You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
Emotional speech is a separate channel of communication that carries the paralinguistic aspects of spoken language. Affective information knowledge can be crucial for contextual speech recognition, which can also provide elements from the personality and psychological state of the speaker enriching the communication. That kind of data may play an important role as semantic analysis features of web content and would also apply in intelligent affective new media and social interaction domains. A model for Speech Emotion Recognition (SER), based on Convolutional Neural Networks (CNN) architecture is proposed and evaluated. Recognition is performed on successive time frames of continuous speech. The dataset used for training and testing the model is the Acted Emotional Speech Dynamic Database (AESDD), a publicly available corpus in the Greek language. Experiments involving the subjective evaluation of the AESDD are presented to serve as a reference for human-level recognition accuracy. The proposed CNN architecture outperforms previous baseline machine learning models (Support Vector Machines) by 8.4% in terms of accuracy and it is also more efficient because it bypasses the stage of handcrafted feature extraction. Data augmentation of the database did not affect classification accuracy in the validation tests but is expected to improve robustness and generalization. Besides performance improvements, the unsupervised feature-extraction stage of the proposed topology also makes it feasible to create real-time systems.
Author (s): Vryzas, Nikolaos; Vrysis, Lazaros; Matsiola, Maria; Kotsakis, Rigas; Dimoulas, Charalampos; Kalliris, George
Affiliation:
Aristotle University of Thessaloniki, Thessaloniki, Greece
(See document for exact affiliation information.)
Publication Date:
2020-01-06
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=20714
(195KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Vryzas, Nikolaos; Vrysis, Lazaros; Matsiola, Maria; Kotsakis, Rigas; Dimoulas, Charalampos; Kalliris, George; 2020; Continuous Speech Emotion Recognition with Convolutional Neural Networks [PDF]; Aristotle University of Thessaloniki, Thessaloniki, Greece; Paper ; Available from: https://aes2.org/publications/elibrary-page/?id=20714
Vryzas, Nikolaos; Vrysis, Lazaros; Matsiola, Maria; Kotsakis, Rigas; Dimoulas, Charalampos; Kalliris, George; Continuous Speech Emotion Recognition with Convolutional Neural Networks [PDF]; Aristotle University of Thessaloniki, Thessaloniki, Greece; Paper ; 2020 Available: https://aes2.org/publications/elibrary-page/?id=20714
@article{vryzas2020continuous,
author={vryzas nikolaos and vrysis lazaros and matsiola maria and kotsakis rigas and dimoulas charalampos and kalliris george},
journal={journal of the audio engineering society},
title={continuous speech emotion recognition with convolutional neural networks},
year={2020},
volume={68},
issue={1/2},
pages={14-24},
month={january},}
TY – paper
TI – Continuous Speech Emotion Recognition with Convolutional Neural Networks
SP – 14 EP – 24
AU – Vryzas, Nikolaos
AU – Vrysis, Lazaros
AU – Matsiola, Maria
AU – Kotsakis, Rigas
AU – Dimoulas, Charalampos
AU – Kalliris, George
PY – 2020
JO – Journal of the Audio Engineering Society
VO – 68
IS – 1/2
Y1 – January 2020