E-library page

AES E-Library

Continuous Speech Emotion Recognition with Convolutional Neural Networks

Emotional speech is a separate channel of communication that carries the paralinguistic aspects of spoken language. Affective information knowledge can be crucial for contextual speech recognition, which can also provide elements from the personality and psychological state of the speaker enriching the communication. That kind of data may play an important role as semantic analysis features of web content and would also apply in intelligent affective new media and social interaction domains. A model for Speech Emotion Recognition (SER), based on Convolutional Neural Networks (CNN) architecture is proposed and evaluated. Recognition is performed on successive time frames of continuous speech. The dataset used for training and testing the model is the Acted Emotional Speech Dynamic Database (AESDD), a publicly available corpus in the Greek language. Experiments involving the subjective evaluation of the AESDD are presented to serve as a reference for human-level recognition accuracy. The proposed CNN architecture outperforms previous baseline machine learning models (Support Vector Machines) by 8.4% in terms of accuracy and it is also more efficient because it bypasses the stage of handcrafted feature extraction. Data augmentation of the database did not affect classification accuracy in the validation tests but is expected to improve robustness and generalization. Besides performance improvements, the unsupervised feature-extraction stage of the proposed topology also makes it feasible to create real-time systems.

Author (s): Vryzas, Nikolaos; Vrysis, Lazaros; Matsiola, Maria; Kotsakis, Rigas; Dimoulas, Charalampos; Kalliris, George
Affiliation: Aristotle University of Thessaloniki, Thessaloniki, Greece (See document for exact affiliation information.)
Publication Date: 2020-01-06 Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=20714

(195KB)

This paper costs $33 for non-members and is free for AES members and E-Libary subscribers.

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type: Journal Article
E-Libary location: (CD JAES68) TMP/JAES68/1/

Learn more about the AES E-Library

AES Conventions

AES Conferences

AES Training & Development

Gift Membership

AES Membership Benefits

Gift Membership

AES Membership Benefits

Become a Sustaining Member

AES Membership Benefits

AES Inside Track

Current Standards

Standards Blog

Journal of the AES

AES E-library

Special Publications

AES Sections are active around the world and provide a means for members to meet locally.

AES Student Website

AES Educational Foundation

Student Sections

See the committee’s accomplishments in diversity & inclusion

AES Statement of solidarity

AES E-Library

Continuous Speech Emotion Recognition with Convolutional Neural Networks

Choose your country of residence from this list:

AES E-Library

Login Institutions

Continuous Speech Emotion Recognition with Convolutional Neural Networks

Choose your country of residence from this list: