AES E-Library

← Back to search

Fighting AI with AI: Fake Speech Detection Using Deep Learning

Voice cloning technologies have found applications in a variety of areas ranging from personalized speech interfaces to advertisement, video gaming, and so on. Existing voice cloning systems are capable of learning speaker characteristics from few samples and generating perceptually indistinguishable speech. These advances pose new security and privacy threats to voice-driven interfaces. This paper presents a deep learning-based framework for learning cloned speech synthesis models and the bona-?de speech production processes. To this end, a convolutional neural network is trained and tested on spectrogram estimated from input audio recordings. Performance of the proposed method is evaluated on cloned and bona-?de audios. Experimental results indicate that the proposed method is capable of detecting bona-?de and cloned audios with a close to perfect accuracy.


Author (s):
Affiliation: (See document for exact affiliation information.)
Publication Date:


Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

E-Libary location:
Choose your country of residence from this list:

Skip to content