AES E-Library

Deep Learning Based Voice Extraction and Primary-Ambience Decomposition for Stereo to Surround Upmixing

Surround systems have gained popularity in home entertainment despite the fact that most of the cinematic content is delivered in two-channel stereo format. Although there are several upmixing options, it has proven challenging to deliver an upmixed signal that approximates the original directionality and timbre intended by the mixing artist. The aim of this work is to design a two-to-five channels upmixer using a novel upmixing strategy combining voice extraction and primary-ambience decomposition. Results from a modified-MUSHRA test show that our proposed upmixer outperforms established alternatives for cinematic upmixing in perceived spatial and timbral quality.

 

Author (s):
Affiliation: (See document for exact affiliation information.)
AES Convention: Paper Number:
Publication Date:
Session subject:
Permalink: https://aes2.org/publications/elibrary-page/?id=22087


(886KB)


Download Now

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type:
E-Libary location:
16938
Choose your country of residence from this list:










Skip to content