AES E-Library

On-Device Automatic Speech Remastering Solution in Real Time

With the development of AI technology, there are many attempts to provide new experiences to users by applying AI technology to various multimedia devices. Most of these technologies are provided through server-based AI models due to the large model size. In particular, most of the audio AI technologies are applied through apps and it is server-based in offline AI models. However, there is no doubt that AI technology which can be implemented in real time is important and attractive for streaming service devices such as TVs. This paper introduces an on-device automatic speech remastering solution. The automatic speech remastering solution indicates extracting speech in real-time from the on-device and automatically adjusts the speech level considering the current background sound and volume level of the device. In addition, the automatic speech normalization technique that reduces the variance in speech level for each content is applied. The proposed solution provides users with a high understanding and immersion in the contents by automatically improving the delivery of speech and normalizing speech levels without manually controlling the volume level. There are three key points in this paper. The first is a deep learning speech extraction model that can be implemented in real-time on TV devices, the second is an optimized implementation method using the DSP and NPU, and last is audio signal processing for the speech remastering to improve speech intelligibility.

 

Author (s):
Affiliation: (See document for exact affiliation information.)
AES Convention: Paper Number:
Publication Date:
Permalink: https://aes2.org/publications/elibrary-page/?id=22764


(555KB)


Download Now

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type:
E-Libary location:
16938
Choose your country of residence from this list:










Skip to content