Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
With the development of AI technology, there are many attempts to provide new experiences to users by applying AI technology to various multimedia devices. Most of these technologies are provided through server-based AI models due to the large model size. In particular, most of the audio AI technologies are applied through apps and it is server-based in offline AI models. However, there is no doubt that AI technology which can be implemented in real time is important and attractive for streaming service devices such as TVs. This paper introduces an on-device automatic speech remastering solution. The automatic speech remastering solution indicates extracting speech in real-time from the on-device and automatically adjusts the speech level considering the current background sound and volume level of the device. In addition, the automatic speech normalization technique that reduces the variance in speech level for each content is applied. The proposed solution provides users with a high understanding and immersion in the contents by automatically improving the delivery of speech and normalizing speech levels without manually controlling the volume level. There are three key points in this paper. The first is a deep learning speech extraction model that can be implemented in real-time on TV devices, the second is an optimized implementation method using the DSP and NPU, and last is audio signal processing for the speech remastering to improve speech intelligibility.
Author (s): Kim, Dongwoo; Hwang, Inwoo; Kim, Sunmin
Affiliation:
Sound Laboratory, Visual Display Division, Samsung Electronics; Sound Laboratory, Visual Display Division, Samsung Electronics; Sound Laboratory, Visual Display Division, Samsung Electronics
(See document for exact affiliation information.)
AES Convention: 157
Paper Number:306
Publication Date:
2024-10-01
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=22764
(555KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Kim, Dongwoo; Hwang, Inwoo; Kim, Sunmin; 2024; On-Device Automatic Speech Remastering Solution in Real Time [PDF]; Sound Laboratory, Visual Display Division, Samsung Electronics; Sound Laboratory, Visual Display Division, Samsung Electronics; Sound Laboratory, Visual Display Division, Samsung Electronics; Paper 306; Available from: https://aes2.org/publications/elibrary-page/?id=22764
Kim, Dongwoo; Hwang, Inwoo; Kim, Sunmin; On-Device Automatic Speech Remastering Solution in Real Time [PDF]; Sound Laboratory, Visual Display Division, Samsung Electronics; Sound Laboratory, Visual Display Division, Samsung Electronics; Sound Laboratory, Visual Display Division, Samsung Electronics; Paper 306; 2024 Available: https://aes2.org/publications/elibrary-page/?id=22764