You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
Automatic coded audio quality assessment is an important task whose progress is hampered by the scarcity of human annotations, poor generalization to unseen codecs, bitrates, content-types, and a lack of flexibility of existing approaches. One of the typical human-perception-related metrics, ViSQOL v3 (ViV3), has been proven to provide a high correlation to the quality scores rated by humans. In this study, we take steps to tackle problems of predicting coded audio quality by completely utilizing programmatically generated data that is informed with expert domain knowledge. We propose a learnable neural network, entitled InSE-NET, with a backbone of Inception and Squeeze-and-Excitation modules to assess the perceived quality of coded audio at a 48 kHz sample rate. We demonstrate that synthetic data augmentation is capable of enhancing the prediction. Our proposed method is intrusive, i.e. it requires Gammatone spectrograms of unencoded reference signals. Besides a comparable performance to ViV3, our approach provides a more robust prediction towards higher bitrates.
Author (s): Jiang, Guanxin; Biswas, Arijit; Bergler, Christian; Maier, Andreas
Affiliation:
Dolby Germany GmbH; Pattern Recognition Lab, FAU Erlangen-Nuremberg, Erlangen, Germany
(See document for exact affiliation information.)
AES Convention: 151
Paper Number:10514
Publication Date:
2021-10-06
Import into BibTeX
Session subject:
Audio quality
Permalink: https://aes2.org/publications/elibrary-page/?id=21478
(373KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Jiang, Guanxin; Biswas, Arijit; Bergler, Christian; Maier, Andreas; 2021; InSE-NET: A Perceptually Coded Audio Quality Model based on CNN [PDF]; Dolby Germany GmbH; Pattern Recognition Lab, FAU Erlangen-Nuremberg, Erlangen, Germany; Paper 10514; Available from: https://aes2.org/publications/elibrary-page/?id=21478
Jiang, Guanxin; Biswas, Arijit; Bergler, Christian; Maier, Andreas; InSE-NET: A Perceptually Coded Audio Quality Model based on CNN [PDF]; Dolby Germany GmbH; Pattern Recognition Lab, FAU Erlangen-Nuremberg, Erlangen, Germany; Paper 10514; 2021 Available: https://aes2.org/publications/elibrary-page/?id=21478