Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
Objective evaluation of audio processed with Time-Scale Modification (TSM) has recently seen improvement with a labeled time-scaled audio dataset used to train an objective measure. This double-ended measure was an extension of Perceptual Evaluation of Audio Quality and required reference and test signals. In this paper two single-ended objective quality measures for time-scaled audio are proposed that do not require a reference signal. Internal representations of spectrogram and speech features are learned by either a Convolutional Neural Network (CNN) or a Bidirectional Gated Recurrent Unit (BGRU) network and fed to a fully connected network to predict Subjective Mean Opinion Scores. The proposed CNN and BGRU measures respectively achieve average Root Mean Square Errors of 0.61 and 0.58 and mean Pearson Correlation Coefficients of 0.77 and 0.79 to the time-scaled audio dataset. The proposed measures are used to evaluate TSM algorithms and comparisons are provided for 15 TSM implementations. A link to implementations of the objective measures is provided.
Author (s): Roberts, Timothy; Nicolson, Aaron; Paliwal, Kuldip K.
Affiliation:
Griffith University, Nathan, Australia; Australian eHealth Research Centre, CSIRO, Herston, Australia; Griffith University, Nathan, Australia
(See document for exact affiliation information.)
Publication Date:
2021-09-06
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=21461
(1039KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Roberts, Timothy; Nicolson, Aaron; Paliwal, Kuldip K.; 2021; Deep Learning-Based Single-Ended Quality Prediction for Time-Scale Modified Audio [PDF]; Griffith University, Nathan, Australia; Australian eHealth Research Centre, CSIRO, Herston, Australia; Griffith University, Nathan, Australia; Paper ; Available from: https://aes2.org/publications/elibrary-page/?id=21461
Roberts, Timothy; Nicolson, Aaron; Paliwal, Kuldip K.; Deep Learning-Based Single-Ended Quality Prediction for Time-Scale Modified Audio [PDF]; Griffith University, Nathan, Australia; Australian eHealth Research Centre, CSIRO, Herston, Australia; Griffith University, Nathan, Australia; Paper ; 2021 Available: https://aes2.org/publications/elibrary-page/?id=21461
@article{roberts2021deep,
author={roberts timothy and nicolson aaron and paliwal kuldip k.},
journal={journal of the audio engineering society},
title={deep learning-based single-ended quality prediction for time-scale modified audio},
year={2021},
volume={69},
issue={9},
pages={644-655},
month={september},}