Journal of the Audio Engineering Society

2018 October - Volume 66 Number 10


Single-Ended Speech Quality Prediction Based on Automatic Speech Recognition

Authors: Huber, Rainer; Ooster, Jasper; Meyer, Bernd T.

Quality evaluation of digitally-transmitted speech is an important prerequisite to ensure the required quality of telecommunication service. Although formal subjective listening tests still represent the gold standard, they are time-consuming and costly. A new single-ended speech quality measure is proposed that uses a deep neural network (DNN)-based automatic speech recognition system. A quality measure is used to quantify the degradation of the DNN output caused by speech distortions. The new method was evaluated using five databases containing nine subsets of data covering several conditions of narrowband and broadband speech that was degraded by speech codecs, telecommunication networks, clipping, chopped speech, echoes, competing speakers, and additional background noises. Other than the training data set, evaluation results with the remaining eight data subsets showed good average correlations with subjective speech quality ratings achieved without any task-specific training or optimizations. These average results are close to those achieved with the American National Standard ANIQUE+ and clearly better than those obtained with the ITU-T standard P.563.

Layered Motion and Gesture Sonification in an Interactive Installation

Authors: Burloiu, Grigore; Mihai, Valentin; Damian, Stefan

SoundThimble is an interactive sound installation based on the relationship between human motion and virtual objects in 3D space. A Vicon infrared motion-capture system and custom software are used to track, interpret, and sonify the movement and gestures of a performer relative to a virtual object. The authors explore the resulting possibilities for layered sonification dynamics and extended perception and expression in internal tests as well as in a public demo. Experimental evaluation reveals an average object search time of around 60 s, as well as thresholding ranges for effective gesture spotting. The underlying software platform is open source and portable to similar hardware systems, leaving room for extension and variation. This paper presents the pilot application of the proposed framework. Audience members entering the tracking area shift among the roles of game player, sonic performer, and composer/arranger, according to an iterative interaction schema. The central vehicle in all three layers is the “sound-thimble” itself, a virtual object with particular spatial, sonic, and interaction attributes.

With the ever-increasing applications for digital signal processing, there is a strong motivation to discover new processing techniques. Methods based on matrix rank minimization have been increasingly used for signal analysis, particularly for signal separation. This research considers the analysis and application of the Non-Negative Matrix Factorization (NMF), associated with Kullback-Leibler and Itakura-Saito divergences, for the separation of digital sound sources consisting of harmonic and percussive elements. The NMF algorithm and divergence functions were implemented in a MATLAB environment and applied to musical mixes composed of electric guitar, bass, kick, ride, and snare. Then, comparative analyses of the divergence functions performance used SNR-based metrics. Considering the inconsistencies between the objective metrics and the human perception, two alternative objective metrics were proposed for the Signal-Interference Ratio (SIR), called Windowed SIR (W-SIR) and Average Windowed SIR (AW-SIR). Based on the W-SIR metric, the authors present the new Recursive Semi-Supervised NMF (RSS-NMF), for which the training information is extracted from the original signal. In both cases, the results demonstrated better performance of the RSS-NMF technique in relation to the non-supervised NMF technique.

Minimally Simple Binaural Room Modeling Using a Single Feedback Delay Network

Authors: Agus, Natalie; Anderson, Hans; Chen, Jer-Ming; Lui, Simon; Herremans, Dorien

The widespread adoption of acoustic modeling in such applications as 3D gaming and virtual reality simulation are generally hindered by the complexity of the implementation. The most efficient binaural acoustic modeling systems use a multi-tap delay to generate early reflections, which are then combined with a feedback delay network to produce generic late reverberation. This report presents a method of binaural acoustic simulation that uses one feedback delay network to simultaneously model both first-order reflections and late reverberation. The advantages are simplicity and efficiency. The proposed method is compared to existing methods for modeling binaural early reflections using a multi-tap delay line. Measurements of ISO standard evaluators including interaural correlation coefficient, decay time, clarity, definition, and center time indicate that the proposed method achieves a comparable accuracy to less efficient methods. The proposed method is implemented as an iOS application, and is able to auralize input signal directly without convolution and update in real time. This significantly reduces the computational time because it does not need to produce an impulse response after every parameter update.

Modern technology, such as SPICE circuit analysis and signal processing implementation of equations, can be applied to ancient technology such as triode vacuum tubes to gain a better understanding of their properties. In this paper the authors propose a formula that can be applied uniformly to a series of miniature tubes: 12AX7, 12AU7, 12AY7, and 12AT7. This model considers the effects of the differences in physical shape and dimension on the equivalent diode characteristics in the initial velocity region. Such effects were theoretically analyzed in the beginning of the tube development in the 1920s. However, the results were never incorporated into today’s SPICE modeling. The model consists of space charge modulation formula, which enables the expression of nonlinear behavior in wide operation range of plus and minus grid bias in a circuit. Thus, the model also successfully simulates the modulation of amplification factor obtained by the differential parameters of transconductance and output resistance. Furthermore, the dispersions of the parameter-value for the different tube types are compared in detail based on the electrode shape and the physical dimension associated with various aspects of properties. From this analysis, the manufacturing stability for 12AU7 and 12AY7 is found to be better than that for 12AX7 and 12AT7 because of the difficulty of the accurate processing of high-amplification factor and high-transconductance factor, respectively.

Identification of Volterra Models of Tube Audio Devices using Multiple-Variance Method

Authors: Orcioni, Simone; Terenzi, Alessandro; Cecchi, Stefania; Piazza, Francesco; Carini, Alberto

The multiple-variance method is a cross-correlation method that exploits input signals with different powers for the identification of a nonlinear system by means of the Volterra series. It overcomes the problem of the locality of the solution of traditional nonlinear identification methods that successfully approximate only for inputs having approximately the same power of the identification signal. The multiple-variance method improves the model performance in case of inputs with high dynamic range. This method is used to identify three different tube amplifiers, and it is applied to a novel reduced Volterra model. This overcomes the problem of the very large number of coefficients required by the Volterra series, the so-called “course of dimensionality.” The paper demonstrates the effectiveness of the multiple-variance methodology in terms of system identification error and computational complexity.

Standards and Information Documents

AES Standards Committee News


Preserving Our Audio Heritage

Authors: Rumsey, Francis

[Feature] Papers presented at the recent conference on archiving, preservation, and restoration majored on the degradation of cylinders and magnetic tapes, but also provided a fascinating insight into other issues such as correcting timebase errors, preservation metadata, and checking digital recordings for ingest errors.

Music-Induced Hearing Disorders Conference Report, Chicago

Spatial Reproduction Conference Report, Tokyo

New Officers 2018'2019

Review of Society's Sustaining Members


AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content