Journal of the Audio Engineering Society

2019 December - Volume 67 Number 12


When an audio recording is used as evidence in litigation and forensic investigations, it needs to be checked thoroughly for authenticity and integrity in order to be admissible, compelling, and decisive evidence in a court of law. An audio recording can be subject to tampering attacks with easy-to-use editing and signal processing tools, thereby undermining its legal value. Artifacts embedded in an audio recording can provide valuable clues about the acoustic environment in` which the audio was recorded and allow for the detection of tampering. This paper presents findings of two parallel methodologies: (1) where the features are extracted from the room impulse response and (2) where features are extracted directly from the reverberated recordings. These methods focus on extracting parameters from audio recordings that helped distinguish different auditory scenes. Experiments employing an exhaustive set of machine learning classifiers along with different acoustic features were conducted for the classification of auditory environments. A comparative analysis has been carried out to assess the performance of each classifier and relative performance impact of each feature set in terms of the accuracy of classification. A two-layer Artificial Neural Network (ANN) provided an accuracy of 98.7% using room impulse responses and an accuracy of 99.5% when the reverberated audio recordings were trained.

The playability and degradation of polyester magnetic media has been an ongoing concern for decades for audio curators, technicians, and hobbyists. As these collections continue to age, users increasingly desire to transfer their contents. However, such a task can be daunting. This report presents a new, rapid, nontechnical tool for evaluating the playability and physical surface of polyester magnetic tapes without needing to place them on playback equipment or use expensive technical instrumentation. Water contact angle, using a micro liter-sized droplet, was found to accurately predict the physical playback condition of the vast majority of tapes from a sampling of test tapes from the Library of Congress testing labs. This tool provides an appealingly simple and powerful method to directly probe a tape's physical surface. Results could frequently be interpreted by eye, without needing technical processing equipment or software.

Even though digital technology now dominates the audio industry, there is still the need to preserve historic analog machines and instruments. The Onde Martenot, invented in 1928, is an example of a classic electronic musical instrument based on heterodyne processing. This paper describes a simulation of that instrument. In the Onde Martenot, two oscillators generate high-frequency quasi-sinusoidal signals, one of which is fixed and other is controlled by a player using a sliding ribbon. The sum of these two oscillators is an amplitude-modulated signal whose envelope is detected using a triode vacuum tube. That produces an audible sound with a frequency that is the difference of the two oscillators. The triode vacuum tube in the detector is a nonlinear component that adds harmonics to the signal. This paper focuses on using a power-balanced simulation of its ribbon-controlled oscillator, composed of linear, nonlinear, as well as time-varying components. Numerical experiments on the nonlinear time-varying circuit lead to expected observations: (1) the combination of the triode amplification and the LC-resonator produces a quasi-sinusoidal oscillation with a stable amplitude for a static configuration; (2) the mechanical force produced by the variable capacitor due to the ribbon displacement is undetectable by the musician for over-speed movement; and (3) the latency between the instantaneous frequency and the ribbon position is also undetectable. This corroborates that the Martenot s ribbon-controlled circuit is close to an ideal oscillator.

A Generative Adversarial Net-Based Bandwidth Extension Method for Audio Compression

Authors: Huang, Qingbo; Liu, Tiejun; Wu, Xihong; Qu, Tianshu

To reduce the burden of storing and transmitting audio signals, they are often compressed with a lossy single-channel code. Because the high-frequency components are effectively truncated when using a low bitrate encoder, listeners may experience the sound as being uncomfortable, muffled, or dull. To compensate for the perceived degradation, bandwidth extension technology can be used to regenerate the missing high frequencies from the low-frequency components during the decoding process. In this paper the authors propose a bandwidth extension method based on Generative Adversarial Networks (GAN), which is used to estimate the relationship between the MDCT spectrum in the high-frequency part and the low-frequency part. It is evaluated by a discriminant network in the GAN to get a more natural result. A complete audio coding system was built by using AAC Low Complex as the single-channel core encoder with the proposed bandwidth extension method. To evaluate the audio quality decoded by the new system, a subjective evaluation experiment was carried out using the HE-AAC as the baseline system with the MUSHRA experimental method.

A prominent trend in spatial audio research is the realization of virtual acoustic environments based on binaural technology. This study estimates the perceptual influence of system errors on the binaural reproduction of spherical microphone array data for room simulation applications. Specifically, the impact of spatial aliasing, system noise, and microphone positioning errors is perceptually analyzed in a listening experiment using an auditory model. Perceptual and technical data are related by various predictive modeling techniques, which enable estimating the perceptual strength of system errors. The experimental data comprises spherical array simulations under free-field conditions and in two reflective environments, a dry and a reverberant shoebox-shaped room, using five different audio signals for auralization. Results show that error prediction is possible with high accuracy and low errors using nonlinear modeling techniques such as artificial neural networks.

Management of Sound Levels in Live Music Venues

Authors: McGinnity, Siobhan; Mulder, Johannes; Beach, Elizabeth Francis; Cowan, Robert

With the increased recognition of potential damage to listeners when subjected to excessively loud sound, software-based sound level management systems can be viewed as a component of a strategy for reducing sound exposure to patrons and staff in live music venues. However, the use of level management tools in small indoor music venues, which represent a unique environment, has not been systematically explored. In an experimental approach for sound level management, a software system was tried in six indoor live-music venues in Melbourne. Comparing a control (without sound level management software) and the experimental condition (using the software), there was no reduction in mean LAeq,T, although there was a reduction in the number of events with extreme volume levels. Subjective questionnaires indicated that one-fifth of the patrons preferred lower sound levels than they experienced. The findings suggest that modifications to the software system may be necessary if the aim of the system is to reduce patron and staff sound exposure rather than simply to avoid exceeding legislative sound level limits. Recommended alterations could include greater flexibility in choice of target, matching with context of the performance, or changes to the system's visual display so that staying below, not at target, is positively reinforced.

Preferred Levels for Background Ducking to Produce Esthetically Pleasing Audio for TV with Clear Speech

Authors: Torcoli, Matteo; Freke-Morin, Alex; Paulus, Jouni; Simon, Christian; Shirley, Ben


In audio production, background ducking facilitates speech intelligibility while allowing the background to fulfill its purpose, e.g., to create ambiance, set the mood, or convey semantic cues. Technical details for recommended ducking practices are not currently documented in the literature. This report first analyzes the common practices found in TV documentaries, and it then describes a listening test that investigated the preferences of 22 normal-hearing participants on the Loudness Difference (LD) between commentary and background during ducking. Highly personal preferences were observed, highlighting the importance of object-based personalization. Statistically significant difference was found between nonexpert and expert listeners. On average, nonexperts preferred LDs that were 4 LU higher than the ones preferred by experts. A statistically significant difference was also found between Commentary over Music (CoM) and Commentary over Ambiance (CoA). Based on the test results, the authors recommend at least 10 LU difference for CoM and at least 15 LU for CoA. Moreover, a computational method based on the Binaural Distortion-Weighted Glimpse Proportion (BiDWGP) was found to match the median preferred LD for each item with good accuracy.

Standards and Information Documents

AES Standards Committee News


[Feature] Room acoustics affect many of the things that audio engineers make or do. Scale models for simulating building acoustics have been given a new look in a potential role as echo chambers for live performance or recording. Modal decay at low frequencies can be evaluated using a clever wavelet transform tech-nique. Finite element models may be able to be employed to support measure-ments of reverberation time in small rooms. Music performances may vary in tempo when they’re done in different reverberation conditions, but the effects are not entirely predictable. Finally it may be possible make a desk screen that is both visually transparent and performs well acoustically.

147th Convention Report, New York

Exhibitors and Sponsors

Call for Nominations for the Board of Governors

147th Convention Papers Abstracts, New York

Call for Awards Nominations

Index to Volume 66


2018 Statement of Financial Position

AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content