Journal of the Audio Engineering Society

2017 October - Volume 65 Number 10


Over the last decade, numerous three-dimensional audio playback formats have been introduced and standardized for cinema, broadcast, and home theater environments. They differ in terms of number of speakers, speaker positions in the horizontal and vertical planes, and workflow strategies: channel-based, object-based, or some hybrid of the two. Each system possesses inherent pros and cons. This research attempts to determine whether listeners could discriminate among four currently standardized three-dimensional audio formats for reproduction of acoustic music. Double-blind listening tests showed that listeners could discriminate between NHK 22.2 Multichannel Sound (22.2) and several lower-channel-count 3D reproduction formats with a high degree of success, regardless of the musical stimulus. Listeners were also able to discriminate between three relatively similar 3D audio formats: ATSC 11.1, KBS 10.2, and Auro 9.1, although with significantly less success than with the 22.2. This suggests each of these formats deliver a perceptually different listening experience, with 22.2 being particularly different from the other formats under investigation.

Audio-based surveillance systems can be used in public places to detect abnormal events because such events are usually accompanied by abnormal sounds, such as screaming, explosions, gunfire, and crashing sounds. Audio-surveillance systems can supplement video surveillance. This paper proves that a T-distribution model is highly suitable for describing a wide range of typical background noise distributions encountered in public places. Background noise in public places can affect feature extraction for abnormal sounds when Local Mean Decomposition (LMD) is used as a signal-processing tool. The authors first confirm that the background noise obeys a T-distribution using Kolmogorov-Smirnov hypothesis testing. The authors propose an improved LMD method based on the T-distribution to enhance features extraction. They add particular production function components of inhomogeneous random noise obeying a T-distribution to the abnormal sound in a nested manner and then take the ensemble means of the obtained production functions as the decomposition results. This alleviates the mode mixing inherent in LMD. Additionally, the algorithm replaces moving average operation with a linear spline to reduce the iteration required in LMD from triple-loop iteration to double-loop iteration. Experimental results demonstrate that the proposed method outperforms commonly used methods in terms of the classification rate and computational cost.

Efficient Design of a Parallel Graphic Equalizer

Authors: Bank, Balázs; Belloch, Jose A.; Välimäki, Vesa

Accurate design of a parallel graphic equalizer involves the construction of a complex target frequency response, which is obtained by smoothly interpolating using minimum-phase characteristics between defined gains, which is then followed by a least-squares filter design. This work proposes two methods to simplify the design computations. First, the magnitude and phase response of the target is computed as a combination of minimum phase basis functions, which leads to the easier evaluation of the total frequency response. Second, the matrix is decomposed into the product of an orthogonal matrix Q and an upper triangular matrix R, which simplifies the required matrix inversion. A comparison with the previous method shows that the accuracy of the proposed design method is not significantly compromised. And the computational cost is radically reduced, making the new algorithm highly attractive for interactive audio applications. The method has been tested on an ARM-based system-on-Chip Cortex-A7, which is currently used in many mobile devices. For the weighted parallel equalizer design, the total speedup is a factor of 7. For the more efficient nonweighted designs, the computation of the filter coefficients takes 0:87ms on the ARM-A7 processor (the speedup factor is 300 compared to the original method).

Impulsive Disturbances in Audio Archives: Signal Classification for Automatic Restoration

Authors: Brandt, Matthias; Doclo, Simon; Gerkmann, Timo; Bitzer, Joerg

Historic recordings usually have degraded audio quality because of their age, improper storage, and the shortcomings of the original media. One typical problem is the presence of impulsive disturbances. Recordings that suffer from clicks and crackles can be processed by impulse-restoration algorithms to improve their audio quality. This report presents a new algorithm that classifies one-second frames of an audio recording based on the existence of impulsive disturbances. The algorithm uses supervised learning. It is shown that existing impulse-restoration algorithms suffer from degradation of the desired signal if the input SNR is high and if no manual parameter adjustment is possible. This would make automatic restoration of large amounts of diverse archive audio material unfeasible. The proposed classification algorithm can be used as a supplement to an existing impulse-restoration algorithm to alleviate this drawback. An evaluation using a large number of test signals shows that high classification accuracy can be achieved, making automatic impulse restoration possible. Results show that prewhitening of the input signal by means of a phase-only transform serves to increase the detectability of disturbance impulses, which can also be used as a detection enhancement method for impulse-restoration algorithms.

Engineering reports

A High Resolution and Full-Spherical Head-Related Transfer Function Database for Different Head-Above-Torso Orientations

Authors: Brinkmann, Fabian; Lindau, Alexander; Weinzierl, Stefan; Par, Steven van de; Müller-Trapet, Markus; Opdam, Rob; Vorländer, Michael


Head-related transfer functions (HRTFs) capture the free-field sound transmission from a sound source to the listeners ears, incorporating all the cues for sound localization, such as interaural time and level differences as well as the spectral cues that originate from scattering, diffraction, and reflection on the human pinnae, head, and body. In this study, HRTFs were acoustically measured and numerically simulated for the FABIAN head-and-torso simulator on a full-spherical and high-resolution sampling grid. HRTFs were acquired for 11 horizontal head-above-torso orientations, covering the typical range of motion of +/-50°. This made it possible to account for head movements in dynamic binaural auralizations. Because of a lack of an external reference for the HRTFs, measured and simulated data sets were cross-validated by applying auditory models for localization performance and spectral coloration. The results indicate a high degree of similarity between the two data sets regarding all tested aspects, thus suggesting that they are free of systematic errors.


[Feature] Audio forensics bene ts greatly from the research reported at a recent conference, and work continues on improving the reliability and ease of use of ENF data gathering and analysis. There's also considerable effort going in to the authentication of recordings made on mobile devices such as iOS systems. A novel approach to recording authentication was reported based on reverberation analysis.

2018 Spatial Reproduction Conference, Call for Contributions, Tokyo

2018 Audio for Virtual and Augmented Reality Conference, Call for Contributions, Redmond

Sound Reinforcement Conference Report, Struer

2018 Audio Archiving, Preservation, and Restoration Conference, Call for Contributions, Culpeper

Review of Society's Sustaining Members

New Officers 2017/2018


AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content