Journal of the Audio Engineering Society

2019 January/February - Volume 67 Number 1/2

President's Message

President's Message

Download: PDF (34.5 KB)


Microphone array techniques for surround sound recording can be broadly classified into two groups: those that attempt to produce the continuous phantom imaging around 360° in the horizontal plane and those that treat the front and rear channels separately. The equal segment microphone array (ESMA) is a multichannel microphone technique that attempts to capture a sound field in 360° without any overlap between the stereophonic recording angle of each pair of adjacent microphones. This study investigated the optimal microphone spacing for a quadraphonic ESMA using cardioid microphones. Recordings of a speech source were made using the ESMAs with four different microphone spacings of 0 cm, 24 cm, 30 cm, and 50 cm, based on different psychoacoustic models for microphone array design. Multichannel and binaural stimuli were created with the reproduced sound field rotated over 45° intervals. Listening tests were conducted to examine the accuracy of phantom image localization for each microphone spacing, in both loudspeaker and binaural headphone reproductions. The results generally indicated that the 50-cm spacing, which was derived from an interchannel time and level trade-off model that is perceptually optimized for 90° loudspeaker base angle, produced more accurate localization results than the 24-cm and 30-cm ones, which were based on conventional models derived from the standard 60° loudspeaker setup. The 0-cm spacing produced the worst accuracy with the most frequent bimodal distributions of responses between the front and back regions. Findings from this study are expected to be useful for acoustic recording for virtual reality applications as well as for multichannel surround sound.

Neural Network Fusion and Selection Techniques for Noise-Efficient Sound Classification

Authors: Mitilineos, Stelios A.; Tatlas, Nicolas-Alexander; Potirakis, Stelios M.; Rangoussi, Maria

An efficient means for classifying potentially hazardous events using wireless acoustic sensor networks may significantly contribute to the preservation of cultural heritage, artifacts, and architectural sights. However, classification of field-collected sound samples is a demanding task because omnipresent ambient noise severely affects the quality of the recorded samples and the corresponding extracted features. Building on previous work, the authors present a series of fusion or ensemble learning techniques that poll a number of artificial neural network classifiers in order to create class estimates that are significantly more accurate than each isolated classifier or their average. Furthermore, ambient noise effect is simulated by artificially injecting additive white and pink noise to the available sound samples, thus creating a wide range of signal-to-noise (SNR) values. Numerical results demonstrate that the proposed fusion techniques maintain satisfactory accuracy even for negative SNR values, thus demonstrating the applicability of the proposed classification platform for real-world applications.

Automatic Noise PSD Estimation for Restoration of Archived Audio

Authors: Brandt, Matthias; Doclo, Simon; Bitzer, Joerg

The quality of audio recordings is often degraded by various types of disturbances, such as broadband noise, hum, clicks, and crackles. Of these, broadband noise is one of the most frequently occurring types of disturbance, especially in old recordings. Disturbances can be classified as having either a technical or acoustic origin. This research presents a novel algorithm to estimate the power spectral density (PSD) of stationary broadband noise disturbances in audio recordings. The proposed algorithm estimates the noise PSD as the mean value of an exponential distribution that corresponds to the truncated periodogram coefficients of the disturbed audio signal. A confidence value is computed to reflect the reliability of the noise PSD estimate. Noise PSD estimates with a low confidence are rejected in order to avoid degrading the desired signal when the obtained noise PSD estimate is used in a noise-reduction algorithm. Based on experiments with a large database of clean speech and music signals and different artificial and real-world broadband noise disturbances, the results show that the proposed algorithm yields reduced PSD estimation errors compared to the state-of-the-art minimum statistics algorithm for a large range of SNRs. The algorithm allows for unsupervised operation and thus constitutes an important part of a fully automatic broadband noise restoration system for audio archives.

This paper introduces a specific database of audio events related to hunting wild elephants by poachers in open nature. Generally, collecting appropriate data with ground truth is a very time-consuming task. There are not many available databases of gunshots that can be used for research. This database contains gunshots, other sounds to express local audio diversity, and mixtures of these. The relatively small variability of gunshot signals together with the variability of extracted features were statistically evaluated. Gunshot detections were estimated using four basic feature sets separately. The created database is appropriate for developing methods of automatic gunshot detection from continuous audio signals that are suitable for implementation in low-power remote-monitoring systems. Some selected recordings from the database are free to be downloaded and other records are available from the authors.

Engineering reports

Although predictive models are widely used to predict the results of listening tests, there are currently no standardized statistical metrics for assessing the rank order. Commonly used rank-order metrics do not consider the variance of the listening test data. This paper proposes two novel metrics for assessing rank order with respect to variance by adapting Spearman’s Rho and Kendall’s Tau and assesses the performance of these metrics against actual listening test data with standardized prediction models.

Standards and Information Documents

AES Standards Committee News

Download: PDF (120.65 KB)


Quantifying Recording Techniques

Authors: Rumsey, Francis

[Feature] We are in a time when machines are increasingly being taught to undertake work that was previously the domain of humans, and that may mean picking apart human perceptual and creative processes in a way that enables them to be taught to, or built into, forms of display or control that machines can mediate. Selected papers on recording and production from the 145th Convention are summarized in that light.

147th Convention, Call for Contributions, New York

Download: PDF (99.87 KB)

Automotive Audio Conference, Call for Contributions, Munich

Download: PDF (99.87 KB)


Section News

Download: PDF (194.18 KB)


Download: PDF (67.13 KB)


Download: PDF (293.4 KB)

AES Conventions and Conferences

Download: PDF (111.44 KB)


Table of Contents

Download: PDF (33.18 KB)

Cover & Sustaining Members List

Download: PDF (77.44 KB)

AES Officers, Committees, Offices & Journal Staff

Download: PDF (81.2 KB)

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content