You are currently logged in as an 
 Institutional Subscriber.
If you would like to logout, 
 please click on the button below.
Home / Publications / Journal
 
The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.
The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work membership news, new products, and newsworthy developments in the field of audio.
Editor-in-Chief: Brian F.G. Katz
If you are experiencing any issues with the E-library or the Online Journals access, please fill in this form.
Only AES members and Institutional Journal Subscribers can download
*Only AES members and Institutional Journal Subscribers can download.
Authors: Hermannsen, Line; Bech, Søren
A sound system was developed to test new innovative designs for creating personal sound zones in a simulated home environment. A listening test paradigm with rating scales was customized and verified as suitable for collecting reliable data describing the basic audio quality and audio quality of experience of sound zones. Based on a previous study, a selection of attributes was evaluated, and all were found to be significantly predicted by the chosen changes in audio content, gain, and bass. A large subwoofer setup was added to the domestic system in some scenarios to reproduce low-frequency components and confirmed to positively affect audio quality of experience. However, the bass enhancement also increased perceived annoyance, which highlights the trade-off between bass quality and sound zone separation. Ways to optimize the experimental protocol are suggested for future studies of perception and affection for audio reproduction in sound zones, including the continued work in the ISOBEL project.
Download: PDF (34.72 MB)
Authors: Lee, Kyung Yun; Meyer-Kahlen, Nils; Schlecht, Sebastian J.; Välimäki, Vesa
The ultimate goal of the reverberation model for augmented reality is to create an auditory illusion, making simulated sound sources indistinguishable from real, measured ones. However, existing evaluation methods are not tailored to achieve this objective. This paper adopts the evaluation paradigm of auditory illusion tests to evaluate reverberation models under two distinct tasks: authenticity and transferring. The listening test uses the three-alternative forced-choice design, where subjects are asked to detect the speech signal processed with a model-generated room impulse response (RIR) among the signals processed with measured RIRs. For the authenticity task, the three signals contain the same speech sample, while for the transferring task, they contain different samples from different speakers. A Bayesian analysis shows that detecting model-generated RIRs is significantly more challenging in the transferring task than in the authenticity task across all models. Additionally, while the listening test results positively correlate with the selected objective metrics, the reliability and generalizability of these correlations for predicting listening test outcomes remain uncertain. The proposed evaluation framework for reverberation models can serve as a precursory analysis for developing dynamic, binaural rendering for augmented reality applications.
Download: PDF (25.04 MB)
Authors: Gao, Shan; Wang, Yiwen; Yuan, Zeyu; Wu, Xihong; Qu, Tianshu
Conventional room geometry blind inference techniques with acoustic signals often rely on prior knowledge, such as source signals or source positions, limiting their applicability when the sound source is unknown. To solve this problem, the authors propose a novel multitask deep neural network (DNN) model that jointly estimates sound source localization and room geometry using signals captured by a spherical microphone array. Considering the coupling between sound source content and environmental parameters in reverberation signals, extracted early reflection direction and delay information as network inputs to estimate spatial parameters is used, ensuring independence from the sound source signal. The proposed model employs a hierarchical architecture with dedicated subnetworks to process direction-of-arrival (DOA) and time-difference-of-arrival features, followed by a shared fusion module that exploits geometric constraints between source and boundary positions. Compared with traditional methods, this model requires less prior environmental information and performs sound source localization and room geometry inference with single-position sound field measurements. Experimental results from simulations and real measurements demonstrate the method’s effectiveness and precision compared with conventional approaches across various scenarios.
Download: PDF (24.87 MB)
Authors: Liu, Xiaojing; Ai, Hongwei; Reiss, Joshua D.
The simultaneous presence of multiple audio signals can lead to information loss due to auditory masking and interference, often resulting in diminished signal clarity. The authors propose a speech enhancement system designed to present multiple tracks of speech information with reduced auditory masking, thereby enabling more effective discernment of multiple simultaneous talkers. The system evaluates auditory masking using the ITU-R BS.1387 Perceptual Evaluation of Audio Quality model along with ideal mask ratio metrics. To achieve optimal results, a combined iterative Harmony Search algorithm and integer optimization are employed, applying audio effects such as level balancing, equalization, dynamic range compression, and spatialization, aimed at minimizing masking. Objective and subjective listening tests demonstrate that the proposed system performs competitively against mixes created by professional sound engineers and surpasses existing automixing systems. This system is applicable in various communication scenarios, including teleconferencing, in-game voice communication, and live streaming.
Download: PDF (19.51 MB)
Authors: Vignati, Luca; Turchet, Luca
Packet loss concealment (PLC) is vital in preserving audio quality for networked music performances. Although existing PLC techniques primarily target speech transmission, the unique challenges in music signals, such as complex harmonic structures and diverse timbral ranges, have yet to be adequately addressed. This is in part a result of the fact that a satisfactory objective evaluation metric for music PLC methods is missing. As a first foundational step toward this direction, this paper proposes a novel evaluation metric that leverages insights from music psychoacoustics and uses the constant-Q transform to better quantify glitch audibility induced by unconcealed packet loss (i.e., replaced with zeros) compared with existing metrics. The authors conducted extensive subjective listening tests leading to the creation of a publicly available ground truth data set, mapping objective audio features to human assessments of glitch audibility. Results show that the developed metric outperforms other measures (such as mean squared error and mean absolute error) in predicting perceptual impacts, taking a step toward addressing the need for a specialized metric for PLC in the domain of networked music performances. However, further improvements are needed to match human perceptual accuracy, which calls for further research on the development of a reliable perceptually motivated evaluation metric.
Download: PDF (4.42 MB)
Download: PDF (4.42 MB)
Download: PDF (19.51 MB)
Download: PDF (24.87 MB)
Download: PDF (25.04 MB)
Download: PDF (34.72 MB)
Authors: Stefani, Domenico; Binelli, Marco; Farina, Angelo; Turchet, Luca
Recent years have witnessed an increasing interest from the academic and industrial research community toward software for dynamic auralization and six-degrees-of-freedom (6DoF) navigation of immersive audio environments. Some existing tools rely on the convolution of source sounds with Ambisonics impulse responses (IRs) recorded in real spaces. However, despite advancements in computing power of modern central processing units, convolution remains a demanding computation to perform, especially with many channels and in real time. Moreover, efficient computation schemes often used in single-IR matrix tools have not made their way into open-source 6DoF spatial audio plugins. This paper presents MCFX-6DoFconv, an open-source 6DoF convolution plugin combining the efficient convolution engine of the MCFX-Convolver plugin with the 6DoF navigation features of SPARTA 6DoFconv, along with functional and interface improvements. Compared with the original SPARTA 6DoFconv, the proposed plugin yields a considerable increase in computing efficiency throughout a wide range of IR lengths, number of channels, and audio buffer sizes, up to a 3.7-fold improvement. This enables real-time auralization with longer IRs and multiple source rendering with more plugin instances. Moreover, the proposed plugin enables instant listener-position updates, eliminating delays up to two buffer sizes and removing the audio latency caused by internal buffering.
Download: PDF (23.31 MB)
Download: PDF (23.31 MB)
Download: PDF (46.79 KB)
Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.