Journal - AES

Journal of the Audio Engineering Society

The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.

The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work membership news, new products, and newsworthy developments in the field of audio.

If you are experiencing any issues with the E-library or the Online Journals access, please fill in this form.

>> Institutional Subscribers: View Your List of Purchased Issues

2023 Best Paper

Speech Intelligibility and Quality Evaluation of Automotive Microphones Using Different Test Metrics and Their Correlation

2025 July/August - Volume 73 Number 7/8

Download Entire Issue (6.23 MB)

*Only AES members and Institutional Journal Subscribers can download.

Papers

Effects of Reduced Information in the Performance of Low-Frequency Sound Zones

Authors: Cadavid, José; Bo Møller, Martin; Van Waterschoot, Toon; Bech, Søren; Østergaard, Jan

OPEN ACCESS

Sound zone techniques enable separate audio playback at distinct listening areas in rooms. This requires using loudspeaker arrays, multiple control microphones per zone, and carefully designed control filters. The filters are based on room impulse responses between all loudspeaker-microphone pairs that, in some cases, can be obtained with very short acquisition times. Low frequencies, featuring longer wavelengths, allow for fewer spatial sampling points inside the sound zones, simplifying the setup and decreasing data processing requirements. This study evaluates the effects of reducing room impulse responses acquisition times and sampling points in low-frequency sound zone rendering. Different combinations of both strategies and multiple spatial sampling arrangements were explored. Their performance was evaluated in two acoustic conditions under objective metrics, and the area of influence of the system around the sound zones was also studied. Both strategies were found effective in reducing the amount of information included while performing as good as cases comprising all information available. For instance, under short reverberation time, four microphones and 150-ms acquisition time performed comparably to 15 microphones and 16-s acquisition time. Using more sampling points reduced the effects of their arrangement in the performance. Both strategies resulted in faster control filter calculations, suggesting improved efficiency.

Download: PDF (62.82 MB)

Auralization of Finite-Element and Boundary Element Vibroacoustic Models With Wave Field Synthesis

Authors: Proulx, François; Berry, Alain; Gauthier, Philippe-Aubert

Wave field synthesis (WFS) is a spatial audio method that allows for auralization over an extended listening space and for a multi-listener experience. WFS is generally only suitable for simple compact sources and canonical extended virtual sources. It is not yet fully adapted to auralize complex radiating objects modeled via numerical methods, such as the finite element method (FEM) and the boundary element method (BEM). The authors propose an auralization workflow combining either FEM or BEM with WFS, for both coupled (vibroacoustic) and uncoupled (vibration) problems. General expressions are provided for loudspeaker driving functions using classical FEM and BEM outputs. The workflow is tested experimentally on a vibrating plate and a vibrating half-cylinder using a 24-loudspeaker WFS system. Both the frequency response and spatial directivity of these complex radiators are well reproduced. However, unwanted reflections from the listening room are visible in the later part of reproduced impulse responses. Although the methodology has inherent limitations that eventually need to be addressed in future work, it is believed that the proposed workflow is a step toward auralization approaches for complex sources and opens opportunities for engineers in terms of acoustic design and decision making.

Download: PDF (LOW Res)

Binaural Reproduction of Microphone Array Recordings With 2D Video in Mixed Reality

Authors: Lübeck, Tim; Ben-Hur, Zamir; Lou Alon, David; Crukley, Jeffery

Head-worn devices equipped with microphone arrays and cameras can be used to capture the experience from a user’s perspective and reproduce it in virtual, mixed, or augmented reality. A concept that has recently introduced is to present the video capture as a 2D video screen augmented into the real-world environment through a mixed reality headset. This study presents such a system for reproducing audio and video capture from glasses arrays as a video “augment” along with binaural audio. Results of an initial listening experiment are presented, evaluating different state-of-the-art methods for binaural rendering. A stereo rendering through virtual loudspeakers attached to the video "augment” is compared with head-locked and world-locked binaural syntheses based on a binaural beamforming approach. The results suggest that listeners rated beamforming-based reproduction higher than stereo rendering. World-locked rendering was not rated significantly better than the head-locked version.

Download: PDF (LOW Res)

Copy-Move Audio Forgery Detection Using Dominant Resonance Frequencies and Dynamic Time Warping

Authors: Singh, Kshitiz; Yadav, Jainath; Kumar, Rajeev

With advanced audio manipulation raising authenticity concerns, this study introduces a method to detect copy-move forgeries, where segments are duplicated and reinserted. It combines dominant resonance frequency analysis and dynamic time warping for high-precision detection, surpassing correlation-based methods. The approach segments audio, applies energy thresholds, and analyzes formant contours to track manipulations. Dynamic time warping provides a key advantage by dynamically aligning speech segments and effectively handling nonuniform distortions and pitch variations that traditional methods fail to capture. This capability significantly enhances its ability to detect temporally inconsistent duplications. Extensive evaluations on the Defense Advanced Research Projects–Texas Instruments/Massachusetts Institute of Technology Acoustic-Phonetic Continuous Speech Corpus (TIMIT; English) and the Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGPSEHSC; Hindi) data sets demonstrate this method’s exceptional accuracy and cross-lingual adaptability, reliably detecting subtle copy-move forgeries with superior precision even in complex voiced segments. This method’s practical applications extend to forensic investigations, combating misinformation and enhancing cybersecurity by ensuring audio content integrity.

Download: PDF (LOW Res)

Effects of Reduced Information in the Performance of Low-Frequency Sound Zones

Download: PDF (62.82 MB)

Copy-Move Audio Forgery Detection Using Dominant Resonance Frequencies and Dynamic Time Warping

Download: PDF (24.05 MB)

Binaural Reproduction of Microphone Array Recordings With 2D Video in Mixed Reality

Download: PDF (7.37 MB)

Auralization of Finite-Element and Boundary Element Vibroacoustic Models With Wave Field Synthesis

Download: PDF (25.47 MB)

Review Paper

Audio Signal Processing in the Artificial Intelligence Era: Challenges and Directions

Authors: Steinmetz, Christian J.; Uhle, Christian; Everardo, Flavio; Mitcheltree, Christopher; McElveen, J. Keith; Jot, Jean-Marc; Wichern, Gordon

OPEN ACCESS

Artificial intelligence (AI) has seen significant advancement in recent years, leading to increasing interest in integrating these techniques to solve both existing and emerging problems in audio engineering. In this paper, the authors investigate current trends in the application of AI for audio engineering, outlining open problems and applications in the research field. The paper begins by providing an overview of AI-based algorithm development in the context of audio, discussing problem selection and taxonomy. Next, human-centric AI challenges and how they relate to audio engineering are explored, including ethics, trustworthiness, explainability, and interaction, emphasizing the need for ethically sound and human-centered AI systems. Subsequently, technical challenges that arise when applying modern AI techniques to audio are examined, including robust generalization, audio quality, high sample rates, and real-time processing with low latency. Finally, the authors outline applications of AI in audio engineering, covering the development of machine learning–powered audio effects, synthesizers, automated mixing systems, and spatial audio, speech enhancement, dialog separation, and music generation. Emphasized are the need for a balanced approach that integrates humancentric concerns with technological advancements, advocating for responsible and effective application of AI.

Download: PDF (8.91 MB)

Audio Signal Processing in the Artificial Intelligence Era: Challenges and Directions

Download: PDF (8.91 MB)

Engineering reports

A Virtual Reality Interface for the Creation of 3D Spatial Audio Trajectories

Authors: Tomasetti, Matteo; Van Kerrebroeck, Bavo; Wanderley, Marcelo M.; Stefani, Domenico; Turchet, Luca

This paper presents SonoSpatia, a virtual reality (VR) system designed for creating 3D spatial audio trajectories. SonoSpatia leverages the immersive capabilities of VR technology to facilitate an intuitive and expressive approach for composers and sound designers willing to control positioning parameters in 3D space via gesture-based interactions. The authors conducted a user study with 12 expert composers, sound engineers, and sound designers to assess the ability of the VR interface to enhance the creative process of spatial audio trajectory creation via a more embodied interaction with the spatial audio parameters. To this end, the workflow of SonoSpatia was compared with those of conventional digital audio workstation (DAW) tools, comprising the ControlGRIS and SpatGRIS spatial audio software and the DAW Reaper. Results indicate that SonoSpatia significantly increases user engagement, satisfaction, absorption, and expressiveness over the conventional DAW-based counterpart. Notably, both the VR and DAW-based interfaces showed minimal differences in the interaction patterns, suggesting similar user interactions across dimensions despite the reported advantages of VR. Despite some challenges related to interface complexity and fine-grained control, the system was well-received by participants, suggesting a promising direction for enhancing 3D spatial audio trajectory creation with VR.

Download: PDF (LOW Res)