Home / Publications / Journal
The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.
The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work membership news, new products, and newsworthy developments in the field of audio.
If you are experiencing any issues with the E-library or the Online Journals access, please fill in this form.
Only AES members and Institutional Journal Subscribers can download
*Only AES members and Institutional Journal Subscribers can download.
Authors: Salmon, François; Berthomieu, Gauthier; Palacino, Julian; Paquier, Mathieu
When a sound field is sampled by a finite number of microphones, the upper-frequency range of the recorded content is affected by spatial aliasing. The frequency beyond which aliasing-induced artifacts occur is related to the distance between a microphone and its nearest neighbors. In this study, ambisonics signals of order N = 7 were encoded from simulated recordings made by virtual spherical microphone arrays of varying characteristics (radius, sampling scheme, and sampling grid order). A diffuse-field equalization, the role of which is to lessen the aliasing-induced spectral coloration, was also considered in the tested conditions. These aliased stimuli were then compared with unaliased stimuli of order N = 7. Results showed that the diffuse-field equalization had a great impact on the perceived differences. A perceptual threshold was calculated in order to determine whether aliasing artifacts are likely to be audible based on the amount of aliasing-induced encoding error.
Authors: De Muynke, Julien; Poirier-Quinot, David; Katz, Brian F. G.
Convolution with spatial Room Impulse Responses can achieve realistic auralizations. When combined with interpolation between spatially distributed RIRs, this technique can be used to create navigable virtual environments. This study explores the impact of various interpolation parameters on the perceived auditory stability of a nearby static sound source with listener movements in a reverberant environment. The auditory scene was rendered via third-order Ambisonic RIR convolution combined with magnitude-least-squares binaural decoding using nonindividualized head-related transfer functions. First, the estimated direction of arrival as a function of the listener’s position within a 2D grid of RIRs under various configurations is examined as an objective metric. The perceived stability of the auditory source is then assessed through a perceptual experiment. Participants freely explored a virtual scene reproduced over headphones and a tracked head-mounted display. They were asked to rate the stability of a nonvisual source under various conditions of RIR grid density, interpolation panning method, and reverberation time. Results indicate no need to use an RIR grid size finer than 1m to optimize source stability when using a three-nearest-neighbor interpolation scheme.
Authors: Hofmann, Anja; Meyer-Kahlen, Nils; Schlecht, Sebastian J.; Lokki, Tapio
This paper investigates audiovisual congruence in virtual reality with both horizontal and vertical offsets between audio and visual rendering. Audiovisual congruence and localization errors are assessed using loudspeaker playback and nonindividualized headphone rendering. To account for the influence of different types of visual information on congruence, presentations of a loudspeaker model and 3D human avatar were compared. Therefore, a new dataset of audiovisual speech was recorded. Results show that human avatar rendering increases perceived congruence, and experienced listeners have an increased tendency to respond with “incongruent” when a loudspeaker model is shown but not when the human avatar is presented. Moreover, a correlation is found between localization precision and audiovisual congruence for horizontally offset stimuli and avatar presentation. For vertical offsets, the angular range of congruence is generally large, and localization errors are high, so no correlation can be observed between the two. The paper contributes congruence ranges for audiovisual speech in virtual reality, which also has implications for augmented reality telepresence use.
Authors: Helmholz, Hannes; Crukley, Jeffery; Amengual Garí, Sebastià V.; Ben-Hur, Zamir; Ahrens, Jens
The authors present a perceptual evaluation of the binaural rendering quality of signals from several types of baffled microphone arrays. They employ the multi-stimulus category rating (MuSCR) paradigm that does not require a reference stimulus. The tested conditions also comprise a very high numerical accuracy stimulus, given the highest quality rating in approximately half of the multi-stimulus trials. A comparison with the literature on spherical microphone arrays (SMAs) shows that MuSCR allows for drawing representative conclusions regarding the dependency of the perceived quality on the array and rendering parameters as in previous experiments with an explicit high-fidelity reference. The authors applied the MuSCR paradigm to evaluate the perceived reproduction quality of equatorial microphone arrays (EMAs) with microphones only along the equator of the spherical baffle and of equatorial arrays with a nonspherical baffle (XMAs). The results endorse the observations from SMAs that an increasing spherical harmonic order leads to improved perceived quality. The authors also confirm that EMAs lead to the same perceived quality as SMAs despite the substantial difference in the number of microphones. Magnitude equalization of artifacts from spatial undersampling can be very effective for XMAs the raw solution deviates significantly from the other array types at high frequencies.
Authors: De Bortoli, Gian Marco; Prawda, Karolina; Schlecht, Sebastian J.
Active acoustics (AA) systems are used to electronically modify the acoustics of a room (e.g., in live music venues). AA systems have an inherent feedback component and can suffer from instability and coloration artifacts resulting from too high feedback gains. State-of-the-art methods can improve system stability and coloration, usually at the cost of complex implementations and long parameter-tuning sessions. They can also cause sound artifacts due to time-varying components, limiting the enhancement at low frequencies. This work proposes a time-invariant feedback attenuation method for low frequencies based on a modal reverberator. The attenuation is achieved through destructive acoustic interference, obtained via phase shifts between the input and output signals. The analyzed frequency range is 0–500 Hz, where the room transfer functions are considered highly invariant over time. The results show a gain-before-instability increase of more than 5 dB for a modal reverberator with high mode density in this frequency range. The improvement is also stable for low-magnitude changes in the room transfer functions over time. The proposed method provides a robust AA system with artificial reverberation for the low-frequency range and can be used alongside other established methods.
Authors: Năstasă, Mădălina; Pulkki, Ville; Mäkivirta, Aki
This paper studies the effect of room modal resonances on the localization of very low–frequency sound sources. A subjective listening test is conducted with 20 participants in an anechoic chamber, where the listener must detect the direction of the sound source for pure sinusoids at 31.5, 50, and 80 Hz. A synthetic standing wave pattern modeling a room resonant effect is created with two additional sound sources located at the left and right sides of the listener. Results show that the perception of low-frequency direction is negatively impacted by the node of the standing wave, even when the standing wave has a relatively low level, whereas the antinode does not have as strong of an effect. The second experiment conducted indicates that variations in the presentation level does not impact the effect of the standing wave. The results of this study suggest that in the low-frequency spectrum, direction judgement is not so strongly a question of this auditory system’s ability but more so of the acoustical properties of the listening environment.
Authors: Pörschmann, Christoph; Lübeck, Tim; Arend, Johannes M.
With the Spherical Array Interpolation by Time Alignment (SARITA) method, the authors introduced an approach for spatial upsampling of spherical microphone array (SMA) signals (T. Lübeck, J. M. Arend, and C. Pörschmann, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1163–1174 [2023]). The basic idea of this method is to perform the interpolation after time-aligning adjacent microphone signals. The upsampled SMA signals can be represented as spherical harmonic coefficients of much higher spatial order than is possible with the sparsely measured signals. Instead of impulse responses, the method is now applied to SMA recordings. Binaural decoding of upsampled SMA recordings is compared technically and perceptually to a baseline Ambisonics decoding. The results show that the SARITA method can also be applied to low-order array recordings and significantly improves their binaural reproduction.
Authors: Hedges, Jacob Michael; Sazdov, Robert; Johnston, Andrew
Extended reality and digital games strive to deliver a high level of “immersion,” a complex phenomenon influenced by both perceptual and psychological factors. Audio plays a crucial role in shaping immersive experiences, yet there is no clear consensus on its impact or on the best methods to evaluate it. This paper presents a systematic literature review spanning two decades, outlining the methods and findings related to how audio influences immersion in extended reality and digital games. It reveals a strong preference for experiments using virtual reality headsets and headphones and notes a gap in research on augmented and mixed reality environments. Moreover, it underscores the need for audio-specific metrics to better assess the ways that audio variables impact immersion. The findings demonstrate that audio elements like spatial fidelity, music, and the integration of sound in multimodal environments generally contribute to the immersive experience but also highlight a threshold beyond which further enhancements may not perceptibly improve the experience. It emphasizes the need for realtime, objective measures of immersion as well as the consideration of diverse methodological approaches to deepen the understanding of audio’s role in immersive technologies.
Download: PDF (62.4 KB)
Download: PDF (62.51 KB)
Download: PDF (45.1 KB)
Download: PDF (51.05 KB)