You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / Journal
The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.
The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work membership news, new products, and newsworthy developments in the field of audio.
If you are experiencing any issues with the E-library or the Online Journals access, please fill in this form.
Only AES members and Institutional Journal Subscribers can download
*Only AES members and Institutional Journal Subscribers can download.
Authors: Caspe, Franco; Shier, Jordie; Sandler, Mark; Saitis, Charalampos; McPherson, Andrew
Neural audio synthesis (NAS) models offer interactive musical control over high-quality, expressive audio generators. While these models can operate in real time, they often suffer from high latency, making them unsuitable for intimate musical interaction. The impact of architectural choices in deep learning models on audio latency remains largely unexplored in the NAS literature. In this work, the authors investigate the sources of latency and jitter typically found in interactive NAS models. They then apply this analysis to the task of timbre transfer using the RAVE model (Realtime Audio Variational autoEncoder), a convolutional variational autoencoder for audio waveforms introduced by Caillon and Esling in 2021. Finally, an iterative design approach for optimizing latency is presented. This culminates with a model the authors call BRAVE (Bravely Realtime Audio Variational autoEncoder), which is low-latency and exhibits better pitch and loudness replication while showing timbre modification capabilities similar to RAVE. It is implemented in a specialized inference framework for low-latency, real-time inference, and a proof-of-concept audio plugin compatible with audio signals from musical instruments is presented. The authors expect the challenges and guidelines described in this document to support NAS researchers in designing models for low-latency inference from the ground up, enriching the landscape of possibilities for musicians.
Download: PDF (8.7 MB)
Authors: Deruty, Emmanuel; Arbez-Nicolas, Pascal; Meredith, David
Aims: Twelve-tone equal temperament (12-TET) is recognized as the modern standard tuning for music. However, contemporary popular music may exhibit significant deviations from this framework. This study investigates such deviations in the music of the electronic musician Vitalic and others. Methods: The study examines relations between signal features and perceived pitches. The artist’s involvement ensures that the analysis focuses on aspects relevant to the music. Results: Deviations from 12-TET can be observed as a result of 1) tuning of quasiharmonic tones outside of 12-TET, 2) audible mistuned partials within an otherwise quasiharmonic context, and 3) subsets of partials perceived as distinct pitches outside 12-TET. Examples from other artists suggest that the use of pitches outside 12-TET is not limited to Vitalic’s music. Conclusions: The deviations from 12-TET are deliberate and pervasive. They are often linked to the acoustic properties of the tones, suggesting 1) a continuum between timbre and pitch and 2) the concept of a “resulting pitch” analogous to the idea of “resulting harmony” in Renaissance polyphonic music. Such conclusions challenge the traditional view of musical pitch.
Download: PDF (88.49 MB)
Authors: Kiyan, Roman; Preihs, Stephan; Peissig, Jürgen
Multichannel loudspeaker systems are generally optimized for a particular listening position—the so-called sweet spot. A sweet area around this position may be defined within which the reproduced sound field is judged to be similar—physically or perceptually—to what is observed in the sweet spot itself. Here, the sweet area is defined in terms of physical sound field features associated with the psychological construct of immersive musical experience, namely, diffuseness and interaural cross-correlation. Binaural and spherical impulse responses are measured in a grid around the sweet spot in a listening room, and deviations of the sound field features across the listening area are evaluated for multiple loudspeaker setups and playback signals. The dependency of sound field feature deviations on the interchannel correlation structure of the signals is analyzed. Deviations of diffuseness and the interaural correlation coefficient are found to be related to interchannel correlation. Subsets of loudspeakers in the reproduction setups whose correlations particularly impact the spatial variation of the sound field features are identified.
Download: PDF (9.49 MB)
Download: PDF (9.49 MB)
Download: PDF (88.49 MB)
Download: PDF (8.7 MB)
Authors: Sun, Shuyuan; Chen, Simiao; Shen, Yong
In subjective listening tests, listening conditions can significantly affect the measured results. Although there are some established standards, quantitative specifications for lighting level are rarely disclosed. This raises a question: does the lighting environment affect the perceived sound quality? In some audio-related industries, it is even widely believed that a darker environment is more conducive to improving the subjective sound quality. However, this conclusion clearly lacks convincing experimental data support and a scientific theoretical basis. In this study, a series of listening tests on preference sound quality were conducted under different lighting environments. The analyses on overall rating, program material, and the dispersion of raw data were presented. The results provide evidence that there are no significant effects of lighting on perceived overall sound quality. However, the unexpected significant effects related to program material were observed in some experiments. Additionally, the dispersion of ratings could be affected by the lighting level. These effects may contribute to the common misconception that a dim listening environment can improve perceived sound quality. More information and guidance toward conducting accurate and valid listening tests can be provided by comprehensive understanding of these effects.
Download: PDF (14.31 MB)
Download: PDF (14.31 MB)
Download: PDF (40.49 KB)
Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.