Journal of the Audio Engineering Society

2025 May - Volume 73 Number 5

Papers


Designing Neural Synthesizers for Low-Latency Interaction

Authors: Caspe, Franco; Shier, Jordie; Sandler, Mark; Saitis, Charalampos; McPherson, Andrew


OPEN ACCESS

Neural audio synthesis (NAS) models offer interactive musical control over high-quality, expressive audio generators. While these models can operate in real time, they often suffer from high latency, making them unsuitable for intimate musical interaction. The impact of architectural choices in deep learning models on audio latency remains largely unexplored in the NAS literature. In this work, the authors investigate the sources of latency and jitter typically found in interactive NAS models. They then apply this analysis to the task of timbre transfer using the RAVE model (Realtime Audio Variational autoEncoder), a convolutional variational autoencoder for audio waveforms introduced by Caillon and Esling in 2021. Finally, an iterative design approach for optimizing latency is presented. This culminates with a model the authors call BRAVE (Bravely Realtime Audio Variational autoEncoder), which is low-latency and exhibits better pitch and loudness replication while showing timbre modification capabilities similar to RAVE. It is implemented in a specialized inference framework for low-latency, real-time inference, and a proof-of-concept audio plugin compatible with audio signals from musical instruments is presented. The authors expect the challenges and guidelines described in this document to support NAS researchers in designing models for low-latency inference from the ground up, enriching the landscape of possibilities for musicians.

Aims: Twelve-tone equal temperament (12-TET) is recognized as the modern standard tuning for music. However, contemporary popular music may exhibit significant deviations from this framework. This study investigates such deviations in the music of the electronic musician Vitalic and others. Methods: The study examines relations between signal features and perceived pitches. The artist’s involvement ensures that the analysis focuses on aspects relevant to the music. Results: Deviations from 12-TET can be observed as a result of 1) tuning of quasiharmonic tones outside of 12-TET, 2) audible mistuned partials within an otherwise quasiharmonic context, and 3) subsets of partials perceived as distinct pitches outside 12-TET. Examples from other artists suggest that the use of pitches outside 12-TET is not limited to Vitalic’s music. Conclusions: The deviations from 12-TET are deliberate and pervasive. They are often linked to the acoustic properties of the tones, suggesting 1) a continuum between timbre and pitch and 2) the concept of a “resulting pitch” analogous to the idea of “resulting harmony” in Renaissance polyphonic music. Such conclusions challenge the traditional view of musical pitch.

Download: PDF (88.49 MB)

Multichannel loudspeaker systems are generally optimized for a particular listening position—the so-called sweet spot. A sweet area around this position may be defined within which the reproduced sound field is judged to be similar—physically or perceptually—to what is observed in the sweet spot itself. Here, the sweet area is defined in terms of physical sound field features associated with the psychological construct of immersive musical experience, namely, diffuseness and interaural cross-correlation. Binaural and spherical impulse responses are measured in a grid around the sweet spot in a listening room, and deviations of the sound field features across the listening area are evaluated for multiple loudspeaker setups and playback signals. The dependency of sound field feature deviations on the interchannel correlation structure of the signals is analyzed. Deviations of diffuseness and the interaural correlation coefficient are found to be related to interchannel correlation. Subsets of loudspeakers in the reproduction setups whose correlations particularly impact the spatial variation of the sound field features are identified.

Download: PDF (9.49 MB)

Signal Dependencies of the Immersion Sweet Area in Multichannel Loudspeaker Reproduction

Download: PDF (9.49 MB)

Methods for Pitch Analysis in Contemporary Popular Music: Deviations From 12-Tone Equal Temperament in Vitalic’s Work

Download: PDF (88.49 MB)

Designing Neural Synthesizers for Low-Latency Interaction

Engineering reports



OPEN ACCESS

In subjective listening tests, listening conditions can significantly affect the measured results. Although there are some established standards, quantitative specifications for lighting level are rarely disclosed. This raises a question: does the lighting environment affect the perceived sound quality? In some audio-related industries, it is even widely believed that a darker environment is more conducive to improving the subjective sound quality. However, this conclusion clearly lacks convincing experimental data support and a scientific theoretical basis. In this study, a series of listening tests on preference sound quality were conducted under different lighting environments. The analyses on overall rating, program material, and the dispersion of raw data were presented. The results provide evidence that there are no significant effects of lighting on perceived overall sound quality. However, the unexpected significant effects related to program material were observed in some experiments. Additionally, the dispersion of ratings could be affected by the lighting level. These effects may contribute to the common misconception that a dim listening environment can improve perceived sound quality. More information and guidance toward conducting accurate and valid listening tests can be provided by comprehensive understanding of these effects.

Download: PDF (14.31 MB)

The Effects of Lighting on Perceptual Sound Quality in Subjective Listening Tests

Download: PDF (14.31 MB)

Standards and Information Documents


AES Standards Committee News

Download: PDF (222.9 KB)

Departments


Conv&Conf

Download: PDF (1.28 MB)

Extras


Table of Contents

Download: PDF (40.49 KB)

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:










Skip to content