You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
Neural audio synthesis (NAS) models offer interactive musical control over high-quality, expressive audio generators. While these models can operate in real time, they often suffer from high latency, making them unsuitable for intimate musical interaction. The impact of architectural choices in deep learning models on audio latency remains largely unexplored in the NAS literature. In this work, the authors investigate the sources of latency and jitter typically found in interactive NAS models. They then apply this analysis to the task of timbre transfer using the RAVE model (Realtime Audio Variational autoEncoder), a convolutional variational autoencoder for audio waveforms introduced by Caillon and Esling in 2021. Finally, an iterative design approach for optimizing latency is presented. This culminates with a model the authors call BRAVE (Bravely Realtime Audio Variational autoEncoder), which is low-latency and exhibits better pitch and loudness replication while showing timbre modification capabilities similar to RAVE. It is implemented in a specialized inference framework for low-latency, real-time inference, and a proof-of-concept audio plugin compatible with audio signals from musical instruments is presented. The authors expect the challenges and guidelines described in this document to support NAS researchers in designing models for low-latency inference from the ground up, enriching the landscape of possibilities for musicians.
Author (s): Caspe, Franco; Shier, Jordie; Sandler, Mark; Saitis, Charalampos; McPherson, Andrew
Affiliation:
Centre for Digital Music, Queen Mary University of London, London, UK; Centre for Digital Music, Queen Mary University of London, London, UK; Centre for Digital Music, Queen Mary University of London, London, UK; Centre for Digital Music, Queen Mary University of London, London, UK; Dyson School of Engineering, Imperial College, London, UK
(See document for exact affiliation information.)
Publication Date:
2025-05-01
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=22820
(608KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Caspe, Franco; Shier, Jordie; Sandler, Mark; Saitis, Charalampos; McPherson, Andrew; 2025; Designing Neural Synthesizers for Low-Latency Interaction [PDF]; Centre for Digital Music, Queen Mary University of London, London, UK; Centre for Digital Music, Queen Mary University of London, London, UK; Centre for Digital Music, Queen Mary University of London, London, UK; Centre for Digital Music, Queen Mary University of London, London, UK; Dyson School of Engineering, Imperial College, London, UK; Paper ; Available from: https://aes2.org/publications/elibrary-page/?id=22820
Caspe, Franco; Shier, Jordie; Sandler, Mark; Saitis, Charalampos; McPherson, Andrew; Designing Neural Synthesizers for Low-Latency Interaction [PDF]; Centre for Digital Music, Queen Mary University of London, London, UK; Centre for Digital Music, Queen Mary University of London, London, UK; Centre for Digital Music, Queen Mary University of London, London, UK; Centre for Digital Music, Queen Mary University of London, London, UK; Dyson School of Engineering, Imperial College, London, UK; Paper ; 2025 Available: https://aes2.org/publications/elibrary-page/?id=22820
@article{caspe2025designing,
author={caspe franco and shier jordie and sandler mark and saitis charalampos and mcpherson andrew},
journal={journal of the audio engineering society},
title={designing neural synthesizers for low-latency interaction},
year={2025},
volume={73},
issue={5},
pages={240-255},
month={may},}