Journal of the Audio Engineering Society

2025 March - Volume 73 Number 3

Papers


The evaluation of audio quality is important in the development of immersive audio algorithms and reproduction systems, and binaural models are often used for this as a quick alternative to listening tests. Coloration (i.e., perceived loudness differences integrated across ears and frequency) is one key quality aspect; however, the majority of models used to predict coloration are often oversimplified or are missing a dedicated binaural stage to consider the relative contribution of the left and right ear signals. A binaural coloration model is presented that builds upon previous work and tests three different approaches for its binaural stage. The proposed model is evaluated in comparison with nine models that are frequently used to predict coloration by using data from five listening tests totaling 252 stimuli with various audio contents and source positions. The proposed model performed best with 85% of explained variance, followed by predictions based on ISO 532-1 loudness, yielding 78% explained variance. The commonly used log-spectral distance performed worst, with only 44% explained variance. The three tested binaural stages had little influence on the performance of the proposed model. The model is made freely available to download.

This paper presents a method for modeling optical dynamic range compressors using deep neural networks with selective state space models. The proposed approach surpasses previous methods based on recurrent layers by employing a selective state space block to encode the input audio. It features a refined technique integrating feature-wise linear modulation and gated linear units to adjust the network dynamically, conditioning the compression’s attack and release phases according to external parameters. The proposed architecture is well-suited for low-latency and real-time applications, which are crucial in live audio processing. The method has been validated on the analog optical compressors Tube-Tech CL 1B and Teletronix LA-2A, which possess distinct characteristics. Evaluation is performed using quantitative metrics and subjective listening tests, comparing the proposed method with other state-of-the art models. Results show that black-box modeling methods used here outperform all others, achieving accurate emulation of the compression process for both seen and unseen settings during training. Furthermore, it is shown that there is a correlation between this accuracy and the sampling density of the control parameters in the data set and it is identified the settings with fast attack and slow release as the most challenging to emulate.

Numerical Noise in Fixed-Pole Parallel and Kautz Filters

Authors: Horváth, Kristóf; Bank, Balázs

In audio filtering, Kautz and fixed-pole parallel filters are commonly used, as they have the ability to approximate the frequency resolution of hearing. However, infinite impulse response filters are susceptible to numerical noise, as quantization happens in their feedback loop. This is especially severe in systems where only fixed-point arithmetic is available. In order to limit the noise to an acceptable level, several low-noise second-order structures have been developed. In this paper, the authors compare the numerical noise properties of Kautz and fixed-pole parallel filters implemented in fixed-point arithmetic. The authors present the results for two different scaling methods and show that the common Kautz realization has unacceptable noise levels. Instead, they suggest the use of better performing parallel filters for general (nonadaptive) infinite impulse response filtering and a rearranged all-pass–based Kautz structure for adaptive filtering.

Influence of Head Rotation Speed and Individual Head-Related Transfer Functions on Latency Detection Threshold in Dynamic Binaural Rendering

Authors: Rappin, Clément; Palacino, Julian; Rueff, Pascal; Feichter, Laurent; Paquier, Mathieu

Binaural audio is a relevant rendering technique for mass diffusion in immersive experiences. By adding a head tracking system, dynamic binaural rendering can improve the overall quality in comparison with static rendering. However, introduced latency between the head movement and the audio rendering is detrimental to the audio experience. Previous studies estimated the latency detection threshold for binaural listening, but the influence of head movements remains unclear. In this paper, two listening tests on latency detection threshold are presented. Several excerpts were used: pink noise, male speech, a pair of congas, and coffee shop ambiance. The first experiment investigated the influence of head rotation speed on the latency detection threshold. The second experiment focused on the impact of head-related transfer functions. An absolute judgment protocol was used in both tests. Results showed that latency was globally easier to detect with faster movement for expert subjects. No global differences between nonindividual and individual head-related transfer functions were observed. In both experiments, pink noise led to a significantly lower latency detection threshold. Large intersubject differences were also observed.

Review Paper


Issues and Challenges of Audio Technologies for the Musical Metaverse

Authors: Boem, Alberto; Tomasetti, Matteo; Turchet, Luca


OPEN ACCESS

Among all the activities envisioned for the metaverse, music has thus far received comparatively less attention. While virtual concerts and music festivals have been successful in drawing substantial audiences and increasing public attention to the idea of the metaverse, the metaverse is not ready for musicians who decide to take advantage of the distinctive features of socially immersive environments to express themselves and create music together. In this article, the authors analyze the state-of-the-art audio technologies used for the creation of shared, Audio-First immersive environments such as the musical metaverse. This work reveals important issues in consumer electronics that currently prevent the realization of a metaverse compatible with musical activities. These include hardware and software limitations used to create and experience shared immersive environments through real-time audio. This work also emphasizes two key challenges: reducing delays in network and audio processing, and addressing the lack of universal standards for spatial audio systems across different platforms. The authors believe that looking at the metaverse from the point of view of musical technologies will provide practitioners in academia and industry with key insights into what is needed to achieve true real-time activities and support human expression in the metaverse in general.

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:










Skip to content