Home / Publications / Journal
The Journal of the Audio Engineering Society — the official publication of the AES — is the only peer-reviewed journal devoted exclusively to audio technology. Published 10 times each year, it is available to all AES members and subscribers.
The Journal contains state-of-the-art technical papers and engineering reports; feature articles covering timely topics; pre and post reports of AES conventions and other society activities; news from AES sections around the world; Standards and Education Committee work membership news, new products, and newsworthy developments in the field of audio.
If you are experiencing any issues with the E-library or the Online Journals access, please fill in this form.
Only AES members and Institutional Journal Subscribers can download
*Only AES members and Institutional Journal Subscribers can download.
Authors: Upadhyaya, Sreenivasa; Buyens, Wim; Desmet, Wim; Karsmakers, Peter
The variation in the acoustic condition of a room presents a major hurdle in the performance robustness of sound event classification. Room impulse response characterizes the way in which a sound wave is propagated from source to receiver and the overall perceptual quality and intelligibility of the recorded sound. This study presents the Room Acoustic Adversarial Neural Network (RAANN) method that can make sound event classification more robust to changes in acoustic condition by exploiting knowledge regarding the room acoustics during learning. With RAANN, the weighted F1 score for the classification task improved by 1.54 percentage points, and the standard deviation in performance dropped from 1.74 percentage points to 1.07 percentage points for acoustic conditions that were harder than those seen during the learning phase. The Clarity Index over the first 25 ms emerged as a good metric for the acoustic estimation in the RAANN training.
Authors: La Roda Mauro, Joan F.; Ramírez-Solana, David; Redondo, Javier
Over the past decade, there has been growing concern about how the audience affects sound wave propagation at open-air concerts, particularly in the low-frequency range. The key lies in the difference between the conditions when the system engineer sets up the mid-high units and the subwoofers before the show, without the audience, and when a dense crowd occupies the audience area during the performance. In this paper it will be shown that the audience behaves as an equivalent medium resulting in similar phenomena to those found in some acoustic metamaterials, delaying the sound propagation compared with the air above the audience. This delay affects the entire sound wave, modifying its phase and magnitude as it propagates. The front-of-house engineer can experience these phenomena, and the analysis of how it affects the front-of-house position, whether raised or on the ground, is required to properly set up the sound system within the varying conditions.
Authors: Kim, Taeho; Pulkki, Ville
Recent spatial audio techniques involve separating multichannel signals into direct and background parts. However, determining parameters for localizing short sound sources in background sounds remains challenging due to the limited knowledge of spatial hearing resolution. This paper investigates the localization performance when short bursts in the median plane are presented with spectrally similar, horizontally spread broadband noise. Listening tests examined target stimuli in different median plane locations with and without masker noises, using elevation gain, bias, and error rate to evaluate localization performance. The target stimuli comprised aperiodically repeated multiple-burst stimuli with different burst rates and levels and single-burst stimuli with varied duration and levels. The results showed that the burst rate of multiple-burst stimuli had a weak systematic effect on all the criteria for localization performance, regardless of noise. However, extending the duration of single-burst stimuli increased the elevation gain, and the added noise further improved the localization performance. The masker also improved localization performance when the sound level increased, while unmasked stimuli had the opposite effect. The optimal conditions for improving localization performance with background noise found in this study were a signal-to-noise ratio of ≥18 dB.
Authors: Meyer-Kahlen, Nils; Schlecht, Sebastian J.; Amengual Garí, Sebastià V.; Lokki, Tapio
Experiments testing sound for augmented reality can involve real and virtual sound sources. Paradigms are either based on rating various acoustic attributes or testing whether a virtual sound source is believed to be real (i.e., evokes an auditory illusion). This study compares four experimental designs indicating such illusions. The first is an ABX task suitable for evaluation under the authenticity paradigm. The second is a Yes/No task, as proposed to evaluate plausibility. The third is a three-alternative-forced-choice (3AFC) task using different source signals for real and virtual, proposed to evaluate transfer-plausibility. Finally, a 2AFC task was tested. The renderings compared in the tests encompassed mismatches between real and virtual room acoustics. Results confirm that authenticity is hard to achieve under nonideal conditions, and ceiling effects occur because differences are always detected. Thus, the other paradigms are better suited for evaluating practical augmented reality audio systems. Detection analysis further shows that the 3AFC transfer-plausibility test is more sensitive than the 2AFC task. Moreover, participants are more sensitive to differences between real and virtual sources in the Yes/No task than theory predicts. This contribution aims to aid in selecting experimental paradigms in future experiments regarding perceptual and technical requirements for sound in augmented reality.
Authors: Lanterman, Aaron D.; Hasler, Jennifer O.
In 1973, David Blackmer introduced a voltage-controlled amplifier consisting of a pair of negative-positive-negative transistors performing logging and antilogging functions coupled with a complementary positive-negative-positive pair. This “Blackmer cell” has been widely used in studio equipment such as compressors and mixers. The authors consider replacing the bipolar junction transistors with metal-oxide-semiconductor field-effect transistors in the subthreshold region to exploit the resulting exponential voltage/current relationship, particularly focusing on its implementation on a field-programmable analog array. Resistors used for input voltage-to-current conversion and output current-to-voltage current conversion are replaced with operational transconductance amplifiers. Simulations show that the circuit may be expected to work well as a voltage-controlled attenuator but may react catastrophically when attempting gains greater than unity due to the nonlinearity of the output operational transconductance amplifier. The effect of various mismatches in transistor parameters is studied.
Authors: Garland, Kevin; Ronan, Malachy; Bassett, Mark
Technical ear training programs generally focus on engaging learners in tasks to increase their sensitivity to various auditory attributes. However, very little work has been done on integrating ear training into a multitrack environment where contextual factors dictate the employment and configuration of audio processing tools for mixing engineers. For novice audio engineers, transference of critical listening skills to multitrack scenarios within a conventional digital audio workstation may prove challenging without guidance from an experienced engineer. This report describes the design and development of ReFlow, a novel reverberation training prototype that embeds technical ear training within a dedicated multitrack environment to facilitate bespoke matching tasks based on expert performance. Expert performance is communicated to the end user via traditional and cognitive apprenticeship methods in which learners recall parameter configurations in the context of a multitrack mix while engaging with underlying conceptual knowledge provided by the modeled engineer.
Authors: Fraser, Helen; Aubanel, Vincent; Maher, Robert C.; Mawalim, Candy; Wang, Xin; Poc̆ta, Peter; Keith, Emma; Chollet, Gérard; Pizzi, Karla
This paper proposes an innovative interdisciplinary approach to evaluating the effectiveness of forensic speech enhancement (FSE). FSE faces unique challenges arising from a range of factors, from poor recording quality, highly variable conditions from case to case, and content uncertainty. Despite these difficulties, FSE is commonly admitted in court, and can significantly influence the outcome of criminal trials. Current FSE practices are hindered by unrealistic expectations from courts, which often assume that enhanced audio inherently clarifies content. In fact, FSE can have the undesired opposite effect, potentially resulting in unfair prejudice, when, for example, it increases the credibility of a misleading transcript. The proposed interdisciplinary project advocates for a better consideration of speech perception factors, particularly those related to transcription. It aims to bridge the gap between FSE and forensic transcription by promoting a combined approach to enhancing and accurately transcribing forensic audio. By developing a position statement on FSE capabilities, the project seeks to establish realistic standards and foster collaboration among researchers and practitioners. This effort aims to ensure reliable, accountable forensic audio evidence, aligning with forensic science standards and improving the effectiveness of the justice system.
Download: PDF (62.73 KB)
Download: PDF (62.57 KB)
Download: PDF (47.74 KB)