Journal-Online

Home / Publications / Journal-Online

Journal of the Audio Engineering Society

>> Institutional Subscribers: View Your List of Purchased Issues

2020 November - Volume 68 Number 11

Download Entire Issue (5.32 MB)

*Only AES members and Institutional Journal Subscribers can download.

Papers

A Compact Spherical Loudspeaker Array for Efficiently Recreating Instrument Directivities

Authors: Neal, Matthew T.; Vigeant, Michelle C.

High levels of realism can be achieved using measurement-based auralizations of concert halls, but typical measurement loudspeakers limit source realism. A compact spherical loudspeaker array was designed to reconstruct the frequency-dependent radiation patterns of different orchestral instruments. A filter bank for each instrument was designed to preprocess measurement signals with built-in source radiation patterns for room impulse response measurements.

Download: PDF (HIGH Res)

Download: PDF (LOW Res)

Loudness Differences for Voice-Over-Voice Audio in TV and Streaming

Authors: Geary, David; Torcoli, Matteo; Paulus, Jouni; Simon, Christian; Straninger, Davide; Travaglini, Alessandro; Shirley, Ben

Voice-over-Voice (VoV) is a common mixing practice observed in news reports and documentaries, where a foreground voice is mixed on top of a background voice, e.g., to translate an interview. This is achieved by ducking the background voice so that the foreground voice is more intelligible, while still allowing the listener to perceive the presence and tone of the background voice. Currently there is little published research on ducking practices for VoV or on technical details such as the Loudness Difference (LD) between foreground and background speech. This paper investigates the ducking practices of nine expert audio engineers and the preferred LDs of 13 non-expert listeners of ages 57 years and older. Results highlight a clear difference between the LDs used by the experts and those preferred by the non-expert listeners. Experts tended toward LDs of 11.5–17 LU, while non-experts preferred a range of 20–30 LU. Based on these results, a minimum LD of 20 LU is recommended for VoV. High inter-subject variance due to personal preference was observed. This variance makes a substantial case for the introduction of personalization in broadcast and streaming. The audiovisual material used for the tests is provided at https://www.audiolabs-erlangen.de/resources/2020-VoV-DB.

Download: PDF (HIGH Res)

Download: PDF (LOW Res)

Investigation Into Consistency of Subjective and Objective Perceptual Selection of Non-individual Head-Related Transfer Functions

Authors: Kim, Chungeun; Lim, Veranika; Picinali, Lorenzo

OPEN ACCESS

The binaural technique uses a set of direction-dependent filters known as Head-Related Transfer Functions (HRTFs) in order to create 3D soundscapes through a pair of headphones. Although each HRTF is unique to the person it ismeasured from, due to the cost and complexity of the measurement process pre-measured non-individual HRTFs are generally used. This study investigates whether it is possible for a listener to perceptually select the best-fitting non-individual HRTFs in a consistent manner, using both subjective and objective methods. 16 subjects participated in 3 repeated sessions of binaural listening tests. During each session, participants firstly listened tomoving sound sources spatialized using 7 different non-individual HRTFs and ranked them according to perceived plausibility and externalization (subjective selection). They then performed a localization task with sources spatialized using the same HRTFs (objective selection). In the subjective selection, 3 to 9 participants showed test-retest reliability levels that could be regarded as good or excellent depending on the attribute under question, the source type, and the trajectory. The reliability was better for participants with musical training and critical audio listening experience. In the objective selection, it was not possible to find significant differences between the tested HRTFs based on localization-related performances.

Download: PDF (HIGH Res) (1.13 MB)

Download: PDF (LOW Res) (422.86 KB)

Pressure Matching With Forced Filters for Personal Sound Zones Application

Authors: Vindrola, Lucas; Melon, Manuel; Chamard, Jean-Christophe; Gazengel, Bruno

This paper presents a rethinking of the Pressure Matching Method (PM) used in the generation of Personal Sound Zones when the responses of some filters are already known. They are then imposed in the calculation, resulting in a Forced Pressure Matching method. This new formulation is implemented to control two zones—a reproduction zone and dark zone—in a two-seat configuration aimed toward the Transportation industry. Due to variations in transportation acoustic environments, the computational time is added to the metrics typically used in the Personal Sound Zones literature (such as acoustic contrast, effort, error, etc.), foreseeing the need of an adaptive system. Perfect Dirac delta functions were forced as filters of the loudspeakers closest to the reproduction zone. The new formulation achieved the same acoustic contrast, effort, and reproduction error very similar to that of the conventional PMbut calculated the filters 24% faster.

Download: PDF (HIGH Res)

Download: PDF (LOW Res)

Acoustic Scene Classification Using Pixel-Based Attention

Authors: Wang, Xingmei; Xu, Yichao; Shi, Jiahao; Teng, Xuyang

OPEN ACCESS

In this paper, we propose a pixel-based attention (PBA) module for acoustic scene classification (ASC). By performing feature compression on the input spectrogram along the spatial dimension, PBA can obtain the global information of the spectrogram. Besides, PBA applies attention weights to each pixel of each channel through two convolutional layers combined with global information. In addition, the spectrogram applied after the attention weights is multiplied by the gamma coefficient and superimposed with the original spectrogram to obtain more effective spectrogram features for training the network model. Furthermore, this paper implements a convolutional neural network (CNN) based on PBA (PB-CNN) and compares its classification performance on task 1 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge with CNN based on time attention (TB-CNN), CNN based on frequency attention (FB-CNN), and pure CNN. The experimental results show that the proposed PB-CNN achieves the highest accuracy of 89.2% among the four CNNs, 1.9% higher than that of TB-CNN (87.3%), 2.2% higher than that of FB-CNN (86.6%), and 3% higher than that of pure CNN (86.2%). Compared with DCASE 2016’s baseline system, the PB-CNN improved by 12%, and its 89.2% accuracy was the highest among all submitted single models.

Download: PDF (HIGH Res) (4.08 MB)

Download: PDF (LOW Res) (588.37 KB)

Engineering reports

Design and Evaluation of a Spectral Phase Rotation Algorithm for Upmixing to 3D Audio

Authors: Keyes, Christopher J.; Tan, Alfred

This paper details the design and evaluation of a novel frequency-domain digital signal processing algorithm intended for upmixing of previously recorded audio to larger 3D loudspeaker arrays either by itself or in combinationwith other upmixing techniques. The algorithm attempts to mimic the dynamic and complex phase variances experienced by listeners in concerts of acoustic music. Critical listening evaluations support the algorithm’s effectiveness in increasing the perceived spaciousness and liveliness and show a significant preference for its use.

Download: PDF (HIGH Res)

Download: PDF (LOW Res)

Quasar Spectroscopy Sound: Analyzing Intergalactic and Circumgalactic Media via Data Sonification

Authors: Hansen, Brian; Burchett, Joseph N.; Forbes, Angus G.

In this paper, we present sonification approaches to support research in astrophysics, using sound to enhance the exploration of the intergalactic medium and the circumgalactic medium. Astrophysicists often analyze matter in these media using a technique called absorption line spectroscopy. Our sonification approaches convey key spectral features identified via this technique, including the presence and width of spectral absorption lines within a region of the Universe, the relationship of a particular redshift location with respect to the absorption peak of a spectral absorption line, and the density of gas at various regions of the Universe. In addition, we introduce Quasar Spectroscopy Sound, a novel software tool that enables researchers to perform these sonification techniques on cosmological data sets, potentially accelerating the discovery and classification of matter in the intergalactic medium and circumgalactic medium.

Download: PDF (HIGH Res)

Download: PDF (LOW Res)

Standards and Information Documents

Standards News

Download: PDF (HIGH Res)

Features

Loudspeaker technology for the 21st century

Authors: Rumsey, Francis

[Feature] An interesting approach to MEMS loudspeaker design delivers surprisingly good results from a balanced radiator, and "omni" loudspeakers made up of multiple drivers on the surface of a sphere can be modelled in such a way as to understand their likely directivity response. We find that limiting the overall energy stored in a loudspeaker system may prove more effective as a means of control than conventional voltage limiting. Finally there is clearly still considerable interest in research on flat panel loudspeakers, and some evidence that timbral uniformity across listening locations may be improved when using these.

Download: PDF (LOW Res)