Journal of the Audio Engineering Society

2019 November - Volume 67 Number 11


Applications of Spatially Localized Active-Intensity Vectors for Sound-Field visualization

Authors: McCormack, Leo; Delikaris-Manias, Symeon; Politis, Archontis; Pavlidi, Despoina; Farina, Angelo; Pinardi, Daniel; Pulkki, Ville


This article details and evaluates three alternative approaches to sound-field visualization, which all employ the use of spatially-localized active-intensity (SLAI) vectors. SLAI vectors are particularly interesting as they allow direction-of-arrival (DoA) estimates to be extracted in multiple spatially-localized sectors, such that sound sources and/or noise present in one sector has reduced influence on the DoA estimate made in the other sectors. These DoA estimates may then be used to visualize the sound-field by either: i) directly depicting the estimates as icons, with their relative size dictated by the corresponding energy of each sector; ii) generating traditional activity-maps via histogram analysis of the DoA estimates; or iii) by using the DoA estimates to re-assign energy and subsequently sharpen traditional beamformer-based activity-maps. Since SLAI-based DoA estimates are continuous, these approaches are inherently computationally efficient, as they forgo the need for dense scanning grids to attain high-resolution imaging.

Spatial Perception of Sound Source Distribution in the Median Plane

Authors: Pulkki, Ville; Pöntynen, Henri; Santala, Olli


Modern spatial audio reproduction techniques with headphones or loudspeakers seek to control the perceived spatial image as accurately as possible in three dimensions. The mechanisms of spatial perception have been studied mainly in the horizontal plane, and this article attempts to shed some light on the corresponding phenomena in the median plane. Spatial perception of concurrently active sound sources was investigated in an exploratory listening experiment. Incoherent noise source distributions of varying spatial characteristics were presented from loudspeaker arrays in anechoic conditions. The arrays were coinciding with the ±45° angular sectors in the frontal median and horizontal planes. The task for immobile subjects was to report the directions of loudspeakers they perceived emitting sound. The results from median plane distributions suggest that two concurrent sources located along the vertical midline can be perceived individually without resorting to head movements when they are separated in elevation by 60° or more. With source pairs separated by less than 60°, and with more complex physical distributions, the distributions were perceived inaccurately, biased, and spatially compressed but nevertheless not as point-like auditory images.

This paper presents a multiband approach for crosstalk cancellation based on superdirective near-field beamforming (SDB) that adapts dynamically to a change in the listener position. SDB requires the computation of a separate set of beamformer weights for each listener position. The beamformer uses weights that exhibit a smooth evolution for listening positions along a linear trajectory parallel to the array. The beamformer weights can therefore be parameterized by using only a few parameters for each frequency. Upon real-time execution, the beamformer weights are determined efficiently for any position from the parameters with negligible error. Simulations and measurements show that the proposed method provides high channel separation and is robust with respect to small uncertainties of the listener position. A user study with 20 subjects and binaural signals shows consistent auditory localization accuracy across the different tested listening positions that are comparable to the localization accuracy of headphone rendering. The study also confirms the previously informal observation that fewer front-back confusions are observed when the listeners face away from the loudspeaker array compared to the listeners facing toward the array.

Given an ambisonics-encoded sound field (i.e., a sound field that has been decomposed into spherical harmonics), virtual navigation enables a listener to explore the recorded space and, ideally, experience a spatially- and tonally-accurate perception of the sound field. Suitable domains were established for the practical application of two state-of-the-art parametric interpolation methods for virtual navigation of ambisonics-encoded sound fields. Although several navigational methods have been developed, existing studies rarely include comparisons between methods, and practical assessments of such methods have been limited. The authors conducted numerical simulations in order to characterize and compare the performance of the time-frequency analysis interpolation method of Thiergart et al. to the recently-proposed parametric valid microphone interpolation method. The simulations involved simple incident sound fields consisting of a two-microphone array and a single point-source with varied source distance and azimuth, microphone spacing, and listener position. The errors introduced by the two methods were objectively evaluated in terms of metrics for sound level, spectral coloration, source localization, and diffuseness. Various practical domains were subsequently identified, and guidelines were established with which to choose between these methods based on their intended application.

Sparse Time-Frequency Representations for Polyphonic Audio Based on Combined Efficient Fan-Chirp Transforms

Authors: Costa, Maurício V. M.; Apolinário, Isabela F.; Biscainho, Luiz W. P.

In audio signal processing, several techniques rely on the Time-Frequency Representation (TFR) of an audio signal, and particularly in applications for music information retrieval. Examples include automatic music transcription, sound source separation, and classification of instruments playing in a musical piece. This paper presents a novel method for obtaining a sparse time-frequency representation by combining different instances of the Fan-Chirp Transform (FChT). The method described is comprised of two main steps: computing the multiple FChTs by means of the structure tensor; and combining them, along with spectrograms, using the smoothed local sparsity method. Experiments conducted with synthetic and real-world audio signals suggest that the proposed method is able to effectively yield much better TFRs than the standard short-time Fourier transform, especially in the presence of fast frequency variations; this allows using the FChT for polyphonic audio signals. As a result, the proposed method allows for better extraction of precise information from audio signals with multiple sources.

Standards and Information Documents

AES Standards Committee News

Download: PDF (66.29 KB)


[Feature] For augmented or assistive listening situations there is a compromise to be struck between hearing natural sounds from the environment and hearing reproduced sounds. Ideally the hear-through sound quality would be the same as if one was not wearing headphones. Bone conduction is another way of getting sounds into the head, and one that might be usable for spatializing information as part of a hybrid information display. It may be possible to adapt measurement methods intended for active noise canceling ear defenders to consumer ANC applications. One may be able to predict the degree of listening effort needed to hear speech through such headphones when noise is present.

Headphone Technology Conference Report, San Francisco

Download: PDF (939.87 KB)

New Officers 2020

Download: PDF (584.53 KB)


Section News

Download: PDF (344.28 KB)


Download: PDF (178.45 KB)


Download: PDF (46.84 KB)

AES Conventions and Conferences

Download: PDF (107.67 KB)


Table of Contents

Download: PDF (37.63 KB)

Cover & Sustaining Members List

Download: PDF (77.66 KB)

AES Officers, Committees, Offices & Journal Staff

Download: PDF (100.63 KB)

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content