Journal of the Audio Engineering Society

2020 September - Volume 68 Number 9


Wavelet-Based Spatial Audio Format

Authors: Scaini, Davide; Arteaga, Daniel

Ambisonics is a spatial audio technique covering all steps of the audio production chain, from encoding and recording to transmission and decoding, whose building blocks are the spherical harmonics. Some of the drawbacks of low order Ambisonics, like large source spread and small sweet-spot, are directly related to the fact that spherical harmonics do not have compact support on the sphere. In this paper we propose a novel spatial audio format similar in spirit to Ambisonics but which replaces the spherical harmonics by an alternative set of functions with compact support, the spherical wavelets.We develop a complete audio chain from encoding to decoding, using discrete spherical wavelets built on a multiresolution mesh, illustrating with an example implementation of the format. We present a decoding algorithm optimizing acoustic and psychoacoustic parameters that can generate decoding matrices to irregular layouts for both Ambisonics and the new wavelet format. This audio workflow is directly compared with Ambisonics. For an industry-standard loudspeaker layout, we show how we can reach well localized sound sources with almost no negative gains (which are a common issue in most Ambisonics decoder designs). The approach is very flexible: there are different possible incarnations of the wavelet-based audio format, depending on the specific multiresolution mesh and the wavelet family, making possible to customize the format, for example adapting it tomeshes that closely resemble the distribution of loudspeakers in standard layouts.

Effect of Skill Level on Listener Performance in 3D Audio Evaluation

Authors: Howie, Will; Martin, Denis; Kim, Sungyoung; Kamekawa, Toru; King, Richard

A previous experiment (Part 1) found that, within the context of 3D audio evaluation, both audio production experience and musical training were significant predictors of listener consistency in making preference or attribute rating judgments of stimuli. In that study, 72 subjects ranging from highly experienced to na ¨ive listeners evaluated an excerpt of orchestral music captured by three different 3D music-recording techniques. Using the same data set from Part 1, the current study (Part 2) examines whether the results of skilled listeners can be generalized to the larger population of unskilled listeners within the context of 3D audio evaluation. Results show no significant changes in the rank order of recording technique attribute ratings or preferences as a function of listener skill. Results also show that using highly skilled participants will result in gains in sta- tistical power. This allows for the detection of subtler differences between stimuli or greater efficiency in the number of trials needed to achieve a significant result.

Two methods for undertaking subjective evaluation were compared: a pairwise dissimilarity task (PDT) and a projective mapping task (PMT). For a set of unambiguous, synthetic, auditory stimuli, the aim was to determine the following: whether the PMT limits the recovered dimensionality to two dimensions; how subjects respond using PMT’s two-dimensional response format; the relative time required for PDT and PMT; and hence, whether PMT is an appropriate alternative to PDT for experiments involving auditory stimuli. The results of both Multi-Dimensional Scaling (MDS) analyses and Multiple Factor Analyses (MFA) indicate that, with multiple participants, PMT allows for the recovery of three meaningful dimensions. The results from the MDS and MFA analyses of the PDT data, on the other hand, were ambiguous and did not enable recovery of more than two meaningful dimensions. This result was unexpected given that PDT is generally considered not to limit the dimensionality that can be recovered. Participants took less time to complete the experiment using PMT compared to PDT (a median ratio of approximately 1:4), and employed a range of strategies to express three perceptual dimensions using PMT’s two-dimensional response format. PMT may provide a viable and efficient means to elicit up to 3-dimensional responses from listeners.

To describe the sound radiation of the human voice into all directions, measurements need to be performed on a spherical grid. However, the resolution of such captured directivity patterns is limited and methods for spatial upsampling are required, for example by interpolation in the spherical harmonics (SH) domain. As the number of measurement directions limits the resolvable SH order, the directivity pattern suffers from spatial aliasing and order-truncation errors. We present an approach for spatial upsampling of voice directivity by spatial equalization. It is based on preprocessing, which equalizes the sparse directivity pattern by spectral division with corresponding directional rigid sphere transfer functions, resulting in a time-aligned and spectrally matched directivity pattern that has a significantly reduced spatial complexity. The directivity pattern is then transformed into the SH domain, interpolated to a dense grid by an inverse spherical Fourier transform and subsequently de-equalized by spectral multiplication with corresponding rigid sphere transfer functions. Based on measurements of a dummy head with an integrated mouth simulator, we compare this approach to reference measurements on a dense grid. The results show that the method significantly decreases errors of spatial undersampling and this allows a meaningful high-resolution voice directivity to be determined from sparse measurements.

Many natural and man-made signals including speech and music are well-modeled by Laplace distributions. Yet testing, evaluation, design, and simulation of devices and systems are often performed with a sine or noise with much different distributions. Such practice, while generally useful, can lead to erroneous estimates of system performance. Three novel methods each with several optimized variations are presented herein to generate continuous—and computable at arbitrary instants of time—signals with nearly Laplace distributions. Further, each method produces signals that are band-limited and thus do not require a low pass filter when used with sampled systems or limited-bandwidth channels. In the bargain, some distribution functions are presented that might not be widely known. Implementations are summarized in readily accessible form. Other distributions can also do well modeling the same signal types and the methods described can all be adapted to generate signals with these density distributions that are strongly peaked around the origin.

Standards and Information Documents

AES Standards Committee News


Engineering XR

Authors: Rumsey, Francis

[Feature] A number of significant challenges arise when attempting to engineer audio systems and processes for extended reality applications. Authors of papers presented at the recent AVAR conference have begun to find ways of representing the acoustics of virtual environments more accurately, such that objects, characters, and participants within them perceive sounds in a more believable way. There’s interesting evidence that the more accurately one renders the acoustics, the less bothered people are about the differences between real and virtual sounds. There's also the interesting problem of the competition for attention in mixed reality environments crowded with stimuli that the user may need to know about.

Call for Papers Special Issue on Internet of Sounds

Audio Engineering Society Educational Foundation 2020 Awardees


AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content