Journal of the Audio Engineering Society

2022 October - Volume 70 Number 10


Audio Augmented Reality (AAR) aims to augment people's auditory perception of the real world by synthesizing virtual spatialized sounds. AAR has begun to attract more research interest in recent years, especially because Augmented Reality (AR) applications are becoming more commonly available on mobile and wearable devices. However, because audio augmentation is relatively under-studied in the wider AR community, AAR needs to be further investigated in order to be widely used in different applications. This paper systematically reports on the technologies used in past studies to realize AAR and provide an overview of AAR applications. A total of 563 publications indexed on Scopus and Google Scholar were reviewed, and from these, 117 of the most impactful papers were identified and summarized in more detail. As one of the first systematic reviews of AAR, this paper presents an overall landscape of AAR, discusses the development trends in techniques and applications, and indicates challenges and opportunities for future research. For researchers and practitioners in related fields, this review aims to provide inspirations and guidance for conducting AAR research in the future.

Influence of Changes in Audio Spatialization on Immersion in Audiovisual Experiences

Authors: Agrawal, Sarvesh; Bech, Søren; De Moor, Katrien; Forchhammer, Søren


Understanding the influence of technical system parameters on audiovisual experiences is important for technologists to optimize experiences. The focus in this study was on the influence of changes in audio spatialization (varying the loudspeaker configuration for audio rendering from 2.1 to 5.1 to 7.1.4) on the experience of immersion. First, a magnitude estimation experiment was performed to perceptually evaluate envelopment for verifying the initial condition that there is a perceptual difference between the audio spatialization levels. It was found that envelopment increased from 2.1 to 5.1 reproduction, but there was no significant benefit of extending from 5.1 to 7.1.4. An absolute-rating experimental paradigm was used to assess immersion in four audiovisual experiences by 24 participants. Evident differences between immersion scores could not be established, signaling that a change in audio spatialization and subsequent change in envelopment does not guarantee a psychologically immersive experience.

Assessor Selection Process for Perceptual Quality Evaluation of 360 Audiovisual Content

Authors: Fela, Randy Frans; Zacharov, Nick; Forchhammer, Søren


For accurate and detailed perceptual evaluation of compressed omnidirectional multimedia content, it is imperative for assessor panels to be qualified to obtain consistent and high-quality data. This work extends existing procedures for assessor selection in terms of scope (360? videos with high-order ambisonic), time efficiency, and analytical approach, as described in detail. The main selection procedures consisted of a basic audiovisual screening and three successive discrimination experiments for audio (listening), video (viewing), and audiovisual using a triangle test. Additionally, four factors influencing quality of experience, including the simulator sickness questionnaire, were evaluated and are discussed. After the selection process, a confirmatory study was conducted using three experiments (audio, video, and audiovisual) and based on a rating scale methodology to compare performance between rejected and selected assessors. The studies showed that (i) perceptual discriminations are influenced by the samples, the encoding parameters, and some quality of experience factors; (ii) the probability of symptom occurrence is considerably low, indicating that the proposed procedure is feasible; and (iii) the selected assessors performed better in discrimination than the rejected assessors, indicating the effectiveness of the proposed procedure.

Interaural Time Difference Prediction Using Anthropometric Interaural Distance

Authors: Johansson, Jaan; Mäkivirta, Aki; Malinen, Matti; Saari, Ville


This paper studies the feasibility of predicting the interaural time difference (ITD) in azimuth and elevation once the personal anthropometric interaural distance is known, proposing an enhancement for spherical head ITD models to increase their accuracy. The method and enhancement are developed using data in a Head-Related Impulse Response (HRIR) data set comprising photogrammetrically obtained personal 3D geometries for 170 persons and then evaluated using three acoustically measured HRIR data sets containing 119 persons in total. The directions include 360° in azimuth and –15° to 60° in elevation. The prediction error for each data set is described, the proportion of persons under a given error in all studied directions is shown, and the directions in which large errors occur are analyzed. The enhanced spherical head model can predict the ITD such that the first and 99th percentile levels of the ITD prediction error for all persons and in all directions remains below 122 µs. The anthropometric interaural distance could potentially be measured directly on a person, enabling personalized ITD without measuring the HRIR. The enhanced model can personalize ITD in binaural rendering for headphone reproduction in games and immersive audio applications.

Buckling Dielectric Elastomer Transducers as Loudspeakers

Authors: Gareis, Michael; Maas, Jürgen

In recent decades, dielectric elastomers (DE) have emerged as a promising transducing principle for various applications. They promise to be lightweight, efficient, and affordable alternatives to conventional electrodynamic or piezoelectric transducers and show large deformations at fast rates. In this work a loudspeaker concept is proposed, which relies on the elastic instability of a DE membrane. A multilayered DE membrane is clamped in a circular ring. Upon applying a DC voltage, its area increases, and themembrane buckles up. A superimposed signal voltage induces vibration and generates sound. To model the device mechanically, a system of partial differential equations is derived from Hamilton's principle. The mechanical model is then coupled to the linear assumed electrical and acoustical domains. Static, dynamic, and acoustic experiments on buckling DE transducers of three different diameters (10, 15, and 20 mm) and different thicknesses (0.4mmto 0.6 mm) as multilayer configurations are conducted to validate the model. Sound pressure levels of about 70 dB above 1 kHz are reached. Small loudspeakers like this may find application in mobile or array systems.

Dual-Residual Transformer Network for Speech Recognition

Authors: Duan, Zhikui; Gao, Guozhi; Chen, Jiawei; Li, Shiren; Ruan, Jinbiao; Yang, Guangguang; Yu, Xinmei

The Transformer, an attention-based encoder-decoder network, has recently become the prevailing model for automatic speech recognition because of its high recognition accuracy. However, the convergence speed of the Transformer is not that optimal. In order to address this problem, a structure called Dual-Residual Transformer Network (DRTNet), which has fast convergence speed, is proposed. In DRTNet, a direct path is added in the encoder and decoder layers to propagate features with the inspiration of the structure proposed in ResNet. Moreover, this architecture can also fuse features, which tends to improve the model performance. Specifically, the input of the current layer is the integration of the input and output of the previous layer. Empirical evaluation of the proposed DRTNet has been conducted on two public datasets, which are AISHELL-1 and HKUST, respectively. Experimental results on these two datasets show that DRTNet has faster convergence speed and better performance.

Engineering reports

A Multi-Angle, Multi-Distance Dataset of Microphone Impulse Responses

Authors: Franco Hernández, Juan Carlos; Bacila, Bogdan; Brookes, Tim; De Sena, Enzo


A new publicly available dataset of microphone impulse responses (IRs) has been generated. The dataset covers 25 microphones, including a Class-1 measurement microphone and polar pattern variations for seven of the microphones. Microphones that were included had omnidirectional, cardioid, supercardioid, and bidirectional polar patterns; condenser, movingcoil, and ribbon transduction types; single and dual diaphragms; multiple body and head basket shapes; small and large diaphragms; and end-address and side-address designs.Using a customdeveloped computer-controlled precision turntable, IRs were captured quasi-anechoically at incident angles from 0? to 355? in steps of 5? and at source-to-microphone distances of 0.5, 1.25, and 5 m. The resulting dataset is suitable for perceptual and objective studies related to the incident-angle--dependent response of microphones and for the development of tools for predicting and emulating on-axis and off-axis microphone characteristics. The captured IRs allow generation of frequency response plots with a degree of detail not commonly available in manufacturer-supplied data sheets and are also particularly well-suited to harmonic distortion analysis.

Aluminum-Based Push-Pull Electrostatic MEMS Transducer for Earphones

Authors: Zamir, Aviad; Seiden, Gabriel; Kupershmidt, Haim

With the evolving market of true wireless stereo earphones and improvements in Bluetooth technology, wireless earphones have become a platform for innovation. Performance of such earphones ismeasured based on two main criteria: sound quality,which includes total harmonic distortion, and power consumption. Power consumption efficiency pertaining to such devices is critical in sustaining a good battery life. In this study, the design and fabrication of a novel aluminum-based push-pull electrostatic microelectromechanical systems transducer for earphones are presented. This device is designed to consume two orders of magnitude lower than a common earphone voice coil speaker and has a substantially higher quality of sound. Particularly, the authors elaborate on the underlying theoretical aspects pertaining to the design and on the unique fabrication challenges originating from the microscale nature.

Standards and Information Documents

AES Standards Committee News

Download: PDF (258.28 KB)

AES Standards Committee News

Download: PDF (258.28 KB)


Call for Papers: Special Issue - New Trends in Audio Effects II

Download: PDF (73.3 KB)

Call for Papers: Special Issue - New Trends in Audio Effects II

Download: PDF (73.3 KB)

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content