Journal of the Audio Engineering Society

2017 June - Volume 65 Number 6


Perceptual Evaluation of Individualized Binaural Reproduction Using a Virtual Artificial Head

Authors: Rasumow, Eugen; Blau, Matthias; Doclo, Simon; Par, Steven van de; Hansen, Martin; Püschel, Dirk; Mellert, Volker

In binaural recordings, spatial information can be captured by using an artificial head that emulates a real human head having average anthropometric geometries and with ear microphones. Because artificial heads are generic without the individual characteristics of the actual listener, recordings often produce perceptual deficiencies such as front-back confusion and internalized source images. Alternatively, individually measured head-related transfer functions (HRTFs) can be approximately synthesized using a microphone array in conjunction with a filter-and-sum beamformer, called a virtual artificial head (VAH). This approach allows for the possibility of adapting a recording to an individual’s HRTF in the recording studio by an appropriate modification of the directivity pattern of the VAH. In this study, binaural reproductions using the VAH, two traditional artificial heads, and individual HRTFs were perceptually evaluated in the horizontal plane with respect to the original free-field presentation. The results show that individual HRTFs in conjunction with individually equalized headphone transfer function result in the best subjective appraisals. The ratings obtained for the VAH-setup indicate a high level of acceptance among the subjects. Mean ratings were often good to excellent.

Robust Acoustic Contrast Control with Reduced In-situ Measurement by Acoustic Modeling

Authors: Zhu, Qiaoxi; Coleman, Philip; Wu, Ming; Yang, Jun


Personal audio systems generate a local sound field for a listener while attenuating the sound energy in predefined quiet zones. In practice, system performance is sensitive to errors in the acoustic transfer functions between the sources and the zones. In this paper, a design framework for robust reproduction is proposed that combines transfer functions and error modeling. The framework allows a physical perspective on the regularization required for a system that is based on there being a bound on the assumed additive or multiplicative errors, which is obtained by acoustic modeling. Acoustic contrast control is separately combined with worst-case and probability-model optimization, exploiting limited knowledge of the potential error distribution. Monte-Carlo simulations show that these approaches give increased system robustness compared to the state-of-the-art approaches for regularization parameter estimation. Experimental results verify that robust sound zone control can be achieved in the presence of loudspeaker gain errors. In addition, to simplify the approach, in-situ transfer function measurements were reduced to a single measurement per loudspeaker per zone with limited acoustic contrast degradation of less than 2 dB over 100–3000 Hz compared to the fully measured regularized case.

Music Thumbnailing for Radio Podcasts: A Listener Evaluation

Authors: Mehrabi, Adib; Harte, Chris; Baume, Chris; Dixon, Simon


When radio podcasts are produced from previously broadcast material, thumbnails of songs that were featured in the original program are often included. Such thumbnails provide a summary of the music content. Because creating thumbnails is a labor-intensive process, this is an ideal application for automatic music editing, but it raises the question of how a piece of music can be best summarized. Researchers asked 120 listeners to rate the quality of thumbnails generated by eight methods (five automatic and three manual). The listeners were asked to rate the editing methods based on the song part selection and transition quality in the edited clips, as well as the perceived overall quality. The listener ratings showed a preference for editing methods where the edit points were quantized to bar positions, but there was no preference for whether the chorus was included or not. Ratings for two automatic editing methods were not significantly different from their manual counterparts. This suggests that automatic editing methods can be used to create production-quality thumbnails.

This research describes a sound field synthesis method that reconstructs a desired sound field within an extended listening area by taking into account psychoacoustic perceptual constraints. The proposed approach covers the complete system: (a) measuring the radiation characteristics by means of microphone array technique; (b) storing the radiation characteristics in a database, (c) propagating an arbitrary source sound toward an extended listening area by considering sources as complex point sources, and (d) reconstructing this sound field with a loudspeaker array by solving a linear equation system for discrete listening points that sample the listening area. By capturing and reconstructing the sound radiation characteristics of musical instruments, a spatial sound impression can be created. Psychoacoustic considerations are implemented to allow for wave fronts arriving from different angles and at different points in time while maintaining precise source localization and a natural and spatial sound impression. Furthermore, the psychoacoustic considerations reduce the computational costs, as illustrated by solving the linear equations for only 25 selected frequencies. Strengths and weaknesses and benefits and limitations of the psychoacoustic sound field synthesis approach are investigated in a listening test. A simulation demonstrates that the approach is valid up to a critical spatial frequency that is given by the distribution of the listening points.

Engineering reports

Although electrostatic loudspeakers (ESL) are renowned for their mid- and high-frequency clarity and coherence, the acoustic behavior of ESL assemblies at high audio frequencies is not well understood. This paper collates acoustic models of all ESL components, including an improved model of perforated-plate stators, and compiles a high-frequency model of the complete ESL assembly. The ESL model includes the membrane, the two perforated-plate stators, the damping cloth, and any grills or dustcovers. The collective behavior of the components is found to be very different from the sum of the effects of each component because of the reflections that occur between the surfaces of all of the components. Therefore, the response of an ESL at high frequencies cannot be determined solely from the attributes of the components taken in isolation. The model shows that the high-frequency response of the ESL is dominated by the effects of inter-component reflections. The distances between the various reflecting surfaces and the overall thickness of the assembly should be minimized to ensure the peaks and dips in the response occur at the highest frequencies. Despite its complexity, the model is easily evaluated numerically with SPICE software, and comparisons with measurements show the model provides a good guide for ESL design. This paper demonstrates a method for measuring the transparency.

Dialogue Channel Control for 22.2 Multichannel Sound Broadcasting: Broadcast Chain Scheme and Subjective Evaluation of Effectiveness

Authors: Sugimoto, Takehiro; Nakayama, Yasushige; Komori, Tomoyasu; Chinen, Toru; Hatanaka, Mitsuyuki

The 22.2 multichannel sound system is an advanced sound system composed of 24 channels located in a three-dimensional space that envelops listeners in an immersive sound field. It is the audio system used in the new Japanese 8K broadcasting launched in 2016. The basis of this proposal is to separate the 22.2 channels into two groups: dialogue channels and background sound channels with each group being controlled separately. To improve the listener experience, dialogue channel control is being proposed for two functions: (1) dialogue audibility can be enhanced by changing the level balance between the dialogue channels and background sound channels; and (2) dialogue replacement allows for selecting an alternative to substitute for the standard dialogue. Subjective listening tests verified the usefulness of the dialogue enhancement. The dialogue channel control was constructed using the syntax of Moving Picture Experts Group (MPEG)-4 advanced audio coding (AAC) technology, and this scheme has been approved as a domestic standard.

Standards and Information Documents

AES Standards Committee News


[Feature] The factors affecting the plausibility of binaural synthesis are complicated and rather context dependent. Low-cost head trackers may be usable for binaural synthesis if chosen with care. The room divergence effect is quite powerful, and head tracking does not seem to be able to overcome it fully. HRTFs turn out to be more complex but more symmetrical in the horizontal plane than the vertical.

2017 Conference on Sound Reinforcement - Open Air Venues Preview, Struer

2017 Automotive Audio Conference Preview, San Francisco


AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content