Journal of the Audio Engineering Society

2012 September - Volume 60 Number 9


MPEG Spatial Audio Object Coding—The ISO/MPEG Standard for Efficient Coding of Interactive Audio Scenes

Authors: Herre, Jürgen; Purnhagen, Heiko; Koppens, Jeroen; Hellmuth, Oliver; Engdegård, Jonas; Hilper, Johannes; Villemoes, Lars; Terentiv, Leon; Falch, Cornelia; Hölzer, Andreas; Valero, María Luis; Resch, Barbara; Mundt, Harald; Oh, Hyen-O

In 2010 the ISO/MPEG Audio standardization group issued the Spatial Audio Object Coding (SAOC) specification to define technology for parametric low bit-rate coding of audio object signals with a mono or stereo downmix. This paper provides an overview of MPEG SAOC technology, discussing recent verification tests. The authors examine operation modes for typical application scenarios by taking advantage of object-based processing. Most important, SAOC enables transmission of multi-object signals at data rates of the same order of magnitude as those used to represent two-channel audio. The important application scenarios are envisaged to be high-quality spatial teleconferencing, personal audio, interactive gaming, and rich media. Because the SAOC representation is independent of any particular loudspeaker setup, SAOC signals can be rendered efficiently on either a target loudspeaker configuration or portable device.

Because the spectral envelope of a sound is a crucial aspect of timbre perception, the authors propose a quantitative model of spectral envelope perception using a set of orthogonal basis functions, analogous to the three primary colors in vision. The goal is find a quantitative mapping between the physical description of the spectral envelope and its perception. This allows for a meaningful and reliable way of controlling timbre in sonification. This paper presents a quantitative metric to describe the multidimensionality of spectral envelope perception, i.e., the perception that is specifically related to the spectral element of timbre. Mel-frequency cepstral coefficients (MFCC) were chosen as a metric for spectral envelope perception because of their linearity, orthogonality, and multidimensionality. Quantitative data from two experiments illustrate the linear relationship between the subjective perception of spectrally-varied synthetic sounds and the MFCC.

Acoustic Detection of Human Activities in Natural Environments

Authors: Ntalampiras, Stavros; Potamitis, Ilyas; Fakotakis, Nikos

Automatic recognition of sound events can be valuable for efficient analysis of audio scenes. For example, detecting human activities like trespassing and hunting in natural environments can play an important role in their preservation by alerting authorities to take action. In the proposed system, each sound class is represented by a hidden Markov model created from descriptors in the time, frequency, and wavelet domains. The system has the ability to automatically adapt to acoustic conditions of different scenes via the feedback loop that refines an unsupervised model. A reliable testing process was adopted for assessing the performance of the system under adverse conditions characterized by highly nonstationary environmental noise.

To augment the task of navigation and orientation of blind individuals, a new travel aid uses 3D scene sonification to present information about the environment using nonverbal audio. The model is composed of two classes of objects: obstacles and planes. The algorithm uses scene image segmentation, personalized spatial audio, musical tones, and sonar-like sound patterns. Individually measured head-related transfer functions were used to provide users with the illusion of sounds originating from the locations of sonified scene elements. Using a segmented and parametric description overcomes the sensory mismatch between visual and auditory perception. In a pilot study using both blind and sighted volunteers, subjects were able to utilize the prototype for spatial orientation and obstacle avoidance after a few minutes of training, attaining 90% accuracy in estimating the direction and depth of obstacles.

Because the human brain is often optimal for detecting subtle patterns, this paper explores a novel transformation that maps numerical data into sound. In this research, a set of data taken from head-related transfer functions was used to create physical objects (bells made from stainless steel) whose acoustics were then presented to listeners. The technique is called acoustic sonification. Listeners were able to hear differences in pitch and timbre of bells that were constructed from different datasets, while bells constructed from similar datasets sounded similar. Modulating the shape of a bell with a dataset can influence the acoustic spectrum in a way that results in audible differences |even though there was no apparent visual difference. Acoustic sonification can take advantage of auditory pattern recognition.

Standards and Information Documents

AES Standards Committee News


Audio Bit Rates

Authors: Rumsey, Francis

[Feature] For many years now low bit-rate coding has remained a hot topic in audio research and development. It would not be understating the case to say that advances in this field, together with the increasing ubiquity of the Internet, were primarily responsible for the revolution in the digital music market. One might wonder, therefore, if there is anything left to do or say about the topic, yet it continues to result in innovations and enhanced standards, as well as new products and licensing opportunities.

[Feature] The interface between microphones and microphone inputs has special characteristics and requires special attention. The low output levels of microphones and the possible need for long cables have made it necessary to think about noise and interference of all kinds. A microphone input is also the electrical load for a microphone and can have an adverse influence on its performance. Condenser microphones contain active circuitry that requires some form of powering. With the introduction of transistorized circuitry in the 1960s, it became practical for this powering to be incorporated into microphone inputs. Various methods appeared in the beginning; 48-volt phantom powering is now dominant, but this standard method is still not always implemented correctly.

46th Conference Report, Denver

134th Call for Papers and Engineering Briefs, Rome

52nd Call for Contributions, Guildford


Products and Developments

Advertiser Internet Directory

Membership Information

Section Contacts Directory

AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content