Journal of the Audio Engineering Society

2004 July/August - Volume 52 Number 7/8


Predicting the Perceived Quality of Nonlinearly Distorted Music and Speech Signals

Authors: Tan, Chin-Tuan; Moore, Brian C. J.; Zacharov, Nick; Mattila, Ville-Veikko

In a previous study perceptual experiments were reported in which subjects had to rate the perceived quality of speech and music that had been subjected to various forms of nonlinear distortion. The subjective ratings were compared to a physical measure of distortion, DS, based on the output spectrum of each nonlinear system in response to a 10-component multitone test signal with logarithmically spaced components. The values of DS were highly negatively correlated with the subjective ratings for stimuli that had been subjected to "artificial" distortions such as peak clipping and zero clipping. However, for stimuli that had been subjected to nonlinear distortion produced by real transducers, the correlation between the DS values and the subjective ratings was only moderately negative. A new method predicts the perceived quality of nonlinearly distorted signals based on the outputs of an array of gammatone filters in response to the original signal and the distorted signal. For each filter, the cross correlation is calculated between the outputs in response to the original and the distorted signals for a series of brief samples (frames). The maximum value of the cross correlation for each filter for each frame is determined, and the maximum values are summed across filters, with a weighting that depends on the magnitude of the output of each filter in response to the distorted signal. The resultant weighted cross correlation gives a perceptually relevant measure of distortion called Rnonlin, which can be used to predict subjective ratings. There were high correlations between the predicted ratings and the subjective ratings obtained previously. The correlations were greater than obtained using the DS measure. A new perceptual experiment, using a mixture of artificial and real distortions, confirmed the validity of the new measure.

A new concept for multichannel loudspeakers is introduced for application in wave field synthesis (WFS). It is a multi-actuator panel (MAP), which consists of a damped acoustic radiation panel with a number of exciters that are used to generate the WFS wave field. It is first shown that distributed-mode loudspeaker (DML) technology can be applied successfully to WFS. There are, however, some drawbacks to apply standard DML panels for this application. This led to the development of the MAPs. It is shown from theory and is confirmed by measurements that these panels can be designed in such a way that they are ideally suited for WFS sound reproduction.

Hierarchical Automatic Audio Signal Classification

Authors: Burred, Juan José; Lerch, Alexander

The design, implementation, and evaluation of a system for automatic audio signal classification is presented. The signals are classified according to audio type, differentiating between three speech classes, 13 musical genres, and background noise. A large number of audio features are evaluated for their suitability in such a classification task, including MPEG-7 descriptors and several new features. The selection of the features is carried out systematically with regard to their robustness to noise and bandwidth changes, as well as to their ability to distinguish a given set of audio types. Direct and hierarchical approaches for the feature selection and for the classification are evaluated and compared.

A Frequency-Domain Approach to Multichannel Upmix

Authors: Avendano, Carlos; Jot, Jean-Marc

A series of upmixing techniques for generating multichannel audio from stereo recordings are proposed. The techniques use a common analysis framework based on a comparison between the short-time Fourier transforms of the left and right stereo signals. An interchannel coherence measure is used to identify time-frequency regions consisting mostly of ambience components, which can then be weighted via a nonlinear mapping function, and extracted to synthesize ambience signals. A similarity measure is used to identify the panning coefficients of the various sources in the mix in the time-frequency plane, and different heuristic mapping functions are applied to unmix (extract) one or more sources, and perceptually based functions to repan the signals into an arbitrary number of channels. We illustrate the application of the various techniques in the design of a two-to-five channel upmix system.

Engineering reports

A Frequency-Domain Approach to Multichannel Upmix

Standards and Information Documents

AES Standards Committee News


116th Convention Report, Berlin



117th Convention Preview, San Francisco


     Exhibit Previews

Education News


News of the Sections

Available Literature

Upcoming Meetings

Membership Information

Advertiser Internet Directory

Sections Contacts Directory

AES Conventions and Conferences


Cover & Sustaining Members List

VIP List & Editorial Staff

AES 116th Convention Papers and CD-ROM

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content