Journal of the Audio Engineering Society

2006 September - Volume 54 Number 9


Relatively little is known about complex auditory events caused by multiple simultaneous sources. In order to gain insight into this topic, the perception of wide-band (200–1179-Hz) noise and click train stimuli was examined with subjective tests focusing on perceived spatial distribution. By reproducing different frequency bands of the stimuli from loudspeakers at different azimuth directions, the spatial content of the overall stimulus was varied in 15 test cases. The subjects were required to indicate those loudspeakers that they perceived as radiating sound in each case. The results suggest that the highest and lowest frequencies of the stimuli were more perceptually significant than the middle frequency region. The test cases were never perceived as being more than half the actual width of the source ensemble. The order of the critical-band signals in the loudspeaker setup had a minor effect on the overall width. When a click train stimulus was used instead of continuous noise, the perceived width was reduced significantly. Cross-correlation-based auditory modeling techniques were also examined for their ability to predict the subjective results and were found to be not entirely suitable for the purpose.

The identification of relevant auditory attributes is pivotal in sound quality evaluation. Two fundamentally different psychometric methods were employed to uncover perceptually relevant auditory features of multichannel reproduced sound. In the first method, called repertory grid technique (RGT), subjects were asked to assign verbal labels directly to the features when encountering them, and to subsequently rate the sounds on the scales thus obtained. The second method, perceptual structure analysis (PSA), required the subjects to consistently use the perceptually relevant features in triadic comparisons, without having to assign them a verbal label; given sufficient consistency, a lattice representation—as frequently used in formal concept analysis (FCA)—can be derived to depict the structure of auditory features.

Two processing approaches that enable perceptually compelling modification of audio signals via accentuation or suppression of transients are described. The first algorithm uses a frequency-domain analysis and a soft-decision paradigm to characterize transients in the signal; the transient characterization is then used to drive a nonlinear frequency-domain modification. In the second algorithm the modulation spectrum of the audio signal is manipulated by modifying the time trajectories of spectral envelopes in different frequency bands; scaling of higher modulation frequencies with shelving filters is used to modify rapidly changing signal events, thus altering transient components without the need for explicit detection. The algorithms are described in detail and it is demonstrated that they can achieve substantial modification of a signal’s perceptual attributes without introducing significant artifacts.

Dithering and noise power modulation in low- and and high-order oversampled sigma– delta modulators is investigated. Previous publications have presented theoretical analyses of quantizer distortion and noise power modulation in undithered and dithered LPCM quantizers and low-order sigma–delta modulators. However, simulations on practical implementations tend to document only distortion and idle tones and pay no attention to noise power modulation. Consequently there has been some dissension on the requirements for dithering of higher order sigma–delta modulators. Functional simulations of individual error moments in register transfer level models of dithered and undithered sigma–delta modulators are discussed, including first-order and realistic high-order examples. The fundamental difference between 1-bit and multibit quantization is addressed, and two modern techniques for improving 1-bit performance—dynamic dithering and trellis noise shaping—are investigated. Baseband noise power modulation performance is emphasized and shown explicitly for each example since the baseband, although containing only a small portion of the total noise power, is the practical region of interest for oversampled devices. This discussion should provide a pragmatic context in the debate on dither requirements and how to analyze and achieve good noise power modulation performance.

[Feature Article] Two recent AES convention workshops have shown ways in which new audio technology is being integrated with the next generation of communications systems. At the AES 119th Convention in New York last year Gerhard Schuller led a panel of experts who discussed issues related to coding and networking. At the AES 120th Convention in Paris in May, Jyri Huopaniemi opened up the exciting topic of 3-D audio for mobile communications. In this article we present some highlights of these two events.

Standards and Information Documents

AES Standards Committee News


28th Conference Report, Piteå, Sweden

Next Generation of Audio Communications

Education News

122nd Convention, Vienna, Call for Papers

31st Conference, London, Call for Papers


Membership Information

Advertiser Internet Directory

Sections Contacts Directory

AES Conventions and Conferences

News of the Sections

Upcoming Meetings

Available Literature

New Products and Developments


Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content