Journal of the Audio Engineering Society

2006 March - Volume 54 Number 3


Continuous- and discrete-time switching audio power amplifiers are studied both with and without feedback. Pulse-width modulation (PWM) and sigma–delta modulation (SDM) amplifier configurations are simulated and their interrelationship is described using linear phase modulation (LPM) and linear frequency modulation (LFM). Distortion generation encountered when applying negative feedback to PWM is demonstrated and strategies to improve linearity are presented. Recent innovations in SDM coding and output-stage topologies using pulse-shaping techniques are reviewed with emphasis on stable, low-distortion operation, especially under high-level signal excitation. A simplified low-latency variant of predictive SDM with step back is introduced, which together with dynamic compression of the state variables extends stable operation to a modulation depth of unity, thus allowing SDM to compete with PWM power amplifiers in terms of peak signal capability.

The correspondence of various spectral difference error metrics to human discrimination data was investigated. Time-varying harmonic amplitude data were obtained from the spectral analysis of eight musical instrument sounds (bassoon, clarinet, flute, horn, oboe, saxophone, trumpet, and violin). Sounds were resynthesized with various levels of random spectral alteration, ranging from 1 to 50%. Listeners were asked to discriminate the randomly altered sounds from reference sounds resynthesized from the original data. Then several formulas designed to predict discrimination performance were evaluated by calculating the correspondence between the discrimination data and the associated spectral difference measurements. Averaged over the eight instruments, the best correspondence was achieved using a spectral error metric based on linear harmonic amplitude differences normalized by rms amplitude and raised to a power a. While an optimum correspondence of 91% was achieved for a 0.64, good correspondence occurred over a wide range of a. For linear harmonic amplitudes without rms normalization, good correspondence occurred within a narrower range, with a maximum correspondence of 88%. Correspondence was approximately 80% for decibelamplitude differences over an even narrower range. Other error metrics such as those based on critical-band grouping of components worked well but did not give any improvement over the method based on harmonic amplitudes, and in some cases yielded worse results. Spectral differences using a small number of representative frames emphasizing attack and decay transients yielded results slightly better than using all frames.

An advanced numerical model of a pressure condenser microphone capsule is presented. The acoustic space is divided into internal and external domains, with both domains dynamically coupled to the condenser diaphragm motion. The external acoustic domain is modeled using the boundary-element (BE) method, which allows the capsule surface to take an arbitrary geometry. The internal acoustic domain (both the viscous air film and the back chamber) is modeled as coupled cylindrical cavities with negligible axial pressure variation. The diaphragm is modeled as a circular tensioned membrane with negligible bending stiffness. Flow through the back plate is modeled by annular arrays of circular pores with generalized functions locating each pore position. Although the presented model is specialized for a simple pressure condenser microphone, the numerical implementation is sufficiently generic to allow for a large variation in capsule parameters. The complete model is used to generate a simulated response curve, which is compared to a response curve taken from an experimental prototype. The results show excellent agreement throughout the measured frequency range, indicating that this new coupled model may be used for advanced microphone characterization and design.

Bit-Rate Scalable Intraframe Sinusoidal Audio Coding Based on Rate-Distortion Optimization

Authors: Heusdens, Richard; Jensen, Jesper; Kleijn, W. Bastiaan; Kot, Valery; Niamut, Omar A.; Van De Par, Steven; Van Schijndel, Micholle H.

A coding methodology that aims at rate-distortion optimal sinusoid + noise coding of audio and speech signals is presented. The coder divides the input signal into variable-length time segments and distributes sinusoidal components over the segments such that the resulting distortion (as measured by a perceptual distortion measure) is minimized subject to a prespecified rate constraint. The coder is bit-rate scalable. For a given target bit budget it automatically adapts the segmentation and distribution of sinusoids in a rate-distortion optimal manner. The coder uses frequency-differential coding techniques in order to exploit intrasegment correlations for efficient quantization and encoding of the sinusoidal model parameters. This technique makes the coder more robust toward packet losses when used in a lossy-packet channel environment as compared to time-differential coding techniques, which are commonly used in audio or speech coders. In a subjective listening experiment the present coder showed similar or better performance than a set of four MPEG-4 coders operating at bit rates of 16, 24, 32, and 48 kbit/s, each of which was state of the art for the given target bit rate.

[Feature Article] The challenge of stereophonic sound recording and reproduction has always been one of how to deliver a convincing sound stage, localizable source images, and a satisfactory sense of space from as small a number of loudspeakers as possible. All sorts of psychoacoustic tricks are employed in this process, because it is not possible to render an acoustically accurate soundfield over a wide range of listening positions with a small number of loudspeakers. Aesthetic judgments, often disregarded in the quest for scientific correctness, also have a large part to play in the choice of an appropriate solution. Practical stereophonic recording approaches have employed various compromises between the different requirements, and this remains true with modern 5.1 techniques. It is still difficult, for example, to come up with a single microphone array that will capture all of the cues required to deliver accurate phantom imaging of individual sources at the same time as satisfyingly spacious reverberation. Most often this is overcome in practice by using hybrid techniques that involve panned spot microphones in addition to or instead of a main front array, as well as either artificial reverberation or some sort of spaced array to pick up decorrelated natural reverberation. Notwithstanding this there are still attempts to devise “purist” main microphone arrays that are claimed to do it all.

Standards and Information Documents

AES Standards Committee News


12th Tokyo Regional Convention Report



120th Convention Preview, Paris

   Exhibit Previews

Stereophonic Recording Techniques: Old Challenges, New Approaches

29th Conference, Seoul, Call for Papers

121st Convention, San Francisco, Call for Papers


News of the Sections

Upcoming Meetings

Available Literature

Membership Information

Advertiser Internet Directory

Sections Contacts Directory

AES Conventions and Conferences


Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content