Journal of the Audio Engineering Society

2016 December - Volume 64 Number 12


Modeling of Rocking Modes in Electroacoustic Transducers

Authors: Klippel, Wolfgang; Cardenas, William

Rocking motion of the radiator is a severe problem in headphones, micro-speakers, and other kinds of loudspeakers. This causes voice coil rubbing, which limits the maximum acoustical output at low frequencies. Causes of this problem are small imbalances in the distribution of the stiffness, mass, and magnetic field in the gap. Modal analysis and lumped parameter modeling are used in this paper to explain the generation of rocking modes. This theory is the basis for a new measurement technique using laser vibrometry to quantify the rocking behavior and to identify the dominant root cause in the design or manufacturing process. Rocking modes radiate less sound than the piston mode because the tilting of the radiator generates positive and negative contributions to the total volume velocity. A new model describes the generation of the fundamental mode and the first two rocking modes. This model requires only three state variables (displacement and two tilting angles) and a minimum number of lumped parameters to describe the excitation of the three modes. Rocking behavior becomes very critical in transducers that do not have a spider, such as headphones or micro-speakers.

Root Cause Analysis of Rocking Modes

Authors: Cardenas, William; Klippel, Wolfgang

Most micro-speakers, headphones, and some cone loudspeakers exhibit undesired rotational vibration patterns called rocking modes. These are caused by inhomogeneous distribution of mass, stiffness, and force factor shifting the center of gravity, stiffness, and electrodynamic excitation away from the pivot point, which is the cross point of the nodal lines of the two rocking modes. This paper focuses on practical measurements of rocking modes using laser vibrometry, parameter identification, and root cause analysis. New characteristics are presented that simplify the interpretation of the identified parameters. Due to the high quality factor of the rocking resonators, only a very small asymmetrical force is required (which is usually a few percent of the transversal force) to generate a critical rocking behavior having more energy than the desired piston mode. Assessing the relative rocking level and identifying the imbalances is a convenient way to keep voice coil rubbing under control and to avoid impulsive distortion impairing the quality of the reproduced sound. A new technique was validated by numerical simulations and systematic modifications of a real transducer. The diagnostic value of the new measurement technique is illustrated on a transducer used in headphones.

Impulse Response Measurements using MLS Technique on Nonsynchronous Devices

Authors: Novak, Antonin; Rund, Frantisek; Honzik, Petr

Maximum-length sequences (MLS) are widely used for measurement of impulse responses of linear time-invariant systems in acoustic and audio systems. It is usually believed that one of the drawbacks of the MLS technique is a requirement of synchronized devices used for the generation and acquisition of the MLS signals. This study shows that the MLS technique can easily be improved and applied to devices that are not synchronous, or operate at different sampling frequencies. To show the efficiency of the proposed modification to MLS technique, the authors provided several experiments, as for example, a measurement with devices working at 44.1 kHz on the generation side and 96 kHz on the acquisition side, or a measurement of the frequency response function of an inexpensive mobile phone in which the synchronous clocking is not possible. Modifications to the classical MLS method are easy to implement and do not require excessive computational cost.


Previous research has shown that musical instruments have distinctive emotional characteristics and that these characteristics can be significantly changed with reverberation. This research examines if the changes in character are relatively uniform or dependent on the instrument. A comparison of eight sustained instrument tones with different amounts and lengths of simple parametric reverberation over eight emotional characteristics was performed. The results showed a remarkable consistency in listener rankings of the instruments for each of the different types of reverberation with strong correlations ranging from 90 to 95%. This indicates that the underlying instrument space for emotional characteristics does not change significantly with reverberation. Each instrument has a particular footprint of emotional characteristics. Tested instruments cluster into two fairly distinctive groups: those where the positive energetic emotional characteristics are strong (e.g., oboe, trumpet, violin), and those where the low-arousal characteristics are strong (e.g., bassoon, clarinet, lute, horn). The saxophone was an outlier, and is somewhat strong for most emotional characteristics.

Two subjective experiments were conducted to examine a new vertical image-rendering method called Perceptual Band Allocation (PBA), using octave bands of pink noise presented from main and height loudspeaker pairs. The PBA attempts to control the perceived degree of vertical image spread (VIS) by a flexible mapping between frequency band and loudspeaker layer based on the desired positioning of the band in the vertical plane. The first experiment measured the perceived vertical location of the phantom image of octave-band stimuli for the main and height loudspeaker layers individually. Results showed significant differences among the frequency bands in perceived image location. Based on the localization data from this experiment, six different PBA stimuli were created in such a way that each frequency band was mapped to either the main or height loudspeaker layer depending on the target degree of VIS. The second experiment conducted a listening test to grade the perceived magnitudes of VIS for the six stimuli. The results indicated that PBA could significantly increase the perceived magnitude of VIS compared to that of a sound presented only from the main layer. It was also found that the different PBA schemes produced various degrees of perceived VIS with statistically significant differences.

Physical and Perceptual Comparison of Real and Focused Sound Sources in a Concert Hall

Authors: Garí, Sebastià V. Amengual; Pätynen, Jukka; Lokki, Tapio

Concert hall acoustics have been traditionally evaluated by means of room acoustic measurements and perceptual studies, which requires an available concert hall, orchestra, and audience. This article presents a physical and perceptual comparison of room acoustics between arrays of real loudspeakers and virtual loudspeakers implemented with Wave Field Synthesis in a concert hall. The physical comparison comprises time-frequency and spatiotemporal analyses of the spatial room impulse responses measured through the two reproduction methods. Perceptual comparisons are based on formal listening tests performed both in-situ and in laboratory conditions, using anechoic classical music recordings as excitation signals. The results indicate that Wave Field Synthesis yields a slower build-up and a spatially more distributed direct sound than real loudspeakers. Perceptually, Wave Field Synthesis presents brighter sound and a wider and more enveloping spatial sound image, as well as higher preference in the studied concert hall.

Using nonindividualized HRTFs in virtual audio synthesis produces front-back confusions, up-down reversals, in-head localization, and timbral coloration. Elevation and frontal localization are found to be most affected. In contrast, obtaining individualized HRTFs is a tedious process that involves complex acoustical measurements for each individual. Having a model of HRTF that does not involve tedious acoustical measurements would make the process much easier. In this research, individualization of the median plane HRTFs is explored using frontal projection headphones with a spherical head model because the frontal positioning of the headphone transducer inherently captures the idiosyncratic frontal spectral cues. To create the HRTFs, the important peaks (P1) and notches (N1, N2) are extracted first from the frontal headphone response and then shifted in frequency in accordance with the elevation angle. Detailed subjective experiments indicated that subjects were able to localize the virtual sound sources accurately with modeled HRTFs with results similar to individualized HRTFs.

Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification

Authors: Vrysis, Lazaros; Tsipas, Nikolaos; Dimoulas, Charalampos; Papanikolaou, George

The task of general audio detection and segmentation is quite common in contemporary audio applications where computationally intensive processes are frequently involved. Machine learning is usually employed along with user-enabled data labeling that is intended to detect, segment, and semantically annotate the relevant audio events. This work focuses on a generic audio detection and classification method that combines hierarchical bimodal segmentation with hybrid pattern classification at different temporal resolutions. This paper presents the algorithmic perspective of a mobile back-end system to facilitate the construction, validation, and continuous update of generic audio ground-truth data. The goal is the implementation of a system that is capable of performing well in different conditions without relying on complicated pattern recognition systems and taxonomies. For this reason, minimal prior knowledge is necessary so that there is consistent behavior for different input signals and computational environments. Novel “classification confidence” metrics are implemented.


[Feature] The workflows, tools, and renderers for virtual reality production are rapidly evolving. There are many different paths to a similar end, and yet little in the way of agreement about how best to pass content through the chain in the most convenient and seamless way. Despite that, the level of enthusiasm and creativity in the field is considerable, with a lot of work going on to address these issues. Sophisticated commercial solutions are emerging that aim to handle the connections between creative control of sound scenes and their technical implementation.

141st Convention Report, Los Angeles

141st Convention Exhibitors and Sponsors

Call for Awards Nominations

Call for Nominations for Board of Governors

3rd International Conference on Sound Reinforcement'Open Air Venues, Call for Contributions, Struer

2017 International Conference on Automotive Audio, Call for Contributions, San Francisco

141st Convention Papers Abstracts, Los Angeles

Index to Volume 64


AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content