Journal of the Audio Engineering Society

2012 January/February - Volume 60 Number 1/2


A Layer Model of Sound Quality

Authors: Blauert, Jens; Jekosch, Ute

Sound quality is a complex and multilayered phenomenon. When analyzing or modeling the formation process of sound-quality judgments, a variety of quality elements and quality featureshave to be taken into account, whereby the actual relevance and salience of each of them is situation dependent. In the following we present some ideas with the aim of structuring the quality-formation process into different layers according to the amount of abstraction involved. Depending on the amount of abstraction, different sets of references and evaluation and assessment methods have to be employed.

Evaluating the sound quality of a vehicle is a complex process. Physical and psychoacoustical measures cannot sufficiently describe this process with only superficial cues. Customer quality evaluation is based on their perceptions, interpretations, and expectations. This study generated a semantic space for vehicle sound. In other words, we elicited numerous attributes related to the perception and quality of vehicle sound. We sought to determine customers’ common language that appropriately describes vehicle sound quality. This study developed and applied a novel systematic approach, which includes a free verbalization interview, a test of participants’ understanding of acoustic attributes, and participant evaluation of the ability of these attributes to describe perceptible vehicle sound properties. In this manner we created a complete semantic database to describe vehicle sounds and testing the relevance and redundancy of these attributes. At the end of the investigation we developed two sets of 28 attributes for interior and exterior driving conditions.

Emoacoustics: A Study of the Psychoacoustical and Psychological Dimensions of Emotional Sound Design

Authors: Asutay, Erkin; Västfjäll, Daniel; Tajadura-Jiménez, Ana; Genell, Anders; Bergman, Penny; Kleiner, Mendel

Even though traditional psychoacoustics has provided indispensable knowledge about auditory perception, it has, in its narrow focus on signal characteristics, neglected listener and contextual characteristics. To demonstrate the influence of the meaning the listener attaches to a sound in the resulting sensations we used a Fourier-time-transform processing to reduce the identifiability of 18 environmental sounds. In a listening experiment, 20 subjects listened to and rated their sensations in response to, first, all the processed stimuli and then, all original stimuli, without being aware of the relationship between the two groups. Another 20 subjects rated only the processed stimuli, which were primed by their original counterparts. This manipulation was used in order to see the difference in resulting sensation when the subject could tell what the sound source is. In both tests subjects rated their emotional experience for each stimulus on the orthogonal dimensions of valence and arousal, as well as perceived annoyance and perceived loudness for each stimulus. They were also asked to identify the sound source. It was found that processing caused correct identification to reduce substantially, while priming recovered most of the identification. While original stimuli induced a wide range of emotional experience, reactions to processed stimuli were emotionally neutral. Priming manipulation reversed the effects of processing to some extent. Moreover, even though the 5th percentile Zwickers-loudness (N5) value of most of the stimuli was reduced after processing, neither perceived loudness nor auditory-induced emotion changed accordingly. Thus indicating the importance of considering other factors apart from the physical sound characteristics in sound design.

The perceived sound quality of small loudspeaker systems with and without digital optimization was empirically evaluated in a listening experiment. Further, it was investigated how the presentation order in the performed paired comparisons influenced the results, as well as whether a self-evaluation was of potential use for variance reduction. The systems were optimized by means of FIR filters. The two versions of each loudspeaker system were rated in a paired comparison test for music stimuli. For the purpose of analysis a linear Gaussian model was applied, resulting in an interval scale revealing interesting information about certainty and discrimination ability of the listeners. The test investigated whether linear pre-compensation of small and inexpensive loudspeaker systems results in a significant improvement of the perceived audio quality in a typical listening situation. The results indicated a significant preference for the optimized version and a significant dependency on the presentation order was detected. The self-evaluation was found to be uncorrelated to the test results.

In our daily lives, we usually perceive an event via more than one sensory modality (e.g., vision, hearing, touch). Therefore, multimodal integration and interactions play an important role when we use objects and for event recognition in our environment. A virtual environment (VE) is a computer simulation of a realistic-looking and interactive world. VEs should take into account the multisensory nature of humans and communicate with the user not only through vision but also through other modalities. In addition to vision, hearing and touch are the most commonly used communication channels. Recently, a variety of products with additional tactile input and output capabilities have been developed (e.g., Apple iPhone and other touch-screen devices, NintendoWii, etc.). Some of these devices provide new possibilities for interacting with a computer, including the auditory modality. Binaural synthesis and rendering are becoming key technologies for multimedia products. Virtual environments are no longer limited to academic research; they have commercial applications, particularly in medicine, game, and entertainment industries. Thus, the quality of VEs is becoming increasingly important. User interaction with a VE is a key issue in the perception of its quality. Several studies have discussed the quality of displays, input and output devices (for different modalities) as well as software and hardware issues; however, multimodal user interaction should also be examined. This paper focuses on the parameters that influence the quality of audio-tactile VEs.

Touch the Sound: Audio-Driven Tactile Feedback for Audio Mixing Applications

Authors: Merchel, Sebastian; Altinsoy, M. Ercan; Stamm, Maik

In this study experiments were conducted to determine if a person could distinguish percussive audio loops by their fingertips using audio-driven tactile feedback. The audio signal was adapted to generate a vibration signal (tactile feedback) taking into account the limited capabilities of the tactile modality. A systematic approach to find the different adaptation parameters is discussed. The vibrations were created by an electrodynamic shaker mounted behind a touch-sensitive screen. Results indicate percussive loops are best distinguished if the source features (e.g., frequency spectrum) and sequence features (e.g., rhythm) are maintained.

The headphone transfer function (HpTF) is a major source of spectral coloration observable in binaural synthesis. Filters for frequency response compensation can be derived from measured HpTFs. Therefore, we developed a method for measuring HpTFs reliably at the blocked ear canal. Subsequently, we compared non-individual dynamic binaural simulations based on recordings from a head and torso simulator (HATS) directly to reality, assessing the effect of non-individual, generic, and individual headphone compensation in listening tests. Additionally, we tested improvements of the regularization scheme of an LMS inversion algorithm, the effect of minimum phase inverse filters, and the reproduction of low frequencies by a subwoofer. Results suggest that while using non-individual binaural recordings the HpTF of the individual used for the recordings – typically a HATS – should be used for headphone compensation.

This article investigates the influence of test duration on user fatigue and the reliability of user ratings in the context of subjective Quality-of-Experience (QoE) assessment. The goal is to provide empirically grounded guidance for the design of lab-based quality experiments, particularly as concerns the overall duration of test sessions. Since subjective user tests tend to be time-consuming and costly, aspects of user workload and fatigue are relevant as they relate to a fundamental challenge: how to maximize test duration without compromising results quality by overly exhausting test participants? In order to address this challenge, we investigate the relationships between test duration, user fatigue, and rating behavior. Our analysis is grounded on measurements and observations made during three typical QoE lab studies with mixed audio, video, and web task profiles that assessed the impact of different network conditions on perceived quality. We measured participant workload and fatigue in two complementary ways: subjectively by means of a questionnaire and objectively by performing physiological measurements in terms of eye blink rates (EBR) as well as Electrocardiographs (ECG). Our results show that even after 90 minutes of active testing, participants’ quality gradings were still reliable despite the presence of measurable signs of fatigue. Thus, for comparable QoE lab user experiments, we recommend to stay within this limit in order to achieve a good balance between results quantity and results quality.

Standards and Information Documents

AES Standards Committee News


[Feature] At the 131st Convention held last October, archiving and preservation of audio material was a prominent topic in the program. A tutorial laid down the why, how, and what of audio preservation, followed by fascinating workshops on the practical exploitation of record company archives in the form of reissues of some of the greatest acts in the history of sound. The challenges of getting the best out of recorded material stored in warehouses and vaults were thoroughly examined, and we discover that an ounce of prevention is probably more worthwhile than a pound of cure when it comes to treating the “patient” that is the valuable historical asset of a record company or radio station’s archive.

132nd Convention Preview, Budapest

Technology Trends in Audio Engineering

133rd Call for Papers, San Francisco


Products and Developments

Membership Information

Advertiser Internet Directory

Section Contacts Directory

AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content