Journal of the Audio Engineering Society

2016 June - Volume 64 Number 6


Over the last decade, there has been considerable debate over the benefits of recording and rendering high resolution audio beyond standard CD quality audio. This research involved a systematic review and meta-analysis (combining the results of numerous independent studies) to assess the ability of test subjects to perceive a difference between high resolution and standard (16 bit, 44.1 or 48 kHz) audio. Eighteen published experiments for which sufficient data could be obtained were included, providing a meta-analysis that combined over 400 participants in more than 12,500 trials. Results showed a small but statistically significant ability of test subjects to discriminate high resolution content, and this effect increased dramatically when test subjects received extensive training. This result was verified by a sensitivity analysis exploring different choices for the chosen studies and different analysis approaches. Potential biases in studies, effect of test methodology, experimental design, and choice of stimuli were also investigated. The overall conclusion is that the perceived fidelity of an audio recording and playback chain can be affected by operating beyond conventional resolution.

Categorization of Broadcast Audio Objects in Complex Auditory Scenes

Authors: Woodcock, James; Davies, William J.; Cox, Trevor J.; Melchior, Frank


Because object-based audio is becoming an important framework for the representation of complex sound scenes, this research describes a series of experiments to determine a categorization framework for broadcast audio objects. Categorization is a fundamental human strategy for reducing cognitive load, and knowledge of these categories should be beneficial for the development of perceptually based representations and rendering strategies for object-based audio. In this study, 21 expert and non-expert listeners took part in a free card sorting task using audio objects from a variety of different types of program material. Hierarchical agglomerative clustering suggests that there are 7 general categories, which relate to sounds indicating actions and movement, continuous background sound, transient background sound, clear speech, non-diegetic music and effects, sounds indicating the presence of people, and prominent attention-grabbing transient sounds. A three-dimensional perceptual space calculated via multidimensional scaling suggests that these categories vary along the dimensions of semantic content, continuous-transient, and presence-absence of people. The position of an audio object along the dimensions of the perceptual space relates to its perceived importance.

Privacy-Aware Acoustic Assessments of Everyday Life

Authors: Bitzer, Joerg; Kissner, Sven; Holube, Inga


In order to enhance people’s ability to interact with their acoustic environments, hearing devices are common tools. However, it is difficult to evaluate the benefit of those tools or to measure acoustically challenging situations in natural environments. This paper proposes a way to measure the most important features of everyday acoustics environments by extracting a limited set of features while not compromising the privacy of partners and bystanders. The respective national laws on how to deal with audio privacy are very different among countries. The authors proposed using a smartphone as the source recorder but splitting the feature extraction into two phases: an initial feature processing in the smartphone and a later processing on a more powerful computer. For a given feature set, a statistical analysis shows comparable results from the extracted data when using either the original audio or the new privacy-aware extraction methods. A comparison shows that different scenarios result in separable features using the new extraction method.

Engineering reports

Adaptive Personal Tuning of Sound in Mobile Computers

Authors: Czyzewski, Andrzej; Ciarkowski, Andrzej; Kostek, Bozena; Kotus, Jozef; Lopatka, Kuba; Suchomski, Piotr


An integrated methodology for enhancing audio quality in mobile computers is presented, whose key features are adapting the acoustic track to changing acoustic conditions of the environment, and matching audio characteristics to the users’ individual preferences. Signal processing algorithms included linearizing the frequency response, enhancing dialogue intelligibility, and adjusted dynamics to the users’ hearing characteristics. Algorithms were tested on two different computers (an All-in-one and a laptop), both of which were located in quiet office-like conditions but in the presence of strong noise. In general, test results showed that audio processing methods were useful tools for the improvement of the sound quality in compact computers. For example, although most the listeners were untrained, the processing for speech clarity in noise (dialogue enhancement and dynamics processing) yielded the highest scores. The majority of the results indicated that listeners perceive the processing as being desirable and useful.


Affective science increasingly concludes that the voice is a powerful tool for emotional communication. The process of creating a finished product by means of studio recording gives listeners the opportunity to engage in experiences of the voice that are quite unlike that which would be achieved in a traditional concert hall or live performance and even more so when compared with day-to-day speech. The audio production chain, from sound capture using particular sound recording techniques, to specific effects processing affords the engineer (or the vocalist themself) unparalleled access to shape the recorded voice, and thereby enhance the affective impact of the voice for the listener. This paper expands upon previous work presented at the 139th AES Convention in New York defining affective potential and considers a number of examples where one of the points in the production chain has been exploited to increase the affective impact of the voice (either deliberately or by happenstance), suggesting that the affective sciences might find the analysis of such applications of the recorded voice a fertile ground for future investigation of perceived affective correlates and their underlying musical, or more generally, acoustic, cues.


[Feature] Audio forensic techniques rarely work like an episode from CSI. Professional members of the AES Technical Committee on Audio Forensics presented two tutorials on the topic at the 139th Convention, offering an excellent primer for those inside the field and out.

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content