Journal of the Audio Engineering Society

2018 June - Volume 66 Number 6


Qualitative Evaluation of Media Device Orchestration for Immersive Spatial Audio Reproduction

Authors: Francombe, Jon; Woodcock, James; Hughes, Richard J.; Mason, Russell; Franck, Andreas; Pike, Chris; Brookes, Tim; Davies, William J.; Jackson, Philip J. B.; Cox, Trevor J.; Fazi, Filippo M.; Hilton, Adrian


The challenge of installing and setting up dedicated spatial audio systems can make it difficult to deliver immersive listening experiences to the general public. However, the proliferation of smart mobile devices and the rise of the Internet of Things mean that there are increasing numbers of connected devices capable of producing audio in the home. “Media device orchestration” (MDO) is the concept of utilizing an ad hoc set of devices to deliver or augment a media experience. In this paper, the concept is evaluated by implementing MDO for augmented spatial audio reproduction using object-based audio with semantic metadata. A system that augmented a stereo pair of loudspeakers with an ad hoc array of connected devices is described. The MDO approach aims to optimize aspects of the listening experience that are closely related to listener preference rather than attempting to recreate sound fields as devised during production. A thematic analysis of positive and negative listener comments about the system revealed three main categories of responses: perceptual, technical, and content-dependent aspects. MDO performed particularly well in terms of immersion/envelopment, but the quality of listening experience was partly dependent on loudspeaker quality and listener position.

With the widespread use of smartphones that have multiple sensors and sound processing capabilities, there is a great potential for increased audience participation in music performances. This paper proposes a framework for participatory mobile music based on mapping arbitrary accelerometer gestures to sound synthesizers. The authors describe Handwaving, a system based on neural networks for real-time gesture recognition and sonification on mobile browsers. Based on a multiuser dataset, results show that training with data from multiple users improves classification accuracy, supporting the use of the proposed algorithm for user-independent gesture recognition. This illustrates the relevance of user-independent training for multiuser settings, especially in participatory music. The system is implemented using web standards, which makes it simple and quick to deploy software on audience devices in live performance settings.

Augmenting a MIDI Keyboard Using Virtual Interfaces

Authors: Desnoyers-Stewart, John; Gerhard, David; Smith, Megan L.

With the ongoing development of virtual reality (VR) systems, such as the HTC Vive and Oculus Rift, there is a need to develop new interfaces that maximize immersion in VR. Several VR interfaces have been successfully developed and implemented for a mixed reality (MR) MIDI keyboard. MR exists along a continuum between a pure real environment and a pure virtual environment. This paper presents a collection of virtual interfaces used to augment a MIDI keyboard synchronized in physical and virtual space. Several virtual interfaces are developed and evaluated. Some utilize the tactility offered by the keyboard’s surface, while others rely on the improved presence offered by the keyboard. An evaluation of these virtual interfaces is made with respect to learnability and the utility to identify the successes and failures of these interfaces. With few exceptions, the interfaces developed offer functional and immersive control in a VR environment. Some mimic ordinary real interfaces without the need for fixed sensors, offering significantly improved flexibility and functionality while taking advantage of the tactility of the keyboard. A number of other interfaces are built upon this immersion to take advantage of the virtual environment.


When playing the piano, pedaling is one of the important techniques that lead to expressive performance, comprising not only the onset and offset information that composers often indicate in the score, but also gestures related to the musical interpretation by performers. This research examines pedaling gestures and techniques on the sustain pedal from the perspective of measurement, recognition, and visualization. Pedaling gestures can be captured by a dedicated measurement system where the sensor data is simultaneously recorded alongside the piano sound under normal playing conditions. Recognition is comprised of two separate tasks on the sensor data: pedal onset/offset detection and classification by technique. The onset and offset times of each pedaling technique were computed using signal processing algorithms. Based on features extracted from every segment when the pedal is pressed, the task of classifying the segments by pedaling technique was undertaken using machine-learning methods. High accuracy was obtained by cross validation. The recognition results can be represented using novel pedaling notations and visualized in an audio-based score-following application.

Speech Emotion Recognition for Performance Interaction

Authors: Vryzas, Nikolaos; Kotsakis, Rigas; Liatsou, Aikaterini; Dimoulas, Charalampos A.; Kalliris, George

This research explores the relevance of machine-driven Speech Emotion Recognition (SER) as a way to augment theatrical performances and interactions, such as controlling stage color/light, stimulating active audience engagement, actors’ interactive training, etc. It is well known that the meaning of a speech utterance arises from more than the linguistic content. Emotional affect can dramatically change meaning. As the basis for classification experiments, the authors developed the Acted Emotional Speech Dynamic Database (AESDD, which contains spoken utterances from 5 actors with 5 emotions. Several audio features and various classification techniques were implemented and evaluated using this database, as well comparing results with the Surrey Audio-Visual Expressed Emotion (SAVEE) database. The training classified was integrated into a novel application that performed live SER, fitting the needs of actor training.

Ethnography has long been used within a variety of settings in order to articulate and understand the everyday worlds of work and leisure. This paper explores the use of auto-ethnography as a method for soundscape design in the fields of personal heritage and locative media. Specifically, the authors explore possible connections between digital media, space, and ‘meaning making,” suggesting how autoethnographies might help discover design opportunities for merging digital media and places. These are methods that are more personally relevant than those typically associated with more system-based design approaches that often are less sensitive to the way that emotion, relationships, memory, and meaning come together. As digital technologies are increasingly ubiquitous, there are new possibilities that allow people to self-design experiences that can be social, located, or mobile, spanning modalities and times. There is a suggestion that tangible interactive technologies might contribute to community-based (or intersubjective) narratives and foster participatory sense-making around such merging of place with media. As physical space and digital media become ever more intertwined, together forming and augmenting meaning and experience, there is a need for methods to explore possible ways in which physical places and intangible personal content can be used to develop meaningful experiences.

“Touch the Sound” is a technological tool specifically designed for children training in experimentation of sound. It is based on the interaction with physical and tangible elements (passive objects that children can organize and move inside a two-axis space). A tablet based app uses computer-vision procedures to recognize each object and its position and movement to playback audio files and modify different sound parameters in real time. The technological approach and design adopted according to the philosophy and objectives of the Touch the Sound project has proved to be effective, despite the drawbacks and technical problems encountered during the development process. Programming based on native Android OS tools has enabled the design of a system that achieves optimal performance results on devices with limited hardware resources. The system contributes to the media literacy of children, making them aware of the narrative possibilities of sound, and teaching them to control and modify its parameters.

Visual programming provides a way to construct and communicate concepts to a computer by manipulating graphical objects and by using symbols, spatial arrangements, and visual expressivity instead of text. In context of music creation, it offers an intuitive yet comprehensive approach. To avoid using textual programming, some composers, performers, and multimedia artists employ visual languages to support their creative processes. This research explores contextual aspects related to discovery, learning, and use of different visual languages for music. The authors conducted a survey of 218 participants and quantitatively analyzed relations between relevant dimensions. The resulting interpretation of the analyzed data formed guidelines for educators, visual language developers, and end-users. Educators can use this research to improve how they transfer knowledge and mentor their students. Developers are provided with empirical evidence gathered through rigorous quantitative research that can indicate the existence of certain phenomena related to users of their tools. End users can engage in continuous and unstructured exploration and experimentation.


[Feature] When loudspeakers are a commodity product, manufactured in large numbers at the lowest possible price, finding ways of ensuring optimum performance can be challenging. Various acoustical and signal processing methods can be used to extract the optimum performance out of them, and testing regimes need to be sensitive to the real implications of performance variations.

Preview of Spatial Reproduction Conference, Tokyo

Download: PDF (546.12 KB)

Preview of Audio for Virtual and Augmented Reality Conference, Redmond

Download: PDF (1.01 MB)


AES Conventions and Conferences

Download: PDF (109.54 KB)

Section News

Download: PDF (144.31 KB)

Book Reviews

Download: PDF (97.45 KB)


Download: PDF (238.6 KB)


Table of Contents

Download: PDF (43.77 KB)

Cover & Sustaining Members List

Download: PDF (77.5 KB)

AES Officers, Committees, Offices & Journal Staff

Download: PDF (76.28 KB)

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content