Journal of the Audio Engineering Society

2018 April - Volume 66 Number 4


This report explores the sound and interaction design aspects of a system for augmenting acoustic drums using electromagnetic actuation of a resonant membrane driven with a continuous audio signal. This system offers a novel application of acoustic synthesis methods to an existing instrument and it scales to offer extended capability when multiple instruments are used in a configurable network. The use of bidirectional OSC communication opens a wide range of possibilities for both performer and audience interaction that have not yet been explored. Integration with smart instruments, WiFi-enabled sensors distributed to performers, and audience-controlled mobile apps offers a large range of possible mappings from gesture to synthesis and processing parameters. Electromagnetic (EM) actuation and wireless connectivity enable a network of augmented drums to function in traditionally percussive roles, as well as in harmonic, melodic, and textural roles.

People with complex disabilities (conditions that affect both cognitive and motor abilities) can use technology to assist them in music performance. In such “facilitated performance,” musicians are supported by musical experts and other facilitators. This report explores facilitated performances as a design space where multilevel social interactions exist surrounding the technology. Results suggest that including facilitators in the design of Digital Musical Instruments (DMIs) could allow for improved accessibility for users with complex disabilities. During this project a gesture-based technology probe was deployed to explore the potential of embodied interactions with digital instruments for this user group. Outcomes show the social relationships between performer and facilitator to be paramount to success, and as such highlights Participatory Design as a strong design methodology for Facilitated Performance. Facilitators can be considered to be gatekeepers to musical activity for performers with complex disabilities. Not only because they possess a multitude of knowledge around music performance and technologies involved, but also because they are most equipped to communicate this knowledge to the performer. As there is limited research about this practice, designers and developers of DMIs wishing to optimize their products for this setting should consider participatory design methods through engaging facilitators in their product testing and development.

Co-design of a Smart Cajón

Authors: Turchet, Luca; McPherson, Andrew; Barthet, Mathieu

Smart Instruments are a family of musical instruments that embed sensors, actuators, wireless connectivity, and semantic audio technologies. This report describes one such example, a Smart Cajón, which is a box-shaped percussion instrument that includes Internet of Musical Things components. Co-design sessions were conducted with five professional cajón player participants. The players were invited to devise tangible mock-ups by placing the provided sensors on an acoustic cajón, and to express desirable use cases and interactions. A prototype was developed on the basis of the designs produced by participants who also took part of an evaluation session. Results showed that all participants personalized the integration of the new gestures afforded by the sensor into their normal playing technique, generating different ways of expressing themselves. These novel pathways for expression were not possible with commercially available cajones, which are not equipped with the involved sensors, gesture-to-sound mappings, and wireless connectivity to external equipment such as smartphones. Participants adapted to the instrument, coping with its limitations and exploiting it in creative ways. Flaws of the instrument were used in a creative way, as for example, deliberately triggering low-frequency sounds in the top position where they should not be present.

The Rough Mile: a Design Template for Locative Audio Experiences

Authors: Hazzard, Adrian; Spence, Jocelyn; Greenhalgh, Chris; McGrath, Sean

The rapid development of mobile devices, networks, and sensors over the past 20 years has expanded the range of listening experiences such that they are not constrained to homes, cars, workplaces, cinemas, and concert halls. Locative or mobile listening experiences come in many forms. Individuals can listen to playlists on their smart phones where the ambient environmental sound is masked by the donning of headphones and replaced with a user-curated soundtrack. In addition, artists have seized upon mobile technologies to explore new opportunities and settings to create and present sound art where the location serves as a stage for presentation. Sensor-driven technologies enable contextually guided media experiences that are responsive and personalized. In this paper the authors chart the design, composition, and authoring of “The Rough Mile,” a dynamic locative audio walk in two parts that combines spoken word, original and found music, user-generated content, and ambient environmental sound. The design of the locative walks, set in city center streets, deliberately sought to explore novel mechanisms to create thematic and structural relationships between the audio treatments and attributes of the built environment. The article reflects upon a distinct design approach and the resulting challenges that emerged from the design of locative walking experiences.

PaperClip: A Digital Pen Interface for Semantic Speech Editing in Radio Production

Authors: Baume, Chris; Plumbley, Mark D.; Frohlich, David; Calic, Janko


The radio production workflow typically involves recording material, selecting which parts of that material to use, and then editing the desired material down to the final output. Some radio producers find this process easier with paper rather than editing directly on a screen, which makes a transcript the common denominator. However, after deciding which audio they want to use, producers then must use a digital audio workstation to manually execute those editorial decisions, which is a tedious and slow process. In this paper, the authors describe the design, development, and evaluation of PaperClip, a novel system for editing speech recordings directly on a printed transcript using a digital pen. A user study with eight professional radio producers compared editing with the digital pen to editing with a screen interface. The two interfaces each had advantages and disadvantages. The pen interface was better for fast and simple editing of familiar audio when accurate transcripts were available. The screen interface was better for more complex editing with less familiar audio and less accurate transcripts. There was no overall preference.

Turn-taking and Online Chatting in Remote and Co-located Collaborative Music Live Coding

Authors: Xambó, Anna; Roma, Gerard; Shah, Pratik; Tsuchiya, Takahiko; Freeman, Jason; Magerko, Brian

This paper looks into co-located and remote turn-taking and online chatting in collaborative music live coding (CMLC) using the web-based computer science education platform EarSketch. Duo and trio live coding are considered from an autoethnographic stance. An online survey with six practitioners in live coding and collaboration complements the autoethnographic findings. It was identified that turn-taking in duo and trio live coding was more promising in an education context than in performance. It is expected that turn-taking and online chatting in CMLC, among small groups of two, three, or four people can be useful in the classroom for pedagogical purposes. The role of a chat window is important as a tool for supporting communication in CMLC, but the proposal of semantic hashtags should be reconsidered as a tailorable vocabulary adapted to the needs of each group and perhaps linked to a notification system that facilitates the collaboration. From the four use cases based on trio/duo versus co-located/remote situations, it was discovered that a co-located trio live coding mediated by a turn-taking mechanism can be more interesting for group dynamics because the roles of a driver and two navigators can specialize and adapt easily during the musical improvisation act, while combining both verbal and nonverbal communication.

Assessing Musical Similarity for Computational Music Creativity

Authors: Goddard, Callum; Barthet, Mathieu; Wiggins, Geraint


Automation of musically creative tasks generally requires inclusion of elements of syntactic and/or semantic information related to the specific task being automated. Such information is rational and meaningful and relates to both the task and its context. When this information is based upon subjective judgements, such as musical similarity, its suitability to the task maybe unknown and thus need validation. This paper outlines the design of a computationally creative musical performance system aimed at producing virtuosic interpretations of musical pieces. The case-based reasoning part of the system relies on a measure of musical similarity using the FANTASTIC and SynPy toolkits that provide melodic and syncopated rhythmic features, respectively. A listening test based on pair-wise comparison to assess to what extent the machine-based similarity models match human perception was conducted. It was found the machine-based models do differ significantly from human responses due to differences in participants’ responses. The best performing model relied on features from the FANTASTIC toolkit obtaining a rank match rate with human response of 63%, while features from the SynPy toolkit only obtained a ranking match rate of 46%. These results suggest that one can use features from the FANTASTIC toolkit as a measure of similarity in creative systems. These are systems, that, from a computationally creative perspective, both display creative behavior and are capable of reflection. This is the ability for an agent (or in the context of this paper, a computational system) to evaluate or reason about its own creative output and, in light of this evaluation, adapt or alter its behavior.

The Perception of Vocal Traits in Synthesized Voices: Age, Gender, and Human Likeness

Authors: Baird, Alice; Jørgensen, Stina Hasse; Parada-Cabaleiro, Emilia; Cummins, Nicholas; Hantke, Simone; Schuller, Björn

As computer-generated voice synthesis has become a significant part of communications between computers and people, there is a need to understand the role of paralinguistic attributes of the voice, such as age, personality, and gender. In many cases, the synthesized voice is produced by concatenating segments of recorded human speech, which can be experienced as a lifeless voice that lacks free expression and fluidness. Technology companies have been developing their own unique synthesized voice identities without paying attention to the stereotypical traits being heard. This study evaluated the responses of 18 listeners who were asked to consider the paralinguistic traits of age, gender, and human likeness from 13 voices in IBM’s Watson corpus. The results of this study were similar to a previous study, with no voice achieving complete human likeness, no voice being perceived within a single age frequency band, and none tied solidly to their given binary gender.

Engineering reports

As a result of increasing computational performance and dramatically lower cost, embedded systems have become suitable for digital signal processing of audio signals. This paper describes a novel multichannel, low-latency, Linux-based audio system that is appropriate for real-time processing. Two different driver architectures are described: a common Linux architecture using ALSA and a unique real-time driver architecture (Bela Platform). The development of the ALSA driver architecture includes device drivers that use the ASoC layer, sound server settings, device tree overlays and capes, register maps, and real-time patches to the kernel. The adaption of the multichannel sound card drivers for Bela focuses on a hard real-time program for data transmission, multichannel buffer alignments, and audio codec control. The overall system has been evaluated with respect to sound quality and latency to gauge its usefulness as a powerful new platform for audio development projects, such as embedded digital effect processors for musicians or augmented and participatory performances.

Standards and Information Documents

AES Standards Committee News


[Feature] There is a growing interest in human response beyond the basic quality rating or attribute scales used in standard listening tests. These ideas are motivated by a wider interest in human response and experience that may require one to examine indirect responses, behavior, or the development of one’s reaction to sound over a long period of time. When it comes to VR, there is increasing evidence that the evaluation methods and attributes developed for static surround sound systems may be less than adequate.

Conference on Immersive and Interactive Audio, Call for Contributions, York

Preview of Conference on Music-Induced Hearing Disorders, Chicago

Preview of Conference Audio Archiving, Preservation, and Restoration, Culpeper


AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content