Journal of the Audio Engineering Society

2024 January/February - Volume 72 Number 1/2


The Role of Communication and Reference Songs in the Mixing Process: Insights From Professional Mix Engineers

Authors: Vanka, Soumya Sai; Safi, Maryam; Rolland, Jean-Baptiste; Fazekas, György


Effective music mixing requires technical and creative finesse, but clear communicationwith the client is crucial. The mixing engineer must grasp the client's expectations and preferences and collaborate to achieve the desired sound. The tacit agreement for the desired sound of the mix is established using guides like reference songs and demo mixes exchanged between the artist and the engineer. This paper presents the findings of a two-phased exploratory study aimed at understanding howprofessionalmixing engineers interact with clients and use their feedback to guide the mixing process. For phase one, semistructured interviews were conducted with five mixing engineers with the aim of gathering insights about their communication strategies, creative processes, and decision-making criteria. Based on the inferences from these interviews, an online questionnairewas designed and administered to a larger group of 22 mixing engineers during the second phase. The results shed light on the importance of collaboration and intention in the mixing process and can inform the development of smart multitrack mixing systems. By highlighting the significance of these findings, this paper contributes to the research on the collaborative nature of music production and provides actionable recommendations for the design and implementation of innovative mixing tools.

Factors Affecting Sound Quality in Acoustically Transparent Hearing Devices

Authors: Ohlmann, Kristin; Kollmeier, Birger; Denk, Florian

Sound quality in hearing devices could be improved by providing acoustic transparency, i.e., electronically creating a listening impression alike to the open ear. This can be achieved by equalizing the hearing device output to conserve the transfer function of the individual open ear as closely as possible. The achievable accuracy is limited by unavailability of individual transfer functions, processing delays, and leakage of external sounds into the ear canal. In the present work, the influence of these limitations on perceived sound quality was assessed. Acoustic scenes as heard through a hearing device were simulated using individually measured transfer functions with an in-ear device and presented through headphones. The sound quality was assessed using a MUSHRA-like framework with normal-hearing subjects and a sound quality model. Equalization to the diffuse-field response of the open ear is shown to be close to optimum in most daily-life situations. Although the benefit of incorporating the individual open-ear response is evident but limited, using knowledge of the individual driver responses and leakage improves the perceived sound quality especially with a vented fit.With appropriate equalization, the influence of the fit and processing delays is of less importance. Sound quality models allow a reasonable prediction of perceived sound quality.

Modern telepresence systems incorporating spatial audio can be realized using multichannel loudspeaker reproduction in combination with close-up microphones attached to the sound sources. With the microphones being tracked in space by external sensors, this setup provides an excellent basis for creating interactive virtual acoustic environments. However, the induced acoustical echo loop has to be handled by a suitable acoustic echo cancellation (AEC) system. A popular state-of-the-art adaptive filter with desirable properties for AEC in multichannel systems is the frequency-domain adaptive Kalman filter (FDKF). Combined with previously proposed enhancements, it shows good performance for minor or abrupt echo path changes but has shortcomings with massive and continuous echo path changes, as caused by moving microphones. This article proposes a velocity-controlled FDKF (VC-FDKF) exploiting the knowledge of the microphone motion for a twofold velocity-dependent contribution to the update step-size. The method has been evaluated in simulations with nonsynthetic recorded measurement data considering different trajectories, velocity profiles, signal types, and loudspeaker setups. Common existing approaches, as the shadow filtering technique, are outperformed by the proposed VC-FDKF in our experiments. Furthermore, two extensions of the proposed technique, namely, a position-dependent gain-and-delay compensation and alternative velocity definitions, are briefly studied.

Directivity measurements characterize the angular dependence of source-radiated fields, often through discrete measurements made over a spherical surface. Despite the AES56-2008 (r2019) dual-equiangular standard's ubiquity for directivity applications, no well-known spherical quadrature rule directly applies to its sampling scheme. However, this work shows how Clenshaw-Curtis--type Chebyshev quadrature rules adapt efficiently to equiangular spherical integration. Numerical experiments compare the reliability of Chebyshev, Chebshev-Lobatto, and Chebyshev-Radau quadrature rules for sampled pressure fields. The results show that significant aliasing effects do not occur until nearly twice the previously assumed limit. They also highlight the benefits of the AES approach of equivalent polar and azimuthal angle sampling intervals.

Information Extraction and Noisy Feature Pruning for Mandarin Speech Recognition

Authors: Gao, Guozhi; Duan, Zhikui; Yang, Guangguang; Li, Shiren; Yu, Xinmei; Zhao, Xiaomeng; Ruan, Jinbiao

The Transformer network has two drawbacks in Automatic Speech Recognition (ASR) tasks. One is that the global features are mainly focused and other useful features, such as local features, are neglected. The other is that it is not robust to the noisy audio signal. In order to improve the model performance in ASR tasks, useful information extraction and noise removal are the main concerns. First, an information extraction module, abbreviated as IE module, is proposed to extract the local context information from the integration of previous layers which contain both low-level information and high-level information. Moreover, a noisy feature pruning (NFP) module is proposed to ease the negative effect caused by noisy audio. Finally, a network called EPT-Net is proposed on the basis of the integration of IE module, NFP module and the Transformer network. Empirical evaluations have been conducted mainly by using two widely used Chinese Mandarin datasets, which are Aishell-1 and HKUST. Experimental results can validate the effectiveness of EPT-Net, whose character error rate (CER) are 5.3%/5.6% of dev/test and 21.9% of dev in these two datasets respectively.

It is known from published papers that the coaxial cable and antenna system of an FM stereo station often reduce stereo channel separation, deteriorating the quality of the emitted sound. This problem is motivated by impedance mismatch errors caused by the characteristics of the radiating dipoles. In this work, amathematical analysis of this problem will be performed using SPICE simulation to identify its origins. Based on the results of this analysis, the solution that has been developed based on the design of a monitored audio processor will be presented. This technology allows achieving high channel separation using the existing antenna and adds the capability to remotely measure, adjust, and certify the quality of the emitted sound.

Engineering reports

A Digital Model for the Prologue Voltage Control Filter

Authors: Lazzarini, Victor; Timoney, Joseph

The Prologue polyphonic synthesizer features a voltage control filter (VCF) employing an original topology. This paper examines its circuit and extracts a flowchart, which yields a set of analog filter equations. From these, the authors were able to develop a transfer function to describe its ideal linear time-invariant form. Then a linear digital version of the filter to validate the analysis was put forward. Noting that the filter operation within the synthesizer is fairly nonlinear, a number of modifications yielding a nonlinear model of the VCF is proposed. These were compared with the outputs of the original analog filter, and it was found that they approximate them consistently.

Standards and Information Documents

AES Standards Committee News


AES New Officers



Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content