Journal of the Audio Engineering Society

2015 December - Volume 63 Number 12


Wave Field Synthesis (WFS) of a moving sound source is of great importance when reproducing dynamic sound scenes. This research describes a time-domain analytical WFS driving functions for the synthesis of uniformly moving acoustic point sources. The results can be regarded as a general WFS solution for both moving and stationary point sources, which can then be directly used in practical applications. The derivation adapts the traditional stationary phase approximation in the mixed temporal-frequency domain to the dynamical description of moving point sources. This results in driving functions for harmonic sources optimized on a reference line. The authors prove that by applying traditional approximations the resulting driving functions formally coincide with the driving functions for stationary sources when the originally static distances are changed to dynamic distances. The validity of the analytical results is demonstrated via numerical simulation examples, including a practical applicable situation: applying multiple linear Secondary Source Distribution (SSD) elements instead of the theoretically infinite SSD.

Previous research has shown that reverberation influences the perception of clarity, spaciousness, and other aspects of music. But the degree to which reverberation influences the emotional experience of musical instrument sounds is still not known. The authors conducted a listening test to compare the effect of reverberation on the emotional characteristics of eight instrument sounds representing the wind and bowed string families. The subjects compared paired stimuli for eight emotional categories: Happy, Sad, Heroic, Scary, Comic, Shy, Romantic, and Mysterious. For simple parametric reverberation, the results showed the following: a significant effect on Mysterious and Romantic for the back of a large hall; a medium effect on Sad, Scary, and Heroic for the back of a large hall; a mild effect on Happy for the front of a small hall; relatively little effect on Shy; and the opposite effect on Comic, with listeners judging anechoic sounds most Comic. These results give audio engineers and musicians an interesting perspective on simple parametric artificial reverberation. The motivation for this research was to understand how emotional characteristics vary with reverberation length and amount in simple parametric reverberation, which are equivalent to the hall size and the listeners distance to the front.

The paper presents a novel approach to the Virtual Bass Synthesis (VBS) applied to mobile devices, called Smart VBS (SVBS). The proposed algorithm uses an intelligent, rule-based setting of bass synthesis parameters adjusted to the particular music genre. Harmonic generation is based on a nonlinear device (NLD) method with the intelligent controlling system adapting to the recognized music genre. To automatically classify music genres, the k-Nearest Neighbor classifier combined with the Principal Component Analysis (PCA) method is employed. To fine tune the SVBS algorithm, the MUSHRA test is performed. Subjects are presented with music excerpts belonging to various genres, unprocessed and also processed by SVBS and a conventional bass boost algorithm. Listening tests show that subjects in most cases prefer the SVBS strategy developed by the authors in favor of both the conventional bass boost algorithm and the unprocessed audio file. Furthermore, the listeners indicated that perception of the SVBS-processed music excerpts is similar for several types of portable devices.

The authors propose a new speech enhancement approach based on the application of the wavelet packet transform with an optimal decomposition. The approach uses the principal component analysis (PCA) and an improved version of the robust PCA by imposing the nonnegative factorization on the low-rank matrix. In order to detect noisy time–frequency zones, a subspace decomposition based post-processing technique was implemented. A number of simulations were then used to evaluate performance under various types of noise. Standard objective measures, as well as subjective evaluations, show that this approach outperforms the comparable speech enhancement methods for noise-corrupted speech at low levels of signal-to-noise ratios. The technique creates the least distortion in the enhanced speech while suppressing less noise than the on-line semi-supervised Nonnegative Matrix Factorization.

A real-time simulation of physical models of musical instruments has applications in a variety of situations where a proposed physical change needs to be instantly auralized. The compactness and computational power of Field Programmable Gate Arrays make it possible to implement Finite Difference methods in the simulation. These methods are based on a discrete representation in both the spatial and time domains of the partial differential equations that represent the physical behavior of the instrument. However unlike large computer simulations, the real-time requirement necessitates special ways of representing the simulation. The authors illustrate this approach using a string-excitation model of a North American five-string banjo, which includes five strings, a membrane, and air cavity. Three examples show how real-time models can be used by musicians in a live-music setting, researchers exploring instrument acoustics, and instrument builders in the process of making design decisions.

Engineering reports

Voice activity detection (VAD) is a critical part of some speech processing because a processing algorithm needs to distinguish between real voices and other unrelated background sounds. This report explores the combination of a neural network and dual microphones to improve VAD estimates in handset applications. Two new features are extracted from the dual microphones: subband signed power difference (SBSPD) and inter-microphone cross correlation (IMCC). SBSPD provides specific and accurate power difference information at various frequency bands and IMCC contains detailed spatial location information of both microphones. Extensive objective evaluation has been performed under various noise conditions including directional speech interference. Compared to existing methods based on the power level difference ratio, the proposed method is superior in terms of accuracy and robustness of VAD estimate under various noise environments, especially directional speech interferences. Because the method adapts to the sonic environment, parameter optimization is not needed and the approach is suitable for hand-held devices.

Standards and Information Documents

AES Standards Committee News


[Feature] Recording and mastering engineers gathered in New York for the 139th Convention, at which the impact of new technology on the art was a prominent theme. Panels and tutorials discussed getting the best out of streamed audio, recording on an iPad, and the interaction between technology and classical recording.

59th Conference Report, Montreal

139th Convention Report, New York

139th Convention Exhibitors and Sponsors

60th Conference Preliminary Program, Leuven

Call for Awards Nominations

Call for Nominations for Board of Governors

Call for Papers: Special Issue on Dereverberation of Audio Music and Speech

Call for Papers: Special Issue on Intelligent Audio Processing, Semantics, and Interactions

139th Convention Papers Abstracts, New York

Index to Volume 63


Products and Developments

AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content