Journal of the Audio Engineering Society

2018 December - Volume 66 Number 12


Gray-Box Modeling of Guitar Amplifiers

Authors: Eichas, Felix; Zölzer, Udo

Musical distortion circuits, especially guitar amplifiers, have been the subject of virtual analog modeling for years. There exists two main modeling approaches: white-box modeling, where the internal properties are fully known, and gray-box modeling, where only the input and output are available. This work proposes a gray-box modeling approach for analog guitar amplifiers using iterative optimization to adjust the parameters of a block-based model. The only assumption made about the reference system is its basic structure. The digital model is an extended Wiener–Hammerstein model consisting of a linear time-invariant (LTI) block, a nonlinear block with a nonlinear mapping function, and another LTI block connected in series. The model is adapted in two steps: first the filters are measured, and then the parameters for the nonlinear part of the digital model are optimized with the Levenberg–Marquardt method to minimize a cost-function describing the error between the digital model and the analog reference system. A small number of guitar amplifiers were modeled, the adapted model was evaluated with objective scores, and a listening test was performed to rate its quality.

Sonification is a technique to present data arrays as sound, thereby taking advantage of the human ability to hear patterns that might otherwise not be apparent. Mappings from parameters of data to parameters of sound form the basis of parameter-mapping sonification. The choice of mappings and their design can influence both the utility of the sonification system and the ability of users to interpret the sounds. In this article the authors demonstrate the use of a time-efficient methodology with an experimental online platform for assessing mappings. Experiments explored the effectiveness of various mappings, and the discussions explore the implications of each approach. Based on the responses of 100 participants in an online Magnitude Estimation experiment, the effectiveness of 16 data-sound mappings was explored. Results showed that mappings involving certain sound parameters were generally effective, while those using other sound parameters varied in their effectiveness. In some cases the ability to interpret mappings and the polarities with which they were perceived varied among individuals using them. The mappings that used the tempo parameter were generally perceived effectively, while those using other sound parameters varied. Exploratory observations suggest that differences among participants might be related to different levels of musical experience.

When an omni-directional loudspeaker is placed close to a surface or surfaces, reflections from the surface(s) can be as dominating as the direct propagating sound, and can thus deteriorate the omni-directionality (as referred to as uniform directionality) of the sound source. This effect can eventually degrade the sound quality because the frequency response is distorted at listening positions. This research is concerned with the sound radiation control for the uniform directionality in the presence of strong early reflections. A circular array of loudspeakers mounted on the surface of a cylinder is employed to apply radiation control methods. It is shown that even when there is a wall close to this array, the directionality can be kept uniform by controlling the radiation as long as the distance to the surface is known. The effects of errors introduced in the distance and the reflection coefficient of the surface are investigated. The results imply that such sound radiation control can improve the uniform directionality and sound quality of loudspeaker arrays with the aid of sensors that can measure distances to surfaces.

Hybrid Approach to Speech Source Separation Depending on the Voicing State

Authors: Wiem, Belhedi; Anouar, Ben Messaoud Mohamed; Aicha, Bouzid

Single-channel speech source separation (SCSSS) is a research field with applications that include hearing aids and security. This research uses a hybrid method for SCSSS, which combines two different approaches based on the voicing state; the algorithm can be used for speech source separation and speech enhancement. The hybrid method combines subspace decomposition for unvoiced speech, and Soft-CASA (Computational Auditory Scene Analysis) for voiced speech. The voiced speech source separation process is an improved version of the conventional CASA system that is optimized by the use of a soft mask. Moreover, the unvoiced speech source separation process relies on an optimized approximation of the speech signal by subspace decomposition in the spectral domain. The new system is evaluated for speech separation outcome, as well as for voicing decision. Despite the challenging acoustic environments that were used for test, the proposed speech separation approach yields on average 58.91 % improvement in signal-to-interference ratio, 12.67 % improvement in signal-to-artifact ratio, 38.91 % improvement in signal-to-distortion ratio, and 45 % improvement in perceived speech quality.

Higher Power Density Amplifiers

Authors: Iversen, Niels E.; Dahl, Nicolai J.; Knott, Arnold; Andersen, Michael A.E.

This paper proposes a new switching strategy for switch-mode power audio amplifiers that reduces the power dissipation in the switching devices of the power stage. The strategy is based on a thorough analysis of the loss mechanism and operating conditions of the power stage and how it relates to the audio input. The strategy utilizes a high-ripple current combined with full state control to improve soft switching capabilities. This shifts the losses from switching devices to the filter inductors, which are less sensitive to loss variations because of a larger form factor. Measured results on 100-W test amplifiers show that the proposed strategy reduces the power dissipation within the switches, causing up to 45°C temperature reduction locally in the switches and up to 35°C globally in the amplifier. THD+N levels are reduced to 0.03 % and power density of the implemented amplifiers is 6 W/cm3. Two amplifiers were implemented.

Method to Estimate the Acoustic Center of Directional Sources and its Psychoacoustic Evaluation

Authors: Imbery, Christina; Franz, Sven; van de Par, Steven; Bitzer, Joerg

The effective acoustic center (AC) has multiple definitions including: (a) the position of the virtual point source from which sound pressure varies inversely as distance, (b) the point from which the approximately spherical wavefronts appear to diverge when observed in a region around the observer, (c) similar to the previous two definitions but also considering phase. In all 3 cases, only one frequency is considered. In this paper, the position of the acoustic center is estimated by using the time delay between two distinct orientations of a loudspeaker source, direct-facing and sideways-facing, in order to approximate the acoustic center position relative to the rotation axis. If the propagation times of both paths are equal and thus the time delay between both paths is zero, it is assumed that the loudspeaker rotates around the acoustic center. The time delay estimation is based on the phase information calculated by the Generalized Cross Correlation Phase Transform (GCC-PHAT) method. Measurements were carried out in an anechoic environment as well as in a room with reverberation. The technical analysis revealed that the acoustic center can be estimated based on only two consecutive recordings and that the GCC-PHAT method performed very well under reverberant conditions. A listening experiment with anechoic binaural recordings demonstrated that using the estimated position of the AC for low frequencies leads to a source position perceived as stable for loudspeaker orientation up to 85°.

Engineering reports

Analysis of 2D Feature Spaces for Deep Learning-Based Speech Recognition

Authors: Korvel, Gražina; Treigys, Povilas; Tamulevicus, Gintautas; Bernataviciene, Jolita; Kostek, Bozena

The aim of this study was to evaluate the suitability of 2D audio signal feature maps for speech recognition based on deep learning. The proposed methodology employs a convolutional neural network (CNN), which is a class of deep, feed-forward artificial neural network. The authors analyzed the audio signal feature maps, namely spectrograms, linear and Mel-scale cepstrograms, and chromagrams. This choice was made because CNN performs well in 2D data-oriented processing contexts. Feature maps were employed in a Lithuanian word-recognition task. The spectral analysis led to the highest word recognition rate. Spectral and mel-scale cepstral feature spaces outperform linear cepstra and chroma. The 111-word classification experiment depicts f1 score of 0.99 for spectrum, 0.91 for mel-scale cepstrum , 0.76 for chromagram, and 0.64 for cepstrum feature space on test data set.


The conventional approach to achieving relatively uniform directional dispersion of sound from an audio monitor is to use diffraction from drivers substantially smaller than the wavelengths of sound they are reproducing. Ideally one would like to have a point sound source emitting a spherical wave front in at least a 90 degree cone. However, it is desirable to use larger drivers to counteract difficulties in producing sufficient amplitude and linearity. Larger drivers emit nearly planar wave fronts at higher frequencies that produce substantially larger amplitudes on axis, known as “beaming.” With the advent of 3D printing technologies, it is possible to print acoustic lenses that have negative focal length, better dispersing the sound. The approach uses an array of physical channels to delay portions of the planar wave front, shaping it into a spherical wave front having an apparent point source as illustrated by acoustic measurements and photography. In a practical speaker installation, the acoustic lens reduced the on-axis beaming effect by reshaping the driver’s planar wave front into a spherical one. Subjective impressions from listeners were very positive. 3D printing opens up the possibility of creating a range of such lenses for various purposes, particularly changing the shape and nature of emitted wave fronts.

Standards and Information Documents

AES Standards Committee News


[Feature] There are many more variables and context dependencies in evaluating AVAR systems than there are in conventional audio reproduction systems. This makes the determination of things like references rather hard, with attention being focused on alternative approaches more to do with quality of experience than audio quality per se. The concept of plausibility arises regularly here, as does the idea of looking at how users behave or what they can identify. When natural and synthetic elements are combined in AR there is a need to ensure a high degree of co-immersion and to find ways of evaluating that.

AVAR Conference Report, Redmond

145th Convention Report, New York

145th Convention Exhibitors and Sponsors

146th Convention Preview, Dublin

IIA Conference Preview, York

Call for Awards Nominations

Call for Nominations for the Board of Governors

JAES Special Issue on Sound in Immersion and Emotion, Call for Papers

145th Convention Papers Abstracts, New York

Index to Volume 66


Annual Report

AES Conventions and Conferences


Table of Contents

Cover & Sustaining Members List

AES Officers, Committees, Offices & Journal Staff

Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.

Choose your country of residence from this list:

Skip to content