Home / Publications / Journal-Online
Only AES members and Institutional Journal Subscribers can download
*Only AES members and Institutional Journal Subscribers can download.
Authors: Jovanovic, Vladan
Tracing errors arise because the reproducing stylus is of a different shape than the cutting chisel used to create the original acetate lacquer master for vinyl Long Play (LP) records. Tracing errors are typically the most significant source of distortion in vinyl reproduction and probably the main reason manufacturers of pickup cartridges seldom specified distortion figures for their products. In this paper, a historical overview of harmonic distortion results due to tracing errors is provided. In many cases, these results are 70--80 years old and, at least in some cases, seem largely forgotten by now. Some new simulation results are provided to verify various approximations proposed and used in the past.
Download: PDF (1.4 MB)
Authors: Mccormack, Leo; Meyer-Kahlen, Nils; Lou Alon, David; Ben-Hur, Zamir; V. Amengual Garí, Sebastià; Robinson, Philip
This article formulates and evaluates four different methods for six-degrees-of-freedom binaural reproduction of head-worn microphone array recordings, which may find application within future augmented reality contexts. Three of the explored methods are signalindependent, utilizing least-squares, magnitude least-squares, or plane wave decomposition--based solutions. Rotations and translations are realized by applying directional transformations to the employed spherical rendering or optimization grid. The fourth considered approach is a parametric signal-dependent alternative, which decomposes the array signals into directional and ambient components using beamformers. The directional components are then spatialized by applying binaural filters corresponding to the transformed directions, whereas the ambient sounds are reproduced using the magnitude least-squares solution. Formal perceptual studies were conducted, whereby test participants rated the perceived relative quality of the four binaural rendering methods being evaluated. Of the three signal-independent approaches, the magnitude least-squares solution was rated the highest. The parametric approach was then rated higher than the magnitude least-square solution when the listeners were permitted to move away from the recording point.
Download: PDF (719.65 KB)
Authors: Grimaldi, Vincent; S.R. Simon, Laurent; Courtois, Gilles; Lissek, Hervé
Head tracking combined with head movements have been shown to improve auditory externalization of a virtual sound source and contribute to the performance in localization. With certain technically constrained head-tracking algorithms, as can be found in wearable devices, artefacts can be encountered. Typical artefacts could consist of an estimation mismatch or a tracking latency. The experiments reported in this article aim to evaluate the effect of such artefacts on the spatial perception of a non-individualized binaural synthesis algorithm. The first experiment focused on auditory externalization of a frontal source while the listener was performing a large head movement. The results showed that a degraded head tracking combined with head movement yields a higher degree of externalization compared to head movements with no head tracking. This suggests that the listeners could still take advantage of spatial cues provided by the head movement. The second experiment consisted of a localization task in azimuth with the same simulated head-tracking artefacts. The results showed that a large latency (400 ms) did not affect the ability of the listeners to locate virtual sound sources compared to a reference headtracking. However, the estimation mismatch artefact reduced the localization performance in azimuth.
Download: PDF (679.68 KB)
Authors: Jüterbock, Tobias; Brinkmann, Fabian; Gamper, Hannes; Raghuvanshi, Nikunj; Weinzierl, Stefan
Parametric spatial audio rendering aims to provide perceptually convincing audio cues that are agnostic to the playback system to enable the acoustic design of games and virtual reality. The authors propose an algorithm for detecting perceptually important reflections from spatial room impulse responses. First, a parametric representation of the sound field is derived based on perceptually motivated spatio-temporal windowing, followed by a second step that estimates the perceptual salience of the detected reflections by means of a masking threshold. In this work, a vertical dependency is incorporated into both these components. This was inspired by recent research revealing that two sound sources in the median plane can evoke two independent auditory events if their spatial separation is sufficiently large. The proposed algorithm is evaluated in nine simulated shoebox rooms with a wide range of sizes and reverberation times. Evaluation results show improved selection of early reflections by accounting for source elevation and suggest that for speech signals, the perceptual quality increases with an increasing number of rendered early reflections.
Download: PDF (979.75 KB)
Authors: Mendonça, Catarina; Wang, Heng; Pulkki, Ville
The current technological solutions for spatial audio provide realistic auditory impressions but rarely account for multisensory interactions. The intent of this study was to discover if and when spatial sounds could lower the accuracy of visual perception. Sequences of light and sound events were presented, and different sound parameters were tested: spatial and temporal congruency, horizontal and vertical spatial distribution, and source broadness. Participants were asked to report the location of the last visual event, in a left-right discrimination task. During the task, cognitive effort was monitored through pupil size measurements. It was found that both spatial and temporal congruence are important for higher accuracy levels and lower cognitive effort levels. However, spatial congruence was found to not be crucial, if sounds occur within the same spatial region as visual events. Sounds hindered the visual accuracy levels and increased effort when they occurred within a narrower or wider field than that of the visual events, but not too discrepant. These effects were replicated with vertical sound distributions.Broad sounds made the task more effortful and limited negative effects of spatially mismatched audiovisual events. When creating spatial sound for audiovisual reproductions, source distribution and broadness should be intentionally controlled.
Download: PDF (631.2 KB)
Authors: Koya, Daisuke; Mason, Russell; Dewhirst, Martin; Bech, Søren
Aperceptual modelwas developed to evaluate the spatial quality of automotive audio systems by adapting the Quality Evaluation of Spatial Transmission and Reproduction by an Artificial Listener (QESTRAL) model of spatial quality developed for domestic audio systems. The QESTRAL model was modified to use a combination of existing and newly created metrics, based on---in order of importance---the interaural cross-correlation, reproduced source angle, scenewidth, level, entropy, and spectral roll-off. The resulting model predicts the overall spatial quality of two-channel and five-channel automotive audio systems with a cross-validation R2 of 0.85 and root-mean-square error (RMSE) of 11.03%. The performance of the modified model improved considerably for automotive applications compared with that of the original model, which had a prediction R2 of 0.72 and RMSE of 29.39%. Modifying the model for automotive audio systems did not invalidate its use for domestic audio systems, which were predicted with an R2 of 0.77 and RMSE of 11.90%.
Download: PDF (993.74 KB)
Authors: Miller, Thomas; Downey, Cristina
The frequency response of a headphone is very important for listener satisfaction. Listener preferences have been well studied for frequencies below 10 kHz, but preferences above that frequency are less well known. Recent improvements in the high-frequency performance of ear simulators makes it more practical to study this frequency region now. The goal of this study was to determine the preferred headphone response for insert headphones for the audible range above 10 kHz. A new target response is proposed, based on listener preference ratings in a blind listening test. The results show a clear preference for significantly more high-frequency energy than was proposed in a previous popular headphone target curve. The preferred response is also affected by the listener's hearing thresholds, with additional high-frequency boost being preferred for listeners with age-related hearing loss.
Download: PDF (1.48 MB)
Authors: Hwang, Inwoo; Kim, Kibeom; Kim, Sunmin
Audio enhancement is a signal processing method that improves the listening experience. Although most audio devices provide a variety of sound-enhancing effects, it is reported that very few people are active users of this feature. This lack of usability comes from insufficient sound improvement because of concerns about scene-rendering mismatch, which means that the processing applied to an unintended target may even damage the sound quality. The key solution to this problem is sound intelligence that provides an optimal sound effect with very low latency. The authors propose a real-time audio enhancement system based on a highly precise audio scene classifier using convolutional neural networks. The entire computation including convolutions is optimized for digital signal processing--level implementation, resulting in enhanced audio outputs for every audio frame.
Download: PDF (549.88 KB)
Download: PDF (38.63 KB)
Download: PDF (38.63 KB)
Download: PDF (46.18 KB)
Download: PDF (46.18 KB)
Download: PDF (12.67 MB)
Download: PDF (224.96 KB)
Download: PDF (12.67 MB)
Download: PDF (224.96 KB)
Download: PDF (121.84 KB)
Download: PDF (36.45 KB)
Download: PDF (48.93 KB)
Download: PDF (121.84 KB)
Download: PDF (36.45 KB)
Download: PDF (48.93 KB)
Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.