You are currently logged in as an 
 Institutional Subscriber.
If you would like to logout, 
 please click on the button below.
Home / Publications / Journal-Online
Only AES members and Institutional Journal Subscribers can download
*Only AES members and Institutional Journal Subscribers can download.
Authors: Bernier, Antoine; Bouserhal, Rachel E.; Voix, Jérémie; Herzog, Philippe
This paper presents the design and assessment of an active musician’s hearing protection device focusing on occlusion effect reduction through active noise control. The system is designed to adapt its feedback compensation to achieve a specified target performance across users, regardless of their ear canal acoustic properties. The detailed design process of the prototype earpiece and feedback compensation algorithm are presented, validated, and implemented. Experimental measurements show the system is able to maintain robust and stable occlusion effect reduction despite great variation in the acoustic properties of an adjustable ear simulator.
Authors: Falcón Pérez, Ricardo; Götz, Georg; Pulkki, Ville
This work suggests a method of presenting information about the acoustical and geometric properties of a room as spherical images to a machine-learning algorithm to estimate acoustical parameters of the room. The approach has the advantage that the spatial distribution of the properties can be presented in a generic and potentially compact way to machine learning methods. The estimation of reverberation time T60 is used as a proof-of-concept study here. The distribution of absorptive material is presented as a spherical map of feature values in which each value is formed by calculating the equivalent absorption area visible through the corresponding facet of a polyhedron as seen from the polyhedron’s center point. The pixel values are then used as feature vectors and the real measured T60 values of corresponding rooms are used as target data. This work presents the method and trains a set of neural networks with different spherical map resolutions using a dataset composed of real-world acoustical measurements of a single room with 831 different configurations of furniture and absorptive materials. The estimation of reverberation time using the proposed approach exhibits a much higher accuracy compared to simple analytic methods, which proves the validity of the approach.
Authors: Roberts, Timothy; Nicolson, Aaron; Paliwal, Kuldip K.
Objective evaluation of audio processed with Time-Scale Modification (TSM) has recently seen improvement with a labeled time-scaled audio dataset used to train an objective measure. This double-ended measure was an extension of Perceptual Evaluation of Audio Quality and required reference and test signals. In this paper two single-ended objective quality measures for time-scaled audio are proposed that do not require a reference signal. Internal representations of spectrogram and speech features are learned by either a Convolutional Neural Network (CNN) or a Bidirectional Gated Recurrent Unit (BGRU) network and fed to a fully connected network to predict Subjective Mean Opinion Scores. The proposed CNN and BGRU measures respectively achieve average Root Mean Square Errors of 0.61 and 0.58 and mean Pearson Correlation Coefficients of 0.77 and 0.79 to the time-scaled audio dataset. The proposed measures are used to evaluate TSM algorithms and comparisons are provided for 15 TSM implementations. A link to implementations of the objective measures is provided.
Authors: Agrawal, Sarvesh; Bech, Søren; Bærentsen, Klaus; De Moor, Katrien; Forchhammer, Søren
Studying immersion in audiovisual experiences can help technologists deliver engaging and enhanced experiences. As a first step toward this goal this paper details an investigation conducted to establish an experimental paradigm for quantifying immersion and determining the influence of immersive tendency (susceptibility to become immersed) on immersion. A balanced incomplete block design was employed where 21 assessors rated 15 commercially available stimuli (representative of the highest quality encountered in domestic AV applications) without repetitions and simultaneous comparisons. The assessors were instructed to rate immersion on a graphic line scale and document their familiarity with the content. A questionnaire was administered to measure the immersive tendency after the rating experiment. The results show that the assessors can comprehend the description of immersion and follow the experimental protocol. It is found that immersion is a graded experience and the correlation between immersive tendencies and immersion ratings is predominantly statistically insignificant. The experimental paradigm presented in this paper can form the framework for assessing immersion and developing novel methods to thoroughly explore the concept of immersion in audiovisual experiences.
Download: PDF (HIGH Res) (2.11 MB)
Download: PDF (LOW Res) (951.91 KB)
Authors: Rafaelof, Menachem; Wendling, Kyle
A method for predicting the audibility of an arbitrary time-varying noise (signal) in the presence of masking noise is described. The statistical audibility prediction (SAP) method relies on the specific loudness, or loudness perceived through the individual auditory filters, for accurate statistical estimation of audibility vs. time. As such this work investigated a new hypothesis that audibility is more accurately discerned within individual auditory filters by a higher-level decision-making process. Audibility prediction vs. time is intuitive since it captures changes in audibility with time as it occurs, critical for the study of human response to noise. Concurrently time-frequency prediction of audibility may provide valuable information about the root cause(s) for audibility useful for the design and operation of sources of noise. Empirical data, gathered under a three-alternative forced-choice (3AFC) test paradigm for low-frequency sound, has been used to examine the accuracy of SAPs.
Authors: Wühle, Tom; Merchel, Sebastian; Altinsoy, M. Ercan
In spatial audio reproduction with sound projection, instead of placing loudspeakers in specific directions, virtual sources are created by projecting sound on reflective boundaries. The projected sound should dominate the localization. One limiting factor is the leading direct sound occurring because of physical limitations of the focusing capabilities of sound projectors. In this paper, localization masking, a method to reduce the influence of this direct sound on localization, is introduced. Localization masking was investigated in an anechoic chamber with cascaded lead-lag pairs representing the sounds involved. The sounds were reproduced via individual loudspeakers. Natural percussion signals with transient temporal structures were used. The lag localization dominance threshold, defined as the maximum lead level at which the direction of the auditory events is in the direction of the lag, was measured using a method of adjustment. Localization masking caused this threshold to shift to up to a 7-dB higher lead level. Therefore localization masking reduced the influence of the initial lead, representing the direct sound, on localization. In practical sound projection scenarios, localization masking may improve the projection of signals with transient structures or reduce the requirements on the focusing capabilities of sound projectors that are used to project such signals.
Authors: Rumsey, Francis
Ambisonic representation provides a relatively versatile and compact way of storing or transmitting spatial audio scenes. Signals in this format may exist in mixed orders, requiring special decoders to be constructed that optimize spatial rendering. When it comes to decoding ambisonics for different speaker layouts, there may be advantages to active or hybrid decoders. Irregular layouts are harder to decode for than regular ones. Binaural rendering is a popular means of reproducing ambisonic content, and there is some evidence that individualized HRTF processing is useful in this context. Ambisonic information can also be data compressed, and there may be advantages to doing this after some form of signal decomposition has been done, which takes advantage of interchannel redundancy.
Institutional Subscribers: If you would like to log into the E-Library using your institutional log in information, please click HERE.