You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
Binaural sound source localization is the task of finding the location of a sound source using binaural audio as affected by the head-related transfer functions (HRTFs) of a binaural array. The most common approach to this is to train a convolutional neural network directly on the magnitude and phase of the binaural audio. Recurrent layers can then also be introduced to allow for consideration of the temporal context of the binaural data, as to create a convolutional recurrent neural network (CRNN).
This work compares the relative performance of this approach for speech localization on the horizontal plane using four different CRNN models based on different types of recurrent layers; Conv-GRU, Conv-BiGRU, Conv-LSTM, and Conv-BiLSTM, as well as a baseline system of a more conventional CNN with no recurrent layers. These systems were trained and tested on datasets of binaural audio created by convolution of speech samples with BRIRs of 120 rooms, for 50 azimuthal directions. Additive noise created from additional sound sources on the horizontal plane were also added to the signal.
Results show a clear preference for use of CRNN over CNN, with overall localization error and front-back confusion being reduced, with it additionally being seen that such systems are less effected by increasing reverb time and reduced signal to noise ratio. Comparing the recurrent layers also reveals that LSTM based layers see the best overall localisation performance, while layers with bidirectionality are more robust, and so overall finding a preference for Conv-BiLSTM for the task.
Author (s): Reed-Jones, Jago T.; Jones, Karl O.; Fergus, Paul; Ellis, David L.
Affiliation:
Liverpool John Moores University, UK; Liverpool John Moores University, UK; Liverpool John Moores University, UK; Liverpool John Moores University, UK
(See document for exact affiliation information.)
AES Convention: 157
Paper Number:290
Publication Date:
2024-09-27
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=22747
(234KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Reed-Jones, Jago T.; Jones, Karl O.; Fergus, Paul; Ellis, David L.; 2024; A study on the relative accuracy and robustness of the convolutional recurrent neural network based approach to binaural sound source localisation [PDF]; Liverpool John Moores University, UK; Liverpool John Moores University, UK; Liverpool John Moores University, UK; Liverpool John Moores University, UK; Paper 290; Available from: https://aes2.org/publications/elibrary-page/?id=22747
Reed-Jones, Jago T.; Jones, Karl O.; Fergus, Paul; Ellis, David L.; A study on the relative accuracy and robustness of the convolutional recurrent neural network based approach to binaural sound source localisation [PDF]; Liverpool John Moores University, UK; Liverpool John Moores University, UK; Liverpool John Moores University, UK; Liverpool John Moores University, UK; Paper 290; 2024 Available: https://aes2.org/publications/elibrary-page/?id=22747
@article{reed-jones2024a,
author={reed-jones jago t. and jones karl o. and fergus paul and ellis david l.},
journal={journal of the audio engineering society},
title={a study on the relative accuracy and robustness of the convolutional recurrent neural network based approach to binaural sound source localisation},
year={2024},
number={290},
month={april},}
TY – paper
TI – A study on the relative accuracy and robustness of the convolutional recurrent neural network based approach to binaural sound source localisation
AU – Reed-Jones, Jago T.
AU – Jones, Karl O.
AU – Fergus, Paul
AU – Ellis, David L.
PY – 2024
JO – Journal of the Audio Engineering Society
VL – 290
Y1 – April 2024