You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
Due to the variability in characteristics of audio scenes, some scenes can naturally be recognized earlier than others. In this work, rather than using equal-length snippets for all scene categories, as is common in the literature, we study to which temporal extent an audio scene can be reliably recognized given state-of-the-art models. Moreover, as model fusion with deep network ensemble is prevalent in audio scene classi?cation, we further study whether, and if so, when model fusion is necessary for this task. To achieve these goals, we employ two single-network systems relying on a convolutional neural network and a recurrent neural network for classi?cation as well as early fusion and late fusion of these networks. Experimental results on the LITIS-Rouen dataset show that some scenes can be reliably recognized with a few seconds while other scenes require signi?cantly longer durations. In addition, model fusion is shown to be the most bene?cial when the signal length is short.
Author (s): Phan, Huy; Chén, Oliver Y.; Koch, Philipp; Pham, Lam; McLoughlin, Ian; Mertins, Alfred; De Vos, Maarten
Affiliation:
University of Kent, UK; University of Oxford, UK; University of Lübeck, Germany; University of Kent, UK; University of Kent, UK; University of Lübeck, Germany; University of Oxford, UK
(See document for exact affiliation information.)
Publication Date:
2019-06-06
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=20468
(513KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Phan, Huy; Chén, Oliver Y.; Koch, Philipp; Pham, Lam; McLoughlin, Ian; Mertins, Alfred; De Vos, Maarten; 2019; Beyond Equal-Length Snippets: How Long Is Sufficient to Recognize an Audio Scene? [PDF]; University of Kent, UK; University of Oxford, UK; University of Lübeck, Germany; University of Kent, UK; University of Kent, UK; University of Lübeck, Germany; University of Oxford, UK; Paper 16; Available from: https://aes2.org/publications/elibrary-page/?id=20468
Phan, Huy; Chén, Oliver Y.; Koch, Philipp; Pham, Lam; McLoughlin, Ian; Mertins, Alfred; De Vos, Maarten; Beyond Equal-Length Snippets: How Long Is Sufficient to Recognize an Audio Scene? [PDF]; University of Kent, UK; University of Oxford, UK; University of Lübeck, Germany; University of Kent, UK; University of Kent, UK; University of Lübeck, Germany; University of Oxford, UK; Paper 16; 2019 Available: https://aes2.org/publications/elibrary-page/?id=20468
@article{phan2019beyond,
author={phan huy and chén oliver y. and koch philipp and pham lam and mcloughlin ian and mertins alfred and de vos maarten},
journal={journal of the audio engineering society},
title={beyond equal-length snippets: how long is sufficient to recognize an audio scene?},
year={2019},
number={16},
month={june},}
TY – paper
TI – Beyond Equal-Length Snippets: How Long Is Sufficient to Recognize an Audio Scene?
AU – Phan, Huy
AU – Chén, Oliver Y.
AU – Koch, Philipp
AU – Pham, Lam
AU – McLoughlin, Ian
AU – Mertins, Alfred
AU – De Vos, Maarten
PY – 2019
JO – Journal of the Audio Engineering Society
VL – 16
Y1 – June 2019