You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
This work proposes bag-of-features deep learning models for acoustic scene classi?cation (ASC) – identifying recording locations by analyzing background sound. We explore the effect on classi?cation accuracy of various front-end feature extraction techniques, ensembles of audio channels, and patch sizes from three kinds of spectrogram. The back-end process presents a two-stage learning model with a pre-trained CNN (preCNN) and a post-trained DNN (postDNN). Additionally, data augmentation using the mixup technique is investigated for both the pre-trained and post-trained processes, to improve classi?cation accuracy through increasing class boundary training conditions. Our experiments on the 2018 Challenge on Detection and Classi?cation of Acoustic Scenes and Events - Acoustic Scene Classi?cation (DCASE2018-ASC) subtask 1A and 1B signi?cantly outperform the DCASE2018 reference implementation and approach state-of-the-art performance for each task. Results reveal that the ensemble of multi-spectrogram features and data augmentation is bene?cial to performance.
Author (s): Pham, Lam; McLoughlin, Ian; Phan, Huy; Palaniappan, Ramaswamy; Lang, Yue
Affiliation:
University of Kent, School of Computing, Medway, UK; University of Kent, School of Computing, Medway, UK; University of Kent, School of Computing, Medway, UK; University of Kent, School of Computing, Medway, UK; Huawei Technologies Co. Ltd., Shenzhen, China
(See document for exact affiliation information.)
Publication Date:
2019-06-06
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=20465
(381KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Pham, Lam; McLoughlin, Ian; Phan, Huy; Palaniappan, Ramaswamy; Lang, Yue; 2019; Bag-of-Features Models Based on C-DNN Network for Acoustic Scene Classification [PDF]; University of Kent, School of Computing, Medway, UK; University of Kent, School of Computing, Medway, UK; University of Kent, School of Computing, Medway, UK; University of Kent, School of Computing, Medway, UK; Huawei Technologies Co. Ltd., Shenzhen, China; Paper 12; Available from: https://aes2.org/publications/elibrary-page/?id=20465
Pham, Lam; McLoughlin, Ian; Phan, Huy; Palaniappan, Ramaswamy; Lang, Yue; Bag-of-Features Models Based on C-DNN Network for Acoustic Scene Classification [PDF]; University of Kent, School of Computing, Medway, UK; University of Kent, School of Computing, Medway, UK; University of Kent, School of Computing, Medway, UK; University of Kent, School of Computing, Medway, UK; Huawei Technologies Co. Ltd., Shenzhen, China; Paper 12; 2019 Available: https://aes2.org/publications/elibrary-page/?id=20465
@article{pham2019bag-of-features,
author={pham lam and mcloughlin ian and phan huy and palaniappan ramaswamy and lang yue},
journal={journal of the audio engineering society},
title={bag-of-features models based on c-dnn network for acoustic scene classification},
year={2019},
number={12},
month={june},}