Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
The task of general audio detection and segmentation is quite common in contemporary audio applications where computationally intensive processes are frequently involved. Machine learning is usually employed along with user-enabled data labeling that is intended to detect, segment, and semantically annotate the relevant audio events. This work focuses on a generic audio detection and classification method that combines hierarchical bimodal segmentation with hybrid pattern classification at different temporal resolutions. This paper presents the algorithmic perspective of a mobile back-end system to facilitate the construction, validation, and continuous update of generic audio ground-truth data. The goal is the implementation of a system that is capable of performing well in different conditions without relying on complicated pattern recognition systems and taxonomies. For this reason, minimal prior knowledge is necessary so that there is consistent behavior for different input signals and computational environments. Novel “classification confidence” metrics are implemented.
Author (s): Vrysis, Lazaros; Tsipas, Nikolaos; Dimoulas, Charalampos; Papanikolaou, George
Affiliation:
Aristotle University of Thessaloniki, Thessaloniki, Greece
(See document for exact affiliation information.)
Publication Date:
2016-12-06
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=18537
(229KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Vrysis, Lazaros; Tsipas, Nikolaos; Dimoulas, Charalampos; Papanikolaou, George; 2016; Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification [PDF]; Aristotle University of Thessaloniki, Thessaloniki, Greece; Paper ; Available from: https://aes2.org/publications/elibrary-page/?id=18537
Vrysis, Lazaros; Tsipas, Nikolaos; Dimoulas, Charalampos; Papanikolaou, George; Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification [PDF]; Aristotle University of Thessaloniki, Thessaloniki, Greece; Paper ; 2016 Available: https://aes2.org/publications/elibrary-page/?id=18537
@article{vrysis2016crowdsourcing,
author={vrysis lazaros and tsipas nikolaos and dimoulas charalampos and papanikolaou george},
journal={journal of the audio engineering society},
title={crowdsourcing audio semantics by means of hybrid bimodal segmentation with hierarchical classification},
year={2016},
volume={64},
issue={12},
pages={1042-1054},
month={december},}
TY – paper
TI – Crowdsourcing Audio Semantics by Means of Hybrid Bimodal Segmentation with Hierarchical Classification
SP – 1042 EP – 1054
AU – Vrysis, Lazaros
AU – Tsipas, Nikolaos
AU – Dimoulas, Charalampos
AU – Papanikolaou, George
PY – 2016
JO – Journal of the Audio Engineering Society
VO – 64
IS – 12
Y1 – December 2016