Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
Individual sounds are difficult to detect in complex soundscapes because of a strong overlap. This article explores the task of estimating sound polyphony, which is defined here as the number of audible sound classes. Sound polyphony measures the complexity of a soundscape and can be used to inform sound classification algorithms. First, a listening test is performed to assess the difficulty of the task.The results showthat humans are only able to reliably count up to three simultaneous sound sources and that they underestimate the degree of polyphony for more complex soundscapes. Human performance depends mainly on the spectral characteristics of the sounds and, in particular, on the number of overlapping noise-like and transient sounds. In a second step, four deep neural network architectures, including an object detection approach for natural images, are compared to contrast human performance with machine learning--based approaches. The results show that machine listening systems can outperform human listeners for the task at hand. Based on these results, an implicit modeling of the sound polyphony based on the number of previously detected sound classes seems less promising than the explicit modeling strategy.
Author (s): Abeßer, Jakob; Ullah, Asad; Ziegler, Sebastian; Grollmisch, Sascha
Affiliation:
Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany;
(See document for exact affiliation information.)
Publication Date:
2023-12-06
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=22348
(805KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Abeßer, Jakob; Ullah, Asad; Ziegler, Sebastian; Grollmisch, Sascha; 2023; Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes [PDF]; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany;; Paper ; Available from: https://aes2.org/publications/elibrary-page/?id=22348
Abeßer, Jakob; Ullah, Asad; Ziegler, Sebastian; Grollmisch, Sascha; Human and Machine Performance in Counting Sound Classes in Single-Channel Soundscapes [PDF]; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany; Semantic Music Technologies, Fraunhofer Institute for Digital Media Technology (IDMT), Ilmenau, Germany;; Paper ; 2023 Available: https://aes2.org/publications/elibrary-page/?id=22348
@article{abeßer2023human,
author={abeßer jakob and ullah asad and ziegler sebastian and grollmisch sascha},
journal={journal of the audio engineering society},
title={human and machine performance in counting sound classes in single-channel soundscapes},
year={2023},
volume={71},
issue={12},
pages={860-872},
month={december},}