AES E-Library

Improvement and Cross-Domain Evaluation of SlowFast Networks

Audio classification requires robust models which can capture both transient events and long-duration patterns. This poses a challenge for common single-stream convolutional neural networks that process short- and long-term events in a uniform fashion using the same kernels. Inspired by the dual-stream processing of the human auditory system, SlowFast networks separate temporal and spectral analysis into parallel pathways. In this paper, we propose several enhancements to the SlowFast network, implementing uniform separable convolutions on both the slow and fast pathways to streamline the architecture and improve efficiency, while also introducing a novel lightweight model variant with a 92 % parameter reduction. We perform a comprehensive cross-domain evaluation using eight datasets that cover speech, environmental sounds, industrial sounds, and bioacoustic sounds. The enhanced SlowFast network surpasses the original SlowFast network as well as the MobileNetV3 single-stream baseline, especially for single-label tasks, while remaining competitive in multi-label tasks. The study highlights the potentialof dual-stream architectures and underscores the importance of architectural design for audio classification.

 

Author (s):
Affiliation: (See document for exact affiliation information.)
Publication Date:
Session subject:
Permalink: https://aes2.org/publications/elibrary-page/?id=23011


(294KB)


Download Now

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type:
E-Libary location:
16938
Choose your country of residence from this list:










Skip to content