AES E-Library

Exploring the User Experience of AI-Assisted Sound Searching Systems for Creative Workflows

Locating the right sound effect efficiently is an important yet challenging topic for audio production. Most current sound-searching systems rely on pre-annotated audio labels created by humans, which can be time-consuming to produce and prone to inaccuracies, limiting the efficiency of audio production. Recent works on text and audio multimodal neural networks have led to the development of contrastive language-audio pretraining (CLAP), which learns a shared embedding space for text descriptions and audio samples. Using this idea, we built a CLAP-based sound searching system (CLAP-Search) that does not rely on human annotations. To evaluate the effectiveness of CLAP-Search, we conducted comparative experiments with a widely used sound effect searching platform, the BBC Sound Effect Library. Our study evaluates user performance, cognitive load, and satisfaction through ecologically valid tasks based on professional sound-searching workflows. Our result shows that CLAP-Search demonstrated significantly enhanced productivity and reduced frustration while maintaining comparable cognitive demands. We also qualitatively analyzed the participants feedback, which offered valuable perspectives on the design of future AI-assisted sound search systems.

 

Author (s):
Affiliation: (See document for exact affiliation information.)
AES Convention: Paper Number:
Publication Date:
Permalink: https://aes2.org/publications/elibrary-page/?id=23077


(1328KB)


Download Now

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type:
E-Libary location:
16938
Choose your country of residence from this list:










Skip to content