AES E-Library

Exploring the User Experience of AI-Assisted Sound Searching Systems for Creative Workflows

Locating the right sound effect efficiently is an important yet challenging topic for audio production. Most current sound-searching systems rely on pre-annotated audio labels created by humans, which can be time-consuming to produce and prone to inaccuracies, limiting the efficiency of audio production. Recent works on text and audio multimodal neural networks have led to the development of contrastive language-audio pretraining (CLAP), which learns a shared embedding space for text descriptions and audio samples. Using this idea, we built a CLAP-based sound searching system (CLAP-Search) that does not rely on human annotations. To evaluate the effectiveness of CLAP-Search, we conducted comparative experiments with a widely used sound effect searching platform, the BBC Sound Effect Library. Our study evaluates user performance, cognitive load, and satisfaction through ecologically valid tasks based on professional sound-searching workflows. Our result shows that CLAP-Search demonstrated significantly enhanced productivity and reduced frustration while maintaining comparable cognitive demands. We also qualitatively analyzed the participants feedback, which offered valuable perspectives on the design of future AI-assisted sound search systems.

 

Author (s):
Affiliation: (See document for exact affiliation information.)
AES Convention: Paper Number:
Publication Date:

DOI:


Type:
16938
Choose your country of residence from this list: