E-library page

AES E-Library

On the Potential for Scene Analysis from Compact Microphone Arrays

In the classic signal processing context, the ability to identify and resolve acoustic objects from a compact and small number of directional microphones is a challenging problem. A practical example is developing a robust system for understanding voice activity in a reverberant conference room from a small number of co-incident directional microphones. In an application setting, many assumptions of the classic academic problem formulation are violated. The actual problem is inherently broad band with a wide dynamic range, simultaneous voice activity and multi-path acoustic responses leading to source correlation and ambiguity. Room and occupant noise is rarely stationary and irrelevant acoustic events are not easily classified separate from voice. There is however a useful set of assumptions which can be utilized. Whilst these can be di cult to formally specify, they correspond to the understandings, common sense and constraints of a real meeting environment. The higher order statistical independence of typical acoustic scenes and voice activity can be utilized to gather information selectively in time. The system discussed in this work combines a simple statistical framework, physical source object modeling and operational heuristics to decompose a meeting scene with low latency from an array of three co-incident directional microphones. An overview of the system architecture is presented with speci c details of the raw features, a convenient mapping utilized for clustering and heuristics over several time scales driven by a voice activity classi er. Longer time frames and suitable constraints on the object state provide robust operation and allow for the use of scene information for an interactive sound field application. Rather than an objective assessment of localization accuracy, the comparative assessment of algorithms and was based on field testing with the key requirements being reliability, testability and understanding potential failure modes. The work is presented as a demonstration and suggestion for the use of light weight computational auditory scene analysis in a deployed voice conference system.

Author (s): Dickins, Glenn; Gunawan, David; Shi, Dong
Affiliation: Dolby Laboratories, Sydney, Australia; Dolby Laboratories, Beijing, China (See document for exact affiliation information.)
Publication Date: 2013-09-06 Import into BibTeX
Session subject: Spatial Field Control Theory and Applications
Permalink: https://aes2.org/publications/elibrary-page/?id=16922

(479KB)

This paper costs $33 for non-members and is free for AES members and E-Libary subscribers.

Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type: Conference Paper
E-Libary location: (CD 52ndPapers) TMP/conf/52/

Learn more about the AES E-Library

About AES

Code of Conduct

AES Conventions

AES Conferences

AES Training & Development

Gift Membership

AES Membership Benefits

Gift Membership

AES Membership Benefits

Become a Sustaining Member

AES Membership Benefits

AES Inside Track

Current Standards

Standards Blog

Journal of the AES

AES E-library

Special Publications

AES Sections are active around the world and provide a means for members to meet locally.

AES Student Website

AES Educational Foundation

Student Sections

See the committee’s accomplishments in diversity & inclusion

AES Statement of solidarity

AES E-Library

On the Potential for Scene Analysis from Compact Microphone Arrays

Choose your country of residence from this list:

AES E-Library

Login Institutions

On the Potential for Scene Analysis from Compact Microphone Arrays

Choose your country of residence from this list: