You are currently logged in as an
Institutional Subscriber.
If you would like to logout,
please click on the button below.
Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
In the classic signal processing context, the ability to identify and resolve acoustic objects from a compact and small number of directional microphones is a challenging problem. A practical example is developing a robust system for understanding voice activity in a reverberant conference room from a small number of co-incident directional microphones. In an application setting, many assumptions of the classic academic problem formulation are violated. The actual problem is inherently broad band with a wide dynamic range, simultaneous voice activity and multi-path acoustic responses leading to source correlation and ambiguity. Room and occupant noise is rarely stationary and irrelevant acoustic events are not easily classified separate from voice. There is however a useful set of assumptions which can be utilized. Whilst these can be di cult to formally specify, they correspond to the understandings, common sense and constraints of a real meeting environment. The higher order statistical independence of typical acoustic scenes and voice activity can be utilized to gather information selectively in time. The system discussed in this work combines a simple statistical framework, physical source object modeling and operational heuristics to decompose a meeting scene with low latency from an array of three co-incident directional microphones. An overview of the system architecture is presented with speci c details of the raw features, a convenient mapping utilized for clustering and heuristics over several time scales driven by a voice activity classi er. Longer time frames and suitable constraints on the object state provide robust operation and allow for the use of scene information for an interactive sound field application. Rather than an objective assessment of localization accuracy, the comparative assessment of algorithms and was based on field testing with the key requirements being reliability, testability and understanding potential failure modes. The work is presented as a demonstration and suggestion for the use of light weight computational auditory scene analysis in a deployed voice conference system.
Author (s): Dickins, Glenn; Gunawan, David; Shi, Dong
Affiliation:
Dolby Laboratories, Sydney, Australia; Dolby Laboratories, Beijing, China
(See document for exact affiliation information.)
Publication Date:
2013-09-06
Import into BibTeX
Session subject:
Spatial Field Control Theory and Applications
Permalink: https://aes2.org/publications/elibrary-page/?id=16922
(479KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Dickins, Glenn; Gunawan, David; Shi, Dong; 2013; On the Potential for Scene Analysis from Compact Microphone Arrays [PDF]; Dolby Laboratories, Sydney, Australia; Dolby Laboratories, Beijing, China; Paper 2-2; Available from: https://aes2.org/publications/elibrary-page/?id=16922
Dickins, Glenn; Gunawan, David; Shi, Dong; On the Potential for Scene Analysis from Compact Microphone Arrays [PDF]; Dolby Laboratories, Sydney, Australia; Dolby Laboratories, Beijing, China; Paper 2-2; 2013 Available: https://aes2.org/publications/elibrary-page/?id=16922