AES E-Library

Dual-Microphone Voice Activity Detection Estimate in Handset Applications Based on Neural Network by Using Subband Signed Power Difference and Inter-Microphone Cross Correlation

Voice activity detection (VAD) is a critical part of some speech processing because a processing algorithm needs to distinguish between real voices and other unrelated background sounds. This report explores the combination of a neural network and dual microphones to improve VAD estimates in handset applications. Two new features are extracted from the dual microphones: subband signed power difference (SBSPD) and inter-microphone cross correlation (IMCC). SBSPD provides specific and accurate power difference information at various frequency bands and IMCC contains detailed spatial location information of both microphones. Extensive objective evaluation has been performed under various noise conditions including directional speech interference. Compared to existing methods based on the power level difference ratio, the proposed method is superior in terms of accuracy and robustness of VAD estimate under various noise environments, especially directional speech interferences. Because the method adapts to the sonic environment, parameter optimization is not needed and the approach is suitable for hand-held devices.

 

Author (s):
Affiliation: (See document for exact affiliation information.)
Publication Date:
Permalink: https://aes2.org/publications/elibrary-page/?id=18059


(528KB)


Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.

Type:
E-Libary location:
16938
Choose your country of residence from this list:










Skip to content