Home / Publications / E-library page
Only AES members and Institutional Journal Subscribers can download
The Transformer network has two drawbacks in Automatic Speech Recognition (ASR) tasks. One is that the global features are mainly focused and other useful features, such as local features, are neglected. The other is that it is not robust to the noisy audio signal. In order to improve the model performance in ASR tasks, useful information extraction and noise removal are the main concerns. First, an information extraction module, abbreviated as IE module, is proposed to extract the local context information from the integration of previous layers which contain both low-level information and high-level information. Moreover, a noisy feature pruning (NFP) module is proposed to ease the negative effect caused by noisy audio. Finally, a network called EPT-Net is proposed on the basis of the integration of IE module, NFP module and the Transformer network. Empirical evaluations have been conducted mainly by using two widely used Chinese Mandarin datasets, which are Aishell-1 and HKUST. Experimental results can validate the effectiveness of EPT-Net, whose character error rate (CER) are 5.3%/5.6% of dev/test and 21.9% of dev in these two datasets respectively.
Author (s): Gao, Guozhi; Duan, Zhikui; Yang, Guangguang; Li, Shiren; Yu, Xinmei; Zhao, Xiaomeng; Ruan, Jinbiao
Affiliation:
Foshan University, Foshan, China
(See document for exact affiliation information.)
Publication Date:
2024-01-06
Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=22378
(671KB)
Click to purchase paper as a non-member or login as an AES member. If your company or school subscribes to the E-Library then switch to the institutional version. If you are not an AES member Join the AES. If you need to check your member status, login to the Member Portal.
Gao, Guozhi; Duan, Zhikui; Yang, Guangguang; Li, Shiren; Yu, Xinmei; Zhao, Xiaomeng; Ruan, Jinbiao; 2024; Information Extraction and Noisy Feature Pruning for Mandarin Speech Recognition [PDF]; Foshan University, Foshan, China; Paper ; Available from: https://aes2.org/publications/elibrary-page/?id=22378
Gao, Guozhi; Duan, Zhikui; Yang, Guangguang; Li, Shiren; Yu, Xinmei; Zhao, Xiaomeng; Ruan, Jinbiao; Information Extraction and Noisy Feature Pruning for Mandarin Speech Recognition [PDF]; Foshan University, Foshan, China; Paper ; 2024 Available: https://aes2.org/publications/elibrary-page/?id=22378
@article{gao2024information,
author={gao guozhi and duan zhikui and yang guangguang and li shiren and yu xinmei and zhao xiaomeng and ruan jinbiao},
journal={journal of the audio engineering society},
title={information extraction and noisy feature pruning for mandarin speech recognition},
year={2024},
volume={72},
issue={1/2},
pages={59-70},
month={january},}
TY – paper
TI – Information Extraction and Noisy Feature Pruning for Mandarin Speech Recognition
SP – 59 EP – 70
AU – Gao, Guozhi
AU – Duan, Zhikui
AU – Yang, Guangguang
AU – Li, Shiren
AU – Yu, Xinmei
AU – Zhao, Xiaomeng
AU – Ruan, Jinbiao
PY – 2024
JO – Journal of the Audio Engineering Society
VO – 72
IS – 1/2
Y1 – January 2024