Efficient Acoustic Modeling Method for Unsupervised Speech Recognition using Multi-Task Deep Neural Network

Haitao Yao; Maobo An; Ji Xu; Jian Liu

doi:10.2991/nceece-15.2016.72

<Previous Article In Volume

Next Article In Volume>

Efficient Acoustic Modeling Method for Unsupervised Speech Recognition using Multi-Task Deep Neural Network

Authors

Haitao Yao, Maobo An, Ji Xu, Jian Liu

Corresponding Author

Haitao Yao

Available Online December 2015.

DOI: 10.2991/nceece-15.2016.72 How to use a DOI?
Keywords: Speech Recognition; Acoustic Modeling; Unsupervised Training; Multi-Lingual; Multi-Task Deep Neural Network
Abstract: This paper proposes a method of acoustic modeling for zero-resourced languages speech recognition under mismatch conditions. In those languages, very limited or no transcribed speech is available for traditional monolingual speech recognition. Conventional methods such as IPA based universal acoustic modeling has been proved to be effective under matched acoustic conditions (similar speaking styles, adjacent languages, etc.), while usually poorly preformed when mismatch occurs. Since mismatch problems between languages often appears, in this paper, unsupervised acoustic modeling via cross-lingual knowledge sharing has thus been proposed: first, initial acoustic models (AM) for a target zero-resourced language are trained using Multi-Task Deep Neural Networks (MDNN) – different languages’ speech mapped to the phonemes of the target language (mapped data) is jointly trained together with the same data transcribed language specifically and respectively (specific data); then, automatically transcribed target language data is used in the iterative process to train new AMs, with various auxiliary tasks. Experiment on 100 hour Japanese speech without transcripts achieved a character error rate (CER) of 57.21%, 19.32% absolute improvement compared to baseline (IPA based universal acoustic modeling).
Copyright: © 2016, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2015 4th National Conference on Electrical, Electronics and Computer Engineering
Series: Advances in Engineering Research
Publication Date: December 2015
ISBN: 978-94-6252-150-6
ISSN: 2352-5401
DOI: 10.2991/nceece-15.2016.72 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Haitao Yao
AU  - Maobo An
AU  - Ji Xu
AU  - Jian Liu
PY  - 2015/12
DA  - 2015/12
TI  - Efficient Acoustic Modeling Method for Unsupervised Speech Recognition using Multi-Task Deep Neural Network
BT  - Proceedings of the 2015 4th National Conference on Electrical, Electronics and Computer Engineering
PB  - Atlantis Press
SP  - 365
EP  - 370
SN  - 2352-5401
UR  - https://doi.org/10.2991/nceece-15.2016.72
DO  - 10.2991/nceece-15.2016.72
ID  - Yao2015/12
ER  -

download .riscopy to clipboard