Proceedings of the 2015 4th National Conference on Electrical, Electronics and Computer Engineering

Efficient Acoustic Modeling Method for Unsupervised Speech Recognition using Multi-Task Deep Neural Network

Authors
Haitao Yao, Maobo An, Ji Xu, Jian Liu
Corresponding Author
Haitao Yao
Available Online December 2015.
DOI
10.2991/nceece-15.2016.72How to use a DOI?
Keywords
Speech Recognition; Acoustic Modeling; Unsupervised Training; Multi-Lingual; Multi-Task Deep Neural Network
Abstract

This paper proposes a method of acoustic modeling for zero-resourced languages speech recognition under mismatch conditions. In those languages, very limited or no transcribed speech is available for traditional monolingual speech recognition. Conventional methods such as IPA based universal acoustic modeling has been proved to be effective under matched acoustic conditions (similar speaking styles, adjacent languages, etc.), while usually poorly preformed when mismatch occurs. Since mismatch problems between languages often appears, in this paper, unsupervised acoustic modeling via cross-lingual knowledge sharing has thus been proposed: first, initial acoustic models (AM) for a target zero-resourced language are trained using Multi-Task Deep Neural Networks (MDNN) – different languages’ speech mapped to the phonemes of the target language (mapped data) is jointly trained together with the same data transcribed language specifically and respectively (specific data); then, automatically transcribed target language data is used in the iterative process to train new AMs, with various auxiliary tasks. Experiment on 100 hour Japanese speech without transcripts achieved a character error rate (CER) of 57.21%, 19.32% absolute improvement compared to baseline (IPA based universal acoustic modeling).

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 4th National Conference on Electrical, Electronics and Computer Engineering
Series
Advances in Engineering Research
Publication Date
December 2015
ISBN
10.2991/nceece-15.2016.72
ISSN
2352-5401
DOI
10.2991/nceece-15.2016.72How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Haitao Yao
AU  - Maobo An
AU  - Ji Xu
AU  - Jian Liu
PY  - 2015/12
DA  - 2015/12
TI  - Efficient Acoustic Modeling Method for Unsupervised Speech Recognition using Multi-Task Deep Neural Network
BT  - Proceedings of the 2015 4th National Conference on Electrical, Electronics and Computer Engineering
PB  - Atlantis Press
SP  - 365
EP  - 370
SN  - 2352-5401
UR  - https://doi.org/10.2991/nceece-15.2016.72
DO  - 10.2991/nceece-15.2016.72
ID  - Yao2015/12
ER  -