Query-by-example spoken term detection based on phonetic posteriorgram
- DOI
- 10.2991/icemct-15.2015.256How to use a DOI?
- Keywords
- query-by-example; spoken term detection; softmax output features; dynamic time warping.
- Abstract
Spoken term detection in low-resource situations is a challenging problem, because traditional large vocabulary continuous speech recognition (LVCSR) approaches are often unusable. This paper introduces a method to use deep neural network (DNN) softmax outputs as input features in a query-by-example (QBE) spoken term detection (STD) system. Matches between queries and test utterances are located using a modified dynamic time warping (DTW) search approach. Subsystems are built with unsupervised Gaussian mixture model (GMM) and DNN monophone models trained on Chinese and English languages and evaluated on the SWS 2013 multilingual database of low-resource languages. The score-level fusion of these different subsystems are shown to improve performance significantly over the baseline results.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Beili Song AU - Weiqiang Zhang AU - Meng Cai AU - Jia Liu AU - Michael T. Johnson PY - 2015/06 DA - 2015/06 TI - Query-by-example spoken term detection based on phonetic posteriorgram BT - Proceedings of the 2015 International Conference on Education, Management and Computing Technology PB - Atlantis Press SP - 1251 EP - 1256 SN - 2352-5398 UR - https://doi.org/10.2991/icemct-15.2015.256 DO - 10.2991/icemct-15.2015.256 ID - Song2015/06 ER -