Research on the key technologies of corpus preprocessing in Mongolian-Chinese SMT
Authors
Jin-ting Li, Hong-xu Hou, Jing Wu, Hong-bin Wang, Wen-ting Fan
Corresponding Author
Jin-ting Li
Available Online November 2016.
- DOI
- 10.2991/icimm-16.2016.119How to use a DOI?
- Keywords
- Mongolian-Chinese SMT; corpus preprocessing; Mongolian morphological analysis; Chinese word segmentation.
- Abstract
The traditional preprocessing method in morphology analysis uses Mongolian suffix segmentation and stemming. But there exists many cases in Mongolian. If the case is not processed, the corpus will suffer from data sparse problem and lead to poor translation performance. Therefore, we sum-marize and research the existing corpus preprocessing method, and focus on the effect of case pro-cessing, in order to improving the performance of Mongolian-Chinese SMT by analyzing Mongolian morphological. Experiments show improvements of about 3.22 relative in the BLEU score of SMT over baseline system 1 by optimizing the preprocessing method.
- Copyright
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Jin-ting Li AU - Hong-xu Hou AU - Jing Wu AU - Hong-bin Wang AU - Wen-ting Fan PY - 2016/11 DA - 2016/11 TI - Research on the key technologies of corpus preprocessing in Mongolian-Chinese SMT BT - Proceedings of the 6th International Conference on Information Engineering for Mechanics and Materials PB - Atlantis Press SP - 665 EP - 672 SN - 2352-5401 UR - https://doi.org/10.2991/icimm-16.2016.119 DO - 10.2991/icimm-16.2016.119 ID - Li2016/11 ER -