A Phrase Combination Approach to Patent SMT
- 10.2991/jcis.2008.99How to use a DOI?
- statistical machine translation, patent, phrase combination, word segmentation
This paper presents a phrase combination approach to patent SMT (Statistical Ma-chine Translation) for Japanese to English. To minimize the segmentation problems caused by the rich OOV (out-of-vocabulary) words in the patent texts, the character based translation phrases are first introduced to avoid the segmentation errors in translation modeling. Then the word based translation phrases, which are established to utilize the dependent word level information, are combined with character translation table by linearly integrating their probability. Our experiments on NTCIR corpus indicate that the proposed method significantly out-performed the originally word based approach.
- © 2008, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Junguo Zhu AU - Muyun Yang AU - Tiejun Zhao AU - Sheng Li AU - Qi Haoliang PY - 2008/12 DA - 2008/12 TI - A Phrase Combination Approach to Patent SMT BT - Proceedings of the 11th Joint Conference on Information Sciences (JCIS 2008) PB - Atlantis Press SP - 590 EP - 594 SN - 1951-6851 UR - https://doi.org/10.2991/jcis.2008.99 DO - 10.2991/jcis.2008.99 ID - Zhu2008/12 ER -