Proceedings of the 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018)

An Adaptive Chinese Word Segmentation Method

Authors
Zhi Yuan
Corresponding Author
Zhi Yuan
Available Online May 2018.
DOI
10.2991/amcce-18.2018.96How to use a DOI?
Keywords
Chinese word segmentation; Active learning; CRF; domain adaption
Abstract

Due to the limitations of the field of training corpus, the Chinese word segmentation based on statistic results in poor self-adaptability in the field. In view of the difficulty of obtaining large-scale annotation corpus in the target area, this paper proposes an area adaptation method that combines domain dictionaries with active learning algorithms. Select a small-scale corpus containing the largest number of unmarked discrepant sentences to prioritize manual annotation, by the statistical analyzing of the difference between the target area text and the existing annotation corpus. Then combine the n-gram statistics in large-scale texts to train the segmentation model in the target area. Finally, the domain adaptiveness is achieved by integrating lexical information into the CRF statistical word segmentation model. Experiments show that this method significantly improves the domain adaptive ability of statistical Chinese word segmentation.

Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018)
Series
Advances in Engineering Research
Publication Date
May 2018
ISBN
10.2991/amcce-18.2018.96
ISSN
2352-5401
DOI
10.2991/amcce-18.2018.96How to use a DOI?
Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Zhi Yuan
PY  - 2018/05
DA  - 2018/05
TI  - An Adaptive Chinese Word Segmentation Method
BT  - Proceedings of the 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018)
PB  - Atlantis Press
SP  - 556
EP  - 561
SN  - 2352-5401
UR  - https://doi.org/10.2991/amcce-18.2018.96
DO  - 10.2991/amcce-18.2018.96
ID  - Yuan2018/05
ER  -