An Adaptive Chinese Word Segmentation Method

Zhi Yuan

doi:10.2991/amcce-18.2018.96

<Previous Article In Volume

Next Article In Volume>

An Adaptive Chinese Word Segmentation Method

Authors

Zhi Yuan

Corresponding Author

Zhi Yuan

Available Online May 2018.

DOI: 10.2991/amcce-18.2018.96 How to use a DOI?
Keywords: Chinese word segmentation; Active learning; CRF; domain adaption
Abstract: Due to the limitations of the field of training corpus, the Chinese word segmentation based on statistic results in poor self-adaptability in the field. In view of the difficulty of obtaining large-scale annotation corpus in the target area, this paper proposes an area adaptation method that combines domain dictionaries with active learning algorithms. Select a small-scale corpus containing the largest number of unmarked discrepant sentences to prioritize manual annotation, by the statistical analyzing of the difference between the target area text and the existing annotation corpus. Then combine the n-gram statistics in large-scale texts to train the segmentation model in the target area. Finally, the domain adaptiveness is achieved by integrating lexical information into the CRF statistical word segmentation model. Experiments show that this method significantly improves the domain adaptive ability of statistical Chinese word segmentation.
Copyright: © 2018, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018)
Series: Advances in Engineering Research
Publication Date: May 2018
ISBN: 978-94-6252-508-5
ISSN: 2352-5401
DOI: 10.2991/amcce-18.2018.96 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Zhi Yuan
PY  - 2018/05
DA  - 2018/05
TI  - An Adaptive Chinese Word Segmentation Method
BT  - Proceedings of the 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018)
PB  - Atlantis Press
SP  - 556
EP  - 561
SN  - 2352-5401
UR  - https://doi.org/10.2991/amcce-18.2018.96
DO  - 10.2991/amcce-18.2018.96
ID  - Yuan2018/05
ER  -

download .riscopy to clipboard