Document Structure Identification Method Based on Conditional Random Field
- DOI
- 10.2991/icmcm-16.2016.71How to use a DOI?
- Keywords
- document structure identification; sequence labeling; CRF
- Abstract
On the basis of deep analysis on the structural features and heading features of documents, it has researched the classification method based on templates and the classification method based on statistics as well as the sequence labeling method based on CRF (Conditional Random Field), then proposed to treat document structure identification as sequential data labeling, built CRF training model with feature templates and finally realized document structure identification upon training model with existing way of supervision learning. Experimental results show that identifying paragraph roles from document sequence structure helps to ensure a higher accuracy and it also owns certain fault-tolerant ability. Besides, it is observed that using CRF for many times could further improve the accuracy of identification.
- Copyright
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Yang Lei AU - Yingai Tian AU - Ning Li AU - Xiaolong Gao PY - 2016/12 DA - 2016/12 TI - Document Structure Identification Method Based on Conditional Random Field BT - Proceedings of the 2016 7th International Conference on Mechatronics, Control and Materials (ICMCM 2016) PB - Atlantis Press SP - 354 EP - 361 SN - 2352-5401 UR - https://doi.org/10.2991/icmcm-16.2016.71 DO - 10.2991/icmcm-16.2016.71 ID - Lei2016/12 ER -