Proceedings of the International Conference on Communication and Electronic Information Engineering (CEIE 2016)

Research on Web Character Information Extraction Based on Semantic Similarity

Authors
Baocheng Wang, Wei Huang, Zhongren Li, Ke Xiao
Corresponding Author
Baocheng Wang
Available Online October 2016.
DOI
https://doi.org/10.2991/ceie-16.2017.85How to use a DOI?
Keywords
Semantic Similarity; Character Information Extraction; Machine Learning
Abstract
As for the loss of the comprehensiveness from the large amount of data when extracting information, this paper proposes a method of character information extraction based on semantic similarity algorithm to improve the comprehensiveness of the character information extraction of massive data in the network. The algorithm is put into the semantic tree to choose the synonyms of the word, and the character feature set which is extended by semantic similarity is applied to character information extraction. The results show that the recall reaches to 81.87% in the case of the accuracy rate being basically unchanged. Therefore, this method of character information extraction is obviously improving in comprehensiveness, and it can be used in network data.
Open Access
This is an open access article distributed under the CC BY-NC license.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Communication and Electronic Information Engineering (CEIE 2016)
Series
Advances in Engineering Research
Publication Date
October 2016
ISBN
978-94-6252-312-8
ISSN
2352-5401
DOI
https://doi.org/10.2991/ceie-16.2017.85How to use a DOI?
Open Access
This is an open access article distributed under the CC BY-NC license.

Cite this article

TY  - CONF
AU  - Baocheng Wang
AU  - Wei Huang
AU  - Zhongren Li
AU  - Ke Xiao
PY  - 2016/10
DA  - 2016/10
TI  - Research on Web Character Information Extraction Based on Semantic Similarity
BT  - Proceedings of the International Conference on Communication and Electronic Information Engineering (CEIE 2016)
PB  - Atlantis Press
SP  - 663
EP  - 670
SN  - 2352-5401
UR  - https://doi.org/10.2991/ceie-16.2017.85
DO  - https://doi.org/10.2991/ceie-16.2017.85
ID  - Wang2016/10
ER  -