Proceedings of the 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018)

Similarity calculation based on Mongolian news corpus

Authors
Yaowen Gao, Feilong Bao, Guanglai Gao
Corresponding Author
Yaowen Gao
Available Online May 2018.
DOI
10.2991/amcce-18.2018.31How to use a DOI?
Keywords
Similarity, Mongolian, Vector Space Model.
Abstract

Similarity calculation is an important part of new event detection and effective computation of text similarity can remove redundant information and improve the efficiency of users' query. The paper mainly studies the calculation of the similarity between the Mongolian news materials. Because of the non-standard Mongolian news corpus, the corpus needs to be preprocessed in order to deal with the later work, which can improve the efficiency. So first of all, it is necessary to preprocess the news corpus, including code conversion、text proofreading、stop-words removal and suffixes removal. Then the news messages are mapped to vectors with a vector space model and calculating similarity between the vectors by Cosine formula. Finally, we choose precision、recall、F-measure as evaluation standard to evaluate the experimental results. The results show that the experiment is better than the manual.

Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018)
Series
Advances in Engineering Research
Publication Date
May 2018
ISBN
10.2991/amcce-18.2018.31
ISSN
2352-5401
DOI
10.2991/amcce-18.2018.31How to use a DOI?
Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Yaowen Gao
AU  - Feilong Bao
AU  - Guanglai Gao
PY  - 2018/05
DA  - 2018/05
TI  - Similarity calculation based on Mongolian news corpus
BT  - Proceedings of the 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018)
PB  - Atlantis Press
SP  - 176
EP  - 181
SN  - 2352-5401
UR  - https://doi.org/10.2991/amcce-18.2018.31
DO  - 10.2991/amcce-18.2018.31
ID  - Gao2018/05
ER  -