Proceedings of the 3rd International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2019)

Study of Tibetan Text Classification based on fastText

Authors
Wei Ma, Hongzhi Yu, Jing Ma
Corresponding Author
Wei Ma
Available Online July 2019.
DOI
10.2991/iccia-19.2019.58How to use a DOI?
Keywords
text classification, Tibetan text, fastText.
Abstract

Tibetan text classification is an important research topic in Tibetan information processing. In this paper, we attempt to apply fastText text classification tool and fastText pre-training word vectors for Tibetan text classification. In the experiment, For the Tibetan language corpus segmented by Tibetan syllable points, we represent all the words in each document with the fastText pre-training word vectors, and then average all the word vectors in this data. The average vector (docvec) represent each piece of document, we put it into SVM classifier, and the results show that the model outperforms competitive the traditional Tibetan text classification method, and the F-measure has improved by 10%.

Copyright
© 2019, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 3rd International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2019)
Series
Advances in Computer Science Research
Publication Date
July 2019
ISBN
10.2991/iccia-19.2019.58
ISSN
2352-538X
DOI
10.2991/iccia-19.2019.58How to use a DOI?
Copyright
© 2019, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Wei Ma
AU  - Hongzhi Yu
AU  - Jing Ma
PY  - 2019/07
DA  - 2019/07
TI  - Study of Tibetan Text Classification based on fastText
BT  - Proceedings of the 3rd International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2019)
PB  - Atlantis Press
SP  - 374
EP  - 380
SN  - 2352-538X
UR  - https://doi.org/10.2991/iccia-19.2019.58
DO  - 10.2991/iccia-19.2019.58
ID  - Ma2019/07
ER  -