Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018)

Research on Text Classification Based on Improved TF-IDF Algorithm

Authors
Huilong Fan, Yongbin Qin
Corresponding Author
Huilong Fan
Available Online May 2018.
DOI
10.2991/ncce-18.2018.79How to use a DOI?
Keywords
TF-IDF; text classification; Bayesian; evaluation index.
Abstract

In solving the problem of feature weight calculation for automatic text classification, we use the most widely used TF-IDF algorithm. Although the algorithm is widely used, there is a problem that the feature categories have different weights when calculating the weights. This paper proposes an improved TF-IDF algorithm (TF-IDCRF) that takes into account the relationships between classes to complete the classification of texts. By modifying the calculation formulas of IDF to correct the problem of insufficient classification of feature categories, the naive Bayes classification algorithm is used to complete the classification. Finally, the proposed algorithm is compared with two other improved TFIDF algorithms. The results of the three text classification evaluation indicators show that the proposed algorithm has certain advantages in text classification.

Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018)
Series
Advances in Intelligent Systems Research
Publication Date
May 2018
ISBN
10.2991/ncce-18.2018.79
ISSN
1951-6851
DOI
10.2991/ncce-18.2018.79How to use a DOI?
Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Huilong Fan
AU  - Yongbin Qin
PY  - 2018/05
DA  - 2018/05
TI  - Research on Text Classification Based on Improved TF-IDF Algorithm
BT  - Proceedings of the 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018)
PB  - Atlantis Press
SP  - 501
EP  - 506
SN  - 1951-6851
UR  - https://doi.org/10.2991/ncce-18.2018.79
DO  - 10.2991/ncce-18.2018.79
ID  - Fan2018/05
ER  -