An Improved Feature Weighting Strategy in Chinese Text Categorization
Authors
Jia Song, Sijun Qin, Pengzhou Zhang
Corresponding Author
Jia Song
Available Online December 2015.
- DOI
- 10.2991/icmse-15.2015.40How to use a DOI?
- Keywords
- Feature Weighting, Inverse Category Frequency, TFIDF-ICF, Text Categorization
- Abstract
In the process of document formalization, feature weight algorithm plays an essential role. It greatly interferes with the performance of the classifier. To improve the classic TF-IDF algorithm on its shortcomings that ignore feature distribution among the classes, we develop a new strategy to weight feature based on ICF and traditional TF-IDF. We have conducted a series of experiments on two text corpuses, namely the TanCorpV1.0 and Sogou, to analyze the performance of our strategy, which are described in the paper. Experimental results demonstrate the proposed strategy can to some extent improve the performance of text categorization.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Jia Song AU - Sijun Qin AU - Pengzhou Zhang PY - 2015/12 DA - 2015/12 TI - An Improved Feature Weighting Strategy in Chinese Text Categorization BT - Proceedings of the 2015 6th International Conference on Manufacturing Science and Engineering PB - Atlantis Press SP - 202 EP - 208 SN - 2352-5401 UR - https://doi.org/10.2991/icmse-15.2015.40 DO - 10.2991/icmse-15.2015.40 ID - Song2015/12 ER -