An Improved Feature Extraction Algorithm Based on CHI and MI
- 10.2991/iccmcee-15.2015.209How to use a DOI?
- text classification, feature weight, Chi-square statistic, mutual information.
The problem of high-dimensional feature vector in vector space model (VSM) is an important problem in text classification. After preprocessing the initial number of feature items is generally huge, and many of them for text categorization are useless. This will not only increase the running time of the classification algorithm, but also affect the classification of the text. Previous studies after comparison of various feature extraction methods, it is concluded that the conclusion of CHI and MI method is better, but these methods are different degrees of exist some disadvantages. In this paper, we try to analyze the shortcomings of CHI and Ml method, and propose the viewpoint of parameter correction. Finally, the two methods are fused together, so as to achieve the purpose of improving the classification effect.
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - GuiChuan Feng AU - Shubin Cai PY - 2015/11 DA - 2015/11 TI - An Improved Feature Extraction Algorithm Based on CHI and MI BT - Proceedings of the 2015 4th International Conference on Computer, Mechatronics, Control and Electronic Engineering PB - Atlantis Press SP - 1106 EP - 1109 SN - 2352-5401 UR - https://doi.org/10.2991/iccmcee-15.2015.209 DO - 10.2991/iccmcee-15.2015.209 ID - Feng2015/11 ER -