Proceedings of the 2015 International Conference on Management Science and Management Innovation

An Algorithm of Feature Selection in Text Categorization Based on Gini-index

Authors
Wei-Dong Zhu, Bo Wang, Yong-Min Lin
Corresponding Author
Wei-Dong Zhu
Available Online August 2015.
DOI
10.2991/msmi-15.2015.50How to use a DOI?
Keywords
Text categorization, Feature selection, Gini-index, Feature selection function.
Abstract

TWith the rapid development of World Wide Web, text categorization has played an important role in organizing and processing large amount of text data. The first and major problem of text categorization is how to select the best subset from the original high feature space in order to reduce the high dimensionality of the original feature space and improve the classification performance. Gini-Index is the principle of multi-attribute selection very early used for attribute selection in Decision Tree, which performs near state-of-the-art level. However, relatively little work has been done on applying Gini-Index to text feature selection. We use improved Gini-index for text feature selection, constructing the measure function based on Gini-Index. We compare it to other four feature selection measures using two kinds of classifiers on two different document corpus. The result of experiments shows that its performance is comparable with other text feature selection approaches. However, it is perfect in the time complexity of algorithm.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 International Conference on Management Science and Management Innovation
Series
Advances in Economics, Business and Management Research
Publication Date
August 2015
ISBN
10.2991/msmi-15.2015.50
ISSN
2352-5428
DOI
10.2991/msmi-15.2015.50How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Wei-Dong Zhu
AU  - Bo Wang
AU  - Yong-Min Lin
PY  - 2015/08
DA  - 2015/08
TI  - An Algorithm of Feature Selection in Text Categorization Based on Gini-index
BT  - Proceedings of the 2015 International Conference on Management Science and Management Innovation
PB  - Atlantis Press
SP  - 272
EP  - 278
SN  - 2352-5428
UR  - https://doi.org/10.2991/msmi-15.2015.50
DO  - 10.2991/msmi-15.2015.50
ID  - Zhu2015/08
ER  -