Proceedings of the 2014 International Conference on Information, Business and Education Technology

Using Gini-Index for Feature Selection in Text Categorization

Authors
Weidong Zhu, Jingyu Feng, Yongmin Lin
Corresponding Author
Weidong Zhu
Available Online February 2014.
DOI
10.2991/icibet-14.2014.22How to use a DOI?
Keywords
text categorization, feature selection, Gini-Index, feature selection function
Abstract

With the rapid development of World Wide Web, text categorization has played an important role in organizing and processing large amount of text data. The first and major problem of text categorization is how to select the best subset from the original high feature space in order to reduce the high dimensionality of the original feature space and improve the classification performance. We aim to use improved Gini-index for text feature selection, constructing the measure function based on Gini-Index. We compare it to other four feature selection measures using two kinds of classifiers on two different document corpus. The result of experiments shows that its performance is comparable with other text feature selection approaches. However, it is perfect in the time complexity of algorithm.

Copyright
© 2014, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2014 International Conference on Information, Business and Education Technology
Series
Advances in Intelligent Systems Research
Publication Date
February 2014
ISBN
978-94-6252-003-5
ISSN
1951-6851
DOI
10.2991/icibet-14.2014.22How to use a DOI?
Copyright
© 2014, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Weidong Zhu
AU  - Jingyu Feng
AU  - Yongmin Lin
PY  - 2014/02
DA  - 2014/02
TI  - Using Gini-Index for Feature Selection in Text Categorization
BT  - Proceedings of the 2014 International Conference on Information, Business and Education Technology
PB  - Atlantis Press
SP  - 76
EP  - 80
SN  - 1951-6851
UR  - https://doi.org/10.2991/icibet-14.2014.22
DO  - 10.2991/icibet-14.2014.22
ID  - Zhu2014/02
ER  -