Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007)

Text Categorization Based on a Similarity Approach

Authors
Cha Yang1, Jun Wen
1School of Computer Science & Engineering, University of Electronic Science and Technology of China
Corresponding Author
Cha Yang
Available Online October 2007.
DOI
10.2991/iske.2007.138How to use a DOI?
Keywords
text classification; Term Frequency/Inverse Document frequency (TFIDF); feature selection; vector space model; word frequency; similarity
Abstract

Text classification can efficiently enhance the text processing capability by automatically sorting out them according to defined collection of categories. This paper uses TFIDF method to represent documents, and set the NGramSize value to be 6. Word Frequency vector is used to measure and distinguish different features on documents. The Similarity Approach uses Cosine function to construct the classifier. The experiment results indicate that proposed algorithm yields good performance with the accuracy up to 98%

Copyright
© 2007, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007)
Series
Advances in Intelligent Systems Research
Publication Date
October 2007
ISBN
10.2991/iske.2007.138
ISSN
1951-6851
DOI
10.2991/iske.2007.138How to use a DOI?
Copyright
© 2007, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Cha Yang
AU  - Jun Wen
PY  - 2007/10
DA  - 2007/10
TI  - Text Categorization Based on a Similarity Approach
BT  - Proceedings of the 2007 International Conference on Intelligent Systems and Knowledge Engineering (ISKE 2007)
PB  - Atlantis Press
SP  - 807
EP  - 811
SN  - 1951-6851
UR  - https://doi.org/10.2991/iske.2007.138
DO  - 10.2991/iske.2007.138
ID  - Yang2007/10
ER  -