Proceedings of the 2015 International Conference on Intelligent Systems Research and Mechatronics Engineering

Short text model based on Strong feature thesaurus

Authors
Wentao Lu, Yongfeng Huang, Xing Li, Zhuo Zhang, Yingkun Li
Corresponding Author
Wentao Lu
Available Online April 2015.
DOI
10.2991/isrme-15.2015.126How to use a DOI?
Keywords
Short Text Model; Data Sparseness; Strong Feature; Latent Dirichlet Allocation; Clustering
Abstract

Data Sparseness, the evident characteristic of short text, is caused by the diversity of language expression and the short text length. The previous text models represented by Bag of Word (BOW) only considers the statistical feature of words, and thus always underperformed when it comes to short texts. To tackle this problem, we introduced a new text model by combining the statistical method and semantic estimation. Specifically, we managed to obtain the “Strong Feature Thesaurus” through mining process with Latent Dirichlet allocation (LDA) model, and then the semantic information is incorporated in the BOW by weighting those strong feature terms. To assess the performance of this model, we conduct two experiments of the clustering of short text corpuses. The results have shown that our model outperform the prevailing text models such as BOW.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 International Conference on Intelligent Systems Research and Mechatronics Engineering
Series
Advances in Intelligent Systems Research
Publication Date
April 2015
ISBN
978-94-62520-59-2
ISSN
1951-6851
DOI
10.2991/isrme-15.2015.126How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Wentao Lu
AU  - Yongfeng Huang
AU  - Xing Li
AU  - Zhuo Zhang
AU  - Yingkun Li
PY  - 2015/04
DA  - 2015/04
TI  - Short text model based on Strong feature thesaurus
BT  - Proceedings of the 2015 International Conference on Intelligent Systems Research and Mechatronics Engineering
PB  - Atlantis Press
SP  - 620
EP  - 625
SN  - 1951-6851
UR  - https://doi.org/10.2991/isrme-15.2015.126
DO  - 10.2991/isrme-15.2015.126
ID  - Lu2015/04
ER  -