Short text model based on Strong feature thesaurus

Wentao Lu; Yongfeng Huang; Xing Li; Zhuo Zhang; Yingkun Li

doi:10.2991/isrme-15.2015.126

<Previous Article In Volume

Next Article In Volume>

Short text model based on Strong feature thesaurus

Authors

Wentao Lu, Yongfeng Huang, Xing Li, Zhuo Zhang, Yingkun Li

Corresponding Author

Wentao Lu

Available Online April 2015.

DOI: 10.2991/isrme-15.2015.126 How to use a DOI?
Keywords: Short Text Model; Data Sparseness; Strong Feature; Latent Dirichlet Allocation; Clustering
Abstract: Data Sparseness, the evident characteristic of short text, is caused by the diversity of language expression and the short text length. The previous text models represented by Bag of Word (BOW) only considers the statistical feature of words, and thus always underperformed when it comes to short texts. To tackle this problem, we introduced a new text model by combining the statistical method and semantic estimation. Specifically, we managed to obtain the “Strong Feature Thesaurus” through mining process with Latent Dirichlet allocation (LDA) model, and then the semantic information is incorporated in the BOW by weighting those strong feature terms. To assess the performance of this model, we conduct two experiments of the clustering of short text corpuses. The results have shown that our model outperform the prevailing text models such as BOW.
Copyright: © 2015, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2015 International Conference on Intelligent Systems Research and Mechatronics Engineering
Series: Advances in Intelligent Systems Research
Publication Date: April 2015
ISBN: 978-94-62520-59-2
ISSN: 1951-6851
DOI: 10.2991/isrme-15.2015.126 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Wentao Lu
AU  - Yongfeng Huang
AU  - Xing Li
AU  - Zhuo Zhang
AU  - Yingkun Li
PY  - 2015/04
DA  - 2015/04
TI  - Short text model based on Strong feature thesaurus
BT  - Proceedings of the 2015 International Conference on Intelligent Systems Research and Mechatronics Engineering
PB  - Atlantis Press
SP  - 620
EP  - 625
SN  - 1951-6851
UR  - https://doi.org/10.2991/isrme-15.2015.126
DO  - 10.2991/isrme-15.2015.126
ID  - Lu2015/04
ER  -

download .riscopy to clipboard