Proceedings of the 2016 4th International Conference on Machinery, Materials and Information Technology Applications

Cloud Computing K-Means Text Clustering Filtering Algorithm based on Hadoop

Authors
Suyu Huang
Corresponding Author
Suyu Huang
Available Online January 2017.
DOI
10.2991/icmmita-16.2016.278How to use a DOI?
Keywords
clustering; K average; text; cloud computing; big data; filtering
Abstract

the partition and hierarchy methods are the most popular clustering technology of the clustering algorithm. Providing that the k-means is sensitive to the initial clustering center and is likely to become partially optimal, an advanced clustering algorithm based on the partial swarm is presented in this essay through determining the number of clusters and the initial clustering center dynamically with the method shown in Literature [1] combined with the method of Literature [2], so as to optimize the normalization of sample set, weight adjustment of particle swarm, computation of dissimilarity matrix and colony fitness variance. Through this algorithm, the initial clustering center is determined through the density and the max/min distance to eliminate k-means being sensitive to the initial value and partially optimal. The colony fitness variance is introduced through normalization of the dimension properties of sampling set to work out the further optimized hybrid algorithm. According to the test results, this algorithm is featured with higher accuracy and stronger convergence ability.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 4th International Conference on Machinery, Materials and Information Technology Applications
Series
Advances in Computer Science Research
Publication Date
January 2017
ISBN
10.2991/icmmita-16.2016.278
ISSN
2352-538X
DOI
10.2991/icmmita-16.2016.278How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Suyu Huang
PY  - 2017/01
DA  - 2017/01
TI  - Cloud Computing K-Means Text Clustering Filtering Algorithm based on Hadoop
BT  - Proceedings of the 2016 4th International Conference on Machinery, Materials and Information Technology Applications
PB  - Atlantis Press
SP  - 1209
EP  - 1214
SN  - 2352-538X
UR  - https://doi.org/10.2991/icmmita-16.2016.278
DO  - 10.2991/icmmita-16.2016.278
ID  - Huang2017/01
ER  -