Proceedings of the 2015 3rd International Conference on Machinery, Materials and Information Technology Applications

Improvements and Implementation of Hierarchical Clustering based on Hadoop

Authors
Jun Zhang, Chunxiao Fan, Yuexin Wu, Ao Xiao
Corresponding Author
Jun Zhang
Available Online November 2015.
DOI
10.2991/icmmita-15.2015.236How to use a DOI?
Keywords
Hierarchical Clustering; Hadoop; MapReduce
Abstract

As the traditional agglomerative hierarchical clustering has a higher number of iterations which makes low efficiency of parallel realization on Hadoop, we propose an improved hierarchical clustering method: when the between-class distance is monotonically increasing, by changing the clustering order of hierarchical clustering without changing the final clustering result, aggregate multiple classes in a MapReduce operation, to reduce the number of iterations then enhance the computational efficiency. The experiments show compared to traditional hierarchical clustering algorithm implemented in Hadoop, the improved algorithm implemented in Hadoop has greatly reduces the number of iterations and the computation time.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 3rd International Conference on Machinery, Materials and Information Technology Applications
Series
Advances in Computer Science Research
Publication Date
November 2015
ISBN
10.2991/icmmita-15.2015.236
ISSN
2352-538X
DOI
10.2991/icmmita-15.2015.236How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Jun Zhang
AU  - Chunxiao Fan
AU  - Yuexin Wu
AU  - Ao Xiao
PY  - 2015/11
DA  - 2015/11
TI  - Improvements and Implementation of Hierarchical Clustering based on Hadoop
BT  - Proceedings of the 2015 3rd International Conference on Machinery, Materials and Information Technology Applications
PB  - Atlantis Press
SP  - 1279
EP  - 1284
SN  - 2352-538X
UR  - https://doi.org/10.2991/icmmita-15.2015.236
DO  - 10.2991/icmmita-15.2015.236
ID  - Zhang2015/11
ER  -