Proceedings of the The 1st International Workshop on Cloud Computing and Information Security

A Parallel Clustering Method Study Based on MapReduce

Authors
Zhanquan Sun
Corresponding Author
Zhanquan Sun
Available Online November 2013.
DOI
10.2991/ccis-13.2013.96How to use a DOI?
Keywords
Clustering; Information bottleneck theory; MapReduce; Multidimensional Scaling; Twister
Abstract

Clustering is considered as one of the most important tasks in data mining. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. It has been widely applied to many kinds of areas. Many clustering methods have been studied, such as k-means, Fisher clustering method, Kohonen neural network and so on. In many kinds of areas, the scale of data set becomes larger and larger. Classical clustering methods are out of reach in practice in face of big data. The study of clustering methods based on large scale data is considered as an important task. MapReduce is taken as the most efficient model to deal with data intensive problems. In this paper, parallel clustering method based on MapReduce is studied. The research mainly contributes the following aspects. Firstly, it determines the initial center objectively. Secondly, information loss is taken as the distance metric between two samples. The efficiency of the method is illustrated with a practical DNA clustering problem.

Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the The 1st International Workshop on Cloud Computing and Information Security
Series
Advances in Intelligent Systems Research
Publication Date
November 2013
ISBN
978-90-78677-88-8
ISSN
1951-6851
DOI
10.2991/ccis-13.2013.96How to use a DOI?
Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Zhanquan Sun
PY  - 2013/11
DA  - 2013/11
TI  - A Parallel Clustering Method Study Based on MapReduce
BT  - Proceedings of the The 1st International Workshop on Cloud Computing and Information Security
PB  - Atlantis Press
SP  - 416
EP  - 419
SN  - 1951-6851
UR  - https://doi.org/10.2991/ccis-13.2013.96
DO  - 10.2991/ccis-13.2013.96
ID  - Sun2013/11
ER  -