Proceedings of the 2012 2nd International Conference on Computer and Information Application (ICCIA 2012)

A New Data Classification Algorithm for Data-Intensive Computing Environments

Authors
Qizhi Deng, Longbo Zhang, Xin Qian, Yali Chen, Fengying Wang
Corresponding Author
Qizhi Deng
Available Online May 2014.
DOI
https://doi.org/10.2991/iccia.2012.335How to use a DOI?
Keywords
MapReduce, Data-Intensive, SPRINT, Gini index
Abstract

In order to solve the problem of how to improve the scalability of data processing capabilities and the data availability which encountered by data mining techniques for Data-intensive computing, a new method of tree learning is presented in this paper. By introducing the MapReduce, the tree learning method based on SPRINT can obtain a well scalability when address large datasets. Moreover, we define the process of split point as a series of distributed computations, which is implemented with the MapReduce model respectively. And a new data structure called class distribution table is introduced to assist the calculation of histogram. Experiments and results analysis shows that the algorithm has strong processing capabilities of data mining for data-intensive computing environments.

Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2012 2nd International Conference on Computer and Information Application (ICCIA 2012)
Series
Advances in Intelligent Systems Research
Publication Date
May 2014
ISBN
978-94-91216-41-1
ISSN
1951-6851
DOI
https://doi.org/10.2991/iccia.2012.335How to use a DOI?
Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Qizhi Deng
AU  - Longbo Zhang
AU  - Xin Qian
AU  - Yali Chen
AU  - Fengying Wang
PY  - 2014/05
DA  - 2014/05
TI  - A New Data Classification Algorithm for Data-Intensive Computing Environments
BT  - Proceedings of the 2012 2nd International Conference on Computer and Information Application (ICCIA 2012)
PB  - Atlantis Press
SP  - 1351
EP  - 1354
SN  - 1951-6851
UR  - https://doi.org/10.2991/iccia.2012.335
DO  - https://doi.org/10.2991/iccia.2012.335
ID  - Deng2014/05
ER  -