A New Data Classification Algorithm for Data-Intensive Computing Environments
- 10.2991/iccia.2012.335How to use a DOI?
- MapReduce, Data-Intensive, SPRINT, Gini index
In order to solve the problem of how to improve the scalability of data processing capabilities and the data availability which encountered by data mining techniques for Data-intensive computing, a new method of tree learning is presented in this paper. By introducing the MapReduce, the tree learning method based on SPRINT can obtain a well scalability when address large datasets. Moreover, we define the process of split point as a series of distributed computations, which is implemented with the MapReduce model respectively. And a new data structure called class distribution table is introduced to assist the calculation of histogram. Experiments and results analysis shows that the algorithm has strong processing capabilities of data mining for data-intensive computing environments.
- © 2013, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Qizhi Deng AU - Longbo Zhang AU - Xin Qian AU - Yali Chen AU - Fengying Wang PY - 2014/05 DA - 2014/05 TI - A New Data Classification Algorithm for Data-Intensive Computing Environments BT - Proceedings of the 2012 2nd International Conference on Computer and Information Application (ICCIA 2012) PB - Atlantis Press SP - 1351 EP - 1354 SN - 1951-6851 UR - https://doi.org/10.2991/iccia.2012.335 DO - 10.2991/iccia.2012.335 ID - Deng2014/05 ER -