The Study and Application of Hadoop across Multiple Clusters
- DOI
- 10.2991/meic-14.2014.346How to use a DOI?
- Keywords
- multiple clusters; Apache Hadoop; hierarchical distributed computing; virtual HDFS; job adapter
- Abstract
Hadoop is a wide applied tool for large-scale data-intensive computing in big data, but it can only be implemented on single cluster environment. In this paper, we focus on the application of Hadoop across multiple clusters and dedicate to solve the key problems of data sharing and task scheduling among clusters. A hierarchical distributed computing architecture of Hadoop across multiple clusters is designed. The virtual HDFS and job adapter are proposed to provide global data view and task allocation across multiple data centers. The job submitted by user to this platform is decomposed automatically into several sub-jobs and then allocated to corresponding cluster by location-aware manner. A prototype based on this architecture is presented and currently applied in the distributed spatial information processing across spatial data centers.
- Copyright
- © 2014, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Shengtao Sun AU - Aizhi Wu AU - Xiaoyang Liu PY - 2014/11 DA - 2014/11 TI - The Study and Application of Hadoop across Multiple Clusters BT - Proceedings of the 2014 International Conference on Mechatronics, Electronic, Industrial and Control Engineering PB - Atlantis Press SP - 1535 EP - 1538 SN - 2352-5401 UR - https://doi.org/10.2991/meic-14.2014.346 DO - 10.2991/meic-14.2014.346 ID - Sun2014/11 ER -