An optimization strategy of massive small files storage based on HDFS
- DOI
- 10.2991/jiaet-18.2018.40How to use a DOI?
- Keywords
- Storage of Small Files, Distribution of Small Files, Merge, Relationship between files.
- Abstract
Nowadays, Hadoop distributed file system as a distributed storage system, has a good effect on the storage of large files. However, there is a natural flaw in the storage of small files: storing a large number of small files will produce excessive metadata, resulting in namenode memory bottlenecks; frequent RPC communications will cause time consumption due to over-provisioning. To solve these problems, this paper presents a merging algorithm based on two factors: the distribution of files and the correlation of files. The algorithm can not only reduce the HDFS blocks, but also make relevant files close. Experimental results show that the algorithm effectively improves the storage efficiency of HDFS on small files and help to optimize the access of small files.
- Copyright
- © 2018, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Xun Cai AU - Cai Chen AU - Yi Liang PY - 2018/03 DA - 2018/03 TI - An optimization strategy of massive small files storage based on HDFS BT - Proceedings of the 2018 Joint International Advanced Engineering and Technology Research Conference (JIAET 2018) PB - Atlantis Press SP - 225 EP - 230 SN - 2352-5401 UR - https://doi.org/10.2991/jiaet-18.2018.40 DO - 10.2991/jiaet-18.2018.40 ID - Cai2018/03 ER -