A Distributed Web Crawler Model based on Cloud Computing

Jiankun Yu; Mengrong Li; Dengyin Zhang

doi:10.2991/itoec-16.2016.51

<Previous Article In Volume

Next Article In Volume>

A Distributed Web Crawler Model based on Cloud Computing

Authors

Jiankun Yu, Mengrong Li, Dengyin Zhang

Corresponding Author

Jiankun Yu

Available Online May 2016.

DOI: 10.2991/itoec-16.2016.51 How to use a DOI?
Keywords: web crawler; cloud computing; distributed; cloud-based web crawler
Abstract: With the rapid development of the network, distributed Web Crawler was introduced for fetching the massive web pages. However, the traditional distributed Web Crawler has disadvantages in load balancing between different nodes. In addition, the number of fetching web pages had not grown up linearly in the case of extended crawling nodes. This paper proposes a distributed web crawler model which runs on the Hadoop platform. The characteristics of Hadoop guarantees the scalability of the crawler model proposed by this paper. At the same time, the crawler model makes good use of HBase to guarantee the storage service of massive web context data. This paper also proposed a method of load balancing which is based on the feedback of crawling nodes. The crawler model has been proved to have good performance in load balancing and node extension.
Copyright: © 2016, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2nd Information Technology and Mechatronics Engineering Conference (ITOEC 2016)
Series: Advances in Engineering Research
Publication Date: May 2016
ISBN: 978-94-6252-178-0
ISSN: 2352-5401
DOI: 10.2991/itoec-16.2016.51 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Jiankun Yu
AU  - Mengrong Li
AU  - Dengyin Zhang
PY  - 2016/05
DA  - 2016/05
TI  - A Distributed Web Crawler Model based on Cloud Computing
BT  - Proceedings of the 2nd Information Technology and Mechatronics Engineering Conference (ITOEC 2016)
PB  - Atlantis Press
SP  - 276
EP  - 279
SN  - 2352-5401
UR  - https://doi.org/10.2991/itoec-16.2016.51
DO  - 10.2991/itoec-16.2016.51
ID  - Yu2016/05
ER  -

download .riscopy to clipboard