Proceedings of the International Joint Conference on Science and Engineering 2022 (IJCSE 2022)

Distributed News Crawler Using Fog Cloud Approach

Authors
I. Gusti Lanang Putra Eka Prismana1, *
1Universitas Negeri Surabaya, Surabaya, Indonesia
*Corresponding author. Email: lanangprismana@unesa.ac.id
Corresponding Author
I. Gusti Lanang Putra Eka Prismana
Available Online 27 December 2022.
DOI
10.2991/978-94-6463-100-5_26How to use a DOI?
Keywords
Web crawler; News; Distributed web crawling; Fog cloud
Abstract

Technology advanced quickly during the Industrial Revolution. 4.0, makes the internet network also develop rapidly and become larger. So website technology that is constantly changing becomes a big challenge in using large and complex data information on the global Internet. Stand-alone web crawlers have traditionally been difficult to overcome the challenges of rapid information growth, therefore it's challenging to extract a lot of data in a short period. The research will use distributed technology to build a more effective web-distributed news system, to search for news. Crawler systems can work efficiently with Multi-Threads working together, and each node can work efficiently with Multithreading. This study applies a new web crawler fog cloud approach that is considered to be more efficient in navigating URLs by setting according to the domain used and dividing URL limitations into various priority URL queues so that URLs can be dispersed across concurrent crawler operations to get rid of the new building. In particular, the proposed model can effectively utilize resources optimally in the cloud-fog layer by deploying a crawler distribution in the cloud-fog infrastructure to detect news. With the fog cloud, analysis is dynamically distributed across the fog and cloud layers enabling real-time distribution. The research phase of the distributed news crawler starts from URL collection, URL filtering, scheduling, accessing URLs, and extracting news data. This research is focused on developing web crawlers to process distributed news crawlers.

.

Copyright
© 2022 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Joint Conference on Science and Engineering 2022 (IJCSE 2022)
Series
Advances in Engineering Research
Publication Date
27 December 2022
ISBN
978-94-6463-100-5
ISSN
2352-5401
DOI
10.2991/978-94-6463-100-5_26How to use a DOI?
Copyright
© 2022 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - I. Gusti Lanang Putra Eka Prismana
PY  - 2022
DA  - 2022/12/27
TI  - Distributed News Crawler Using Fog Cloud Approach
BT  - Proceedings of the International Joint Conference on Science and Engineering 2022 (IJCSE 2022)
PB  - Atlantis Press
SP  - 251
EP  - 260
SN  - 2352-5401
UR  - https://doi.org/10.2991/978-94-6463-100-5_26
DO  - 10.2991/978-94-6463-100-5_26
ID  - Prismana2022
ER  -