Web Page Data Collection Based on Multithread
- DOI
- 10.2991/iccsee.2013.509How to use a DOI?
- Keywords
- web page, data collection,multithread
- Abstract
The web data collection is the process of collecting the semi-structured, large-scale and redundant data which include web content, web structure and web usage in the web by the crawler and it is often used for the information extraction, information retrieval, search engine and web data mining. In this paper, the web data collection principle is introduced and some related topics are discussed such as page download, coding problem, updated strategy, static and dynamic page. The multithread technology is described and multithread mode for the web data collection is proposed. The web data collection with multithread can get better resource utilization, better average response time and better performance.
- Copyright
- © 2013, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Wentao Liu PY - 2013/03 DA - 2013/03 TI - Web Page Data Collection Based on Multithread BT - Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013) PB - Atlantis Press SP - 2023 EP - 2026 SN - 1951-6851 UR - https://doi.org/10.2991/iccsee.2013.509 DO - 10.2991/iccsee.2013.509 ID - Liu2013/03 ER -