Gray Tunneling Based on Joint Link for Focused Crawling
- DOI
- 10.2991/icmra-15.2015.167How to use a DOI?
- Keywords
- Focused Crawling; Gray Tunneling; Web Link Machine Learning; Q Learning
- Abstract
Tunneling problems of the topic-multiplicity of a web page makes the relevance of the highly relevant page to be weakened. In this paper, we proposed a novel relevance prediction for focused crawling to solve gray tunneling. Our approach is based on calculating the relevancy score of web page based on its block relevancy score with respect to topics and calculating the URL score based on its parent pages and its anchor contexts, and we joins the context similarity and the link similarity which is based on Q feedback learning. Experimental results showed that the proposed method outperformed the Link-Contexts, Best-First and Breadth-First for all test data sets.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Wei Dong AU - Hong Ni AU - Haojiang Deng AU - Liheng Tuo PY - 2015/04 DA - 2015/04 TI - Gray Tunneling Based on Joint Link for Focused Crawling BT - Proceedings of the 3rd International Conference on Mechatronics, Robotics and Automation PB - Atlantis Press SP - 859 EP - 862 SN - 2352-538X UR - https://doi.org/10.2991/icmra-15.2015.167 DO - 10.2991/icmra-15.2015.167 ID - Dong2015/04 ER -