Proceedings of the 3rd International Conference on Wireless Communication and Sensor Networks (WCSN 2016)

A Survey of Web Page Preprocessing Research

Authors
Qi Qi, Gui-Xian Xu
Corresponding Author
Qi Qi
Available Online December 2016.
DOI
10.2991/icwcsn-16.2017.118How to use a DOI?
Keywords
Web page cleaning; data mining; Web mining; information retrieval.
Abstract

After obtaining the required information through the crawler technology on Web, it also includes a lot of advertisement and navigation bar. So we should take the basic method to remove the noise content on Web page, which is independent of topic, it is necessary to sum up the Web denoising and do a further study. Firstly, we should explain why the page denosing is necessary, define the page denoising, and summarize the method of Web page denosing, Secondly, we should the improve the algorithm on the Web page denoising, Finally we should discuss the webpage denoising problems and the future research direction.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 3rd International Conference on Wireless Communication and Sensor Networks (WCSN 2016)
Series
Advances in Computer Science Research
Publication Date
December 2016
ISBN
10.2991/icwcsn-16.2017.118
ISSN
2352-538X
DOI
10.2991/icwcsn-16.2017.118How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Qi Qi
AU  - Gui-Xian Xu
PY  - 2016/12
DA  - 2016/12
TI  - A Survey of Web Page Preprocessing Research
BT  - Proceedings of the 3rd International Conference on Wireless Communication and Sensor Networks (WCSN 2016)
PB  - Atlantis Press
SP  - 585
EP  - 588
SN  - 2352-538X
UR  - https://doi.org/10.2991/icwcsn-16.2017.118
DO  - 10.2991/icwcsn-16.2017.118
ID  - Qi2016/12
ER  -