An Approach of Web Page Information Extraction

Yaohui Li; Lixia Wang; Jianxiong Wang; Jie Yue; Mingzhan Zhao

doi:10.2991/iccsee.2013.556

<Previous Article In Volume

Next Article In Volume>

An Approach of Web Page Information Extraction

Authors

Yaohui Li, Lixia Wang, Jianxiong Wang, Jie Yue, Mingzhan Zhao

Corresponding Author

Yaohui Li

Available Online March 2013.

DOI: 10.2991/iccsee.2013.556 How to use a DOI?
Keywords: Information extraction, DOM, page segmentation, HTML tag
Abstract: The Web has become the largest information source, but the noise content is an inevitable part in any web pages. The noise content reduces the nicety of search engine and increases the load of server. Information extraction technology has been developed. Information extraction technology is mostly based on page segmentation. Through analyzed the existing method of page segmentation, an approach of web page information extraction is provided. The block node is identified by analyzing attributes of HTML tags. This algorithm is easy to implementation. Experiments prove its good performance.
Copyright: © 2013, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013)
Series: Advances in Intelligent Systems Research
Publication Date: March 2013
ISBN: 978-90-78677-61-1
ISSN: 1951-6851
DOI: 10.2991/iccsee.2013.556 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Yaohui Li
AU  - Lixia Wang
AU  - Jianxiong Wang
AU  - Jie Yue
AU  - Mingzhan Zhao
PY  - 2013/03
DA  - 2013/03
TI  - An Approach of Web Page Information Extraction
BT  - Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering (ICCSEE 2013)
PB  - Atlantis Press
SP  - 2217
EP  - 2219
SN  - 1951-6851
UR  - https://doi.org/10.2991/iccsee.2013.556
DO  - 10.2991/iccsee.2013.556
ID  - Li2013/03
ER  -

download .riscopy to clipboard