Proceedings of the 2016 2nd Workshop on Advanced Research and Technology in Industry Applications

Study of Extraction for Web Pages Information Based on XML

Authors
Suming Li
Corresponding Author
Suming Li
Available Online May 2016.
DOI
10.2991/wartia-16.2016.174How to use a DOI?
Keywords
XML, web pages, information extraction, knowledge base.
Abstract

This paper proposes a web information platform based on XML. First, the information platform combines the advantages of existing different extraction technology, automatically extracts the key information in accordance with XML technology, next translates key information into structural and extensible XML documents, finally, concludes corresponding extraction rules by a group of similar pages, and then finishes the extraction for web pages information by these extraction rule.

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 2nd Workshop on Advanced Research and Technology in Industry Applications
Series
Advances in Engineering Research
Publication Date
May 2016
ISBN
978-94-6252-195-7
ISSN
2352-5401
DOI
10.2991/wartia-16.2016.174How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Suming Li
PY  - 2016/05
DA  - 2016/05
TI  - Study of Extraction for Web Pages Information Based on XML
BT  - Proceedings of the 2016 2nd Workshop on Advanced Research and Technology in Industry Applications
PB  - Atlantis Press
SP  - 827
EP  - 830
SN  - 2352-5401
UR  - https://doi.org/10.2991/wartia-16.2016.174
DO  - 10.2991/wartia-16.2016.174
ID  - Li2016/05
ER  -