Proceedings of the 2018 International Symposium on Communication Engineering & Computer Science (CECS 2018)

The Analysis of Web Page Information Processing Based on Natural Language Processing

Authors
Yusheng Zhao
Corresponding Author
Yusheng Zhao
Available Online July 2018.
DOI
10.2991/cecs-18.2018.79How to use a DOI?
Keywords
Natural Language Processing, Python, Crawler, Word Segmentation, TF-IDF.
Abstract

Nowadays, the structure of webpages has gradually become more and more complicated, and the content of webpages has gradually increased. This has caused a lot of useless and even illegal information in webpages. The screening of keywords in webpage information and the evasion of invalid illegal information have become the focus of attention. This paper will use natural language processing (NLP) technology to crawl web page information and then process it, in order to avoid some invalid or illegal information, and to find out the key information in the web page. Therefore, this paper also concludes that NLP is reasonable and practical for applications on web pages.

Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2018 International Symposium on Communication Engineering & Computer Science (CECS 2018)
Series
Advances in Computer Science Research
Publication Date
July 2018
ISBN
10.2991/cecs-18.2018.79
ISSN
2352-538X
DOI
10.2991/cecs-18.2018.79How to use a DOI?
Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Yusheng Zhao
PY  - 2018/07
DA  - 2018/07
TI  - The Analysis of Web Page Information Processing Based on Natural Language Processing
BT  - Proceedings of the 2018 International Symposium on Communication Engineering & Computer Science (CECS 2018)
PB  - Atlantis Press
SP  - 466
EP  - 469
SN  - 2352-538X
UR  - https://doi.org/10.2991/cecs-18.2018.79
DO  - 10.2991/cecs-18.2018.79
ID  - Zhao2018/07
ER  -