Proceedings of the 2016 International Conference on Education, Management, Computer and Society

International Conference on Education, Management, Computer and Society

📍Shenyang, China🗓️ 1-3 January 2016

Research on Similarity for XML Information Retrieval

Authors
Xueli Ren, Yubiao Dai
Corresponding Author
Xueli Ren
Available Online January 2016.
DOI
10.2991/emcs-16.2016.476How to use a DOI?
Keywords
XML; Information Retrieval; Structure Similarity; Sematic; Content
Abstract

With the continuous development of Internet and rich resources emerging on the Web, information retrieval based on XML has emerged; the similarity of documents is the basis of information retrieval. A new method SC-Similarity is proposed to compute similarity of XML documents from structure and content in the paper. XML document is expressed as a collection of tuple, the paths are extracted and delete the recurring in order to improve efficiency, and matching fuzzy path using dynamic programming and WordNet; and then the structure similarity between documents are calculated using Hungarian algorithm; the content similarity are estimated by set matching. Finally, the similarity of XML documents is estimated. Two experiments are done to show that the method is effective, the experiment 1 test structural similarity; the information retrieval is test using automatically generated documentation sets and real data sets in the experiment 2, and results show the accuracy may arrive at 95%.

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 International Conference on Education, Management, Computer and Society
Series
Advances in Computer Science Research
Publication Date
January 2016
ISBN
978-94-6252-158-2
ISSN
2352-538X
DOI
10.2991/emcs-16.2016.476How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Xueli Ren
AU  - Yubiao Dai
PY  - 2016/01
DA  - 2016/01
TI  - Research on Similarity for XML Information Retrieval
BT  - Proceedings of the 2016 International Conference on Education, Management, Computer and Society
PB  - Atlantis Press
SP  - 1897
EP  - 1901
SN  - 2352-538X
UR  - https://doi.org/10.2991/emcs-16.2016.476
DO  - 10.2991/emcs-16.2016.476
ID  - Ren2016/01
ER  -