Proceedings of the 2016 International Conference on Education, Management, Computer and Society

Research on Similarity for XML Information Retrieval

Authors
Xueli Ren, Yubiao Dai
Corresponding Author
Xueli Ren
Available Online January 2016.
DOI
https://doi.org/10.2991/emcs-16.2016.476How to use a DOI?
Keywords
XML; Information Retrieval; Structure Similarity; Sematic; Content
Abstract
With the continuous development of Internet and rich resources emerging on the Web, information retrieval based on XML has emerged; the similarity of documents is the basis of information retrieval. A new method SC-Similarity is proposed to compute similarity of XML documents from structure and content in the paper. XML document is expressed as a collection of tuple, the paths are extracted and delete the recurring in order to improve efficiency, and matching fuzzy path using dynamic programming and WordNet; and then the structure similarity between documents are calculated using Hungarian algorithm; the content similarity are estimated by set matching. Finally, the similarity of XML documents is estimated. Two experiments are done to show that the method is effective, the experiment 1 test structural similarity; the information retrieval is test using automatically generated documentation sets and real data sets in the experiment 2, and results show the accuracy may arrive at 95%.
Open Access
This is an open access article distributed under the CC BY-NC license.

Download article (PDF)

Proceedings
International Conference on Education, Management, Computer and Society
Part of series
Advances in Computer Science Research
Publication Date
January 2016
ISBN
978-94-6252-158-2
ISSN
2352-538X
DOI
https://doi.org/10.2991/emcs-16.2016.476How to use a DOI?
Open Access
This is an open access article distributed under the CC BY-NC license.

Cite this article

TY  - CONF
AU  - Xueli Ren
AU  - Yubiao Dai
PY  - 2016/01
DA  - 2016/01
TI  - Research on Similarity for XML Information Retrieval
BT  - International Conference on Education, Management, Computer and Society
PB  - Atlantis Press
SP  - 1897
EP  - 1901
SN  - 2352-538X
UR  - https://doi.org/10.2991/emcs-16.2016.476
DO  - https://doi.org/10.2991/emcs-16.2016.476
ID  - Ren2016/01
ER  -