Proceedings of the 2016 International Conference on Education, Management, Computer and Society

Estimation of Structural Similarity of XML Document Based on Frequency and Path

Authors
Xueli Ren, Yubiao Dai
Corresponding Author
Xueli Ren
Available Online January 2016.
DOI
10.2991/emcs-16.2016.66How to use a DOI?
Keywords
XML; Structural similarity; Frequency; Sematic; Tuple
Abstract

With the continuous development of Internet and rich resources emerging on the Web, information retrieval based on XML has emerged; the similarity of documents is the basis of information retrieval. A method is proposed to compute similarity of XML documents based on path and frequency in the paper. XML document is expressed as a collection of tuple, the paths are extracted and delete the recurring in order to improve efficiency, tag is matched by WordNet; and then path similarity is computed by the fuzzy longest common subsequence and frequency; finally, the structure similarity between documents are calculated. Two experiments are done to show that the method is effective, the experiment 1 test structural similarity of 15 XML documents from 3 DTDs; the similarity computing is applied in the documents classification for real data sets in the experiment 2, and results show the accuracy may arrive at 100%

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 International Conference on Education, Management, Computer and Society
Series
Advances in Computer Science Research
Publication Date
January 2016
ISBN
978-94-6252-158-2
ISSN
2352-538X
DOI
10.2991/emcs-16.2016.66How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Xueli Ren
AU  - Yubiao Dai
PY  - 2016/01
DA  - 2016/01
TI  - Estimation of Structural Similarity of XML Document Based on Frequency and Path
BT  - Proceedings of the 2016 International Conference on Education, Management, Computer and Society
PB  - Atlantis Press
SP  - 272
EP  - 275
SN  - 2352-538X
UR  - https://doi.org/10.2991/emcs-16.2016.66
DO  - 10.2991/emcs-16.2016.66
ID  - Ren2016/01
ER  -