Proceedings of the 2016 International Conference on Education, Management, Computer and Society

Estimation of Structural Similarity of XML Document Based on Frequency and Path

Authors
Xueli Ren, Yubiao Dai
Corresponding Author
Xueli Ren
Available Online January 2016.
DOI
https://doi.org/10.2991/emcs-16.2016.66How to use a DOI?
Keywords
XML; Structural similarity; Frequency; Sematic; Tuple
Abstract
With the continuous development of Internet and rich resources emerging on the Web, information retrieval based on XML has emerged; the similarity of documents is the basis of information retrieval. A method is proposed to compute similarity of XML documents based on path and frequency in the paper. XML document is expressed as a collection of tuple, the paths are extracted and delete the recurring in order to improve efficiency, tag is matched by WordNet; and then path similarity is computed by the fuzzy longest common subsequence and frequency; finally, the structure similarity between documents are calculated. Two experiments are done to show that the method is effective, the experiment 1 test structural similarity of 15 XML documents from 3 DTDs; the similarity computing is applied in the documents classification for real data sets in the experiment 2, and results show the accuracy may arrive at 100%
Open Access
This is an open access article distributed under the CC BY-NC license.

Download article (PDF)

Proceedings
International Conference on Education, Management, Computer and Society
Part of series
Advances in Computer Science Research
Publication Date
January 2016
ISBN
978-94-6252-158-2
ISSN
2352-538X
DOI
https://doi.org/10.2991/emcs-16.2016.66How to use a DOI?
Open Access
This is an open access article distributed under the CC BY-NC license.

Cite this article

TY  - CONF
AU  - Xueli Ren
AU  - Yubiao Dai
PY  - 2016/01
DA  - 2016/01
TI  - Estimation of Structural Similarity of XML Document Based on Frequency and Path
BT  - International Conference on Education, Management, Computer and Society
PB  - Atlantis Press
SP  - 272
EP  - 275
SN  - 2352-538X
UR  - https://doi.org/10.2991/emcs-16.2016.66
DO  - https://doi.org/10.2991/emcs-16.2016.66
ID  - Ren2016/01
ER  -