Title
Semantic similarity of XML documents based on structural and content analysis
Date Issued
17 November 2020
Access level
metadata only access
Resource Type
conference paper
Publisher(s)
Association for Computing Machinery
Abstract
The eXtensible Markup Language (XML) has become the main standard for Web information representation and data exchange over the last decades. However, XML documents present high heterogeneity regarding their structure. Hence, there is still a need of new approaches to manage and recognize similar information that consider the content and the semantic, besides the document structure. Most current approaches semantically analyze the XML document content, regardless its structure or vice versa. In this paper, we propose LSI*, a new approach for XML documents similarity by integrating in the semantic analysis their structural composition. We extend the Latent Semantic Indexing (LSI), which is based on Singular Value Decomposition (SVD), by considering the term itself and the context (i.e., structural path) in which it appears, to determine the semantic similarity between XML documents. To evaluate the performance of our proposal, we perform experiments to compare LSI*to state-of-the-art methods based on structural and content-structural analysis. Results show a precision up to 71, 43% when the XML structure is considered in the content analysis.
Language
English
OCDE Knowledge area
Ciencias de la computación
Ingeniería de sistemas y comunicaciones
Subjects
Scopus EID
2-s2.0-85123040045
Resource of which it is part
ACM International Conference Proceeding Series
ISBN of the container
9781450388894
Conference
ACM International Conference Proceeding Series
Sources of information:
Directorio de Producción Científica
Scopus