Title
Schema matching on streams with accuracy guarantees
Date Issued
01 January 2008
Access level
metadata only access
Resource Type
journal article
Author(s)
Gama J.
Klinkenberg R.
Humboldt-Universität zu Berlin
Publisher(s)
IOS Press
Abstract
We address the problem of matching imperfectly documented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between attributes by quantifying the similarity of their corresponding values. However, exact calculation of these similarities requires processing of all database records - which is infeasible for data streams. We devise a fast matching algorithm that uses only a small sample of records, and is yet guaranteed to find a matching that is a close approximation of the matching that would be obtained if the entire stream were processed. The method can be applied to any given (combination of) similarity metrics that can be estimated from a sample with bounded error; we apply the algorithm to several metrics. We give a rigorous proof of the method's correctness and report on experiments using large databases. © 2008 IOS Press. All rights reserved.
Start page
253
End page
270
Volume
12
Issue
3
Language
English
OCDE Knowledge area
Ciencias de la información Telecomunicaciones
Scopus EID
2-s2.0-51849119371
Source
Intelligent Data Analysis
ISSN of the container
1088467X
Sources of information: Directorio de Producción Científica Scopus