Skip to Main content Skip to Navigation
Theses

Federation de données semi-structurées avec XML

Abstract : In contrast to the traditional data, semi-structured data are irregular:
data may be missed, different data types may be for the similar concepts,
and if any the structure may not be well-known. One lacks actually
predefined schemas to describe the data of the real world. It makes it
difficult to integrate the data from different sources.

We propose a mediator architecture entirely based on XML. The objective of
the mediator architecture is to federate distributed and heterogeneous
data sources. It relies on XQuery, the functional language that is
designed to query across XML documents. The mediator parses the XQuery
request, dispatch it to sources for evaluation and recompose results with
additional query evaluation.

Query evaluation must be done by making best use of data specificity to
carry out an efficient optimization. We present the algebra
XAlgebra based on the operators designed for XML. This algebra aims to
construct execution plans for the evaluation of XQuery and processes
tuples of tree structure.

These execution plans must be evaluated by a cost model and
one of them with the minimal cost will be selected. In this thesis,
we define a cost model for semi-structured data that is designed for
our algebra.

Since the data sources (DBMS, Web server, search engine, etc.) may be very
heterogeneous, they can have different capabilities of processing data,
and their cost models may also be defined with different precision. So,
in order to integrate such information in the mediation architecture, we
have to know how to communicate the information between the mediator and
the sources and to integrate them. To do this, we use XML-based languages
such as XML-schema and MathML to export the metadata, cost formula and
the definitions of source capabilities. The exported information is
transferred by an application interface called XML/DBC.

Finally, diverse optimizations specific to this mediator architecture
must be considered. For this, we introduce a semantic cache based on
the DBMS prototype that store natively and efficiently XML data.
Document type :
Theses
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-00005162
Contributor : Tuyêt Trâm Dang Ngoc <>
Submitted on : Saturday, February 28, 2004 - 10:31:15 AM
Last modification on : Tuesday, December 1, 2020 - 2:18:03 PM
Long-term archiving on: : Wednesday, September 12, 2012 - 1:45:10 PM

Identifiers

  • HAL Id : tel-00005162, version 1

Collections

Citation

Tuyet Tram Dang Ngoc. Federation de données semi-structurées avec XML. Interface homme-machine [cs.HC]. Université de Versailles-Saint Quentin en Yvelines, 2003. Français. ⟨tel-00005162⟩

Share

Metrics

Record views

277

Files downloads

1266