Une approche matérialisée basée sur les vues pour l'intégration de documents XML

Abstract : Semi-structured data play an increasing role in the development of the Web through the use of XML. However, the management of semi-structured data poses specifi c problems because semistructured data, contrary to classical databases, do not rely on a prede fined schema. The schema of a document is contained in the document itself and similar documents may be represented by di fferent schemas. Consequently, the techniques and algorithms used for querying or integrating this data are more complex than those used for structured data. The objective of our work is the integration of XML data by using the principles of Osiris, a prototype of KB-DBMS, in which views are a central concept. In this system, a family of objects is de fined by a hierarchy of views, where a view is defi ned by its parent views and its own attributes and constraints. Osiris belongs to the family of Description Logics ; the minimal view of a family of objects is assimilated to a primitive concept and its other views to defi ned concepts. An object of a family satis fies some of its views. For each family of objects, Osiris builds a n-dimensional classifi cation space by analysing the constraints defi ned in all of its views. This space is used for object classifi cation and indexation. In this thesis we study the contribution of the main features of Osiris - classi fication, indexation and semantic query optimization - to the integration of XML documents. For this purpose we produce a target schema (an abstract XML schema), which represents an Osiris schema ; every document satisfying a source schema (concrete XML schema) is rewritten in terms of the target schema before undergoing the extraction of the values of its entities. The objects corresponding to these entities are then classifi ed and indexed. The Osiris mechanism for semantic query optimization can then be used to extract the objects of interest of a query. We have realized a prototype, named OSIX (Osiris-based System for the Integration of XML documents) and we have applied it to the integration and interrogation of XML documents simulating the data of a hospital.
