IXIA (IndeX-based Integration Approach) A Hybrid Approach to Data Integration

Shokoh Kermanshahani 1
TIMC - Techniques de l'Ingénierie Médicale et de la Complexité - Informatique, Mathématiques et Applications, Grenoble - UMR 5525
Abstract : There is a large and increasing volume of documents, data sources and data base management systems available in the world, and many autonomous and heterogeneous sources speak of a same reality while using different words and conceptual structures. Many organizations need to dispose of a system that handles such data in a homogeneous way, which necessitates the integration of these data sources.

The goal of a data integration system is to develop a homogeneous interface for the end users to query several heterogeneous and autonomous sources. Building such a homogeneous interface raises many challenges among which the heterogeneity of data sources, the fragmentation of data, the processing and optimization of queries appear to be the most important.

There are many research projects that present different approaches and each of them proposes a solution to each of these problems. Depending on the integrated view, these approaches can be categorized into two main categories: materialized and virtual approaches; there are also some hybrid approaches when there is a composition of materialized and virtual views. The main advantage of a hybrid approach is to offer a trade-off between the query response time and data freshness in a data integration system. In the existing approaches, query optimization is often privileged for the materialized part of the system.

In this thesis, we develop a hybrid approach which aims to extend query optimization to all the queries of the integration system. It also provides a flexible data refreshing mechanism in order to tolerate different characteristics of sources and their data. This approach is based on the Osiris object indexing system. Osiris is a database and knowledge base platform with a specific object data model based on a hierarchy of views. Its indexation system relies on the partitioning of the object space using the view constraints.

IXIA, the hybrid approach presented in this thesis, materializes the indexation structure of the underlying objects at the mediator level. The Oids of objects, their correspondence with the source objects and the needed data to refresh the indexation data are also materialized.

Our index-based data integration approach offers more flexibility in data refreshing than a fully materialized approach and a better query response time in comparison with a fully virtual data integration system.
