Habilitation à diriger des recherches

Optimisation et évaluation de performance pour l'aide à la conception et à l'administration des entrepôts de données complexes

Abstract : Data warehouses form the basis of decision-support systems. They help integrating the production data of companies or organizations and support multidimensional on-line analysis (OLAP) or data mining. Complex data are now more and more casually exploited within decision-support processes, hence new data warehousing approaches are developed, some of which exploit the XML language. In this context, data warehouse performance remains as much as ever a crucial issue.

In this thesis, we aim at proposing novel solutions for optimizing and evaluating data warehouse performance. We have indeed designed a generic approach whose objective is to automatically propose solutions to data warehouse administrators for optimizing data access times. The principle of this approach is to apply data mining techniques on a workload (set of queries) that is representative of data warehouse usage in order to deduce a quasi-optimal configuration of indices and/or materialized views. Then, cost models help selecting among these data structures those that are the most efficient in terms of performance gain/overhead ratio.

Besides, performance evaluation may help supporting data warehouse design. Thus, in order to experimentally validate our approach, we have also designed several generic benchmarks. Their main design principle is adaptability. In order to compare the efficiency of different performance optimization techniques, it is indeed necessary to test them in various environments, on different database and workload configurations, etc. The ability to assess the impact of different architecture choices is also a valuable help in data warehouse design. Our benchmarks thus allow the generation of various data warehouse configurations, as well as associated decision-support workloads.

Eventually, our performance optimization and evaluation solutions have been implemented in both the contexts of relational and XML data warehouses.
