Data Engineering: Modeling and Integration Issues

Abstract : This report includes the main results in three research areas we have been working on since 1989: Geographical Databases, Data Integration and Semantic Issues in PDMS (Peer Data Management Systems). A Geographical Database is a collection of inter-related and geo-referenced data. By definition, it is a database directed to the representation, storage and access to the information, which is spatially referenced. Traditional techniques of data modeling were not adequate for the treatment of geographical data. The difficulty consists of the fact that most of these data are validated in terms of its spatial localization, time, and the reliability of the collection. In this context, our contribution was the proposal of an object-oriented geographic data model MGeo+ and its query language LinGeo. We also have worked on spatial access methods' analysis and on the proposal of a visual query language for geographical data along with its user interface. The data integration systems are tools that offer a uniform access to distributed and heterogeneous Web data sources. This is done by resolving the heterogeneities and giving to the disparate sources an uniform view. Users submit queries over the integrated view without having to spend a lot of time in searching and browsing the Web. We have been working on the specification and implementation of a data integration system mainly interested in the evolution of the mediation schema, query reformulation and quality issues. Schemas and instances drawn from heterogeneous, dynamic and distributed data sources rarely contain explicit semantic descriptions which could be used to derive the meaning or purpose of schema elements (e.g. entity, attribute and relationship). Implicit semantic information needs to be extracted in order to clarify the meaning of the schema elements. To achieve this, an ontology of a given knowledge domain will provide the information regarding semantic relations among the vocabulary terms shared by the data sources. Semantic interpretation, however, regards people's understanding and it is a context-dependent task which requires a specific understanding of the shared domain knowledge. Context may be employed as a way to improve decision-making over heterogeneity reconciliation in data integration processes since it helps to understand the data schema semantics as well as the data content semantics. We present our proposal to a context-oriented model and a domain-independent context manager, a contextual ontology to data integration and a semantic-based approach to peers' organization in a PDMS.
Document type :
Habilitation à diriger des recherches
Complete list of metadatas

Cited literature [62 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00324525
Contributor : Mokrane Bouzeghoub <>
Submitted on : Thursday, September 25, 2008 - 11:47:28 AM
Last modification on : Friday, January 10, 2020 - 3:42:19 PM
Long-term archiving on: Friday, June 4, 2010 - 11:47:28 AM

Identifiers

  • HAL Id : tel-00324525, version 1

Collections

Citation

Ana Carolina Salgado. Data Engineering: Modeling and Integration Issues. Computer Science [cs]. Université de Versailles-Saint Quentin en Yvelines, 2008. ⟨tel-00324525⟩

Share

Metrics

Record views

233

Files downloads

1231