Skip to Main content Skip to Navigation
Theses

A Pattern Model and Algebra for Representing and Querying Relative Information Completenes

Abstract : Information incompleteness is a major data quality issue which is amplified by the increasing amount of data collected from unreliable sources. Assessing the completeness of data is crucial for determining the quality of the data and the validity of query answers.In this work, we tackle the issue of extracting and reasoning about complete and missing information under relative information completeness setting. Under this setting, the completeness of a dataset is assessed with respect to a complete reference dataset. We advance the field by proposing two contributions: a pattern model for providing minimal covers summarizing the extent of complete and missing data partitions and a pattern algebra for deriving minimal pattern covers for query answers to analyze their validity.The completeness pattern framework presents an intriguing opportunity to achieve many applications, particularly those aiming at improving the quality of tasks impacted by missing data. Data imputation is a well-known technique for repairing missing data values but can incur a prohibitive cost when applied to large data sets. Query-driven imputation offers a better alternative as it allows for We adopt a rule-based query rewriting technique for imputing the answers of aggregation queries that are missing or suffer from incorrectness due to data incompleteness. We present a novel query rewriting mechanism that is guided by the completeness pattern model and algebra.We also investigate the generalization of our pattern model for summarizing any data fragments. Summaries can be queried to analyze and compare data fragments in a synthetic and flexible way.
Document type :
Theses
Complete list of metadatas

Cited literature [172 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02503212
Contributor : Abes Star :  Contact
Submitted on : Monday, March 9, 2020 - 6:04:08 PM
Last modification on : Friday, October 23, 2020 - 4:34:19 PM
Long-term archiving on: : Wednesday, June 10, 2020 - 4:39:27 PM

File

HANNOU_FZ_these_2019.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02503212, version 1

Citation

Fatma-Zohra Hannou. A Pattern Model and Algebra for Representing and Querying Relative Information Completenes. Databases [cs.DB]. Sorbonne Université, 2019. English. ⟨NNT : 2019SORUS110⟩. ⟨tel-02503212⟩

Share

Metrics

Record views

158

Files downloads

151