Symbolic Inference Methods for Databases

Slawomir Staworko 1, 2
1 LINKS - Linking Dynamic Data
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : This dissertation is a summary of a line of research, that I was actively involved in, on learning in databases from examples. This research focused on traditional as well as novel database models and languages for querying, transforming, and describing the schema of a database. In case of schemas our contributions involve proposing an original languages for the emerging data models of Unordered XML and RDF. We have studied learning from examples of schemas for Unordered XML, schemas for RDF, twig queries for XML, join queries for relational databases, and XML transformations defined with a novel model of tree-to-word transducers. Investigating learnability of the proposed languages required us to examine closely a number of their fundamental properties, often of independent interest, including normal forms, minimization, containment and equivalence, consistency of a set of examples, and finite characterizability. Good understanding of these properties allowed us to devise learning algorithms that explore a possibly large search space with the help of a diligently designed set of generalization operations in search of an appropriate solution. Learning (or inference) is a problem that has two parameters: the precise class of languages we wish to infer and the type of input that the user can provide. We focused on the setting where the user input consists of positive examples i.e., elements that belong to the goal language, and negative examples i.e., elements that do not belong to the goal language. In general using both negative and positive examples allows to learn richer classes of goal languages than using positive examples alone. However, using negative examples is often difficult because together with positive examples they may cause the search space to take a very complex shape and its exploration may turn out to be computationally challenging.
Document type :
Habilitation à diriger des recherches
Complete list of metadatas

Cited literature [103 references]  Display  Hide  Download

https://hal.inria.fr/tel-01254965
Contributor : Slawomir Staworko <>
Submitted on : Wednesday, January 13, 2016 - 1:13:11 AM
Last modification on : Friday, March 22, 2019 - 1:35:16 AM
Long-term archiving on : Friday, November 11, 2016 - 3:34:51 AM

Identifiers

  • HAL Id : tel-01254965, version 1

Citation

Slawomir Staworko. Symbolic Inference Methods for Databases. Databases [cs.DB]. Université de Lille, 2015. ⟨tel-01254965⟩

Share

Metrics

Record views

396

Files downloads

203