Skip to Main content Skip to Navigation

Normalisation et Apprentissage de Transductions d'Arbres en Mots

Grégoire Laurence 1, 2
2 LINKS - Linking Dynamic Data
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe
Abstract : Storage, management and sharing of data are central issues in computer science. Structuring data in trees has become a standard (XML, JSON). To ensure preservation and quick exchange of data, one must identify new me- chanisms to automatize such transformations. We focus on the study of tree to words transformations represented by finite state machines. We define sequential tree to words transducers, that use each node of the input tree exactly once to produce an output. Using reduction to the equivalence problem of morphisms applied to context- free grammars (Plandowski, 95), we prove that equivalence of sequential trans- ducers is decidable in polynomial time. We introduce the concept of earliest transducer, sequential transducers normal form, which aim to produce output "as soon as possible" during the transduction. Using normalization and minimization algorithms, we prove the existence of a canonical transducer, unique, minimal and earliest, for each transduction of our class. Deciding the existence of a transducer representing a sample, i.e. pairs of input and output of a transformation, is proved NP-hard. Thus, we propose a learning algorithm that generate a canonical transducer from a sample, or fail, while remaining polynomial. This algorithm is based on grammatical inference techniques and the adaptation of a Myhill-Nerode theorem.
Complete list of metadatas

Cited literature [57 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-01053084
Contributor : Grégoire Laurence <>
Submitted on : Tuesday, July 29, 2014 - 3:13:50 PM
Last modification on : Thursday, February 21, 2019 - 10:52:55 AM
Document(s) archivé(s) le : Tuesday, November 25, 2014 - 8:21:08 PM

File

Identifiers

  • HAL Id : tel-01053084, version 1

Citation

Grégoire Laurence. Normalisation et Apprentissage de Transductions d'Arbres en Mots. Base de données [cs.DB]. Université des Sciences et Technologie de Lille - Lille I, 2014. Français. ⟨NNT : 41446⟩. ⟨tel-01053084⟩

Share

Metrics

Record views

702

Files downloads

952