Inférence grammaticale sur des alphabets ordonnés : application à la découverte de motifs dans des familles de protéines

Aurélien Leroux 1
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : This work has addressed the problem of the adaptation of grammatical inference algorithms for the discovery of common properties in a set of proteins. Positive grammatical inference generates a particular grammatical representation which is optimal for this language, i.e. which gathers and organises the specic properties of the words of the given language, from a set of words belonging to a given target language. We used the Taylor diagram, which classies amino acids according to their physico-chemical properties, in order to propose a specic order on groups of amino acids in the form of a lattice. During this work, we also developed an inference algorithm (SDTM) which computes best local alignments between pairs of proteins according to a score based on the order dened by the lattice and on the statistical properties of the given set of proteins. The result of the algorithm is a sequential machine close to a Mealy machine in which the outputs are reduced to accept and reject. The algorithm begins by the construction of the biggest automaton recognising exactly the words of the language. Then, it generalizes the automaton by successively merging some pairs of transitions corresponding to paired amino acids in the selected alignments. Experiments have shown the interest of this combination of pattern discovery and grammatical inference methods.
Complete list of metadatas

Cited literature [154 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00185489
Contributor : Anne Jaigu <>
Submitted on : Tuesday, November 6, 2007 - 11:30:00 AM
Last modification on : Friday, November 16, 2018 - 1:23:33 AM
Long-term archiving on : Monday, April 12, 2010 - 1:26:57 AM

Identifiers

  • HAL Id : tel-00185489, version 1

Citation

Aurélien Leroux. Inférence grammaticale sur des alphabets ordonnés : application à la découverte de motifs dans des familles de protéines. Biochimie [q-bio.BM]. Université Rennes 1, 2005. Français. ⟨tel-00185489⟩

Share

Metrics

Record views

557

Files downloads

952