Skip to Main content Skip to Navigation

Apprentissage d'automates modélisant des familles de séquences protéiques

Goulven Kerbellec 1
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : This thesis shows a new approach out of discovering protein families signatures. Given a sample of (unaligned) sequences belonging to a structural or functional family of proteins, this approach infers non-deterministic automata characterizing the family. A new kind of multiple alignment called PLMA is introduced in order to emphasize the partial and local significant similarities. Given this information, the NFA models are produced by a process stemming from the domain of grammatical inference. The NFA models, presented here under the name of Protomata, are discreet graphical models of strong expressiveness, which distinguishes them from statistical models such as HMM profiles or pattern models like Prosite patterns.
The experiments led on various biological families, among which the MIP and the TNF, show a success on real data.
Complete list of metadatas

Cited literature [67 references]  Display  Hide  Download
Contributor : François Coste <>
Submitted on : Thursday, October 9, 2008 - 3:37:06 PM
Last modification on : Monday, October 19, 2020 - 11:07:31 AM
Long-term archiving on: : Monday, June 7, 2010 - 5:33:23 PM


  • HAL Id : tel-00327938, version 1


Goulven Kerbellec. Apprentissage d'automates modélisant des familles de séquences protéiques. Interface homme-machine [cs.HC]. Université Rennes 1, 2008. Français. ⟨tel-00327938⟩



Record views


Files downloads