Apprentissage d'automates modélisant des familles de séquences protéiques

Goulven Kerbellec 1
1 SYMBIOSE - Biological systems and models, bioinformatics and sequences
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : This thesis shows a new approach out of discovering protein families signatures. Given a sample of (unaligned) sequences belonging to a structural or functional family of proteins, this approach infers non-deterministic automata characterizing the family. A new kind of multiple alignment called PLMA is introduced in order to emphasize the partial and local significant similarities. Given this information, the NFA models are produced by a process stemming from the domain of grammatical inference. The NFA models, presented here under the name of Protomata, are discreet graphical models of strong expressiveness, which distinguishes them from statistical models such as HMM profiles or pattern models like Prosite patterns.
The experiments led on various biological families, among which the MIP and the TNF, show a success on real data.
Document type :
Theses
Interface homme-machine [cs.HC]. Université Rennes 1, 2008. Français
Liste complète des métadonnées

Cited literature [67 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00327938
Contributor : François Coste <>
Submitted on : Thursday, October 9, 2008 - 3:37:06 PM
Last modification on : Friday, January 13, 2017 - 2:20:29 PM
Document(s) archivé(s) le : Monday, June 7, 2010 - 5:33:23 PM

Identifiers

  • HAL Id : tel-00327938, version 1

Collections

Citation

Goulven Kerbellec. Apprentissage d'automates modélisant des familles de séquences protéiques. Interface homme-machine [cs.HC]. Université Rennes 1, 2008. Français. 〈tel-00327938〉

Share

Metrics

Record views

508

Document downloads

549