Skip to Main content Skip to Navigation

Mixed sequence-structure based analysis of proteins, with applications to functional annotations

Romain Tetley 1
1 ABS - Algorithms, Biology, Structure
CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : In this thesis, the focus is set on reconciling the realms of structure and sequence for protein analysis. Sequence analysis tools shine when faced with proteins presenting high sequence identity (≤ 30\%), but are lack - luster when it comes to remote homolog detection. Structural analysis tools present an interesting alternative, but solving structures - when at all possible- is a tedious and expensive process. These observations make the need for hybrid methods - which inject information obtained from available structures in a sequence model - quite clear. This thesis makes four main contributions toward this goal. First we present a novel structural measure, the RMSDcomb, based on local structural conservation patterns - the so called structural motifs. Second, we developed a method to identify structural motifs between two structures using a bootstrap method which relies on filtrations. Our approach is not a direct competitor to flexible aligners but can provide useful to perform a multiscale analysis of structural similarities. Third, we build upon the previous methods to design hybrid Hidden Markov Models which are biased towards regions of increased structural conservation between sets of proteins. We test this tool on the class II fusion viral proteins - particularly challenging because of their low sequence identity and mild structural homology. We find that we are able to recover known remote homologs of the viral proteins in the Drosophila and other organisms. Finally, formalizing a sub - problem encountered when comparing filtrations, we present a new theoretical problem - the D-family matching - on which we present various algorithmic results. We show - in a manner that is analogous to comparing parts of two protein conformations - how it is possible to compare two clusterings of the same data set using such a theoretical model.
Complete list of metadatas

Cited literature [195 references]  Display  Hide  Download
Contributor : Abes Star :  Contact
Submitted on : Tuesday, February 19, 2019 - 12:13:09 PM
Last modification on : Monday, May 25, 2020 - 7:03:24 PM
Long-term archiving on: : Monday, May 20, 2019 - 4:16:53 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02024736, version 1



Romain Tetley. Mixed sequence-structure based analysis of proteins, with applications to functional annotations. Data Structures and Algorithms [cs.DS]. Université Côte d'Azur, 2018. English. ⟨NNT : 2018AZUR4111⟩. ⟨tel-02024736⟩



Record views


Files downloads