Skip to Main content Skip to Navigation

Développement d'un alphabet structural intégrant la flexibilité des structures protéiques

Abstract : The purpose of this PhD is to provide a Structural Alphabet (SA) for more accurate characterization of protein three-dimensional (3D) structures as well as integrating the increasing protein 3D structure information currently available in the Protein Data Bank (PDB). The SA also takes into consideration the logic behind the structural fragments sequence by using the hidden Markov Model (HMM). In this PhD, we describe a new structural alphabet, improving the existing HMM-SA27 structural alphabet, called SAFlex (Structural Alphabet Flexibility), in order to take into account the uncertainty of data (missing data in PDB files) and the redundancy of protein structures. The new SAFlex structural alphabet obtained therefore offers a new, rigorous and robust encoding model. This encoding takes into account the encoding uncertainty by providing three encoding options: the maximum a posteriori (MAP), the marginal posterior distribution (POST), and the effective number of letters at each given position (NEFF). SAFlex also provides and builds a consensus encoding from different replicates (multiple chains, monomers and several homomers) of a single protein. It thus allows the detection of structural variability between different chains. The methodological advances and the achievement of the SAFlex alphabet are the main contributions of this PhD. We also present the new PDB parser(SAFlex-PDB) and we demonstrate that our parser is therefore interesting both qualitative (detection of various errors) and quantitative terms (program optimization and parallelization) by comparing it with two other parsers well-known in the area of Bioinformatics (Biopython and BioJava). The SAFlex structural alphabet is being made available to the scientific community by providing a website. The SAFlex web server represents the concrete contribution of this PhD while the SAFlex-PDB parser represents an important contribution to the proper function of the proposed website. Here, we describe the functions and the interfaces of the SAFlex web server. The SAFlex can be used in various fashions for a protein tertiary structure of a given PDB format file; it can be used for encoding the 3D structure, identifying and predicting missing data. Hence, it is the only alphabet able to encode and predict the missing data in a 3D protein structure to date. Finally, these improvements; are promising to explore increasing protein redundancy data and obtain useful quantification of their flexibility
Document type :
Complete list of metadata

Cited literature [242 references]  Display  Hide  Download
Contributor : ABES STAR :  Contact
Submitted on : Wednesday, September 25, 2019 - 12:35:07 PM
Last modification on : Friday, August 5, 2022 - 3:00:05 PM
Long-term archiving on: : Sunday, February 9, 2020 - 2:18:38 PM


Version validated by the jury (STAR)


  • HAL Id : tel-02296605, version 1


Ikram Allam Sekhi. Développement d'un alphabet structural intégrant la flexibilité des structures protéiques. Bio-informatique [q-bio.QM]. Université Sorbonne Paris Cité, 2018. Français. ⟨NNT : 2018USPCC084⟩. ⟨tel-02296605⟩



Record views


Files downloads