DÉVELOPPEMENTS THÉORIQUES ET MÉTHODES NUMÉRIQUES POUR LES ANALYSES COMPARATIVES DE GÉNOMES ET PROTÉOMES BIAISÉS. Application à la comparaison des génomes et protéomes de Plasmodium falciparum et d'Arabidopsis thaliana

Abstract : Malaria is a major threat for humankind with a rough record of half a billion of infected people. Recently, one of the best known attributes of the plant cells, a relic chloroplast, termed apicoplast, was discovered within the cells of apicomplexan parasites and appears to holds vital functions unique to plants. Therefore, it is now admitted that in the “plant-side” of the parasite reside innovative targets for intervention, using molecules harboring herbicidal properties. To that extent, the release of the complete genome of Plasmodium falciparum , paved the way to the search for innovative plant-related protein targets.
A first step for searching such target is the genome-scale pairwise comparison between plant model, like Arabidopsis thaliana, and P. falciparum. The first release of P. falciparum identify 5268 predicted proteins from which 60% have not sufficient similarity to proteins in other organisms to justify a functional assignment. A singular feature of the P. falciparum genome was put forward to explain this prediction failure: the A+T richness (82%) which is known to influenced the distribution of amino acids in proteins. In order to consider this feature , we developed a new scoring scheme that extend the BLOSUM model, the non-symmetric matrices dirAtPf, which consider the difference of global distribution of amino acids in proteins between two species.
One supplementary effort in sequence analysis theory have been made with a mathematical demonstration which provide a single-linkage clustering criterion for genome-scale comparison. This demonstration lie on the Z-Value computation and the Bienaymé-Chebyshev theorem.
We re-examined the estimate of the sequences “dissemblance within assessed resemblance” as a source for divergence time calculation and evolutionary reconstruction. We sought the probabilistic, statistical and geometric rules that an optimal alignment score has to respect in respect of the recently demonstrated TULIP theorem. We used these rules as a framework of constraints to build up a geometric representation of a space of probably homologous proteins and define a theoretically explicit measure of protein proximity. Eventually, we constrained the topology associated to this geometric space by respecting i) the protein clock and derived phylogenetic models and ii) taking into account the lineages that separate sequences from the ancestral diverging events. This unified model, called the TULIP topological space, reconciles concepts from different fields of protein science that were not yet explicitly connected. The spatial geometry and topology of probably homologous proteins, built from pair-wise alignments, being univocal, applications include the reconstruction of univocal classification trees. The power of this elaborate topological spatial representation is illustrated by comparison with phylogenetic reconstructions obtained from multiple alignments.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00080245
Contributor : Olivier Bastien <>
Submitted on : Thursday, June 15, 2006 - 12:23:43 PM
Last modification on : Thursday, June 28, 2018 - 9:51:09 AM
Long-term archiving on: Friday, November 25, 2016 - 11:07:24 AM

Identifiers

  • HAL Id : tel-00080245, version 1

Collections

Citation

Olivier Bastien. DÉVELOPPEMENTS THÉORIQUES ET MÉTHODES NUMÉRIQUES POUR LES ANALYSES COMPARATIVES DE GÉNOMES ET PROTÉOMES BIAISÉS. Application à la comparaison des génomes et protéomes de Plasmodium falciparum et d'Arabidopsis thaliana. Biochimie [q-bio.BM]. Université Joseph-Fourier - Grenoble I, 2006. Français. ⟨tel-00080245⟩

Share

Metrics

Record views

444

Files downloads

1313