Modélisation et score de complexes protéine-ARN

Abstract : My thesis shows results for the prediction of protein-RNA interactions with machine learning. An international community named CAPRI (Critical Assessment of PRedicted Interactions) regularly assesses in silico methods for the prediction of the interactions between macromolecules. Using blindpredictions within time constraints, protein-protein interactions and more recently protein-RNA interaction prediction techniques are assessed.In a first stage, we worked on curated protein-RNA benchmarks, including 120 3D structures extracted from the non redundant PRIDB (Protein-RNA Interface DataBase). We also tested the protein-RNA prediction method we designed using 40 protein-RNA complexes that were extracted from state-ofthe-art benchmarks and independent from the non redundant PRIDB complexes. Generating candidates identical to the in vivo solution with only a few 3D structures is an issue we tackled by modelling a candidate generation strategy using RNA structure perturbation in the protein-RNAcomplex. Such candidates are either near-native candidates – if they are close enough to the solution– or decoys – if they are too far away. We want to discriminate the near-native candidates from thedecoys. For the evaluation, we performed an original cross-validation process we called leave-”onepdb”-out, where there is one fold per protein-RNA complex and each fold contains the candidates generated using one complex. One of the gold standard approaches participating in the CAPRI experiment as to date is RosettaDock. RosettaDock is originally optimized for protein-proteincomplexes. For the learning step of our scoring function, we adapted and used an evolutionary algorithm called ROGER (ROC-based Genetic LearnER) to learn a logistic function. The results show that our scoring function performs much better than the original RosettaDock scoring function. Thus,we extend RosettaDock to the prediction of protein-RNA interactions. We also evaluated classifier based and metaclassifier-based approaches, which can lead to new improvements with further investigation.In a second stage, we introduced a new way to evaluate candidates using a multi-scale protocol. A candidate is geometrically represented on an atomic level – the most detailed scale – as well as on a coarse-grained level. The coarse-grained level is based on the construction of a Voronoi diagram over the coarse-grained atoms of the 3D structure. Voronoi diagrams already successfully modelled coarsegrained interactions for protein-protein complexes in the past. The idea behind the multi-scale protocolis to first find the interaction patch (epitope) between the protein and the RNA before using the time consuming and yet more precise atomic level. We modelled new scoring terms, as well as new scoring functions to evaluate generated candidates. Results are promising. Reducing the number of parameters involved and optimizing the explicit solvent model may improve the coarse-grained level predictions.
Document type :
Theses
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-01081605
Contributor : Abes Star <>
Submitted on : Monday, November 10, 2014 - 10:12:12 AM
Last modification on : Tuesday, April 24, 2018 - 1:51:58 PM
Long-term archiving on : Wednesday, February 11, 2015 - 3:25:27 PM

File

VD2_GUILHOT_GAUDEFFROY_ADRIEN_...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01081605, version 1

Collections

Citation

Adrien Guilhot-Gaudeffroy. Modélisation et score de complexes protéine-ARN. Autre [cs.OH]. Université Paris Sud - Paris XI, 2014. Français. ⟨NNT : 2014PA112228⟩. ⟨tel-01081605⟩

Share

Metrics

Record views

533

Files downloads

248