Modélisation de séquences génomiques structurées, génération aléatoire et applications

Abstract : Evidences of a selective pressure over structured genomic data (RNA, Proteins, DNA) can be established using random sequences model. From such a model, it is possible to evaluate the significance of an observed phenomenon, at the cost of some complex mathematical analysis or through random
generation. First, we will consider the properties of weighted context-free grammars, a class of models especially fit for the modeling of RNA secondary structure. We derive efficient random generation algorithms implemented in the software GenRGenS. We address the computation of weights realizing
given frequencies for the terminal symbols, and give directions to solve this problem, both theoretically by mean of Drmota theorem and practically through an optimisation approach. Then, we focus on the modeling of RNA secondary structure. After a short introduction to some of the biological phenomena that motivates our works, we propose several models for this structure based on context-free grammars, allowing for the random generation of realistic RNA structures. Using a optimisation based algorithm, we derive the weights associated with certain RNA families. We also propose an algorithm for the extraction of the maximal planar subset of a general, possibly non-planar, RNA structure in order to exploit the most recent structural data obtained through crystallography. Finally, the last chapter of the thesis deals with the analysis of a similarity search algorithm, which sensibility turns out to be closely related to the probability of presence of a given motif in a special random walk, the culminating paths. These paths share both properties of staying above the Y-axis (positivity) and reaching their maximal height at their last step. We propose a polynomial-time recursive random generation algorithm for these paths and, by mixing enumerative combinatorics, asymptotic analysis and language theory, we propose linear time complexity algorithms through a rejection approach in many cases.
Complete list of metadatas

Cited literature [104 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-00144130
Contributor : Yann Ponty <>
Submitted on : Thursday, April 14, 2011 - 11:52:26 AM
Last modification on : Tuesday, April 24, 2018 - 1:38:27 PM
Long-term archiving on : Saturday, December 3, 2016 - 5:07:25 PM

Identifiers

  • HAL Id : tel-00144130, version 2

Collections

Citation

Yann Ponty. Modélisation de séquences génomiques structurées, génération aléatoire et applications. Mathématiques [math]. Université Paris Sud - Paris XI, 2006. Français. ⟨tel-00144130v2⟩

Share

Metrics

Record views

448

Files downloads

999