Comparaison de structures secondaires d'ARN

Abstract : RNAs are one of the fundamental elements of a cell. Generally, RNAs are defined as oriented sequences of nucleotides (denoted by A,C,G and U). Inside a cell, RNAs do not have a linear shape but fold in space. The molecular function of an RNA strongly depends on this tri-dimensional folding. Hence, the comparison of the tri-dimensional structure of two RNAs is essential to determine whether the RNAs share the same function. The structure of an RNA is generally divided into three parts. The first is the primary structure which corresponds to the sequence of nucleotides. The secondary structure is composed of the list of links between nucleotides that represent helices. Finally, the tertiary structure corresponds to the exact tri-dimensional folding of the RNA. Although the tertiary structure is the most accurate definition of the spatial structure adopted by an RNA, it is well-known that two RNAs sharing the same function will also have closely related secondary structures. A few other structural elements can be distinguished in an RNA secondary structure. These are the helices, multiloops, hairpin loops, internal loops and bulges. Up until now, essentially three data structures have been proposed to represent an RNA secondary structure : arc-annotated sequences, 2-intervals and rooted oriented trees. Arc-annotated sequences are sequences with arcs between nucleotides of the sequence that form a pair in the structure. 2-intervals generalise arc-annotated sequences and correspond to two disjoint subsets. An RNA secondary structure is then de?ned as a family of 2-intervals. Finally, rooted ordered trees can represent an RNA secondary structure at various levels, from the nucleotides up to the network of multiloops. One of the drawbacks of all these approaches is that they model the secondary structure of an RNA from a specific point of view (nucleotides, helices etc.). We decided to introduce a new model called RNA-MiGaL, made of four trees related among them. Each of these trees represents the structure of an RNA at a particular level of detail : the upper level models the network of multiloops that is considered as the skeleton of the secondary structure, while the lower level represents nucleotides. We use the tree edit distance to compare two RNA-MiGaLs. However, due to some limitations of the classical edit distance to compare trees representing RNA secondary structures, we introduced two new edit operations named "node fusion" and "edge fusion", thus providing a new edit distance. Using this distance, we developed an algorithm to compare two RNA-MiGaLs. The algorithm has been implemented in a package which allows RNA secondary structures to be compared in various ways.
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00637131
Contributor : Guillaume Blin <>
Submitted on : Sunday, October 30, 2011 - 10:07:32 PM
Last modification on : Wednesday, April 11, 2018 - 12:12:02 PM
Long-term archiving on : Tuesday, January 31, 2012 - 2:25:13 AM

Identifiers

  • HAL Id : tel-00637131, version 1

Collections

Citation

Julien Allali. Comparaison de structures secondaires d'ARN. Informatique [cs]. Université de Marne la Vallée, 2004. Français. ⟨tel-00637131⟩

Share

Metrics

Record views

861

Files downloads

1396