Skip to Main content Skip to Navigation

Motif representation and discovery

Abstract : An important part of gene regulation is mediated by specific proteins, called transcription factors, which influence the transcription of a particular gene by binding to specific sites on DNA sequences, called transcription factor binding sites (TFBS) or, simply, motifs. Such binding sites are relatively short segments of DNA, normally 5 to 25 nucleotides long, over- represented in a set of co-regulated DNA sequences. There are two different problems in this setup: motif representation, accounting for the model that describes the TFBS's; and motif discovery, focusing in unravelling TFBS's from a set of co-regulated DNA sequences. This thesis proposes a discriminative scoring criterion that culminates in a discriminative mixture of Bayesian networks to distinguish TFBS's from the background DNA. This new probabilistic model supports further evidence in non-additivity among binding site positions, providing a superior discriminative power in TFBS's detection. On the other hand, extra knowledge carefully selected from the literature was incorporated in TFBS discovery in order to capture a variety of characteristics of the TFBS's patterns. This extra knowledge was combined during the process of motif discovery leading to results that are considerably more accurate than those achieved by methods that rely in the DNA sequence alone.
Complete list of metadatas

Cited literature [160 references]  Display  Hide  Download
Contributor : Marie-France Sagot <>
Submitted on : Tuesday, November 20, 2012 - 1:59:23 PM
Last modification on : Wednesday, November 21, 2012 - 1:27:29 PM
Long-term archiving on: : Thursday, February 21, 2013 - 12:15:39 PM


  • HAL Id : tel-00755042, version 1



A.M. Carvalho. Motif representation and discovery. Bioinformatics [q-bio.QM]. Universidade técnica de Lisboa Instituto superior técnico, 2011. English. ⟨tel-00755042⟩



Record views


Files downloads