Supertree methods for phylogenomics

Abstract : Phylogenetics is the field of evolutionary biology that studies the evolutionary relationships between species through morphological and molecular data. These relationships can be summarized in the so- called "species tree". A gene tree is an evolutionary tree constructed by analyzing a gene family. Species trees are mainly estimated using gene trees. However, for both methodological and biological reasons, a gene tree may differ from the species tree. To estimate species tree, biologists then analyze several data sets at a time, letting the weight of the evidence decide. This thesis focuses on the "supertree" approach to combine data sets. This approach consists first in constructing trees (commonly called source trees) from primary data, then assembling them into a larger and more comprehensive tree, called supertree. When using supertree construction in a divide-and-conquer approach in the attempt to reconstruct large portions of the Tree of Life, conservative supertree methods have to be preferred in order to obtain reliable supertrees. In this context, a supertree method should display only information that is displayed or induced by source trees (induction property - PI) and that does not conflict with source trees or a combination thereof (non contradiction property - PC). In this thesis we introduce two combinatorial properties that formalize these ideas. We proposed algorithms that modify the output of any supertree methods such that it verifies these properties. Since no existing supertree method satisfies both PI and PC, we have developed two methods, PhySIC and PhySIC_IST, which directly build supertrees satisfying these properties. An application of PhySIC_IST to the complex problem of the history of Triticeae is presented. Since duplication events often result in the presence of several copies of the same genes in the species genomes, gene trees are usually multi-labeled, i.e., , a single species can label more than one leaf. Since no supertree method exists to combine multi-labeled trees, until now these gene trees were simply discarded in supertree analyses. Yet, they account for 60% to 80 % of the gene trees available in phylogenomic databases. In this thesis, we propose several algorithms to extract a maximum amount of speciation signal from multi-labeled trees and put it under the form of single-labeled trees which can be handled by supertree methods. An application to the hogenom database is presented
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-00842893
Contributor : Sylvain Milanesi <>
Submitted on : Tuesday, July 9, 2013 - 4:26:42 PM
Last modification on : Friday, June 21, 2019 - 1:34:06 AM
Long-term archiving on : Thursday, October 10, 2013 - 4:12:45 AM

Identifiers

  • HAL Id : tel-00842893, version 1

Collections

Citation

Celine Scornavacca. Supertree methods for phylogenomics. Bioinformatics [q-bio.QM]. Université Montpellier II - Sciences et Techniques du Languedoc, 2009. English. ⟨tel-00842893⟩

Share

Metrics

Record views

513

Files downloads

597