André Wehe
Iowa State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by André Wehe.
Bioinformatics | 2008
André Wehe; Mukul S. Bansal; J. Gordon Burleigh; Oliver Eulenstein
UNLABELLED DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. AVAILABILITY DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree
Systematic Biology | 2011
J. Gordon Burleigh; Mukul S. Bansal; Oliver Eulenstein; Stefanie Hartmann; André Wehe
Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.
BMC Bioinformatics | 2010
Ruchi Chaudhary; Mukul S. Bansal; André Wehe; David Fernández-Baca; Oliver Eulenstein
BackgroundThe ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle.ResultsWe introduce iGTP, a platform-independent software program that implements state-of-the-art algorithms that greatly speed up species tree inference under the duplication, duplication-loss, and deep coalescence reconciliation costs. iGTP significantly extends and improves the functionality and performance of existing gene tree parsimony software and offers advanced features such as building effective initial trees using stepwise leaf addition and the ability to have unrooted gene trees in the input. Moreover, iGTP provides a user-friendly graphical interface with integrated tree visualization software to facilitate analysis of the results.ConclusionsiGTP enables, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication, duplication-loss, and deep coalescence reconciliation costs, all from within a convenient graphical user interface.
research in computational molecular biology | 2007
Mukul S. Bansal; J. Gordon Burleigh; Oliver Eulenstein; André Wehe
The gene-duplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene duplications. This problem is NP-hard and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. We show how this local search problem can be solved efficiently by reusing previously computed information. This improves the running time of the current solution by a factor of n, where n is the number of species in the resulting supertree solution, and makes the gene-duplication problem more tractable for large-scale phylogenetic analyses. We verify the exceptional performance of our solution in a comparison study using sets of large randomly generated gene trees. Furthermore, we demonstrate the utility of our solution by incorporating large genomic data sets from GenBank into a supertree analysis of plants.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2009
Mukul S. Bansal; Oliver Eulenstein; André Wehe
The gene-duplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene-duplication events. This problem is NP-complete and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. A classical local search problem is the {\tt NNI} search problem, which is based on the nearest neighbor interchange operation. In this work, we 1) provide a novel near-linear time algorithm for the {\tt NNI} search problem, 2) introduce extensions that significantly enlarge the search space of the {\tt NNI} search problem, and 3) present algorithms for these extended versions that are asymptotically just as efficient as our algorithm for the {\tt NNI} search problem. The exceptional speedup achieved in the extended {\tt NNI} search problems makes the gene-duplication problem more tractable for large-scale phylogenetic analyses. We verify the performance of our algorithms in a comparison study using sets of large randomly generated gene trees.
Journal of Computational Biology | 2009
John Gordon Burleigh; Mukul S. Bansal; André Wehe; Oliver Eulenstein
Recent analyses of plant genomic data have found extensive evidence of ancient whole genome duplication (or polyploidy) events, but there are many unresolved questions regarding the number and timing of such events in plant evolutionary history. We describe the first exact and efficient algorithm for the Episode Clustering problem, which, given a collection of rooted gene trees and a rooted species tree, seeks the minimum number of locations on the species tree of gene duplication events. Solving this problem allows one to place gene duplication events onto nodes of a given species tree and potentially detect large-scale gene duplication events. We examined the performance of an implementation of our algorithm using 85 plant gene trees that contain genes from a total of 136 plant taxa. We found evidence of large-scale gene duplication events in Populus, Gossypium, Poaceae, Asteraceae, Brassicaceae, Solanaceae, Fabaceae, and near the root of the eudicot clade that are consistent with previous genomic evidence. However, a lack of phylogenetic signal within the gene trees can produce erroneous evidence of large-scale duplication events, especially near the root of the species tree. Although the results of our algorithm should be interpreted cautiously, they provide hypotheses for precise locations of large-scale gene duplication events with data from relatively few gene trees and can complement other genomic approaches to provide a more comprehensive view of ancient large-scale gene duplication events.
research in computational molecular biology | 2008
J. Gordon Burleigh; Mukul S. Bansal; André Wehe; Oliver Eulenstein
We introduce the first exact and efficient algorithm for Guigo et al.s problem that given a collection of rooted, binary gene trees and a rooted, binary species tree, determines a minimum number of locations for gene duplication events from the gene trees on the species tree. We examined the performance of our algorithm using a set of 85 genes trees that contain genes from a total of 136 plant taxa. There was evidence of large-scale gene duplication events in Populus, Gossypium, Poaceae, Asteraceae, Brassicaceae, Solanaceae, Fabaceae, and near the root of the eudicot clade. However, error in gene trees can produce erroneous evidence of large-scale duplication events, especially near the root of the species tree. Our algorithm can provide hypotheses for precise locations of large-scale gene duplication events with data from relatively few gene trees and can complement other genomic approaches to provide a more comprehensive view of ancient large-scale gene duplication events.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2013
André Wehe; J. Gordon Burleigh; Oliver Eulenstein
Phylogenetic inference is a computationally difficult problem, and constructing high-quality phylogenies that can build upon existing phylogenetic knowledge and synthesize insights from new data remains a major challenge. We introduce knowledge-enhanced phylogenetic problems for both supertree and supermatrix phylogenetic analyses. These problems seek an optimal phylogenetic tree that can only be assembled from a user-supplied set of, possibly incompatible, phylogenetic relationships. We describe exact polynomial time algorithms for the knowledge-enhanced versions of the NP-hard Robinson Foulds, gene duplication, duplication and loss, and deep coalescence supertree problems. Further, we demonstrate that our algorithms can rapidly improve upon results of local search heuristics for these problems. Finally, we introduce a knowledge-enhanced search heuristic that can be applied to any discrete character data set using the maximum parsimony (MP) phylogenetic problem. Although this approach is not guaranteed to find exact solutions, we show that it also can improve upon solutions from commonly used MP heuristics.
international symposium on bioinformatics research and applications | 2012
André Wehe; J. Gordon Burleigh; Oliver Eulenstein
Supertree algorithms combine smaller phylogenetic trees into a single, comprehensive phylogeny, or supertree. Most supertree problems are NP-hard, and often heuristics identify supertrees with anomalous or unwanted relationships. We introduce knowledge-enhanced supertree problems, which seek an optimal supertree for a collection of input trees that can only be assembled from a set of given, possibly incompatible, phylogenetic relationships. For these problems we introduce efficient algorithms that, in a special setting, also provide exact solutions for the original supertree problems. We describe our algorithms and verify their performance based on the Robinson Foulds (RF) supertree problem. We demonstrate that our algorithms (i) can significantly improve upon estimates of existing RF-heuristics, and (ii) can compute exact RF supertrees with up to 17 taxa.
Journal of Parallel and Distributed Computing | 2010
André Wehe; Wen-Chieh Chang; Oliver Eulenstein; Srinivas Aluru
Phylogenetics is a branch of computational and evolutionary biology dealing with the inference of trees depicting evolutionary relationships among species and/or sequences. An important problem in phylogenetics is to find a species tree that is most parsimonious with a given set of gene trees, which are derived from sequencing multiple gene families from various subsets of species. The gene duplication problem is to compute a species tree that requires the minimum number of gene duplication events to reconciliate with the given set of gene trees. The best known heuristic algorithm for this NP-hard problem is a local optimization technique that runs in O(n^2+kmn) time per search step, where k is the number of gene trees, n is the size of the species tree, and m is the maximum size of a gene tree. In this paper, we present a parallel algorithm for the gene duplication problem that runs in O(n^2+kmnp) time for up to p=O(nklogk) processors. Our algorithm exploits multiple levels of parallelism by parallelizing both the exploration of the search neighborhood and reconciliating of the gene trees with species trees in the neighborhood. Due to the wide variance in the sizes of the gene trees, it is difficult to completely characterize the behavior of the algorithm analytically. We present experimental results on the Blue Gene/L to study both levels of parallelism and how best they should be combined to achieve overall minimum execution time. On a large problem that took about 62.5 h on a 3 GHz Pentium 4, our parallel algorithm ran in 7.7 min on a 1024 node Blue Gene/L.