Is this you? Create Your Porfile

Bernard M. E. Moret

École Polytechnique Fédérale de Lausanne

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bernard M. E. Moret is active.

Explore More

Publication

Featured researches published by Bernard M. E. Moret.

Journal of Computational Biology | 2010

How many bootstrap replicates are necessary

Nicholas D. Pattengale; Masoud Alipour; Olaf R. P. Bininda-Emonds; Bernard M. E. Moret; Alexandros Stamatakis

Phylogenetic bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on one of the most popular, maximum likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the relative accuracy of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1 to 2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this article, we propose stopping criteria--that is, thresholds computed at runtime to determine when enough replicates have been generated--and we report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA--single-gene as well as multi-gene--datasets, which include 125-2,554 taxa. We find that our stopping criteria typically stop computations after 100-500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus twofold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through BS, and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100-500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2, and BS with our stopping criteria is included in the latest release of RAxML v7.2.5, available at http://wwwkramer.in.tum.de/exelixis/software.html.

ACM Computing Surveys | 1982

Decision Trees and Diagrams

Bernard M. E. Moret

Decision trees and diagrams (also known as sequential evaluation procedures) have widespread applications in databases, dec~smn table programming, concrete complexity theory, switching theory, pattern recognmon, and taxonomy--in short, wherever discrete functions must be evaluated sequentially. In this tutorial survey a common framework of defimtmns and notation is established, the contributions from the main fields of apphcatmn are reviewed, recent results and extensions are presented, and areas of ongoing and future research are discussed.

research in computational molecular biology | 2009

How Many Bootstrap Replicates Are Necessary

Nicholas D. Pattengale; Masoud Alipour; Olaf R. P. Bininda-Emonds; Bernard M. E. Moret; Alexandros Stamatakis

Phylogenetic Bootstrapping (BS) is a standard technique for inferring confidence values on phylogenetic trees that is based on reconstructing many trees from minor variations of the input data, trees called replicates. BS is used with all phylogenetic reconstruction approaches, but we focus here on the most popular, Maximum Likelihood (ML). Because ML inference is so computationally demanding, it has proved too expensive to date to assess the impact of the number of replicates used in BS on the quality of the support values. For the same reason, a rather small number (typically 100) of BS replicates are computed in real-world studies. Stamatakis et al. recently introduced a BS algorithm that is 1---2 orders of magnitude faster than previous techniques, while yielding qualitatively comparable support values, making an experimental study possible. In this paper, we propose stopping criteria , that is, thresholds computed at runtime to determine when enough replicates have been generated, and report on the first large-scale experimental study to assess the effect of the number of replicates on the quality of support values, including the performance of our proposed criteria. We run our tests on 17 diverse real-world DNA, single-gene as well as multi-gene, datasets, that include between 125 and 2,554 sequences. We find that our stopping criteria typically stop computations after 100---500 replicates (although the most conservative criterion may continue for several thousand replicates) while producing support values that correlate at better than 99.5% with the reference values on the best ML trees. Significantly, we also find that the stopping criteria can recommend very different numbers of replicates for different datasets of comparable sizes. Our results are thus two-fold: (i) they give the first experimental assessment of the effect of the number of BS replicates on the quality of support values returned through bootstrapping; and (ii) they validate our proposals for stopping criteria. Practitioners will no longer have to enter a guess nor worry about the quality of support values; moreover, with most counts of replicates in the 100---500 range, robust BS under ML inference becomes computationally practical for most datasets. The complete test suite is available at http://lcbb.epfl.ch/BS.tar.bz2 and BS with our stopping criteria is included in RAxML 7.1.0.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2004

Phylogenetic Networks: Modeling, Reconstructibility, and Accuracy

Bernard M. E. Moret; Luay Nakhleh; Tandy J. Warnow; C.R. Linder; A. Tholse; A. Padolina; Jerry Sun; R. Timme

Phylogenetic networks model the evolutionary history of sets of organisms when events such as hybrid speciation and horizontal gene transfer occur. In spite of their widely acknowledged importance in evolutionary biology, phylogenetic networks have so far been studied mostly for specific data sets. We present a general definition of phylogenetic networks in terms of directed acyclic graphs (DAGs) and a set of conditions. Further, we distinguish between model networks and reconstructible ones and characterize the effect of extinction and taxon sampling on the reconstructibility of the network. Simulation studies are a standard technique for assessing the performance of phylogenetic methods. A main step in such studies entails quantifying the topological error between the model and inferred phylogenies. While many measures of tree topological accuracy have been proposed, none exist for phylogenetic networks. Previously, we proposed the first such measure, which applied only to a restricted class of networks. In this paper, we extend that measure to apply to all networks, and prove that it is a metric on the space of phylogenetic networks. Our results allow for the systematic study of existing network methods, and for the design of new accurate ones.

Comparative Genomics | 2000

An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae

Mary E. Cosner; Robert K. Jansen; Bernard M. E. Moret; Linda A. Raubeson; Li-San Wang; Tandy J. Warnow; Stacey K. Wyman

The first heuristic for reconstructing phylogenetic trees from gene order data was introduced by Blanchette et al. It sought to reconstruct the breakpoint phytogeny and was applied to a variety of datasets. We present a new heuristic for estimating the breakpoint phylogeny which, although not polynomial-time, is much faster in practice than BPAnalysis. We use this heuristic to conduct a phylogenetic analysis of chloroplast genomes in the flowering plant family Campanulaceae. We also present and discuss the results of experimentation on this real dataset with three methods: our new method, BPAnalysis, and the neighbor-joining method, using breakpoint distances, inversion distances, and inversion plus transposition distances.

workshop on algorithms in bioinformatics | 2002

Inversion Medians Outperform Breakpoint Medians in Phylogeny Reconstruction from Gene-Order Data

Bernard M. E. Moret; Adam Siepel; Jijun Tang; Tao Liu

Phylogeny reconstruction from gene-order data has attracted much attention over the last few years. The two software packages used for that purpose, BPAnalysis and GRAPPA, both use so-called breakpoint medians in their computations. Some of our past results indicate that using inversion scores rather than breakpoint scores in evaluating trees leads to the selection of better trees. On that basis, we conjectured that phylogeny reconstructions could be improved by using inversion medians, which minimize evolutionary distance under an inversions-only model of genome rearrangement. Recent algorithmic developments have made it possible to compute inversion medians for problems of realistic size.Our experimental studies unequivocally show that inversion medians are strongly preferable to breakpoint medians in the context of phylogenetic reconstruction from gene-order data. Improvements are most pronounced in the reconstruction of ancestral genomes, but are also evident in the topological accuracy of the reconstruction as well as, surprisingly, in the overall running time. Improvements are strongest for small average distances along tree edges and for evolutionary scenarios with a preponderance of inversion events, but occur in all cases, including evolutionary scenarios with high proportions of transpositions.All of our tests were run using our GRAPPA package, available (under GPL) at www.cs.unm.edu/~moret/GRAPPA; the next release will include the inversion median software we used in this study. The software used includes RevMed, developed by the authors and available at www.cs.unm.edu/~acs, and A. Capraras inversion median code, generously made available for testing.

computing and combinatorics conference | 2003

Genomic distances under deletions and insertions

Mark Marron; Krister M. Swenson; Bernard M. E. Moret

As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multi-set of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. El-Mabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications). In this paper we extend El-Mabrouks work to handle duplications as well as insertions and present an alternate framework for computing (near) minimal edit sequences involving insertions, deletions, and inversions. We derive an error bound for our polynomial-time distance computation under various assumptions and present preliminary experimental results that suggest that performance in practice may be excellent, within a few percent of the actual distance.

workshop on algorithms in bioinformatics | 2001

Finding an Optimal Inversion Median: Experimental Results

Adam Siepel; Bernard M. E. Moret

We derive a branch-and-bound algorithm to find an optimal inversion median of three signed permutations. The algorithm prunes to manageable size an extremely large search tree using simple geometric properties of the problem and a newly available linear-time routine for inversion distance. Our experiments on simulated data sets indicate that the algorithm finds optimal medians in reasonable time for genomes of medium size when distances are not too large, as commonly occurs in phylogeny reconstruction. In addition, we have compared inversion and breakpoint medians, and found that inversion medians generally score significantly better and tend to be far more unique, which should make them valuable in median-based tree-building algorithms.

ACM Journal of Experimental Algorithms | 2008

Approximating the true evolutionary distance between two genomes

Krister M. Swenson; Mark Marron; Joel V. Earnest-DeYoung; Bernard M. E. Moret

As more and more genomes are sequenced, evolutionary biologists are becoming increasingly interested in evolution at the level of whole genomes, in scenarios in which the genome evolves through insertions, duplications, deletions, and movements of genes along its chromosomes. In the mathematical model pioneered by Sankoff and others, a unichromosomal genome is represented by a signed permutation of a multiset of genes; Hannenhalli and Pevzner showed that the edit distance between two signed permutations of the same set can be computed in polynomial time when all operations are inversions. El-Mabrouk extended that result to allow deletions and a limited form of insertions (which forbids duplications); in turn we extended it to compute a nearly optimal edit sequence between an arbitrary genome and the identity permutation. In this paper we generalize our approach to compute distances between two arbitrary genomes, but focus on approximating the true evolutionary distance rather than the edit distance. We present experimental results showing that our algorithm produces excellent estimates of the true evolutionary distance up to a (high) threshold of saturation; indeed, the distances thus produced are good enough to enable the simple neighbor-joining procedure to reconstruct our test trees with high accuracy.

Siam Journal on Scientific and Statistical Computing | 1985

On Minimizing a Set of Tests

Bernard M. E. Moret; Henry D. Shapiro

Minimizing the size or cost of a set of tests without losing any discrimination power is a common problem in fault testing and diagnosis, pattern recognition, and biological identification. This problem, referred to as the minimum test set problem, is known to be NP-hard, so that determining an optimal solution is not always computationally feasible. Accordingly, researchers have proposed a number of heuristics for building approximate solutions, without, however, providing an analysis of their performance. In this paper, we take an in-depth look at the main heuristics and at the optimal solution methods, both from a theoretical and an experimental standpoint. We characterize the worst-case behavior of the heuristics and discuss their use in bounding. We then present the results of extensive experimentation with randomly generated problems. While the exponential explosion suggested by the problems NP-hardness is apparent, our results suggest that real world testing problems of large sizes can be solved quickly at the expense of large storage requirements.

Explore More