Elizabeth S. Allman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Elizabeth S. Allman is active.

Explore More

Publication

Featured researches published by Elizabeth S. Allman.

Annals of Statistics | 2009

IDENTIFIABILITY OF PARAMETERS IN LATENT STRUCTURE MODELS WITH MANY OBSERVED VARIABLES

Elizabeth S. Allman; Catherine Matias; John A. Rhodes

While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions.

Journal of Mathematical Biology | 2011

Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent

Elizabeth S. Allman; James H. Degnan; John A. Rhodes

Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.

Journal of Computational Biology | 2006

The identifiability of tree topology for phylogenetic models, including covarion and mixture models.

Elizabeth S. Allman; John A. Rhodes

For a model of molecular evolution to be useful for phylogenetic inference, the topology of evolutionary trees must be identifiable. That is, from a joint distribution the model predicts, it must be possible to recover the tree parameter. We establish tree identifiability for a number of phylogenetic models, including a covarion model and a variety of mixture models with a limited number of classes. The proof is based on the introduction of a more general model, allowing more states at internal nodes of the tree than at leaves, and the study of the algebraic variety formed by the joint distributions to which it gives rise. Tree identifiability is first established for this general model through the use of certain phylogenetic invariants.

Advances in Applied Probability | 2008

IDENTIFIABILITY OF A MARKOVIAN MODEL OF MOLECULAR EVOLUTION WITH GAMMA-DISTRIBUTED RATES

Elizabeth S. Allman; Cécile Ané; John A. Rhodes

Inference of evolutionary trees and rates from biological sequences is commonly performed using continuous-time Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data sets and is accounted for by the use of flexible mixture models where each site is allowed its own rate. Very little has been rigorously established concerning the identifiability of the models currently in common use in data analysis, although nonidentifiability was proven for a semiparametric model and an incorrect proof of identifiability was published for a general parametric model (GTR + Γ + I). Here we prove that one of the most widely used models (GTR + Γ) is identifiable for generic parameters, and for all parameter choices in the case of four-state (DNA) models. This is the first proof of identifiability of a phylogenetic model with a continuous distribution of rates.

Journal of Theoretical Biology | 2011

Determining species tree topologies from clade probabilities under the coalescent.

Elizabeth S. Allman; James H. Degnan; John A. Rhodes

One approach to estimating a species tree from a collection of gene trees is to first estimate probabilities of clades from the gene trees, and then to construct the species tree from the estimated clade probabilities. While a greedy consensus algorithm, which consecutively accepts the most probable clades compatible with previously accepted clades, can be used for this second stage, this method is known to be statistically inconsistent under the multispecies coalescent model. This raises the question of whether it is theoretically possible to reconstruct the species tree from known probabilities of clades on gene trees. We investigate clade probabilities arising from the multispecies coalescent model, with an eye toward identifying features of the species tree. Clades on gene trees with probability greater than 1/3 are shown to reflect clades on the species tree, while those with smaller probabilities may not. Linear invariants of clade probabilities are studied both computationally and theoretically, with certain linear invariants giving insight into the clade structure of the species tree. For species trees with generic edge lengths, these invariants can be used to identify the species tree topology. These theoretical results both confirm that clade probabilities contain full information on the species tree topology and suggest future directions of study for developing statistically consistent inference methods from clade frequencies on gene trees.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2011

Identifiability of Two-Tree Mixtures for Group-Based Models

Elizabeth S. Allman; Sonja Petrović; John A. Rhodes; Seth Sullivant

Phylogenetic data arising on two possibly different tree topologies might be mixed through several biological mechanisms, including incomplete lineage sorting or horizontal gene transfer in the case of different topologies, or simply different substitution processes on characters in the case of the same topology. Recent work on a 2-state symmetric model of character change showed that for 4 taxa, such a mixture model has nonidentifiable parameters, and thus, it is theoretically impossible to determine the two tree topologies from any amount of data under such circumstances. Here, the question of identifiability is investigated for two-tree mixtures of the 4-state group-based models, which are more relevant to DNA sequence data. Using algebraic techniques, we show that the tree parameters are identifiable for the JC and K2P models. We also prove that generic substitution parameters for the JC mixture models are identifiable, and for the K2P and K3P models obtain generic identifiability results for mixtures on the same tree. This indicates that the full phylogenetic signal remains in such mixtures, and the 2-state symmetric result is thus a misleading guide to the behavior of other models.

SIAM Journal on Discrete Mathematics | 2014

A Semialgebraic Description of the General Markov Model on Phylogenetic Trees

Elizabeth S. Allman; John A. Rhodes; Amelia Taylor

Many of the stochastic models used in inference of phylogenetic trees from biological sequence data have polynomial parameterization maps. The image of such a map --- the collection of joint distributions for a model --- forms the model space. Since the parameterization is polynomial, the Zariski closure of the model space is an algebraic variety which is typically much larger than the model space, but has been usefully studied with algebraic methods. Of ultimate interest, however, is not the full variety, but only the model space. Here we develop complete semialgebraic descriptions of the model space arising from the k-state general Markov model on a tree, with slightly restricted parameters. Our approach depends upon both recently-formulated analogs of Cayleys hyperdeterminant, and the construction of certain quadratic forms from the joint distribution whose positive (semi-)definiteness encodes information about parameter values. We additionally investigate the use of Sturm sequences for obtaining similar results.

SIAM Journal on Matrix Analysis and Applications | 2013

Tensor Rank, Invariants, Inequalities, and Applications

Elizabeth S. Allman; Peter D. Jarvis; John A. Rhodes; Jeremy G. Sumner

Though algebraic geometry over

Journal of Computational Biology | 2013

Species tree inference by the STAR method and its generalizations.

Elizabeth S. Allman; James H. Degnan; John A. Rhodes

\mathbb C

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2018

Species Tree Inference from Gene Splits by Unrooted STAR Methods

Elizabeth S. Allman; James H. Degnan; John A. Rhodes

is often used to describe the closure of the tensors of a given size and complex rank, this variety includes tensors of both smaller and larger rank. Here we focus on the

Explore More