Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Vladimir N. Minin is active.

Publication


Featured researches published by Vladimir N. Minin.


Systematic Biology | 2003

Performance-Based Selection of Likelihood Models for Phylogeny Estimation

Vladimir N. Minin; Zaid Abdo; Paul Joyce; Jack Sullivan

Phylogenetic estimation has largely come to rely on explicitly model-based methods. This approach requires that a model be chosen and that that choice be justified. To date, justification has largely been accomplished through use of likelihood-ratio tests (LRTs) to assess the relative fit of a nested series of reversible models. While this approach certainly represents an important advance over arbitrary model selection, the best fit of a series of models may not always provide the most reliable phylogenetic estimates for finite real data sets, where all available models are surely incorrect. Here, we develop a novel approach to model selection, which is based on the Bayesian information criterion, but incorporates relative branch-length error as a performance measure in a decision theory (DT) framework. This DT method includes a penalty for overfitting, is applicable prior to running extensive analyses, and simultaneously compares all models being considered and thus does not rely on a series of pairwise comparisons of models to traverse model space. We evaluate this method by examining four real data sets and by using those data sets to define simulation conditions. In the real data sets, the DT method selects the same or simpler models than conventional LRTs. In order to lend generality to the simulations, codon-based models (with parameters estimated from the real data sets) were used to generate simulated data sets, which are therefore more complex than any of the models we evaluate. On average, the DT method selects models that are simpler than those chosen by conventional LRTs. Nevertheless, these simpler models provide estimates of branch lengths that are more accurate both in terms of relative error and absolute error than those derived using the more complex (yet still wrong) models chosen by conventional LRTs. This method is available in a program called DT-ModSel.


Bioinformatics | 2005

Dual multiple change-point model leads to more accurate recombination detection

Vladimir N. Minin; Karin S. Dorman; Fang Fang; Marc A. Suchard

MOTIVATION We introduce a dual multiple change-point (MCP) model for recombination detection among aligned nucleotide sequences. The dual MCP model is an extension of the model introduced previously by Suchard and co-workers. In the original single MCP model, one change-point process is used to model spatial phylogenetic variation. Here, we show that using two change-point processes, one for spatial variation of tree topologies and the other for spatial variation of substitution process parameters, increases recombination detection accuracy. Statistical analysis is done in a Bayesian framework using reversible jump Markov chain Monte Carlo sampling to approximate the joint posterior distribution of all model parameters. RESULTS We use primate mitochondrial DNA data with simulated recombination break-points at specific locations to compare the two models. We also analyze two real HIV sequences to identify recombination break-points using the dual MCP model.


Systematic Biology | 2014

Species Delimitation using Genome-Wide SNP Data

Matthew K. Fujita; Vladimir N. Minin; Remco Bouckaert

The multispecies coalescent has provided important progress for evolutionary inferences, including increasing the statistical rigor and objectivity of comparisons among competing species delimitation models. However, Bayesian species delimitation methods typically require brute force integration over gene trees via Markov chain Monte Carlo (MCMC), which introduces a large computation burden and precludes their application to genomic-scale data. Here we combine a recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, to provide a rigorous and computationally tractable technique for genome-wide species delimitation. We provide a critical yet simple correction that brings the likelihoods of different species trees, and more importantly their corresponding marginal likelihoods, to the same common denominator, which enables direct and accurate comparisons of competing species delimitation models using Bayes factors. We test this approach, which we call Bayes factor delimitation (*with genomic data; BFD*), using common species delimitation scenarios with computer simulations. Varying the numbers of loci and the number of samples suggest that the approach can distinguish the true model even with few loci and limited samples per species. Misspecification of the prior for population size θ has little impact on support for the true model. We apply the approach to West African forest geckos (Hemidactylus fasciatus complex) using genome-wide SNP data. This new Bayesian method for species delimitation builds on a growing trend for objective species delimitation methods with explicit model assumptions that are easily tested. [Bayes factor; model testing; phylogeography; RADseq; simulation; speciation.].


Molecular Biology and Evolution | 2009

Learning to Count: Robust Estimates for Labeled Distances between Molecular Sequences

John O'Brien; Vladimir N. Minin; Marc A. Suchard

Researchers routinely estimate distances between molecular sequences using continuous-time Markov chain models. We present a new method, robust counting, that protects against the possibly severe bias arising from model misspecification. We achieve this robustness by generalizing the conventional distance estimation to incorporate the empirical distribution of site patterns found in the observed pairwise sequence alignment. Our flexible framework allows for computing distances based only on a subset of possible substitutions. From this, we show how to estimate labeled codon distances, such as expected numbers of synonymous or nonsynonymous substitutions. We present two simulation studies. The first compares the relative bias and variance of conventional and robust labeled nucleotide estimators. In the second simulation, we demonstrate that robust counting furnishes accurate synonymous and nonsynonymous distance estimates based only on easy-to-fit models of nucleotide substitution, bypassing the need for computationally expensive codon models. We conclude with three empirical examples. In the first two examples, we investigate the evolutionary dynamics of the influenza A hemagglutinin gene using labeled codon distances. In the final example, we demonstrate the advantages of using robust synonymous distances to alleviate the effect of convergent evolution on phylogenetic analysis of an HIV transmission network.


Philosophical Transactions of the Royal Society B | 2008

Fast, accurate and simulation-free stochastic mapping

Vladimir N. Minin; Marc A. Suchard

Mapping evolutionary trajectories of discrete traits onto phylogenies receives considerable attention in evolutionary biology. Given the trait observations at the tips of a phylogenetic tree, researchers are often interested where on the tree the trait changes its state and whether some changes are preferential in certain parts of the tree. In a model-based phylogenetic framework, such questions translate into characterizing probabilistic properties of evolutionary trajectories. Current methods of assessing these properties rely on computationally expensive simulations. In this paper, we present an efficient, simulation-free algorithm for computing two important and ubiquitous evolutionary trajectory properties. The first is the mean number of trait changes, where changes can be divided into classes of interest (e.g. synonymous/non-synonymous mutations). The mean evolutionary reward, accrued proportionally to the time a trait occupies each of its states, is the second property. To illustrate the usefulness of our results, we first employ our simulation-free stochastic mapping to execute a posterior predictive test of correlation between two evolutionary traits. We conclude by mapping synonymous and non-synonymous mutations onto branches of an HIV intrahost phylogenetic tree and comparing selection pressure on terminal and internal tree branches.


Proceedings of the National Academy of Sciences of the United States of America | 2009

High frequency of hotspot mutations in core genes of Escherichia coli due to short-term positive selection

Sujay Chattopadhyay; Scott J. Weissman; Vladimir N. Minin; Thomas A. Russo; Daniel E. Dykhuizen; Evgeni V. Sokurenko

Core genes comprising the ubiquitous backbone of bacterial genomes are not subject to frequent horizontal transfer and generally are not thought to contribute to the adaptive evolution of bacterial pathogens. We determined, however, that at least one-third and possibly more than one-half of the core genes in Escherichia coli genomes are targeted by repeated replacement substitutions in the same amino acid positions—hotspot mutations. Occurrence of hotspot mutations is driven by positive selection, as their rate is significantly higher than expected by random chance alone, and neither intragenic recombination nor increased mutability can explain the observed patterns. Also, commensal E. coli strains have a significantly lower frequency of mutated genes and mutations per genome than pathogenic strains. E. coli strains causing extra-intestinal infections accumulate hotspot mutations at the highest rate, whereas the highest total number of mutated genes has been found among Shigella isolates, suggesting the pathoadaptive nature of such mutations. The vast majority of hotspot mutations are of recent evolutionary origin, implying short-term positive selection, where adaptive mutations emerge repeatedly but are not sustained in natural circulation for long. Such pattern of dynamics is consistent with source-sink model of virulence evolution.


Bioinformatics | 2012

A counting renaissance

Philippe Lemey; Vladimir N. Minin; Filip Bielejec; Sergei L. Kosakovsky Pond; Marc A. Suchard

MOTIVATION Statistical methods for comparing relative rates of synonymous and non-synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates (dN/dS) at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in dN/dS across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non-synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific dN/dS estimates. RESULTS We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific dN/dS values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific dN/dS estimates. Simulations demonstrate that this method competes well with more-principled statistical procedures and, in some cases, even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples.


Proceedings of the National Academy of Sciences of the United States of America | 2014

Predictable transcriptome evolution in the convergent and complex bioluminescent organs of squid

M. Sabrina Pankey; Vladimir N. Minin; Greg C. Imholte; Marc A. Suchard; Todd H. Oakley

Significance Unless there are strong constraints, the probability of complex organs originating multiple times through similar trajectories should be vanishingly small. Here, we report that similar light-producing organs (photophores) evolved separately in two squid species, yet each organ expresses similar genes at comparable levels. Gene expression is so similar that overall expression levels alone can predict organ identity, even in separately evolved traits of squid species separated by tens of millions of years. The striking similarity of expression of hundreds of genes in distinct photophores indicates complex trait evolution may sometimes be more constrained and predictable than expected, either because of internal factors, like a limited array of suitable genetic building blocks, or external factors, like natural selection favoring an optimum. Despite contingency in life’s history, the similarity of evolutionarily convergent traits may represent predictable solutions to common conditions. However, the extent to which overall gene expression levels (transcriptomes) underlying convergent traits are themselves convergent remains largely unexplored. Here, we show strong statistical support for convergent evolutionary origins and massively parallel evolution of the entire transcriptomes in symbiotic bioluminescent organs (bacterial photophores) from two divergent squid species. The gene expression similarities are so strong that regression models of one species’ photophore can predict organ identity of a distantly related photophore from gene expression levels alone. Our results point to widespread parallel changes in gene expression evolution associated with convergent origins of complex organs. Therefore, predictable solutions may drive not only the evolution of novel, complex organs but also the evolution of overall gene expression levels that underlie them.


Genetics | 2007

Phylogenetic Mapping of Recombination Hotspots in Human Immunodeficiency Virus via Spatially Smoothed Change-Point Processes

Vladimir N. Minin; Karin S. Dorman; Fang Fang; Marc A. Suchard

We present a Bayesian framework for inferring spatial preferences of recombination from multiple putative recombinant nucleotide sequences. Phylogenetic recombination detection has been an active area of research for the last 15 years. However, only recently attempts to summarize information from several instances of recombination have been made. We propose a hierarchical model that allows for simultaneous inference of recombination breakpoint locations and spatial variation in recombination frequency. The dual multiple change-point model for phylogenetic recombination detection resides at the lowest level of our hierarchy under the umbrella of a common prior on breakpoint locations. The hierarchical prior allows for information about spatial preferences of recombination to be shared among individual data sets. To overcome the sparseness of breakpoint data, dictated by the modest number of available recombinant sequences, we a priori impose a biologically relevant correlation structure on recombination location log odds via a Gaussian Markov random field hyperprior. To examine the capabilities of our model to recover spatial variation in recombination frequency, we simulate recombination from a predefined distribution of breakpoint locations. We then proceed with the analysis of 42 human immunodeficiency virus (HIV) intersubtype gag recombinants and identify a putative recombination hotspot.


Systematic Biology | 2016

Detecting the Anomaly Zone in Species Trees and Evidence for a Misleading Signal in Higher-Level Skink Phylogeny (Squamata: Scincidae).

Charles W. Linkem; Vladimir N. Minin

The anomaly zone, defined by the presence of gene tree topologies that are more probable than the true species tree, presents a major challenge to the accurate resolution of many parts of the Tree of Life. This discrepancy can result from consecutive rapid speciation events in the species tree. Similar to the problem of long-branch attraction, including more data via loci concatenation will only reinforce the support for the incorrect species tree. Empirical phylogenetic studies often employ coalescent-based species tree methods to avoid the anomaly zone, but to this point these studies have not had a method for providing any direct evidence that the species tree is actually in the anomaly zone. In this study, we use 16 species of lizards in the family Scincidae to investigate whether nodes that are difficult to resolve place the species tree within the anomaly zone. We analyze new phylogenomic data (429 loci), using both concatenation and coalescent-based species tree estimation, to locate conflicting topological signal. We then use the unifying principle of the anomaly zone, together with estimates of ancestral population sizes and species persistence times, to determine whether the observed phylogenetic conflict is a result of the anomaly zone. We identify at least three regions of the Scincidae phylogeny that provide demographic signatures consistent with the anomaly zone, and this new information helps reconcile the phylogenetic conflict in previously published studies on these lizards. The anomaly zone presents a real problem in phylogenetics, and our new framework for identifying anomalous relationships will help empiricists leverage their resources appropriately for investigating and overcoming this challenge.

Collaboration


Dive into the Vladimir N. Minin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jason Xu

University of Washington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fang Fang

Iowa State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peter B. Chi

California Polytechnic State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge