Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alan Filipski is active.

Publication


Featured researches published by Alan Filipski.


Molecular Biology and Evolution | 2013

MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0

Koichiro Tamura; Glen Stecher; Daniel Peterson; Alan Filipski; Sudhir Kumar

We announce the release of an advanced version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which currently contains facilities for building sequence alignments, inferring phylogenetic histories, and conducting molecular evolutionary analysis. In version 6.0, MEGA now enables the inference of timetrees, as it implements the RelTime method for estimating divergence times for all branching points in a phylogeny. A new Timetree Wizard in MEGA6 facilitates this timetree inference by providing a graphical user interface (GUI) to specify the phylogeny and calibration constraints step-by-step. This version also contains enhanced algorithms to search for the optimal trees under evolutionary criteria and implements a more advanced memory management that can double the size of sequence data sets to which MEGA can be applied. Both GUI and command-line versions of MEGA6 can be downloaded from www.megasoftware.net free of charge.


Proceedings of the National Academy of Sciences of the United States of America | 2012

Estimating divergence times in large molecular phylogenies

Koichiro Tamura; Fabia U. Battistuzzi; Paul Billing-Ross; Oscar Murillo; Alan Filipski; Sudhir Kumar

Molecular dating of species divergences has become an important means to add a temporal dimension to the Tree of Life. Increasingly larger datasets encompassing greater taxonomic diversity are becoming available to generate molecular timetrees by using sophisticated methods that model rate variation among lineages. However, the practical application of these methods is challenging because of the exorbitant calculation times required by current methods for contemporary data sizes, the difficulty in correctly modeling the rate heterogeneity in highly diverse taxonomic groups, and the lack of reliable clock calibrations and their uncertainty distributions for most groups of species. Here, we present a method that estimates relative times of divergences for all branching points (nodes) in very large phylogenetic trees without assuming a specific model for lineage rate variation or specifying any clock calibrations. The method (RelTime) performed better than existing methods when applied to very large computer simulated datasets where evolutionary rates were varied extensively among lineages by following autocorrelated and uncorrelated models. On average, RelTime completed calculations 1,000 times faster than the fastest Bayesian method, with even greater speed difference for larger number of sequences. This speed and accuracy will enable molecular dating analysis of very large datasets. Relative time estimates will be useful for determining the relative ordering and spacing of speciation events, identifying lineages with significantly slower or faster evolutionary rates, diagnosing the effect of selected calibrations on absolute divergence times, and estimating absolute times of divergence when highly reliable calibration points are available.


Molecular Biology and Evolution | 2012

Statistics and Truth in Phylogenomics

Sudhir Kumar; Alan Filipski; Fabia U. Battistuzzi; Sergei L. Kosakovsky Pond; Koichiro Tamura

Phylogenomics refers to the inference of historical relationships among species using genome-scale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genome-scale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value). However, reports of highly significant P values are increasing even for contrasting phylogenetic hypotheses depending on the evolutionary model and inference method used, making it difficult to establish true relationships. We argue that the assessment of the robustness of results to biological factors, that may systematically mislead (bias) the outcomes of statistical estimation, will be a key to avoiding incorrect phylogenomic inferences. In fact, there is a need for increased emphasis on the magnitude of differences (effect sizes) in addition to the P values of the statistical test of the null hypothesis. On the other hand, the amount of sequence data available will likely always remain inadequate for some phylogenomic applications, for example, those involving episodic positive selection at individual codon positions and in specific lineages. Again, a focus on effect size and biological relevance, rather than the P value, may be warranted. Here, we present a theoretical overview and discuss practical aspects of the interplay between effect sizes, bias, and P values as it relates to the statistical inference of evolutionary truth in phylogenomics.


Molecular Biology and Evolution | 2010

Performance of Relaxed-Clock Methods in Estimating Evolutionary Divergence Times and Their Credibility Intervals

Fabia U. Battistuzzi; Alan Filipski; S. Blair Hedges; Sudhir Kumar

The rapid expansion of sequence data and the development of statistical approaches that embrace varying evolutionary rates among lineages have encouraged many more investigators to use DNA and protein data to time species divergences. Here, we report results from a systematic evaluation, by means of computer simulation, of the performance of two frequently used relaxed-clock methods for estimating these times and their credibility intervals (CrIs). These relaxed-clock methods allow rates to vary in a phylogeny randomly over lineages (e.g., BEAST software) and in autocorrelated fashion (e.g., MultiDivTime software). We applied these methods for analyzing sequence data sets simulated using naturally derived parameters (evolutionary rates, sequence lengths, and base substitution patterns) and assuming that clock calibrations are known without error. We find that the estimated times are, on average, close to the true times as long as the assumed model of lineage rate changes matches the actual model. The 95% CrIs also contain the true time for >or=95% of the simulated data sets. However, the use of incorrect lineage rate model reduces this frequency to 83%, indicating that the relaxed-clock methods are not robust to the violation of underlying lineage rate model. Because these rate models are rarely known a priori and are difficult to detect empirically, we suggest building composite CrIs using CrIs produced from MultiDivTime and BEAST analysis. These composite CrIs are found to contain the true time for >or=97% data sets. Our analyses also verify the usefulness of the common practice of interpreting the congruence of times inferred from different methods as a reflection of the accuracy of time estimates. Overall, our results show that simple strategies can be used to enhance our ability to estimate times and their CrIs when using the relaxed-clock methods.


Trends in Genetics | 2011

Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations.

Sudhir Kumar; Joel T. Dudley; Alan Filipski; Li Liu

Modern technologies have made the sequencing of personal genomes routine. They have revealed thousands of nonsynonymous (amino acid altering) single nucleotide variants (nSNVs) of protein-coding DNA per genome. What do these variants foretell about an individuals predisposition to diseases? The experimental technologies required to carry out such evaluations at a genomic scale are not yet available. Fortunately, the process of natural selection has lent us an almost infinite set of tests in nature. During long-term evolution, new mutations and existing variations have been evaluated for their biological consequences in countless species, and outcomes are readily revealed by multispecies genome comparisons. We review studies that have investigated evolutionary characteristics and in silico functional diagnoses of nSNVs found in thousands of disease-associated genes. We conclude that the patterns of long-term evolutionary conservation and permissible sequence divergence are essential and instructive modalities for functional assessment of human genetic variations.


Genome Research | 2009

Positional conservation and amino acids shape the correct diagnosis and population frequencies of benign and damaging personal amino acid mutations

Sudhir Kumar; Michael P. Suleski; Glenn J. Markov; Simon Lawrence; Antonio Marco; Alan Filipski

As the cost of DNA sequencing drops, we are moving beyond one genome per species to one genome per individual to improve prevention, diagnosis, and treatment of disease by using personal genotypes. Computational methods are frequently applied to predict impairment of gene function by nonsynonymous mutations in individual genomes and single nucleotide polymorphisms (nSNPs) in populations. These computational tools are, however, known to fail 15%-40% of the time. We find that accurate discrimination between benign and deleterious mutations is strongly influenced by the long-term (among species) history of positions that harbor those mutations. Successful prediction of known disease-associated mutations (DAMs) is much higher for evolutionarily conserved positions and for original-mutant amino acid pairs that are rarely seen among species. Prediction accuracies for nSNPs show opposite patterns, forecasting impediments to building diagnostic tools aiming to simultaneously reduce both false-positive and false-negative errors. The relative allele frequencies of mutations diagnosed as benign and damaging are predicted by positional evolutionary rates. These allele frequencies are modulated by the relative preponderance of the mutant allele in the set of amino acids found at homologous sites in other species (evolutionarily permissible alleles [EPAs]). The nSNPs found in EPAs are biochemically less severe than those missing from EPAs across all allele frequency categories. Therefore, it is important to consider position evolutionary rates and EPAs when interpreting the consequences and population frequencies of human mutations. The impending sequencing of thousands of human and many more vertebrate genomes will lead to more accurate classifiers needed in real-world applications.


Molecular Biology and Evolution | 2014

Prospects for building large timetrees using molecular data with incomplete gene coverage among species

Alan Filipski; Oscar Murillo; Anna Freydenzon; Koichiro Tamura; Sudhir Kumar

Scientists are assembling sequence data sets from increasing numbers of species and genes to build comprehensive timetrees. However, data are often unavailable for some species and gene combinations, and the proportion of missing data is often large for data sets containing many genes and species. Surprisingly, there has not been a systematic analysis of the effect of the degree of sparseness of the species-gene matrix on the accuracy of divergence time estimates. Here, we present results from computer simulations and empirical data analyses to quantify the impact of missing gene data on divergence time estimation in large phylogenies. We found that estimates of divergence times were robust even when sequences from a majority of genes for most of the species were absent. From the analysis of such extremely sparse data sets, we found that the most egregious errors occurred for nodes in the tree that had no common genes for any pair of species in the immediate descendant clades of the node in question. These problematic nodes can be easily detected prior to computational analyses based only on the input sequence alignment and the tree topology. We conclude that it is best to use larger alignments, because adding both genes and species to the alignment augments the number of genes available for estimating divergence events deep in the tree and improves their time estimates.


The Evolution of the Genome | 2005

Comparative Genomics in Eukaryotes

Alan Filipski; Sudhir Kumar

Publisher Summary This chapter outlines the development and current status of comparative eukaryotic genomics, from the earliest studies of basic chromosome structure to the sequencing of entire genomes. In the process, a review is provided for the structure, organization, and composition of the primary eukaryotic genomes that have been sequenced thus far. Although the word “genome,” meaning the total hereditary material of an organism, was coined in 1920, the general concept of genome arose before 4th century, when Aristotle implicated blood as the heredity substance. The notions of “blood relations” and characteristics being “in ones blood” persist; it is now known that the blood of mammals actually contains very little genetic material because their erythrocytes contain neither nuclei nor mitochondria. Although its roots can be traced back to the earliest chromosomal work, comparative genomics involving complete genome sequencing is a science still in its infancy. Fast-growing and full of potential, its maturation is expected to influence an increasingly broad array of biological disciplines. Already, widespread implications can be envisioned for evolutionary biology, medicine, and agriculture; in some cases, these have already become reality. The large-scale comparison, and perhaps even manipulation, of genomes is a complex undertaking involving numerous empirical, analytical, and ethical issues. Both important challenges and exciting discoveries lie ahead for genome biology.


Proceedings of the National Academy of Sciences of the United States of America | 2005

Placing confidence limits on the molecular age of the human–chimpanzee divergence

Sudhir Kumar; Alan Filipski; Vinod Swarna; Alan W. Walker; S. Blair Hedges


Genome Research | 2007

Multiple sequence alignment: In pursuit of homologous DNA positions

Sudhir Kumar; Alan Filipski

Collaboration


Dive into the Alan Filipski's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Koichiro Tamura

Tokyo Metropolitan University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joel T. Dudley

Icahn School of Medicine at Mount Sinai

View shared research outputs
Researchain Logo
Decentralizing Knowledge