Alan R. Lemmon
Florida State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alan R. Lemmon.
Nature | 2015
Richard O. Prum; Jacob S. Berv; Alex Dornburg; Daniel J. Field; Jeffrey P. Townsend; Emily Moriarty Lemmon; Alan R. Lemmon
Although reconstruction of the phylogeny of living birds has progressed tremendously in the last decade, the evolutionary history of Neoaves—a clade that encompasses nearly all living bird species—remains the greatest unresolved challenge in dinosaur systematics. Here we investigate avian phylogeny with an unprecedented scale of data: >390,000 bases of genomic sequence data from each of 198 species of living birds, representing all major avian lineages, and two crocodilian outgroups. Sequence data were collected using anchored hybrid enrichment, yielding 259 nuclear loci with an average length of 1,523 bases for a total data set of over 7.8 × 107 bases. Bayesian and maximum likelihood analyses yielded highly supported and nearly identical phylogenetic trees for all major avian lineages. Five major clades form successive sister groups to the rest of Neoaves: (1) a clade including nightjars, other caprimulgiforms, swifts, and hummingbirds; (2) a clade uniting cuckoos, bustards, and turacos with pigeons, mesites, and sandgrouse; (3) cranes and their relatives; (4) a comprehensive waterbird clade, including all diving, wading, and shorebirds; and (5) a comprehensive landbird clade with the enigmatic hoatzin (Opisthocomus hoazin) as the sister group to the rest. Neither of the two main, recently proposed Neoavian clades—Columbea and Passerea—were supported as monophyletic. The results of our divergence time analyses are congruent with the palaeontological record, supporting a major radiation of crown birds in the wake of the Cretaceous–Palaeogene (K–Pg) mass extinction.
Systematic Biology | 2012
Alan R. Lemmon; Sandra A. Emme; Emily Moriarty Lemmon
The field of phylogenetics is on the cusp of a major revolution, enabled by new methods of data collection that leverage both genomic resources and recent advances in DNA sequencing. Previous phylogenetic work has required labor-intensive marker development coupled with single-locus polymerase chain reaction and DNA sequencing on clade-by-clade and locus-by-locus basis. Here, we present a new, cost-efficient, and rapid approach to obtaining data from hundreds of loci for potentially hundreds of individuals for deep and shallow phylogenetic studies. Specifically, we designed probes for target enrichment of >500 loci in highly conserved anchor regions of vertebrate genomes (flanked by less conserved regions) from five model species and tested enrichment efficiency in nonmodel species up to 508 million years divergent from the nearest model. We found that hybrid enrichment using conserved probes (anchored enrichment) can recover a large number of unlinked loci that are useful at a diversity of phylogenetic timescales. This new approach has the potential not only to expedite resolution of deep-scale portions of the Tree of Life but also to greatly accelerate resolution of the large number of shallow clades that remain unresolved. The combination of low cost (~1% of the cost of traditional Sanger sequencing and ~3.5% of the cost of high-throughput amplicon sequencing for projects on the scale of 500 loci × 100 individuals) and rapid data collection (~2 weeks of laboratory time) are expected to make this approach tractable even for researchers working on systems with limited or nonexistent genomic resources.
Systematic Biology | 2009
Alan R. Lemmon; Jeremy M. Brown; Kathrin F. Stanger-Hall; Emily Moriarty Lemmon
Abstract Although an increasing number of phylogenetic data sets are incomplete, the effect of ambiguous data on phylogenetic accuracy is not well understood. We use 4-taxon simulations to study the effects of ambiguous data (i.e., missing characters or gaps) in maximum likelihood (ML) and Bayesian frameworks. By introducing ambiguous data in a way that removes confounding factors, we provide the first clear understanding of 1 mechanism by which ambiguous data can mislead phylogenetic analyses. We find that in both ML and Bayesian frameworks, among-site rate variation can interact with ambiguous data to produce misleading estimates of topology and branch lengths. Furthermore, within a Bayesian framework, priors on branch lengths and rate heterogeneity parameters can exacerbate the effects of ambiguous data, resulting in strongly misleading bipartition posterior probabilities. The magnitude and direction of the ambiguous data bias are a function of the number and taxonomic distribution of ambiguous characters, the strength of topological support, and whether or not the model is correctly specified. The results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruction, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis.
Systematic Biology | 2007
Jeremy M. Brown; Alan R. Lemmon
As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses. We also used Bayes factors to test empirical data for the need to divide data in a manner that has no expected biological meaning. Posterior probability estimates are misleading when an incorrect partitioning strategy is assumed. The error was greatest when the assumed model was underpartitioned. These results suggest that model partitioning is important for large data sets. Bayes factors performed well, giving a 5% type I error rate, which is remarkably consistent with standard frequentist hypothesis tests. The sensitivity of Bayes factors was found to be quite high when the across-class model heterogeneity reflected that of empirical data. These results suggest that Bayes factors represent a robust method of choosing among partitioning strategies. Lastly, results of tests for the inclusion of unexpected divisions in empirical data mirrored the simulation results, although the outcome of such tests is highly dependent on accounting for rate variation among classes. We conclude by discussing other approaches for partitioning data, as well as other applications of Bayes factors.
Systematic Biology | 2004
Alan R. Lemmon; Emily C. Moriarty
We studied the importance of proper model assumption in the context of Bayesian phylogenetics by examining >5,000 Bayesian analyses and six nested models of nucleotide substitution. Model misspecification can strongly bias bipartition posterior probability estimates. These biases were most pronounced when rate heterogeneity was ignored. The type of bias seen at a particular bipartition appeared to be strongly influenced by the lengths of the branches surrounding that bipartition. In the Felsenstein zone, posterior probability estimates of bipartitions were biased when the assumed model was underparameterized but were unbiased when the assumed model was overparameterized. For the inverse Felsenstein zone, however, both underparameterization and overparameterization led to biased bipartition posterior probabilities, although the bias caused by overparameterization was less pronounced and disappeared with increased sequence length. Model parameter estimates were also affected by model misspecification. Underparameterization caused a bias in some parameter estimates, such as branch lengths and the gamma shape parameter, whereas overparameterization caused a decrease in the precision of some parameter estimates. We caution researchers to assure that the most appropriate model is assumed by employing both a priori model choice methods and a posteriori model adequacy tests.
Systematic Biology | 2010
Jeremy M. Brown; Shannon M. Hedtke; Alan R. Lemmon; Emily Moriarty Lemmon
A surprising number of recent Bayesian phylogenetic analyses contain branch-length estimates that are several orders of magnitude longer than corresponding maximum-likelihood estimates. The levels of divergence implied by such branch lengths are unreasonable for studies using biological data and are known to be false for studies using simulated data. We conducted additional Bayesian analyses and studied approximate-posterior surfaces to investigate the causes underlying these large errors. We manipulated the starting parameter values of the Markov chain Monte Carlo (MCMC) analyses, the moves used by the MCMC analyses, and the prior-probability distribution on branch lengths. We demonstrate that inaccurate branch-length estimates result from either 1) poor mixing of MCMC chains or 2) posterior distributions with excessive weight at long tree lengths. Both effects are caused by a rapid increase in the volume of branch-length space as branches become longer. In the former case, both an MCMC move that scales all branch lengths in the tree simultaneously and the use of overdispersed starting branch lengths allow the chain to accurately sample the posterior distribution and should be used in Bayesian analyses of phylogeny. In the latter case, branch-length priors can have strong effects on resulting inferences and should be carefully chosen to reflect biological expectations. We provide a formula to calculate an exponential rate parameter for the branch-length prior that should eliminate inference of biased branch lengths in many cases. In any phylogenetic analysis, the biological plausibility of branch-length output must be carefully considered.
Proceedings of the National Academy of Sciences of the United States of America | 2002
Alan R. Lemmon; Michel C. Milinkovitch
Large phylogeny estimation is a combinatorial optimization problem that no future computer will ever be able to solve exactly in practical computing time. The difficulty of the problem is amplified by the need to use complex evolutionary models and large taxon samplings. Hence, many heuristic approaches have been developed, with varying degrees of success. Here, we report on a heuristic approach, the metapopulation genetic algorithm, involving several populations of trees that are forced to cooperate in the search for the optimal tree. Within each population, trees are subjected to evaluation, selection, and mutation events, which are directed by using inter-population consensus information. The method proves to be both very accurate and vastly faster than existing heuristics, such that data sets comprised of hundreds of taxa can be analyzed in practical computing times under complex models of maximum-likelihood evolution. Branch support values produced by the metapopulation genetic algorithm might closely approximate the posterior probabilities of the corresponding branches.
BMC Genomics | 2012
Darin R. Rokyta; Alan R. Lemmon; Mark J. Margres; Karalyn Aronow
BackgroundSnake venoms have significant impacts on human populations through the morbidity and mortality associated with snakebites and as sources of drugs, drug leads, and physiological research tools. Genes expressed by venom-gland tissue, including those encoding toxic proteins, have therefore been sequenced but only with relatively sparse coverage resulting from the low-throughput sequencing approaches available. High-throughput approaches based on 454 pyrosequencing have recently been applied to the study of snake venoms to give the most complete characterizations to date of the genes expressed in active venom glands, but such approaches are costly and still provide a far-from-complete characterization of the genes expressed during venom production.ResultsWe describe the de novo assembly and analysis of the venom-gland transcriptome of an eastern diamondback rattlesnake (Crotalus adamanteus) based on 95,643,958 pairs of quality-filtered, 100-base-pair Illumina reads. We identified 123 unique, full-length toxin-coding sequences, which cluster into 78 groups with less than 1% nucleotide divergence, and 2,879 unique, full-length nontoxin coding sequences. The toxin sequences accounted for 35.4% of the total reads, and the nontoxin sequences for an additional 27.5%. The most highly expressed toxin was a small myotoxin related to crotamine, which accounted for 5.9% of the total reads. Snake-venom metalloproteinases accounted for the highest percentage of reads mapping to a toxin class (24.4%), followed by C-type lectins (22.2%) and serine proteinases (20.0%). The most diverse toxin classes were the C-type lectins (21 clusters), the snake-venom metalloproteinases (16 clusters), and the serine proteinases (14 clusters). The high-abundance nontoxin transcripts were predominantly those involved in protein folding and translation, consistent with the protein-secretory function of the tissue.ConclusionsWe have provided the most complete characterization of the genes expressed in an active snake venom gland to date, producing insights into snakebite pathology and guidance for snakebite treatment for the largest rattlesnake species and arguably the most dangerous snake native to the United States of America, C. adamanteus. We have more than doubled the number of sequenced toxins for this species and created extensive genomic resources for snakes based entirely on de novo assembly of Illumina sequence data.
Evolution | 2007
Emily Moriarty Lemmon; Alan R. Lemmon; David C. Cannatella
Abstract Tertiary geological events and Quaternary climatic fluctuations have been proposed as important factors of speciation in the North American flora and fauna. Few studies, however, have rigorously tested hypotheses regarding the specific factors driving divergence of taxa. Here, we test explicit speciation hypotheses by correlating geologic events with divergence times among species in the continentally distributed trilling chorus frogs (Pseudacris). In particular, we ask whether marine inundation of the Mississippi Embayment, uplift of the Appalachian Mountains, or modification of the ancient Teays-Mahomet River system contributed to speciation. To examine the plausibility of ancient rivers causing divergence, we tested whether modern river systems inhibit gene flow. Additionally, we compared the effects of Quaternary climatic factors (glaciation and aridification) on levels of genetic variation. Divergence time estimates using penalized likelihood and coalescent approaches indicate that the major lineages of chorus frogs diversified during the Tertiary, and also exclude Quaternary climate change as a factor in speciation of chorus frogs. We show the first evidence that inundation of the Mississippi Embayment contributed to speciation. We reject the hypotheses that Cenozoic uplift of the Appalachians and that diversion of the Teays-Mahomet River contributed to speciation in this clade. We find that by reducing gene flow, rivers have the potential to cause divergence of lineages. Finally, we demonstrate that populations in areas affected by Quaternary glaciation and aridification have reduced levels of genetic variation compared to those from more equable regions, suggesting recent colonization.
Systematic Biology | 2008
Alan R. Lemmon; Emily Moriarty Lemmon
Due to lack of an adequate statistical framework, biologists studying phylogeography are abandoning traditional methods of estimating phylogeographic history in favor of statistical methods designed to test a priori hypotheses. These new methods may, however, have limited descriptive utility. Here, we develop a new statistical framework that can be used to both test a priori hypotheses and estimate phylogeographic history of a gene (and the statistical confidence in that history) in the absence of such hypotheses. The statistical approach concentrates on estimation of geographic locations of the ancestors of a set of sampled organisms. Now we use (2) to derive the likelihood of the ancestral geographic coordinates and the value of the scaled dispersal parameter, given the observed geographic coordinates (assuming known topology and branch lengths). Using a maximum likelihood approach, which is implemented in the new program PhyloMapper, we apply this statistical framework to a 246-taxon mitochondrial genealogy of North American chorus frogs, focusing in detail on one of these species. We demonstrate three lines of evidence for recent northward expansion of the mitochondrion of the coastal clade of Pseudacris feriarum: higher per-generation dispersal distance in the recently colonized region, a noncentral ancestral location, and directional migration. After illustrating one method of accommodating phylogenetic uncertainty, we conclude by discussing how extensions of this framework could function to incorporate a priori ecological and geological information into phylogeographic analyses.