Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joseph F. Walker is active.

Publication


Featured researches published by Joseph F. Walker.


Bioinformatics | 2017

Phyx: Phylogenetic tools for unix

Joseph W. Brown; Joseph F. Walker; Stephen A. Smith

Summary: The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx: a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream‐centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. Availability and Implementation: phyx runs on POSIX‐compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


American Journal of Botany | 2017

Widespread paleopolyploidy, gene tree conflict, and recalcitrant relationships among the carnivorous Caryophyllales

Joseph F. Walker; Ya Yang; Michael J. Moore; Jessica Mikenas; Alfonso Timoneda; Samuel F. Brockington; Stephen A. Smith

PREMISE OF STUDY The carnivorous members of the large, hyperdiverse Caryophyllales (e.g., Venus flytrap, sundews, and Nepenthes pitcher plants) represent perhaps the oldest and most diverse lineage of carnivorous plants. However, despite numerous studies seeking to elucidate their evolutionary relationships, the early-diverging relationships remain unresolved. METHODS To explore the utility of phylogenomic data sets for resolving relationships among the carnivorous Caryophyllales, we sequenced 10 transcriptomes, including all the carnivorous genera except those in the rare West African liana family Dioncophyllaceae. We used a variety of methods to infer the species tree, examine gene tree conflict, and infer paleopolyploidy events. KEY RESULTS Phylogenomic analyses supported the monophyly of the carnivorous Caryophyllales, with a crown age of 68-83 million years. In contrast to previous analyses, we recovered the remaining noncore Caryophyllales as nonmonophyletic, although the node supporting this relationship contained a significant amount of gene tree discordance. We present evidence that the clade contains at least seven independent paleopolyploidy events, previously unresolved nodes from the literature have high levels of gene tree conflict, and taxon sampling influences topology even in a phylogenomic data set, regardless of the use of coalescent or supermatrix methods. CONCLUSIONS Our data demonstrate the importance of carefully considering gene tree conflict and taxon sampling in phylogenomic analyses. Moreover, they provide a remarkable example of the propensity for paleopolyploidy in angiosperms, with at least seven such events in a clade of less than 2500 species.


New Phytologist | 2018

Improved transcriptome sampling pinpoints 26 ancient and more recent polyploidy events in Caryophyllales, including two allopolyploidy events

Ya Yang; Michael J. Moore; Samuel F. Brockington; Jessica Mikenas; Julia Olivieri; Joseph F. Walker; Stephen A. Smith

Studies of the macroevolutionary legacy of polyploidy are limited by an incomplete sampling of these events across the tree of life. To better locate and understand these events, we need comprehensive taxonomic sampling as well as homology inference methods that accurately reconstruct the frequency and location of gene duplications. We assembled a data set of transcriptomes and genomes from 168 species in Caryophyllales, of which 43 transcriptomes were newly generated for this study, representing one of the most densely sampled genomic-scale data sets available. We carried out phylogenomic analyses using a modified phylome strategy to reconstruct the species tree. We mapped the phylogenetic distribution of polyploidy events by both tree-based and distance-based methods, and explicitly tested scenarios for allopolyploidy. We identified 26 ancient and more recent polyploidy events distributed throughout Caryophyllales. Two of these events were inferred to be allopolyploidy. Through dense phylogenomic sampling, we show the propensity of polyploidy throughout the evolutionary history of Caryophyllales. We also provide a framework for utilizing transcriptome data to detect allopolyploidy, which is important as it may have different macroevolutionary implications compared with autopolyploidy.


New Phytologist | 2018

Disparity, diversity, and duplications in the Caryophyllales

Stephen A. Smith; Joseph W. Brown; Ya Yang; Riva Bruenn; Chloe P. Drummond; Samuel F. Brockington; Joseph F. Walker; Norman A. Douglas; Michael J. Moore

The role played by whole genome duplication (WGD) in plant evolution is actively debated. WGDs have been associated with advantages such as superior colonization, various adaptations, and increased effective population size. However, the lack of a comprehensive mapping of WGDs within a major plant clade has led to uncertainty regarding the potential association of WGDs and higher diversification rates. Using seven chloroplast and nuclear ribosomal genes, we constructed a phylogeny of 5036 species of Caryophyllales, representing nearly half of the extant species. We phylogenetically mapped putative WGDs as identified from analyses on transcriptomic and genomic data and analyzed these in conjunction with shifts in climatic occupancy and lineage diversification rate. Thirteen putative WGDs and 27 diversification shifts could be mapped onto the phylogeny. Of these, four WGDs were concurrent with diversification shifts, with other diversification shifts occurring at more recent nodes than WGDs. Five WGDs were associated with shifts to colder climatic occupancy. While we find that many diversification shifts occur after WGDs, it is difficult to consider diversification and duplication to be tightly correlated. Our findings suggest that duplications may often occur along with shifts in either diversification rate, climatic occupancy, or rate of evolution.


American Journal of Botany | 2018

Quartet Sampling distinguishes lack of support from conflicting support in the green plant tree of life

James B. Pease; Joseph W. Brown; Joseph F. Walker; Cody E. Hinchliff; Stephen A. Smith

PREMISE OF THE STUDY Phylogenetic support has been difficult to evaluate within the green plant tree of life partly due to a lack of specificity between conflicted versus poorly informed branches. As data sets continue to expand in both breadth and depth, new support measures are needed that are more efficient and informative. METHODS We describe the Quartet Sampling (QS) method, a quartet-based evaluation system that synthesizes several phylogenetic and genomic analytical approaches. QS characterizes discordance in large-sparse and genome-wide data sets, overcoming issues of alignment sparsity and distinguishing strong conflict from weak support. We tested QS with simulations and recent plant phylogenies inferred from variously sized data sets. KEY RESULTS QS scores demonstrated convergence with increasing replicates and were not strongly affected by branch depth. Patterns of QS support from different phylogenies led to a coherent understanding of ancestral branches defining key disagreements, including the relationships of Ginkgo to cycads, magnoliids to monocots and eudicots, and mosses to liverworts. The relationships of ANA-grade angiosperms (Amborella, Nymphaeales, Austrobaileyales), major monocot groups, bryophytes, and fern families are likely highly discordant in their evolutionary histories, rather than poorly informed. QS can also detect discordance due to introgression in phylogenomic data. CONCLUSIONS Quartet Sampling is an efficient synthesis of phylogenetic tests that offers more comprehensive and specific information on branch support than conventional measures. The QS method corroborates growing evidence that phylogenomic investigations that incorporate discordance testing are warranted when reconstructing complex evolutionary histories, in particular those surrounding ANA-grade, monocots, and nonvascular plants.


Applications in Plant Sciences | 2017

An Efficient Field and Laboratory Workflow for Plant Phylotranscriptomic Projects

Ya Yang; Michael J. Moore; Samuel F. Brockington; Alfonso Timoneda; Tao Feng; Hannah E. Marx; Joseph F. Walker; Stephen A. Smith

Premise of the study: We describe a field and laboratory workflow developed for plant phylotranscriptomic projects that involves cryogenic tissue collection in the field, RNA extraction and quality control, and library preparation. We also make recommendations for sample curation. Methods and Results: A total of 216 frozen tissue samples of Caryophyllales and other angiosperm taxa were collected from the field or botanical gardens. RNA was extracted, stranded mRNA libraries were prepared, and libraries were sequenced on Illumina HiSeq platforms. These included difficult mucilaginous tissues such as those of Cactaceae and Droseraceae. Conclusions: Our workflow is not only cost effective (ca.


American Journal of Botany | 2018

From cacti to carnivores: Improved phylotranscriptomic sampling and hierarchical homology inference provide further insight into the evolution of Caryophyllales

Joseph F. Walker; Ya Yang; Tao Feng; Alfonso Timoneda; Jessica Mikenas; Vera Hutchison; Caroline Edwards; Ning Wang; Sonia Ahluwalia; Julia Olivieri; Nathanael Walker-Hale; Lucas C. Majure; Raul Puente; Gudrun Kadereit; Maximilian Lauterbach; Urs Eggli; Hilda Flores-Olvera; Helga Ochoterena; Samuel F. Brockington; Michael J. Moore; Stephen A. Smith

270 per sample, as of August 2016, from tissue to reads) and time efficient (less than 50 h for 10–12 samples including all laboratory work and sample curation), but also has proven robust for extraction of difficult samples such as tissues containing high levels of secondary compounds.


bioRxiv | 2017

Improved Transcriptome Sampling Pinpoints 26 Paleopolyploidy Events In Caryophyllales, Including Two Paleo-Allopolyploidy Events

Ya Yang; Michael J. Moore; Samuel F. Brockington; Jessica Mikenas; Julia Olivieri; Joseph F. Walker; Stephen A. Smith

PREMISE OF THE STUDY The Caryophyllales contain ~12,500 species and are known for their cosmopolitan distribution, convergence of trait evolution, and extreme adaptations. Some relationships within the Caryophyllales, like those of many large plant clades, remain unclear, and phylogenetic studies often recover alternative hypotheses. We explore the utility of broad and dense transcriptome sampling across the order for resolving evolutionary relationships in Caryophyllales. METHODS We generated 84 transcriptomes and combined these with 224 publicly available transcriptomes to perform a phylogenomic analysis of Caryophyllales. To overcome the computational challenge of ortholog detection in such a large data set, we developed an approach for clustering gene families that allowed us to analyze >300 transcriptomes and genomes. We then inferred the species relationships using multiple methods and performed gene-tree conflict analyses. KEY RESULTS Our phylogenetic analyses resolved many clades with strong support, but also showed significant gene-tree discordance. This discordance is not only a common feature of phylogenomic studies, but also represents an opportunity to understand processes that have structured phylogenies. We also found taxon sampling influences species-tree inference, highlighting the importance of more focused studies with additional taxon sampling. CONCLUSIONS Transcriptomes are useful both for species-tree inference and for uncovering evolutionary complexity within lineages. Through analyses of gene-tree conflict and multiple methods of species-tree inference, we demonstrate that phylogenomic data can provide unparalleled insight into the evolutionary history of Caryophyllales. We also discuss a method for overcoming computational challenges associated with homolog clustering in large data sets.


bioRxiv | 2017

Site and gene-wise likelihoods unmask influential outliers in phylogenomic analyses

Joseph F. Walker; Joseph W. Brown; Stephen A. Smith

Studies of the macroevolutionary legacy of paleopolyploidy are limited by an incomplete sampling of these events across the tree of life. To better locate and understand these events, we need comprehensive taxonomic sampling as well as homology inference methods that accurately reconstruct the frequency and location of gene duplications. We assembled a dataset of transcriptomes and genomes from 169 species in Caryophyllales, of which 43 were newly generated for this study, representing one of the densest sampled genomic-scale datasets yet available. We carried out phylogenomic analyses using a modified phylome strategy to reconstruct the species tree. We mapped phylogenetic distribution of paleopolyploidy events by both tree-based and distance-based methods, and explicitly tested scenarios for paleo-allopolyploidy. We identified twenty-six paleopolyploidy events distributed throughout Caryophyllales, and using novel techniques inferred two to be paleo-allopolyploidy. Through dense phylogenomic sampling, we show the propensity of paleo-polyploidy in the clade Caryophyllales. We also provide the first method for utilizing transcriptome data to detect paleo-allopolyploidy, which is important as it may have different macro-evolutionary implications compared to paleo-autopolyploidy.


bioRxiv | 2018

Evolution of Portulacineae marked by gene tree conflict and gene family expansion associated with adaptation to harsh environments

Ning Wang; Ya Yang; Michael J. Moore; Samuel F. Brockington; Joseph F. Walker; Joseph W. Brown; Bin Liang; Tao Feng; Caroline Edwards; Jessica Mikenas; Julia Olivieri; Vera Hutchison; Alfonso Timoneda; Tommy Stoughton; Raul Puente; Lucas C. Majure; Urs Eggli; Stephen A. Smith

Despite the wealth of evolutionary information available from sequence data, recalcitrant nodes in phylogenomic studies remain. A recent study of vertebrate transcriptomes by Brown and Thomson (2016) revealed that less than one percent of genes can have strong enough phylogenetic signal to alter the species tree. While identifying these outliers is important, the use of Bayes factors, advocated by Brown and Thomson (2016), is a heavy computational burden for increasingly large and growing datasets. We do not find fault with the Brown and Thomson (2016) study, but instead hope to build on their suggestions and offer some alternatives. Here we suggest that site- and gene-wise likelihoods may be used to idenitfy discordant genes and nodes. We demonstrate this in the vertebrate dataset analyzed by Brown and Thomson (2016) as well as a dataset of carnivorous Caryophyllales (Eudicots: Superasterids). In both datasets, we identify genes that strongly influence species tree inference, and can overrule the signal present in all remaining genes altering the species tree topology. By using a less computationally demanding approach, we can more rapidly examine competing hypotheses, providing a more thorough assessment of overall conflict. For example, our analyses highlight that the debated vertebrate relationship of Alligatoridae sister to turtles, only has six genes with complete coverage for all species of Alligatoridae, birds and turtles. We also find that two genes (~0.0016%) from the 1237 gene dataset of carnivorous Caryophyllales drive the topological estimate and, when removed, the species tree topology supports an alternative hypothesis supported by the remaining genes. Additionally, while the genes highlighted by Brown and Thomson (2016) were revealed to be the result of errors, we suggest that the topology produced by the outlier genes in the carnivorous Caryophyllales may not be the result of methodological error. Close examination of these genes revealed no obvious biases (i.e. no evidence of misidentified orthology, alignment error, or model violations such as significant compositional heterogeneity) suggesting the potential that these genes represent genuine, but exceptional, products of the evolutionary process. Bayes factors have been demonstrated to be helpful in addressing questions of conflict, but require significant computational effort. We suggest that maximum likelihood can also address these questions without the extensive computational burden. Furthermore, we recommend more thorough dataset exploration as this may expose limitations in a dataset to address primary hypotheses. While a dataset may contain hundreds or thousands of genes, only a small subset may be informative for the primary biological question.Recent studies have demonstrated that conflict is common among gene trees in phylogenomic studies, and that less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. Here, we examined two datasets where supermatrix and coalescent-based species trees conflict. We identified two highly influential “outlier” genes in each dataset. When removed from each dataset, the inferred supermatrix trees matched the topologies obtained from coalescent analyses. We also demonstrate that, while the outlier genes in the vertebrate dataset have been shown in a previous study to be the result of errors in orthology detection, the outlier genes from a plant dataset did not exhibit any obvious systematic error and therefore may be the result of some biological process yet to be determined. While topological comparisons among a small set of alternate topologies can be helpful in discovering outlier genes, they can be limited in several ways, such as assuming all genes share the same topology. Coalescent species tree methods relax this assumption but do not explicitly facilitate the examination of specific edges. Coalescent methods often also assume that conflict is the result of incomplete lineage sorting (ILS). Here we explored a framework that allows for quickly examining alternative edges and support for large phylogenomic datasets that does not assume a single topology for all genes. For both datasets, these analyses provided detailed results confirming the support for coalescent-based topologies. This framework suggests that we can improve our understanding of the underlying signal in phylogenomic datasets by asking more targeted edge-based questions.

Collaboration


Dive into the Joseph F. Walker's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ya Yang

University of Minnesota

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Tao Feng

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge