Glenn Hickey
University of California, Santa Cruz
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Glenn Hickey.
Nucleic Acids Research | 2015
Kate R. Rosenbloom; Joel Armstrong; Galt P. Barber; Jonathan Casper; Hiram Clawson; Mark Diekhans; Timothy R. Dreszer; Pauline A. Fujita; Luvina Guruvadoo; Maximilian Haeussler; Rachel A. Harte; Steven G. Heitner; Glenn Hickey; Angie S. Hinrichs; Robert Hubley; Donna Karolchik; Katrina Learned; Brian T. Lee; Chin H. Li; Karen H. Miga; Ngan Nguyen; Benedict Paten; Brian J. Raney; Arian Smit; Matthew L. Speir; Ann S. Zweig; David Haussler; Robert M. Kuhn; W. James Kent
Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), ‘mined the web’ for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.
Science | 2014
Richard E. Green; Edward L. Braun; Joel Armstrong; Dent Earl; Ngan Nguyen; Glenn Hickey; Michael W. Vandewege; John St. John; Salvador Capella-Gutiérrez; Todd A. Castoe; Colin Kern; Matthew K. Fujita; Juan C. Opazo; Jerzy Jurka; Kenji K. Kojima; Juan Caballero; Robert Hubley; Arian Smit; Roy N. Platt; Christine Lavoie; Meganathan P. Ramakodi; John W. Finger; Alexander Suh; Sally R. Isberg; Lee G. Miles; Amanda Y. Chong; Weerachai Jaratlerdsiri; Jaime Gongora; C. Moran; Andrés Iriarte
INTRODUCTION Crocodilians and birds are the two extant clades of archosaurs, a group that includes the extinct dinosaurs and pterosaurs. Fossils suggest that living crocodilians (alligators, crocodiles, and gharials) have a most recent common ancestor 80 to 100 million years ago. Extant crocodilians are notable for their distinct morphology, limited intraspecific variation, and slow karyotype evolution. Despite their unique biology and phylogenetic position, little is known about genome evolution within crocodilians. Evolutionary rates of tetrapods inferred from DNA sequences anchored by ultraconserved elements. Evolutionary rates among reptiles vary, with especially low rates among extant crocodilians but high rates among squamates. We have reconstructed the genomes of the common ancestor of birds and of all archosaurs (shown in gray silhouette, although the morphology of these species is uncertain). RATIONALE Genome sequences for the American alligator, saltwater crocodile, and Indian gharial—representatives of all three extant crocodilian families—were obtained to facilitate better understanding of the unique biology of this group and provide a context for studying avian genome evolution. Sequence data from these three crocodilians and birds also allow reconstruction of the ancestral archosaurian genome. RESULTS We sequenced shotgun genomic libraries from each species and used a variety of assembly strategies to obtain draft genomes for these three crocodilians. The assembled scaffold N50 was highest for the alligator (508 kilobases). Using a panel of reptile genome sequences, we generated phylogenies that confirm the sister relationship between crocodiles and gharials, the relationship with birds as members of extant Archosauria, and the outgroup status of turtles relative to birds and crocodilians. We also estimated evolutionary rates along branches of the tetrapod phylogeny using two approaches: ultraconserved element–anchored sequences and fourfold degenerate sites within stringently filtered orthologous gene alignments. Both analyses indicate that the rates of base substitution along the crocodilian and turtle lineages are extremely low. Supporting observations were made for transposable element content and for gene family evolution. Analysis of whole-genome alignments across a panel of reptiles and mammals showed that the rate of accumulation of micro-insertions and microdeletions is proportionally lower in crocodilians, consistent with a single underlying cause of a reduced rate of evolutionary change rather than intrinsic differences in base repair machinery. We hypothesize that this single cause may be a consistently longer generation time over the evolutionary history of Crocodylia. Low heterozygosity was observed in each genome, consistent with previous analyses, including the Chinese alligator. Pairwise sequential Markov chain analysis of regional heterozygosity indicates that during glacial cycles of the Pleistocene, each species suffered reductions in effective population size. The reduction was especially strong for the American alligator, whose current range extends farthest into regions of temperate climates. CONCLUSION We used crocodilian, avian, and outgroup genomes to reconstruct 584 megabases of the archosaurian common ancestor genome and the genomes of key ancestral nodes. The estimated accuracy of the archosaurian genome reconstruction is 91% and is higher for conserved regions such as genes. The reconstructed genome can be improved by adding more crocodilian and avian genome assemblies and may provide a unique window to the genomes of extinct organisms such as dinosaurs and pterosaurs. To provide context for the diversification of archosaurs—the group that includes crocodilians, dinosaurs, and birds—we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs.
Evolutionary Bioinformatics | 2008
Glenn Hickey; Frank K. H. A. Dehne; Andrew Rau-Chaplin; Christian Blouin
The subtree prune and regraft distance (dSPR) between phylogenetic trees is important both as a general means of comparing phylogenetic tree topologies as well as a measure of lateral gene transfer (LGT). Although there has been extensive study on the computation of dSPR and similar metrics between rooted trees, much less is known about SPR distances for unrooted trees, which often arise in practice when the root is unresolved. We show that unrooted SPR distance computation is NP-Hard and verify which techniques from related work can and cannot be applied. We then present an efficient heuristic algorithm for this problem and benchmark it on a variety of synthetic datasets. Our algorithm computes the exact SPR distance between unrooted tree, and the heuristic element is only with respect to the algorithms computation time. Our method is a heuristic version of a fixed parameter tractability (FPT) approach and our experiments indicate that the running time behaves similar to FPT algorithms. For real data sets, our algorithm was able to quickly compute dSPR for the majority of trees that were part of a study of LGT in 144 prokaryotic genomes. Our analysis of its performance, especially with respect to searching and reduction rules, is applicable to computing many related distance measures.
Genome Research | 2014
Dent Earl; Ngan Nguyen; Glenn Hickey; Robert S. Harris; Stephen Fitzgerald; Kathryn Beal; Seledtsov I; Molodtsov; Brian J. Raney; Hiram Clawson; Jaebum Kim; Carsten Kemena; Jia-Ming Chang; Ionas Erb; Poliakov A; Minmei Hou; Javier Herrero; William Kent; Solovyev; Aaron E. Darling; Jian Ma; Cedric Notredame; Michael Brudno; Inna Dubchak; David Haussler; Benedict Paten
Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.
Mobile Dna | 2015
Douglas R. Hoen; Glenn Hickey; Guillaume Bourque; Josep Casacuberta; Richard Cordaux; Cédric Feschotte; Anna Sophie Fiston-Lavier; Aurélie Hua-Van; Robert Hubley; Aurélie Kapusta; Emmanuelle Lerat; Florian Maumus; David D. Pollock; Hadi Quesneville; Arian Smit; Travis J. Wheeler; Thomas E. Bureau; Mathieu Blanchette
DNA derived from transposable elements (TEs) constitutes large parts of the genomes of complex eukaryotes, with major impacts not only on genomic research but also on how organisms evolve and function. Although a variety of methods and tools have been developed to detect and annotate TEs, there are as yet no standard benchmarks—that is, no standard way to measure or compare their accuracy. This lack of accuracy assessment calls into question conclusions from a wide range of research that depends explicitly or implicitly on TE annotation. In the absence of standard benchmarks, toolmakers are impeded in improving their tools, annotators cannot properly assess which tools might best suit their needs, and downstream researchers cannot judge how accuracy limitations might impact their studies. We therefore propose that the TE research community create and adopt standard TE annotation benchmarks, and we call for other researchers to join the authors in making this long-overdue effort a success.
Bioinformatics | 2013
Glenn Hickey; Benedict Paten; Dent Earl; Daniel R. Zerbino; David Haussler
Motivation: Large multiple genome alignments and inferred ancestral genomes are ideal resources for comparative studies of molecular evolution, and advances in sequencing and computing technology are making them increasingly obtainable. These structures can provide a rich understanding of the genetic relationships between all subsets of species they contain. Current formats for storing genomic alignments, such as XMFA and MAF, are all indexed or ordered using a single reference genome, however, which limits the information that can be queried with respect to other species and clades. This loss of information grows with the number of species under comparison, as well as their phylogenetic distance. Results: We present HAL, a compressed, graph-based hierarchical alignment format for storing multiple genome alignments and ancestral reconstructions. HAL graphs are indexed on all genomes they contain. Furthermore, they are organized phylogenetically, which allows for modular and parallel access to arbitrary subclades without fragmentation because of rearrangements that have occurred in other lineages. HAL graphs can be created or read with a comprehensive C++ API. A set of tools is also provided to perform basic operations, such as importing and exporting data, identifying mutations and coordinate mapping (liftover). Availability: All documentation and source code for the HAL API and tools are freely available at http://github.com/glennhickey/hal. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Bioinformatics | 2014
Ngan Nguyen; Glenn Hickey; Brian J. Raney; Joel Armstrong; Hiram Clawson; Ann S. Zweig; Donna Karolchik; William James Kent; David Haussler; Benedict Paten
MOTIVATION Researchers now have access to large volumes of genome sequences for comparative analysis, some generated by the plethora of public sequencing projects and, increasingly, from individual efforts. It is not possible, or necessarily desirable, that the public genome browsers attempt to curate all these data. Instead, a wealth of powerful tools is emerging to empower users to create their own visualizations and browsers. RESULTS We introduce a pipeline to easily generate collections of Web-accessible UCSC Genome Browsers interrelated by an alignment. It is intended to democratize our comparative genomic browser resources, serving the broad and growing community of evolutionary genomicists and facilitating easy public sharing via the Internet. Using the alignment, all annotations and the alignment itself can be efficiently viewed with reference to any genome in the collection, symmetrically. A new, intelligently scaled alignment display makes it simple to view all changes between the genomes at all levels of resolution, from substitutions to complex structural rearrangements, including duplications. To demonstrate this work, we create a comparative assembly hub containing 57 Escherichia coli and 9 Shigella genomes and show examples that highlight their unique biology. AVAILABILITY AND IMPLEMENTATION The source code is available as open source at: https://github.com/glennhickey/progressiveCactus The E.coli and Shigella genome hub is now a public hub listed on the UCSC browser public hubs Web page.
Journal of Computational Biology | 2011
Glenn Hickey; Mathieu Blanchette
Probabilistic approaches for sequence alignment are usually based on pair Hidden Markov Models (HMMs) or Stochastic Context Free Grammars (SCFGs). Recent studies have shown a significant correlation between the content of short indels and their flanking regions, which by definition cannot be modelled by the above two approaches. In this work, we present a context-sensitive indel model based on a pair Tree-Adjoining Grammar (TAG), along with accompanying algorithms for efficient alignment and parameter estimation. The increased precision and statistical power of this model is shown on simulated and real genomic data. As the cost of sequencing plummets, the usefulness of comparative analysis is becoming limited by alignment accuracy rather than data availability. Our results will therefore have an impact on any type of downstream comparative genomics analyses that rely on alignments. Fine-grained studies of small functional regions or disease markers, for example, could be significantly improved by our method. The implementation is available at www.mcb.mcgill.ca/~blanchem/software.html.
research in computational molecular biology | 2017
Benedict Paten; Adam M. Novak; Erik Garrison; Glenn Hickey
A superbubble is a type of directed acyclic subgraph with single distinct source and sink vertices. In genome assembly and genetics, the possible paths through a superbubble can be considered to represent the set of possible sequences at a location in a genome. Bidirected and biedged graphs are a generalization of digraphs that are increasingly being used to more fully represent genome assembly and variation problems. Here we define snarls and ultrabubbles, generalizations of superbubbles for bidirected and biedged graphs, and give an efficient algorithm for the detection of these more general structures. Key to this algorithm is the cactus graph, which we show encodes the nested decomposition of a graph into snarls and ultrabubbles within its structure. We propose and demonstrate empirically that this decomposition on bidirected and biedged graphs solves a fundamental problem by defining genetic sites for any collection of genomic variations, including complex structural variations, without need for any single reference genome coordinate system. Furthermore, the nesting of the decomposition gives a natural way to describe and model variations contained within large variations, a case not currently dealt with by existing formats, e.g. VCF.
BMC Bioinformatics | 2016
Daniel R. Zerbino; Tracy Ballinger; Benedict Paten; Glenn Hickey; David Haussler
BackgroundThe study of genomic variation has provided key insights into the functional role of mutations. Predominantly, studies have focused on single nucleotide variants (SNV), which are relatively easy to detect and can be described with rich mathematical models. However, it has been observed that genomes are highly plastic, and that whole regions can be moved, removed or duplicated in bulk. These structural variants (SV) have been shown to have significant impact on phenotype, but their study has been held back by the combinatorial complexity of the underlying models.ResultsWe describe here a general model of structural variation that encompasses both balanced rearrangements and arbitrary copy-number variants (CNV).ConclusionsIn this model, we show that the space of possible evolutionary histories that explain the structural differences between any two genomes can be sampled ergodically.