William Chow
Wellcome Trust Sanger Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William Chow.
PLOS Biology | 2011
Deanna M. Church; Valerie Schneider; Tina Graves; Katherine Auger; Fiona Cunningham; Nathan Bouk; Hsiu Chuan Chen; Richa Agarwala; William M. McLaren; Graham R. S. Ritchie; Derek Albracht; Milinn Kremitzki; Susan Rock; Holland Kotkiewicz; Colin Kremitzki; Aye Wollam; Lee Trani; Lucinda Fulton; Robert S. Fulton; Lucy Matthews; S. Whitehead; William Chow; James Torrance; Matthew Dunn; Glenn Harden; Glen Threadgold; Jonathan Wood; Joanna Collins; Paul Heath; Guy Griffiths
I have read the journals policy and have the following conflicts: Paul Flicek is married to the deputy editor of PLoS Medicine, Melissa Norton. Evan Eichler is on the board of Pacific Biosciences. Support for this work came from the Intramural Research Program of the NIH, The National Library of Medicine, the European Molecular Biology Laboratory, the Wellcome Trust (grant number 077198), and the Howard Hughes Medical Institute (EEE). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
BMC Genomics | 2008
Nicole L. Quinn; Natasha Levenkova; William Chow; Pascal Bouffard; Keith A. Boroevich; James Knight; Thomas Jarvie; Krzysztof P. Lubieniecki; Brian Desany; Ben F. Koop; Timothy T. Harkins; William S. Davidson
BackgroundWith a whole genome duplication event and wealth of biological data, salmonids are excellent model organisms for studying evolutionary processes, fates of duplicated genes and genetic and physiological processes associated with complex behavioral phenotypes. It is surprising therefore, that no salmonid genome has been sequenced. Atlantic salmon (Salmo salar) is a good representative salmonid for sequencing given its importance in aquaculture and the genomic resources available. However, the size and complexity of the genome combined with the lack of a sequenced reference genome from a closely related fish makes assembly challenging. Given the cost and time limitations of Sanger sequencing as well as recent improvements to next generation sequencing technologies, we examined the feasibility of using the Genome Sequencer (GS) FLX pyrosequencing system to obtain the sequence of a salmonid genome. Eight pooled BACs belonging to a minimum tiling path covering ~1 Mb of the Atlantic salmon genome were sequenced by GS FLX shotgun and Long Paired End sequencing and compared with a ninth BAC sequenced by Sanger sequencing of a shotgun library.ResultsAn initial assembly using only GS FLX shotgun sequences (average read length 248.5 bp) with ~30× coverage allowed gene identification, but was incomplete even when 126 Sanger-generated BAC-end sequences (~0.09× coverage) were incorporated. The addition of paired end sequencing reads (additional ~26× coverage) produced a final assembly comprising 175 contigs assembled into four scaffolds with 171 gaps. Sanger sequencing of the ninth BAC (~10.5× coverage) produced nine contigs and two scaffolds. The number of scaffolds produced by the GS FLX assembly was comparable to Sanger-generated sequencing; however, the number of gaps was much higher in the GS FLX assembly.ConclusionThese results represent the first use of GS FLX paired end reads for de novo sequence assembly. Our data demonstrated that this improved the GS FLX assemblies; however, with respect to de novo sequencing of complex genomes, the GS FLX technology is limited to gene mining and establishing a set of ordered sequence contigs. Currently, for a salmonid reference sequence, it appears that a substantial portion of sequencing should be done using Sanger technology.
Genome Research | 2017
Valerie Schneider; Tina A. Graves-Lindsay; Kerstin Howe; Nathan Bouk; Hsiu-Chuan Chen; Paul Kitts; Terence Murphy; Kim D. Pruitt; Françoise Thibaud-Nissen; Derek Albracht; Robert S. Fulton; Milinn Kremitzki; Vincent Magrini; Chris Markovic; Sean McGrath; Karyn Meltz Steinberg; Kate Auger; William Chow; Joanna Collins; Glenn Harden; Tim Hubbard; Sarah Pelan; Jared T. Simpson; Glen Threadgold; James Torrance; Jonathan Wood; Laura Clarke; Sergey Koren; Matthew Boitano; Paul Peluso
The human reference genome assembly plays a central role in nearly all aspects of todays basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
G3: Genes, Genomes, Genetics | 2017
Wesley C. Warren; LaDeana W. Hillier; Chad Tomlinson; Patrick Minx; Milinn Kremitzki; Tina Graves; Chris Markovic; Nathan Bouk; Kim D. Pruitt; Françoise Thibaud-Nissen; Valerie Schneider; Tamer Mansour; C. Titus Brown; Aleksey V. Zimin; R. J. Hawken; Mitch Abrahamsen; Alexis B. Pyrkosz; Mireille Morisson; Valerie Fillon; Alain Vignal; William Chow; Kerstin Howe; Janet E. Fulton; Marcia M. Miller; Peter V. Lovell; Claudio V. Mello; Morgan Wirthlin; Andrew S. Mason; Richard Kuo; David W. Burt
The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts.
BMC Genomics | 2010
Nicole L. Quinn; Keith A. Boroevich; Krysztof P. Lubieniecki; William Chow; Evelyn A. Davidson; Ruth B. Phillips; Benjamin F. Koop; William S. Davidson
BackgroundThe genomes of salmonids are considered pseudo-tetraploid undergoing reversion to a stable diploid state. Given the genome duplication and extensive biological data available for salmonids, they are excellent model organisms for studying comparative genomics, evolutionary processes, fates of duplicated genes and the genetic and physiological processes associated with complex behavioral phenotypes. The evolution of the tetrapod hemoglobin genes is well studied; however, little is known about the genomic organization and evolution of teleost hemoglobin genes, particularly those of salmonids. The Atlantic salmon serves as a representative salmonid species for genomics studies. Given the well documented role of hemoglobin in adaptation to varied environmental conditions as well as its use as a model protein for evolutionary analyses, an understanding of the genomic structure and organization of the Atlantic salmon α and β hemoglobin genes is of great interest.ResultsWe identified four bacterial artificial chromosomes (BACs) comprising two hemoglobin gene clusters spanning the entire α and β hemoglobin gene repertoire of the Atlantic salmon genome. Their chromosomal locations were established using fluorescence in situ hybridization (FISH) analysis and linkage mapping, demonstrating that the two clusters are located on separate chromosomes. The BACs were sequenced and assembled into scaffolds, which were annotated for putatively functional and pseudogenized hemoglobin-like genes. This revealed that the tail-to-tail organization and alternating pattern of the α and β hemoglobin genes are well conserved in both clusters, as well as that the Atlantic salmon genome houses substantially more hemoglobin genes, including non-Bohr β globin genes, than the genomes of other teleosts that have been sequenced.ConclusionsWe suggest that the most parsimonious evolutionary path leading to the present organization of the Atlantic salmon hemoglobin genes involves the loss of a single hemoglobin gene cluster after the whole genome duplication (WGD) at the base of the teleost radiation but prior to the salmonid-specific WGD, which then produced the duplicated copies seen today. We also propose that the relatively high number of hemoglobin genes as well as the presence of non-Bohr β hemoglobin genes may be due to the dynamic life history of salmon and the diverse environmental conditions that the species encounters.Data deposition: BACs S0155C07 and S0079J05 (fps135): GenBank GQ898924; BACs S0055H05 and S0014B03 (fps1046): GenBank GQ898925
Molecular Biology and Evolution | 2009
K. A. Johnstone; Kate L. Ciborowski; Krzysztof P. Lubieniecki; William Chow; Ruth B. Phillips; Ben F. Koop; William C. Jordan; William S. Davidson
There are three major multigene superfamilies of olfactory receptors (OR, V1R, and V2R) in mammals. The ORs are expressed in the main olfactory organ, whereas the V1Rs and V2Rs are located in the vomeronasal organ. Fish only possess one olfactory organ in each nasal cavity, the olfactory rosette; therefore, it has been proposed that their V2R-like genes be classified as olfactory C family G protein-coupled receptors (OlfC). There are large variations in the sizes of OR gene repertoires. Previous studies have shown that fish have between 12 and 46 functional V2R-like genes, whereas humans have lost all functional V2Rs, and frog sp. have more than 240. Pseudogenization of V2R genes is a prevalent event across species. In the mouse and frog genomes, there are approximately double the number of pseudogenes compared with functional genes. An oligonucleotide probe was designed from a conserved sequence from four Atlantic salmon OlfC genes and used to screen the Atlantic salmon bacterial artificial chromosome (BAC) library. Hybridization-positive BACs were matched to fingerprint contigs, and representative BACs were shotgun cloned and sequenced. We identified 55 OlfC genes. Twenty-nine of the OlfC genes are classified as putatively functional genes and 26 as pseudogenes. The OlfC genes are found in two genomic clusters on chromosomes 9 and 20. Phylogenetic analysis revealed that the OlfC genes could be divided into 10 subfamilies, with nine of these subfamilies corresponding to subfamilies found in other teleosts and one being salmon specific. There is also a large expansion in the number of OlfC genes in one subfamily in Atlantic salmon. Subfamily gene expansions have been identified in other teleosts, and these differences in gene number reflect species-specific evolutionary requirements for olfaction. Total RNA was isolated from the olfactory epithelium and other tissues from a presmolt to examine the expression of the odorant genes. Several of the putative OlfC genes that we identified are expressed only in the olfactory epithelium, consistent with these genes encoding odorant receptors.
Genome Research | 2016
Benjamin M. Skinner; Carole A. Sargent; Carol Churcher; Toby Hunt; Javier Herrero; Jane Loveland; Matthew Dunn; Sandra Louzada; Beiyuan Fu; William Chow; James Gilbert; Siobhan Austin-Guest; Kathryn Beal; Denise R. Carvalho-Silva; William Cheng; Daria Gordon; Darren Grafham; Matt Hardy; Jo Harley; Heidi Hauser; Philip Howden; Kerstin Howe; Kim Lachani; Peter Ji Ellis; Daniel Kelly; Giselle Kerry; James Kerwin; Bee Ling Ng; Glen Threadgold; Thomas Wileman
We have generated an improved assembly and gene annotation of the pig X Chromosome, and a first draft assembly of the pig Y Chromosome, by sequencing BAC and fosmid clones from Duroc animals and incorporating information from optical mapping and fiber-FISH. The X Chromosome carries 1033 annotated genes, 690 of which are protein coding. Gene order closely matches that found in primates (including humans) and carnivores (including cats and dogs), which is inferred to be ancestral. Nevertheless, several protein-coding genes present on the human X Chromosome were absent from the pig, and 38 pig-specific X-chromosomal genes were annotated, 22 of which were olfactory receptors. The pig Y-specific Chromosome sequence generated here comprises 30 megabases (Mb). A 15-Mb subset of this sequence was assembled, revealing two clusters of male-specific low copy number genes, separated by an ampliconic region including the HSFY gene family, which together make up most of the short arm. Both clusters contain palindromes with high sequence identity, presumably maintained by gene conversion. Many of the ancestral X-related genes previously reported in at least one mammalian Y Chromosome are represented either as active genes or partial sequences. This sequencing project has allowed us to identify genes--both single copy and amplified--on the pig Y Chromosome, to compare the pig X and Y Chromosomes for homologous sequences, and thereby to reveal mechanisms underlying pig X and Y Chromosome evolution.
Marine Genomics | 2008
K. A. Johnstone; Krzysztof P. Lubieniecki; William Chow; Ruth B. Phillips; Ben F. Koop; William S. Davidson
Olfactory receptors are encoded by three large multigene superfamilies (OR, V1R and V2R) in mammals. Fish do not possess a vomeronasal system; therefore, it has been proposed that their V1R-like genes be classified as olfactory receptors related to class A G protein-coupled receptors (ora). Unlike mammalian genomes, which contain more than a hundred V1R genes, the five species of teleost fish that have been investigated to date appear to have six ora genes (ora1-6) except for pufferfish that have lost ora1. The common ancestor of salmonid fishes is purported to have undergone a whole genome duplication. As salmonids have a life history that requires the use of olfactory cues to navigate back to their natal habitats to spawn, we set out to determine if ora1 or ora2 is duplicated in a representative species, Atlantic salmon (Salmo salar). We used an oligonucleotide probe designed from a conserved sequence of several teleost ora2 genes to screen an Atlantic salmon BAC library (CHORI-214). Hybridization-positive BACs belonged to a single fingerprint contig of the Atlantic salmon physical map. All were also positive for ora2 by PCR. One of these BACs was chosen for further study, and shotgun sequencing of this BAC identified two V1R-like genes, ora1 and ora2, that are in a head-to-head conformation as is seen in some other teleosts. The gene products, ora1 and ora2, are highly conserved among teleosts. We only found evidence for a single ora1-2 locus in the Atlantic salmon genome, which was mapped to linkage group 6. Fluorescent in situ hybridization (FISH) analysis placed ora1-2 on chromosome 12. Conserved synteny was found surrounding the ora1 and ora2 genes in Atlantic salmon, medaka and three-spined stickleback, but not zebrafish.
Marine Genomics | 2009
Yvonne Y.Y. Lai; Krzysztof P. Lubieniecki; Ruth B. Phillips; William Chow; Ben F. Koop; William S. Davidson
Gene and genome duplications are considered to be driving forces of evolution. The relatively recent genome duplication in the common ancestor of salmonids makes this group of fish an excellent system for studying the re-diploidization process and the fates of duplicate genes. We characterized the structure and genome organization of the intestinal fatty acid binding protein (fabp2) genes in Atlantic salmon as a means of understanding the evolutionary fates of members of this protein family in teleosts. A survey of EST databases identified three unique salmonid fabp2 transcripts (fabp2aI, fabp2aII and fabp2b) compared to one transcript in zebrafish. We screened the CHORI-214 Atlantic salmon BAC library and identified BACs containing each of the three fabp2 genes. Physical mapping, genetic mapping and fluorescence in situ hybridization of Atlantic salmon chromosomes revealed that Atlantic salmon fabp2aI, fabp2aII and fabp2b correspond to separate genetic loci that reside on different chromosomes. Comparative genomic analyses indicated that these genes are related to one another by two genome duplications and a gene loss. The first genome duplication occurred in the common ancestor of all teleosts, giving rise to fabp2a and fabp2b, and the second in the common ancestor of salmonids, producing fabp2aI, fabp2aII, fabp2bI and fabp2bII. A subsequent loss of fabp2bI or fabp2bII gave the complement of fabp2 genes seen in Atlantic salmon today. There is also evidence for independent losses of fabp2b genes in zebrafish and tetraodon. Although there is no evidence for partitioning of tissue expression of fabp2 genes (i.e., sub-functionalization) in Atlantic salmon, the pattern of amino acid substitutions in Atlantic salmon and rainbow trout fabp2aI and fabp2aII suggests that neo-functionalization is occurring.
Genome Research | 2018
David Thybert; Maša Roller; Fabio C. P. Navarro; Ian T Fiddes; Ian Streeter; Christine Feig; David Martín-Gálvez; Mikhail Kolmogorov; Václav Janoušek; Wasiu Akanni; Bronwen Aken; Sarah Aldridge; Varshith Chakrapani; William Chow; Laura Clarke; Carla Cummins; Anthony G. Doran; Matthew Dunn; Leo Goodstadt; Kerstin Howe; Matthew Howell; Ambre Aurore Josselin; Robert C. Karn; Lilue Jingtao; Fergal Martin; Matthieu Muffato; Stefanie Nachtweide; Michael A. Quail; Cristina Sisu; Mario Stanke
Understanding the mechanisms driving lineage-specific evolution in both primates and rodents has been hindered by the lack of sister clades with a similar phylogenetic structure having high-quality genome assemblies. Here, we have created chromosome-level assemblies of the Mus caroli and Mus pahari genomes. Together with the Mus musculus and Rattus norvegicus genomes, this set of rodent genomes is similar in divergence times to the Hominidae (human-chimpanzee-gorilla-orangutan). By comparing the evolutionary dynamics between the Muridae and Hominidae, we identified punctate events of chromosome reshuffling that shaped the ancestral karyotype of Mus musculus and Mus caroli between 3 and 6 million yr ago, but that are absent in the Hominidae. Hominidae show between four- and sevenfold lower rates of nucleotide change and feature turnover in both neutral and functional sequences, suggesting an underlying coherence to the Muridae acceleration. Our system of matched, high-quality genome assemblies revealed how specific classes of repeats can play lineage-specific roles in related species. Recent LINE activity has remodeled protein-coding loci to a greater extent across the Muridae than the Hominidae, with functional consequences at the species level such as reproductive isolation. Furthermore, we charted a Muridae-specific retrotransposon expansion at unprecedented resolution, revealing how a single nucleotide mutation transformed a specific SINE element into an active CTCF binding site carrier specifically in Mus caroli, which resulted in thousands of novel, species-specific CTCF binding sites. Our results show that the comparison of matched phylogenetic sets of genomes will be an increasingly powerful strategy for understanding mammalian biology.