Charles A. Steward
Wellcome Trust Sanger Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Charles A. Steward.
Genome Research | 2012
Jennifer Harrow; Adam Frankish; José Manuel Rodríguez González; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen Aken; Daniel Barrell; Amonida Zadissa; Stephen M. J. Searle; I. Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles A. Steward; Rachel A. Harte; Mike Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael L. Tress
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Nature | 2011
Thomas M. Keane; Leo Goodstadt; Petr Danecek; Michael A. White; Kim Wong; Binnaz Yalcin; Andreas Heger; Avigail Agam; Guy Slater; Martin Goodson; N A Furlotte; Eleazar Eskin; Christoffer Nellåker; H Whitley; James Cleak; Deborah Janowitz; Polinka Hernandez-Pliego; Andrew Edwards; T G Belgard; Peter L. Oliver; Rebecca E McIntyre; Amarjit Bhomra; Jérôme Nicod; Xiangchao Gan; Wei Yuan; L van der Weyden; Charles A. Steward; Sendu Bala; Jim Stalker; Richard Mott
We report genome sequences of 17 inbred strains of laboratory mice and identify almost ten times more variants than previously known. We use these genomes to explore the phylogenetic history of the laboratory mouse and to examine the functional consequences of allele-specific variation on transcript abundance, revealing that at least 12% of transcripts show a significant tissue-specific expression bias. By identifying candidate functional variants at 718 quantitative trait loci we show that the molecular nature of functional variants and their position relative to genes vary according to the effect size of the locus. These sequences provide a starting point for a new era in the functional analysis of a key model organism.
Genome Research | 2009
Kim D. Pruitt; Jennifer Harrow; Rachel A. Harte; Craig Wallin; Mark Diekhans; Donna Maglott; Steve Searle; Catherine M. Farrell; Jane Loveland; Barbara J. Ruef; Elizabeth Hart; Marie-Marthe Suner; Melissa J. Landrum; Bronwen Aken; Sarah Ayling; Robert Baertsch; Julio Fernandez-Banet; Joshua L. Cherry; Val Curwen; Michael DiCuccio; Manolis Kellis; Jennifer M. Lee; Michael F. Lin; Michael Schuster; Andrew Shkeda; Clara Amid; Garth Brown; Oksana Dukhanina; Adam Frankish; Jennifer Hart
Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.
Nature | 2000
James C. Mullikin; Sarah Hunt; Charlotte G. Cole; Beverley Mortimore; Catherine M. Rice; J. Burton; L. H. Matthews; R. Pavitt; R. W. Plumb; Sarah Sims; R. M. R. Ainscough; J. Attwood; J. M. Bailey; K. Barlow; R. M. M. Bruskiewich; P. N. Butcher; Nigel P. Carter; Yuan Chen; C. M. Clee; Penny Coggill; J. Davies; Robert Davies; E. Dawson; M. D. Francis; A. A. Joy; R. G. Lamble; Cordelia Langford; J. Macarthy; V. Mall; A. Moreland
The human genome sequence will provide a reference for measuring DNA sequence variation in human populations. Sequence variants are responsible for the genetic component of individuality, including complex characteristics such as disease susceptibility and drug response. Most sequence variants are single nucleotide polymorphisms (SNPs), where two alternate bases occur at one position. Comparison of any two genomes reveals around 1 SNP per kilobase. A sufficiently dense map of SNPs would allow the detection of sequence variants responsible for particular characteristics on the basis that they are associated with a specific SNP allele. Here we have evaluated large-scale sequencing approaches to obtaining SNPs, and have constructed a map of 2,730 SNPs on human chromosome 22. Most of the SNPs are within 25 kilobases of a transcribed exon, and are valuable for association studies. We have scaled up the process, detecting over 65,000 SNPs in the genome as part of The SNP Consortium programme, which is on target to build a map of 1 SNP every 5 kilobases that is integrated with the human genome sequence and that is freely available in the public domain.
Nature | 2006
Michael C. Zody; Manuel Garber; David J. Adams; Ted Sharpe; Jennifer Harrow; James R. Lupski; Christine Nicholson; Steven M. Searle; Laurens Wilming; Sarah K. Young; Amr Abouelleil; Nicole R. Allen; Weimin Bi; Toby Bloom; Mark L. Borowsky; Boris Bugalter; Jonathan Butler; Jean L. Chang; Chao-Kung Chen; April Cook; Benjamin Corum; Christina A. Cuomo; Pieter J. de Jong; David DeCaprio; Ken Dewar; Michael Fitzgerald; James Gilbert; Richard Gibson; Sante Gnerre; Steven Goldstein
Chromosome 17 is unusual among the human chromosomes in many respects. It is the largest human autosome with orthology to only a single mouse chromosome, mapping entirely to the distal half of mouse chromosome 11. Chromosome 17 is rich in protein-coding genes, having the second highest gene density in the genome. It is also enriched in segmental duplications, ranking third in density among the autosomes. Here we report a finished sequence for human chromosome 17, as well as a structural comparison with the finished sequence for mouse chromosome 11, the first finished mouse chromosome. Comparison of the orthologous regions reveals striking differences. In contrast to the typical pattern seen in mammalian evolution, the human sequence has undergone extensive intrachromosomal rearrangement, whereas the mouse sequence has been remarkably stable. Moreover, although the human sequence has a high density of segmental duplication, the mouse sequence has a very low density. Notably, these segmental duplications correspond closely to the sites of structural rearrangement, demonstrating a link between duplication and rearrangement. Examination of the main classes of duplicated segments provides insight into the dynamics underlying expansion of chromosome-specific, low-copy repeats in the human genome.
Nucleic Acids Research | 2014
Catherine M. Farrell; Nuala A. O’Leary; Rachel A. Harte; Jane Loveland; Laurens Wilming; Craig Wallin; Mark Diekhans; Daniel Barrell; Stephen M. J. Searle; Bronwen Aken; Susan M. Hiatt; Adam Frankish; Marie-Marthe Suner; Bhanu Rajput; Charles A. Steward; Garth Brown; Ruth Bennett; Michael R. Murphy; Wendy Wu; Mike Kay; Jennifer Hart; Jeena Rajan; Janet Weber; Catherine Snow; Lillian D. Riddick; Toby Hunt; David Webb; Mark G. Thomas; Pamela Tamez; Sanjida H. Rangwala
The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.
BMC Genomics | 2013
Harry Dawson; Jane Loveland; Géraldine Pascal; James Gilbert; Hirohide Uenishi; Katherine Mann; Yongming Sang; Jie Zhang; Denise R. Carvalho-Silva; Toby Hunt; Matthew Hardy; Zhi-Liang Hu; Shuhong Zhao; Anna Anselmo; Hiroki Shinkai; Celine Chen; Bouabid Badaoui; Daniel Berman; Clara Amid; Mike Kay; David Lloyd; Catherine Snow; Takeya Morozumi; Ryan Pei-Yen Cheng; Megan Bystrom; Ronan Kapetanovic; John C. Schwartz; Ranjit Singh Kataria; Matthew Astley; Eric Fritz
BackgroundThe domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems.ResultsThe Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome.ConclusionsThis extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig’s adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response.
Nucleic Acids Research | 2014
Jennifer Harrow; Charles A. Steward; Adam Frankish; James Gilbert; José Manuel Rodríguez González; Jane Loveland; Jonathan M. Mudge; Daniel Sheppard; Mark G. Thomas; Stephen J. Trevanion; Laurens Wilming
The Vertebrate Genome Annotation (VEGA) database (http://vega.sanger.ac.uk), initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).
Journal of Immunology | 2010
Heather I. Fraser; Calliope A. Dendrou; Barry Healy; Daniel B. Rainbow; Sarah Howlett; Luc J. Smink; Simon G. Gregory; Charles A. Steward; John A. Todd; Laurence B. Peterson; Linda S. Wicker
We have used the public sequencing and annotation of the mouse genome to delimit the previously resolved type 1 diabetes (T1D) insulin-dependent diabetes (Idd)18 interval to a region on chromosome 3 that includes the immunologically relevant candidate gene, Vav3. To test the candidacy of Vav3, we developed a novel congenic strain that enabled the resolution of Idd18 to a 604-kb interval, designated Idd18.1, which contains only two annotated genes: the complete sequence of Vav3 and the last exon of the gene encoding NETRIN G1, Ntng1. Targeted sequencing of Idd18.1 in the NOD mouse strain revealed that allelic variation between NOD and C57BL/6J (B6) occurs in noncoding regions with 138 single nucleotide polymorphisms concentrated in the introns between exons 20 and 27 and immediately after the 3′ untranslated region. We observed differential expression of VAV3 RNA transcripts in thymocytes when comparing congenic mouse strains with B6 or NOD alleles at Idd18.1. The T1D protection associated with B6 alleles of Idd18.1/Vav3 requires the presence of B6 protective alleles at Idd3, which are correlated with increased IL-2 production and regulatory T cell function. In the absence of B6 protective alleles at Idd3, we detected a second T1D protective B6 locus, Idd18.3, which is closely linked to, but distinct from, Idd18.1. Therefore, genetic mapping, sequencing, and gene expression evidence indicate that alteration of VAV3 expression is an etiological factor in the development of autoimmune β-cell destruction in NOD mice. This study also demonstrates that a congenic strain mapping approach can isolate closely linked susceptibility genes.
Journal of Immunology | 2011
Daniel B. Rainbow; Carolyn Moule; Heather I. Fraser; Jan Clark; Sarah Howlett; Oliver Burren; Mikkel Christensen; Val Moody; Charles A. Steward; Javid P. Mohammed; Michael E. Fusakio; Emma L. Masteller; Erik B. Finger; J. P. Houchins; Dieter Naf; Frank Koentgen; William M. Ridgway; John A. Todd; Jeffrey A. Bluestone; Laurence B. Peterson; Jochen Mattner; Linda S. Wicker
We have previously proposed that sequence variation of the CD101 gene between NOD and C57BL/6 mice accounts for the protection from type 1 diabetes (T1D) provided by the insulin-dependent diabetes susceptibility region 10 (Idd10), a <1 Mb region on mouse chromosome 3. In this study, we provide further support for the hypothesis that Cd101 is Idd10 using haplotype and expression analyses of novel Idd10 congenic strains coupled to the development of a CD101 knockout mouse. Susceptibility to T1D was correlated with genotype-dependent CD101 expression on multiple cell subsets, including Foxp3+ regulatory CD4+ T cells, CD11c+ dendritic cells, and Gr1+ myeloid cells. The correlation of CD101 expression on immune cells from four independent Idd10 haplotypes with the development of T1D supports the identity of Cd101 as Idd10. Because CD101 has been associated with regulatory T and Ag presentation cell functions, our results provide a further link between immune regulation and susceptibility to T1D.