Mike Kay
Wellcome Trust Sanger Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mike Kay.
Genome Research | 2012
Jennifer Harrow; Adam Frankish; José Manuel Rodríguez González; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen Aken; Daniel Barrell; Amonida Zadissa; Stephen M. J. Searle; I. Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles A. Steward; Rachel A. Harte; Mike Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael L. Tress
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Science | 2012
Daniel G. MacArthur; Suganthi Balasubramanian; Adam Frankish; Ni Huang; James A. Morris; Klaudia Walter; Luke Jostins; Lukas Habegger; Joseph K. Pickrell; Stephen B. Montgomery; Cornelis A. Albers; Zhengdong D. Zhang; Donald F. Conrad; Gerton Lunter; Hancheng Zheng; Qasim Ayub; Mark A. DePristo; Eric Banks; Min Hu; Robert E. Handsaker; Jeffrey A. Rosenfeld; Menachem Fromer; Mike Jin; Xinmeng Jasmine Mu; Ekta Khurana; Kai Ye; Mike Kay; Gary Saunders; Marie-Marthe Suner; Toby Hunt
Defective Gene Detective Identifying genes that give rise to diseases is one of the major goals of sequencing human genomes. However, putative loss-of-function genes, which are often some of the first identified targets of genome and exome sequencing, have often turned out to be sequencing errors rather than true genetic variants. In order to identify the true scope of loss-of-function genes within the human genome, MacArthur et al. (p. 823; see the Perspective by Quintana-Murci) extensively validated the genomes from the 1000 Genomes Project, as well as an additional European individual, and found that the average person has about 100 true loss-of-function alleles of which approximately 20 have two copies within an individual. Because many known disease-causing genes were identified in “normal” individuals, the process of clinical sequencing needs to reassess how to identify likely causative alleles. Validation of predicted nonfunctional alleles in the human genome affects the medical interpretation of genomic analyses. Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease–causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
Nucleic Acids Research | 2014
Catherine M. Farrell; Nuala A. O’Leary; Rachel A. Harte; Jane Loveland; Laurens Wilming; Craig Wallin; Mark Diekhans; Daniel Barrell; Stephen M. J. Searle; Bronwen Aken; Susan M. Hiatt; Adam Frankish; Marie-Marthe Suner; Bhanu Rajput; Charles A. Steward; Garth Brown; Ruth Bennett; Michael R. Murphy; Wendy Wu; Mike Kay; Jennifer Hart; Jeena Rajan; Janet Weber; Catherine Snow; Lillian D. Riddick; Toby Hunt; David Webb; Mark G. Thomas; Pamela Tamez; Sanjida H. Rangwala
The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.
BMC Genomics | 2013
Harry Dawson; Jane Loveland; Géraldine Pascal; James Gilbert; Hirohide Uenishi; Katherine Mann; Yongming Sang; Jie Zhang; Denise R. Carvalho-Silva; Toby Hunt; Matthew Hardy; Zhi-Liang Hu; Shuhong Zhao; Anna Anselmo; Hiroki Shinkai; Celine Chen; Bouabid Badaoui; Daniel Berman; Clara Amid; Mike Kay; David Lloyd; Catherine Snow; Takeya Morozumi; Ryan Pei-Yen Cheng; Megan Bystrom; Ronan Kapetanovic; John C. Schwartz; Ranjit Singh Kataria; Matthew Astley; Eric Fritz
BackgroundThe domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems.ResultsThe Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome.ConclusionsThis extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig’s adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response.
Genome Biology | 2010
Steve Searle; Adam Frankish; Alexandra Bignell; Bronwen Aken; Thomas Derrien; Mark Diekhans; Rachel A. Harte; C. Howald; Felix Kokocinski; Michael F. Lin; Michael L. Tress; M. Van Baren; I. Barnes; Toby Hunt; D. Carvalho-Silva; C. Davidson; Sarah Donaldson; James Gilbert; Mike Kay; David Lloyd; Jane Loveland; Jonathan M. Mudge; Catherine Snow; J. Vamathevan; Laurens Wilming; Michael R. Brent; Mark Gerstein; Roderic Guigó; Manolis Kellis; Alexandre Reymond
This article is part of the supplement: Beyond the Genome: The true gene count, human evolution and disease genomics, Boston, MA, USA. 11-13 October 2010.
Nucleic Acids Research | 2003
Neil Hall; Matthew Berriman; Nicola Lennard; Barbara Harris; Christiane Hertz-Fowler; Emmanuelle N. Bart‐Delabesse; Caroline S. Gerrard; Rebecca Atkin; Andrew Barron; Sharen Bowman; Sarah P. Bray‐Allen; Frédéric Bringaud; Louise Clark; Craig Corton; Ann Cronin; Robert Davies; Jonathon Doggett; Audrey Fraser; Eric Grüter; Sarah Hall; A. David Harper; Mike Kay; Vanessa Leech; Rebecca Mayes; Claire Price; Michael A. Quail; Ester Rabbinowitsch; Christopher Reitter; Kim Rutherford; Jürgen Sasse
F1000Research | 2014
Mike Kay; Jennifer Harrow
Nature Precedings | 2009
Alexandra Bignell; Adam Frankish; Bronwen Aken; Mark Diekhans; Felix Kokocinski; Mike Lin; Michael L. Tress; J. Van Baren; I. Barnes; Toby Hunt; D. Carvalho-Silva; C. Davidson; S. Donaldson; J. Gilbert; E. Hart; Mike Kay; R. Kinsella; D. Lloyd; J. Loveland; J. E. Mudge; C. Snow; J. Vamathevan; L. Wilming; Michael R. Brent; Mark Gerstein; Roderic Guigó; Rachel A. Harte; Manolis Kellis; Stephen M. J. Searle; Jennifer Harrow