Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Felix Kokocinski is active.

Publication


Featured researches published by Felix Kokocinski.


Nature | 2012

Landscape of transcription in human cells

Sarah Djebali; Carrie A. Davis; Angelika Merkel; Alexander Dobin; Timo Lassmann; Ali Mortazavi; Andrea Tanzer; Julien Lagarde; Wei Lin; Felix Schlesinger; Chenghai Xue; Georgi K. Marinov; Jainab Khatun; Brian A. Williams; Chris Zaleski; Joel Rozowsky; Maik Röder; Felix Kokocinski; Rehab F. Abdelhamid; Tyler Alioto; Igor Antoshechkin; Michael T. Baer; Nadav S. Bar; Philippe Batut; Kimberly Bell; Ian Bell; Sudipto Chakrabortty; Xian Chen; Jacqueline Chrast; Joao Curado

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell’s regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.


Genome Research | 2012

GENCODE: The reference human genome annotation for The ENCODE Project

Jennifer Harrow; Adam Frankish; José Manuel Rodríguez González; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen Aken; Daniel Barrell; Amonida Zadissa; Stephen M. J. Searle; I. Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles A. Steward; Rachel A. Harte; Mike Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael L. Tress

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.


Nucleic Acids Research | 2010

Ensembl’s 10th year

Paul Flicek; Bronwen Aken; Benoit Ballester; Kathryn Beal; Eugene Bragin; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Julio Fernandez-Banet; Leo Gordon; Stefan Gräf; Syed Haider; Martin Hammond; Kerstin Howe; Andrew M. Jenkinson; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Gautier Koscielny; Eugene Kulesha; Daniel Lawson; Ian Longden; Tim Massingham; William M. McLaren

Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.


Nature Methods | 2013

Assessment of transcript reconstruction methods for RNA-seq

Tamara Steijger; Josep F. Abril; Pär G. Engström; Felix Kokocinski; Tim Hubbard; Roderic Guigó; Jennifer Harrow; Paul Bertone

We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.


Database | 2016

The Ensembl gene annotation system.

Bronwen Aken; Sarah Ayling; Daniel Barrell; Laura Clarke; Valery Curwen; Susan Fairley; Julio Fernandez Banet; Konstantinos Billis; Carlos Giron; Thibaut Hourlier; Kevin L. Howe; Andreas Kähäri; Felix Kokocinski; Fergal Martin; Daniel N. Murphy; Rishi Nag; Magali Ruffier; Michael Schuster; Y. Amy Tang; Jan-Hinnerk Vogel; Simon White; Amonida Zadissa; Paul Flicek; Stephen M. J. Searle

The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail. Database URL: http://www.ensembl.org/index.html


European Journal of Human Genetics | 2011

The GENCODE exome: sequencing the complete human exome

Alison J. Coffey; Felix Kokocinski; Maria S Calafato; Carol Scott; Priit Palta; Eleanor Drury; Christopher J. Joyce; Emily M LeProust; Jen Harrow; Sarah Hunt; Anna-Elina Lehesjoki; Daniel J. Turner; Tim Hubbard; Aarno Palotie

Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing.


Genome Research | 2012

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome

Cédric Howald; Andrea Tanzer; Jacqueline Chrast; Felix Kokocinski; Thomas Derrien; Nathalie Walters; José Manuel Rodríguez González; Adam Frankish; Bronwen Aken; Thibaut Hourlier; Jan-Hinnerk Vogel; Simon White; Stephen M. J. Searle; Jennifer Harrow; Tim Hubbard; Roderic Guigó; Alexandre Reymond

Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.


Genome Biology | 2010

The GENCODE human gene set

Steve Searle; Adam Frankish; Alexandra Bignell; Bronwen Aken; Thomas Derrien; Mark Diekhans; Rachel A. Harte; C. Howald; Felix Kokocinski; Michael F. Lin; Michael L. Tress; M. Van Baren; I. Barnes; Toby Hunt; D. Carvalho-Silva; C. Davidson; Sarah Donaldson; James Gilbert; Mike Kay; David Lloyd; Jane Loveland; Jonathan M. Mudge; Catherine Snow; J. Vamathevan; Laurens Wilming; Michael R. Brent; Mark Gerstein; Roderic Guigó; Manolis Kellis; Alexandre Reymond

This article is part of the supplement: Beyond the Genome: The true gene count, human evolution and disease genomics, Boston, MA, USA. 11-13 October 2010.


BMC Genomics | 2010

AnnoTrack - a tracking system for genome annotation

Felix Kokocinski; Jennifer Harrow; Tim Hubbard

BackgroundAs genome sequences are determined for increasing numbers of model organisms, demand has grown for better tools to facilitate unified genome annotation efforts by communities of biologists. Typically this process involves numerous experts from the field and the use of data from dispersed sources as evidence. This kind of collaborative annotation project requires specialized software solutions for efficient data tracking and processing.ResultsAs part of the scale-up phase of the ENCODE project (Encyclopedia of DNA Elements), the aim of the GENCODE project is to produce a highly accurate evidence-based reference gene annotation for the human genome. The AnnoTrack software system was developed to aid this effort. It integrates data from multiple distributed sources, highlights conflicts and facilitates the quick identification, prioritisation and resolution of problems during the process of genome annotation.ConclusionsAnnoTrack has been in use for the last year and has proven a very valuable tool for large-scale genome annotation. Designed to interface with standard bioinformatics components, such as DAS servers and Ensembl databases, it is easy to setup and configure for different genome projects. The source code is available at http://annotrack.sanger.ac.uk.


Archive | 2010

AnnoTrack - a tracking system for genome

Felix Kokocinski; Jennifer Harrow; Tim Hubbard

Collaboration


Dive into the Felix Kokocinski's collaboration.

Top Co-Authors

Avatar

Bronwen Aken

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Jennifer Harrow

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Adam Frankish

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Stephen M. J. Searle

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexandra Bignell

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

I. Barnes

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Mike Kay

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Toby Hunt

Wellcome Trust Sanger Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge