Raymund Stefancsik
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Raymund Stefancsik.
Nucleic Acids Research | 2017
Simon A. Forbes; David Beare; Harry Boutselakis; Sally Bamford; Nidhi Bindal; John G. Tate; Charlotte G. Cole; Sari Ward; Elisabeth Dawson; Laura Ponting; Raymund Stefancsik; Bhavana Harsha; Chai Yin Kok; Mingming Jia; Harry C. Jubb; Zbyslaw Sondka; Sam Thompson; Tisham De; Peter J. Campbell
COSMIC, the Catalogue of Somatic Mutations in Cancer (http://cancer.sanger.ac.uk) is a high-resolution resource for exploring targets and trends in the genetics of human cancer. Currently the broadest database of mutations in cancer, the information in COSMIC is curated by expert scientists, primarily by scrutinizing large numbers of scientific publications. Over 4 million coding mutations are described in v78 (September 2016), combining genome-wide sequencing results from 28 366 tumours with complete manual curation of 23 489 individual publications focused on 186 key genes and 286 key fusion pairs across all cancers. Molecular profiling of large tumour numbers has also allowed the annotation of more than 13 million non-coding mutations, 18 029 gene fusions, 187 429 genome rearrangements, 1 271 436 abnormal copy number segments, 9 175 462 abnormal expression variants and 7 879 142 differentially methylated CpG dinucleotides. COSMIC now details the genetics of drug resistance, novel somatic gene mutations which allow a tumour to evade therapeutic cancer drugs. Focusing initially on highly characterized drugs and genes, COSMIC v78 contains wide resistance mutation profiles across 20 drugs, detailing the recurrence of 301 unique resistance alleles across 1934 drug-resistant tumours. All information from the COSMIC database is available freely on the COSMIC website.
Nucleic Acids Research | 2014
Susan E. St. Pierre; Laura Ponting; Raymund Stefancsik; Peter McQuilton
FlyBase (http://flybase.org) is the leading website and database of Drosophila genes and genomes. Whether you are using the fruit fly Drosophila melanogaster as an experimental system or wish to understand Drosophila biological knowledge in relation to human disease or to other model systems, FlyBase can help you successfully find the information you are looking for. Here, we demonstrate some of our more advanced searching systems and highlight some of our new tools for searching the wealth of data on FlyBase. The first section explores gene function in FlyBase, using our TermLink tool to search with Controlled Vocabulary terms and our new RNA-Seq Search tool to search gene expression. The second section of this article describes a few ways to search genomic data in FlyBase, using our BLAST server and the new implementation of GBrowse 2, as well as our new FeatureMapper tool. Finally, we move on to discuss our most powerful search tool, QueryBuilder, before describing pre-computed cuts of the data and how to query the database programmatically.
Database | 2016
Qinghua Wang; Shabbir Syed Abdul; Lara Monteiro Almeida; Sophia Ananiadou; Yalbi Itzel Balderas-Martínez; Riza Theresa Batista-Navarro; David Campos; Lucy Chilton; Hui-Jou Chou; Gabriela Contreras; Laurel Cooper; Hong-Jie Dai; Barbra Ferrell; Juliane Fluck; Socorro Gama-Castro; Nancy George; Georgios V. Gkoutos; Afroza Khanam Irin; Lars Juhl Jensen; Silvia Jimenez; Toni Rose Jue; Ingrid M. Keseler; Sumit Madan; Sérgio Matos; Peter McQuilton; Marija Milacic; Matthew Mort; Jeyakumar Natarajan; Evangelos Pafilis; Emiliano Pereira
Fully automated text mining (TM) systems promote efficient literature searching, retrieval, and review but are not sufficient to produce ready-to-consume curated documents. These systems are not meant to replace biocurators, but instead to assist them in one or more literature curation steps. To do so, the user interface is an important aspect that needs to be considered for tool adoption. The BioCreative Interactive task (IAT) is a track designed for exploring user-system interactions, promoting development of useful TM tools, and providing a communication channel between the biocuration and the TM communities. In BioCreative V, the IAT track followed a format similar to previous interactive tracks, where the utility and usability of TM tools, as well as the generation of use cases, have been the focal points. The proposed curation tasks are user-centric and formally evaluated by biocurators. In BioCreative V IAT, seven TM systems and 43 biocurators participated. Two levels of user participation were offered to broaden curator involvement and obtain more feedback on usability aspects. The full level participation involved training on the system, curation of a set of documents with and without TM assistance, tracking of time-on-task, and completion of a user survey. The partial level participation was designed to focus on usability aspects of the interface and not the performance per se. In this case, biocurators navigated the system by performing pre-designed tasks and then were asked whether they were able to achieve the task and the level of difficulty in completing the task. In this manuscript, we describe the development of the interactive task, from planning to execution and discuss major findings for the systems tested. Database URL: http://www.biocreative.org
Current protocols in human genetics | 2016
Simon A. Forbes; David Beare; Nidhi Bindal; Sally Bamford; Sari Ward; Charlotte G. Cole; Mingming Jia; Chai Yin Kok; Harry Boutselakis; Tisham De; Zbyslaw Sondka; Laura Ponting; Raymund Stefancsik; Bhavana Harsha; John G. Tate; Elisabeth Dawson; Sam Thompson; Harry C. Jubb; Peter J. Campbell
COSMIC (http://cancer.sanger.ac.uk) is an expert‐curated database of somatic mutations in human cancer. Broad and comprehensive in scope, recent releases in 2016 describe over 4 million coding mutations across all human cancer disease types. Mutations are annotated across the entire genome, but expert curation is focused on over 400 key cancer genes. Now encompassing the majority of molecular mutation mechanisms in oncogenetics, COSMIC additionally describes 10 million non‐coding mutations, 1 million copy‐number aberrations, 9 million gene‐expression variants, and almost 8 million differentially methylated CpGs. This information combines a consistent interpretation of the data from the major cancer genome consortia and cancer genome literature with exhaustive hand curation of over 22,000 gene‐specific literature publications. This unit describes the graphical Web site in detail; alternative protocols overview other ways the entire database can be accessed, analyzed, and downloaded.
Virus Research | 2010
Győző L. Kaján; Raymund Stefancsik; Krisztina Ursu; Vilmos Palya; Mária Benkő
The complete genome sequence of an adenovirus, isolated from turkey and proposed to be turkey adenovirus type 1 (TAdV-1), was determined to extend our knowledge about the genome organisation and phylogeny of aviadenoviruses. The longest adenovirus genome, consisting of 45,412 bp, with the highest G+C content (of 67.55%) known to date, was found. The central part of the TAdV-1 genome has the conserved gene set and arrangement that are characteristic for every other adenovirus analysed to date. This genome core is flanked by the terminal early regions 1 and 4 (E1 and E4). Aviadenovirus-specific genus-common genes were found in these regions, each containing nine such open reading frames (ORFs). Additionally a type-specific novel ORF, designated as ORF50, was found in E4. Phylogenetic analysis as well as the presence of the genus-specific genes, splice sites and protease cleavage sites confirmed the classification of TAdV-1 in the genus Aviadenovirus. Intrageneric analyses of two genus-specific genes demonstrated the distinctness of TAdV-1 from other aviadenoviruses, thus supporting the proposal for the establishment of a new species, Turkey adenovirus B for TAdV-1.
Comparative and Functional Genomics | 2003
Raymund Stefancsik; Jeffrey D. Randall; Chengjian Mao; Satyapriya Sarkar
We describe the cloning, sequencing and structure of the human fast skeletal troponin T (TNNT3) gene located on chromosome 11p15.5. The single-copy gene encodes 19 exons and 18 introns. Eleven of these exons, 1–3, 9–15 and 18, are constitutively spliced, whereas exons 4–8 are alternatively spliced. The gene contains an additional subset of developmentally regulated and alternatively spliced exons, including a foetal exon located between exon 8 and 9 and exon 16 or α (adult) and 17 or β (foetal and neonatal). Exon phasing suggests that the majority of the alternatively spliced exons located at the 5′ end of the gene may have evolved as a result of exon shuffling, because they are of the same phase class. In contrast, the 3′ exons encoding an evolutionarily conserved heptad repeat domain, shared by both TnT and troponin I (TnI), may be remnants of an ancient ancestral gene. The sequence of the 5′ flanking region shows that the putative promoter contains motifs including binding sites for MyoD, MEF-2 and several transcription factors which may play a role in transcriptional regulation and tissue-specific expression of TnT. The coding region of TNNT3 exhibits strong similarity to the corresponding rat sequence. However, unlike the rat TnT gene, TNNT3 possesses two repeat regions of CCA and TC. The exclusive presence of these repetitive elements in the human gene indicates divergence in the evolutionary dynamics of mammalian TnT genes. Homologous muscle-specific splicing enhancer motifs are present in the introns upstream and downstream of the foetal exon, and may play a role in the developmental pattern of alternative splicing of the gene. The genomic correlates of TNNT3 are relevant to our understanding of the evolution and regulation of expression of the gene, as well as the structure and function of the protein isoforms. The nucleotide sequence of TNNT3 has been submitted to EMBL/GenBank under Accession No. AF026276.
Journal of Biomedical Semantics | 2013
David Osumi-Sutherland; Steven J. Marygold; Gillian Millburn; Peter McQuilton; Laura Ponting; Raymund Stefancsik; Kathleen Falls; Nicholas H. Brown; Georgios V. Gkoutos
BackgroundPhenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions.ResultsWe have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable.ConclusionsThe DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes.
Dna Sequence | 2003
Raymund Stefancsik; Satyapriya Sarkar
Transcription factors of the SMAD family relay signals from cell surface receptors to the nucleus in response to TGF-β related soluble factors. Members of the nuclear factor I/CAAT box binding family (NFI/CTF) have been implicated as regulators of diverse biological processes such as adenovirus replication and transcription of TGF- responsive genes. There are highly conserved DNA binding domains in SMAD and NFI/CTF transcription factors that allow sequence specific DNA binding for members of each family. However, no homology relationship has been established for the DNA binding domains present in these families. For a better understanding of the structure and evolution of SMAD genes, we carried out a sensitive PSI-BLAST database search. This revealed significant similarities between the DNA binding domains of SMADs and NFI/CTF transcription factors. Enhanced graphic matrix analysis and multiple sequence alignment of the amino acid sequences of the SMAD and NFI/CTF DNA binding domains also show that these two classes of domains share considerable structural similarity. These results strongly suggest that these two classes of factors share a homologous DNA binding domain presumably resulting from a common ancestry. In contrast, the C-terminal transcription modulation domains of both SMAD and NFI/CTF families do not show any sequence similarity. Based on the structural relationship of their DNA binding domains, we propose that the SMAD and NFI/CTF transcription factors belong to new superfamily of genes.
BMC Bioinformatics | 2011
Arun Rangarajan; Tim Schedl; Karen Yook; Juancarlos Chan; Stephen Haenel; Lolly Otis; Sharon Faelten; Tracey DePellegrin-Connelly; Ruth Isaacson; Marek S. Skrzypek; Steven J. Marygold; Raymund Stefancsik; J. Michael Cherry; Paul W. Sternberg; Hans-Michael Müller
BackgroundJournal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five years and have resulted in the development of automated tools that can recognize entities within a document and link those entities to a relevant database. Unfortunately, automated tools cannot resolve ambiguities that arise from one term being used to signify entities that are quite distinct from one another. Instead, resolving these ambiguities requires some manual oversight. Finding the right balance between the speed and portability of automation and the accuracy and flexibility of manual effort is a crucial goal to making text markup a successful venture.ResultsWe have established a journal article mark-up pipeline that links GENETICS journal articles and the model organism database (MOD) WormBase. This pipeline uses a lexicon built with entities from the database as a first step. The entity markup pipeline results in links from over nine classes of objects including genes, proteins, alleles, phenotypes and anatomical terms. New entities and ambiguities are discovered and resolved by a database curator through a manual quality control (QC) step, along with help from authors via a web form that is provided to them by the journal. New entities discovered through this pipeline are immediately sent to an appropriate curator at the database. Ambiguous entities that do not automatically resolve to one link are resolved by hand ensuring an accurate link. This pipeline has been extended to other databases, namely Saccharomyces Genome Database (SGD) and FlyBase, and has been implemented in marking up a paper with links to multiple databases.ConclusionsOur semi-automated pipeline hyperlinks articles published in GENETICS to model organism databases such as WormBase. Our pipeline results in interactive articles that are data rich with high accuracy. The use of a manual quality control step sets this pipeline apart from other hyperlinking tools and results in benefits to authors, journals, readers and databases.
Cancer Research | 2017
Zbyslaw Sondka; Sally Bamford; Charlotte G. Cole; Elisabeth Dawson; Laura Ponting; Raymund Stefancsik; Sari Ward; John G. Tate; Peter J. Campbell; Simon A. Forbes
The Cancer Gene Census is an ongoing effort to catalogue genes for which somatic mutations have been causally implicated in cancer. The Census comprises manually curated summaries of the most relevant information for cancer-driving genes and their somatic mutations and brings together the expertise of a dedicated curation team, cancer scientists and the comprehensive resources of the COSMIC database. Current research focuses on characterising the participation of 609 census genes in hallmarks of cancer and identification of additional genes involved in these biological traits primarily via altered expression, CNA or epigenetic changes. New overviews of cancer gene function focused on hallmarks of cancer pull together manually curated information on the function of proteins coded by cancer genes and summarises the data in simple graphical form. It presents a condensed overview of most relevant facts with quick access to the literature source, aiming to provide summary characteristics of a cancer gene, rather than a full monography, to avoid information overload. This functional characterisation enables the creation of lists of genes of interest focused on the particular role they play in the development of cancer, as well as aiming to identify the cellular functions affected by mutations in particular tumours, and help to choose right targets for targeted therapy or synthetic lethality experiments. The Census is available from the COSMIC website for online use or download at: http://cancer.sanger.ac.uk/census. Citation Format: Zbyslaw Sondka, Sally Bamford, Charlotte G. Cole, Elisabeth Dawson, Laura Ponting, Raymund Stefancsik, Sari A. Ward, John Tate, Peter J. Campbell, Simon A. Forbes. COSMIC Cancer Gene Census: expert descriptions across genes in oncogenesis [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 2599. doi:10.1158/1538-7445.AM2017-2599