Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Steven J. Marygold is active.

Publication


Featured researches published by Steven J. Marygold.


Nucleic Acids Research | 2009

FlyBase: enhancing Drosophila Gene Ontology annotations

Susan Tweedie; Michael Ashburner; Kathleen Falls; Paul Leyland; Peter McQuilton; Steven J. Marygold; Gillian Millburn; David Osumi-Sutherland; Andrew Schroeder; Ruth Seal; Haiyan Zhang

FlyBase (http://flybase.org) is a database of Drosophila genetic and genomic information. Gene Ontology (GO) terms are used to describe three attributes of wild-type gene products: their molecular function, the biological processes in which they play a role, and their subcellular location. This article describes recent changes to the FlyBase GO annotation strategy that are improving the quality of the GO annotation data. Many of these changes stem from our participation in the GO Reference Genome Annotation Project--a multi-database collaboration producing comprehensive GO annotation sets for 12 diverse species.


Nucleic Acids Research | 2016

FlyBase: establishing a Gene Group resource for Drosophila melanogaster

Helen Attrill; Kathleen Falls; Joshua L. Goodman; Gillian Millburn; Giulia Antonazzo; Alix J. Rey; Steven J. Marygold

Many publications describe sets of genes or gene products that share a common biology. For example, genome-wide studies and phylogenetic analyses identify genes related in sequence; high-throughput genetic and molecular screens reveal functionally related gene products; and advanced proteomic methods can determine the subunit composition of multi-protein complexes. It is useful for such gene collections to be presented as discrete lists within the appropriate Model Organism Database (MOD) so that researchers can readily access these data alongside other relevant information. To this end, FlyBase (flybase.org), the MOD for Drosophila melanogaster, has established a ‘Gene Group’ resource: high-quality sets of genes derived from the published literature and organized into individual report pages. To facilitate further analyses, Gene Group Reports also include convenient download and analysis options, together with links to equivalent gene groups at other databases. This new resource will enable researchers with diverse backgrounds and interests to easily view and analyse acknowledged D. melanogaster gene sets and compare them with those of other species.


Nucleic Acids Research | 2017

FlyBase at 25: looking to the future

L. Sian Gramates; Steven J. Marygold; Gilberto dos Santos; Jose-Maria Urbano; Giulia Antonazzo; Beverley B. Matthews; Alix J. Rey; Christopher J. Tabone; Madeline A. Crosby; David B. Emmert; Kathleen Falls; Joshua L. Goodman; Yanhui Hu; Laura Ponting; Andrew J. Schroeder; Victor B. Strelets; Jim Thurmond; Pinglei Zhou

Since 1992, FlyBase (flybase.org) has been an essential online resource for the Drosophila research community. Concentrating on the most extensively studied species, Drosophila melanogaster, FlyBase includes information on genes (molecular and genetic), transgenic constructs, phenotypes, genetic and physical interactions, and reagents such as stocks and cDNAs. Access to data is provided through a number of tools, reports, and bulk-data downloads. Looking to the future, FlyBase is expanding its focus to serve a broader scientific community. In this update, we describe new features, datasets, reagent collections, and data presentations that address this goal, including enhanced orthology data, Human Disease Model Reports, protein domain search and visualization, concise gene summaries, a portal for external resources, video tutorials and the FlyBase Community Advisory Group.


BMC Bioinformatics | 2012

Automatic categorization of diverse experimental information in the bioscience literature

Ruihua Fang; Gary Schindelman; Kimberly Van Auken; Jolene S. Fernandes; Wen Chen; Xiaodong Wang; Paul Davis; Mary Ann Tuli; Steven J. Marygold; Gillian Millburn; Beverley B. Matthews; Haiyan Zhang; Nicholas H. Brown; William M. Gelbart; Paul W. Sternberg

BackgroundCuration of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance.ResultsWe successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction.ConclusionsOur methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort.


Database | 2012

Directly e-mailing authors of newly published papers encourages community curation

Stephanie Bunt; Gary B. Grumbling; Helen I. Field; Steven J. Marygold; Nicholas H. Brown; Gillian Millburn

Much of the data within Model Organism Databases (MODs) comes from manual curation of the primary research literature. Given limited funding and an increasing density of published material, a significant challenge facing all MODs is how to efficiently and effectively prioritize the most relevant research papers for detailed curation. Here, we report recent improvements to the triaging process used by FlyBase. We describe an automated method to directly e-mail corresponding authors of new papers, requesting that they list the genes studied and indicate (‘flag’) the types of data described in the paper using an online tool. Based on the author-assigned flags, papers are then prioritized for detailed curation and channelled to appropriate curator teams for full data extraction. The overall response rate has been 44% and the flagging of data types by authors is sufficiently accurate for effective prioritization of papers. In summary, we have established a sustainable community curation program, with the result that FlyBase curators now spend less time triaging and can devote more effort to the specialized task of detailed data extraction. Database URL: http://flybase.org/


Journal of Biomedical Semantics | 2013

The Drosophila phenotype ontology

David Osumi-Sutherland; Steven J. Marygold; Gillian Millburn; Peter McQuilton; Laura Ponting; Raymund Stefancsik; Kathleen Falls; Nicholas H. Brown; Georgios V. Gkoutos

BackgroundPhenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions.ResultsWe have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable.ConclusionsThe DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes.


Methods of Molecular Biology | 2016

Using FlyBase, a Database of Drosophila Genes and Genomes

Steven J. Marygold; Madeline A. Crosby; Joshua L. Goodman

For nearly 25 years, FlyBase (flybase.org) has provided a freely available online database of biological information about Drosophila species, focusing on the model organism D. melanogaster. The need for a centralized, integrated view of Drosophila research has never been greater as advances in genomic, proteomic, and high-throughput technologies add to the quantity and diversity of available data and resources.FlyBase has taken several approaches to respond to these changes in the research landscape. Novel report pages have been generated for new reagent types and physical interaction data; Drosophila models of human disease are now represented and showcased in dedicated Human Disease Model Reports; other integrated reports have been established that bring together related genes, datasets, or reagents; Gene Reports have been revised to improve access to new data types and to highlight functional data; links to external sites have been organized and expanded; and new tools have been developed to display and interrogate all these data, including improved batch processing and bulk file availability. In addition, several new community initiatives have served to enhance interactions between researchers and FlyBase, resulting in direct user contributions and improved feedback.This chapter provides an overview of the data content, organization, and available tools within FlyBase, focusing on recent improvements. We hope it serves as a guide for our diverse user base, enabling efficient and effective exploration of the database and thereby accelerating research discoveries.


Database | 2014

tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles

Juan Miguel Cejuela; Peter McQuilton; Laura Ponting; Steven J. Marygold; Raymund Stefancsik; Gillian Millburn; Burkhard Rost

The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the ‘tagtog’ system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation. Database URL: www.tagtog.net, www.flybase.org


BMC Bioinformatics | 2011

Toward an interactive article: integrating journals and biological databases

Arun Rangarajan; Tim Schedl; Karen Yook; Juancarlos Chan; Stephen Haenel; Lolly Otis; Sharon Faelten; Tracey DePellegrin-Connelly; Ruth Isaacson; Marek S. Skrzypek; Steven J. Marygold; Raymund Stefancsik; J. Michael Cherry; Paul W. Sternberg; Hans-Michael Müller

BackgroundJournal articles and databases are two major modes of communication in the biological sciences, and thus integrating these critical resources is of urgent importance to increase the pace of discovery. Projects focused on bridging the gap between journals and databases have been on the rise over the last five years and have resulted in the development of automated tools that can recognize entities within a document and link those entities to a relevant database. Unfortunately, automated tools cannot resolve ambiguities that arise from one term being used to signify entities that are quite distinct from one another. Instead, resolving these ambiguities requires some manual oversight. Finding the right balance between the speed and portability of automation and the accuracy and flexibility of manual effort is a crucial goal to making text markup a successful venture.ResultsWe have established a journal article mark-up pipeline that links GENETICS journal articles and the model organism database (MOD) WormBase. This pipeline uses a lexicon built with entities from the database as a first step. The entity markup pipeline results in links from over nine classes of objects including genes, proteins, alleles, phenotypes and anatomical terms. New entities and ambiguities are discovered and resolved by a database curator through a manual quality control (QC) step, along with help from authors via a web form that is provided to them by the journal. New entities discovered through this pipeline are immediately sent to an appropriate curator at the database. Ambiguous entities that do not automatically resolve to one link are resolved by hand ensuring an accurate link. This pipeline has been extended to other databases, namely Saccharomyces Genome Database (SGD) and FlyBase, and has been implemented in marking up a paper with links to multiple databases.ConclusionsOur semi-automated pipeline hyperlinks articles published in GENETICS to model organism databases such as WormBase. Our pipeline results in interactive articles that are data rich with high accuracy. The use of a manual quality control step sets this pipeline apart from other hyperlinking tools and results in benefits to authors, journals, readers and databases.


Fly | 2015

The Aminoacyl-tRNA Synthetases of Drosophila melanogaster

Jiongming Lu; Steven J. Marygold; Walid H. Gharib; Beat Suter

Aminoacyl-tRNA synthetases (aaRSs) ligate amino acids to their cognate tRNAs, allowing them to decode the triplet code during translation. Through different mechanisms aaRSs also perform several non-canonical functions in transcription, translation, apoptosis, angiogenesis and inflammation. Drosophila has become a preferred system to model human diseases caused by mutations in aaRS genes, to dissect effects of reduced translation or non-canonical activities, and to study aminoacylation and translational fidelity. However, the lack of a systematic annotation of this gene family has hampered such studies. Here, we report the identification of the entire set of aaRS genes in the fly genome and we predict their roles based on experimental evidence and/or orthology. Further, we propose a new, systematic and logical nomenclature for aaRSs. We also review the research conducted on Drosophila aaRSs to date. Together, our work provides the foundation for further research in the fly aaRS field.

Collaboration


Dive into the Steven J. Marygold's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joshua L. Goodman

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alix J. Rey

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge