Sergio Contrino
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sergio Contrino.
Bioinformatics | 2012
Richard N. Smith; Jelena Aleksic; Daniela Butano; Adrian Carr; Sergio Contrino; Fengyuan Hu; Mike Lyne; Rachel Lyne; Alex Kalderimis; Kim Rutherford; Radek Stepan; Julie Sullivan; Matthew Wakeling; Xavier Watkins; Gos Micklem
Summary: InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search and a library of ‘widgets’ performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages. Availability: Freely available from http://www.intermine.org under the LGPL license. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Nucleic Acids Research | 2012
Sergio Contrino; Richard N. Smith; Daniela Butano; Adrian Carr; Fengyuan Hu; Rachel Lyne; Kim Rutherford; Alexis Kalderimis; Julie Sullivan; Seth Carbon; E. Kephart; P. Lloyd; Eo Stinson; Nicole L. Washington; M. Perry; P. Ruzanov; Z. Zha; Suzanna E. Lewis; Lincoln Stein; Gos Micklem
In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.
Nucleic Acids Research | 2015
Vivek Krishnakumar; Matthew R. Hanlon; Sergio Contrino; Erik S. Ferlanti; Svetlana Karamycheva; Maria Kim; Benjamin D. Rosen; Chia Yi Cheng; Walter Moreira; Stephen A. Mock; Joe Stubbs; Julie Sullivan; Konstantinos Krampis; Jason R. Miller; Gos Micklem; Matthew W. Vaughn; Christopher D. Town
The Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research. It houses the Arabidopsis thaliana genome sequence and associated annotation. It was conceived as a framework that allows the research community to develop and release ‘modules’ that integrate, analyze and visualize Arabidopsis data that may reside at remote sites. The current implementation provides an indexed database of core genomic information. These data are made available through feature-rich web applications that provide search, data mining, and genome browser functionality, and also by bulk download and web services. Araport uses software from the InterMine and JBrowse projects to expose curated data from TAIR, GO, BAR, EBI, UniProt, PubMed and EPIC CoGe. The site also hosts ‘science apps,’ developed as prototypes for community modules that use dynamic web pages to present data obtained on-demand from third-party servers via RESTful web services. Designed for sustainability, the Arabidopsis Information Portal strategy exploits existing scientific computing infrastructure, adopts a practical mixture of data integration technologies and encourages collaborative enhancement of the resource by its user community.
Nucleic Acids Research | 2014
Alex Kalderimis; Rachel Lyne; Daniela Butano; Sergio Contrino; Mike Lyne; Joshua Heimbach; Fengyuan Hu; Richard L. Smith; Radek Štěpán; Julie Sullivan; Gos Micklem
InterMine (www.intermine.org) is a biological data warehousing system providing extensive automatically generated and configurable RESTful web services that underpin the web interface and can be re-used in many other applications: to find and filter data; export it in a flexible and structured way; to upload, use, manipulate and analyze lists; to provide services for flexible retrieval of sequence segments, and for other statistical and analysis tools. Here we describe these features and discuss how they can be used separately or in combinations to support integrative and comparative analysis.
Database | 2011
Nicole L. Washington; Eo Stinson; M. Perry; P. Ruzanov; Sergio Contrino; Richard N. Smith; Z. Zha; Rachel Lyne; Adrian Carr; P. Lloyd; E. Kephart; Sheldon J. McKay; Gos Micklem; Lincoln Stein; Suzanna E. Lewis
The model organism Encyclopedia of DNA Elements (modENCODE) project is a National Human Genome Research Institute (NHGRI) initiative designed to characterize the genomes of Drosophila melanogaster and Caenorhabditis elegans. A Data Coordination Center (DCC) was created to collect, store and catalog modENCODE data. An effective DCC must gather, organize and provide all primary, interpreted and analyzed data, and ensure the community is supplied with the knowledge of the experimental conditions, protocols and verification checks used to generate each primary data set. We present here the design principles of the modENCODE DCC, and describe the ramifications of collecting thorough and deep metadata for describing experiments, including the use of a wiki for capturing protocol and reagent information, and the BIR-TAB specification for linking biological samples to experimental results. modENCODE data can be found at http://www.modencode.org. Database URL: http://www.modencode.org.
Genesis | 2015
Rachel Lyne; Julie Sullivan; Daniela Butano; Sergio Contrino; Joshua Heimbach; Fengyuan Hu; Alex Kalderimis; Mike Lyne; Richard N. Smith; Radek Štěpán; Rama Balakrishnan; Gail Binkley; Todd W. Harris; Kalpana Karra; Sierra A. T. Moxon; Howie Motenko; Steven B. Neuhauser; Leyla Ruzicka; Mike Cherry; Joel E. Richardson; Lincoln Stein; Monte Westerfield; Elizabeth A. Worthey; Gos Micklem
InterMine is a data integration warehouse and analysis software system developed for large and complex biological data sets. Designed for integrative analysis, it can be accessed through a user‐friendly web interface. For bioinformaticians, extensive web services as well as programming interfaces for most common scripting languages support access to all features. The web interface includes a useful identifier look‐up system, and both simple and sophisticated search options. Interactive results tables enable exploration, and data can be filtered, summarized, and browsed. A set of graphical analysis tools provide a rich environment for data exploration including statistical enrichment of sets of genes or other entities. InterMine databases have been developed for the major model organisms, budding yeast, nematode worm, fruit fly, zebrafish, mouse, and rat together with a newly developed human database. Here, we describe how this has facilitated interoperation and development of cross‐organism analysis tools and reports. InterMine as a data exploration and analysis tool is also described. All the InterMine‐based systems described in this article are resources freely available to the scientific community. genesis 53:547–560, 2015.
BMC Genomics | 2013
Quang M. Trinh; Fei-Yang Arthur Jen; Ziru Zhou; Kar Ming Chu; M. Perry; E. Kephart; Sergio Contrino; P. Ruzanov; Lincoln Stein
BackgroundFunded by the National Institutes of Health (NIH), the aim of the Mod el Organism ENC yclopedia o f D NA E lements (modENCODE) project is to provide the biological research community with a comprehensive encyclopedia of functional genomic elements for both model organisms C. elegans (worm) and D. melanogaster (fly). With a total size of just under 10 terabytes of data collected and released to the public, one of the challenges faced by researchers is to extract biologically meaningful knowledge from this large data set. While the basic quality control, pre-processing, and analysis of the data has already been performed by members of the modENCODE consortium, many researchers will wish to reinterpret the data set using modifications and enhancements of the original protocols, or combine modENCODE data with other data sets. Unfortunately this can be a time consuming and logistically challenging proposition.ResultsIn recognition of this challenge, the modENCODE DCC has released uniform computing resources for analyzing modENCODE data on Galaxy (https://github.com/modENCODE-DCC/Galaxy), on the public Amazon Cloud (http://aws.amazon.com), and on the private Bionimbus Cloud for genomic research (http://www.bionimbus.org). In particular, we have released Galaxy workflows for interpreting ChIP-seq data which use the same quality control (QC) and peak calling standards adopted by the modENCODE and ENCODE communities. For convenience of use, we have created Amazon and Bionimbus Cloud machine images containing Galaxy along with all the modENCODE data, software and other dependencies.ConclusionsUsing these resources provides a framework for running consistent and reproducible analyses on modENCODE data, ultimately allowing researchers to use more of their time using modENCODE data, and less time moving it around.
Plant and Cell Physiology | 2016
Vivek Krishnakumar; Sergio Contrino; Chia-Yi Cheng; Irina Belyaeva; Erik S. Ferlanti; Jason R. Miller; Matthew W. Vaughn; Gos Micklem; Christopher D. Town; Agnes P. Chan
ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled.
F1000Research | 2017
Yo Yehudi; Daniela Butano; Matthew Chadwick; Justin Clark-Casey; Sergio Contrino; Joshua Heimbach; Rachel Lyne; Juli Sullivan; Gos Micklem
SWAT4LS | 2016
Maxime Déraspe; Gail Binkley; Daniela Butano; Matthew Chadwick; J. Michael Cherry; Justin Clark-Casey; Sergio Contrino; Jacques Corbeil; Joshua Heimbach; Kalpana Karra; Rachel Lyne; Julie Sullivan; Yo Yehudi; Gos Micklem; Michel Dumontier