Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Philip D. Blood is active.

Publication


Featured researches published by Philip D. Blood.


Nature Methods | 2017

Critical assessment of metagenome interpretation − a benchmark of computational metagenomics software

Alexander Sczyrba; Peter Hofmann; Peter Belmann; David Koslicki; Stefan Janssen; Johannes Droege; Ivan Gregor; Stephan Majda; Jessika Fiedler; Eik Dahms; Andreas Bremges; Adrian Fritz; Ruben Garrido-Oter; Tue Sparholt Jørgensen; Nicole Shapiro; Philip D. Blood; Alexey Gurevich; Yang Bai; Dmitrij Turaev; Matthew Z. DeMaere; Rayan Chikhi; Niranjan Nagarajan; Christopher Quince; Fernando Meyer; Monika Balvociute; Lars Hestbjerg Hansen; Søren J. Sørensen; Burton K H Chia; Bertrand Denis; Jeff Froula

Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.


Scientific Reports | 2015

Novel transcriptome assembly and improved annotation of the whiteleg shrimp (Litopenaeus vannamei), a dominant crustacean in global seafood mariculture

Noushin Ghaffari; Alejandro Sanchez-Flores; Ryan Doan; Karina D. Garcia-Orozco; Patricia L. Chen; Adrián Ochoa-Leyva; Alonso A. Lopez-Zavala; J. Salvador Carrasco; Chris Hong; Luis G. Brieba; Enrique Rudiño-Piñera; Philip D. Blood; J. E. Sawyer; Charles D. Johnson; Scott V. Dindot; Rogerio R. Sotelo-Mundo; Michael F. Criscitiello

We present a new transcriptome assembly of the Pacific whiteleg shrimp (Litopenaeus vannamei), the species most farmed for human consumption. Its functional annotation, a substantial improvement over previous ones, is provided freely. RNA-Seq with Illumina HiSeq technology was used to analyze samples extracted from shrimp abdominal muscle, hepatopancreas, gills and pleopods. We used the Trinity and Trinotate software suites for transcriptome assembly and annotation, respectively. The quality of this assembly and the affiliated targeted homology searches greatly enrich the curated transcripts currently available in public databases for this species. Comparison with the model arthropod Daphnia allows some insights into defining characteristics of decapod crustaceans. This large-scale gene discovery gives the broadest depth yet to the annotated transcriptome of this important species and should be of value to ongoing genomics and immunogenetic resistance studies in this shrimp of paramount global economic importance.


Nature Methods | 2017

Critical Assessment of Metagenome Interpretation — a benchmark of metagenomics software

Alexander Sczyrba; Peter Hofmann; Peter Belmann; David Koslicki; Stefan Janssen; Johannes Dröge; Ivan Gregor; Stephan Majda; Jessika Fiedler; Eik Dahms; Andreas Bremges; Adrian Fritz; Ruben Garrido-Oter; Tue Sparholt Jørgensen; Nicole Shapiro; Philip D. Blood; Alexey Gurevich; Yang Bai; Dmitrij Turaev; Matthew Z. DeMaere; Rayan Chikhi; Niranjan Nagarajan; Christopher Quince; Fernando Meyer; Monika Balvočiūtė; Lars Hestbjerg Hansen; Søren J. Sørensen; Burton K H Chia; Bertrand Denis; Jeff Froula

Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.


Journal of the American Medical Informatics Association | 2014

Leveraging the national cyberinfrastructure for biomedical research.

Richard D. LeDuc; Matthew W. Vaughn; John M. Fonner; Michael Sullivan; James G. Williams; Philip D. Blood; James Taylor; William K. Barnett

In the USA, the national cyberinfrastructure refers to a system of research supercomputer and other IT facilities and the high speed networks that connect them. These resources have been heavily leveraged by scientists in disciplines such as high energy physics, astronomy, and climatology, but until recently they have been little used by biomedical researchers. We suggest that many of the ‘Big Data’ challenges facing the medical informatics community can be efficiently handled using national-scale cyberinfrastructure. Resources such as the Extreme Science and Discovery Environment, the Open Science Grid, and Internet2 provide economical and proven infrastructures for Big Data challenges, but these resources can be difficult to approach. Specialized web portals, support centers, and virtual organizations can be constructed on these resources to meet defined computational challenges, specifically for genomics. We provide examples of how this has been done in basic biology as an illustration for the biomedical informatics community.


PLOS ONE | 2016

TCGA Expedition: A Data Acquisition and Management System for TCGA Data.

Uma Chandran; Olga Medvedeva; M. Michael Barmada; Philip D. Blood; Anish Chakka; Soumya Luthra; Antonio G. Ferreira; Kim F. Wong; Adrian V. Lee; Zhihui Zhang; Robert Budden; J. Ray Scott; Annerose Berndt; Jeremy M. Berg; Rebecca S. Jacobson

Background The Cancer Genome Atlas Project (TCGA) is a National Cancer Institute effort to profile at least 500 cases of 20 different tumor types using genomic platforms and to make these data, both raw and processed, available to all researchers. TCGA data are currently over 1.2 Petabyte in size and include whole genome sequence (WGS), whole exome sequence, methylation, RNA expression, proteomic, and clinical datasets. Publicly accessible TCGA data are released through public portals, but many challenges exist in navigating and using data obtained from these sites. We developed TCGA Expedition to support the research community focused on computational methods for cancer research. Data obtained, versioned, and archived using TCGA Expedition supports command line access at high-performance computing facilities as well as some functionality with third party tools. For a subset of TCGA data collected at University of Pittsburgh, we also re-associate TCGA data with de-identified data from the electronic health records. Here we describe the software as well as the architecture of our repository, methods for loading of TCGA data to multiple platforms, and security and regulatory controls that conform to federal best practices. Results TCGA Expedition software consists of a set of scripts written in Bash, Python and Java that download, extract, harmonize, version and store all TCGA data and metadata. The software generates a versioned, participant- and sample-centered, local TCGA data directory with metadata structures that directly reference the local data files as well as the original data files. The software supports flexible searches of the data via a web portal, user-centric data tracking tools, and data provenance tools. Using this software, we created a collaborative repository, the Pittsburgh Genome Resource Repository (PGRR) that enabled investigators at our institution to work with all TCGA data formats, and to interrogate these data with analysis pipelines, and associated tools. WGS data are especially challenging for individual investigators to use, due to issues with downloading, storage, and processing; having locally accessible WGS BAM files has proven invaluable. Conclusion Our open-source, freely available TCGA Expedition software can be used to create a local collaborative infrastructure for acquiring, managing, and analyzing TCGA data and other large public datasets.


extreme science and engineering discovery environment | 2013

National Center for Genome Analysis support leverages XSEDE to support life science research

Richard D. LeDuc; Thomas G. Doak; Le-Shin Wu; Philip D. Blood; Carrie L. Ganote; Matthew W. Vaughn

The National Center for Genome Analysis Support (NCGAS) is a response to the concern that NSF-funded life scientists were underutilizing the national cyberinfrastructure, because there has been little effort to tailor these resources to the life scientist communities needs. NCGAS is a multi-institutional service center that provides computational resources, specialized systems support to both the end-user and systems administrators, curated sets of applications, and most importantly scientific consultations for domain scientists unfamiliar with next generation DNA sequence data analysis. NCGAS is a partnership between Indiana University Pervasive Technology Institute, Texas Advanced Computing Center, San Diego Supercomputing Center, and the Pittsburgh Supercomputing Center. NCGAS provides hardened bioinformatic applications and user support on all aspects of a users data analysis, including data management, systems usage, bioinformatics, and biostatistics related issues.


Concurrency and Computation: Practice and Experience | 2014

Enabling large-scale next-generation sequence assembly with Blacklight

M. Brian Couger; Lenore Pipes; Fabio M. Squina; Rolf A. Prade; Adam Siepel; Robert E. Palermo; Michael G. Katze; Christopher E. Mason; Philip D. Blood

A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short‐read and long‐read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. Copyright


bioRxiv | 2017

What are the most influencing factors in reconstructing a reliable transcriptome assembly

Noushin Ghaffari; Jordi Abante; Raminder Singh; Philip D. Blood; Lenore Pipes; Christopher E. Mason; Charles D. Johnson

Reconstructing the genome and transcriptome for a new or extant species are essential steps in expanding our understanding of the organism’s active RNA landscape and gene regulatory dynamics, as well as for developing therapeutic targets to fight disease. The advancement of sequencing technologies has paved the way to generate high-quality draft transcriptomes. With many possible approaches available to accomplish this task, there is a need for a closer investigation of the factors that influence the quality of the results. We carried out an extensive survey of variety of elements that are important in transcriptome assembly. We utilized the human RNA-Seq data from the Sequencing Quality Control Consortium (SEQC) as a well-characterized and comprehensive resource with an available, well-studied human reference genome. Our results indicate that the quality of the library construction significantly impacts the quality of the assembly. Higher coverage of the genome is not as important as the quality of the input RNA-Seq data. Thus, once a certain coverage is attained, the quality of the assembly is mainly dependent on the base-calling accuracy of the input sequencing reads; and it is important to avoid saturating the assembler with extra coverage.


Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact | 2017

Demonstrating Distributed Workflow Computing with a Federating Wide-Area File System

Philip D. Blood; Anjana Kar; Jason Sommerfield; Beth Lynn Eicher; Richard Angeletti; J. Ray Scott

We have demonstrated the synergy of a wide-area SLASH2 file system [1] with remote bioinformatics workflows between Extreme Science and Engineering Discovery Environment [2] sites using the Galaxy Projects web-based platform [3] for reproducible data analysis. Wide-area Galaxy workflows were enabled by establishing a geographically-distributed SLASH2 instance between the Greenfield [4] system at Pittsburgh Supercomputing Center [5] and virtual machines incorporating storage within the Corral [6] file system at the Texas Advanced Computing Center [7]. Analysis tasks submitted through a single Galaxy instance seamlessly leverage data available from either site. In this paper, we explore the advantages of SLASH2 for enabling workflows from Galaxy Main [8].


Nature Methods | 2017

Critical Assessment of Metagenome Interpretation[mdash]a benchmark of metagenomics software

Alexander Sczyrba; Peter Hofmann; Peter Belmann; David Koslicki; Stefan Janssen; Johannes Dröge; Ivan Gregor; Stephan Majda; Jessika Fiedler; Eik Dahms; Andreas Bremges; Adrian Fritz; Ruben Garrido-Oter; Tue Sparholt Jørgensen; Nicole Shapiro; Philip D. Blood; Alexey Gurevich; Yang Bai; Dmitrij Turaev; Matthew Z. DeMaere; Rayan Chikhi; Niranjan Nagarajan; Christopher Quince; Fernando Meyer; Monika Balvočiūtė; Lars Hestbjerg Hansen; Søren J. Sørensen; Burton K H Chia; Bertrand Denis; Jeff Froula

Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.

Collaboration


Dive into the Philip D. Blood's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jeff Froula

Joint Genome Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rayan Chikhi

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eik Dahms

University of Düsseldorf

View shared research outputs
Researchain Logo
Decentralizing Knowledge