Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Daniel Blankenberg is active.

Publication


Featured researches published by Daniel Blankenberg.


Current protocols in molecular biology | 2010

Galaxy: a web-based genome analysis tool for experimentalists.

Daniel Blankenberg; Gregory Von Kuster; Nathaniel Coraor; Guruprasad Ananda; Ross Lazarus; Mary E. Mangan; Anton Nekrutenko; James Taylor

High‐throughput data production has revolutionized molecular biology. However, massive increases in data generation capacity require analysis approaches that are more sophisticated, and often very computationally intensive. Thus, making sense of high‐throughput data requires informatics support. Galaxy (http://galaxyproject.org) is a software system that provides this support through a framework that gives experimentalists simple interfaces to powerful tools, while automatically managing the computational details. Galaxy is distributed both as a publicly available Web service, which provides tools for the analysis of genomic, comparative genomic, and functional genomic data, or a downloadable package that can be deployed in individual laboratories. Either way, it allows experimentalists without informatics or programming expertise to perform complex large‐scale analysis with just a Web browser. Curr. Protoc. Mol. Biol. 89:19.10.1‐19.10.21.


Nucleic Acids Research | 2016

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update

Enis Afgan; Dannon Baker; Marius van den Beek; Daniel Blankenberg; Dave Bouvier; Martin Čech; John Chilton; Dave Clements; Nate Coraor; Carl Eberhard; Björn Grüning; Aysam Guerler; Jennifer Hillman-Jackson; Gregory Von Kuster; Eric Rasche; Nicola Soranzo; Nitesh Turaga; James Taylor; Anton Nekrutenko; Jeremy Goecks

High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.


Bioinformatics | 2010

Manipulation of FASTQ data with Galaxy

Daniel Blankenberg; Assaf Gordon; Gregory Von Kuster; Nathan Coraor; James Taylor; Anton Nekrutenko

Summary: Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps. Availability and Implementation: This open-source toolset was implemented in Python and has been integrated into the online data analysis platform Galaxy (public web access: http://usegalaxy.org; download: http://getgalaxy.org). Two short movies that highlight the functionality of tools described in this manuscript as well as results from testing components of this tool suite against a set of previously published files are available at http://usegalaxy.org/u/dan/p/fastq Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


Genome Biology | 2012

An encyclopedia of mouse DNA elements (Mouse ENCODE)

John A. Stamatoyannopoulos; Michael Snyder; Ross C. Hardison; Bing Ren; Thomas R. Gingeras; David M. Gilbert; Mark Groudine; M. A. Bender; Rajinder Kaul; Theresa K. Canfield; Erica Giste; Audra K. Johnson; Mia Zhang; Gayathri Balasundaram; Rachel Byron; Vaughan Roach; Peter J. Sabo; Richard Sandstrom; A Sandra Stehling; Robert E. Thurman; Sherman M. Weissman; Philip Cayting; Manoj Hariharan; Jin Lian; Yong Cheng; Stephen G. Landt; Zhihai Ma; Barbara J. Wold; Job Dekker; Gregory E. Crawford

To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.


Current protocols in human genetics | 2007

Using Galaxy to Perform Large‐Scale Interactive Data Analyses

James Taylor; Ian Schenck; Daniel Blankenberg; Anton Nekrutenko

While most experimental biologists know where to download genomic data, few have a concrete plan on how to analyze it. This situation can be corrected by: (1) providing unified portals serving genomic data and (2) building Web applications to allow flexible retrieval and on‐the‐fly analyses of the data. Powerful resources, such as the UCSC Genome Browser already address the first issue. The second issue, however, remains open. For example, how to find human protein‐coding exons with the highest density of single nucleotide polymorphisms (SNPs) and extract orthologous sequences from all sequenced mammals? Indeed, one can access all relevant data from the UCSC Genome Browser. But once the data is downloaded how would one deal with millions of SNPs and gigabytes of alignments? Galaxy (http://g2.bx.psu.edu) is designed specifically for that purpose. It amplifies the strengths of existing resources (such as UCSC Genome Browser) by allowing the user to access and, most importantly, analyze data within a single interface in an unprecedented number of ways. Curr. Protoc. Bioinform. 19:10.5.1‐10.5.25.


Genetics | 2012

CloudMap: A Cloud-Based Pipeline for Analysis of Mutant Genome Sequences

Gregory Minevich; Danny S. Park; Daniel Blankenberg; Richard J. Poole; Oliver Hobert

Whole genome sequencing (WGS) allows researchers to pinpoint genetic differences between individuals and significantly shortcuts the costly and time-consuming part of forward genetic analysis in model organism systems. Currently, the most effort-intensive part of WGS is the bioinformatic analysis of the relatively short reads generated by second generation sequencing platforms. We describe here a novel, easily accessible and cloud-based pipeline, called CloudMap, which greatly simplifies the analysis of mutant genome sequences. Available on the Galaxy web platform, CloudMap requires no software installation when run on the cloud, but it can also be run locally or via Amazons Elastic Compute Cloud (EC2) service. CloudMap uses a series of predefined workflows to pinpoint sequence variations in animal genomes, such as those of premutagenized and mutagenized Caenorhabditis elegans strains. In combination with a variant-based mapping procedure, CloudMap allows users to sharply define genetic map intervals graphically and to retrieve very short lists of candidate variants with a few simple clicks. Automated workflows and extensive video user guides are available to detail the individual analysis steps performed (http://usegalaxy.org/cloudmap). We demonstrate the utility of CloudMap for WGS analysis of C. elegans and Arabidopsis genomes and describe how other organisms (e.g., Zebrafish and Drosophila) can easily be accommodated by this software platform. To accommodate rapid analysis of many mutants from large-scale genetic screens, CloudMap contains an in silico complementation testing tool that allows users to rapidly identify instances where multiple alleles of the same gene are present in the mutant collection. Lastly, we describe the application of a novel mapping/WGS method (“Variant Discovery Mapping”) that does not rely on a defined polymorphic mapping strain, and we integrate the application of this method into CloudMap. CloudMap tools and documentation are continually updated at http://usegalaxy.org/cloudmap.


Genome Biology | 2014

Dissemination of scientific software with Galaxy ToolShed

Daniel Blankenberg; Gregory Von Kuster; Emil Bouvier; Dannon Baker; Enis Afgan; Nicholas Stoler; James Taylor; Anton Nekrutenko

The proliferation of web-based integrative analysis frameworks has enabled users to perform complex analyses directly through the web. Unfortunately, it also revoked the freedom to easily select the most appropriate tools. To address this, we have developed Galaxy ToolShed.


Proceedings of the National Academy of Sciences of the United States of America | 2014

Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA.

Boris Rebolledo-Jaramillo; Marcia Shu-Wei Su; Nicholas Stoler; Jennifer A. McElhoe; Benjamin J. A. Dickins; Daniel Blankenberg; Thorfinn Sand Korneliussen; Francesca Chiaromonte; Rasmus Nielsen; Mitchell M. Holland; Ian M. Paul; Anton Nekrutenko; Kateryna D. Makova

Significance The frequency of intraindividual mitochondrial DNA (mtDNA) polymorphisms—heteroplasmies—can change dramatically from mother to child owing to the mitochondrial bottleneck at oogenesis. For deleterious heteroplasmies such a change may transform alleles that are benign at low frequency in a mother into disease-causing alleles when at a high frequency in her child. Our study estimates the mtDNA germ-line bottleneck to be small (30–35) and documents a positive association between the number of child heteroplasmies and maternal age at fertilization, enabling prediction of transmission of disease-causing variants and informing mtDNA evolution. The manifestation of mitochondrial DNA (mtDNA) diseases depends on the frequency of heteroplasmy (the presence of several alleles in an individual), yet its transmission across generations cannot be readily predicted owing to a lack of data on the size of the mtDNA bottleneck during oogenesis. For deleterious heteroplasmies, a severe bottleneck may abruptly transform a benign (low) frequency in a mother into a disease-causing (high) frequency in her child. Here we present a high-resolution study of heteroplasmy transmission conducted on blood and buccal mtDNA of 39 healthy mother–child pairs of European ancestry (a total of 156 samples, each sequenced at ∼20,000× per site). On average, each individual carried one heteroplasmy, and one in eight individuals carried a disease-associated heteroplasmy, with minor allele frequency ≥1%. We observed frequent drastic heteroplasmy frequency shifts between generations and estimated the effective size of the germ-line mtDNA bottleneck at only ∼30–35 (interquartile range from 9 to 141). Accounting for heteroplasmies, we estimated the mtDNA germ-line mutation rate at 1.3 × 10−8 (interquartile range from 4.2 × 10−9 to 4.1 × 10−8) mutations per site per year, an order of magnitude higher than for nuclear DNA. Notably, we found a positive association between the number of heteroplasmies in a child and maternal age at fertilization, likely attributable to oocyte aging. This study also took advantage of droplet digital PCR (ddPCR) to validate heteroplasmies and confirm a de novo mutation. Our results can be used to predict the transmission of disease-causing mtDNA variants and illuminate evolutionary dynamics of the mitochondrial genome.


BMC Bioinformatics | 2007

Quantitative sequence-function relationships in proteins based on gene ontology

Vineet Sangar; Daniel Blankenberg; Naomi Altman; Arthur M. Lesk

BackgroundThe relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins – for instance, orthologs in different mammals – to very distantly-related proteins at the limit of reliable recognition of homology.ResultsWe correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues) the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero.ConclusionOur results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative.


Database | 2011

Integrating diverse databases into an unified analysis framework: a Galaxy approach.

Daniel Blankenberg; Nathan Coraor; Gregory Von Kuster; James Taylor; Anton Nekrutenko

Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources. Database URL: http://usegalaxy.org

Collaboration


Dive into the Daniel Blankenberg's collaboration.

Top Co-Authors

Avatar

Anton Nekrutenko

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gregory Von Kuster

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Kateryna D. Makova

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Ian M. Paul

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Nicholas Stoler

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Ross C. Hardison

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Francesca Chiaromonte

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Webb Miller

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Dannon Baker

Johns Hopkins University

View shared research outputs
Researchain Logo
Decentralizing Knowledge