Kasper D. Hansen
Johns Hopkins University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kasper D. Hansen.
BMC Bioinformatics | 2010
James H. Bullard; Elizabeth Purdom; Kasper D. Hansen; Sandrine Dudoit
BackgroundHigh-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data.ResultsWe compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection.ConclusionsOur results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq.
Nature Methods | 2015
Wolfgang Huber; Vincent J. Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton Carvalho; Héctor Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D. Hansen; Rafael A. Irizarry; Michael S. Lawrence; Michael I. Love; James W. MacDonald; Valerie Obenchain; Andrzej K. Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K. Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan
Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.
Nucleic Acids Research | 2010
Kasper D. Hansen; Steven E. Brenner; Sandrine Dudoit
Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.
Genome Biology | 2010
Ben Langmead; Kasper D. Hansen; Jeffrey T. Leek
As sequencing throughput approaches dozens of gigabases per day, there is a growing need for efficient software for analysis of transcriptome sequencing (RNA-Seq) data. Myrna is a cloud-computing pipeline for calculating differential gene expression in large RNA-Seq datasets. We apply Myrna to the analysis of publicly available data sets and assess the goodness of fit of standard statistical models. Myrna is available from http://bowtie-bio.sf.net/myrna.
Genome Biology | 2012
Kasper D. Hansen; Benjamin Langmead; Rafael A. Irizarry
DNA methylation is an important epigenetic modification involved in gene regulation, which can now be measured using whole-genome bisulfite sequencing. However, cost, complexity of the data, and lack of comprehensive analytical tools are major challenges that keep this technology from becoming widely applied. Here we present BSmooth, an alignment, quality control and analysis pipeline that provides accurate and precise results even with low coverage data, appropriately handling biological replicates. BSmooth is open source software, and can be downloaded from http://rafalab.jhsph.edu/bsmooth.
Biostatistics | 2012
Kasper D. Hansen; Rafael A. Irizarry; Zhijin Wu
The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decades worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show that RNA-seq data demonstrate unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find guanine-cytosine content (GC-content) has a strong sample-specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here, we describe a statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content and quantile normalization to correct for global distortions.
Nature Neuroscience | 2012
Brian Herb; Florian Wolschin; Kasper D. Hansen; Martin J. Aryee; Ben Langmead; Rafael A. Irizarry; Gro V. Amdam; Andrew P. Feinberg
In honeybee societies, distinct caste phenotypes are created from the same genotype, suggesting a role for epigenetics in deriving these behaviorally different phenotypes. We found no differences in DNA methylation between irreversible worker and queen castes, but substantial differences between nurses and forager subcastes. Reverting foragers back to nurses reestablished methylation levels for a majority of genes and provides, to the best of our knowledge, the first evidence in any organism of reversible epigenetic changes associated with behavior.
Proceedings of the National Academy of Sciences of the United States of America | 2012
Jenny Tung; Luis B. Barreiro; Zachary P. Johnson; Kasper D. Hansen; Vasiliki Michopoulos; Donna Toufexis; Katelyn Michelini; Mark E. Wilson; Yoav Gilad
Variation in the social environment is a fundamental component of many vertebrate societies. In humans and other primates, adverse social environments often translate into lasting physiological costs. The biological mechanisms associated with these effects are therefore of great interest, both for understanding the evolutionary impacts of social behavior and in the context of human health. However, large gaps remain in our understanding of the mechanisms that mediate these effects at the molecular level. Here we addressed these questions by leveraging the power of an experimental system that consisted of 10 social groups of female macaques, in which each individuals social status (i.e., dominance rank) could be experimentally controlled. Using this paradigm, we show that dominance rank results in a widespread, yet plastic, imprint on gene regulation, such that peripheral blood mononuclear cell gene expression data alone predict social status with 80% accuracy. We investigated the mechanistic basis of these effects using cell type-specific gene expression profiling and glucocorticoid resistance assays, which together contributed to rank effects on gene expression levels for 694 (70%) of the 987 rank-related genes. We also explored the possible contribution of DNA methylation levels to these effects, and identified global associations between dominance rank and methylation profiles that suggest epigenetic flexibility in response to status-related behavioral cues. Together, these results illuminate the importance of the molecular response to social conditions, particularly in the immune system, and demonstrate a key role for gene regulation in linking the social environment to individual physiology.
Genome Biology | 2014
Jean Philippe Fortin; Aurelie Labbe; Mathieu Lemire; Brent W. Zanke; Thomas J. Hudson; Elana Fertig; Celia M. T. Greenwood; Kasper D. Hansen
We propose an extension to quantile normalization that removes unwanted technical variation using control probes. We adapt our algorithm, functional normalization, to the Illumina 450k methylation array and address the open problem of normalizing methylation data with global epigenetic changes, such as human cancers. Using data sets from The Cancer Genome Atlas and a large case–control study, we show that our algorithm outperforms all existing normalization methods with respect to replication of results between experiments, and yields robust results even in the presence of batch effects. Functional normalization can be applied to any microarray platform, provided suitable control probes are available.
Genome Research | 2011
Angela N. Brooks; Li Yang; Michael O. Duff; Kasper D. Hansen; Jung W. Park; Sandrine Dudoit; Steven E. Brenner; Brenton R. Graveley
Alternative splicing is generally controlled by proteins that bind directly to regulatory sequence elements and either activate or repress splicing of adjacent splice sites in a target pre-mRNA. Here, we have combined RNAi and mRNA-seq to identify exons that are regulated by Pasilla (PS), the Drosophila melanogaster ortholog of mammalian NOVA1 and NOVA2. We identified 405 splicing events in 323 genes that are significantly affected upon depletion of ps, many of which were annotated as being constitutively spliced. The sequence regions upstream and within PS-repressed exons and downstream from PS-activated exons are enriched for YCAY repeats, and these are consistent with the location of these motifs near NOVA-regulated exons in mammals. Thus, the RNA regulatory map of PS and NOVA1/2 is highly conserved between insects and mammals despite the fact that the target gene orthologs regulated by PS and NOVA1/2 are almost entirely nonoverlapping. This observation suggests that the regulatory codes of individual RNA binding proteins may be nearly immutable, yet the regulatory modules controlled by these proteins are highly evolvable.