Simon Anders
European Bioinformatics Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Simon Anders.
Genome Biology | 2010
Simon Anders; Wolfgang Huber
High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.
Genome Biology | 2014
Michael I. Love; Wolfgang Huber; Simon Anders
In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html.
Bioinformatics | 2015
Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: [email protected]
Nature Methods | 2015
Wolfgang Huber; Vincent J. Carey; Robert Gentleman; Simon Anders; Marc Carlson; Benilton Carvalho; Héctor Corrada Bravo; Sean Davis; Laurent Gatto; Thomas Girke; Raphael Gottardo; Florian Hahne; Kasper D. Hansen; Rafael A. Irizarry; Michael S. Lawrence; Michael I. Love; James W. MacDonald; Valerie Obenchain; Andrzej K. Oleś; Hervé Pagès; Alejandro Reyes; Paul Shannon; Gordon K. Smyth; Dan Tenenbaum; Levi Waldron; Martin Morgan
Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.
Genome Research | 2012
Simon Anders; Alejandro Reyes; Wolfgang Huber
RNA-seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential isoform abundance in comparisons between conditions, cell types, or tissues. We present DEXSeq, a statistical method to test for differential exon usage in RNA-seq data. DEXSeq uses generalized linear models and offers reliable control of false discoveries by taking biological variation into account. DEXSeq detects with high sensitivity genes, and in many cases exons, that are subject to differential exon usage. We demonstrate the versatility of DEXSeq by applying it to several data sets. The method facilitates the study of regulation and function of alternative exon usage on a genome-wide scale. An implementation of DEXSeq is available as an R/Bioconductor package.
Nature Protocols | 2013
Simon Anders; Davis J. McCarthy; Yunshun Chen; Michal Okoniewski; Gordon K. Smyth; Wolfgang Huber; Mark D. Robinson
RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations) while optionally adjusting for other systematic factors that affect the data-collection process. There are a number of subtle yet crucial aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4–10 samples) can be <1 h, with computation time <1 d using a standard desktop PC.
Nature Methods | 2013
Philip Brennecke; Simon Anders; Jong Kyoung Kim; Aleksandra A. Kolodziejczyk; Xiuwei Zhang; Valentina Proserpio; Bianka Baying; Vladimir Benes; Sarah A. Teichmann; John C. Marioni; Marcus G. Heisler
Single-cell RNA-seq can yield valuable insights about the variability within a population of seemingly homogeneous cells. We developed a quantitative statistical method to distinguish true biological variability from the high levels of technical noise in single-cell experiments. Our approach quantifies the statistical significance of observed cell-to-cell variability in expression strength on a gene-by-gene basis. We validate our approach using two independent data sets from Arabidopsis thaliana and Mus musculus.
Bioinformatics | 2009
Martin Morgan; Simon Anders; Michael V. Lawrence; Patrick Aboyoun; Hervé Pagès; Robert Gentleman
Summary: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead is provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources. Availability and Implementation: This package is implemented in R and available at the Bioconductor web site; the package contains a ‘vignette’ outlining typical work flows. Contact: [email protected]
Cell | 2013
Kathi Zarnack; Julian König; Mojca Tajnik; Inigo Martincorena; Sebastian Eustermann; Isabelle Stévant; Alejandro Reyes; Simon Anders; Nicholas M. Luscombe; Jernej Ule
Summary There are ∼650,000 Alu elements in transcribed regions of the human genome. These elements contain cryptic splice sites, so they are in constant danger of aberrant incorporation into mature transcripts. Despite posing a major threat to transcriptome integrity, little is known about the molecular mechanisms preventing their inclusion. Here, we present a mechanism for protecting the human transcriptome from the aberrant exonization of transposable elements. Quantitative iCLIP data show that the RNA-binding protein hnRNP C competes with the splicing factor U2AF65 at many genuine and cryptic splice sites. Loss of hnRNP C leads to formation of previously suppressed Alu exons, which severely disrupt transcript function. Minigene experiments explain disease-associated mutations in Alu elements that hamper hnRNP C binding. Thus, by preventing U2AF65 binding to Alu elements, hnRNP C plays a critical role as a genome-wide sentinel protecting the transcriptome. The findings have important implications for human evolution and disease.
Genome Biology | 2010
Stefan Thomsen; Simon Anders; Sarath Chandra Janga; Wolfgang Huber; Claudio R. Alonso
BackgroundThe modulation of mRNA levels across tissues and time is key for the establishment and operation of the developmental programs that transform the fertilized egg into a fully formed embryo. Although the developmental mechanisms leading to differential mRNA synthesis are heavily investigated, comparatively little attention is given to the processes of mRNA degradation and how these relate to the molecular programs controlling development.ResultsHere we combine timed collection of Drosophila embryos and unfertilized eggs with genome-wide microarray technology to determine the degradation patterns of all mRNAs present during early fruit fly development. Our work studies the kinetics of mRNA decay, the contributions of maternally and zygotically encoded factors to mRNA degradation, and the ways in which mRNA decay profiles relate to gene function, mRNA localization patterns, translation rates and protein turnover. We also detect cis-regulatory sequences enriched in transcripts with common degradation patterns and propose several proteins and microRNAs as developmental regulators of mRNA decay during early fruit fly development. Finally, we experimentally validate the effects of a subset of cis-regulatory sequences and trans-regulators in vivo.ConclusionsOur work advances the current understanding of the processes controlling mRNA degradation during early Drosophila development, taking us one step closer to the understanding of mRNA decay processes in all animals. Our data also provide a valuable resource for further experimental and computational studies investigating the process of mRNA decay.