Niko Beerenwinkel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Niko Beerenwinkel is active.

Explore More

Publication

Featured researches published by Niko Beerenwinkel.

Bioinformatics | 2005

ROCR: visualizing classifier performance in R

Tobias Sing; Oliver Sander; Niko Beerenwinkel; Thomas Lengauer

UNLABELLED ROCR is a package for evaluating and visualizing the performance of scoring classifiers in the statistical language R. It features over 25 performance measures that can be freely combined to create two-dimensional performance curves. Standard methods for investigating trade-offs between specific performance measures are available within a uniform framework, including receiver operating characteristic (ROC) graphs, precision/recall plots, lift charts and cost curves. ROCR integrates tightly with Rs powerful graphics capabilities, thus allowing for highly adjustable plots. Being equipped with only three commands and reasonable default values for optional parameters, ROCR combines flexibility with ease of usage. AVAILABILITY http://rocr.bioinf.mpi-sb.mpg.de. ROCR can be used under the terms of the GNU General Public License. Running within R, it is platform-independent. CONTACT [email protected].

Proceedings of the National Academy of Sciences of the United States of America | 2008

Comparative lesion sequencing provides insights into tumor evolution

Siân Jones; Wei Dong Chen; Giovanni Parmigiani; Frank Diehl; Niko Beerenwinkel; Tibor Antal; Arne Traulsen; Martin A. Nowak; Christopher Siegel; Victor E. Velculescu; Kenneth W. Kinzler; Bert Vogelstein; Joseph Willis; Sanford D. Markowitz

We show that the times separating the birth of benign, invasive, and metastatic tumor cells can be determined by analysis of the mutations they have in common. When combined with prior clinical observations, these analyses suggest the following general conclusions about colorectal tumorigenesis: (i) It takes ≈17 years for a large benign tumor to evolve into an advanced cancer but <2 years for cells within that cancer to acquire the ability to metastasize; (ii) it requires few, if any, selective events to transform a highly invasive cancer cell into one with the capacity to metastasize; (iii) the process of cell culture ex vivo does not introduce new clonal mutations into colorectal tumor cell populations; and (iv) the rates at which point mutations develop in advanced cancers are similar to those of normal cells. These results have important implications for understanding human tumor pathogenesis, particularly those associated with metastasis.

PLOS Computational Biology | 2007

Genetic Progression and the Waiting Time to Cancer

Niko Beerenwinkel; Tibor Antal; David Dingli; Arne Traulsen; Kenneth W. Kinzler; Victor E. Velculescu; Bert Vogelstein; Martin A. Nowak

Cancer results from genetic alterations that disturb the normal cooperative behavior of cells. Recent high-throughput genomic studies of cancer cells have shown that the mutational landscape of cancer is complex and that individual cancers may evolve through mutations in as many as 20 different cancer-associated genes. We use data published by Sjöblom et al. (2006) to develop a new mathematical model for the somatic evolution of colorectal cancers. We employ the Wright-Fisher process for exploring the basic parameters of this evolutionary process and derive an analytical approximation for the expected waiting time to the cancer phenotype. Our results highlight the relative importance of selection over both the size of the cell population at risk and the mutation rate. The model predicts that the observed genetic diversity of cancer genomes can arise under a normal mutation rate if the average selective advantage per mutation is on the order of 1%. Increased mutation rates due to genetic instability would allow even smaller selective advantages during tumorigenesis. The complexity of cancer progression can be understood as the result of multiple sequential mutations, each of which has a relatively small but positive effect on net cell growth.

Proceedings of the National Academy of Sciences of the United States of America | 2002

Diversity and complexity of HIV-1 drug resistance: a bioinformatics approach to predicting phenotype from genotype.

Niko Beerenwinkel; Barbara Schmidt; Hauke Walter; Rolf Kaiser; Thomas Lengauer; Daniel Hoffmann; Klaus Korn; Joachim Selbig

Drug resistance testing has been shown to be beneficial for clinical management of HIV type 1 infected patients. Whereas phenotypic assays directly measure drug resistance, the commonly used genotypic assays provide only indirect evidence of drug resistance, the major challenge being the interpretation of the sequence information. We analyzed the significance of sequence variations in the protease and reverse transcriptase genes for drug resistance and derived models that predict phenotypic resistance from genotypes. For 14 antiretroviral drugs, both genotypic and phenotypic resistance data from 471 clinical isolates were analyzed with a machine learning approach. Information profiles were obtained that quantify the statistical significance of each sequence position for drug resistance. For the different drugs, patterns of varying complexity were observed, including between one and nine sequence positions with substantial information content. Based on these information profiles, decision tree classifiers were generated to identify genotypic patterns characteristic of resistance or susceptibility to the different drugs. We obtained concise and easily interpretable models to predict drug resistance from sequence information. The prediction quality of the models was assessed in leave-one-out experiments in terms of the prediction error. We found prediction errors of 9.6–15.5% for all drugs except for zalcitabine, didanosine, and stavudine, with prediction errors between 25.4% and 32.0%. A prediction service is freely available at http://cartan.gmd.de/geno2pheno.html.

PLOS Computational Biology | 2008

Viral Population Estimation Using Pyrosequencing

Nicholas Eriksson; Lior Pachter; Yumi Mitsuya; Soo-Yon Rhee; Chunlin Wang; Baback Gharizadeh; Mostafa Ronaghi; Robert W. Shafer; Niko Beerenwinkel

The diversity of virus populations within single infected hosts presents a major difficulty for the natural immune response as well as for vaccine design and antiviral drug therapy. Recently developed pyrophosphate-based sequencing technologies (pyrosequencing) can be used for quantifying this diversity by ultra-deep sequencing of virus samples. We present computational methods for the analysis of such sequence data and apply these techniques to pyrosequencing data obtained from HIV populations within patients harboring drug-resistant virus strains. Our main result is the estimation of the population structure of the sample from the pyrosequencing reads. This inference is based on a statistical approach to error correction, followed by a combinatorial algorithm for constructing a minimal set of haplotypes that explain the data. Using this set of explaining haplotypes, we apply a statistical model to infer the frequencies of the haplotypes in the population via an expectation–maximization (EM) algorithm. We demonstrate that pyrosequencing reads allow for effective population reconstruction by extensive simulations and by comparison to 165 sequences obtained directly from clonal sequencing of four independent, diverse HIV populations. Thus, pyrosequencing can be used for cost-effective estimation of the structure of virus populations, promising new insights into viral evolutionary dynamics and disease control strategies.

Nucleic Acids Research | 2003

Geno2pheno: estimating phenotypic drug resistance from HIV-1 genotypes

Niko Beerenwinkel; Martin Däumer; Mark Oette; Klaus Korn; Daniel Hoffmann; Rolf Kaiser; Thomas Lengauer; Joachim Selbig; Hauke Walter

Therapeutic success of anti-HIV therapies is limited by the development of drug resistant viruses. These genetic variants display complex mutational patterns in their pol gene, which codes for protease and reverse transcriptase, the molecular targets of current antiretroviral therapy. Genotypic resistance testing depends on the ability to interpret such sequence data, whereas phenotypic resistance testing directly measures relative in vitro susceptibility to a drug. From a set of 650 matched genotype-phenotype pairs we construct regression models for the prediction of phenotypic drug resistance from genotypes. Since the range of resistance factors varies considerably between different drugs, two scoring functions are derived from different sets of predicted phenotypes. Firstly, we compare predicted values to those of samples derived from 178 treatment-naive patients and report the relative deviance. Secondly, estimation of the probability density of 2000 predicted phenotypes gives rise to an intrinsic definition of a susceptible and a resistant subpopulation. Thus, for a predicted phenotype, we calculate the probability of membership in the resistant subpopulation. Both scores provide standardized measures of resistance that can be calculated from the genotype and are comparable between drugs. The geno2pheno system makes these genotype interpretations available via the Internet (http://www.genafor.org/).

Nucleic Acids Research | 2010

Error correction of next-generation sequencing data and reliable estimation of HIV quasispecies

Osvaldo Zagordi; Rolf Klein; Martin Däumer; Niko Beerenwinkel

Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5-kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated.

BMC Bioinformatics | 2011

ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data

Osvaldo Zagordi; Arnab Bhattacharya; Nicholas Eriksson; Niko Beerenwinkel

BackgroundWith next-generation sequencing technologies, experiments that were considered prohibitive only a few years ago are now possible. However, while these technologies have the ability to produce enormous volumes of data, the sequence reads are prone to error. This poses fundamental hurdles when genetic diversity is investigated.ResultsWe developed ShoRAH, a computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. The software was run on simulated data and on real data obtained in wet lab experiments to assess its reliability.ConclusionsShoRAH is implemented in C++, Python, and Perl and has been tested under Linux and Mac OS X. Source code is available under the GNU General Public License at http://www.cbg.ethz.ch/software/shorah.

Frontiers in Microbiology | 2012

Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data

Niko Beerenwinkel; Huldrych F. Günthard; Volker Roth; Karin J. Metzner

Many viruses, including the clinically relevant RNA viruses HIV (human immunodeficiency virus) and HCV (hepatitis C virus), exist in large populations and display high genetic heterogeneity within and between infected hosts. Assessing intra-patient viral genetic diversity is essential for understanding the evolutionary dynamics of viruses, for designing effective vaccines, and for the success of antiviral therapy. Next-generation sequencing (NGS) technologies allow the rapid and cost-effective acquisition of thousands to millions of short DNA sequences from a single sample. However, this approach entails several challenges in experimental design and computational data analysis. Here, we review the entire process of inferring viral diversity from sample collection to computing measures of genetic diversity. We discuss sample preparation, including reverse transcription and amplification, and the effect of experimental conditions on diversity estimates due to in vitro base substitutions, insertions, deletions, and recombination. The use of different NGS platforms and their sequencing error profiles are compared in the context of various applications of diversity estimation, ranging from the detection of single nucleotide variants (SNVs) to the reconstruction of whole-genome haplotypes. We describe the statistical and computational challenges arising from these technical artifacts, and we review existing approaches, including available software, for their solution. Finally, we discuss open problems, and highlight successful biomedical applications and potential future clinical use of NGS to estimate viral diversity.

Current Opinion in Virology | 2011

Ultra-deep sequencing for the analysis of viral populations.

Niko Beerenwinkel; Osvaldo Zagordi

Next-generation sequencing allows for cost-effective probing of virus populations at an unprecedented level of detail. The massively parallel sequencing approach can detect low-frequency mutations and it provides a snapshot of the entire virus population. However, analyzing ultra-deep sequencing data obtained from diverse virus populations is challenging because of PCR and sequencing errors and short read lengths, such that the experiment provides only indirect evidence of the underlying viral population structure. Recent computational and statistical advances allow for accommodating some of the confounding factors, including methods for read error correction, haplotype reconstruction, and haplotype frequency estimation. With these methods ultra-deep sequencing can be more reliably used to analyze, in a quantitative manner, the genetic diversity of virus populations.

Explore More