Torsten Hothorn
University of Zurich
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Torsten Hothorn.
Genome Biology | 2004
Robert Gentleman; Vincent J. Carey; Douglas M. Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano M. Iacus; Rafael A. Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony Rossini; Gunther Sawitzki; Colin A. Smith; Gordon K. Smyth; Luke Tierney; Jean Yee Hwa Yang; Jianhua Zhang
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.
Biometrical Journal | 2008
Torsten Hothorn; Frank Bretz; Peter H. Westfall
Simultaneous inference is a common problem in many areas of application. If multiple null hypotheses are tested simultaneously, the probability of rejecting erroneously at least one of them increases beyond the pre-specified significance level. Simultaneous inference procedures have to be used which adjust for multiplicity and thus control the overall type I error rate. In this paper we describe simultaneous inference procedures in general parametric models, where the experimental questions are specified through a linear combination of elemental model parameters. The framework described here is quite general and extends the canonical theory of multiple comparison procedures in ANOVA models to linear regression problems, generalized linear models, linear mixed effects models, the Cox model, robust linear models, etc. Several examples using a variety of different statistical models illustrate the breadth of the results. For the analyses we use the R add-on package multcomp, which provides a convenient interface to the general approach adopted here.
Journal of Computational and Graphical Statistics | 2006
Torsten Hothorn; Kurt Hornik; Achim Zeileis
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously affects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, confirming the need for an unbiased variable selection. Moreover, it is shown that the prediction accuracy of trees with early stopping is equivalent to the prediction accuracy of pruned trees with unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on glaucoma classification, node positive breast cancer survival and mammography experience are re-analyzed.
BMC Bioinformatics | 2007
Carolin Strobl; Anne-Laure Boulesteix; Achim Zeileis; Torsten Hothorn
BackgroundVariable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories.ResultsSimulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand.ConclusionWe propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research.
Statistical Science | 2007
Peter Bühlmann; Torsten Hothorn
We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in high-dimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated open-source software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing user-specified loss functions.
Lancet Oncology | 2012
Claus Rödel; Torsten Liersch; Heinz Becker; Rainer Fietkau; Werner Hohenberger; Torsten Hothorn; Ullrich Graeven; Dirk Arnold; Marga Lang-Welzenbach; Hans-Rudolf Raab; Heiko Sülberg; Christian Wittekind; Sergej Potapov; Ludger Staib; Clemens F. Hess; Karin Weigang-Köhler; Gerhard G. Grabenbauer; Hans Hoffmanns; Fritz Lindemann; Anke Schlenska-Lange; Gunnar Folprecht; Rolf Sauer
BACKGROUND Preoperative chemoradiotherapy, total mesorectal excision surgery, and adjuvant chemotherapy with fluorouracil is the standard combined modality treatment for rectal cancer. With the aim of improving disease-free survival (DFS), this phase 3 study (CAO/ARO/AIO-04) integrated oxaliplatin into standard treatment. METHODS This was a multicentre, open-label, randomised, phase 3 study in patients with histologically proven carcinoma of the rectum with clinically staged T3-4 or any node-positive disease. Between July 25, 2006, and Feb 26, 2010, patients were randomly assigned to two groups: a control group receiving standard fluorouracil-based combined modality treatment, consisting of preoperative radiotherapy of 50·4 Gy plus infusional fluorouracil (1000 mg/m(2) days 1-5 and 29-33), followed by surgery and four cycles of bolus fluorouracil (500 mg/m(2) days 1-5 and 29; fluorouracil group); and an experimental group receiving preoperative radiotherapy of 50·4 Gy plus infusional fluorouracil (250 mg/m(2) days 1-14 and 22-35) and oxaliplatin (50 mg/m(2) days 1, 8, 22, and 29), followed by surgery and eight cycles of adjuvant chemotherapy with oxaliplatin (100 mg/m(2) days 1 and 15), leucovorin (400 mg/m(2) days 1 and 15), and infusional fluorouracil (2400 mg/m(2) days 1-2 and 15-16; fluorouracil plus oxaliplatin group). Randomisation was done with computer-generated block-randomisation codes stratified by centre, clinical T category (cT1-4 vs cT4), and clinical N category (cN0 vs cN1-2) without masking. DFS is the primary endpoint. Secondary endpoints, including toxicity, compliance, and histopathological response are reported here. Safety and compliance analyses included patients as treated, efficacy endpoints were analysed according to the intention-to-treat principle. This study is registered with ClinicalTrials.gov, number NCT00349076. FINDINGS Of the 1265 patients initially enrolled, 1236 were evaluable (613 in the fluorouracil plus oxaliplatin group and 623 in the fluorouracil group). Preoperative grade 3-4 toxic effects occurred in 140 (23%) of 606 patients who actually received fluorouracil and oxaliplatin during chemoradiotherapy and in 127 (20%) of 624 patients who actually received fluorouracil chemoradiotherapy. Grade 3-4 diarrhoea was more common in those who received fluorouracil and oxaliplatin during chemoradiotherapy than in those who received fluorouracil during chemoradiotherapy (73 patients [12%] vs 52 patients [8%]), as was grade 3-4 nausea or vomiting (23 [4%] vs nine [1%]). 516 (85%) of the 606 patients who received fluorouracil and oxaliplatin-based chemoradiotherapy had the full dose of chemotherapy, and 571 (94%) had the full dose of radiotherapy; as did 495 (79%) and 601 (96%) of 624 patients who received fluorouracil-based chemoradiotherapy, respectively. A pathological complete response was achieved in 103 (17%) of 591 patients who underwent surgery in the fluorouracil and oxaliplatin group and in 81 (13%) of 606 patients who underwent surgery in the fluorouracil group (odds ratio 1·40, 95% CI 1·02-1·92; p=0·038). In the fluorouracil and oxaliplatin group, 352 (81%) of 435 patients who began adjuvant chemotherapy completed all cycles (with or without dose reduction), as did 386 (83%) of 463 patients in the fluorouracil group. INTERPRETATION Inclusion of oxaliplatin into modified fluorouracil-based combined modality treatment was feasible and led to more patients achieving a pathological complete response than did standard treatment. Longer follow-up is needed to assess DFS. FUNDING German Cancer Aid (Deutsche Krebshilfe).
The American Statistician | 2006
Torsten Hothorn; Kurt Hornik; Mark A. van de Wiel; Achim Zeileis
Conditioning on the observed data is an important and flexible design principle for statistical test procedures. Although generally applicable, permutation tests currently in use are limited to the treatment of special cases, such as contingency tables or K-sample problems. A new theoretical framework for permutation tests opens up the way to a unified and generalized view. This article argues that the transfer of such a theory to practical data analysis has important implications in many applications and requires tools that enable the data analyst to compute on the theoretical concepts as closely as possible. We reanalyze four datasets by adapting the general conceptual framework to these challenging inference problems and using the coin add-on package in the R system for statistical computing to show what one can gain from going beyond the “classical” test procedures.
Journal of Computational and Graphical Statistics | 2008
Achim Zeileis; Torsten Hothorn; Kurt Hornik
Recursive partitioning is embedded into the general and well-established class of parametric models that can be fitted using M-type estimators (including maximum likelihood). An algorithm for model-based recursive partitioning is suggested for which the basic steps are: (1) fit a parametric model to a dataset; (2) test for parameter instability over a set of partitioning variables; (3) if there is some overall parameter instability, split the model with respect to the variable associated with the highest instability; (4) repeat the procedure in each of the daughter nodes. The algorithm yields a partitioned (or segmented) parametric model that can be effectively visualized and that subject-matter scientists are used to analyzing and interpreting.
Computational Statistics & Data Analysis | 2003
Torsten Hothorn; Berthold Lausen
The construction of simple classification rules is a frequent problem in medical research. Maximally selected rank statistics allow the evaluation of cutpoints, which provide the classification of observations into two groups by a continuous or ordinal predictor variable. The computation of the exact distribution of a maximally selected rank statistic is discussed and a new lower bound of the distribution is derived based on an extension of an algorithm for the exact distribution of a linear rank statistic. Therefore, the test based on the upper bound of the P-value is of level α. For small to moderate sample sizes the lower bound of the exact distribution is a substantial improvement compared to approximations based on an improved Bonferroni inequality or based on the asymptotic Gaussian process. The lower bound of the distribution is compared to the exact distribution by means of a simulation study and the proposal is illustrated by three clinical studies.
Lancet Oncology | 2015
Claus Rödel; Ullrich Graeven; Rainer Fietkau; Werner Hohenberger; Torsten Hothorn; Dirk Arnold; Ralf-Dieter Hofheinz; Michael Ghadimi; Hendrik A. Wolff; Marga Lang-Welzenbach; Hans-Rudolf Raab; Christian Wittekind; Philipp Ströbel; Ludger Staib; Martin Wilhelm; Gerhard G. Grabenbauer; Hans Hoffmanns; Fritz Lindemann; Anke Schlenska-Lange; Gunnar Folprecht; Rolf Sauer; Torsten Liersch
BACKGROUND Preoperative chemoradiotherapy with infusional fluorouracil, total mesorectal excision surgery, and postoperative chemotherapy with fluorouracil was established by the German CAO/ARO/AIO-94 trial as a standard combined modality treatment for locally advanced rectal cancer. Here we compare the previously established regimen with an investigational regimen in which oxaliplatin was added to both preoperative chemoradiotherapy and postoperative chemotherapy. METHODS In this multicentre, open-label, randomised, phase 3 study we randomly assigned patients with rectal adenocarcinoma, clinically staged as cT3-4 or any node-positive disease, to two groups: a control group receiving standard fluorouracil-based combined modality treatment, consisting of preoperative radiotherapy of 50·4 Gy in 28 fractions plus infusional fluorouracil (1000 mg/m(2) on days 1-5 and 29-33), followed by surgery and four cycles of bolus fluorouracil (500 mg/m(2) on days 1-5 and 29); or to an investigational group receiving preoperative radiotherapy of 50·4 Gy in 28 fractions plus infusional fluorouracil (250 mg/m(2) on days 1-14 and 22-35) and oxaliplatin (50 mg/m(2) on days 1, 8, 22, and 29), followed by surgery and eight cycles of oxaliplatin (100 mg/m(2) on days 1 and 15), leucovorin (400 mg/m(2) on days 1 and 15), and infusional fluorouracil (2400 mg/m(2) on days 1-2 and 15-16). Randomisation was done with computer-generated block-randomisation codes stratified by centre, clinical T category (cT1-3 vs cT4), and clinical N category (cN0 vs cN1-2) without masking. The primary endpoint was disease-free survival, defined as the time between randomisation and non-radical surgery of the primary tumour (R2 resection), locoregional recurrence after R0/1 resection, metastatic disease or progression, or death from any cause, whichever occurred first. Survival and cumulative incidence of recurrence analyses followed the intention-to-treat principle; toxicity analyses included all patients treated. Enrolment of patients in this trial is completed and follow-up is ongoing. This study is registered with ClinicalTrials.gov, number NCT00349076. FINDINGS Of the 1265 patients initially enrolled, 1236 were assessable (613 in the investigational group and 623 in the control group). With a median follow-up of 50 months (IQR 38-61), disease-free survival at 3 years was 75·9% (95% CI 72·4-79·5) in the investigational group and 71·2% (95% CI 67·6-74·9) in the control group (hazard ratio [HR] 0·79, 95% CI 0·64-0·98; p=0·03). Preoperative grade 3-4 toxic effects occurred in 144 (24%) of 607 patients who actually received fluorouracil and oxaliplatin during chemoradiotherapy and in 128 (20%) of 625 patients who actually received fluorouracil chemoradiotherapy. Of 445 patients who actually received adjuvant fluorouracil and leucovorin and oxaliplatin, 158 (36%) had grade 3-4 toxic effects, as did 170 (36%) of 470 patients who actually received adjuvant fluorouracil. Late grade 3-4 adverse events in patients who received protocol-specified preoperative and postoperative treatment occurred in 112 (25%) of 445 patients in the investigational group, and in 100 (21%) of 470 patients in the control group. INTERPRETATION Adding oxaliplatin to fluorouracil-based neoadjuvant chemoradiotherapy and adjuvant chemotherapy (at the doses and intensities used in this trial) significantly improved disease-free survival of patients with clinically staged cT3-4 or cN1-2 rectal cancer compared with our former fluorouracil-based combined modality regimen (based on CAO/ARO/AIO-94). The regimen established by CAO/ARO/AIO-04 can be deemed a new treatment option for patients with locally advanced rectal cancer. FUNDING German Cancer Aid (Deutsche Krebshilfe).