Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Achim Zeileis is active.

Publication


Featured researches published by Achim Zeileis.


Journal of Computational and Graphical Statistics | 2006

Unbiased Recursive Partitioning: A Conditional Inference Framework

Torsten Hothorn; Kurt Hornik; Achim Zeileis

Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously affects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, confirming the need for an unbiased variable selection. Moreover, it is shown that the prediction accuracy of trees with early stopping is equivalent to the prediction accuracy of pruned trees with unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on glaucoma classification, node positive breast cancer survival and mammography experience are re-analyzed.


BMC Bioinformatics | 2007

Bias in random forest variable importance measures: Illustrations, sources and a solution

Carolin Strobl; Anne-Laure Boulesteix; Achim Zeileis; Torsten Hothorn

BackgroundVariable importance measures for random forests have been receiving increased attention as a means of variable selection in many classification tasks in bioinformatics and related scientific fields, for instance to select a subset of genetic markers relevant for the prediction of a certain disease. We show that random forest variable importance measures are a sensible means for variable selection in many applications, but are not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. This is particularly important in genomics and computational biology, where predictors often include variables of different types, for example when predictors include both sequence data and continuous variables such as folding energy, or when amino acid sequence data show different numbers of categories.ResultsSimulation studies are presented illustrating that, when random forest variable importance measures are used with data of varying types, the results are misleading because suboptimal predictor variables may be artificially preferred in variable selection. The two mechanisms underlying this deficiency are biased variable selection in the individual classification trees used to build the random forest on one hand, and effects induced by bootstrap sampling with replacement on the other hand.ConclusionWe propose to employ an alternative implementation of random forests, that provides unbiased variable selection in the individual classification trees. When this method is applied using subsampling without replacement, the resulting variable importance measures can be used reliably for variable selection even in situations where the potential predictor variables vary in their scale of measurement or their number of categories. The usage of both random forest algorithms and their variable importance measures in the R system for statistical computing is illustrated and documented thoroughly in an application re-analyzing data from a study on RNA editing. Therefore the suggested method can be applied straightforwardly by scientists in bioinformatics research.


Computational Statistics & Data Analysis | 2003

Testing and dating of structural changes in practice

Achim Zeileis; Christian Kleiber; Walter Krämer; Kurt Hornik

The paper presents an approach to the analysis of data that contains (multiple) structural changes in a linear regression setup. We implement various strategies which have been suggested in the literature for testing against structural changes as well as a dynamic programming algorithm for the dating of the breakpoints in the R statistical software package. Using historical data on Nile river discharges, road casualties in Great Britain and oil prices in Germany it is shown that changes in the mean of a time series as well as in the coefficients of a linear regression are easily matched with identifiable historical, political or economic events.


The American Statistician | 2006

A Lego System for Conditional Inference

Torsten Hothorn; Kurt Hornik; Mark A. van de Wiel; Achim Zeileis

Conditioning on the observed data is an important and flexible design principle for statistical test procedures. Although generally applicable, permutation tests currently in use are limited to the treatment of special cases, such as contingency tables or K-sample problems. A new theoretical framework for permutation tests opens up the way to a unified and generalized view. This article argues that the transfer of such a theory to practical data analysis has important implications in many applications and requires tools that enable the data analyst to compute on the theoretical concepts as closely as possible. We reanalyze four datasets by adapting the general conceptual framework to these challenging inference problems and using the coin add-on package in the R system for statistical computing to show what one can gain from going beyond the “classical” test procedures.


Journal of Computational and Graphical Statistics | 2008

Model-Based Recursive Partitioning

Achim Zeileis; Torsten Hothorn; Kurt Hornik

Recursive partitioning is embedded into the general and well-established class of parametric models that can be fitted using M-type estimators (including maximum likelihood). An algorithm for model-based recursive partitioning is suggested for which the basic steps are: (1) fit a parametric model to a dataset; (2) test for parameter instability over a set of partitioning variables; (3) if there is some overall parameter instability, split the model with respect to the variable associated with the highest instability; (4) repeat the procedure in each of the daughter nodes. The algorithm yields a partitioned (or segmented) parametric model that can be effectively visualized and that subject-matter scientists are used to analyzing and interpreting.


Econometric Reviews | 2005

A Unified Approach to Structural Change Tests Based on ML Scores, F Statistics, and OLS Residuals

Achim Zeileis

ABSTRACT Three classes of structural change tests (or tests for parameter instability) that have been receiving much attention in both the statistics and the econometrics communities but have been developed in rather loosely connected lines of research are unified by embedding them into the framework of generalized M-fluctuation tests (Zeileis and Hornik, 2003). These classes are tests based on maximum likelihood scores (including the Nyblom–Hansen test), on F statistics (sup F, ave F, exp F tests), and on OLS residuals (OLS-based CUSUM and MOSUM tests). We show that (representatives from) these classes are special cases of the generalized M-fluctuation tests, based on the same functional central limit theorem but employing different functionals for capturing excessive fluctuations. After embedding these tests into the same framework and thus understanding the relationship between these procedures for testing in historical samples, it is shown how the tests can also be extended to a monitoring situation. This is achieved by establishing a general M-fluctuation monitoring procedure and then applying the different functionals corresponding to monitoring with ML scores, F statistics, and OLS residuals. In particular, an extension of the sup F test to a monitoring scenario is suggested and illustrated on a real-world data set.


Journal of Computational and Graphical Statistics | 2005

The Design and Analysis of Benchmark Experiments

Torsten Hothorn; Friedrich Leisch; Achim Zeileis; Kurt Hornik

The assessment of the performance of learners by means of benchmark experiments is an established exercise. In practice, benchmark studies are a tool to compare the performance of several competing algorithms for a certain learning problem. Cross-validation or resampling techniques are commonly used to derive point estimates of the performances which are compared to identify algorithms with good properties. For several benchmarking problems, test procedures taking the variability of those point estimates into account have been suggested. Most of the recently proposed inference procedures are based on special variance estimators for the cross-validated performance. We introduce a theoretical framework for inference problems in benchmark experiments and show that standard statistical test procedures can be used to test for differences in the performances. The theory is based on well-defined distributions of performance measures which can be compared with established tests. To demonstrate the usefulness in practice, the theoretical results are applied to regression and classification benchmark studies based on artificial and real world data.


Remote Sensing | 2013

Shifts in Global Vegetation Activity Trends

Rogier de Jong; Jan Verbesselt; Achim Zeileis; Michael E. Schaepman

Vegetation belongs to the components of the Earth surface, which are most extensively studied using historic and present satellite records. Recently, these records exceeded a 30-year time span composed of preprocessed fortnightly observations (1981–2011). The existence of monotonic changes and trend shifts present in such records has previously been demonstrated. However, information on timing and type of such trend shifts was lacking at global scale. In this work, we detected major shifts in vegetation activity trends and their associated type (either interruptions or reversals) and timing. It appeared that the biospheric trend shifts have, over time, increased in frequency, confirming recent findings of increased turnover rates in vegetated areas. Signs of greening-to-browning reversals around the millennium transition were found in many regions (Patagonia, the Sahel, northern Kazakhstan, among others), as well as negative interruptions—“setbacks”—in greening trends (southern Africa, India, Asia Minor, among others). A minority (26%) of all significant trends appeared monotonic.


Biometrics | 2008

Generalized Maximally Selected Statistics

Torsten Hothorn; Achim Zeileis

SUMMARY Maximally selected statistics for the estimation of simple cutpoint models are embedded into a generalized conceptual framework based on conditional inference procedures. This powerful framework contains most of the published procedures in this area as special cases, such as maximally selected chi(2) and rank statistics, but also allows for direct construction of new test procedures for less standard test problems. As an application, a novel maximally selected rank statistic is derived from this framework for a censored response partitioned with respect to two ordered categorical covariates and potential interactions. This new test is employed to search for a high-risk group of rectal cancer patients treated with a neo-adjuvant chemoradiotherapy. Moreover, a new efficient algorithm for the evaluation of the asymptotic distribution for a large class of maximally selected statistics is given enabling the fast evaluation of a large number of cutpoints.


Canadian Journal of Zoology | 2009

Responding to spatial and temporal variations in predation risk: space use of a game species in a changing landscape of fear

Vincent Tolon; Stéphane Dray; Anne Loison; Achim Zeileis; Claude Fischer; E. Baubet

Predators generate a “landscape of fear” within which prey can minimize the risk of predation by selecting low-risk areas. Depending on the spatial structure of this “landscape”, i.e., whether it is coarse- or fine-grained, prey may respond to increased risk by shifting their home ranges or by fine-scale redistributions within these ranges, respectively. We studied how wild boar (Sus scrofa L., 1758) responded to temporal changes in risk in hunted areas (risky habitat) surrounding a nature reserve (refuge habitat). Animals with home ranges “in contact” with the reserve during the low-risk season were the only ones to shift toward the refuge when the risk increased. These shifts occurred at two temporal scales in response to the increased risk during the daytime and during the hunting season. Whereas animals not influenced by the reserve found food and shelter in forest during the hunting season, shifts to the refuge area were detrimental to the rather scarce forest areas in the reserve. This confirms that...

Collaboration


Dive into the Achim Zeileis's collaboration.

Top Co-Authors

Avatar

Kurt Hornik

Vienna University of Economics and Business

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christian Kleiber

Technical University of Dortmund

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Carolin Strobl

Ludwig Maximilian University of Munich

View shared research outputs
Top Co-Authors

Avatar

David Meyer

Vienna University of Economics and Business

View shared research outputs
Top Co-Authors

Avatar

Christoph Leitner

Vienna University of Economics and Business

View shared research outputs
Top Co-Authors

Avatar

Jan Verbesselt

Wageningen University and Research Centre

View shared research outputs
Researchain Logo
Decentralizing Knowledge