Thomas Kneib
University of Göttingen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Kneib.
Journal of the American Statistical Association | 2011
Nora Fenske; Thomas Kneib; Torsten Hothorn
We investigated the risk factors for childhood malnutrition in India based on the 2005/2006 Demographic and Health Survey by applying a novel estimation technique for additive quantile regression. Ordinary linear and generalized linear regression models relate the mean of a response variable to a linear combination of covariate effects, and, as a consequence, focus on average properties of the response. The use of such a regression model for analyzing childhood malnutrition in developing or transition countries implies that the estimated effects describe the average nutritional status. However, it is of even greater interest to analyze quantiles of the response distribution, such as the 5% or 10% quantile, which relate to the risk of extreme malnutrition. Our investigation is based on a semiparametric extension of quantile regression models where different types of nonlinear effects are included in the model equation, leading to additive quantile regression. We addressed the variable selection and model choice problems associated with estimating such an additive quantile regression model using a novel boosting approach. Our proposal allows for data-driven determination of the amount of smoothness required for the nonlinear effects and combines model choice with an automatic variable selection property. In an empirical evaluation, we compared our boosting approach with state-of-the-art methods for additive quantile regression. The results suggest that boosting is an appropriate tool for estimation and variable selection in additive quantile regression models and helps to identify yet unknown risk factors for childhood malnutrition. This article has supplementary material online.
Biometrics | 2009
Thomas Kneib; Torsten Hothorn; Gerhard Tutz
SUMMARY Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.
Statistical Modelling | 2013
Elisabeth Waldmann; Thomas Kneib; Yu Ryan Yu; Stefan Lang
Quantile regression provides a convenient framework for analyzing the impact of covariates on the complete conditional distribution of a response variable instead of only the mean. While frequentist treatments of quantile regression are typically completely nonparametric, a Bayesian formulation relies on assuming the asymmetric Laplace distribution as auxiliary error distribution that yields posterior modes equivalent to frequentist estimates. In this paper, we utilize a location-scale mixture of normals representation of the asymmetric Laplace distribution to transfer different flexible modelling concepts from Gaussian mean regression to Bayesian semiparametric quantile regression. In particular, we will consider high-dimensional geoadditive models comprising LASSO regularization priors and mixed models with potentially non-normal random effects distribution modeled via a Dirichlet process mixture. These extensions are illustrated using two large-scale applications on net rents in Munich and longitudinal measurements on obesity among children. The impact of the likelihood misspecification that underlies the Bayesian formulation of quantile regression is studied in terms of simulations.
Infection Control and Hospital Epidemiology | 2009
Jan Beyersmann; Thomas Kneib; Martin Schumacher; Petra Gastmeier
Nosocomial pneumonia and its impact on length of stay are major healthcare concerns. From an epidemiological perspective, nosocomial pneumonia is a time-dependent event. Any statistical analysis that does not explicitly model this time dependency will be biased. The bias is not redeemed by adjusting for baseline information.
Journal of the American Statistical Association | 2010
Tatyana Krivobokova; Thomas Kneib; Gerda Claeskens
In this article we construct simultaneous confidence bands for a smooth curve using penalized spline estimators. We consider three types of estimation methods: (a) as a standard (fixed effect) nonparametric model, (b) using the mixed-model framework with the spline coefficients as random effects, and (c) a full Bayesian approach. The volume-of-tube formula is applied for the first two methods and compared with Bayesian simultaneous confidence bands from a frequentist perspective. We show that the mixed-model formulation of penalized splines can help obtain, at least approximately, confidence bands with either Bayesian or frequentist properties. Simulations and data analysis support the proposed methods. The R package ConfBands accompanies the article.
Statistics and Computing | 2010
Ludwig Fahrmeir; Thomas Kneib; Susanne Konrath
This paper surveys various shrinkage, smoothing and selection priors from a unifying perspective and shows how to combine them for Bayesian regularisation in the general class of structured additive regression models. As a common feature, all regularisation priors are conditionally Gaussian, given further parameters regularising model complexity. Hyperpriors for these parameters encourage shrinkage, smoothness or selection. It is shown that these regularisation (log-) priors can be interpreted as Bayesian analogues of several well-known frequentist penalty terms. Inference can be carried out with unified and computationally efficient MCMC schemes, estimating regularised regression coefficients and basis function coefficients simultaneously with complexity parameters and measuring uncertainty via corresponding marginal posteriors. For variable and function selection we discuss several variants of spike and slab priors which can also be cast into the framework of conditionally Gaussian priors. The performance of the Bayesian regularisation approaches is demonstrated in a hazard regression model and a high-dimensional geoadditive regression model.
Journal of the American Statistical Association | 2015
Nadja Klein; Thomas Kneib; Stefan Lang
Frequent problems in applied research preventing the application of the classical Poisson log-linear model for analyzing count data include overdispersion, an excess of zeros compared to the Poisson distribution, correlated responses, as well as complex predictor structures comprising nonlinear effects of continuous covariates, interactions or spatial effects. We propose a general class of Bayesian generalized additive models for zero-inflated and overdispersed count data within the framework of generalized additive models for location, scale, and shape where semiparametric predictors can be specified for several parameters of a count data distribution. As standard options for applied work we consider the zero-inflated Poisson, the negative binomial and the zero-inflated negative binomial distribution. The additive predictor specifications rely on basis function approximations for the different types of effects in combination with Gaussian smoothness priors. We develop Bayesian inference based on Markov chain Monte Carlo simulation techniques where suitable proposal densities are constructed based on iteratively weighted least squares approximations to the full conditionals. To ensure practicability of the inference, we consider theoretical properties like the involved question whether the joint posterior is proper. The proposed approach is evaluated in simulation studies and applied to count data arising from patent citations and claim frequencies in car insurances. For the comparison of models with respect to the distribution, we consider quantile residuals as an effective graphical device and scoring rules that allow us to quantify the predictive ability of the models. The deviance information criterion is used to select appropriate predictor specifications once a response distribution has been chosen. Supplementary materials for this article are available online.
Ecological Monographs | 2011
Torsten Hothorn; Jörg Müller; Boris Schröder; Thomas Kneib; Roland Brandl
Species distribution models are an important tool to predict the impact of global change on species distributional ranges and community assemblages. Although considerable progress has been made in the statistical modeling during the last decade, many approaches still ignore important features of species distributions, such as nonlinearity and interactions between predictors, spatial autocorrelation, and nonstationarity, or at most incorporate only some of these features. Ecologists, however, require a modeling framework that simultaneously addresses all these features flexibly and consistently. Here we describe such an approach that allows the estimation of the global effects of environmental variables in addition to local components dealing with spatiotemporal autocorrelation as well as nonstationary effects. The local components can be used to infer unknown spatiotemporal processes; the global component describes how the species is influenced by the environment and can be used for predictions, allowing ...
Statistics and Computing | 2014
Stefan Lang; Nikolaus Umlauf; Peter Wechselberger; Kenneth Harttgen; Thomas Kneib
Models with structured additive predictor provide a very broad and rich framework for complex regression modeling. They can deal simultaneously with nonlinear covariate effects and time trends, unit- or cluster-specific heterogeneity, spatial heterogeneity and complex interactions between covariates of different type. In this paper, we propose a hierarchical or multilevel version of regression models with structured additive predictor where the regression coefficients of a particular nonlinear term may obey another regression model with structured additive predictor. In that sense, the model is composed of a hierarchy of complex structured additive regression models. The proposed model may be regarded as an extended version of a multilevel model with nonlinear covariate terms in every level of the hierarchy. The model framework is also the basis for generalized random slope modeling based on multiplicative random effects. Inference is fully Bayesian and based on Markov chain Monte Carlo simulation techniques. We provide an in depth description of several highly efficient sampling schemes that allow to estimate complex models with several hierarchy levels and a large number of observations within a couple of minutes (often even seconds). We demonstrate the practicability of the approach in a complex application on childhood undernutrition with large sample size and three hierarchy levels.
Computational Statistics & Data Analysis | 2012
Fabian Sobotka; Thomas Kneib
Quantile regression has emerged as one of the standard tools for regression analysis that enables a proper assessment of the complete conditional distribution of responses even in the presence of heteroscedastic errors. Quantile regression estimates are obtained by minimising an asymmetrically weighted sum of absolute deviations from the regression line, a decision theoretic formulation of the estimation problem that avoids a full specification of the error term distribution. Recent advances in mean regression have concentrated on making the regression structure more flexible by including nonlinear effects of continuous covariates, random effects or spatial effects. These extensions often rely on penalised least squares or penalised likelihood estimation with quadratic penalties and may therefore be difficult to combine with the linear programming approaches often considered in quantile regression. As a consequence, geoadditive expectile regression based on minimising an asymmetrically weighted sum of squared residuals is introduced. Different estimation procedures are presented including least asymmetrically weighted squares, boosting and restricted expectile regression. The properties of these procedures are investigated in a simulation study and an analysis on rental fees in Munich is provided where the geoadditive specification allows for an analysis of nonlinear effects of the size of flats or the year of construction and the spatial distribution of rents simultaneously.