Ludger Evers
University of Glasgow
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ludger Evers.
Statistics and Computing | 2005
Jochen Einbeck; Gerhard Tutz; Ludger Evers
Principal components are a well established tool in dimension reduction. The extension to principal curves allows for general smooth curves which pass through the middle of a multidimensional data cloud. In this paper local principal curves are introduced, which are based on the localization of principal component analysis. The proposed algorithm is able to identify closed curves as well as multiple curves which may or may not be connected. For the evaluation of the performance of principal curves as tool for data reduction a measure of coverage is suggested. By use of simulated and real data sets the approach is compared to various alternative concepts of principal curves.
Environmental Modelling and Software | 2014
Wayne R. Jones; Michael J. Spence; Adrian Bowman; Ludger Evers; Daniel A. Molinari
The GroundWater Spatiotemporal Data Analysis Tool (GWSDAT) is a user friendly, open source, decision support tool for the analysis and reporting of groundwater monitoring data. Uniquely, GWSDAT applies a spatiotemporal model smoother for a more coherent and smooth interpretation of the interaction in spatial and time-series components of groundwater solute concentrations. Data entry is via a standardised Microsoft Excel input template whilst the underlying statistical modelling and graphical output are generated using the open source statistical program R. This paper describes in detail the various plotting options available and how the graphical user interface can be used for rapid, rigorous and interactive trend analysis with facilitated report generation. GWSDAT has been used extensively in the assessment of soil and groundwater conditions at Shells downstream assets and the discussion section describes the benefits of its applied use. Finally, some consideration is given to possible future developments.
arXiv: Astrophysics | 2008
Jochen Einbeck; Ludger Evers; Coryn A. L. Bailer-Jones
Often the relation between the variables constituting amultivariate data space might be characterized by one or more of the terms: “nonlinear”, “branched”, “disconnected”, “bended”, “curved”, “heterogeneous”, or, more general, “complex”. In these cases, simple principal component analysis (PCA) as a tool for dimension reduction can fail badly. Of the many alternative approaches proposed so far, local approximations of PCA are among the most promising. This paper will give a short review of localized versions of PCA, focusing on local principal curves and local partitioning algorithms. Furthermore we discuss projections other than the local principal components. When performing local dimension reduction for regression or classification problems it is important to focus not only on the manifold structure of the covariates, but also on the response variable(s). Local principal components only achieve the former, whereas localized regression approaches concentrate on the latter. Local projection directions derived from the partial least squares (PLS) algorithm offer an interesting trade-off between these two objectives. We apply these methods to several real data sets. In particular, we consider simulated astrophysical data from the future Galactic survey mission Gaia.
International Journal of Neural Systems | 2010
Jochen Einbeck; Ludger Evers; Benedict Powell
We consider principal curves and surfaces in the context of multivariate regression modelling. For predictor spaces featuring complex dependency patterns between the involved variables, the intrinsic dimensionality of the data tends to be very small due to the high redundancy induced by the dependencies. In situations of this type, it is useful to approximate the high-dimensional predictor space through a low-dimensional manifold (i.e., a curve or a surface), and use the projections onto the manifold as compressed predictors in the regression problem. In the case that the intrinsic dimensionality of the predictor space equals one, we use the local principal curve algorithm for the the compression step. We provide a novel algorithm which extends this idea to local principal surfaces, thus covering cases of an intrinsic dimensionality equal to two, which is in principle extendible to manifolds of arbitrary dimension. We motivate and apply the novel techniques using astrophysical and oceanographic data examples.
Journal of Computational and Graphical Statistics | 2009
Ludger Evers; T. J. Heaton
This article is concerned with the application of thresholding to the estimation of possibly sparse single sequence data observed subject to noise. In such problems, accuracy can be greatly improved by selecting a threshold that adapts to the unknown signal strength. We set out a classification and regression tree approach aimed at partitioning a sequence of inhomogeneous strength into component homogeneous regions where we can independently set a locally adaptive threshold and thus improve estimation. Our method places a mixture prior on each coefficient consisting of an atom of probability at zero and a symmetric probability density. The mixing weight is chosen via Empirical Bayes. The decision on whether a split should occur is based on a score test. Having selected the partitioning and obtained the local mixing weight for each region, estimation is carried out using the posterior median. We evaluate the performance of our method in the single sequence case and for wavelet denoising on both simulated and real data. In the wavelet context we consider two alternative implementations, splitting the coefficients levelwise and splitting the original domain. Our method is cheap to compute and in numerical comparisons our method shows excellent performance when compared with current thresholding techniques. This article has supplementary material online.
Fink, A. & Lausen, B. & Seidel, W. & Ultsch, A. (Eds.). (2010). Advances in data analysis, data handling and business intelligence. Berlin: Springer, pp. 701-712, Studies in classification, data analysis, and knowledge organization | 2009
Jochen Einbeck; Ludger Evers; Kirsty Hinchliff
Frequently the predictor space of a multivariate regression problem of the type y = m(x 1, …, x p ) + e is intrinsically one-dimensional, or at least of far lower dimension than p. Usual modeling attempts such as the additive model y = m 1(x 1) + … + m p (x p ) + e, which try to reduce the complexity of the regression problem by making additional structural assumptions, are then inefficient as they ignore the inherent structure of the predictor space and involve complicated model and variable selection stages. In a fundamentally different approach, one may consider first approximating the predictor space by a (usually nonlinear) curve passing through it, and then regressing the response only against the one-dimensional projections onto this curve. This entails the reduction from a p- to a one-dimensional regression problem.
Environmetrics | 2015
Ludger Evers; Daniel A. Molinari; Adrian Bowman; Wayne R. Jones; Michael J. Spence
Fitting statistical models to spatiotemporal data requires finding the right balance between imposing smoothness and following the data. In the context of P‐splines, we propose a Bayesian framework for choosing the smoothing parameter, which allows the construction of fully automatic data‐driven methods for fitting flexible models to spatiotemporal data. An implementation, which is highly computationally efficient and exploits the sparsity of the design and penalty matrices, is proposed. The findings are illustrated using a simulation study and two examples, all concerned with the modelling of contaminants in groundwater. This suggests that the proposed strategy is more stable that competing methods based on the use of criteria such as generalised cross‐validation and Akaikes Information Criterion.
Statistics and Computing | 2018
Charalampos Chanialidis; Ludger Evers; Tereza Neocleous; Agostino Nobile
COM-Poisson regression is an increasingly popular model for count data. Its main advantage is that it permits to model separately the mean and the variance of the counts, thus allowing the same covariate to affect in different ways the average level and the variability of the response variable. A key limiting factor to the use of the COM-Poisson distribution is the calculation of the normalisation constant: its accurate evaluation can be time-consuming and is not always feasible. We circumvent this problem, in the context of estimating a Bayesian COM-Poisson regression, by resorting to the exchange algorithm, an MCMC method applicable to situations where the sampling model (likelihood) can only be computed up to a normalisation constant. The algorithm requires to draw from the sampling model, which in the case of the COM-Poisson distribution can be done efficiently using rejection sampling. We illustrate the method and the benefits of using a Bayesian COM-Poisson regression model, through a simulation and two real-world data sets with different levels of dispersion.
Transportation Research Record | 2007
Ludger Evers; Diego Santapaola
This paper proposes a modified version of the iterative proportional fitting (IPF) algorithm that can be used to combine contingency tables with missing dimensions. In many situations data from more than one survey could be used to answer a certain question of interest. Although using only one of the data sets is the simplest approach, it is uneconomical because information contained in the other data sets is not used. When combining data from different sources, one often faces the problem that the data have not been collected in exactly the same way, so certain information is available only in some data sets. Dominici proposed a hierarchical Bayesian model for combining contingency tables with missing dimensions. Although this is an elegant theory, it has a number of shortcomings, the most important one being that it has a high time and memory complexity. In other words, this method is too slow to be useful for traffic count data in which the number of cells easily exceeds several thousands. The IPF algorithm proposed here consists of iterations whose time complexity is linear only in the number of cells, so it can easily be applied to huge contingency tables such as those obtained from traffic surveys.
Bioinformatics | 2008
Ludger Evers; Claudia-Martina Messow