Michiel Debruyne
University of Antwerp
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michiel Debruyne.
international conference on artificial neural networks | 2009
Kris De Brabanter; Kristiaan Pelckmans; Jos De Brabanter; Michiel Debruyne; Johan A. K. Suykens; Mia Hubert; Bart De Moor
It has been shown that Kernel Based Regression (KBR) with a least squares loss has some undesirable properties from robustness point of view. KBR with more robust loss functions, e.g. Huber or logistic losses, often give rise to more complicated computations. In this work the practical consequences of this sensitivity are explained, including the breakdown of Support Vector Machines (SVM) and weighted Least Squares Support Vector Machines (LS-SVM) for regression. In classical statistics, robustness is improved by reweighting the original estimate. We study the influence of reweighting the LS-SVM estimate using four different weight functions. Our results give practical guidelines in order to choose the weights, providing robustness and fast convergence. It turns out that Logistic and Myriad weights are suitable reweighting schemes when outliers are present in the data. In fact, the Myriad shows better performance over the others in the presence of extreme outliers (e.g. Cauchy distributed errors). These findings are then illustrated on toy example as well as on a real life data sets.
Advanced Data Analysis and Classification | 2010
Michiel Debruyne; Tim Verdonck
Kernel principal component analysis (KPCA) extends linear PCA from a real vector space to any high dimensional kernel feature space. The sensitivity of linear PCA to outliers is well-known and various robust alternatives have been proposed in the literature. For KPCA such robust versions received considerably less attention. In this article we present kernel versions of three robust PCA algorithms: spherical PCA, projection pursuit and ROBPCA. These robust KPCA algorithms are analyzed in a classification context applying discriminant analysis on the KPCA scores. The performances of the different robust KPCA algorithms are studied in a simulation study comparing misclassification percentages, both on clean and contaminated data. An outlier map is constructed to visualize outliers in such classification problems. A real life example from protein classification illustrates the usefulness of robust KPCA and its corresponding outlier map.
Computational Statistics & Data Analysis | 2010
Michiel Debruyne; Mia Hubert; Johan Van Horebeek
Kernel Principal Component Analysis extends linear PCA from a Euclidean space to any reproducing kernel Hilbert space. Robustness issues for Kernel PCA are studied. The sensitivity of Kernel PCA to individual observations is characterized by calculating the influence function. A robust Kernel PCA method is proposed by incorporating kernels in the Spherical PCA algorithm. Using the scores from Spherical Kernel PCA, a graphical diagnostic is proposed to detect points that are influential for ordinary Kernel PCA.
Journal of Multivariate Analysis | 2010
Michiel Debruyne; Andreas Christmann; Mia Hubert; Johan A. K. Suykens
Kernel Based Regression (KBR) minimizes a convex risk over a possibly infinite dimensional reproducing kernel Hilbert space. Recently, it was shown that KBR with a least squares loss function may have some undesirable properties from a robustness point of view: even very small amounts of outliers can dramatically affect the estimates. KBR with other loss functions is more robust, but often gives rise to more complicated computations (e.g. for Huber or logistic losses). In classical statistics robustness is often improved by reweighting the original estimate. In this paper we provide a theoretical framework for reweighted Least Squares KBR (LS-KBR) and analyze its robustness. Some important differences are found with respect to linear regression, indicating that LS-KBR with a bounded kernel is much more suited for reweighting. In two special cases our results can be translated into practical guidelines for a good choice of weights, providing robustness as well as fast convergence. In particular a logistic weight function seems an appropriate choice, not only to downweight outliers, but also to improve performance at heavy tailed distributions. For the latter some heuristic arguments are given comparing concepts from robustness and stability.
Wiley Interdisciplinary Reviews: Computational Statistics | 2018
Mia Hubert; Michiel Debruyne; Peter J. Rousseeuw
The Minimum Covariance Determinant (MCD) method is a highly robust estimator of multivariate location and scatter, for which a fast algorithm is available. Since estimating the covariance matrix is the cornerstone of many multivariate statistical methods, the MCD is an important building block when developing robust multivariate techniques. It also serves as a convenient and efficient tool for outlier detection. The MCD estimator is reviewed, along with its main properties such as affine equivariance, breakdown value, and influence function. We discuss its computation, and list applications and extensions of the MCD in applied and methodological multivariate statistics. Two recent extensions of the MCD are described. The first one is a fast deterministic algorithm which inherits the robustness of the MCD while being almost affine equivariant. The second is tailored to high-dimensional data, possibly with more dimensions than cases, and incorporates regularization to prevent singular matrices.
Statistics and Computing | 2018
Michiel Debruyne; Sebastiaan Höppner; Sven Serneels; Tim Verdonck
Outlier detection is an inevitable step to most statistical data analyses. However, the mere detection of an outlying case does not always answer all scientific questions associated with that data point. Outlier detection techniques, classical and robust alike, will typically flag the entire case as outlying, or attribute a specific case weight to the entire case. In practice, particularly in high dimensional data, the outlier will most likely not be outlying along all of its variables, but just along a subset of them. If so, the scientific question why the case has been flagged as an outlier becomes of interest. In this article, a fast and efficient method is proposed to detect variables that contribute most to an outlier’s outlyingness. Thereby, it helps the analyst understand why an outlier lies out. The approach pursued in this work is to estimate the univariate direction of maximal outlyingness. It is shown that the problem of estimating that direction can be rewritten as the normed solution of a classical least squares regression problem. Identifying the subset of variables contributing most to outlyingness, can thus be achieved by estimating the associated least squares problem in a sparse manner. From a practical perspective, sparse partial least squares (SPLS) regression, preferably by the fast sparse NIPALS (SNIPLS) algorithm, is suggested to tackle that problem. The proposed methodology is illustrated to perform well both on simulated data and real life examples.
Wiley Interdisciplinary Reviews: Computational Statistics | 2010
Mia Hubert; Michiel Debruyne
Statistics & Probability Letters | 2009
Michiel Debruyne; Mia Hubert
Journal of Chemometrics | 2009
Michiel Debruyne; Sven Serneels; Tim Verdonck
Archive | 2017
Michiel Debruyne; Sebastiaan Höppner; Sven Serneels; Tim Verdonck