Sanne Engelen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sanne Engelen is active.

Explore More

Publication

Featured researches published by Sanne Engelen.

Bioinformatics | 2004

Robust PCA and classification in biosciences

Mia Hubert; Sanne Engelen

MOTIVATION Principal components analysis (PCA) is a very popular dimension reduction technique that is widely used as a first step in the analysis of high-dimensional microarray data. However, the classical approach that is based on the mean and the sample covariance matrix of the data is very sensitive to outliers. Also, classification methods based on this covariance matrix do not give good results in the presence of outlying measurements. RESULTS First, we propose a robust PCA (ROBPCA) method for high-dimensional data. It combines projection-pursuit ideas with robust estimation of low-dimensional data. We also propose a diagnostic plot to display and classify the outliers. This ROBPCA method is applied to several bio-chemical datasets. In one example, we also apply a robust discriminant method on the scores obtained with ROBPCA. We show that this combination of robust methods leads to better classifications than classical PCA and quadratic discriminant analysis. AVAILABILITY All the programs are part of the Matlab Toolbox for Robust Calibration, available at http://www.wis.kuleuven.ac.be/stat/robust.html.

Critical Reviews in Analytical Chemistry | 2006

Robustness and Outlier Detection in Chemometrics

Peter J. Rousseeuw; Michiel Debruyne; Sanne Engelen; Mia Hubert

In analytical chemistry, experimental data often contain outliers of one type or another. The most often used chemometrical/statistical techniques are sensitive to such outliers, and the results may be adversely affected by them. This paper presents an overview of robust chemometrical/statistical methods which search for the model fitted by the majority of the data, and hence are far less affected by outliers. As an extra benefit, we can then detect the outliers by their large deviation from the robust fit. We discuss robust procedures for estimating location and scatter, and for performing multiple linear regression, PCA, PCR, PLS, and classification. We also describe recent results concerning the robustness of Support Vector Machines, which are kernel-based methods for fitting non-linear models. Finally, we present robust approaches for the analysis of multiway data.

Computational Statistics & Data Analysis | 2007

Fast cross-validation of high-breakdown resampling methods for PCA

Mia Hubert; Sanne Engelen

Cross-validation (CV) is a very popular technique for model selection and model validation. The general procedure of leave-one-out CV (LOO-CV) is to exclude one observation from the data set, to construct the fit of the remaining observations and to evaluate that fit on the item that was left out. In classical procedures such as least-squares regression or kernel density estimation, easy formulas can be derived to compute this CV fit or the residuals of the removed observations. However, when high-breakdown resampling algorithms are used, it is no longer possible to derive such closed-form expressions. High-breakdown methods are developed to obtain estimates that can withstand the effects of outlying observations. Fast algorithms are presented for LOO-CV when using a high-breakdown method based on resampling, in the context of robust covariance estimation by means of the MCD estimator and robust principal component analysis. A robust PRESS curve is introduced as an exploratory tool to select the number of principal components. Simulation results and applications on real data show the accuracy and the gain in computation time of these fast CV algorithms.

Analytica Chimica Acta | 2011

Detecting outlying samples in a parallel factor analysis model

Sanne Engelen; Mia Hubert

To explore multi-way data, different methods have been proposed. Here, we study the popular PARAFAC (Parallel factor analysis) model, which expresses multi-way data in a more compact way, without ignoring the underlying complex structure. To estimate the score and loading matrices, an alternating least squares procedure is typically used. It is however well known that least squares techniques suffer from outlying observations, making the models useless when outliers are present in the data. In this paper, we present a robust PARAFAC method. Essentially, it searches for an outlier-free subset of the data, on which we can then perform the classical PARAFAC algorithm. An outlier map is constructed to identify outliers. Simulations and examples show the robustness of our approach.

Theory and applications of recent robust methods. - Basel, 2004 | 2004

Robust PCR and Robust PLSR: a Comparative Study

Sanne Engelen; Mia Hubert; K. Vanden Branden; Sabine Verboven

Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR) are the two most popular regression techniques in chemo-metrics. They both fit a linear relationship between two sets of variables. The responses are usually low-dimensional whereas the regressors are very numerous compared to the number of observations. In this paper we compare two recent robust PCR and PLSR methods and their classical versions in terms of efficiency, goodness-of-fit, predictive power and robustness.

Austrian Journal of Statistics | 2016