Sabine Verboven | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sabine Verboven is active.

Explore More

Publication

Featured researches published by Sabine Verboven.

Chemometrics and Intelligent Laboratory Systems | 2002

A fast method for robust principal components with applications to chemometrics

Mia Hubert; Peter J. Rousseeuw; Sabine Verboven

When faced with high-dimensional data, one often uses principal component analysis (PCA) for dimension reduction. Classical PCA constructs a set of uncorrelated variables, which correspond to eigenvectors of the sample covariance matrix. However, it is well-known that this covariance matrix is strongly affected by anomalous observations. It is therefore necessary to apply robust methods that are resistant to possible outliers. Li and Chen [J. Am. Stat. Assoc. 80 (1985) 759] proposed a solution based on projection pursuit (PP). The idea is to search for the direction in which the projected observations have the largest robust scale. In subsequent steps, each new direction is constrained to be orthogonal to all previous directions. This method is very well suited for high-dimensional data, even when the number of variables p is higher than the number of observations n. However, the algorithm of Li and Chen has a high computational cost. In the references [C. Croux, A. Ruiz-Gazen, in COMPSTAT: Proceedings in Computational Statistics 1996, Physica-Verlag, Heidelberg, 1996, pp. 211–217; C. Croux and A. Ruiz-Gazen, High Breakdown Estimators for Principal Components: the Projection-Pursuit Approach Revisited, 2000, submitted for publication.], a computationally much more attractive method is presented, but in high dimensions (large p) it has a numerical accuracy problem and still consumes much computation time. In this paper, we construct a faster two-step algorithm that is more stable numerically. The new algorithm is illustrated on a data set with four dimensions and on two chemometrical data sets with 1200 and 600 dimensions.

intelligent data analysis | 2011

Mining train delays

Boris Cule; Bart Goethals; Sven Tassenoy; Sabine Verboven

The Belgian railway network has a high traffic density with Brussels as its gravity center. The star-shape of the network implies heavily loaded bifurcations in which knock-on delays are likely to occur. Knock-on delays should be minimized to improve the total punctuality in the network. Based on experience, the most critical junctions in the traffic flow are known, but others might be hidden. To reveal the hidden patterns of trains passing delays to each other, we study, adapt and apply the state-of-the-art techniques for mining frequent episodes to this specific problem.

Journal of Chemometrics | 2012

Robust preprocessing and model selection for spectral data

Sabine Verboven; Mia Hubert; Peter Goos

To calibrate spectral data, one typically starts with preprocessing the spectra and then applies a multivariate calibration method such as principal component regression or partial least squares regression. In the model selection step, the optimal number of latent variables is determined in order to minimize the prediction error. To protect the analysis against the harmful influence of possible outliers in the data, robust calibration methods have been developed. In this paper, we focus on the preprocessing and the model selection step. We propose several robust preprocessing methods as well as robust measures of the root mean squared error of prediction (RMSEP). To select the optimal preprocessing method, we summarize the results for the different RMSEP values by means of a desirability index, which is a concept from industrial quality control. These robust RMSEP values are also used to select the optimal number of latent variables. We illustrate our newly developed techniques through the analysis of a real data set containing near‐infrared measurements of samples of animal feed. Copyright

Computational Biology and Chemistry | 2009

Research Article: Robust data imputation

Karlien Vanden Branden; Sabine Verboven

Single imputation methods have been wide-discussed topics among researchers in the field of bioinformatics. One major shortcoming of methods proposed until now is the lack of robustness considerations. Like all data, gene expression data can possess outlying values. The presence of these outliers could have negative effects on the imputated values for the missing values. Afterwards, the outcome of any statistical analysis on the completed data could lead to incorrect conclusions. Therefore it is important to consider the possibility of outliers in the data set, and to evaluate how imputation techniques will handle these values. In this paper, a simulation study is performed to test existing techniques for data imputation in case outlying values are present in the data. To overcome some shortcomings of the existing imputation techniques, a new robust imputation method that can deal with the presence of outliers in the data is introduced. In addition, the robust imputation procedure cleans the data for further statistical analysis. Moreover, this method can be easily extended towards a multiple imputation approach by which the uncertainty of the imputed values is emphasised. Finally, a classification example illustrates the lack of robustness of some existing imputation methods and shows the advantage of the multiple imputation approach of the new robust imputation technique.

Theory and applications of recent robust methods. - Basel, 2004 | 2004

Robust PCR and Robust PLSR: a Comparative Study

Sanne Engelen; Mia Hubert; K. Vanden Branden; Sabine Verboven

Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR) are the two most popular regression techniques in chemo-metrics. They both fit a linear relationship between two sets of variables. The responses are usually low-dimensional whereas the regressors are very numerous compared to the number of observations. In this paper we compare two recent robust PCR and PLSR methods and their classical versions in terms of efficiency, goodness-of-fit, predictive power and robustness.

COMPSTAT: proceedings in computational statistics / HÃ¤rdle, W. [edit.] | 2002

Robust Principal Components Regression

Sabine Verboven; Mia Hubert

We consider the multivariate linear regression model with p explanatory variables X and q ≥ 1 response variables Y. Moreover we assume that the regressors are multicollinear. This situation often occurs in the calibration of chemometrical data, where the X-variables correspond with spectra that are measured at many frequencies. It is well known that the classical least squares estimator has a large variance in the presence of multicollinearity. Moreover it can not be computed when p > n since the X t X matrix then becomes singular. Therefore many biased estimators have been proposed. A very appealing method is principal components regression (PCR) since it is easy to understand and to compute.

Archive | 2003

Robust PCA for high-dimensional data

Mia Hubert; Peter J. Rousseeuw; Sabine Verboven

Principal component analysis (PCA) is a well-known technique for dimension reduction. Classical PCA is based on the empirical mean and covariance matrix of the data, and hence is strongly affected by outlying observations. Therefore, there is a huge need for robust PCA. When the original number of variables is small enough, and in particular smaller than the number of observations, it is known that one can apply a robust estimator of multivariate location and scatter and compute the eigenvectors of the scatter matrix.

COMPSTAT: Proceedings in Computational Statistics | 2000

An improved algorithm for robust PCA

Sabine Verboven; Peter J. Rousseeuw; Mia Hubert

In Croux and Ruiz (1996) a robust principal component algorithm is presented. It is based on projection pursuit to ensure that it can be applied to high-dimensional data. We note that this algorithm has a problem of numerical stability and we develop an improved version. To reduce the computation time we then propose a two-step algorithm. The new algorithm is illustrated on a real data set from chemometrics

Chemometrics and Intelligent Laboratory Systems | 2005