Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Johan A. Westerhuis is active.

Publication


Featured researches published by Johan A. Westerhuis.


BMC Genomics | 2006

Centering, scaling, and transformations: improving the biological information content of metabolomics data.

Robert A. van den Berg; Huub C. J. Hoefsloot; Johan A. Westerhuis; Age K. Smilde; Mariët J. van der Werf

BackgroundExtracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability.ResultsDifferent data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis.ConclusionDifferent pretreatment methods emphasize different aspects of the data and each pretreatment method has its own merits and drawbacks. The choice for a pretreatment method depends on the biological question to be answered, the properties of the data set and the data analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis).In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important.


Metabolomics | 2008

Assessment of PLSDA cross validation

Johan A. Westerhuis; Huub C. J. Hoefsloot; Suzanne Smit; Daniel J. Vis; Age K. Smilde; Ewoud J. J. van Velzen; John P. M. van Duijnhoven; Ferdi A. van Dorsten

Classifying groups of individuals based on their metabolic profile is one of the main topics in metabolomics research. Due to the low number of individuals compared to the large number of variables, this is not an easy task. PLSDA is one of the data analysis methods used for the classification. Unfortunately this method eagerly overfits the data and rigorous validation is necessary. The validation however is far from straightforward. Is this paper we will discuss a strategy based on cross model validation and permutation testing to validate the classification models. It is also shown that too optimistic results are obtained when the validation is not done properly. Furthermore, we advocate against the use of PLSDA score plots for inference of class differences.


Chemometrics and Intelligent Laboratory Systems | 2000

Generalized contribution plots in multivariate statistical process monitoring

Johan A. Westerhuis; Stephen P. Gurden; Age K. Smilde

Abstract This paper discusses contribution plots for both the D -statistic and the Q -statistic in multivariate statistical process control of batch processes. Contributions of process variables to the D -statistic are generalized to any type of latent variable model with or without orthogonality constraints. The calculation of contributions to the Q -statistic is discussed. Control limits for both types of contributions are introduced to show the relative importance of a contribution compared to the contributions of the corresponding process variables in the batches obtained under normal operating conditions. The contributions are introduced for off-line monitoring of batch processes, but can easily be extended to on-line monitoring and to continuous processes, as is shown in this paper.


Chemometrics and Intelligent Laboratory Systems | 2001

Direct orthogonal signal correction

Johan A. Westerhuis; Sijmen de Jong; Age K. Smilde

In the present paper, the concept of orthogonal signal correction (OSC) as a spectral preprocessing method is discussed and a number of OSC algorithms that have appeared are compared from a theoretical viewpoint. Since all of these algorithms had some problems concerning the orthogonality towards Y, non-optimal amount of variance removed from X, or a non-attainable solution, a new direct OSC algorithm (DOSC) is introduced. DOSC was originally developed as a direct method solely based on least squares steps that had none of the problems mentioned above. The first practical results with the new method, however, were not encouraging due to the complete orthogonality constraint. If this orthogonality constraint is loosened, the method improves considerably and simplifies the calibration model for the prediction of Y.


Metabolomics | 2012

Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies

Ewa Szymańska; Edoardo Saccenti; Age K. Smilde; Johan A. Westerhuis

Partial Least Squares-Discriminant Analysis (PLS-DA) is a PLS regression method with a special binary ‘dummy’ y-variable and it is commonly used for classification purposes and biomarker selection in metabolomics studies. Several statistical approaches are currently in use to validate outcomes of PLS-DA analyses e.g. double cross validation procedures or permutation testing. However, there is a great inconsistency in the optimization and the assessment of performance of PLS-DA models due to many different diagnostic statistics currently employed in metabolomics data analyses. In this paper, properties of four diagnostic statistics of PLS-DA, namely the number of misclassifications (NMC), the Area Under the Receiver Operating Characteristic (AUROC), Q2 and Discriminant Q2 (DQ2) are discussed. All four diagnostic statistics are used in the optimization and the performance assessment of PLS-DA models of three different-size metabolomics data sets obtained with two different types of analytical platforms and with different levels of known differences between two groups: control and case groups. Statistical significance of obtained PLS-DA models was evaluated with permutation testing. PLS-DA models obtained with NMC and AUROC are more powerful in detecting very small differences between groups than models obtained with Q2 and Discriminant Q2 (DQ2). Reproducibility of obtained PLS-DA models outcomes, models complexity and permutation test distributions are also investigated to explain this phenomenon. DQ2 and Q2 (in contrary to NMC and AUROC) prefer PLS-DA models with lower complexity and require higher number of permutation tests and submodels to accurately estimate statistical significance of the model performance. NMC and AUROC seem more efficient and more reliable diagnostic statistics and should be recommended in two group discrimination metabolomic studies.


Metabolomics | 2010

Multivariate paired data analysis: multilevel PLSDA versus OPLSDA

Johan A. Westerhuis; Ewoud J. J. van Velzen; Huub C. J. Hoefsloot; Age K. Smilde

Metabolomics data obtained from (human) nutritional intervention studies can have a rather complex structure that depends on the underlying experimental design. In this paper we discuss the complex structure in data caused by a cross-over designed experiment. In such a design, each subject in the study population acts as his or her own control and makes the data paired. For a single univariate response a paired t-test or repeated measures ANOVA can be used to test the differences between the paired observations. The same principle holds for multivariate data. In the current paper we compare a method that exploits the paired data structure in cross-over multivariate data (multilevel PLSDA) with a method that is often used by default but that ignores the paired structure (OPLSDA). The results from both methods have been evaluated in a small simulated example as well as in a genuine data set from a cross-over designed nutritional metabolomics study. It is shown that exploiting the paired data structure underlying the cross-over design considerably improves the power and the interpretability of the multivariate solution. Furthermore, the multilevel approach provides complementary information about (I) the diversity and abundance of the treatment effects within the different (subsets of) subjects across the study population, and (II) the intrinsic differences between these study subjects.


Metabolomics | 2014

Reflections on univariate and multivariate analysis of metabolomics data

Edoardo Saccenti; Huub C. J. Hoefsloot; Age K. Smilde; Johan A. Westerhuis; Margriet M. W. B. Hendriks

AbstractMetabolomics experiments usually result in a large quantity of data. Univariate and multivariate analysis techniques are routinely used to extract relevant information from the data with the aim of providing biological knowledge on the problem studied. Despite the fact that statistical tools like the t test, analysis of variance, principal component analysis, and partial least squares discriminant analysis constitute the backbone of the statistical part of the vast majority of metabolomics papers, it seems that many basic but rather fundamental questions are still often asked, like: Why do the results of univariate and multivariate analyses differ? Why apply univariate methods if you have already applied a multivariate method? Why if I do not see something univariately I see something multivariately? In the present paper we address some aspects of univariate and multivariate analysis, with the scope of clarifying in simple terms the main differences between the two approaches. Applications of the t test, analysis of variance, principal component analysis and partial least squares discriminant analysis will be shown on both real and simulated metabolomics data examples to provide an overview on fundamental aspects of univariate and multivariate methods.


Chemical Engineering Science | 2002

Critical evaluation of approaches for on-line batch process monitoring

Eric N.M. van Sprang; Henk-Jan Ramaker; Johan A. Westerhuis; Stephen P. Gurden; Age K. Smilde

Since the introduction of batch process monitoring using component models in 1992, different approaches for statistical batch process monitoring have been suggested in the literature. This is the first evaluation of five proposed approaches so far. The differences and similarities between the approaches are highlighted. The derivation of control charts for these approaches are discussed. A control chart should give a fast and reliable detection of disturbances in the process. These features are evaluated for each approach by means of two performance indices. First, the action signal time for various disturbed batches is tested. Secondly, the probability of a false warning in a control chart is computed. In order to evaluate the five approaches, five different data sets are studied: one simulation of a batch process, three batch processes obtained from industry and one laboratory spectral data set. The obtained results for the performance indices are summarised and discussed. Recommendations helpful for practical use are given.


Journal of Proteome Research | 2009

Phenotyping Tea Consumers by Nutrikinetic Analysis of Polyphenolic End-Metabolites

Ewoud J. J. van Velzen; Johan A. Westerhuis; John van Duynhoven; Ferdi A. van Dorsten; Christian H. Grün; Doris M. Jacobs; Guus S. M. J. E. Duchateau; Daniel J. Vis; Age K. Smilde

An integration of metabolomics and pharmacokinetics (or nutrikinetics) is introduced as a concept to describe a human study population with different metabolic phenotypes following a nutritional intervention. The approach facilitates an unbiased analysis of the time-response of body fluid metabolites from crossover designed intervention trials without prior knowledge of the underlying metabolic pathways. The method is explained for the case of a human intervention study in which the nutrikinetic analysis of polyphenol-rich black tea consumption was performed in urine over a period of 48 h. First, multilevel PLS-DA analysis was applied to the urinary 1H NMR profiles to select the most differentiating biomarkers between the verum and placebo samples. Then, a one-compartment nutrikinetic model with first-order excretion, a lag time, and a baseline function was fitted to the time courses of these selected biomarkers. The nutrikinetic model used here fully exploits the crossover structure in the data by fitting the data from both the treatment period and the placebo period simultaneously. To demonstrate the procedure, a selected set of urinary biomarkers was used in the model fitting. These metabolites include hippuric acid, 4-hydroxyhippuric acid and 1,3-dihydroxyphenyl-2-O-sulfate and derived from microbial fermentation of polyphenols in the gut. Variations in urinary excretion between- and within the subjects were observed, and used to provide a phenotypic description of the test population.


Journal of Chemometrics | 2000

Multiway multiblock component and covariates regression models

Age K. Smilde; Johan A. Westerhuis; Ricard Boqué

In this paper the general theory of multiway multiblock component and covariates regression models is explained. Unlike in existing methods such as multiblock PLS and multiblock PCA, in the new proposed method a different number of components can be selected for each block. Furthermore, the method can be generalized to incorporate multiway blocks to which any multiway model can be applied. The method is a direct extension of principal covariates regression and therefore works in a simultaneous fashion in which a clearly defined objective criterion is minimized. It can be tuned to fulfil the requirements of the user. Algorithms to calculate the components will be presented. The method will be illustrated with two three‐block examples and compared to existing approaches. The first example is with two‐way data and the second example is with a three‐way array. It will be shown that predictions are as good as with the existing methods, but because for most blocks fewer components are required, diagnostic properties of the method are improved. Copyright

Collaboration


Dive into the Johan A. Westerhuis's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jos A. Hageman

Wageningen University and Research Centre

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge