Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Age K. Smilde is active.

Publication


Featured researches published by Age K. Smilde.


BMC Genomics | 2006

Centering, scaling, and transformations: improving the biological information content of metabolomics data.

Robert A. van den Berg; Huub C. J. Hoefsloot; Johan A. Westerhuis; Age K. Smilde; Mariët J. van der Werf

BackgroundExtracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability.ResultsDifferent data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis.ConclusionDifferent pretreatment methods emphasize different aspects of the data and each pretreatment method has its own merits and drawbacks. The choice for a pretreatment method depends on the biological question to be answered, the properties of the data set and the data analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis).In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important.


Metabolomics | 2008

Assessment of PLSDA cross validation

Johan A. Westerhuis; Huub C. J. Hoefsloot; Suzanne Smit; Daniel J. Vis; Age K. Smilde; Ewoud J. J. van Velzen; John P. M. van Duijnhoven; Ferdi A. van Dorsten

Classifying groups of individuals based on their metabolic profile is one of the main topics in metabolomics research. Due to the low number of individuals compared to the large number of variables, this is not an easy task. PLSDA is one of the data analysis methods used for the classification. Unfortunately this method eagerly overfits the data and rigorous validation is necessary. The validation however is far from straightforward. Is this paper we will discuss a strategy based on cross model validation and permutation testing to validate the classification models. It is also shown that too optimistic results are obtained when the validation is not done properly. Furthermore, we advocate against the use of PLSDA score plots for inference of class differences.


Chemometrics and Intelligent Laboratory Systems | 2000

Generalized contribution plots in multivariate statistical process monitoring

Johan A. Westerhuis; Stephen P. Gurden; Age K. Smilde

Abstract This paper discusses contribution plots for both the D -statistic and the Q -statistic in multivariate statistical process control of batch processes. Contributions of process variables to the D -statistic are generalized to any type of latent variable model with or without orthogonality constraints. The calculation of contributions to the Q -statistic is discussed. Control limits for both types of contributions are introduced to show the relative importance of a contribution compared to the contributions of the corresponding process variables in the batches obtained under normal operating conditions. The contributions are introduced for off-line monitoring of batch processes, but can easily be extended to on-line monitoring and to continuous processes, as is shown in this paper.


Analytical Methods | 2014

Principal component analysis

Rasmus Bro; Age K. Smilde

Principal component analysis is one of the most important and powerful methods in chemometrics as well as in a wealth of other areas. This paper provides a description of how to understand, use, and interpret principal component analysis. The paper focuses on the use of principal component analysis in typical chemometric areas but the results are generally applicable.


Archive | 2005

Multi-Way Analysis with Applications in the Chemical Sciences: Smilde/Multi-Way Analysis with Applications in the Chemical Sciences

Age K. Smilde; Rasmus Bro; Paul Geladi

Foreword. Preface. Nomenclature and Conventions. 1 Introduction. 1.1 What is multi--way analysis? 1.2 Conceptual aspects of multi--way data analysis. 1.3 Hierarchy of multivariate data structures in chemistry. 1.4 Principal component analysis and PARAFAC. 1.5 Summary. 2 Array definitions and properties. 2.1 Introduction. 2.2 Rows, columns and tubes frontal, lateral and horizontal slices. 2.3 Elementary operations. 2.4 Linearity concepts. 2.5 Rank of two--way arrays. 2.6 Rank of three--way arrays. 2.7 Algebra of multi--way analysis. 2.8 Summary. Appendix 2.A. 3 Two--way component and regression models. 3.1 Models for two--way one--block data analysis: component models. 3.2 Models for two--way two--block data analysis: regression models. 3.3 Summary. Appendix 3.A: some PCA results. Appendix 3.B: PLS algorithms. 4 Three--way component and regression models. 4.1 Historical introduction to multi--way models. 4.2 Models for three--way one--block data: three--way component models. 4.3 Models for three--way two--block data: three--way regression models. 4.4 Summary. Appendix 4.A: alternative notation for the PARAFAC model. Appendix 4.B: alternative notations for the Tucker3 model. 5 Some properties of three--way component models. 5.1 Relationships between three--way component models. 5.2 Rotational freedom and uniqueness in three--way component models. 5.3 Properties of Tucker3 models. 5.4 Degeneracy problem in PARAFAC models. 5.5 Summary. 6 Algorithms. 6.1 Introduction. 6.2 Optimization techniques. 6.3 PARAFAC algorithms. 6.4 Tucker3 algorithms. 6.5 Tucker2 and Tucker1 algorithms. 6.6 Multi--linear partial least squares regression. 6.7 Multi--way covariates regression models. 6.8 Core rotation in Tucker3 models. 6.9 Handling missing data. 6.10 Imposing non--negativity. 6.11 Summary. Appendix 6.A: closed--form solution for the PARAFAC model. Appendix 6.B: proof that the weights in trilinear PLS1 can be obtained from a singular value decomposition. 7 Validation and diagnostics. 7.1 What is validation? 7.2 Test--set and cross--validation. 7.3 Selecting which model to use. 7.4 Selecting the number of components. 7.5 Residual and influence analysis. 7.6 Summary. 8 Visualization. 8.1 Introduction. 8.2 History of plotting in three--way analysis. 8.3 History of plotting in chemical three--way analysis. 8.4 Scree plots. 8.5 Line plots. 8.6 Scatter plots. 8.7 Problems with scatter plots. 8.8 Image analysis. 8.9 Dendrograms. 8.10 Visualizing the Tucker core array. 8.11 Joint plots. 8.12 Residual plots. 8.13 Leverage plots. 8.14 Visualization of large data sets. 8.15 Summary. 9 Preprocessing. 9.1 Background. 9.2 Two--way centering. 9.3 Two--way scaling. 9.4 Simultaneous two--way centering and scaling. 9.5 Three--way preprocessing. 9.6 Summary. Appendix 9.A: other types of preprocessing. Appendix 9.B: geometric view of centering. Appendix 9.C: fitting bilinear model plus offsets across one mode equals fitting a bilinear model to centered data. Appendix 9.D: rank reduction and centering. Appendix 9.E: centering data with missing values. Appendix 9.F: incorrect centering introduces artificial variation. Appendix 9.G: alternatives to centering. 10 Applications. 10.1 Introduction. 10.2 Curve resolution of fluorescence data. 10.3 Second--order calibration. 10.4 Multi--way regression. 10.5 Process chemometrics. 10.6 Exploratory analysis in chromatography. 10.7 Exploratory analysis in environmental sciences. 10.8 Exploratory analysis of designed data. 10.9 Analysis of variance of data with complex interactions. Appendix 10.A: an illustration of the generalized rank annihilation method. Appendix 10.B: other types of second--order calibration problems. Appendix 10.C: the multiple standards calibration model of the second--order calibration example. References. Index.


Bioinformatics | 2005

ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data

Age K. Smilde; J. Jansen; Huub C. J. Hoefsloot; Robert-Jan A. N. Lamers; Jan van der Greef; Marieke E. Timmerman

MOTIVATION Datasets resulting from metabolomics or metabolic profiling experiments are becoming increasingly complex. Such datasets may contain underlying factors, such as time (time-resolved or longitudinal measurements), doses or combinations thereof. Currently used biostatistics methods do not take the structure of such complex datasets into account. However, incorporating this structure into the data analysis is important for understanding the biological information in these datasets. RESULTS We describe ASCA, a new method that can deal with complex multivariate datasets containing an underlying experimental design, such as metabolomics datasets. It is a direct generalization of analysis of variance (ANOVA) for univariate data to the multivariate case. The method allows for easy interpretation of the variation induced by the different factors of the design. The method is illustrated with a dataset from a metabolomics experiment with time and dose factors.


Chemometrics and Intelligent Laboratory Systems | 1992

MULTICRITERIA DECISION-MAKING

Margriet M. W. B. Hendriks; Jan H. de Boer; Age K. Smilde; Durk A. Doornbos

Abstract Hendriks, M.M.W.B., De Boer, J.H., Smilde, A.K. and Doornbos, D.A., 1992. Multicriteria decision making. Chemometrics and Intelligent Laboratory System , 16: 175–191. Interest is growing in multicriteria decision making (MCDM) techniques and a large number of these techniques are now available. The purpose of this tutorial is to give a theoretical description of some of the MCDM techniques. Besides this we will give an overview of the differences and similarities of the techniques discussed. We have tried to select those techniques that are most frequently described in recent publications on analytical chemical and pharmaceutical subjects and, more important, that give a good survey of the diversity of techniques. We describe five different MCDM methods: Pareto optimality, desirability functions, overlay plots, utility functions and PROMETHEE. These techniques are compared to each other by applying them to a decision making problem in tablet manufacturing.


Metabolomics | 2007

Proposed minimum reporting standards for data analysis in metabolomics

Royston Goodacre; David Broadhurst; Age K. Smilde; Bruce S. Kristal; J. David Baker; Richard D. Beger; Conrad Bessant; Susan C. Connor; Giorgio Capuani; Andrew Craig; Timothy M. D. Ebbels; Douglas B. Kell; Cesare Manetti; Jack Newton; Giovanni Paternostro; Ray L. Somorjai; Michael Sjöström; Johan Trygg; Florian Wulfert

The goal of this group is to define the reporting requirements associated with the statistical analysis (including univariate, multivariate, informatics, machine learning etc.) of metabolite data with respect to other measured/collected experimental data (often called meta-data). These definitions will embrace as many aspects of a complete metabolomics study as possible at this time. In chronological order this will include: Experimental Design, both in terms of sample collection/matching, and data acquisition scheduling of samples through whichever spectroscopic technology used; Deconvolution (if required); Pre-processing, for example, data cleaning, outlier detection, row/column scaling, or other transformations; Definition and parameterization of subsequent visualizations and Statistical/Machine learning Methods applied to the dataset; If required, a clear definition of the Model Validation Scheme used (including how data are split into training/validation/test sets); Formal indication on whether the data analysis has been Independently Tested (either by experimental reproduction, or blind hold out test set). Finally, data interpretation and the visual representations and hypotheses obtained from the data analyses.


Chemometrics and Intelligent Laboratory Systems | 2001

Direct orthogonal signal correction

Johan A. Westerhuis; Sijmen de Jong; Age K. Smilde

In the present paper, the concept of orthogonal signal correction (OSC) as a spectral preprocessing method is discussed and a number of OSC algorithms that have appeared are compared from a theoretical viewpoint. Since all of these algorithms had some problems concerning the orthogonality towards Y, non-optimal amount of variance removed from X, or a non-attainable solution, a new direct OSC algorithm (DOSC) is introduced. DOSC was originally developed as a direct method solely based on least squares steps that had none of the problems mentioned above. The first practical results with the new method, however, were not encouraging due to the complete orthogonality constraint. If this orthogonality constraint is loosened, the method improves considerably and simplifies the calibration model for the prediction of Y.


Metabolomics | 2012

Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies

Ewa Szymańska; Edoardo Saccenti; Age K. Smilde; Johan A. Westerhuis

Partial Least Squares-Discriminant Analysis (PLS-DA) is a PLS regression method with a special binary ‘dummy’ y-variable and it is commonly used for classification purposes and biomarker selection in metabolomics studies. Several statistical approaches are currently in use to validate outcomes of PLS-DA analyses e.g. double cross validation procedures or permutation testing. However, there is a great inconsistency in the optimization and the assessment of performance of PLS-DA models due to many different diagnostic statistics currently employed in metabolomics data analyses. In this paper, properties of four diagnostic statistics of PLS-DA, namely the number of misclassifications (NMC), the Area Under the Receiver Operating Characteristic (AUROC), Q2 and Discriminant Q2 (DQ2) are discussed. All four diagnostic statistics are used in the optimization and the performance assessment of PLS-DA models of three different-size metabolomics data sets obtained with two different types of analytical platforms and with different levels of known differences between two groups: control and case groups. Statistical significance of obtained PLS-DA models was evaluated with permutation testing. PLS-DA models obtained with NMC and AUROC are more powerful in detecting very small differences between groups than models obtained with Q2 and Discriminant Q2 (DQ2). Reproducibility of obtained PLS-DA models outcomes, models complexity and permutation test distributions are also investigated to explain this phenomenon. DQ2 and Q2 (in contrary to NMC and AUROC) prefer PLS-DA models with lower complexity and require higher number of permutation tests and submodels to accurately estimate statistical significance of the model performance. NMC and AUROC seem more efficient and more reliable diagnostic statistics and should be recommended in two group discrimination metabolomic studies.

Collaboration


Dive into the Age K. Smilde's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rasmus Bro

University of Copenhagen

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

J. Jansen

University of Amsterdam

View shared research outputs
Researchain Logo
Decentralizing Knowledge