Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Huub C. J. Hoefsloot is active.

Publication


Featured researches published by Huub C. J. Hoefsloot.


BMC Genomics | 2006

Centering, scaling, and transformations: improving the biological information content of metabolomics data.

Robert A. van den Berg; Huub C. J. Hoefsloot; Johan A. Westerhuis; Age K. Smilde; Mariët J. van der Werf

BackgroundExtracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability.ResultsDifferent data pretreatment methods, i.e. centering, autoscaling, pareto scaling, range scaling, vast scaling, log transformation, and power transformation, were tested on a real-life metabolomics data set. They were found to greatly affect the outcome of the data analysis and thus the rank of the, from a biological point of view, most important metabolites. Furthermore, the stability of the rank, the influence of technical errors on data analysis, and the preference of data analysis methods for selecting highly abundant metabolites were affected by the data pretreatment method used prior to data analysis.ConclusionDifferent pretreatment methods emphasize different aspects of the data and each pretreatment method has its own merits and drawbacks. The choice for a pretreatment method depends on the biological question to be answered, the properties of the data set and the data analysis method selected. For the explorative analysis of the validation data set used in this study, autoscaling and range scaling performed better than the other pretreatment methods. That is, range scaling and autoscaling were able to remove the dependence of the rank of the metabolites on the average concentration and the magnitude of the fold changes and showed biologically sensible results after PCA (principal component analysis).In conclusion, selecting a proper data pretreatment method is an essential step in the analysis of metabolomics data and greatly affects the metabolites that are identified to be the most important.


Metabolomics | 2008

Assessment of PLSDA cross validation

Johan A. Westerhuis; Huub C. J. Hoefsloot; Suzanne Smit; Daniel J. Vis; Age K. Smilde; Ewoud J. J. van Velzen; John P. M. van Duijnhoven; Ferdi A. van Dorsten

Classifying groups of individuals based on their metabolic profile is one of the main topics in metabolomics research. Due to the low number of individuals compared to the large number of variables, this is not an easy task. PLSDA is one of the data analysis methods used for the classification. Unfortunately this method eagerly overfits the data and rigorous validation is necessary. The validation however is far from straightforward. Is this paper we will discuss a strategy based on cross model validation and permutation testing to validate the classification models. It is also shown that too optimistic results are obtained when the validation is not done properly. Furthermore, we advocate against the use of PLSDA score plots for inference of class differences.


Bioinformatics | 2005

ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data

Age K. Smilde; J. Jansen; Huub C. J. Hoefsloot; Robert-Jan A. N. Lamers; Jan van der Greef; Marieke E. Timmerman

MOTIVATION Datasets resulting from metabolomics or metabolic profiling experiments are becoming increasingly complex. Such datasets may contain underlying factors, such as time (time-resolved or longitudinal measurements), doses or combinations thereof. Currently used biostatistics methods do not take the structure of such complex datasets into account. However, incorporating this structure into the data analysis is important for understanding the biological information in these datasets. RESULTS We describe ASCA, a new method that can deal with complex multivariate datasets containing an underlying experimental design, such as metabolomics datasets. It is a direct generalization of analysis of variance (ANOVA) for univariate data to the multivariate case. The method allows for easy interpretation of the variation induced by the different factors of the design. The method is illustrated with a dataset from a metabolomics experiment with time and dose factors.


Metabolomics | 2010

Multivariate paired data analysis: multilevel PLSDA versus OPLSDA

Johan A. Westerhuis; Ewoud J. J. van Velzen; Huub C. J. Hoefsloot; Age K. Smilde

Metabolomics data obtained from (human) nutritional intervention studies can have a rather complex structure that depends on the underlying experimental design. In this paper we discuss the complex structure in data caused by a cross-over designed experiment. In such a design, each subject in the study population acts as his or her own control and makes the data paired. For a single univariate response a paired t-test or repeated measures ANOVA can be used to test the differences between the paired observations. The same principle holds for multivariate data. In the current paper we compare a method that exploits the paired data structure in cross-over multivariate data (multilevel PLSDA) with a method that is often used by default but that ignores the paired structure (OPLSDA). The results from both methods have been evaluated in a small simulated example as well as in a genuine data set from a cross-over designed nutritional metabolomics study. It is shown that exploiting the paired data structure underlying the cross-over design considerably improves the power and the interpretability of the multivariate solution. Furthermore, the multilevel approach provides complementary information about (I) the diversity and abundance of the treatment effects within the different (subsets of) subjects across the study population, and (II) the intrinsic differences between these study subjects.


Metabolomics | 2014

Reflections on univariate and multivariate analysis of metabolomics data

Edoardo Saccenti; Huub C. J. Hoefsloot; Age K. Smilde; Johan A. Westerhuis; Margriet M. W. B. Hendriks

AbstractMetabolomics experiments usually result in a large quantity of data. Univariate and multivariate analysis techniques are routinely used to extract relevant information from the data with the aim of providing biological knowledge on the problem studied. Despite the fact that statistical tools like the t test, analysis of variance, principal component analysis, and partial least squares discriminant analysis constitute the backbone of the statistical part of the vast majority of metabolomics papers, it seems that many basic but rather fundamental questions are still often asked, like: Why do the results of univariate and multivariate analyses differ? Why apply univariate methods if you have already applied a multivariate method? Why if I do not see something univariately I see something multivariately? In the present paper we address some aspects of univariate and multivariate analysis, with the scope of clarifying in simple terms the main differences between the two approaches. Applications of the t test, analysis of variance, principal component analysis and partial least squares discriminant analysis will be shown on both real and simulated metabolomics data examples to provide an overview on fundamental aspects of univariate and multivariate methods.


Bioinformatics | 2004

Analysis of longitudinal metabolomics data

J. Jansen; Huub C. J. Hoefsloot; Hans F. M. Boelens; Jan van der Greef; Age K. Smilde

MOTIVATION Metabolomics datasets are generally large and complex. Using principal component analysis (PCA), a simplified view of the variation in the data is obtained. The PCA model can be interpreted and the processes underlying the variation in the data can be analysed. In metabolomics, often a priori information is present about the data. Various forms of this information can be used in an unsupervised data analysis with weighted PCA (WPCA). A WPCA model will give a view on the data that is different from the view obtained using PCA, and it will add to the interpretation of the information in a metabolomics dataset. RESULTS A method is presented to translate spectra of repeated measurements into weights describing the experimental error. These weights are used in the data analysis with WPCA. The WPCA model will give a view on the data where the non-uniform experimental error is accounted for. Therefore, the WPCA model will focus more on the natural variation in the data. AVAILABILITY M-files for MATLAB for the algorithm used in this research are available at http://www-its.chem.uva.nl/research/pac/Software/pcaw.zip.


Metabolomics | 2010

Dynamic metabolomic data analysis : A tutorial review

Age K. Smilde; Johan A. Westerhuis; Huub C. J. Hoefsloot; Sabina Bijlsma; Carina M. Rubingh; Daniel J. Vis; Renger H. Jellema; Hanno Pijl; Ferdinand Roelfsema; J. van der Greef

In metabolomics, time-resolved, dynamic or temporal data is more and more collected. The number of methods to analyze such data, however, is very limited and in most cases the dynamic nature of the data is not even taken into account. This paper reviews current methods in use for analyzing dynamic metabolomic data. Moreover, some methods from other fields of science that may be of use to analyze such dynamic metabolomics data are described in some detail. The methods are put in a general framework after providing a formal definition on what constitutes a ‘dynamic’ method. Some of the methods are illustrated with real-life metabolomics examples.


International Journal for Numerical Methods in Fluids | 1999

Lattice‐Boltzmann and finite element simulations of fluid flow in a SMRX Static Mixer Reactor

Drona Kandhai; D.J.-E. Vidal; Alfons G. Hoekstra; Huub C. J. Hoefsloot; Piet D. Iedema; Peter M. A. Sloot

SUMMARY A detailed comparison between the finite element method (FEM) and the lattice-Boltzmann method (LBM) is presented. As a realistic test case, three-dimensional fluid flow simulations in an SMRX static mixer were performed. The SMRX static mixer is a piece of equipment with excellent mixing performance and it is used as a highly efficient chemical reactor for viscous systems like polymers. The complex geometry of this mixer makes such three-dimensional simulations non-trivial. An excellent agreement between the results of the two simulation methods was found. Furthermore, the numerical results for the pressure drop as a function of the flow rate were close to experimental measurements. Results show that the relatively simple LBM is a good alternative to traditional methods. Copyright


Chemical Engineering Science | 2000

Selection of optimal sensor position in a tubular reactor using robust degree of observability criteria

F.W.J. van den Berg; Huub C. J. Hoefsloot; Hans F. M. Boelens; Age K. Smilde

Robust selection criteria for the optimal location for in-process concentration or temperature sensors along the length of a tubular reactor for the partial oxidation of Benzene to Maleic Anhydride are developed. A model of the reactor is constructed by rewriting the Pdes describing the mass and heat balances into a set of Odes through the method of lines on a grid defined over the reactor length. The linearized model is described as a continuous, time-invariant state-space model where the state is formed by temperature and concentration profiles on the grid points. The best sensor location for the reactor is found by specifying scalar measures on the observability Gramian integral from the linear least-squares state estimation problem. New robust criteria for a degree of observability are specified. The scores on these criteria are determined by the amount of signal received by a sensor for a specific system configuration. These new selection criteria are compared with known measures for degree of observability for the optimal sensor location problem from the literature.


Molecular & Cellular Proteomics | 2013

A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

Christin Christin; Huub C. J. Hoefsloot; Age K. Smilde; Berend Hoekman; Frank Suits; Rainer Bischoff; Peter Horvatovich

In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery—t test, the Mann–Whitney–Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine–recursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)—using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n = 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low.

Collaboration


Dive into the Huub C. J. Hoefsloot's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

J. Jansen

University of Amsterdam

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge