Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jos A. Hageman is active.

Publication


Featured researches published by Jos A. Hageman.


Metabolomics | 2016

Improved batch correction in untargeted MS-based metabolomics.

Ron Wehrens; Jos A. Hageman; Fred A. van Eeuwijk; Rik Kooke; Pádraic J. Flood; Erik Wijnker; Joost J. B. Keurentjes; Arjen Lommen; Henriëtte D. L. M. van Eekelen; Robert D. Hall; Roland Mumm; Ric C. H. de Vos

AbstractIntroductionBatch effects in large untargeted metabolomics experiments are almost unavoidable, especially when sensitive detection techniques like mass spectrometry (MS) are employed. In order to obtain peak intensities that are comparable across all batches, corrections need to be performed. Since non-detects, i.e., signals with an intensity too low to be detected with certainty, are common in metabolomics studies, the batch correction methods need to take these into account. ObjectivesThis paper aims to compare several batch correction methods, and investigates the effect of different strategies for handling non-detects.MethodsBatch correction methods usually consist of regression models, possibly also accounting for trends within batches. To fit these models quality control samples (QCs), injected at regular intervals, can be used. Also study samples can be used, provided that the injection order is properly randomized. Normalization methods, not using information on batch labels or injection order, can correct for batch effects as well. Introducing two easy-to-use quality criteria, we assess the merits of these batch correction strategies using three large LC–MS and GC–MS data sets of samples from Arabidopsis thaliana.ResultsThe three data sets have very different characteristics, leading to clearly distinct behaviour of the batch correction strategies studied. Explicit inclusion of information on batch and injection order in general leads to very good corrections; when enough QCs are available, also general normalization approaches perform well. Several approaches are shown to be able to handle non-detects—replacing them with very small numbers such as zero seems the worst of the approaches considered.ConclusionThe use of quality control samples for batch correction leads to good results when enough QCs are available. If an experiment is properly set up, batch correction using the study samples usually leads to a similar high-quality correction, but has the advantage that more metabolites are corrected. The strategy for handling non-detects is important: choosing small values like zero can lead to suboptimal batch corrections.


Journal of Near Infrared Spectroscopy | 2005

Temperature robust multivariate calibration: an overview of methods for dealing with temperature influences on near infrared spectra

Jos A. Hageman; Johan A. Westerhuis; Age K. Smilde

Multivariate calibration is a powerful tool for establishing a relationship between spectral variables and properties of interest. Usually, changes in spectral variables are ascribed to changes in the chemical composition of the sample. However, spectral intensities that are measured at varying temperatures do not only change because of changes in sample composition but also respond to the change in temperature. In these cases, multivariate calibration can be (severely) hindered, resulting in a loss of prediction capabilities. This paper provides an overview of the characteristics and possibilities of (most) methods for temperature robust multivariate calibration. The methods are discussed by using two data sets.


PLOS ONE | 2015

Bovine milk proteome in the first 9 days: protein interactions in maturation of the immune and digestive system of the newborn.

Lina Zhang; Jos A. Hageman; Toon van Hooijdonk; Jacques Vervoort; Kasper Hettinga

In order to better understand the milk proteome and its changes from colostrum to mature milk, samples taken at seven time points in the first 9 days from 4 individual cows were analyzed using proteomic techniques. Both the similarity in changes from day 0 to day 9 in the quantitative milk proteome, and the differences in specific protein abundance, were observed among four cows. One third of the quantified proteins showed a significant decrease in concentration over the first 9 days after calving, especially in the immune proteins (as much as 40 fold). Three relative high abundant enzymes (XDH, LPL, and RNASE1) and cell division and proliferation protein (CREG1) may be involved in the maturation of the gastro-intestinal tract. In addition, high correlations between proteins involved in complement and blood coagulation cascades illustrates the complex nature of biological interrelationships between milk proteins. The linear decrease of protease inhibitors and proteins involved in innate and adaptive immune system implies a protective role for protease inhibitor against degradation. In conclusion, the results found in this study not only improve our understanding of the role of colostrum in both host defense and development of the newborn calf but also provides guidance for the improvement of infant formula through better understanding of the complex interactions between milk proteins.


Plant and Soil | 2006

The complementarity of extractable and ester-bound lipids in a soil profile under pine.

Klaas G.J. Nierop; Boris Jansen; Jos A. Hageman; J.M. Verstraten

Extractable and solvent insoluble, ester-bound lipids were analysed in an acid, sandy soil profile under Corsican pine. The n-alkanes and alkanoic acids from the soil profile showed rather poor correlations with those from the pine needles and roots, while the n-alkanol composition in the mineral horizons strongly indicated the presence of lipids derived from a previous grass vegetation. Although the ester-bound lipids (ω-hydroxyalkanoic acids and α,ω-alkanedioic acids (>C24)) suggested that plant sources other than pines were present in the mineral soil horizons their composition was less contaminated and a clear distinction between needle and root input could be discerned. The divergent clustering of soil horizons and plant materials by individual and combined compound classes emphasized the usefulness of both extractable lipids and cutin/suberin in unravelling (past) vegetation and tissue history and contributions to soil organic matter.


Metabolomics | 2008

Genetic algorithm based two-mode clustering of metabolomics data

Jos A. Hageman; R.A. van den Berg; Johan A. Westerhuis; M.J. van der Werf; Age K. Smilde

Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering methods allow for analysis of the behavior of subsets of metabolites under different experimental conditions. In addition, the results are easily visualized. In this paper we introduce a two-mode clustering method based on a genetic algorithm that uses a criterion that searches for homogeneous clusters. Furthermore we introduce a cluster stability criterion to validate the clusters and we provide an extended knee plot to select the optimal number of clusters in both experimental and metabolite modes. The genetic algorithm-based two-mode clustering gave biological relevant results when it was applied to two real life metabolomics data sets. It was, for instance, able to identify a catabolic pathway for growth on several of the carbon sources.


Analytica Chimica Acta | 2011

On the increase of predictive performance with high-level data fusion

T.G. Doeswijk; Age K. Smilde; Jos A. Hageman; Johan A. Westerhuis; F. A. van Eeuwijk

The combination of the different data sources for classification purposes, also called data fusion, can be done at different levels: low-level, i.e. concatenating data matrices, medium-level, i.e. concatenating data matrices after feature selection and high-level, i.e. combining model outputs. In this paper the predictive performance of high-level data fusion is investigated. Partial least squares is used on each of the data sets and dummy variables representing the classes are used as response variables. Based on the estimated responses ŷ(j) for data set j and class k, a Gaussian distribution p(g(k)|ŷ(j)) is fitted. A simulation study is performed that shows the theoretical performance of high-level data fusion for two classes and two data sets. Within group correlations of the predicted responses of the two models and differences between the predictive ability of each of the separate models and the fused models are studied. Results show that the error rate is always less than or equal to the best performing subset and can theoretically approach zero. Negative within group correlations always improve the predictive performance. However, if the data sets have a joint basis, as with metabolomics data, this is not likely to happen. For equally performing individual classifiers the best results are expected for small within group correlations. Fusion of a non-predictive classifier with a classifier that exhibits discriminative ability lead to increased predictive performance if the within group correlations are strong. An example with real life data shows the applicability of the simulation results.


Euphytica | 2012

Two-mode clustering of genotype by trait and genotype by environment data

Jos A. Hageman; Marcos Malosetti; F. A. van Eeuwijk

In this paper, we demonstrate the use of two-mode clustering for genotype by trait and genotype by environment data. In contrast to two separate (one mode) clusterings on genotypes or traits/environments, two-mode clustering simultaneously produces homogeneous groups of genotypes and traits/environments. For two-mode clustering, we first scan all two-mode cluster solutions with all possible numbers of clusters using k-means. After deciding on the final numbers of clusters, we continue with a two-mode clustering algorithm based on a genetic algorithm. This ensures optimal solutions even for large data sets. We discuss the application of two-mode clustering to multiple trait data stemming from genomic research on tomatoes as well as an application to multi-environment data on barley.


Critical Reviews in Analytical Chemistry | 2006

Bagged K-Means Clustering of Metabolome Data

Jos A. Hageman; R.A. van den Berg; Johan A. Westerhuis; Huub C. J. Hoefsloot; Age K. Smilde

Clustering of metabolomics data can be hampered by noise originating from biological variation, physical sampling error and analytical error. Using data analysis methods which are not specially suited for dealing with noisy data will yield sub optimal solutions. Bootstrap aggregating (bagging) is a resampling technique that can deal with noise and improves accuracy. This paper demonstrates the possibilities for bagged clustering applied to metabolomics data. The metabolomics data used in this paper is computer-generated with the human red blood cell model. Perturbing this model can be done in several ways. In this paper, inhibition experiments are mimicked inhibiting enzyme activity to 10% of its original value. Comparing bagged K-means clustering to ordinary K-means, the number of metabolites switching clusters under the influence of heteroscedastic noise is lower if bagging is used. This favors bagged K-means above ordinary K-means clustering when dealing with noisy metabolomics data. A special validation scheme, independent of the addition of noise, has been devised to demonstrate the positive effects of bagging on clustering.


PLOS ONE | 2011

Simplivariate models: uncovering the underlying biology in functional genomics data

Edoardo Saccenti; Johan A. Westerhuis; Age K. Smilde; M.J. van der Werf; Jos A. Hageman; M.M.W.B. Hendriks

One of the first steps in analyzing high-dimensional functional genomics data is an exploratory analysis of such data. Cluster Analysis and Principal Component Analysis are then usually the method of choice. Despite their versatility they also have a severe drawback: they do not always generate simple and interpretable solutions. On the basis of the observation that functional genomics data often contain both informative and non-informative variation, we propose a method that finds sets of variables containing informative variation. This informative variation is subsequently expressed in easily interpretable simplivariate components. We present a new implementation of the recently introduced simplivariate models. In this implementation, the informative variation is described by multiplicative models that can adequately represent the relations between functional genomics data. Both a simulated and two real-life metabolomics data sets show good performance of the method.


Journal of Computational Chemistry | 2003

Powder pattern indexing using the weighted crosscorrelation and genetic algorithms.

Jos A. Hageman; Ron Wehrens; R. de Gelder; L.M.C. Buydens

X‐ray diffraction is a powerful technique for investigating the structure of crystals and crystalline powders. Unfortunately, for powders, the first step in the structure elucidation process, retrieving the unit cell parameters (indexing), is still very critical. In the present article, an improved approach to powder pattern indexing is presented. The proposed method matches peak positions from experimental X‐ray powder patterns with peak positions from trial cells using a recently published method for pattern comparison (weighted crosscorrelation). Trial cells are optimized with Genetic Algorithms. Patterns are not pretreated to remove any existing zero point shift, as this is determined during optimization. Another improvement is the peak assignment procedure. This assignment is needed for determining the similarity between lines from trial cells and experiment. It no longer allows calculated peaks to be assigned twice to different experimental peaks, which is beneficial for the indexing process. The procedure proves to be robust with respect to false peaks and accidental or systematic absensences of reflections, and is successfully applied to powder patterns originating from orthorhombic, monoclinic, and triclinic compounds measured with synchrotron as well as with conventional laboratory X‐ray diffractometers.

Collaboration


Dive into the Jos A. Hageman's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kasper Hettinga

Wageningen University and Research Centre

View shared research outputs
Top Co-Authors

Avatar

Robert D. Hall

Wageningen University and Research Centre

View shared research outputs
Top Co-Authors

Avatar

F. A. van Eeuwijk

Wageningen University and Research Centre

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jacques Vervoort

Wageningen University and Research Centre

View shared research outputs
Top Co-Authors

Avatar

Lina Zhang

Wageningen University and Research Centre

View shared research outputs
Researchain Logo
Decentralizing Knowledge