François Husson
Agrocampus Ouest
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by François Husson.
Computational Statistics & Data Analysis | 2008
Julie Josse; Jérôme Pagès; François Husson
The relationship between two sets of variables defined for the same individuals can be evaluated by the RV coefficient. However, it is impossible to assess by the RV value alone whether or not the two sets of variables are significantly correlated, which is why a test is required. Asymptotic tests do exist but fail in many situations, hence the interest in permutation tests. However, the main drawbacks of the permutation tests are that they are time consuming. It is therefore interesting to approximate the permutation distribution with continuous distributions (without doing any permutation). The current approximations (normal approximation, a log-transformation and Pearson type III approximation) are discussed and a new one is described: an Edgeworth expansion. Finally, these different approximations are compared for both simulations and for a sensory example.
Computational Statistics & Data Analysis | 2012
Julie Josse; François Husson
Cross-validation is a tried and tested approach to select the number of components in principal component analysis (PCA), however, its main drawback is its computational cost. In a regression (or in a non parametric regression) setting, criteria such as the general cross-validation one (GCV) provide convenient approximations to leave-one-out cross-validation. They are based on the relation between the prediction error and the residual sum of squares weighted by elements of a projection matrix (or a smoothing matrix). Such a relation is then established in PCA using an original presentation of PCA with a unique projection matrix. It enables the definition of two cross-validation approximation criteria: the smoothing approximation of the cross-validation criterion (SACV) and the GCV criterion. The method is assessed with simulations and gives promising results.
Food Quality and Preference | 2001
François Husson; S Le Dien; Jérôme Pagès
The introduction of sensory descriptive questions in consumer studies is often criticized. From a data set, provided for the 5th sensometrics meeting, we show that sensory profiles obtained from consumers can have two essential qualities: consensus and reproducibility. This implies that we cannot (a priori) forbid the use of sensory profiles given by consumers. To show this, we use analyses of variance and multiple factor analysis; these methods are useful to obtain a visualization of the data from several panels.
Advanced Data Analysis and Classification | 2011
Julie Josse; Jérôme Pagès; François Husson
The available methods to handle missing values in principal component analysis only provide point estimates of the parameters (axes and components) and estimates of the missing values. To take into account the variability due to missing values a multiple imputation method is proposed. First a method to generate multiple imputed data sets from a principal component analysis model is defined. Then, two ways to visualize the uncertainty due to missing values onto the principal component analysis results are described. The first one consists in projecting the imputed data sets onto a reference configuration as supplementary elements to assess the stability of the individuals (respectively of the variables). The second one consists in performing a principal component analysis on each imputed data set and fitting each obtained configuration onto the reference one with Procrustes rotation. The latter strategy allows to assess the variability of the principal component analysis parameters induced by the missing values. The methodology is then evaluated from a real data set.
Journal of Classification | 2012
Julie Josse; Marie Chavent; Benoit Liquet; François Husson
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence analysis, to handle missing values in multiple correspondence analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity analysis framework.
Journal of Dairy Science | 2010
Laure Brun-Lafleur; Luc Delaby; François Husson; Philippe Faverdin
Feed management is one of the principal levers by which the production and composition of milk by dairy cows can be modulated in the short term. The response of milk yield and milk composition to variations in either energy or protein supplies is well known. However, in practice, dietary supplies of energy and protein vary simultaneously, and their interaction is still not well understood. The objective of this trial was to determine whether energy and protein interacted in their effects on milk production and milk composition and whether the response to changes in the diets depended on the parity and potential production of cows. From the results, a model was built to predict the response of milk yield and milk composition to simultaneous variations in energy and protein supplies relative to requirements of cows. Nine treatments, defined by their energy and protein supplies, were applied to 48 cows divided into 4 homogeneous groups (primiparous or multiparous x high or low milk potential) over three 4-wk periods. The control treatment was calculated to cover the predicted requirements of the group of cows in the middle of the trial and was applied to each cow. The other 8 treatments corresponded to fixed supplies of energy and protein, higher or lower than those of the control treatment. The results highlighted a significant energy x protein interaction not only on milk yield but also on protein content and yield. The response of milk yield to energy supply was zero with a negative protein balance and increased with protein supply equal to or higher than requirements. The response of milk yield to changes in the diet was greater for cows with high production potential than for those with low production potential, and the response of milk protein content was higher for primiparous cows than for multiparous cows. The model for the response of milk yield, protein yield, and protein content obtained in this trial made it possible to predict more accurately the variations in production and composition of milk relative to the potential of the cow because of changes in diet composition. In addition, the interaction obtained was in line with a response corresponding to the more limiting of 2 factors: energy or protein.
Advanced Data Analysis and Classification | 2016
Vincent Audigier; François Husson; Julie Josse
We propose a new method to impute missing values in mixed data sets. It is based on a principal component method, the factorial analysis for mixed data, which balances the influence of all the variables that are continuous and categorical in the construction of the principal components. Because the imputation uses the principal axes and components, the prediction of the missing values is based on the similarity between individuals and on the relationships between variables. The properties of the method are illustrated via simulations and the quality of the imputation is assessed using real data sets. The method is compared to a recent method (Stekhoven and Buhlmann Bioinformatics 28:113–118, 2011) based on random forest and shows better performance especially for the imputation of categorical variables and situations with highly linear relationships between continuous variables.
Statistics and Computing | 2015
Marie Verbanck; Julie Josse; François Husson
Principal component analysis (PCA) is a well-established dimensionality reduction method commonly used to denoise and visualise data. A classical PCA model is the fixed effect model in which data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression, we suggest a regularised version of PCA that essentially selects a certain number of dimensions and shrinks the corresponding singular values. Each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension. The regularised term is analytically derived using asymptotic results and can also be justified from a Bayesian treatment of the model. Regularised PCA provides promising results in terms of the recovery of the true signal and the graphical outputs in comparison with classical PCA and with a soft thresholding estimation strategy. The distinction between PCA and regularised PCA becomes especially important in the case of very noisy data.
Journal of Immunology | 2016
Hiroko Fujii; Julie Josse; Miki Tanioka; Yoshiki Miyachi; François Husson; Masahiro Ono
CD4+ T cells that express the transcription factor FOXP3 (FOXP3+ T cells) are commonly regarded as immunosuppressive regulatory T cells (Tregs). FOXP3+ T cells are reported to be increased in tumor-bearing patients or animals and are considered to suppress antitumor immunity, but the evidence is often contradictory. In addition, accumulating evidence indicates that FOXP3 is induced by antigenic stimulation and that some non-Treg FOXP3+ T cells, especially memory-phenotype FOXP3low cells, produce proinflammatory cytokines. Accordingly, the subclassification of FOXP3+ T cells is fundamental for revealing the significance of FOXP3+ T cells in tumor immunity, but the arbitrariness and complexity of manual gating have complicated the issue. In this article, we report a computational method to automatically identify and classify FOXP3+ T cells into subsets using clustering algorithms. By analyzing flow cytometric data of melanoma patients, the proposed method showed that the FOXP3+ subpopulation that had relatively high FOXP3, CD45RO, and CD25 expressions was increased in melanoma patients, whereas manual gating did not produce significant results on the FOXP3+ subpopulations. Interestingly, the computationally identified FOXP3+ subpopulation included not only classical FOXP3high Tregs, but also memory-phenotype FOXP3low cells by manual gating. Furthermore, the proposed method successfully analyzed an independent data set, showing that the same FOXP3+ subpopulation was increased in melanoma patients, validating the method. Collectively, the proposed method successfully captured an important feature of melanoma without relying on the existing criteria of FOXP3+ T cells, revealing a hidden association between the T cell profile and melanoma, and providing new insights into FOXP3+ T cells and Tregs.
Journal of Statistical Computation and Simulation | 2016
Vincent Audigier; François Husson; Julie Josse
ABSTRACT We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the PCA model. Using a simulation study and real data sets, the method is compared to two classical approaches: multiple imputation based on joint modelling and on fully conditional modelling. Contrary to the others, the proposed method can be easily used on data sets where the number of individuals is less than the number of variables and when the variables are highly correlated. In addition, it provides unbiased point estimates of quantities of interest, such as an expectation, a regression coefficient or a correlation coefficient, with a smaller mean squared error. Furthermore, the widths of the confidence intervals built for the quantities of interest are often smaller whilst ensuring a valid coverage.