Pierre Mahé | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pierre Mahé is active.

Explore More

Publication

Featured researches published by Pierre Mahé.

Bioinformatics | 2016

Large-scale machine learning for metagenomics sequence classification

Kévin Vervier; Pierre Mahé; Maud Tournoud; Jean-Baptiste Veyrieras; Jean-Philippe Vert

Motivation: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. Results: We propose a new rank-flexible machine learning-based compositional approach for taxonomic assignment of metagenomics reads and show that it benefits from increasing the number of fragments sampled from reference genome to tune its parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning the method involves training machine learning models on about 108 samples in 107 dimensions, which is out of reach of standard softwares but can be done efficiently with modern implementations for large-scale machine learning. The resulting method is competitive in terms of accuracy with well-established alignment and composition-based tools for problems involving a small to moderate number of candidate species and for reasonable amounts of sequencing errors. We show, however, that machine learning-based compositional approaches are still limited in their ability to deal with problems involving a greater number of species and more sensitive to sequencing errors. We finally show that the new method outperforms the state-of-the-art in its ability to classify reads from species of lineage absent from the reference database and confirm that compositional approaches achieve faster prediction times, with a gain of 2–17 times with respect to the BWA-MEM short read mapper, depending on the number of candidate species and the level of sequencing noise. Availability and implementation: Data and codes are available at http://cbio.ensmp.fr/largescalemetagenomics. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

european conference on machine learning | 2014

On learning matrices with orthogonal columns or disjoint supports

Kévin Vervier; Pierre Mahé; Alexandre d'Aspremont; Jean-Baptiste Veyrieras; Jean-Philippe Vert

We investigate new matrix penalties to jointly learn linear models with orthogonality constraints, generalizing the work of Xiao et al. [24] who proposed a strictly convex matrix norm for orthogonal transfer. We show that this norm converges to a particular atomic norm when its convexity parameter decreases, leading to new algorithmic solutions to minimize it. We also investigate concave formulations of this norm, corresponding to more aggressive strategies to induce orthogonality, and show how these penalties can also be used to learn sparse models with disjoint supports.

IEEE Access | 2014

Classification of Proteomic MS Data as Bayesian Solution of an Inverse Problem

Pascal Szacherski; Jean-François Giovannelli; Laurent Gerfault; Pierre Mahé; Jean-Philippe Charrier; Audrey Giremus; Bruno Lacroix; Pierre Grangeat

The cells in an organism emit different amounts of proteins according to their clinical state (healthy/pathological, for instance). The resulting proteomic profile can be used for early detection, diagnosis, and therapy planning. In this paper, we study the classification of a proteomic sample from the point of view of an inverse problem with a joint Bayesian solution, called inversion-classification. We propose a hierarchical physical forward model and present encouraging results from both simulation and clinical data.

Journal of Microbiological Methods | 2015

Three-dimensional characterization of bacterial microcolonies on solid agar-based culture media

Laurent Drazek; Maud Tournoud; Frédéric Derepas; Maryse Guicherd; Pierre Mahé; Frédéric Pinston; Jean-Baptiste Veyrieras; Sonia Chatellier

For the last century, in vitro diagnostic process in microbiology has mainly relied on the growth of bacteria on the surface of a solid agar medium. Nevertheless, few studies focused in the past on the dynamics of microcolonies growth on agar surface before 8 to 10h of incubation. In this article, chromatic confocal microscopy has been applied to characterize the early development of a bacterial colony. This technology relies on a differential focusing depth of the white light. It allows one to fully measure the tridimensional shape of microcolonies more quickly than classical confocal microscopy but with the same spatial resolution. Placing the device in an incubator, the method was able to individually track colonies growing on an agar plate, and to follow the evolution of their surface or volume. Using an appropriate statistical modeling framework, for a given microorganism, the doubling time has been estimated for each individual colony, as well as its variability between colonies, both within and between agar plates. A proof of concept led on four bacterial strains of four distinct species demonstrated the feasibility and the interest of the approach. It showed in particular that doubling times derived from early tri-dimensional measurements on microcolonies differed from classical measurements in micro-dilutions based on optical diffusion. Such a precise characterization of the tri-dimensional shape of microcolonies in their late-lag to early-exponential phase could be beneficial in terms of in vitro diagnostics. Indeed, real-time monitoring of the biomass available in a colony could allow to run well established microbial identification workflows like, for instance, MALDI-TOF mass-spectrometry, as soon as a sufficient quantity of material is available, thereby reducing the time needed to provide a diagnostic. Moreover, as done for pre-identification of macro-colonies, morphological indicators such as three-dimensional growth profiles derived from microcolonies could be used to perform a first pre-identification step, but in a shorten time.

bioRxiv | 2018

A fast and agnostic method for bacterial genome-wide association studies: bridging the gap between kmers and genetic events

Magali Jaillard; Leandro Lima; Maud Tournoud; Pierre Mahé; Alex van Belkum; Vincent Lacroix; Laurent Jacob

Motivation Genome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or fine-assessment of marker effect. Recently, alignment-free methods based on kmer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are hard to interpret. Methods Here, we introduce DBGWAS, an extended kmer-based GWAS method producing interpretable genetic variants associated with pheno-types. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes identified by the association model into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is fast, alignment-free and only requires a set of contigs and phenotypes. It produces annotated subgraphs representing local polymorphisms as well as mobile genetic elements (MGE) and offers a graphical framework to interpret GWAS results. Results We validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in Mycobacterium tuberculosis and genes acquired by horizontal transfer in Staphylococcus aureus and Pseudomonas aeruginosa – along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature. Conclusion Our novel method proved its efficiency to retrieve any type of phenotype-associated genetic variant without prior knowledge. All experiments were computed in less than two hours and produced a compact set of meaningful subgraphs, thereby outperforming other GWAS approaches and facilitating the interpretation of the results. Availability Open-source tool available at https://gitlab.com/leoisl/dbgwas

BMC Bioinformatics | 2018

Variance component analysis to assess protein quantification in biomarker validation: application to selected reaction monitoring-mass spectrometry

Amna Klich; Catherine Mercier; Laurent Gerfault; Pierre Grangeat; Corinne Beaulieu; Elodie Degout-Charmette; Tanguy Fortin; Pierre Mahé; Jean-François Giovannelli; Jean-Philippe Charrier; Audrey Giremus; Delphine Maucort-Boulch; Pascal Roy

BackgroundIn the field of biomarker validation with mass spectrometry, controlling the technical variability is a critical issue. In selected reaction monitoring (SRM) measurements, this issue provides the opportunity of using variance component analysis to distinguish various sources of variability. However, in case of unbalanced data (unequal number of observations in all factor combinations), the classical methods cannot correctly estimate the various sources of variability, particularly in presence of interaction. The present paper proposes an extension of the variance component analysis to estimate the various components of the variance, including an interaction component in case of unbalanced data.ResultsWe applied an experimental design that uses a serial dilution to generate known relative protein concentrations and estimated these concentrations by two processing algorithms, a classical and a more recent one. The extended method allowed estimating the variances explained by the dilution and the technical process by each algorithm in an experiment with 9 proteins: L-FABP, 14.3.3 sigma, Calgi, Def.A6, Villin, Calmo, I-FABP, Peroxi-5, and S100A14. Whereas, the recent algorithm gave a higher dilution variance and a lower technical variance than the classical one in two proteins with three peptides (L-FABP and Villin), there were no significant difference between the two algorithms on all proteins.ConclusionsThe extension of the variance component analysis was able to estimate correctly the variance components of protein concentration measurement in case of unbalanced design.

Bioinformatics | 2014

Automatic identification of mixed bacterial species fingerprints in a MALDI-TOF mass-spectrum.

Pierre Mahé; Maud Arsac; Sonia Chatellier; Valérie Monnin; Nadine Perrot; Sandrine Mailler; Victoria Girard; Mahendrasingh Ramjeet; Jérémy Surre; Bruno Lacroix; Alex van Belkum; Jean-Baptiste Veyrieras

research in computational molecular biology | 2012