Pierre Pudlo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pierre Pudlo is active.

Explore More

Publication

Featured researches published by Pierre Pudlo.

Bioinformatics | 2014

DIYABC v2.0: a software to make approximate Bayesian computation inferences about population history using single nucleotide polymorphism, DNA sequence and microsatellite data

Jean-Marie Cornuet; Pierre Pudlo; Julien Veyssier; Alexandre Dehne-Garcia; Mathieu Gautier; Raphaël Leblois; Jean-Michel Marin; Arnaud Estoup

MOTIVATION DIYABC is a software package for a comprehensive analysis of population history using approximate Bayesian computation on DNA polymorphism data. Version 2.0 implements a number of new features and analytical methods. It allows (i) the analysis of single nucleotide polymorphism data at large number of loci, apart from microsatellite and DNA sequence data, (ii) efficient Bayesian model choice using linear discriminant analysis on summary statistics and (iii) the serial launching of multiple post-processing analyses. DIYABC v2.0 also includes a user-friendly graphical interface with various new options. It can be run on three operating systems: GNU/Linux, Microsoft Windows and Apple Os X. AVAILABILITY Freely available with a detailed notice document and example projects to academic users at http://www1.montpellier.inra.fr/CBGP/diyabc CONTACT: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Statistics and Computing | 2012

Approximate Bayesian Computational methods

Jean-Michel Marin; Pierre Pudlo; Christian P. Robert; Robin J. Ryder

Approximate Bayesian Computation (ABC) methods, also known as likelihood-free techniques, have appeared in the past ten years as the most satisfactory approach to intractable likelihood problems, first in genetics then in a broader spectrum of applications. However, these methods suffer to some degree from calibration difficulties that make them rather volatile in their implementation and thus render them suspicious to the users of more traditional Monte Carlo methods. In this survey, we study the various improvements and extensions brought on the original ABC algorithm in recent years.

Molecular Ecology | 2013

The effect of RAD allele dropout on the estimation of genetic variation within and between populations

Mathieu Gautier; Karim Gharbi; Timothee Cezard; Julien Foucaud; Carole Kerdelhué; Pierre Pudlo; Jean-Marie Cornuet; Arnaud Estoup

Inexpensive short‐read sequencing technologies applied to reduced representation genomes is revolutionizing genetic research, especially population genetics analysis, by allowing the genotyping of massive numbers of single‐nucleotide polymorphisms (SNP) for large numbers of individuals and populations. Restriction site–associated DNA (RAD) sequencing is a recent technique based on the characterization of genomic regions flanking restriction sites. One of its potential drawbacks is the presence of polymorphism within the restriction site, which makes it impossible to observe the associated SNP allele (i.e. allele dropout, ADO). To investigate the effect of ADO on genetic variation estimated from RAD markers, we first mathematically derived measures of the effect of ADO on allele frequencies as a function of different parameters within a single population. We then used RAD data sets simulated using a coalescence model to investigate the magnitude of biases induced by ADO on the estimation of expected heterozygosity and FST under a simple demographic model of divergence between two populations. We found that ADO tends to overestimate genetic variation both within and between populations. Assuming a mutation rate per nucleotide between 10−9 and 10−8, this bias remained low for most studied combinations of divergence time and effective population size, except for large effective population sizes. Averaging FST values over multiple SNPs, for example, by sliding window analysis, did not correct ADO biases. We briefly discuss possible solutions to filter the most problematic cases of ADO using read coverage to detect markers with a large excess of null alleles.

Molecular Ecology | 2013

Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping

Mathieu Gautier; Julien Foucaud; Karim Gharbi; Timothee Cezard; Maxime Galan; Anne Loiseau; Marian Thomson; Pierre Pudlo; Carole Kerdelhué; Arnaud Estoup

Molecular markers produced by next‐generation sequencing (NGS) technologies are revolutionizing genetic research. However, the costs of analysing large numbers of individual genomes remain prohibitive for most population genetics studies. Here, we present results based on mathematical derivations showing that, under many realistic experimental designs, NGS of DNA pools from diploid individuals allows to estimate the allele frequencies at single nucleotide polymorphisms (SNPs) with at least the same accuracy as individual‐based analyses, for considerably lower library construction and sequencing efforts. These findings remain true when taking into account the possibility of substantially unequal contributions of each individual to the final pool of sequence reads. We propose the intuitive notion of effective pool size to account for unequal pooling and derive a Bayesian hierarchical model to estimate this parameter directly from the data. We provide a user‐friendly application assessing the accuracy of allele frequency estimation from both pool‐ and individual‐based NGS population data under various sampling, sequencing depth and experimental error designs. We illustrate our findings with theoretical examples and real data sets corresponding to SNP loci obtained using restriction site–associated DNA (RAD) sequencing in pool‐ and individual‐based experiments carried out on the same population of the pine processionary moth (Thaumetopoea pityocampa). NGS of DNA pools might not be optimal for all types of studies but provides a cost‐effective approach for estimating allele frequencies for very large numbers of SNPs. It thus allows comparison of genome‐wide patterns of genetic variation for large numbers of individuals in multiple populations.

Molecular Ecology Resources | 2012

Estimation of demo-genetic model probabilities with Approximate Bayesian Computation using linear discriminant analysis on summary statistics

Arnaud Estoup; Eric Lombaert; Jean-Michel Marin; Thomas Guillemaud; Pierre Pudlo; Christian P. Robert; Jean-Marie Cornuet

Comparison of demo‐genetic models using Approximate Bayesian Computation (ABC) is an active research field. Although large numbers of populations and models (i.e. scenarios) can be analysed with ABC using molecular data obtained from various marker types, methodological and computational issues arise when these numbers become too large. Moreover, Robert et al. (Proceedings of the National Academy of Sciences of the United States of America, 2011, 108, 15112) have shown that the conclusions drawn on ABC model comparison cannot be trusted per se and required additional simulation analyses. Monte Carlo inferential techniques to empirically evaluate confidence in scenario choice are very time‐consuming, however, when the numbers of summary statistics (Ss) and scenarios are large. We here describe a methodological innovation to process efficient ABC scenario probability computation using linear discriminant analysis (LDA) on Ss before computing logistic regression. We used simulated pseudo‐observed data sets (pods) to assess the main features of the method (precision and computation time) in comparison with traditional probability estimation using raw (i.e. not LDA transformed) Ss. We also illustrate the method on real microsatellite data sets produced to make inferences about the invasion routes of the coccinelid Harmonia axyridis. We found that scenario probabilities computed from LDA‐transformed and raw Ss were strongly correlated. Type I and II errors were similar for both methods. The faster probability computation that we observed (speed gain around a factor of 100 for LDA‐transformed Ss) substantially increases the ability of ABC practitioners to analyse large numbers of pods and hence provides a manageable way to empirically evaluate the power available to discriminate among a large set of complex scenarios.

Bioinformatics | 2016

Reliable ABC model choice via random forests

Pierre Pudlo; Jean-Michel Marin; Arnaud Estoup; Jean-Marie Cornuet; Mathieu Gautier; Christian P. Robert

MOTIVATION Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. RESULTS We propose a novel approach based on a machine learning tool named random forests (RF) to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with RF and postponing the approximation of the posterior probability of the selected model for a second stage also relying on RF. Compared with earlier implementations of ABC model choice, the ABC RF approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least 50) and (iv) it includes an approximation of the posterior probability of the selected model. The call to RF will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. AVAILABILITY AND IMPLEMENTATION The proposed methodology is implemented in the R package abcrf available on the CRAN. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Molecular Biology and Evolution | 2014

Maximum-Likelihood Inference of Population Size Contractions from Microsatellite Data

Raphaël Leblois; Pierre Pudlo; Joseph Néron; François Bertaux; Champak Reddy Beeravolu; Renaud Vitalis; François Rousset

Understanding the demographic history of populations and species is a central issue in evolutionary biology and molecular ecology. In this work, we develop a maximum-likelihood method for the inference of past changes in population size from microsatellite allelic data. Our method is based on importance sampling of gene genealogies, extended for new mutation models, notably the generalized stepwise mutation model (GSM). Using simulations, we test its performance to detect and characterize past reductions in population size. First, we test the estimation precision and confidence intervals coverage properties under ideal conditions, then we compare the accuracy of the estimation with another available method (MSVAR) and we finally test its robustness to misspecification of the mutational model and population structure. We show that our method is very competitive compared with alternative ones. Moreover, our implementation of a GSM allows more accurate analysis of microsatellite data, as we show that the violations of a single step mutation assumption induce very high bias toward false contraction detection rates. However, our simulation tests also showed some limits, which most importantly are large computation times for strong disequilibrium scenarios and a strong influence of some form of unaccounted population structure. This inference method is available in the latest implementation of the MIGRAINE software package.

Molecular Biology and Evolution | 2017

Deciphering the routes of invasion of Drosophila suzukii by means of ABC random forest

Antoine Fraimout; Vincent Debat; Simon Fellous; Ruth A. Hufbauer; Julien Foucaud; Pierre Pudlo; Jean-Michel Marin; Donald K. Price; Julien Cattel; Xiao Chen; Maríndia Deprá; Pierre François Duyck; Christelle Guédot; Marc Kenis; Masahito T. Kimura; Gregory M. Loeb; Anne Loiseau; Isabel Martinez-Sañudo; Marta Pascual; Maxi Polihronakis Richmond; Peter Shearer; Nadia Singh; Koichiro Tamura; A. Xuéreb; Jinping Zhang; Arnaud Estoup

Abstract Deciphering invasion routes from molecular data is crucial to understanding biological invasions, including identifying bottlenecks in population size and admixture among distinct populations. Here, we unravel the invasion routes of the invasive pest Drosophila suzukii using a multi-locus microsatellite dataset (25 loci on 23 worldwide sampling locations). To do this, we use approximate Bayesian computation (ABC), which has improved the reconstruction of invasion routes, but can be computationally expensive. We use our study to illustrate the use of a new, more efficient, ABC method, ABC random forest (ABC-RF) and compare it to a standard ABC method (ABC-LDA). We find that Japan emerges as the most probable source of the earliest recorded invasion into Hawaii. Southeast China and Hawaii together are the most probable sources of populations in western North America, which then in turn served as sources for those in eastern North America. European populations are genetically more homogeneous than North American populations, and their most probable source is northeast China, with evidence of limited gene flow from the eastern US as well. All introduced populations passed through bottlenecks, and analyses reveal five distinct admixture events. These findings can inform hypotheses concerning how this species evolved between different and independent source and invasive populations. Methodological comparisons indicate that ABC-RF and ABC-LDA show concordant results if ABC-LDA is based on a large number of simulated datasets but that ABC-RF out-performs ABC-LDA when using a comparable and more manageable number of simulated datasets, especially when analyzing complex introduction scenarios.

Advances in Applied Probability | 2012

The Normalized Graph Cut and Cheeger Constant: from Discrete to Continuous

Ery Arias-Castro; Bruno Pelletier; Pierre Pudlo

Let M be a bounded domain of with a smooth boundary. We relate the Cheeger constant of M and the conductance of a neighborhood graph defined on a random sample from M. By restricting the minimization defining the latter over a particular class of subsets, we obtain consistency (after normalization) as the sample size increases, and show that any minimizing sequence of subsets has a subsequence converging to a Cheeger set of M.

Journal of Nonparametric Statistics | 2013

Estimation of density level sets with a given probability content

BenoÃ®t Cadre; Bruno Pelletier; Pierre Pudlo

Given a random vector X valued in ℝ d with density f and an arbitrary probability number p∈(0; 1), we consider the estimation of the upper level setf≥t (p)of f corresponding to probability content p, that is, such that the probability that X belongs tof≥t (p)is equal to p. Based on an i.i.d. random sample X 1, …, X n drawn from f, we define the plug-in level set estimate , where is a random threshold depending on the sample and [fcirc] n is a nonparametric kernel density estimate based on the same sample. We establish the exact convergence rate of the Lebesgue measure of the symmetric difference between the estimated and actual level sets.

Explore More