Peter Pfaffelhuber
University of Freiburg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter Pfaffelhuber.
Genetics | 2009
Thomas Städler; Bernhard Haubold; Carlos Merino; Wolfgang Stephan; Peter Pfaffelhuber
Using coalescent simulations, we study the impact of three different sampling schemes on patterns of neutral diversity in structured populations. Specifically, we are interested in two summary statistics based on the site frequency spectrum as a function of migration rate, demographic history of the entire substructured population (including timing and magnitude of specieswide expansions), and the sampling scheme. Using simulations implementing both finite-island and two-dimensional stepping-stone spatial structure, we demonstrate strong effects of the sampling scheme on Tajimas D (DT) and Fu and Lis D (DFL) statistics, particularly under specieswide (range) expansions. Pooled samples yield average DT and DFL values that are generally intermediate between those of local and scattered samples. Local samples (and to a lesser extent, pooled samples) are influenced by local, rapid coalescence events in the underlying coalescent process. These processes result in lower proportions of external branch lengths and hence lower proportions of singletons, explaining our finding that the sampling scheme affects DFL more than it does DT. Under specieswide expansion scenarios, these effects of spatial sampling may persist up to very high levels of gene flow (Nm > 25), implying that local samples cannot be regarded as being drawn from a panmictic population. Importantly, many data sets on humans, Drosophila, and plants contain signatures of specieswide expansions and effects of sampling scheme that are predicted by our simulation results. This suggests that validating the assumption of panmixia is crucial if robust demographic inferences are to be made from local or pooled samples. However, future studies should consider adopting a framework that explicitly accounts for the genealogical effects of population subdivision and empirical sampling schemes.
Genome Biology and Evolution | 2012
Franz Baumdicker; Wolfgang R. Hess; Peter Pfaffelhuber
The distributed genome hypothesis states that the gene pool of a bacterial taxon is much more complex than that found in a single individual genome. However, the possible fitness advantage, why such genomic diversity is maintained, whether this variation is largely adaptive or neutral, and why these distinct individuals can coexist, remains poorly understood. Here, we present the infinitely many genes (IMG) model, which is a quantitative, evolutionary model for the distributed genome. It is based on a genealogy of individual genomes and the possibility of gene gain (from an unbounded reservoir of novel genes, e.g., by horizontal gene transfer from distant taxa) and gene loss, for example, by pseudogenization and deletion of genes, during reproduction. By implementing these mechanisms, the IMG model differs from existing concepts for the distributed genome, which cannot differentiate between neutral evolution and adaptation as drivers of the observed genomic diversity. Using the IMG model, we tested whether the distributed genome of 22 full genomes of picocyanobacteria (Prochlorococcus and Synechococcus) shows signs of adaptation or neutrality. We calculated the effective population size of Prochlorococcus at 1.01 × 1011 and predicted 18 distinct clades for this population, only six of which have been isolated and cultured thus far. We predicted that the Prochlorococcus pangenome contains 57,792 genes and found that the evolution of the distributed genome of Prochlorococcus was possibly neutral, whereas that of Synechococcus and the combined sample shows a clear deviation from neutrality.
Molecular Ecology | 2010
Bernhard Haubold; Peter Pfaffelhuber; Michael Lynch
Improvements in sequencing technology over the past 5 years are leading to routine application of shotgun sequencing in the fields of ecology and evolution. However, the theory to estimate evolutionary parameters from these data is still being worked out. Here we present an extension and implementation of part of this theory, mlRho. This program can efficiently compute the following three maximum likelihood estimators based on shotgun sequence data obtained from single diploid individuals: the population mutation rate (4Neμ), the sequencing error rate, and the population recombination rate (4Nec). We demonstrate the accuracy of mlRho by applying it to simulated data sets. In addition, we analyse the genomes of the sea squirt Ciona intestinalis and the water flea Daphnia pulex. Ciona intestinalis is an obligate outcrosser, while D. pulex is a cyclic parthenogen, and we discuss how these contrasting life histories are reflected in our parameter estimates. The program mlRho is freely available from http://guanine.evolbio.mpg.de/mlRho.
Genetics | 2008
Peter Pfaffelhuber; A. Lehnert; Wolfgang Stephan
The model of genetic hitchhiking predicts a reduction in sequence diversity at a neutral locus closely linked to a beneficial allele. In addition, it has been shown that the same process results in a specific pattern of correlations (linkage disequilibrium) between neutral polymorphisms along the chromosome at the time of fixation of the beneficial allele. During the hitchhiking event, linkage disequilibrium on either side of the beneficial allele is built up whereas it is destroyed across the selected site. We derive explicit formulas for the expectation of the covariance measure D and standardized linkage disequilibrium \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathrm{{\sigma}}_{D}^{2}\) \end{document} between a pair of polymorphic sites. For our analysis we use the approximation of a star-like genealogy at the selected site. The resulting expressions are approximately correct in the limit of large selection coefficients. Using simulations we show that the resulting pattern of linkage disequilibrium is quickly—i.e., in <0.1N generations—destroyed after the fixation of the beneficial allele for moderately distant neutral loci, where N is the diploid population size.
Annals of Applied Probability | 2010
Franz Baumdicker; Wolfgang R. Hess; Peter Pfaffelhuber
The distributed genome hypothesis states that the set of genes in a population of bacteria is distributed over all individuals that belong to the specific taxon. It implies that certain genes can be gained and lost from generation to generation. We use the random genealogy given by a Kingman coalescent in order to superimpose events of gene gain and loss along ancestral lines. Gene gains occur at a constant rate along ancestral lines. We assume that gained genes have never been present in the population before. Gene losses occur at a rate proportional to the number of genes present along the ancestral line. In this infinitely many genes model we derive moments for several statistics within a sample: the average number of genes per individual, the average number of genes differing between individuals, the number of incongruent pairs of genes, the total number of different genes in the sample and the gene frequency spectrum. We demonstrate that the model gives a reasonable fit with gene frequency data from marine cyanobacteria.
Annals of Applied Probability | 2012
Andrej Depperschmidt; Andreas Greven; Peter Pfaffelhuber
The Fleming-Viot measure-valued diffusion is a Markov process describing the evolution of (allelic) types under mutation, selection and random reproduction. We enrich this process by genealogical relations of individuals so that the random type distribution as well as the genealogical distances in the population evolve stochastically. The state space of this tree-valued enrichment of the Fleming-Viot dynamics with mutation and selection (TFVMS) consists of marked ultrametric measure spaces, equipped with the marked Gromov-weak topology and a suitable notion of polynomials as a separating algebra of test functions. The construction and study of the TFVMS is based on a well-posed martingale problem. For existence, we use approximating finite population models, the tree-valued Moran models, while uniqueness follows from duality to a function-valued process. Path properties of the resulting process carry over from the neutral case due to absolute continuity, given by a new Girsanov-type theorem on marked metric measure spaces. To study the long-time behavior of the process, we use a duality based on ideas from Dawson and Greven [On the effects of migration in spatial Fleming-Viot models with selection and mutation (2011c) Unpublished manuscript] and prove ergodicity of the TFVMS if the Fleming-Viot measure-valued diffusion is ergodic. As a further application, we consider the case of two allelic types and additive selection. For small selection strength, we give an expansion of the Laplace transform of genealogical distances in equilibrium, which is a first step in showing that distances are shorter in the selective case.
Bioinformatics | 2015
Bernhard Haubold; Fabian Klötzl; Peter Pfaffelhuber
MOTIVATION A standard approach to classifying sets of genomes is to calculate their pairwise distances. This is difficult for large samples. We have therefore developed an algorithm for rapidly computing the evolutionary distances between closely related genomes. RESULTS Our distance measure is based on ungapped local alignments that we anchor through pairs of maximal unique matches of a minimum length. These exact matches can be looked up efficiently using enhanced suffix arrays and our implementation requires approximately only 1 s and 45 MB RAM/Mbase analysed. The pairing of matches distinguishes non-homologous from homologous regions leading to accurate distance estimation. We show this by analysing simulated data and genome samples ranging from 29 Escherichia coli/Shigella genomes to 3085 genomes of Streptococcus pneumoniae. AVAILABILITY AND IMPLEMENTATION We have implemented the computation of anchor distances in the multithreaded UNIX command-line program andi for ANchor DIstances. C sources and documentation are posted at http://github.com/evolbioinf/andi/ CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Bioinformatics | 2011
Bernhard Haubold; Floyd A. Reed; Peter Pfaffelhuber
MOTIVATION Sequencing capacity is currently growing more rapidly than CPU speed, leading to an analysis bottleneck in many genome projects. Alignment-free sequence analysis methods tend to be more efficient than their alignment-based counterparts. They may, therefore, be important in the long run for keeping sequence analysis abreast with sequencing. RESULTS We derive and implement an alignment-free estimator of the number of pairwise mismatches, . Our implementation of , pim, is based on an enhanced suffix array and inherits the superior time and memory efficiency of this data structure. Simulations demonstrate that is accurate if mutations are distributed randomly along the chromosome. While real data often deviates from this ideal, remains useful for identifying regions of low genetic diversity using a sliding window approach. We demonstrate this by applying it to the complete genomes of 37 strains of Drosophila melanogaster, and to the genomes of two closely related Drosophila species, D.simulans and D.sechellia. In both cases, we detect the diversity minimum and discuss its biological implications.
Genetics | 2006
Peter Pfaffelhuber; Bernhard Haubold; Anton Wakolbinger
The rapid fixation of an advantageous allele leads to a reduction in linked neutral variation around the target of selection. The genealogy at a neutral locus in such a selective sweep can be simulated by first generating a random path of the advantageous alleles frequency and then a structured coalescent in this background. Usually the frequency path is approximated by a logistic growth curve. We discuss an alternative method that approximates the genealogy by a random binary splitting tree, a so-called Yule tree that does not require first constructing a frequency path. Compared to the coalescent in a logistic background, this method gives a slightly better approximation for identity by descent during the selective phase and a much better approximation for the number of lineages that stem from the founder of the selective sweep. In applications such as the approximation of the distribution of Tajimas D, the two approximation methods perform equally well. For relevant parameter ranges, the Yule approximation is faster.
Journal of Biotechnology | 2015
Jan-Philip Schlüter; Peter Czuppon; Oliver Schauer; Peter Pfaffelhuber; Matthew McIntosh; Anke Becker
Phenotypic heterogeneity, defined as the unequal behavior of individuals in an isogenic population, is prevalent in microorganisms. It has a significant impact both on industrial bioprocesses and microbial ecology. We introduce a new versatile reporter system designed for simultaneous monitoring of the activities of three different promoters, where each promoter is fused to a dedicated fluorescent reporter gene (cerulean, mCherry, and mVenus). The compact 3.1 kb triple reporter cassette can either be carried on a replicating plasmid or integrated into the genome avoiding artifacts associated with variation in copy number of plasmid-borne reporter constructs. This construct was applied to monitor promoter activities related to quorum sensing (sinI promoter) and biosynthesis of the exopolysaccharide galactoglucan (wgeA promoter) at single cell level in colonies of the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti growing in a microfluidics system. The T5-promoter served as a constitutive and homogeneously active control promoter indicating cell viability. wgeA promoter activity was heterogeneous over the whole period of colony development, whereas sinI promoter activity passed through a phase of heterogeneity before becoming homogeneous at late stages. Although quorum sensing-dependent regulation is a major factor activating galactoglucan production, activities of both promoters did not correlate at single cell level. We developed a novel mathematical strategy for classification of the gene expression status in cell populations based on the increase in fluorescence over time in each individual. With respect to galactoglucan biosynthesis, cells in the population were classified into non-contributors, weak contributors, and strong contributors.