Paul Fearnhead
Lancaster University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paul Fearnhead.
Nature Genetics | 2007
Meredith Yeager; Nick Orr; Richard B. Hayes; Kevin B. Jacobs; Peter Kraft; Sholom Wacholder; Mark J Minichiello; Paul Fearnhead; Kai Yu; Nilanjan Chatterjee; Zhaoming Wang; Robert Welch; Brian Staats; Eugenia E. Calle; Heather Spencer Feigelson; Michael J. Thun; Carmen Rodriguez; Demetrius Albanes; Jarmo Virtamo; Stephanie J. Weinstein; Fredrick R. Schumacher; Edward Giovannucci; Walter C. Willett; Geraldine Cancel-Tassin; Olivier Cussenot; Antoine Valeri; Gerald L. Andriole; Edward P. Gelmann; Margaret A. Tucker; Daniela S. Gerhard
Recently, common variants on human chromosome 8q24 were found to be associated with prostate cancer risk. While conducting a genome-wide association study in the Cancer Genetic Markers of Susceptibility project with 550,000 SNPs in a nested case-control study (1,172 cases and 1,157 controls of European origin), we identified a new association at 8q24 with an independent effect on prostate cancer susceptibility. The most significant signal is 70 kb centromeric to the previously reported SNP, rs1447295, but shows little evidence of linkage disequilibrium with it. A combined analysis with four additional studies (total: 4,296 cases and 4,299 controls) confirms association with prostate cancer for rs6983267 in the centromeric locus (P = 9.42 × 10−13; heterozygote odds ratio (OR): 1.26, 95% confidence interval (c.i.): 1.13–1.41; homozygote OR: 1.58, 95% c.i.: 1.40–1.78). Each SNP remained significant in a joint analysis after adjusting for the other (rs1447295 P = 1.41 × 10−11; rs6983267 P = 6.62 × 10−10). These observations, combined with compelling evidence for a recombination hotspot between the two markers, indicate the presence of at least two independent loci within 8q24 that contribute to prostate cancer in men of European ancestry. We estimate that the population attributable risk of the new locus, marked by rs6983267, is higher than the locus marked by rs1447295 (21% versus 9%).
Journal of the American Statistical Association | 2012
Rebecca Killick; Paul Fearnhead; Idris A. Eckley
We consider the problem of detecting multiple changepoints in large data sets. Our focus is on applications where the number of changepoints will increase as we collect more data: for example in genetics as we analyse larger regions of the genome, or in finance as we observe time-series over longer periods. We consider the common approach of detecting changepoints through minimising a cost function over possible numbers and locations of changepoints. This includes several established procedures for detecting changing points, such as penalised likelihood and minimum description length. We introduce a new ∗R. Killick is Senior Research Associate, Department of Mathematics & Statistics, Lancaster University, Lancaster, UK (E-mail: [email protected]). P. Fearnhead is Professor, Department of Mathematics & Statistics, Lancaster University, Lancaster, UK (E-mail: [email protected]). I.A. Eckley is Senior Lecturer, Department of Mathematics & Statistics, Lancaster University, Lancaster, UK (E-mail: [email protected]). The authors are grateful to Richard Davis and Alice Cleynen for providing the Auto-PARM and PDPA software respectively. Part of this research was conducted whilst R. Killick was a jointly funded Engineering and Physical Sciences Research Council (EPSRC) / Shell Research Ltd graduate student at Lancaster University. Both I.A. Eckley and R. Killick also gratefully acknowledge the financial support of the EPSRC grant number EP/I016368/1. 1 ar X iv :1 10 1. 14 38 v3 [ st at .M E ] 9 O ct 2 01 2 method for finding the minimum of such cost functions and hence the optimal number and location of changepoints that has a computational cost which, under mild conditions, is linear in the number of observations. This compares favourably with existing methods for the same problem whose computational cost can be quadratic or even cubic. In simulation studies we show that our new method can be orders of magnitude faster than these alternative exact methods. We also compare with the Binary Segmentation algorithm for identifying changepoints, showing that the exactness of our approach can lead to substantial improvements in the accuracy of the inferred segmentation of the data.
PLOS Genetics | 2008
Daniel J. Wilson; Edith Gabriel; A. J. H. Leatherbarrow; John Cheesbrough; Steven Gee; Eric Bolton; Andrew Fox; Paul Fearnhead; C. Anthony Hart; Peter J. Diggle
Campylobacter jejuni is the leading cause of bacterial gastro-enteritis in the developed world. It is thought to infect 2–3 million people a year in the US alone, at a cost to the economy in excess of US
Statistics and Computing | 2006
Paul Fearnhead
4 billion. C. jejuni is a widespread zoonotic pathogen that is carried by animals farmed for meat and poultry. A connection with contaminated food is recognized, but C. jejuni is also commonly found in wild animals and water sources. Phylogenetic studies have suggested that genotypes pathogenic to humans bear greatest resemblance to non-livestock isolates. Moreover, seasonal variation in campylobacteriosis bears the hallmarks of water-borne disease, and certain outbreaks have been attributed to contamination of drinking water. As a result, the relative importance of these reservoirs to human disease is controversial. We use multilocus sequence typing to genotype 1,231 cases of C. jejuni isolated from patients in Lancashire, England. By modeling the DNA sequence evolution and zoonotic transmission of C. jejuni between host species and the environment, we assign human cases probabilistically to source populations. Our novel population genetics approach reveals that the vast majority (97%) of sporadic disease can be attributed to animals farmed for meat and poultry. Chicken and cattle are the principal sources of C. jejuni pathogenic to humans, whereas wild animal and environmental sources are responsible for just 3% of disease. Our results imply that the primary transmission route is through the food chain, and suggest that incidence could be dramatically reduced by enhanced on-farm biosecurity or preventing food-borne transmission.
Journal of The Royal Statistical Society Series B-statistical Methodology | 2003
Paul Fearnhead; Peter Clifford
We demonstrate how to perform direct simulation from the posterior distribution of a class of multiple changepoint models where the number of changepoints is unknown. The class of models assumes independence between the posterior distribution of the parameters associated with segments of data between successive changepoints. This approach is based on the use of recursions, and is related to work on product partition models. The computational complexity of the approach is quadratic in the number of observations, but an approximate version, which introduces negligible error, and whose computational cost is roughly linear in the number of observations, is also possible. Our approach can be useful, for example within an MCMC algorithm, even when the independence assumptions do not hold. We demonstrate our approach on coal-mining disaster data and on well-log data. Our method can cope with a range of models, and exact simulation from the posterior distribution is possible in a matter of minutes.
Molecular Biology and Evolution | 2009
Daniel J. Wilson; Edith Gabriel; A. J. H. Leatherbarrow; John Cheesbrough; Steven Gee; Eric Bolton; Andrew S Fox; C. Anthony Hart; Peter J. Diggle; Paul Fearnhead
We consider the on-line Bayesian analysis of data by using a hidden Markov model, where inference is tractable conditional on the history of the state of the hidden component. A new particle filter algorithm is introduced and shown to produce promising results when analysing data of this type. The algorithm is similar to the mixture Kalman filter but uses a different resampling algorithm. We prove that this resampling algorithm is computationally efficient and optimal, among unbiased resampling algorithms, in terms of minimizing a squared error loss function. In a practical example, that of estimating break points from well-log data, our new particle filter outperforms two other particle filters, one of which is the mixture Kalman filter, by between one and two orders of magnitude.
Statistics and Computing | 2004
Paul Fearnhead
Responsible for the majority of bacterial gastroenteritis in the developed world, Campylobacter jejuni is a pervasive pathogen of humans and animals, but its evolution is obscure. In this paper, we exploit contemporary genetic diversity and empirical evidence to piece together the evolutionary history of C. jejuni and quantify its evolutionary potential. Our combined population genetics–phylogenetics approach reveals a surprising picture. Campylobacter jejuni is a rapidly evolving species, subject to intense purifying selection that purges 60% of novel variation, but possessing a massive evolutionary potential. The low mutation rate is offset by a large effective population size so that a mutation at any site can occur somewhere in the population within the space of a week. Recombination has a fundamental role, generating diversity at twice the rate of de novo mutation, and facilitating gene flow between C. jejuni and its sister species Campylobacter coli. We attempt to calibrate the rate of molecular evolution in C. jejuni based solely on within-species variation. The rates we obtain are up to 1,000 times faster than conventional estimates, placing the C. jejuni–C. coli split at the time of the Neolithic revolution. We weigh the plausibility of such recent bacterial evolution against alternative explanations and discuss the evidence required to settle the issue.
Statistics and Computing | 2008
Paul Fearnhead
We consider the analysis of data under mixture models where the number of components in the mixture is unknown. We concentrate on mixture Dirichlet process models, and in particular we consider such models under conjugate priors. This conjugacy enables us to integrate out many of the parameters in the model, and to discretize the posterior distribution. Particle filters are particularly well suited to such discrete problems, and we propose the use of the particle filter of Fearnhead and Clifford for this problem. The performance of this particle filter, when analyzing both simulated and real data from a Gaussian mixture model, is uniformly better than the particle filter algorithm of Chen and Liu. In many situations it outperforms a Gibbs Sampler. We also show how models without the required amount of conjugacy can be efficiently analyzed by the same particle filter algorithm.
Journal of The Royal Statistical Society Series B-statistical Methodology | 2002
Paul Fearnhead; Peter Donnelly
Abstract We consider analysis of complex stochastic models based upon partial information. MCMC and reversible jump MCMC are often the methods of choice for such problems, but in some situations they can be difficult to implement; and suffer from problems such as poor mixing, and the difficulty of diagnosing convergence. Here we review three alternatives to MCMC methods: importance sampling, the forward-backward algorithm, and sequential Monte Carlo (SMC). We discuss how to design good proposal densities for importance sampling, show some of the range of models for which the forward-backward algorithm can be applied, and show how resampling ideas from SMC can be used to improve the efficiency of the other two methods. We demonstrate these methods on a range of examples, including estimating the transition density of a diffusion and of a discrete-state continuous-time Markov chain; inferring structure in population genetics; and segmenting genetic divergence data.
IEEE Transactions on Signal Processing | 2005
Paul Fearnhead
There is currently great interest in understanding the way in which recombination rates vary, over short scales, across the human genome. Aside from inherent interest, an understanding of this local variation is essential for the sensible design and analysis of many studies aimed at elucidating the genetic basis of common diseases or of human population histories. Standard pedigree-based approaches do not have the fine scale resolution that is needed to address this issue. In contrast, samples of deoxyribonucleic acid sequences from unrelated chromosomes in the population carry relevant information, but inference from such data is extremely challenging. Although there has been much recent interest in the development of full likelihood inference methods for estimating local recombination rates from such data, they are not currently practicable for data sets of the size being generated by modern experimental techniques. We introduce and study two approximate likelihood methods. The first, a marginal likelihood, ignores some of the data. A careful choice of what to ignore results in substantial computational savings with virtually no loss of relevant information. For larger sequences, we introduce a ‘composite’ likelihood, which approximates the model of interest by ignoring certain long-range dependences. An informal asymptotic analysis and a simulation study suggest that inference based on the composite likelihood is practicable and performs well. We combine both methods to reanalyse data from the lipoprotein lipase gene, and the results seriously question conclusions from some earlier studies of these data.