Kevin J. Dawson
Wellcome Trust Sanger Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kevin J. Dawson.
Nature Communications | 2014
Niccolo Bolli; Hervé Avet-Loiseau; David C. Wedge; Peter Van Loo; Ludmil B. Alexandrov; Inigo Martincorena; Kevin J. Dawson; Francesco Iorio; Serena Nik-Zainal; Graham R. Bignell; Jonathan Hinton; Yilong Li; Jose M. C. Tubio; Stuart McLaren; Sarah O’Meara; Adam Butler; Jon Teague; Laura Mudie; Elizabeth Anderson; Naim Rashid; Yu-Tzu Tai; Masood A. Shammas; Adam Sperling; Mariateresa Fulciniti; Paul G. Richardson; Giovanni Parmigiani; Florence Magrangeas; Stephane Minvielle; Philippe Moreau; Michel Attal
Multiple myeloma is an incurable plasma cell malignancy with a complex and incompletely understood molecular pathogenesis. Here we use whole-exome sequencing, copy-number profiling and cytogenetics to analyse 84 myeloma samples. Most cases have a complex subclonal structure and show clusters of subclonal variants, including subclonal driver mutations. Serial sampling reveals diverse patterns of clonal evolution, including linear evolution, differential clonal response and branching evolution. Diverse processes contribute to the mutational repertoire, including kataegis and somatic hypermutation, and their relative contribution changes over time. We find heterogeneity of mutational spectrum across samples, with few recurrent genes. We identify new candidate genes, including truncations of SP140, LTB, ROBO1 and clustered missense mutations in EGR1. The myeloma genome is heterogeneous across the cohort, and exhibits diversity in clonal admixture and in dynamics of evolution, which may impact prognostic stratification, therapeutic approaches and assessment of disease response to treatment.
Genetics | 2010
Eric Bazin; Kevin J. Dawson; Mark A. Beaumont
We address the problem of finding evidence of natural selection from genetic data, accounting for the confounding effects of demographic history. In the absence of natural selection, gene genealogies should all be sampled from the same underlying distribution, often approximated by a coalescent model. Selection at a particular locus will lead to a modified genealogy, and this motivates a number of recent approaches for detecting the effects of natural selection in the genome as “outliers” under some models. The demographic history of a population affects the sampling distribution of genealogies, and therefore the observed genotypes and the classification of outliers. Since we cannot see genealogies directly, we have to infer them from the observed data under some model of mutation and demography. Thus the accuracy of an outlier-based approach depends to a greater or a lesser extent on the uncertainty about the demographic and mutational model. A natural modeling framework for this type of problem is provided by Bayesian hierarchical models, in which parameters, such as mutation rates and selection coefficients, are allowed to vary across loci. It has proved quite difficult computationally to implement fully probabilistic genealogical models with complex demographies, and this has motivated the development of approximations such as approximate Bayesian computation (ABC). In ABC the data are compressed into summary statistics, and computation of the likelihood function is replaced by simulation of data under the model. In a hierarchical setting one may be interested both in hyperparameters and parameters, and there may be very many of the latter—for example, in a genetic model, these may be parameters describing each of many loci or populations. This poses a problem for ABC in that one then requires summary statistics for each locus, which, if used naively, leads to a consequent difficulty in conditional density estimation. We develop a general method for applying ABC to Bayesian hierarchical models, and we apply it to detect microsatellite loci influenced by local selection. We demonstrate using receiver operating characteristic (ROC) analysis that this approach has comparable performance to a full-likelihood method and outperforms it when mutation rates are variable across loci.
Cell | 2017
Inigo Martincorena; Keiran Raine; Moritz Gerstung; Kevin J. Dawson; Kerstin Haase; Peter Van Loo; Helen Davies; Michael R. Stratton; Peter J. Campbell
Summary Cancer develops as a result of somatic mutation and clonal selection, but quantitative measures of selection in cancer evolution are lacking. We adapted methods from molecular evolution and applied them to 7,664 tumors across 29 cancer types. Unlike species evolution, positive selection outweighs negative selection during cancer development. On average, <1 coding base substitution/tumor is lost through negative selection, with purifying selection almost absent outside homozygous loss of essential genes. This allows exome-wide enumeration of all driver coding mutations, including outside known cancer genes. On average, tumors carry ∼4 coding substitutions under positive selection, ranging from <1/tumor in thyroid and testicular cancers to >10/tumor in endometrial and colorectal cancers. Half of driver substitutions occur in yet-to-be-discovered cancer genes. With increasing mutation burden, numbers of driver mutations increase, but not linearly. We systematically catalog cancer genes and show that genes vary extensively in what proportion of mutations are drivers versus passengers.
Molecular Ecology | 2001
J. M. Green; J. H. A. Barker; E. J. P. Marshall; R. J. Froud-Williams; N. C. B. Peters; G M Arnold; Kevin J. Dawson; A. Karp
Nine microsatellites were used to screen 131 samples of Barren Brome (Anisantha sterilis: synonym Bromus sterilis) collected from within the fields of three English farms [from Oxfordshire (Oxon), Leicestershire (Leics) and Wiltshire (Wilts)] and eight seeds taken from samples of each of 10 farms across England, UK. Most individuals (~97%) were homozygous. Polymorphism occurred at all nine loci in all three farms sampled at the field scale, and at most loci for nine of the other 10 farm samples. Between three and 11 alleles were found per locus. Gene diversity (D = 1 − ∑pi2) ranged from 0.088 to 0.760. Polymorphism occurred among individuals within and among fields, and farms. Some alleles were found in only one farm. On the basis of the alleles at all nine loci in the 211 sampled plants, a total of 92 (44%) different genotypes was identified. Clustering analysis using the unweighted pair group method with arithmetic averages (upgma) for the combined Oxon, Wilts and Leics samples did not cluster them into their respective farms. Similarly, a phenogram of samples from all 10 farms showed considerable mixing of individuals with respect to farm origins. Identification of genotypes on field plans showed evidence of both spatial localization and mixing. Previous reports have suggested that A. sterilis is strictly inbreeding with little intrapopulation variation at the genetic level. Our data reveal that A. sterilis exists as numerous separate and genetically different lines, which are maintained by inbreeding but which very occasionally outcross. Possible explanations for this pattern of high genetic diversity are discussed.
Genetics | 2014
Renaud Vitalis; Mathieu Gautier; Kevin J. Dawson; Mark A. Beaumont
The recent advent of high-throughput sequencing and genotyping technologies makes it possible to produce, easily and cost effectively, large amounts of detailed data on the genotype composition of populations. Detecting locus-specific effects may help identify those genes that have been, or are currently, targeted by natural selection. How best to identify these selected regions, loci, or single nucleotides remains a challenging issue. Here, we introduce a new model-based method, called SelEstim, to distinguish putative selected polymorphisms from the background of neutral (or nearly neutral) ones and to estimate the intensity of selection at the former. The underlying population genetic model is a diffusion approximation for the distribution of allele frequency in a population subdivided into a number of demes that exchange migrants. We use a Markov chain Monte Carlo algorithm for sampling from the joint posterior distribution of the model parameters, in a hierarchical Bayesian framework. We present evidence from stochastic simulations, which demonstrates the good power of SelEstim to identify loci targeted by selection and to estimate the strength of selection acting on these loci, within each deme. We also reanalyze a subset of SNP data from the Stanford HGDP–CEPH Human Genome Diversity Cell Line Panel to illustrate the performance of SelEstim on real data. In agreement with previous studies, our analyses point to a very strong signal of positive selection upstream of the LCT gene, which encodes for the enzyme lactase–phlorizin hydrolase and is associated with adult-type hypolactasia. The geographical distribution of the strength of positive selection across the Old World matches the interpolated map of lactase persistence phenotype frequencies, with the strongest selection coefficients in Europe and in the Indus Valley.
Bioinformatics | 2014
Andrea Gobbi; Francesco Iorio; Kevin J. Dawson; David C. Wedge; David Tamborero; Ludmil B. Alexandrov; Nuria Lopez-Bigas; Mathew J. Garnett; Giuseppe Jurman; Julio Saez-Rodriguez
Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a ‘mutually exclusive’ manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks. Availability and implementation: BiRewire is available on BioConductor at http://www.bioconductor.org/packages/2.13/bioc/html/BiRewire.html Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Insect Molecular Biology | 2004
Jacqueline Batley; Keith J. Edwards; J. H. A. Barker; Kevin J. Dawson; Cw Wiltshire; Dm Glen; A. Karp
Phyllodecta (=Phratora) vulgatissima and P. vitellinae (Coleoptera: Chrysomelidae) are important pests of willows and poplars. Their differences in host species preference may provide a non‐chemical control strategy for pest control. However, little is known about population structure with respect to hosts, regions or seasons. Using five microsatellites, 850 P. vulgatissima and 1100 P. vitellinae individuals, comprising 17 and 22 UK samples, respectively, were genotyped. High diversity was observed at all loci. Migrant numbers exchanged per generation (Nm) were high (2.1–12.6 for P. vulgatissima and 0.9–12.2 for P. vitellinae), suggesting high genetic exchange between samples. Estimates of population differentiation (FST) and analyses of the data using Bayesian methods (Partition and Structure) showed little evidence of subdivision in relation to geography, sampling time or host.
Heredity | 2009
Kevin J. Dawson; Khalid Belkhir
Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals—the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. As the number of possible partitions grows very rapidly with the sample size, we cannot visualize this probability distribution in its entirety, unless the sample is very small. As a solution to this visualization problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package PartitionView. The exact linkage algorithm takes the posterior co-assignment probabilities as input and yields as output a rooted binary tree, or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities.
Agricultural and Forest Entomology | 2012
Gudbjorg I. Aradottir; Steven J. Hanley; C. Matilda Collins; Kevin J. Dawson; A. Karp; Simon R. Leather; I. Shield; R. Harrington
1 This study reports the results obtained in an investigation of the putatively parthenogenetic aphid species Tuberolachnus salignus Gmelin. Tuberolachnus salignus is one of the largest aphid species in the world but where and how it overwinters is not known. It has recently become noteworthy because it is increasingly found on commercially grown willows used in bioenergy production. 2 Seven newly‐developed polymorphic microsatellite markers were used to investigate the genetic diversity of the species, and also to confirm its reproduction strategy. 3 Tuberolachnus salignus shows very low clonal diversity; only 16 genotypes were found in 660 specimens from 27 populations in five countries. 4 There was limited geographical structuring in the samples, although the two most common genotypes, which comprised more than half of the specimens collected, had a very wide distribution. 5 Furthermore, we determined that these aphids, which live in very dense colonies, can consist of more than one genotype, suggesting aggregation of colonizing T. salignus. These results confirm the parthenogenetic nature of T. salignus and demonstrate the presence of common genotypes that are widespread in time and space.
Leukemia | 2018
Francesco Maura; Mia Petljak; M Lionetti; I Cifola; W Liang; E Pinatel; Ludmil B. Alexandrov; Anthony Fullam; Inigo Martincorena; Kevin J. Dawson; Nicos Angelopoulos; Mehmet Kemal Samur; Raphael Szalat; Jorge Zamora; Patrick Tarpey; Helen Davies; Paolo Corradini; Kenneth C. Anderson; Stephane Minvielle; Antonino Neri; Hervé Avet-Loiseau; Jonathan J. Keats; Peter J. Campbell; Nikhil C. Munshi; Niccolo Bolli
Biological and prognostic impact of APOBEC-induced mutations in the spectrum of plasma cell dyscrasias and multiple myeloma cell lines