Konrad Scheffler
University of California, San Diego
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Konrad Scheffler.
PLOS Genetics | 2012
Ben Murrell; Joel O. Wertheim; Sasha Moola; Thomas Weighill; Konrad Scheffler; Sergei L. Kosakovsky Pond
The imprint of natural selection on protein coding genes is often difficult to identify because selection is frequently transient or episodic, i.e. it affects only a subset of lineages. Existing computational techniques, which are designed to identify sites subject to pervasive selection, may fail to recognize sites where selection is episodic: a large proportion of positively selected sites. We present a mixed effects model of evolution (MEME) that is capable of identifying instances of both episodic and pervasive positive selection at the level of an individual site. Using empirical and simulated data, we demonstrate the superior performance of MEME over older models under a broad range of scenarios. We find that episodic selection is widespread and conclude that the number of sites experiencing positive selection may have been vastly underestimated.
Molecular Biology and Evolution | 2013
Ben Murrell; Sasha Moola; Amandla Mabona; Thomas Weighill; Daniel J. Sheward; Sergei L. Kosakovsky Pond; Konrad Scheffler
Model-based analyses of natural selection often categorize sites into a relatively small number of site classes. Forcing each site to belong to one of these classes places unrealistic constraints on the distribution of selection parameters, which can result in misleading inference due to model misspecification. We present an approximate hierarchical Bayesian method using a Markov chain Monte Carlo (MCMC) routine that ensures robustness against model misspecification by averaging over a large number of predefined site classes. This leaves the distribution of selection parameters essentially unconstrained, and also allows sites experiencing positive and purifying selection to be identified orders of magnitude faster than by existing methods. We demonstrate that popular random effects likelihood methods can produce misleading results when sites assigned to the same site class experience different levels of positive or purifying selection--an unavoidable scenario when using a small number of site classes. Our Fast Unconstrained Bayesian AppRoximation (FUBAR) is unaffected by this problem, while achieving higher power than existing unconstrained (fixed effects likelihood) methods. The speed advantage of FUBAR allows us to analyze larger data sets than other methods: We illustrate this on a large influenza hemagglutinin data set (3,142 sequences). FUBAR is available as a batch file within the latest HyPhy distribution (http://www.hyphy.org), as well as on the Datamonkey web server (http://www.datamonkey.org/).
Bioinformatics | 2006
Konrad Scheffler; Darren P. Martin; Cathal Seoighe
MOTIVATION Accurate detection of positive Darwinian selection can provide important insights to researchers investigating the evolution of pathogens. However, many pathogens (particularly viruses) undergo frequent recombination and the phylogenetic methods commonly applied to detect positive selection have been shown to give misleading results when applied to recombining sequences. We propose a method that makes maximum likelihood inference of positive selection robust to the presence of recombination. This is achieved by allowing tree topologies and branch lengths to change across detected recombination breakpoints. Further improvements are obtained by allowing synonymous substitution rates to vary across sites. RESULTS Using simulation we show that, even for extreme cases where recombination causes standard methods to reach false positive rates >90%, the proposed method decreases the false positive rate to acceptable levels while retaining high power. We applied the method to two HIV-1 datasets for which we have previously found that inference of positive selection is invalid owing to high rates of recombination. In one of these (env gene) we still detected positive selection using the proposed method, while in the other (gag gene) we found no significant evidence of positive selection. AVAILABILITY A HyPhy batch language implementation of the proposed methods and the HIV-1 datasets analysed are available at http://www.cbio.uct.ac.za/pub_support/bioinf06. The HyPhy package is available at http://www.hyphy.org, and it is planned that the proposed methods will be included in the next distribution. RDP2 is available at http://darwin.uvigo.es/rdp/rdp.html
Molecular Biology and Evolution | 2015
Joel O. Wertheim; Ben Murrell; Martin D. Smith; Sergei L. Kosakovsky Pond; Konrad Scheffler
Relaxation of selective strength, manifested as a reduction in the efficiency or intensity of natural selection, can drive evolutionary innovation and presage lineage extinction or loss of function. Mechanisms through which selection can be relaxed range from the removal of an existing selective constraint to a reduction in effective population size. Standard methods for estimating the strength and extent of purifying or positive selection from molecular sequence data are not suitable for detecting relaxed selection, because they lack power and can mistake an increase in the intensity of positive selection for relaxation of both purifying and positive selection. Here, we present a general hypothesis testing framework (RELAX) for detecting relaxed selection in a codon-based phylogenetic framework. Given two subsets of branches in a phylogeny, RELAX can determine whether selective strength was relaxed or intensified in one of these subsets relative to the other. We establish the validity of our test via simulations and show that it can distinguish between increased positive selection and a relaxation of selective strength. We also demonstrate the power of RELAX in a variety of biological scenarios where relaxation of selection has been hypothesized or demonstrated previously. We find that obligate and facultative γ-proteobacteria endosymbionts of insects are under relaxed selection compared with their free-living relatives and obligate endosymbionts are under relaxed selection compared with facultative endosymbionts. Selective strength is also relaxed in asexual Daphnia pulex lineages, compared with sexual lineages. Endogenous, nonfunctional, bornavirus-like elements are found to be under relaxed selection compared with exogenous Borna viruses. Finally, selection on the short-wavelength sensitive, SWS1, opsin genes in echolocating and nonecholocating bats is relaxed only in lineages in which this gene underwent pseudogenization; however, selection on the functional medium/long-wavelength sensitive opsin, M/LWS1, is found to be relaxed in all echolocating bats compared with nonecholocating bats.
Briefings in Bioinformatics | 2008
Wayne Delport; Konrad Scheffler; Cathal Seoighe
Probabilistic models of sequence evolution are in widespread use in phylogenetics and molecular sequence evolution. These models have become increasingly sophisticated and combined with statistical model comparison techniques have helped to shed light on how genes and proteins evolve. Models of codon evolution have been particularly useful, because, in addition to providing a significant improvement in model realism for protein-coding sequences, codon models can also be designed to test hypotheses about the selective pressures that shape the evolution of the sequences. Such models typically assume a phylogeny and can be used to identify sites or lineages that have evolved adaptively. Recently some of the key assumptions that underlie phylogenetic tests of selection have been questioned, such as the assumption that the rate of synonymous changes is constant across sites or that a single phylogenetic tree can be assumed at all sites for recombining sequences. While some of these issues have been addressed through the development of novel methods, others remain as caveats that need to be considered on a case-by-case basis. Here, we outline the theory of codon models and their application to the detection of positive selection. We review some of the more recent developments that have improved their power and utility, laying a foundation for further advances in the modeling of coding sequence evolution.
PLOS Computational Biology | 2010
Wayne Delport; Konrad Scheffler; Gordon Botha; Mike B. Gravenor; Spencer V. Muse; Sergei L. Kosakovsky Pond
Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes.
Molecular Biology and Evolution | 2014
Joel O. Wertheim; Martin D. Smith; Davey M. Smith; Konrad Scheffler; Sergei L. Kosakovsky Pond
Abstract Herpesviruses have been infecting and codiverging with their vertebrate hosts for hundreds of millions of years. The primate simplex viruses exemplify this pattern of virus–host codivergence, at a minimum, as far back as the most recent common ancestor of New World monkeys, Old World monkeys, and apes. Humans are the only primate species known to be infected with two distinct herpes simplex viruses: HSV-1 and HSV-2. Human herpes simplex viruses are ubiquitous, with over two-thirds of the human population infected by at least one virus. Here, we investigated whether the additional human simplex virus is the result of ancient viral lineage duplication or cross-species transmission. We found that standard phylogenetic models of nucleotide substitution are inadequate for distinguishing among these competing hypotheses; the extent of synonymous substitutions causes a substantial underestimation of the lengths of some of the branches in the phylogeny, consistent with observations in other viruses (e.g., avian influenza, Ebola, and coronaviruses). To more accurately estimate ancient viral divergence times, we applied a branch-site random effects likelihood model of molecular evolution that allows the strength of natural selection to vary across both the viral phylogeny and the gene alignment. This selection-informed model favored a scenario in which HSV-1 is the result of ancient codivergence and HSV-2 arose from a cross-species transmission event from the ancestor of modern chimpanzees to an extinct Homo precursor of modern humans, around 1.6 Ma. These results provide a new framework for understanding human herpes simplex virus evolution and demonstrate the importance of using selection-informed models of sequence evolution when investigating viral origin hypotheses.
Journal of Virology | 2012
Randall G. Fisher; Gert U. van Zyl; Simon A. A. Travers; Sergei L. Kosakovsky Pond; Susan Engelbrech; Ben Murrell; Konrad Scheffler; Davey M. Smith
ABSTRACT Standard genotypic antiretroviral resistance testing, performed by bulk sequencing, does not readily detect variants that comprise <20% of the circulating HIV-1 RNA population. Nevertheless, it is valuable in selecting an antiretroviral regimen after antiretroviral failure. In patients with poor adherence, resistant variants may not reach this threshold. Therefore, deep sequencing would be potentially valuable for detecting minority resistant variants. We compared bulk sequencing and deep sequencing to detect HIV-1 drug resistance at the time of a second-line protease inhibitor (PI)-based antiretroviral regimen failure. Eligibility criteria were virologic failure (HIV-1 RNA load of >500 copies/ml) of a first-line nonnucleoside reverse transcriptase inhibitor-based regimen, with at least the M184V mutation (lamivudine resistance), and second-line failure of a lopinavir/ritonavir (LPV/r)-based regimen. An amplicon-sequencing approach on the Roche 454 system was used. Six patients with viral loads of >90,000 copies/ml and one patient with a viral load of 520 copies/ml were included. Mutations not detectable by bulk sequencing during first- and second-line failure were detected by deep sequencing during second-line failure. Low-frequency variants (>0.5% of the sequence population) harboring major protease inhibitor resistance mutations were found in 5 of 7 patients despite poor adherence to the LPV/r-based regimen. In patients with intermittent adherence to a boosted PI regimen, deep sequencing may detect minority PI-resistant variants, which likely represent early events in resistance selection. In patients with poor or intermittent adherence, there may be low evolutionary impetus for such variants to reach fixation, explaining the low prevalence of PI resistance.
PLOS Pathogens | 2008
Wayne Delport; Konrad Scheffler; Cathal Seoighe
Host immune responses against infectious pathogens exert strong selective pressures favouring the emergence of escape mutations that prevent immune recognition. Escape mutations within or flanking functionally conserved epitopes can occur at a significant cost to the pathogen in terms of its ability to replicate effectively. Such mutations come under selective pressure to revert to the wild type in hosts that do not mount an immune response against the epitope. Amino acid positions exhibiting this pattern of escape and reversion are of interest because they tend to coincide with immune responses that control pathogen replication effectively. We have used a probabilistic model of protein coding sequence evolution to detect sites in HIV-1 exhibiting a pattern of rapid escape and reversion. Our model is designed to detect sites that toggle between a wild type amino acid, which is susceptible to a specific immune response, and amino acids with lower replicative fitness that evade immune recognition. Through simulation, we show that this model has significantly greater power to detect selection involving immune escape and reversion than standard models of diversifying selection, which are sensitive to an overall increased rate of non-synonymous substitution. Applied to alignments of HIV-1 protein coding sequences, the model of immune escape and reversion detects a significantly greater number of adaptively evolving sites in env and nef. In all genes tested, the model provides a significantly better description of adaptively evolving sites than standard models of diversifying selection. Several of the sites detected are corroborated by association between Human Leukocyte Antigen (HLA) and viral sequence polymorphisms. Overall, there is evidence for a large number of sites in HIV-1 evolving under strong selective pressure, but exhibiting low sequence diversity. A phylogenetic model designed to detect rapid toggling between wild type and escape amino acids identifies a larger number of adaptively evolving sites in HIV-1, and can in some cases correctly identify the amino acid that is susceptible to the immune response.
international conference on acoustics, speech, and signal processing | 2000
Konrad Scheffler; Steve J. Young
The field of spoken dialogue systems has developed rapidly. However, optimisation, evaluation and rapid development of systems remain problematic. This paper describes a method of producing a probabilistic simulation of mixed initiative dialogue with recognition and understanding errors. Both user behaviour and system errors are modelled using a data-driven approach, and the quality of the simulations are evaluated by comparing them to real human-machine dialogues. The simulation system can be used to perform rapid evaluations of prototype systems, thus aiding the development process. It is also envisaged that it will be used as a tool for automation of dialogue design.