Is this you? Create Your Porfile

Kevin Karplus

University of California, Santa Cruz

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kevin Karplus is active.

Explore More

Publication

Featured researches published by Kevin Karplus.

Molecular Systems Biology | 2014

Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega

Fabian Sievers; Andreas Wilm; David Dineen; Toby J. Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D. Thompson

Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high‐quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high‐quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

Bioinformatics | 1998

Hidden Markov models for detecting remote protein homologies.

Kevin Karplus; Christian Barrett; Richard Hughey

MOTIVATION A new hidden Markov model method (SAM-T98) for finding remote homologs of protein sequences is described and evaluated. The method begins with a single target sequence and iteratively builds a hidden Markov model (HMM) from the sequence and homologs found using the HMM for database search. SAM-T98 is also used to construct model libraries automatically from sequences in structural databases. METHODS We evaluate the SAM-T98 method with four datasets. Three of the test sets are fold-recognition tests, where the correct answers are determined by structural similarity. The fourth uses a curated database. The method is compared against WU-BLASTP and against DOUBLE-BLAST, a two-step method similar to ISS, but using BLAST instead of FASTA. RESULTS SAM-T98 had the fewest errors in all tests-dramatically so for the fold-recognition tests. At the minimum-error point on the SCOP (Structural Classification of Proteins)-domains test, SAM-T98 got 880 true positives and 68 false positives, DOUBLE-BLAST got 533 true positives with 71 false positives, and WU-BLASTP got 353 true positives with 24 false positives. The method is optimized to recognize superfamilies, and would require parameter adjustment to be used to find family or fold relationships. One key to the performance of the HMM method is a new score-normalization technique that compares the score to the score with a reversed model rather than to a uniform null model. AVAILABILITY A World Wide Web server, as well as information on obtaining the Sequence Alignment and Modeling (SAM) software suite, can be found at http://www.cse.ucsc.edu/research/compbi o/ CONTACT [email protected]; http://www.cse.ucsc.edu/karplus

Proteins | 2009

Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8.

Elmar Krieger; Keehyoung Joo; Jinwoo Lee; Jooyoung Lee; Srivatsan Raman; James Thompson; Mike Tyka; David Baker; Kevin Karplus

A correct alignment is an essential requirement in homology modeling. Yet in order to bridge the structural gap between template and target, which may not only involve loop rearrangements, but also shifts of secondary structure elements and repacking of core residues, high‐resolution refinement methods with full atomic details are needed. Here, we describe four approaches that address this “last mile of the protein folding problem” and have performed well during CASP8, yielding physically realistic models: YASARA, which runs molecular dynamics simulations of models in explicit solvent, using a new partly knowledge‐based all atom force field derived from Amber, whose parameters have been optimized to minimize the damage done to protein crystal structures. The LEE‐SERVER, which makes extensive use of conformational space annealing to create alignments, to help Modeller build physically realistic models while satisfying input restraints from templates and CHARMM stereochemistry, and to remodel the side‐chains. ROSETTA, whose high resolution refinement protocol combines a physically realistic all atom force field with Monte Carlo minimization to allow the large conformational space to be sampled quickly. And finally UNDERTAKER, which creates a pool of candidate models from various templates and then optimizes them with an adaptive genetic algorithm, using a primarily empirical cost function that does not include bond angle, bond length, or other physics‐like terms. Proteins 2009.

Nature Biotechnology | 2012

Automated forward and reverse ratcheting of DNA in a nanopore at 5-A precision

Gerald M Cherf; Kate R. Lieberman; Hytham Rashid; Christopher Evan Lam; Kevin Karplus; Mark Akeson

An emerging DNA sequencing technique uses protein or solid-state pores to analyze individual strands as they are driven in single-file order past a nanoscale sensor. However, uncontrolled electrophoresis of DNA through these nanopores is too fast for accurate base reads. Here, we describe forward and reverse ratcheting of DNA templates through the α-hemolysin nanopore controlled by phi29 DNA polymerase without the need for active voltage control. DNA strands were ratcheted through the pore at median rates of 2.5–40 nucleotides per second and were examined at one nucleotide spatial precision in real time. Up to 500 molecules were processed at ∼130 molecules per hour through one pore. The probability of a registry error (an insertion or deletion) at individual positions during one pass along the template strand ranged from 10% to 24.5% without optimization. This strategy facilitates multiple reads of individual strands and is transferable to other nanopore devices for implementation of DNA sequence analysis.

Bioinformatics | 1996

DIRICHLET MIXTURES: A METHOD FOR IMPROVING DETECTION OF WEAK BUT SIGNIFICANT PROTEIN SEQUENCE HOMOLOGY

Kimmen Sjölander; Kevin Karplus; Michael F. Brown; Richard Hughey; Anders Krogh; I. S Mian; David Haussler

We present a method for condensing the information in multiple alignments of proteins into a mixture of Dirichlet densities over amino acid distributions. Dirichlet mixture densities are designed to be combined with observed amino acid frequencies to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model or other statistical model. These estimates give a statistical model greater generalization capacity, so that remotely related family members can be more reliably recognized by the model. This paper corrects the previously published formula for estimating these expected probabilities, and contains complete derivations of the Dirichlet mixture formulas, methods for optimizing the mixtures to match particular databases, and suggestions for efficient implementation.

Computational Biology and Chemistry | 1996

A flexible motif search technique based on generalized profiles

Philipp Bucher; Kevin Karplus; Nicolas Moeri; Kay Hofmann

A flexible motif search technique is presented which has two major components: (1) a generalized profile syntax serving as a motif definition language; and (2) a motif search method specifically adapted to the problem of finding multiple instances of a motif in the same sequence. The new profile structure, which is the core of the generalized profile syntax, combines the functions of a variety of motif descriptors implemented in other methods, including regular expression-like patterns, weight matrices, previously used profiles, and certain types of hidden Markov models (HMMs). The relationship between generalized profiles and other biomolecular motif descriptors is analyzed in detail, with special attention to HMMs. Generalized profiles are shown to be equivalent to a particular class of HMMs, and conversion procedures in both directions are given. The conversion procedures provide an interpretation for local alignment in the framework of stochastic models, allowing for clear, simple significance tests. A mathematical statement of the motif search problem defines the new method exactly without linking it to a specific algorithmic solution. Part of the definition includes a new definition of disjointness of alignments.

Proteins | 2003

Combining local-structure, fold-recognition, and new fold methods for protein structure prediction

Kevin Karplus; Rachel Karchin; Jenny Draper; Jonathan Casper; Yael Mandel-Gutfreund; Mark Diekhans; Richard Hughey

This article presents an overview of the SAM‐T02 method for protein fold recognition and the UNDERTAKER program for ab initio predictions. The SAM‐T02 server is an automatic method that uses two‐track hidden Markov models (HMMS) to find and align template proteins from PDB to the target protein. The two‐track HMMs use an amino acid alphabet and one of several different local structure alphabets. The UNDERTAKER program is a new fragment‐packing program that can use short or long fragments and alignments to create protein conformations. The HMMs and fold‐recognition alignments from the SAM‐T02 method were used to generate the fragment and alignment libraries used by UNDERTAKER. We present results on a few selected targets for which this combined method worked particularly well: T0129, T0181, T0135, T0130, and T0139. Proteins 2003;53:491–496.

Bioinformatics | 2002

Classifying G-protein coupled receptors with support vector machines

Rachel Karchin; Kevin Karplus; David Haussler

MOTIVATION The enormous amount of protein sequence data uncovered by genome research has increased the demand for computer software that can automate the recognition of new proteins. We discuss the relative merits of various automated methods for recognizing G-Protein Coupled Receptors (GPCRs), a superfamily of cell membrane proteins. GPCRs are found in a wide range of organisms and are central to a cellular signalling network that regulates many basic physiological processes. They are the focus of a significant amount of current pharmaceutical research because they play a key role in many diseases. However, their tertiary structures remain largely unsolved. The methods described in this paper use only primary sequence information to make their predictions. We compare a simple nearest neighbor approach (BLAST), methods based on multiple alignments generated by a statistical profile Hidden Markov Model (HMM), and methods, including Support Vector Machines (SVMs), that transform protein sequences into fixed-length feature vectors. RESULTS The last is the most computationally expensive method, but our experiments show that, for those interested in annotation-quality classification, the results are worth the effort. In two-fold cross-validation experiments testing recognition of GPCR subfamilies that bind a specific ligand (such as a histamine molecule), the errors per sequence at the Minimum Error Point (MEP) were 13.7% for multi-class SVMs, 17.1% for our SVMtree method of hierarchical multi-class SVM classification, 25.5% for BLAST, 30% for profile HMMs, and 49% for classification based on nearest neighbor feature vector Kernel Nearest Neighbor (kernNN). The percentage of true positives recognized before the first false positive was 65% for both SVM methods, 13% for BLAST, 5% for profile HMMs and 4% for kernNN.

Proteins | 2003

Hidden Markov models that use predicted local structure for fold recognition: Alphabets of backbone geometry

Rachel Karchin; Melissa S. Cline; Yael Mandel-Gutfreund; Kevin Karplus

An important problem in computational biology is predicting the structure of the large number of putative proteins discovered by genome sequencing projects. Fold‐recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Remote homologs that may have significant structural similarity are often not detectable by sequence similarities alone. To address this, we incorporated predicted local structure, a generalization of secondary structure, into two‐track profile hidden Markov models (HMMs). We did not rely on a simple helix‐strand‐coil definition of secondary structure, but experimented with a variety of local structure descriptions, following a principled protocol to establish which descriptions are most useful for improving fold recognition and alignment quality. On a test set of 1298 nonhomologous proteins, HMMs incorporating a 3‐letter STRIDE alphabet improved fold recognition accuracy by 15% over amino‐acid‐only HMMs and 23% over PSI‐BLAST, measured by ROC‐65 numbers. We compared two‐track HMMs to amino‐acid‐only HMMs on a difficult alignment test set of 200 protein pairs (structurally similar with 3–24% sequence identity). HMMs with a 6‐letter STRIDE secondary track improved alignment quality by 62%, relative to DALI structural alignments, while HMMs with an STR track (an expanded DSSP alphabet that subdivides strands into six states) improved by 40% relative to CE. Proteins 2003;51:504–514.

Proteins | 1999

CAFASP-1: Critical Assessment of Fully Automated Structure Prediction Methods

Daniel Fischer; Christian Barret; Kevin Bryson; Arne Elofsson; Adam Godzik; David Jones; Kevin Karplus; Lawrence A. Kelley; Robert M. MacCallum; Krzysztof Pawowski; Burkhard Rost; Leszek Rychlewski; Michael J. E. Sternberg

The results of the first Critical Assessment of Fully Automated Structure Prediction (CAFASP‐1) are presented. The objective was to evaluate the success rates of fully automatic web servers for fold recognition which are available to the community. This study was based on the targets used in the third meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP‐3). However, unlike CASP‐3, the study was not a blind trial, as it was held after the structures of the targets were known. The aim was to assess the performance of methods without the user intervention that several groups used in their CASP‐3 submissions. Although it is clear that “human plus machine” predictions are superior to automated ones, this CAFASP‐1 experiment is extremely valuable for users of our methods; it provides an indication of the performance of the methods alone, and not of the “human plus machine” performance assessed in CASP. This information may aid users in choosing which programs they wish to use and in evaluating the reliability of the programs when applied to their specific prediction targets. In addition, evaluation of fully automated methods is particularly important to assess their applicability at genomic scales. For each target, groups submitted the top‐ranking folds generated from their servers. In CAFASP‐1 we concentrated on fold‐recognition web servers only and evaluated only recognition of the correct fold, and not, as in CASP‐3, alignment accuracy. Although some performance differences appeared within each of the four target categories used here, overall, no single server has proved markedly superior to the others. The results showed that current fully automated fold recognition servers can often identify remote similarities when pairwise sequence search methods fail. Nevertheless, in only a few cases outside the family‐level targets has the score of the top‐ranking fold been significant enough to allow for a confident fully automated prediction.Because the goals, rules, and procedures of CAFASP‐1 were different from those used at CASP‐3, the results reported here are not comparable with those reported in CASP‐3. Nevertheless, it is clear that current automated fold recognition methods can not yet compete with “human‐expert plus machine” predictions. Finally, CAFASP‐1 has been useful in identifying the requirements for a future blind trial of automated served‐based protein structure prediction. Proteins Suppl 1999;3:209–217.

Explore More