Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lukas Käll is active.

Publication


Featured researches published by Lukas Käll.


Nature Methods | 2007

Semi-supervised learning for peptide identification from shotgun proteomics datasets

Lukas Käll; Jesse D. Canterbury; Jason Weston; William Stafford Noble; Michael J. MacCoss

Shotgun proteomics uses liquid chromatography–tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.


Nucleic Acids Research | 2007

Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server.

Lukas Käll; Anders Krogh; Erik L. L. Sonnhammer

When using conventional transmembrane topology and signal peptide predictors, such as TMHMM and SignalP, there is a substantial overlap between these two types of predictions. Applying these methods to five complete proteomes, we found that 30–65% of all predicted signal peptides and 25–35% of all predicted transmembrane topologies overlap. This impairs predictions of 5–10% of the proteome, hence this is an important issue in protein annotation. To address this problem, we previously designed a hidden Markov model, Phobius, that combines transmembrane topology and signal peptide predictions. The method makes an optimal choice between transmembrane segments and signal peptides, and also allows constrained and homology-enriched predictions. We here present a web interface (http://phobius.cgb.ki.se and http://phobius.binf.ku.dk) to access Phobius.


Nucleic Acids Research | 2015

The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides

Konstantinos D. Tsirigos; Christoph Peters; Nanjiang Shu; Lukas Käll; Arne Elofsson

TOPCONS (http://topcons.net/) is a widely used web server for consensus prediction of membrane protein topology. We hereby present a major update to the server, with some substantial improvements, including the following: (i) TOPCONS can now efficiently separate signal peptides from transmembrane regions. (ii) The server can now differentiate more successfully between globular and membrane proteins. (iii) The server now is even slightly faster, although a much larger database is used to generate the multiple sequence alignments. For most proteins, the final prediction is produced in a matter of seconds. (iv) The user-friendly interface is retained, with the additional feature of submitting batch files and accessing the server programmatically using standard interfaces, making it thus ideal for proteome-wide analyses. Indicatively, the user can now scan the entire human proteome in a few days. (v) For proteins with homology to a known 3D structure, the homology-inferred topology is also displayed. (vi) Finally, the combination of methods currently implemented achieves an overall increase in performance by 4% as compared to the currently available best-scoring methods and TOPCONS is the only method that can identify signal peptides and still maintain a state-of-the-art performance in topology predictions.


FEBS Letters | 2007

Membrane topology of the Drosophila OR83b odorant receptor.

Carolina Lundin; Lukas Käll; Scott A. Kreher; Katja Kapp; Erik L. L. Sonnhammer; John R. Carlson; Gunnar von Heijne; IngMarie Nilsson

By analogy to mammals, odorant receptors (ORs) in insects, such as Drosophila melanogaster, have long been thought to belong to the G‐protein coupled receptor (GPCR) superfamily. However, recent work has cast doubt on this assumption and has tentatively suggested an inverted topology compared to the canonical N out − C in 7 transmembrane (TM) GPCR topology, at least for some Drosophila ORs. Here, we report a detailed topology mapping of the Drosophila OR83b receptor using engineered glycosylation sites as topology markers. Our results are inconsistent with a classical GPCR topology and show that OR83b has an intracellular N‐terminus, an extracellular C‐terminus, and 7TM helices.


Protein Science | 2006

A general model of G protein-coupled receptor sequences and its application to detect remote homologs

Markus Wistrand; Lukas Käll; Erik L. L. Sonnhammer

G protein‐coupled receptors (GPCRs) constitute a large superfamily involved in various types of signal transduction pathways triggered by hormones, odorants, peptides, proteins, and other types of ligands. The superfamily is so diverse that many members lack sequence similarity, although they all span the cell membrane seven times with an extracellular N and a cytosolic C terminus. We analyzed a divergent set of GPCRs and found distinct loop length patterns and differences in amino acid composition between cytosolic loops, extracellular loops, and membrane regions. We configured GPCRHMM, a hidden Markov model, to fit those features and trained it on a large dataset representing the entire superfamily. GPCRHMM was benchmarked to profile HMMs and generic transmembrane detectors on sets of known GPCRs and non‐GPCRs. In a cross‐validation procedure, profile HMMs produced an error rate nearly twice as high as GPCRHMM. In a sensitivity‐selectivity test, GPCRHMMs sensitivity was about 15% higher than that of the best transmembrane predictors, at comparable false positive rates. We used GPCRHMM to search for novel members of the GPCR superfamily in five proteomes. All in all we detected 120 sequences that lacked annotation and are potentially novel GPCRs. Out of those 102 were found in Caenorhabditis elegans, four in human, and seven in mouse. Many predictions (65) belonged to Pfam domains of unknown function. GPCRHMM strongly rejected a family of arthropod‐specific odorant receptors believed to be GPCRs. A detailed analysis showed that these sequences are indeed very different from other GPCRs. GPCRHMM is available at http://gpcrhmm.cgb.ki.se.


PLOS Computational Biology | 2008

Transmembrane Topology and Signal Peptide Prediction Using Dynamic Bayesian Networks

Sheila M. Reynolds; Lukas Käll; Michael Riffle; Jeff A. Bilmes; William Stafford Noble

Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr.


Journal of Proteome Research | 2009

Improvements to the Percolator Algorithm for Peptide Identification from Shotgun Proteomics Data Sets

Marina Spivak; Jason Weston; Léon Bottou; Lukas Käll; William Stafford Noble

Shotgun proteomics coupled with database search software allows the identification of a large number of peptides in a single experiment. However, some existing search algorithms, such as SEQUEST, use score functions that are designed primarily to identify the best peptide for a given spectrum. Consequently, when comparing identifications across spectra, the SEQUEST score function Xcorr fails to discriminate accurately between correct and incorrect peptide identifications. Several machine learning methods have been proposed to address the resulting classification task of distinguishing between correct and incorrect peptide-spectrum matches (PSMs). A recent example is Percolator, which uses semisupervised learning and a decoy database search strategy to learn to distinguish between correct and incorrect PSMs identified by a database search algorithm. The current work describes three improvements to Percolator. (1) Percolators heuristic optimization is replaced with a clear objective function, with intuitive reasons behind its choice. (2) Tractable nonlinear models are used instead of linear models, leading to improved accuracy over the original Percolator. (3) A method, Q-ranker, for directly optimizing the number of identified spectra at a specified q value is proposed, which achieves further gains.


Nature Methods | 2014

HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics

Rui M. Branca; Lukas M. Orre; H. Johansson; Viktor Granholm; Mikael Huss; Åsa Pérez-Bercoff; Jenny Forshed; Lukas Käll; Janne Lehtiö

We present a liquid chromatography–mass spectrometry (LC-MS)-based method permitting unbiased (gene prediction–independent) genome-wide discovery of protein-coding loci in higher eukaryotes. Using high-resolution isoelectric focusing (HiRIEF) at the peptide level in the 3.7–5.0 pH range and accurate peptide isoelectric point (pI) prediction, we probed the six-reading-frame translation of the human and mouse genomes and identified 98 and 52 previously undiscovered protein-coding loci, respectively. The method also enabled deep proteome coverage, identifying 13,078 human and 10,637 mouse proteins.


Journal of Proteome Research | 2008

Rapid and Accurate Peptide Identification from Tandem Mass Spectra

Christopher Y. Park; Aaron A. Klammer; Lukas Käll; Michael J. MacCoss; William Stafford Noble

Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of proteins in a complex biological sample. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We describe a database search program called Crux that reimplements and extends the widely used database search program Sequest. For speed, Crux uses a peptide indexing scheme to rapidly retrieve candidate peptides for a given spectrum. For each peptide in the target database, Crux generates shuffled decoy peptides on the fly, providing a good null model and, hence, accurate false discovery rate estimates. Crux also implements two recently described postprocessing methods: a p value calculation based upon fitting a Weibull distribution to the observed scores, and a semisupervised method that learns to discriminate between target and decoy matches. Both methods significantly improve the overall rate of peptide identification. Crux is implemented in C and is distributed with source code freely to noncommercial users.


Genome Research | 2008

Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations

Gennifer Merrihew; Colleen Davis; Brent Ewing; Gary Williams; Lukas Käll; Barbara Frewen; William Stafford Noble; Phil Green; James H. Thomas; Michael J. MacCoss

We describe a general mass spectrometry-based approach for gene annotation of any organism and demonstrate its effectiveness using the nematode Caenorhabditis elegans. We detected 6779 C. elegans proteins (67,047 peptides), including 384 that, although annotated in WormBase WS150, lacked cDNA or other prior experimental support. We also identified 429 new coding sequences that were unannotated in WS150. Nearly half (192/429) of the new coding sequences were confirmed with RT-PCR data. Thirty-three (approximately 8%) of the new coding sequences had been predicted to be pseudogenes, 151 (approximately 35%) reveal apparent errors in gene models, and 245 (57%) appear to be novel genes. In addition, we verified 6010 exon-exon splice junctions within existing WormBase gene models. Our work confirms that mass spectrometry is a powerful experimental tool for annotating sequenced genomes. In addition, the collection of identified peptides should facilitate future proteomics experiments targeted at specific proteins of interest.

Collaboration


Dive into the Lukas Käll's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yasset Perez-Riverol

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Samuel H. Payne

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Björn Forsström

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Fredrik Edfors

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Magnus Palmblad

Leiden University Medical Center

View shared research outputs
Researchain Logo
Decentralizing Knowledge