Søren Brunak | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Søren Brunak is active.

Explore More

Publication

Featured researches published by Søren Brunak.

Nature Methods | 2011

SignalP 4.0: discriminating signal peptides from transmembrane regions

Thomas Nordahl Petersen; Søren Brunak; Gunnar von Heijne; Henrik Nielsen

We benchmarked SignalP 4.0 against SignalP 3.0 and ten other signal peptide prediction algorithms (Fig. 1). We compared prediction performance using the Matthews correlation coefficient16, for which each sequence was counted as a true or false positive or negative. To test SignalP 4.0 performance, we did not use data that had been used in training the networks or selecting the optimal architecture, and the test data did not contain homologs to the training and optimization data (Supplementary Methods). The test set for SignalP 3.0 was also independent of the training set because we removed sequences used to construct SignalP 3.0 and their homologs from the benchmark data. For other algorithms more recent than SignalP 3.0, the benchmark data may include data used to train the methods, possibly leading to slight overestimations of their performance. Our results show that SignalP 4.0 was the best signal-peptide predictor for all three organism types (Fig. 1). This comes at a price, however, because SignalP 4.0 was not in all cases as good as SignalP 3.0 according to cleavage-site sensitivity or signal-peptide correlation when there are no transmembrane proteins present (Supplementary Results). An ideal method would have the best SignalP 4.0: discriminating signal peptides from transmembrane regions

Nature | 2010

A human gut microbial gene catalogue established by metagenomic sequencing

Junjie Qin; Ruiqiang Li; Jeroen Raes; Manimozhiyan Arumugam; Kristoffer Sølvsten Burgdorf; Chaysavanh Manichanh; Trine Nielsen; Nicolas Pons; Florence Levenez; Takuji Yamada; Daniel R. Mende; Junhua Li; Junming Xu; Shaochuan Li; Dongfang Li; Jianjun Cao; Bo Wang; Huiqing Liang; Huisong Zheng; Yinlong Xie; Julien Tap; Patricia Lepage; Marcelo Bertalan; Jean-Michel Batto; Torben Hansen; Denis Le Paslier; Allan Linneberg; H. Bjørn Nielsen; Eric Pelletier; Pierre Renault

To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence, from faecal samples of 124 European individuals. The gene set, ∼150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, indicating that the entire cohort harbours between 1,000 and 1,150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively.

Nature | 2011

Enterotypes of the human gut microbiome

Manimozhiyan Arumugam; Jeroen Raes; Eric Pelletier; Denis Le Paslier; Takuji Yamada; Daniel R. Mende; Gabriel da Rocha Fernandes; Julien Tap; Thomas Brüls; Jean-Michel Batto; Marcelo Bertalan; Natalia Borruel; Francesc Casellas; Leyden Fernandez; Laurent Gautier; Torben Hansen; Masahira Hattori; Tetsuya Hayashi; Michiel Kleerebezem; Ken Kurokawa; Marion Leclerc; Florence Levenez; Chaysavanh Manichanh; H. Bjørn Nielsen; Trine Nielsen; Nicolas Pons; Julie Poulain; Junjie Qin; Thomas Sicheritz-Pontén; Sebastian Tims

Our knowledge of species and functional composition of the human gut microbiome is rapidly increasing, but it is still based on very few cohorts and little is known about variation across the world. By combining 22 newly sequenced faecal metagenomes of individuals from four countries with previously published data sets, here we identify three robust clusters (referred to as enterotypes hereafter) that are not nation or continent specific. We also confirmed the enterotypes in two published, larger cohorts, indicating that intestinal microbiota variation is generally stratified, not continuous. This indicates further the existence of a limited number of well-balanced host–microbial symbiotic states that might respond differently to diet and drug intake. The enterotypes are mostly driven by species composition, but abundant molecular functions are not necessarily provided by abundant species, highlighting the importance of a functional analysis to understand microbial communities. Although individual host properties such as body mass index, age, or gender cannot explain the observed enterotypes, data-driven marker genes or functional modules can be identified for each of these host properties. For example, twelve genes significantly correlate with age and three functional modules with the body mass index, hinting at a diagnostic potential of microbial markers.

Nature Protocols | 2007

Locating proteins in the cell using TargetP, SignalP and related tools

Olof Emanuelsson; Søren Brunak; Gunnar von Heijne; Henrik Nielsen

Determining the subcellular localization of a protein is an important first step toward understanding its function. Here, we describe the properties of three well-known N-terminal sequence motifs directing proteins to the secretory pathway, mitochondria and chloroplasts, and sketch a brief history of methods to predict subcellular localization based on these sorting signals and other sequence properties. We then outline how to use a number of internet-accessible tools to arrive at a reliable subcellular localization prediction for eukaryotic and prokaryotic proteins. In particular, we provide detailed step-by-step instructions for the coupled use of the amino-acid sequence-based predictors TargetP, SignalP, ChloroP and TMHMM, which are all hosted at the Center for Biological Sequence Analysis, Technical University of Denmark. In addition, we describe and provide web references to other useful subcellular localization predictors. Finally, we discuss predictive performance measures in general and the performance of TargetP and SignalP in particular.

Bioinformatics | 2000

Assessing the accuracy of prediction algorithms for classification: an overview

Pierre Baldi; Søren Brunak; Yves Chauvin; Claus A. F. Andersen; Henrik Nielsen

We provide a unified overview of methods that currently are widely used to assess the accuracy of prediction algorithms, from raw percentages, quadratic error measures and other distances, and correlation coefficients, and to information theoretic measures such as relative entropy and mutual information. We briefly discuss the advantages and disadvantages of each approach. For classification tasks, we derive new learning algorithms for the design of prediction systems by directly optimising the correlation coefficient. We observe and prove several results relating sensitivity and specificity of optimal systems. While the principles are general, we illustrate the applicability on specific problems such as protein secondary structure and signal peptide prediction.

Nature | 2013

Richness of human gut microbiome correlates with metabolic markers

Trine Nielsen; Junjie Qin; Edi Prifti; Falk Hildebrand; Gwen Falony; Mathieu Almeida; Manimozhiyan Arumugam; Jean-Michel Batto; Sean Kennedy; Pierre Leonard; Junhua Li; Kristoffer Sølvsten Burgdorf; Niels Grarup; Torben Jørgensen; Ivan Brandslund; Henrik Bjørn Nielsen; Agnieszka Sierakowska Juncker; Marcelo Bertalan; Florence Levenez; Nicolas Pons; Simon Rasmussen; Shinichi Sunagawa; Julien Tap; Sebastian Tims; Erwin G. Zoetendal; Søren Brunak; Karine Clément; Joël Doré; Michiel Kleerebezem; Karsten Kristiansen

We are facing a global metabolic health crisis provoked by an obesity epidemic. Here we report the human gut microbial composition in a population sample of 123 non-obese and 169 obese Danish individuals. We find two groups of individuals that differ by the number of gut microbial genes and thus gut bacterial richness. They contain known and previously unknown bacterial species at different proportions; individuals with a low bacterial richness (23% of the population) are characterized by more marked overall adiposity, insulin resistance and dyslipidaemia and a more pronounced inflammatory phenotype when compared with high bacterial richness individuals. The obese individuals among the lower bacterial richness group also gain more weight over time. Only a few bacterial species are sufficient to distinguish between individuals with high and low bacterial richness, and even between lean and obese participants. Our classifications based on variation in the gut microbiome identify subsets of individuals in the general white adult population who may be at increased risk of progressing to adiposity-associated co-morbidities.

Science Signaling | 2010

Quantitative Phosphoproteomics Reveals Widespread Full Phosphorylation Site Occupancy During Mitosis

J. Olsen; Michiel Vermeulen; Anna Santamaria; Chanchal Kumar; Martin L. Miller; Lars Juhl Jensen; Florian Gnad; Jürgen Cox; Thomas Skøt Jensen; Erich A. Nigg; Søren Brunak; Matthias Mann

Protein phosphorylation during the cell cycle may be an all-or-none process in many instances. All-or-None Phosphorylation Phosphorylation is a key regulatory event that drives many cellular processes, including cell division. Olsen et al. undertook a phosphoproteomic analysis of HeLa cells at various stages in the cell cycle, which linked new phosphorylation sites and kinase substrates to specific stages. Furthermore, they established a method to calculate the fractional occupancy of particular phosphorylation sites (phosphorylation stoichiometry) on a global level and found that, contrary to expectations, many sites on functionally related proteins appeared to be nearly completely phosphorylated at particular stages of the cell cycle. They observed an inverse relationship in the phosphorylation occupancy of some sites in cells undergoing mitosis compared to those in S phase. The authors speculate that a high stoichiometry of phosphorylation may be necessary to inactivate an entire protein population to effectively block activity, whereas function may only require a low stoichiometry of phosphorylation, because only a small fraction of the protein population may be required for full activity. Eukaryotic cells replicate by a complex series of evolutionarily conserved events that are tightly regulated at defined stages of the cell division cycle. Progression through this cycle involves a large number of dedicated protein complexes and signaling pathways, and deregulation of this process is implicated in tumorigenesis. We applied high-resolution mass spectrometry–based proteomics to investigate the proteome and phosphoproteome of the human cell cycle on a global scale and quantified 6027 proteins and 20,443 unique phosphorylation sites and their dynamics. Co-regulated proteins and phosphorylation sites were grouped according to their cell cycle kinetics and compared to publicly available messenger RNA microarray data. Most detected phosphorylation sites and more than 20% of all quantified proteins showed substantial regulation, mainly in mitotic cells. Kinase-motif analysis revealed global activation during S phase of the DNA damage response network, which was mediated by phosphorylation by ATM or ATR or DNA-dependent protein kinases. We determined site-specific stoichiometry of more than 5000 sites and found that most of the up-regulated sites phosphorylated by cyclin-dependent kinase 1 (CDK1) or CDK2 were almost fully phosphorylated in mitotic cells. In particular, nuclear proteins and proteins involved in regulating metabolic processes have high phosphorylation site occupancy in mitosis. This suggests that these proteins may be inactivated by phosphorylation in mitotic cells.

Protein Science | 2003

Prediction of lipoprotein signal peptides in Gram-negative bacteria

Agnieszka Sierakowska Juncker; Hanni Willenbrock; Gunnar von Heijne; Søren Brunak; Henrik Nielsen; Anders Krogh

A method to predict lipoprotein signal peptides in Gram‐negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII‐cleaved proteins), SPaseI‐cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was able to predict 96.8% of the lipoproteins correctly with only 0.3% false positives in a set of SPaseI‐cleaved, cytoplasmic, and transmembrane proteins. The results obtained were significantly better than those of previously developed methods. Even though Gram‐positive lipoprotein signal peptides differ from Gram‐negatives, the HMM was able to identify 92.9% of the lipoproteins included in a Gram‐positive test set. A genome search was carried out for 12 Gram‐negative genomes and one Gram‐positive genome. The results for Escherichia coli K12 were compared with new experimental data, and the predictions by the HMM agree well with the experimentally verified lipoproteins. A neural network‐based predictor was developed for comparison, and it gave very similar results. LipoP is available as a Web server at www.cbs.dtu.dk/services/LipoP/.

Nature Biotechnology | 2007

A human phenome-interactome network of protein complexes implicated in genetic disorders

Kasper Lage; E. Olof Karlberg; Zenia M Størling; Páll Ísólfur Ólason; Anders Gorm Pedersen; Olga Rigina; Anders M. Hinsby; Zeynep Tümer; Flemming Pociot; Niels Tommerup; Yves Moreau; Søren Brunak

We performed a systematic, large-scale analysis of human protein complexes comprising gene products implicated in many different categories of human disease to create a phenome-interactome network. This was done by integrating quality-controlled interactions of human proteins with a validated, computationally derived phenotype similarity score, permitting identification of previously unknown complexes likely to be associated with disease. Using a phenomic ranking of protein complexes linked to human disease, we developed a Bayesian predictor that in 298 of 669 linkage intervals correctly ranks the known disease-causing protein as the top candidate, and in 870 intervals with no identified disease-causing gene, provides novel candidates implicated in disorders such as retinitis pigmentosa, epithelial ovarian cancer, inflammatory bowel disease, amyotrophic lateral sclerosis, Alzheimer disease, type 2 diabetes and coronary heart disease. Our publicly available draft of protein complexes associated with pathology comprises 506 complexes, which reveal functional relationships between disease-promoting genes that will inform future experimentation.

International Journal of Neural Systems | 1997

A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.

Henrik Nielsen; Jacob Engelbrecht; Søren Brunak; Gunnar von Heijne

We have developed a new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequences. The method performs significantly better than previous prediction schemes, and can easily be applied to genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal-anchor sequences is also possible, though with lower precision. Predictions can be made on a publicly available WWW server: http://www.cbs.dtu.dk/services/SignalP/.

Explore More