Marco Punta
Wellcome Trust Sanger Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marco Punta.
Nucleic Acids Research | 2000
Marco Punta; Penny Coggill; Ruth Y. Eberhardt; Jaina Mistry; John G. Tate; Chris Boursnell; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L. L. Sonnhammer; Sean R. Eddy; Alex Bateman; Robert D. Finn
Pfam is a widely used database of protein families, currently containing more than 13 000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the ‘sunburst’ representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.
Nucleic Acids Research | 2014
Robert D. Finn; Alex Bateman; Jody Clements; Penelope Coggill; Ruth Y. Eberhardt; Sean R. Eddy; Andreas Heger; Kirstie Hetherington; Liisa Holm; Jaina Mistry; Erik L. L. Sonnhammer; John G. Tate; Marco Punta
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.
Nucleic Acids Research | 2016
Robert D. Finn; Penelope Coggill; Ruth Y. Eberhardt; Sean R. Eddy; Jaina Mistry; Alex L. Mitchell; Simon Potter; Marco Punta; Matloob Qureshi; Amaia Sangrador-Vegas; Gustavo A. Salazar; John G. Tate; Alex Bateman
In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.
Nucleic Acids Research | 2012
Sarah Hunter; P. D. Jones; Alex L. Mitchell; Rolf Apweiler; Teresa K. Attwood; Alex Bateman; Thomas Bernard; David Binns; Peer Bork; Sarah W. Burge; Edouard de Castro; Penny Coggill; Matthew Corbett; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D. Finn; Matthew Fraser; Julian Gough; Daniel H. Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Ivica Letunic; David M. Lonsdale; Rodrigo Lopez; John Maslen; Craig McAnulla; Jennifer McDowall; Conor McMenamin
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
Nucleic Acids Research | 2015
Alex L. Mitchell; Hsin-Yu Chang; Louise Daugherty; Matthew Fraser; Sarah Hunter; Rodrigo Lopez; Craig McAnulla; Conor McMenamin; Gift Nuka; Sebastien Pesseat; Amaia Sangrador-Vegas; Maxim Scheremetjew; Claudia Rato; Siew-Yit Yong; Alex Bateman; Marco Punta; Teresa K. Attwood; Christian J. A. Sigrist; Nicole Redaschi; Catherine Rivoire; Ioannis Xenarios; Daniel Kahn; Dominique Guyot; Peer Bork; Ivica Letunic; Julian Gough; Matt E. Oates; Daniel H. Haft; Hongzhan Huang; Darren A. Natale
The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36 766 member database signatures integrated into 26 238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.
Nucleic Acids Research | 2014
Guy Yachdav; Edda Kloppmann; László Kaján; Maximilian Hecht; Tatyana Goldberg; Tobias Hamp; Peter Hönigschmid; Andrea Schafferhans; Manfred Roos; Michael Bernhofer; Lothar Richter; Haim Ashkenazy; Marco Punta; Avner Schlessinger; Yana Bromberg; Reinhard Schneider; Gerrit Vriend; Chris Sander; Nir Ben-Tal; Burkhard Rost
PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org.
Nucleic Acids Research | 2013
Jaina Mistry; Robert D. Finn; Sean R. Eddy; Alex Bateman; Marco Punta
Detection of protein homology via sequence similarity has important applications in biology, from protein structure and function prediction to reconstruction of phylogenies. Although current methods for aligning protein sequences are powerful, challenges remain, including problems with homologous overextension of alignments and with regions under convergent evolution. Here, we test the ability of the profile hidden Markov model method HMMER3 to correctly assign homologous sequences to >13 000 manually curated families from the Pfam database. We identify problem families using protein regions that match two or more Pfam families not currently annotated as related in Pfam. We find that HMMER3 E-value estimates seem to be less accurate for families that feature periodic patterns of compositional bias, such as the ones typically observed in coiled-coils. These results support the continued use of manually curated inclusion thresholds in the Pfam database, especially on the subset of families that have been identified as problematic in experiments such as these. They also highlight the need for developing new methods that can correct for this particular type of compositional bias.
Nature | 2010
Yu-hang Chen; Lei Hu; Marco Punta; Renato Bruni; B. Hillerich; Brian Kloss; Burkhard Rost; J. Love; Steven A. Siegelbaum; Wayne A. Hendrickson
The plant SLAC1 anion channel controls turgor pressure in the aperture-defining guard cells of plant stomata, thereby regulating the exchange of water vapour and photosynthetic gases in response to environmental signals such as drought or high levels of carbon dioxide. Here we determine the crystal structure of a bacterial homologue (Haemophilus influenzae) of SLAC1 at 1.20 Å resolution, and use structure-inspired mutagenesis to analyse the conductance properties of SLAC1 channels. SLAC1 is a symmetrical trimer composed from quasi-symmetrical subunits, each having ten transmembrane helices arranged from helical hairpin pairs to form a central five-helix transmembrane pore that is gated by an extremely conserved phenylalanine residue. Conformational features indicate a mechanism for control of gating by kinase activation, and electrostatic features of the pore coupled with electrophysiological characteristics indicate that selectivity among different anions is largely a function of the energetic cost of ion dehydration.
Nature | 2011
Yu Cao; Xiangshu Jin; Hua Huang; Mehabaw Getahun Derebe; Elena J. Levin; Venkataraman Kabaleeswaran; Yaping Pan; Marco Punta; J. Love; Jun Weng; Matthias Quick; Sheng Ye; Brian Kloss; Renato Bruni; Erik Martinez-Hackert; Wayne A. Hendrickson; Burkhard Rost; Jonathan A. Javitch; Kanagalaghatta R. Rajashankar; Youxing Jiang; Ming Zhou
The TrkH/TrkG/KtrB proteins mediate K+ uptake in bacteria and probably evolved from simple K+ channels by multiple gene duplications or fusions. Here we present the crystal structure of a TrkH from Vibrio parahaemolyticus. TrkH is a homodimer, and each protomer contains an ion permeation pathway. A selectivity filter, similar in architecture to those of K+ channels but significantly shorter, is lined by backbone and side-chain oxygen atoms. Functional studies showed that TrkH is selective for permeation of K+ and Rb+ over smaller ions such as Na+ or Li+. Immediately intracellular to the selectivity filter are an intramembrane loop and an arginine residue, both highly conserved, which constrict the permeation pathway. Substituting the arginine with an alanine significantly increases the rate of K+ flux. These results reveal the molecular basis of K+ selectivity and suggest a novel gating mechanism for this large and important family of membrane transport proteins.
Nature | 2011
Yu Cao; Xiangshu Jin; Elena J. Levin; Hua Huang; Yinong Zong; Matthias Quick; Jun Weng; Yaping Pan; J. Love; Marco Punta; Burkhard Rost; Wayne A. Hendrickson; Jonathan A. Javitch; Kanagalaghatta R. Rajashankar; Ming Zhou
Saccharides have a central role in the nutrition of all living organisms. Whereas several saccharide uptake systems are shared between the different phylogenetic kingdoms, the phosphoenolpyruvate-dependent phosphotransferase system exists almost exclusively in bacteria. This multi-component system includes an integral membrane protein EIIC that transports saccharides and assists in their phosphorylation. Here we present the crystal structure of an EIIC from Bacillus cereus that transports diacetylchitobiose. The EIIC is a homodimer, with an expansive interface formed between the amino-terminal halves of the two protomers. The carboxy-terminal half of each protomer has a large binding pocket that contains a diacetylchitobiose, which is occluded from both sides of the membrane with its site of phosphorylation near the conserved His 250 and Glu 334 residues. The structure shows the architecture of this important class of transporters, identifies the determinants of substrate binding and phosphorylation, and provides a framework for understanding the mechanism of sugar translocation.