Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Patricia C. Babbitt is active.

Publication


Featured researches published by Patricia C. Babbitt.


PLOS Computational Biology | 2009

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies

Alexandra M. Schnoes; Shoshana D. Brown; Igor Dodevski; Patricia C. Babbitt

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%–63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with “overprediction” of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.


Nucleic Acids Research | 2017

InterPro in 2017—beyond protein family and domain annotations

Robert D. Finn; Teresa K. Attwood; Patricia C. Babbitt; Alex Bateman; Peer Bork; Alan Bridge; Hsin Yu Chang; Zsuzsanna Dosztányi; Sara El-Gebali; Matthew Fraser; Julian Gough; David R Haft; Gemma L. Holliday; Hongzhan Huang; Xiaosong Huang; Ivica Letunic; Rodrigo Lopez; Shennan Lu; Huaiyu Mi; Jaina Mistry; Darren A. Natale; Marco Necci; Gift Nuka; Christine A. Orengo; Youngmi Park; Sebastien Pesseat; Damiano Piovesan; Simon Potter; Neil D. Rawlings; Nicole Redaschi

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPros predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Nucleic Acids Research | 2003

BayGenomics: a resource of insertional mutations in mouse embryonic stem cells

Doug Stryke; Michiko Kawamoto; Conrad C. Huang; Susan J. Johns; Leslie A. King; Courtney A. Harper; Elaine C. Meng; Roy E. Lee; Alice Yee; Larry L'Italien; Pao-Tien Chuang; Stephen G. Young; William C. Skarnes; Patricia C. Babbitt; Thomas E. Ferrin

The BayGenomics gene-trap resource (http://baygenomics.ucsf.edu) provides researchers with access to thousands of mouse embryonic stem (ES) cell lines harboring characterized insertional mutations in both known and novel genes. Each cell line contains an insertional mutation in a specific gene. The identity of the gene that has been interrupted can be determined from a DNA sequence tag. Approximately 75% of our cell lines contain insertional mutations in known mouse genes or genes that share strong sequence similarities with genes that have been identified in other organisms. These cell lines readily transmit the mutation to the germline of mice and many mutant lines of mice have already been generated from this resource. BayGenomics provides facile access to our entire database, including sequence tags for each mutant ES cell line, through the World Wide Web. Investigators can browse our resource, search for specific entries, download any portion of our database and BLAST sequences of interest against our entire set of cell line sequence tags. They can then obtain the mutant ES cell line for the purpose of generating knockout mice.


Journal of Biological Chemistry | 1997

Understanding Enzyme Superfamilies CHEMISTRY AS THE FUNDAMENTAL DETERMINANT IN THE EVOLUTION OF NEW CATALYTIC ACTIVITIES

Patricia C. Babbitt; John A. Gerlt

Prior to the discovery in 1990 that mandelate racemase (MR) and muconate-lactonizing enzyme (MLE) are structurally similar enzymes that catalyze different overall reactions (1), structurally related enzymes were assumed to catalyze identical chemical reactions but, perhaps, with distinct substrate specificities. For example, all of the members of the serine protease superfamily were known to catalyze the same chemistry, hydrolyses of peptide bonds, although their peptide substrates varied. As described by Craik and Perona in the previous minireview (2), evolutionary accommodation of these differences in substrate specificity can result in major reorganization of the associated structures. In this minireview, we discuss four recently discovered enzyme superfamilies in which an alternate theme predominates; within each of these superfamilies, the member proteins share a common structural scaffold but catalyze different overall reactions. For each of the superfamilies described, the active sites are contained within a single homologous domain. Although they represent several distinct family folds, the enzyme functions in each superfamily are related to their respective structural scaffolds in the same way; the proteins within each superfamily utilize a common mechanistic strategy for lowering the free energies of the rate-limiting transition states in the reactions they catalyze. The existence of several examples of such superfamilies lends further credence to the principle that the evolution of new catalytic activities involves the incorporation of new catalytic groups within an active site while retaining those groups necessary to catalyze the partial reaction common to all of them (3–5). As a consequence, the range of catalytic functions that can be accommodated by a single structural scaffold is considerably broader than had been previously suspected. Further, the diversity of function that each superfamily represents allows an economy in the number of unique protein folds required to support life and, as a result, undoubtedly has “simplified” the course of metabolic evolution. The Enolase Superfamily: Abstraction of the a-Protons of Carboxylic Acids We recently described the enolase superfamily, the members of which catalyze at least 11 different chemical reactions, including racemization, epimerization, and both syn and anti b-elimination reactions involving water, ammonia, or an intramolecular carboxylate group as leaving group (5). Despite broad differences in substrate structures and the overall reactions they catalyze, all of the reactions of the enolase superfamily are initiated by a common partial reaction, metal-assisted, general base-catalyzed abstraction of the aproton of a carboxylate anion to generate a stabilized enolate anion intermediate (Reaction 1). However, the fate of the intermediate (protonation in the case of racemization and epimerization reactions and vinylogous b-elimination in the others) must be determined by the different functional groups that “surround” the intermediate in the active site. The common partial reaction is thermodynamically difficult: the pKa values of the a-protons in the substrates range from 29 to 32 whereas the pKa values of the conjugate acids of the active site bases accepting the protons are #7. The enzyme-active sites must destabilize the enzyme-substrate complex and/or stabilize the enzyme-intermediate complex so that the free energy of the transition state for a-proton abstraction can be lowered sufficiently to be consistent with the observed kcat values.


PLOS ONE | 2009

Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

Holly J. Atkinson; John H. Morris; Thomas E. Ferrin; Patricia C. Babbitt

The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity—GPCRs and kinases from humans, and the crotonase superfamily of enzymes—we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.


Genome Biology | 2000

Can sequence determine function

John A. Gerlt; Patricia C. Babbitt

The functional annotation of proteins identified in genome sequencing projects is based on similarities to homologs in the databases. As a result of the possible strategies for divergent evolution, homologous enzymes frequently do not catalyze the same reaction, and we conclude that assignment of function from sequence information alone should be viewed with some skepticism.


Nucleic Acids Research | 2006

The International Gene Trap Consortium Website: a portal to all publicly available gene trap cell lines in mouse

Alex S. Nord; Patricia J. Chang; Bruce R. Conklin; Antony V. Cox; Courtney A. Harper; Geoffrey G Hicks; Conrad C. Huang; Susan J. Johns; Michiko Kawamoto; Songyan Liu; Elaine C. Meng; John H. Morris; Janet Rossant; Patricia Ruiz; William C. Skarnes; Philippe Soriano; William L. Stanford; Doug Stryke; Harald von Melchner; Wolfgang Wurst; Ken-ichi Yamamura; Stephen G. Young; Patricia C. Babbitt; Thomas E. Ferrin

Gene trapping is a method of generating murine embryonic stem (ES) cell lines containing insertional mutations in known and novel genes. A number of international groups have used this approach to create sizeable public cell line repositories available to the scientific community for the generation of mutant mouse strains. The major gene trapping groups worldwide have recently joined together to centralize access to all publicly available gene trap lines by developing a user-oriented Website for the International Gene Trap Consortium (IGTC). This collaboration provides an impressive public informatics resource comprising ∼45 000 well-characterized ES cell lines which currently represent ∼40% of known mouse genes, all freely available for the creation of knockout mice on a non-collaborative basis. To standardize annotation and provide high confidence data for gene trap lines, a rigorous identification and annotation pipeline has been developed combining genomic localization and transcript alignment of gene trap sequence tags to identify trapped loci. This information is stored in a new bioinformatics database accessible through the IGTC Website interface. The IGTC Website () allows users to browse and search the database for trapped genes, BLAST sequences against gene trap sequence tags, and view trapped genes within biological pathways. In addition, IGTC data have been integrated into major genome browsers and bioinformatics sites to provide users with outside portals for viewing this data. The development of the IGTC Website marks a major advance by providing the research community with the data and tools necessary to effectively use public gene trap resources for the large-scale characterization of mammalian gene function.


Science | 1995

A functionally diverse enzyme superfamily that abstracts the alpha protons of carboxylic acids

Patricia C. Babbitt; Gregory T. Mrachko; Miriam S. Hasson; Gjalt Huisman; Roberto Kolter; Dagmar Ringe; Gregory A. Petsko; George L. Kenyon; John A. Gerlt

Mandelate racemase and muconate lactonizing enzyme are structurally homologous but catalyze different reactions, each initiated by proton abstraction from carbon. The structural similarity to mandelate racemase of a previously unidentified gene product was used to deduce its function as a galactonate dehydratase. In this enzyme superfamily that has evolved to catalyze proton abstraction from carbon, three variations of homologous active site architectures are now represented: lysine and histidine bases in the active site of mandelate racemase, only a lysine base in the active site of muconate lactonizing enzyme, and only a histidine base in the active site of galactonate dehydratase. This discovery supports the hypothesis that new enzymatic activities evolve by recruitment of a protein catalyzing the same type of chemical reaction.


Biochemistry | 2011

The Enzyme Function Initiative.

John A. Gerlt; Karen N. Allen; Steven C. Almo; Richard N. Armstrong; Patricia C. Babbitt; John E. Cronan; Debra Dunaway-Mariano; Heidi Imker; Matthew P. Jacobson; Wladek Minor; C. Dale Poulter; Frank M. Raushel; Andrej Sali; Brian K. Shoichet; Jonathan V. Sweedler

The Enzyme Function Initiative (EFI) was recently established to address the challenge of assigning reliable functions to enzymes discovered in bacterial genome projects; in this Current Topic, we review the structure and operations of the EFI. The EFI includes the Superfamily/Genome, Protein, Structure, Computation, and Data/Dissemination Cores that provide the infrastructure for reliably predicting the in vitro functions of unknown enzymes. The initial targets for functional assignment are selected from five functionally diverse superfamilies (amidohydrolase, enolase, glutathione transferase, haloalkanoic acid dehalogenase, and isoprenoid synthase), with five superfamily specific Bridging Projects experimentally testing the predicted in vitro enzymatic activities. The EFI also includes the Microbiology Core that evaluates the in vivo context of in vitro enzymatic functions and confirms the functional predictions of the EFI. The deliverables of the EFI to the scientific community include (1) development of a large-scale, multidisciplinary sequence/structure-based strategy for functional assignment of unknown enzymes discovered in genome projects (target selection, protein production, structure determination, computation, experimental enzymology, microbiology, and structure-based annotation), (2) dissemination of the strategy to the community via publications, collaborations, workshops, and symposia, (3) computational and bioinformatic tools for using the strategy, (4) provision of experimental protocols and/or reagents for enzyme production and characterization, and (5) dissemination of data via the EFIs Website, http://enzymefunction.org. The realization of multidisciplinary strategies for functional assignment will begin to define the full metabolic diversity that exists in nature and will impact basic biochemical and evolutionary understanding, as well as a wide range of applications of central importance to industrial, medicinal, and pharmaceutical efforts.


Nucleic Acids Research | 2014

The Structure–Function Linkage Database

Eyal Akiva; Shoshana D. Brown; Daniel E. Almonacid; Alan E. Barber; Ashley F. Custer; Michael A. Hicks; Conrad C. Huang; Florian Lauck; Susan T. Mashiyama; Elaine C. Meng; David Mischel; John H. Morris; Sunil Ojha; Alexandra M. Schnoes; Doug Stryke; Jeffrey M. Yunes; Thomas E. Ferrin; Gemma L. Holliday; Patricia C. Babbitt

The Structure–Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure–function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies ‘look alike’, making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.

Collaboration


Dive into the Patricia C. Babbitt's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Steven C. Almo

Albert Einstein College of Medicine

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eyal Akiva

University of California

View shared research outputs
Top Co-Authors

Avatar

John H. Morris

University of California

View shared research outputs
Researchain Logo
Decentralizing Knowledge