Winona C. Barker
Georgetown University Medical Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Winona C. Barker.
Nucleic Acids Research | 2004
Rolf Apweiler; Amos Marc Bairoch; Cathy H. Wu; Winona C. Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; María Martín; Darren A. Natale; Claire O’Donovan; Nicole Redaschi; Lai-Su L. Yeh
To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.
Methods in Enzymology | 1983
Margaret O. Dayhoff; Winona C. Barker; Lois T. Hunt
Computer-based statistical techniques used to determine homologies between proteins occurring in different species are reviewed. The technique is based on comparison of two protein sequences, either by relating all segments of a given length in one sequence to all segments of the second or by finding the best alignment of the two sequences. Approaches discussed include selection using printed tabulations, identification of very similar sequences, and computer searches of a database. The use of the SEARCH, RELATE, and ALIGN programs (Dayhoff, 1979) is explained; sample data are presented in graphs, diagrams, and tables and the construction of scoring matrices is considered.
Nucleic Acids Research | 2006
Cathy H. Wu; Rolf Apweiler; Amos Marc Bairoch; Darren A. Natale; Winona C. Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; María Martín; Raja Mazumder; Claire O'Donovan; Nicole Redaschi; Baris E. Suzek
The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at or downloaded at .
Nucleic Acids Research | 1992
Winona C. Barker; John S. Garavelli; Peter B. McGarvey; Christopher R. Marzec; Bruce C. Orcutt; Geetha Y. Srinivasarao; Lai-Su L. Yeh; Robert S. Ledley; Hans-Werner Mewes; Friedhelm Pfeiffer; Akira Tsugita; Cathy H. Wu
The Protein Information Resource (PIR; http://www-nbrf.georgetown. edu/pir/) supports research on molecular evolution, functional genomics, and computational biology by maintaining a comprehensive, non-redundant, well-organized and freely available protein sequence database. Since 1988 the database has been maintained collaboratively by PIR-International, an international association of data collection centers cooperating to develop this resource during a period of explosive growth in new sequence data and new computer technologies. The PIR Protein Sequence Database entries are classified into superfamilies, families and homology domains, for which sequence alignments are available. Full-scale family classification supports comparative genomics research, aids sequence annotation, assists database organization and improves database integrity. The PIR WWW server supports direct on-line sequence similarity searches, information retrieval, and knowledge discovery by providing the Protein Sequence Database and other supplementary databases. Sequence entries are extensively cross-referenced and hypertext-linked to major nucleic acid, literature, genome, structure, sequence alignment and family databases. The weekly release of the Protein Sequence Database can be accessed through the PIR Web site. The quarterly release of the database is freely available from our anonymous FTP server and is also available on CD-ROM with the accompanying ATLAS database search program.
Nucleic Acids Research | 2000
Winona C. Barker; John S. Garavelli; Hongzhan Huang; Peter B. McGarvey; Bruce C. Orcutt; Geetha Y. Srinivasarao; Chunlin Xiao; Lai-Su L. Yeh; Robert S. Ledley; Joseph F. Janda; Friedhelm Pfeiffer; Hans-Werner Mewes; Akira Tsugita; Cathy H. Wu
The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site allows sequence similarity and text searching of the Protein Sequence Database and auxiliary databases. Several new web-based search engines combine searches of sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. New capabilities for searching the PIR sequence databases include annotation-sorted search, domain search, combined global and domain search, and interactive text searches. The PIR-International databases and search tools are accessible on the PIR WWW site at http://pir.georgetown.edu and at the MIPS WWW site at http://www. mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
Computational Biology and Chemistry | 2003
Cathy H. Wu; Hongzhan Huang; Lai-Su L. Yeh; Winona C. Barker
With the accelerated accumulation of genomic sequence data, there is a pressing need to develop computational methods and advanced bioinformatics infrastructure for reliable and large-scale protein annotation and biological knowledge discovery. The Protein Information Resource (PIR) provides an integrated public resource of protein informatics to support genomic and proteomic research. PIR produces the Protein Sequence Database of functionally annotated protein sequences. The annotation problems are addressed by a classification-driven and rule-based method with evidence attribution, coupled with an integrated knowledge base system being developed. The approach allows sensitive identification, consistent and rich annotation, and systematic detection of annotation errors, as well as distinction of experimentally verified and computationally predicted features. The knowledge base consists of two new databases, sequence analysis tools, and graphical interfaces. PIR-NREF, a non-redundant reference database, provides a timely and comprehensive collection of all protein sequences, totaling more than 1,000,000 entries. iProClass, an integrated database of protein family, function, and structure information, provides extensive value-added features for about 830,000 proteins with rich links to over 50 molecular databases. This paper describes our approach to protein functional annotation with case studies and examines common identification errors. It also illustrates that data integration in PIR supports exploration of protein relationships and may reveal protein functional associations beyond sequence homology.
Nucleic Acids Research | 2011
Darren A. Natale; Cecilia N. Arighi; Winona C. Barker; Judith A. Blake; Michael Caudy; Harold J. Drabkin; Peter D’Eustachio; Alexei V. Evsikov; Hongzhan Huang; Jules Nchoutmboube; Natalia V. Roberts; Barry Smith; Jian Zhang; Cathy H. Wu
The Protein Ontology (PRO) provides a formal, logically-based classification of specific protein classes including structured representations of protein isoforms, variants and modified forms. Initially focused on proteins found in human, mouse and Escherichia coli, PRO now includes representations of protein complexes. The PRO Consortium works in concert with the developers of other biomedical ontologies and protein knowledge bases to provide the ability to formally organize and integrate representations of precise protein forms so as to enhance accessibility to results of protein research. PRO (http://pir.georgetown.edu/pro) is part of the Open Biomedical Ontology Foundry.
Computational Biology and Chemistry | 2004
Cathy H. Wu; Hongzhan Huang; Anastasia N. Nikolskaya; Zhang-Zhi Hu; Winona C. Barker
Increasingly, scientists have begun to tackle gene functions and other complex regulatory processes by studying organisms at the global scales for various levels of biological organization, ranging from genomes to metabolomes and physiomes. Meanwhile, new bioinformatics methods have been developed for inferring protein function using associative analysis of functional properties to complement the traditional sequence homology-based methods. To fully exploit the value of the high-throughput system biology data and to facilitate protein functional studies requires bioinformatics infrastructures that support both data integration and associative analysis. The iProClass database, designed to serve as a framework for data integration in a distributed networking environment, provides comprehensive descriptions of all proteins, with rich links to over 50 databases of protein family, function, pathway, interaction, modification, structure, genome, ontology, literature, and taxonomy. In particular, the database is organized with PIRSF family classification and maps to other family, function, and structure classification schemes. Coupled with the underlying taxonomic information for complete genomes, the iProClass system (http://pir.georgetown.edu/iproclass/) supports associative studies of protein family, domain, function, and structure. A case study of the phosphoglycerate mutases illustrates a systematic approach for protein family and phylogenetic analysis. Such studies may serve as a basis for further analysis of protein functional evolution, and its relationship to the co-evolution of metabolic pathways, cellular networks, and organisms.
Methods in Enzymology | 1990
David G. George; Winona C. Barker; Lois T. Hunt
Publisher Summary This chapter describes the mutation data matrix (MDM) and its application for comparing protein sequences. Basic to all sequence comparison is the concept of an alignment that defines the relationship between sequences on a residue-by-residue basis. Sequence comparison methods use a scoring matrix that assigns a value to each possible pair of aligned amino acids. One of the most widely used similarity measures is the mutation data matrix (MDM) developed by Dayhoff and colleagues. The first MDM, published in 1968, was derived from over 400 accepted point mutations between present-day sequences and inferred ancestral sequences. Within the Markovian model, the MDM is derived from a transition probability matrix in which each matrix element gives the probability that amino acid A will be replaced by amino acid B in one unit of evolutionary change. The diagonal elements give the probabilities that the amino acids will remain unchanged. The probability of an amino acid being replaced is estimated as its relative mutability, which is calculated as the ratio of the number of observed changes of an amino acid to its total exposure to change.
BMC Bioinformatics | 2007
Darren A. Natale; Cecilia N. Arighi; Winona C. Barker; Judith A. Blake; Ti-Cheng Chang; Zhang-Zhi Hu; Hongfang Liu; Barry Smith; Cathy H. Wu
Biomedical ontologies are emerging as critical tools in genomic and proteomic research, where complex data in disparate resources need to be integrated. A number of ontologies describe properties that can be attributed to proteins. For example, protein functions are described by the Gene Ontology (GO) and human diseases by SNOMED CT or ICD10. There is, however, a gap in the current set of ontologies – one that describes the protein entities themselves and their relationships. We have designed the PR otein O ntology (PRO) to facilitate protein annotation and to guide new experiments. The components of PRO extend from the classification of proteins on the basis of evolutionary relationships to the representation of the multiple protein forms of a gene (products generated by genetic variation, alternative splicing, proteolytic cleavage, and other post-translational modifications). PRO will allow the specification of relationships between PRO, GO and other ontologies in the OBO Foundry. Here we describe the initial development of PRO, illustrated using human and mouse proteins involved in the transforming growth factor-beta and bone morphogenetic protein signaling pathways.