Lai-Su L. Yeh
Georgetown University Medical Center
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Lai-Su L. Yeh.
Nucleic Acids Research | 2004
Rolf Apweiler; Amos Marc Bairoch; Cathy H. Wu; Winona C. Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; María Martín; Darren A. Natale; Claire O’Donovan; Nicole Redaschi; Lai-Su L. Yeh
To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.
Nucleic Acids Research | 1992
Winona C. Barker; John S. Garavelli; Peter B. McGarvey; Christopher R. Marzec; Bruce C. Orcutt; Geetha Y. Srinivasarao; Lai-Su L. Yeh; Robert S. Ledley; Hans-Werner Mewes; Friedhelm Pfeiffer; Akira Tsugita; Cathy H. Wu
The Protein Information Resource (PIR; http://www-nbrf.georgetown. edu/pir/) supports research on molecular evolution, functional genomics, and computational biology by maintaining a comprehensive, non-redundant, well-organized and freely available protein sequence database. Since 1988 the database has been maintained collaboratively by PIR-International, an international association of data collection centers cooperating to develop this resource during a period of explosive growth in new sequence data and new computer technologies. The PIR Protein Sequence Database entries are classified into superfamilies, families and homology domains, for which sequence alignments are available. Full-scale family classification supports comparative genomics research, aids sequence annotation, assists database organization and improves database integrity. The PIR WWW server supports direct on-line sequence similarity searches, information retrieval, and knowledge discovery by providing the Protein Sequence Database and other supplementary databases. Sequence entries are extensively cross-referenced and hypertext-linked to major nucleic acid, literature, genome, structure, sequence alignment and family databases. The weekly release of the Protein Sequence Database can be accessed through the PIR Web site. The quarterly release of the database is freely available from our anonymous FTP server and is also available on CD-ROM with the accompanying ATLAS database search program.
Nucleic Acids Research | 2000
Winona C. Barker; John S. Garavelli; Hongzhan Huang; Peter B. McGarvey; Bruce C. Orcutt; Geetha Y. Srinivasarao; Chunlin Xiao; Lai-Su L. Yeh; Robert S. Ledley; Joseph F. Janda; Friedhelm Pfeiffer; Hans-Werner Mewes; Akira Tsugita; Cathy H. Wu
The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site allows sequence similarity and text searching of the Protein Sequence Database and auxiliary databases. Several new web-based search engines combine searches of sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. New capabilities for searching the PIR sequence databases include annotation-sorted search, domain search, combined global and domain search, and interactive text searches. The PIR-International databases and search tools are accessible on the PIR WWW site at http://pir.georgetown.edu and at the MIPS WWW site at http://www. mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
Computational Biology and Chemistry | 2003
Cathy H. Wu; Hongzhan Huang; Lai-Su L. Yeh; Winona C. Barker
With the accelerated accumulation of genomic sequence data, there is a pressing need to develop computational methods and advanced bioinformatics infrastructure for reliable and large-scale protein annotation and biological knowledge discovery. The Protein Information Resource (PIR) provides an integrated public resource of protein informatics to support genomic and proteomic research. PIR produces the Protein Sequence Database of functionally annotated protein sequences. The annotation problems are addressed by a classification-driven and rule-based method with evidence attribution, coupled with an integrated knowledge base system being developed. The approach allows sensitive identification, consistent and rich annotation, and systematic detection of annotation errors, as well as distinction of experimentally verified and computationally predicted features. The knowledge base consists of two new databases, sequence analysis tools, and graphical interfaces. PIR-NREF, a non-redundant reference database, provides a timely and comprehensive collection of all protein sequences, totaling more than 1,000,000 entries. iProClass, an integrated database of protein family, function, and structure information, provides extensive value-added features for about 830,000 proteins with rich links to over 50 molecular databases. This paper describes our approach to protein functional annotation with case studies and examines common identification errors. It also illustrates that data integration in PIR supports exploration of protein relationships and may reveal protein functional associations beyond sequence homology.
Nucleic Acids Research | 2001
Winona C. Barker; John S. Garavelli; Zhenglin Hou; Hongzhan Huang; Robert S. Ledley; Peter B. McGarvey; Hans-Werner Mewes; Bruce C. Orcutt; Friedhelm Pfeiffer; Akira Tsugita; C. R. Vinayaka; Chunlin Xiao; Lai-Su L. Yeh; Cathy H. Wu
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200,000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter-national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
Journal of Molecular Evolution | 1985
David G. George; Lois T. Hunt; Lai-Su L. Yeh; Winona C. Barker
SummaryRecent evidence indicates that a gene transposition event occurred during the evolution of the bacterial ferredoxins subsequent to the ancestral intrasequence gene duplication. In light of this new information, the relationships among the bacterial ferredoxins were reexamined and an evolutionary tree consistent with this new understanding was derived. The bacterial ferredoxins can be divided into several groups based on their sequence properties; these include the clostridial-type ferredoxins, theAzotobacter-type ferredoxins, and a group containing the ferredoxins from the anaerobic, green, and purple sulfur bacteria. Based on sequence comparison, it was concluded that the amino-terminal domain of theAzotobacter-type ferredoxins, which contains the novel 3Fe∶3S cluster binding site, is homologous with the carboxyl-terminal domain of the ferredoxins from the anaerobic photosynthetic bacteria.A number of ferredoxin sequences do not fit into any of the groups described above. Based on sequence properties, these sequences can be separated into three groups: a group containingMethanosarcina barkeri ferredoxin andDesulfovibrio desulfuricans ferredoxin II, a group containingDesulfovibrio gigas ferredoxin andClostridium thermoaceticum ferredoxin, and a group containingDesulfovibrio africanus ferredoxin I andBacillus stearothermophilus ferredoxin. The last two groups differ from all of the other bacterial ferredoxins in that they bind only one Fe∶S cluster per polypeptide, whereas the others bind two. Sequence examination indicates that the second binding site has been either partially or completely lost from these ferredoxins.Methanosarcina barkeri ferredoxin andDesulfovibrio desulfuricans ferredoxin II are of interest because, of all the ferredoxins whose sequences are presently known, they show the strongest evidence of internal gene duplication. However, the derived evolutionary tree indicates that they diverged from theAzotobacter-type ferredoxins well after the ancestral internal gene duplication. This apparent discrepancy is explained by postulating a duplication of one halfchain sequence and a deletion of the other halfchain. TheClostridium thermoaceticum andBacillus stearothermophilus groups diverged from this line and subsequently lost one of the Fe∶S binding sites.It has recently become apparent that gene duplication is ubiquitous among the ferredoxins. Several organisms are now known to have a variety of ferredoxins with widely divergent properties. Unfortunately, in only one case are the sequences of more than one ferredoxin from the same organism known. Thus, although the major features of the bacterial ferredoxin tree are now understood, a complete bacterial phylogeny cannot be inferred until more sequence information is available.
Nucleic Acids Research | 1997
David G. George; Robert J. Dodson; John S. Garavelli; Daniel H. Haft; Lois T. Hunt; Christopher R. Marzec; Bruce C. Orcutt; Kathryn E. Sidman; Geetha Y. Srinivasarao; Lai-Su L. Yeh; Leslie Arminski; Robert S. Ledley; Akira Tsugita; Winona C. Barker
From its origin, the PIR has aspired to support research in computational biology and genomics through the compilation of a comprehensive, quality controlled and well-organized protein sequence information resource. The resource originated with the pioneering work of the late Margaret O. Dayhoff in the early 1960s. Since 1988, the Protein Sequence Database has been maintained collaboratively by PIR-International, an association of macromolecular sequence data collection centers dedicated to fostering international cooperation as an essential element in the development of scientific databases. The work of the resource is widely distributed and is available on the World Wide Web, via FTP, E-mail server, CD-ROM and magnetic media. It is widely redistributed and incorporated into many other protein sequence data compilations including SWISS-PROT and theEntrezsystem of the NCBI.
Bioinformatics | 2000
Peter B. McGarvey; Hongzhan Huang; Winona C. Barker; Bruce C. Orcutt; John S. Garavelli; Geetha Y. Srinivasarao; Lai-Su L. Yeh; Chunlin Xiao; Cathy H. Wu
UNLABELLED The Protein Information Resource (PIR) has greatly expanded its Web site and developed a set of interactive search and analysis tools to facilitate the analysis, annotation, and functional identification of proteins. New search engines have been implemented to combine sequence similarity search results with database annotation information. The new PIR search systems have proved very useful in providing enriched functional annotation of protein sequences, determining protein superfamily-domain relationships, and detecting annotation errors in genomic database archives. AVAILABILITY http://pir.georgetown.edu/. CONTACT [email protected]
Biochemical and Biophysical Research Communications | 1983
David G. George; Lai-Su L. Yeh; Winona C. Barker
Hypothetical lambda protein ORF314 shows significant homology with the carboxyl end of phage T4 tail-fiber protein gp37. Homology can also be demonstrated between hypothetical lambda protein ORF194 and a fragment of bacteriophage T4 protein gp38. This sequence homology is also reflected in the genomic sequences of these two phages.
Bioinformatics | 1999
Geetha Y. Srinivasarao; Lai-Su L. Yeh; Christopher R. Marzec; Bruce C. Orcutt; Winona C. Barker
MOTIVATION The Protein Information Resource (PIR) maintains a database of annotated and curated alignments in order to visually represent interrelationships among sequences in the PIR-International Protein Sequence Database, to spread and standardize protein names, features and keywords among members of a family or superfamily, and to aid us in classifying sequences, in identifying conserved regions, and in defining new homology domains. RESULTS Release 22.0, (December 1998), of the PIR-ALN database contains a total of 3806 alignments, including 1303 superfamily, 2131 family and 372 homology domain alignments. This is an appropriate dataset to develop and extract patterns, test profiles, train neural networks or build Hidden Markov Models (HMMs). These alignments can be used to standardize and spread annotation to newer members by homology, as well as to understand the modular architecture of multidomain proteins. PIR-ALN includes 529 alignments that can be used to develop patterns not represented in PROSITE, Blocks, PRINTS and Pfam databases. The ATLAS information retrieval system can be used to browse and query the PIR-ALN alignments. AVAILABILITY PIR-ALN is currently being distributed as a single ASCII text file along with the title, member, species, superfamily and keyword indexes. The quarterly and weekly updates can be accessed via the WWW at pir.georgetown.edu. The quarterly updates can also be obtained by anonymous FTP from the PIR FTP site at NBRF.Georgetown.edu, directory [ANONYMOUS.PIR.ALIGNMENT].