Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hongzhan Huang is active.

Publication


Featured researches published by Hongzhan Huang.


Nucleic Acids Research | 2004

UniProt: the Universal Protein knowledgebase

Rolf Apweiler; Amos Marc Bairoch; Cathy H. Wu; Winona C. Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; María Martín; Darren A. Natale; Claire O’Donovan; Nicole Redaschi; Lai-Su L. Yeh

To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.


Nucleic Acids Research | 2006

The Universal Protein Resource (UniProt): an expanding universe of protein information

Cathy H. Wu; Rolf Apweiler; Amos Marc Bairoch; Darren A. Natale; Winona C. Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; María Martín; Raja Mazumder; Claire O'Donovan; Nicole Redaschi; Baris E. Suzek

The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at or downloaded at .


Nucleic Acids Research | 2015

The InterPro protein families database: the classification resource after 15 years

Alex L. Mitchell; Hsin-Yu Chang; Louise Daugherty; Matthew Fraser; Sarah Hunter; Rodrigo Lopez; Craig McAnulla; Conor McMenamin; Gift Nuka; Sebastien Pesseat; Amaia Sangrador-Vegas; Maxim Scheremetjew; Claudia Rato; Siew-Yit Yong; Alex Bateman; Marco Punta; Teresa K. Attwood; Christian J. A. Sigrist; Nicole Redaschi; Catherine Rivoire; Ioannis Xenarios; Daniel Kahn; Dominique Guyot; Peer Bork; Ivica Letunic; Julian Gough; Matt E. Oates; Daniel H. Haft; Hongzhan Huang; Darren A. Natale

The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36 766 member database signatures integrated into 26 238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.


Bioinformatics | 2007

UniRef: comprehensive and non-redundant UniProt reference clusters

Baris E. Suzek; Hongzhan Huang; Peter B. McGarvey; Raja Mazumder; Cathy H. Wu

MOTIVATION Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. RESULTS The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis. AVAILABILITY UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Nucleic Acids Research | 2017

InterPro in 2017—beyond protein family and domain annotations

Robert D. Finn; Teresa K. Attwood; Patricia C. Babbitt; Alex Bateman; Peer Bork; Alan Bridge; Hsin Yu Chang; Zsuzsanna Dosztányi; Sara El-Gebali; Matthew Fraser; Julian Gough; David R Haft; Gemma L. Holliday; Hongzhan Huang; Xiaosong Huang; Ivica Letunic; Rodrigo Lopez; Shennan Lu; Huaiyu Mi; Jaina Mistry; Darren A. Natale; Marco Necci; Gift Nuka; Christine A. Orengo; Youngmi Park; Sebastien Pesseat; Damiano Piovesan; Simon Potter; Neil D. Rawlings; Nicole Redaschi

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPros predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Nucleic Acids Research | 2000

The Protein Information Resource (PIR)

Winona C. Barker; John S. Garavelli; Hongzhan Huang; Peter B. McGarvey; Bruce C. Orcutt; Geetha Y. Srinivasarao; Chunlin Xiao; Lai-Su L. Yeh; Robert S. Ledley; Joseph F. Janda; Friedhelm Pfeiffer; Hans-Werner Mewes; Akira Tsugita; Cathy H. Wu

The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site allows sequence similarity and text searching of the Protein Sequence Database and auxiliary databases. Several new web-based search engines combine searches of sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. New capabilities for searching the PIR sequence databases include annotation-sorted search, domain search, combined global and domain search, and interactive text searches. The PIR-International databases and search tools are accessible on the PIR WWW site at http://pir.georgetown.edu and at the MIPS WWW site at http://www. mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.


Bioinformatics | 2015

UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches

Baris E. Suzek; Yuqi Wang; Hongzhan Huang; Peter B. McGarvey; Cathy H. Wu

Motivation: UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. Results: Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters, implying that clusters are useful for annotation and can also be used to detect annotation inconsistencies. To examine coverage in similarity results, BLASTP searches against UniRef50 followed by expansion of the hit lists with cluster members demonstrated advantages compared with searches against UniProtKB sequences; the searches are concise (∼7 times shorter hit list before expansion), faster (∼6 times) and more sensitive in detection of remote similarities (>96% recall at e-value <0.0001). Our results support the use of UniRef clusters as a comprehensive and scalable alternative to native sequence databases for similarity searches and reinforces its reliability for use in functional annotation. Availability and implementation: Web access and file download from UniProt website at http://www.uniprot.org/uniref and ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. BLAST searches against UniRef are available at http://www.uniprot.org/blast/ Contact: [email protected]


Computational Biology and Chemistry | 2003

Protein family classification and functional annotation

Cathy H. Wu; Hongzhan Huang; Lai-Su L. Yeh; Winona C. Barker

With the accelerated accumulation of genomic sequence data, there is a pressing need to develop computational methods and advanced bioinformatics infrastructure for reliable and large-scale protein annotation and biological knowledge discovery. The Protein Information Resource (PIR) provides an integrated public resource of protein informatics to support genomic and proteomic research. PIR produces the Protein Sequence Database of functionally annotated protein sequences. The annotation problems are addressed by a classification-driven and rule-based method with evidence attribution, coupled with an integrated knowledge base system being developed. The approach allows sensitive identification, consistent and rich annotation, and systematic detection of annotation errors, as well as distinction of experimentally verified and computationally predicted features. The knowledge base consists of two new databases, sequence analysis tools, and graphical interfaces. PIR-NREF, a non-redundant reference database, provides a timely and comprehensive collection of all protein sequences, totaling more than 1,000,000 entries. iProClass, an integrated database of protein family, function, and structure information, provides extensive value-added features for about 830,000 proteins with rich links to over 50 molecular databases. This paper describes our approach to protein functional annotation with case studies and examines common identification errors. It also illustrates that data integration in PIR supports exploration of protein relationships and may reveal protein functional associations beyond sequence homology.


Nucleic Acids Research | 2011

The Protein Ontology: a structured representation of protein forms and complexes

Darren A. Natale; Cecilia N. Arighi; Winona C. Barker; Judith A. Blake; Michael Caudy; Harold J. Drabkin; Peter D’Eustachio; Alexei V. Evsikov; Hongzhan Huang; Jules Nchoutmboube; Natalia V. Roberts; Barry Smith; Jian Zhang; Cathy H. Wu

The Protein Ontology (PRO) provides a formal, logically-based classification of specific protein classes including structured representations of protein isoforms, variants and modified forms. Initially focused on proteins found in human, mouse and Escherichia coli, PRO now includes representations of protein complexes. The PRO Consortium works in concert with the developers of other biomedical ontologies and protein knowledge bases to provide the ability to formally organize and integrate representations of precise protein forms so as to enhance accessibility to results of protein research. PRO (http://pir.georgetown.edu/pro) is part of the Open Biomedical Ontology Foundry.


Computational Biology and Chemistry | 2004

The iProClass integrated database for protein functional analysis

Cathy H. Wu; Hongzhan Huang; Anastasia N. Nikolskaya; Zhang-Zhi Hu; Winona C. Barker

Increasingly, scientists have begun to tackle gene functions and other complex regulatory processes by studying organisms at the global scales for various levels of biological organization, ranging from genomes to metabolomes and physiomes. Meanwhile, new bioinformatics methods have been developed for inferring protein function using associative analysis of functional properties to complement the traditional sequence homology-based methods. To fully exploit the value of the high-throughput system biology data and to facilitate protein functional studies requires bioinformatics infrastructures that support both data integration and associative analysis. The iProClass database, designed to serve as a framework for data integration in a distributed networking environment, provides comprehensive descriptions of all proteins, with rich links to over 50 databases of protein family, function, pathway, interaction, modification, structure, genome, ontology, literature, and taxonomy. In particular, the database is organized with PIRSF family classification and maps to other family, function, and structure classification schemes. Coupled with the underlying taxonomic information for complete genomes, the iProClass system (http://pir.georgetown.edu/iproclass/) supports associative studies of protein family, domain, function, and structure. A case study of the phosphoglycerate mutases illustrates a systematic approach for protein family and phylogenetic analysis. Such studies may serve as a basis for further analysis of protein functional evolution, and its relationship to the co-evolution of metabolic pathways, cellular networks, and organisms.

Collaboration


Dive into the Hongzhan Huang's collaboration.

Top Co-Authors

Avatar

Cathy H. Wu

University of Delaware

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peter B. McGarvey

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Winona C. Barker

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Darren A. Natale

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Jian Zhang

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Zhang-Zhi Hu

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Baris E. Suzek

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Lai-Su L. Yeh

Georgetown University Medical Center

View shared research outputs
Researchain Logo
Decentralizing Knowledge