Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peter B. McGarvey is active.

Publication


Featured researches published by Peter B. McGarvey.


Bioinformatics | 2007

UniRef: comprehensive and non-redundant UniProt reference clusters

Baris E. Suzek; Hongzhan Huang; Peter B. McGarvey; Raja Mazumder; Cathy H. Wu

MOTIVATION Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences. RESULTS The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering >4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of approximately 10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis. AVAILABILITY UniRef is updated biweekly and is available for online search and retrieval at http://www.uniprot.org, as well as for download at ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Nature | 2014

Proteogenomic characterization of human colon and rectal cancer

Bing Zhang; Jing Wang; Xiaojing Wang; Jing Zhu; Qi Liu; Zhiao Shi; Matthew C. Chambers; Lisa J. Zimmerman; Kent Shaddox; Sangtae Kim; Sherri R. Davies; Sean Wang; Pei Wang; Christopher R. Kinsinger; Robert Rivers; Henry Rodriguez; R. Reid Townsend; Matthew J. Ellis; Steven A. Carr; David L. Tabb; Robert J. Coffey; Robbert J. C. Slebos; Daniel C. Liebler; Michael A. Gillette; Karl R. Klauser; Eric Kuhn; D. R. Mani; Philipp Mertins; Karen A. Ketchum; Amanda G. Paulovich

Extensive genomic characterization of human cancers presents the problem of inference from genomic abnormalities to cancer phenotypes. To address this problem, we analysed proteomes of colon and rectal tumours characterized previously by The Cancer Genome Atlas (TCGA) and perform integrated proteogenomic analyses. Somatic variants displayed reduced protein abundance compared to germline variants. Messenger RNA transcript abundance did not reliably predict protein abundance differences between tumours. Proteomics identified five proteomic subtypes in the TCGA cohort, two of which overlapped with the TCGA ‘microsatellite instability/CpG island methylation phenotype’ transcriptomic subtype, but had distinct mutation, methylation and protein expression patterns associated with different clinical outcomes. Although copy number alterations showed strong cis- and trans-effects on mRNA abundance, relatively few of these extend to the protein level. Thus, proteomics data enabled prioritization of candidate driver genes. The chromosome 20q amplicon was associated with the largest global changes at both mRNA and protein levels; proteomics data highlighted potential 20q candidates, including HNF4A (hepatocyte nuclear factor 4, alpha), TOMM34 (translocase of outer mitochondrial membrane 34) and SRC (SRC proto-oncogene, non-receptor tyrosine kinase). Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords a new paradigm for understanding cancer biology.


Nucleic Acids Research | 1992

The PIR-International Protein Sequence Database

Winona C. Barker; John S. Garavelli; Peter B. McGarvey; Christopher R. Marzec; Bruce C. Orcutt; Geetha Y. Srinivasarao; Lai-Su L. Yeh; Robert S. Ledley; Hans-Werner Mewes; Friedhelm Pfeiffer; Akira Tsugita; Cathy H. Wu

The Protein Information Resource (PIR; http://www-nbrf.georgetown. edu/pir/) supports research on molecular evolution, functional genomics, and computational biology by maintaining a comprehensive, non-redundant, well-organized and freely available protein sequence database. Since 1988 the database has been maintained collaboratively by PIR-International, an international association of data collection centers cooperating to develop this resource during a period of explosive growth in new sequence data and new computer technologies. The PIR Protein Sequence Database entries are classified into superfamilies, families and homology domains, for which sequence alignments are available. Full-scale family classification supports comparative genomics research, aids sequence annotation, assists database organization and improves database integrity. The PIR WWW server supports direct on-line sequence similarity searches, information retrieval, and knowledge discovery by providing the Protein Sequence Database and other supplementary databases. Sequence entries are extensively cross-referenced and hypertext-linked to major nucleic acid, literature, genome, structure, sequence alignment and family databases. The weekly release of the Protein Sequence Database can be accessed through the PIR Web site. The quarterly release of the database is freely available from our anonymous FTP server and is also available on CD-ROM with the accompanying ATLAS database search program.


Nucleic Acids Research | 2000

The Protein Information Resource (PIR)

Winona C. Barker; John S. Garavelli; Hongzhan Huang; Peter B. McGarvey; Bruce C. Orcutt; Geetha Y. Srinivasarao; Chunlin Xiao; Lai-Su L. Yeh; Robert S. Ledley; Joseph F. Janda; Friedhelm Pfeiffer; Hans-Werner Mewes; Akira Tsugita; Cathy H. Wu

The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site allows sequence similarity and text searching of the Protein Sequence Database and auxiliary databases. Several new web-based search engines combine searches of sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. New capabilities for searching the PIR sequence databases include annotation-sorted search, domain search, combined global and domain search, and interactive text searches. The PIR-International databases and search tools are accessible on the PIR WWW site at http://pir.georgetown.edu and at the MIPS WWW site at http://www. mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.


Bioinformatics | 2015

UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches

Baris E. Suzek; Yuqi Wang; Hongzhan Huang; Peter B. McGarvey; Cathy H. Wu

Motivation: UniRef databases provide full-scale clustering of UniProtKB sequences and are utilized for a broad range of applications, particularly similarity-based functional annotation. Non-redundancy and intra-cluster homogeneity in UniRef were recently improved by adding a sequence length overlap threshold. Our hypothesis is that these improvements would enhance the speed and sensitivity of similarity searches and improve the consistency of annotation within clusters. Results: Intra-cluster molecular function consistency was examined by analysis of Gene Ontology terms. Results show that UniRef clusters bring together proteins of identical molecular function in more than 97% of the clusters, implying that clusters are useful for annotation and can also be used to detect annotation inconsistencies. To examine coverage in similarity results, BLASTP searches against UniRef50 followed by expansion of the hit lists with cluster members demonstrated advantages compared with searches against UniProtKB sequences; the searches are concise (∼7 times shorter hit list before expansion), faster (∼6 times) and more sensitive in detection of remote similarities (>96% recall at e-value <0.0001). Our results support the use of UniRef clusters as a comprehensive and scalable alternative to native sequence databases for similarity searches and reinforces its reliability for use in functional annotation. Availability and implementation: Web access and file download from UniProt website at http://www.uniprot.org/uniref and ftp://ftp.uniprot.org/pub/databases/uniprot/uniref. BLAST searches against UniRef are available at http://www.uniprot.org/blast/ Contact: [email protected]


Cell | 2016

Integrated proteogenomic characterization of human high-grade serous ovarian cancer

Hui Zhang; Tao Liu; Zhen Zhang; Samuel H. Payne; Bai Zhang; Jason E. McDermott; Jian-Ying Zhou; Vladislav A. Petyuk; Li Chen; Debjit Ray; Shisheng Sun; Feng Yang; Lijun Chen; Jing Wang; Punit Shah; Seong Won Cha; Paul Aiyetan; Sunghee Woo; Yuan Tian; Marina A. Gritsenko; Therese R. Clauss; Caitlin H. Choi; Matthew E. Monroe; Stefani N. Thomas; Song Nie; Chaochao Wu; Ronald J. Moore; Kun-Hsing Yu; David L. Tabb; David Fenyö

To provide a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer, we performed a comprehensive mass-spectrometry-based proteomic characterization of 174 ovarian tumors previously analyzed by The Cancer Genome Atlas (TCGA), of which 169 were high-grade serous carcinomas (HGSCs). Integrating our proteomic measurements with the genomic data yielded a number of insights into disease, such as how different copy-number alternations influence the proteome, the proteins associated with chromosomal instability, the sets of signaling pathways that diverse genome rearrangements converge on, and the ones most associated with short overall survival. Specific protein acetylations associated with homologous recombination deficiency suggest a potential means for stratifying patients for therapy. In addition to providing a valuable resource, these findings provide a view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC. VIDEO ABSTRACT.


Nucleic Acids Research | 2001

Protein Information Resource: a community resource for expert annotation of protein data

Winona C. Barker; John S. Garavelli; Zhenglin Hou; Hongzhan Huang; Robert S. Ledley; Peter B. McGarvey; Hans-Werner Mewes; Bruce C. Orcutt; Friedhelm Pfeiffer; Akira Tsugita; C. R. Vinayaka; Chunlin Xiao; Lai-Su L. Yeh; Cathy H. Wu

The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200,000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter-national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.


Bioinformatics | 2000

PIR: a new resource for bioinformatics

Peter B. McGarvey; Hongzhan Huang; Winona C. Barker; Bruce C. Orcutt; John S. Garavelli; Geetha Y. Srinivasarao; Lai-Su L. Yeh; Chunlin Xiao; Cathy H. Wu

UNLABELLED The Protein Information Resource (PIR) has greatly expanded its Web site and developed a set of interactive search and analysis tools to facilitate the analysis, annotation, and functional identification of proteins. New search engines have been implemented to combine sequence similarity search results with database annotation information. The new PIR search systems have proved very useful in providing enriched functional annotation of protein sequences, determining protein superfamily-domain relationships, and detecting annotation errors in genomic database archives. AVAILABILITY http://pir.georgetown.edu/. CONTACT [email protected]


Bioinformatics | 2011

A comprehensive protein-centric ID mapping service for molecular data integration

Hongzhan Huang; Peter B. McGarvey; Baris E. Suzek; Raja Mazumder; Jian Zhang; Yongxing Chen; Cathy H. Wu

MOTIVATION Identifier (ID) mapping establishes links between various biological databases and is an essential first step for molecular data integration and functional annotation. ID mapping allows diverse molecular data on genes and proteins to be combined and mapped to functional pathways and ontologies. We have developed comprehensive protein-centric ID mapping services providing mappings for 90 IDs derived from databases on genes, proteins, pathways, diseases, structures, protein families, protein interaction, literature, ontologies, etc. The services are widely used and have been regularly updated since 2006. AVAILABILITY www.uniprot.org/mappingandproteininformation-resource.org/pirwww/search/idmapping.shtml CONTACT [email protected].


Journal of Proteome Research | 2015

The CPTAC Data Portal: A Resource for Cancer Proteomics Research

Nathan Edwards; Mauricio Oberti; Ratna Thangudu; Shuang Cai; Peter B. McGarvey; Shine Jacob; Subha Madhavan; Karen A. Ketchum

The Clinical Proteomic Tumor Analysis Consortium (CPTAC), under the auspices of the National Cancer Institutes Office of Cancer Clinical Proteomics Research, is a comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through the application of proteomic technologies and workflows to clinical tumor samples with characterized genomic and transcript profiles. The consortium analyzes cancer biospecimens using mass spectrometry, identifying and quantifying the constituent proteins and characterizing each tumor samples proteome. Mass spectrometry enables highly specific identification of proteins and their isoforms, accurate relative quantitation of protein abundance in contrasting biospecimens, and localization of post-translational protein modifications, such as phosphorylation, on a proteins sequence. The combination of proteomics, transcriptomics, and genomics data from the same clinical tumor samples provides an unprecedented opportunity for tumor proteogenomics. The CPTAC Data Portal is the centralized data repository for the dissemination of proteomic data collected by Proteome Characterization Centers (PCCs) in the consortium. The portal currently hosts 6.3 TB of data and includes proteomic investigations of breast, colorectal, and ovarian tumor tissues from The Cancer Genome Atlas (TCGA). The data collected by the consortium is made freely available to the public through the data portal.

Collaboration


Dive into the Peter B. McGarvey's collaboration.

Top Co-Authors

Avatar

Cathy H. Wu

University of Delaware

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Baris E. Suzek

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jian Zhang

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Raja Mazumder

George Washington University

View shared research outputs
Top Co-Authors

Avatar

Bruce C. Orcutt

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar

James N. Baraniuk

Georgetown University Medical Center

View shared research outputs
Top Co-Authors

Avatar

Lai-Su L. Yeh

Georgetown University Medical Center

View shared research outputs
Researchain Logo
Decentralizing Knowledge