Jessica H. Fong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jessica H. Fong is active.

Explore More

Publication

Featured researches published by Jessica H. Fong.

Nucleic Acids Research | 2011

CDD: a Conserved Domain Database for the functional annotation of proteins

Shennan Lu; John B. Anderson; Farideh Chitsaz; Myra K. Derbyshire; Carol DeWeese-Scott; Jessica H. Fong; Lewis Y. Geer; Renata C. Geer; Noreen R. Gonzales; Marc Gwadz; David I. Hurwitz; John D. Jackson; Zhaoxi Ke; Christopher J. Lanczycki; Fu-Ping Lu; Gabriele H. Marchler; Mikhail Mullokandov; Marina V. Omelchenko; Cynthia L. Robertson; James S. Song; Narmada Thanki; Roxanne A. Yamashita; Dachuan Zhang; Naigong Zhang; Chanjuan Zheng; Stephen H. Bryant

NCBI’s Conserved Domain Database (CDD) is a resource for the annotation of protein sequences with the location of conserved domain footprints, and functional sites inferred from these footprints. CDD includes manually curated domain models that make use of protein 3D structure to refine domain models and provide insights into sequence/structure/function relationships. Manually curated models are organized hierarchically if they describe domain families that are clearly related by common descent. As CDD also imports domain family models from a variety of external sources, it is a partially redundant collection. To simplify protein annotation, redundant models and models describing homologous families are clustered into superfamilies. By default, domain footprints are annotated with the corresponding superfamily designation, on top of which specific annotation may indicate high-confidence assignment of family membership. Pre-computed domain annotation is available for proteins in the Entrez/Protein dataset, and a novel interface, Batch CD-Search, allows the computation and download of annotation for large sets of protein queries. CDD can be accessed via http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.

Nucleic Acids Research | 2009

CDD: specific functional annotation with the Conserved Domain Database.

John B. Anderson; Farideh Chitsaz; Myra K. Derbyshire; Carol DeWeese-Scott; Jessica H. Fong; Lewis Y. Geer; Renata C. Geer; Noreen R. Gonzales; Marc Gwadz; Siqian He; David I. Hurwitz; John D. Jackson; Zhaoxi Ke; Christopher J. Lanczycki; Cynthia A. Liebert; Chunlei Liu; Fu-er Lu; Shennan Lu; Gabriele H. Marchler; Mikhail Mullokandov; James S. Song; Asba Tasneem; Narmada Thanki; Roxanne A. Yamashita; Dachuan Zhang; Naigong Zhang; Stephen H. Bryant

NCBIs Conserved Domain Database (CDD) is a collection of multiple sequence alignments and derived database search models, which represent protein domains conserved in molecular evolution. The collection can be accessed at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml, and is also part of NCBIs Entrez query and retrieval system, cross-linked to numerous other resources. CDD provides annotation of domain footprints and conserved functional sites on protein sequences. Precalculated domain annotation can be retrieved for protein sequences tracked in NCBIs Entrez system, and CDDs collection of models can be queried with novel protein sequences via the CD-Search service at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Starting with the latest version of CDD, v2.14, information from redundant and homologous domain models is summarized at a superfamily level, and domain annotation on proteins is flagged as either ‘specific’ (identifying molecular function with high confidence) or as ‘non-specific’ (identifying superfamily membership only).

PLOS Computational Biology | 2009

Intrinsic Disorder in Protein Interactions: Insights From a Comprehensive Structural Analysis

Jessica H. Fong; Benjamin A. Shoemaker; Sergiy O. Garbuzynskiy; Michail Yu. Lobanov; Oxana V. Galzitskaya; Anna R. Panchenko

We perform a large-scale study of intrinsically disordered regions in proteins and protein complexes using a non-redundant set of hundreds of different protein complexes. In accordance with the conventional view that folding and binding are coupled, in many of our cases the disorder-to-order transition occurs upon complex formation and can be localized to binding interfaces. Moreover, analysis of disorder in protein complexes depicts a significant fraction of intrinsically disordered regions, with up to one third of all residues being disordered. We find that the disorder in homodimers, especially in symmetrical homodimers, is significantly higher than in heterodimers and offer an explanation for this interesting phenomenon. We argue that the mechanisms of regulation of binding specificity through disordered regions in complexes can be as common as for unbound monomeric proteins. The fascinating diversity of roles of disordered regions in various biological processes and protein oligomeric forms shown in our study may be a subject of future endeavors in this area.

Nucleic Acids Research | 2012

MMDB: 3D structures and macromolecular interactions

Thomas Madej; Kenneth J. Addess; Jessica H. Fong; Lewis Y. Geer; Renata C. Geer; Christopher J. Lanczycki; Chunlei Liu; Shennan Lu; Anna R. Panchenko; Jie Chen; Paul A. Thiessen; Yanli Wang; Dachuan Zhang; Stephen H. Bryant

Close to 60% of protein sequences tracked in comprehensive databases can be mapped to a known three-dimensional (3D) structure by standard sequence similarity searches. Potentially, a great deal can be learned about proteins or protein families of interest from considering 3D structure, and to this day 3D structure data may remain an underutilized resource. Here we present enhancements in the Molecular Modeling Database (MMDB) and its data presentation, specifically pertaining to biologically relevant complexes and molecular interactions. MMDB is tightly integrated with NCBIs Entrez search and retrieval system, and mirrors the contents of the Protein Data Bank. It links protein 3D structure data with sequence data, sequence classification resources and PubChem, a repository of small-molecule chemical structures and their biological activities, facilitating access to 3D structure data not only for structural biologists, but also for molecular biologists and chemists. MMDB provides a complete set of detailed and pre-computed structural alignments obtained with the VAST algorithm, and provides visualization tools for 3D structure and structure/sequence alignment via the molecular graphics viewer Cn3D. MMDB can be accessed at http://www.ncbi.nlm.nih.gov/structure.

Nucleic Acids Research | 2010

Inferred Biomolecular Interaction Server—a web server to analyze and predict protein interacting partners and binding sites

Benjamin A. Shoemaker; Dachuan Zhang; Ratna R. Thangudu; Manoj Tyagi; Jessica H. Fong; Stephen H. Bryant; Thomas Madej; Anna R. Panchenko

IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.html.

Nucleic Acids Research | 2012

IBIS (Inferred Biomolecular Interaction Server) reports, predicts and integrates multiple types of conserved interactions for proteins

Benjamin A. Shoemaker; Dachuan Zhang; Manoj Tyagi; Ratna R. Thangudu; Jessica H. Fong; Stephen H. Bryant; Thomas Madej; Anna R. Panchenko

We have recently developed the Inferred Biomolecular Interaction Server (IBIS) and database, which reports, predicts and integrates different types of interaction partners and locations of binding sites in proteins based on the analysis of homologous structural complexes. Here, we highlight several new IBIS features and options. The servers webpage is now redesigned to allow users easier access to data for different interaction types. An entry page is added to give a quick summary of available results and to now accept protein sequence accessions. To elucidate the formation of protein complexes, not just binary interactions, IBIS currently presents an expandable interaction network. Previously, IBIS provided annotations for four different types of binding partners: proteins, small molecules, nucleic acids and peptides; in the current version a new protein–ion interaction type has been added. Several options provide easy downloads of IBIS data for all Protein Data Bank (PDB) protein chains and the results for each query. In this study, we show that about one-third of all RefSeq sequences can be annotated with IBIS interaction partners and binding sites. The IBIS server is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi and updated biweekly.

Molecular BioSystems | 2013

Regulation of protein-protein binding by coupling between phosphorylation and intrinsic disorder: analysis of human protein complexes†

Hafumi Nishi; Jessica H. Fong; Christiana Chang; Sarah A. Teichmann; Anna R. Panchenko

Phosphorylation offers a dynamic way to regulate protein activity, subcellular localization, and stability. The majority of signaling pathways involve an extensive set of protein-protein interactions, and phosphorylation is widely used to regulate protein-protein binding by affecting the stability, kinetics and specificity of interactions. Previously it was found that phosphorylation sites tend to be located on protein-protein binding interfaces and may orthosterically modulate the strength of interactions. Here we studied the effect of phosphorylation on protein binding in relation to intrinsic disorder for different types of human protein complexes with known structure of the binding interface. Our results suggest that the processes of phosphorylation, binding and disorder-order transitions are coupled to each other, with about one quarter of all disordered interface Ser/Thr/Tyr sites being phosphorylated. Namely, residue site disorder and interfacial states significantly affect the phosphorylation of serine and to a lesser extent of threonine. Tyrosine phosphorylation might not be directly associated with binding through disorder, and is often observed in ordered interface regions which are not predicted to be disordered in the unbound state. We analyze possible mechanisms of how phosphorylation might regulate protein-protein binding via intrinsic disorder, and specifically focus on how phosphorylation could prevent disorder-order transitions upon binding.

Molecular BioSystems | 2010

Intrinsic disorder and protein multibinding in domain, terminal, and linker regions

Jessica H. Fong; Anna R. Panchenko

Intrinsic disorder is believed to contribute to the ability of some proteins to interact with multiple partners which is important for protein functional promiscuity and regulation of the cross-talk between pathways. To better understand the mechanisms of molecular recognition through disordered regions, here, we systematically investigate the coupling between disorder and binding within domain families in a structure interaction network and in terminal and inter-domain linker regions. We showed that the canonical domain-domain interaction model should take into account contributions of N- and C-termini and inter-domain linkers, which may form all or part of the binding interfaces. For the majority of proteins, binding interfaces on domain and terminal regions were predicted to be less disordered than non-interface regions. Analysis of all domain families revealed several exceptions, such as kinases, DNA/RNA binding proteins, certain enzymes, and regulatory proteins, which are candidates for disorder-to-order transitions that can occur upon binding. Domain interfaces that bind single or multiple partners do not exhibit significant difference in disorder content if normalized by the number of interactions. In general, protein families with more diverse interactions exhibit less average disorder over all members of the family. Our results shed light on recent controversies regarding the relationship between disorder and binding of multiple partners at common interfaces. In particular, they support the hypothesis that protein domains with many interacting partners should have a pleiotropic effect on functional pathways and consequently might be more constrained in evolution.

BMC Research Notes | 2008

Protein subfamily assignment using the Conserved Domain Database

Jessica H. Fong

BackgroundDomains, evolutionarily conserved units of proteins, are widely used to classify protein sequences and infer protein function. Often, two or more overlapping domain models match a region of a protein sequence. Therefore, procedures are required to choose appropriate domain annotations for the protein. Here, we propose a method for assigning NCBI-curated domains from the Curated Domain Database (CDD) that takes into account the organization of the domains into hierarchies of homologous domain models.FindingsOur analysis of alignment scores from NCBI-curated domain assignments suggests that identifying the correct model among closely related models is more difficult than choosing between non-overlapping domain models. We find that simple heuristics based on sorting scores and domain-specific thresholds are effective at reducing classification error. In fact, in our test set, the heuristics result in almost 90% of current misclassifications due to missing domain subfamilies being replaced by more generic domain assignments, thereby eliminating a significant amount of error within the database.ConclusionOur proposed domain subfamily assignment rule has been incorporated into the CD-Search software for assigning CDD domains to query protein sequences and has significantly improved pre-calculated domain annotations on protein sequences in NCBIs Entrez resource.

BMC Genomics | 2013

Comparison of RefSeq protein-coding regions in human and vertebrate genomes

Jessica H. Fong; Terence Murphy; Kim D. Pruitt

BackgroundAdvances in high-throughput sequencing technology have yielded a large number of publicly available vertebrate genomes, many of which are selected for inclusion in NCBI’s RefSeq project and subsequently processed by NCBI’s eukaryotic annotation pipeline. Genome annotation results are affected by differences in available support evidence and may be impacted by annotation pipeline software changes over time. The RefSeq project has not previously assessed annotation trends across organisms or over time. To address this deficiency, we have developed a comparative protocol which integrates analysis of annotated protein-coding regions across a data set of vertebrate orthologs in genomic sequence coordinates, protein sequences, and protein features.ResultsWe assessed an ortholog dataset that includes 34 annotated vertebrate RefSeq genomes including human. We confirm that RefSeq protein-coding gene annotations in mammals exhibit considerable similarity. Over 50% of the orthologous protein-coding genes in 20 organisms are supported at the level of splicing conservation with at least three selected reference genomes. Approximately 7,500 ortholog sets include at least half of the analyzed organisms, show highly similar sequence and conserved splicing, and may serve as a minimal set of mammalian “core proteins” for initial assessment of new mammalian genomes. Additionally, 80% of the proteins analyzed pass a suite of tests to detect proteins that lack splicing conservation and have unusual sequence or domain annotation. We use these tests to define an annotation quality metric that is based directly on the annotated proteins thus operates independently of other quality metrics such as availability of transcripts or assembly quality measures. Results are available on the RefSeq FTP site [http://ftp.ncbi.nlm.nih.gov/refseq/supplemental/ProtCore/SM1.txt].ConclusionsOur multi-factored analysis demonstrates a high level of consistency in RefSeq protein representation among vertebrates. We find that the majority of the RefSeq vertebrate proteins for which we have calculated orthology are good as measured by these metrics. The process flow described provides specific information on the scope and degree of conservation for the analyzed protein sequences and annotations and will be used to enrich the quality of RefSeq records by identifying targets for further improvement in the computational annotation pipeline, and by flagging specific genes for manual curation.

Explore More