Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Judith D. Cohn is active.

Publication


Featured researches published by Judith D. Cohn.


Protein Science | 2006

A categorization approach to automated ontological function annotation.

Karin Verspoor; Judith D. Cohn; Susan M. Mniszewski; Cliff Joslyn

Automated function prediction (AFP) methods increasingly use knowledge discovery algorithms to map sequence, structure, literature, and/or pathway information about proteins whose functions are unknown into functional ontologies, typically (a portion of) the Gene Ontology (GO). While there are a growing number of methods within this paradigm, the general problem of assessing the accuracy of such prediction algorithms has not been seriously addressed. We present first an application for function prediction from protein sequences using the POSet Ontology Categorizer (POSOC) to produce new annotations by analyzing collections of GO nodes derived from annotations of protein BLAST neighborhoods. We then also present hierarchical precision and hierarchical recall as new evaluation metrics for assessing the accuracy of any predictions in hierarchical ontologies, and discuss results on a test set of protein sequences. We show that our method provides substantially improved hierarchical precision (measure of predictions made that are correct) when applied to the nearest BLAST neighbors of target proteins, as compared with simply imputing that neighborhoods annotations to the target. Moreover, when our method is applied to a broader BLAST neighborhood, hierarchical precision is enhanced even further. In all cases, such increased hierarchical precision performance is purchased at a modest expense of hierarchical recall (measure of all annotations that get predicted at all).


Acta Crystallographica Section D-biological Crystallography | 2006

Automated ligand fitting by core-fragment fitting and extension into density

Thomas C. Terwilliger; Herbert E. Klei; Paul D. Adams; Nigel W. Moriarty; Judith D. Cohn

An automated ligand-fitting procedure has been developed and tested on 9327 ligands and (F o − F c)exp(iϕc) difference density from macromolecular structures in the Protein Data Bank.


BMC Bioinformatics | 2005

Protein annotation as term categorization in the gene ontology using word proximity networks

Karin Verspoor; Judith D. Cohn; Cliff Joslyn; Susan M. Mniszewski; Andreas Rechtsteiner; Luis Mateus Rocha; Tiago Simas

BackgroundWe participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations of two distinct methods: an unsupervised algorithm for expanding words associated with GO nodes, and an annotation methodology which treats annotation as categorization of terms from a proteins document neighborhood into the GO.ResultsThe evaluation results indicate that the method for expanding words associated with GO nodes is quite powerful; we were able to successfully select appropriate evidence text for a given annotation in 38% of Task 2.1 queries by building on this method. The term categorization methodology achieved a precision of 16% for annotation within the correct extended family in Task 2.2, though we show through subsequent analysis that this can be improved with a different parameter setting. Our architecture proved not to be very successful on the evidence text component of the task, in the configuration used to generate the submitted results.ConclusionThe initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall.


Acta Crystallographica Section D-biological Crystallography | 2007

Ligand identification using electron-density map correlations

Thomas C. Terwilliger; Paul D. Adams; Nigel W. Moriarty; Judith D. Cohn

An automated ligand-fitting procedure is applied to (F o − F c)exp(iϕc) difference density for 200 commonly found ligands from macromolecular structures in the Protein Data Bank to identify ligands from density maps.


BMC Structural Biology | 2008

Fast dynamics perturbation analysis for prediction of protein functional sites.

Dengming Ming; Judith D. Cohn; Michael E. Wall

BackgroundWe present a fast version of the dynamics perturbation analysis (DPA) algorithm to predict functional sites in protein structures. The original DPA algorithm finds regions in proteins where interactions cause a large change in the protein conformational distribution, as measured using the relative entropy Dx. Such regions are associated with functional sites.ResultsThe Fast DPA algorithm, which accelerates DPA calculations, is motivated by an empirical observation that Dxin a normal-modes model is highly correlated with an entropic term that only depends on the eigenvalues of the normal modes. The eigenvalues are accurately estimated using first-order perturbation theory, resulting in a N-fold reduction in the overall computational requirements of the algorithm, where N is the number of residues in the protein. The performance of the original and Fast DPA algorithms was compared using protein structures from a standard small-molecule docking test set. For nominal implementations of each algorithm, top-ranked Fast DPA predictions overlapped the true binding site 94% of the time, compared to 87% of the time for original DPA. In addition, per-protein recall statistics (fraction of binding-site residues that are among predicted residues) were slightly better for Fast DPA. On the other hand, per-protein precision statistics (fraction of predicted residues that are among binding-site residues) were slightly better using original DPA. Overall, the performance of Fast DPA in predicting ligand-binding-site residues was comparable to that of the original DPA algorithm.ConclusionCompared to the original DPA algorithm, the decreased run time with comparable performance makes Fast DPA well-suited for implementation on a web server and for high-throughput analysis.


BMC Research Notes | 2012

Rapid phylogenetic and functional classification of short genomic fragments with signature peptides

Joel Berendzen; William J. Bruno; Judith D. Cohn; Nicolas W. Hengartner; Cheryl R. Kuske; Benjamin H. McMahon; Murray Wolinsky; Gary Xie

BackgroundClassification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers.ResultsAt even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read) at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database.ConclusionsClassification by exact matching against a precomputed list of signature peptides provides comparable results to existing techniques for reads longer than about 300 bp and does not degrade severely with shorter reads. Orders of magnitude faster than existing methods, the approach is suitable now for inclusion in analysis pipelines and appears to be extensible in several different directions.


PLOS ONE | 2012

Text mining improves prediction of protein functional sites.

Karin Verspoor; Judith D. Cohn; K. E. Ravikumar; Michael E. Wall

We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions.


PLOS Computational Biology | 2011

Genome majority vote improves gene predictions.

Michael E. Wall; Sindhu Raghavan; Judith D. Cohn; John Dunbar

Recent studies have noted extensive inconsistencies in gene start sites among orthologous genes in related microbial genomes. Here we provide the first documented evidence that imposing gene start consistency improves the accuracy of gene start-site prediction. We applied an algorithm using a genome majority vote (GMV) scheme to increase the consistency of gene starts among orthologs. We used a set of validated Escherichia coli genes as a standard to quantify accuracy. Results showed that the GMV algorithm can correct hundreds of gene prediction errors in sets of five or ten genomes while introducing few errors. Using a conservative calculation, we project that GMV would resolve many inconsistencies and errors in publicly available microbial gene maps. Our simple and logical solution provides a notable advance toward accurate gene maps.


BMC Genomics | 2011

Consistency of gene starts among Burkholderia genomes

John Dunbar; Judith D. Cohn; Michael E. Wall

BackgroundEvolutionary divergence in the position of the translational start site among orthologous genes can have significant functional impacts. Divergence can alter the translation rate, degradation rate, subcellular location, and function of the encoded proteins.ResultsExisting Genbank gene maps for Burkholderia genomes suggest that extensive divergence has occurred--53% of ortholog sets based on Genbank gene maps had inconsistent gene start sites. However, most of these inconsistencies appear to be gene-calling errors. Evolutionary divergence was the most plausible explanation for only 17% of the ortholog sets. Correcting probable errors in the Genbank gene maps decreased the percentage of ortholog sets with inconsistent starts by 68%, increased the percentage of ortholog sets with extractable upstream intergenic regions by 32%, increased the sequence similarity of intergenic regions and predicted proteins, and increased the number of proteins with identifiable signal peptides.ConclusionsOur findings highlight an emerging problem in comparative genomics: single-digit percent errors in gene predictions can lead to double-digit percentages of inconsistent ortholog sets. The work demonstrates a simple approach to evaluate and improve the quality of gene maps.


pacific symposium on biocomputing | 2012

Detection of Protein Catalytic Sites in the Biomedical Literature

Karin Verspoor; Andrew MacKinlay; Judith D. Cohn; Michael E. Wall

This paper explores the application of text mining to the problem of detecting protein functional sites in the biomedical literature, and specifically considers the task of identifying catalytic sites in that literature. We provide strong evidence for the need for text mining techniques that address residue-level protein function annotation through an analysis of two corpora in terms of their coverage of curated data sources. We also explore the viability of building a text-based classifier for identifying protein functional sites, identifying the low coverage of curated data sources and the potential ambiguity of information about protein functional sites as challenges that must be addressed. Nevertheless we produce a simple classifier that achieves a reasonable ∼69% F-score on our full text silver corpus on the first attempt to address this classification task. The work has application in computational prediction of the functional significance of protein sites as well as in curation workflows for databases that capture this information.

Collaboration


Dive into the Judith D. Cohn's collaboration.

Top Co-Authors

Avatar

Michael E. Wall

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cliff Joslyn

Pacific Northwest National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Dengming Ming

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Benjamin H. McMahon

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

John Dunbar

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Luis Mateus Rocha

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar

Nicolas W. Hengartner

Los Alamos National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Nigel W. Moriarty

Lawrence Berkeley National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge