Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David J. States is active.

Publication


Featured researches published by David J. States.


Nature Biotechnology | 2006

Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study

David J. States; Gilbert S. Omenn; Thomas W. Blackwell; Damian Fermin; Jimmy K. Eng; David W. Speicher; Samir M. Hanash

The Human Proteome Organization (HUPO) recently completed the first large-scale collaborative study to characterize the human serum and plasma proteomes. The study was carried out in different locations and used diverse methods and instruments to compare and integrate tandem mass spectrometry (MS/MS) data on aliquots of pooled serum and plasma from healthy subjects. Liquid chromatography (LC)-MS/MS data sets from 18 laboratories were matched to the International Protein Index database, and an initial integration exercise resulted in 9,504 proteins identified with one or more peptides, and 3,020 proteins identified with two or more peptides. This article uses a rigorous statistical approach to take into account the length of coding regions in genes, and multiple hypothesis-testing techniques. On this basis, we now present a reduced set of 889 proteins identified with a confidence level of at least 95%. We also discuss the importance of such an integrated analysis in providing an accurate representation of a proteome as well as the value such data sets contain for the high-confidence identification of protein matches to novel exons, some of which may be localized in alternatively spliced forms of known plasma proteins and some in previously nonannotated gene sequences.


Computational Biology and Chemistry | 1993

Information enhancement methods for large scale sequence analysis

Jean-Michel Claverie; David J. States

Abstract The improved efficiency of similarity search programs and the affordability of even faster computers allow studies where whole sequence databases can be the target of various comparisons with increasingly larger or numerous query sequences. However, the usefulness of those “brute force” methods now becomes limited by the time it takes an experienced scientist to sift the biologically relevant matches from overwhelming, albeit “statistically significant” outputs. The discrepancy between statistical vs biological significance has different causes: erroneous database entries, repetitive sequence elements, and the ubiquity of low complexity segments with biased composition. We present two masking methods (programs XNU and XBLAST) capable of eliminating most of the irrelevant outputs in a variety of large scale sequence analysis situations: global “all against all” database comparisons, massive partial cDNA sequencing (EST), positional cloning and genomic data analysis.


Methods | 1991

Improved Sensitivity of Nucleic Acid Database Searches Using Application-Specific Scoring Matrices

David J. States; Warren Gish; Stephen F. Altschul

Scoring matrices for nucleic acid sequence comparison that are based on models appropriate to the analysis of molecular sequencing errors or biological mutation processes are presented. In mammalian genomes, transition mutations occur significantly more frequently than transversions, and the optimal scoring of sequence alignments based on this substitution model differs from that derived assuming a uniform mutation model. The information from sequence alignments potentially available using an optimal scoring system is compared with that obtained using the BLASTN default scoring. A modified BLAST database search tool allows these, or other explicitly specified scoring matrices, to be utilized in computationally efficient queries of nucleic acid databases with nucleic acid query sequences. Results of searches performed using BLASTNs default score matrix are compared with those using scores based on a mutational model in which transitions are more prevalent than transversions.


Nucleic Acids Research | 2007

Michigan Molecular Interactions (MiMI): putting the jigsaw puzzle together

Magesh Jayapandian; Adriane Chapman; V. Glenn Tarcea; Cong Yu; Aaron Elkiss; Angela Ianni; Bin Liu; Arnab Nandi; Carlos de los Santos; Philip C. Andrews; Brian D. Athey; David J. States; H. V. Jagadish

Protein interaction data exists in a number of repositories. Each repository has its own data format, molecule identifier and supplementary information. Michigan Molecular Interactions (MiMI) assists scientists searching through this overwhelming amount of protein interaction data. MiMI gathers data from well-known protein interaction databases and deep-merges the information. Utilizing an identity function, molecules that may have different identifiers but represent the same real-world object are merged. Thus, MiMI allows the users to retrieve information from many different databases at once, highlighting complementary and contradictory information. To help scientists judge the usefulness of a piece of data, MiMI tracks the provenance of all data. Finally, a simple yet powerful user interface aids users in their queries, and frees them from the onerous task of knowing the data format or learning a query language. MiMI allows scientists to query all data, whether corroborative or contradictory, and specify which sources to utilize. MiMI is part of the National Center for Integrative Biomedical Informatics (NCIBI) and is publicly available at: .


Genome Biology | 2006

Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics

Damian Fermin; Baxter B. Allen; Thomas W. Blackwell; Rajasree Menon; Marcin Adamski; Yin Xu; Peter J. Ulintz; Gilbert S. Omenn; David J. States

BackgroundDefining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database.ResultsApplying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene.ConclusionThis work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures.


Nucleic Acids Research | 2009

Michigan molecular interactions r2: from interacting proteins to pathways.

V. Glenn Tarcea; Terry E. Weymouth; Alexander S. Ade; Aaron V. Bookvich; Jing Gao; Vasudeva Mahavisno; Zach Wright; Adriane Chapman; Magesh Jayapandian; Arzucan Özgür; Yuanyuan Tian; James D. Cavalcoli; Barbara Mirel; Jignesh M. Patel; Dragomir R. Radev; Brian D. Athey; David J. States; H. V. Jagadish

Molecular interaction data exists in a number of repositories, each with its own data format, molecule identifier and information coverage. Michigan molecular interactions (MiMI) assists scientists searching through this profusion of molecular interaction data. The original release of MiMI gathered data from well-known protein interaction databases, and deep merged this information while keeping track of provenance. Based on the feedback received from users, MiMI has been completely redesigned. This article describes the resulting MiMI Release 2 (MiMIr2). New functionality includes extension from proteins to genes and to pathways; identification of highlighted sentences in source publications; seamless two-way linkage with Cytoscape; query facilities based on MeSH/GO terms and other concepts; approximate graph matching to find relevant pathways; support for querying in bulk; and a user focus-group driven interface design. MiMI is part of the NIHs; National Center for Integrative Biomedical Informatics (NCIBI) and is publicly available at: http://mimi.ncibi.org.


Bioinformatics | 2010

GLay: Community structure analysis of biological networks

Gang Su; Allan Kuchinsky; John H. Morris; David J. States; Fan Meng

Summary: GLay provides Cytoscape users an assorted collection of versatile community structure algorithms and graph layout functions for network clustering and structured visualization. High performance is achieved by dynamically linking highly optimized C functions to the Cytoscape JAVA program, which makes GLay especially suitable for decomposition, display and exploratory analysis of large biological networks. Availability: http://brainarray.mbni.med.umich.edu/glay/ Contact: [email protected]


Cancer Research | 2009

Identification of Novel Alternative Splice Isoforms of Circulating Proteins in a Mouse Model of Human Pancreatic Cancer

Rajasree Menon; Qing Zhang; Yan Zhang; Damian Fermin; Nabeel Bardeesy; Ronald A. DePinho; Chunxia Lu; Samir M. Hanash; Gilbert S. Omenn; David J. States

To assess the potential of tumor-associated, alternatively spliced gene products as a source of biomarkers in biological fluids, we have analyzed a large data set of mass spectra derived from the plasma proteome of a mouse model of human pancreatic ductal adenocarcinoma. MS/MS spectra were interrogated for novel splice isoforms using a nonredundant database containing an exhaustive three-frame translation of Ensembl transcripts and gene models from ECgene. This integrated analysis identified 420 distinct splice isoforms, of which 92 did not match any previously annotated mouse protein sequence. We chose seven of those novel variants for validation by reverse transcription-PCR. The results were concordant with the proteomic analysis. All seven novel peptides were successfully amplified in pancreas specimens from both wild-type and mutant mice. Isotopic labeling of cysteine-containing peptides from tumor-bearing mice and wild-type controls enabled relative quantification of the proteins. Differential expression between tumor-bearing and control mice was notable for peptides from novel variants of muscle pyruvate kinase, malate dehydrogenase 1, glyceraldehyde-3-phosphate dehydrogenase, proteoglycan 4, minichromosome maintenance, complex component 9, high mobility group box 2, and hepatocyte growth factor activator. Our results show that, in a mouse model for human pancreatic cancer, novel and differentially expressed alternative splice isoforms are detectable in plasma and may be a source of candidate biomarkers.


Journal of Biological Chemistry | 2006

A dominant function of IKK/NF-κB signaling in global lipopolysaccharide-induced gene expression

Nathalie Carayol; Ji Chen; Fan Yang; Taocong Jin; Lijian Jin; David J. States; Cun-Yu Wang

Porphyromonas gingivalis is an etiologic pathogen of periodontitis that is one of the most common inflammatory diseases. Recently, we found that P. gingivalis LPS activated the transcription factor nuclear factor-κB (NF-κB) through the IκB kinase complex (IKK). NF-κB is a transcription factor that controls inflammation and host responses. In this study, we examined the role of IKK/NF-κBin P. gingivalis LPS-induced gene expression on a genome-wide basis using a combination of microarray and biochemical approaches. A total of 88 early response genes were found to be induced by P. gingivalis LPS in a human THP.1 monocytic cell lines. Interestingly, the induction of most of these genes was abolished or attenuated under the inactivation of IKK/NF-κB. Among those IKK/NF-κB-dependent genes, 20 genes were NF-κB-inducible genes reported previously, and 59 genes represented putative novel NF-κB target genes. Using transcription factor binding analysis, we found that most of these putative NF-κB target genes contained one or multiple NF-κB-binding sites. Also, some transcription factor-binding motifs were overrepresented in the promoter of both known and putative NF-κB-dependent genes, indicating that these genes may be regulated in a similar fashion. Furthermore, we found that several transcription factors associated with metabolic and inflammatory responses, including nuclear receptors, activator of protein-1, and early growth responses, were induced by P. gingivalis LPS through IKK/NF-κB, indicating that IKK/NF-κB may utilize these transcription factors to mediate secondary responses. Taken together, our results demonstrate that IKK/NF-κB signaling plays a dominant role in P. gingivalis LPS-induced early response gene expression, suggesting that IKK/NF-κB is a therapeutic target for periodontitis.


Journal of Computational Biology | 1994

QGB: Combined Use of Sequence Similarity and Codon Bias for Coding Region Identification

David J. States; Warren Gish

A computer program called BLASTX was previously shown to be effective in identifying and assigning putative function to likely protein coding regions by detecting significant similarity between a conceptually translated nucleotide query sequence and members of a protein sequence database. We present and assess the sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with the information obtained from sequence similarity. A rationale for combining these diverse information sources was derived, and analyses of the information available from codon utilization in several species were performed, with wide variation seen. Codon bias information was found on average to improve the sensitivity of detection of short coding regions of human origin by about a factor of 5. The implications of combining information sources on the interpretation of positive findings are discussed.

Collaboration


Dive into the David J. States's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Arvind Rao

University of Michigan

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lawrence Hunter

University of Colorado Denver

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge