David Lee Gold
State University of New York System
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Lee Gold.
BMC Cancer | 2010
Jeffrey C. Miecznikowski; Dan Wang; Song Liu; Lara Sucheston; David Lee Gold
BackgroundAn estimated 12% of females in the United States will develop breast cancer in their lifetime. Although, there are advances in treatment options including surgery and chemotherapy, breast cancer is still the second most lethal cancer in women. Thus, there is a clear need for better methods to predict prognosis for each breast cancer patient. With the advent of large genetic databases and the reduction in cost for the experiments, researchers are faced with choosing from a large pool of potential prognostic markers from numerous breast cancer gene expression profile studies.MethodsFive microarray datasets related to breast cancer were examined using gene set analysis and the cancers were categorized into different subtypes using a scoring system based on genetic pathway activity.ResultsWe have observed that significant genes in the individual studies show little reproducibility across the datasets. From our comparative analysis, using gene pathways with clinical variables is more reliable across studies and shows promise in assessing a patients prognosis.ConclusionsThis study concludes that, in light of clinical variables, there are significant gene pathways in common across the datasets. Specifically, several pathways can further significantly stratify patients for survival. These candidate pathways should help to develop a panel of significant biomarkers for the prognosis of breast cancer patients in a clinical setting.
Bioinformatics | 2009
David Lee Gold; Jeffrey C. Miecznikowski; Song Liu
Motivation: The decision to commit some or many false positives in practice rests with the investigator. Unfortunately, not all error control procedures perform the same. Our problem is to choose an error control procedure to determine a P-value threshold for identifying differentially expressed pathways in high-throughput gene expression studies. Pathway analysis involves fewer tests than differential gene expression analysis, on the order of a few hundred. We discuss and compare methods for error control for pathway analysis with gene expression data. Results: In consideration of the variability in test results, we find that the widely used Benjamini and Hochbergs (BH) false discovery rate (FDR) analysis is less robust than alternative procedures. BHs error control requires a large number of hypothesis tests, a reasonable assumption for differential gene expression analysis, though not the case with pathway-based analysis. Therefore, we advocate through a series of simulations and applications to real gene expression data that researchers control the number of false positives rather than the FDR. Availability: Our R package, EPath.omg is available at http://sphhp.buffalo.edu/biostat/research/software. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
Biometrics | 2011
Xiaoyu Jiang; David Lee Gold; Eric D. Kolaczyk
Predicting the functional roles of proteins based on various genome-wide data, such as protein-protein association networks, has become a canonical problem in computational biology. Approaching this task as a binary classification problem, we develop a network-based extension of the spatial auto-probit model. In particular, we develop a hierarchical Bayesian probit-based framework for modeling binary network-indexed processes, with a latent multivariate conditional autoregressive Gaussian process. The latter allows for the easy incorporation of protein-protein association network topologies-either binary or weighted-in modeling protein functional similarity. We use this framework to predict protein functions, for functions defined as terms in the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functionality. Furthermore, we show how a natural extension of this framework can be used to model and correct for the high percentage of false negative labels in training data derived from GO, a serious shortcoming endemic to biological databases of this type. Our method performance is evaluated and compared with standard algorithms on weighted yeast protein-protein association networks, extracted from a recently developed integrative database called Search Tool for the Retrieval of INteracting Genes/proteins (STRING). Results show that our basic method is competitive with these other methods, and that the extended method-incorporating the uncertainty in negative labels among the training data-can yield nontrivial improvements in predictive accuracy.
bioinformatics and biomedicine | 2008
Xiaoyu Jiang; Naoki Nariai; Martin Steffen; Simon Kasif; David Lee Gold; Eric D. Kolaczyk
The study of gene function is critical in various genomic and proteomic fields. Due to the availability of tremendous amounts of different types of protein data, integrating these datasets to predict function has become a significant opportunity in computational biology. In this paper, to predict protein function we (i) develop a novel Bayesian framework combining relational,hierarchical and structural information with improvement in data usage efficiency over similar methods, and (ii) propose to use it in conjunction with an integrative protein-protein association network, STRING (Search Tool for the Retrieval of INteracting Genes/proteins), which combines information from seven different sources. At the heart of our work is accomplishing protein data integration in a concerted fashion with respect to algorithm and data source. Method performance is assessed by a 5-fold cross-validation in yeast on selected terms from the Molecular Function ontology in the Gene Ontology database. Results show that our combined use of the proposed computational framework and the protein network from STRING offers substantial improvements in prediction. The benefits of using an aggressively integrative network, such as STRING, may derive from the fact that although it is likely that the ultimate gene interaction matrix (including but not limited to protein-protein, genetic, or regulatory interactions) will be sparse, presently it is still known only incompletely in most organisms, and thus the use of multiple distinct data sources is rewarded.
Journal of Computational Biology | 2008
David Lee Gold; Bani K. Mallick; Kevin R. Coombes
Advances in microtechnologies are making it possible for high-throughput control and reporting of gene expression in live cells, in real-time. We explore relevant statistical challenges to modeling and inference in real-time gene expression data from single-shock experiments, with special attention on potential confounding between treatment and cell cycle variation. We propose a semi-wavelet non-linear dynamic regression model to infer modulation in gene expression due to treatment shocks in the presence of cell cycle variation. A case study is performed with public data. Results are compared ignoring cell cycle. Estimation and inference are performed by a Bayesian approach.
Briefings in Bioinformatics | 2007
David Lee Gold; Kevin R. Coombes; Jing Wang; Bani K. Mallick
Archive | 2009
Bani K. Mallick; David Lee Gold; Veerabhadran Baladandayuthapani
Statistics & Probability Letters | 2011
Jeffrey C. Miecznikowski; David Lee Gold; Lori Shepherd; Song Liu
Archive | 2009
Bani K. Mallick; David Lee Gold; Veerabhadran Baladandayuthapani
Archive | 2009
Bani K. Mallick; David Lee Gold; Veerabhadran Baladandayuthapani
Collaboration
Dive into the David Lee Gold's collaboration.
Veerabhadran Baladandayuthapani
University of Texas MD Anderson Cancer Center
View shared research outputs