Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Olga G. Troyanskaya is active.

Publication


Featured researches published by Olga G. Troyanskaya.


Bioinformatics | 2001

Missing value estimation methods for DNA microarrays

Olga G. Troyanskaya; Michael N. Cantor; Gavin Sherlock; Patrick O. Brown; Trevor Hastie; Robert Tibshirani; David Botstein; Russ B. Altman

MOTIVATION Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. RESULTS We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1--20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions.


Science | 2010

The Genetic Landscape of a Cell

Michael Costanzo; Anastasia Baryshnikova; Jeremy Bellay; Yungil Kim; Eric D. Spear; Carolyn S. Sevier; Huiming Ding; Judice L. Y. Koh; Kiana Toufighi; Jeany Prinz; Robert P. St.Onge; Benjamin VanderSluis; Taras Makhnevych; Franco J. Vizeacoumar; Solmaz Alizadeh; Sondra Bahr; Renee L. Brost; Yiqun Chen; Murat Cokol; Raamesh Deshpande; Zhijian Li; Zhen Yuan Lin; Wendy Liang; Michaela Marback; Jadine Paw; Bryan Joseph San Luis; Ermira Shuteriqi; Amy Hin Yan Tong; Nydia Van Dyk; Iain M. Wallace

Making Connections Genetic interaction profiles highlight cross-connections between bioprocesses, providing a global view of cellular pleiotropy, and enable the prediction of genetic network hubs. Costanzo et al. (p. 425) performed a pairwise fitness screen covering approximately one-third of all potential genetic interactions in yeast, examining 5.4 million gene-gene pairs and generating quantitative profiles for ∼75% of the genome. Of the pairwise interactions tested, about 3% of the genes investigated interact under the conditions tested. On the basis of these data, a reference map for the yeast genetic network was created. A genome-wide interaction map of yeast identifies genetic interactions, networks, and function. A genome-scale genetic interaction map was constructed by examining 5.4 million gene-gene pairs for synthetic genetic interactions, generating quantitative genetic interaction profiles for ~75% of all genes in the budding yeast, Saccharomyces cerevisiae. A network based on genetic interaction profiles reveals a functional map of the cell in which genes of similar biological processes cluster together in coherent subsets, and highly correlated profiles delineate specific pathways to define gene function. The global network identifies functional cross-connections between all bioprocesses, mapping a cellular wiring diagram of pleiotropy. Genetic interaction degree correlated with a number of different gene attributes, which may be informative about genetic network hubs in other organisms. We also demonstrate that extensive and unbiased mapping of the genetic landscape provides a key for interpretation of chemical-genetic interactions and drug target identification.


Proceedings of the National Academy of Sciences of the United States of America | 2001

Diversity of gene expression in adenocarcinoma of the lung.

Mitchell E. Garber; Olga G. Troyanskaya; Karsten Schluens; Simone Petersen; Zsuzsanna Thaesler; Manuela Pacyna-Gengelbach; Matt van de Rijn; Glenn D. Rosen; Charles M. Perou; Richard I. Whyte; Russ B. Altman; Patrick O. Brown; David Botstein; Iver Petersen

The global gene expression profiles for 67 human lung tumors representing 56 patients were examined by using 24,000-element cDNA microarrays. Subdivision of the tumors based on gene expression patterns faithfully recapitulated morphological classification of the tumors into squamous, large cell, small cell, and adenocarcinoma. The gene expression patterns made possible the subclassification of adenocarcinoma into subgroups that correlated with the degree of tumor differentiation as well as patient survival. Gene expression analysis thus promises to extend and refine standard pathologic analysis.


Proceedings of the National Academy of Sciences of the United States of America | 2003

Endothelial cell diversity revealed by global expression profiling.

Jen-Tsan Chi; Howard Y. Chang; Guttorm Haraldsen; Frode L. Jahnsen; Olga G. Troyanskaya; Dustin S. Chang; Zhen Wang; Stanley G. Rockson; Matt van de Rijn; David Botstein; Patrick O. Brown

The vascular system is locally specialized to accommodate widely varying blood flow and pressure and the distinct needs of individual tissues. The endothelial cells (ECs) that line the lumens of blood and lymphatic vessels play an integral role in the regional specialization of vascular structure and physiology. However, our understanding of EC diversity is limited. To explore EC specialization on a global scale, we used DNA microarrays to determine the expression profile of 53 cultured ECs. We found that ECs from different blood vessels and microvascular ECs from different tissues have distinct and characteristic gene expression profiles. Pervasive differences in gene expression patterns distinguish the ECs of large vessels from microvascular ECs. We identified groups of genes characteristic of arterial and venous endothelium. Hey2, the human homologue of the zebrafish gene gridlock, was selectively expressed in arterial ECs and induced the expression of several arterial-specific genes. Several genes critical in the establishment of left/right asymmetry were expressed preferentially in venous ECs, suggesting coordination between vascular differentiation and body plan development. Tissue-specific expression patterns in different tissue microvascular ECs suggest they are distinct differentiated cell types that play roles in the local physiology of their respective organs and tissues.


Proceedings of the National Academy of Sciences of the United States of America | 2003

A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae)

Olga G. Troyanskaya; Kara Dolinski; Art B. Owen; Russ B. Altman; David Botstein

Genomic sequencing is no longer a novelty, but gene function annotation remains a key challenge in modern biology. A variety of functional genomics experimental techniques are available, from classic methods such as affinity precipitation to advanced high-throughput techniques such as gene expression microarrays. In the future, more disparate methods will be developed, further increasing the need for integrated computational analysis of data generated by these studies. We address this problem with magic (Multisource Association of Genes by Integration of Clusters), a general framework that uses formal Bayesian reasoning to integrate heterogeneous types of high-throughput biological data (such as large-scale two-hybrid screens and multiple microarray analyses) for accurate gene function prediction. The system formally incorporates expert knowledge about relative accuracies of data sources to combine them within a normative framework. magic provides a belief level with its output that allows the user to vary the stringency of predictions. We applied magic to Saccharomyces cerevisiae genetic and physical interactions, microarray, and transcription factor binding sites data and assessed the biological relevance of gene groupings using Gene Ontology annotations produced by the Saccaromyces Genome Database. We found that by creating functional groupings based on heterogeneous data types, magic improved accuracy of the groupings compared with microarray analysis alone. We describe several of the biological gene groupings identified.


Proceedings of the National Academy of Sciences of the United States of America | 2007

Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry

An Chi; Curtis Huttenhower; Lewis Y. Geer; Joshua J. Coon; John E. P. Syka; Dina L. Bai; Jeffrey Shabanowitz; Daniel J. Burke; Olga G. Troyanskaya; Donald F. Hunt

We present a strategy for the analysis of the yeast phosphoproteome that uses endo-Lys C as the proteolytic enzyme, immobilized metal affinity chromatography for phosphopeptide enrichment, a 90-min nanoflow-HPLC/electrospray-ionization MS/MS experiment for phosphopeptide fractionation and detection, gas phase ion/ion chemistry, electron transfer dissociation for peptide fragmentation, and the Open Mass Spectrometry Search Algorithm for phosphoprotein identification and assignment of phosphorylation sites. From a 30-μg (≈600 pmol) sample of total yeast protein, we identify 1,252 phosphorylation sites on 629 proteins. Identified phosphoproteins have expression levels that range from <50 to 1,200,000 copies per cell and are encoded by genes involved in a wide variety of cellular processes. We identify a consensus site that likely represents a motif for one or more uncharacterized kinases and show that yeast kinases, themselves, contain a disproportionately large number of phosphorylation sites. Detection of a pHis containing peptide from the yeast protein, Cdc10, suggests an unexpected role for histidine phosphorylation in septin biology. From diverse functional genomics data, we show that phosphoproteins have a higher number of interactions than an average protein and interact with each other more than with a random protein. They are also likely to be conserved across large evolutionary distances.


Bioinformatics | 2006

Hierarchical multi-label prediction of gene function

Zafer Barutcuoglu; Robert E. Schapire; Olga G. Troyanskaya

MOTIVATION Assigning functions for unknown genes based on diverse large-scale data is a key task in functional genomics. Previous work on gene function prediction has addressed this problem using independent classifiers for each function. However, such an approach ignores the structure of functional class taxonomies, such as the Gene Ontology (GO). Over a hierarchy of functional classes, a group of independent classifiers where each one predicts gene membership to a particular class can produce a hierarchically inconsistent set of predictions, where for a given gene a specific class may be predicted positive while its inclusive parent class is predicted negative. Taking the hierarchical structure into account resolves such inconsistencies and provides an opportunity for leveraging all classifiers in the hierarchy to achieve higher specificity of predictions. RESULTS We developed a Bayesian framework for combining multiple classifiers based on the functional taxonomy constraints. Using a hierarchy of support vector machine (SVM) classifiers trained on multiple data types, we combined predictions in our Bayesian framework to obtain the most probable consistent set of predictions. Experiments show that over a 105-node subhierarchy of the GO, our Bayesian framework improves predictions for 93 nodes. As an additional benefit, our method also provides implicit calibration of SVM margin outputs to probabilities. Using this method, we make function predictions for multiple proteins, and experimentally confirm predictions for proteins involved in mitosis. SUPPLEMENTARY INFORMATION Results for the 105 selected GO classes and predictions for 1059 unknown genes are available at: http://function.princeton.edu/genesite/ CONTACT [email protected].


Proceedings of the National Academy of Sciences of the United States of America | 2003

Systemic and cell type-specific gene expression patterns in scleroderma skin

Michael L. Whitfield; Deborah Finlay; John I. Murray; Olga G. Troyanskaya; Jen-Tsan Chi; Timothy H. McCalmont; Patrick O. Brown; David Botstein; M. Kari Connolly

We used DNA microarrays representing >12,000 human genes to characterize gene expression patterns in skin biopsies from individuals with a diagnosis of systemic sclerosis with diffuse scleroderma. We found consistent differences in the patterns of gene expression between skin biopsies from individuals with scleroderma and those from normal, unaffected individuals. The biopsies from affected individuals showed nearly indistinguishable patterns of gene expression in clinically affected and clinically unaffected tissue, even though these were clearly distinguishable from the patterns found in similar tissue from unaffected individuals. Genes characteristically expressed in endothelial cells, B lymphocytes, and fibroblasts showed differential expression between scleroderma and normal biopsies. Analysis of lymphocyte populations in scleroderma skin biopsies by immunohistochemistry suggest the B lymphocyte signature observed on our arrays is from CD20+ B cells. These results provide evidence that scleroderma has systemic manifestations that affect multiple cell types and suggests genes that could be used as potential markers for the disease.


Nature Methods | 2015

Predicting effects of noncoding variants with deep learning-based sequence model

Jian Zhou; Olga G. Troyanskaya

Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning–based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants.


Genome Biology | 2008

A critical assessment of Mus musculus gene function prediction using integrated genomic evidence

Lourdes Peña-Castillo; Murat Tasan; Chad L. Myers; Hyunju Lee; Trupti Joshi; Chao Zhang; Yuanfang Guan; Michele Leone; Andrea Pagnani; Wan-Kyu Kim; Chase Krumpelman; Weidong Tian; Guillaume Obozinski; Yanjun Qi; Guan Ning Lin; Gabriel F. Berriz; Francis D. Gibbons; Gert R. G. Lanckriet; Jian-Ge Qiu; Charles E. Grant; Zafer Barutcuoglu; David P. Hill; David Warde-Farley; Chris Grouios; Debajyoti Ray; Judith A. Blake; Minghua Deng; Michael I. Jordan; William Stafford Noble; Quaid Morris

Background:Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated.Results:In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%.Conclusion:We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized.

Collaboration


Dive into the Olga G. Troyanskaya's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kai Li

Princeton University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Casey S. Greene

University of Pennsylvania

View shared research outputs
Researchain Logo
Decentralizing Knowledge