Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Edoardo M. Airoldi is active.

Publication


Featured researches published by Edoardo M. Airoldi.


Biometrika | 2012

Stochastic blockmodels with a growing number of classes.

David S. Choi; Patrick J. Wolfe; Edoardo M. Airoldi

We present asymptotic and finite-sample results on the use of stochastic blockmodels for the analysis of network data. We show that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root of the network size and the average network degree grows at least poly-logarithmically in this size. We also establish finite-sample confidence bounds on maximum-likelihood blockmodel parameter estimates from data comprising independent Bernoulli random variates; these results hold uniformly over class assignment. We provide simulations verifying the conditions sufficient for our results, and conclude by fitting a logit parameterization of a stochastic blockmodel with covariates to a network data example comprising self-reported school friendships, resulting in block estimates that reveal residual structure.


Cell | 2015

Reversible, Specific, Active Aggregates of Endogenous Proteins Assemble upon Heat Stress

Edward Wallace; Jamie L. Kear-Scott; Evgeny V. Pilipenko; Michael H. Schwartz; Pawel R. Laskowski; Alexandra E. Rojek; Christopher D. Katanski; Joshua A. Riback; Michael F. Dion; Alexander Franks; Edoardo M. Airoldi; Tao Pan; Bogdan Budnik; D. Allan Drummond

Heat causes protein misfolding and aggregation and, in eukaryotic cells, triggers aggregation of proteins and RNA into stress granules. We have carried out extensive proteomic studies to quantify heat-triggered aggregation and subsequent disaggregation in budding yeast, identifying >170 endogenous proteins aggregating within minutes of heat shock in multiple subcellular compartments. We demonstrate that these aggregated proteins are not misfolded and destined for degradation. Stable-isotope labeling reveals that even severely aggregated endogenous proteins are disaggregated without degradation during recovery from shock, contrasting with the rapid degradation observed for many exogenous thermolabile proteins. Although aggregation likely inactivates many cellular proteins, in the case of a heterotrimeric aminoacyl-tRNA synthetase complex, the aggregated proteins remain active with unaltered fidelity. We propose that most heat-induced aggregation of mature proteins reflects the operation of an adaptive, autoregulatory process of functionally significant aggregate assembly and disassembly that aids cellular adaptation to thermal stress.


PLOS Computational Biology | 2009

Predicting Cellular Growth from Gene Expression Signatures

Edoardo M. Airoldi; Curtis Huttenhower; David Gresham; Charles Lu; Amy A. Caudy; Maitreya J. Dunham; James R. Broach; David Botstein; Olga G. Troyanskaya

Maintaining balanced growth in a changing environment is a fundamental systems-level challenge for cellular physiology, particularly in microorganisms. While the complete set of regulatory and functional pathways supporting growth and cellular proliferation are not yet known, portions of them are well understood. In particular, cellular proliferation is governed by mechanisms that are highly conserved from unicellular to multicellular organisms, and the disruption of these processes in metazoans is a major factor in the development of cancer. In this paper, we develop statistical methodology to identify quantitative aspects of the regulatory mechanisms underlying cellular proliferation in Saccharomyces cerevisiae. We find that the expression levels of a small set of genes can be exploited to predict the instantaneous growth rate of any cellular culture with high accuracy. The predictions obtained in this fashion are robust to changing biological conditions, experimental methods, and technological platforms. The proposed model is also effective in predicting growth rates for the related yeast Saccharomyces bayanus and the highly diverged yeast Schizosaccharomyces pombe, suggesting that the underlying regulatory signature is conserved across a wide range of unicellular evolution. We investigate the biological significance of the gene expression signature that the predictions are based upon from multiple perspectives: by perturbing the regulatory network through the Ras/PKA pathway, observing strong upregulation of growth rate even in the absence of appropriate nutrients, and discovering putative transcription factor binding sites, observing enrichment in growth-correlated genes. More broadly, the proposed methodology enables biological insights about growth at an instantaneous time scale, inaccessible by direct experimental methods. Data and tools enabling others to apply our methods are available at http://function.princeton.edu/growthrate.


siam international conference on data mining | 2011

Block-LDA: Jointly modeling entity-annotated text and entity-entity links.

Edoardo M. Airoldi; David M. Blei; Elena A. Erosheva; Stephen E. Fienberg; Ramnath Balasubramanyan; William W. Cohen

Identifying latent groups of entities from observed interactions between pairs of entities is a frequently encountered problem in areas like analysis of protein interactions and social networks. We present a model that combines aspects of mixed membership stochastic block models and topic models to improve entity-entity link modeling by jointly modeling links and text about the entities that are linked. We apply the model to two datasets: a protein-protein interaction (PPI) dataset supplemented with a corpus of abstracts of scientific publications annotated with the proteins in the PPI dataset and an Enron email corpus. The model is evaluated by inspecting induced topics to understand the nature of the data and by quantitative methods such as functional category prediction of proteins and perplexity which exhibit improvements when joint modeling is used over baselines that use only link or text information.


PLOS Genetics | 2015

Accounting for Experimental Noise Reveals That mRNA Levels, Amplified by Post-Transcriptional Processes, Largely Determine Steady-State Protein Levels in Yeast

Gábor Csárdi; Alexander Franks; David S. Choi; Edoardo M. Airoldi; D. Allan Drummond

Cells respond to their environment by modulating protein levels through mRNA transcription and post-transcriptional control. Modest observed correlations between global steady-state mRNA and protein measurements have been interpreted as evidence that mRNA levels determine roughly 40% of the variation in protein levels, indicating dominant post-transcriptional effects. However, the techniques underlying these conclusions, such as correlation and regression, yield biased results when data are noisy, missing systematically, and collinear---properties of mRNA and protein measurements---which motivated us to revisit this subject. Noise-robust analyses of 24 studies of budding yeast reveal that mRNA levels explain more than 85% of the variation in steady-state protein levels. Protein levels are not proportional to mRNA levels, but rise much more rapidly. Regulation of translation suffices to explain this nonlinear effect, revealing post-transcriptional amplification of, rather than competition with, transcriptional signals. These results substantially revise widely credited models of protein-level regulation, and introduce multiple noise-aware approaches essential for proper analysis of many biological phenomena.


Cell Reports | 2015

Differential Stoichiometry among Core Ribosomal Proteins.

Nikolai Slavov; Stefan Semrau; Edoardo M. Airoldi; Bogdan Budnik; Alexander van Oudenaarden

Summary Understanding the regulation and structure of ribosomes is essential to understanding protein synthesis and its dysregulation in disease. While ribosomes are believed to have a fixed stoichiometry among their core ribosomal proteins (RPs), some experiments suggest a more variable composition. Testing such variability requires direct and precise quantification of RPs. We used mass spectrometry to directly quantify RPs across monosomes and polysomes of mouse embryonic stem cells (ESC) and budding yeast. Our data show that the stoichiometry among core RPs in wild-type yeast cells and ESC depends both on the growth conditions and on the number of ribosomes bound per mRNA. Furthermore, we find that the fitness of cells with a deleted RP-gene is inversely proportional to the enrichment of the corresponding RP in polysomes. Together, our findings support the existence of ribosomes with distinct protein composition and physiological function.


Proceedings of the 3rd international workshop on Link discovery | 2005

A latent mixed membership model for relational data

Edoardo M. Airoldi; David M. Blei; Eric P. Xing; Stephen E. Fienberg

Modeling relational data is an important problem for modern data analysis and machine learning. In this paper we propose a Bayesian model that uses a hierarchy of probabilistic assumptions about the way objects interact with one another in order to learn latent groups, their typical interaction patterns, and the degree of membership of objects to groups. Our model explains the data using a small set of parameters that can be reliably estimated with an efficient inference algorithm. In our approach, the set of probabilistic assumptions may be tailored to a specific application domain in order to incorporate intuitions and/or semantics of interest. We demonstrate our methods on simulated data and we successfully apply our model to a data set of protein-to-protein interactions.


Journal of the American Statistical Association | 2016

A Model of Text for Experimentation in the Social Sciences

Margaret E. Roberts; Brandon M. Stewart; Edoardo M. Airoldi

ABSTRACT Statistical models of text have become increasingly popular in statistics and computer science as a method of exploring large document collections. Social scientists often want to move beyond exploration, to measurement and experimentation, and make inference about social and political processes that drive discourse and content. In this article, we develop a model of text data that supports this type of substantive research. Our approach is to posit a hierarchical mixed membership model for analyzing topical content of documents, in which mixing weights are parameterized by observed covariates. In this model, topical prevalence and topical content are specified as a simple generalized linear model on an arbitrary number of document-level covariates, such as news source and time of release, enabling researchers to introduce elements of the experimental design that informed document collection into the model, within a generally applicable framework. We demonstrate the proposed methodology by analyzing a collection of news reports about China, where we allow the prevalence of topics to evolve over time and vary across newswire services. Our methods quantify the effect of news wire source on both the frequency and nature of topic coverage. Supplementary materials for this article are available online.


PLOS Computational Biology | 2007

Getting Started in Probabilistic Graphical Models

Edoardo M. Airoldi

Probabilistic graphical models offer a common conceptual architecture where biological and mathematical objects can be expressed with a common, intuitive formalism. This enables effective communication between scientists across the mathematical divide by fostering substantive debate in the context of a scientific problem, and ultimately facilitates the joint development of statistical and computational tools for quantitative data analysis. A number of success stories have appeared over the years [1–4]. Today, probabilistic graphical models promise to play a major role in the resolution of many intriguing conundrums in the biological sciences. The goal of this short article is to be a dense, informative introduction to the language of probabilistic graphical models, for beginners, with pointers to successful applications in selected areas of biology. The exposition introduces the essential concepts involved in PGMs in the context of the various stages of a typical collaboration between natural and computational scientists, and discusses the aspects to which each scientist should contribute to carry out the data analysis successfully using PGMs. Let us start by considering a specific problem in transcriptional regulation. Given measurements about the abundance of gene transcripts in retinal cells across stages of development, we would like to discover which functional processes are relevant for development, and reveal which ones are most important at which stage. To develop a PGM to address this problem, we begin by identifying the biological objects that would appear in a cartoon model of how cellular development impacts transcription. In this illustrative example, we have genes and functional processes/contexts. It is reasonable to assume that each gene will participate in multiple functional processes, although typically in a small number of them, and that not all functional processes will be important at all stages of development. We then assess what aspects of the problem we can probe directly, with experimental techniques, and what aspects we cannot. In the example, while an abundance of gene transcripts can be obtained, for instance, via SAGE (serial analysis of gene expression), it is harder to measure functional processes. However, the latter could be operationally defined as sets of genes that share a similar temporal regulation pattern; this definition has the advantage of creating a connection between membership of genes to functional processes (i.e., an unobservable mapping) and similarity of the temporal expression profiles (i.e., observable quantities). The establishment of connections between those biological objects that we can probe and those that we cannot ends a first conceptual effort. A cartoon model of how cellular development impacts transcription is now specified in terms of genes and their abundance, functional processes, and membership of genes to functional processes. Next we translate the biological players and the connections we established among them into mathematical quantities (i.e., random variables) and connections among them (i.e., statistical dependencies). This translation specifies the model structure. At this stage, we rely on biological intuitions to fine-tune the model, for instance, by deciding which sources of variability in the measurements carry information about the latent variables and which do not—if the temporal expression profiles of genes A and B are similar on a relative scale, but their absolute abundance is quite different, should we believe that they both participate in the same functional processes? Last, we assign numerical values to those quantities that are unknown in the final model specifications (i.e., we fit the model to the data) and we use them to develop biological intuitions in the context of the original problem. (Functional aspects of retinal development, in mouse, are fully addressed in [5].) In the following, we briefly introduce the basic mathematical quantities that enable the translation of a cartoon model of biology into a PGM, and we review strategies to assign numerical values to the unknown quantities underlying any PGM that are most likely given the observations. We conclude with an overview of selected applications, complete with pointers to published work.


eLife | 2014

Musashi proteins are post-transcriptional regulators of the epithelial-luminal cell state

Yarden Katz; Feifei Li; Nicole J. Lambert; Ethan S. Sokol; Wai Leong Tam; Albert W. Cheng; Edoardo M. Airoldi; Christopher J. Lengner; Piyush B. Gupta; Zhengquan Yu; Rudolf Jaenisch; Christopher B. Burge

The conserved Musashi (Msi) family of RNA binding proteins are expressed in stem/progenitor and cancer cells, but generally absent from differentiated cells, consistent with a role in cell state regulation. We found that Msi genes are rarely mutated but frequently overexpressed in human cancers and are associated with an epithelial-luminal cell state. Using ribosome profiling and RNA-seq analysis, we found that Msi proteins regulate translation of genes implicated in epithelial cell biology and epithelial-to-mesenchymal transition (EMT), and promote an epithelial splicing pattern. Overexpression of Msi proteins inhibited the translation of Jagged1, a factor required for EMT, and repressed EMT in cell culture and in mammary gland in vivo. Knockdown of Msis in epithelial cancer cells promoted loss of epithelial identity. Our results show that mammalian Msi proteins contribute to an epithelial gene expression program in neural and mammary cell types. DOI: http://dx.doi.org/10.7554/eLife.03915.001

Collaboration


Dive into the Edoardo M. Airoldi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eric P. Xing

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xue Bai

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge