Sach Mukherjee
German Center for Neurodegenerative Diseases
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sach Mukherjee.
Nature Communications | 2014
Rehan Akbani; Patrick Kwok Shing Ng; Henrica Maria Johanna Werner; Maria Shahmoradgoli; Fan Zhang; Zhenlin Ju; Wenbin Liu; Ji Yeon Yang; Kosuke Yoshihara; Jun Li; Shiyun Ling; Elena G. Seviour; Prahlad T. Ram; John D. Minna; Lixia Diao; Pan Tong; John V. Heymach; Steven M. Hill; Frank Dondelinger; Nicolas Städler; Lauren Averett Byers; Funda Meric-Bernstam; John N. Weinstein; Bradley M. Broom; Roeland Verhaak; Han Liang; Sach Mukherjee; Yiling Lu; Gordon B. Mills
Protein levels and function are poorly predicted by genomic and transcriptomic analysis of patient tumors. Therefore, direct study of the functional proteome has the potential to provide a wealth of information that complements and extends genomic, epigenomic and transcriptomic analysis in The Cancer Genome Atlas (TCGA) projects. Here we use reverse-phase protein arrays to analyze 3,467 patient samples from 11 TCGA “Pan-Cancer” diseases, using 181 high-quality antibodies that target 128 total proteins and 53 post-translationally modified proteins. The resultant proteomic data is integrated with genomic and transcriptomic analyses of the same samples to identify commonalities, differences, emergent pathways and network biology within and across tumor lineages. In addition, tissue-specific signals are reduced computationally to enhance biomarker and target discovery spanning multiple tumor lineages. This integrative analysis, with an emphasis on pathways and potentially actionable proteins, provides a framework for determining the prognostic, predictive and therapeutic relevance of the functional proteome.
Proceedings of the National Academy of Sciences of the United States of America | 2008
Sach Mukherjee; Terence P. Speed
Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of “network inference” is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling.
Bioinformatics | 2012
Steven M. Hill; Yiling Lu; Jennifer R. Molina; Laura M. Heiser; Paul T. Spellman; Terence P. Speed; Joe W. Gray; Gordon B. Mills; Sach Mukherjee
MOTIVATION Protein signaling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. To shed light on signaling network topology in specific contexts, such as cancer, requires interrogation of multiple proteins through time and statistical approaches to make inferences regarding network structure. RESULTS In this study, we use dynamic Bayesian networks to make inferences regarding network structure and thereby generate testable hypotheses. We incorporate existing biology using informative network priors, weighted objectively by an empirical Bayes approach, and exploit a connection between variable selection and network inference to enable exact calculation of posterior probabilities of interest. The approach is computationally efficient and essentially free of user-set tuning parameters. Results on data where the true, underlying network is known place the approach favorably relative to existing approaches. We apply these methods to reverse-phase protein array time-course data from a breast cancer cell line (MDA-MB-468) to predict signaling links that we independently validate using targeted inhibition. The methods proposed offer a general approach by which to elucidate molecular networks specific to biological context, including, but not limited to, human cancers. AVAILABILITY http://mukherjeelab.nki.nl/DBN (code and data).
Nature Methods | 2016
Steven M. Hill; Laura M. Heiser; Thomas Cokelaer; Michael Unger; Nicole K. Nesser; Daniel E. Carlin; Yang Zhang; Artem Sokolov; Evan O. Paull; Christopher K. Wong; Kiley Graim; Adrian Bivol; Haizhou Wang; Fan Zhu; Bahman Afsari; Ludmila Danilova; Alexander V. Favorov; Wai Shing Lee; Dane Taylor; Chenyue W. Hu; Byron L. Long; David P. Noren; Alexander J Bisberg; Gordon B. Mills; Joe W. Gray; Michael R. Kellen; Thea Norman; Stephen H. Friend; Amina A. Qutub; Elana J. Fertig
It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.
Bioinformatics | 2010
Steven John Kiddle; Oliver P. Windram; Stuart McHattie; A. Mead; Jim Beynon; Vicky Buchanan-Wollaston; Katherine J. Denby; Sach Mukherjee
MOTIVATION Identifying regulatory modules is an important task in the exploratory analysis of gene expression time series data. Clustering algorithms are often used for this purpose. However, gene regulatory events may induce complex temporal features in a gene expression profile, including time delays, inversions and transient correlations, which are not well accounted for by current clustering methods. As the cost of microarray experiments continues to fall, the temporal resolution of time course studies is increasing. This has led to a need to take account of detailed temporal features of this kind. Thus, while standard clustering methods are both widely used and much studied, their shared shortcomings with respect to such temporal features motivates the work presented here. RESULTS Here, we introduce a temporal clustering approach for high-dimensional gene expression data which takes account of time delays, inversions and transient correlations. We do so by exploiting a recently introduced, message-passing-based algorithm called Affinity Propagation (AP). We take account of temporal features of interest following an approximate but efficient dynamic programming approach due to Qian et al. The resulting approach is demonstrably effective in its ability to discern non-obvious temporal features, yet efficient and robust enough for routine use as an exploratory tool. We show results on validated transcription factor-target pairs in yeast and on gene expression data from a study of Arabidopsis thaliana under pathogen infection. The latter reveals a number of biologically striking findings. AVAILABILITY Matlab code for our method is available at http://www.wsbc.warwick.ac.uk/stevenkiddle/tcap.html.
The Annals of Applied Statistics | 2012
Chris J. Oates; Sach Mukherjee
Network inference approaches are now widely used in biological applications to probe regulatory relationships between molecular components such as genes or proteins. Many methods have been proposed for this setting, but the connections and differences between their statistical formulations have received less attention. In this paper, we show how a broad class of statistical network inference methods, including a number of existing approaches, can be described in terms of variable selection for the linear model. This reveals some subtle but important differences between the methods, including the treatment of time intervals in discretely observed data. In developing a general formulation, we also explore the relationship between single-cell stochastic dynamics and network inference on averages over cells. This clarifies the link between biochemical networks as they operate at the cellular level and network inference as carried out on data that are averages over populations of cells. We present empirical results, comparing thirty-two network inference methods that are instances of the general formulation we describe, using two published dynamical models. Our investigation sheds light on the applicability and limitations of network inference and provides guidance for practitioners and suggestions for experimental design.
Oncogene | 2016
Elena G. Seviour; Vasudha Sehgal; Yiling Lu; Z. Luo; Tyler Moss; Fahao Zhang; S. M. Hill; W. Liu; S. N. Maiti; L. Cooper; R. Azencot; Gabriel Lopez-Berestein; Cristian Rodriguez-Aguayo; R. Roopaimoole; Chad V. Pecot; Anil K. Sood; Sach Mukherjee; Joe W. Gray; Gordon B. Mills; Prahlad T. Ram
The myc oncogene is overexpressed in almost half of all breast and ovarian cancers, but attempts at therapeutic interventions against myc have proven to be challenging. Myc regulates multiple biological processes, including the cell cycle, and as such is associated with cell proliferation and tumor progression. We identified a protein signature of high myc, low p27 and high phospho-Rb significantly correlated with poor patient survival in breast and ovarian cancers. Screening of a miRNA library by functional proteomics in multiple cell lines and integration of data from patient tumors revealed a panel of five microRNAs (miRNAs) (miR-124, miR-365, miR-34b*, miR-18a and miR-506) as potential tumor suppressors capable of reversing the p27/myc/phospho-Rb protein signature. Mechanistic studies revealed an RNA-activation function of miR-124 resulting in direct induction of p27 protein levels by binding to and inducing transcription on the p27 promoter region leading to a subsequent G1 arrest. Additionally, in vivo studies utilizing a xenograft model demonstrated that nanoparticle-mediated delivery of miR-124 could reduce tumor growth and sensitize cells to etoposide, suggesting a clinical application of miRNAs as therapeutics to target the functional effect of myc on tumor growth.
Bioinformatics | 2014
Chris J. Oates; Frank Dondelinger; Nora Bayani; James E. Korkola; Joe W. Gray; Sach Mukherjee
Motivation: Networks are widely used as structural summaries of biochemical systems. Statistical estimation of networks is usually based on linear or discrete models. However, the dynamics of biochemical systems are generally non-linear, suggesting that suitable non-linear formulations may offer gains with respect to causal network inference and aid in associated prediction problems. Results: We present a general framework for network inference and dynamical prediction using time course data that is rooted in non-linear biochemical kinetics. This is achieved by considering a dynamical system based on a chemical reaction graph with associated kinetic parameters. Both the graph and kinetic parameters are treated as unknown; inference is carried out within a Bayesian framework. This allows prediction of dynamical behavior even when the underlying reaction graph itself is unknown or uncertain. Results, based on (i) data simulated from a mechanistic model of mitogen-activated protein kinase signaling and (ii) phosphoproteomic data from cancer cell lines, demonstrate that non-linear formulations can yield gains in causal network inference and permit dynamical prediction and uncertainty quantification in the challenging setting where the reaction graph is unknown. Availability and implementation: MATLAB R2014a software is available to download from warwick.ac.uk/chrisoates. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
BMC Bioinformatics | 2012
Steven M. Hill; Richard M. Neve; Nora Bayani; Wen Lin Kuo; Safiyyah Ziyad; Paul T. Spellman; Joe W. Gray; Sach Mukherjee
BackgroundAn important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data.ResultsWe put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information.ConclusionsThe empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge.
The Annals of Applied Statistics | 2014
Chris J. Oates; James E. Korkola; Joe W. Gray; Sach Mukherjee
Graphical models are widely used to make inferences concerning interplay in multivariate systems. In many applications, data are collected from multiple related but nonidentical units whose underlying networks may differ but are likely to share features. Here we present a hierarchical Bayesian formulation for joint estimation of multiple networks in this nonidentically distributed setting. The approach is general: given a suitable class of graphical models, it uses an exchangeability assumption on networks to provide a corresponding joint formulation. Motivated by emerging experimental designs in molecular biology, we focus on time-course data with interventions, using dynamic Bayesian networks as the graphical models. We introduce a computationally efficient, deterministic algorithm for exact joint inference in this setting. We provide an upper bound on the gains that joint estimation offers relative to separate estimation for each network and empirical results that support and extend the theory, including an extensive simulation study and an application to proteomic data from human cancer cell lines. Finally, we describe approximations that are still more computationally efficient than the exact algorithm and that also demonstrate good empirical performance.