Ritabrata Dutta
University of Lugano
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ritabrata Dutta.
Statistics and Computing | 2018
Michael U. Gutmann; Ritabrata Dutta; Samuel Kaski; Jukka Corander
Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers.
Systematic Biology | 2016
Jarno Lintusaari; Michael U. Gutmann; Ritabrata Dutta; Samuel Kaski; Jukka Corander
&NA; Bayesian inference plays an important role in phylogenetics, evolutionary biology, and in many other branches of science. It provides a principled framework for dealing with uncertainty and quantifying how it changes in the light of new evidence. For many complex models and inference problems, however, only approximate quantitative answers are obtainable. Approximate Bayesian computation (ABC) refers to a family of algorithms for approximate inference that makes a minimal set of assumptions by only requiring that sampling from a model is possible. We explain here the fundamentals of ABC, review the classical algorithms, and highlight recent developments. [ABC; approximate Bayesian computation; Bayesian inference; likelihood‐free inference; phylogenetics; simulator‐based models; stochastic simulation models; tree‐based models.]
Proceedings of the Platform for Advanced Scientific Computing Conference on | 2017
Ritabrata Dutta; Marcel Schoengens; Jukka-Pekka Onnela; Antonietta Mira
ABCpy is a highly modular, scientific library for Approximate Bayesian Computation (ABC) written in Python. The main contribution of this paper is to document a software engineering effort that enables domain scientists to easily apply ABC to their research without being ABC experts; using ABCpy they can easily run large parallel simulations without much knowledge about parallelization, even without much additional effort to parallelize their code. Further, ABCpy enables ABC experts to easily develop new inference schemes and evaluate them in a standardized environment, and to extend the library with new algorithms. These benefits come mainly from the modularity of ABCpy. We give an overview of the design of ABCpy, and we provide a performance evaluation concentrating on parallelization.
Bioinformatics | 2016
Paul Blomstedt; Ritabrata Dutta; Sohan Seth; Alvis Brazma; Samuel Kaski
MOTIVATION Public and private repositories of experimental data are growing to sizes that require dedicated methods for finding relevant data. To improve on the state of the art of keyword searches from annotations, methods for content-based retrieval have been proposed. In the context of gene expression experiments, most methods retrieve gene expression profiles, requiring each experiment to be expressed as a single profile, typically of case versus control. A more general, recently suggested alternative is to retrieve experiments whose models are good for modelling the query dataset. However, for very noisy and high-dimensional query data, this retrieval criterion turns out to be very noisy as well. RESULTS We propose doing retrieval using a denoised model of the query dataset, instead of the original noisy dataset itself. To this end, we introduce a general probabilistic framework, where each experiment is modelled separately and the retrieval is done by finding related models. For retrieval of gene expression experiments, we use a probabilistic model called product partition model, which induces a clustering of genes that show similar expression patterns across a number of samples. The suggested metric for retrieval using clusterings is the normalized information distance. Empirical results finally suggest that inference for the full probabilistic model can be approximated with good performance using computationally faster heuristic clustering approaches (e.g. k-means). The method is highly scalable and straightforward to apply to construct a general-purpose gene expression experiment retrieval method. AVAILABILITY AND IMPLEMENTATION The method can be implemented using standard clustering algorithms and normalized information distance, available in many statistical software packages. CONTACT [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
arXiv: Applications | 2018
Ritabrata Dutta; Antonietta Mira; Jukka-Pekka Onnela
Infectious diseases are studied to understand their spreading mechanisms, to evaluate control strategies and to predict the risk and course of future outbreaks. Because people only interact with few other individuals, and the structure of these interactions influence spreading processes, the pairwise relationships between individuals can be usefully represented by a network. Although the underlying transmission processes are different, the network approach can be used to study the spread of pathogens in a contact network or the spread of rumours in a social network. We study simulated simple and complex epidemics on synthetic networks and on two empirical networks, a social/contact network in an Indian village and an online social network. Our goal is to learn simultaneously the spreading process parameters and the first infected node, given a fixed network structure and the observed state of nodes at several time points. Our inference scheme is based on approximate Bayesian computation, a likelihood-free inference technique. Our method is agnostic about the network topology and the spreading process. It generally performs well and, somewhat counter-intuitively, the inference problem appears to be easier on more heterogeneous network topologies, which enhances its future applicability to real-world settings where few networks have homogeneous topologies.
Journal of Chemical Physics | 2018
Ritabrata Dutta; Zacharias Faidon Brotzakis; Antonietta Mira
Molecular dynamics (MD) simulations give access to equilibrium structures and dynamic properties given an ergodic sampling and an accurate force-field. The force-field parameters are calibrated to reproduce properties measured by experiments or simulations. The main contribution of this paper is an approximate Bayesian framework for the calibration and uncertainty quantification of the force-field parameters, without assuming parameter uncertainty to be Gaussian. To this aim, since the likelihood function of the MD simulation models is intractable in the absence of Gaussianity assumption, we use a likelihood-free inference scheme known as approximate Bayesian computation (ABC) and propose an adaptive population Monte Carlo ABC algorithm, which is illustrated to converge faster and scales better than the previously used ABCsubsim algorithm for the calibration of the force-field of a helium system. The second contribution is the adaptation of ABC algorithms for High Performance Computing to MD simulations within the Python ecosystem ABCpy. This adaptation includes a novel use of a dynamic allocation scheme for Message Passing Interface (MPI). We illustrate the performance of the developed methodology to learn posterior distribution and Bayesian estimates of Lennard-Jones force-field parameters of helium and the TIP4P system of water implemented for both simulated and experimental datasets collected using neutron and X-ray diffraction. For simulated data, the Bayesian estimate is in close agreement with the true parameter value used to generate the dataset. For experimental as well as for simulated data, the Bayesian posterior distribution shows a strong correlation pattern between the force-field parameters. Providing an estimate of the entire posterior distribution, our methodology also allows us to perform the uncertainty quantification of model prediction. This research opens up the possibility to rigorously calibrate force-fields from available experimental datasets of any structural and dynamic property.
Frontiers in Physiology | 2018
Ritabrata Dutta; Bastien Chopard; Jonas Latt; Frank Dubois; Karim Zouaoui Boudjeltia; Antonietta Mira
Cardio/cerebrovascular diseases (CVD) have become one of the major health issue in our societies. Recent studies show the existing clinical tests to detect CVD are ineffectual as they do not consider different stages of platelet activation or the molecular dynamics involved in platelet interactions. Further they are also incapable to consider inter-individual variability. A physical description of platelets deposition was introduced recently in Chopard et al. (2017), by integrating fundamental understandings of how platelets interact in a numerical model, parameterized by five parameters. These parameters specify the deposition process and are relevant for a biomedical understanding of the phenomena. One of the main intuition is that these parameters are precisely the information needed for a pathological test identifying CVD captured and that they capture the inter-individual variability. Following this intuition, here we devise a Bayesian inferential scheme for estimation of these parameters, using experimental observations, at different time intervals, on the average size of the aggregation clusters, their number per mm2, the number of platelets, and the ones activated per μℓ still in suspension. As the likelihood function of the numerical model is intractable due to the complex stochastic nature of the model, we use a likelihood-free inference scheme approximate Bayesian computation (ABC) to calibrate the parameters in a data-driven manner. As ABC requires the generation of many pseudo-data by expensive simulation runs, we use a high performance computing (HPC) framework for ABC to make the inference possible for this model. We consider a collective dataset of seven volunteers and use this inference scheme to get an approximate posterior distribution and the Bayes estimate of these five parameters. The mean posterior prediction of platelet deposition pattern matches the experimental dataset closely with a tight posterior prediction error margin, justifying our main intuition and providing a methodology to infer these parameters given patient data. The present approach can be used to build a new generation of personalized platelet functionality tests for CVD detection, using numerical modeling of platelet deposition, Bayesian uncertainty quantification, and High performance computing.
arXiv: Machine Learning | 2016
Ritabrata Dutta; Jukka Corander; Samuel Kaski; Michael U. Gutmann
arXiv: Machine Learning | 2016
Ritabrata Dutta; Jukka Corander; Samuel Kaski; Michael U. Gutmann
arXiv: Computation | 2014
Michael U. Gutmann; Ritabrata Dutta; Samuel Kaski; Jukka Corander