Sarah Filippi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sarah Filippi is active.

Explore More

Publication

Featured researches published by Sarah Filippi.

PLOS Computational Biology | 2013

Maximizing the Information Content of Experiments in Systems Biology

Juliane Liepe; Sarah Filippi; Michał Komorowski; Michael P. H. Stumpf

Our understanding of most biological systems is in its infancy. Learning their structure and intricacies is fraught with challenges, and often side-stepped in favour of studying the function of different gene products in isolation from their physiological context. Constructing and inferring global mathematical models from experimental data is, however, central to systems biology. Different experimental setups provide different insights into such systems. Here we show how we can combine concepts from Bayesian inference and information theory in order to identify experiments that maximize the information content of the resulting data. This approach allows us to incorporate preliminary information; it is global and not constrained to some local neighbourhood in parameter space and it readily yields information on parameter robustness and confidence. Here we develop the theoretical framework and apply it to a range of exemplary problems that highlight how we can improve experimental investigations into the structure and dynamics of biological systems and their behavior.

Proceedings of the National Academy of Sciences of the United States of America | 2012

Perturbation of fetal liver hematopoietic stem and progenitor cell development by trisomy 21

Anindita Roy; Gillian Cowan; Adam Mead; Sarah Filippi; Georg Bohn; Aristeidis Chaidos; Oliver Tunstall; Jerry Chan; Mahesh Choolani; Phillip R. Bennett; Sailesh Kumar; Deborah Atkinson; Josephine Wyatt-Ashmead; Ming Hu; Michael P. H. Stumpf; Katerina Goudevenou; David F. O'Connor; Stella T. Chou; Mitchell J. Weiss; Anastasios Karadimitris; Sten Eirik W. Jacobsen; Paresh Vyas; Irene Roberts

The 40-fold increase in childhood megakaryocyte-erythroid and B-cell leukemia in Down syndrome implicates trisomy 21 (T21) in perturbing fetal hematopoiesis. Here, we show that compared with primary disomic controls, primary T21 fetal liver (FL) hematopoietic stem cells (HSC) and megakaryocyte-erythroid progenitors are markedly increased, whereas granulocyte-macrophage progenitors are reduced. Commensurately, HSC and megakaryocyte-erythroid progenitors show higher clonogenicity, with increased megakaryocyte, megakaryocyte-erythroid, and replatable blast colonies. Biased megakaryocyte-erythroid–primed gene expression was detected as early as the HSC compartment. In lymphopoiesis, T21 FL lymphoid-primed multipotential progenitors and early lymphoid progenitor numbers are maintained, but there was a 10-fold reduction in committed PreproB-lymphoid progenitors and the functional B-cell potential of HSC and early lymphoid progenitor is severely impaired, in tandem with reduced early lymphoid gene expression. The same pattern was seen in all T21 FL samples and no samples had GATA1 mutations. Therefore, T21 itself causes multiple distinct defects in FL myelo- and lymphopoiesis.

Statistical Applications in Genetics and Molecular Biology | 2013

On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo.

Sarah Filippi; C. Barnes; Julien Cornebise; Michael P. H. Stumpf

Abstract Approximate Bayesian computation (ABC) has gained popularity over the past few years for the analysis of complex models arising in population genetics, epidemiology and system biology. Sequential Monte Carlo (SMC) approaches have become work-horses in ABC. Here we discuss how to construct the perturbation kernels that are required in ABC SMC approaches, in order to construct a sequence of distributions that start out from a suitably defined prior and converge towards the unknown posterior. We derive optimality criteria for different kernels, which are based on the Kullback-Leibler divergence between a distribution and the distribution of the perturbed particles. We will show that for many complicated posterior distributions, locally adapted kernels tend to show the best performance. We find that the added moderate cost of adapting kernel functions is easily regained in terms of the higher acceptance rate. We demonstrate the computational efficiency gains in a range of toy examples which illustrate some of the challenges faced in real-world applications of ABC, before turning to two demanding parameter inference problems in molecular biology, which highlight the huge increases in efficiency that can be gained from choice of optimal kernels. We conclude with a general discussion of the rational choice of perturbation kernels in ABC SMC settings.

Proceedings of the National Academy of Sciences of the United States of America | 2014

The ecology in the hematopoietic stem cell niche determines the clinical outcome in chronic myeloid leukemia

Adam L. MacLean; Sarah Filippi; Michael P. H. Stumpf

Significance Three contrasting models of the ecological interactions in the hematopoietic stem cell niche explain clinical progression of chronic myeloid leukemia equally well, but do so in different ways. We identify key differences between models and find that we can conclusively rule out those that fail to take competition between healthy and leukemic lineages explicitly into account. Detailed analysis of population dynamics within the bone marrow niche allows us to ascribe mechanisms to distinct disease outcomes and suggests experiments to distinguish between these mechanisms. Chronic myeloid leukemia (CML) is a blood disease that disrupts normal function of the hematopoietic system. Despite the great progress made in terms of molecular therapies for CML, there remain large gaps in our understanding. By comparing mathematical models that describe CML progression and etiology we sought to identify those models that provide the best description of disease dynamics and their underlying mechanisms. Data for two clinical outcomes—disease remission or relapse—are considered, and we investigate these using Bayesian inference techniques throughout. We find that it is not possible to choose between the models based on fits to the data alone; however, by studying model predictions we can discard models that fail to take niche effects into account. More detailed analysis of the remaining models reveals mechanistic differences: for one model, leukemia stem cell dynamics determine the disease outcome; and for the other model disease progression is determined at the stage of progenitor cells, in particular by differences in progenitor death rates. This analysis also reveals distinct transient dynamics that will be experimentally accessible, but are currently at the limits of what is possible to measure. To resolve these differences we need to be able to probe the hematopoietic stem cell niche directly. Our analysis highlights the importance of further mapping of the bone marrow hematopoietic niche microenvironment as the “ecological” interactions between cells in this niche appear to be intricately linked to disease outcome.

Statistics and Computing | 2012

Considerate approaches to constructing summary statistics for ABC model selection

C. Barnes; Sarah Filippi; Michael P. H. Stumpf; Thomas Thorne

For nearly any challenging scientific problem evaluation of the likelihood is problematic if not impossible. Approximate Bayesian computation (ABC) allows us to employ the whole Bayesian formalism to problems where we can use simulations from a model, but cannot evaluate the likelihood directly. When summary statistics of real and simulated data are compared—rather than the data directly—information is lost, unless the summary statistics are sufficient. Sufficient statistics are, however, not common but without them statistical inference in ABC inferences are to be considered with caution. Previously other authors have attempted to combine different statistics in order to construct (approximately) sufficient statistics using search and information heuristics. Here we employ an information-theoretical framework that can be used to construct appropriate (approximately sufficient) statistics by combining different statistics until the loss of information is minimized. We start from a potentially large number of different statistics and choose the smallest set that captures (nearly) the same information as the complete set. We then demonstrate that such sets of statistics can be constructed for both parameter estimation and model selection problems, and we apply our approach to a range of illustrative and real-world model selection problems.

allerton conference on communication, control, and computing | 2010

Optimism in reinforcement learning and Kullback-Leibler divergence

Sarah Filippi; Olivier Cappé; Aurélien Garivier

We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out extended value iterations under a constraint of consistency with the estimated model transition probabilities. The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows this strategy, has recently been shown to guarantee near-optimal regret bounds. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this purpose. By studying the linear maximization problem under KL constraints, we provide an efficient algorithm, termed KL-UCRL, for solving KL-optimistic extended value iteration. Using recent deviation bounds on the KL divergence, we prove that KL-UCRL provides the same guarantees as UCRL2 in terms of regret. However, numerical experiments on classical benchmarks show a significantly improved behavior, particularly when the MDP has reduced connectivity. To support this observation, we provide elements of comparison between the two algorithms based on geometric considerations.

Statistical Applications in Genetics and Molecular Biology | 2013

Optimizing threshold-schedules for sequential approximate Bayesian computation: applications to molecular systems

Daniel Silk; Sarah Filippi; Michael P. H. Stumpf

Abstract The likelihood–free sequential Approximate Bayesian Computation (ABC) algorithms are increasingly popular inference tools for complex biological models. Such algorithms proceed by constructing a succession of probability distributions over the parameter space conditional upon the simulated data lying in an ε–ball around the observed data, for decreasing values of the threshold ε. While in theory, the distributions (starting from a suitably defined prior) will converge towards the unknown posterior as ε tends to zero, the exact sequence of thresholds can impact upon the computational efficiency and success of a particular application. In particular, we show here that the current preferred method of choosing thresholds as a pre-determined quantile of the distances between simulated and observed data from the previous population, can lead to the inferred posterior distribution being very different to the true posterior. Threshold selection thus remains an important challenge. Here we propose that the threshold–acceptance rate curve may be used to determine threshold schedules that avoid local optima, while balancing the need to minimise the threshold with computational efficiency. Furthermore, we provide an algorithm based upon the unscented transform, that enables the threshold–acceptance rate curve to be efficiently predicted in the case of deterministic and stochastic state space models.

Cell Reports | 2016

Robustness of MEK-ERK Dynamics and Origins of Cell-to-Cell Variability in MAPK Signaling

Sarah Filippi; C. Barnes; Paul Kirk; Takamasa Kudo; Katsuyuki Kunida; Siobhan S. McMahon; Takaho Tsuchiya; Takumi Wada; Shinya Kuroda; Michael P. H. Stumpf

Summary Cellular signaling processes can exhibit pronounced cell-to-cell variability in genetically identical cells. This affects how individual cells respond differentially to the same environmental stimulus. However, the origins of cell-to-cell variability in cellular signaling systems remain poorly understood. Here, we measure the dynamics of phosphorylated MEK and ERK across cell populations and quantify the levels of population heterogeneity over time using high-throughput image cytometry. We use a statistical modeling framework to show that extrinsic noise, particularly that from upstream MEK, is the dominant factor causing cell-to-cell variability in ERK phosphorylation, rather than stochasticity in the phosphorylation/dephosphorylation of ERK. We furthermore show that without extrinsic noise in the core module, variable (including noisy) signals would be faithfully reproduced downstream, but the within-module extrinsic variability distorts these signals and leads to a drastic reduction in the mutual information between incoming signal and ERK activity.

Seminars in Cell & Developmental Biology | 2014

Information theory and signal transduction systems: from molecular information processing to network inference.

Siobhan S. Mc Mahon; Aaron Sim; Sarah Filippi; Rob Johnson; Juliane Liepe; Dominic Smith; Michael P. H. Stumpf

Sensing and responding to the environment are two essential functions that all biological organisms need to master for survival and successful reproduction. Developmental processes are marshalled by a diverse set of signalling and control systems, ranging from systems with simple chemical inputs and outputs to complex molecular and cellular networks with non-linear dynamics. Information theory provides a powerful and convenient framework in which such systems can be studied; but it also provides the means to reconstruct the structure and dynamics of molecular interaction networks underlying physiological and developmental processes. Here we supply a brief description of its basic concepts and introduce some useful tools for systems and developmental biologists. Along with a brief but thorough theoretical primer, we demonstrate the wide applicability and biological application-specific nuances by way of different illustrative vignettes. In particular, we focus on the characterisation of biological information processing efficiency, examining cell-fate decision making processes, gene regulatory network reconstruction, and efficient signal transduction experimental design.

Statistics and Computing | 2018

Large-scale kernel methods for independence testing

Qinyi Zhang; Sarah Filippi; Arthur Gretton; Dino Sejdinovic

Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable trade-off between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nyström and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our large-scale methods give comparable performance with existing methods while using significantly less computation time and memory.

Explore More