Sophie Lèbre
University of Strasbourg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sophie Lèbre.
BMC Systems Biology | 2010
Sophie Lèbre; Jennifer Becq; Frédéric Devaux; Michael P. H. Stumpf; Gaëlle Lelandais
BackgroundBiological networks are highly dynamic in response to environmental and physiological cues. This variability is in contrast to conventional analyses of biological networks, which have overwhelmingly employed static graph models which stay constant over time to describe biological systems and their underlying molecular interactions.MethodsTo overcome these limitations, we propose here a new statistical modelling framework, the ARTIVA formalism (Auto Regressive TIme VArying models), and an associated inferential procedure that allows us to learn temporally varying gene-regulation networks from biological time-course expression data. ARTIVA simultaneously infers the topology of a regulatory network and how it changes over time. It allows us to recover the chronology of regulatory associations for individual genes involved in a specific biological process (development, stress response, etc.).ResultsWe demonstrate that the ARTIVA approach generates detailed insights into the function and dynamics of complex biological systems and exploits efficiently time-course data in systems biology. In particular, two biological scenarios are analyzed: the developmental stages of Drosophila melanogaster and the response of Saccharomyces cerevisiae to benomyl poisoning.ConclusionsARTIVA does recover essential temporal dependencies in biological systems from transcriptional data, and provide a natural starting point to learn and investigate their dynamics in greater detail.
Euphytica | 2012
Frank Dondelinger; Dirk Husmeier; Sophie Lèbre
To understand the processes of growth and biomass production in plants, we ultimately need to elucidate the structure of the underlying regulatory networks at the molecular level. The advent of high-throughput postgenomic technologies has spurred substantial interest in reverse engineering these networks from data, and several techniques from machine learning and multivariate statistics have recently been proposed. The present article discusses the problem of inferring gene regulatory networks from gene expression time series, and we focus our exposition on the methodology of Bayesian networks. We describe dynamic Bayesian networks and explain their advantages over other statistical methods. We introduce a novel information sharing scheme, which allows us to infer gene regulatory networks from multiple sources of gene expression data more accurately. We illustrate and test this method on a set of synthetic data, using three different measures to quantify the network reconstruction accuracy. The main application of our method is related to the problem of circadian regulation in plants, where we aim to reconstruct the regulatory networks of nine circadian genes in Arabidopsis thaliana from four gene expression time series obtained under different experimental conditions.
Journal of Statistical Computation and Simulation | 2008
Sophie Lèbre; Pierre-Yves Bourguignon
The mixture transition distribution (MTD) model was introduced by Raftery to face the need for parsimony in the modeling of high-order Markov chains in discrete time. The particularity of this model comes from the fact that the effect of each lag upon the present is considered separately and additively, so that the number of parameters required is drastically reduced. However, the efficiency for the MTD parameter estimations proposed up to date still remains problematic on account of the large number of constraints on the parameters. In this article, an iterative procedure, commonly known as expectation–maximization (EM) algorithm, is developed cooperating with the principle of maximum likelihood estimation (MLE) to estimate the MTD parameters. Some applications of modeling MTD show the proposed EM algorithm is easier to be used than the algorithm developed by Berchtold. Moreover, the EM estimations of parameters for high-order MTD models led on DNA sequences outperform the corresponding fully parametrized Markov chain in terms of Bayesian information criterion. A software implementation of our algorithm is available in the library seq++at http://stat.genopole.cnrs.fr/seqpp.
Computational Biology and Chemistry | 2010
Sophie Lèbre; Christian J. Michel
We develop here a new class of stochastic models of gene evolution based on residue Insertion-Deletion Independent from Substitution (IDIS). Indeed, in contrast to all existing evolution models, insertions and deletions are modeled here by a concept in population dynamics. Therefore, they are not only independent from each other, but also independent from the substitution process. After a separate stochastic analysis of the substitution and the insertion-deletion processes, we obtain a matrix differential equation combining these two processes defining the IDIS model. By deriving a general solution, we give an analytical expression of the residue occurrence probability at evolution time t as a function of a substitution rate matrix, an insertion rate vector, a deletion rate and an initial residue probability vector. Various mathematical properties of the IDIS model in relation with time t are derived: time scale, time step, time inversion and sequence length. Particular expressions of the nucleotide occurrence probability at time t are given for classical substitution rate matrices in various biological contexts: equal insertion rate, insertion-deletion only and substitution only. All these expressions can be directly used for biological evolutionary applications. The IDIS model shows a strongly different stochastic behavior from the classical substitution only model when compared on a gene dataset. Indeed, by considering three processes of residue insertion, deletion and substitution independently from each other, it allows a more realistic representation of gene evolution and opens new directions and applications in this research field.
Journal of Theoretical Biology | 2017
Sophie Lèbre; Olivier Gascuel
Overlapping genes exist in all domains of life and are much more abundant than expected upon their first discovery in the late 1970s. Assuming that the reference gene is read in frame +0, an overlapping gene can be encoded in two reading frames in the sense strand, denoted by +1 and +2, and in three reading frames in the opposite strand, denoted by -0, -1, and -2. This motivated numerous researchers to study the constraints induced by the genetic code on the various overlapping frames, mostly based on information theory. Our focus in this paper is on the constraints induced on two overlapping genes in terms of amino acids, as well as polypeptides. We show that simple linear constraints bind the amino-acid composition of two proteins encoded by overlapping genes. Novel constraints are revealed when polypeptides are considered, and not just single amino acids. For example, in double-coding sequences with an overlapping reading frame -2, each Tyrosine (denoted as Tyr or Y) in the overlapping frame overlaps a Tyrosine in the reference frame +0 (and reciprocally), whereas specific words (e.g. YY) never occur. We thus distinguish between null constraints (YY = 0 in frame -2) and non-null constraints (Y in frame +0 ⇔ Y in frame -2). Our equivalence-based constraints are symmetrical and thus enable the characterization of the joint composition of overlapping proteins. We describe several formal frameworks and a graph algorithm to characterize and compute these constraints. As expected, the degrees of freedom left by these constraints vary drastically among the different overlapping frames. Interestingly, the biological meaning of constraints induced on two overlapping proteins (hydropathy, forbidden di-peptides, expected overlap length …) is also specific to the reading frame. We study the combinatorics of these constraints for overlapping polypeptides of length n, pointing out that, (i) except for frame -2, non-null constraints are deduced from the amino-acid (length = 1) constraints and (ii) null constraints are deduced from the di-peptide (length = 2) constraints. These results yield support for understanding the mechanisms and evolution of overlapping genes, and for developing novel overlapping gene detection methods.
Methods of Molecular Biology | 2012
Sophie Lèbre; Frank Dondelinger; Dirk Husmeier
Dynamic Bayesian networks (DBNs) have received increasing attention from the computational biology community as models of gene regulatory networks. However, conventional DBNs are based on the homogeneous Markov assumption and cannot deal with inhomogeneity and nonstationarity in temporal processes. The present chapter provides a detailed discussion of how the homogeneity assumption can be relaxed. The improved method is evaluated on simulated data, where the network structure is allowed to change with time, and on gene expression time series during morphogenesis in Drosophila melanogaster.
PLOS Computational Biology | 2018
Chloé Bessière; May Taha; Florent Petitprez; Jimmy Vandel; Jean-Michel Marin; Laurent Bréhélin; Sophie Lèbre; Charles-Henri Lecellier
Gene expression is orchestrated by distinct regulatory regions to ensure a wide variety of cell types and functions. A challenge is to identify which regulatory regions are active, what are their associated features and how they work together in each cell type. Several approaches have tackled this problem by modeling gene expression based on epigenetic marks, with the ultimate goal of identifying driving regions and associated genomic variations that are clinically relevant in particular in precision medicine. However, these models rely on experimental data, which are limited to specific samples (even often to cell lines) and cannot be generated for all regulators and all patients. In addition, we show here that, although these approaches are accurate in predicting gene expression, inference of TF combinations from this type of models is not straightforward. Furthermore these methods are not designed to capture regulation instructions present at the sequence level, before the binding of regulators or the opening of the chromatin. Here, we probe sequence-level instructions for gene expression and develop a method to explain mRNA levels based solely on nucleotide features. Our method positions nucleotide composition as a critical component of gene expression. Moreover, our approach, able to rank regulatory regions according to their contribution, unveils a strong influence of the gene body sequence, in particular introns. We further provide evidence that the contribution of nucleotide content can be linked to co-regulations associated with genome 3D architecture and to associations of genes within topologically associated domains.
Archive | 2013
Radhakrishnan Nagarajan; Marco Scutari; Sophie Lèbre
Real-world entities comprising a complex system evolve as a function of time and respond to external perturbations. Dynamic Bayesian networks extend the fundamental ideas behind static Bayesian networks to model associations arising from the temporal dynamics between the entities of interest. This has to be contrasted with static Bayesian networks, which model the network structure from multiple independent realizations of the entities of a snapshot of the process. More importantly, incorporating the temporal signatures is useful in capturing possible feedback loops that are implicitly disregarded in the case of static Bayesian networks. Since feedback loops are ubiquitous in biological pathways, dynamic Bayesian network modeling is expected to result in better representations of such pathways.
Archive | 2013
Radhakrishnan Nagarajan; Marco Scutari; Sophie Lèbre
Most problems in Bayesian network theory have a computational complexity that, in the worst case, scales exponentially with the number of variables. It is polynomial even for sparse networks. Even though newer algorithms are designed to improve scalability, it is unfeasible to analyze data containing more than a few hundreds of variables. Parallel computing provides a way to address this problem by making better use of modern hardware.
Archive | 2013
Radhakrishnan Nagarajan; Marco Scutari; Sophie Lèbre
Chapters 2 and 3 discussed the importance of learning the structure and the parameters of Bayesian networks from observational and interventional data sets. Bayesian inference on the other hand is often a follow-up to Bayesian network learning and deals with inferring the state of a set of variables given the state of others as evidence. Such an approach eliminates the need for additional experiments and is therefore extremely helpful. In this chapter, we will introduce inferential techniques for static and dynamic Bayesian networks and their applications to gene expression profiles.