Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Florencia Leonardi is active.

Publication


Featured researches published by Florencia Leonardi.


Archive | 2007

Constructing Probabilistic Genetic Networks of Plasmodium falciparum from Dynamical Expression Signals of the Intraerythrocytic Development Cycle

Junior Barrera; Roberto M. Cesar; David Correa Martins; Ricardo Z. N. Vêncio; Emilio F. Merino; Marcio Yamamoto; Florencia Leonardi; Carlos Alberto Pereira; Hernando A. del Portillo

The completion of the genome sequence of Plasmodium falciparum revealed that close to 60% of the annotated genome corresponds to hypothetical proteins and that many genes, whose metabolic pathways or biological products are known, have not been predicted from sequence similarity searches. Recently, using global gene expression of the asexual blood stages of P. falciparum at 1 h resolution scale and Discrete Fourier Transform based techniques, it has been demonstrated that many genes are regulated in a single periodic manner during the asexual blood stages. Moreover, by ordering the genes according to the phase of expression, a new list of targets for vaccine and drug development was generated. In the present paper, genes are annotated under a different perspective: a list of functional properties is attributed to networks of genes representing subsystems of the P. falciparum regulatory expression system. The model developed to represent genetic networks, called Probabilistic Genetic Network (PGN), is a Markov chain with some additional properties. This model mimics the properties of a gene as a non-linear stochastic gate and the systems are built by coupling of these gates. Moreover, a tool that integrates mining of dynamical expression signals by PGN design techniques, different databases and biological knowledge, was developed. The applicability of this tool for discovering gene networks of the malaria expression regulation system has been validated using the glycolytic pathway as a “gold-standard”, as well as by creating an apicoplast PGN network. Presently, we are tentatively improving the network design technique before trying to validate results from the apicoplast PGN network through reverse genetics approaches.


The Annals of Applied Statistics | 2012

Context tree selection and linguistic rhythm retrieval from written texts

Antonio Galves; Charlotte Galves; Jesús E. García; Nancy L. Garcia; Florencia Leonardi

We introduce a new criterion to select in a consistent way the probabilistic context tree generating a sample. The basic idea is to construct a totally ordered set of candidate trees. This set is composed by the “champion trees”, the ones that maximize the likelihood of the sample for each number of degrees of freedom. The smallest maximizer criterion selects the infimum of the subset of champion trees whose gain in likelihood is negligible. This study was motivated by the linguistic challenge of retrieving rhythmic patterns from written texts. Applied to a data set consisting of texts extracted from daily newspapers, our algorithm identifies different context trees for European Portuguese and Brazilian Portuguese. This is compatible with the long standing conjecture that European Portuguese and Brazilian Portuguese belong to different rhythmic classes. Moreover, these context trees have several interesting properties which are linguistically meaningful.


The Annals of Applied Statistics | 2009

Testing statistical hypothesis on random trees and applications to the protein classification problem

Jorge Rodolfo Busch; Pablo A. Ferrari; Ana Georgina Flesia; Ricardo Fraiman; Sebastian P. Grynberg; Florencia Leonardi

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov―Smirnov-type goodness-of-fit test proposed by Balding et al. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford―Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton―Watson related processes.


arXiv: Statistics Theory | 2008

Exponential Inequalities for Empirical Unbounded Context Trees

Antonio Galves; Florencia Leonardi

In this paper we obtain non-uniform exponential upper bounds for the rate of convergence of a version of the algorithm Context, when the underlying tree is not necessarily bounded. The algorithm Context is a well-known tool to estimate the context tree of a Variable Length Markov Chain. As a consequence of the exponential bounds we obtain a strong consistency result. We generalize in this way several previous results in the field.


Stochastic Processes and their Applications | 2011

Context Tree Selection: A Unifying View

Aurélien Garivier; Florencia Leonardi

We study a problem of model selection for data produced by two different context tree sources. Motivated by linguistic questions, we consider the case where the probabilistic context trees corresponding to the two sources are finite and share many of their contexts. In order to understand the differences between the two sources, it is important to identify which contexts and which transition probabilities are specific to each source. We consider a class of probabilistic context tree models with three types of contexts: those which appear in one, the other, or both sources. We use a BIC penalized maximum likelihood procedure that jointly estimates the two sources. We propose a new algorithm which efficiently computes the estimated context trees. We prove that the procedure is strongly consistent. We also present a simulation study showing the practical advantage of our procedure over a procedure that works separately on each dataset.


Bioinformatics | 2006

A generalization of the PST algorithm: modeling the sparse nature of protein sequences

Florencia Leonardi

MOTIVATION A central problem in genomics is to determine the function of a protein using the information contained in its amino acid sequence. Variable length Markov chains (VLMC) are a promising class of models that can effectively classify proteins into families and they can be estimated in linear time and space. RESULTS We introduce a new algorithm, called Sparse Probabilistic Suffix Trees (SPST), that identifies equivalence between the contexts of a VLMC. We show that, in many cases, the identification of these equivalence can improve the classification rate of the classical Probabilistic Suffix Trees (PST) algorithm. We also show that better classification can be achieved by identifying representative fingerprints in the amino acid chains, and this variation in the SPST algorithm is called F-SPST.


Brazilian Journal of Probability and Statistics | 2010

Some upper bounds for the rate of convergence of penalized likelihood context tree estimators

Florencia Leonardi

We find upper bounds for the probability of underestimation and overestimation errors in penalized likelihood context tree estimation. The bounds are explicit and applies to processes of not necessarily finite memory. We allow for general penalizing terms and we give conditions over the maximal depth of the estimated trees in order to get strongly consis- tent estimates. This generalizes previous results obtained in the case of estimation of the order of a Markov chain.


Annals of Applied Probability | 2014

Loss of memory of hidden Markov models and Lyapunov exponents

Pierre Collet; Florencia Leonardi

In this paper we prove that the asymptotic rate of exponential loss of memory of a finite state hidden Markov model is bounded above by the difference of the first two Lyapunov exponents of a certain product of matrices. We also show that this bound is in fact realized, namely for almost all realizations of the observed process we can find symbols where the asymptotic exponential rate of loss of memory attains the difference of the first two Lyapunov exponents. These results are derived in particular for the observed process and for the filter; that is, for the distribution of the hidden state conditioned on the observed sequence. We also prove similar results in total variation.


IEEE Transactions on Network Science and Engineering | 2017

A Test of Hypotheses for Random Graph Distributions Built from EEG Data

Andressa Cerqueira; Daniel Fraiman; Claudia D. Vargas; Florencia Leonardi

The theory of random graphs has been applied in recent years to model neural interactions in the brain. While the probabilistic properties of random graphs has been extensively studied, the development of statistical inference methods for this class of objects has received less attention. In this work we propose a non-parametric test of hypotheses to test if a sample of random graphs was generated by a given probability distribution (one-sample test) or if two samples of random graphs were originated from the same probability distribution (two-sample test). We prove a Central Limit Theorem providing the asymptotic distribution of the test statistics and we propose a method to compute the quantiles of the finite sample distributions by simulation. The test makes no assumption on the specific form of the distributions and it is consistent against any alternative hypotheses that differs from the sample distribution on at least one edge-marginal. Moreover, we show that the test is a Kolmogorov-Smirnov type test, for a given distance between graphs, and we study its performance on simulated data. We apply it to compare graphs of brain functional network interactions built from electroencephalographic (EEG) data collected during the visualization of point light displays depicting human locomotion.


brazilian symposium on bioinformatics | 2005

Sequence motif identification and protein family classification using probabilistic trees

Florencia Leonardi; Antonio Galves

Efficient family classification of newly discovered protein sequences is a central problem in bioinformatics. We present a new algorithm, using Probabilistic Suffix Trees, which identifies equivalences between the amino acids in different positions of a motif for each family. We also show that better classification can be achieved identifying representative fingerprints in the amino acid chains.

Collaboration


Dive into the Florencia Leonardi's collaboration.

Top Co-Authors

Avatar

Antonio Galves

University of São Paulo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pablo Groisman

University of Buenos Aires

View shared research outputs
Top Co-Authors

Avatar

Charlotte Galves

State University of Campinas

View shared research outputs
Top Co-Authors

Avatar

Nancy L. Garcia

State University of Campinas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pierre Collet

University of Strasbourg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bruno M. Castro

Federal University of Rio Grande do Norte

View shared research outputs
Researchain Logo
Decentralizing Knowledge