Is this you? Create Your Porfile

Mireille Régnier

French Institute for Research in Computer Science and Automation

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mireille Régnier is active.

Explore More

Publication

Featured researches published by Mireille Régnier.

Nature Biotechnology | 2005

Assessing computational tools for the discovery of transcription factor binding sites

Martin Tompa; Nan Li; Timothy L. Bailey; George M. Church; Bart De Moor; Eleazar Eskin; Alexander V. Favorov; Martin C. Frith; Yutao Fu; W. James Kent; Vsevolod J. Makeev; Andrei A. Mironov; William Stafford Noble; Giulio Pavesi; Mireille Régnier; Nicolas Simonis; Saurabh Sinha; Gert Thijs; Jacques van Helden; Mathias Vandenbogaert; Zhiping Weng; Christopher T. Workman; Chun Ye; Zhou Zhu

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

international symposium on information theory | 1997

On pattern frequency occurrences in a Markovian sequence

Mireille Régnier; Wojciech Szpankowski

Abstract. Consider a given pattern H and a random text T generated by a Markovian source. We study the frequency of pattern occurrences in a random text when overlapping copies of the pattern are counted separately. We present exact and asymptotic formulae for moments (including the variance), and probability of r pattern occurrences for three different regions of r , namely: (i) r=O(1) , (ii) central limit regime, and (iii) large deviations regime. In order to derive these results, we first construct certain language expressions that characterize pattern occurrences which are later translated into generating functions. We then use analytical methods to extract asymptotic behaviors of the pattern frequency from the generating functions. These findings are of particular interest to molecular biology problems (e.g., finding patterns with unexpectedly high or low frequencies, and gene recognition), information theory (e.g., second-order properties of the relative frequency), and pattern matching algorithms (e.g., q -gram algorithms).

colloquium on trees in algebra and programming | 1986

Trie Partitioning Process: Limiting Distributions

Philippe Jacquet; Mireille Régnier

This paper is devoted to the well-known trie structure. We consider two basic parameters: depth of the leaves and height when the trie is formed with n items. We prove the convergence of their distributions and of their moments of any order when n → ∞ to a limit distribution. We exhibit the limits : a periodic distribution or a normal distribution. The results are given for uniform or biased data distributions for Bernoulli and Poisson models. Our reasoning is based on generating and characteristic functions. We make an extensive use of analytic functions and asymptotic methods.

Discrete Applied Mathematics | 2000

A unified approach to word occurrence probabilities

Mireille Régnier

Abstract Evaluation of the expected frequency of occurrences of a given set of patterns in a DNA sequence has numerous applications and has been extensively studied recently. We provide a unified framework for this evaluation that adapts to various constraints and allow to extend previous results. We assume successively that the patterns may, then may not, overlap. We derive exact formulae for the moments in a Markovian model, that are linear functions of the size of the sequence. We show that our formulae, that occasionally simplify previous results, are computable at low cost, which makes them useful for practical applications.

Bioinformatics | 2006

Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression

Valentina Boeva; Mireille Régnier; Dmitri Papatsenko; Vsevolod J. Makeev

MOTIVATION Genomic sequences are highly redundant and contain many types of repetitive DNA. Fuzzy tandem repeats (FTRs) are of particular interest. They are found in regulatory regions of eukaryotic genes and are reported to interact with transcription factors. However, accurate assessment of FTR occurrences in different genome segments requires specific algorithm for efficient FTR identification and classification. RESULTS We have obtained formulas for P-values of FTR occurrence and developed an FTR identification algorithm implemented in TandemSWAN software. Using TandemSWAN we compared the structure and the occurrence of FTRs with short period length (up to 24 bp) in coding and non-coding regions including UTRs, heterochromatic, intergenic and enhancer sequences of Drosophila melanogaster and Drosophila pseudoobscura. Tandems with period three and its multiples were found in coding segments, whereas FTRs with periods multiple of six are overrepresented in all non-coding segment. Periods equal to 5-7 and 11-14 were characteristic of the enhancer regions and other non-coding regions close to genes. AVAILABILITY TandemSWAN web page, stand-alone version and documentation can be found at http://bioinform.genetika.ru/projects/swan/www/ SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

IEEE Transactions on Information Theory | 1989

New results on the size of tries

Mireille Régnier; Philippe Jacquet

A precise asymptotic expansion of the variance of the size of a trie built on random binary strings is presented. This data structure appears in some hashing schemes and communications protocols. The variance is asymptotically linear, and numerical results are given. The reader is referred to an earlier work for formal proofs. >

Algorithms for Molecular Biology | 2007

Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules

Valentina Boeva; Julien Clement; Mireille Régnier; Mikhail A. Roytberg; Vsevolod J. Makeev

Backgroundcis-Regulatory modules (CRMs) of eukaryotic genes often contain multiple binding sites for transcription factors. The phenomenon that binding sites form clusters in CRMs is exploited in many algorithms to locate CRMs in a genome. This gives rise to the problem of calculating the statistical significance of the event that multiple sites, recognized by different factors, would be found simultaneously in a text of a fixed length. The main difficulty comes from overlapping occurrences of motifs. So far, no tools have been developed allowing the computation of p-values for simultaneous occurrences of different motifs which can overlap.ResultsWe developed and implemented an algorithm computing the p-value that s different motifs occur respectively k1, ..., ksor more times, possibly overlapping, in a random text. Motifs can be represented with a majority of popular motif models, but in all cases, without indels. Zero or first order Markov chains can be adopted as a model for the random text. The computational tool was tested on the set of cis-regulatory modules involved in D. melanogaster early development, for which there exists an annotation of binding sites for transcription factors. Our test allowed us to correctly identify transcription factors cooperatively/competitively binding to DNA.MethodThe algorithm that precisely computes the probability of simultaneous motif occurrences is inspired by the Aho-Corasick automaton and employs a prefix tree together with a transition function. The algorithm runs with the O(n|Σ|(m|ℋMathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@| + K|σ|K) ∏iki) time complexity, where n is the length of the text, |Σ| is the alphabet size, m is the maximal motif length, |ℋMathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFlecsaaa@3762@| is the total number of words in motifs, K is the order of Markov model, and kiis the number of occurrences of the i th motif.ConclusionThe primary objective of the program is to assess the likelihood that a given DNA segment is CRM regulated with a known set of regulatory factors. In addition, the program can also be used to select the appropriate threshold for PWM scanning. Another application is assessing similarity of different motifs.AvailabilityProject web page, stand-alone version and documentation can be found at http://bioinform.genetika.ru/AhoPro/

Bit Numerical Mathematics | 1985

Analysis of grid file algorithms

Mireille Régnier

Grid file algorithms were suggested in [12] to provide multi-key access to records in a dynamically growing file. We specify here two algorithms and derive the average sizes of the corresponding directories. We provide an asymptotic analysis. The growth of the indexes appears to be non-linear for uniform distributions:O(vc) orO(vξ), wherec=1+b−1, ξ=1+(s-1)/(sb+1),s is the number of attributes being used,v the file size, andb the page capacity of the system. Finally we give corresponding results for biased distributions and compare transient phases.

Archive | 1985

Some Uses of the Mellin Integral Transform in the Analysis of Algorithms

Philippe Flajolet; Mireille Régnier; Robert Sedgewick

We informally survey some uses of the Mellin integral transform in the context of the asymptotic evaluation of combinatorial sums arising in the analysis of algorithms.

North-holland Mathematics Studies | 1985

ALGEBRAIC METHODS FOR TRIE STATISTICS

Philippe Flajolet; Mireille Régnier; Dominique Sotteau

Tries are a data structure commonly used to represent sets of binary data. They also constitute a convenient way of modelling a number of algorithms to factorise polynomials, to implement communication protocols or to access files on disk. We present here a systematic method for analysing, in the average case, trie parameters through generating functions and conclude with several applications.

Explore More