Arnaud Lefebvre | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arnaud Lefebvre is active.

Explore More

Publication

Featured researches published by Arnaud Lefebvre.

Bioinformatics | 2003

FORRepeats: detects repeats on entire chromosomes and between genomes.

Arnaud Lefebvre; Thierry Lecroq; Hélène Dauchel; Joël Alexandre

MOTIVATION As more and more whole genomes are available, there is a need for new methods to compare large sequences and transfer biological knowledge from annotated genomes to related new ones. BLAST is not suitable to compare multimegabase DNA sequences. MegaBLAST is designed to compare closely related large sequences. Some tools to detect repeats in large sequences have already been developed such as MUMmer or REPuter. They also have time or space restrictions. Moreover, in terms of applications, REPuter only computes repeats and MUMmer works better with related genomes. RESULTS We present a heuristic method, named FORRepeats, which is based on a novel data structure called factor oracle. In the first step it detects exact repeats in large sequences. Then, in the second step, it computes approximate repeats and performs pairwise comparison. We compared its computational characteristics with BLAST and REPuter. Results demonstrate that it is fast and space economical. We show FORRepeats ability to perform intra-genomic comparison and to detect repeated DNA sequences in the complete genome of the model plant Arabidopsis thaliana.

Theoretical Computer Science | 2004

Linear-time computation of local periods

Jean-Pierre Duval; Roman Kolpakov; Gregory Kucherov; Thierry Lecroq; Arnaud Lefebvre

We present a linear-time algorithm for computing all local periods of a given word. This subsumes (but is substantially more powerful than) the computation of the (global) period of the word and on the other hand, the computation of a critical factorization, implied by the Critical Factorization Theorem.

Theoretical Informatics and Applications | 2009

Efficient validation and construction of border arrays and validation of string matching automata

Jean-Pierre Duval; Thierry Lecroq; Arnaud Lefebvre

We present an on-line linear time and space algorithm to check if an integer array f is the border array of at least one string w built on a bounded or unbounded size alphabet Σ . First of all, we show a bijection between the border array of a string w and the skeleton of the DFA recognizing Σ*ω, called a string matching automaton (SMA). Different strings can have the same border array but the originality of the presented method is that the correspondence between a border array and a skeleton of SMA is independent from the underlying strings. This enables to design algorithms for validating and generating border arrays that outperform existing ones. The validating algorithm lowers the delay (maximal number of comparisons on one element of the array) from O(|w|) to 1 + min{|Σ|,1 + log 2 |ω|} compared to existing algorithms. We then give results on the numbers of distinct border arrays depending on the alphabet size. We also present an algorithm that checks if a given directed unlabeled graph G is the skeleton of a SMA on an alphabet of size s in linear time. Along the process the algorithm can build one string w for which G is the SMA skeleton.

BMC Bioinformatics | 2012

EVA: Exome Variation Analyzer, an efficient and versatile tool for filtering strategies in medical genomics

Sophie Coutant; Chloé Cabot; Arnaud Lefebvre; Martine Léonard; Élise Prieur-Gaston; Dominique Campion; Thierry Lecroq; Hélène Dauchel

BackgroundWhole exome sequencing (WES) has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of efficient algorithms has been developed to ensure the variant discovery. They generally lead to ~20,000 variations that have to be narrow down to find the potential pathogenic allelic variant(s) and the affected gene(s). For this purpose, commonly adopted procedures which implicate various filtering strategies have emerged: exclusion of common variations, type of the allelics variants, pathogenicity effect prediction, modes of inheritance and multiple individuals for exome comparison. To deal with the expansion of WES in medical genomics individual laboratories, new convivial and versatile software tools have to implement these filtering steps. Non-programmer biologists have to be autonomous combining themselves different filtering criteria and conduct a personal strategy depending on their assumptions and study design.ResultsWe describe EVA (Exome Variation Analyzer), a user-friendly web-interfaced software dedicated to the filtering strategies for medical WES. Thanks to different modules, EVA (i) integrates and stores annotated exome variation data as strictly confidential to the project owner, (ii) allows to combine the main filters dealing with common variations, molecular types, inheritance mode and multiple samples, (iii) offers the browsing of annotated data and filtered results in various interactive tables, graphical visualizations and statistical charts, (iv) and finally offers export files and cross-links to external useful databases and softwares for further prioritization of the small subset of sorted candidate variations and genes. We report a demonstrative case study that allowed to identify a new candidate gene related to a rare form of Alzheimer disease.ConclusionsEVA is developed to be a user-friendly, versatile, and efficient-filtering assisting software for WES. It constitutes a platform for data storage and for drastic screening of clinical relevant genetics variations by non-programmer geneticists. Thereby, it provides a response to new needs at the expanding era of medical genomics investigated by WES for both fundamental research and clinical diagnostics.

Information Processing Letters | 2002

Compror: on-line lossless data compression with a factor oracle

Arnaud Lefebvre; Thierry Lecroq

We present in this article a linear time and space data compression method. This method, based on a factor oracle and the computation of the length of repeated suffixes, is easy to implement, fast and gives good compression ratios.

developments in language theory | 2013

Abelian Repetitions in Sturmian Words

Gabriele Fici; Alessio Langiu; Thierry Lecroq; Arnaud Lefebvre; Filippo Mignosi; Élise Prieur-Gaston

We investigate abelian repetitions in Sturmian words. We exploit a bijection between factors of Sturmian words and subintervals of the unitary segment that allows us to study the periods of abelian repetitions by using classical results of elementary Number Theory. If k m denotes the maximal exponent of an abelian repetition of period m, we prove that limsup \(k_{m}/m\ge \sqrt{5}\) for any Sturmian word, and the equality holds for the Fibonacci infinite word. We further prove that the longest prefix of the Fibonacci infinite word that is an abelian repetition of period F j , j > 1, has length F j ( F j + 1 + F j − 1 + 1) − 2 if j is even or F j ( F j + 1 + F j − 1 ) − 2 if j is odd. This allows us to give an exact formula for the smallest abelian periods of the Fibonacci finite words. More precisely, we prove that for j ≥ 3, the Fibonacci word f j has abelian period equal to F n , where \(n = \lfloor{j/2}\rfloor\) if \(j = 0, 1, 2\mod{4}\), or \(n = 1 + \lfloor{j/2}\rfloor\) if \( j = 3\mod{4}\).

International Journal of Computer Mathematics | 2002

A heuristic for computing repeats with a factor oracle: Application to biological sequences

Arnaud Lefebvre; Thierry Lecroq

We present in this article a linear time and space method for the computation of the length of a repeated suffix for each prefix of a given word p . Our method is based on the utilization of the factor oracle of p which is a new and very compact structure introduced in [1], used for representing all the factors of p . We exhibit applications where our method really speeds up the computation of repetitions in words.

Theoretical Computer Science | 2014

Linear computation of unbordered conjugate on unordered alphabet

Jean-Pierre Duval; Thierry Lecroq; Arnaud Lefebvre

We present an algorithm that, given a word w of length n on an unordered alphabet, computes one of its unbordered conjugates. If such a conjugate does not exist, the algorithm computes one of its conjugates that is a power of an unbordered word. The time complexity of the algorithm is O(n): the number of comparisons between letters of w is bounded by 4n.

mathematical foundations of computer science | 2003

Linear-Time Computation of Local Periods

Jean-Pierre Duval; Roman Kolpakov; Gregory Kucherov; Thierry Lecroq; Arnaud Lefebvre

language and automata theory and applications | 2015

Online Computation of Abelian Runs

Gabriele Fici; Thierry Lecroq; Arnaud Lefebvre; Élise Prieur-Gaston

Given a word \(w\) and a Parikh vector \(\mathcal {P}\), an abelian run of period \(\mathcal {P}\) in \(w\) is a maximal occurrence of a substring of \(w\) having abelian period \(\mathcal {P}\). We give an algorithm that finds all the abelian runs of period \(\mathcal {P}\) in a word of length \(n\) in time \(O(n\times |\mathcal {P}|)\) and space \(O(\sigma +|\mathcal {P}|)\).

Explore More