Maxime Crochemore | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maxime Crochemore is active.

Explore More

Publication

Featured researches published by Maxime Crochemore.

string processing and information retrieval | 2010

Extracting powers and periods in a string from its runs structure

Maxime Crochemore; Costas S. Iliopoulos; Marcin Kubica; Jakub Radoszewski; Wojciech Rytter; Tomasz Waleń

A breakthrough in the field of text algorithms was the discovery of the fact that the maximal number of runs in a string of length n is O(n) and that they can all be computed in O(n) time. We study some applications of this result. New simpler O(n) time algorithms are presented for a few classical string problems: computing all distinct kth string powers for a given k, in particular squares for k = 2, and finding all local periods in a given string of length n. Additionally, we present an efficient algorithm for testing primitivity of factors of a string and computing their primitive roots. Applications of runs, despite their importance, are underrepresented in existing literature (approximately one page in the paper of Kolpakov & Kucherov, 1999). In this paper we attempt to fill in this gap. We use Lyndon words and introduce the Lyndon structure of runs as a useful tool when computing powers. In problems related to periods we use some versions of the Manhattan skyline problem.

language and automata theory and applications | 2010

On the maximal number of cubic runs in a string

Maxime Crochemore; Costas S. Iliopoulos; Marcin Kubica; Jakub Radoszewski; Wojciech Rytter; Tomasz Waleń

A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition v with a period p such that 2p≤|v|. The maximal number of runs in a string of length n has been thoroughly studied, and is known to be between 0.944 n and 1.029 n. In this paper we investigate cubic runs, in which the shortest period p satisfies 3p≤|v|. We show the upper bound of 0.5 n on the maximal number of such runs in a string of length n, and construct an infinite sequence of words over binary alphabet for which the lower bound is 0.406 n.

Information Processing Letters | 1981

An optimal algorithm for computing the repetitions in a word

Maxime Crochemore

A word has a repetition when it has at least two consecutive equal factors. For instance, abab is a repetition (a square) in aababba. Recently, it has been proved that the set of words containing a square is not context-free [3,7 1. This paper presents an algorithm to compute all the repetitions of primitive factors in a word x in time 0( i x i log2 Ix i 9. A straightforward adaption of the Knuth, Morris and Pratt’s string-matching algorithm [S) also allows to solve the problem, but in time O(ixi2). Main and Lorentz have given an 0( I x I log2 i xi) algorithm to find one square in a word x. Their method cannot be directly extended to solve the present problem since they eliminate many repetitions when they are guaranteed to find another one later in the search. Our algorithm uses an improved version of the wellknown partitioning technique [I] for refmements of equivalence relations. This version has already been fruitful in a problem concerning partitions on graphs C2L The optimality of the algorithm is proved by showing that there exist words which have indeed 0( ix i log2 ix i) repetitions. These particular words are Fibonacci words. With a slight modification, the algorithm gives the maximal repetitions of a word. This algorithm is aljo optimal since it computes all the 0( ix I log2 ix I) maximal repetitions of a Fibonacci word x in time O(ixi loga 1x1). 1. Repetitions in words

Algorithms on Strings | 2007

Algorithms on Strings

Maxime Crochemore; Christophe Hancart; Thierry Lecroq

Describing algorithms in a C-like language, this text presents examples related to the automatic processing of natural language, to the analysis of molecular sequences and to the management of textual databases.

Algorithmica | 1994

Speeding up two string-matching algorithms

Maxime Crochemore; Artur Czumaj; Leszek Gasieniec; Stefan Jarominek; Thierry Lecroq; Wojciech Plandowski; Wojciech Rytter

We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern. The main feature of both algorithms is that they scan the text right-to-left from the supposed right position of the pattern. The BM algorithm goes as far as the scanned segment (factor) is a suffix of the pattern. The RF algorithm scans while the segment is a factor of the pattern. Both algorithms make a shift of the pattern, forget the history, and start again. The RF algorithm usually makes bigger shifts than BM, but is quadratic in the worst case. We show that it is enough to remember the last matched segment (represented by two pointers to the text) to speed up the RF algorithm considerably (to make a linear number of inspections of text symbols, with small coefficient), and to speed up the BM algorithm (to make at most 2 ·n comparisons). Only a constant additional memory is needed for the search phase. We give alternative versions of an accelerated RF algorithm: the first one is based on combinatorial properties of primitive words, and the other two use the power of suffix trees extensively. The paper demonstrates the techniques to transform algorithms, and also shows interesting new applications of data structures representing all subwords of the pattern in compact form.

conference on current trends in theory and practice of informatics | 1999

Factor Oracle: A New Structure for Pattern Matching

Cyril Allauzen; Maxime Crochemore; Mathieu Raffinot

We introduce a new automaton on a word p, sequence of letters taken in an alphabet ?, that we call factor oracle. This automaton is acyclic, recognizes at least the factors of p, has m+1 states and a linear number of transitions. We give an on-line construction to build it. We use this new structure in string matching algorithms that we conjecture optimal according to the experimental results. These algorithms are as efficient as the ones that already exist using less memory and being more easy to implement.

Journal of the ACM | 1991

Two-way string-matching

Maxime Crochemore; Dominique Perrin

A new string-matching algorithm is presented, which can be viewed as an intermediate between the classical algorithms of Knuth, Morris, and Pratt on the one hand and Boyer and Moore, on the other hand. The algorithm is linear in time and uses constant space as the algorithm of Galil and Seiferas. It presents the advantage of being remarkably simple which consequently makes its analysis possible. The algorithm relies on a previously known result in combinatorics on words, called the Critical Factorization Theorem,which relates the global period of a word to Its local repetitions of blocks

mathematical foundations of computer science | 2007

Finding patterns in given intervals

Maxime Crochemore; Costas S. Iliopoulos; M. Sohel Rahman

In this paper, we study the pattern matching problem in given intervals. Depending on whether the intervals are given a priori for pre-processing, or during the query along with the pattern or, even in both cases, we develop solutions for different variants of this problem. In particular, we present efficient indexing schemes for each of the above variants of the problem.

Information Processing Letters | 1998

Automata and forbidden words

Maxime Crochemore; Filippo Mignosi; Antonio Restivo

Abstract Let L ( M ) be the (factorial) language avoiding a given anti-factorial language M . We design an automaton accepting L ( M ) and built from the language M . The construction is effective if M is finite. If M is the set of minimal forbidden words of a single word ν, the automaton turns out to be the factor automaton of ν (the minimal automaton accepting the set of factors of ν). We also give an algorithm that builds the trie of M from the factor automaton of a single word. It yields a nontrivial upper bound on the number of minimal forbidden words of a word.

Handbook of formal languages, vol. 2 | 1997