Pierre Peterlongo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pierre Peterlongo is active.

Explore More

Publication

Featured researches published by Pierre Peterlongo.

Algorithms for Molecular Biology | 2009

Lossless Filter for Multiple Repeats with Bounded Edit Distance

Pierre Peterlongo; Gustavo Sacomoto; Alair Pereira do Lago; Nadia Pisanti; Marie-France Sagot

BackgroundIdentifying local similarity between two or more sequences, or identifying repeats occurring at least twice in a sequence, is an essential part in the analysis of biological sequences and of their phylogenetic relationship. Finding such fragments while allowing for a certain number of insertions, deletions, and substitutions, is however known to be a computationally expensive task, and consequently exact methods can usually not be applied in practice.ResultsThe filter TUIUIU that we introduce in this paper provides a possible solution to this problem. It can be used as a preprocessing step to any multiple alignment or repeats inference method, eliminating a possibly large fraction of the input that is guaranteed not to contain any approximate repeat. It consists in the verification of several strong necessary conditions that can be checked in a fast way. We implemented three versions of the filter. The first is simply a straightforward extension to the case of multiple sequences of an application of conditions already existing in the literature. The second uses a stronger condition which, as our results show, enable to filter sensibly more with negligible (if any) additional time. The third version uses an additional condition and pushes the sensibility of the filter even further with a non negligible additional time in many circumstances; our experiments show that it is particularly useful with large error rates. The latter version was applied as a preprocessing of a multiple alignment tool, obtaining an overall time (filter plus alignment) on average 63 and at best 530 times smaller than before (direct alignment), with in most cases a better quality alignment.ConclusionTo the best of our knowledge, TUIUIU is the first filter designed for multiple repeats and for dealing with error rates greater than 10% of the repeats length.

string processing and information retrieval | 2005

Lossless filter for finding long multiple approximate repetitions using a new data structure, the bi-factor array

Pierre Peterlongo; Nadia Pisanti; Frédéric Boyer; Marie-France Sagot

Similarity search in texts, notably biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the resolution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two sequences or occurring twice in the same sequence. In this paper, we present an algorithm called NIMBUS for filtering sequences prior to finding repetitions occurring more than twice in a sequence or in more than two sequences. NIMBUS uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimental results show that the filter can be very efficient: preprocessing with NIMBUS a data set where one wants to find functional elements using a multiple local alignment tool such as GLAM ([7]), the overall execution time can be reduced from 10 hours to 6 minutes while obtaining exactly the same results.

international conference on implementation and application of automata | 2006

Finding common motifs with gaps using finite automata

Pavlos Antoniou; Jan Holub; Costas S. Iliopoulos; Bořivoj Melichar; Pierre Peterlongo

We present an algorithm that uses finite automata to find the common motifs with gaps occurring in all strings belonging to a finite set S = {S1,S2,...,Sr}. In order to find these common motifs we must first identify the factors that exist in each string. Therefore the algorithm begins by constructing a factor automaton for each string Si. To find the common factors of all the strings, the algorithm needs to gather all the factors from the strings together in one data structure and this is achieved by computing an automaton that accepts the union of the above-mentioned automata. Using this automaton we are able to create a new factor alphabet. Based on this factor alphabet a finite automaton is created for each string Si that accepts sequences of all non overlapping factors residing in each string. The intersection of the latter automata produces the finite automaton which accepts all the common subsequences with gaps over the factor alphabet that are present in all the strings of the set S = {S1,S2,...,Sr}. These common subsequences are the common motifs of the strings.

language and automata theory and applications | 2007

Application of suffix trees for the acquisition of common motifs with gaps in a set of strings

Pavlos Antoniou; Maxime Crochemore; Costas S. Iliopoulos; Pierre Peterlongo

prague stringology conference | 2006

The gapped-factor tree.

Pierre Peterlongo; Julien Allali; Marie-France Sagot

Archive | 2011

Mapsembler, targeted assembly of larges genomes on a desktop computer

Pierre Peterlongo; Rayan Chikhi

International Conference on Holobionts | 2017

A transcriptomic approach to study marine plankton holobionts

Arnaud Meng; Erwan Corre; Pierre Peterlongo; Camille Marchet; Adriana Alberti; Corinne Da Silva; Patrick Wincker; Ian Probert; Noritoshi Suzuki; Stéphane Le Crom; Lucie Bittner; Fabrice Not

Archive | 2016