Géraldine Jean
University of Nantes
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Géraldine Jean.
BMC Medical Genomics | 2015
Guillaume Fertin; Géraldine Jean; Andreea Radulescu; Irena Rusu
BackgroundAs one of the most studied genome rearrangements, tandem repeats have a considerable impact on genetic backgrounds of inherited diseases. Many methods designed for tandem repeat detection on reference sequences obtain high quality results. However, in the case of a de novo context, where no reference sequence is available, tandem repeat detection remains a difficult problem. The short reads obtained with the second-generation sequencing methods are not long enough to span regions that contain long repeats. This length limitation was tackled by the long reads obtained with the third-generation sequencing platforms such as Pacific Biosciences technologies. Nevertheless, the gain on the read length came with a significant increase of the error rate. The main objective of nowadays studies on long reads is to handle the high error rate up to 16%.MethodsIn this paper we present MixTaR, the first de novo method for tandem repeat detection that combines the high-quality of short reads and the large length of long reads. Our hybrid algorithm uses the set of short reads for tandem repeat pattern detection based on a de Bruijn graph. These patterns are then validated using the long reads, and the tandem repeat sequences are constructed using local greedy assemblies.ResultsMixTaR is tested with both simulated and real reads from complex organisms. For a complete analysis of its robustness to errors, we use short and long reads with different error rates. The results are then analysed in terms of number of tandem repeats detected and the length of their patterns.ConclusionsOur method shows high precision and sensitivity. With low false positive rates even for highly erroneous reads, MixTaR is able to detect accurate tandem repeats with pattern lengths varying within a significant interval.
Scientific Reports | 2018
Dinka Mandakovic; Claudia Rojas; Jonathan Maldonado; Mauricio Latorre; Dante Travisany; Erwan Delage; Audrey Bihouée; Géraldine Jean; Francisca Díaz; Beatriz Fernández-Gómez; Pablo Cabrera; Alexis Gaete; Claudio Latorre; Rodrigo A. Gutiérrez; Alejandro Maass; Verónica Cambiazo; Sergio A. Navarrete; Damien Eveillard; Mauricio González
Understanding the factors that modulate bacterial community assembly in natural soils is a longstanding challenge in microbial community ecology. In this work, we compared two microbial co-occurrence networks representing bacterial soil communities from two different sections of a pH, temperature and humidity gradient occurring along a western slope of the Andes in the Atacama Desert. In doing so, a topological graph alignment of co-occurrence networks was used to determine the impact of a shift in environmental variables on OTUs taxonomic composition and their relationships. We observed that a fraction of association patterns identified in the co-occurrence networks are persistent despite large environmental variation. This apparent resilience seems to be due to: (1) a proportion of OTUs that persist across the gradient and maintain similar association patterns within the community and (2) bacterial community ecological rearrangements, where an important fraction of the OTUs come to fill the ecological roles of other OTUs in the other network. Actually, potential functional features suggest a fundamental role of persistent OTUs along the soil gradient involving nitrogen fixation. Our results allow identifying factors that induce changes in microbial assemblage configuration, altering specific bacterial soil functions and interactions within the microbial communities in natural environments.
workshop on algorithms in bioinformatics | 2016
Guillaume Fertin; Géraldine Jean; Eric Tannier
All combinatorial works on genome rearrangements have so far ignored the influence of intergene sizes, i.e. the number of nucleotides between consecutive genes, although it was recently shown decisive for the accuracy of the inference methods [3, 4]. In this line, we define a new genome rearrangement model called wDCJ, a generalization of the well-known Double Cut and Join (or DCJ) model that allows for modifying both the gene order and the intergene size distribution of a genome. We first provide a generic formula for the wDCJ distance between two genomes, and show that computing this distance is strongly NP-complete. We then propose an approximation algorithm of ratio 3/2, and two exact ones: a fixed parameterized (FPT) algorithm and an ILP formulation. We finally provide theoretical and empirical bounds on the expected growth of the parameter at the center of our FPT and ILP algorithms, assuming a probabilistic model of evolution under wDCJ, which shows that both these algorithms should run reasonably fast in practice.
bioinformatics and biomedicine | 2014
Guillaume Fertin; Géraldine Jean; Andreea Radulescu; Irena Rusu
Genomes present various types of repeated structures having important roles in the mechanism of evolution. In particular, tandem repeats are analysed for their impact on genetic backgrounds of inherited diseases. However, the main objective of todays de novo assemblers is to output long, high-quality, assembled sequences; to this end, they use heuristic-based assembling procedures, which can leave many repeated regions unassembled - and in particular exact tandem repeats - due to the genomes complexity. In this paper, we propose an effective method, called DExTaR, that improves the detection of exact tandem repeats (ETRs) in any de novo de Bruijn assembly. DExTaR is based on a de Bruijn graph constructed by an assembler and retrieves ETRs left unassembled. When used with the well-known assembler ABySS, we show that DExTaR is able to obtain high quality results in terms of number and length of the detected ETRs.
theory and applications of models of computation | 2017
Guillaume Fertin; Julien Fradin; Géraldine Jean
Given a vertex-colored arc-weighted directed acyclic graph G, the Maximum Colorful Subtree problem (or MCS) aims at finding an arborescence of maximum weight in G, in which no color appears more than once. The problem was originally introduced in [2] in the context of de novo identification of metabolites by tandem mass spectrometry. However, a thorough analysis of the initial motivation shows that the formal definition of MCS needs to be amended, since the input graph G actually possesses two extra properties, which are so far unexploited. This leads us to introduce in this paper a more precise model that we call Maximum Colorful Arborescence (MCA), and extensively study it in terms of algorithmic complexity. In particular, we show that exploiting the implied color hierarchy of the input graph can lead to polynomial algorithms. We also develop Fixed-Parameter Tractable (FPT) algorithms for the problem, notably using the “dual parameter” \(\ell \), defined as the number of vertices of G which are not kept in the solution.
brazilian symposium on bioinformatics | 2018
Andre Rodrigues Oliveira; Géraldine Jean; Guillaume Fertin; Ulisses Dias; Zanoni Dias
The evolutionary distance between two genomes can be estimated by computing the minimum length sequence of operations, called genome rearrangements, that transform one genome into another. Usually, a genome is modeled as an ordered sequence of (possibly signed) genes, and almost all the studies that have been undertaken in the genome rearrangement literature consist in shaping biological scenarios into mathematical models: for instance, allowing different genome rearrangements operations at the same time, adding constraints to these rearrangements (e.g., each rearrangement can affect at most a given number k of genes), considering that a rearrangement implies a cost depending on its length rather than a unit cost, etc. However, most of the works in the field have overlooked some important features inside genomes, such as the presence of sequences of nucleotides between genes, called intergenic regions. In this work, we investigate the problem of computing the distance between two genomes, taking into account both gene order and intergenic sizes; the genome rearrangement operation we consider here is a constrained type of reversals, called super short reversals, which affect up to two (consecutive) genes. We propose here three algorithms to solve the problem: a 3-approximation algorithm that applies to any instance, and two additional algorithms that apply only on specific types of genomes with respect to their gene order: the first one is an exact algorithm, while the second is a 2-approximation algorithm.
Discrete Applied Mathematics | 2017
Guillaume Fertin; Loïc Jankowiak; Géraldine Jean
Abstract The Sorting by Prefix Reversals problem consists in sorting the elements of a given permutation π using a minimum number of prefix reversals, i.e. reversals that always involve the leftmost element of π . A natural extension of this problem is to consider strings, in which any letter may appear several times, rather than permutations. In strings, three different types of problems arise: grouping (given a string S , transform it so that all identical letters are consecutive), sorting (a constrained version of grouping, in which the target string must be lexicographically ordered) and rearranging (given two strings S and T , transform S into T ). In this paper, we study these three problems, under an algorithmic viewpoint, in the setting where two operations, rather than one, are allowed: namely, prefix and suffix reversals — where a suffix reversal must always involve the rightmost element of the string. We first compare the “prefix reversals only” case to our case, before presenting a series of algorithmic results on these three problems concerning polynomiality, constant ratio approximation algorithms, NP-hardness and fixed-parameterized tractability. These results depend on the size k of the alphabet on which the strings are built, with a particular focus on small-sized alphabet instances (i.e., k = O ( 1 ) ) and big-sized alphabet instances (i.e. n − k = O ( 1 ) , where n is the length of the input string(s)).
Algorithms for Next-Generation Sequencing Data | 2017
Géraldine Jean; Andreea Radulescu; Irena Rusu
DNA sequencing, assuming no prior knowledge on the target DNA fragment, may be roughly described as the succession of two steps. The first of them uses some sequencing technology to output, for a given DNA fragment (not necessarily a whole genome), a collection of possibly overlapping sequences (called reads) representing small parts of the initial DNA fragment. The second one aims at recovering the sequence of the entire DNA fragment by assembling the reads.
Algorithms for Molecular Biology | 2017
Guillaume Fertin; Géraldine Jean; Eric Tannier
string processing and information retrieval | 2015
Guillaume Fertin; Loïc Jankowiak; Géraldine Jean