Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Élise Prieur-Gaston is active.

Publication


Featured researches published by Élise Prieur-Gaston.


BMC Bioinformatics | 2012

EVA: Exome Variation Analyzer, an efficient and versatile tool for filtering strategies in medical genomics

Sophie Coutant; Chloé Cabot; Arnaud Lefebvre; Martine Léonard; Élise Prieur-Gaston; Dominique Campion; Thierry Lecroq; Hélène Dauchel

BackgroundWhole exome sequencing (WES) has become the strategy of choice to identify a coding allelic variant for a rare human monogenic disorder. This approach is a revolution in medical genetics history, impacting both fundamental research, and diagnostic methods leading to personalized medicine. A plethora of efficient algorithms has been developed to ensure the variant discovery. They generally lead to ~20,000 variations that have to be narrow down to find the potential pathogenic allelic variant(s) and the affected gene(s). For this purpose, commonly adopted procedures which implicate various filtering strategies have emerged: exclusion of common variations, type of the allelics variants, pathogenicity effect prediction, modes of inheritance and multiple individuals for exome comparison. To deal with the expansion of WES in medical genomics individual laboratories, new convivial and versatile software tools have to implement these filtering steps. Non-programmer biologists have to be autonomous combining themselves different filtering criteria and conduct a personal strategy depending on their assumptions and study design.ResultsWe describe EVA (Exome Variation Analyzer), a user-friendly web-interfaced software dedicated to the filtering strategies for medical WES. Thanks to different modules, EVA (i) integrates and stores annotated exome variation data as strictly confidential to the project owner, (ii) allows to combine the main filters dealing with common variations, molecular types, inheritance mode and multiple samples, (iii) offers the browsing of annotated data and filtered results in various interactive tables, graphical visualizations and statistical charts, (iv) and finally offers export files and cross-links to external useful databases and softwares for further prioritization of the small subset of sorted candidate variations and genes. We report a demonstrative case study that allowed to identify a new candidate gene related to a rare form of Alzheimer disease.ConclusionsEVA is developed to be a user-friendly, versatile, and efficient-filtering assisting software for WES. It constitutes a platform for data storage and for drastic screening of clinical relevant genetics variations by non-programmer geneticists. Thereby, it provides a response to new needs at the expanding era of medical genomics investigated by WES for both fundamental research and clinical diagnostics.


BMC Bioinformatics | 2013

Combining MEDLINE and publisher data to create parallel corpora for the automatic translation of biomedical text

Antonio Jimeno Yepes; Élise Prieur-Gaston; Aurélie Névéol

BackgroundMost of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain.ResultsWe show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text.ConclusionsWe have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts.


developments in language theory | 2013

Abelian Repetitions in Sturmian Words

Gabriele Fici; Alessio Langiu; Thierry Lecroq; Arnaud Lefebvre; Filippo Mignosi; Élise Prieur-Gaston

We investigate abelian repetitions in Sturmian words. We exploit a bijection between factors of Sturmian words and subintervals of the unitary segment that allows us to study the periods of abelian repetitions by using classical results of elementary Number Theory. If k m denotes the maximal exponent of an abelian repetition of period m, we prove that limsup \(k_{m}/m\ge \sqrt{5}\) for any Sturmian word, and the equality holds for the Fibonacci infinite word. We further prove that the longest prefix of the Fibonacci infinite word that is an abelian repetition of period F j , j > 1, has length F j ( F j + 1 + F j − 1 + 1) − 2 if j is even or F j ( F j + 1 + F j − 1 ) − 2 if j is odd. This allows us to give an exact formula for the smallest abelian periods of the Fibonacci finite words. More precisely, we prove that for j ≥ 3, the Fibonacci word f j has abelian period equal to F n , where \(n = \lfloor{j/2}\rfloor\) if \(j = 0, 1, 2\mod{4}\), or \(n = 1 + \lfloor{j/2}\rfloor\) if \( j = 3\mod{4}\).


BMC Bioinformatics | 2012

Matching health information seekers' queries to medical terms

Lina Fatima Soualmia; Élise Prieur-Gaston; Zied Moalla; Thierry Lecroq; Stéfan Jacques Darmoni

BackgroundThe Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation. Several methods have been proposed to improve information retrieval: query expansion, syntactic and semantic techniques or knowledge-based methods. However, it would be useful to clean those queries which are misspelled. In this paper, we propose a simple yet efficient method in order to correct misspellings of queries submitted by health information seekers to a medical online search tool.MethodsIn addition to query normalizations and exact phonetic term matching, we tested two approximate string comparators: the similarity score function of Stoilos and the normalized Levenshtein edit distance. We propose here to combine them to increase the number of matched medical terms in French. We first took a sample of query logs to determine the thresholds and processing times. In the second run, at a greater scale we tested different combinations of query normalizations before or after misspelling correction with the retained thresholds in the first run.ResultsAccording to the total number of suggestions (around 163, the number of the first sample of queries), at a threshold comparator score of 0.3, the normalized Levenshtein edit distance gave the highest F-Measure (88.15%) and at a threshold comparator score of 0.7, the Stoilos function gave the highest F-Measure (84.31%). By combining Levenshtein and Stoilos, the highest F-Measure (80.28%) is obtained with 0.2 and 0.7 thresholds respectively. However, queries are composed by several terms that may be combination of medical terms. The process of query normalization and segmentation is thus required. The highest F-Measure (64.18%) is obtained when this process is realized before spelling-correction.ConclusionsDespite the widely known high performance of the normalized edit distance of Levenshtein, we show in this paper that its combination with the Stoilos algorithm improved the results for misspelling correction of user queries. Accuracy is improved by combining spelling, phoneme-based information and string normalizations and segmentations into medical terms. These encouraging results have enabled the integration of this method into two projects funded by the French National Research Agency-Technologies for Health Care. The first aims to facilitate the coding process of clinical free texts contained in Electronic Health Records and discharge summaries, whereas the second aims at improving information retrieval through Electronic Health Records.


language and automata theory and applications | 2015

Online Computation of Abelian Runs

Gabriele Fici; Thierry Lecroq; Arnaud Lefebvre; Élise Prieur-Gaston

Given a word \(w\) and a Parikh vector \(\mathcal {P}\), an abelian run of period \(\mathcal {P}\) in \(w\) is a maximal occurrence of a substring of \(w\) having abelian period \(\mathcal {P}\). We give an algorithm that finds all the abelian runs of period \(\mathcal {P}\) in a word of length \(n\) in time \(O(n\times |\mathcal {P}|)\) and space \(O(\sigma +|\mathcal {P}|)\).


Theoretical Computer Science | 2016

Binary block order Rouen Transform

Jacqueline W. Daykin; Richard Groult; Yannick Guesnet; Thierry Lecroq; Arnaud Lefebvre; Martine Léonard; Élise Prieur-Gaston

Novel twin binary Burrows-Wheeler type transforms are introduced.The transforms are defined for Lyndon-like B-words which apply binary block order.We call this approach the B-BWT Rouen Transform.These bijective Rouen Transforms and inverses are computed in linear time.Preliminary experimental results indicate potential value of binary transforms. We introduce bijective Burrows-Wheeler type transforms for binary strings.1 The original method by Burrows and Wheeler 4 is based on lexicographic order for general alphabets, and the transform is defined to be the last column of the ordered BWT matrix. This new approach applies binary block order, B-order, which yields not one, but twin transforms: one based on Lyndon words, the other on a repetition of Lyndon words. These binary B-BWT transforms are constructed here for B-words, analogous structures to Lyndon words. A key computation in the transforms is the application of a linear-time suffix-sorting technique, such as 18,21,22,27, to sort the cyclic rotations of a binary input string into their B-order. Moreover, like the original lexicographic transform, we show that computing the B-BWT inverses is also achieved in linear time by using straightforward combinatorial arguments.


Theoretical Computer Science | 2016

Fast computation of abelian runs

Gabriele Fici; Tomasz Kociumaka; Thierry Lecroq; Arnaud Lefebvre; Élise Prieur-Gaston

Abstract Given a word w and a Parikh vector P , an abelian run of period P in w is a maximal occurrence of a substring of w having abelian period P . Our main result is an online algorithm that, given a word w of length n over an alphabet of cardinality σ and a Parikh vector P , returns all the abelian runs of period P in w in time O ( n ) and space O ( σ + p ) , where p is the norm of P , i.e., the sum of its components. We also present an online algorithm that computes all the abelian runs with periods of norm p in w in time O ( n p ) , for any given norm p . Finally, we give an O ( n 2 ) -time offline randomized algorithm for computing all the abelian runs of w . Its deterministic counterpart runs in O ( n 2 log ⁡ σ ) time.


Archive | 2011

Correction orthographique de requêtes: L’apport des distances de Levenshtein et Stoilos

Zied Moalla; Lina Fatima Soualmia; Élise Prieur-Gaston; Stéfan Jacques Darmoni

Background: Medical text repositories not only constitute a significant amount of data but also represent an interesting scientific test bed for those willing to apply natural language processing to information retrieval. In order to improve retrieval performance of the Catalogue and Index of Health Resources in French (CISMeF) and its search tool Doc’CISMeF, we tested a new method to correct misspellings of the queries written by the users. Methods: In addition to exact phonetic term matching, we tested two approximate string comparators. The approximate comparators are the string distance metric of Stoilos and the Levenshtein edit distance. We also calculated the results of the two-combined algorithm to examine whether it improves misspelling correction of the queries. Results: At a threshold comparator score of 0.2, the normalized Levenshtein algorithm achieved the highest recall of 76% but the highest precision 94% is achieved by combining the distances of Levenshtein and Stoilos. Conclusion: Although the well-known good performance of the normalized edit distance of Levenshtein, we have demonstrated in this paper that its combination with the Stoilos algorithm improves the results for misspelling correction.


prague stringology conference | 2011

Computing abelian periods in words

Gabriele Fici; Thierry Lecroq; Arnaud Lefebvre; Élise Prieur-Gaston


prague stringology conference | 2012

Quasi-linear Time Computation of the Abelian Periods of a Word

Gabriele Fici; Thierry Lecroq; Arnaud Lefebvre; Élise Prieur-Gaston; William F. Smyth

Collaboration


Dive into the Élise Prieur-Gaston's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge