Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nicolas Philippe is active.

Publication


Featured researches published by Nicolas Philippe.


Genome Biology | 2013

CRAC: an integrated approach to the analysis of RNA-seq reads

Nicolas Philippe; Mikaël Salson; Thérèse Commes; Eric Rivals

A large number of RNA-sequencing studies set out to predict mutations, splice junctions or fusion RNAs. We propose a method, CRAC, that integrates genomic locations and local coverage to enable such predictions to be made directly from RNA-seq read analysis. A k-mer profiling approach detects candidate mutations, indels and splice or chimeric junctions in each single read. CRAC increases precision compared with existing tools, reaching 99:5% for splice junctions, without losing sensitivity. Importantly, CRAC predictions improve with read length. In cancer libraries, CRAC recovered 74% of validated fusion RNAs and predicted novel recurrent chimeric junctions. CRAC is available at http://crac.gforge.inria.fr.


Nucleic Acids Research | 2014

Combining DGE and RNA-sequencing data to identify new polyA+ non-coding transcripts in the human genome

Nicolas Philippe; Elias Bou Samra; Anthony Boureux; Alban Mancheron; Florence Rufflé; Qiang Bai; John De Vos; Eric Rivals; Thérèse Commes

Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.


Biodata Mining | 2016

On the evaluation of the fidelity of supervised classifiers in the prediction of chimeric RNAs.

Sacha Beaumeunier; Jérôme Audoux; Anthony Boureux; Florence Rufflé; Thérèse Commes; Nicolas Philippe; Ronnie Alves

BackgroundHigh-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis.ResultsThe task of discriminating true chRNAs from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artifacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases. Moreover, if we succeed to have a proper set of observations (enough sequencing data) about true chRNAs, chances are that the devised model can not be able to generalize beyond it. Like any other machine learning problem, the first big issue is finding the good data to build models. As far as we were concerned, there is no common benchmark data available for chRNAs detection. The definition of a classification baseline is lacking in the related literature too. In this work we are moving towards benchmark data and an evaluation of the fidelity of supervised classifiers in the prediction of chRNAs.ConclusionsWe proposed a modelization strategy that can be used to increase the tools performances in context of chRNA classification based on a simulated data generator, that permit to continuously integrate new complex chimeric events. The pipeline incorporated a genome mutation process and simulated RNA-seq data. The reads within distinct depth were aligned and analysed by CRAC that integrates genomic location and local coverage, allowing biological predictions at the read scale. Additionally, these reads were functionally annotated and aggregated to form chRNAs events, making it possible to evaluate ML methods (classifiers) performance in both levels of reads and events. Ensemble learning strategies demonstrated to be more robust to this classification problem, providing an average AUC performance of 95 % (ACC=94 %, Kappa=0.87 %). The resulting classification models were also tested on real RNA-seq data from a set of twenty-seven patients with acute myeloid leukemia (AML).


Genome Biology | 2017

DE-kupl: exhaustive capture of biological variation in RNA-seq data through k-mer decomposition

Jérôme Audoux; Nicolas Philippe; Rayan Chikhi; Mikaël Salson; Mélina Gallopin; Marc Gabriel; Jérémy Le Coz; Emilie Drouineau; Thérèse Commes; Daniel Gautheret

We introduce a k-mer-based computational protocol, DE-kupl, for capturing local RNA variation in a set of RNA-seq libraries, independently of a reference genome or transcriptome. DE-kupl extracts all k-mers with differential abundance directly from the raw data files. This enables the retrieval of virtually all variation present in an RNA-seq data set. This variation is subsequently assigned to biological events or entities such as differential long non-coding RNAs, splice and polyadenylation variants, introns, repeats, editing or mutation events, and exogenous RNA. Applying DE-kupl to human RNA-seq data sets identified multiple types of novel events, reproducibly across independent RNA-seq experiments.


BMC Bioinformatics | 2017

SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines

Jérôme Audoux; Mikaël Salson; Christophe Grosset; Sacha Beaumeunier; Jean-Marc Holder; Thérèse Commes; Nicolas Philippe

BackgroundThe evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices.ResultsTo evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved.ConclusionOur research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/.


database and expert systems applications | 2015

The Role of Machine Learning in Finding Chimeric RNAs

Sacha Beaumeunier; Jérôme Audoux; Anthony Boureux; Thérèse Commes; Nicolas Philippe; Ronnie Alves

High-throughput sequencing technology and bioinformatics have identified chimeric RNAs (chRNAs), raising the possibility of chRNAs expressing particularly in diseases can be used as potential biomarkers in both diagnosis and prognosis. The task of discriminating true chRNA from the false ones poses an interesting Machine Learning (ML) challenge. First of all, the sequencing data may contain false reads due to technical artefacts and during the analysis process, bioinformatics tools may generate false positives due to methodological biases. Thus predicting the real signal from the noise can be a hard task. Furthermore, even if we succeed to have a proper set of observations (enough sequencing data) about true chRNAs, chances are that the devised model can not be able to generalize beyond it. Like any other machine learning problem, the first big issue is finding the good data, observations, to build the prediction model. Unfortunately, as far as we were concerned, there is no common benchmark data available for chRNAs. And, the definition of a classification baseline is lacking in the related literature. In this work we are moving towards a benchmark data and a fair comparison analysis unraveling the role of ML techniques in finding chRNAs. We have developed a benchmark pipeline incorporating a mutated genome process and simulated RNA-seq data by Flux Simulator. These sequencing reads were aligned and annotated by CRAC. CRAC offers a new way to analyze the RNA-seq data by integrating genomic location and local coverage, allowing biological predictions in one step. The resulting data were used as a benchmark for our comparison analysis. We have observed that the no free lunch theorem do not hold for ensemble classifiers. Ensemble learning strategies demonstrated to be more robust to this classification problem, providing an average AUC performance of 95% (ACC=94%, Kappa=0.87%).


Human Reproduction Update | 2016

Long non-coding RNAs in human early embryonic development and their potential in ART

Julien Bouckenheimer; S. Assou; Sébastien Riquier; Cyrielle Hou; Nicolas Philippe; Caroline Sansac; Thierry Lavabre-Bertrand; Thérèse Commes; Jean-Marc Lemaître; Anthony Boureux; John De Vos


BMC Bioinformatics | 2011

Querying large read collections in main memory: a versatile data structure

Nicolas Philippe; Mikaël Salson; Thierry Lecroq; Martine Léonard; Thérèse Commes; Eric Rivals


EMBnet.journal | 2012

A combinatorial and integrated method to analyse RNA-seq reads

Nicolas Philippe; Mikaël Salson; Thérèse Commes; Eric Rivals


F1000Research | 2017

New chimeric RNAs in acute myeloid leukemia

Florence Rufflé; Jérôme Audoux; Anthony Boureux; Sacha Beaumeunier; Jean-Baptiste Gaillard; Elias Bou Samra; André Mégarbané; Bruno Cassinat; Christine Chomienne; Ronnie Alves; Sébastien Riquier; Nicolas Gilbert; Jean-Marc Lemaitre; Delphine Bacq-Daian; Anne Laure Bougé; Nicolas Philippe; Thérèse Commes

Collaboration


Dive into the Nicolas Philippe's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eric Rivals

Helsinki University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jérôme Audoux

University of Montpellier

View shared research outputs
Top Co-Authors

Avatar

Anthony Boureux

Helsinki University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eric Rivals

Helsinki University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ronnie Alves

University of Montpellier

View shared research outputs
Researchain Logo
Decentralizing Knowledge