Emanuele Raineri | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Emanuele Raineri is active.

Explore More

Publication

Featured researches published by Emanuele Raineri.

PLOS Biology | 2010

Multi-platform next-generation sequencing of the domestic Turkey (Meleagris gallopavo): Genome assembly and analysis

Rami A. Dalloul; Julie A Long; Aleksey V. Zimin; Luqman Aslam; Kathryn Beal; Le Ann Blomberg; Pascal Bouffard; David W. Burt; Oswald Crasta; R.P.M.A. Crooijmans; Kristal L. Cooper; Roger A. Coulombe; Supriyo De; Mary E. Delany; Jerry B. Dodgson; Jennifer J Dong; Clive Evans; Karin M. Frederickson; Paul Flicek; Liliana Florea; Otto Folkerts; M.A.M. Groenen; Tim Harkins; Javier Herrero; Steve Hoffmann; Hendrik-Jan Megens; Andrew Jiang; Pieter J. de Jong; Peter K. Kaiser; Heebal Kim

The combined application of next-generation sequencing platforms has provided an economical approach to unlocking the potential of the turkey genome.

Science | 2009

Impact of Genome Reduction on Bacterial Metabolism and Its Regulation

Eva Yus; Tobias Maier; Konstantinos Michalodimitrakis; Vera van Noort; Takuji Yamada; Wei-Hua Chen; Judith A. H. Wodke; Marc Güell; Sira Martínez; Ronan Bourgeois; Sebastian Kühner; Emanuele Raineri; Ivica Letunic; Olga V. Kalinina; Michaela Rode; Richard Herrmann; Ricardo Gutiérrez-Gallego; Robert B. Russell; Anne-Claude Gavin; Peer Bork; Luis Serrano

Simply Mycoplasma The bacterium Mycoplasma pneumoniae, a human pathogen, has a genome of reduced size and is one of the simplest organisms that can reproduce outside of host cells. As such, it represents an excellent model organism in which to attempt a systems-level understanding of its biological organization. Now three papers provide a comprehensive and quantitative analysis of the proteome, the metabolic network, and the transcriptome of M. pneumoniae (see the Perspective by Ochman and Raghavan). Anticipating what might be possible in the future for more complex organisms, Kühner et al. (p. 1235) combine analysis of protein interactions by mass spectrometry with extensive structural information on M. pneumoniae proteins to reveal how proteins work together as molecular machines and map their organization within the cell by electron tomography. The manageable genome size of M. pneumoniae allowed Yus et al. (p. 1263) to map the metabolic network of the organism manually and validate it experimentally. Analysis of the network aided development of a minimal medium in which the bacterium could be cultured. Finally, G‡ell et al. (p. 1268) applied state-of-the-art sequencing techniques to reveal that this “simple” organism makes extensive use of noncoding RNAs and has exon- and intron-like structure within transcriptional operons that allows complex gene regulation resembling that of eukaryotes. Reconstruction of a bacterial metabolic network reveals strategies for metabolic control with a genome of reduced size. To understand basic principles of bacterial metabolism organization and regulation, but also the impact of genome size, we systematically studied one of the smallest bacteria, Mycoplasma pneumoniae. A manually curated metabolic network of 189 reactions catalyzed by 129 enzymes allowed the design of a defined, minimal medium with 19 essential nutrients. More than 1300 growth curves were recorded in the presence of various nutrient concentrations. Measurements of biomass indicators, metabolites, and 13C-glucose experiments provided information on directionality, fluxes, and energetics; integration with transcription profiling enabled the global analysis of metabolic regulation. Compared with more complex bacteria, the M. pneumoniae metabolic network has a more linear topology and contains a higher fraction of multifunctional enzymes; general features such as metabolite concentrations, cellular energetics, adaptability, and global gene expression responses are similar, however.

PLOS ONE | 2012

Fast computation and applications of genome mappability.

Thomas Derrien; Jordi Estellé; Santiago Marco Sola; David G. Knowles; Emanuele Raineri; Roderic Guigó; Paolo Ribeca

We present a fast mapping-based algorithm to compute the mappability of each region of a reference genome up to a specified number of mismatches. Knowing the mappability of a genome is crucial for the interpretation of massively parallel sequencing experiments. We investigate the properties of the mappability of eukaryotic DNA/RNA both as a whole and at the level of the gene family, providing for various organisms tracks which allow the mappability information to be visually explored. In addition, we show that mappability varies greatly between species and gene classes. Finally, we suggest several practical applications where mappability can be used to refine the analysis of high-throughput sequencing data (SNP calling, gene expression quantification and paired-end experiments). This work highlights mappability as an important concept which deserves to be taken into full account, in particular when massively parallel sequencing technologies are employed. The GEM mappability program belongs to the GEM (GEnome Multitool) suite of programs, which can be freely downloaded for any use from its website (http://gemlibrary.sourceforge.net).

Nucleic Acids Research | 2012

Modelling and simulating generic RNA-Seq experiments with the flux simulator

Thasso Griebel; Benedikt Zacher; Paolo Ribeca; Emanuele Raineri; Vincent Lacroix; Roderic Guigó; Michael Sammeth

High-throughput sequencing of cDNA libraries constructed from cellular RNA complements (RNA-Seq) naturally provides a digital quantitative measurement for every expressed RNA molecule. Nature, impact and mutual interference of biases in different experimental setups are, however, still poorly understood—mostly due to the lack of data from intermediate protocol steps. We analysed multiple RNA-Seq experiments, involving different sample preparation protocols and sequencing platforms: we broke them down into their common—and currently indispensable—technical components (reverse transcription, fragmentation, adapter ligation, PCR amplification, gel segregation and sequencing), investigating how such different steps influence abundance and distribution of the sequenced reads. For each of those steps, we developed universally applicable models, which can be parameterised by empirical attributes of any experimental protocol. Our models are implemented in a computer simulation pipeline called the Flux Simulator, and we show that read distributions generated by different combinations of these models reproduce well corresponding evidence obtained from the corresponding experimental setups. We further demonstrate that our in silico RNA-Seq provides insights about hidden precursors that determine the final configuration of reads along gene bodies; enhancing or compensatory effects that explain apparently controversial observations can be observed. Moreover, our simulations identify hitherto unreported sources of systematic bias from RNA hydrolysis, a fragmentation technique currently employed by most RNA-Seq protocols.

Nature Communications | 2015

A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing

Tyler Alioto; Ivo Buchhalter; Sophia Derdak; Barbara Hutter; Matthew Eldridge; Eivind Hovig; Lawrence E. Heisler; Timothy Beck; Jared T. Simpson; Laurie Tonon; Anne Sophie Sertier; Ann Marie Patch; Natalie Jäger; Philip Ginsbach; Ruben M. Drews; Nagarajan Paramasivam; Rolf Kabbe; Sasithorn Chotewutmontri; Nicolle Diessl; Christopher Previti; Sabine Schmidt; Benedikt Brors; Lars Feuerbach; Michael Heinold; Susanne Gröbner; Andrey Korshunov; Patrick Tarpey; Adam Butler; Jonathan Hinton; David Jones

As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy.

Nature Genetics | 2015

Whole-genome fingerprint of the DNA methylome during human B cell differentiation.

Marta Kulis; Angelika Merkel; Simon Heath; Ana C. Queirós; Ronald Schuyler; Giancarlo Castellano; Renée Beekman; Emanuele Raineri; Anna Esteve; Guillem Clot; Néria Verdaguer-Dot; Martí Duran-Ferrer; Nuria Russiñol; Roser Vilarrasa-Blasi; Simone Ecker; Vera Pancaldi; Daniel Rico; Lidia Agueda; Julie Blanc; David C. Richardson; Laura Clarke; Avik Datta; Marien Pascual; Xabier Agirre; Felipe Prosper; Diego Alignani; Bruno Paiva; Gersende Caron; Thierry Fest; Marcus O. Muench

We analyzed the DNA methylome of ten subpopulations spanning the entire B cell differentiation program by whole-genome bisulfite sequencing and high-density microarrays. We observed that non-CpG methylation disappeared upon B cell commitment, whereas CpG methylation changed extensively during B cell maturation, showing an accumulative pattern and affecting around 30% of all measured CpG sites. Early differentiation stages mainly displayed enhancer demethylation, which was associated with upregulation of key B cell transcription factors and affected multiple genes involved in B cell biology. Late differentiation stages, in contrast, showed extensive demethylation of heterochromatin and methylation gain at Polycomb-repressed areas, and genes with apparent functional impact in B cells were not affected. This signature, which has previously been linked to aging and cancer, was particularly widespread in mature cells with an extended lifespan. Comparing B cell neoplasms with their normal counterparts, we determined that they frequently acquire methylation changes in regions already undergoing dynamic methylation during normal B cell differentiation.

BMC Genomics | 2013

Dissecting structural and nucleotide genome-wide variation in inbred Iberian pigs

Anna Esteve-Codina; Yogesh Paudel; L. Ferretti; Emanuele Raineri; Hendrik-Jan Megens; L. Silió; Martein Am Groenen; Sebastian E. Ramos-Onsins; Miguel Pérez-Enciso

BackgroundIn contrast to international pig breeds, the Iberian breed has not been admixed with Asian germplasm. This makes it an important model to study both domestication and relevance of Asian genes in the pig. Besides, Iberian pigs exhibit high meat quality as well as appetite and propensity to obesity. Here we provide a genome wide analysis of nucleotide and structural diversity in a reduced representation library from a pool (n=9 sows) and shotgun genomic sequence from a single sow of the highly inbred Guadyerbas strain. In the pool, we applied newly developed tools to account for the peculiarities of these data.ResultsA total of 254,106 SNPs in the pool (79.6 Mb covered) and 643,783 in the Guadyerbas sow (1.47 Gb covered) were called. The nucleotide diversity (1.31x10-3 per bp in autosomes) is very similar to that reported in wild boar. A much lower than expected diversity in the X chromosome was confirmed (1.79x10-4 per bp in the individual and 5.83x10-4 per bp in the pool). A strong (0.70) correlation between recombination and variability was observed, but not with gene density or GC content. Multicopy regions affected about 4% of annotated pig genes in their entirety, and 2% of the genes partially. Genes within the lowest variability windows comprised interferon genes and, in chromosome X, genes involved in behavior like HTR2C or MCEP2. A modified Hudson-Kreitman-Aguadé test for pools also indicated an accelerated evolution in genes involved in behavior, as well as in spermatogenesis and in lipid metabolism.ConclusionsThis work illustrates the strength of current sequencing technologies to picture a comprehensive landscape of variability in livestock species, and to pinpoint regions containing genes potentially under selection. Among those genes, we report genes involved in behavior, including feeding behavior, and lipid metabolism. The pig X chromosome is an outlier in terms of nucleotide diversity, which suggests selective constraints. Our data further confirm the importance of structural variation in the species, including Iberian pigs, and allowed us to identify new paralogs for known gene families.

BMC Bioinformatics | 2012

SNP calling by sequencing pooled samples.

Emanuele Raineri; Luca Ferretti; Anna Esteve-Codina; Bruno Nevado; Simon Heath; Miguel Pérez-Enciso

BackgroundPerforming high throughput sequencing on samples pooled from different individuals is a strategy to characterize genetic variability at a small fraction of the cost required for individual sequencing. In certain circumstances some variability estimators have even lower variance than those obtained with individual sequencing. SNP calling and estimating the frequency of the minor allele from pooled samples, though, is a subtle exercise for at least three reasons. First, sequencing errors may have a much larger relevance than in individual SNP calling: while their impact in individual sequencing can be reduced by setting a restriction on a minimum number of reads per allele, this would have a strong and undesired effect in pools because it is unlikely that alleles at low frequency in the pool will be read many times. Second, the prior allele frequency for heterozygous sites in individuals is usually 0.5 (assuming one is not analyzing sequences coming from, e.g. cancer tissues), but this is not true in pools: in fact, under the standard neutral model, singletons (i.e. alleles of minimum frequency) are the most common class of variants because P(f) ∝ 1/f and they occur more often as the sample size increases. Third, an allele appearing only once in the reads from a pool does not necessarily correspond to a singleton in the set of individuals making up the pool, and vice versa, there can be more than one read – or, more likely, none – from a true singleton.ResultsTo improve upon existing theory and software packages, we have developed a Bayesian approach for minor allele frequency (MAF) computation and SNP calling in pools (and implemented it in a program called snape): the approach takes into account sequencing errors and allows users to choose different priors. We also set up a pipeline which can simulate the coalescence process giving rise to the SNPs, the pooling procedure and the sequencing. We used it to compare the performance of snape to that of other packages.ConclusionsWe present a software which helps in calling SNPs in pooled samples: it has good power while retaining a low false discovery rate (FDR). The method also provides the posterior probability that a SNP is segregating and the full posterior distribution of f for every SNP. In order to test the behaviour of our software, we generated (through simulated coalescence) artificial genomes and computed the effect of a pooled sequencing protocol, followed by SNP calling. In this setting, snape has better power and False Discovery Rate (FDR) than the comparable packages samtools, PoPoolation, Varscan : for N = 50 chromosomes, snape has power ≈ 35% and FDR ≈ 2.5%. snape is available athttp://code.google.com/p/snape-pooled/ (source code and precompiled binaries).

Genetics | 2012

Neutrality Tests for Sequences with Missing Data

Luca Ferretti; Emanuele Raineri; Sebastian E. Ramos-Onsins

Missing data are common in DNA sequences obtained through high-throughput sequencing. Furthermore, samples of low quality or problems in the experimental protocol often cause a loss of data even with traditional sequencing technologies. Here we propose modified estimators of variability and neutrality tests that can be naturally applied to sequences with missing data, without the need to remove bases or individuals from the analysis. Modified statistics include the Watterson estimator θW, Tajima’s D, Fay and Wu’s H, and HKA. We develop a general framework to take missing data into account in frequency spectrum-based neutrality tests and we derive the exact expression for the variance of these statistics under the neutral model. The neutrality tests proposed here can also be used as summary statistics to describe the information contained in other classes of data like DNA microarrays.

Nucleic Acids Research | 2011

BlastR—fast and accurate database searches for non-coding RNAs

Giovanni Bussotti; Emanuele Raineri; Ionas Erb; Matthias Zytnicki; Andreas Wilm; Emmanuel Beaudoing; Philipp Bucher; Cedric Notredame

We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-coding RNA sequences. Using Rfam as a gold standard, we benchmarked this approach and show BlastR to be more sensitive than BlastN. We also show that BlastR is both faster and more sensitive than BlastP used with a single nucleotide log-odd substitution matrix. BlastR, when used in combination with WU-BlastP, is about 5% more accurate than WU-BlastN and about 50 times slower. The approach shown here is equally effective when combined with the NCBI-Blast package. The software is an open source freeware available from www.tcoffee.org/blastr.html.

Explore More