Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Luca Ferretti is active.

Publication


Featured researches published by Luca Ferretti.


Molecular Ecology | 2013

Population genomics from pool sequencing

Luca Ferretti; Sebastian E. Ramos-Onsins; Miguel Pérez-Enciso

Next generation sequencing of pooled samples is an effective approach for studies of variability and differentiation in populations. In this paper we provide a comprehensive set of estimators of the most common statistics in population genetics based on the frequency spectrum, namely the Watterson estimator θW , nucleotide pairwise diversity Π, Tajimas D, Fu and Lis D and F, Fay and Wus H, McDonald‐Kreitman and HKA tests and FST , corrected for sequencing errors and ascertainment bias. In a simulation study, we show that pool and individual θ estimates are highly correlated and discuss how the performance of the statistics vary with read depth and sample size in different evolutionary scenarios. As an application, we reanalyse sequences from Drosophila mauritiana and from an evolution experiment in Drosophila melanogaster. These methods are useful for population genetic projects with limited budget, study of communities of individuals that are hard to isolate, or autopolyploid species.


BMC Bioinformatics | 2012

SNP calling by sequencing pooled samples.

Emanuele Raineri; Luca Ferretti; Anna Esteve-Codina; Bruno Nevado; Simon Heath; Miguel Pérez-Enciso

BackgroundPerforming high throughput sequencing on samples pooled from different individuals is a strategy to characterize genetic variability at a small fraction of the cost required for individual sequencing. In certain circumstances some variability estimators have even lower variance than those obtained with individual sequencing. SNP calling and estimating the frequency of the minor allele from pooled samples, though, is a subtle exercise for at least three reasons. First, sequencing errors may have a much larger relevance than in individual SNP calling: while their impact in individual sequencing can be reduced by setting a restriction on a minimum number of reads per allele, this would have a strong and undesired effect in pools because it is unlikely that alleles at low frequency in the pool will be read many times. Second, the prior allele frequency for heterozygous sites in individuals is usually 0.5 (assuming one is not analyzing sequences coming from, e.g. cancer tissues), but this is not true in pools: in fact, under the standard neutral model, singletons (i.e. alleles of minimum frequency) are the most common class of variants because P(f) ∝ 1/f and they occur more often as the sample size increases. Third, an allele appearing only once in the reads from a pool does not necessarily correspond to a singleton in the set of individuals making up the pool, and vice versa, there can be more than one read – or, more likely, none – from a true singleton.ResultsTo improve upon existing theory and software packages, we have developed a Bayesian approach for minor allele frequency (MAF) computation and SNP calling in pools (and implemented it in a program called snape): the approach takes into account sequencing errors and allows users to choose different priors. We also set up a pipeline which can simulate the coalescence process giving rise to the SNPs, the pooling procedure and the sequencing. We used it to compare the performance of snape to that of other packages.ConclusionsWe present a software which helps in calling SNPs in pooled samples: it has good power while retaining a low false discovery rate (FDR). The method also provides the posterior probability that a SNP is segregating and the full posterior distribution of f for every SNP. In order to test the behaviour of our software, we generated (through simulated coalescence) artificial genomes and computed the effect of a pooled sequencing protocol, followed by SNP calling. In this setting, snape has better power and False Discovery Rate (FDR) than the comparable packages samtools, PoPoolation, Varscan : for N = 50 chromosomes, snape has power ≈ 35% and FDR ≈ 2.5%. snape is available athttp://code.google.com/p/snape-pooled/ (source code and precompiled binaries).


Genetics | 2012

Neutrality Tests for Sequences with Missing Data

Luca Ferretti; Emanuele Raineri; Sebastian E. Ramos-Onsins

Missing data are common in DNA sequences obtained through high-throughput sequencing. Furthermore, samples of low quality or problems in the experimental protocol often cause a loss of data even with traditional sequencing technologies. Here we propose modified estimators of variability and neutrality tests that can be naturally applied to sequences with missing data, without the need to remove bases or individuals from the analysis. Modified statistics include the Watterson estimator θW, Tajima’s D, Fay and Wu’s H, and HKA. We develop a general framework to take missing data into account in frequency spectrum-based neutrality tests and we derive the exact expression for the variance of these statistics under the neutral model. The neutrality tests proposed here can also be used as summary statistics to describe the information contained in other classes of data like DNA microarrays.


Genetics | 2010

Optimal neutrality tests based on the frequency spectrum.

Luca Ferretti; Miguel Pérez-Enciso; Sebastian E. Ramos-Onsins

The ascertainment of the demographic and selective history of populations has been a major research goal in genetics for decades. To that end, numerous statistical tests have been developed to detect deviations between expected and observed frequency spectra, e.g., Tajimas D, Fu and Lis F and D tests, and Fay and Wus H. Recently, Achaz developed a general framework to generate tests that detect deviations in the frequency spectrum. In a further development, we argue that the results of these tests should be as independent on the sample size as possible and propose a scale-free form for them. Furthermore, using the same framework as that of Achaz, we develop a new family of neutrality tests based on the frequency spectrum that are optimal against a chosen alternative evolutionary scenario. These tests maximize the power to reject the standard neutral model and are scalable with the sample size. Optimal tests are derived for several alternative evolutionary scenarios, including demographic processes (population bottleneck, expansion, contraction) and selective sweeps. Within the same framework, we also derive an optimal general test given a generic evolutionary scenario as a null model. All formulas are relatively simple and can be computed very fast, making it feasible to apply them to genome-wide sequence data. A simulation study showed that, generally, the tests proposed are more consistently powerful than standard tests like Tajimas D. We further illustrate the method with real data from a QTL candidate region in pigs.


Molecular Biology and Evolution | 2014

Nucleotide Variability at Its Limit? Insights into the Number and Evolutionary Dynamics of the Sex-Determining Specificities of the Honey Bee Apis mellifera

Sarah Lechner; Luca Ferretti; Caspar Schöning; Wanja Kinuthia; David Willemsen; Martin Hasselmann

Deciphering the evolutionary processes driving nucleotide variation in multiallelic genes is limited by the number of genetic systems in which such genes occur. The complementary sex determiner (csd) gene in the honey bee Apis mellifera is an informative example for studying allelic diversity and the underlying evolutionary forces in a well-described model of balancing selection. Acting as the primary signal of sex determination, diploid individuals heterozygous for csd develop into females, whereas csd homozygotes are diploid males that have zero fitness. Examining 77 of the functional heterozygous csd allele pairs, we established a combinatorical criteria that provide insights into the minimum number of amino acid differences among those pairs. Given a data set of 244 csd sequences, we show that the total number of csd alleles found in A. mellifera ranges from 53 (locally) to 87 (worldwide), which is much higher than was previously reported (20). Using a coupon-collector model, we extrapolate the presence of in total 116–145 csd alleles worldwide. The hypervariable region (HVR) is of particular importance in determining csd allele specificity, and we provide for this region evidence of high evolutionary rate for length differences exceeding those of microsatellites. The proportion of amino acids driven by positive selection and the rate of nonsynonymous substitutions in the HVR-flanking regions reach values close to 1 but differ with respect to the HVR length. Using a model of csd coalescence, we identified the high originating rate of csd specificities as a major evolutionary force, leading to an origin of a novel csd allele every 400,000 years. The csd polymorphism frequencies in natural populations indicate an excess of new mutations, whereas signs of ancestral transspecies polymorphism can still be detected. This study provides a comprehensive view of the enormous diversity and the evolutionary forces shaping a multiallelic gene.


Molecular Biology and Evolution | 2015

A Model of Substitution Trajectories in Sequence Space and Long-Term Protein Evolution

Dinara R. Usmanova; Luca Ferretti; Inna S. Povolotskaya; Peter K. Vlasov; Fyodor A. Kondrashov

The nature of factors governing the tempo and mode of protein evolution is a fundamental issue in evolutionary biology. Specifically, whether or not interactions between different sites, or epistasis, are important in directing the course of evolution became one of the central questions. Several recent reports have scrutinized patterns of long-term protein evolution claiming them to be compatible only with an epistatic fitness landscape. However, these claims have not yet been substantiated with a formal model of protein evolution. Here, we formulate a simple covarion-like model of protein evolution focusing on the rate at which the fitness impact of amino acids at a site changes with time. We then apply the model to the data on convergent and divergent protein evolution to test whether or not the incorporation of epistatic interactions is necessary to explain the data. We find that convergent evolution cannot be explained without the incorporation of epistasis and the rate at which an amino acid state switches from being acceptable at a site to being deleterious is faster than the rate of amino acid substitution. Specifically, for proteins that have persisted in modern prokaryotic organisms since the last universal common ancestor for one amino acid substitution approximately ten amino acid states switch from being accessible to being deleterious, or vice versa. Thus, molecular evolution can only be perceived in the context of rapid turnover of which amino acids are available for evolution.


Genome Biology and Evolution | 2016

Dosage Compensation in the African Malaria Mosquito Anopheles gambiae.

Graham Rose; Elzbieta Krzywinska; Jan Kim; Loic Revuelta; Luca Ferretti; Jaroslaw Krzywinski

Dosage compensation is the fundamental process by which gene expression from the male monosomic X chromosome and from the diploid set of autosomes is equalized. Various molecular mechanisms have evolved in different organisms to achieve this task. In Drosophila, genes on the male X chromosome are upregulated to the levels of expression from the two X chromosomes in females. To test whether a similar mechanism is operating in immature stages of Anopheles mosquitoes, we analyzed global gene expression in the Anopheles gambiae fourth instar larvae and pupae using high-coverage RNA-seq data. In pupae of both sexes, the median expression ratios of X-linked to autosomal genes (X:A) were close to 1.0, and within the ranges of expression ratios between the autosomal pairs, consistent with complete compensation. Gene-by-gene comparisons of expression in males and females revealed mild female bias, likely attributable to a deficit of male-biased X-linked genes. In larvae, male to female ratios of the X chromosome expression levels were more female biased than in pupae, suggesting that compensation may not be complete. No compensation mechanism appears to operate in male germline of early pupae. Confirmation of the existence of dosage compensation in A. gambiae lays the foundation for research into the components of dosage compensation machinery in this important vector species.


Briefings in Functional Genomics | 2015

Beyond fruit-flies: population genomic advances in non-Drosophila arthropods

Martin Hasselmann; Luca Ferretti; Amro Zayed

Understanding the evolutionary processes driving the adaptive differentiation of populations is of broad interest in biology. Genome-wide nucleotide polymorphisms provide the basis for population genetic studies powered by advances in high-throughput sequencing technologies. These advances have led to an extension of genome projects to a variety of non-genetic model organisms, broadening our view on the evolution of gene families and taxonomic-restricted novelties. Here, we review the progress of genome projects in non-Drosophila arthropods, focusing on advances in the analysis of large-scale polymorphism data and functional genomics and examples of population genomic studies.


EPL | 2014

Duality between preferential attachment and static networks on hyperbolic spaces

Luca Ferretti; Michele Cortelezzi; Marcello Mamino

There is a complex relation between the mechanism of preferential attachment, scale-free degree distributions and hyperbolicity in complex networks. In fact, both preferential attachment and hidden hyperbolic spaces often generate scale-free networks. We show that there is actually a duality between a class of growing spatial networks based on preferential attachment on the sphere and a class of static random networks on the hyperbolic plane. Both classes of networks have the same scale-free degree distribution as the Barabasi-Albert model. As a limit of this correspondence, the Barabasi-Albert model is equivalent to a static random network on an hyperbolic space with infinite curvature.


bioRxiv | 2015

MAGELLAN: a tool to explore small fitness landscapes

Sophie Brouillet; Harry Annoni; Luca Ferretti; Guillaume Achaz

In a fitness landscape, fitness values are associated to all genotypes corresponding to several, potentially all, combinations of a set of mutations. In the last decade, many small experimental fitness landscapes have been partially or completely resolved, and more will likely follow. MAGELLAN is a web-based graphical software to explore small fitness/energy landscapes through dynamic visualization and quantitative measures. It can be used to explore input custom landscapes, previously published experimental landscapes or randomly generated model landscapes.

Collaboration


Dive into the Luca Ferretti's collaboration.

Top Co-Authors

Avatar

Sebastian E. Ramos-Onsins

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ginestra Bianconi

Queen Mary University of London

View shared research outputs
Top Co-Authors

Avatar

Paolo Ribeca

Institute for Animal Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Miguel Pérez-Enciso

Spanish National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge