Francesco Vezzi
Science for Life Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Francesco Vezzi.
European Respiratory Journal | 2016
Johan Grunewald; Ylva Kaiser; Mahyar Ostadkarampour; Natalia V. Rivera; Francesco Vezzi; Britta Lötstedt; Lina Sylwan; Sverker Lundin; Max Käller; Tatiana Sandalova; Kerstin M. Ahlgren; Jan Wahlström; Adnane Achour; Marcus Ronninger; Anders Eklund
In pulmonary sarcoidosis, CD4+ T-cells expressing T-cell receptor Vα2.3 accumulate in the lungs of HLA-DRB1*03+ patients. To investigate T-cell receptor-HLA-DRB1*03 interactions underlying recognition of hitherto unknown antigens, we performed detailed analyses of T-cell receptor expression on bronchoalveolar lavage fluid CD4+ T-cells from sarcoidosis patients. Pulmonary sarcoidosis patients (n=43) underwent bronchoscopy with bronchoalveolar lavage. T-cell receptor α and β chains of CD4+ T-cells were analysed by flow cytometry, DNA-sequenced, and three-dimensional molecular models of T-cell receptor-HLA-DRB1*03 complexes generated. Simultaneous expression of Vα2.3 with the Vβ22 chain was identified in the lungs of all HLA-DRB1*03+ patients. Accumulated Vα2.3/Vβ22-expressing T-cells were highly clonal, with identical or near-identical Vα2.3 chain sequences and inter-patient similarities in Vβ22 chain amino acid distribution. Molecular modelling revealed specific T-cell receptor-HLA-DRB1*03-peptide interactions, with a previously identified, sarcoidosis-associated vimentin peptide, (Vim)429–443 DSLPLVDTHSKRTLL, matching both the HLA peptide-binding cleft and distinct T-cell receptor features perfectly. We demonstrate, for the first time, the accumulation of large clonal populations of specific Vα2.3/Vβ22 T-cell receptor-expressing CD4+ T-cells in the lungs of HLA-DRB1*03+ sarcoidosis patients. Several distinct contact points between Vα2.3/Vβ22 receptors and HLA-DRB1*03 molecules suggest presentation of prototypic vimentin-derived peptides. Clonal CD4+ lung T-cells associating with HLA-DRB1*03 molecules indicate specific antigens in pulmonary sarcoidosis http://ow.ly/UB81x
GigaScience | 2016
Ola Spjuth; Erik Bongcam-Rudloff; Johan Dahlberg; Martin Dahlö; Aleksi Kallio; Luca Pireddu; Francesco Vezzi; Eija Korpelainen
With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.
BMC Evolutionary Biology | 2016
Maria de la Paz Celorio-Mancera; Christopher W. Wheat; Mikael Huss; Francesco Vezzi; Ramprasad Neethiraj; Johan Reimegård; Sören Nylin; Niklas Janz
BackgroundAlthough most insect species are specialized on one or few groups of plants, there are phytophagous insects that seem to use virtually any kind of plant as food. Understanding the nature of this ability to feed on a wide repertoire of plants is crucial for the control of pest species and for the elucidation of the macroevolutionary mechanisms of speciation and diversification of insect herbivores. Here we studied Vanessa cardui, the species with the widest diet breadth among butterflies and a potential insect pest, by comparing tissue-specific transcriptomes from caterpillars that were reared on different host plants. We tested whether the similarities of gene-expression response reflect the evolutionary history of adaptation to these plants in the Vanessa and related genera, against the null hypothesis of transcriptional profiles reflecting plant phylogenetic relatedness.ResultUsing both unsupervised and supervised methods of data analysis, we found that the tissue-specific patterns of caterpillar gene expression are better explained by the evolutionary history of adaptation of the insects to the plants than by plant phylogeny.ConclusionOur findings suggest that V. cardui may use two sets of expressed genes to achieve polyphagy, one associated with the ancestral capability to consume Rosids and Asterids, and another allowing the caterpillar to incorporate a wide range of novel host-plants.
GigaScience | 2015
Ignas Bunikis; Ievgeniia Tiukova; Kicki Holmberg; Britta Lötstedt; Olga Vinnere Pettersson; Volkmar Passoth; Max Käller; Francesco Vezzi
BackgroundIt remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome.MethodsIn this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work.ResultsWe obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.
bioRxiv | 2016
Adam Ameur; Johan Dahlberg; Pall Olason; Francesco Vezzi; Robert Karlsson; Pär Lundin; Huiwen Che; Jessada Thutkawkorapin; Andreas Kusalananda Kahari; Mats Dahlberg; Johan Viklund; Jonas Hagberg; Niclas Jareborg; Inger Jonasson; Åsa Johansson; Sverker Lundin; Daniel Nilsson; Björn Nystedt; Patrik K. E. Magnusson; Ulf Gyllensten
Here we describe the SweGen dataset, a high-quality map of genetic variation in the Swedish population. This data represents a basic resource for clinical genetics laboratories as well as for sequencing-based association studies, by providing information on the frequencies of genetic variants in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide population based cohort of over 10,000 individuals. From this sample collection, 1,000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole genome sequencing (WGS). Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a whole-genome map of aggregated variant frequencies in the Swedish population that we hereby release to the scientific community.
BMC Genomics | 2014
Andrey Alexeyenko; Björn Nystedt; Francesco Vezzi; Ellen Sherwood; Rosa Ye; Bjarne Knudsen; Martin Simonsen; Benjamin Turner; Pieter J. de Jong; Cheng-Cang Wu; Joakim Lundeberg
BackgroundSampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality.ResultsIn order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS.ConclusionsBy exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.
BMC Bioinformatics | 2016
Nicola Prezza; Francesco Vezzi; Max Käller; Alberto Policriti
BackgroundBisulfite treatment of DNA followed by sequencing (BS-seq) has become a standard technique in epigenetic studies, providing researchers with tools for generating single-base resolution maps of whole methylomes. Aligning bisulfite-treated reads, however, is a computationally difficult task: bisulfite treatment decreases the (lexical) complexity of low-methylated genomic regions, and C-to-T mismatches may reflect cytosine unmethylation rather than SNPs or sequencing errors. Further challenges arise both during and after the alignment phase: data structures used by the aligner should be fast and should fit into main memory, and the methylation-caller output should be somehow compressed, due to its significant size.MethodsAs far as data structures employed to align bisulfite-treated reads are concerned, solutions proposed in the literature can be roughly grouped into two main categories: those storing pointers at each text position (e.g. hash tables, suffix trees/arrays), and those using the information-theoretic minimum number of bits (e.g. FM indexes and compressed suffix arrays). The former are fast and memory consuming. The latter are much slower and light. In this paper, we try to close this gap proposing a data structure for aligning bisulfite-treated reads which is at the same time fast, light, and very accurate. We reach this objective by combining a recent theoretical result on succinct hashing with a bisulfite-aware hash function. Furthermore, the new versions of the tools implementing our ideas|the aligner ERNE-BS5 2 and the caller ERNE-METH 2|have been extended with increased downstream compatibility (EPP/Bismark cov output formats), output compression, and support for target enrichment protocols.ResultsExperimental results on public and simulated WGBS libraries show that our algorithmic solution is a competitive tradeoff between hash-based and BWT-based indexes, being as fast and accurate as the former, and as memory-efficient as the latter.ConclusionsThe new functionalities of our bisulfite aligner and caller make it a fast and memory efficient tool, useful to analyze big datasets with little computational resources, to easily process target enrichment data, and produce statistics such as protocol efficiency and coverage as a function of the distance from target regions.
bioRxiv | 2017
Tom van der Valk; Francesco Vezzi; Mattias Ormestad; Love Dalén; Katerina Guschanski
The high-throughput capacities of the Illumina sequencing platforms and possibility to label samples individually have encouraged a wide use of sample multiplexing. However, this practice results in read misassignment (usually <1%) across samples sequenced on the same lane. Alarmingly high rates of read misassignment of up to 10% were reported for the latest generation of lllumina sequencing machines. This potentially calls into question previously generated results and may make future use of the newest generation of platforms prohibitive. In this study we rely on barcodes, short sequences that are directly ligated to both ends of the DNA insert, which allows us to quantify the amount of index hopping. Correcting for multiple sources of noise, we identify on average only 0.470% of reads containing a hopped index. Multiplexing of samples on this platform is therefore unlikely to cause markedly different results to those obtained from older platforms.Abstract The high-throughput capacities of the Illumina sequencing platforms and the possibility to label samples individually have encouraged a wide use of sample multiplexing. However, this practice results in read misassignment (usually
bioRxiv | 2018
Tom van der Valk; Francesco Vezzi; Mattias Ormestad; Love Dalén; Katerina Guschanski
The high-throughput capacities of the Illumina sequencing platforms and possibility to label samples individually have encouraged a wide use of sample multiplexing. However, this practice results in read misassignment (usually <1%) across samples sequenced on the same lane. Alarmingly high rates of read misassignment of up to 10% were reported for the latest generation of lllumina sequencing machines. This potentially calls into question previously generated results and may make future use of the newest generation of platforms prohibitive. In this study we rely on barcodes, short sequences that are directly ligated to both ends of the DNA insert, which allows us to quantify the amount of index hopping. Correcting for multiple sources of noise, we identify on average only 0.470% of reads containing a hopped index. Multiplexing of samples on this platform is therefore unlikely to cause markedly different results to those obtained from older platforms.Abstract The high-throughput capacities of the Illumina sequencing platforms and the possibility to label samples individually have encouraged a wide use of sample multiplexing. However, this practice results in read misassignment (usually
bioRxiv | 2018
Tom van der Valk; Francesco Vezzi; Mattias Ormestad; Love Dalén; Katerina Guschanski
The high-throughput capacities of the Illumina sequencing platforms and possibility to label samples individually have encouraged a wide use of sample multiplexing. However, this practice results in read misassignment (usually <1%) across samples sequenced on the same lane. Alarmingly high rates of read misassignment of up to 10% were reported for the latest generation of lllumina sequencing machines. This potentially calls into question previously generated results and may make future use of the newest generation of platforms prohibitive. In this study we rely on barcodes, short sequences that are directly ligated to both ends of the DNA insert, which allows us to quantify the amount of index hopping. Correcting for multiple sources of noise, we identify on average only 0.470% of reads containing a hopped index. Multiplexing of samples on this platform is therefore unlikely to cause markedly different results to those obtained from older platforms.Abstract The high-throughput capacities of the Illumina sequencing platforms and the possibility to label samples individually have encouraged a wide use of sample multiplexing. However, this practice results in read misassignment (usually