Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Edoardo Pasolli is active.

Publication


Featured researches published by Edoardo Pasolli.


Nature Methods | 2015

MetaPhlAn2 for enhanced metagenomic taxonomic profiling.

Duy Tin Truong; Eric A. Franzosa; Timothy L. Tickle; Matthias Scholz; George Weingart; Edoardo Pasolli; Adrian Tett; Curtis Huttenhower; Nicola Segata

 Profiling of all domains of life. Marker and quasi-marker genes are now identified not only for microbes (Bacteria and Archaea), but also for viruses and Eukaryotic microbes (Fungi, Protozoa) that are crucial components of microbial communities.  A 6-fold increase in the number of considered species. Markers are now identified from >16,000 reference genomes and >7,000 unique species, dramatically expanding the comprehensiveness of the method. The new pipeline for identifying marker genes is also scalable to the quickly increasing number of reference genomes. See Supplementary Tables 1-3.  Introduction of the concept of quasi-markers, allowing more comprehensive and accurate profiling. For species with less than 200 markers, MetaPhlAn2 adopts additional quasi-marker sequences (Supplementary Note 2) that are occasionally present in other genomes (because of vertical conservation or horizontal transfer). At profiling time, if no other markers of the potentially confounding species are detected, the corresponding quasi-local markers are used to improve the quality and accuracy of the profiling.  Addition of strain-specific barcoding for microbial strain tracking. MetaPhlAn2 includes a completely new feature that exploits marker combinations to perform species-specific and genus-specific “barcoding” for strains in metagenomic samples (Supplementary Note 7). This feature can be used for culture-free pathogen tracking in epidemiology studies and strain tracking across microbiome samples. See Supplementary Figs. 12-20.  Strain-level identification for organisms with sequenced genomes. For the case in which a microbiome includes strains that are very close to one of those already sequenced, MetaPhlAn2 is now able to identify such strains and readily reports their abundances. See Supplementary Note 7, Supplementary Table 13, and Supplementary Fig. 21.  Improvement of false positive and false negative rates. Improvements in the underlying pipeline for identifying marker genes (including the increment of the adopted genomes and the use of quasi-markers) and the profiling procedure resulted in much improved quantitative performances (higher correlation with true abundances, lower false positive and false negative rates). See the validation on synthetic metagenomes in Supplementary Note 4.  Estimation of the percentage of reads mapped against known reference genomes. MetaPhlAn2 is now able to estimate the number of reads that would map against genomes of each clade detected as present and for which an estimation of its relative abundance is provided by the default output. See Supplementary Note 3 for details.  Integration of MetaPhlAn with post-processing and visualization tools. The MetaPhlAn2 package now includes a set of post-processing and visualization tools (“utils” subfolder of the MetaPhlAn2 repository). Multiple MetaPhlAn profiles can in fact be merged in an abundance table (“merge_metaphlan_tables.py”), exported as BIOM files, visualized as heatmap (“metaphlan_hclust_heatmap.py” or the integrated “hclust2” package), GraPhlAn plots (“export2graphlan.py” and the GraPhlAn package1), Krona2 plots (“metaphlan2krona.py”), and single microbe barplot across samples and conditions (“plot_bug.py”).


IEEE Transactions on Geoscience and Remote Sensing | 2009

Automatic Analysis of GPR Images: A Pattern-Recognition Approach

Edoardo Pasolli; Farid Melgani; Massimo Donelli

In this paper, we propose a novel pattern-recognition system to identify and classify buried objects from ground-penetrating radar (GPR) imagery. The entire process is subdivided into four steps. After a preprocessing step, the GPR image is thresholded to put under light the regions containing potential objects. The third step of the system consists of automatically detecting the objects in the obtained binary image by means of a search of linear/hyperbolic patterns formulated within a genetic optimization framework. In the genetic optimizer, each chromosome models the apex position and the curvature associated with the candidate pattern, while the fitness function expresses the Hamming distance between that pattern and the binary image content. Finally, in the fourth step, the problem of the recognition of the material type of the identified objects is approached as a classification issue, which is solved by means of an opportune feature-extraction strategy and a support vector machine classifier. To illustrate the performances of the proposed system, we conducted a thorough experimental study based on GPR images generated by a GPR simulator based on the finite-difference time-domain method so as to construct different acquisition scenarios by varying the number of buried objects, their position, their size, their shape, and their material type. In general, the obtained experimental results show that the proposed system exhibits promising performances both in terms of object detection and material recognition.


IEEE Transactions on Geoscience and Remote Sensing | 2009

Clustering of Hyperspectral Images Based on Multiobjective Particle Swarm Optimization

Andrea Paoli; Farid Melgani; Edoardo Pasolli

In this paper, we present a new methodology for clustering hyperspectral images. It aims at simultaneously solving the following three different issues: 1) estimation of the class statistical parameters; 2) detection of the best discriminative bands without requiring the a priori setting of their number by the user; and 3) estimation of the number of data classes characterizing the considered image. It is formulated within a multiobjective particle swarm optimization (MOPSO) framework and is guided by three different optimization criteria, which are the log-likelihood function, the Bhattacharyya statistical distance between classes, and the minimum description length (MDL). A detailed experimental analysis was conducted on both simulated and real hyperspectral images. In general, the obtained results show that interesting classification performances can be achieved by the proposed methodology despite its completely unsupervised nature.


Nature Methods | 2016

Strain-level microbial epidemiology and population genomics from shotgun metagenomics

Matthias Scholz; Doyle V. Ward; Edoardo Pasolli; Thomas Tolio; Moreno Zolfo; Francesco Asnicar; Duy Tin Truong; Adrian Tett; Ardythe L. Morrow; Nicola Segata

Identifying microbial strains and characterizing their functional potential is essential for pathogen discovery, epidemiology and population genomics. We present pangenome-based phylogenomic analysis (PanPhlAn; http://segatalab.cibio.unitn.it/tools/panphlan), a tool that uses metagenomic data to achieve strain-level microbial profiling resolution. PanPhlAn recognized outbreak strains, produced the largest strain-level population genomic study of human-associated bacteria and, in combination with metatranscriptomics, profiled the transcriptional activity of strains in complex communities.


IEEE Transactions on Geoscience and Remote Sensing | 2014

SVM Active Learning Approach for Image Classification Using Spatial Information

Edoardo Pasolli; Farid Melgani; Devis Tuia; Fabio Pacifici; William J. Emery

In the last few years, active learning has been gaining growing interest in the remote sensing community in optimizing the process of training sample collection for supervised image classification. Current strategies formulate the active learning problem in the spectral domain only. However, remote sensing images are intrinsically defined both in the spectral and spatial domains. In this paper, we explore this fact by proposing a new active learning approach for support vector machine classification. In particular, we suggest combining spectral and spatial information directly in the iterative process of sample selection. For this purpose, three criteria are proposed to favor the selection of samples distant from the samples already composing the current training set. In the first strategy, the Euclidean distances in the spatial domain from the training samples are explicitly computed, whereas the second one is based on the Parzen window method in the spatial domain. Finally, the last criterion involves the concept of spatial entropy. Experiments on two very high resolution images show the effectiveness of regularization in spatial domain for active learning purposes.


Genome Research | 2017

Microbial strain-level population structure and genetic diversity from metagenomes

Duy Tin Truong; Adrian Tett; Edoardo Pasolli; Curtis Huttenhower; Nicola Segata

Among the human health conditions linked to microbial communities, phenotypes are often associated with only a subset of strains within causal microbial groups. Although it has been critical for decades in microbial physiology to characterize individual strains, this has been challenging when using culture-independent high-throughput metagenomics. We introduce StrainPhlAn, a novel metagenomic strain identification approach, and apply it to characterize the genetic structure of thousands of strains from more than 125 species in more than 1500 gut metagenomes drawn from populations spanning North and South American, European, Asian, and African countries. The method relies on per-sample dominant sequence variant reconstruction within species-specific marker genes. It identified primarily subject-specific strain variants (<5% inter-subject strain sharing), and we determined that a single strain typically dominated each species and was retained over time (for >70% of species). Microbial population structure was correlated in several distinct ways with the geographic structure of the host population. In some cases, discrete subspecies (e.g., for Eubacterium rectale and Prevotella copri) or continuous microbial genetic variations (e.g., for Faecalibacterium prausnitzii) were associated with geographically distinct human populations, whereas few strains occurred in multiple unrelated cohorts. We further estimated the genetic variability of gut microbes, with Bacteroides species appearing remarkably consistent (0.45% median number of nucleotide variants between strains), whereas P. copri was among the most plastic gut colonizers. We thus characterize here the population genetics of previously inaccessible intestinal microbes, providing a comprehensive strain-level genetic overview of the gut microbial diversity.


PLOS Computational Biology | 2016

Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights.

Edoardo Pasolli; Duy Tin Truong; Faizan Malik; Levi Waldron; Nicola Segata

Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.


international conference of the ieee engineering in medicine and biology society | 2010

Active Learning Methods for Electrocardiographic Signal Classification

Edoardo Pasolli; Farid Melgani

In this paper, we present three active learning strategies for the classification of electrocardiographic (ECG) signals. Starting from a small and suboptimal training set, these learning strategies select additional beat samples from a large set of unlabeled data. These samples are labeled manually, and then added to the training set. The entire procedure is iterated until the construction of a final training set representative of the considered classification problem. The proposed methods are based on support vector machine classification and on the: 1) margin sampling; 2) posterior probability; and 3) query by committee principles, respectively. To illustrate their performance, we conducted an experimental study based on both simulated data and real ECG signals from the MIT-BIH arrhythmia database. In general, the obtained results show that the proposed strategies exhibit a promising capability to select samples that are significant for the classification process, i.e., to boost the accuracy of the classification process while minimizing the number of involved labeled samples.


IEEE Geoscience and Remote Sensing Letters | 2011

Support Vector Machine Active Learning Through Significance Space Construction

Edoardo Pasolli; Farid Melgani; Yakoub Bazi

Active learning is showing to be a useful approach to improve the efficiency of the classification process for remote sensing images. This letter introduces a new active learning strategy specifically developed for support vector machine (SVM) classification. It relies on the idea of the following: 1) reformulating the original classification problem into a new problem where it is needed to discriminate between significant and nonsignificant samples, according to a concept of significance which is proper to the SVM theory; and 2) constructing the corresponding significance space to suitably guide the selection of the samples potentially useful to better deal with the original classification problem. Experiments were conducted on both multi- and hyperspectral images. Results show interesting advantages of the proposed method in terms of convergence speed, stability, and sparseness.


IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | 2015

Ensemble Multiple Kernel Active Learning For Classification of Multisource Remote Sensing Data

Yuhang Zhang; Hsiuhan Lexie Yang; Saurabh Prasad; Edoardo Pasolli; Jinha Jung; Melba M. Crawford

Incorporating disparate features from multiple sources can provide valuable diverse information for remote sensing data analysis. However, multisource remote sensing data require large quantities of labeled data to train robust supervised classifiers, which are often difficult and expensive to acquire. A mixture-of-kernel approach can facilitate the construction of an effective formulation for acquiring useful samples via active learning (AL). In this paper, we propose an ensemble multiple kernel active learning (EnsembleMKL-AL) framework that incorporates different types of features extracted from multisensor remote sensing data (hyperspectral imagery and LiDAR data) for robust classification. An ensemble of probabilistic multiple kernel classifiers is embedded into a maximum disagreement-based AL system, which adaptively optimizes the kernel for each source during the AL process. At the end of each learning step, a decision fusion strategy is implemented to make a final decision based on the probabilistic outputs. The proposed framework is tested in a multisource environment, including different types of features extracted from hyperspectral and LiDAR data. The experimental results validate the efficacy of the proposed approach. In addition, we demonstrate that using ensemble classifiers and a large number of disparate but relevant features can further improve the performance of an AL-based classification approach.

Collaboration


Dive into the Edoardo Pasolli's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge