Giuseppe Jurman
fondazione bruno kessler
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Giuseppe Jurman.
Nature Genetics | 2009
John P. A. Ioannidis; David B. Allison; Catherine A. Ball; Issa Coulibaly; Xiangqin Cui; Aedín C. Culhane; Mario Falchi; Cesare Furlanello; Giuseppe Jurman; Jon Mangion; Tapan Mehta; Michael Nitzberg; Grier P. Page; Enrico Petretto; Vera van Noort
Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by independent scientists. Here we evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005–2006. One table or figure from each article was independently evaluated by two teams of analysts. We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability, and discrepancies were mostly due to incomplete data annotation or specification of data processing and analysis. Repeatability of published microarray studies is apparently limited. More strict publication rules enforcing public data availability and explicit description of data processing and analysis should be considered.
Nature Biotechnology | 2014
Charles Wang; Binsheng Gong; Pierre R. Bushel; Jean Thierry-Mieg; Danielle Thierry-Mieg; Joshua Xu; Hong Fang; Huixiao Hong; Jie Shen; Zhenqiang Su; Joe Meehan; Xiaojin Li; Lu Yang; Haiqing Li; Paweł P. Łabaj; David P. Kreil; Dalila B. Megherbi; Stan Gaj; Florian Caiment; Joost H.M. van Delft; Jos Kleinjans; Andreas Scherer; Viswanath Devanarayan; Jian Wang; Yong Yang; Hui-Rong Qian; Lee Lancashire; Marina Bessarabova; Yuri Nikolsky; Cesare Furlanello
The concordance of RNA-sequencing (RNA-seq) with microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed using a range of chemical treatment conditions. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in triplicate to varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOAs). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is linearly correlated with treatment effect size (R20.8). Furthermore, the concordance is also affected by transcript abundance and biological complexity of the MOA. RNA-seq outperforms microarray (93% versus 75%) in DEG verification as assessed by quantitative PCR, with the gain mainly due to its improved accuracy for low-abundance transcripts. Nonetheless, classifiers to predict MOAs perform similarly when developed using data from either platform. Therefore, the endpoint studied and its biological complexity, transcript abundance and the genomic application are important factors in transcriptomic research and for clinical and regulatory decision making.
BMC Bioinformatics | 2003
Cesare Furlanello; Maria Serafini; Stefano Merler; Giuseppe Jurman
BackgroundWe describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process).ResultsWith E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipfs law profiles.ConclusionsWithout a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.
PLOS ONE | 2012
Giuseppe Jurman; Samantha Riccadonna; Cesare Furlanello
We show that the Confusion Entropy, a measure of performance in multiclass problems has a strong (monotone) relation with the multiclass generalization of a classical metric, the Matthews Correlation Coefficient. Analytical results are provided for the limit cases of general no-information (n-face dice rolling) of the binary classification. Computational evidence supports the claim in the general case.
PLOS ONE | 2012
Rebeca Sanz-Pamplona; Antoni Berenguer; David Cordero; Samantha Riccadonna; Xavier Solé; Marta Crous-Bou; Elisabet Guinó; Xavier Sanjuan; Sebastiano Biondo; Antonio Soriano; Giuseppe Jurman; Gabriel Capellá; Cesare Furlanello; Victor Moreno
Introduction The traditional staging system is inadequate to identify those patients with stage II colorectal cancer (CRC) at high risk of recurrence or with stage III CRC at low risk. A number of gene expression signatures to predict CRC prognosis have been proposed, but none is routinely used in the clinic. The aim of this work was to assess the prediction ability and potential clinical usefulness of these signatures in a series of independent datasets. Methods A literature review identified 31 gene expression signatures that used gene expression data to predict prognosis in CRC tissue. The search was based on the PubMed database and was restricted to papers published from January 2004 to December 2011. Eleven CRC gene expression datasets with outcome information were identified and downloaded from public repositories. Random Forest classifier was used to build predictors from the gene lists. Matthews correlation coefficient was chosen as a measure of classification accuracy and its associated p-value was used to assess association with prognosis. For clinical usefulness evaluation, positive and negative post-tests probabilities were computed in stage II and III samples. Results Five gene signatures showed significant association with prognosis and provided reasonable prediction accuracy in their own training datasets. Nevertheless, all signatures showed low reproducibility in independent data. Stratified analyses by stage or microsatellite instability status showed significant association but limited discrimination ability, especially in stage II tumors. From a clinical perspective, the most predictive signatures showed a minor but significant improvement over the classical staging system. Conclusions The published signatures show low prediction accuracy but moderate clinical usefulness. Although gene expression data may inform prognosis, better strategies for signature validation are needed to encourage their widespread use in the clinic.
Bioinformatics | 2008
Giuseppe Jurman; Stefano Merler; Annalisa Barla; Silvano Paoli; Antonio Galea; Cesare Furlanello
MOTIVATION We propose a method for studying the stability of biomarker lists obtained from functional genomics studies. It is common to adopt resampling methods to tune and evaluate marker-based diagnostic and prognostic systems in order to prevent selection bias. Such caution promotes honest estimation of class prediction, but leads to alternative sets of solutions. In microarray studies, the difference in lists may be bewildering, also due to the presence of modules of functionally related genes. Methods for assessing stability understand the dependency of the markers on the data or on the predictors type and help selecting solutions. RESULTS A computational framework for comparing sets of ranked biomarker lists is presented. Notions and algorithms are based on concepts from permutation group theory. We introduce several algebraic indicators and metric methods for symmetric groups, including the Canberra distance, a weighted version of Spearmans footrule. We also consider distances between partial lists and an aggregation of sets of lists into an optimal list based on voting theory (Borda count). The stability indicators are applied in practical situations to several synthetic, cancer microarray and proteomics datasets. The addressed issues are predictive classification, presence of modules, comparison of alternative biomarker lists, outlier removal, control of selection bias by randomization techniques and enrichment analysis. AVAILABILITY Supplementary Material and software are available at the address http://biodcv.fbk.eu/listspy.html
international symposium on neural networks | 2003
Cesare Furlanello; Maria Serafini; Stefano Merler; Giuseppe Jurman
We describe a new wrapper algorithm for fast feature ranking in classification problems. The Entropy-based Recursive Feature Elimination (E-RFE) method eliminates chunks of uninteresting features according to the entropy of the weights distribution of a SVM classifier. With specific regard to DNA microarray datasets, the method is designed to support computationally intensive model selection in classification problems in which the number of features is much larger than the number of samples. We test E-RFE on synthetic and real data sets, comparing it with other SVM-based methods. The speed-up obtained with E-RFE supports predictive modeling on high dimensional microarray data.
Bioinformatics | 2013
Davide Albanese; Michele Filosi; Roberto Visintainer; Samantha Riccadonna; Giuseppe Jurman; Cesare Furlanello
UNLABELLED We introduce a novel implementation in ANSI C of the MINE family of algorithms for computing maximal information-based measures of dependence between two variables in large datasets, with the aim of a low memory footprint and ease of integration within bioinformatics pipelines. We provide the libraries minerva (with the R interface) and minepy for Python, MATLAB, Octave and C++. The C solution reduces the large memory requirement of the original Java implementation, has good upscaling properties and offers a native parallelization for the R interface. Low memory requirements are demonstrated on the MINE benchmarks as well as on large ( = 1340) microarray and Illumina GAII RNA-seq transcriptomics datasets. AVAILABILITY AND IMPLEMENTATION Source code and binaries are freely available for download under GPL3 licence at http://minepy.sourceforge.net for minepy and through the CRAN repository http://cran.r-project.org for the R package minerva. All software is multiplatform (MS Windows, Linux and OSX).
International Journal of Cancer | 2006
Cristiano De Pittà; Lucia Tombolan; Giada Albiero; F. Sartori; Chiara Romualdi; Giuseppe Jurman; Modesto Carli; Cesare Furlanello; Gerolamo Lanfranchi; Angelo Rosolen
We analyzed the expression signatures of 14 tumor biopsies from children affected by alveolar rhabdomyosarcoma (ARMS) to identify genes correlating to biological features of this tumor. Seven of these patients were positive for the PAX3‐FKHR fusion gene and 7 were negative. We used a cDNA platform containing a large majority of probes derived from muscle tissues. The comparison of transcription profiles of tumor samples with fetal skeletal muscle identified 171 differentially expressed genes common to all ARMS patients. The functional classification analysis of altered genes led to the identification of a group of transcripts (LGALS1, BIN1) that may be relevant for the tumorigenic processes. The muscle‐specific microarray platform was able to distinguish PAX3‐FKHR positive and negative ARMS through the expression pattern of a limited number of genes (RAC1, CFL1, CCND1, IGFBP2) that might be biologically relevant for the different clinical behavior and aggressiveness of the 2 ARMS subtypes. Expression levels for selected candidate genes were validated by quantitative real‐time reverse‐transcription PCR.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2005
Cesare Furlanello; Maria Serafini; Stefano Merler; Giuseppe Jurman
Class prediction and feature selection are two learning tasks that are strictly paired in the search of molecular profiles from microarray data. Researchers have become aware how easy it is to incur a selection bias effect, and complex validation setups are required to avoid overly optimistic estimates of the predictive accuracy of the models and incorrect gene selections. This paper describes a semisupervised pattern discovery approach that uses the by-products of complete validation studies on experimental setups for gene profiling. In particular, we introduce the study of the patterns of single sample responses (sample-tracking profiles) to the gene selection process induced by typical supervised learning tasks in microarray studies. We originate sample-tracking profiles as the aggregated off-training evaluation of SVM models of increasing gene panel sizes. Genes are ranked by E-RFE, an entropy-based variant of the recursive feature elimination for support vector machines (RFE-SVM). A dynamic time warping (DTW) algorithm is then applied to define a metric between sample-tracking profiles. An unsupervised clustering based on the DTW metric allows automating the discovery of outliers and of subtypes of different molecular profiles. Applications are described on synthetic data and in two gene expression studies.