Giuseppe Jurman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Giuseppe Jurman is active.

Explore More

Publication

Featured researches published by Giuseppe Jurman.

Nature Genetics | 2009

Repeatability of published microarray gene expression analyses

John P. A. Ioannidis; David B. Allison; Catherine A. Ball; Issa Coulibaly; Xiangqin Cui; Aedín C. Culhane; Mario Falchi; Cesare Furlanello; Giuseppe Jurman; Jon Mangion; Tapan Mehta; Michael Nitzberg; Grier P. Page; Enrico Petretto; Vera van Noort

Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by independent scientists. Here we evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005–2006. One table or figure from each article was independently evaluated by two teams of analysts. We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability, and discrepancies were mostly due to incomplete data annotation or specification of data processing and analysis. Repeatability of published microarray studies is apparently limited. More strict publication rules enforcing public data availability and explicit description of data processing and analysis should be considered.

Nature Biotechnology | 2014

The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance

Charles Wang; Binsheng Gong; Pierre R. Bushel; Jean Thierry-Mieg; Danielle Thierry-Mieg; Joshua Xu; Hong Fang; Huixiao Hong; Jie Shen; Zhenqiang Su; Joe Meehan; Xiaojin Li; Lu Yang; Haiqing Li; Paweł P. Łabaj; David P. Kreil; Dalila B. Megherbi; Stan Gaj; Florian Caiment; Joost H.M. van Delft; Jos Kleinjans; Andreas Scherer; Viswanath Devanarayan; Jian Wang; Yong Yang; Hui-Rong Qian; Lee Lancashire; Marina Bessarabova; Yuri Nikolsky; Cesare Furlanello

The concordance of RNA-sequencing (RNA-seq) with microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed using a range of chemical treatment conditions. Here we use a comprehensive study design to generate Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in triplicate to varying degrees of perturbation by 27 chemicals representing multiple modes of action (MOAs). The cross-platform concordance in terms of differentially expressed genes (DEGs) or enriched pathways is linearly correlated with treatment effect size (R20.8). Furthermore, the concordance is also affected by transcript abundance and biological complexity of the MOA. RNA-seq outperforms microarray (93% versus 75%) in DEG verification as assessed by quantitative PCR, with the gain mainly due to its improved accuracy for low-abundance transcripts. Nonetheless, classifiers to predict MOAs perform similarly when developed using data from either platform. Therefore, the endpoint studied and its biological complexity, transcript abundance and the genomic application are important factors in transcriptomic research and for clinical and regulatory decision making.

BMC Bioinformatics | 2003

Entropy-based gene ranking without selection bias for the predictive classification of microarray data.

Cesare Furlanello; Maria Serafini; Stefano Merler; Giuseppe Jurman

BackgroundWe describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process).ResultsWith E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipfs law profiles.ConclusionsWithout a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.

PLOS ONE | 2012

A Comparison of MCC and CEN Error Measures in Multi-Class Prediction

Giuseppe Jurman; Samantha Riccadonna; Cesare Furlanello

We show that the Confusion Entropy, a measure of performance in multiclass problems has a strong (monotone) relation with the multiclass generalization of a classical metric, the Matthews Correlation Coefficient. Analytical results are provided for the limit cases of general no-information (n-face dice rolling) of the binary classification. Computational evidence supports the claim in the general case.

PLOS ONE | 2012

Clinical value of prognosis gene expression signatures in colorectal cancer: a systematic review.

Rebeca Sanz-Pamplona; Antoni Berenguer; David Cordero; Samantha Riccadonna; Xavier Solé; Marta Crous-Bou; Elisabet Guinó; Xavier Sanjuan; Sebastiano Biondo; Antonio Soriano; Giuseppe Jurman; Gabriel Capellá; Cesare Furlanello; Victor Moreno

Introduction The traditional staging system is inadequate to identify those patients with stage II colorectal cancer (CRC) at high risk of recurrence or with stage III CRC at low risk. A number of gene expression signatures to predict CRC prognosis have been proposed, but none is routinely used in the clinic. The aim of this work was to assess the prediction ability and potential clinical usefulness of these signatures in a series of independent datasets. Methods A literature review identified 31 gene expression signatures that used gene expression data to predict prognosis in CRC tissue. The search was based on the PubMed database and was restricted to papers published from January 2004 to December 2011. Eleven CRC gene expression datasets with outcome information were identified and downloaded from public repositories. Random Forest classifier was used to build predictors from the gene lists. Matthews correlation coefficient was chosen as a measure of classification accuracy and its associated p-value was used to assess association with prognosis. For clinical usefulness evaluation, positive and negative post-tests probabilities were computed in stage II and III samples. Results Five gene signatures showed significant association with prognosis and provided reasonable prediction accuracy in their own training datasets. Nevertheless, all signatures showed low reproducibility in independent data. Stratified analyses by stage or microsatellite instability status showed significant association but limited discrimination ability, especially in stage II tumors. From a clinical perspective, the most predictive signatures showed a minor but significant improvement over the classical staging system. Conclusions The published signatures show low prediction accuracy but moderate clinical usefulness. Although gene expression data may inform prognosis, better strategies for signature validation are needed to encourage their widespread use in the clinic.

Bioinformatics | 2008

Algebraic stability indicators for ranked lists in molecular profiling

Giuseppe Jurman; Stefano Merler; Annalisa Barla; Silvano Paoli; Antonio Galea; Cesare Furlanello

MOTIVATION We propose a method for studying the stability of biomarker lists obtained from functional genomics studies. It is common to adopt resampling methods to tune and evaluate marker-based diagnostic and prognostic systems in order to prevent selection bias. Such caution promotes honest estimation of class prediction, but leads to alternative sets of solutions. In microarray studies, the difference in lists may be bewildering, also due to the presence of modules of functionally related genes. Methods for assessing stability understand the dependency of the markers on the data or on the predictors type and help selecting solutions. RESULTS A computational framework for comparing sets of ranked biomarker lists is presented. Notions and algorithms are based on concepts from permutation group theory. We introduce several algebraic indicators and metric methods for symmetric groups, including the Canberra distance, a weighted version of Spearmans footrule. We also consider distances between partial lists and an aggregation of sets of lists into an optimal list based on voting theory (Borda count). The stability indicators are applied in practical situations to several synthetic, cancer microarray and proteomics datasets. The addressed issues are predictive classification, presence of modules, comparison of alternative biomarker lists, outlier removal, control of selection bias by randomization techniques and enrichment analysis. AVAILABILITY Supplementary Material and software are available at the address http://biodcv.fbk.eu/listspy.html

international symposium on neural networks | 2003

An accelerated procedure for recursive feature ranking on microarray data

Cesare Furlanello; Maria Serafini; Stefano Merler; Giuseppe Jurman

We describe a new wrapper algorithm for fast feature ranking in classification problems. The Entropy-based Recursive Feature Elimination (E-RFE) method eliminates chunks of uninteresting features according to the entropy of the weights distribution of a SVM classifier. With specific regard to DNA microarray datasets, the method is designed to support computationally intensive model selection in classification problems in which the number of features is much larger than the number of samples. We test E-RFE on synthetic and real data sets, comparing it with other SVM-based methods. The speed-up obtained with E-RFE supports predictive modeling on high dimensional microarray data.

Bioinformatics | 2013

minerva and minepy

Davide Albanese; Michele Filosi; Roberto Visintainer; Samantha Riccadonna; Giuseppe Jurman; Cesare Furlanello

UNLABELLED We introduce a novel implementation in ANSI C of the MINE family of algorithms for computing maximal information-based measures of dependence between two variables in large datasets, with the aim of a low memory footprint and ease of integration within bioinformatics pipelines. We provide the libraries minerva (with the R interface) and minepy for Python, MATLAB, Octave and C++. The C solution reduces the large memory requirement of the original Java implementation, has good upscaling properties and offers a native parallelization for the R interface. Low memory requirements are demonstrated on the MINE benchmarks as well as on large ( = 1340) microarray and Illumina GAII RNA-seq transcriptomics datasets. AVAILABILITY AND IMPLEMENTATION Source code and binaries are freely available for download under GPL3 licence at http://minepy.sourceforge.net for minepy and through the CRAN repository http://cran.r-project.org for the R package minerva. All software is multiplatform (MS Windows, Linux and OSX).

International Journal of Cancer | 2006

Gene expression profiling identifies potential relevant genes in alveolar rhabdomyosarcoma pathogenesis and discriminates PAX3-FKHR positive and negative tumors

Cristiano De Pittà; Lucia Tombolan; Giada Albiero; F. Sartori; Chiara Romualdi; Giuseppe Jurman; Modesto Carli; Cesare Furlanello; Gerolamo Lanfranchi; Angelo Rosolen

We analyzed the expression signatures of 14 tumor biopsies from children affected by alveolar rhabdomyosarcoma (ARMS) to identify genes correlating to biological features of this tumor. Seven of these patients were positive for the PAX3‐FKHR fusion gene and 7 were negative. We used a cDNA platform containing a large majority of probes derived from muscle tissues. The comparison of transcription profiles of tumor samples with fetal skeletal muscle identified 171 differentially expressed genes common to all ARMS patients. The functional classification analysis of altered genes led to the identification of a group of transcripts (LGALS1, BIN1) that may be relevant for the tumorigenic processes. The muscle‐specific microarray platform was able to distinguish PAX3‐FKHR positive and negative ARMS through the expression pattern of a limited number of genes (RAC1, CFL1, CCND1, IGFBP2) that might be biologically relevant for the different clinical behavior and aggressiveness of the 2 ARMS subtypes. Expression levels for selected candidate genes were validated by quantitative real‐time reverse‐transcription PCR.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2005

Semisupervised Learning for Molecular Profiling

Cesare Furlanello; Maria Serafini; Stefano Merler; Giuseppe Jurman

Class prediction and feature selection are two learning tasks that are strictly paired in the search of molecular profiles from microarray data. Researchers have become aware how easy it is to incur a selection bias effect, and complex validation setups are required to avoid overly optimistic estimates of the predictive accuracy of the models and incorrect gene selections. This paper describes a semisupervised pattern discovery approach that uses the by-products of complete validation studies on experimental setups for gene profiling. In particular, we introduce the study of the patterns of single sample responses (sample-tracking profiles) to the gene selection process induced by typical supervised learning tasks in microarray studies. We originate sample-tracking profiles as the aggregated off-training evaluation of SVM models of increasing gene panel sizes. Genes are ranked by E-RFE, an entropy-based variant of the recursive feature elimination for support vector machines (RFE-SVM). A dynamic time warping (DTW) algorithm is then applied to define a metric between sample-tracking profiles. An unsupervised clustering based on the DTW metric allows automating the discovery of outliers and of subtypes of different molecular profiles. Applications are described on synthetic data and in two gene expression studies.

Explore More