Francesca Ruffino | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Francesca Ruffino is active.

Explore More

Publication

Featured researches published by Francesca Ruffino.

Neurocomputing | 2004

Cancer recognition with bagged ensembles of Support Vector Machines

Giorgio Valentini; Marco Muselli; Francesca Ruffino

Abstract Expression-based classification of tumors requires stable, reliable and variance reduction methods, as DNA microarray data are characterized by low size, high dimensionality, noise and large biological variability. In order to address the variance and curse of dimensionality problems arising from this difficult task, we propose to apply bagged ensembles of support vector machines (SVM) and feature selection algorithms to the recognition of malignant tissues. Presented results show that bagged ensembles of SVMs are more reliable and achieve equal or better classification accuracy with respect to single SVMs, whereas feature selection methods can further enhance classification accuracy.

international symposium on neural networks | 2003

Bagged ensembles of Support Vector Machines for gene expression data analysis

Giorgio Valentini; Marco Muselli; Francesca Ruffino

Extracting information from gene expression data is a difficult task, as these data are characterized by very high dimensional, small sized, samples and large degree of biological variability. However, a possible way of dealing with the curse of dimensionality is offered by feature selection algorithms, while variance problems arising from small samples and biological variability can be addressed through ensemble methods based on resampling techniques. These two approaches have been combined to improve the accuracy of Support Vector Machines (SVM) in the classification of malignant tissues from DNA microarray data. To assess the accuracy and the confidence of the predictions performed proper measures have been introduced. Presented results show that bagged ensembles of SVM are more reliable and achieve equal or better classification accuracy with respect to single SVM, whereas feature selection methods can further enhance classification accuracy.

Artificial Intelligence in Medicine | 2009

Evaluating switching neural networks through artificial and real gene expression data

Marco Muselli; Massimiliano Costacurta; Francesca Ruffino

OBJECTIVE DNA microarrays offer the possibility of analyzing the expression level for thousands of genes concerning a specific tissue. An important target of this analysis is to derive the subset of genes involved in a biological process of interest. Here, a new promising method for gene selection is proposed, which presents a good level of accuracy and reliability. METHODS AND MATERIALS The proposed technique adopts switching neural networks (SNN), a particular kind of connectionist models, to assign a relevance value to each gene, thus employing recursive feature addition (RFA) to derive the final list of relevant genes. To fairly evaluate the quality of the new approach, called SNN-RFA, its application on three real and three artificial gene expression datasets, generated according to a proper mathematical model that possesses biological and statistical plausibility, has been considered. In particular, a comparison with other two widely used gene selection methods, namely the signal to noise ratio (S2N) and support vector machines with recursive feature elimination (SVM-RFE), has been performed. RESULTS In all the considered cases SNN-RFA achieves the best performances, arriving to determine the whole collection of relevant genes in one of the three artificial datasets. The S2N method exhibits a quality similar to that of SNN-RFA, whereas SVM-RFE shows the worst behavior. CONCLUSION The quality of the proposed method SNN-RFA has been established together with the usefulness of the mathematical model adopted to generate the artificial datasets of gene expression levels.

International Journal of Approximate Reasoning | 2008

Gene expression modeling through positive boolean functions

Francesca Ruffino; Marco Muselli; Giorgio Valentini

In the framework of gene expression data analysis, the selection of biologically relevant sets of genes and the discovery of new subclasses of diseases at bio-molecular level represent two significant problems. Unfortunately, in both cases the correct solution is usually unknown and the evaluation of the performance of gene selection and clustering methods is difficult and in many cases unfeasible. A natural approach to this complex issue consists in developing an artificial model for the generation of biologically plausible gene expression data, thus allowing to know in advance the set of relevant genes and the functional classes involved in the problem. In this work we propose a mathematical model, based on positive Boolean functions, for the generation of synthetic gene expression data. Despite its simplicity, this model is sufficiently rich to take account of the specific peculiarities of gene expression, including the biological variability, viewed as a sort of random source. As an applicative example, we also provide some data simulations and numerical experiments for the analysis of the performances of gene selection methods.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2011

A Mathematical Model for the Validation of Gene Selection Methods

Marco Muselli; Alberto Bertoni; Marco Frasca; Alessandro Beghini; Francesca Ruffino; Giorgio Valentini

Gene selection methods aim at determining biologically relevant subsets of genes in DNA microarray experiments. However, their assessment and validation represent a major difficulty since the subset of biologically relevant genes is usually unknown. To solve this problem a novel procedure for generating biologically plausible synthetic gene expression data is proposed. It is based on a proper mathematical model representing gene expression signatures and expression profiles through Boolean threshold functions. The results show that the proposed procedure can be successfully adopted to analyze the quality of statistical and machine learning-based gene selection algorithms.

international workshop on fuzzy logic and applications | 2007

Evaluating Switching Neural Networks for Gene Selection

Francesca Ruffino; Massimiliano Costacurta; Marco Muselli

A new gene selection method for analyzing microarray experiments pertaining to two classes of tissues and for determining relevant genes characterizing differences between the two classes is proposed. The new technique is based on Switching Neural Networks (SNN), learning machines that assign a relevance value to each input variable, and adopts Recursive Feature Addition (RFA) for performing gene selection. The performances of SNN-RFA are evaluated by considering its application on two real and two artificial gene expression datasets generated according to a proper mathematical model that possesses biological and statistical plausibility. Comparisons with other two widely used gene selection methods are also shown.

international conference on knowledge based and intelligent information and engineering systems | 2008

An Algorithm to Assess the Reliability of Hierarchical Clusters in Gene Expression Data

Roberto Avogadri; Matteo Brioschi; Francesca Ruffino; Fulvia Ferrazzi; Alessandro Beghini; Giorgio Valentini

The validation of clusters discovered in bio-molecular data is a central issue in bioinformatics. Recently, stability-based methods have been successfully applied to the analysis of the reliability of clusterings characterized by a relatively low number of examples and clusters. Nevertheless, several problems in functional genomics are characterized by a very large number of examples and clusters. We present a stability-based algorithm to discover significant clusters in hierarchical clusterings with a large number of examples and clusters. Preliminary results on gene expression data of patients affected by Human Myeloid Leukemia, show how to apply the proposed method when thousands of gene clusters are involved.

international conference on knowledge-based and intelligent information and engineering systems | 2007

Reliable learning: a theoretical framework

Marco Muselli; Francesca Ruffino

A proper theoretical framework, called reliable learning, for the analysis of consistency of learning techniques incorporating prior knowledge for the solution of pattern recognition problems is introduced by properly extending standard concepts of Statistical Learning Theory. In particular, two different situations are considered: in the first one a reliable region is determined where the correct classification is known; in the second case the prior knowledge regards the correct classification of some points in the training set. In both situations sufficient conditions for ensuring the consistency of the Empirical Risk Minimization (ERM) criterion is established and an explicit bound for the generalization error is derived.

italian workshop on neural nets | 2005

CONSISTENCY OF EMPIRICAL RISK MINIMIZATION FOR UNBOUNDED LOSS FUNCTIONS

Marco Muselli; Francesca Ruffino

The theoretical framework of Statistical Learning Theory (SLT) for pattern recognition problems is extended to comprehend the situations where an infinite value of the loss function is employed to prevent misclassifications in specific regions with high reliability.

international workshop on fuzzy logic and applications | 2005

Biological specifications for a synthetic gene expression data generation model

Francesca Ruffino; Marco Muselli; Giorgio Valentini

An open problem in gene expression data analysis is the evaluation of the performance of gene selection methods applied to discover biologically relevant sets of genes. The problem is difficult, as the entire set of genes involved in specific biological processes is usually unknown or only partially known, making unfeasible a correct comparison between different gene selection methods. The natural solution to this problem consists in developing an artificial model to generate gene expression data, in order to know in advance the set of biologically relevant genes. The models proposed in the literature, even if useful for a preliminary evaluation of gene selection methods, did not explicitly consider the biological characteristics of gene expression data. The main aim of this work is to individuate the main biological characteristics that need to be considered to design a model for validating gene selection methods based on the analysis of DNA microarray data.

Explore More