Marco Frasca | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marco Frasca is active.

Explore More

Publication

Featured researches published by Marco Frasca.

Bioinformatics | 2014

GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology.

Horacio Caniza; Alfonso Romero; Samuel Heron; Haixuan Yang; Alessandra Devoto; Marco Frasca; Marco Mesiti; Giorgio Valentini; Alberto Paccanaro

Summary: We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures. GOssTo is very fast, easy to use, and it allows the calculation of similarities on a genomic scale in a few minutes on a regular desktop machine. Contact: [email protected] Availability: GOssTo is available both as a stand-alone application running on GNU/Linux, Windows and MacOS from www.paccanarolab.org/gossto and as a web application from www.paccanarolab.org/gosstoweb. The stand-alone application features a simple and concise command line interface for easy integration into high-throughput data processing pipelines.

Neural Networks | 2013

A neural network algorithm for semi-supervised node label learning from unbalanced data

Marco Frasca; Alberto Bertoni; Matteo Re; Giorgio Valentini

Given a weighted graph and a partial node labeling, the graph classification problem consists in predicting the labels of all the nodes. In several application domains, from gene to social network analysis, the labeling is unbalanced: for instance positive labels may be much less than negatives. In this paper we present COSNet (COst Sensitive neural Network), a neural algorithm for predicting node labels in graphs with unbalanced labels. COSNet is based on a 2-parameter family of Hopfield networks, and consists of two main steps: (1) the network parameters are learned through a cost-sensitive optimization procedure; (2) a suitable Hopfield network restricted to the unlabeled nodes is considered and simulated. The reached equilibrium point induces the classification of the unlabeled nodes. The restriction of the dynamics leads to a significant reduction in time complexity and allows the algorithm to nicely scale with large networks. An experimental analysis on real-world unbalanced data, in the context of the genome-wide prediction of gene functions, shows the effectiveness of the proposed approach.

Bioinformatics | 2016

RANKS: a flexible tool for node label ranking and classification in biological networks

Giorgio Valentini; Giuliano Armano; Marco Frasca; Jianyi Lin; Marco Mesiti; Matteo Re

UNLABELLED RANKS is a flexible software package that can be easily applied to any bioinformatics task formalizable as ranking of nodes with respect to a property given as a label, such as automated protein function prediction, gene disease prioritization and drug repositioning. To this end RANKS provides an efficient and easy-to-use implementation of kernelized score functions, a semi-supervised algorithmic scheme embedding both local and global learning strategies for the analysis of biomolecular networks. To facilitate comparative assessment, baseline network-based methods, e.g. label propagation and random walk algorithms, have also been implemented. AVAILABILITY AND IMPLEMENTATION The package is available from CRAN: https://cran.r-project.org/ The package is written in R, except for the most computationally intensive functionalities which are implemented in C. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Neurocomputing | 2015

Automated gene function prediction through gene multifunctionality in biological networks

Marco Frasca

Abstract As the number of sequenced genomes rapidly grows, Automated Prediction of gene Function (AFP) is now a challenging problem. Despite significant progresses in the last several years, the accuracy of gene function prediction still needs to be improved in order to be used effectively in practice. Two of the main issues of AFP problem are the imbalance of gene functional annotations and the ‘multifunctional properties’ of genes. While the former is a well studied problem in machine learning, the latter has recently emerged in bioinformatics and few studies have been carried out about it. Here we propose a method for AFP which appropriately handles the label imbalance characterizing biological taxonomies, and embeds in the model the property of some genes of being ‘multifunctional’. We tested the method in predicting the functions of the Gene Ontology functional hierarchy for genes of yeast and fly model organisms, in a genome-wide approach. The achieved results show that cost-sensitive strategies and ‘gene multifunctionality’ can be combined to achieve significantly better results than the compared state-of-the-art algorithms for AFP.

Journal of Computational Biology | 2015

UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions.

Marco Frasca; Alberto Bertoni; Giorgio Valentini

The proper integration of multiple sources of data and the unbalance between annotated and unannotated proteins represent two of the main issues of the automated function prediction (AFP) problem. Most of supervised and semisupervised learning algorithms for AFP proposed in literature do not jointly consider these items, with a negative impact on both sensitivity and precision performances, due to the unbalance between annotated and unannotated proteins that characterize the majority of functional classes and to the specific and complementary information content embedded in each available source of data. We propose UNIPred (unbalance-aware network integration and prediction of protein functions), an algorithm that properly combines different biomolecular networks and predicts protein functions using parametric semisupervised neural models. The algorithm explicitly takes into account the unbalance between unannotated and annotated proteins both to construct the integrated network and to predict protein annotations for each functional class. Full-genome and ontology-wide experiments with three eukaryotic model organisms show that the proposed method compares favorably with state-of-the-art learning algorithms for AFP.

multiple classifier systems | 2015

A Hierarchical Ensemble Method for DAG-Structured Taxonomies

Peter N. Robinson; Marco Frasca; Sebastian Köhler; Marco Notaro; Matteo Re; Giorgio Valentini

Structured taxonomies characterize several real world problems, ranging from text categorization, to video annotation and protein function prediction. In this context “flat” learning methods may introduce inconsistent predictions, while structured output-aware learning methods can improve the accuracy of the predictions by exploiting the hierarchical relationships between classes. We propose a novel hierarchical ensemble method able to provide theoretically guaranteed consistent predictions for any Directed Acyclic Graph (DAG)-structured taxonomy, and consequently also for any taxonomy structured according to a tree. Results with a complex real-world DAG-structured taxonomy involving about one thousand classes and twenty thousand of examples show that the proposed hierarchical ensemble approach significantly improves flat methods, especially in terms of precision/recall curves.

international symposium on neural networks | 2013

A neural network based algorithm for gene expression prediction from chromatin structure

Marco Frasca; Giulio Pavesi

Gene expression is a very complex process, which is finely regulated and modulated at different levels. The first step of gene expression, the transcription of DNA into mRNA, is in turn regulated both at the genetic and epigenetic level. In particular, the latter, which involves the structure formed by DNA wrapped around histones (chromatin), has been recently shown to be a key factor, with post-translational modifications of histones acting combinatorially to activate or block transcription. In this work we addressed the problem of predicting the level of expression of genes starting from genome-wide maps of chromatin structure, that is, of the localization of several different histone modifications, which have been recently made available through the introduction of technologies like ChIP-Seq. We formalized the problem as a multi-class bipartite ranking problem, in which for each class a gene can be under-or over-expressed with respect to a given reference expression value. In order to deal with this problem, we exploit and extend a semi-supervised method (COSNet) based on a family of Hopfield neural networks. Benchmark genome-wide tests performed on six different human cell lines yielded satisfactory results, with clear improvements over the alternative approach most commonly adopted in the literature.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2017

Multitask Protein Function Prediction Through Task Dissimilarity

Marco Frasca; Nicolò Cesa-Bianchi

Automated protein function prediction is a challenging problem with distinctive features, such as the hierarchical organization of protein functions and the scarcity of annotated proteins for most biological functions. We propose a multitask learning algorithm addressing both issues. Unlike standard multitask algorithms, which use task (protein functions) similarity information as a bias to speed up learning, we show that dissimilarity information enforces separation of rare class labels from frequent class labels, and for this reason is better suited for solving unbalanced protein function prediction problems. We support our claim by showing that a multitask extension of the label propagation algorithm empirically works best when the task relatedness information is represented using a dissimilarity matrix as opposed to a similarity matrix. Moreover, the experimental comparison carried out on three model organism shows that our method has a more stable performance in both “protein-centric” and “function-centric” evaluation settings.

italian workshop on neural nets | 2016

Selection of Negative Examples for Node Label Prediction Through Fuzzy Clustering Techniques

Marco Frasca; Dario Malchiodi

Negative examples, which are required for most machine learning methods to infer new predictions, are rarely directly recorded in several real world databases for classification problems. A variety of heuristics for the choice of negative examples have been proposed, ranging from simply under-sampling non positive instances, to the analysis of class taxonomy structures. Here we propose an efficient strategy for selecting negative examples designed for Hopfield networks which exploits the clustering properties of positive instances. The method has been validated on the prediction of protein functions of a model organism.

Neural Computing and Applications | 2016

Learning node labels with multi-category Hopfield networks

Marco Frasca; Simone Bassis; Giorgio Valentini

In several real-world node label prediction problems on graphs, in fields ranging from computational biology to World Wide Web analysis, nodes can be partitioned into categories different from the classes to be predicted, on the basis of their characteristics or their common properties. Such partitions may provide further information about node classification that classical machine learning algorithms do not take into account. We introduce a novel family of parametric Hopfield networks (m-category Hopfield networks) and a novel algorithm (Hopfield multi-category—HoMCat), designed to appropriately exploit the presence of property-based partitions of nodes into multiple categories. Moreover, the proposed model adopts a cost-sensitive learning strategy to prevent the remarkable decay in performance usually observed when instance labels are unbalanced, that is, when one class of labels is highly underrepresented than the other one. We validate the proposed model on both synthetic and real-world data, in the context of multi-species function prediction, where the classes to be predicted are the Gene Ontology terms and the categories the different species in the multi-species protein network. We carried out an intensive experimental validation, which on the one hand compares HoMCat with several state-of-the-art graph-based algorithms, and on the other hand reveals that exploiting meaningful prior partitions of input data can substantially improve classification performances.

Explore More