Massimo La Rosa | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Massimo La Rosa is active.

Explore More

Publication

Featured researches published by Massimo La Rosa.

BMC Bioinformatics | 2013

Alignment-free analysis of barcode sequences by means of compression-based methods

Massimo La Rosa; Antonino Fiannaca; Riccardo Rizzo; Alfonso Urso

BackgroundThe key idea of DNA barcode initiative is to identify, for each group of species belonging to different kingdoms of life, a short DNA sequence that can act as a true taxon barcode. DNA barcode represents a valuable type of information that can be integrated with ecological, genetic, and morphological data in order to obtain a more consistent taxonomy. Recent studies have shown that, for the animal kingdom, the mitochondrial gene cytochrome c oxidase I (COI), about 650 bp long, can be used as a barcode sequence for identification and taxonomic purposes of animals. In the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. Our approach is based on the use of two compression-based versions of non-computable Universal Similarity Metric (USM) class of distances. Our purpose is to justify the employ of USM also for the analysis of short DNA barcode sequences, showing how USM is able to correctly extract taxonomic information among those kind of sequences.ResultsWe downloaded from Barcode of Life Data System (BOLD) database 30 datasets of barcode sequences belonging to different animal species. We built phylogenetic trees of every dataset, according to compression-based and classic evolutionary methods, and compared them in terms of topology preservation. In the experimental tests, we obtained scores with a percentage of similarity between evolutionary and compression-based trees between 80% and 100% for the most of datasets (94%). Moreover we carried out experimental tests using simulated barcode datasets composed of 100, 150, 200 and 500 sequences, each simulation replicated 25-fold. In this case, mean similarity scores between evolutionary and compression-based trees span between 83% and 99% for all simulated datasets.ConclusionsIn the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. Our approach is based on the use of two compression-based versions of non-computable Universal Similarity Metric (USM) class of distances. This way we demonstrate the reliability of compression-based methods even for the analysis of short barcode sequences. Compression-based methods, with their strong theoretical assumptions, may then represent a valid alignment-free and parameter-free approach for barcode studies.

BMC Bioinformatics | 2015

Probabilistic topic modeling for the analysis and classification of genomic sequences

Massimo La Rosa; Antonino Fiannaca; Riccardo Rizzo; Alfonso Urso

BackgroundStudies on genomic sequences for classification and taxonomic identification have a leading role in the biomedical field and in the analysis of biodiversity. These studies are focusing on the so-called barcode genes, representing a well defined region of the whole genome. Recently, alignment-free techniques are gaining more importance because they are able to overcome the drawbacks of sequence alignment techniques. In this paper a new alignment-free method for DNA sequences clustering and classification is proposed. The method is based on k-mers representation and text mining techniques.MethodsThe presented method is based on Probabilistic Topic Modeling, a statistical technique originally proposed for text documents. Probabilistic topic models are able to find in a document corpus the topics (recurrent themes) characterizing classes of documents. This technique, applied on DNA sequences representing the documents, exploits the frequency of fixed-length k-mers and builds a generative model for a training group of sequences. This generative model, obtained through the Latent Dirichlet Allocation (LDA) algorithm, is then used to classify a large set of genomic sequences.Results and conclusionsWe performed classification of over 7000 16S DNA barcode sequences taken from Ribosomal Database Project (RDP) repository, training probabilistic topic models. The proposed method is compared to the RDP tool and Support Vector Machine (SVM) classification algorithm in a extensive set of trials using both complete sequences and short sequence snippets (from 400 bp to 25 bp). Our method reaches very similar results to RDP classifier and SVM for complete sequences. The most interesting results are obtained when short sequence snippets are considered. In these conditions the proposed method outperforms RDP and SVM with ultra short sequences and it exhibits a smooth decrease of performance, at every taxonomic level, when the sequence length is decreased.

BMC Bioinformatics | 2015

Analysis of miRNA expression profiles in breast cancer using biclustering

Antonino Fiannaca; Massimo La Rosa; Laura La Paglia; Riccardo Rizzo; Alfonso Urso

BackgroundMicroRNAs (miRNAs) are important key regulators in multiple cellular functions, due to their a crucial role in different physiological processes. MiRNAs are differentially expressed in specific tissues, during specific cell status, or in different diseases as tumours. RNA sequencing (RNA-seq) is a Next Generation Sequencing (NGS) method for the analysis of differential gene expression. Using machine learning algorithms, it is possible to improve the functional significance interpretation of miRNA in the analysis and interpretation of data from RNA-seq. Furthermore, we tried to identify some patterns of deregulated miRNA in human breast cancer (BC), in order to give a contribution in the understanding of this type of cancer at the molecular level.ResultsWe adopted a biclustering approach, using the Iterative Signature Algorithm (ISA) algorithm, in order to evaluate miRNA deregulation in the context of miRNA abundance and tissue heterogeneity. These are important elements to identify miRNAs that would be useful as prognostic and diagnostic markers. Considering a real word breast cancer dataset, the evaluation of miRNA differential expressions in tumours versus healthy tissues evidenced 12 different miRNA clusters, associated to specific groups of patients. The identified miRNAs were deregulated in breast tumours compared to healthy controls. Our approach has shown the association between specific sub-class of tumour samples having the same immuno-histo-chemical and/or histological features. Biclusters have been validated by means of two online repositories, MetaMirClust database and UCSC Genome Browser, and using another biclustering algorithm.ConclusionsThe obtained results with biclustering algorithm aimed first of all to give a contribute in the differential expression analysis in a cohort of BC patients and secondly to support the potential role that these non-coding RNA molecules could play in the clinical practice, in terms of prognosis, evolution of tumour and treatment response.

international conference on engineering applications of neural networks | 2013

Analysis of DNA Barcode Sequences Using Neural Gas and Spectral Representation

Antonino Fiannaca; Massimo La Rosa; Riccardo Rizzo; Alfonso Urso

In this paper we present an application of the neural gas network to the classification of the DNA barcode sequences. The proposed method is based on the identification of distinctive words, extracted from the spectral representation of DNA sequences. In particular we calculated the “signatures” that are a characteristic of the DNA sequence at different taxonomic levels. In order to demonstrate the efficacy of the proposed method, we tested it over 10 real barcode datasets belonging to different animalia species, provided by on-line resource Barcode of Life Database (BOLD).

computational intelligence in bioinformatics and computational biology | 2012

An ontology design methodology for Knowledge-Based systems with application to bioinformatics

Antonino Fiannaca; Massimo La Rosa; Salvatore Gaglio

Ontologies are formal knowledge representation models. Knowledge organization is a fundamental requirement in order to develop Knowledge-Based systems. In this paper we present Data-Problem-Solver (DPS) approach, a new ontological paradigm that allows the knowledge designer to model and represent a Knowledge Base (KB) for expert systems. Our approach clearly distinguishes among the knowledge about a problem to resolve (answering the “what to do” question), the solver method to resolve it (answering the “how to do” question) and the type of input data required (answering the “what I need” question). The main purpose of the proposed paradigm is to facilitate the generalization of the application domain and the modularity and the expandability of the represented knowledge. The proposed DPS ontological approach is applied to the modelling of the knowledge about a bioinformatics application scenario: the protein complex extraction from a protein-protein interaction network.

computational intelligence methods for bioinformatics and biostatistics | 2015

A Deep Learning Approach to DNA Sequence Classification

Riccardo Rizzo; Antonino Fiannaca; Massimo La Rosa; Alfonso Urso

Deep learning neural networks are capable to extract significant features from raw data, and to use these features for classification tasks. In this work we present a deep learning neural network for DNA sequence classification based on spectral sequence representation. The framework is tested on a dataset of 16S genes and its performances, in terms of accuracy and F1 score, are compared to the General Regression Neural Network, already tested on a similar problem, as well as naive Bayes, random forest and support vector machine classifiers. The obtained results demonstrate that the deep learning approach outperformed all the other classifiers when considering classification of small sequence fragment 500 bp long.

computational intelligence methods for bioinformatics and biostatistics | 2013

Genomic Sequence Classification Using Probabilistic Topic Modeling

Massimo La Rosa; Antonino Fiannaca; Riccardo Rizzo; Alfonso Urso

Taxonomic classification of genomic sequences is usually based on evolutionary distance obtained by alignment. In this work we introduce a novel alignment-free classification approach based on probabilistic topic modeling. Using a k-mer (small fragments of length k) decomposition of DNA sequences and the Latent Dirichlet Allocation algorithm, we built a classifier for 16S rRNA bacterial gene sequences. We tested our method with a tenfold cross validation procedure considering a bacteria dataset of 3000 elements belonging to the most numerous bacteria phyla: Actinobacteria, Firmicutes and Proteobacteria. Experiments were carried out using complete and 400 bp long 16S sequences, in order to test the robustness of the proposed methodology. Our results, in terms of precision scores and for different number of topics, ranges from 100 %, at class level, to 77 % at genus level, for both full and 400 bp length, considering k-mers of length 8. These results demonstrate the effectiveness of the proposed approach.

computational intelligence methods for bioinformatics and biostatistics | 2012

A Study of Compression–Based Methods for the Analysis of Barcode Sequences

Massimo La Rosa; Antonino Fiannaca; Riccardo Rizzo; Alfonso Urso

In this paper it is introduced a new methodology for the analysis of barcode sequences. Barcode DNA is a very short nucleotide sequence, corresponding for the animal kingdom to the mitochondrial gene cytochrome c oxidase subunit 1, that acts as a unique element for identification and taxonomic purposes. Traditional barcode analysis uses well consolidated bioinformatics techniques such as sequence alignment, computation of evolutionary distances and phylogenetic trees. The proposed alignment-free approach consists in the use of two different compression-based approximations of Universal Similarity Metric in order to compute dissimilarity matrices among barcode sequences of 20 datasets belonging to different species. From these matrices phylogenetic trees are computed and compared, in terms of topology and branch length, with trees built from evolutionary distance. The results show high similarity values between compression-based and evolutionary-based trees allowing us to consider the former methodology worth to be employed for the study of barcode sequences

computational intelligence methods for bioinformatics and biostatistics | 2014

The General Regression Neural Network to Classify Barcode and mini-barcode DNA

Riccardo Rizzo; Antonino Fiannaca; Massimo La Rosa; Alfonso Urso

In the identification of living species through the analysis of their DNA sequences, the mitochondrial “cytochrome c oxidase subunit 1” (COI) gene has proved to be a good DNA barcode. Nevertheless, the quality of the full length barcode sequences often can not be guaranteed because of the DNA degradation in biological samples, so that only short sequences (mini-barcode) are available. In this paper, a prototype-based classification approach for the analysis of DNA barcode, exploiting a spectral representation of DNA sequences and a memory-based neural network, is proposed. The neural network is a modified version of General Regression Neural Network (GRNN) used as a classification tool. Furthermore, the relationship between the characteristics of different species and their spectral distribution is investigated. Namely, a subset of the whole spectrum of a DNA sequence, composed by very high frequency DNA k-mers, is considered providing a robust system for the classification of barcode sequences. The proposed approach is compared with standard classification algorithms, like Support Vector Machine (SVM), obtaining better results specially when applied to mini-barcode sequences.

Expert Systems With Applications | 2014

An expert system hybrid architecture to support experiment management

Antonino Fiannaca; Massimo La Rosa; Riccardo Rizzo; Alfonso Urso; Salvatore Gaglio

Specific expert systems are used for supporting, speeding-up and adding precision to in silico experimentation in many domains. In particular, many experimentalists exhibit a growing interest in workflow management systems for making a pipeline of experiments. Unfortunately, these type of systems does not integrate a systematic approach or a support component for the workflow composition/reuse. For this reason, in this paper we propose a knowledge-based hybrid architecture for designing expert systems that are able to support experiment management. This architecture defines a reference cognitive space and a proper ontology that describe the state of a problem by means of three different perspectives at the same time: procedural, declarative and workflow-oriented. In addition, we introduce an instance of our architecture, in order to demonstrate the features of the proposed work. In particular, we model a bioinformatics case study, according to the proposed hybrid architecture guidelines, in order to explain how to design and integrate required knowledge into an interactive system for composition and running of scientific workflows.

Explore More