Ricardo Cerri
Federal University of São Carlos
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ricardo Cerri.
BMC Bioinformatics | 2016
Ricardo Cerri; Rodrigo C. Barros; André Carlos Ponce Leon Ferreira de Carvalho; Yaochu Jin
BackgroundHierarchical Multi-Label Classification is a classification task where the classes to be predicted are hierarchically organized. Each instance can be assigned to classes belonging to more than one path in the hierarchy. This scenario is typically found in protein function prediction, considering that each protein may perform many functions, which can be further specialized into sub-functions. We present a new hierarchical multi-label classification method based on multiple neural networks for the task of protein function prediction. A set of neural networks are incrementally training, each being responsible for the prediction of the classes belonging to a given level.ResultsThe method proposed here is an extension of our previous work. Here we use the neural network output of a level to complement the feature vectors used as input to train the neural network in the next level. We experimentally compare this novel method with several other reduction strategies, showing that it obtains the best predictive performance. Empirical results also show that the proposed method achieves better or comparable predictive performance when compared with state-of-the-art methods for hierarchical multi-label classification in the context of protein function prediction.ConclusionsThe experiments showed that using the output in one level as input to the next level contributed to better classification results. We believe the method was able to learn the relationships between the protein functions during training, and this information was useful for classification. We also identified in which functional classes our method performed better.
international symposium on neural networks | 2015
Ricardo Cerri; Rodrigo C. Barros; André Carlos Ponce Leon Ferreira de Carvalho
Hierarchical Multi-label Classification (HMC) is a classification task where classes are organized in a hierarchical taxonomy, and instances can be simultaneously classified in more than one class. This paper investigates the HMC problem of classifying proteins in functions organized according to the Gene Ontology hierarchical taxonomy. This is a complex task, since the Gene Ontology hierarchy is organized as a Directed Acyclic Graph with thousands of classes hierarchically represented. We propose a neural network-based method to incorporate label-dependency during learning. The experimental results show that the proposed method achieves competitive results when compared to the state-of-the-art methods from the literature.
symposium on applied computing | 2017
Jonatas Wehrmann; Rodrigo C. Barros; Silvia N. das Dôres; Ricardo Cerri
In classification tasks, an object usually belongs to one class within a set of disjoint classes. In more complex tasks, an object can belong to more than one class, in what is conventionally termed multi-label classification. Moreover, there are cases in which the set of classes are organised in a hierarchical fashion, and an object must be associated to a single path in this hierarchy, defining the so-called hierarchical classification. Finally, in even more complex scenarios, the classes are organised in a hierarchical structure and the object can be associated to multiple paths of this hierarchy, defining the problem investigated in this article: hierarchical multi-label classification (HMC). We address a typical problem of HMC, which is protein function prediction, and for that we propose an approach that chains multiple neural networks, performing both local and global optimisation in order to provide the final prediction: one or multiple paths in the hierarchy of classes. We experiment with four variations of this chaining process, and we compare these strategies with the state-of-the-art HMC algorithms for protein function prediction, showing that our novel approach significantly outperforms these methods.
Bioinformatics | 2015
Carlos Norberto Fischer; Claudia Marcia Carareto; Renato Augusto Corrêa dos Santos; Ricardo Cerri; Eduardo Costa; Leander Schietgat; Celine Vens
Profile hidden Markov models (profile HMMs) are known to efficiently predict whether an amino acid (AA) sequence belongs to a specific protein family. Profile HMMs can also be used to search for protein domains in genome sequences. In this case, HMMs are typically learned from AA sequences and then used to search on the six-frame translation of nucleotide (NT) sequences. However, this approach demands additional processing of the original data and search results. Here, we propose an alternative and more direct method which converts an AA alignment into an NT one, after which an NT-based HMM is trained to be applied directly on a genome.
PLOS Computational Biology | 2018
Leander Schietgat; Celine Vens; Ricardo Cerri; Carlos Norberto Fischer; Eduardo De Paula Costa; Jan Ramon; Claudia Marcia Aparecida Carareto; Hendrik Blockeel
Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-Learner, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: RepeatMasker, Censor and LtrDigest. In contrast to these methods, TE-Learner is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance, while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-Learner’s predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.
Neurocomputing | 2018
Alex Marino Goncalves de Almeida; Ricardo Cerri; Emerson Cabrera Paraiso; Rafael Gomes Mantovani; Sylvio Barbon Junior
Abstract Sentiment Analysis is an emerging research field traditionally applied to classify opinions, sentiments and emotions towards polarity and subjectivity expressed in text. An important characteristic to automatic emotion analysis is the standpoint, in which we can look at an opinion from two perspectives, the opinion holder (author) who express an opinion, and the reader who reads and perceives the opinion. From the reader’s standpoint, the interpretations of the text can be multiple and depend on the personal background. The multiple standpoints cognition, in which readers can look at the same sentence, is an interesting scenario to use the multi-label classification paradigm in the Sentiment Analysis domain. This methodology is able to handle different target sentiments simultaneously in the same text, by also taking advantage of the relations between them. We applied different approaches such as algorithm adaptation, problem transformation and ensemble methods in order to explore the wide range of multi-label solutions. The experiments were conducted on 10,080 news sentences from two different real datasets. Experimental results showed that the Ensemble Classifier Chain overcame the other algorithms, average F-measure of 64.89% using emotion strength features, when considering six emotions and neutral sentiment.
Journal of Signal Processing Systems | 2018
Saulo Martiello Mastelini; Victor Guilherme Turrisi da Costa; Everton Jose Santana; Felipe Kenji Nakano; Rodrigo Capobianco Guido; Ricardo Cerri; Sylvio Barbon
Multi-target regression (MTR) regards predictive problems with multiple numerical targets. To solve this, machine learning techniques can model solutions treating each target as a separated problem based only on the input features. Nonetheless, modelling inter-target correlation can improve predictive performance. When performing MTR tasks using the statistical dependencies of targets, several approaches put aside the evaluation of each pair-wise correlation between those targets, which may differ for each problem. Besides that, one of the main drawbacks of the current leading MTR method is its high memory cost. In this paper, we propose a novel MTR method called Multi-output Tree Chaining (MOTC) to overcome the mentioned disadvantages. Our method provides an interpretative internal tree-based structure which represents the relationships between targets denominated Chaining Trees (CT). Different from the current techniques, we compute the outputs dependencies, one-by-one, based on the Random Forest importance metric. Furthermore, we proposed a memory friendly approach which reduces the number of required regression models when compared to a leading method, reducing computational cost. We compared the proposed algorithm against three MTR methods (Single-target - ST; Multi-Target Regressor Stacking - MTRS; and Ensemble of Regressor Chains - ERC) on 18 benchmark datasets with two base regression algorithms (Random Forest and Support Vector Regression). The obtained results show that our method is superior to the ST approach regarding predictive performance, whereas, having no significant difference from ERC and MTRS. Moreover, the interpretative tree-based structures built by MOTC pose as great insight on the relationships among targets. Lastly, the proposed solution used significantly less memory than ERC being very similar in predictive performance.
international symposium on neural networks | 2017
Felipe Kenji Nakano; Walter José G. S. Pinto; Gisele L. Pappa; Ricardo Cerri
Transposable Elements are DNA sequences that can move from one place to another inside the genome of a cell. They are important for genetic variability, and can modify the functionality of genes. The correct classification of these elements is crucial to understand their role in the evolution of species. In this paper, we investigate Transposable Elements classification as a Hierarchical Classification problem using Machine Learning. We present new hierarchical datasets suitable to be used by Machine Learning methods, and also new hierarchical top-down classification strategies using neural networks. We compared our strategies with existing ones in the literature, and evaluated them using measures specific for hierarchical problems. Experiments showed that our proposal achieved better or competitive results than those found by other methods in the literature.
international symposium on neural networks | 2017
Iuri Bonna M. de Abreu; Rafael Gomes Mantovani; Ricardo Cerri
Multi-label classification is a machine learning task where instances can be classified into two or more labels simultaneously. In this task, there exist correlations between the instances belonging to same or similar sets of labels. This paper proposes the incorporation of instance correlations by modifying the multi-label datasets. We used the label-space to create new features, which represent these correlations. The original and modified datasets were used with different multi-label classification methods. Experiments have shown that better results can be obtained when instance correlations were incorporated in the classification tasks. All methods were evaluated with measures specifically designed for multi-label problems.
international symposium on neural networks | 2017
Gustavo G. Colombini; Iuri Bonna M. de Abreu; Ricardo Cerri
In Machine Learning, multi-label classification is the task of assigning an instance to two or more categories simultaneously. This is a very challenging task, since datasets can have many instances and become very unbalanced. While most of the methods in the literature use supervised learning to solve multilabel problems, in this paper we propose the use of unsupervised learning through neural networks. More specifically, we explore the power of Self-Organizing Maps (Kohonen Maps), since they have a self-organization ability and maps input instances to a map of neurons. Because instances that are assigned to similar groups of labels tend to be more similar, there is a network tendency that, after organization, training instances which are similar to each other are mapped to closer neurons in the map. Testing instances can then be mapped to specific neurons in the network, being classified in the labels assigned to training instances mapped to these neurons. Our proposal was experimentally compared to other literature methods, showing competitive performances. The evaluation was performed using freely available datasets and measures specifically designed for multi-label problems.