Diego Parente Paiva Mesquita
Federal University of Ceará
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Diego Parente Paiva Mesquita.
international work-conference on artificial and natural neural networks | 2015
Diego Parente Paiva Mesquita; João Paulo Pordeus Gomes; Amauri Holanda de Souza Júnior
The use of ensemble methods for pattern classification have gained attention in recent years mainly due to its improvements on classification rates. This paper evaluates ensemble learning methods using the Minimal Learning Machines (MLM), a recently proposed supervised learning algorithm. Additionally, we introduce an alternative output estimation procedure to reduce the complexity of the standard MLM. The proposed methods are evaluated on real datasets and compared to several state-of-the-art classification algorithms.
Applied Soft Computing | 2017
Diego Parente Paiva Mesquita; João Paulo Pordeus Gomes; Leonardo Ramos Rodrigues; Saulo A. F. Oliveira; Roberto Kawakami Harrop Galvão
Abstract Randomization based methods for training neural networks have gained increasing attention in recent years and achieved remarkable performances on a wide variety of tasks. The interest in such methods relies on the fact that standard gradient based learning algorithms may often converge to local minima and are usually time consuming. Despite the good performance achieved by Randomization Based Neural Networks (RNNs), the random feature mapping procedure may generate redundant information, leading to suboptimal solutions. To overcome this problem, some strategies have been used such as feature selection, hidden neuron pruning and ensemble methods. Feature selection methods discard redundant information from the original dataset. Pruning methods eliminate hidden nodes with redundant information. Ensemble methods combine multiple models to generate a single one. Selective ensemble methods select a subset of all available models to generate the final model. In this paper, we propose a selective ensemble of RNNs based on the Successive Projections Algorithm (SPA), for regression problems. The proposed method, named Selective Ensemble of RNNs using the Successive projections algorithm (SERS), employs the SPA for three distinct tasks: feature selection, pruning and ensemble selection. SPA was originally developed as a feature selection technique and has been recently employed for RNN pruning. Herein, we show that it can also be employed for ensemble selection. The proposed framework was used to develop three selective ensemble models based on the three RNNs: Extreme Learning Machines (ELM), Feedforward Neural Network with Random Weights (FNNRW) and Random Vector Functional Link (RVFL). The performances of SERS-ELM, SERS-FNNRW and SERS-RVFL were assessed in terms of model accuracy and model complexity in several real world benchmark problems. Comparisons to related methods showed that SERS variants achieved similar accuracies with significant model complexity reduction. Among the proposed models, SERS-RVFL had the best accuracies and all variants had similar model complexities.
international conference on neural information processing | 2015
Diego Parente Paiva Mesquita; João Paulo Pordeus Gomes; Amauri H. Souza
Minimal Learning Machine (MLM) is a recently proposed supervised learning algorithm with simple implementation and few hyper-parameters. Learning MLM model consists on building a linear mapping between input and output distance matrices. In this work, the standard MLM is modified to deal with missing data. For that, the expected squared distance approach is used to compute the input space distance matrix. The proposed approach showed promising results when compared to standard strategies that deal with missing data.
Neural Processing Letters | 2017
Diego Parente Paiva Mesquita; João Paulo Pordeus Gomes; Amauri Holanda de Souza Júnior
Minimal Learning Machine (MLM) is a recently proposed supervised learning algorithm with performance comparable to most state-of-the-art machine learning methods. In this work, we propose ensemble methods for classification and regression using MLMs. The goal of ensemble strategies is to produce more robust and accurate models when compared to a single classifier or regression model. Despite its successful application, MLM employs a computationally intensive optimization problem as part of its test procedure (out-of-sample data estimation). This becomes even more noticeable in the context of ensemble learning, where multiple models are used. Aiming to provide fast alternatives to the standard MLM, we also propose the Nearest Neighbor Minimal Learning Machine and the Cubic Equation Minimal Learning Machine to cope with classification and single-output regression problems, respectively. The experimental assessment conducted on real-world datasets reports that ensemble of fast MLMs perform comparably or superiorly to reference machine learning algorithms.
Applied Soft Computing | 2016
Diego Parente Paiva Mesquita; Lincoln S. Rocha; João Paulo Pordeus Gomes; Ajalmar R. da Rocha Neto
Graphical abstractDisplay Omitted HighlightsWe propose the use of classification with reject option for software defect prediction (SDP) as a way to incorporate additional knowledge in the SDP process.We propose two variants of the extreme learning machine with reject option.It is proposed an ELM with reject option for imbalanced datasets.The proposed method is tested on five real world software datasets.An example is shown to illustrate how the rejected software modules can be further analyzed to improve the final SDP accuracy. ContextSoftware defect prediction (SDP) is an important task in software engineering. Along with estimating the number of defects remaining in software systems and discovering defect associations, classifying the defect-proneness of software modules plays an important role in software defect prediction. Several machine-learning methods have been applied to handle the defect-proneness of software modules as a classification problem. This type of yes or no decision is an important drawback in the decision-making process and if not precise may lead to misclassifications. To the best of our knowledge, existing approaches rely on fully automated module classification and do not provide a way to incorporate extra knowledge during the classification process. This knowledge can be helpful in avoiding misclassifications in cases where system modules cannot be classified in a reliable way. ObjectiveWe seek to develop a SDP method that (i) incorporates a reject option in the classifier to improve the reliability in the decision-making process; and (ii) makes it possible postpone the final decision related to rejected modules for an expert analysis or even for another classifier using extra domain knowledge. MethodWe develop a SDP method called rejoELM and its variant, IrejoELM. Both methods were built upon the weighted extreme learning machine (ELM) with reject option that makes it possible postpone the final decision of non-classified modules, the rejected ones, to another moment. While rejoELM aims to maximize the accuracy for a rejection rate, IrejoELM maximizes the F-measure. Hence, IrejoELM becomes an alternative for classification with reject option for imbalanced datasets. ResultsrejoEM and IrejoELM are tested on five datasets of source code metrics extracted from real world open-source software projects. Results indicate that rejoELM has an accuracy for several rejection rates that is comparable to some state-of-the-art classifiers with reject option. Although IrejoELM shows lower accuracies for several rejection rates, it clearly outperforms all other methods when the F-measure is used as a performance metric. ConclusionIt is concluded that rejoELM is a valid alternative for classification with reject option problems when classes are nearly equally represented. On the other hand, IrejoELM is shown to be the best alternative for classification with reject option on imbalanced datasets. Since SDP problems are usually characterized as imbalanced learning problems, the use of IrejoELM is recommended.
Neurocomputing | 2017
Diego Parente Paiva Mesquita; João Paulo Pordeus Gomes; Amauri Holanda de Souza Júnior; Juvêncio Santos Nobre
This paper proposes a method to estimate the expected value of the Euclidean distance between two possibly incomplete feature vectors. Under the Missing at Random assumption, we show that the Euclidean distance can be modeled by a Nakagami distribution, for which the parameters we express as a function of the moments of the unknown data distribution. In our formulation the data distribution is modeled using a mixture of Gaussians. The proposed method, named Expected Euclidean Distance (EED), is validated through a series of experiments using synthetic and real-world data. Additionally, we show the application of EED to the Minimal Learning Machine (MLM), a distance-based supervised learning method. Experimental results show that EED outperforms existing methods that estimate Euclidean distances in an indirect manner. We also observe that the application of EED to the MLM provides promising results.
brazilian conference on intelligent systems | 2016
Weslley L. Caldas; João Paulo Pordeus Gomes; Michelle G. Cacais; Diego Parente Paiva Mesquita
Semi-supervised learning is a challenging topic in machine learning that has attracted much attention in recent years. The availability of huge volumes of data and the work necessary to label all these data are two of the reasons that can explain this interest. Among the various methods for semi-supervised learning, the co-training framework has become popular due to its simple formulation and promising results. In this work, we propose Co-MLM, a semi-supervised learning algorithm based on a recently supervised method named Minimal Learning Machine (MLM), built upon co-training framework. Experiments on UCI data sets showed that Co-MLM has promising performance in compared to other co-training style algorithms.
New Generation Computing | 2018
Weslley L. Caldas; João Paulo Pordeus Gomes; Diego Parente Paiva Mesquita
Co-training is a framework for semi-supervised learning that has attracted much attention due to its good performance and easy adaptation for various learning algorithms. In a recent work, Caldas et al. proposed a co-training-based method using the recently proposed supervised learning method named minimal learning machine (MLM). Although the proposed method, referred to as Co-MLM, presented results that are comparable to other semi-supervised algorithms, using MLM as a base learner resulted in a formulation with heavy computational cost. Aiming to mitigate this problem, in this paper, we propose an improved variant of Co-MLM with reduced computational cost on both training and testing phases. The proposed method is compared to Co-MLM and other Co-training-based semi-supervised methods, presenting comparable performances.
international work-conference on artificial and natural neural networks | 2017
Marcelo B. A. Veras; Diego Parente Paiva Mesquita; João Paulo Pordeus Gomes; Amauri Holanda de Souza Júnior; Guilherme A. Barreto
The Forward Stagewise Regression (FSR) algorithm is a popular procedure to generate sparse linear regression models. However, the standard FSR assumes that the data are fully observed. This assumption is often flawed and pre-processing steps are applied to the dataset so that FSR can be used. In this paper, we extend the FSR algorithm to directly handle datasets with partially observed feature vectors, dismissing the need for the data to be pre-processed. Experiments were carried out on real-world datasets and the proposed method reported promising results when compared to the usual strategies for handling incomplete data.
intelligent systems design and applications | 2016
Diego Parente Paiva Mesquita; João Paulo Pordeus Gomes
Radial Basis Function Neural Networks (RBFNN) are among the most popular supervised learning methods and showed significant results in various applications. Despite is applicability, RBFNNs basic formulation can not handle datasets with missing attributes. Aiming to overcome this problem, in this work, the RBFNN is modified to deal with missing data. For that, the expected squared distance approach is used to compute the RBF Kernel. The proposed approach showed promising results when compared to standard missing data strategies.