Cristiano Leite Castro

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cristiano Leite Castro is active.

Explore More

Publication

Featured researches published by Cristiano Leite Castro.

IEEE Transactions on Neural Networks | 2013

Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data

Cristiano Leite Castro; Antônio de Pádua Braga

Traditional learning algorithms applied to complex and highly imbalanced training sets may not give satisfactory results when distinguishing between examples of the classes. The tendency is to yield classification models that are biased towards the overrepresented (majority) class. This paper investigates this class imbalance problem in the context of multilayer perceptron (MLP) neural networks. The consequences of the equal cost (loss) assumption on imbalanced data are formally discussed from a statistical learning theory point of view. A new cost-sensitive algorithm (CSMLP) is presented to improve the discrimination ability of (two-class) MLPs. The CSMLP formulation is based on a joint objective function that uses a single cost parameter to distinguish the importance of class errors. The learning rule extends the Levenberg-Marquadts rule, ensuring the computational efficiency of the algorithm. In addition, it is theoretically demonstrated that the incorporation of prior information via the cost parameter may lead to balanced decision boundaries in the feature space. Based on the statistical analysis of results on real data, our approach shows a significant improvement of the area under the receiver operating characteristic curve and G-mean measures of regular MLPs.

international conference on engineering applications of neural networks | 2009

An Improved Algorithm for SVMs Classification of Imbalanced Data Sets

Cristiano Leite Castro; Mateus Araujo Carvalho; Antônio de Pádua Braga

Support Vector Machines (SVMs) have strong theoretical foundations and excellent empirical success in many pattern recognition and data mining applications. However, when induced by imbalanced training sets, where the examples of the target class (minority) are outnumbered by the examples of the non-target class (majority), the performance of SVM classifier is not so successful. In medical diagnosis and text classification, for instance, small and heavily imbalanced data sets are common. In this paper, we propose the Boundary Elimination and Domination algorithm (BED) to enhance SVM class-prediction accuracy on applications with imbalanced class distributions. BED is an informative resampling strategy in input space. In order to balance the class distributions, our algorithm considers density information in training sets to remove noisy examples of the majority class and generate new synthetic examples of the minority class. In our experiments, we compared BED with original SVM and Synthetic Minority Oversampling Technique (SMOTE), a popular resampling strategy in the literature. Our results demonstrate that this new approach improves SVM classifier performance on several real world imbalanced problems.

IEEE Transactions on Dielectrics and Electrical Insulation | 2016

A comparison of cycle spinning versus stationary wavelet transform for the extraction of features of partial discharge signals

Hilton de Oliveira Mota; Flávio Henrique Vasconcelos; Cristiano Leite Castro

This paper presents a comparison of three feature extraction methods to denoise partial discharge (PD) signals. The denoising technique employs the Stationary Wavelet Transform (SWT) associated to a spatially-adaptive selection procedure based on the coefficients propagation along decomposition levels (scales). The PD and noise related coefficients are identified and separated by an automatic data classifier using Support Vector Machines (SVM). The first and second feature extraction methods act directly on the SWT coefficients and differ only on the procedures to characterize the propagation. The third method relies on Cycle Spinning (CS) on the several translated Discrete Wavelet Transform (DWT) obtained from SWT. We conducted an empirical study using Analysis of Variance (ANOVA) to evaluate the influence of the methods on denoising performance and to guarantee the statistical significance of the tests. Afterwards, performance was evaluated considering real PD signals measured in air and in solid dielectrics, corrupted by several types of interferences, both stationary and time-varying. The results show that the three approaches allow robust signal recovering and significant noise rejection, but differ substantially on the quality of the reconstructed signals.

brazilian symposium on neural networks | 2008

Optimization of the Area under the ROC Curve

Cristiano Leite Castro; Antônio de Pádua Braga

In this paper, we propose a new binary classification algorithm (AUCtron), based on gradient descent learning, that directly optimizes AUC (area under the ROC curve). We compare it with a linear classifier and with AUCsplit proposed. The AUCtron algorithm implicitly considers class prior probabilities in the decision criteria. Our results demonstrated that AUC is a sensitive enough metric that when used in small and imbalanced data sets may lead to a better separation.

Mathematical Problems in Engineering | 2015

Stream-Based Extreme Learning Machine Approach for Big Data Problems

Euler Guimarães Horta; Cristiano Leite Castro; Antônio de Pádua Braga

Big Data problems demand data models with abilities to handle time-varying, massive, and high dimensional data. In this context, Active Learning emerges as an attractive technique for the development of high performance models using few data. The importance of Active Learning for Big Data becomes more evident when labeling cost is high and data is presented to the learner via data streams. This paper presents a novel Active Learning method based on Extreme Learning Machines (ELMs) and Hebbian Learning. Linearization of input data by a large size ELM hidden layer turns our method little sensitive to parameter setting. Overfitting is inherently controlled via the Hebbian Learning crosstalk term. We also demonstrate that a simple convergence test can be used as an effective labeling criterion since it points out to the amount of labels necessary for learning. The proposed method has inherent properties that make it highly attractive to handle Big Data: incremental learning via data streams, elimination of redundant patterns, and learning from a reduced informative training set. Experimental results have shown that our method is competitive with some large-margin Active Learning strategies and also with a linear SVM.

international conference on artificial neural networks | 2012

A computational geometry approach for pareto-optimal selection of neural networks

Luiz C. B. Torres; Cristiano Leite Castro; Antônio de Pádua Braga

This paper presents a Pareto-optimal selection strategy for multiobjective learning that is based on the geometry of the separation margin between classes. The Gabriel Graph, a method borrowed from Computational Geometry, is constructed in order to obtain margin patterns and class borders. From border edges, a target separator is obtained in order to obtain a large margin classifier. The selected model from the generated Pareto-set is the one that is closer to the target separator. The method presents robustness in both synthetic and real benchmark datasets. It is efficient for Pareto-Optimal selection of neural networks and no claim is made that the obtained solution is equivalent to a maximum margin separator.

international conference on artificial neural networks | 2012

Improving ANNs performance on unbalanced data with an AUC-Based learning algorithm

Cristiano Leite Castro; Antônio de Pádua Braga

This paper investigates the use of the Area Under the ROC Curve (AUC) as an alternative criteria for model selection in classification problems with unbalanced datasets. A novel algorithm, named here as AUCMLP, which incorporates AUC optimization into the Multi-layer Perceptron (MLPs) learning process is presented. The basic principle of AUCMLP is the solution of an optimization problem that aims at ranking quality as well as the separability of class distributions with respect to the threshold decision. Preliminary results achieved on real data, point out that our approach is promising, and can lead to better decision surfaces, specially under more severe unbalance conditions.

Sba: Controle & Automação Sociedade Brasileira de Automatica | 2011

Aprendizado supervisionado com conjuntos de dados desbalanceados

Cristiano Leite Castro; Antônio de Pádua Braga

Traditional learning algorithms induced by complex and highly imbalanced training sets may have difficulty in distinguishing between examples of the groups. The tendency is to create classification models that are biased toward the overrepresented (majority) class, resulting in a low rate of recognition for the minority group. This paper provides a survey of this problem which has attracted the interest of many researchers in recent years. In the scope of two-class classification tasks, concepts related to the nature of the imbalanced class problem and evaluation metrics are presented, including the foundations of the ROC (Receiver Operating Characteristic) analysis; plus a state of the art of the proposed solutions. At the end of the paper a brief discussion on how the subject can be extended to multiclass learning is provided.

Neural Computing and Applications | 2017

Semi-supervised relevance index for feature selection

Frederico Coelho; Cristiano Leite Castro; Antônio de Pádua Braga; Michel Verleysen

This paper presents a new relevance index based on mutual information that is based on labeled and unlabeled data. The proposed index, which is based in Mutual Information, takes into account the similarity between features and their joint influence on the output variable. Based on this principle, a method to select features is developed to eliminate redundant and irrelevant features when the relevance index value is less then a threshold value. A strategy to set the threshold is also proposed in this work. Experiments show that the new method is capable of capturing important joint relations between input and output variables, which are incorporated into a new feature selection clustering approach.

ChemBioChem | 2016

A New Oversampling-Based Approach for Class Imbalance Problem

Alexandre W. C. Faria; Cristiano Leite Castro; Antônio de Pádua Braga

In this paper, a new oversampling method is proposed to improve the representativeness of minority groups in the training data set. Our methodology creates artificial (synthetic) examples on basis the spatial distribution of the classes. The original data are expanded (duplicated) along the lines connecting the class centroid and each minority pattern under consideration. In contrast to other methods known in literature (as SMOTE), our geometric approach for data generation has the advantage of being accomplished in a straightforward way, i.e., without the need of the definition of parameters by the user. Experiments conducted with real and synthetic data point out that the our solution to the class imbalance problem is able to improve the number of correct minority classifications and the balance between the class accuracies.

Explore More