Cristiano Leite Castro
Universidade Federal de Minas Gerais
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cristiano Leite Castro.
IEEE Transactions on Neural Networks | 2013
Cristiano Leite Castro; Antônio de Pádua Braga
Traditional learning algorithms applied to complex and highly imbalanced training sets may not give satisfactory results when distinguishing between examples of the classes. The tendency is to yield classification models that are biased towards the overrepresented (majority) class. This paper investigates this class imbalance problem in the context of multilayer perceptron (MLP) neural networks. The consequences of the equal cost (loss) assumption on imbalanced data are formally discussed from a statistical learning theory point of view. A new cost-sensitive algorithm (CSMLP) is presented to improve the discrimination ability of (two-class) MLPs. The CSMLP formulation is based on a joint objective function that uses a single cost parameter to distinguish the importance of class errors. The learning rule extends the Levenberg-Marquadts rule, ensuring the computational efficiency of the algorithm. In addition, it is theoretically demonstrated that the incorporation of prior information via the cost parameter may lead to balanced decision boundaries in the feature space. Based on the statistical analysis of results on real data, our approach shows a significant improvement of the area under the receiver operating characteristic curve and G-mean measures of regular MLPs.
international conference on engineering applications of neural networks | 2009
Cristiano Leite Castro; Mateus Araujo Carvalho; Antônio de Pádua Braga
Support Vector Machines (SVMs) have strong theoretical foundations and excellent empirical success in many pattern recognition and data mining applications. However, when induced by imbalanced training sets, where the examples of the target class (minority) are outnumbered by the examples of the non-target class (majority), the performance of SVM classifier is not so successful. In medical diagnosis and text classification, for instance, small and heavily imbalanced data sets are common. In this paper, we propose the Boundary Elimination and Domination algorithm (BED) to enhance SVM class-prediction accuracy on applications with imbalanced class distributions. BED is an informative resampling strategy in input space. In order to balance the class distributions, our algorithm considers density information in training sets to remove noisy examples of the majority class and generate new synthetic examples of the minority class. In our experiments, we compared BED with original SVM and Synthetic Minority Oversampling Technique (SMOTE), a popular resampling strategy in the literature. Our results demonstrate that this new approach improves SVM classifier performance on several real world imbalanced problems.
IEEE Transactions on Dielectrics and Electrical Insulation | 2016
Hilton de Oliveira Mota; Flávio Henrique Vasconcelos; Cristiano Leite Castro
This paper presents a comparison of three feature extraction methods to denoise partial discharge (PD) signals. The denoising technique employs the Stationary Wavelet Transform (SWT) associated to a spatially-adaptive selection procedure based on the coefficients propagation along decomposition levels (scales). The PD and noise related coefficients are identified and separated by an automatic data classifier using Support Vector Machines (SVM). The first and second feature extraction methods act directly on the SWT coefficients and differ only on the procedures to characterize the propagation. The third method relies on Cycle Spinning (CS) on the several translated Discrete Wavelet Transform (DWT) obtained from SWT. We conducted an empirical study using Analysis of Variance (ANOVA) to evaluate the influence of the methods on denoising performance and to guarantee the statistical significance of the tests. Afterwards, performance was evaluated considering real PD signals measured in air and in solid dielectrics, corrupted by several types of interferences, both stationary and time-varying. The results show that the three approaches allow robust signal recovering and significant noise rejection, but differ substantially on the quality of the reconstructed signals.
brazilian symposium on neural networks | 2008
Cristiano Leite Castro; Antônio de Pádua Braga
In this paper, we propose a new binary classification algorithm (AUCtron), based on gradient descent learning, that directly optimizes AUC (area under the ROC curve). We compare it with a linear classifier and with AUCsplit proposed. The AUCtron algorithm implicitly considers class prior probabilities in the decision criteria. Our results demonstrated that AUC is a sensitive enough metric that when used in small and imbalanced data sets may lead to a better separation.
Mathematical Problems in Engineering | 2015
Euler Guimarães Horta; Cristiano Leite Castro; Antônio de Pádua Braga
Big Data problems demand data models with abilities to handle time-varying, massive, and high dimensional data. In this context, Active Learning emerges as an attractive technique for the development of high performance models using few data. The importance of Active Learning for Big Data becomes more evident when labeling cost is high and data is presented to the learner via data streams. This paper presents a novel Active Learning method based on Extreme Learning Machines (ELMs) and Hebbian Learning. Linearization of input data by a large size ELM hidden layer turns our method little sensitive to parameter setting. Overfitting is inherently controlled via the Hebbian Learning crosstalk term. We also demonstrate that a simple convergence test can be used as an effective labeling criterion since it points out to the amount of labels necessary for learning. The proposed method has inherent properties that make it highly attractive to handle Big Data: incremental learning via data streams, elimination of redundant patterns, and learning from a reduced informative training set. Experimental results have shown that our method is competitive with some large-margin Active Learning strategies and also with a linear SVM.
international conference on artificial neural networks | 2012
Luiz C. B. Torres; Cristiano Leite Castro; Antônio de Pádua Braga
This paper presents a Pareto-optimal selection strategy for multiobjective learning that is based on the geometry of the separation margin between classes. The Gabriel Graph, a method borrowed from Computational Geometry, is constructed in order to obtain margin patterns and class borders. From border edges, a target separator is obtained in order to obtain a large margin classifier. The selected model from the generated Pareto-set is the one that is closer to the target separator. The method presents robustness in both synthetic and real benchmark datasets. It is efficient for Pareto-Optimal selection of neural networks and no claim is made that the obtained solution is equivalent to a maximum margin separator.
international conference on artificial neural networks | 2012
Cristiano Leite Castro; Antônio de Pádua Braga
This paper investigates the use of the Area Under the ROC Curve (AUC) as an alternative criteria for model selection in classification problems with unbalanced datasets. A novel algorithm, named here as AUCMLP, which incorporates AUC optimization into the Multi-layer Perceptron (MLPs) learning process is presented. The basic principle of AUCMLP is the solution of an optimization problem that aims at ranking quality as well as the separability of class distributions with respect to the threshold decision. Preliminary results achieved on real data, point out that our approach is promising, and can lead to better decision surfaces, specially under more severe unbalance conditions.
Sba: Controle & Automação Sociedade Brasileira de Automatica | 2011
Cristiano Leite Castro; Antônio de Pádua Braga
Traditional learning algorithms induced by complex and highly imbalanced training sets may have difficulty in distinguishing between examples of the groups. The tendency is to create classification models that are biased toward the overrepresented (majority) class, resulting in a low rate of recognition for the minority group. This paper provides a survey of this problem which has attracted the interest of many researchers in recent years. In the scope of two-class classification tasks, concepts related to the nature of the imbalanced class problem and evaluation metrics are presented, including the foundations of the ROC (Receiver Operating Characteristic) analysis; plus a state of the art of the proposed solutions. At the end of the paper a brief discussion on how the subject can be extended to multiclass learning is provided.
Neural Computing and Applications | 2017
Frederico Coelho; Cristiano Leite Castro; Antônio de Pádua Braga; Michel Verleysen
This paper presents a new relevance index based on mutual information that is based on labeled and unlabeled data. The proposed index, which is based in Mutual Information, takes into account the similarity between features and their joint influence on the output variable. Based on this principle, a method to select features is developed to eliminate redundant and irrelevant features when the relevance index value is less then a threshold value. A strategy to set the threshold is also proposed in this work. Experiments show that the new method is capable of capturing important joint relations between input and output variables, which are incorporated into a new feature selection clustering approach.
ChemBioChem | 2016
Alexandre W. C. Faria; Cristiano Leite Castro; Antônio de Pádua Braga
In this paper, a new oversampling method is proposed to improve the representativeness of minority groups in the training data set. Our methodology creates artificial (synthetic) examples on basis the spatial distribution of the classes. The original data are expanded (duplicated) along the lines connecting the class centroid and each minority pattern under consideration. In contrast to other methods known in literature (as SMOTE), our geometric approach for data generation has the advantage of being accomplished in a straightforward way, i.e., without the need of the definition of parameters by the user. Experiments conducted with real and synthetic data point out that the our solution to the class imbalance problem is able to improve the number of correct minority classifications and the balance between the class accuracies.
Collaboration
Dive into the Cristiano Leite Castro's collaboration.
Carlos Henrique Nogueira de Resende Barbosa
Universidade Federal de Minas Gerais
View shared research outputs