Telmo de Menezes e Silva Filho
Federal University of Pernambuco
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Telmo de Menezes e Silva Filho.
Expert Systems With Applications | 2015
Telmo de Menezes e Silva Filho; Bruno A. Pimentel; Renata M. C. R. Souza; Adriano L. I. Oliveira
We present two new hybrids of FCM and improved self-adaptive PSO.The methods are based on the FCM-PSO algorithm.We use FCM to initialize one particle to achieve better results in less iterations.The new methods are compared to FCM-PSO using many real and synthetic datasets.The proposed methods consistently outperform FCM-PSO in three evaluation metrics. Fuzzy clustering has become an important research field with many applications to real world problems. Among fuzzy clustering methods, fuzzy c-means (FCM) is one of the best known for its simplicity and efficiency, although it shows some weaknesses, particularly its tendency to fall into local minima. To tackle this shortcoming, many optimization-based fuzzy clustering methods have been proposed in the literature. Some of these methods are based solely on a metaheuristic optimization, such as particle swarm optimization (PSO) whereas others are hybrid methods that combine a metaheuristic with a traditional partitional clustering method such as FCM. It is demonstrated in the literature that methods that hybridize PSO and FCM for clustering have an improved accuracy over traditional partitional clustering approaches. On the other hand, PSO-based clustering methods have poor execution time in comparison to partitional clustering techniques. Another problem with PSO-based clustering is that the current PSO algorithms require tuning a range of parameters before they are able to find good solutions. In this paper we introduce two hybrid methods for fuzzy clustering that aim to deal with these shortcomings. The methods, referred to as FCM-IDPSO and FCM2-IDPSO, combine FCM with a recent version of PSO, the IDPSO, which adjusts PSO parameters dynamically during execution, aiming to provide better balance between exploration and exploitation, avoiding falling into local minima quickly and thereby obtaining better solutions. Experiments using two synthetic data sets and eight real-world data sets are reported and discussed. The experiments considered the proposed methods as well as some recent PSO-based fuzzy clustering methods. The results show that the methods introduced in this paper provide comparable or in many cases better solutions than the other methods considered in the comparison and were much faster than the other state of the art PSO-based methods.
Knowledge Based Systems | 2017
Leandro C. Souza; Renata M. C. R. Souza; Getúlio J. A. Amaral; Telmo de Menezes e Silva Filho
Abstract Interval symbolic data is a complex data type that can often be obtained by summarizing large datasets. All existing linear regression approaches for interval data use certain fixed reference points to model intervals, such as midpoints, ranges and lower and upper bounds. This is a limitation, because different datasets might be better represented by different reference points. In this paper, we propose a new method for extracting knowledge from interval data. Our parametrized approach automatically extracts the best reference points from the regressor variables. These reference points are then used to build two linear regressions: one for the lower bounds of the response variable and another for its upper bounds. Before the regressions are applied, we compute a criterion to verify the mathematical coherence of predicted values. Mathematical coherence means that the upper bounds are greater than the lower bounds. If the criterion shows that the coherence is not guaranteed, we suggest the use of a novel interval Box-Cox transformation of the response variable. Experimental evaluations with synthetic and real interval datasets illustrate the advantages and the usefulness of the proposed method to perform interval linear regression.
ieee international conference on fuzzy systems | 2013
Telmo de Menezes e Silva Filho; Renata M. C. R. Souza
Symbolic data analysis deals with complex data types, capable of modeling internal data variability and imprecise data. This paper introduces two Fuzzy Learning Vector Quantization algorithms for interval symbolic data. One algorithm employs an interval Euclidean distance. The second uses a weighted interval Euclidean distance to try and achieve a better performance of classification when the data set is composed of classes with varying sizes, shapes and structures. The algorithms are evaluated for their performances with synthetic and real data sets. This paper aims at contributing to the area of Supervised Learning within Symbolic Data Analysis.
international symposium on neural networks | 2011
Telmo de Menezes e Silva Filho; Renata M. C. R. de Souza
This paper presents learning vector quantization classifiers with adaptive distances. The classifiers furnish discriminant class regions from the input data set that are represented by prototypes. In order to compare prototypes and patterns, the classifiers use adaptive distances that change at each iteration and are different from one class to another or from one prototype to another. Experiments with real and synthetic data sets demonstrate the usefulness of these classifiers.
international conference on artificial neural networks | 2009
Renata M. C. R. de Souza; Telmo de Menezes e Silva Filho
This paper presents a classifier based on Optimized Learning Vector Quantization (optimized version of the basic LVQ1) and an adaptive Euclidean distance. The classifier furnishes discriminative class regions of the input data set that are represented by prototypes. In order to compare prototypes and patterns, the classifier uses an adaptive Euclidean distance that changes at each iteration but is the same for all the class regions. Experiments with real and synthetic data sets demonstrate the usefulness of this classifier.
Neural Networks | 2016
Telmo de Menezes e Silva Filho; Renata M. C. R. Souza; Ricardo Bastos Cavalcante Prudêncio
Some complex data types are capable of modeling data variability and imprecision. These data types are studied in the symbolic data analysis field. One such data type is interval data, which represents ranges of values and is more versatile than classic point data for many domains. This paper proposes a new prototype-based classifier for interval data, trained by a swarm optimization method. Our work has two main contributions: a swarm method which is capable of performing both automatic selection of features and pruning of unused prototypes and a generalized weighted squared Euclidean distance for interval data. By discarding unnecessary features and prototypes, the proposed algorithm deals with typical limitations of prototype-based methods, such as the problem of prototype initialization. The proposed distance is useful for learning classes in interval datasets with different shapes, sizes and structures. When compared to other prototype-based methods, the proposed method achieves lower error rates in both synthetic and real interval datasets.
international conference on neural information processing | 2012
Telmo de Menezes e Silva Filho; Renata M. C. R. de Souza
Symbolic Data Analysis deals with complex data types, capable of modeling internal data variability and imprecise data. This paper introduces a Learning Vector Quantization algorithm for symbolic data that uses a weighted interval Euclidean distance to try and achieve a better performance of classification when the dataset is composed of classes of varying structures. This algorithm is compared to a Learning Vector Quantization algorithm that uses traditional interval Euclidean distance. The algorithms are evaluated and compared for their performances with synthetic and real datasets. This paper aims at contributing to the area of Supervised Learning within Symbolic Data Analysis.
International Journal of Business Intelligence and Data Mining | 2017
Renata M. C. R. Souza; Maria P.S. Souza; Telmo de Menezes e Silva Filho; Getúlio J. A. Amaral
Swarm-based optimisation methods have been previously used for tackling clustering tasks, with good results. However, the results obtained by this kind of algorithm are highly dependent on the chosen fitness criterion. In this work, we investigate the influence of four different fitness criteria on swarm-based clustering performance. The first function is the typical sum of distances between instances and their cluster centroids, which is the most used clustering criterion. The remaining functions are based on three different types of data dispersion: total dispersion, within-group dispersion and between-groups dispersion. We use a swarm-based algorithm to optimise these criteria and perform clustering tasks with nine real and artificial datasets. For each dataset, we select the best criterion in terms of adjusted Rand index and compare it with three state-of-the-art swarm-based clustering algorithms, trained with their proposed criteria. Numerical results confirm the importance of selecting an appropriate fitness criterion for each clustering task.
Electronic Journal of Statistics | 2017
Meelis Kull; Telmo de Menezes e Silva Filho; Peter A. Flach
For optimal decision making under variable class distributions and misclassification costs a classifier needs to produce well-calibrated estimates of the posterior probability. Isotonic calibration is a powerful nonparametric method that is however prone to overfitting on smaller datasets; hence a parametric method based on the logistic sigmoidal curve is commonly used. While logistic calibration is designed for normally distributed per-class scores, we demonstrate experimentally that many classifiers including Naive Bayes and Adaboost suffer from a particular distortion where these score distributions are heavily skewed. In such cases logistic calibration can easily yield probability estimates that are worse than the original scores. Moreover, the logistic curve family does not include the identity function, and hence logistic calibration can easily uncalibrate a perfectly calibrated classifier. In this paper we solve all these problems with a richer class of parametric calibration maps based on the beta distribution. We derive the method from first principles and show that fitting it is as easy as fitting a logistic curve. Extensive experiments show that beta calibration is superior to logistic calibration for a wide range of classifiers: Naive Bayes, Adaboost, random forest, logistic regression, support vector machine and multi-layer perceptron. If the original classifier is already calibrated, then beta calibration learns a function close to the identity. On this we build a statistical test to recognise if the model deviates from being well-calibrated.
international conference on data mining | 2016
Miquel Perelló-Nieto; Telmo de Menezes e Silva Filho; Meelis Kull; Peter A. Flach
We introduce a powerful technique to make classifiers more reliable and versatile. Background Check equips classifiers with the ability to assess the difference of unlabelled test data from the training data. In particular, Background Check gives classifiers the capability to (i) perform cautious classification with a reject option, (ii) identify outliers, and (iii) better assess the confidence in their predictions. We derive the method from first principles and consider four particular relationships between background and foreground distributions. One of these assumes an affine relationship with two parameters, and we show how this bivariate parameter space naturally interpolates between the above capabilities. We demonstrate the versatility of the approach by comparing it experimentally with published special-purpose solutions for outlier detection and confident classification on 41 benchmark datasets. Results show that Background Check can match and in many cases surpass the performances of specialised approaches.