Renato Cordeiro de Amorim
University of Hertfordshire
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Renato Cordeiro de Amorim.
Pattern Recognition | 2012
Renato Cordeiro de Amorim; Boris Mirkin
This paper represents another step in overcoming a drawback of K-Means, its lack of defense against noisy features, using feature weights in the criterion. The Weighted K-Means method by Huang et al. (2008, 2004, 2005) [5-7] is extended to the corresponding Minkowski metric for measuring distances. Under Minkowski metric the feature weights become intuitively appealing feature rescaling factors in a conventional K-Means criterion. To see how this can be used in addressing another issue of K-Means, the initial setting, a method to initialize K-Means with anomalous clusters is adapted. The Minkowski metric based method is experimentally validated on datasets from the UCI Machine Learning Repository and generated sets of Gaussian clusters, both as they are and with additional uniform random noise features, and appears to be competitive in comparison with other K-Means based feature weighting algorithms.
Information Sciences | 2015
Renato Cordeiro de Amorim; Christian Hennig
In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters.We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunns, Calinski-Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.
intelligent data analysis | 2012
Renato Cordeiro de Amorim; Peter Komisarczuk
Minkowski Weighted K-Means is a variant of K-Means set in the Minkowski space, automatically computing weights for features at each cluster. As a variant of K-Means, its accuracy heavily depends on the initial centroids fed to it. In this paper we discuss our experiments comparing six initializations, random and five other initializations in the Minkowski space, in terms of their accuracy, processing time, and the recovery of the Minkowski exponent p. We have found that the Ward method in the Minkowski space tends to outperform other initializations, with the exception of low-dimensional Gaussian Models with noise features. In these, a modified version of intelligent K-Means excels.
international symposium on computational intelligence and informatics | 2012
Renato Cordeiro de Amorim
In this paper we introduce the Constrained Minkowski Weighted K-Means. This algorithm calculates cluster specific feature weights that can be interpreted as feature rescaling factors thanks to the use of the Minkowski distance. Here, we use an small amount of labelled data to select a Minkowski exponent and to generate clustering constrains based on pair-wise must-link and cannot-link rules. We validate our new algorithm with a total of 12 datasets, most of which containing features with uniformly distributed noise. We have run the algorithm numerous times in each dataset. These experiments ratify the general superiority of using feature weighting in K-Means, particularly when applying the Minkowski distance. We have also found that the use of constrained clustering rules has little effect on the average proportion of correctly clustered entities. However, constrained clustering does improve considerably the maximum of such proportion.
mexican international conference on artificial intelligence | 2012
Renato Cordeiro de Amorim
This paper presents an analysis of the number of iterations K-Means takes to converge under different initializations. We have experimented with seven initialization algorithms in a total of 37 real and synthetic datasets. We have found that hierarchical-based initializations tend to be most effective at reducing the number of iterations, especially a divisive algorithm using the Ward criterion when applied to real datasets.
Neurocomputing | 2016
Renato Cordeiro de Amorim; Vladimir Makarenkov
We consider the Weighted K-Means algorithm with distributed centroids aimed at clustering data sets with numerical, categorical and mixed types of data. Our approach allows given features (i.e., variables) to have different weights at different clusters. Thus, it supports the intuitive idea that features may have different degrees of relevance at different clusters. We use the Minkowski metric in a way that feature weights become feature re-scaling factors for any considered exponent. Moreover, the traditional Silhouette clustering validity index was adapted to deal with both numerical and categorical types of features. Finally, we show that our new method usually outperforms traditional K-Means as well as the recently proposed WK-DC clustering algorithm.
intelligent data analysis | 2012
Renato Cordeiro de Amorim; Trevor I. Fenner
In this paper we introduce the Minkowski weighted partition around medoids algorithm (MW-PAM). This extends the popular partition around medoids algorithm (PAM) by automatically assigning K weights to each feature in a dataset, where K is the number of clusters. Our approach utilizes the within-cluster variance of features to calculate the weights and uses the Minkowski metric. We show through many experiments that MW-PAM, particularly when initialized with the Build algorithm (also using the Minkowski metric), is superior to other medoid-based algorithms in terms of both accuracy and identification of irrelevant features.
Journal of Classification | 2016
Renato Cordeiro de Amorim
In a real-world data set, there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process.With over 50 years of history, K-Means is arguably the most popular partitional clustering algorithm there is. The first K-Means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analyzing some of the most popular, or innovative, feature weighting mechanisms based in K-Means.
Artificial Intelligence Review | 2012
Renato Cordeiro de Amorim; Boris Mirkin; John Q. Gan
In this paper we describe a new method for EEG signal classification in which the classification of one subject’s EEG signals is based on features learnt from another subject. This method applies to the power spectrum density data and assigns class-dependent information weights to individual features. The informative features appear to be rather similar among different subjects, thus supporting the view that there are subject independent general brain patterns for the same mental task. Classification is done via clustering using the intelligent k-means algorithm with the most informative features from a different subject. We experimentally compare our method with others.
Pattern Recognition | 2017
Renato Cordeiro de Amorim; Andrei Shestakov; Boris Mirkin; Vladimir Makarenkov
We generate optimal Minkowski partitions at various values of the exponent p.We define the Minkowski profile based on the average similarity between partitions.Minkowski profile is highly correlated with ARI vectors related to the ground truth.We define the central Minkowski partition which can serve as a consensus partition.The Silhouette width should be used for selecting the optimal Minkowski exponent. The Minkowski weighted K-means (MWK-means) is a recently developed clustering algorithm capable of computing feature weights. The cluster-specific weights in MWK-means follow the intuitive idea that a feature with low variance should have a greater weight than a feature with high variance. The final clustering found by this algorithm depends on the selection of the Minkowski distance exponent. This paper explores the possibility of using the central Minkowski partition in the ensemble of all Minkowski partitions for selecting an optimal value of the Minkowski exponent. The central Minkowski partition appears to be also a good consensus partition. Furthermore, we discovered some striking correlation results between the Minkowski profile, defined as a mapping of the Minkowski exponent values into the average similarity values of the optimal Minkowski partitions, and the Adjusted Rand Index vectors resulting from the comparison of the obtained partitions to the ground truth. Our findings were confirmed by a series of computational experiments involving synthetic Gaussian clusters and real-world data.