Ricardo J. G. B. Campello

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ricardo J. G. B. Campello is active.

Explore More

Publication

Featured researches published by Ricardo J. G. B. Campello.

systems man and cybernetics | 2009

A Survey of Evolutionary Algorithms for Clustering

Eduardo R. Hruschka; Ricardo J. G. B. Campello; Alex Alves Freitas; A. de Carvalho

This paper presents a survey of evolutionary algorithms designed for clustering tasks. It tries to reflect the profile of this area by focusing more on those subjects that have been given more importance in the literature. In this context, most of the paper is devoted to partitional algorithms that look for hard clusterings of data, though overlapping (i.e., soft and fuzzy) approaches are also covered in the paper. The paper is original in what concerns two main aspects. First, it provides an up-to-date overview that is fully devoted to evolutionary algorithms for clustering, is not limited to any particular kind of evolutionary approach, and comprises advanced topics like multiobjective and ensemble-based evolutionary clustering. Second, it provides a taxonomy that highlights some very important aspects in the context of evolutionary data clustering, namely, fixed or variable number of clusters, cluster-oriented or nonoriented operators, context-sensitive or context-insensitive operators, guided or unguided operators, binary, integer, or real encodings, centroid-based, medoid-based, label-based, tree-based, or graph-based representations, among others. A number of references are provided that describe applications of evolutionary algorithms for clustering in different domains, such as image processing, computer security, and bioinformatics. The paper ends by addressing some important issues and open questions that can be subject of future research.

Information Sciences | 2006

Evolving clusters in gene-expression data

Eduardo R. Hruschka; Ricardo J. G. B. Campello; Leandro Nunes de Castro

Clustering is a useful exploratory tool for gene-expression data. Although successful applications of clustering techniques have been reported in the literature, there is no method of choice in the gene-expression analysis community. Moreover, there are only a few works that deal with the problem of automatically estimating the number of clusters in bioinformatics datasets. Most clustering methods require the number k of clusters to be either specified in advance or selected a posteriori from a set of clustering solutions over a range of k. In both cases, the user has to select the number of clusters. This paper proposes improvements to a clustering genetic algorithm that is capable of automatically discovering an optimal number of clusters and its corresponding optimal partition based upon numeric criteria. The proposed improvements are mainly designed to enhance the efficiency of the original clustering genetic algorithm, resulting in two new clustering genetic algorithms and an evolutionary algorithm for clustering (EAC). The original clustering genetic algorithm and its modified versions are evaluated in several runs using six gene-expression datasets in which the right clusters are known a priori. The results illustrate that all the proposed algorithms perform well in gene-expression data, although statistical comparisons in terms of the computational efficiency of each algorithm point out that EAC outperforms the others. Statistical evidence also shows that EAC is able to outperform a traditional method based on multiple runs of k-means over a range of k.

Pattern Recognition Letters | 2007

A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment

Ricardo J. G. B. Campello

A fuzzy extension of the Rand index [Rand, W.M., 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 846-850] is introduced in this paper. The Rand index is a traditional criterion for assessment and comparison of different results provided by classifiers and clustering algorithms. It is able to measure the quality of different hard partitions of a data set from a classification perspective, including partitions with different numbers of classes or clusters. The original Rand index is extended here by making it able to evaluate a fuzzy partition of a data set - provided by a fuzzy clustering algorithm or a classifier with fuzzy-like outputs - against a reference hard partition that encodes the actual (known) data classes. A theoretical formulation based on formal concepts from the fuzzy set theory is derived and used as a basis for the mathematical interpretation of the Fuzzy Rand Index proposed. The fuzzy counterparts of other (five) related indexes, namely, the Adjusted Rand Index of Hubert and Arabie, the Jaccard coefficient, the Minkowski measure, the Fowlkes-Mallows Index, and the @C statistics, are also derived from this formulation.

Fuzzy Sets and Systems | 2006

A fuzzy extension of the silhouette width criterion for cluster analysis

Ricardo J. G. B. Campello; Eduardo R. Hruschka

The present paper proposes a new cluster validity measure as an additional criterion to help the decision making process in fuzzy cluster analysis. This measure, named Fuzzy Silhouette, is a generalization to the fuzzy case of the Average Silhouette Width Criterion, originally conceived to assess crisp (non-fuzzy) data partitions. The Fuzzy Silhouette is more appealing than its crisp counterpart in the context of fuzzy cluster analysis since it makes explicit use of the fuzzy partition matrix provided by the clustering algorithm. In addition, it has been designed to improve performance of the original silhouette criterion in detecting regions with higher data density when the data set involves overlapping clusters. The performance of the Fuzzy Silhouette is evaluated and compared to that of five well-known cluster validity measures. Six data sets are used to illustrate different scenarios in which the proposed Fuzzy Silhouette performs similar to or better than these other criteria, thus becoming eligible to join a pool of measures to be used all together in fuzzy cluster analysis.

pacific-asia conference on knowledge discovery and data mining | 2013

Density-Based Clustering Based on Hierarchical Density Estimates

Ricardo J. G. B. Campello; Davoud Moulavi; Joerg Sander

We propose a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed. For obtaining a “flat” partition consisting of only the most significant clusters (possibly corresponding to different density thresholds), we propose a novel cluster stability measure, formalize the problem of maximizing the overall stability of selected clusters, and formulate an algorithm that computes an optimal solution to this problem. We demonstrate that our approach outperforms the current, state-of-the-art, density-based clustering methods on a wide variety of real world data.

Automatica | 2004

Brief Optimal expansions of discrete-time Volterra models using Laguerre functions

Ricardo J. G. B. Campello; Gérard Favier; Wagner Caradori do Amaral

This work is concerned with the optimization of Laguerre bases for the orthonormal series expansion of discrete-time Volterra models. The aim is to minimize the number of Laguerre functions associated with a given series truncation error, thus reducing the complexity of the resulting finite-dimensional representation. Fu and Dumont (IEEE Trans. Automatic Control 38(6) (1993) 934) indirectly approached this problem in the context of linear systems by minimizing an upper bound for the error resulting from the truncated Laguerre expansion of impulse response models, which are equivalent to first-order Volterra models. A generalization of the work mentioned above focusing on Volterra models of any order is presented in this paper. The main result is the derivation of analytic strict global solutions for the optimal expansion of the Volterra kernels either using an independent Laguerre basis for each kernel or using a common basis for all the kernels.

Sigkdd Explorations | 2014

Ensembles for unsupervised outlier detection: challenges and research questions a position paper

Arthur Zimek; Ricardo J. G. B. Campello; Jörg Sander

Ensembles for unsupervised outlier detection is an emerging topic that has been neglected for a surprisingly long time (although there are reasons why this is more difficult than supervised ensembles or even clustering ensembles). Aggarwal recently discussed algorithmic patterns of outlier detection ensembles, identified traces of the idea in the literature, and remarked on potential as well as unlikely avenues for future transfer of concepts from supervised ensembles. Complementary to his points, here we focus on the core ingredients for building an outlier ensemble, discuss the first steps taken in the literature, and identify challenges for future research.

statistical and scientific database management | 2013

On the combination of relative clustering validity criteria

Lucas Vendramin; Pablo A. Jaskowiak; Ricardo J. G. B. Campello

Many different relative clustering validity criteria exist that are very useful as quantitative measures for assessing the quality of data partitions. These criteria are endowed with particular features that may make each of them more suitable for specific classes of problems. Nevertheless, the performance of each criterion is usually unknown a priori by the user. Hence, choosing a specific criterion is not a trivial task. A possible approach to circumvent this drawback consists of combining different relative criteria in order to obtain more robust evaluations. However, this approach has so far been applied in an ad-hoc fashion only; its real potential is actually not well-understood. In this paper, we present an extensive study on the combination of relative criteria considering both synthetic and real datasets. The experiments involved 28 criteria and 4 different combination strategies applied to a varied collection of data partitions produced by 5 clustering algorithms. In total, 427,680 partitions of 972 synthetic datasets and 14,000 partitions of a collection of 400 image datasets were considered. Based on the results, we discuss the shortcomings and possible benefits of combining different relative criteria into a committee.

knowledge discovery and data mining | 2013

Subsampling for efficient and effective unsupervised outlier detection ensembles

Arthur Zimek; Matthew Gaudet; Ricardo J. G. B. Campello; Jörg Sander

Outlier detection and ensemble learning are well established research directions in data mining yet the application of ensemble techniques to outlier detection has been rarely studied. Here, we propose and study subsampling as a technique to induce diversity among individual outlier detectors. We show analytically and experimentally that an outlier detector based on a subsample per se, besides inducing diversity, can, under certain conditions, already improve upon the results of the same outlier detector on the complete dataset. Building an ensemble on top of several subsamples is further improving the results. While in the literature so far the intuition that ensembles improve over single outlier detectors has just been transferred from the classification literature, here we also justify analytically why ensembles are also expected to work in the unsupervised area of outlier detection. As a side effect, running an ensemble of several outlier detectors on subsamples of the dataset is more efficient than ensembles based on other means of introducing diversity and, depending on the sample rate and the size of the ensemble, can be even more efficient than just the single outlier detector on the complete data.

IEEE Transactions on Fuzzy Systems | 2012

Collaborative Fuzzy Clustering Algorithms: Some Refinements and Design Guidelines

Luiz F. S. Coletta; Lucas Vendramin; Eduardo R. Hruschka; Ricardo J. G. B. Campello; Witold Pedrycz

There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.

Explore More