Eduardo R. Hruschka | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eduardo R. Hruschka is active.

Explore More

Publication

Featured researches published by Eduardo R. Hruschka.

systems man and cybernetics | 2009

A Survey of Evolutionary Algorithms for Clustering

Eduardo R. Hruschka; Ricardo J. G. B. Campello; Alex Alves Freitas; A. de Carvalho

This paper presents a survey of evolutionary algorithms designed for clustering tasks. It tries to reflect the profile of this area by focusing more on those subjects that have been given more importance in the literature. In this context, most of the paper is devoted to partitional algorithms that look for hard clusterings of data, though overlapping (i.e., soft and fuzzy) approaches are also covered in the paper. The paper is original in what concerns two main aspects. First, it provides an up-to-date overview that is fully devoted to evolutionary algorithms for clustering, is not limited to any particular kind of evolutionary approach, and comprises advanced topics like multiobjective and ensemble-based evolutionary clustering. Second, it provides a taxonomy that highlights some very important aspects in the context of evolutionary data clustering, namely, fixed or variable number of clusters, cluster-oriented or nonoriented operators, context-sensitive or context-insensitive operators, guided or unguided operators, binary, integer, or real encodings, centroid-based, medoid-based, label-based, tree-based, or graph-based representations, among others. A number of references are provided that describe applications of evolutionary algorithms for clustering in different domains, such as image processing, computer security, and bioinformatics. The paper ends by addressing some important issues and open questions that can be subject of future research.

ACM Computing Surveys | 2013

Data stream clustering: A survey

Jonathan de Andrade Silva; Elaine R. Faria; Rodrigo C. Barros; Eduardo R. Hruschka; André Carlos Ponce Leon Ferreira de Carvalho; João Gama

Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with nonstationary, unbounded data that arrive in an online fashion. The intrinsic nature of stream data requires the development of algorithms capable of performing fast and incremental processing of data objects, suitably addressing time and memory limitations. In this article, we present a survey of data stream clustering algorithms, providing a thorough discussion of the main design components of state-of-the-art algorithms. In addition, this work addresses the temporal aspects involved in data stream clustering, and presents an overview of the usually employed experimental methodologies. A number of references are provided that describe applications of data stream clustering in different domains, such as network intrusion detection, sensor networks, and stock market analysis. Information regarding software packages and data repositories are also available for helping researchers and practitioners. Finally, some important issues and open questions that can be subject of future research are discussed.

Information Sciences | 2006

Evolving clusters in gene-expression data

Eduardo R. Hruschka; Ricardo J. G. B. Campello; Leandro Nunes de Castro

Clustering is a useful exploratory tool for gene-expression data. Although successful applications of clustering techniques have been reported in the literature, there is no method of choice in the gene-expression analysis community. Moreover, there are only a few works that deal with the problem of automatically estimating the number of clusters in bioinformatics datasets. Most clustering methods require the number k of clusters to be either specified in advance or selected a posteriori from a set of clustering solutions over a range of k. In both cases, the user has to select the number of clusters. This paper proposes improvements to a clustering genetic algorithm that is capable of automatically discovering an optimal number of clusters and its corresponding optimal partition based upon numeric criteria. The proposed improvements are mainly designed to enhance the efficiency of the original clustering genetic algorithm, resulting in two new clustering genetic algorithms and an evolutionary algorithm for clustering (EAC). The original clustering genetic algorithm and its modified versions are evaluated in several runs using six gene-expression datasets in which the right clusters are known a priori. The results illustrate that all the proposed algorithms perform well in gene-expression data, although statistical comparisons in terms of the computational efficiency of each algorithm point out that EAC outperforms the others. Statistical evidence also shows that EAC is able to outperform a traditional method based on multiple runs of k-means over a range of k.

Fuzzy Sets and Systems | 2006

A fuzzy extension of the silhouette width criterion for cluster analysis

Ricardo J. G. B. Campello; Eduardo R. Hruschka

The present paper proposes a new cluster validity measure as an additional criterion to help the decision making process in fuzzy cluster analysis. This measure, named Fuzzy Silhouette, is a generalization to the fuzzy case of the Average Silhouette Width Criterion, originally conceived to assess crisp (non-fuzzy) data partitions. The Fuzzy Silhouette is more appealing than its crisp counterpart in the context of fuzzy cluster analysis since it makes explicit use of the fuzzy partition matrix provided by the clustering algorithm. In addition, it has been designed to improve performance of the original silhouette criterion in detecting regions with higher data density when the data set involves overlapping clusters. The performance of the Fuzzy Silhouette is evaluated and compared to that of five well-known cluster validity measures. Six data sets are used to illustrate different scenarios in which the proposed Fuzzy Silhouette performs similar to or better than these other criteria, thus becoming eligible to join a pool of measures to be used all together in fuzzy cluster analysis.

decision support systems | 2014

Tweet sentiment analysis with classifier ensembles

Nádia Félix Felipe da Silva; Eduardo R. Hruschka; Estevam R. Hruschka

Twitter is a microblogging site in which users can post updates (tweets) to friends (followers). It has become an immense dataset of the so-called sentiments. In this paper, we introduce an approach that automatically classifies the sentiment of tweets by using classifier ensembles and lexicons. Tweets are classified as either positive or negative concerning a query term. This approach is useful for consumers who can use sentiment analysis to search for products, for companies that aim at monitoring the public sentiment of their brands, and for many other applications. Indeed, sentiment classification in microblogging services (e.g., Twitter) through classifier ensembles and lexicons has not been well explored in the literature. Our experiments on a variety of public tweet sentiment datasets show that classifier ensembles formed by Multinomial Naive Bayes, SVM, Random Forest, and Logistic Regression can improve classification accuracy. We show that classifier ensembles are promising for tweet sentiment analysis.We compare bag-of-words and feature hashing for the representation of tweets.Classifier ensembles obtained from bag-of-words and feature hashing are discussed.

IEEE Transactions on Fuzzy Systems | 2012

Collaborative Fuzzy Clustering Algorithms: Some Refinements and Design Guidelines

Luiz F. S. Coletta; Lucas Vendramin; Eduardo R. Hruschka; Ricardo J. G. B. Campello; Witold Pedrycz

There are some variants of the widely used Fuzzy C-Means (FCM) algorithm that support clustering data distributed across different sites. Those methods have been studied under different names, like collaborative and parallel fuzzy clustering. In this study, we offer some augmentation of the two FCM-based clustering algorithms used to cluster distributed data by arriving at some constructive ways of determining essential parameters of the algorithms (including the number of clusters) and forming a set of systematically structured guidelines such as a selection of the specific algorithm depending on the nature of the data environment and the assumptions being made about the number of clusters. A thorough complexity analysis, including space, time, and communication aspects, is reported. A series of detailed numeric experiments is used to illustrate the main ideas discussed in the study.

Neurocomputing | 2006

EXTRACTING RULES FROM MULTILAYER PERCEPTRONS IN CLASSIFICATION PROBLEMS: A CLUSTERING-BASED APPROACH

Eduardo R. Hruschka; Nelson F. F. Ebecken

Abstract Multilayer perceptrons adjust their internal parameters performing vector mappings from the input to the output space. Although they may achieve high classification accuracy, the knowledge acquired by such neural networks is usually incomprehensible for humans. This fact is a major obstacle in data mining applications, in which ultimately understandable patterns (like classification rules) are very important. Therefore, many algorithms for rule extraction from neural networks have been developed. This work presents a method to extract rules from multilayer perceptrons trained in classification problems. The rule extraction algorithm basically consists of two steps. First, a clustering genetic algorithm is applied to find clusters of hidden unit activation values. Then, classification rules describing these clusters, in relation to the inputs, are generated. The proposed approach is experimentally evaluated in four datasets that are benchmarks for data mining applications and in a real-world meteorological dataset, leading to interesting results.

international conference on data mining | 2004

Evolutionary algorithms for clustering gene-expression data

Eduardo R. Hruschka; L.N. de Castro; Ricardo J. G. B. Campello

This work deals with the problem of automatically finding optimal partitions in bioinformatics datasets. We propose incremental improvements for a clustering genetic algorithm (CGA) culminating in the evolutionary algorithm for clustering (EAC). The CGA and its modified versions are evaluated in five gene-expression datasets, showing that the proposed EAC is a promising tool for clustering gene-expression data.

Journal of Heuristics | 2009

On the efficiency of evolutionary fuzzy clustering

Ricardo J. G. B. Campello; Eduardo R. Hruschka; Vinicius S. Alves

Abstract This paper tackles the problem of showing that evolutionary algorithms for fuzzy clustering can be more efficient than systematic (i.e. repetitive) approaches when the number of clusters in a data set is unknown. To do so, a fuzzy version of an Evolutionary Algorithm for Clustering (EAC) is introduced. A fuzzy cluster validity criterion and a fuzzy local search algorithm are used instead of their hard counterparts employed by EAC. Theoretical complexity analyses for both the systematic and evolutionary algorithms under interest are provided. Examples with computational experiments and statistical analyses are also presented.

Knowledge Based Systems | 2015

Simultaneous co-clustering and learning to address the cold start problem in recommender systems

Andre Luiz Vizine Pereira; Eduardo R. Hruschka

Recommender Systems (RSs) are powerful and popular tools for e-commerce. To build their recommendations, RSs make use of varied data sources, which capture the characteristics of items, users, and their transactions. Despite recent advances in RS, the cold start problem is still a relevant issue that deserves further attention, and arises due to the lack of prior information about new users and new items. To minimize system degradation, a hybrid approach is presented that combines collaborative filtering recommendations with demographic information. The approach is based on an existing algorithm, SCOAL (Simultaneous Co-Clustering and Learning), and provides a hybrid recommendation approach that can address the (pure) cold start problem, where no collaborative information (ratings) is available for new users. Better predictions are produced from this relaxation of assumptions to replace the lack of information for the new user. Experiments using real-world datasets show the effectiveness of the proposed approach.

Explore More