Is this you? Create Your Porfile

Heitor Murilo Gomes

Pontifícia Universidade Católica do Paraná

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Heitor Murilo Gomes is active.

Explore More

Publication

Featured researches published by Heitor Murilo Gomes.

ACM Computing Surveys | 2017

A Survey on Ensemble Learning for Data Stream Classification

Heitor Murilo Gomes; Jean Paul Barddal; Fabrício Enembreck; Albert Bifet

Ensemble-based methods are among the most widely used techniques for data stream classification. Their popularity is attributable to their good performance in comparison to strong single learners while being relatively easy to deploy in real-world applications. Ensemble algorithms are especially useful for data stream learning as they can be integrated with drift detection algorithms and incorporate dynamic updates, such as selective removal or addition of classifiers. This work proposes a taxonomy for data stream ensemble learning as derived from reviewing over 60 algorithms. Important aspects such as combination, diversity, and dynamic updates, are thoroughly discussed. Additional contributions include a listing of popular open-source tools and a discussion about current data stream research challenges and how they relate to ensemble learning (big data streams, concept evolution, feature drifts, temporal dependencies, and others).

Machine Learning | 2017

Adaptive random forests for evolving data stream classification

Heitor Murilo Gomes; Albert Bifet; Jesse Read; Jean Paul Barddal; Fabrício Enembreck; Bernhard Pfharinger; Geoff Holmes; Talel Abdessalem

Random forests is currently one of the most used machine learning algorithms in the non-streaming (batch) setting. This preference is attributable to its high learning performance and low demands with respect to input preparation and hyper-parameter tuning. However, in the challenging context of evolving data streams, there is no random forests algorithm that can be considered state-of-the-art in comparison to bagging and boosting based algorithms. In this work, we present the adaptive random forest (ARF) algorithm for classification of evolving data streams. In contrast to previous attempts of replicating random forests for data stream learning, ARF includes an effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimizations for different data sets. We present experiments with a parallel implementation of ARF which has no degradation in terms of classification performance in comparison to a serial implementation, since trees and adaptive operators are independent from one another. Finally, we compare ARF with state-of-the-art algorithms in a traditional test-then-train evaluation and a novel delayed labelling evaluation, and show that ARF is accurate and uses a feasible amount of resources.

Journal of Systems and Software | 2017

A survey on feature drift adaptation

Jean Paul Barddal; Heitor Murilo Gomes; Fabrício Enembreck; Bernhard Pfahringer

This paper provides insights into a nearly neglected type of drift: feature drifts.Existing works on feature drift detection and adaptation are surveyed.Existing works are empirically analyzed showing how challenging this problem is.This paper provides insights into future directions for research into feature drift detection and adaptation. Data stream mining is a fast growing research topic due to the ubiquity of data in several real-world problems. Given their ephemeral nature, data stream sources are expected to undergo changes in data distribution, a phenomenon called concept drift. This paper focuses on one specific type of drift that has not yet been thoroughly studied, namely feature drift. Feature drift occurs whenever a subset of features becomes, or ceases to be, relevant to the learning task; thus, learners must detect and adapt to these changes accordingly. We survey existing work on feature drift adaptation with both explicit and implicit approaches. Additionally, we benchmark several algorithms and a naive feature drift detection approach using synthetic and real-world datasets. The results from our experiments indicate the need for future research in this area as even naive approaches produced gains in accuracy while reducing resources usage. Finally, we state current research topics, challenges and future directions for feature drift adaptation.

international conference on tools with artificial intelligence | 2015

A Survey on Feature Drift Adaptation

Jean Paul Barddal; Heitor Murilo Gomes; Fabrício Enembreck

Mining data streams is of the utmost importance due to its appearance in many real-world situations, such as: sensor networks, stock market analysis and computer networks intrusion detection systems. Data streams are, by definition, potentially unbounded sequences of data that arrive intermittently at rapid rates. Extracting useful knowledge from data streams embeds virtually all problems from conventional data mining with the addition of single-pass real-time processing within limited time and memory space. Additionally, due to its ephemeral nature, it is expected that streams undergo changes in its data distribution denominated concept drifts. In this work, we focus on one specific kind of concept drift that has not been extensively addressed in the literature, namely feature drift. A feature drift happens when changes occur in the set of features, such that a subset of features become, or cease to be, relevant to the learning problem. Specifically, changes in the relevance of features directly imply modifications in the decision boundary to be learned, thus the learner must detect and adapt to according to it. Timely detection and recover from feature drifts is a challenging task that can be modeled after a dynamic feature selection problem. In this paper we survey existing work on dynamic feature selection for data streams that acts either implicitly or explicitly. We conclude that there is a need for future research in this area, which we highlight as future research directions.

acm symposium on applied computing | 2015

SNCStream: a social network-based data stream clustering algorithm

Jean Paul Barddal; Heitor Murilo Gomes; Fabrício Enembreck

Data Stream Clustering is an active area of research which requires efficient algorithms capable of finding and updating clusters incrementally. On top of that, due to the inherent evolving nature of data streams, it is expected that these algorithms manage to quickly adapt to both concept drifts and the appearance and disappearance of clusters. Nevertheless, many of the developed two-step algorithms are only capable of finding hyper-spherical clusters and are highly dependant on parametrization. In this paper we introduce SNCStream, a one-step online clustering algorithm based on Social Networks Theory, which uses homophily to find non-hyper-spherical clusters. Our empirical studies show that SNCStream is able to surpass density-based algorithms in cluster quality and requires feasible amount of resources (time and memory) when compared to other algorithms.

acm symposium on applied computing | 2014

SFNClassifier: a scale-free social network method to handle concept drift

Jean Paul Barddal; Heitor Murilo Gomes; Fabrício Enembreck

In this paper, we present a new ensemble method, the Scale-free Network Classifier (SFNClassifier), that is conceived as a dynamic sized scale-free network. In Data Stream Mining, ensemble-based approaches have been proposed to enhance accuracy and allow fast recovery from concept drift. However, these approaches are based on both update and polling heuristics that do not present good accuracy results in arbitrary domains and do not represent explicitly the similarity between classifiers. The representation of the ensemble as a network allows us to extract centrality metrics, which are used to perform a weighted majority vote, where the weight of a classifier is proportional to its centrality value. Based on empirical studies, we concluded that SFNClassifier has comparable results to other ensemble-learners in terms of accuracy and outperformed the other methods in processing time.

european conference on machine learning | 2016

On Dynamic Feature Weighting for Feature Drifting Data Streams

Jean Paul Barddal; Heitor Murilo Gomes; Fabrício Enembreck; Bernhard Pfahringer; Albert Bifet

The ubiquity of data streams has been encouraging the development of new incremental and adaptive learning algorithms. Data stream learners must be fast, memory-bounded, but mainly, tailored to adapt to possible changes in the data distribution, a phenomenon named concept drift. Recently, several works have shown the impact of a so far nearly neglected type of drifcccct: feature drifts. Feature drifts occur whenever a subset of features becomes, or ceases to be, relevant to the learning task. In this paper we (i) provide insights into how the relevance of features can be tracked as a stream progresses according to information theoretical Symmetrical Uncertainty; and (ii) how it can be used to boost two learning schemes: Naive Bayesian and k-Nearest Neighbor. Furthermore, we investigate the usage of these two new dynamically weighted learners as prediction models in the leaves of the Hoeffding Adaptive Tree classifier. Results show improvements in accuracy (an average of 10.69 % for k-Nearest Neighbor, 6.23 % for Naive Bayes and 4.42 % for Hoeffding Adaptive Trees) in both synthetic and real-world datasets at the expense of a bounded increase in both memory consumption and processing time.

computational intelligence and data mining | 2013

SAE: Social Adaptive Ensemble classifier for data streams

Heitor Murilo Gomes; Fabrício Enembreck

This work encompasses the development of a new ensemble classifier that uses a Social Network abstraction for Data Stream Classification, namely the Social Adaptive Ensemble (SAE). In the context of data stream classification, concept drift is considered one of the most difficult and important issues to be addressed. Ensemble classifiers can be successfully applied to data streams as long as the ensemble efficiently adapts itself in the occurrence of a concept drift. SAE algorithm inherits strategies from other ensemble methods, such as Online Bagging [4] and DWM [2], and merge these with the notion of connectivity between similar classifiers w.r.t. their individual predictions. The relational data obtained through measuring similarities between classifiers is used to arrange ensemble members in a social network structure that allows us to identify subgroups (subnetworks) of highly similar classifiers. Being able to identify similar classifiers allows us to implement a combination strategy that first combines predictions within similar classifiers and later combines these into the final prediction. Moreover, this combination strategy assigns more weight to recently added classifiers predictions during concept drifts, since these are dissimilar to all other existing classifiers. The similarity between classifiers is also used to identify and remove redundant classifiers. This effectively saves systems resources and sometimes improves accuracy. We present empirical experiments with synthetic data streams containing abrupt, gradual and no drift showing that SAE is a valid option for stream classification, especially when data stream characteristics (e.g. presence of abrupt drifts) are previously unknown and system resources, such as CPU time and memory space, are a concern.

Information Systems | 2016

SNCStream+: Extending A High Quality True Anytime Data Stream Clustering Algorithm

Jean Paul Barddal; Heitor Murilo Gomes; Fabrício Enembreck; Jean-Paul A. Barthès

Abstract Data Stream Clustering is an active area of research which requires efficient algorithms capable of finding and updating clusters incrementally as data arrives. On top of that, due to the inherent evolving nature of data streams, it is expected that algorithms undergo both concept drifts and evolutions, which must be taken into account by the clustering algorithm, allowing incremental clustering updates. In this paper we present the Social Network Clusterer Stream+ (SNCStream+). SNCStream+ tackles the data stream clustering problem as a network formation and evolution problem, where instances and micro-clusters form clusters based on homophily. Our proposal has its parameters analyzed and it is evaluated in a broad set of problems against literature baselines. Results show that SNCStream+ achieves superior clustering quality (CMM), and feasible processing time and memory space usage when compared to the original SNCStream and other proposals of the literature.

acm symposium on applied computing | 2014

SAE2: advances on the social adaptive ensemble classifier for data streams

Heitor Murilo Gomes; Fabrício Enembreck

This work presents SAE2, a dynamic ensemble classifier for data stream classification that is built on the Social Adaptive Ensemble (SAE). Similarly to its predecessor, SAE2 maintains an ensemble of classifiers arranged as a network in which connections are created between two classifiers if they have similar predictions. In comparison with SAE, SAE2 includes a more scalable adaptation method, achieved by updating classifiers connections weights before they are added to the ensemble; an alternative combination method based on maximal cliques; a voting strategy based on weighted majority, which diminishes prediction ties; and some other minor enhancements, such as a threshold to limit the maximum ensemble size. We present empirical experiments with synthetic and real data streams containing gradual, abrupt and no drift and compare SAE2 to state-of-the-art data stream classifiers. Our analysis shows that SAE2 is able to deliver high accuracy on a multitude of data streams, sometimes outperforming state-of-the-art data stream classifiers, without demanding as much processing time and memory as some of them.

Explore More