Roberto Souto Maior de Barros

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roberto Souto Maior de Barros is active.

Explore More

Publication

Featured researches published by Roberto Souto Maior de Barros.

Pattern Recognition Letters | 2013

RCD: A recurring concept drift framework

Paulo M. Goncalves; Roberto Souto Maior de Barros

This paper presents recurring concept drifts (RCD), a framework that offers an alternative approach to handle data streams that suffer from recurring concept drifts (on-line learning). It creates a new classifier to each context found and stores a sample of data used to build it. When a new concept drift occurs, the algorithm compares the new context to previous ones using a non-parametric multivariate statistical test to verify if both contexts come from the same distribution. If so, the corresponding classifier is reused. The RCD framework is compared with several algorithms (among single and ensemble approaches), in both artificial and real data sets, chosen from frequently used algorithms and data sets in the concept drift research area. We claim the proposed framework had better average ranks in data sets with abrupt and gradual concept drifts compared to both the single classifiers and the ensemble approaches that use the same base learner.

Expert Systems With Applications | 2014

A Comparative Study on Concept Drift Detectors

Paulo M. Goncalves; Silas Garrido Teixeira de Carvalho Santos; Roberto Souto Maior de Barros; Davi C. de L. Vieira

Abstract In data stream environments, drift detection methods are used to identify when the context has changed. This paper evaluates eight different concept drift detectors ( ddm , eddm , pht , stepd , d o f , adwin , Paired Learners, and ecdd ) and performs tests using artificial datasets affected by abrupt and gradual concept drifts, with several rates of drift, with and without noise and irrelevant attributes, and also using real-world datasets. In addition, a 2 k factorial design was used to indicate the parameters that most influence performance which is a novelty in the area. Also, a variation of the Friedman non-parametric statistical test was used to identify the best methods. Experiments compared accuracy, evaluation time, as well as false alarm and miss detection rates. Additionally, we used the Mahalanobis distance to measure how similar the methods are when compared to the best possible detection output. This work can, to some extent, also be seen as a research survey of existing drift detection methods.

Journal of Systems and Software | 2015

A large-scale study on the usage of Java's concurrent programming constructs

Gustavo Pinto; Weslley Torres; Benito Fernandes; Fernando Castor; Roberto Souto Maior de Barros

An analysis of 2227 Java projects, comprising more than 650 million lines of code.Seventy seven percent of the projects create threads or employ a concurrency control mechanism.Concurrent programming constructs are used both frequently and intensively.Adoption of java.util.concurrent is moderate (23% of the concurrent projects use it).Efficient and safe data structures, e.g., ConcurrentHashMap, are not yet widely used. In both academia and industry, there is a strong belief that multicore technology will radically change the way software is built. However, little is known about the current state of use of concurrent programming constructs. In this work we present an empirical work aimed at studying the usage of concurrent programming constructs of 2227 real world, stable and mature Java projects from SourceForge. We have studied the usage of concurrent techniques in the most recent versions of these applications and also how usage has evolved along time. The main findings of our study are: (I) More than 75% of the latest versions of the projects either explicitly create threads or employ some concurrency control mechanism. (II) More than half of these projects exhibit at least 47 synchronized methods and 3 implementations of the Runnable interface per 100,000 LoC, which means that not only concurrent programming constructs are used often but they are also employed intensively. (III) The adoption of the java.util.concurrent library is only moderate (approximately 23% of the concurrent projects employ it). (IV) Efficient and thread-safe data structures, such as ConcurrentHashMap, are not yet widely used, despite the fact that they present numerous advantages.

international conference on tools with artificial intelligence | 2015

A Lightweight Concept Drift Detection Ensemble

Bruno Iran Ferreira Maciel; Silas Garrido Teixeira de Carvalho Santos; Roberto Souto Maior de Barros

Uncovering information from large data streams containing changes in the data distribution (concept drift) make online learning a challenge that is progressively becoming more relevant. This paper proposes Drift Detection Ensemble (DDE), a small ensemble classifier that aggregates the warnings and drift detections of three concept drift detectors aiming to improve the results of the individual methods using different strategies and configurations. DDE was programmed to use different default combinations of detectors depending on the chosen sensitivity of the ensemble. Experiments were performed against six drift detectors using their default configurations, comparing their results on multiple artificial datasets containing different frequencies and durations of concept drifts, as well as real-world datasets. Our results indicate that the best two methods were DDE versions and they were statistically superior to several detectors.

european conference on machine learning | 2014

Speeding up recovery from concept drifts

Silas Garrido Teixeira de Carvalho Santos; Júnior Paulo Mauricio Gonçalves; Geyson Daniel dos Santos Silva; Roberto Souto Maior de Barros

The extraction of knowledge from data streams is an activity that has progressively been receiving an increased demand. However, in this type of environment, changes in data distribution, or concept drift, can occur constantly and is a challenge. This paper proposes the Adaptable Diversity-based Online Boosting (ADOB), a modified version of the online boosting, as proposed by Oza and Russell, which is aimed at speeding up the experts recovery after concept drifts. We performed experiments to compare the accuracy as well as the execution time and memory use of ADOB against a number of other methods using several artificial and real-world datasets, chosen from the most used ones in the area. Results suggest that, in many different situations, the proposed approach maintains a high accuracy, outperforming the other tested methods in regularity, with no significant change in the execution time and memory use. In particular, ADOB was specially efficient in situations where frequent and abrupt concept drifts occur.

Expert Systems With Applications | 2017

RDDM: Reactive drift detection method

Roberto Souto Maior de Barros; Danilo Rafael de Lima Cabral; Paulo M. Goncalves; Silas Garrido Teixeira de Carvalho Santos

Abstract Concept drift detectors are online learning software that mostly attempt to estimate the drift positions in data streams in order to modify the base classifier after these changes and improve accuracy. This is very important in applications such as the detection of anomalies in TCP/IP traffic and/or frauds in financial transactions. Drift Detection Method (DDM) is a simple, efficient, well-known method whose performance is often impaired when the concepts are very long. This article proposes the Reactive Drift Detection Method (RDDM) , which is based on DDM and, among other modifications, discards older instances of very long concepts aiming to detect drifts earlier, improving the final accuracy. Experiments run in MOA, using abrupt and gradual concept drift versions of different dataset generators and sizes (48 artificial datasets in total), as well as three real-world datasets, suggest RDDM beats the accuracy results of DDM, ECDD, and STEPD in most scenarios.

international joint conference on neural network | 2016

A Boosting-like Online Learning Ensemble

Roberto Souto Maior de Barros; Silas Garrido Teixeira de Carvalho Santos; Paulo Mauricio Gonçalves Júnior

Changes in the data distribution (concept drift) makes online learning a challenge that is progressively attracting more attention. This paper proposes Boosting-like Online Learning Ensemble (BOLE) based on heuristic modifications to Adaptable Diversity-based Online Boosting (ADOB), which is a modified version of Oza and Russells Online Boosting. More precisely, we empirically investigate the effects of (a) weakening the requirements to allow the experts to vote and (b) changing the concept drift detection method internally used, aiming to improve the ensemble accuracy. BOLE was tested against the original and other modified versions of both boosting methods as well as three renowned ensembles using well-known artificial and real-world datasets and statistically surpassed the accuracies of both boosting methods as well as those of the three ensembles. The accuracy improved in most tested situations but this is more evident in the datasets with more concept drifts, where the accuracy gains were very high.

international conference on tools with artificial intelligence | 2015

Optimizing the Parameters of Drift Detection Methods Using a Genetic Algorithm

Silas Garrido Teixeira de Carvalho Santos; Roberto Souto Maior de Barros; Paulo Mauricio Gonçalves Júnior

Extracting knowledge from environments with a continuous flow of data (data streams) is progressively receiving more attention. In such environments, the data distribution usually changes over time, which is known as concept drift. This paper presents a genetic algorithm aimed at adjusting the parameters of concept drift detection methods to improve their accuracies. Experiments were performed with four drift detectors, comparing their results using the values as presented by their original proposals to those using the average of the values returned by the genetic algorithm on multiple datasets containing the same type of concept drifts. Tests were performed in nine artificial datasets, each one with abrupt, slow gradual, and fast gradual concept drifts versions, as well as three real-world datasets. Results indicate that the predictive accuracies statistically increased in many situations.

asia pacific web conference | 2005

Providing geographic-multidimensional decision support over the web

Joel da Silva; Valéria Cesário Times; Robson do Nascimento Fidalgo; Roberto Souto Maior de Barros

For the last years, many researchers have been addressing their efforts to try to solve the problem regarding the integration between analytic and geographic processing systems. The main goal is to provide users with a system capable of processing both geographic and multidimensional data by abstracting the complexity of separately querying and analyzing these data in a decision making process. However, this integration may not be fully achieved yet or may be built by using proprietary technologies. This paper presents a service integration model for supporting analytic and/or geographic requests over the Web. This model has been implemented by a Web Service, named GMLA WS, which is strongly based on standardized technologies such as Web Services, Java and XML. The GMLA WS query results are displayed using a Web browser as maps and/or tables for helping users in their decision making.

Neurocomputing | 2018

Wilcoxon Rank Sum Test Drift Detector

Roberto Souto Maior de Barros; Juan Isidro González Hidalgo; Danilo Rafael de Lima Cabral

Abstract Online learning regards extracting information from large quantities of data (streams) usually affected by changes in the distribution (concept drift). Drift detectors are software that estimate the positions of these changes to substitute the base learner and ultimately improve accuracy. Statistical Test of Equal Proportions (STEPD) is a simple, well-known, efficient detector which uses a hypothesis test between two proportions to signal the concept drifts. However, despite identifying the existing drifts close to their correct positions, STEPD tends to identify many false positives. This article examines the application of the Wilcoxon rank sum statistical test for concept drift detection, proposing WSTD. Experiments run in the MOA framework using four artificial dataset generators, with abrupt and gradual drift versions of three sizes, as well as seven real-world datasets, suggest WSTD improves the detections of STEPD and other methods as well as their accuracies in many scenarios.

Explore More