Pedro Pereira Rodrigues

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pedro Pereira Rodrigues is active.

Explore More

Publication

Featured researches published by Pedro Pereira Rodrigues.

brazilian symposium on artificial intelligence | 2004

Learning with Drift Detection

João Gama; Pedro Medas; Gladys Castillo; Pedro Pereira Rodrigues

Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generate the examples changes over time. We present a method for detection of changes in the probability distribution of examples. The idea behind the drift detection method is to control the online error-rate of the algorithm. The training examples are presented in sequence. When a new training example is available, it is classified using the actual model. Statistical theory guarantees that while the distribution is stationary, the error will decrease. When the distribution changes, the error will increase. The method controls the trace of the online error of the algorithm. For the actual context we define a warning level, and a drift level. A new context is declared, if in a sequence of examples, the error increases reaching the warning level at example k w , and the drift level at example k d . This is an indication of a change in the distribution of the examples. The algorithm learns a new model using only the examples since k w . The method was tested with a set of eight artificial datasets and a real world dataset. We used three learning algorithms: a perceptron, a neural network and a decision tree. The experimental results show a good performance detecting drift and with learning the new concept. We also observe that the method is independent of the learning algorithm.

knowledge discovery and data mining | 2009

Issues in evaluation of stream learning algorithms

João Gama; Raquel Sebastião; Pedro Pereira Rodrigues

Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. There are no golden standards for assessing performance in non-stationary environments. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of Predictive Sequential methods for error estimate - the prequential error. The prequential error allows us to monitor the evolution of the performance of models that evolve over time. Nevertheless, it is known to be a pessimistic estimator in comparison to holdout estimates. To obtain more reliable estimators we need some forgetting mechanism. Two viable alternatives are: sliding windows and fading factors. We observe that the prequential error converges to an holdout estimator when estimated over a sliding window or using fading factors. We present illustrative examples of the use of prequential error estimators, using fading factors, for the tasks of: i) assessing performance of a learning algorithm; ii) comparing learning algorithms; iii) hypothesis testing using McNemar test; and iv) change detection using Page-Hinkley test. In these tasks, the prequential error estimated using fading factors provide reliable estimators. In comparison to sliding windows, fading factors are faster and memory-less, a requirement for streaming applications. This paper is a contribution to a discussion in the good-practices on performance assessment when learning dynamic models that evolve over time.

IEEE Transactions on Knowledge and Data Engineering | 2008

Hierarchical Clustering of Time-Series Data Streams

Pedro Pereira Rodrigues; João Gama; João Pedro Pedroso

This paper presents and analyzes an incremental system for clustering streaming time series. The Online Divisive-Agglomerative Clustering (ODAC) system continuously maintains a tree-like hierarchy of clusters that evolves with data, using a top-down strategy. The splitting criterion is a correlation-based dissimilarity measure among time series, splitting each node by the farthest pair of streams. The system also uses a merge operator that reaggregates a previously split node in order to react to changes in the correlation structure between time series. The split and merge operators are triggered in response to changes in the diameters of existing clusters, assuming that in stationary environments, expanding the structure leads to a decrease in the diameters of the clusters. The system is designed to process thousands of data streams that flow at a high rate. The main features of the system include update time and memory consumption that do not depend on the number of examples in the stream. Moreover, the time and memory required to process an example decreases whenever the cluster structure expands. Experimental results on artificial and real data assess the processing qualities of the system, suggesting a competitive performance on clustering streaming time series, exploring also its ability to deal with concept drift.

Machine Learning | 2013

On evaluating stream learning algorithms

João Gama; Raquel Sebastião; Pedro Pereira Rodrigues

Most streaming decision models evolve continuously over time, run in resource-aware environments, and detect and react to changes in the environment generating data. One important issue, not yet convincingly addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of prequential error with forgetting mechanisms to provide reliable error estimators. We prove that, in stationary data and for consistent learning algorithms, the holdout estimator, the prequential error and the prequential error estimated over a sliding window or using fading factors, all converge to the Bayes error. The use of prequential error with forgetting mechanisms reveals to be advantageous in assessing performance and in comparing stream learning algorithms. It is also worthwhile to use the proposed methods for hypothesis testing and for change detection. In a set of experiments in drift scenarios, we evaluate the ability of a standard change detection algorithm to detect change using three prequential error estimators. These experiments point out that the use of forgetting mechanisms (sliding windows or fading factors) are required for fast and efficient change detection. In comparison to sliding windows, fading factors are faster and memoryless, both important requirements for streaming applications. Overall, this paper is a contribution to a discussion on best practice for performance assessment when learning is a continuous process, and the decision models are dynamic and evolve over time.

acm symposium on applied computing | 2005

Learning decision trees from dynamic data streams

João Gama; Pedro Medas; Pedro Pereira Rodrigues

This paper presents a system for induction of forest of functional trees from data streams able to detect concept drift. The Ultra Fast Forest of Trees (UFFT) is an incremental algorithm, that works online, processing each example in constant time, and performing a single scan over the training examples. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. For multi-class problems the algorithm grows a binary tree for each possible pair of classes, leading to a forest of trees. Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process. Naive-Bayes in leaves are used to classify test examples, naive-Bayes in inner nodes can be used as multivariate splitting-tests if chosen by the splitting criteria, and used to detect drift in the distribution of the examples that traverse the node. When a drift is detected, all the sub-tree rooted at that node will be pruned. The use of naive-Bayes classifiers at leaves to classify test examples, the use of splitting-tests based on the outcome of naive-Bayes, and the use of naive-Bayes classifiers at decision nodes to detect drift are directly obtained from the sufficient statistics required to compute the splitting criteria, without no additional computations. This aspect is a main advantage in the context of high-speed data streams. This methodology was tested with artificial and real-world data sets. The experimental results show a very good performance in comparison to a batch decision tree learner, and high capacity to detect and react to drift.

Avian Pathology | 2011

Molecular characterization of vancomycin-resistant enterococci and extended-spectrum β-lactamase-containing Escherichia coli isolates in wild birds from the Azores Archipelago

Nuno Silva; Gilberto Igrejas; Pedro Pereira Rodrigues; Tiago Rodrigues; Alexandre Gonçalves; Ana Felgar; Rui Pacheco; David Gonçalves; Regina Tristão da Cunha; Patrícia Poeta

To study the prevalence of vancomycin-resistant enterococci (VRE) and extended-spectrum β-lactamase (ESBL)-containing Escherichia coli isolates, and the mechanisms of resistance implicated, 220 faecal samples from wild birds were collected between 2006 and 2010 in the Azores Archipelago. Samples were spread on Slanetz–Bartley agar plates supplemented with 4 mg/l vancomycin and on Levine agar plates supplemented with 2 mg/l cefotaxime for VRE and ESBL-containing E. coli isolation, respectively. vanA-containing enterococcal isolates (four Enterococcus faecium and two Enterococcus durans) and vanC-1 Enterococcus gallinarum isolates were detected in six and seven faecal samples, respectively. VRE isolates showed ampicillin (n=11), ciprofloxacin (n=9), tetracycline (n=6), erythromycin (n=5), quinupristin/dalfopristin (n=3) and high-level kanamycin resistance (n=1). The tet(L) and/or tet(M) gene was found in all tetracycline-resistant isolates and the erm(B) gene in all erythromycin-resistant isolates. Three vanA-containing E. faecium and two E. gallinarum presented specific sequences of the Tn5397 transposon. Four VRE isolates harboured the ace virulence gene. One faecal sample revealed one ESBL-containing E. coli isolate that belongs to the A phylogenetic group, showed a phenotype of resistance to β-lactams and tetracycline, and harboured the bla CTX-M-14, bla SHV-12 and the tet(A) genes. To our knowledge, this is the first study to focus on defining the prevalence of VRE and/or ESBL-containing E. coli strains in wild birds from the Azores. The data recovered are essential to improve knowledge about the dissemination of resistant strains through wild ecosystems and their possible implications by transferring these resistances to other animals or to humans.

european conference on principles of data mining and knowledge discovery | 2007

Stream-Based Electricity Load Forecast

João Gama; Pedro Pereira Rodrigues

Sensors distributed all around electrical-power distribution networks produce streams of data at high-speed. From a data mining perspective, this sensor network problem is characterized by a large number of variables (sensors), producing a continuous flow of data, in a dynamic non-stationary environment. Companies make decisions to buy or sell energy based on load profiles and forecast. We propose an architecture based on an online clustering algorithm where each cluster (group of sensors with high correlation) contains a neural-network based predictive model. The goal is to maintain in real-time a clustering model and a predictive model able to incorporate new information at the speed data arrives, detecting changes and adapting the decision models to the most recent information. We present results illustrating the advantages of the proposed architecture, on several temporal horizons, and its competitiveness with another predictive strategy.

IEEE Transactions on Nuclear Science | 2006

The Clear-PEM Electronics System

Edgar Albuquerque; Pedro Bento; Carlos Leong; Fernando Gonçalves; João Nobre; Joel Rego; Paulo Relvas; Pedro Lousã; Pedro Pereira Rodrigues; Isabel C. Teixeira; João Paulo Teixeira; Luís Silva; M. Medeiros Silva; Andreia Trindade; J. Varela

The Clear-PEM detector system is a compact positron emission mammography scanner with about 12000 channels aiming at high sensitivity and good spatial resolution. Front-end, Trigger, and Data Acquisition electronics are crucial components of this system. The on-detector front-end is implemented as a data-driven synchronous system that identifies and selects the analog signals whose energy is above a predefined threshold. The off-detector trigger logic uses digitized front-end data streams to compute pulse amplitudes and timing. Based on this information it generates a coincidence trigger signal that is used to initiate the conditioning and transfer of the relevant data to the data acquisition computer. To minimize dead-time, the data acquisition electronics makes extensive use of pipeline processing structures and derandomizer memories with multievent capacity. The system operates at 100-MHz clock frequency, and is capable of sustaining a data acquisition rate of 1 million events per second with an efficiency above 95%, at a total single photon background rate of 10 MHz. The basic component of the front-end system is a low-noise amplifier-multiplexer chip presently under development. The off-detector system is designed around a dual-bus crate backplane for fast intercommunication between the system boards. The trigger and data acquisition logic is implemented in large FPGAs with 4 million gates. Monte Carlo simulation results evaluating the trigger performance, as well as results of hardware simulations are presented, showing the correctness of the design and the implementation approach

intelligent data analysis | 2009

A system for analysis and prediction of electricity-load streams

Pedro Pereira Rodrigues; João Gama

Sensors distributed all around electrical-power distribution networks produce streams of data at high-speed. From a data mining perspective, this sensor network problem is characterized by a large number of variables (sensors), producing a continuous flow of data, in a dynamic non-stationary environment. Companies make decisions to buy or sell energy based on load profiles and forecast. In this work we analyze the most relevant data mining problems and issues: continuously learning clusters and predictive models, model adaptation in large domains, and change detection and adaptation. The goal is to continuously maintain a clustering model, defining profiles, and a predictive model able to incorporate new information at the speed data arrives, detecting changes and adapting the decision models to the most recent information. We present experimental results in a large real-world scenario, illustrating the advantages of the continuous learning and its competitiveness against Wavelets based prediction. We also propose a light electrical load visualization system which enhances the ability to inspect forecast results in mobile devices.

World Journal of Gastroenterology | 2013

Clinical prognostic factors for disabling Crohn's disease: a systematic review and meta-analysis.

Cláudia Dias; Pedro Pereira Rodrigues; Altamiro Costa-Pereira; Fernando Magro

AIM To identify demographic and clinical factors associated with disabling Crohns disease (CD). METHODS A systematic review and meta-analysis of observational studies, focusing on the factors that can predict the prognosis of different outcomes of CD was undertaken. PubMed, ISI Web of Knowledge and Scopus were searched to identify studies investigating the above mentioned factors in adult patients with CD. Studies were eligible for inclusion if they describe prognostic factors in CD, with inclusion and exclusion criteria defined as follows. Studies with adult patients and CD, written in English and studying association between clinical factors and at least one prognosis outcome were included. Meta-analysis of effects was undertaken for the disabling disease outcome, using odds ratio (OR) to assess the effect of the different factors in the outcome. The statistical method used was Mantel-Haenszel for fixed effects. The 16-item quality assessment tool (QATSDD) was used to assess the quality of the studies (range: 0-42). RESULTS Of the 913 papers initially selected, sixty studies were reviewed and three were included in the systematic review and meta-analysis. The global QATSDD scores of papers were 18, 21 and 22. Of a total of 1961 patients enrolled, 1332 (78%) were classified with disabling disease five years after diagnosis. In two studies, age at diagnosis was a factor associated with disabling disease five years after diagnosis. Individuals under 40 years old had a higher risk of developing disabling disease. In two studies, patients who were treated with corticosteroids on the first flare developed disabling disease five years after diagnosis. Further, perianal disease was found to be relevant in all of the studies at two and five years after diagnosis. Finally, one study showed localization as a factor associated with disabling disease five years after diagnosis, with L3 being a higher risk factor. This meta-analysis showed a significantly higher risk of developing disabling disease at five years after initial diagnosis among patients younger than 40 years of age (OR = 2.47, 95%CI: 1.74-3.51), with initial steroid treatment for first flare (OR = 2.42, 95%CI: 1.87-3.11) and with perianal disease (OR = 2.00, 95%CI: 1.41-2.85). CONCLUSION Age at diagnosis, perianal disease, initial use of steroids and localization seem to be independent prognostic factors of disabling disease.

Explore More