Is this you? Create Your Porfile

Ricard Gavaldà

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ricard Gavaldà is active.

Explore More

Publication

Featured researches published by Ricard Gavaldà.

knowledge discovery and data mining | 2009

New ensemble methods for evolving data streams

Albert Bifet; Geoffrey Holmes; Bernhard Pfahringer; Richard Brendon Kirkby; Ricard Gavaldà

Advanced analysis of data streams is quickly becoming a key area of data mining research as the number of applications demanding such processing increases. Online mining when such data streams evolve over time, that is when concepts drift or change completely, is becoming one of the core issues. When tackling non-stationary concepts, ensembles of classifiers have several advantages over single classifier methods: they are easy to scale and parallelize, they can adapt to change quickly by pruning under-performing parts of the ensemble, and they therefore usually also generate more accurate concept descriptions. This paper proposes a new experimental data stream framework for studying concept drift, and two new variants of Bagging: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. Using the new experimental framework, an evaluation study on synthetic and real-world datasets comprising up to ten million examples shows that the new ensemble methods perform very well compared to several known methods.

intelligent data analysis | 2009

Adaptive Learning from Evolving Data Streams

Albert Bifet; Ricard Gavaldà

We propose and illustrate a method for developing algorithms that can adaptively learn from data streams that drift over time. As an example, we take Hoeffding Tree, an incremental decision tree inducer for data streams, and use as a basis it to build two new methods that can deal with distribution and concept drift: a sliding window-based algorithm, Hoeffding Window Tree, and an adaptive method, Hoeffding Adaptive Tree. Our methods are based on using change detectors and estimator modules at the right places; we choose implementations with theoretical guarantees in order to extend such guarantees to the resulting adaptive learning algorithm. A main advantage of our methods is that they require no guess about how fast or how often the stream will drift; other methods typically have several user-defined parameters to this effect. In our experiments, the new methods never do worse, and in some cases do much better, than CVFDT, a well-known method for tree induction on data streams with drift.

Data Mining and Knowledge Discovery | 2002

Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Carlos Domingo; Ricard Gavaldà; Osamu Watanabe

Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest research challenges is to develop methods that allow to use large amounts of data. One possible approach for dealing with huge amounts of data is to take a random sample and do data mining on it, since for many data mining applications approximate answers are acceptable. However, as argued by several researchers, random sampling is difficult to use due to the difficulty of determining an appropriate sample size. In this paper, we take a sequential sampling approach for solving this difficulty, and propose an adaptive sampling method that solves a general problem covering many actual problems arising in applications of discovery science. An algorithm following this method obtains examples sequentially in an on-line fashion, and it determines from the obtained examples whether it has already seen a large enough number of examples. Thus, sample size is not fixed a priori; instead, it adaptively depends on the situation. Due to this adaptiveness, if we are not in a worst case situation as fortunately happens in many practical applications, then we can solve the problem with a number of examples much smaller than required in the worst case. We prove the correctness of our method and estimates its efficiency theoretically. For illustrating its usefulness, we consider one concrete task requiring sampling, provide an algorithm based on our method, and show its efficiency experimentally.

Advances in Algorithms, Languages, and Complexity | 1997

Algorithms for Learning Finite Automata from Queries: A Unified View

José L. Balcázar; Josep Díaz; Ricard Gavaldà; Osamu Watanabe

In this survey we compare several known variants of the algorithm for learning deterministic finite automata via membership and equivalence queries. We believe that our presentation makes it easier to understand what is going on and what the differences between the various algorithms mean. We also include the comparative analysis of the algorithms, review some known lower bounds, prove a new one, and discuss the question of parallelizing this sort of algorithm.

knowledge discovery and data mining | 2011

Mining frequent closed graphs on evolving data streams

Albert Bifet; Geoffrey Holmes; Bernhard Pfahringer; Ricard Gavaldà

Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this paper we present a framework for studying graph pattern mining on time-varying streams. Three new methods for mining frequent closed subgraphs are presented. All methods work on coresets of closed subgraphs, compressed representations of graph sets, and maintain these sets in a batch-incremental manner, but use different approaches to address potential concept drift. An evaluation study on datasets comprising up to four million graphs explores the strength and limitations of the proposed methods. To the best of our knowledge this is the first work on mining frequent closed subgraphs in non-stationary data streams.

dependable systems and networks | 2010

Adaptive on-line software aging prediction based on machine learning

Javier Alonso; Jordi Torres; Josep Lluis Berral; Ricard Gavaldà

The growing complexity of software systems is resulting in an increasing number of software faults. According to the literature, software faults are becoming one of the main sources of unplanned system outages, and have an important impact on company benefits and image. For this reason, a lot of techniques (such as clustering, fail-over techniques, or server redundancy) have been proposed to avoid software failures, and yet they still happen. Many software failures are those due to the software aging phenomena. In this work, we present a detailed evaluation of our chosen machine learning prediction algorithm (M5P) in front of dynamic and non-deterministic software aging. We have tested our prediction model on a three-tier web J2EE application achieving acceptable prediction accuracy against complex scenarios with small training data sets. Furthermore, we have found an interesting approach to help to determine the root cause failure: The model generated by machine learning algorithms.

IEEE Transactions on Information Theory | 1997

Computational power of neural networks: a characterization in terms of Kolmogorov complexity

José L. Balcázar; Ricard Gavaldà; Hava T. Siegelmann

The computational power of recurrent neural networks is shown to depend ultimately on the complexity of the real constants (weights) of the network. The complexity, or information contents, of the weights is measured by a variant of resource-bounded Kolmogorov (1965) complexity, taking into account the time required for constructing the numbers. In particular, we reveal a full and proper hierarchy of nonuniform complexity classes associated with networks having weights of increasing Kolmogorov complexity.

international conference on user modeling, adaptation, and personalization | 2007

Web Customer Modeling for Automated Session Prioritization on High Traffic Sites

Nicolas Poggi; Toni Moreno; Josep Lluis Berral; Ricard Gavaldà; Jordi Torres

In the Web environment, user identification is becoming a major challenge for admission control systems on high traffic sites. When a web server is overloaded there is a significant loss of throughput when we compare finished sessions and the number of responses per second; longer sessions are usually the ones ending in sales but also the most sensitive to load failures. Session-based admission control systems maintain a high QoS for a limited number of sessions, but does not maximize revenue as it treats all non-logged sessions the same. We present a novel method for learning to assign priorities to sessions according to the revenue that will generate. For this, we use traditional machine learning techniques and Markov-chain models. We are able to train a system to estimate the probability of the users purchasing intentions according to its early navigation clicks and other static information. The predictions can be used by admission control systems to prioritize sessions or deny them if no resources are available, thus improving sales throughput per unit of time for a given infrastructure. We test our approach on access logs obtained from a high-traffic online travel agency, with promising results.

discovery science | 2006

Kalman filters and adaptive windows for learning in data streams

Albert Bifet; Ricard Gavaldà

We study the combination of Kalman filter and a recently proposed algorithm for dynamically maintaining a sliding window, for learning from streams of examples. We integrate this idea into two well-known learning algorithms, the Naive Bayes algorithm and the k-means clusterer. We show on synthetic data that the new algorithms do never worse, and in some cases much better, than the algorithms using only memoryless Kalman filters or sliding windows with no filtering.

asian conference on machine learning | 2009

Improving Adaptive Bagging Methods for Evolving Data Streams

Albert Bifet; Geoffrey Holmes; Bernhard Pfahringer; Ricard Gavaldà

We propose two new improvements for bagging methods on evolving data streams. Recently, two new variants of Bagging were proposed: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging. ASHT Bagging uses trees of different sizes, and ADWIN Bagging uses ADWIN as a change detector to decide when to discard underperforming ensemble members. We improve ADWIN Bagging using Hoeffding Adaptive Trees, trees that can adaptively learn from data streams that change over time. To speed up the time for adapting to change of Adaptive-Size Hoeffding Tree (ASHT) Bagging, we add an error change detector for each classifier. We test our improvements by performing an evaluation study on synthetic and real-world datasets comprising up to ten million examples.

Explore More