Matteo Gagliolo
Dalle Molle Institute for Artificial Intelligence Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Matteo Gagliolo.
Neural Computation | 2007
Jürgen Schmidhuber; Daan Wierstra; Matteo Gagliolo; Faustino J. Gomez
In recent years, gradient-based LSTM recurrent neural networks (RNNs) solved many previously RNN-unlearnable tasks. Sometimes, however, gradient information is of little use for training RNNs, due to numerous local minima. For such cases, we present a novel method: EVOlution of systems with LINear Outputs (Evolino). Evolino evolves weights to the nonlinear, hidden nodes of RNNs while computing optimal linear mappings from hidden state to output, using methods such as pseudo-inverse-based linear regression. If we instead use quadratic programming to maximize the margin, we obtain the first evolutionary recurrent support vector machines. We show that Evolino-based LSTM can solve tasks that Echo State nets (Jaeger, 2004a) cannot and achieves higher accuracy in certain continuous function generation tasks than conventional gradient descent RNNs, including gradient-based LSTM.
Annals of Mathematics and Artificial Intelligence | 2006
Matteo Gagliolo; Juergen Schmidhuber
Algorithm selection can be performed using a model of runtime distribution, learned during a preliminary training phase. There is a trade-off between the performance of model-based algorithm selection, and the cost of learning the model. In this paper, we treat this trade-off in the context of bandit problems. We propose a fully dynamic and online algorithm selection technique, with no separate training phase: all candidate algorithms are run in parallel, while a model incrementally learns their runtime distributions. A redundant set of time allocators uses the partially trained model to propose machine time shares for the algorithms. A bandit problem solver mixes the model-based shares with a uniform share, gradually increasing the impact of the best time allocators as the model improves. We present experiments with a set of SAT solvers on a mixed SAT-UNSAT benchmark; and with a set of solvers for the Auction Winner Determination problem.
Annals of Mathematics and Artificial Intelligence | 2011
Matteo Gagliolo; Juergen Schmidhuber
We propose a method that learns to allocate computation time to a given set of algorithms, of unknown performance, with the aim of solving a given sequence of problem instances in a minimum time. Analogous meta-learning techniques are typically based on models of algorithm performance, learned during a separate offline training sequence, which can be prohibitively expensive. We adopt instead an online approach, named GAMBLETA, in which algorithm performance models are iteratively updated, and used to guide allocation on a sequence of problem instances. GAMBLETA is a general method for selecting among two or more alternative algorithm portfolios. Each portfolio has its own way of allocating computation time to the available algorithms, possibly based on performance models, in which case its performance is expected to improve over time, as more runtime data becomes available. The resulting exploration-exploitation trade-off is represented as a bandit problem. In our previous work, the algorithms corresponded to the arms of the bandit, and allocations evaluated by the different portfolios were mixed, using a solver for the bandit problem with expert advice, but this required the setting of an arbitrary bound on algorithm runtimes, invalidating the optimal regret of the solver. In this paper, we propose a simpler version of GAMBLETA, in which the allocators correspond to the arms, such that a single portfolio is selected for each instance. The selection is represented as a bandit problem with partial information, and an unknown bound on losses. We devise a solver for this game, proving a bound on its expected regret. We present experiments based on results from several solver competitions, in various domains, comparing GAMBLETA with another online method.
european conference on machine learning | 2004
Matteo Gagliolo; Viktor Zhumatiy; Juergen Schmidhuber
Given is a search problem or a sequence of search problems, as well as a set of potentially useful search algorithms. We propose a general framework for online allocation of computation time to search algorithms based on experience with their performance so far. In an example instantiation, we use simple linear extrapolation of performance for allocating time to various simultaneously running genetic algorithms characterized by different parameter values. Despite the large number of searchers tested in parallel, on various tasks this rather general approach compares favorably to a more specialized state-of-the-art heuristic; in one case it is nearly two orders of magnitude faster.
international conference on artificial neural networks | 2005
Matteo Gagliolo; Juergen Schmidhuber
One aim of Meta-learning techniques is to minimize the time needed for problem solving, and the effort of parameter hand-tuning, by automating algorithm selection. The predictive model of algorithm performance needed for task often requires long training times. We address the problem in an online fashion, running multiple algorithms in parallel on a sequence of tasks, continually updating their relative priorities according to a neural model that maps their current state to the expected time to the solution. The model itself is updated at the end of each task, based on the actual performance of each algorithm. Censored sampling allows us to train the model effectively, without need of additional exploration after each tasks solution. We present a preliminary experiment in which this new inter-problem technique learns to outperform a previously proposed intraproblem heuristic.
Swarm Intelligence | 2013
Giovanni Pini; Matteo Gagliolo; Arne Brutschy; Marco Dorigo; Mauro Birattari
Task partitioning consists in dividing a task into sub-tasks that can be tackled separately. Partitioning a task might have both positive and negative effects: On the one hand, partitioning might reduce physical interference between workers, enhance exploitation of specialization, and increase efficiency. On the other hand, partitioning may introduce overheads due to coordination requirements. As a result, whether partitioning is advantageous or not has to be evaluated on a case-by-case basis. In this paper we consider the case in which a swarm of robots must decide whether to complete a given task as an unpartitioned task, or utilize task partitioning and tackle it as a sequence of two sub-tasks. We show that the problem of selecting between the two options can be formulated as a multi-armed bandit problem and tackled with algorithms that have been proposed in the reinforcement learning literature. Additionally, we study the implications of using explicit communication between the robots to tackle the studied task partitioning problem. We consider a foraging scenario as a testbed and we perform simulation-based experiments to evaluate the behavior of the system. The results confirm that existing multi-armed bandit algorithms can be employed in the context of task partitioning. The use of communication can result in better performance, but in may also hinder the flexibility of the system.
principles and practice of constraint programming | 2006
Matteo Gagliolo; Juergen Schmidhuber
Algorithm selection, algorithm portfolios, and randomized restarts, can profit from a probabilistic model of algorithm run-time, to be estimated from data gathered by solving a set of experiments. Censored sampling offers a principled way of reducing this initial training time. We study the trade-off between training time and model precision by varying the censoring threshold, and analyzing the consequent impact on the performance of an optimal restart strategy, based on an estimated model of runtime distribution. We present experiments with a SAT solver on a graph-coloring benchmark. Due to the “heavy-tailed” runtime distribution, a modest censoring can already reduce training time by a few orders of magnitudes. The nature of the optimization process underlying the restart strategy renders its performance surprisingly robust, also to more aggressive censoring.
international symposium on neural networks | 2012
Kevin Van Vaerenbergh; Abdel Rodríguez; Matteo Gagliolo; Peter Vrancx; Ann Nowé; Julian Stoev; Stijn Goossens; Gregory Pinte; Wim Symens
A common approach when applying reinforcement learning to address control problems is that of first learning a policy based on an approximated model of the plant, whose behavior can be quickly and safely explored in simulation; and then implementing the obtained policy to control the actual plant. Here we follow this approach to learn to engage a transmission clutch, with the aim of obtaining a rapid and smooth engagement, with a small torque loss. Using an approximated model of a wet clutch, which simulates a portion of the whole engagement, we first learn an open loop control signal, which is then transferred on the actual wet clutch, and improved by further learning with a different reward function, based on the actual torque loss observed.
Archive | 2010
Matteo Gagliolo; Catherine Legrand
Algorithm selection is typically based on models of algorithm performance,learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which models of the runtime distributions of the available algorithms are iteratively updated and used to guide the allocation of computational resources, while solving a sequence of problem instances. The models are estimated using survival analysis techniques, which allow us to reduce computation time, censoring the runtimes of the slower algorithms. Here, we review the statistical aspects of our online selection method, discussing the bias induced in the runtime distributions (RTD) models by the competition of different algorithms on the same problem instances.
soft computing | 2009
Matteo Gagliolo; Juergen Schmidhuber
In recent work we have developed an online algorithm selection technique, in which a model of algorithm performance is learned incrementally while being used. The resulting exploration-exploitation trade-off is solved as a bandit problem. The candidate solvers are run in parallel on a single machine, as an algorithm portfolio, and computation time is shared among them according to their expected performances. In this paper, we extend our technique to the more interesting and practical case of multiple CPUs.
Collaboration
Dive into the Matteo Gagliolo's collaboration.
Dalle Molle Institute for Artificial Intelligence Research
View shared research outputs