Levente Kocsis
Hungarian Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Levente Kocsis.
european conference on machine learning | 2006
Levente Kocsis; Csaba Szepesvári
For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives.
Communications of The ACM | 2012
Sylvain Gelly; Levente Kocsis; Marc Schoenauer; Michèle Sebag; David Silver; Csaba Szepesvári; Olivier Teytaud
The ancient oriental game of Go has long been considered a grand challenge for artificial intelligence. For decades, computer Go has defied the classical methods in game tree search that worked so successfully for chess and checkers. However, recent play in computer Go has been transformed by a new paradigm for tree search based on Monte-Carlo methods. Programs based on Monte-Carlo tree search now play at human-master levels and are beginning to challenge top professional players. In this paper, we describe the leading algorithms for Monte-Carlo tree search and explain how they have advanced the state of the art in computer Go.
computational intelligence and games | 2008
Benjamin E. Childs; James H. Brodeur; Levente Kocsis
Monte Carlo search, and specifically the UCT (Upper Confidence Bounds applied to Trees) algorithm, has contributed to a significant improvement in the game of Go and has received considerable attention in other applications. This article investigates two enhancements to the UCT algorithm. First, we consider the possible adjustments to UCT when the search tree is treated as a graph (and information amongst transpositions are shared). The second modification introduces move groupings, which may reduce the effective branching factor. Experiments with both enhancements were performed using artificial trees and in the game of Go. From the experimental results we conclude that both exploiting the graph structure and grouping moves may contribute to an increase in the playing strength of game programs using UCT.
Journal of Artificial Intelligence Research | 2011
András György; Levente Kocsis
Local search algorithms applied to optimization problems often suffer from getting trapped in a local optimum. The common solution for this deficiency is to restart the algorithm when no progress is observed. Alternatively, one can start multiple instances of a local search algorithm, and allocate computational resources (in particular, processing time) to the instances depending on their behavior. Hence, a multi-start strategy has to decide (dynamically) when to allocate additional resources to a particular instance and when to start new instances. In this paper we propose multi-start strategies motivated by works on multi-armed bandit problems and Lipschitz optimization with an unknown constant. The strategies continuously estimate the potential performance of each algorithm instance by supposing a convergence rate of the local search algorithm up to an unknown constant, and in every phase allocate resources to those instances that could converge to the optimum for a particular range of the constant. Asymptotic bounds are given on the performance of the strategies. In particular, we prove that at most a quadratic increase in the number of times the target function is evaluated is needed to achieve the performance of a local search algorithm started from the attraction region of the optimum. Experiments are provided using SPSA (Simultaneous Perturbation Stochastic Approximation) and k-means as local search algorithms, and the results indicate that the proposed strategies work well in practice, and, in all cases studied, need only logarithmically more evaluations of the target function as opposed to the theoretically suggested quadratic increase.
Machine Learning | 2006
Levente Kocsis; Csaba Szepesvári
Most game programs have a large number of parameters that are crucial for their performance. While tuning these parameters by hand is rather difficult, efficient and easy to use generic automatic parameter optimisation algorithms are known only for special problems such as the adjustment of the parameters of an evaluation function. The SPSA algorithm (Simultaneous Perturbation Stochastic Approximation) is a generic stochastic gradient method for optimising an objective function when an analytic expression of the gradient is not available, a frequent case in game programs. Further, SPSA in its canonical form is very easy to implement. As such, it is an attractive choice for parameter optimisation in game programs, both due to its generality and simplicity. The goal of this paper is twofold: (i) to introduce SPSA for the game programming community by putting it into a game-programming perspective, and (ii) to propose and discuss several methods that can be used to enhance the performance of SPSA. These methods include using common random numbers and antithetic variables, a combination of SPSA with RPROP, and the reuse of samples of previous performance evaluations. SPSA with the proposed enhancements was tested in some large-scale experiments on tuning the parameters of an opponent model, a policy and an evaluation function in our poker program, MCRAISE. Whilst SPSA with no enhancements failed to make progress using the allocated resources, SPSA with the enhancements proved to be competitive with other methods, including TD-learning; increasing the average payoff per game by as large as 0.19 times the size of the amount of the small bet. From the experimental study, we conclude that the use of an appropriately enhanced variant of SPSA for the optimisation of game program parameters is a viable approach, especially if no good alternative exist for the types of parameters considered.
advances in computer games | 2006
Levente Kocsis; Csaba Szepesvári; Mark H. M. Winands
Most game programs have a large number of parameters that are crucial for their performance. Tuning these parameters by hand is rather difficult. Therefore automatic optimization algorithms in game programs are interesting research domains. However, successful applications are only known for parameters that belong to certain components (e.g., evaluation-function parameters). The SPSA (Simultaneous Perturbation Stochastic Approximation) algorithm is an attractive choice for optimizing any kind of parameters of a game program, both for its generality and its simplicity. Its disadvantage is that it can be very slow. In this article we propose several methods to speed up SPSA, in particular, the combination with RPROP, using common random numbers, antithetic variables, and averaging. We test the resulting algorithm for tuning various types of parameters in two domains, Poker and LOA. From the experimental study, we may conclude that using SPSA is a viable approach for optimization in game programs, in particular if no good alternative exists for the types of parameters considered.
industrial and engineering applications of artificial intelligence and expert systems | 2001
Levente Kocsis; Jos W. H. M. Uiterwijk; H. Jaap van den Herik
The efficiency of alpha-beta search algorithms heavily depends on the order in which the moves are examined. This paper focuses on using neural networks to estimate the likelihood of a move being the best in a certain position. The moves considered more likely to be the best are examined first. We selected Lines of Action as a testing ground. We investigated several schemes to encode the moves in a neural network. In the experiments, the best performance was obtained by using one output unit for each possible move of the game. The results indicate that our move-ordering approach can speed up the search with 20 to 50 percent compared with one of the best current alternatives, the history heuristic.
annual conference on computers | 2000
Levente Kocsis; Jos W. H. M. Uiterwijk; H. Jaap van den Herik
The strength of a game-playing program is mainly based on the adequacy of the evaluation function and the efficacy of the search algorithm. This paper investigates how temporal difference learning and genetic algorithms can be used to improve various decisions made during game-tree search. The existent TD algorithms are not directly suitable for learning search decisions. Therefore we propose a modified update rule that uses the TD error of the evaluation function to shorten the lag between two rewards. The genetic algorithms can be applied directly to learn search decisions. For our experiments we selected the problem of time allocation from the set of search decisions. On each move the player can decide on a certain search depth, being constrained by the amount of time left. As testing ground, we used the game of Lines of Action, which has roughly the same complexity as Othello. From the results we conclude that both the TD and the genetic approach lead to good results when compared to the existent time-allocation techniques. Finally, a brief discussion of the issues that can emerge when the algorithms are applied to more complex search decisions is given.
annual conference on computers | 2002
Levente Kocsis; Jos W. H. M. Uiterwijk; Eric O. Postma; H. Jaap van den Herik
The efficiency of alpha-beta search algorithms heavily depends on the order in which the moves are examined. This paper investigates a new move-ordering heuristic in chess, namely the Neural MoveMap (NMM) heuristic. The heuristic uses a neural network to estimate the likelihood of a move being the best in a certain position. The moves considered more likely to be the best are examined first. We develop an enhanced approach to apply the NMM heuristic during the search, by using a weighted combination of the neural-network scores and the history-heuristic scores. Moreover, we analyse the influence of existing game databases and opening theory on the design of the training patterns. The NMM heuristic is tested for middle-game chess positions by the program Crafty. The experimental results indicate that the NMM heuristic outperforms the existing move ordering, especially when a weighted-combination approach is chosen.
european conference on machine learning | 2009
Levente Kocsis; András György
Local search algorithms for global optimization often suffer from getting trapped in a local optimum. The common solution for this problem is to restart the algorithm when no progress is observed. Alternatively, one can start multiple instances of a local search algorithm, and allocate computational resources (in particular, processing time) to the instances depending on their behavior. Hence, a multi-start strategy has to decide (dynamically) when to allocate additional resources to a particular instance and when to start new instances. In this paper we propose a consistent multi-start strategy that assumes a convergence rate of the local search algorithm up to an unknown constant, and in every phase gives preference to those instances that could converge to the best value for a particular range of the constant. Combined with the local search algorithm SPSA (Simultaneous Perturbation Stochastic Approximation), the strategy performs remarkably well in practice, both on synthetic tasks and on tuning the parameters of learning algorithms.