Istvan Szita
Eötvös Loránd University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Istvan Szita.
Neural Computation | 2006
Istvan Szita; András Lörincz
The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning (RL) seems to be limited because it often converges to suboptimal policies. We apply noise for preventing early convergence of the cross-entropy method, using Tetris, a computer game, for demonstration. The resulting policy outperforms previous RL algorithms by almost two orders of magnitude.
Journal of Artificial Intelligence Research | 2007
Istvan Szita; András Lörincz
In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either handcrafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist.
international conference on artificial neural networks | 2006
Istvan Szita; Viktor Gyenes; András Lőrincz
Function approximators are often used in reinforcement learning tasks with large or continuous state spaces. Artificial neural networks, among them recurrent neural networks are popular function approximators, especially in tasks where some kind of of memory is needed, like in real-world partially observable scenarios. However, convergence guarantees for such methods are rarely available. Here, we propose a method using a class of novel RNNs, the echo state networks. Proof of convergence to a bounded region is provided for k-order Markov decision processes. Runs on POMDPs were performed to test and illustrate the working of the architecture.
Journal of Machine Learning Research | 2003
Istvan Szita; Bálint Takács; András Lörincz
In this paper e-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an e-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem.
Neural Computation | 2004
Istvan Szita; András Lőrincz
There is a growing interest in using Kalman filter models in brain modeling. The question arises whether Kalman filter models can be used on-line not only for estimation but for control. The usual method of optimal control of Kalman filter makes use of off-line backward recursion, which is not satisfactory for this purpose. Here, it is shown that a slight modification of the linear-quadratic-gaussian Kalman filter model allows the on-line estimation of optimal control by using reinforcement learning and overcomes this difficulty. Moreover, the emerging learning rule for value estimation exhibits a Hebbian form, which is weighted by the error of the value estimation.
Neurocomputing | 2006
Istvan Szita; András Lőrincz
Abstract It is an intriguing task to develop efficient connectionist representations for learning long time series. Recurrent neural networks have great promises here. We model the learning task as a minimization problem of a nonlinear least-squares cost function, that takes into account both one-step and multi-step prediction errors. The special structure of the cost function is constructed to build a bridge to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm, and show that RNN training can be made to fit the reinforcement learning framework in a natural fashion. The relevance of this connection is discussed. We also present experimental results, which demonstrate the appealing properties of the unique parameter structure prescribed by reinforcement learning. Experiments cover both sequence learning and long-term prediction.
international symposium on neural networks | 2004
Istvan Szita; András Lörincz
We can memorize long sequences like melodies or poems and it is intriguing to develop efficient connectionist representations for this problem. Recurrent neural networks have been proved to offer a reasonable approach here. We start from a few axiomatic assumptions and provide a simple mathematical framework that encapsulates the problem. A gradient-descent based algorithm is derived in this framework. Demonstrations on a benchmark problem show the applicability of our approach.
international symposium on neural networks | 2004
Bálint Takács; Istvan Szita; András Lörincz
For interacting agents in time-critical applications, learning whether a subtask can be scheduled reliably is an important issue. The identification of sub-problems of this nature may promote e.g., planning, scheduling and segmenting in Markov decision processes. We define a subtask to be schedulable if its execution time has a small variance. We present an algorithm for finding such subtasks.
international conference on machine learning | 2008
Istvan Szita; András Lőrincz
Acta Cybernetica | 2008
Istvan Szita; András Lörincz