Istvan Szita | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Istvan Szita is active.

Explore More

Publication

Featured researches published by Istvan Szita.

Neural Computation | 2006

Learning tetris using the noisy cross-entropy method

Istvan Szita; András Lörincz

The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning (RL) seems to be limited because it often converges to suboptimal policies. We apply noise for preventing early convergence of the cross-entropy method, using Tetris, a computer game, for demonstration. The resulting policy outperforms previous RL algorithms by almost two orders of magnitude.

Journal of Artificial Intelligence Research | 2007

Learning to play using low-complexity rule-based policies: illustrations through Ms. Pac-Man

Istvan Szita; András Lörincz

In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either handcrafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist.

international conference on artificial neural networks | 2006

Reinforcement learning with echo state networks

Istvan Szita; Viktor Gyenes; András Lőrincz

Function approximators are often used in reinforcement learning tasks with large or continuous state spaces. Artificial neural networks, among them recurrent neural networks are popular function approximators, especially in tasks where some kind of of memory is needed, like in real-world partially observable scenarios. However, convergence guarantees for such methods are rarely available. Here, we propose a method using a class of novel RNNs, the echo state networks. Proof of convergence to a bounded region is provided for k-order Markov decision processes. Runs on POMDPs were performed to test and illustrate the working of the architecture.

Journal of Machine Learning Research | 2003

ε-mdps: learning in varying environments

Istvan Szita; Bálint Takács; András Lörincz

In this paper e-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvari and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an e-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem.

Neural Computation | 2004

Kalman filter control embedded into the reinforcement learning framework

Istvan Szita; András Lőrincz

There is a growing interest in using Kalman filter models in brain modeling. The question arises whether Kalman filter models can be used on-line not only for estimation but for control. The usual method of optimal control of Kalman filter makes use of off-line backward recursion, which is not satisfactory for this purpose. Here, it is shown that a slight modification of the linear-quadratic-gaussian Kalman filter model allows the on-line estimation of optimal control by using reinforcement learning and overcomes this difficulty. Moreover, the emerging learning rule for value estimation exhibits a Hebbian form, which is weighted by the error of the value estimation.

Neurocomputing | 2006

PIRANHA: Policy iteration for recurrent artificial neural networks with hidden activities

Istvan Szita; András Lőrincz

Abstract It is an intriguing task to develop efficient connectionist representations for learning long time series. Recurrent neural networks have great promises here. We model the learning task as a minimization problem of a nonlinear least-squares cost function, that takes into account both one-step and multi-step prediction errors. The special structure of the cost function is constructed to build a bridge to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm, and show that RNN training can be made to fit the reinforcement learning framework in a natural fashion. The relevance of this connection is discussed. We also present experimental results, which demonstrate the appealing properties of the unique parameter structure prescribed by reinforcement learning. Experiments cover both sequence learning and long-term prediction.

international symposium on neural networks | 2004

Simple algorithm for recurrent neural networks that can learn sequence completion

Istvan Szita; András Lörincz

We can memorize long sequences like melodies or poems and it is intriguing to develop efficient connectionist representations for this problem. Recurrent neural networks have been proved to offer a reasonable approach here. We start from a few axiomatic assumptions and provide a simple mathematical framework that encapsulates the problem. A gradient-descent based algorithm is derived in this framework. Demonstrations on a benchmark problem show the applicability of our approach.

international symposium on neural networks | 2004

An algorithm for finding reliably schedulable plans

Bálint Takács; Istvan Szita; András Lörincz

For interacting agents in time-critical applications, learning whether a subtask can be scheduled reliably is an important issue. The identification of sub-problems of this nature may promote e.g., planning, scheduling and segmenting in Markov decision processes. We define a subtask to be schedulable if its execution time has a small variance. We present an algorithm for finding such subtasks.

international conference on machine learning | 2008