Ronald Parr | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ronald Parr is active.

Explore More

Publication

Featured researches published by Ronald Parr.

Journal of Artificial Intelligence Research | 2003

Efficient solution algorithms for factored MDPs

Carlos Guestrin; Daphne Koller; Ronald Parr; Shobha Venkataraman

This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 1040 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.

international conference on robotics and automation | 2004

DP-SLAM 2.0

Austin I. Eliazar; Ronald Parr

Probabilistic approaches have proved very successful at addressing the basic problems of robot localization and mapping and they have shown great promise on the combined problem of simultaneous localization and mapping (SLAM). One approach to SLAM assumes relatively sparse, relatively unambiguous landmarks and builds a Kalman filter over landmark positions. Other approaches assume dense sensor data which individually are not very distinctive, such as those available from a laser range finder. In earlier work, we presented an algorithm called DP-SLAM, which provided a very accurate solution to the latter case by efficiently maintaining a joint distribution over robot maps and poses. The approach assumed an extremely accurate laser range finder and a deterministic environment. In this work we demonstrate an improved map representation and laser penetration model, an improvement in the asymptotic efficiency of the algorithm, and empirical results of loop closing on a high resolution map of a very challenging domain.

international conference on machine learning | 2007

Analyzing feature generation for value-function approximation

Ronald Parr; Christopher Painter-Wakefield; Lihong Li; Michael L. Littman

We analyze a simple, Bellman-error-based approach to generating basis functions for value-function approximation. We show that it generates orthogonal basis functions that provably tighten approximation error bounds. We also illustrate the use of this approach in the presence of noise on some sample problems.

international conference on machine learning | 2008

An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

Ronald Parr; Lihong Li; Gavin Taylor; Christopher Painter-Wakefield; Michael L. Littman

We show that linear value-function approximation is equivalent to a form of linear model approximation. We then derive a relationship between the model-approximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or value-function improvement. We also show how these results give insight into the behavior of existing feature-selection algorithms.

very large data bases | 2002

XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation

Lipyeow Lim; Min Wang; Sriram Padmanabhan; Jeffrey Scott Vitter; Ronald Parr

The extensible mark-up language (XML) is gaining widespread use as a format for data exchange and storage on the World Wide Web. Queries over XML data require accurate selectivity estimation of path expressions to optimize query execution plans. Selectivity estimation of XML path expression is usually done based on summary statistics about the structure of the underlying XML repository. All previous methods require an off-line scan of the XML repository to collect the statistics. In this paper, we propose XPathLearner, a method for estimating selectivity of the most commonly used types of path expressions without looking at the XML data. XPathLearner gathers and refines the statistics using query feedback in an on-line manner and is especially suited to queries in Internet scale applications since the underlying XML repository is either inaccessible or too large to be scanned in its entirety. Besides the on-line property, our method also has two other novel features: (a) XPathLearner is workload-aware in collecting the statistics and thus can be more accurate than the more costly off-line method under tight memory constraints, and (b) XPathLearner automatically adjusts the statistics using query feedback when the underlying XML data change. We show empirically the estimation accuracy of our method using several real data sets.

international conference on machine learning | 2009

Kernelized value function approximation for reinforcement learning

Gavin Taylor; Ronald Parr

A recent surge in research in kernelized approaches to reinforcement learning has sought to bring the benefits of kernelized machine learning techniques to reinforcement learning. Kernelized reinforcement learning techniques are fairly new and different authors have approached the topic with different assumptions and goals. Neither a unifying view nor an understanding of the pros and cons of different approaches has yet emerged. In this paper, we offer a unifying view of the different approaches to kernelized value function approximation for reinforcement learning. We show that, except for different approaches to regularization, Kernelized LSTD (KLSTD) is equivalent to a modelbased approach that uses kernelized regression to find an approximate reward and transition model, and that Gaussian Process Temporal Difference learning (GPTD) returns a mean value function that is equivalent to these other approaches. We also discuss the relationship between our modelbased approach and the earlier Gaussian Processes in Reinforcement Learning (GPRL). Finally, we decompose the Bellman error into the sum of transition error and reward error terms, and demonstrate through experiments that this decomposition can be helpful in choosing regularization parameters.

international conference on machine learning | 2004

Learning probabilistic motion models for mobile robots

Austin I. Eliazar; Ronald Parr

Machine learning methods are often applied to the problem of learning a map from a robots sensor data, but they are rarely applied to the problem of learning a robots motion model. The motion model, which can be influenced by robot idiosyncrasies and terrain properties, is a crucial aspect of current algorithms for Simultaneous Localization and Mapping (SLAM). In this paper we concentrate on generating the correct motion model for a robot by applying EM methods in conjunction with a current SLAM algorithm. In contrast to previous calibration approaches, we not only estimate the mean of the motion, but also the interdependencies between motion terms, and the variances in these terms. This can be used to provide a more focused proposal distribution to a particle filter used in a SLAM algorithm, which can reduce the resources needed for localization while decreasing the chance of losing track of the robots position. We validate this approach by recovering a good motion model despite initialization with a poor one. Further experiments validate the generality of the learned model in similar circumstances.

hellenic conference on artificial intelligence | 2002

Least-Squares Methods in Reinforcement Learning for Control

Michail G. Lagoudakis; Ronald Parr; Michael L. Littman

Least-squares methods have been successfully used for prediction problems in the context of reinforcement learning, but little has been done in extending these methods to control problems. This paper presents an overview of our research efforts in using least-squares techniques for control. In our early attempts, we considered a direct extension of the Least-Squares Temporal Difference (LSTD) algorithm in the spirit of Q-learning. Later, an effort to remedy some limitations of this algorithm (approximation bias, poor sample utilization) led to the Least-Squares Policy Iteration (LSPI) algorithm, which is a form of model-free approximate policy iteration and makes efficient use of training samples collected in any arbitrary manner. The algorithms are demonstrated on a variety of learning domains, including algorithm selection, inverted pendulum balancing, bicycle balancing and riding, multiagent learning in factored domains, and, recently, on two-player zero-sum Markov games and the game of Tetris.

IEEE Transactions on Signal Processing | 2007

Nonmyopic Multiaspect Sensing With Partially Observable Markov Decision Processes

Shihao Ji; Ronald Parr; Lawrence Carin

We consider the problem of sensing a concealed or distant target by interrogation from multiple sensors situated on a single platform. The available actions that may be taken are selection of the next relative target-platform orientation and the next sensor to be deployed. The target is modeled in terms of a set of states, each state representing a contiguous set of target-sensor orientations over which the scattering physics is relatively stationary. The sequence of states sampled at multiple target-sensor orientations may be modeled as a Markov process. The sensor only has access to the scattered fields, without knowledge of the particular state being sampled, and, therefore, the problem is modeled as a partially observable Markov decision process (POMDP). The POMDP yields a policy, in which the belief state at any point is mapped to a corresponding action. The nonmyopic policy is compared to an approximate myopic approach, with example results presented for measured underwater acoustic scattering data

international joint conference on artificial intelligence | 2011

Security games with multiple attacker resources

Dmytro Korzhyk; Vincent Conitzer; Ronald Parr

Algorithms for finding game-theoretic solutions are now used in several real-world security applications. This work has generally assumed a Stackelberg model where the defender commits to a mixed strategy first. In general two-player normal-form games, Stackelberg strategies are easier to compute than Nash equilibria, though it has recently been shown that in many security games, Stackelberg strategies are also Nash strategies for the defender. However, the work on security games so far assumes that the attacker attacks only a single target. In this paper, we generalize to the case where the attacker attacks multiple targets simultaneously. Here, Stackelberg and Nash strategies for the defender can be truly different. We provide a polynomial-time algorithm for finding a Nash equilibrium. The algorithm gradually increases the number of defender resources and maintains an equilibrium throughout this process. Moreover, we prove that Nash equilibria in security games with multiple attackers satisfy the interchange property, which resolves the problem of equilibrium selection in such games. On the other hand, we show that Stackelberg strategies are actually NP-hard to compute in this context. Finally, we provide experimental results.

Explore More