Nicholas K. Jong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Nicholas K. Jong is active.

Explore More

Publication

Featured researches published by Nicholas K. Jong.

european conference on machine learning | 2008

Transferring instances for model-based reinforcement learning

Matthew E. Taylor; Nicholas K. Jong; Peter Stone

Reinforcement learningagents typically require a significant amount of data before performing well on complex tasks. Transfer learningmethods have made progress reducing sample complexity, but they have primarily been applied to model-free learning methods, not more data-efficient model-based learning methods. This paper introduces timbrel , a novel method capable of transferring information effectively into a model-based reinforcement learning algorithm. We demonstrate that timbrel can significantly improve the sample efficiency and asymptotic performance of a model-based algorithm when learning in a continuous state space. Additionally, we conduct experiments to test the limits of timbrel s effectiveness.

symposium on abstraction reformulation and approximation | 2007

Model-based exploration in continuous state spaces

Nicholas K. Jong; Peter Stone

Modern reinforcement learning algorithms effectively exploit experience data sampled from an unknown controlled dynamical system to compute a good control policy, but to obtain the necessary data they typically rely on naive exploration mechansisms or human domain knowledge. Approaches that first learn a model offer improved exploration in finite problems, but discrete model representations do not extend directly to continuous problems. This paper develops a method for approximating continuous models by fitting data to a finite sample of states, leading to finite representations compatible with existing model-based exploration mechanisms. Experiments with the resulting family of fitted-model reinforcement learning algorithms reveals the critical importance of how the continuous model is generalized from finite data. This paper demonstrates instantiations of fitted-model algorithms that lead to faster learning on benchmark problems than contemporary model-free RL algorithms that only apply generalization in estimating action values. Finally, the paper concludes that in continuous problems, the exploration-exploitation tradeoff is better construed as a balance between exploration and generalization.

adaptive agents and multi-agents systems | 2007

Model-based function approximation in reinforcement learning

Nicholas K. Jong; Peter Stone

Reinforcement learning promises a generic method for adapting agents to arbitrary tasks in arbitrary stochastic environments, but applying it to new real-world problems remains difficult, a few impressive success stories notwithstanding. Most interesting agent-environment systems have large state spaces, so performance depends crucially on efficient generalization from a small amount of experience. Current algorithms rely on model-free function approximation, which estimates the long-term values of states and actions directly from data and assumes that actions have similar values in similar states. This paper proposes model-based function approximation, which combines two forms of generalization by assuming that in addition to having similar values in similar states, actions also have similar effects. For one family of generalization schemes known as averagers, computation of an approximate value function from an approximate model is shown to be equivalent to the computation of the exact value function for a finite model derived from data. This derivation both integrates two independent sources of generalization and permits the extension of model-based techniques developed for finite problems. Preliminary experiments with a novel algorithm, AMBI (Approximate Models Based on Instances), demonstrate that this approach yields faster learning on some standard benchmark problems than many contemporary algorithms.

Robotics and Autonomous Systems | 2006

From pixels to multi-robot decision-making: A study in uncertainty

Peter Stone; Mohan Sridharan; Daniel Stronger; Gregory Kuhlmann; Nate Kohl; Peggy Fidelman; Nicholas K. Jong

Mobile robots must cope with uncertainty from many sources along the path from interpreting raw sensor inputs to behavior selection to execution of the resulting primitive actions. This article identifies several such sources and introduces methods for (i) reducing uncertainty and (ii) making decisions in the face of uncertainty. We present a complete vision-based robotic system that includes several algorithms for learning models that are useful and necessary for planning, and then place particular emphasis on the planning and decision-making capabilities of the robot. Specifically, we present models for autonomous color calibration, autonomous sensor and actuator modeling, and an adaptation of particle filtering for improved localization on legged robots. These contributions enable effective planning under uncertainty for robots engaged in goaloriented behavior within a dynamic, collaborative and adversarial environment. Each of our algorithms is fully implemented and tested on a commercial off-the-shelf vision-based quadruped robot. c 2006 Elsevier B.V. All rights reserved.

international conference on machine learning | 2008

Hierarchical model-based reinforcement learning: R-max + MAXQ

Nicholas K. Jong; Peter Stone

Hierarchical decomposition promises to help scale reinforcement learning algorithms naturally to real-world problems by exploiting their underlying structure. Model-based algorithms, which provided the first finite-time convergence guarantees for reinforcement learning, may also play an important role in coping with the relative scarcity of data in large environments. In this paper, we introduce an algorithm that fully integrates modern hierarchical and model-learning methods in the standard reinforcement learning setting. Our algorithm, R-maxq, inherits the efficient model-based exploration of the R-max algorithm and the opportunities for abstraction provided by the MAXQ framework. We analyze the sample complexity of our algorithm, and our experiments in a standard simulation environment illustrate the advantages of combining hierarchies and models.

european conference on machine learning | 2009

Compositional Models for Reinforcement Learning

Nicholas K. Jong; Peter Stone

Innovations such as optimistic exploration, function approximation, and hierarchical decomposition have helped scale reinforcement learning to more complex environments, but these three ideas have rarely been studied together. This paper develops a unified framework that formalizes these algorithmic contributions as operators on learned models of the environment. Our formalism reveals some synergies among these innovations, and it suggests a straightforward way to compose them. The resulting algorithm, Fitted R-MAXQ, is the first to combine the function approximation of fitted algorithms, the efficient model-based exploration of R-MAX, and the hierarchical decompostion of MAXQ.

european conference on machine learning | 2006

Improvement of systems management policies using hybrid reinforcement learning

Gerald Tesauro; Nicholas K. Jong; Rajarshi Das; Mohamed N. Bennani

Reinforcement Learning (RL) holds particular promise in an emerging application domain of performance management of computing systems. In recent work, online RL yielded effective server allocation policies in a prototype Data Center, without explicit system models or built-in domain knowledge. This paper presents a substantially improved and more practical “hybrid” approach, in which RL trains offline on data collected while a queuing-theoretic policy controls the system. This approach avoids potentially poor performance in live online training. Additionally we use nonlinear function approximators instead of tabular value functions; this greatly improves scalability, and surprisingly, eliminated the need for exploratory actions. In experiments using both open-loop and closed-loop traffic as well as large switching delays, our results show significant performance improvement over state-of-art queuing model policies.

Cluster Computing | 2007