Rémi Munos | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rémi Munos is active.

Explore More

Publication

Featured researches published by Rémi Munos.

Machine Learning | 2002

Variable Resolution Discretization in Optimal Control

Rémi Munos; Andrew W. Moore

The problem of state abstraction is of central importance in optimal control, reinforcement learning and Markov decision processes. This paper studies the case of variable resolution state abstraction for continuous time and space, deterministic dynamic control problems in which near-optimal policies are required. We begin by defining a class of variable resolution policy and value function representations based on Kuhn triangulations embedded in a kd-trie. We then consider top-down approaches to choosing which cells to split in order to generate improved policies. The core of this paper is the introduction and evaluation of a wide variety of possible splitting criteria. We begin with local approaches based on value function and policy properties that use only features of individual cells in making split choices. Later, by introducing two new non-local measures, influence and variance, we derive splitting criteria that allow one cell to efficiently take into account its impact on other cells when deciding whether to split. Influence is an efficiently-calculable measure of the extent to which changes in some state effect the value function of some other states. Variance is an efficiently-calculable measure of how risky is some state in a Markov chain: a low variance state is one in which we would be very surprised if, during any one execution, the long-term reward attained from that state differed substantially from its expected value, given by the value function.The paper proceeds by graphically demonstrating the various approaches to splitting on the familiar, non-linear, non-minimum phase, and two dimensional problem of the “Car on the hill”. It then evaluates the performance of a variety of splitting criteria on many benchmark problems, paying careful attention to their number-of-cells versus closeness-to-optimality tradeoff curves.

Machine Learning | 2000

A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions

Rémi Munos

This paper proposes a study of Reinforcement Learning (RL) for continuous state-space and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulative reinforcement. In the continuous case, the value function satisfies a non-linear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the Hamilton-Jacobi-Bellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradient-descent methods may converge to one of these generalized solutions, thus failing to find the optimal control.In order to solve the HJB equation, we use the powerful framework of viscosity solutions and state that there exists a unique viscosity solution to the HJB equation, which is the value function. Then, we use another main result of VSs (their stability when passing to the limit) to prove the convergence of numerical approximations schemes based on finite difference (FD) and finite element (FE) methods. These methods discretize, at some resolution, the HJB equation into a DP equation of a Markov Decision Process (MDP), which can be solved by DP methods (thanks to a “strong” contraction property) if all the initial data (the state dynamics and the reinforcement function) were perfectly known. However, in the RL approach, as we consider a system in interaction with some a priori (at least partially) unknown environment, which learns “from experience”, the initial data are not perfectly known but have to be approximated during learning. The main contribution of this work is to derive a general convergence theorem for RL algorithms when one uses only “approximations” (in a sense of satisfying some “weak” contraction property) of the initial data. This result can be used for model-based or model-free RL algorithms, with off-line or on-line updating methods, for deterministic or stochastic state dynamics (though this latter case is not described here), and based on FE or FD discretization methods. It is illustrated with several RL algorithms and one numerical simulation for the “Car on the Hill” problem.

Siam Journal on Control and Optimization | 2005

Sensitivity Analysis Using Itô-Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control

Emmanuel Gobet; Rémi Munos

We consider a multidimensional diffusion process

international conference on machine learning | 2005

Finite time bounds for sampling based fitted value iteration

Csaba Szepesvári; Rémi Munos

(X^\alpha_t)_{0\leq t\leq T}

conference on learning theory | 2006

Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path

András Antos; Csaba Szepesvári; Rémi Munos

whose dynamics depends on a parameter

international conference on machine learning | 2003

Error bounds for approximate policy iteration

Rémi Munos

\alpha

international joint conference on artificial intelligence | 1999

Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems

Rémi Munos; Andrew W. Moore

. Our first purpose is to write as an expectation the sensitivity

Methodology and Computing in Applied Probability | 2006

Numerical methods for the pricing of Swing options: a stochastic control approach

Christophe Barrera-Esteve; Florent Bergeret; Charles Dossal; Emmanuel Gobet; Asma Meziou; Rémi Munos; Damien Reboul-Salze

\nabla_\alpha J(\alpha)

Journal of Machine Learning Research | 2006

Policy Gradient in Continuous Time

Rémi Munos

for the expected cost

neural information processing systems | 1998

Barycentric Interpolators for Continuous Space and Time Reinforcement Learning

Rémi Munos; Andrew W. Moore

J(\alpha)=\mathbb{E}(f(X^\alpha_T))

Explore More

Collaboration

Dive into the Rémi Munos's collaboration.

Top Co-Authors

Andrew W. Moore

Carnegie Mellon University

View shared research outputs

Top Co-Authors

Csaba Szepesvári

University of Alberta

View shared research outputs

Top Co-Authors

Asma Meziou

École Polytechnique

View shared research outputs

Top Co-Authors

Charles Dossal

École Polytechnique

View shared research outputs

Top Co-Authors

Christophe Barrera-Esteve

Gaz de France

View shared research outputs

Top Co-Authors

Damien Reboul-Salze

Gaz de France

View shared research outputs

Top Co-Authors

Florent Bergeret

Gaz de France

View shared research outputs

Top Co-Authors

András Antos

Hungarian Academy of Sciences

View shared research outputs

Top Co-Authors

Leemon C. Baird

Carnegie Mellon University

View shared research outputs

Explore More