Rémi Munos
École Polytechnique
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rémi Munos.
Machine Learning | 2002
Rémi Munos; Andrew W. Moore
The problem of state abstraction is of central importance in optimal control, reinforcement learning and Markov decision processes. This paper studies the case of variable resolution state abstraction for continuous time and space, deterministic dynamic control problems in which near-optimal policies are required. We begin by defining a class of variable resolution policy and value function representations based on Kuhn triangulations embedded in a kd-trie. We then consider top-down approaches to choosing which cells to split in order to generate improved policies. The core of this paper is the introduction and evaluation of a wide variety of possible splitting criteria. We begin with local approaches based on value function and policy properties that use only features of individual cells in making split choices. Later, by introducing two new non-local measures, influence and variance, we derive splitting criteria that allow one cell to efficiently take into account its impact on other cells when deciding whether to split. Influence is an efficiently-calculable measure of the extent to which changes in some state effect the value function of some other states. Variance is an efficiently-calculable measure of how risky is some state in a Markov chain: a low variance state is one in which we would be very surprised if, during any one execution, the long-term reward attained from that state differed substantially from its expected value, given by the value function.The paper proceeds by graphically demonstrating the various approaches to splitting on the familiar, non-linear, non-minimum phase, and two dimensional problem of the “Car on the hill”. It then evaluates the performance of a variety of splitting criteria on many benchmark problems, paying careful attention to their number-of-cells versus closeness-to-optimality tradeoff curves.
Machine Learning | 2000
Rémi Munos
This paper proposes a study of Reinforcement Learning (RL) for continuous state-space and time control problems, based on the theoretical framework of viscosity solutions (VSs). We use the method of dynamic programming (DP) which introduces the value function (VF), expectation of the best future cumulative reinforcement. In the continuous case, the value function satisfies a non-linear first (or second) order (depending on the deterministic or stochastic aspect of the process) differential equation called the Hamilton-Jacobi-Bellman (HJB) equation. It is well known that there exists an infinity of generalized solutions (differentiable almost everywhere) to this equation, other than the VF. We show that gradient-descent methods may converge to one of these generalized solutions, thus failing to find the optimal control.In order to solve the HJB equation, we use the powerful framework of viscosity solutions and state that there exists a unique viscosity solution to the HJB equation, which is the value function. Then, we use another main result of VSs (their stability when passing to the limit) to prove the convergence of numerical approximations schemes based on finite difference (FD) and finite element (FE) methods. These methods discretize, at some resolution, the HJB equation into a DP equation of a Markov Decision Process (MDP), which can be solved by DP methods (thanks to a “strong” contraction property) if all the initial data (the state dynamics and the reinforcement function) were perfectly known. However, in the RL approach, as we consider a system in interaction with some a priori (at least partially) unknown environment, which learns “from experience”, the initial data are not perfectly known but have to be approximated during learning. The main contribution of this work is to derive a general convergence theorem for RL algorithms when one uses only “approximations” (in a sense of satisfying some “weak” contraction property) of the initial data. This result can be used for model-based or model-free RL algorithms, with off-line or on-line updating methods, for deterministic or stochastic state dynamics (though this latter case is not described here), and based on FE or FD discretization methods. It is illustrated with several RL algorithms and one numerical simulation for the “Car on the Hill” problem.
Siam Journal on Control and Optimization | 2005
Emmanuel Gobet; Rémi Munos
We consider a multidimensional diffusion process
international conference on machine learning | 2005
Csaba Szepesvári; Rémi Munos
(X^\alpha_t)_{0\leq t\leq T}
conference on learning theory | 2006
András Antos; Csaba Szepesvári; Rémi Munos
whose dynamics depends on a parameter
international conference on machine learning | 2003
Rémi Munos
\alpha
international joint conference on artificial intelligence | 1999
Rémi Munos; Andrew W. Moore
. Our first purpose is to write as an expectation the sensitivity
Methodology and Computing in Applied Probability | 2006
Christophe Barrera-Esteve; Florent Bergeret; Charles Dossal; Emmanuel Gobet; Asma Meziou; Rémi Munos; Damien Reboul-Salze
\nabla_\alpha J(\alpha)
Journal of Machine Learning Research | 2006
Rémi Munos
for the expected cost
neural information processing systems | 1998
Rémi Munos; Andrew W. Moore
J(\alpha)=\mathbb{E}(f(X^\alpha_T))