Olivier Spanjaard | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Olivier Spanjaard is active.

Explore More

Publication

Featured researches published by Olivier Spanjaard.

algorithmic decision theory | 2015

Reducing the Number of Queries in Interactive Value Iteration

Hugo Gilbert; Olivier Spanjaard; Paolo Viappiani; Paul Weng

To tackle the potentially hard task of defining the reward function in a Markov Decision Process MDPs, a new approach, called Interactive Value Iteration IVI has recently been proposed by Weng and Zanuttini 2013. This solving method, which interweaves elicitation and optimization phases, computes a near optimal policy without knowing the precise reward values. The procedure as originally presented can be improved in order to reduce the number of queries needed to determine an optimal policy. The key insights are that 1 asking queries should be delayed as much as possible, avoiding asking queries that might not be necessary to determine the best policy, 2 queries should be asked by following a priority order because the answers to some queries can enable to resolve some other queries, 3 queries can be avoided by using heuristic information to guide the process. Following these ideas, a modified IVI algorithm is presented and experimental results show a significant decrease in the number of queries issued.

European Journal of Operational Research | 2017

A double oracle approach to minmax regret optimization problems with interval data

Hugo Gilbert; Olivier Spanjaard

In this paper, we provide a generic anytime lower bounding procedure for minmax regret optimization problems. We show that the lower bound obtained is always at least as accurate as the lower bound recently proposed by Chassein and Goerigk [3]. This lower bound can be viewed as the optimal value of a linear programming relaxation of a mixed integer programming formulation of minmax regret optimization, but the contribution of the paper is to compute this lower bound via a double oracle algorithm [10] that we specify. The double oracle algorithm is designed by relying on a game theoretic view of robust optimization, similar to the one developed by Mastin et al. [9], and it can be efficiently implemented for any minmax regret optimization problem whose standard version is easy . We describe how to efficiently embed this lower bound in a branch and bound procedure. Finally we apply our approach to the robust shortest path problem. Our numerical results show a significant gain in the computation times compared to previous approaches in the literature.

multi disciplinary trends in artificial intelligence | 2013

Markov Decision Processes with Functional Rewards

Olivier Spanjaard; Paul Weng

Markov decision processes MDP have become one of the standard models for decision-theoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.

International Journal on Artificial Intelligence Tools | 2017

Functional Reward Markov Decision Processes: Theory and Applications

Paul Weng; Olivier Spanjaard

Markov decision processes (MDP) have become one of the standard models for decisiontheoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. We also discuss the infinite horizon case and the case where a maximum operator does not exist. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.

A Quarterly Journal of Operations Research | 2004

Non-classical preference models in combinatorial problems: Models and algorithms for graphs

Olivier Spanjaard

Abstract.This is a summary of the most important results presented in the author’s PhD thesis (Spanjaard 2003). This thesis, written in French, was defended on 16 December 2003 and supervised by Patrice Perny. A copy is available from the author upon request. This thesis deals with the search for preferred solutions in combinatorial optimization problems (and more particularly graph problems). It aims at conciliating preference modelling and algorithmic concerns for decision aiding.

uncertainty in artificial intelligence | 2002