Paul Weng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul Weng is active.

Explore More

Publication

Featured researches published by Paul Weng.

european conference on artificial intelligence | 2010

On Finding Compromise Solutions in Multiobjective Markov Decision Processes

Patrice Perny; Paul Weng

A Markov Decision Process (MDP) is a general model for solving planning problems under uncertainty. It has been extended to multiobjective MDP to address multicriteria or multiagent problems in which the value of a decision must be evaluated according to several viewpoints, sometimes conflicting. Although most of the studies concentrate on the determination of the set of Pareto-optimal policies, we focus here on a more specialized problem that concerns the direct determination of policies achieving well-balanced tradeoffs. We first explain why this problem cannot simply be solved by optimizing a linear combination of criteria. This leads us to use an alternative optimality concept which formalizes the notion of best compromise solution, i.e. a policy yielding an expected-utility vector as close as possible (w.r.t. Tchebycheff norm) to a reference point. We show that this notion of optimality depends on the initial state. Moreover, it appears that the best compromise policy cannot be found by a direct adaptation of value iteration. In addition, we observe that in some (if not most) situations, the optimal solution can only be obtained with a randomized policy. To overcome all these problems, we propose a solution method based on linear programming and give some experimental results.

International Journal of Information Technology and Decision Making | 2013

A COMPROMISE PROGRAMMING APPROACH TO MULTIOBJECTIVE MARKOV DECISION PROCESSES

Włodzimierz Ogryczak; Patrice Perny; Paul Weng

A Markov decision process (MDP) is a general model for solving planning problems under uncertainty. It has been extended to multiobjective MDP to address multicriteria or multiagent problems in which the value of a decision must be evaluated according to several viewpoints, sometimes conflicting. Although most of the studies concentrate on the determination of the set of Pareto-optimal policies, we focus here on a more specialized problem that concerns the direct determination of policies achieving well-balanced tradeoffs. To this end, we introduce a reference point method based on the optimization of a weighted ordered weighted average (WOWA) of individual disachievements. We show that the resulting notion of optimal policy does not satisfy the Bellman principle and depends on the initial state. To overcome these difficulties, we propose a solution method based on a linear programming (LP) reformulation of the problem. Finally, we illustrate the feasibility of the proposed method on two types of planning problems under uncertainty arising in navigation of an autonomous agent and in inventory management.

algorithmic decision theory | 2011

On minimizing ordered weighted regrets in multiobjective Markov decision processes

Włodzimierz Ogryczak; Patrice Perny; Paul Weng

In this paper, we propose an exact solution method to generate fair policies in Multiobjective Markov Decision Processes (MMDPs). MMDPs consider n immediate reward functions, representing either individual payoffs in a multiagent problem or rewards with respect to different objectives. In this context, we focus on the determination of a policy that fairly shares regrets among agents or objectives, the regret being defined on each dimension as the opportunity loss with respect to optimal expected rewards. To this end, we propose to minimize the ordered weighted average of regrets (OWR). The OWR criterion indeed extends the minimax regret, relaxing egalitarianism for a milder notion of fairness. After showing that OWR-optimality is state-dependent and that the Bellman principle does not hold for OWR-optimal policies, we propose a linear programming reformulation of the problem. We also provide experimental results showing the efficiency of our approach.

scalable uncertainty management | 2014

Solving Hidden-Semi-Markov-Mode Markov Decision Problems

Emmanuel Hadoux; Aurélie Beynier; Paul Weng

Hidden-Mode Markov Decision Processes HM-MDPs were proposed to represent sequential decision-making problems in non-stationary environments that evolve according to a Markov chain. We introduce in this paper Hidden-Semi-Markov-Mode Markov Decision Process es HS3MDPs, a generalization of HM-MDPs to the more realistic case of non-stationary environments evolving according to a semi-Markov chain. Like HM-MDPs, HS3MDPs form a subclass of Partially Observable Markov Decision Processes. Therefore, large instances of HS3MDPs and HM-MDPs can be solved using an online algorithm, the Partially Observable Monte Carlo Planning POMCP algorithm, based on Monte Carlo Tree Search exploiting particle filters for belief state approximation. We propose a first adaptation of POMCP to solve HS3MDPs more efficiently by exploiting their structure. Our empirical results show that the first adapted POMCP reaches higher cumulative rewards than the original algorithm. However, in larger instances, POMCP may run out of particles. To solve this issue, we propose a second adaptation of POMCP, replacing particle filters by exact representations of beliefs. Our empirical results indicate that this new version reaches high cumulative rewards faster than the former adapted POMCP and still remains efficient even for large problems.

algorithmic decision theory | 2015

Reducing the Number of Queries in Interactive Value Iteration

Hugo Gilbert; Olivier Spanjaard; Paolo Viappiani; Paul Weng

To tackle the potentially hard task of defining the reward function in a Markov Decision Process MDPs, a new approach, called Interactive Value Iteration IVI has recently been proposed by Weng and Zanuttini 2013. This solving method, which interweaves elicitation and optimization phases, computes a near optimal policy without knowing the precise reward values. The procedure as originally presented can be improved in order to reduce the number of queries needed to determine an optimal policy. The key insights are that 1 asking queries should be delayed as much as possible, avoiding asking queries that might not be necessary to determine the best policy, 2 queries should be asked by following a priority order because the answers to some queries can enable to resolve some other queries, 3 queries can be avoided by using heuristic information to guide the process. Following these ideas, a modified IVI algorithm is presented and experimental results show a significant decrease in the number of queries issued.

Machine Learning | 2014

Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm

Róbert Busa-Fekete; Balázs Szörényi; Paul Weng; Weiwei Cheng; Eyke Hüllermeier

We introduce a novel approach to preference-based reinforcement learning, namely a preference-based variant of a direct policy search method based on evolutionary optimization. The core of our approach is a preference-based racing algorithm that selects the best among a given set of candidate policies with high probability. To this end, the algorithm operates on a suitable ordinal preference structure and only uses pairwise comparisons between sample rollouts of the policies. Embedding the racing algorithm in a rank-based evolutionary search procedure, we show that approximations of the so-called Smith set of optimal policies can be produced with certain theoretical guarantees. Apart from a formal performance and complexity analysis, we present first experimental studies showing that our approach performs well in practice.

european conference on artificial intelligence | 2012

Ordinal decision models for Markov decision processes

Paul Weng

Setting the values of rewards in Markov decision processes (MDP) may be a difficult task. In this paper, we consider two ordinal decision models for MDPs where only an order is known over rewards. The first one, which has been proposed recently in MDPs [23], defines preferences with respect to a reference point. The second model, which can been viewed as the dual approach of the first one, is based on quantiles. Based on the first decision model, we give a new interpretation of rewards in standard MDPs, which sheds some interesting light on the preference system used in standard MDPs. The second model based on quantile optimization is a new approach in MDPs with ordinal rewards. Although quantile-based optimality is state-dependent, we prove that an optimal stationary deterministic policy exists for a given initial state. Finally, we propose solution methods based on linear programming for optimizing quantiles.

IEEE Design & Test of Computers | 2017

Hierarchical Electric Vehicle Charging Aggregator Strategy Using Dantzig-Wolfe Decomposition

M. Hadi Amini; Paul McNamara; Paul Weng; Orkun Karabasoglu; Yinliang Xu

This article focuses on reducing a charging cost for electric vehicles (EVs). A charging strategy is proposed to minimize the charging cost of EVs within the charging station constraints.<italic>—Zili Shao</italic>

multi disciplinary trends in artificial intelligence | 2013

Axiomatic Foundations of Generalized Qualitative Utility

Paul Weng

The aim of this paper is to provide a unifying axiomatic justification for a class of qualitative decision models comprising among others optimistic/pessimistic qualitative utilities, binary possibilistic utility, likelihood-based utility, Spohns disbelief function-based utility. All those criteria that are instances of Algebraic Expected Utility have been shown to be counterparts of Expected Utility thanks to a unifying axiomatization in a von Neumann-Morgenstern setting when non probabilistic decomposable uncertainty measures are used. Those criteria are based on i¾ź,i¾ź operators, counterpart of +, × used by Expected Utility, where i¾ź is an idempotent operator and i¾ź is a triangular norm. The axiomatization is lead in the Savage setting which is a more general setting than that of von Neumann-Morgenstern as here we do not assume that the uncertainty representation of the decision-maker is known.

multi disciplinary trends in artificial intelligence | 2013

Markov Decision Processes with Functional Rewards

Olivier Spanjaard; Paul Weng

Markov decision processes MDP have become one of the standard models for decision-theoretic planning problems under uncertainty. In its standard form, rewards are assumed to be numerical additive scalars. In this paper, we propose a generalization of this model allowing rewards to be functional. The value of a history is recursively computed by composing the reward functions. We show that several variants of MDPs presented in the literature can be instantiated in this setting. We then identify sufficient conditions on these reward functions for dynamic programming to be valid. In order to show the potential of our framework, we conclude the paper by presenting several illustrative examples.

Explore More