Xianping Guo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xianping Guo is active.

Explore More

Publication

Featured researches published by Xianping Guo.

Archive | 2009

Continuous-Time Markov Decision Processes

Xianping Guo; Onésimo Hernández-Lerma

In Chap. 2, we formally introduce the concepts associated to a continuous time MDP. Namely, the basic model of continuous-time MDPs and the concept of a Markov policy are stated in precise terms in Sect. 2.2. We also give, in Sect. 2.3, a precise definition of state and action processes in continuous-time MDPs, together with some fundamental properties of these two processes. Then, in Sect. 2.4, we introduce the basic optimality criteria that we are interested in.

Top | 2006

A Survey of Recent Results on Continuous-Time Markov Decision Processes

Xianping Guo; Onésimo Hernández-Lerma; Tomás Prieto-Rumeau; Xi-Ren Cao; Junyu Zhang; Qiying Hu; Mark E. Lewis; Ricardo Vélez

This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games.

Acta Applicandae Mathematicae | 2003

Continuous-Time Controlled Markov Chains with Discounted Rewards

Xianping Guo; Onésimo Hernández-Lerma

This paper studies denumerable state continuous-time controlled Markov chains with the discounted reward criterion and a Borel action space. The reward and transition rates are unbounded, and the reward rates are allowed to take positive or negative values. First, we present new conditions for a nonhomogeneous Q(t)-process to be regular. Then, using these conditions, we give a new set of mild hypotheses that ensure the existence of ∈-optimal (∈≥0) stationary policies. We also present a ‘martingale characterization’ of an optimal stationary policy. Our results are illustrated with controlled birth and death processes.

IEEE Transactions on Automatic Control | 2001

A note on optimality conditions for continuous-time Markov decision processes with average cost criterion

Xianping Guo; Ke Liu

This note deals with continuous-time Markov decision processes with a denumerable state space and the average cost criterion. The transition rates are allowed to be unbounded, and the action set is a Borel space. We give a new set of conditions under which the existence of optimal stationary policies is ensured by using the optimality inequality. Our results are illustrated with a controlled queueing model. Moreover, we use an example to show that our conditions do not imply the existence of a solution to the optimality equations in the previous literature.

Siam Journal on Control and Optimization | 2005

Optimal Control of Ergodic Continuous-Time Markov Chains with Average Sample-Path Rewards

Xianping Guo; Xi-Ren Cao

In this paper we study continuous-time Markov decision processes with the average sample-path reward (ASPR) criterion and possibly unbounded transition and reward rates. We propose conditions on the systems primitive data for the existence of

Automatica | 2004

A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: multichain cases

Xi-Ren Cao; Xianping Guo

\epsilon

IEEE Transactions on Automatic Control | 2007

Constrained Optimization for Average Cost Continuous-Time Markov Decision Processes

Xianping Guo

-ASPR-optimal (deterministic) stationary policies in a class of randomized Markov policies satisfying some additional continuity assumptions. The proof of this fact is based on the time discretization technique, the martingale stability theory, and the concept of potential. We also provide both policy and value iteration algorithms for computing, or at least approximating, the

Siam Journal on Optimization | 2000

Limiting Average Criteria For Nonstationary Markov Decision Processes

Xianping Guo; Peng Shi

\epsilon

Annals of Applied Probability | 2011

Discounted continuous-time constrained Markov decision processes in Polish spaces.

Xianping Guo; Xin-Yuan Song

-ASPR-optimal stationary policies. We illustrate with examples our main results as well as the difference between the ASPR and the average expected reward criteria.

Stochastic Analysis and Applications | 2005

Another Set of Conditions for Strong n (n = −1, 0) Discount Optimality in Markov Decision Processes

Quanxin Zhu; Xianping Guo

We propose a unified framework to Markov decision problems and performance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-difference formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted- and average-reward MDPs can be established using the performance-difference formulas in a simple and intuitive way; and the performance-gradient formulas together with stochastic approximation may lead to new optimization schemes. This sensitivity-based point of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36 (2000) 771).

Explore More