Xianping Guo
Sun Yat-sen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xianping Guo.
Archive | 2009
Xianping Guo; Onésimo Hernández-Lerma
In Chap. 2, we formally introduce the concepts associated to a continuous time MDP. Namely, the basic model of continuous-time MDPs and the concept of a Markov policy are stated in precise terms in Sect. 2.2. We also give, in Sect. 2.3, a precise definition of state and action processes in continuous-time MDPs, together with some fundamental properties of these two processes. Then, in Sect. 2.4, we introduce the basic optimality criteria that we are interested in.
Top | 2006
Xianping Guo; Onésimo Hernández-Lerma; Tomás Prieto-Rumeau; Xi-Ren Cao; Junyu Zhang; Qiying Hu; Mark E. Lewis; Ricardo Vélez
This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games.
Acta Applicandae Mathematicae | 2003
Xianping Guo; Onésimo Hernández-Lerma
This paper studies denumerable state continuous-time controlled Markov chains with the discounted reward criterion and a Borel action space. The reward and transition rates are unbounded, and the reward rates are allowed to take positive or negative values. First, we present new conditions for a nonhomogeneous Q(t)-process to be regular. Then, using these conditions, we give a new set of mild hypotheses that ensure the existence of ∈-optimal (∈≥0) stationary policies. We also present a ‘martingale characterization’ of an optimal stationary policy. Our results are illustrated with controlled birth and death processes.
IEEE Transactions on Automatic Control | 2001
Xianping Guo; Ke Liu
This note deals with continuous-time Markov decision processes with a denumerable state space and the average cost criterion. The transition rates are allowed to be unbounded, and the action set is a Borel space. We give a new set of conditions under which the existence of optimal stationary policies is ensured by using the optimality inequality. Our results are illustrated with a controlled queueing model. Moreover, we use an example to show that our conditions do not imply the existence of a solution to the optimality equations in the previous literature.
Siam Journal on Control and Optimization | 2005
Xianping Guo; Xi-Ren Cao
In this paper we study continuous-time Markov decision processes with the average sample-path reward (ASPR) criterion and possibly unbounded transition and reward rates. We propose conditions on the systems primitive data for the existence of
Automatica | 2004
Xi-Ren Cao; Xianping Guo
\epsilon
IEEE Transactions on Automatic Control | 2007
Xianping Guo
-ASPR-optimal (deterministic) stationary policies in a class of randomized Markov policies satisfying some additional continuity assumptions. The proof of this fact is based on the time discretization technique, the martingale stability theory, and the concept of potential. We also provide both policy and value iteration algorithms for computing, or at least approximating, the
Siam Journal on Optimization | 2000
Xianping Guo; Peng Shi
\epsilon
Annals of Applied Probability | 2011
Xianping Guo; Xin-Yuan Song
-ASPR-optimal stationary policies. We illustrate with examples our main results as well as the difference between the ASPR and the average expected reward criteria.
Stochastic Analysis and Applications | 2005
Quanxin Zhu; Xianping Guo
We propose a unified framework to Markov decision problems and performance sensitivity analysis for multichain Markov processes with both discounted and average-cost performance criteria. With the fundamental concept of performance potentials, we derive both performance-gradient and performance-difference formulas, which play the central role in performance optimization. The standard policy iteration algorithms for both discounted- and average-reward MDPs can be established using the performance-difference formulas in a simple and intuitive way; and the performance-gradient formulas together with stochastic approximation may lead to new optimization schemes. This sensitivity-based point of view of performance optimization provides some insights that link perturbation analysis, Markov decision processes, and reinforcement learning together. The research is an extension of the previous work on ergodic Markov chains (Cao, Automatica 36 (2000) 771).