Eugene A. Feinberg
Stony Brook University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eugene A. Feinberg.
Mathematical Methods of Operations Research | 1994
Eugene A. Feinberg
This paper deals with constrained average reward Semi-Markov Decision Processes (SMDPs) with finite state and action sets. We consider two average reward criteria. The first criterion is time-average rewards, which equal the lower limits of the expected average rewards per unit time, as the horizon tends to infinity. The second criterion is ratio-average rewards, which equal the lower limits of the ratios of the expected total rewards during the firstn steps to the expected total duration of thesen steps asn → ∞. For both criteria, we prove the existence of optimal mixed stationary policies for constrained problems when the constraints are of the same nature as the objective functions. For unichain problems, we show the existence of randomized stationary policies which are optimal for both criteria. However, optimal mixed stationary policies may be different for each of these critria even for unichain problems. We provide linear programming algorithms for the computation of optimal policies.
Mathematics of Operations Research | 1996
Eugene A. Feinberg; Adam Shwartz
This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semicontinuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Suppose a feasible policy exists for a problem with M constraints. We prove two results on the existence and structure of optimal policies. First, we show that there exists a randomized stationary optimal policy which requires at most M actions more than a nonrandomized stationary one. This result is known for several particular cases. Second, we prove that there exists an optimal policy which is i stationary nonrandomized from some step onward, ii randomized Markov before this step, but the total number of actions which are added by randomization is at most M, iii the total number of actions that are added by nonstationarity is at most M. We also establish Pareto optimality of policies from the two classes described above for multi-criteria problems. We describe an algorithm to compute optimal policies with properties i--iii for constrained problems. The policies that satisfy properties i--iii have the pleasing aesthetic property that the amount of randomization they require over any trajectory is restricted by the number of constraints. In contrast, a randomized stationary policy may require an infinite number of randomizations over time.
Mathematics of Operations Research | 2004
Eugene A. Feinberg
This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time Markov decision processes (MDPs). The reduction is based on the equivalence of strategies that change actions between jumps and the randomized strategies that change actions only at jump epochs. This holds both for one-criterion problems and for multiple-objective problems with constraints. In particular, this paper introduces the theory for multiple-objective problems with expected total discounted rewards and constraints. If a problem is feasible, there exist three types of optimal policies: (i) nonrandomized switching stationary policies, (ii) randomized stationary policies for the CTJMDP, and (iii) randomized stationary policies for the corresponding SMDP with exponentially distributed sojourn times, and these policies can be implemented as randomized strategies in the CTJMDP.
Mathematics of Operations Research | 1995
Eugene A. Feinberg; Adam Shwartz
This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a different discount factor Such models arise, e.g., in production and in applications involving multiple time scales. We prove that it a feasible policy exists, then there exists an optimal policy which is (i) stationary (nonrandomized) from some step onward, (ii) randomized, Markov before this step, but the total number of actions which are added by randomization is bounded by the number of constraints. Optimality of such policies for multi-criteria problems is also established. These new policies have the pleasing aesthetic property that the amount of randomization they require over any trajectory is restricted by the number of constraints. This result is new even for constrained optimization with a single discount factor, where the optimality of randomized stationary policies is known. However, a randomized stationary policy may req...
Mathematics of Operations Research | 2012
Eugene A. Feinberg; Pavlo O. Kasyanov; Nina V. Zadoianchuk
This paper presents sufficient conditions for the existence of stationary optimal policies for average cost Markov decision processes with Borel state and action sets and weakly continuous transition probabilities. The one-step cost functions may be unbounded, and the action sets may be noncompact. The main contributions of this paper are: (i) general sufficient conditions for the existence of stationary discount optimal and average cost optimal policies and descriptions of properties of value functions and sets of optimal actions, (ii) a sufficient condition for the average cost optimality of a stationary policy in the form of optimality inequalities, and (iii) approximations of average cost optimal actions by discount optimal actions.
Mathematics of Operations Research | 1994
Eugene A. Feinberg; Adam Shwartz
We consider a discrete time Markov Decision Process with infinite horizon. The criterion to be maximized is the sum of a number of standard discounted rewards, each with a different discount factor. Situations in which such criteria arise include modeling investments, production, modeling projects of different durations and systems with multiple criteria, and some axiomatic formulations of multi-attribute preference theory. We show that for this criterion for some positive e there need not exist an e-optimal randomized stationary strategy, even when the state and action sets are finite. However, e-optimal Markov nonrandomized strategies and optimal Markov strategies exist under weak conditions. We exhibit e-optimal Markov strategies which are stationary from some time onward. When both state and action spaces are finite, there exists an optimal Markov strategy with this property. We provide an explicit algorithm for the computation of such strategies and give a description of the set of optimal strategies.
Queueing Systems | 2002
Eugene A. Feinberg; Offer Kella
We consider an M/G/1 queue with a removable server. When a customer arrives, the workload becomes known. The cost structure consists of switching costs, running costs, and holding costs per unit time which is a nonnegative nondecreasing right-continuous function of a current workload in the system. We prove an old conjecture that D-policies are optimal for the average cost per unit time criterion. It means that for this criterion there is an optimal policy that either runs the server all the time or switches the server off when the system becomes empty and switches it on when the workload reaches or exceeds some threshold D.
IEEE Transactions on Automatic Control | 1999
Eugene A. Feinberg; Adam Shwartz
We consider a discrete time Markov decision process, where the objectives are linear combinations of standard discounted rewards, each with a different discount factor. We describe several applications that motivate the recent interest in these criteria. For the special case where a standard discounted cost is to be minimized, subject to a constraint on another standard discounted cost but with a different discount factor, we provide an implementable algorithm for computing an optimal policy.
Mathematics of Operations Research | 2007
Eugene A. Feinberg; Mark E. Lewis
For general state and action space Markov decision processes, we present sufficient conditions for the existence of solutions of the average cost optimality inequalities. These conditions also imply the convergence of both the optimal discounted cost value function and policies to the corresponding objects for the average costs per unit time case. Inventory models are natural applications of our results. We describe structural properties of average cost optimal policies for the cash balance problem; an inventory control problem where the demand may be negative and the decision-maker can produce or scrap inventory. We also show the convergence of optimal thresholds in the finite horizon case to those under the expected discounted cost criterion and those under the expected discounted costs to those under the average costs per unit time criterion.
Probability in the Engineering and Informational Sciences | 1994
Eugene A. Feinberg; Martin I. Reiman
We consider a controlled queueing system that is a generalization of the M/M/c/W queue. There are m types of customers that arrive according to independent Poisson processes. Service times are exponential and independent and do not depend on the customer type. There is room in the system for a total of N customers; if there are N customers in the system, new arrivals are lost. Type j customers are more profitable than type ( j + 1 ) customers, j = 2,…, m —, and type 1 customers are at least as profitable as type 2 customers. The allowed control is to accept or reject customers at arrival. No preemption of customers in service is allowed. The goal is to maximize the average reward per unit of time subject to a constraint that the blocking probability of type 1 customers is no greater than a given level. For an M/M/c/c system without a constraint, Miller [12] proved that an optimal policy has a simple threshold structure. We show that, for the constrained problem described above, an optimal policy has a similar structure, but one of the thresholds might have to be randomized. We also derive an algorithm that constructs an optimal policy and describe other forms of optimal policies.