Warren B. Powell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Warren B. Powell is active.

Explore More

Publication

Featured researches published by Warren B. Powell.

Archive | 2011

Approximate dynamic programming : solving the curses of dimensionality

Warren B. Powell

Preface. Acknowledgments. 1. The challenges of dynamic programming. 1.1 A dynamic programming example: a shortest path problem. 1.2 The three curses of dimensionality. 1.3 Some real applications. 1.4 Problem classes. 1.5 The many dialects of dynamic programming. 1.6 What is new in this book? 1.7 Bibliographic notes. 2. Some illustrative models. 2.1 Deterministic problems. 2.2 Stochastic problems. 2.3 Information acquisition problems. 2.4 A simple modeling framework for dynamic programs. 2.5 Bibliographic notes. Problems. 3. Introduction to Markov decision processes. 3.1 The optimality equations. 3.2 Finite horizon problems. 3.3 Infinite horizon problems. 3.4 Value iteration. 3.5 Policy iteration. 3.6 Hybrid valuepolicy iteration. 3.7 The linear programming method for dynamic programs. 3.8 Monotone policies. 3.9 Why does it work? 3.10 Bibliographic notes. Problems 4. Introduction to approximate dynamic programming. 4.1 The three curses of dimensionality (revisited). 4.2 The basic idea. 4.3 Sampling random variables . 4.4 ADP using the postdecision state variable. 4.5 Lowdimensional representations of value functions. 4.6 So just what is approximate dynamic programming? 4.7 Experimental issues. 4.8 Dynamic programming with missing or incomplete models. 4.9 Relationship to reinforcement learning. 4.10 But does it work? 4.11 Bibliographic notes. Problems. 5. Modeling dynamic programs. 5.1 Notational style. 5.2 Modeling time. 5.3 Modeling resources. 5.4 The states of our system. 5.5 Modeling decisions. 5.6 The exogenous information process. 5.7 The transition function. 5.8 The contribution function. 5.9 The objective function. 5.10 A measuretheoretic view of information. 5.11 Bibliographic notes. Problems. 6. Stochastic approximation methods. 6.1 A stochastic gradient algorithm. 6.2 Some stepsize recipes. 6.3 Stochastic stepsizes. 6.4 Computing bias and variance. 6.5 Optimal stepsizes. 6.6 Some experimental comparisons of stepsize formulas. 6.7 Convergence. 6.8 Why does it work? 6.9 Bibliographic notes. Problems. 7. Approximating value functions. 7.1 Approximation using aggregation. 7.2 Approximation methods using regression models. 7.3 Recursive methods for regression models. 7.4 Neural networks. 7.5 Batch processes. 7.6 Why does it work? 7.7 Bibliographic notes. Problems. 8. ADP for finite horizon problems. 8.1 Strategies for finite horizon problems. 8.2 Qlearning. 8.3 Temporal difference learning. 8.4 Policy iteration. 8.5 Monte Carlo value and policy iteration. 8.6 The actorcritic paradigm. 8.7 Bias in value function estimation. 8.8 State sampling strategies. 8.9 Starting and stopping. 8.10 A taxonomy of approximate dynamic programming strategies. 8.11 Why does it work? 8.12 Bibliographic notes. Problems. 9. Infinite horizon problems. 9.1 From finite to infinite horizon. 9.2 Algorithmic strategies. 9.3 Stepsizes for infinite horizon problems. 9.4 Error measures. 9.5 Direct ADP for online applications. 9.6 Finite horizon models for steady state applications. 9.7 Why does it work? 9.8 Bibliographic notes. Problems. 10. Exploration vs. exploitation. 10.1 A learning exercise: the nomadic trucker. 10.2 Learning strategies. 10.3 A simple information acquisition problem. 10.4 Gittins indices and the information acquisition problem. 10.5 Variations. 10.6 The knowledge gradient algorithm. 10.7 Information acquisition in dynamic programming. 10.8 Bibliographic notes. Problems. 11. Value function approximations for special functions. 11.1 Value functions versus gradients. 11.2 Linear approximations. 11.3 Piecewise linear approximations. 11.4 The SHAPE algorithm. 11.5 Regression methods. 11.6 Cutting planes. 11.7 Why does it work? 11.8 Bibliographic notes. Problems. 12. Dynamic resource allocation. 12.1 An asset acquisition problem. 12.2 The blood management problem. 12.3 A portfolio optimization problem. 12.4 A general resource allocation problem. 12.5 A fleet management problem. 12.6 A driver management problem. 12.7 Bibliographic references. Problems. 13. Implementation challenges. 13.1 Will ADP work for your problem? 13.2 Designing an ADP algorithm for complex problems. 13.3 Debugging an ADP algorithm. 13.4 Convergence issues. 13.5 Modeling your problem. 13.6 Online vs. offline models. 13.7 If it works, patent it!

(2004) | 2004

Handbook of learning and approximate dynamic programming

Jennie Si; Andrew G. Barto; Warren B. Powell; Donald C. Wunsch

Foreword. 1. ADP: goals, opportunities and principles. Part I: Overview. 2. Reinforcement learning and its relationship to supervised learning. 3. Model-based adaptive critic designs. 4. Guidance in the use of adaptive critics for control. 5. Direct neural dynamic programming. 6. The linear programming approach to approximate dynamic programming. 7. Reinforcement learning in large, high-dimensional state spaces. 8. Hierarchical decision making. Part II: Technical advances. 9. Improved temporal difference methods with linear function approximation. 10. Approximate dynamic programming for high-dimensional resource allocation problems. 11. Hierarchical approaches to concurrency, multiagency, and partial observability. 12. Learning and optimization - from a system theoretic perspective. 13. Robust reinforcement learning using integral-quadratic constraints. 14. Supervised actor-critic reinforcement learning. 15. BPTT and DAC - a common framework for comparison. Part III: Applications. 16. Near-optimal control via reinforcement learning. 17. Multiobjective control problems by reinforcement learning. 18. Adaptive critic based neural network for control-constrained agile missile. 19. Applications of approximate dynamic programming in power systems control. 20. Robust reinforcement learning for heating, ventilation, and air conditioning control of buildings. 21. Helicopter flight control using direct neural dynamic programming. 22. Toward dynamic stochastic optimal power flow. 23. Control, optimization, security, and self-healing of benchmark power systems.

Networks | 1982

An algorithm for the equilibrium assignment problem with random link times

Yosef Sheffi; Warren B. Powell

In this article we offer an equivalent minimization formulation for the traffic assignment problem when the link travel times are flow-dependent random variables. The paper shows the equivalency between the first-order conditions of this program and the stochastic equilibrium conditions as well as the uniqueness of the solution. The paper also describes an algorithmic approach to the solution of this program, including a proof of convergence. Finally, we conduct some limited numerical experiments on the rate of convergence of the algorithm and the merits of the stochastic equilibrium model, in general, as compared with deterministic approaches.

Siam Journal on Control and Optimization | 2008

A Knowledge-Gradient Policy for Sequential Information Collection

Peter I. Frazier; Warren B. Powell; Savas Dayanik

In a sequential Bayesian ranking and selection problem with independent normal populations and common known variance, we study a previously introduced measurement policy which we refer to as the knowledge-gradient policy. This policy myopically maximizes the expected increment in the value of information in each time period, where the value is measured according to the terminal utility function. We show that the knowledge-gradient policy is optimal both when the horizon is a single time period and in the limit as the horizon extends to infinity. We show furthermore that, in some special cases, the knowledge-gradient policy is optimal regardless of the length of any given fixed total sampling horizon. We bound the knowledge-gradient policys suboptimality in the remaining cases, and show through simulations that it performs competitively with or significantly better than other policies.

Transportation Science | 2002

An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times

Gregory A. Godfrey; Warren B. Powell

In a companion paper (Godfrey and Powell 2002) we introduced an adaptive dynamic programming algorithm for stochastic dynamic resource allocation problems, which arise in the context of logistics and distribution, fleet management, and other allocation problems. The method depends on estimating separable nonlinear approximations of value functions, using a dynamic programming framework. That paper considered only the case in which the time to complete an action was always a single time period. Experiments with this technique quickly showed that when the basic algorithm was applied to problems with multiperiod travel times, the results were very poor. In this paper, we illustrate why this behavior arose, and propose a modified algorithm that addresses the issue. Experimental work demonstrates that the modified algorithm works on problems with multiperiod travel times, with results that are almost as good as the original algorithm applied to single period travel times.

Transportation Science | 1992

AN OPTIMIZATION-BASED HEURISTIC FOR VEHICLE ROUTING AND SCHEDULING WITH SOFT TIME WINDOW CONSTRAINTS

Yiannis A. Koskosidis; Warren B. Powell; Marius M. Solomon

The Vehicle Routing and Scheduling Problem with Time Window constraints is formulated as a mixed integer program, and optimization-based heuristics which extend the cluster-first, route-second algorithm of Fisher and Jaikumar are developed for its solution. We present a new formulation based on the treatment of the time window constraints as soft constraints that can be violated at a cost and we heuristically decompose the problem into an assignment/clustering component and a series of routing and scheduling components. Numerical results based on randomly generated and benchmark problem sets indicate that the algorithm compares favorably to state-of-the-art local insertion and improvement heuristics.

Informs Journal on Computing | 1999

Solving Parallel Machine Scheduling Problems by Column Generation

Zhi-Long Chen; Warren B. Powell

We consider a class of problems of scheduling n jobs on m identical, uniform, or unrelated parallel machines with an objective of minimizing an additive criterion. We propose a decomposition approach for solving these problems exactly. The decomposition approach first formulates these problems as an integer program, and then reformulates the integer program, using Dantzig-Wolfe decomposition, as a set partitioning problem. Based on this set partitioning formulation, branch-and-bound exact solution algorithms can be designed for these problems. In such a branch-and-bound tree, each node is the linear relaxation problem of a set partitioning problem. This linear relaxation problem is solved by a column generation approach where each column represents a schedule on one machine and is generated by solving a single machine subproblem. Branching is conducted on variables in the original integer programming formulation instead of variables in the set partitioning formulation such that single machine subproblems are more tractable. We apply this decomposition approach to two particular problems: the total weighted completion time problem and the weighted number of tardy jobs problem. The computational results indicate that the decomposition approach is promising and capable of solving large problems.

Transportation Science | 1996

A Stochastic Formulation of the Dynamic Assignment Problem, with an Application to Truckload Motor Carriers

Warren B. Powell

The dynamic assignment problem arises in a number of application areas in transportation and logistics. Taxi drivers have to be assigned to pick up passengers, police have to be assigned to emergencies, and truck drivers have to pick up and carry loads of freight. All of these problems are characterized by demands that arrive continuously and randomly throughout the day, and require a dispatcher to assign a driver to handle a specific demand. We use as our motivating application the load matching problem that arises in long-haul truckload trucking, where we have to assign drivers to loads on a real-time basis. A hybrid model is presented that handles the detailed assignment of drivers to loads, as well as handling forecasts of future loads. Numerical experiments demonstrate that our stochastic, dynamic model outperforms standard myopic models that are widely used in practice.

Transportation Science | 1990

A Successive Linear Approximation Procedure for Stochastic, Dynamic Vehicle Allocation Problems

Linos F. Frantzeskakis; Warren B. Powell

The Stochastic Dynamic Vehicle Allocation problem involves managing a fleet of vehicles over time in an uncertain demand environment to maximize expected total profits. The problem is formulated as a Stochastic Programming problem. A new heuristic algorithm is developed and is contrasted to various deterministic approximations. The paper presents computational results that were obtained by employing a Rolling Horizon Procedure to simulate the operation of the truckload carrier. Results indicate the superiority of the new algorithm over other approaches tested.

Informs Journal on Computing | 2006

Dynamic-Programming Approximations for Stochastic Time-Staged Integer Multicommodity-Flow Problems

Huseyin Topaloglu; Warren B. Powell

In this paper, we consider a stochastic and time-dependent version of the min-cost integer multicommodity-flow problem that arises in the dynamic resource allocation context. In this problem class, tasks arriving over time have to be covered by a set of indivisible and reusable resources of different types. The assignment of a resource to a task removes the task from the system, modifies the resource, and generates a profit. When serving a task, resources of different types can serve as substitutes of each other, possibly yielding different revenues. We propose an iterative, adaptive dynamic-programming-based methodology that makes use of linear or nonlinear approximations of the value function. Our numerical work shows that the proposed method provides high-quality solutions and is computationally attractive for large problems.

Explore More