Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Abhijit Gosavi is active.

Publication


Featured researches published by Abhijit Gosavi.


Archive | 2003

Simulation-based optimization :

Abhijit Gosavi

Simulation-based optimization : , Simulation-based optimization : , کتابخانه مرکزی دانشگاه علوم پزشکی ایران


Informs Journal on Computing | 2009

Reinforcement Learning: A Tutorial Survey and Recent Advances

Abhijit Gosavi

In the last few years, reinforcement learning (RL), also called adaptive (or approximate) dynamic programming, has emerged as a powerful tool for solving complex sequential decision-making problems in control theory. Although seminal research in this area was performed in the artificial intelligence (AI) community, more recently it has attracted the attention of optimization theorists because of several noteworthy success stories from operations management. It is on large-scale and complex problems of dynamic optimization, in particular the Markov decision problem (MDP) and its variants, that the power of RL becomes more obvious. It has been known for many years that on large-scale MDPs, the curse of dimensionality and the curse of modeling render classical dynamic programming (DP) ineffective. The excitement in RL stems from its direct attack on these curses, which allows it to solve problems that were considered intractable via classical DP in the past. The success of RL is due to its strong mathematical roots in the principles of DP, Monte Carlo simulation, function approximation, and AI. Topics treated in some detail in this survey are temporal differences, Q-learning, semi-MDPs, and stochastic games. Several recent advances in RL, e.g., policy gradients and hierarchical RL, are covered along with references. Pointers to numerous examples of applications are provided. This overview is aimed at uncovering the mathematical roots of this science so that readers gain a clear understanding of the core concepts and are able to use them in their own research. The survey points to more than 100 references from the literature.


European Journal of Operational Research | 2004

Reinforcement learning for long-run average cost

Abhijit Gosavi

Abstract A large class of sequential decision-making problems under uncertainty can be modeled as Markov and semi-Markov decision problems (SMDPs), when their underlying probability structure has a Markov chain. They may be solved by using classical dynamic programming (DP) methods. However, DP methods suffer from the curse of dimensionality and break down rapidly in face of large state-spaces. In addition, DP methods require the exact computation of the so-called transition probabilities, which are often hard to obtain and are hence said to suffer from the curse of modeling as well. In recent years, a simulation-based method, called reinforcement learning (RL), has emerged in the literature. It can, to a great extent, alleviate stochastic DP of its curses by generating ‘near-optimal’ solutions to problems having large state-spaces and complex transition mechanisms. In this paper, a simulation-based algorithm that solves Markov and SMDPs is presented, along with its convergence analysis. The algorithm involves a step-size based transformation on two-time scales. Its convergence analysis is based on a recent result on asynchronous convergence of iterates on two-time scales. We present numerical results from the new algorithm on a classical preventive maintenance case study of a reasonable size, where results on the optimal policy are also available. In addition, we present a tutorial that explains the framework of RL in the context of long-run average cost SMDPs.


Automatica | 2006

A risk-sensitive approach to total productive maintenance

Abhijit Gosavi

While risk-sensitive (RS) approaches for designing plans of total productive maintenance are critical in manufacturing systems, there is little in the literature by way of theoretical modeling. Developing such plans often requires the solution of a discrete-time stochastic control-optimization problem. Renewal theory and Markov decision processes (MDPs) are commonly employed tools for solving the underlying problem. The literature on preventive maintenance, for the most part, focuses on minimizing the expected net cost, and disregards issues related to minimizing risks. RS maintenance managers employ safety factors to modify the risk-neutral solution in an attempt to heuristically accommodate elements of risk in their decision making. In this paper, our efforts are directed toward developing a formal theory for developing RS preventive-maintenance plans. We employ the Markowitz paradigm in which one seeks to optimize a function of the expected cost and its variance. In particular, we present (i) a result for an RS approach in the setting of renewal processes and (ii) a result for solving an RS MDP. We also provide computational results to demonstrate the efficacy of these results. Finally, the theory developed here is of sufficiently general nature that can be applied to problems in other relevant domains.


Machine Learning | 2004

A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

Abhijit Gosavi

We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving average reward Markov and semi-Markov decision problems. In the literature on discounted reward RL, algorithms based on policy iteration and actor-critic algorithms have appeared. Our algorithm is an asynchronous, model-free algorithm (which can be used on large-scale problems) that hinges on the idea of computing the value function of a given policy and searching over policy space. In the applied operations research community, RL has been used to derive good solutions to problems previously considered intractable. Hence in this paper, we have tested the proposed algorithm on a commercially significant case study related to a real-world problem from the airline industry. It focuses on yield management, which has been hailed as the key factor for generating profits in the airline industry. In the experiments conducted, we use our algorithm with a nearest-neighbor approach to tackle a large state space. We also present a convergence analysis of the algorithm via an ordinary differential equation method.


International Journal of Production Research | 2002

Global supply chain management: A reinforcement learning approach

P. Pontrandolfo; Abhijit Gosavi; O.G. Okogbaa; Tapas K. Das

In recent years, researchers and practitioners alike have devoted a great deal of attention to supply chain management (SCM). The main focus of SCM is the need to integrate operations along the supply chain as part of an overall logistic support function. At the same time, the need for globalization requires that the solution of SCM problems be performed in an international context as part of what we refer to as Global Supply Chain Management (GSCM). This paper proposes an approach to study GSCM problems using an artificial intelligence framework called reinforcement learning (RL). The RL framework allows the management of global supply chains under an integration perspective. The RL approach has remarkable similarities to that of an autonomous agent network (AAN); a similarity that we shall discuss. The RL approach is applied to a case example, namely a networked production system that spans several geographic areas and logistics stages. We discuss the results and provide guidelines and implications for practical applications.


Iie Transactions | 1997

Economic design of dual-sampling-interval policies for X¯ charts with and without run rules

Tapas K. Das; Vikas Jain; Abhijit Gosavi

Recent studies show that the dual-sampling-interval (DSI) policies of X¯ control chart yield a smaller average time to signal (ATS) than Shewharts classical fixed-sampling-interval (FSI) policy for off-target processes. An economic design approach for DSI policies has not been addressed in the literature. In this paper we develop a comprehensive cost model for DSI policies with and without run rules and steady-state performance. The expression for the unit cost of quality is used as the objective function in optimal design of DSI policy parameters. The design process and the sensitivities of some of the model input parameters are exposed through numerical examples.


OR Spectrum | 2006

Simulation optimization for revenue management of airlines with cancellations and overbooking

Abhijit Gosavi; Emrah Ozkaya; Aykut F. Kahraman

This paper develops a model-free simulation-based optimization model to solve a seat-allocation problem arising in airlines. The model is designed to accommodate a number of realistic assumptions for real-world airline systems—in particular, allowing cancellations of tickets by passengers and overbooking of planes by carriers. The simulation–optimization model developed here can be used to solve both single-leg problems and multi-leg or network problems. A model-free simulation–optimization approach only requires a discrete-event simulator of the system along with a numerical optimization method such as a gradient-ascent technique or a meta-heuristic. In this sense, it is relatively “easy” because alternative models such as dynamic programming or model-based gradient-ascent usually require more mathematically involved frameworks. Also, existing simulation-based approaches in the literature, unlike the one presented here, fail to capture the dynamics of cancellations and overbooking in their models. Empirical tests conducted with our approach demonstrate that it can produce robust solutions which provide revenue improvements over heuristics used in the industry, namely, EMSR (Expected Marginal Seat Revenue) for single-leg problems and DAVN (Displacement Adjusted Virtual Nesting) for networks.


Iie Transactions | 2004

A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward

Abhijit Gosavi; Tapas K. Das; Sudeep Sarkar

Many problems of sequential decision making under uncertainty, whose underlying probabilistic structure has a Markov chain, can be set up as Markov Decision Problems (MDPs). However, when their underlying transition mechanism cannot be characterized by the Markov chain alone, the problems may be set up as Semi-Markov Decision Problems (SMDPs). The framework of dynamic programming has been used extensively in the literature to solve such problems. An alternative framework that exists in the literature is that of the Learning Automata (LA). This framework can be combined with simulation to develop convergent LA algorithms for solving MDPs under long-run cost (or reward). A very attractive feature of this framework is that it avoids a major stumbling block of dynamic programming; that of having to compute the one-step transition probability matrices of the Markov chain for every possible action of the decision-making process. In this paper, we extend this framework to the more general SMDP. We also present numerical results on a case study from the domain of preventive maintenance in which the decision-making problem is modeled as a SMDP. An algorithm based on LA theory is employed, which may be implemented in a simulator as a solution method. It produces satisfactory results in all the numerical examples studied.


Engineering Management Journal | 2010

A Reinforcement Learning Approach for Inventory Replenishment in Vendor-Managed Inventory Systems With Consignment Inventory

Zheng Sui; Abhijit Gosavi; Li Lin

Abstract: In a Vendor-Managed Inventory (VMI) system, the supplier makes decisions of inventory management for the retailer; the retailer is not responsible for placing orders. There is a dearth of optimization models for replenishment strategies for VMI systems, and the industry relies on well-understood, but simple models, e.g., the newsvendor rule. In this article, we propose a methodology based on reinforcement learning, which is rooted in the Bellman equation, to determine a replenishment policy in a VMI system with consignment inventory. We also propose rules based on the newsvendor rule. Our numerical results show that our approach can outperform the newsvendor.

Collaboration


Dive into the Abhijit Gosavi's collaboration.

Top Co-Authors

Avatar

Susan L. Murray

Missouri University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Tapas K. Das

University of South Florida

View shared research outputs
Top Co-Authors

Avatar

Suzanna Long

Missouri University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jane M. Fraser

Colorado State University–Pueblo

View shared research outputs
Top Co-Authors

Avatar

Ruwen Qin

Missouri University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Elizabeth A. Cudney

Missouri University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Scott E. Grasman

Rochester Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Casey Noll

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Cihan H. Dagli

Missouri University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge