Is this you? Create Your Porfile

Sridhar Mahadevan

University of Massachusetts Amherst

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sridhar Mahadevan is active.

Explore More

Publication

Featured researches published by Sridhar Mahadevan.

Discrete Event Dynamic Systems | 2003

Recent Advances in Hierarchical Reinforcement Learning

Andrew G. Barto; Sridhar Mahadevan

Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability. Concluding remarks address open challenges facing the further development of reinforcement learning in a hierarchical setting.

Machine Learning | 1996

Average reward reinforcement learning: foundations, algorithms, and empirical results

Sridhar Mahadevan

This paper presents a detailed study of average reward reinforcement learning, an undiscounted optimality framework that is more appropriate for cyclical tasks than the much better studied discounted framework. A wide spectrum of average reward algorithms are described, ranging from synchronous dynamic programming methods to several (provably convergent) asynchronous algorithms from optimal control and learning automata. A general sensitive discount optimality metric calledn-discount-optimality is introduced, and used to compare the various algorithms. The overview identifies a key similarity across several asynchronous algorithms that is crucial to their convergence, namely independent estimation of the average reward and the relative values. The overview also uncovers a surprising limitation shared by the different algorithms while several algorithms can provably generategain-optimal policies that maximize average reward, none of them can reliably filter these to producebias-optimal (orT-optimal) policies that also maximize the finite reward to absorbing goal states. This paper also presents a detailed empirical study of R-learning, an average reward reinforcement learning method, using two empirical testbeds: a stochastic grid world domain and a simulated robot environment. A detailed sensitivity analysis of R-learning is carried out to test its dependence on learning rates and exploration levels. The results suggest that R-learning is quite sensitive to exploration strategies and can fall into sub-optimal limit cycles. The performance of R-learning is also compared with that of Q-learning, the best studied discounted RL method. Here, the results suggest that R-learning can be fine-tuned to give better performance than Q-learning in both domains.

international joint conference on artificial intelligence | 2011

Heterogeneous domain adaptation using manifold alignment

Chang Wang; Sridhar Mahadevan

We propose a manifold alignment based approach for heterogeneous domain adaptation. A key aspect of this approach is to construct mappings to link different feature spaces in order to transfer knowledge across domains. The new approach can reuse labeled data from multiple source domains in a target domain even in the case when the input domains do not share any common features or instances. As a pre-processing step, our approach can also be combined with existing domain adaptation approaches to learn a common feature space for all input domains. This paper extends existing manifold alignment approaches by making use of labels rather than correspondences to align the manifolds. This extension significantly broadens the application scope of manifold alignment, since the correspondence relationship required by existing alignment approaches is hard to obtain in many applications.

international conference on machine learning | 2005

Proto-value functions: developmental reinforcement learning

Sridhar Mahadevan

This paper presents a novel framework called proto-reinforcement learning (PRL), based on a mathematical model of a proto-value function: these are task-independent basis functions that form the building blocks of all value functions on a given state space manifold. Proto-value functions are learned not from rewards, but instead from analyzing the topology of the state space. Formally, proto-value functions are Fourier eigenfunctions of the Laplace-Beltrami diffusion operator on the state space manifold. Proto-value functions facilitate structural decomposition of large state spaces, and form geodesically smooth orthonormal basis functions for approximating any value function. The theoretical basis for proto-value functions combines insights from spectral graph theory, harmonic analysis, and Riemannian manifolds. Proto-value functions enable a novel generation of algorithms called representation policy iteration, unifying the learning of representation and behavior.

Advances in psychology | 2001

14 – Gaze Control for Face Learning and Recognition by Humans and Machines

John M. Henderson; Richard J. Falk; Silviu Minur; Fred C. Dyer; Sridhar Mahadevan

In this chapter we describe an ongoing project designed to investigate gaze control in face perception, a problem of central importance in both human and machine vision. The project uses converging evidence from behavioral studies of human observers and computational studies in machine vision. The research is guided by a formal framework for understanding gaze control based on Markov decision processes (MDPs). Behavioral data from human observers provide new insight into gaze control in a complex task, and are used to motivate an artificial gaze control system using the Markov framework. Furthermore, the efficacy of a foveal Markov-based approach to gaze control for face recognition in machine vision is tested. The general goal of the project is to uncover key principles of gaze control that cut across the specific implementation of the system (biological or machine).

Archive | 1993

Rapid Task Learning for Real Robots

Jonathan H. Connell; Sridhar Mahadevan

For learning to be useful on real robots, whatever algorithm is used must converge in some “reasonable” amount of time. If each trial step takes on the order of seconds, a million steps would take several months of continuous run time. In many cases such extended runs are neither desirable nor practical. In this chapter we discuss how learning can be speeded up by exploiting properties of the task, sensor configuration, environment, and existing control structure.

adaptive agents and multi-agents systems | 2001

A reinforcement learning model of selective visual attention

Silviu Minut; Sridhar Mahadevan

This paper proposes a model of selective attention for visual search tasks, based on a framework for sequential decision-making. The model is implemented using a fixed pan-tilt-zoom camera in a visually cluttered lab environment, which samples the environment at discrete time steps. The agent has to decide where to fixate next based purely on visual information, in order to reach the region where a target object is most likely to be found. The model consists of two interacting modules. A reinforcement learning module learns a policy on a set of regions in the room for reaching the target object, using as objective function the expected value of the sum of discounted rewards. By selecting an appropriate gaze direction at each step, this module provides top-down control in the selection of the next fixation point. The second module performs “within fixation” processing, based exclusively on visual information. Its purpose is twofold: to provide the agent with a set of locations of interest in the current image, and to perform the detection and identification of the target object. Detailed experimental results show that the number of saccades to a target object significantly decreases with the number of training epochs. The results also show the learned policy to find the target object is invariant to small physical displacements as well as object inversion.

international conference on robotics and automation | 2002

Approximate planning with hierarchical partially observable Markov decision process models for robot navigation

Georgios Theocharous; Sridhar Mahadevan

We propose and investigate a planning framework based on the hierarchical partially observable Markov decision process model (HPOMDP), and apply it to robot navigation. We show how this framework can be used to produce more robust plans as compared to flat models such as partially observable Markov decision processes (POMDPs). In our approach the environment is modeled at different levels of resolution, where abstract states represent both spatial and temporal abstraction. We test our hierarchical POMDP approach using a large simulated and real navigation environment. The results show that the robot is more successful in navigating to goals starting with no positional knowledge (uniform initial belief state distribution) using the hierarchical POMDP framework as compared to the flat POMDP approach.

Foundations and Trends® in Machine Learning archive | 2009

Learning Representation and Control in Markov Decision Processes: New Frontiers

Sridhar Mahadevan

Learning Representation and Control in Markov Decision Processes describes methods for automatically compressing Markov decision processes (MDPs) by learning a low-dimensional linear approximation defined by an orthogonal set of basis functions. A unique feature of the text is the use of Laplacian operators, whose matrix representations have non-positive off-diagonal elements and zero row sums. The generalized inverses of Laplacian operators, in particular the Drazin inverse, are shown to be useful in the exact and approximate solution of MDPs. The author goes on to describe a broad framework for solving MDPs, generically referred to as representation policy iteration (RPI), where both the basis function representations for approximation of value functions as well as the optimal policy within their linear span are simultaneously learned. Basis functions are constructed by diagonalizing a Laplacian operator or by dilating the reward function or an initial set of bases by powers of the operator. The idea of decomposing an operator by finding its invariant subspaces is shown to be an important principle in constructing low-dimensional representations of MDPs. Theoretical properties of these approaches are discussed, and they are also compared experimentally on a variety of discrete and continuous MDPs. Finally, challenges for further research are briefly outlined. Learning Representation and Control in Markov Decision Processes is a timely exposition of a topic with broad interest within machine learning and beyond.

Autonomous Robots | 1998

Rapid Concept Learning for Mobile Robots

Sridhar Mahadevan; Georgios Theocharous; Nikfar Khaleeli

Concept learning in robotics is an extremely challenging problem: sensory data is often high dimensional, and noisy due to specularities and other irregularities. In this paper, we investigate two general strategies to speed up learning, based on spatial decomposition of the sensory representation, and simultaneous learning of multiple classes using a shared structure. We study two concept learning scenarios: a hallway navigation problem, where the robot has to induce features such as “opening” or “wall”. The second task is recycling, where the robot has to learn to recognize objects, such as a “trash can”. We use a common underlying function approximator in both studies in the form of a feedforward neural network, with several hundred input units and multiple output units. Despite the high degree of freedom afforded by such an approximator, we show the two strategies provide sufficient bias to achieve rapid learning. We provide detailed experimental studies on an actual mobile robot called PAVLOV to illustrate the effectiveness of this approach.

Explore More