Prasad Tadepalli | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Prasad Tadepalli is active.

Explore More

Publication

Featured researches published by Prasad Tadepalli.

international conference on machine learning | 2007

Multi-task reinforcement learning: a hierarchical Bayesian approach

Aaron Wilson; Alan Fern; Soumya Ray; Prasad Tadepalli

We consider the problem of multi-task reinforcement learning, where the agent needs to solve a sequence of Markov Decision Processes (MDPs) chosen randomly from a fixed but unknown distribution. We model the distribution over MDPs using a hierarchical Bayesian infinite mixture model. For each novel MDP, we use the previously learned distribution as an informed prior for modelbased Bayesian reinforcement learning. The hierarchical Bayesian framework provides a strong prior that allows us to rapidly infer the characteristics of new environments based on previous environments, while the use of a nonparametric model allows us to quickly adapt to environments we have not encountered before. In addition, the use of infinite mixtures allows for the model to automatically learn the number of underlying MDP components. We evaluate our approach and show that it leads to significant speedups in convergence to an optimal policy after observing only a small number of tasks.

Machine Learning | 2008

Structured machine learning: the next ten years

Thomas G. Dietterich; Pedro M. Domingos; Lise Getoor; Stephen Muggleton; Prasad Tadepalli

The field of inductive logic programming (ILP) has made steady progress, since the first ILP workshop in 1991, based on a balance of developments in theory, implementations and applications. More recently there has been an increased emphasis on Probabilistic ILP and the related fields of Statistical Relational Learning (SRL) and Structured Prediction. The goal of the current paper is to consider these emerging trends and chart out the strategic directions and open problems for the broader area of structured machine learning for the next 10 years.

Artificial Intelligence | 1998

Model-based average reward reinforcement learning

Prasad Tadepalli; DoKyeong Ok

Abstract Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this paper, we introduce a model-based Averagereward Reinforcement Learning method called H-learning and show that it converges more quickly and robustly than its discounted counterpart in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning that automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this “Auto-exploratory H-Learning” performs better than the previously studied exploration strategies. To scale H-learning to larger state spaces, we extend it to learn action models and reward functions in the form of dynamic Bayesian networks, and approximate its value function using local linear regression. We show that both of these extensions are effective in significantly reducing the space requirement of H-learning and making it converge faster in some AGV scheduling tasks.

Machine Learning | 2008

Transfer in variable-reward hierarchical reinforcement learning

Neville Mehta; Sriraam Natarajan; Prasad Tadepalli; Alan Fern

Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL problems are derived from Semi-Markov Decision Processes (SMDPs) that share the same transition dynamics but have different reward functions that are linear in a set of reward features. We formally define the transfer learning problem in the context of RL as learning an efficient algorithm to solve any SMDP drawn from a fixed distribution after experiencing a finite number of them. Furthermore, we introduce an online algorithm to solve this problem, Variable-Reward Reinforcement Learning (VRRL), that compactly stores the optimal value functions for several SMDPs, and uses them to optimally initialize the value function for a new SMDP. We generalize our method to a hierarchical RL setting where the different SMDPs share the same task hierarchy. Our experimental results in a simplified real-time strategy domain show that significant transfer learning occurs in both flat and hierarchical settings. Transfer is especially effective in the hierarchical setting where the overall value functions are decomposed into subtask value functions which are more widely amenable to transfer across different SMDPs.

international conference on machine learning | 2008

Automatic discovery and transfer of MAXQ hierarchies

Neville Mehta; Soumya Ray; Prasad Tadepalli; Thomas G. Dietterich

We present an algorithm, HI-MAT (Hierarchy Induction via Models And Trajectories), that discovers MAXQ task hierarchies by applying dynamic Bayesian network models to a successful trajectory from a source reinforcement learning task. HI-MAT discovers subtasks by analyzing the causal and temporal relationships among the actions in the trajectory. Under appropriate assumptions, HI-MAT induces hierarchies that are consistent with the observed trajectory and have compact value-function tables employing safe state abstractions. We demonstrate empirically that HI-MAT constructs compact hierarchies that are comparable to manually-engineered hierarchies and facilitate significant speedup in learning when transferred to a target task.

international conference on machine learning | 2005

Dynamic preferences in multi-criteria reinforcement learning

Sriraam Natarajan; Prasad Tadepalli

The current framework of reinforcement learning is based on maximizing the expected returns based on scalar rewards. But in many real world situations, tradeoffs must be made among multiple objectives. Moreover, the agents preferences between different objectives may vary with time. In this paper, we consider the problem of learning in the presence of time-varying preferences among multiple objectives, using numeric weights to represent their importance. We propose a method that allows us to store a finite number of policies, choose an appropriate policy for any weight vector and improve upon it. The idea is that although there are infinitely many weight vectors, they may be well-covered by a small number of optimal policies. We show this empirically in two domains: a version of the Buridans ass problem and network routing.

international joint conference on artificial intelligence | 2011

Imitation learning in relational domains: a functional-gradient boosting approach

Sriraam Natarajan; Saket Joshi; Prasad Tadepalli; Kristian Kersting; Jude W. Shavlik

Imitation learning refers to the problem of learning how to behave by observing a teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitation learning as supervised learning of a function from states to actions. For propositional worlds, functional gradient methods have been proved to be beneficial. They are simpler to implement than most existing methods, more efficient, more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the form of the function. Building on recent generalizations of functional gradient boosting to relational representations, we implement a functional gradient boosting approach to imitation learning in relational domains. In particular, given a set of traces from the human teacher, our system learns a policy in the form of a set of relational regression trees that additively approximate the functional gradients. The use of multiple additive trees combined with relational representation allows for learning more expressive policies than what has been done before. We demonstrate the usefulness of our approach in several different domains.

international conference on machine learning | 1998

Learning First-Order Acyclic Horn Programs from Entailment

Chandra Reddy; Prasad Tadepalli

In this paper, we consider learning first-order Horn programs from entailment. In particular, we show that any subclass of first-order acyclic Horn programs with constant arity is exactly learnable from equivalence and entailment membership queries provided it allows a polynomial-time subsumption procedure and satisfies some closure conditions. One consequence of this is that first-order acyclic determinate Horn programs with constant arity are exactly learnable from equivalence and entailment membership queries.

Journal of Artificial Intelligence Research | 2014

HC-search: a learning framework for search-based structured prediction

Janardhan Rao Doppa; Alan Fern; Prasad Tadepalli

Structured prediction is the problem of learning a function that maps structured inputs to structured outputs. Prototypical examples of structured prediction include part-of-speech tagging and semantic segmentation of images. Inspired by the recent successes of search-based structured prediction, we introduce a new framework for structured prediction called HC-Search. Given a structured input, the framework uses a search procedure guided by a learned heuristic H to uncover high quality candidate outputs and then employs a separate learned cost function C to select a final prediction among those outputs. The overall loss of this prediction architecture decomposes into the loss due to H not leading to high quality outputs, and the loss due to C not selecting the best among the generated outputs. Guided by this decomposition, we minimize the overall loss in a greedy stagewise manner by first training H to quickly uncover high quality outputs via imitation learning, and then training C to correctly rank the outputs generated via H according to their true losses. Importantly, this training procedure is sensitive to the particular loss function of interest and the time-bound allowed for predictions. Experiments on several benchmark domains show that our approach significantly outperforms several state-of-the-art methods.

inductive logic programming | 2008

Logical Hierarchical Hidden Markov Models for Modeling User Activities

Sriraam Natarajan; Hung Hai Bui; Prasad Tadepalli; Kristian Kersting; Weng-Keen Wong

Hidden Markov Models (HMM) have been successfully used in applications such as speech recognition, activity recognition, bioinformatics etc. There have been previous attempts such as Hierarchical HMMs and Abstract HMMs to elegantly extend HMMs at multiple levels of temporal abstraction (for example to represent the users activities). Similarly, there has been previous work such as Logical HMMs on extending HMMs to domains with relational structure. In this work we develop a representation that naturally combines the power of both relational and hierarchical models in the form of Logical Hierarchical Hidden Markov Models (LoHiHMMs). LoHiHMMs inherit the compactness of representation from Logical HMMs and the tractability of inference from Hierarchical HMMs. We outline two inference algorithms: one based on grounding the LoHiHMM to a propositional HMM and the other based on particle filtering adapted for this setting. We present the results of our experiments with the model in two simulated domains.

Explore More