Alborz Geramifard
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alborz Geramifard.
The International Journal of Robotics Research | 2010
Ruijie He; Abraham Bachrach; Michael Achtelik; Alborz Geramifard; Daniel Gurdan; Sam Prentice; Jan Stumpf; Nicholas Roy
The MAV ’08 competition focused on the problem of using air and ground vehicles to locate and rescue hostages being held in a remote building. To execute this mission, a number of technical challenges were addressed, including designing the micro air vehicle (MAV), using the MAV to geo-locate ground targets, and planning the motion of ground vehicles to reach the hostage location without detection. In this paper, we describe the complete system designed for the MAV ’08 competition, and present our solutions to three technical challenges that were addressed within this system. First, we summarize the design of our MAV, focusing on the navigation and sensing payload. Second, we describe the vision and state estimation algorithms used to track ground features, including stationary obstacles and moving adversaries, from a sequence of images collected by the MAV. Third, we describe the planning algorithm used to generate motion plans for the ground vehicles to approach the hostage building undetected by adversaries; these adversaries are tracked by the MAV from the air. We examine different variants of a search algorithm and describe their performance under different conditions. Finally, we provide results of our system’s performance during the mission execution.
Foundations and Trends® in Machine Learning archive | 2013
Alborz Geramifard; Thomas J. Walsh; Stefanie Tellex; Girish Chowdhary; Nicholas Roy; Jonathan P. How
A Markov Decision Process (MDP) is a natural framework for formulating sequential decision-making problems under uncertainty. In recent years, researchers have greatly advanced algorithms for learning and acting in MDPs. This article reviews such algorithms, beginning with well-known dynamic programming methods for solving MDPs such as policy iteration and value iteration, then describes approximate dynamic programming methods such as trajectory based value iteration, and finally moves to reinforcement learning methods such as Q-Learning, SARSA, and least-squares policy iteration. We describe algorithms in a unified framework, giving pseudocode together with memory and iteration complexity analysis for each. Empirical evaluations of these techniques with four representations across four domains, provide insight into how these algorithms perform with various feature sets in terms of running time and performance.
conference on decision and control | 2013
Christopher Amato; Girish Chowdhary; Alborz Geramifard; N. Kemal Ure; Mykel J. Kochenderfer
The focus of this paper is on solving multi-robot planning problems in continuous spaces with partial observability. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems, but representing and solving Dec-POMDPs is often intractable for large problems. To allow for a high-level representation that is natural for multi-robot problems and scalable to large discrete and continuous problems, this paper extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP). The Dec-POSMDP formulation allows asynchronous decision-making by the robots, which is crucial in multi-robot domains. We also present an algorithm for solving this Dec-POSMDP which is much more scalable than previous methods since it can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed methods performance is evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent multi-robot problems and provide high-quality solutions for large-scale problems.
advances in computing and communications | 2010
Joshua Redding; Alborz Geramifard; Aditya Undurti; Han-Lim Choi; Jonathan P. How
This paper presents an extension of existing cooperative control algorithms that have been developed for multi-UAV applications to utilize real-time observations and/or performance metric(s) in conjunction with learning methods to generate a more intelligent planner response. We approach this issue from a cooperative control perspective and embed elements of feedback control and active learning, resulting in an new intelligent Cooperative Control Architecture (iCCA). We describe this architecture, discuss some of the issues that must be addressed, and present illustrative examples of cooperative control problems where iCCA can be applied effectively.
european conference on machine learning | 2012
N. Kemal Ure; Alborz Geramifard; Girish Chowdhary; Jonathan P. How
Solving large scale sequential decision making problems without prior knowledge of the state transition model is a key problem in the planning literature. One approach to tackle this problem is to learn the state transition model online using limited observed measurements. We present an adaptive function approximator (incremental Feature Dependency Discovery (iFDD)) that grows the set of features online to approximately represent the transition model. The approach leverages existing feature-dependencies to build a sparse representation of the state transition model. Theoretical analysis and numerical simulations in domains with state space sizes varying from thousands to millions are used to illustrate the benefit of using iFDD for incrementally building transition models in a planning framework.
international conference on robotics and automation | 2013
Joshua Mason Joseph; Alborz Geramifard; John W. Roberts; Jonathan P. How; Nicholas Roy
Real-world robots commonly have to act in complex, poorly understood environments where the true world dynamics are unknown. To compensate for the unknown world dynamics, we often provide a class of models to a learner so it may select a model, typically using a minimum prediction error metric over a set of training data. Often in real-world domains the model class is unable to capture the true dynamics, due to either limited domain knowledge or a desire to use a small model. In these cases we call the model class misspecified, and an unfortunate consequence of misspecification is that even with unlimited data and computation there is no guarantee the model with minimum prediction error leads to the best performing policy. In this work, our approach improves upon the standard maximum likelihood model selection metric by explicitly selecting the model which achieves the highest expected reward, rather than the most likely model. We present an algorithm for which the highest performing model from the model class is guaranteed to be found given unlimited data and computation. Empirically, we demonstrate that our algorithm is often superior to the maximum likelihood learner in a batch learning setting for two common RL benchmark problems and a third real-world system, the hydrodynamic cart-pole, a domain whose complex dynamics cannot be known exactly.
advances in computing and communications | 2012
Alborz Geramifard; Joshua Redding; Joshua Mason Joseph; Nicholas Roy; Jonathan P. How
Risk and reward are fundamental concepts in the cooperative control of unmanned systems. In this research, we focus on developing a constructive relationship between cooperative planning and learning algorithms to mitigate the learning risk, while boosting system (planner & learner) asymptotic performance and guaranteeing the safety of agent behavior. Our framework is an instance of the intelligent cooperative control architecture (iCCA) where the learner incrementally improves on the output of a baseline planner through interaction and constrained exploration. We extend previous work by extracting the embedded parameterized transition model from within the cooperative planner and making it adaptable and accessible to all iCCA modules. We empirically demonstrate the advantage of using an adaptive model over a static model and pure learning approaches in an example GridWorld problem and a UAV mission planning scenario with 200 million possibilities. Finally we discuss two extensions to our approach to handle cases where the true model can not be captured exactly through the presumed functional form.
International Journal of Advanced Robotic Systems | 2005
Alborz Geramifard; Peyman Nayeri; Reza Zamani-Nasab; Jafar Habibi
This paper presents a new architecture called FAIS for implementing intelligent agents cooperating in a special Multi Agent environment, namely the RoboCup Rescue Simulation System. This is a layered architecture which is customized for solving fire extinguishing problem. Structural decision making algorithms are combined with heuristic ones in this model, so its a hybrid architecture.
AIAA Guidance, Navigation, and Control Conference | 2011
Joshua Redding; Tuna Toksoz; N. Kemal Ure; Alborz Geramifard; Jonathan P. How; Matthew A. Vavrina; John Vian
This paper introduces and demonstrates a full hardware testbed for research in long-duration missions for multiple, autonomous agents. Speci cally, we describe an automated battery management platform designed to service multiple quadrotor agents in the MIT RAVEN and Boeing VSTL ight environments. The changing/charging station allows the quadrotor’s spent battery to be quickly swapped for a fresh one without requiring it to power down or wait for recharge a signi cant bene t in persistent and/or time-critical missions. We focus on a multi-agent persistent search and track scenario and construct both centralized and decentralized MDP-based mission planners. We further show that for the three agent case, decentralized planners (one for each agent) o er a 99% reduction in computation time and only a relatively small (10%) degradation in overall mission performance when compared to the centralized approach over a long-term simulated mission.
uncertainty in artificial intelligence | 2008
Richard S. Sutton; Csaba Szepesvári; Alborz Geramifard; Michael P. Bowling