Paul Reverdy
University of Pennsylvania
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Paul Reverdy.
arXiv: Learning | 2014
Paul Reverdy; Vaibhav Srivastava; Naomi Ehrich Leonard
In this paper, we present a formal model of human decision making in explore-exploit tasks using the context of multiarmed bandit problems, where the decision maker must choose among multiple options with uncertain rewards. We address the standard multiarmed bandit problem, the multiarmed bandit problem with transition costs, and the multiarmed bandit problem on graphs. We focus on the case of Gaussian rewards in a setting where the decision maker uses Bayesian inference to estimate the reward values. We model the decision makers prior knowledge with the Bayesian prior on the mean reward. We develop the upper-credible-limit (UCL) algorithm for the standard multiarmed bandit problem and show that this deterministic algorithm achieves logarithmic cumulative expected regret, which is optimal performance for uninformative priors. We show how good priors and good assumptions on the correlation structure among arms can greatly enhance decision-making performance, even over short time horizons. We extend to the stochastic UCL algorithm and draw several connections to human decision-making behavior. We present empirical data from human experiments and show that human performance is efficiently captured by the stochastic UCL algorithm with appropriate parameters. For the multiarmed bandit problem with transition costs and the multiarmed bandit problem on graphs, we generalize the UCL algorithm to the block UCL algorithm and the graphical block UCL algorithm, respectively. We show that these algorithms also achieve logarithmic cumulative expected regret and require a sublogarithmic expected number of transitions among arms. We further illustrate the performance of these algorithms with numerical examples.
conference on decision and control | 2014
Vaibhav Srivastava; Paul Reverdy; Naomi Ehrich Leonard
We study a path planning problem in an environment that is abruptly changing due to the arrival of unknown spatial events. The objective of the path planning problem is to collect the data that is most evidential about the events. We formulate this problem as a multiarmed bandit (MAB) problem with Gaussian rewards and change points, and address the fundamental tradeoff between learning the true event (exploration), and collecting the data that is most evidential about the true event (exploitation). We extend the switching-window UCB algorithm for MAB problems with bounded rewards and change points to the context of correlated Gaussian rewards and develop the switching-window UCL (SW-UCL) algorithm. We extend the SW-UCL algorithm to an adaptive SW-UCL algorithm that utilizes statistical change detection to adapt the SW-UCL algorithm. We also develop a block SW-UCL algorithm that reduces the number of transitions among arms in the SW-UCL algorithm, and is more amenable to robotic applications.
allerton conference on communication, control, and computing | 2013
Vaibhav Srivastava; Paul Reverdy; Naomi Ehrich Leonard
We consider two variants of the standard multi-armed bandit problem, namely, the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs. We develop block allocation algorithms for these problems that achieve an expected cumulative regret that is uniformly dominated by a logarithmic function of time, and an expected cumulative number of transitions from one arm to another arm uniformly dominated by a double-logarithmic function of time. We observe that the multi-armed bandit problem with transition costs and the associated block allocation algorithm capture the key features of popular animal foraging models in literature.
IEEE Transactions on Automation Science and Engineering | 2016
Paul Reverdy; Naomi Ehrich Leonard
We contribute to the development of a systematic means to infer features of human decision-making from behavioral data. Motivated by the common use of softmax selection in models of human decision-making, we study the maximum-likelihood (ML) parameter estimation problem for softmax decision-making models with linear objective functions. We present conditions under which the likelihood function is convex. These allow us to provide sufficient conditions for convergence of the resulting ML estimator and to construct its asymptotic distribution. In the case of models with nonlinear objective functions, we show how the estimator can be applied by linearizing about a nominal parameter value. We apply the estimator to fit the stochastic Upper Credible Limit (UCL) model of human decision-making to human subject data. The fits show statistically significant differences in behavior across related, but distinct, tasks.
intelligent robots and systems | 2015
Paul Reverdy; B. Deniz Ilhan; Daniel E. Koditschek
We develop a stochastic framework for modeling and analysis of robot navigation in the presence of obstacles. We show that, with appropriate assumptions, the probability of a robot avoiding a given obstacle can be reduced to a function of a single dimensionless parameter which captures all relevant quantities of the problem. This parameter is analogous to the Péclet number considered in the literature on mass transport in advection-diffusion fluid flows. Using the framework we also compute statistics of the time required to escape an obstacle in an informative case. The results of the computation show that adding noise to the navigation strategy can improve performance. Finally, we present experimental results that illustrate these performance improvements on a robotic platform.
conference on decision and control | 2014
Paul Reverdy; Naomi Ehrich Leonard
We propose a satisficing objective for the multi-armed bandit problem, i.e., where the objective is to achieve performance above a given threshold. We show that this new problem is equivalent to a standard multi-armed bandit problem with a maximizing objective and use this equivalence to find bounds on performance in terms of the satisficing objective. For the special case of Gaussian rewards we show that the satisficing problem is equivalent to a related standard multi-armed bandit problem again with Gaussian rewards. We apply the Upper Credible Limit (UCL) algorithm to this standard problem and show how it achieves optimal performance in terms of the satisficing objective.
conference on decision and control | 2012
Paul Reverdy; Robert C. Wilson; Philip Holmes; Naomi Ehrich Leonard
Motivated by models of human decision making, we consider a heuristic solution for explore-exploit problems. In a numerical example we show that, with appropriate parameter values, the algorithm performs well. However, the parameters of the algorithm trade off exploration against exploitation in a complicated way so that finding the optimal parameter values is not obvious. We show that the optimal parameter values can be analytically computed in some cases and prove that suboptimal parameter tunings can provide robustness to modeling error. The analytic results suggest a feedback control law for dynamically optimizing parameters.
intelligent robots and systems | 2016
Paul Reverdy; Daniel E. Koditschek
Spatial point process models are a commonly-used statistical tool for studying the distribution of objects of interest in a domain. We study the problem of deploying mobile robots as remote sensors to estimate the parameters of such a model, in particular the intensity parameter λ which measures the mean density of points in a Poisson point process. This problem requires covering an appropriately large section of the domain while avoiding the objects, which we treat as obstacles. We develop a control law that covers an expanding section of the domain and an online criterion for determining when to stop sampling, i.e., when the covered area is large enough to achieve a desired level of estimation accuracy, and illustrate the resulting system with numerical simulations.
14th AIAA Aviation Technology, Integration, and Operations Conference | 2014
Paul Reverdy; Akhil S. Reddy; Luigi Martinelli; Naomi Ehrich Leonard
Multidisciplinary Design Optimization (MDO) is a powerful engineering tool that allows designers to incorporate information from all relevant design disciplines simultaneously. In aerospace appli- cations, for example, MDO has been used to produce designs that incorporate both the structural and aerodynamic disciplines. It is not generally possible to optimize the objectives of all disciplines simultaneously, so producing an optimal design requires a human designer to balance the tradeoffs between the various objectives. We propose and implement a novel system that helps the designer explore the various possible tradeoffs and systematically find their most preferred design. We show that the system converges to the most preferred design in a simulated task and discuss how it could be used in an industrial MDO problem.
IEEE Transactions on Automatic Control | 2017
Paul Reverdy; Vaibhav Srivastava; Naomi Ehrich Leonard
Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. We show that these new problems are equivalent to various standard multi-armed bandit problems with maximizing objectives and use the equivalence to find bounds on performance. The different objectives can result in qualitatively different behavior; for example, agents explore their options continually in one case and only a finite number of times in another. For the case of Gaussian rewards we show an additional equivalence between the two sets of satisficing objectives that allows algorithms developed for one set to be applied to the other. We then develop variants of the Upper Credible Limit (UCL) algorithm that solve the problems with satisficing objectives and show that these modified UCL algorithms achieve efficient satisficing performance.