Is this you? Create Your Porfile

Chelsea Finn

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chelsea Finn is active.

Explore More

Publication

Featured researches published by Chelsea Finn.

international conference on robotics and automation | 2016

Deep spatial autoencoders for visuomotor learning

Chelsea Finn; Xin Yu Tan; Yan Duan; Trevor Darrell; Sergey Levine; Pieter Abbeel

Reinforcement learning provides a powerful and flexible framework for automated acquisition of robotic motion skills. However, applying reinforcement learning requires a sufficiently detailed representation of the state, including the configuration of task-relevant objects. We present an approach that automates state-space construction by learning a state representation directly from camera images. Our method uses a deep spatial autoencoder to acquire a set of feature points that describe the environment for the current task, such as the positions of objects, and then learns a motion skill with these feature points using an efficient reinforcement learning method based on local linear models. The resulting controller reacts continuously to the learned feature points, allowing the robot to dynamically manipulate objects in the world with closed-loop control. We demonstrate our method with a PR2 robot on tasks that include pushing a free-standing toy block, picking up a bag of rice using a spatula, and hanging a loop of rope on a hook at various positions. In each task, our method automatically learns to track task-relevant objects and manipulate their configuration with the robots arm.

international conference on robotics and automation | 2016

Learning deep neural network policies with continuous memory states

Marvin Zhang; Zoe McCarthy; Chelsea Finn; Sergey Levine; Pieter Abbeel

Policy learning for partially observed control tasks requires policies that can remember salient information from past observations. In this paper, we present a method for learning policies with internal memory for high-dimensional, continuous systems, such as robotic manipulators. Our approach consists of augmenting the state and action space of the system with continuous-valued memory states that the policy can read from and write to. Learning general-purpose policies with this type of memory representation directly is difficult, because the policy must automatically figure out the most salient information to memorize at each time step. We show that, by decomposing this policy search problem into a trajectory optimization phase and a supervised learning phase through a method called guided policy search, we can acquire policies with effective memorization and recall strategies. Intuitively, the trajectory optimization phase chooses the values of the memory states that will make it easier for the policy to produce the right action in future states, while the supervised learning phase encourages the policy to use memorization actions to produce those memory states. We evaluate our method on tasks involving continuous control in manipulation and navigation settings, and show that our method can learn complex policies that successfully complete a range of tasks that require memory.

international conference on robotics and automation | 2017

Reset-free guided policy search: Efficient deep reinforcement learning with stochastic initial states

William H. Montgomery; Anurag Ajay; Chelsea Finn; Pieter Abbeel; Sergey Levine

Autonomous learning of robotic skills can allow general-purpose robots to learn wide behavioral repertoires without extensive manual engineering. However, robotic skill learning must typically make trade-offs to enable practical real-world learning, such as requiring manually designed policy or value function representations, initialization from human demonstrations, instrumentation of the training environment, or extremely long training times. We propose a new reinforcement learning algorithm that can train general-purpose neural network policies with minimal human engineering, while still allowing for fast, efficient learning in stochastic environments. We build on the guided policy search (GPS) algorithm, which transforms the reinforcement learning problem into supervised learning from a computational teacher (without human demonstrations). In contrast to prior GPS methods, which require a consistent set of initial states to which the system must be reset after each episode, our approach can handle random initial states, allowing it to be used even when deterministic resets are impossible. We compare our method to existing policy search algorithms in simulation, showing that it can train high-dimensional neural network policies with the same sample efficiency as prior GPS methods, and can learn policies directly from image pixels. We also present real-world robot results that show that our method can learn manipulation policies with visual features and random initial states.

international conference on robotics and automation | 2015

Beyond lowest-warping cost action selection in trajectory transfer

Dylan Hadfield-Menell; Alex X. Lee; Chelsea Finn; Eric Tzeng; Sandy H. Huang; Pieter Abbeel

We consider the problem of learning from demonstrations to manipulate deformable objects. Recent work [1], [2], [3] has shown promising results that enable robotic manipulation of deformable objects through learning from demonstrations. Their approach is able to generalize from a single demonstration to new test situations, and suggests a nearest neighbor approach to select a demonstration to adapt to a given test situation. Such a nearest neighbor approach, however, ignores important aspects of the problem: brittleness (versus robustness) of demonstrations when generalized through this process, and the extent to which a demonstration makes progress towards a goal. In this paper, we frame the problem of selecting which demonstration to transfer as an options Markov decision process (MDP). We present max-margin Q-function estimation: an approach to learn a Q-function from expert demonstrations. Our learned policies account for variability in robustness of demonstrations and the sequential nature of our tasks. We developed two knot-tying benchmarks to experimentally validate the effectiveness of our proposed approach. The selection strategy described in [2] achieves success rates of 70% and 54%, respectively. Our approach performs significantly better, with success rates of 88% and 76%, respectively.

Journal of Machine Learning Research | 2016