Is this you? Create Your Porfile

Tom Erez

Washington University in St. Louis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tom Erez is active.

Explore More

Publication

Featured researches published by Tom Erez.

intelligent robots and systems | 2012

MuJoCo: A physics engine for model-based control

Emanuel Todorov; Tom Erez; Yuval Tassa

We describe a new physics engine tailored to model-based control. Multi-joint dynamics are represented in generalized coordinates and computed via recursive algorithms. Contact responses are computed via efficient new algorithms we have developed, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers. Models are specified using either a high-level C++ API or an intuitive XML file format. A built-in compiler transforms the user model into an optimized data structure used for runtime computation. The engine can compute both forward and inverse dynamics. The latter are well-defined even in the presence of contacts and equality constraints. The model can include tendon wrapping as well as actuator activation states (e.g. pneumatic cylinders or muscles). To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel. Around 400,000 dynamics evaluations per second are possible on a 12-core machine, for a 3D homanoid with 18 dofs and 6 active contacts. We have already used the engine in a number of control applications. It will soon be made publicly available.

intelligent robots and systems | 2012

Synthesis and stabilization of complex behaviors through online trajectory optimization

Yuval Tassa; Tom Erez; Emanuel Todorov

We present an online trajectory optimization method and software platform applicable to complex humanoid robots performing challenging tasks such as getting up from an arbitrary pose on the ground and recovering from large disturbances using dexterous acrobatic maneuvers. The resulting behaviors, illustrated in the attached video, are computed only 7 × slower than real time, on a standard PC. The video also shows results on the acrobot problem, planar swimming and one-legged hopping. These simpler problems can already be solved in real time, without pre-computing anything.

ieee-ras international conference on humanoid robots | 2013

An integrated system for real-time model predictive control of humanoid robots

Tom Erez; Kendall Lowrey; Yuval Tassa; Vikash Kumar; Svetoslav Kolev; Emanuel Todorov

Generating diverse behaviors with a humanoid robot requires a mix of human supervision and automatic control. Ideally, the users input is restricted to high-level instruction and guidance, and the controller is intelligent enough to accomplish the tasks autonomously. Here we describe an integrated system that achieves this goal. The automatic controller is based on real-time model-predictive control (MPC) applied to the full dynamics of the robot. This is possible due to the speed of our new physics engine (MuJoCo), the efficiency of our trajectory optimization algorithm, and the contact smoothing methods we have developed for the purpose of control optimization. In our system, the operator specifies subtasks by selecting from a menu of predefined cost functions, and optionally adjusting the mixing weights of the different cost terms in runtime. The resulting composite cost is sent to the MPC machinery which constructs a new locally-optimal time-varying linear feedback control law once every 30 msec, while planning 500 msec into the future. This control law is evaluated at 1 kHz to generate control signals for the robot, until the next control law becomes available. Performance is illustrated on a subset of the tasks from the DARPA Virtual Robotics Challenge.

intelligent robots and systems | 2012

Trajectory optimization for domains with contacts using inverse dynamics

Tom Erez; Emanuel Todorov

This paper presents an algorithm for direct trajectory optimization in domains with contact. Since contacts and other unilateral constraints may introduce non-smooth dynamics, many standard algorithms of optimal control and reinforcement learning cannot be directly applied to such domains. We use a smooth contact model that can compute inverse dynamics through the contact, thereby avoiding hybrid representation of the non-smooth contact state. This allows us to formulate an unconstrained, continuous trajectory optimization problem, which can be solved using standard optimization tools. We demonstrate our approach by optimizing a running gait for a 31-dimensional simulated humanoid. The resulting gait is demonstrated in a movie attached as supplementary material. The optimization result exhibits a synchronous motion of the arm and the opposite leg, eliminating undesired angular momentum; this is a key feature of bipedal running, and its emergence attests to the power of the optimization process.

uncertainty in artificial intelligence | 2010

A scalable method for solving high-dimensional continuous POMDPs using local approximation

Tom Erez; William D. Smart

Partially-Observable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding belief-MDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent notion of locality, we can find an approximate solution using local optimization methods. We parameterize the belief distribution as a Gaussian mixture, and use the Extended Kalman Filter (EKF) to approximate the belief update. Since the EKF is a first-order filter, we can marginalize over the observations analytically. By using feedback control and state estimation during policy execution, we recover a behavior that is effectively conditioned on incoming observations despite the unconditioned planning. Local optimization provides no guarantees of global optimality, but it allows us to tackle domains that are at least an order of magnitude larger than the current state-of-the-art. We demonstrate the scalability of our algorithm by considering a simulated hand-eye coordination domain with 16 continuous state dimensions and 6 continuous action dimensions.

international conference on robotics and automation | 2015

Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX

Tom Erez; Yuval Tassa; Emanuel Todorov

There is growing need for software tools that can accurately simulate the complex dynamics of modern robots. While a number of candidates exist, the field is fragmented. It is difficult to select the best tool for a given project, or to predict how much effort will be needed and what the ultimate simulation performance will be. Here we introduce new quantitative measures of simulation performance, focusing on the numerical challenges that are typical for robotics as opposed to multi-body dynamics and gaming. We then present extensive simulation results, obtained within a new software framework for instantiating the same model in multiple engines and running side-by-side comparisons. Overall we find that each engine performs best on the type of system it was designed and optimized for: MuJoCo wins the robotics-related tests, while the gaming engines win the gaming-related tests without a clear leader among them. The simulations are illustrated in the accompanying movie.

robotics: science and systems | 2011

Infinite-Horizon Model Predictive Control for Periodic Tasks with Contacts.

Tom Erez; Yuval Tassa; Emanuel Todorov

We present a method that combines offline trajectory optimization and online Model Predictive Control (MPC), generating robust controllers for complex periodic behavior in domains with unilateral constraints (e.g., contact with the environment). MPC offers robust and adaptive control even in high-dimensional domains; however, the online optimization gets stuck in local minima when the domains has discontinuous dynamics. Some methods of trajectory optimization that are immune to such problems, but these are often too slow to be applied online. In this paper, we use offline optimization to find the limit-cycle solution of an infinite-horizon average-cost optimal-control task. We then compute a local quadratic approximation of the Value function around this limit cycle. Finally, we use this quadratic approximation as the terminal cost of an online MPC. This combination of an offline solution of the infinite-horizon problem with an online MPC controller is known as Infinite Horizon Model Predictive Control (IHMPC), and has previously been applied only to simple stabilization objectives. Here we extend IHMPC to tackle periodic tasks, and demonstrate the power of our approach by synthesizing hopping behavior in a simulated robot. IHMPC involves a limited computational load, and can be executed online on a standard laptop computer. The resulting behavior is extremely robust, allowing the hopper to recover from virtually any perturbation. In real robotic domains, modeling errors are inevitable. We show how IHMPC is robust to modeling errors by altering the morphology of the robot; the same controller remains effective, even when the underlying infinite-horizon solution is no longer accurate.

international conference on development and learning | 2008

What does shaping mean for computational reinforcement learning

Tom Erez; William D. Smart

This paper considers the role of shaping in applications of reinforcement learning, and proposes a formulation of shaping as a homotopy-continuation method. By considering reinforcement learning tasks as elements in an abstracted task space, we conceptualize shaping as a trajectory in task space, leading from simple tasks to harder ones. The solution of earlier, simpler tasks serves to initialize and facilitate the solution of later, harder tasks. We list the different ways reinforcement learning tasks may be modified, and review cases where continuation methods were employed (most of which were originally presented outside the context of shaping). We contrast our proposed view with previous work on computational shaping, and argue against the often-held view that equates shaping with a rich reward scheme. We conclude by discussing a proposed research agenda for the computational study of shaping in the context of reinforcement learning.

IEEE Transactions on Neural Networks | 2007

Least Squares Solutions of the HJB Equation With Neural Network Value-Function Approximators

Yuval Tassa; Tom Erez

In this paper, we present an empirical study of iterative least squares minimization of the Hamilton-Jacobi-Bellman (HJB) residual with a neural network (NN) approximation of the value function. Although the nonlinearities in the optimal control problem and NN approximator preclude theoretical guarantees and raise concerns of numerical instabilities, we present two simple methods for promoting convergence, the effectiveness of which is presented in a series of experiments. The first method involves the gradual increase of the horizon time scale, with a corresponding gradual increase in value function complexity. The second method involves the assumption of stochastic dynamics which introduces a regularizing second derivative term to the HJB equation. A gradual reduction of this term provides further stabilization of the convergence. We demonstrate the solution of several problems, including the 4D inverted-pendulum system with bounded control. Our approach requires no initial stabilizing policy or any restrictive assumptions on the plant or cost function, only knowledge of the plant dynamics. In the appendix, we provide the equations for first- and second-order differential backpropagation.

international conference on robotics and automation | 2014

Real-time behaviour synthesis for dynamic hand-manipulation

Vikash Kumar; Yuval Tassa; Tom Erez; Emanuel Todorov

Dexterous hand manipulation is one of the most complex types of biological movement, and has proven very difficult to replicate in robots. The usual approaches to robotic control - following pre-defined trajectories or planning online with reduced models - are both inapplicable. Dexterous manipulation is so sensitive to small variations in contact force and object location that it seems to require online planning without any simplifications. Here we demonstrate for the first time online planning (or model-predictive control) with a full physics model of a humanoid hand, with 28 degrees of freedom and 48 pneumatic actuators. We augment the actuation space with motor synergies which speed up optimization without removing dexterity. Most of our results are in simulation, showing non-prehensile object manipulation as well as typing. In both cases the input to the system is a high level task description, while all details of the hand movement emerge online from fully automated numerical optimization.

Explore More