Emo Todorov
University of Washington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Emo Todorov.
international conference on robotics and automation | 2014
Yuval Tassa; Nicolas Mansard; Emo Todorov
Trajectory optimizers are a powerful class of methods for generating goal-directed robot motion. Differential Dynamic Programming (DDP) is an indirect method which optimizes only over the unconstrained control-space and is therefore fast enough to allow real-time control of a full humanoid robot on modern computers. Although indirect methods automatically take into account state constraints, control limits pose a difficulty. This is particularly problematic when an expensive robot is strong enough to break itself. In this paper, we demonstrate that simple heuristics used to enforce limits (clamping and penalizing) are not efficient in general. We then propose a generalization of DDP which accommodates box inequality constraints on the controls, without significantly sacrificing convergence quality or computational effort. We apply our algorithm to three simulated problems, including the 36-DoF HRP-2 robot. A movie of our results can be found here goo.gl/eeiMnn.
advances in computing and communications | 2010
Evangelos A. Theodorou; Yuval Tassa; Emo Todorov
Although there has been a significant amount of work in the area of stochastic optimal control theory towards the development of new algorithms, the problem of how to control a stochastic nonlinear system remains an open research topic. Recent iterative linear quadratic optimal control methods iLQG handle control and state multiplicative noise while they are derived based on first order approximation of dynamics. On the other hand, methods such as Differential Dynamic Programming expand the dynamics up to the second order but so far they can handle nonlinear systems with additive noise. In this work we present a generalization of the classic Differential Dynamic Programming algorithm. We assume the existence of state and control multiplicative process noise, and proceed to derive the second-order expansion of the cost-to-go. We find the correction terms that arise from the stochastic assumption. Despite having quartic and cubic terms in the initial expression, we show that these vanish, leaving us with the same quadratic structure as standard DDP.
robotics science and systems | 2010
Yuval Tassa; Emo Todorov
We present a method for smoothing discontinuous dynamics involving contact and friction, thereby facilitating the use of local optimization techniques for control. The method replaces the standard Linear Complementarity Problem with a Stochastic Linear Complementarity Problem. The resulting dynamics are continuously differentiable, and the resulting controllers are robust to disturbances. We demonstrate our method on a simulated 6-dimensional manipulation task, which involves a finger learning to spin an anchored object by repeated flicking.
IEEE-ASME Transactions on Mechatronics | 2013
Eric Rombokas; Mark Malhotra; Evangelos A. Theodorou; Emo Todorov; Yoky Matsuoka
Tendon-driven systems are ubiquitous in biology and provide considerable advantages for robotic manipulators, but control of these systems is challenging because of the increase in dimensionality and intrinsic nonlinearities. Researchers in biological movement control have suggested that the brain may employ “muscle synergies” to make planning, control, and learning more tractable by expressing the tendon space in a lower dimensional virtual synergistic space. We employ synergies that respect the differing constraints of actuation and sensation, and apply path integral reinforcement learning in the virtual synergistic space as well as the full tendon space. Path integral reinforcement learning has been used successfully on torque-driven systems to learn episodic tasks without using explicit models, which is particularly important for difficult-to-model dynamics like tendon networks and contact transitions. We show that optimizing a small number of trajectories in virtual synergy space can produce comparable performance to optimizing the trajectories of the tendons individually. The six tendons of the index finger and eight tendons of the thumb, each actuating four degrees of joint freedom, are used to slide a switch and turn a knob. The learned control strategies provide a method for discovery of novel task strategies and system phenomena without explicitly modeling the physics of the robot and environment.
international conference on robotics and automation | 2012
Eric Rombokas; Evangelos A. Theodorou; Mark Malhotra; Emo Todorov; Yoky Matsuoka
We apply path integral reinforcement learning to a biomechanically accurate dynamics model of the index finger and then to the Anatomically Correct Testbed (ACT) robotic hand. We illustrate the applicability of Policy Improvement with Path Integrals (PI2) to parameterized and non-parameterized control policies. This method is based on sampling variations in control, executing them in the real world, and minimizing a cost function on the resulting performance. Iteratively improving the control policy based on real-world performance requires no direct modeling of tendon network nonlinearities and contact transitions, allowing improved task performance.
Archive | 2014
Perle Geoffroy; Nicolas Mansard; Maxime Raison; Sofiane Achiche; Emo Todorov
Numerical optimal control (the approximation of an optimal trajectory using numerical iterative algorithms) is a promising approach to compute the control of complex dynamical systems whose instantaneous linearization is not meaningful. Aside from the problems of computation cost, these methods raise several conceptual problems, like stability, robustness, or simply understanding of the nature of the obtained solution. In this chapter, we propose a rewriting of the Differential Dynamic Programing solver. Our variant is more efficient and numerically more interesting. Furthermore, it draws some interesting comparisons with the classical inverse formulation: in particular, we show that inverse kinematics can be seen as singular case of it, when the preview horizon collapses.
robotics: science and systems | 2012
Eric Rombokas; Mark Malhotra; Evangelos A. Theodorou; Yoky Matsuoka; Emo Todorov
Biological motor control is capable of learning complex movements containing contact transitions and unknown force requirements while adapting the impedance of the system. In this work, we seek to achieve robotic mimicry of this compliance, employing stiffness only when it is necessary for task completion. We use path integral reinforcement learning which has been successfully applied on torque-driven systems to learn episodic tasks without using explicit models. Applying this method to tendon-driven systems is challenging because of the increase in dimensionality, the intrinsic nonlinearities of such systems, and the increased effect of external dynamics on the lighter tendon-driven end effectors. We demonstrate the simultaneous learning of feedback gains and desired tendon trajectories in a dynamically complex slidingswitch task with a tendon-driven robotic hand. The learned controls look noisy but nonetheless result in smooth and expert task performance. We show discovery of dynamic strategies not explored in a demonstration, and that the learned strategy is useful for understanding difficult-to-model plant characteristics.
conference on decision and control | 2013
Evangelos A. Theodorou; Krishnamurthy Dvijotham; Emo Todorov
We derive Policy Gradients(PGs) with time varying parameterizations for nonlinear diffusion processes affine in noise. The resulting policies have the form of reward weighted gradient. The analysis is in continuous time and includes the case of linear and nonlinear parameterizations. Examples on stochastic control problems for diffusions processes are provided.
advances in computing and communications | 2012
Krishnamurthy Dvijotham; Emo Todorov
Recent work has led to a novel theory of linearly solvable optimal control, where the Bellman equation characterizing the optimal value function is reduced to a linear equation. Already, this work has shown promising results in planning and control of nonlinear systems in high dimensional state spaces. In this paper, we extend the class of linearly solvable problems to include certain kinds of 2-player Markov Games. In terms of modeling power, the new framework is more general than previous work, and can apply to any noisy dynamical system. Also, we obtain analytical solutions for the optimal value function of continuous-state control problems with linear dynamics and a very flexible class of cost functions. The linearity leads to many other useful properties: the ability to compose solutions to simple control problems to obtain solutions to new problems, a convex optimization formulation of inverse optimal control etc. We demonstrate the usefulness of the framework through examples of forward and inverse optimal control problems in continuous as well as discrete state spaces.
ASME 2010 Summer Bioengineering Conference, Parts A and B | 2010
Evangelos A. Theodorou; Emo Todorov; Francisco J. Valero-Cuevas
In this work we present the first constrained stochastic optimal feedback controller applied to a fully nonlinear, tendon driven index finger model. Our model also takes into account an extensor mechanism, and muscle force-length and force-velocity properties. We show this feedback controller is robust to noise and perturbations to the dynamics, while successfully handling the nonlinearities and high dimensionality of the system. By extending prior methods, we are able to approximate physiological realism by ensuring positivity of neural commands and tendon tensions at all times.