[PDF] Extending iLQR method with control delay

Abstract

Iterative linear quadradic regulator(iLQR) has become a benchmark method to deal with nonlinear stochastic optimal control problem. However, it does not apply to delay system. In this paper, we extend the iLQR theory and prove new theorem in case of input signal with fixed delay. Which could be beneficial for machine learning or optimal control application to real time robot or human assistive device.

Full PDF

AAll authors used to be belong to East China University of Science and Techonolgy This work was finished in 2013, however due to various reasons, it was not published Corresponding to: [email protected]

Extending iLQR method with control delay

Cheng Ju, Yan Qin and Chunjiang Fu *

1. Introduction In the framework of optimal control, the research of human motor control system has been progressing for 30 years. Numerous experimental phenomena have risen up fresh questions and contributed to the development of theoretical study. It has achieved the range of problems including nonlinearity, redundancy, signal dependant noise and constraint. In last century, feedforward optimal control was introduced first with various criterion of cost (Uno, 1989). Later, the importance of the involvement of noise was indicated by (Harris and Wolpert, 1998). After 20 th century, feedback was added in to form a relatively unified framework to computationally describe the motor control system, which is called optimal feedback control framework also including multiplicative noise and constraints (Todorov, 2002). However, such framework does not express how to deal with the time delay. Physiologically speaking, delays exist in all stages of motor system such as receiving sensory data, transferring motion command and muscle responding, which vary from 10ms to 150ms depending on the task. Delays can produce error and even instability of present state estimation, which cannot be neglected in the stochastic optimal feedback control framework (OFC) if the time of motion is long enough. Some literatures even employ delay to explain famous Fitts Law (Beamish, 2006). This work imports time delay into the OFC. As it is known, it is difficult to find a globally-optimal control regulator in such complex problem. By the idea of iterative linear quadratic regulator(iLQR, Weiwei Li, 2005), a discrete-time linear quadratic stochastic control system is developed to approximate the original problem. Then the local optimal feedback control regulator is obtained in a recursive form. This work demonstrates OFC is capable of including time delay in theory. The paper is organized as follows. Section 2 presents the nonlinear stochastic dynamic system with time delay. An approximate linear quadratic stochastic control system is introduced by linearization techniques. In Section 3, the iterative linear uadratic regulator (iLQR) with input delay algorithm is designed in our main theorem.

2. Problem Formulation 2.1 The non-linear stochastic dynamic system with time-delay Consider a class of non-linear dynamic system with time-delay described by the stochastic state equation ( ) ( ( ), ( ), ( )) ( ( ), ( ), ( )) ( ) dx t f x t u t u t dt F x t u t u t d t       (1) with the initial condition (0) x x  and ( ) 0 u s  for any [ , 0) s   . Here  is a positive fixed delay, x n R  is the state variable and u d R  is the control input variable without any constraint in our paper. ( ) t  is a d -dimension standard Brownian motion. The coefficients f and F are continuous with respect to all their arguments. The derivatives of these are continuous and bounded. The cost function to be minimized is defined as follow. ( , ) J t x  E { ( ( ) ) ( ( ) ) f f f Tf t t f t x t x P x t x   + [ ( ) ( ) ( ) + ( , ( )) ( ) ( , ( ))] f t T Tt x s P s x s ds u s x s Q s u s x s ds  }, (2) where superscript T denotes the transpose, the cost function ( , ) J t x is the total cost expected to accumulate if the system is initialized in state x at time t and controlled until the final time f t according to the control law ( , ) u u t x  , E {  } is the expectation with respect to  -algebra F , f t x is the target position, f t P and P ( s ) are symmetric non-negative matrices, Q ( s ) is a symmetric positive matrix. The quadratic criterion (2) is discussed in our paper to reduce complexity in the optimal control regulator algorithm and reflect physiological characteristics in the biomechanical model. The optimal control problem is to find the optimal feedback control * ( ) u t , f t t   , that minimizes criterion (0, ) J x , along with the trajectory * ( ) x t , f t t   , Note that it is difficult to find a globally-optimal control regulator in such the nonlinear stochastic dynamic system. Instead, we try to seek locally-optimal control laws: we will build a Linear-Quadratic-Gaussian (LQG) approximation to our original on-linear time-delay system. Then we will design an quasi-optimal control regulator in section 3. 2.2. Local LQG approximation Throughout the paper, there is an equidistant grid on the time interval [0, ] f t with mesh (sample period) f tt K l     , for two positive integers K , l . The interval [ , ) [ , ( 1) ) k k t t k t k t      is the sampling interval. Assumed that the input is a piecewise constant over the sampling interval, i.e. the zero-order holds assumption holds true: ( ) ( ) k u t u k t u constant     , for ( 1) k t t k t      , and similarly for all other time-vary quantities. Method starts with an open-loop control initial guess ( ) u t , and the corresponding “zero-noise” trajectory ( ) x t , obtained by applying ( ) u t to the deterministic system ( ) ( ( ), ( ), ( )) x t f x t u t u t   with the initial state (0) x x  and initial control input u ( t )=0 for [ , 0) t   . It can also be done by Euler integration ( , , ) k k k k k l x x f x u u t      . By linearizing the state equation (1) around , x u , we have the discrete-time linear equations, which describe by the state and control deviations k k k x x x    , k k k u u u    . Written in terms of these deviations, the discrete-time linear state equations approximation to our original non-linear state equation become ( , ) k k k k k k k l k k k l k x A x B u B u C u u               k =0 ，，，．．．， K -1. (3) with x   , l u u u           , and the coefficients are k n A I t f x       ， ik i B t f u      ， i  , ( , ) [ , , ] k k k l k k k k k l pk pk k pk k l C u u c C u C u c C u C u               , [ ], jj k c t F    ， [ ], i jj k i C t F u      ， i  ,

1, , j p  ( , , ) k k k l f f x u u  ， ( , , )= k k k l f x u uf x x     ， = ( , ) k k l F F u u  ， ( , , )= k k k lk f x u uf u u     , ( , , )= k k k lk l f x u uf u u     [ ][ ] 0 ( , )= ii k k lk F u uF u u     , [ ][ ] 1 ( , )= ii k k lk l F u uF u u     , F [ i ] denotes the th i column of F ， i =1 ，．．．， p . The noise ~ (0, ) k p N I  , k K   are independent each other. The t  term appears because the covariance of Brownian motion grows linearly with time. Since we are interested in multiplicative noises in the control signal, F is only linearized with respect to k u  and k l u   . The th i column of the matrix ( , ) k k k l C u u    is ik ik k ik k l c C u C u      . Thus the noise conditional covariance with respect to , , , k     is [ ( , ) ] ( )( ) p Tk k k l k ik ik k ik k l ik ik k ik k li Cov C u u c C u C u c C u C u                 Similarly, by quadratizing the cost function (2) around , x u , we have a cost-to-go function in the discrete-time quadratic form, for k = K , K- ，，，， cos t 2 2 T T T Tk k k k k k k k k k k k d x d x D x u e u E u           ， (4) where ( ( ) ( ) ) T Tk k k k k d t x P k t x u Q k t u      ， ( ) k k d t P k t x     , ( ) k D t P k t     ， ( ) k k e t Q k t u     ， ( ) k E t Q k t     , k = K- ，，

0. At the final time step k K  , it is k d  ( ( ) ) ( ( ) ) f f f Tf t t f t x t x P x t x   , k d  ( ( ) ) f f t f t P x t x  , f k t D P  , k e  and k E  . So k E is a symmetric positive matrix for k  . The discrete-time quadratic criterion approximation to our original one becomes ( ) J k  E { cos t } Kk ii k   , (5) where E {} k  is the conditional expectation with respect to , , , k     . So the LQG pproximation problem with (3) and (5) is to find the optimal control , , , K u u u     , that minimizes criterion (0) J （ start from beginning 0, answer: initial time is 0 ） , along with the trajectory , , , K x x x    , generated by the state equation (3). 3. Designing iLQR with input delay In this section we focus on the above LQG approximation system. For developing an iterative linear Quadratic Regulator (iLQR) algorithm for time-delay system, we first try to find an optimal control * 1 K u   to minimize (0) J , when the optimal control laws * * *0 1 2 , , , K u u u     are given. Bellman’s dynamic programming tells us that it is of equivalence to find an optimal control * 1 K u   to minimize ( 1) J K  in the LQG system. By substituting the state equation (3) into the criterion (5), it gives a new quadratic form ( 1) J K  without K u  and K x  . Notice that the new ( 1) J K  also has the quadratic form with the controls * * *0 1 2 , , , K u u u     , K u   and the states K x   . Based upon Pontryagin’s maximum principle, the optimal control law * 1 K u   can be written as an explicit linear expression with respect to * * *0 1 2 , , , K u u u     and K x   . For simplicity, the optimal control * k u  is denoted by k u  later. Then repeating the above procedure backwardly, the optimal feedback control laws of the LQG system is derived in the following theorem. Theorem 1. Consider the LQG system with the state equations (3) and the quadratic criterion (5), the optimal control laws , , , K u u u     satisfy (6) And the criterion (5) to be minimum is equivalent to in{( ), } 10min{( ), } 1 min{( ), } 10 , 0 ( ) 2 2 +2 ( ) ( ) K k lT T T ik k k k k k k i l kiK k l K k lT i T ijk k k i l k i l k k j li i j

J k s x s x S x u rx R u u R u                            (7) for k K   . Their coefficients are ( ) ( ) ,( ) ( ) + +( ) ( ) , 0 1 pT Tk k k k i k k i kik pT T l l T l l Tk k k k i k k i k k k k k ki

E B S B C S C K k K lH E B S B C S C R B R R B k K l                                 ( ) [ ( ) ( ) ],( ) [ ( ) ( ) ], 0 1( ) ,( ) ( ) pT Tk k k k i k k i kik pT T lk k k k i k k i k kiTk k k kk l Tk k k k H e B s C S c K k K lI H e B s C S c r k K lH B S A K k K lL H S B R                                      

11 0 1 0 11 1 1 ( 1) ( 1)10 1 1 0 1 0 1 1 11 1 1 ( 1) ( 1) 111 0 11 11 , 0 1( ) [( ) ( ) ],( ) [( ) ( ) +( ) ], 0 1( ) ( ) k pT Tk k k k i k k i kik pT T l Tk k k k i k k i k k kiT ii k k kk

A k K lH B S B C S C K k K lM H B S B C S C R B k K lH B RM                                 , , 1, 2, ,( ) [( ) + ] , 0 1, 1, 2, , 1 T i l ik k k k

K k K l i K kH B R R k K l i l                  + ( ) p T Tk k k i k k i k k k ki s d s c S c I H I            , ( ) T Tk k k k k k k s d s A I H L          , ( ) ( ) T Tk k k k k k k k

S D A S A L H L           , ( ) ( ) ( ) pT T Tk k k i k k i k k k ki r B s c S C M H I            ,

11 1 1 1 ( ) , 1, 2, , min{ , 1} i i i Tk k k k k r r M H I i K k l          , =( ) ( ) ( ) T Tk k k k k k k

R B S A M H L        ,

11 1 1 1 1 =( ) ( ) , 1, 2, , min{ , 1} i T i T ik k k k k k

R A R L H M i K k l          , ( ) ( ) ( ) pT T Tk k k k i k k i k k k ki R B S B C S C M H M             , ( ) ( ) ( ) , 1, 2, , min{ , 1} i T i i T T ik k k k k k k R R R B M H M i K k l              , ( 1)( 1) 1 11 1 1 1 ( ) , , 1, 2, , min{ , 1} ij i j i T jk k k k k R R M H M i j K k l             . (8) Proof. We want to show that (6) and (7) hold by induction. First of all, for k K   , from (5) and (4), we know that ( 1) J K   E { cos t } cos t KK i Ki K       E { ( )} K J K  and ( ) 2 T TK K K K K K

J K s x s x S x      with the coefficients

K K s d   ( ( ) ) ( ( ) ) f f f

Tf t t f t x t x P x t x   , K K s d   ( ( ) ) f f t f t

P x t x  , K K

S D   f t P . (9) Substituting the state equation (3) into the criterion ( 1) J K  , it gives ( 1) 2 2 T T T TK K K K K K K K K K K

J K d x d x D x u e u E u                        + E { 2( ( , ) ) TK K K K K K K K l K K K l K K s A x B u B u C u u s                        ( ) ( )2( ) ( , )( ( , ) ) ( ( , ) )}

TK K K K K K l K K K K K K K lTK K K K K K l K K K K l KTK K K l K K K K K l K

A x B u B u S A x B u B uA x B u B u S C u uC u u S C u u                                                         

Using the fact that ( ) ( ) trace UV trace VU  and ~ (0, ) k p N I  , we have E {( ( , ) ) ( ( , ) )} TK K K K l K K K K K l K

C u u S C u u                  { ( )( ) }( ) ( ) p Ti K i K K i K K l i K i K K i K K l Kip Ti K i K K i K K l K i K i K K i K K li trace c C u C u c C u C u Sc C u C u S c C u C u                                         Then (10) ( ) 2( )

T TK K K K K K u H u u G g             where ( ) ( ) pT TK K K K K i K K i Ki

H E B S B C S C           ,

01 1 1 1 1 1 1 { }

K K K K K K k l

G H I L x M u              

11 1 1 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 11 1( 1) ( 1) 1 ( 1) ( 1) 11

T T TK K K K K K K K K K K K l KTK K K K l K K K K K lp Ti K i K K l K i K i K K li g d x d x D x s A x B u sA x B u S A x B uc C u S c C u                                                  ( ) [ ( ) ( ) ]( )( ) [( ) ( ) ] pT T TK K K K K i K K i KiTK K K K K pT TK K K K K i K K i Ki I H e B s c S CL H B S AM H B S B C S C                       (11) and K H  is a symmetric positive matrix (see Appendix). When the optimal control laws , , , K u u u     are given, the well-known Bellman dynamic programming means that , , , , min (0) min K K u u u u J        E { cos t min ( 1)} K K i ui

J K       . It implies that we shall find the optimal control K u   which minimizes ( 1) J K  in the LQG system. Then based upon the Pontryagin’s maximum principle, the optimal control law K u   can be written to a linear expression as follows ( ) K K K K K K K K l u H G I L x M u                  , (12) which implies that (6) holds. ( 1) 2 22( )( ) ( )(

T T T TK K K K K K K K K K KTK K K K K K K l KTK K K K K K l K K K K K K K lp i Ki

J K d x d x D x u e u E us A x B u B u sA x B u B u S A x B u B uc                                                          ) ( ) Ti K K i K K l K i K i K K i K K l

C u C u S c C u C u                   ubstituting (12) into (10), it yields

01 1 1 1 1 1 1 10 001 1 1 1 1 1 ( 1) 2 2 +2 ( ) ( )

T T TK K K K K K K l KT TK K K l K l K k l

J K s x s x S x u rx R u u R u                              where + ( ) p T TK K K i K K i K K K Ki s d s c S c I H I            , ( ) T TK K K K K K K s d A s L H I          , ( ) ( ) T TK K K K K K K K

S D A S A L H L           , ( ) ( ) ( ) pT T TK K K i K k i K K K Ki r B s C S c M H I            , =( ) ( ) ( ) T TK K K K K K K

R A S B L H M        ,

00 1 1 1 1 0 01 1 1 ( 1) ( 1) 1 1 11 ( ) ( ) ( ) pT T TK K K K i K K i K K K Ki

R B S B C S C M H M             . (13) So we prove that (6) and (7) hold. Now assume that (6) and (7) hold for the time step k . We want to prove that those hold for the time step k  in two cases. Case 1: When ( 1, 1] k K l K     , we know that (7) is

101 10 , 0 ( ) 2 2 +2 ( ) ( )

K kT T T ik k k k k k k i l kiK k K kT i T ijk k k i l k i l k k j li i j

J k s x s x S x u rx R u u R u                          

Substituting (3) into the cost function ( 1)

J k   E {cost + ( )} k J k  , by a similar way, it gives (14) where ( ) ( ){ }2 + 2( )( pT Tk k k k k i k k i ki K k ik k k k k k k l k k i liT T Tk k k k k k k k k k k k l kk k k H E B S B C S CG H I L x M u M ug d x d x D x s A x B u sA x B                                                     ) ( )( ) ( )+2+2 ( )+ ( )

Tk l k k k k k lp K kT T ii k i k k l k i k i k k l k i l ki iK k K kT i T ijk k k k l k k i l k i l k k j li i j u S A x B uc C u S c C u u rA x B u R u u R u                                                 （）   ( ) [ ( ) ( ) ]( )( ) [( ) ( ) ]( ) ( ) , 1, 2, , pT Tk k k k k i k k i kiTk k k k k pT Tk k k k k i k k i kii T ik k k k

I H e B s C S cL H B S AM H B S B C S CM H B R i K k                             (15) and k H  is a positive matrix (see Appendix). Pontryagin’s maximum principle tells us that the feedback optimal control law

11 1 1 ( ) k k k u H G       i1 1 1 1 10 K kk k k k k i li

I L x M u              (16) Substituting into the cost function (14), we have ( 1) 2 2 K kT T T ik k k k k k k i l ki

J k s x s x S x u r                     +2 ( ) ( ) K k K kT i T ijk k k i l k i l k k j li i j x R u u R u                     (17) where + ( ) p T Tk k k i k k i k k k ki s d s c S c I H I            , ( ) T Tk k k k k k k s d A s L H I          , ( ) ( ) T Tk k k k k k k k

S D A S A L H L           , ( ) ( ) ( ) pT T Tk k k i k k i k k k ki r B s C S c M H I            , ( ) , 1, 2, , i i i Tk k k k k r r M H I i K k         , =( ) ( ) ( ) T Tk k k k k k k

R A S B L H M        ,

11 1 1 1 1 =( ) ( ) , 1, 2, , i T i T ik k k k k k

R A R L H M i K k         ,

00 1 1 1 1 0 01 1 1 ( 1) ( 1) 1 1 11 ( ) ( ) ( ) pT T Tk k k k i k k i k k k ki

R B S B C S C M H M             , ( ) ( ) ( ) , 1, 2, , i T i i T T ik k k k k k k R R R B M H M i K k             , ( 1)( 1) 1 11 1 1 1 ( ) , , 1, 2, , ij i j i T jk k k k k R R M H M i j K k            . (18) Case 2: When (0, 1] k K l    , Similarity, the cost function (7) can be written as follows

101 10 , 0 ( ) 2 2 +2 ( ) ( ) lT T T ik k k k k k k i l kil lT i T ijk k k i l k i l k k j li i j

J k s x s x S x u rx R u u R u                       

Substituting (3) into the cost function ( 1)

J k   E {cost + ( )} k J k  , by a similar way, we also have the cost function (14) with the coefficients as follows ( ) [ ( ) ( ) ]( ) ( )( ) [( ) ( ) +( ) ]( ) [( ) pT T lk k k k k i k k i k kil Tk k k k k k pT T l Tk k k k k i k k i k k kii T ik k k k I H e B s C S c rL H S B R AM H B S B C S C R BM H B R                                + ] , 1, 2, , 1 l ik R i l     

Based upon Pontryagin’s maximum principle, the feedback optimal control law can be written as follows:

11 1 1 ( ) k k k u H G       lk k k k k i li I L x M u              (19) Substituting into the cost function (14), we have

11 1 1 1 1 1 1 101 11 1 1 1 10 , 0 ( 1) 2 2 +2 ( ) ( ) lT T T ik k k k k k k i l kil lT i T ijk k k i l k i l k k j li i j

J k s x s x S x u rx R u u R u                                     (20) where the coefficients , , , , , , , 0,1, , 1 i i i ijk k k k k k k s s S r R R R i j l          ， are the same as those in (8). So (6) and (7) hold by induction. It is complete of the proof. Appendix Now we try to prove the matrix k H defined in (8) is positive. We should switch to a matrix form of the criterion (7) to better represent our solution. So (7) is rewritten as a uniform matrix form as follows: ( ) ( 1) 1 kkT T Tk k k l k k l xuJ k x u u u                   , (21) where the block matrix

01 ( 1)( 1) ( 1)0 10 ( 1)0 00 01 0 ( )( ) ( )( ) ( ) lk k k kl T l l l lk k k kk T l Tk k k kT l T Tk k k k

S R R sR R R rR R R rs r r s                 , the matrices , , , , , , , 0,1, , 1 i i i ijk k k k k k k s s S r R R R i j l          ， are denoted by (8) and

0, 0, 0 i ij ik k k

R R r    , , , 1, , 1 i j K k K k l      when ( 1, 1] k K l K     . Then by a similar way like that in the proof of Theorem 1, we will prove that k  is a non-negative definite symmetric matrix and k H is a positive definite symmetric matrix by induction. According to the proof of Theorem 1, (10) can be rewritten as follows: where

11 1 11 1 110 0 11 1 1 11 1

K KK K KT TK K KTK T K KK K K KTT K KK

I ID dI IE ed e dA S sB A B Bs sBI I                                                          

0( 1)1 0 1( 1) ( 1) ( 1) ( 1)1 ( 1)

Ti Kp Ti K K i K i K i Ki Ti K

C IC S C C c Ic                        K KK KT TK K KT T T TK K K K K K K K K K KT T T TK K K K K K K K K K KT T TK K K K K K K K K K

D dE ed e dA S A A S B A S B A sB S A B S B B S B B sB S A B S B B S B B                                 )0 0 0 00 ( ) ( ) ( )0 ( ) ( ) ( )

T KT T TK K K K K K Kp p pT T Ti K K i K i K K i K i K K i Ki i ip pT T Ti K K i K i K K i K i K K i Ki i ss A s B s B sC S C C S C C S cC S C C S C C S c                             pip p pT T Ti K K i K i K K i K i K K i Ki i i c S C c S C c S c                        ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) T T T TK K K K K K K K K K K K KT T T TK K K K K K K K K K K K KT T T TK K K K K K K K K K KT T T TK K K K K K K

D A S A A S B A S B d A sB S A E B S B B S B e B sB S A B S B B S B B sd s A e s B s                               

T K K Kp p pT T Ti K K i K i K K i K i K K i Ki i ip p pT T Ti K K i K i K K i K i K K i Ki i iTi K

B d sC S C C S C C S cC S C C S C C S cc S                              p p pT TK i K i K K i K i K K i Ki i i

C c S C c S c                       ( ) ( ) ( )( ) ( ) ( ) ( ) ( )

T T TK K K K K K K K K K K KT T TK K K K K K Kp pT T T T TK K K K K K K K i K K i K K K i K K i Ki iTK

D A S A H L A S B d A sL H H M H I HB S A H M B S B C S C B s C S cd s                                     p pT T T TK K K K K K i K K i K K K i K K i Ki i

A H I s B c S C d s c S c                           and I is a unit matrix with respect to their argument,

01 1 1 1 , , ,

K K K K

H L I M     are defined in (11) . We can obtain that ( ) ( ) pT TK K K K K i K K i Ki

H E B S B C S C           is a positive definite symmetric matrix, because K E  is a positive definite symmetric matrix and ( ) K S P K t   is a non-negative definite symmetric matrix. By the definition of , , , , k k k k k

D E d e d in (4) and , ,

K K K

S s s in (9), we know that for

1, 2, , 1 k K   , k k kk k kT T T Tk k k k k D d I I xP k tE e t I I uQ k td e d x u                           and   ( )( ( ) ) f ff

TK K T t f tf tK K

IS s P I x t xx t xs s             are two non-negative definite symmetric matrices. Notice that the matrix K S is non-negative. Then the matrix K   is non-negative. Substituting the feedback optimal control law

01 1 1 1 1 1

K K K K K K l u I L x M u              into ( 1)

J K  , it gives

11 1 1 1 ( 1) ( 1) 1

KT TK K l K K l xJ K x u u                   . where the block matrix are

K K K

IL M IK I            and

01 1 10 00 01 1 1 1 101 1 1 ( )( ) ( )

K K KT TK K K K KT TK K K

S R sK K R R rs r s                    , with the coefficients , , , , ,

K K K K K K s s S r R R       defined in (13). Let

T T TK K K

K K             , where the block matrix   ij   with the n n  unit matrix I   , the d d  unit matrix

2( 1) l I    ,

3( 2) l    and the otherwise ij   . So (7) can be written as (21) with

0, 0, 0 i ij iK K K

R R r       , , 1, 2, , 1 i j l   . We can obtain that the matrix K   is non-negative because the matrix K   is non-negative. Next assume that the matrix k  defined in (21) is non-negative. We want to prove that the matrix k   defined in (21) is also non-negative and the matrix k H is positive. By a similar way like that in the proof of Theorem 1, substituting (3) into the cost function ( 1) J k   E {cost + ( )} k J k  , it yields (22) where k k k A B BI I               , the block matrix   k ij     is symmetric with ( ) Tij ji   for i j  and

11 1 1 1

Tk k k k

D A S A       , ( ) ( ) T l Tk k k k k k

A S B R H L          ,

11 1 , 3, , 1

T l jj k k

A R j l       ,

11( 2) 1 1

Tl k k k

A S B      ,

1( 3) 1 1

Tl k k k d A s       , ( ) ( ) + +( ) pT T l l T l lk k k k i k k i k k k k k k ki E B S B C S C R B R R B H                   ( ) , 3, , 1 T l j l l j l jj k k k k k

B R R H M j l                 ( ) ( ) pl T Tl k k k k i k k i k k ki S B R B C S C H M               , ( ) ( ) pT l Tl k k k k i k k i ki e B s r C S c            , ( 1 )( 1 ) , , , 3, 1 l i l jij k R i j i j l          , ( 1 ) 1( 2) 1 ( ) , 3, 1 l i Ti l k k R B i l        ,

1( 3) , 3, 1 l ii l k r i l       , ( ) ( ) pT Tl l k k k i k k i ki B S B C S C           ,

1( 2)( 3) 1 ( 1) ( 1)1 ( ) ( ) pT Tl l k k i k k i ki

B s C S c          , ( 3)( 3) ( 1) ( 1)1 ( ) p Tl l k k i k k i ki s d c S c          . The block symmetric matrix   k ij     is non-negative because k kk kT Tk k k D dE ed e d            , T k    and k S  are non-negative. Since T k    and k S  are two non-negative definite symmetric matrices, ( ) ( ) + +( ) pT T l l T l lk k k i k k i k k k k k ki B S B C S C R B R R B             is non-negative. So we also have that k H  is a positive definite symmetric matrix, because k E  is a positive definite symmetric matrix. Notice the relation k u   lk k k k k i li I L x M u              . Let k k klk k k k k kk k l k lk l x I x xu L M M u uIu Ku uIu                                                                                    . Substituting the above equality into (22), we can obtain that Tk k

K K      is non-negative because the matrix   k ij     is non-negative. The relation Tk k

K K      is equivalence to (8). In particular, when ( 1, 1] k K l K     , we know that the coefficients

0, 0, 0 i ij ik k k

R R r    , , , 1, , 1 i j K k K k l      . Then by carefully calculating from (8), we get that the coefficients

0, 0, 0 i ij ik k k

R R r       , , 1, 2, , 1 i j K k K k l       . Reference [1] Y. Uno, M. Kawato, R. Suzuki. Formation and Control of Optimal Trajectory in Human Multipoint Arm Movement. Biological Cybernetics. 1989, 61(2): 89-101. [2] C. M. Harris, D. M.Wolpert. Signal-Dependent Noise Determines Motor lanning. Nature. 1998, 394(6695): 780-784. [3] E. Todorov, M. I. Jordan. Optimal Feedback Control as a Theory of Motor Coordination. Nat Neurosci. 2002, 11(5): 1226-1235. [4] D.Beamish, Fifty years later: a neurodynamic explanation of Fitts’ law. Journal of The Royal Society Interface, 2006 [5] Weiwei Li and E. Todorov. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems, 1st International Conference on Informatics in Control, Automation & Robotics, 2005 [6] K. J. Astrom. Introduction to Stochastic Control Theory. New York: Academic Press, 1970.[1] Y. Uno, M. Kawato, R. Suzuki. Formation and Control of Optimal Trajectory in Human Multipoint Arm Movement. Biological Cybernetics. 1989, 61(2): 89-101. [2] C. M. Harris, D. M.Wolpert. Signal-Dependent Noise Determines Motor lanning. Nature. 1998, 394(6695): 780-784. [3] E. Todorov, M. I. Jordan. Optimal Feedback Control as a Theory of Motor Coordination. Nat Neurosci. 2002, 11(5): 1226-1235. [4] D.Beamish, Fifty years later: a neurodynamic explanation of Fitts’ law. Journal of The Royal Society Interface, 2006 [5] Weiwei Li and E. Todorov. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems, 1st International Conference on Informatics in Control, Automation & Robotics, 2005 [6] K. J. Astrom. Introduction to Stochastic Control Theory. New York: Academic Press, 1970.