Feasible Region-based Identification Using Duality (Extended Version)
FFeasible Region-based Identification Using Duality(Extended Version)
Jaskaran Grover , Changliu Liu , Katia Sycara ∗ Abstract
We consider the problem of estimating bounds on parameters representing tasks being performed byindividual robots in a multirobot system. In our previous work, we derived necessary conditions based onpersistency of excitation analysis for exact identification of these parameters. We concluded that depend-ing on the robot’s task, the dynamics of individual robots may fail to satisfy these conditions, therebypreventing exact inference. As an extension to that work, this paper focuses on estimating bounds ontask parameters when such conditions are not satisfied. Each robot in the team uses optimization-basedcontrollers for mediating between task satisfaction and collision avoidance. We use KKT conditionsof this optimization and SVD of active collision avoidance constraints to derive explicit relations be-tween Lagrange multipliers, robot dynamics and task parameters. Using these relations, we are ableto derive bounds on each robot’s task parameters. Through numerical simulations we show how ourproposed region based identification approach generates feasible regions for parameters when a conven-tional estimator such as a UKF fails. Additionally, empirical evidence shows that this approach generatescontracting sets which converge to the true parameters much faster than the rate at which a UKF basedestimate converges. Videos of these results are available at https://bit.ly/2JDMgeJ
Task-based motion planning and control for multiple robots has several applications including search andrescue [1], sensor coverage [2] and environmental exploration [3]. The resiliency and robustness that can beachieved by multiple robots is superior to that achievable by one [4], [5]. While controlling these robots canbe treated as a forward problem, in this paper we consider the inverse problem of task inference [6], whichhas several applications of its own. For example, analyzing how easy it is to infer tasks being performedby a team of robots by simply observing them can reveal vulnerabilities of that robot team to adversarialattacks. This analysis can be used to design privacy preserving controllers to make the team resilient toattacks. In this paper, we take the view of an observer monitoring a team of robots in which each robot isperforming some task. The question we want to address is how can the observer compute reasonable bounds on the parameters representing the task of each robot by measuring their positions over time.The approach to infer parameters by estimating bounds is different from inferring them using point-wiseestimation algorithms such as an Unscented Kalman Filter (UKF) [7] or adaptive observers [8],[9],[10]. Thesealgorithms are point-wise estimators because they generate a single estimate of the parameter that convergesto the true parameter. Their convergence, however, is only guaranteed when the system dynamics satisfythe persistency of excitation condition [11]. When an observer has the freedom to command inputs to thesystem, he can design sufficiently rich inputs that will ensure convergence of these estimators to the trueparameters of the system. This open-loop identification has been used for identifying physical parametersof manipulators [12] and quadrotors [13].By contrast, our scenario is different from these systems because of two reasons. Firstly, our observercan only observe the robots but not intervene physically. Hence the parameter identification must occur ina closed-loop way. Secondly, while manipulators and quadrotors are monolithic, our “plant” is comprised ofmultiple robots. In our previous work [14], we derived necessary identifiability conditions for this system andshowed that multiple active interactions between robots leads to violation of the persistency of excitationcondition. Since interactions between robots are inevitable, point-wise parameter estimation algorithmswould fail to converge owing to the violation of this condition.This takeaway from our previous work forms the motivation for this paper. That is, while it may not bepossible to get an exact estimate of the task parameter of a robot, we want to determine if we can estimateany bounds for it. To address this problem, we propose a feasible-region based parameter identification algorithm that generates a bounded set where the true task parameter of each robot must belong. Further,we design this algorithm in a way such that the “measure” of this set ( i.e. its size) is non-increasing with time.There are several advantages of our proposed identification approach over point-wise estimation. Firstly, thisalgorithm is anytime i.e. the set computed at a given time is comprised of parameters that are all validcandidates for explaining the robot measurements observed until that time. Secondly, this algorithm doesnot have any gains to tune unlike Kalman filters or adaptive observers [10] where tuning relies on user’sexperience. Thirdly, this algorithm can also work symbiotically with point-wise estimation algorithms andcan expedite their convergence by continually projecting their estimates to the feasible regions generated bythis algorithm. Finally, our empirical evidence suggests that even in cases where point-wise estimation ispossible, region-based estimation converges much faster than a point-wise estimator (we have used a UKFfor comparison). ∗ J. Grover, C. Liu, K. Sycara are with the Robotics Institute at Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh,PA 15213, USA. { jaskarag,cliu6,sycara } @andrew.cmu.edu a r X i v : . [ c s . R O ] N ov he outline of this paper is as follows. In section 2, we give a brief review of point-wise parameterestimation following the development in [14, 15] and establish notation for our proposed feasible region-based identification approach. In section 3, we discuss how prior work [16] has addressed the multirobot tasksatisfaction problem using optimization-based controllers. Although prior work has mostly considered thetask of making robots reach some goals, we make the formulation abstract to allow for arbitrary tasks. Weformalize the feasible region-based task identification problem in section 4. The main contributions beginfrom section 5 where we derive the KKT conditions of the control-synthesis optimization. These conditionsallow us to pose an equality constrained optimization problem (EQP) formulated using the set of activeinteractions ( i.e. active constraints) of the ego robot. In section 6, we classify this robot’s dynamics basedon the number of constraints in this EQP, and linear independence relations amongst these constraints.Taking the SVD of these constraints allows us to derive explicit relations between robot dynamics, Lagrangemultipliers and task parameters. Finally, by using these relations along with the KKT conditions, we deriveanalytical descriptions of the sets where the task parameters must belong as we wanted. We demonstrate thepower of this identification approach through numerical simulations in section 7. In these simulations, weconsider geometric settings that span all combinations of number of active constraints and linear indepen-dence relations we theorized in section 6. For these cases, we show through simulations how the bounds ontask parameters computed by our approach converge much faster to the true parameter than an UnscentedKalman Filter. We conclude in 8 by summarizing and provide directions for future work. We describe two different approaches for parameter identification. These include (a) Point-wise identificationand (b) Feasible-region based identification. The need for distinguishing between these two approaches isnecessitated by our application i.e. task inference for multirobot systems. For these systems, we willdemonstrate that approach (a) which is the de-facto standard for parameter identification, can often failto converge owing to unavoidable interactions between robots. In such circumstances, approach (b) on theother hand, can provide reasonable bounds on task parameters, if not their exact values.
The point-wise identification approach focuses on designing an online parameter update law which ensuresthat the estimate of the parameter converges to the true parameter of the underlying system. Severalknown estimation algorithms fall in this category, including RLS [17], UKF, EKF [7] and the ones based onadaptive observers [15, 9, 8]. Adaptive observer based estimators provide conditions based on the persistencyof excitation criterion [18, 19] under which convergence to the true parameters is achieved. To formalizepoint-wise identification, consider a nonlinear system as follows˙ x = G ( x ) θ + f ( x ) , (1)where x ∈ R n is the measurable state, θ ∈ R p is the unknown parameter and G ( x ) : R n −→ R n × p , f ( x ) : R n −→ R n are known functions. For example, in our context, x ( t ) will correspond to the position ofthe ego robot under observation and θ denotes the task parameters of its controller that the observer wishesto infer. We can safely assume that the observer runs several parallel estimators, one for estimating theparameters of each robot, so the observer’s focus is on the ego robot. Adhering to the pointwise identification paradigm, the observer’s problem is to design an estimation law ˙ˆ θ = ψ (ˆ θ , x ) that guarantees convergence ofˆ θ −→ θ by using x ( t ) over some t ∈ [0 , T ] where T is large enough. This approach computes a set in the parameter space where the parameter of the underlying system mustbelong. The observer may use some measurements y ( t ) and compute a set Θ ( t ) ⊂ R p such that1. Θ ( t ) is bounded.2. θ ∈ Θ ( t ) ∀ t .3. The “measure” of this set, denoted by Vol ( Θ ( t )) is non-increasing with time.Ideally, if condition (2) is satisfied and condition (3) is replaced with a strict decrease in Vol ( Θ ( t )), it isguaranteed that lim t →∞ Θ ( t ) = θ . However, the relaxed condition (3) i.e. Vol ( Θ ( t )) is non-increasing iseasier to ensure in practice and will be our focus. One way to define a set that satisfies these three conditionsis as follows. Suppose at a given time t , we deduce that θ ∈ Ω ( t ) where Ω ( t ) := { ˆ θ ∈ R p | g ( y ( t ) , ˆ θ ) ≺ } , (2)for some function g ( y ( t ) , ˆ θ ) that depends on the observer’s measurements y ( t ). Note that Ω ( t ) need notbe bounded, that would depend on how g ( y ( t ) , ˆ θ ) is defined. Further, suppose Θ ⊂ R p is a known time-invariant compact set in which θ is known to belong a-priori. Then defining Θ ( t ) := (cid:92) ≤ τ ≤ t Ω ( τ ) ∩ Θ (3)will ensure satisfaction of the three conditions proposed above. Indeed, let t and t be two time instants suchthat t < t , then from (3) it is evident that Θ ( t ) ⊆ Θ ( t ). This would in-turn imply that Vol ( Θ ( t )) ≤ ol ( Θ ( t )) or in other words, Vol ( Θ ( t )) non-increasing. Since Θ ( t ) is derived from the set Ω ( t ), it sufficesto focus on constructing the instantaneous feasible set Ω ( t ) i.e. deriving g ( y ( t ) , ˆ θ ).In the rest of the paper, our goal will be to design a feasible-region based identification method forinferring task parameters of a multirobot system. To construct the the instantaneous feasible set Ω ( t ),we need to define functions g ( y ( t ) , ˆ θ ) needed in (2). This requires some assumptions on the controllersthat the robots use to perform their tasks (parametrized by θ ) and the resulting dynamics of the robots.The most basic of these assumptions is safety i.e. we assume that all robots have an underlying collisionavoidance mechanism to ensure collision-free motions while they perform their tasks. To mediate betweentask satisfaction and safety, a common approach is to use reactive optimization-based controllers as hasbeen demonstrated in [16, 20]. Throughout the rest of the paper, we will assume that the robots use suchcontrollers. The identification framework we develop assumes Control Barrier Function based QPs (CBF-QPs) as the underlying controllers, just to illustrate the mechanics of constructing Ω ( t ). However, ourapproach can be used to construct this set for any other type of optimization-based controller as well [21]. We give a brief summary of CBF-QPs here, referring the reader to [16] for a detailed treatment. Let therebe a total of M + 1 robots in the system. From the perspective of an ego robot, the remaining M robotsare “cooperative obstacles” who share the responsibility of avoiding collisions with the ego robot, whileperforming their own respective tasks. In the following discussion, the focus is on the ego robot. This robotfollows single-integrator dynamics i.e. ˙ x = u , (4)where x = ( p x , p y ) ∈ R is its position and u ∈ R is its velocity ( i.e. the control input). Assume that thisrobot can use a nominal controller ˆ u ( x ) = C ( x ) θ + d ( x ) that guarantees satisfaction of the task when thereis no other robot in the system. Here, θ is the task-parameter the observer wishes to infer, and C ( x ) , d ( x )are appropriate functions. As an example, if the ego robot’s task is to reach a goal at x d , the ego robot canuse ˆ u ( x ) = − k p ( x − x d ), a proportional controller that guarantees x → x d . If the observer’s problem is toinfer the goal x d , then defining C ( x ) = k p , d ( x ) = − k p x and θ = x d gives C ( x ) θ + d ( x ) = − k p ( x − x d ).Now although ˆ u ( x ) by itself ensures task completion, the robots must also avoid collisions amongstthemselves. To that end, the ego robot must maintain a distance of at-least D s with all the other robotslocated at { x oj } Mj =1 . That is the their positions ( x , x oj ) must satisfy (cid:107) ∆ x j (cid:107) ≥ D s where ∆ x j := x − x oj .To combine the collision avoidance requirement with the task-satisfaction objective, the ego robot solves aQP that computes a controller closest to the nominal task control ˆ u ( x ) and satisfies M collision avoidanceconstraints as follows: u ∗ = arg min u (cid:107) u − ˆ u ( x ) (cid:107) subject to A ( x ) u ≤ b ( x ) (5)Here A ( x ) ∈ R M × , b ( x ) ∈ R M are defined such that the j th row of A is a Tj and the j th element of b is b j : a Tj ( x ) := − ∆ x Tj = − ( x − x oj ) T b j ( x ) := γ (cid:107) ∆ x j (cid:107) − D s ) ∀ j ∈ { , , . . . , M } (6)The ego robot solves this QP at every time step to determine its control u ∗ , which ensures collision avoidancewhile encouraging satisfaction of the task parametrized by θ .Aside from its position x , the ego robot’s control u ∗ depends on task parameter θ . This dependenceis implicitly encoded through the cost function of (5) (recall ˆ u ( x ) = C ( x ) θ + d ( x )). To highlight thisdependence, we denote the control as u ∗ θ ( x ) where θ is the unknown parameter the observer aims to identify.Next, we formulate the problem that the observer seeks to solve. The observer’s problem is to identify a region in the parameter space consistent with conditions (1) and (2)of the feasible-region based identification approach proposed in Sec. 2. That is, the observer must identifya set Θ ( t ) ⊂ R p where θ must lie ∀ t ∈ [0 , T ] such that its volume is non-increasing over this duration.Following the analysis in subsec. 2.2, it suffices to estimate the instantaneous feasible set Ω ( t ) (2) as the0-sublevel set of a function g ( y ( t ) , ˆ θ ). The measurements y ( t ) (2) that the observer can use to computethis set include (a) the positions of the ego robot x ( t ) and (b) the positions of the “cooperative obstacles” x oj ( t ) ∀ j ∈ { , , · · · , M } over t ∈ [0 , T ] i.e. y ( t ) = ( x ( t ) , x o ( t ) , · · · , x oM ( t )). The observer will run M + 1parallel set-estimators to compute { Θ i ( t ) } M +1 i =1 for each robot in the team since each robot has its uniquetask parameter θ i .Focusing on the ego robot, the observer needs to derive functions g ( y ( t ) , ˆ θ ) needed in (2) to compute Ω ( t ). For that, the observer must know an explicit relation between the dynamics of the ego robot and theparameters θ of the form ˙ x = G ( x ) θ + f ( x ) similar to (1). That is, it must know G ( x ) and f ( x ). However,owing to the fact that ˙ x = u ∗ θ ( x ) is optimization-based (5), such explicit relations are not known. It ispossible to derive these relations by focusing on the KKT conditions of (5). We derive these conditions inthe next section using which the observer will be able to compute the set Ω ( t ). Before proceeding, let’s stateall the assumptions on the observer’s knowledge: 3 ssumption 1. The observer can measure both the position x ( t ) and velocity (i.e. the control u ∗ θ ( x ( t )) ofthe ego robot. Assumption 2.
The observer knows the form of safety constraints A ( x ) , b ( x ) in (5) and that the costfunction is of the form (cid:107) u − ˆ u ( x ) (cid:107) . Assumption 3.
The observer knows the task functions C ( x ) , d ( x ) of ˆ u ( x ) = C ( x ) θ + d ( x )Assumption 1 is not restrictive because positions are easily measurable and velocities can be obtainedthrough numerical differentiation of positions. Assumption 2 is needed since we are interested in derivingexpressions of g ( y ( t ) , ˆ θ ) and that requires knowledge of the underlying control synthesis approach. Assump-tions 3 is not restrictive in practice because the observer can hypothesize on the high-level task by observingthe robots and query a library of task-to-function mappings for guessing C ( x ) , d ( x ). To analyze the relation between the optimizer of (5) i.e. u ∗ θ ( x ) and parameters θ , we look at the KKTconditions of this QP. These are necessary and sufficient conditions satisfied by u ∗ θ ( x ). The Lagrangian for(5) is L ( u , µ ) = (cid:107) u − ˆ u (cid:107) + µ T ( A u − b ) . Let ( u ∗ θ , µ ∗ θ ) be the optimal primal-dual solution to (5). The KKT conditions are [22]:1. Stationarity: ∇ u L ( u , µ ) | ( u ∗ θ , µ ∗ θ ) = 0,= ⇒ u ∗ θ = ˆ u − (cid:88) j ∈{ , ··· ,M } µ ∗ j θ a j = ˆ u − A T µ ∗ θ . (7)2. Primal Feasibility A u ∗ θ ≤ b ⇐⇒ a Tj u ∗ θ ≤ b j ∀ j ∈ { , · · · , M } . (8)3. Dual Feasibility µ ∗ j θ ≥ ∀ j ∈ { , , · · · , M } . (9)4. Complementary Slackness µ ∗ j θ · ( a Tj u ∗ θ − b j ) = 0 ∀ j ∈ { , , · · · , M } . (10)We define the set of active and inactive constraints as A ( u ∗ θ ) := { j ∈ { , , · · · , M } | a Tj u ∗ θ = b j } , (11) IA ( u ∗ θ ) := { j ∈ { , , · · · , M } | a Tj u ∗ θ < b j } . (12)The set of active constraints qualitatively represents those robots that the ego robot “worries” about forcollisions. From the perspective of the ego robot, we simply refer to the “other robots” as obstacles . Now we briefly describe how these conditions are useful for the computing the instantaneous feasible set where the parameter θ can lie i.e. the set Ω ( t ) defined in (2).1. Inactive constraints:
From the definition of inactive constraints (11), recall a Tj u ∗ θ < b j ∀ j ∈ IA ( u ∗ θ ).Thus, deriving an explicit expression for u ∗ θ = G ( x ) θ + f ( x ) will allow us to prune regions in theparameter where θ belongs.2. Non-negativity of Lagrange multipliers:
Likewise from dual feasibility (9), recall that µ ∗ j θ ≥ ∀ j ∈ { , , · · · , M } . Since µ ∗ j θ depends on θ , deriving an explicit expression for µ ∗ j θ as a function of θ will allow us to further prune the region where θ belongs.Thus we define the instantaneous feasible set Ω ( t ) as Ω ( t ) := { ˆ θ ∈ R p | a Tj ( t ) u ∗ θ ( t ) − b j ( t ) < ∀ j ∈ IA ( u ∗ θ ( t )) , − µ ∗ j θ ( t ) ≤ ∀ j ∈ { , · · · , M }} (13)4able 1: Enumerating cases where either point-wise identification or region-based identification or both arepossible Case Number of activeconstraints rank ( A ac ) Point-wise identifi-cation Region-basedidentification
A None - Possible PossibleB 1 1 Possible PossibleC ≥ ≥ u ∗ θ and µ ∗ j θ The following discussion summarizes how we can simplify problem (5) to a simpler problem so as to deriveexplicit expressions for both u ∗ θ and µ ∗ j θ . Let there be a total of K active constraints i.e. card ( A ( u ∗ θ )) = K where K ∈ { , , · · · , M } . Using (9) and (10), we deduce µ ∗ j θ = 0 ∀ j ∈ IA ( u ∗ θ ) . (14)Therefore, we can restrict the summation in (7) only to the set of active constraints i.e. u ∗ θ = ˆ u − (cid:88) j ∈A ( u ∗ θ ) µ ∗ j θ a j = ˆ u − A Tac µ ac θ . (15)where A ac ( x ) ∈ R K × is the matrix formed using the rows of A that are indexed by the active set A ( u ∗ θ ),and likewise µ ac θ := { µ ∗ j θ } j ∈A ( u ∗ θ ) . Similarly, let b ac ( x ) ∈ R K denote the vector formed from the elements of b indexed by A ( u ∗ θ ). Likewise, we can define A inac ( x ) and b inac ( x ) corresponding to the inactive set. Bydeleting all inactive constraints and retaining only the active constraints from (5), we can pose another QPthat consists only of active constraints, whose solution is the same as that of (5). This equality-constrainedprogram (EQP) is given by u ∗ = arg min u (cid:107) u − ˆ u ( x ) (cid:107) subject to A ac ( x ) u = b ac ( x ) (16)The system A ac ( x ) u = b ac ( x ) is always consistent by construction (11), as long as a solution u ∗ θ to (5)exists. This EQP is useful because it is easier to derive an expression u ∗ θ ( x ) = G ( x ) θ + f ( x ) for (16) thanthe inequality constrained problem (5). The question is how can the observer estimate the active set A ( u ∗ θ )to determine A ac ( x ) , b ac ( x ) for (16). This can be done as follows. From Assumption 1, recall that theobserver can measure both the position x ( t ) and velocity u ∗ θ ( x ( t )) of the robot. Using these, the observercan determine A ( u ∗ θ ) by comparing residuals | a Tj ( x ) u ∗ θ − b j ( x ) | against a small threshold (cid:15) > A obs. := { j ∈ { , · · · , M } | | a Tj ( x ) u ∗ θ − b j ( x ) | < (cid:15) } (17)For a small threshold (cid:15) , it holds true that A obs. = A ( u ∗ θ ) consistent with (11). This allows the observer todetermine the active set. Next, we work with (16) to derive an explicit expression for control i.e. u ∗ θ ( x ) = G ( x ) θ + f ( x ) and µ ∗ j θ for various combinations of card ( A ( u ∗ θ )) = K and rank ( A ac ( x )). ( t ) using SVD of A ac ( x ) u = b ac ( x ) The aim of this section is to derive explicit expressions for the optimal control u ∗ θ and Lagrange multipliers µ ∗ j θ for region based estimation of θ . To derive these, we will bank on the EQP proposed in (16) as asurrogate to the original problem in (5). We can further simplify the analysis of (16) by considering differentcases that can arise based on the number of active constraints ( K ) and linear independence relations amongthe constraints governed by rank ( A ac ( x )). Table 1 summarizes these cases along with whether point-wiseestimation and feasible region-based estimation is possible when either of these arise in practice. Note thatthe last two rows where the number of active constraints are ≥ θ . As we will show, these in fact correspond tosituations that are most likely to arise in multirobot systems applications as well, highlighting the importanceof our proposed technique. i.e. K = 0 When no constraint is active, we have µ j θ = 0 ∀ j ∈ { , , · · · , M } . This means that A ( u ∗ θ ) = φ and IA ( u ∗ θ ) = { , , · · · , M } . In other words, the ego robot does not “worry” about collisions with any otherrobot. From (7) we get u ∗ θ = ˆ u ( x ) = C ( x ) θ + d ( x ). From this expression, it is evident that the control u ∗ θ and the robot dynamics ˙ x , exhibit a well-defined dependence on θ . Therefore, it is possible to do point-wiseidentification of θ using RLS, UKF or AO type of estimators (provided that C ( x ) is persistently exciting[19]). However, since point-wise identification is not under the purview of this paper, we focus on estimatingregions where θ can lie. Referring to 5.1, note that the condition for nonnegativity of Lagrange multipliers is5atisfied trivially because µ j θ = 0 ∀ j ∈ { , , · · · , M } , so they do not give any information about θ . However,the condition for inactive constraints gives ∀ j ∈ IA ( u ∗ θ ) = { , , · · · , M } a Tj u ∗ θ − b j < ⇒ a Tj ( C θ + d ) − b j < ⇒ ( a Tj C ) θ < b j − a Tj d (18)which represents a set of M halfspace constraints on θ . Recall from (6) that a j and b j depend on theposition of the ego robot i.e. x and the positions of other robots i.e. x oj ∀ j ∈ { , , · · · , M } . In (18), wehave omitted these dependencies to keep the notation light. Thus, the desired set Ω is defined using (6) and(13) as follows Ω := { ˆ θ ∈ R p | ( AC ) ˆ θ < b − A d } (19) i.e. K = 1 When exactly constraint is active i.e. K = 1, there is one obstacle that the ego robot “worries” about forcollision. Since there are two degrees of freedom in the control input, and one obstacle to avoid, the ego robotcan avoid this obstacle and additionally minimize (cid:107) u − ˆ u ( x ) (cid:107) with the remaining degree of freedom. Thiscauses u ∗ θ to exhibit a well-defined dependence on ˆ u ( x ) and by extension, on θ . This allows for point-wiseidentification of θ .Region-based identification is also possible and can be used to expedite the convergence of point-wiseidentification. To derive the feasible region where θ belongs, we need the expression for control u ∗ θ andthe Lagrange multiplier corresponding to the active constraint. Let i ∈ { , , · · · , M } denote the index ofthe active constraint. Thus, from (11), we have A ac ( x ) u ∗ θ = a Ti ( x ) u ∗ θ = b i ( x ) where A ac ( x ) := a Ti ( x ) and a Ti ( x ) , b i ( x ) are defined in (6). We use the SVD of A ac = U Σ V T . Defining U = 1Σ = (cid:2) Σ m , (cid:3) where Σ m = (cid:107) a i (cid:107) V = (cid:2) V , V (cid:3) V = a i (cid:107) a i (cid:107) , V = R π a i (cid:107) a i (cid:107) , (20)Since V forms a basis for R , any u can be expressed as u = (cid:2) V , V (cid:3) (cid:20) ˜ u ˜ u (cid:21) . = ⇒ a Ti u − b i = U (cid:2) Σ m , (cid:3) (cid:20) V T V T (cid:21) (cid:2) V , V (cid:3) (cid:20) ˜ u ˜ u (cid:21) − b i = U Σ m ˜ u + 0 · ˜ u − b i = 0 (21)Choosing ˜ u = Σ − m U T b i and ˜ u = ψ ∈ R , we find that u = V Σ − m U T b i + V ψ (22)satisfies a Ti ( x ) u = b i ( x ) ∀ ψ ∈ R . Recall from the properties of SVD that V forms a basis for N ( a Ti ( x )).We tune ψ to minimize (cid:107) u − ˆ u (cid:107) by solving the following unconstrained minimization problem ψ ∗ = arg min ψ (cid:107) u − ˆ u (cid:107) = arg min ψ (cid:13)(cid:13) V Σ − m U T b i + V ψ − ˆ u (cid:13)(cid:13) , (23)which gives ψ ∗ = V T ˆ u . Substituting this in (22), gives u ∗ θ = V Σ − m U T b i + V V T ˆ u . (24)This equation is the solution to (16) and by extension, to (5). Substituting ˆ u ( x ) = C ( x ) θ + d ( x ) in (24)gives u ∗ θ = V Σ − m U T b i + V V T ( C θ + d )= V V T C (cid:124) (cid:123)(cid:122) (cid:125) G θ + V Σ − m U T b i + V V T d (cid:124) (cid:123)(cid:122) (cid:125) f = G θ + f (25)From inactive constraints, we get ∀ j ∈ IA ( u ∗ θ ): a Tj u ∗ θ − b j < ⇒ a Tj ( G θ + f ) − b j < ⇒ ( a Tj G ) θ < b j − a Tj f = ⇒ ( A inac G ) θ ≺ b inac − A inac f , (26)6here A inac , b inac are the rows of A, b indexed by the inactive constraints IA ( u ∗ θ ). To get the Lagrangemultiplier µ ∗ i θ , we use (15) and (25), u ∗ θ = ˆ u − A Tac µ ∗ i θ = ⇒ A Tac µ ∗ i θ = ( C − G ) (cid:124) (cid:123)(cid:122) (cid:125) ˜ G θ + ( d − f ) (cid:124) (cid:123)(cid:122) (cid:125) ˜ f = ˜ G θ + ˜ f = ⇒ µ ∗ i θ = 2 A ac ( ˜ G θ + ˜ f ) (cid:107) A Tac (cid:107) (27)From non-negativity of µ ∗ i θ we get µ ∗ i θ ≥ ⇐⇒ − A ac ˜ G θ ≤ A ac ˜ f (28)Thus the desired set Ω where θ belongs is defined using (26) and (28) as follows Ω := { ˆ θ ∈ R p | ( A inac G ) ˆ θ ≺ b inac − A inac f , − A ac ˜ G ˆ θ ≤ A ac ˜ f } (29) ≤ K ≤ M and rank ( A ac ( x )) = 1 Let’s consider the more general case in which there is more than one constraint active, but all these con-straints are linearly dependent on one constraint among them. That is to say there is effectively only one“representative constraint”. Consequently, this case is similar to the case with just one active obstacle. Wenow derive expressions for both u ∗ θ and µ ac θ . Let i , i , · · · , i K ∈ { , · · · , M } be the indices of the K activeconstraints which satisfy a Ti ( x ) a Ti ( x )... a Ti K ( x ) u ∗ θ = b i ( x ) b i ( x )... b i K ( x ) or A ac ( x ) u ∗ θ = b ac ( x ) , (30)where a Ti j ( x ) and b i j ( x ) are defined using (6). Using the SVD of A ac (30), we can derive an expression for u ∗ θ ( x ). Let A ac = U Σ V T where U := (cid:2) U , U (cid:3) where, U ∈ R K × , U ∈ R K × K − Σ := (cid:20) Σ r × K − × K − × (cid:21) V := (cid:2) V , V (cid:3) where, V , V ∈ R . (31)Choosing u = V ˜ u + V ˜ u , from (30) we get A ac u − b ac = (cid:2) U , U (cid:3) (cid:20) Σ r
00 0 (cid:21) (cid:20) V T V T (cid:21) (cid:2) V , V (cid:3) (cid:20) ˜ u ˜ u (cid:21) − b ac = (cid:2) U , U (cid:3) (cid:18) (cid:20) Σ r ˜ u (cid:21) − (cid:20) U T b ac U T b ac (cid:21) (cid:19) . (32)Since (cid:107)(cid:107) is unitary invariant, from (30) (cid:107) A ac u − b ac (cid:107) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) Σ r ˜ u (cid:21) − (cid:20) U T b ac U T b ac (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13) Σ r ˜ u − U T b ac (cid:13)(cid:13) + (cid:13)(cid:13) U T b ac (cid:13)(cid:13) . (33)The minimum norm is achieved for ˜ u = Σ − r U T b ac . Choosing ˜ u = ψ ∈ R , the “least-squares” solutions are u = V Σ − r U T b ac + V ψ. (34)Computing ψ by minimizing (cid:107) u − ˆ u (cid:107) , we get ψ ∗ = V T ˆ u which gives u ∗ θ = V Σ − r U T b ac + V V T ˆ u . (35)This control is the solution to (16) and by extension, to (5). Substituting ˆ u ( x ) = C ( x ) θ + d ( x ) in (35) gives u ∗ θ = V Σ − r U T b ac + V V T ( C θ + d )= V V T C (cid:124) (cid:123)(cid:122) (cid:125) G θ + V Σ − r U T b ac + V V T d (cid:124) (cid:123)(cid:122) (cid:125) f = G θ + f (36)Following the approach in (26), for the inactive constraints, we get:( A inac G ) θ ≺ b inac − A inac f (37)7o get the Lagrange multipliers µ ac θ , we use (15) and (36), u ∗ θ = ˆ u − A Tac µ ac θ = ⇒ A Tac µ ac θ = ( C − G ) θ + ( d − f )= ˜ G θ + ˜ f = ⇒ A Tac µ ac θ = 2( ˜ G θ + ˜ f )= ⇒ A Tac µ ac θ = w (38)We will call the right handside 2( ˜ G θ + ˜ f ) = w for convenience of notation. Here, given the fact that A Tac ∈ R × K and rank ( A Tac ) = 1, the Lagrange multipliers are underdetermined. We use the SVD of A Tac = ˜ U ˜Σ ˜ V T to deduce an expression for µ ac θ . Here˜ U = (cid:2) ˜ U , ˜ U (cid:3) where, ˜ U , ˜ U ∈ R ˜Σ = (cid:20) ˜Σ r × K − × × K − (cid:21) ˜ V = (cid:2) ˜ V , ˜ V (cid:3) where, ˜ V ∈ R K × , ˜ V ∈ R K × K − . (39)As before we represent, µ = ˜ V ˜ µ + ˜ V ˜ µ . Since (cid:107)(cid:107) is unitary invariant, from (38) (cid:13)(cid:13) A Tac µ − w (cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) Σ r ˜ µ (cid:21) − (cid:20) ˜ U T w ˜ U T w (cid:21)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13) Σ r ˜ µ − ˜ U T w (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ˜ U T w (cid:13)(cid:13)(cid:13) . (40)The minimum norm is achieved for ˜ µ = Σ − r ˜ U T w . Choosing ˜ µ = η ∈ R , the “least-squares” solutions are µ = V Σ − r ˜ U T w + V η . (41)Then using (38), we get µ ac θ = 2 ˜ V ˜Σ − r ˜ U T ( ˜ G θ + ˜ f ) + ˜ V η , (42)where η ∈ R K − are floating variables which can assume any value. From non-negativity of µ ac θ we get µ ac θ (cid:23) ⇐⇒ − V ˜Σ − r ˜ U T ˜ G θ − ˜ V η (cid:22) V ˜Σ − r ˜ U T ˜ f (43)Thus, the desired set Ω where θ belongs is defined using (37) and (43) as follows Ω := { ˆ θ ∈ R p | ( A inac G ) ˆ θ < b inac − A inac f , − V ˜Σ − r ˜ U T ˜ G ˆ θ − ˜ V η (cid:22) V ˜Σ − r ˜ U T ˜ f if ∃ η ∈ R K − } (44) K = 2 and rank ( A ac ( x )) = 2 Consider the case where there are exactly two constraints that are active and linearly independent. In thiscase, there are as many degrees of freedom in control as the number of independent active obstacles to avoid.Consequently, u ∗ θ is completely determined by the active constraints and hence does not depend on ˆ u . Weformally demonstrate this claim as follows. Let i , i ∈ { , , · · · , M } be the indices of the two constraintsthat are active. These constraints satisfy (30) except that here A ac ( x ) ∈ R × and rank ( A ac ( x )) = 2. Thisproblem is well-posed, its solution is given by A ac ( x ) u ∗ θ = b ac ( x )= ⇒ u ∗ θ = A − ac ( x ) b ac ( x ) (45)where the inverse exists because rank ( A ac ( x )) = 2. Since neither A − ac nor b ac depend on θ (6), u ∗ θ also doesnot depend on θ . Hence, point-wise identification is not possible. For feasible region-based identification,we cannot get information from inactive constraints because that is also contingent upon on u ∗ θ dependingon θ (subsec. 5.1). Nevertheless, non-negativity of Lagrange multipliers is still useful. Since constraints i and i are active, it follows from dual feasibility (9) and subsec. 5.1 that µ ∗ i θ ≥ µ ∗ i θ ≥
0. Define µ ac θ = ( µ ∗ i θ , µ ∗ i θ ), then from (15) and (45), we have u ∗ θ = ˆ u − A Tac µ ac θ = ⇒ A − ac b ac = ˆ u − A Tac µ ac θ = ⇒ µ ac θ = 2 A − Tac (ˆ u − A − ac b ac )= ⇒ µ ac θ = 2 A − Tac ( C θ + d − A − ac b ac )= ⇒ µ ac (cid:23) ⇐⇒ − A − Tac C θ (cid:22) A − Tac ( d − A − ac b ac ) (46)Thus the desired set Ω is defined using (13) as follows Ω := { ˆ θ ∈ R p | − A − Tac C ˆ θ (cid:22) A − Tac ( d − A − ac b ac ) } (47)where we have omitted the dependence on x and x oj to keep the notation light.8 .5 < K ≤ M and rank ( A ac ( x )) = 2 Finally, let’s consider the case where there are
K > u ∗ θ nor the robot dynamics depend on θ making point-wise identification infeasible.Let i , i , · · · , i K ∈ { , , · · · , M } be the indices of the active constraints. These constraints satisfy (30)except that here rank ( A ac ( x )) = 2. This problem is well-posed albeit overdetermined, its solution is givenby u ∗ θ = A † ac ( x ) b ac ( x ) (48)where A † ac denotes the Moore-Penrose pseudoinverse which exists because rank ( A ac ( x )) = 2. Since u ∗ θ isindependent of θ , point-wise identification is not possible. Likewise, for feasible region-based identification,we cannot get information from inactive constraints. Nevertheless, non-negativity of Lagrange multipliers isstill useful. Since constraints i , i , · · · , i K are active, it follows from dual feasibility (9) and subsec. 5.1 that µ ∗ i θ , · · · , µ ∗ i K θ ≥
0. From (15) and (48), we have A † ac b ac = ˆ u − A Tac µ ac θ = ⇒ A Tac µ ac θ = 2( C θ + d − A † ac b ac )= ⇒ A Tac µ ac θ = w (49)We defined w = 2( C θ + d − A † ac b ac ) to simplify notation. The above linear system for determining µ ac θ isfull rank but underdetermined because rank ( A Tac ) = 2. The SVD of A Tac is A Tac = ˜ U ˜Σ ˜ V T where˜ U ∈ R × ˜Σ = (cid:2) ˜Σ m × K − (cid:3) where, ˜Σ m ∈ R × ˜ V = (cid:2) ˜ V , ˜ V (cid:3) where, ˜ V ∈ R K × , ˜ V ∈ R K × K − . (50)Since ˜ V forms a basis for R K , any µ can be expressed as µ = (cid:2) ˜ V , ˜ V (cid:3) (cid:20) ˜ µ ˜ µ (cid:21) = ˜ V ˜ µ + ˜ V ˜ µ . = ⇒ A Tac µ − w = ˜ U (cid:2) Σ m , (cid:3) (cid:20) ˜ V T ˜ V T (cid:21) (cid:2) ˜ V , ˜ V (cid:3) (cid:20) ˜ µ ˜ µ (cid:21) − w = ˜ U Σ m ˜ µ + 0 · ˜ µ − w = 0 (51)Choosing ˜ µ = Σ − m ˜ U T w and ˜ µ = η ∈ R K − , we find that µ ac θ = ˜ V ˜Σ − m ˜ U T w + ˜ V η = 2 ˜ V ˜Σ − m ˜ U T ( C θ + d − A † ac b ac ) + ˜ V η (52)Here η ∈ R K − are floating variables which can assume any value. From non-negativity of µ ac θ we get µ ac θ (cid:23) ⇐⇒ − V ˜Σ − m ˜ U T C θ − ˜ V η (cid:22) V ˜Σ − m ˜ U T ( d − A † ac b ac ) (53)Thus the desired set Ω where θ belongs is defined using (53) as follows Ω := { ˆ θ ∈ R p | − V ˜Σ − m ˜ U T C ˆ θ − ˜ V η (cid:22) V ˜Σ − m ˜ U T ( d − A † ac b ac ) if ∃ η ∈ R K − } (54) In this section, we present simulations to demonstrate the effectiveness of our proposed region-based pa-rameter identification. We consider a multirobot system in which the task of each robot is to reach a goalposition while avoiding collisions with other robots. The observer’s problem is to infer the goal of each robotusing the its positions and velocities as measurements. We use an Unscented Kalman Filter (UKF) as abaseline for goal estimation, to show how our region-based estimator outperforms a UKF.To infer robot goals using the region-based estimator, the observer must estimate the instantaneousfeasible region Ω ( t ) using which it will compute the cumulative feasible region Θ ( t ). Recall from (3), that Θ ( t ) is computed by taking intersections of all Ω ( t ) over time, and then its intersection with a compactset Θ where the θ is known to belong. For the goal inference problem, the observer can compute theinstantaneous set Ω ( t ) using the results we derived in subsections 6.1-6.5 by checking how many obstaclesare active at a given time (17). As for Θ , we assume that the observer knows some reasonable upper andlower bounds on the location of the robot goals. In subsec. 6.4, we showed that when two or more obstaclesare active at a given time, point-wise parameter identification is not possible because robot dynamics areindependent of the underlying parameters. The feasible region based estimator, however, can give produce abounded set where the underlying parameter belongs. To substantiate this, we first show results for a singlerobot navigating in an environment consisting of static obstacles. Subsequently, we will show results formultirobot inference as well. In Figs. 1(a)-1(c), an ego robot (red) is trying to reach its goal x d shown inblack. The green regions correspond to Θ ( t ) computed using the instantaneous feasible regions Ω ( t ). As therobot moves to the right, obstacles one and two remain active until t = 1 . s . After this, obstacles three and9 a) t = 0 . s (b) t = 1 . s (c) t = 2 . s (d) Error Comparison Figure 1: (a)-(c) Goal identification for a robot navigating among static obstacles. Dark discs representactive obstacles. (d) Comparison of region-based estimation (blue) with a UKF. Video at https://youtu.be/jH3mxZhX2mA (a) t = 0 . s (b) t = 1 . s (c) t = 3 . s (d) Error Comparison Figure 2: (a)-(c) Goal identification for a robot navigating among static obstacles. Dark discs representactive obstacles. (d) Comparison of region-based estimation (blue) with a UKF (red). Video at https://youtu.be/VthIXiBvjfU four stay active until t = 2 . s . Therefore, a point-wise estimator will not result in reduction in estimationerror. Thus, until t = 2 . s , the UKF based estimator (red) does not get updated as is evident from Figs.1(a)-1(b). On the contrary, the green regions continue to shrink. The inset in these figures shows a zoomedin view around the goal to empirically demonstrate that the feasible region always remains non-empty andalways encompasses the goal. In Fig. 1(c), all the obstacles are inactive so the UKF estimator will converge.Additionally, the feasible region based estimator will also provide feasible regions based on the set definedin subsec 6.1. We measure the Vol ( Θ ( t ) of the feasible regions by computing the area of the green regionsusing the MPT3 toolbox in MATLAB. Fig. 1(d) shows this area as a function of time and demonstrates thefast convergence as the robot moves, in comparison to the UKF estimator based error shown in red.In Figs. 2(a)-2(c), we consider a different arrangement of obstacles. Here, until t = 0 . s , only oneobstacle is active; then until t = 2 .
08s exactly two obstacles are active, then until t = 3 . s exactly one isactive, following which all obstacles are inactive. As the robot moves, the region-based estimator convergesquickly to the true goal as is evident in the insets in Figs. 2(a)-2(c). Further, the errors shown in 2(d)illustrate how the UKF estimator takes some time to converge while the region based estimator convergesmuch faster to the true goal. Furthermore, the UKF estimator does not respect the feasibility of the goal,it frequently remains outside the feasible region. This is to be expected because the derivation for UKFestimator does not consider non-negativity of Lagrange multipliers. 2(d).Finally, we consider a multirobot system in Fig. 3 in which we run parallel region-based estimatorssynchronously. There are four robots shown in different colors, their goals are shown in the same colors asare their feasible regions Θ i ( t ) ∀ i ∈ { , , , } . Fig. 4 compares the convergence of region based estimator(left) to the UKF estimator (right). As is evident, the region-based estimators converge to the true goalmuch faster than the UKF based estimators. (a) t = 0 . s (b) t = 0 . s (c) t = 0 . s (d) t = 3 . s Figure 3: (a)-(d) Region-based goal identification for multirobot system. Video at https://youtu.be/eh3nApDVUws
In this paper, we proposed a region-based parameter identification method to compute bounds on parametersof tasks being performed by a multirobot system. This approach works even when exact estimation usingpoint-wise estimators is likely to fail. We used KKT conditions to describe the instantaneous feasible set ofthe underlying parameters using which we computed contracting sets where the parameters must belong. Todemonstrate the effectiveness of our method, we showed numerical simulations for inferring goals of a singlerobot moving amongst static obstacles as well as for each robot in a multirobot system. These scenariosshow how our region based estimator outperforms a UKF in terms of speed of convergence. There are severaldirections that we would like to pursue further. Firstly, we want to see how physical intervention by theobserver be designed to further expedite the convergence of the region-based estimator. Additionally, we alsowant to explore connections between our work and prior work on inference using inverse convex optimization.
References [1] G. Kantor, S. Singh, R. Peterson, D. Rus, A. Das, V. Kumar, G. Pereira, and J. Spletzer, “Distributedsearch and rescue with robot and sensor teams,” in
Field and Service Robotics , pp. 529–538, Springer.[2] J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing networks,”
IEEETransactions on robotics and Automation , vol. 20, no. 2, pp. 243–255, 2004.[3] W. Burgard, M. Moors, C. Stachniss, and F. E. Schneider, “Coordinated multi-robot exploration,”
IEEE Transactions on robotics , vol. 21, no. 3, pp. 376–386, 2005.[4] P. Ogren, M. Egerstedt, and X. Hu, “A control lyapunov function approach to multiagent coordination,”
IEEE Transactions on Robotics and Automation , vol. 18, no. 5, pp. 847–851, 2002.[5] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in networked multi-agentsystems,”
Proceedings of the IEEE , vol. 95, no. 1, pp. 215–233, 2007.[6] A. Byravan, M. Monfort, B. Ziebart, B. Boots, and D. Fox, “Graph-based inverse optimal control forrobot manipulation,” in
Twenty-Fourth International Joint Conference on Artificial Intelligence , 2015.[7] S. Thrun, “Probabilistic robotics,”
Communications of the ACM , vol. 45, no. 3, pp. 52–57, 2002.[8] J. Na, M. N. Mahyuddin, G. Herrmann, X. Ren, and P. Barber, “Robust adaptive finite-time parameterestimation and control for robotic systems,”
International Journal of Robust and Nonlinear Control ,vol. 25, no. 16, pp. 3045–3071, 2015.[9] V. Adetola and M. Guay, “Performance improvement in adaptive control of linearly parameterizednonlinear systems,”
IEEE Transactions on Automatic Control , vol. 55, no. 9, pp. 2182–2186, 2010.[10] M. Hartman, N. Bauer, and A. R. Teel, “Robust finite-time parameter estimation using a hybrid systemsframework,”
IEEE transactions on automatic control , vol. 57, no. 11, pp. 2956–2962, 2012.[11] J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. De Moor, “A note on persistency of excitation,”
Systems & Control Letters , vol. 54, no. 4, pp. 325–329, 2005.[12] C. Yang, Y. Jiang, W. He, J. Na, Z. Li, and B. Xu, “Adaptive parameter estimation and control designfor robot manipulators with finite-time convergence,”
IEEE Transactions on Industrial Electronics ,vol. 65, no. 10, pp. 8112–8123, 2018. 1113] J. Zhao, X. Wang, G. Gao, J. Na, H. Liu, and F. Luan, “Online adaptive parameter estimation forquadrotors,”
Algorithms , vol. 11, no. 11, p. 167, 2018.[14] J. S. Grover, C. Liu, and K. Sycara, “Parameter identification for multirobot systems using optimizationbased controllers (extended version).” https://arxiv.org/abs/2009.13817 , 2020.[15] V. Adetola and M. Guay, “Finite-time parameter estimation in adaptive control of nonlinear systems,”
IEEE Transactions on Automatic Control , vol. 53, no. 3, pp. 807–811, 2008.[16] L. Wang, A. D. Ames, and M. Egerstedt, “Safety barrier certificates for collisions-free multirobot sys-tems,”
IEEE Transactions on Robotics , vol. 33, no. 3, pp. 661–674, 2017.[17] T. F. Edgar, “Recursive least squares parameter estimation for linear steady state and dynamic models.”[18] P. A. Ioannou and J. Sun,
Robust adaptive control . Courier Corporation, 2012.[19] K. S. Narendra and A. M. Annaswamy,
Stable adaptive systems . Courier Corporation, 2012.[20] J. Grover, C. Liu, and K. Sycara, “Deadlock analysis and resolution in multi-robot systems (extendedversion),” 2019.[21] T. Wei and C. Liu, “Safe control algorithms using energy functions: A uni ed framework, benchmark,and new directions,” in , pp. 238–243,IEEE, 2019.[22] S. Boyd and L. Vandenberghe,