Unified Multi-Rate Control: from Low Level Actuation to High Level Planning
11 Unified Multi-Rate Control: from Low LevelActuation to High Level Planning
Ugo Rosolia,
Member, IEEE, , Andrew Singletary,
Member, IEEE, and Aaron D. Ames,
Senior Member, IEEE
Abstract —In this paper we present a hierarchical multi-ratecontrol architecture for nonlinear autonomous systems operatingin partially observable environments. Control objectives areexpressed using syntactically co-safe Linear Temporal Logic(LTL) specifications and the nonlinear system is subject to stateand input constraints. At the highest level of abstraction, wemodel the system-environment interaction using a discrete MixedObservable Markov Decision Problem (MOMDP), where theenvironment states are partially observed. The high level controlpolicy is used to update the constraint sets and cost function ofa Model Predictive Controller (MPC) which plans a referencetrajectory. Afterwards, the MPC planned trajectory is fed toa low-level high-frequency tracking controller, which leveragesControl Barrier Functions (CBFs) to guarantee bounded trackingerrors. Our strategy is based on model abstractions of increasingcomplexity and layers running at different frequencies. We showthat the proposed hierarchical multi-rate control architecturemaximizes the probability of satisfying the high-level specifica-tions while guaranteeing state and input constraint satisfaction.Finally, we tested the proposed strategy in simulations and ex-periments on examples inspired by the Mars exploration mission,where only partial environment observations are available.
Index Terms —partially observable, predictive control, controlbarrier function, multi-rate control, hierarchical control.
I. I
NTRODUCTION
Control design for complex cyber-physical systems, whichare described by continuous and discrete variables, is usuallydivided into different layers [1]–[9]. Each layer is designedusing model of increasing accuracy and complexity, whichallow the controller to take high-level decision, e.g. performan overtaking maneuver, and to compute low-level commands,e.g. the input current to a motor. High-level decisions and low-level control actions are computed at different frequencies andthe interaction between layers should be taken into account toguarantee safety of the closed-loop system [1].Control policies for high-level decision making are usuallysynthesized using discrete model abstractions and the high-level control objectives are often expressed by Linear Tem-poral Logic (LTL) formulas [10], as they are a formalism toexpress high-level system behaviors using logical and temporaloperators [10]. Motion planning with LTL and syntacticallyco-safe LTL (scLTL) specifications has been widely studied inliterature [1]–[3], [5], [6], [11]–[20]. For deterministic systemswith finite-state spaces several approaches and toolboxes areavailable for synthesis [1]–[3], [11]–[13]. When the system-environment interaction are uncertain, the high-level abstrac-tions are described by discrete Markov Decision Processes
U. Rosolia, A. Singletary and A. D. Ames are with the AMBERlab at the California Institute of Technology, Pasadena, CA, USA, e-mail: { urosolia, asinglet, ames } @caltech.edu . Fig. 1. This figure shows an environment composed of cells, obstacles(yellow boxes) and uncertain regions (light brown). In this example the goalof the controller is to explore the state space in order to find a science sample. (MDPs) and the high-level decision making problem can besolved exactly using dynamic programming, policy iterationand linear programming strategies [21]. On the other hand,when the system dynamics are uncertain and only partialobservations are available, the system-environment interactioncan be modeled using discrete Partially Observable MarkovDecision Processes (POMPDs). Computing a control policyin POMDPs settings is NP-hard [22], but approximate solu-tions can be computed using finite state controllers [23] andperforming point-based approximations [24].Given a high-level decision, reachability-based tech-niques [1], [2] or simulation-based abstractions [3], [4] maybeused to compute a goal set for the continuous time system, e.g.,a subset of a lane where we would like to drive the vehiclewhen performing an overtaking maneuver. Therefore, the inputto the system’s actuators is computed solving mid-level plan-ning and low-level control problems, which have been studiesextensively in literature [9], [25]–[34]. The planning problemis usually defined for a simplified model and the resultingreference trajectory is then tracked using low-level controllers,which leverage the nonlinear system dynamics. Trackingcontrollers may be synthesized using Hamilton-Jacobi (HJ)reachability analysis [9] or sum-of-squares programming [29],[30]. Another strategy to solve mid-level planning and low-level control problems is to use nonlinear tube MPC [31]–[35], where the difference between the planned trajectory andthe actual one is over approximated using Lyapunov basedanalysis or Lipschitz properties of the nonlinear dynamics.When the planned trajectory is computed without taking intoaccount tracking errors, safety can be guaranteed using filterswhich, given a desired mid-level command and/or planned a r X i v : . [ ee ss . S Y ] J a n Fig. 2. Multi-rate control architecture. The high-level decision maker leverages the system’s state x ( t ) and partial environment observations o k to compute agoal cell, the constraint set and the goal positions, which are fed to the mid-level MPC planner. The planner computes a reference trajectory given the trackingerror bounds E from the low-level tracking controller. Finally at the lowest level, the control action is computed summing up the mid-level input u m ( t ) andthe low-level input u l ( t ) . trajectory, compute a safe control action using CBFs [25]–[27],feasibility of an MPC problem [28] or reachability analysis [8].In this work we present a multi-rate hierarchical controlscheme for nonlinear systems operating in partially observableenvironments. Our architecture, which is composed by threelayers running at different frequencies, guarantees constraintsatisfaction and maximization of the closed-loop probabil-ity of satisfying the high-level specifications. At the lowestlevel, we leverage Control Barrier Functions (CBFs) andControl Lyapunov Functions (CLFs), which are based onthe continuous time nonlinear system model and guarantee abounded tracking error. The mid-level planning layer computesa reference trajectory using an MPC, which leverages asimplified prediction model and the low-level tracking errorbounds. Finally, at the highest level of abstraction we modelthe system-environment interaction using Mixed ObservableMarkov Decision Processes (MOMDPs), which allows us toaccount for partial environment observations. Contribution:
Our contribution is threefold. First, we showhow to integrate a CLF-CBF tracking controller with anMPC planner. Compared to our previous work [36], the MPCplanner is based on a fixed-tube robust MPC scheme [37],where the initial state of the planned trajectory is an opti-mization variable. For this reason, the proposed strategy doesnot require the online computation of robust reachable sets toformulate the MPC problem and therefore it is computationallymore efficient than the approach proposed in [36]. Second,we introduce a mid-level planner, which leverages an MPCwith time-varying constraint sets and cost function. Thesetime-varying components are given by the high-level decisionmaker and they can jeopardize the feasibility of the MPCproblem. For this reason, we propose a contingency scheme,which guarantees feasibility of the MPC planner with time-varying components. The feasibility of this contingency planand the low-level tracking error bounds guarantee safety, whena local reachability assumption on the system dynamics issatisfied. Such reachability assumption, which is tailored to navigation problems, together with the proposed contingencyscheme allows us to avoid the construction of finite stateabstractions defined over the entire state space. Third, weshow how to model the system-environment interaction usingMixed Observable Markov Decision Processes (MOMDPs),where the system state is fully observable and the environmentstate is partially observable. We use the high-level action fromthe MOMDP to update the MPC time-varying componentsand we show that our hierarchical strategy guarantees that theprobability of satisfying the high-level specifications is maxi-mized. Finally, we test our strategy on navigation task shownin Figure 1, where a Segway like-robot has to find sciencesamples while navigating a partially observable environment.This paper is organized as follows. Section II describesthe problem formulation. The hierarchical architecture is in-troduced in Section III and the closed-loop properties arediscussed in Section IV. Finally, we illustrate the effectivenessof the proposed strategy with high-fidelity simulations andhardware experiments.
Notation:
The Minkowski sum of two sets
X ⊂ R n and Y ⊂ R n is denoted as X ⊕ Y , and the Pontryagin differenceas
X (cid:9)Y . K e is the set of extended class- K e functions β whichare strictly increasing and β (0) = 0 . For for a set A ⊂ R n and a vector x ∈ R n , we denote the projectionProj ( x, A ) = argmin d ∈A || x − d || . and the cardinality of the set A as |A| . We define Z = { , , , . . . } and R = { x ∈ R n | x ≥ } which denote theset of positive integers and real numbers, respectively. Finally,given t ∈ R and T ∈ Z we define (cid:98) t/T (cid:99) = floor ( t/T ) .II. P ROBLEM F ORMULATION
This section describes the problem formulation. First, weintroduce the continuous system dynamics. Afterwards, wepresent the discrete environment model. Finally, we describethe synthesis goals and we summarize the overall controlarchitecture from Figure 2.
System Model : As discussed in the introduction, our goalis to design a controller for nonlinear dynamical systems. Inparticular, we consider nonlinear control affine systems of thefollowing form: ˙ x ( t ) = f (cid:0) x ( t ) (cid:1) + g (cid:0) x ( t ) (cid:1) u ( t ) , (1)where f and g are Lipschitz continuous, the input u ( t ) ∈ R n u and the state x ( t ) = [ p (cid:62) ( t ) , q (cid:62) ( t )] (cid:62) ∈ R n x for the positionvector p ( t ) ∈ R n p and the vector q ( t ) ∈ R n q collecting theremaining states. Furthermore, the above system is subject tothe following state and input constraints: u ( t ) ∈ U , p ( t i ) ∈ X p and q ( t i ) ∈ X q (2)for all t ∈ R and t i = iT for all i ∈ Z . The time constant T is specified by the user and, as it will be clear later on,it defines the frequency at which the controller updates theplanned trajectory. In the above equation (2), X p representsfree space and X q is a user-defined constraint set. Remark 1.
We consider state constraints which are enforcedpointwise in time to streamline the presentation. The proposedcontrol strategy can be extended to account for constraintswhich must hold for all time t ∈ R . In this case, itis required to modify the low-level controller as discussedin [36]. Environment Model : We consider nonlinear dynamical sys-tems operating in partially observable environments, whichare partitioned into C , . . . , C c cells as in the example fromFigure 1. We assume that the state of the system is per-fectly observable, but we are given only partial observationsabout the environment state. Thus, at the highest level ofabstraction, we model the interaction between the nonlinearsystem (1) and the environment using a Mixed ObservableMarkov Decision Process (MOMDP). A MOMDP provides asequential decision-making formalism for high-level planningunder mixed full and partial observations [38] and it is definedas tuple ( S , Z , A , O , T s , T z , O ) , where • S = { , . . . , |S|} is a set of fully observable states; • Z = { , . . . , |Z|} is a set of partially observable states; • A = { , . . . , |A|} is a set of actions; • O = { , . . . , |O|} is the set of observations for thepartially observable state z ∈ Z ; • The function T s : S × Z × A × S → [0 , describes theprobability of transitioning to a state s (cid:48) given the action a and system’s state ( s, z ) , i.e., T s ( s, z, a, s (cid:48) ) := P ( s k +1 = s (cid:48) | s k = s, z k = z, a k = a ); • The function T z : S×Z×A×S×Z → [0 , describes theprobability of transitioning to a state z (cid:48) given the action a ,the successor observable state s (cid:48) and the system’s currentstate ( s, z ) , i.e., T z ( s, z, a, s (cid:48) , z (cid:48) ):= P ( z k +1 = z (cid:48) | s k = s, z k = z, a k = a, s k +1 = s (cid:48) ); • The function O : S × Z × A × O → [0 , describes theprobability of observing the measurement o ∈ O , given the current state of the system ( s (cid:48) , z (cid:48) ) ∈ S × Z and theaction a applied at the previous time step, i.e., O ( s (cid:48) , z (cid:48) , a, o ) := P ( o k = o | s k = s (cid:48) , z k = z (cid:48) , a k − = a ); MOMDPs were introduced in [38] to model systems where asubspace of the state space is perfectly observable. As we willdiscuss later on, our hierarchical architecture leverages robustcontrol methodologies to guarantee that the high-level transi-tions of the observable state are deterministic. Therefore, weconsider MOMDPs with the following transitions dyanmics: T s ( s, z, a, s (cid:48) ) = (cid:40) If s (cid:48) = f s ( s, z, a )0 Else , for the high-level update function f s : S × Z × A → S . Specifications:
High-level objectives are expressed usingsyntactically co-safe Linear Temporal Logic (scLTL) speci-fications. An scLTL specification is defined as follows: ψ := p | ¬ p | ψ ∧ ψ | ψ ∨ ψ | ψ U ψ | ψ (cid:13) ψ , where the atomic proposition p ∈ { true , false } and ψ, ψ , ψ are scLTL formulas, which can be defined using thelogic operators negation ( ¬ ), conjunction ( ∧ ) and disjunction( ∨ ). Furthermore, scLTL formulas can be specified usingthe temporal operators until ( U ) and next ( (cid:13) ). Each atomicproposition p is associated with a subset of the high-level statespace P ⊂ S ×Z and, for a high-level state ω k = ( s k , z k ) , theproposition p is true if ω k ∈ P . Finally, we say that a high-level trajectory ω = [ ω , ω , . . . ] satisfies the specification ψ and we write ω | = ψ (3)when the following holds: i ) ω | = p ⇐⇒ ω k ∈ P , ∀ k ≥ , ii ) ω | = ψ ∧ ψ ⇐⇒ ( ω | = ψ ) ∧ ( ω | = ψ ) , iii ) ω | = ψ ∨ ψ ⇐⇒ ( ω | = ψ ) ∨ ( ω | = ψ ) , iv ) ω | = ψ U ψ ⇐⇒∃ k such that [ ω , . . . , ω k ] | = ψ and [ ω k +1 , ω k +2 , . . . ] | = ψ , v ) ω | = ψ (cid:13) ψ ⇐⇒ [ ω , . . . , ω k ] | = ψ implies that [ ω k +1 , ω k +2 , . . . ] | = ψ .In the example from Figure 1, the high-level specifica-tion is not to collide with an obstacle until the goal isreached. This objective is expressed by the scLTL formula ψ = ¬ Collision U Goal , where the atomic proposition
Collision is true when the system (1) is in a celloccupied by an obstacle and the atomic Goal is true whenthe system (1) reached the goal location. Synthesis Objectives:
Given the system’s state x ( t ) ∈ R n and k observations o k − = [ o , . . . , o k − ] ∈ O k about theenvironment, our goal is to design a control policy π : R n × O k → U , (4)which maps the state x ( t ) and the observation vector o k − tothe continuous control action u ∈ U . Furthermore, the controlpolicy (4) should guarantee that state and input constraints (2)are satisfied and that the probability of satisfying the specifica-tion (3) is maximized. Notice that standard control strategiesfor nonlinear systems can be used to guarantee constraintsatisfaction [25]–[27], [31]–[35]. Furthermore, standard deci-sion making methodologies for Partially Observable Markov Decision Processes (POMDPs) can be used to synthesize acontrol policy which maximizes the probability of satisfyingthe specification [5], [6], [15]–[18]. In this paper, we bridgethe gap between the two communities and we propose ahierarchical control scheme for nonlinear systems operating inpartially observable environments, which guarantees that stateand input constraints are satisfied and that the probability ofsatisfying the specifications is maximized.
Navigation Example:
Figure 1 shows our motivating exam-ple, where a Segway has to reach a goal cell while avoidingknow obstacles and exploring uncertain regions, which maybe traversable with some probability. The Segway dynamicsare nonlinear and the system is open-loop unstable, for thisreason it is required a low-level high frequency controller thatstabilizes the system during operations. On the other hand, atthe highest level of abstraction we model the system using thediscrete state s k ∈ S , which denotes the grid cell containingthe nonlinear system (1), and the environment state z k ∈ Z representing the traversability of the uncertain regions R , R and R . For instance in the example from Figure 1, theenvironment’s state z k = [0 , , as regions R and R arenot traversable and region R is traversable. Strategy Overview : We summarize the proposed multi-ratecontrol architecture depicted in Figure 2. The key idea is todivide the controller into three layers and compute the controlaction u ( t ) as the summation of a high-frequency component u l ( t ) and a low-frequency component u m ( t ) , i.e., u ( t ) = u l ( t ) + u m ( t ) . At the lowest level, the control action u l is updated contin-uously (at high frequency ∼ / kHz ) and it is computedusing Control Barrier Functions (CFBs), which leverage thefull-nonlinear model (1) to track a reference trajectory ¯ x ( t ) .The middle layer updates at a constant frequency the referencetrajectory ¯ x ( t ) and reference input u m , which is computedusing a Model Predictive Controller (MPC). This referencetrajectory steers the system from the current state x ( t ) to a goalcell C k goal . Finally, the high-level planner computes the goal cell C k goal based on partial observations o k about the environment.III. U NIFIED M ULTI -R ATE A RCHITECTURE
In this section, we describe the multi-rate control archi-tecture. First, we design a low-level CLF-CBF controller,which tracks a reference state-input trajectory and guaranteesbounded tracking errors. Afterwards, we show how to updatethe state-input reference trajectory leveraging an MPC, whichis designed using a goal state computed from a discrete high-level decision maker. Finally, we introduce the hierarchicalmulti-rate architecture, which guarantees that the synthesisobjectives from Section II are satisfied.
A. Low-Level Control
We leverage CBFs and CLFs to design a low-level trackingcontroller for the nonlinear system (1). CBFs guarantee safetyfor nonlinear system [26], but they are suboptimal as thecontrol action is computed without forecasting the system’s trajectory. For this reason, we use CBFs to enforce safetyaround a reference state-input trajectory that is computed atlow frequency by the mid-level planner, as shown in Figure 2.
Error Model:
At the lowest layer, the goal of the controlleris to track a reference trajectory ¯ x ( t ) . We assume that thereference trajectory is given by the following Linear Time-Varying (LTV) model: Σ ¯ x : (cid:40) ˙¯ x ( t ) = A (cid:98) t/T (cid:99) ¯ x ( t ) + B (cid:98) t/T (cid:99) u m ( t ) , t ∈ T ¯ x + ( t ) = ∆ ¯ x ( x − ( t )) , t ∈ T c , (5)where T c = ∪ ∞ j =0 { jT } , T = ∪ ∞ j =0 ( jT, ( j +1) T ) and the time T from (2) is specified by the user. Furthermore, we denote ¯ x − ( t ) = lim τ (cid:1) t ¯ x ( τ ) and x + ( t ) = lim τ (cid:1) t ¯ x ( τ ) as the rightand left limits of the reference trajectory ¯ x ( t ) ∈ R n , whichis assumed left continuous. In the above system, the referenceinput u m ( t ) ∈ R d and the reset map ∆ ¯ x , which depends on thestate of the nonlinear system (1), are given by the middle layeras we will discuss in Section III-B. Finally, the time-varyingmatrices ( A (cid:98) t/T (cid:99) , B (cid:98) t/T (cid:99) ) are known and, in practice, may becomputed linearizing the system dynamics (1), as discussed inthe result section.Given the nonlinear system (1) and the LTV model (5), wedefine the error state e ( t ) = x ( t ) − ¯ x ( t ) and the associatederror dynamics: Σ e : (cid:40) ˙ e ( t ) = f e ( x ( t ) , ¯ x ( t ) , u l ( t ) + u m ( t ) , t ) , t ∈ T e + ( t ) = x + ( t ) − ¯ x + ( t ) , t ∈ T c (6)where the time-varying error dynamics are f e ( x, ¯ x, u l + u m , t )= f ( x ) + g ( x )( u l + u m ) − ( A (cid:98) t/T (cid:99) ¯ x + B (cid:98) t/T (cid:99) u m ) . In the above definition, we dropped the dependence on timefor states and inputs to simplify the notation. Furthermore, weintroduce the low-level input constraint set U l ⊂ U and themid-level input constraint set U m ⊂ U which partition theinput space, i.e., U l ⊕ U m = U . Next, we design a low-level controller which guaranteesthat the reference trajectory ¯ x ( t ) from the LTV model (5) istracked within some error bounds. Control Barrier and Lyapunov Functions:
We show how todesign a tracking controller using CBFs and CLFs [26]. First,we define the candidate Lyapunov function V ( e ) = || e || Q , (7)where || e || Q = e (cid:62) Qe . Furthermore, we introduce the follow-ing safe set for the error dynamics (6): E = { e ∈ R n : h e ( e ) ≥ } ⊂ R n . (8)The above function h e is defined by the user and it dependson the application as discussed in the result section. Finally, the CBF associated with the safe set (8), and theCLF from (7) are used to define the following CLF-CBFQuadratic Program (QP): min v l ∈U l ,γ || v l || + c γ s.t. ∂V ( e ) ∂e f e ( x, ¯ x, v l + u m ) ≤ − c V ( e ) + γ∂h e ( e ) ∂e f e ( x, ¯ x, v l + u m ) ≥ − α ( h e ( e )) , (9)where we dropped the time dependence to simplify the nota-tion. In the above QP, the parameters c ∈ R , c ∈ R , α ∈ K e and α ∈ K e . Let v ∗ l ( t ) be the optimal input actionfrom the QP (9), then the low-level control policy is definedas follows: u l ( t ) = π l (cid:0) x ( t ) , ¯ x ( t ) , u m ( t ) (cid:1) = v ∗ l ( t ) . (10) Assumption 1.
The CLF-CBF QP (9) is feasible for all e = x − ¯ x ∈ E and for all u m ∈ U m .The low-level control policy (10) guarantees that the differ-ence between the evolution of the nonlinear system (1) andthe LTV model (5) is bounded. Indeed, when Assumption 1is satisfied, the CLF-CBF QP (9) guarantees invariance ofthe safe set (8) for all t ∈ ( iT, ( i + 1) T ) and i ∈ Z , asdiscussed in Section IV. Next, we show how to design a mid-level planner which leverages the safe set E from (8). B. Mid-Level Planning
In this section we describe the mid-level planning strategy.At this level of abstraction, we assume that we are given a goalgrid cell where we would like to steer the system. Afterwards,we compute a reference state-input trajectory using a ModelPredictive Controller (MPC), which leverages a simplifiedmodel and the tracking error bounds from the previous section.
Grid Model:
Given the state x ( t ) = [ p (cid:62) ( t ) , q (cid:62) ( t )] (cid:62) , wedefine the current grid cell C k curr , which contains the nonlinearsystem (1) for time t ∈ [ t k , t k +1 ) , i.e., p ( t ) ∈ C k curr ⊂ X p , ∀ t ∈ [ t k , t k +1 ) . (11)Similarly, we define the goal cell C k goal , which representsthe region where we want to steer the system for time t ∈ [ t k , t k +1 ) . Finally, we introduce the goal equilibrium sets X k curr and X k goal , which collect the unforced equilibrium statesthat are contained into C k curr and C k goal , i.e., for i ∈ { curr , goal }X ki = { x = [ p, q ] ∈ R n | p ∈ C ki , ˙ x = f ( x ) = 0 } ⊂ R n . (12)Throughout this section, we assume that t k , X k goal , C k curr and C k goal are given by the high-level planner and we synthesize acontroller to drive the system from the current cell C k curr to thegoal cell C k goal . Model Predictive Control:
We design a Model PredictiveController (MPC) to compute the mid-level input u m ( t ) thatdefines the evolution of the reference trajectory (5) and todefine the return map ∆ ¯ x for the LTV model (5). The MPC problem is solved at / T Hertz and therefore the referencemid-level input is piecewise constant, i.e., ˙ u m ( t ) = 0 ∀ t ∈ T = ∪ ∞ k =0 ( kT, ( k + 1) T ) . Next, we introduce the following discrete time linear model: ¯ x d (cid:0) ( i + 1) T (cid:1) = ¯ A i ¯ x d (cid:0) iT (cid:1) + ¯ B i u (cid:0) iT (cid:1) , (13)where the transition matrices are ¯ A i = e A i T and ¯ B i = (cid:90) T e A i ( T − η ) B i dη. for all i ∈ Z . Now notice that, as the mid-level input u m is piecewise constant, if at time t i = iT the state ¯ x ( iT ) =¯ x + ( iT ) = ¯ x d ( iT ) , then at time t i +1 = ( i + 1) T we have that ¯ x − (( i + 1) T ) = ¯ x d (( i + 1) T ) . (14)Given the discrete time model (13), at time t i = iT ∈ T c we solve the following finite time optimal control problem: J ( x ( iT ) , N ) =min v t ,x di | i || x di | i − x ( iT ) || Q e + i + N − (cid:88) t = i h (cid:0) x dt | i , v t | i (cid:1) + || p di + N | i − p k goal || Q f s.t. x dt +1 | i = ¯ A t x dt | i + ¯ B t v dt | i x dt | i = (cid:34) p dt +1 | i q dt +1 | i (cid:35) ∈ X kp,q (cid:9) E , v dt | i ∈ U m x di | i − x ( iT ) ∈ E x di + N | i ∈ X k goal (cid:9) E p , ∀ t = { i, . . . , i + N − } (15)where E is defined in (8), || p || Q = p (cid:62) Qp , X kp,q = (cid:26) x = (cid:20) pq (cid:21) ∈ R n | p ∈ C k curr ∪ C k goal and q ∈ X q (cid:27) (16)and E p = (cid:26) e = (cid:20) e p (cid:21) ∈ R n |∃ e q ∈ R n p and (cid:20) e p e q (cid:21) ∈ E (cid:27) . (17)Notice that the MPC problem (15) is designed based onthe time-varying components X k goal , C k curr , C k goal , p k goal which aregiven by the high-level decision maker, as shown in Figure 2.Problem (15) computes a sequence of open loop actions v dt = [ v dt | t , . . . , v dt + N | t ] and an initial condition x di | i suchthat the predicted trajectory steers the system to the terminalset X k goal , while minimizing the cost and satisfying stateand input constraints. Let v d, ∗ t = [ v d, ∗ t | t , . . . , v d, ∗ t + N | t ] be theoptimal solution and [ x d, ∗ t | t , . . . , x d, ∗ t + N | t ] the associated optimaltrajectory, then the mid-level policy is Π m : (cid:40) u m ( t ) = π m (cid:0) x ( t ) , N (cid:1) = v d, ∗ t | t t ∈ T c ˙ u m ( t ) = 0 t ∈ T (18)Finally, we define the return map from the LTV model (5) asfollows: ∆ ¯ x ( x ( t )) = x d, ∗ t | t . (19) Assumption 2.
Consider the equilibrium set X k curr defined inEquation (12). For all states x ( t ) ∈ X k curr ⊕ E Problem (15) isfeasible with horizon N .The above assumption is satisfied when any equilibriumstate ¯ x ∈ X k curr of the discrete time system (13) can be steeredto the goal equilibrium set X k goal in at most N time steps.More formally, Assumption 2 holds when, for the discretetime system (13), X k goal is N -step backward reachable fromthe set X k curr .In Section IV, we will show that when the nonlinearsystem (1) and the LTV system (5) are in closed-loop withthe low-level policy (10) and the mid-level policy (18), thenstate and input constraints (2) are satisfied for system (1).Furthermore, the nonlinear system (1) is steered from thecurrent cell C k curr to the goal cell C k goal in finite time. C. High Level Decision Making
In this section, we first describe how to compute a controlpolicy which maximizes the probability of satisfying thespecifications. Afterwards, we show how to compute the time-varying components p k goal , C k curr , C k goal and X k goal used in theMPC problem (15). Belief Model:
For the MOMDP from Section II, we have thatthe environment state z k is not perfectly observed. Thereforeas in [38], we introduce the belief space B = { b ∈ R |Z| : (cid:80) |Z| z =1 b ( z ) = 1 } and the belief state b k ∈ B , which representsthe posterior probability that the partially observable state z k equals z ∈ Z , i.e., b k = [ b (1) k , . . . , b ( |Z| ) k ] with b ( z ) k = P ( z k = z | o k , s k , a k − ) , ∀ z ∈ { . . . , |Z|} where at time k the observation vector o k = [ o , . . . , o k ] ,the observable state vector s k = [ s , . . . , s k ] and the actionsvector a k − = [ a , . . . , a k − ] . The belief is a sufficientstatistic and, for all z (cid:48) ∈ Z , it evolves accordingly to thefollowing update equation: b ( z (cid:48) ) k +1 = ηO ( s k +1 , z (cid:48) , a k , o k ) × (cid:88) z ∈Z T s ( s k , z, a k , s k +1 ) T z ( s k , z, a k , s k +1 , z k +1 ) b ( z ) k where η is a normalization constant [38]. Notice that the aboveupdate equation can be written in a compact form, i.e., b k +1 = f b ( s k +1 , s k , o k , a k , b k ) , (20)where f b : S × S × O × A × B → B . Finally, given the belief b k , we introduce the maximum likelihood environment stateestimate: ˆ z k = argmax z ∈Z P ( z k = z | o k , s k , a k − ) = argmax z ∈Z b ( z ) . (21) Quantitative Control Policy:
At the highest level of ab-straction our goal is to compute a control policy π h , whichmaximizes the probability that the high-level trajectory ω satisfies the specifications ψ . Such control control policy canbe computed solving the following quantitative problem: π h = argmax π P π [ ω | = ψ ] , (22) Algorithm 1:
Update High-Level inputs: x ( t ) , o k , s k − , a k − , b k − ; set current high-level state s k = getState ( x ( t )) ; compute current set C k curr = getCell ( s k ) ; update belief b k using (20) ; compute high-level action a k = π h ( s k , b k ) ; compute maximum likely estimate ˆ z k using (21); update state s k +1 = f s ( s k , ˆ z k , a k ) ; compute goal set C k goal = getCell ( s k +1 ) ; computed the forecasted action ˆ a = π h ( s k +1 , b k ) ; compute the forecasted state ˆ s k +2 = f s ( s k +1 , ˆ z k , ˆ a ) ; set forecasted set C k forc = getCell ( s k +1 ) ; get forecasted cell center c forc = getCenter ( C k forc ) ; compute goal position p k goal = Proj ( c forc , C k goal ) ; return: a k , b k , s k , C k goal , C k curr , p k goal where P π [ ω | = ψ ] represents the probability that the speci-fication ψ is satisfied for the closed-loop trajectory ω underthe policy π . The solution to the above qualitative problemcan be approximated using point-based and simulation-basedstrategies [5], [6], [15]–[17]. The resulting high-level controlpolicy maps the high-level state s k and the environment belief b k to the high-level control action a k , i.e., a k = π h ( s k , b k ) . (23)The high-level policy (22) is leveraged in Algorithm 1 tocompute the goal position p k goal and the sets C k curr and C k goal ,which are used in the MPC problem (15). In Algorithm 1, wefirst use the function getState , which maps the current state x ( t ) to the high-level state s k representing the cell containingthe nonlinear system 1 (line 2). Then, we compute the currentcell C k curr associated with the high-level state s k using thefunction getCell (line ). Afterwards, we update the beliefstate b k and we compute the control action a k (lines − ).Given the control action a k and the maximum likelihoodestimator of the environment state ˆ z k , we update the high-level state and we compute the goal cell C k goal (lines − ).Next, given the current belief b k , we forecast the action ˆ a andstate ˆ s k +2 (lines − ). These quantities are used to computethe forcasted cell center c forc ∈ R n p and the forecasted cell C k forc associated with the forecasted state ˆ s k +2 (lines − ). Fig. 3. The above figure illustrated the high-level updated from Algorithm 1.
Finally, the goal cell C k goal and the forecasted center c forc ∈ R n p are used to compute the goal position p goal (line ).Figure 3 illustrates Algorithm 1. In this example, the Seg-way is located in the top left corner of the grid and the currenthigh-level action a k is to move east. The figure also showsthe forecasted action ˆ a that the Segway would take from thegoal region, if the belief b k is not updated. Basically, ˆ a is ahigh-level open-loop prediction of the future control action andit is used to incorporate forecast into the high-level decisionmaker. Indeed, the goal position p k goal is computed projectingthe forecasted cell center c forc onto the goal cell C k goal . D. Control Architecture
Finally, we introduce the multi-rate hierarchical controlarchitecture which leverages the low-level, mid-level and high-level control policies from the previous sections. The multi-rate control Algorithm 2 details the architecture depicted inFigure 2. When the nonlinear system (1) reaches the goal cell(i.e., p ( t ) ∈ C k goal ), the high-level decision maker reads the newobservations o k +1 and updates high-level state, action, goal Algorithm 2:
Multi-Rate Control inputs: k , s k , b k , a k , i , x ( t ) , u m ( t ) , ¯ x ( t ) , C k curr , C k goal , p k goal , N ki , C k − curr , C k − curr , p k − goal , N k − i ; if q ( t ) ∈ C k goal or k = 0 then // Update high-level goal measure o k +1 ; update a k +1 , b k +1 , s k +1 , X k goal , C k +1 goal , C k +1 curr , p k +1 goal using Algorithm 1 with x ( t ) , o k +1 , s k , a k , b k ; set N k +1 i = N ; k = k + 1 ; end if t ∈ T c = ∪ ∞ j =0 { jT } then // Update mid-level plan solve MPC problem (15) with N = N ki , and X k goal , C k curr , C k goal , p k goal ; if the MPC problem (15) is not feasible then solve MPC problem (15) with N = N k − i , and X goal k − , C k − curr , C k − goal , p k − goal ; set N k − i +1 = max(1 , N k − i − ; set N ki +1 = N ki ; else set N k − i +1 = N k − i ; set N ki +1 = max(1 , N ki − ; end set u m ( t ) = v d, ∗ t | t + K ( x ( t ) − ¯ x d, ∗ t | t ) ; update ¯ x ( t ) = ∆ ¯ x ( x ( t )) = ¯ x d, ∗ t | t ; i = i + 1 ; end // Compute low-level control solve the CBF problem (9) ; Compute total input u ( t ) = u l ( t ) + u m ( t ) ; Return: u ( t ) , k , s k , b k , a k , i , x ( t ) , u m ( t ) , ¯ x ( t ) , C k curr , C k goal , p k goal , N ki , N k − i position p k goal , goal cell C k goal and current cell C k curr (lines − ).Finally, it updates the high-level time k and it initializes theMPC horizon N ki = N . Afterwards, the mid-level planner(lines − ) updates the mid-level time counter i and theplanned trajectory at a constant frequency of /T Hertz. First,it solves the MPC problem (15) with N = N ki and time-varying components X k goal , C k goal , C k curr and p k goal . If the MPCproblem is not feasible, the planner computes a contingencyplan (lines − ), otherwise it updates the prediction horizon(lines 15-16). Finally, Algorithm 2 computes the low-levelcontrol action solving the CLF-CBF QP (9) and the totalcontrol input u ( t ) = u l ( t ) + u m ( t ) .IV. S AFETY AND P ERFORMANCE G UARANTEES
In this section we show the properties of the proposed multi-rate control architecture. We consider the augmented system: Σ aug : ˙ x ( t ) = f (cid:0) x ( t ) (cid:1) + g (cid:0) x ( t ) (cid:1)(cid:0) u l ( t ) + u m ( t ) (cid:1) , t ≥ x ( t ) = A (cid:98) t/T (cid:99) ¯ x ( t ) + B (cid:98) t/T (cid:99) u m ( t ) , t ∈ T ¯ x + ( t ) = ∆ ¯ x ( x − ( t )) , t ∈ T c (24)where the nonlinear dynamics for state x ( t ) ∈ R n are definedin (1) and the LTV model for the nominal state ¯ x ( t ) ∈ R n is defined in (5) for the reset map (19) given by the MPC.In what follows, we analyse the properties of the proposedmulti-rate control Algorithm 2 in closed-loop with system (24).We show that the closed-loop system satisfies state and inputconstraints (2) and that the proposed algorithm maximizesthe probability of satisfying the specifications. Notice that inpractice the state x ( t ) is given by the nonlinear system (1),whereas the nominal state ¯ x ( t ) is computed by the low-levellayer to update the tracking error e ( t ) , as shown in Figure 2. Proposition 1.
Consider the closed-loop system (10) and (24) with mid-level input u m ( t ) ∈ U m and ˙ u m ( t ) = 0 , ∀ t ∈ T . IfAssumption 1 holds and the error e ( kT ) = x ( kT ) − ¯ x ( kT ) ∈E for all k ∈ Z , then the control policy (10) guaranteesthat e ( t ) ∈ E and u l ( t ) ∈ U l , ∀ t ∈ [ kT, ( k + 1) T ) . Proof:
The proof follows from standard CBF argu-ments [26]. First, we notice that the error e ( kT ) = x ( kT ) − ¯ x ( kT ) follows the error dynamics in (6). Furthermore, byconstruction the time-varying matrices ( A (cid:98) t/T (cid:99) , B (cid:98) t/T (cid:99) ) areconstant for t ∈ [ kT, ( k + 1) T ) . Therefore, for all k ∈ Z and t ∈ [ kT, ( k + 1) T ) , we have that error dynamicsin (6) are nonlinear control affine for the low-level input u l . This fact implies that, if at time t = kT the error e ( kT ) = x ( kT ) − ¯ x ( kT ) ∈ E , then from the feasibilityof the CLF-CBF QP (9) from Assumption 1 we have that e ( t ) = x ( t ) − ¯ x ( t ) ∈ E , ∀ t ∈ [ kT, ( k + 1) T ) . Remark 2.
We underline that Assumption 1 is satisfied forsome α ∈ K e and α ∈ K e when the set E is robustcontrol invariant for system (6) with u m ( t ) ∈ U m and mildassumptions on the Lie derivative of (6) hold (see [26] forfurther details). The set E may be hard to compute andstandard techniques are based on HJB reachability analysis [9],SOS programming [30], Lyapunov-based methods [31] andLipschitz properties of the system dynamics [34], [39]. Lemma 1 shows that between time t i = iT and t i +1 =( i + 1) T the difference between the state x and the state ¯ x of the reference trajectory is bounded. Next, we show thatthis property allows us to guarantee safety and convergence infinite time to the goal cell C k goal for the nonlinear system (1).In turns, convergence in finite time allows us to show thatthe high-level specifications are satisfied, when the followingassumption holds. Assumption 3.
For the environment state sequence z =[ z , z , . . . ] , we have that for all s ∈ S there exists a high-level control policy κ : S × B → A such that the high-leveltrajectory ω satisfies the specifications ψ with probability one. Theorem 1.
Let Assumptions 1-3 hold and consider sys-tem (24) in closed-loop with Algorithm 2. If at time t i = iT theMPC problem (15) is feasible with N ki = N and time-varyingcomponents X k goal , C k curr , C k goal and p k goal , then there exists a j ∈ { i, . . . , i + N − } such that the closed-loop system satisfiesstate and input constraints (2) for all t ∈ { iT, . . . , jT } andthe state x (( j +1) T ) = [ p (cid:62) (( j +1) T ) , q (cid:62) (( j +1) T )] (cid:62) reachesthe goal cell C k goal , i.e., p (( j + 1) T ) ∈ C k goal . Proof:
First, we show that Algorithm 1 returns a goalcell C k goal which is contained in the feasible set X p . FromAssumption 3 we have that, for all s ∈ S and the environmentstate sequence z = [ z , z , . . . ] , there exists a policy whichsatisfies the specifications with probability one. Consequentlythe high-level policy (22), which maximizes the probabilityof satisfying the specifications, takes an high-level action a k which avoids collision, i.e., C k goal ⊂ X p . (25)Next, we show that if at time t i = iT the MPC problem (15)is feasible with X k goal , C k goal , C k curr , p k goal and N ki > , then attime t i +1 = ( i + 1) T the MPC problem (15) is feasible with X k goal , C k goal , C k curr , p k goal and N ki +1 = N ki − . Let [ x d, ∗ i | i , x d, ∗ i +1 | i , . . . , x d, ∗ i + N ki | i ] and [ u d, ∗ i | i , . . . , u d, ∗ i + N ki − | i ] be the optimal state input sequence to the MPC problem (15)at time t i = iT . Then, from Lemma 1, equation (14) and thedefinition of the return map (19), we have that x (( i + 1) T ) − ¯ x d, ∗ i +1 | i = x (( i + 1) T ) − ¯ x (( i + 1) T ) ∈ E (26)and therefore, by standard MPC arguments, the followingsequences of N ki − states and N ki − inputs [ x d, ∗ i +1 | i , . . . , x d, ∗ i + N ki | i ] and [ u d, ∗ i +1 | i , . . . , u d, ∗ i + N ki − | i ] (27)are feasible at time t i +1 = ( i +1) T for the MPC problem (15)with X k goal , C k goal , C k curr , p k goal and N ki +1 = N ki − .Now, we show that state and input constraints are satisfieduntil the system reaches the goal set C k goal . Recall that byassumption the MPC problem is feasible at time t i = iT with X k goal , C k goal , C k curr , p k goal , N i = N and assume that p ( jT ) / ∈ C k goal for all j ∈ { i, . . . , i + N − } . By induction the MPCproblem (15) with X k goal , C k goal , C k curr , p k goal and N kj = N ki − j is feasible for all j ∈ { i, . . . , i + N − } . Consequently, Algorithms 1 returns a feasible mid-level control action u m ( t ) ∈ U m . Furthermore, from Lemma 1 we have the low-level controller returns a feasible control action u l ( t ) ∈ U l andtherefore u ( t ) = u l ( t ) + u m ( t ) ∈ U l ⊕ U m = U , ∀ t ∈ R . (28)The feasibility of the state-input sequences in (27) for the MPCproblem solved with X k goal , C k goal , C k curr , p k goal implies that x d, ∗ j | j ∈ X kp,q (cid:9) E x ( jT ) − x d, ∗ j | j ∈ E , (29) ∀ j ∈ { i, . . . , i + N − } . Consequently, from the above equationand definition (16), we have that p ( jT ) ∈ X p and q ( jT ) ∈ X q , ∀ j ∈ { i, . . . , i + N − } . Finally, we show that the state x ( t ) of the augmented sys-tem (24) in closed-loop with Algorithm 2 converges to the goalcell C k goal in finite time. We have shown that, if p ( jT ) / ∈ C k goal for all j ∈ { i, . . . , i + N − } , then the MPC problem isfeasible for all time t k = kT and k ∈ { i, . . . , i + N − } .Now we notice that by feasibility of the MPC problem attime t i + N − = ( i + N − T with N i + N − = 1 , we have thatthe optimal planned trajectory satisfies x d, ∗ i + N | i + N − ∈ X k goal (cid:9) E p . From Lemma 1, equation (14) and the definition of the returnmap (19), we have that x (( i + N ) T ) − ¯ x d, ∗ i + N | i = x (( i + N ) T ) − ¯ x (( i + N ) T ) ∈ E . The above equation together with definition (17) imply that attime t i + N = ( i + N ) Tx (( i + N ) T ) = (cid:20) p (( i + N ) T ) q (( i + N ) T ) (cid:21) ∈ X k goal (cid:9) E p ⊕ E and therefore p (( i + N ) T ) ∈ C k goal .Concluding, if for all time t j = jT and j ∈ { i, . . . , i + N − } we have that p ( jT ) / ∈ C k goal , then p (( i + N ) T ) ∈ C k goal .Thus, the closed-loop system converges to the goal cell C k goal in finite time.Finally, we leverage Theorem 1 to show that the multi-ratecontrol Algorithm 2 steers the system in finite time to goalcell C k goal for all k ∈ Z and, consequently, the closed-loopsystems satisfies the high-level specifications when Assump-tion 3 is satisfied. In particular, we show that the contingencyplan from lines 10-14 of Algorithm 2 guarantees feasibility ofthe planner when the time-varying components are updated. Theorem 2.
Let Assumptions 1-3 hold and consider sys-tem (24) in closed-loop with Algorithm 2. If x (0) ∈ X k curr ⊕ E ,then the closed-loop system (2) and (24) satisfies the high-levelspecifications. Proof:
The proof follows by induction. Assume that attime t i = iT the closed-loop system reaches the goal Note that as p ( jT ) / ∈ C k goal for all j ∈ { i, . . . , i + N − } the MPCtime-varying components are not updated. cell C k goal , i.e, p ( iT ) ∈ C k goal . Then, at time t i = iT wehave that the high-level decision maker from Algorithm 2(lines 2-9) updates the high-level time and the time-varyingcomponents X k +1 goal , C k +1 curr , C k +1 goal , p k +1 goal used to design the MPCproblem (15). After the high-level update, the MPC problemwith N = N k +1 j , X k +1 goal , C k +1 curr , C k +1 goal , and p k +1 goal maybe eitherfeasible or unfeasible . Thus, we analyse the following threecases for j ≥ i : Case 1:
The MPC problem with C k +1 goal , C k +1 curr , p k +1 goal and N = N k +1 j is feasible, therefore from Theorem 1 we havethat Algorithm 2 steers the nonlinear system to the goal C k +1 goal . Case 2:
The MPC problem with C k +1 goal , C k +1 curr , p k +1 goal and N = N k +1 j is not feasible and N kj = 1 . Then from Theorem 1, wehave that the contingency MPC with N kj , C k goal , C k curr and p k goal is feasible and Algorithm 1 returns a feasible control action.Furthermore, as N kj = 1 the terminal state of the optimalpredicted trajectory is x d, ∗ j +1 | j ∈ X k goal (cid:9) E p . The above equation together with equation (26) imply that x (( j + 1) T ) ∈ X k goal (cid:9) E p ⊕ E ⊂ X k goal ⊕ E , therefore from Assumption 2 we have that at the next timestep t j +1 = ( j + 1) T the MPC problem with N k +1 j +1 = N , X k +1 goal , C k +1 goal , C k +1 curr and p k +1 goal is feasible and, from Theorem 1,we have that Algorithm 2 steers the nonlinear system to thegoal C k +1 goal in finite time. Case 3:
The MPC problem with C k +1 goal , C k +1 curr , p k +1 goal and N = N k +1 j is not feasible and N kj > . Then from Theorem 1, wehave that the contingency MPC with N kj , C k goal , C k curr and p k goal is feasible. Fig. 4. Closed-loop trajectory. The Segway first explores regions R , whichis traversable, and G that does not contain the science sample. Afterwards,it explores the traversable region R and it reaches G . Unfeasiblity may be caused by the update of C k +1 goal , C k +1 curr and X k +1 curr . Fig. 5. This figure shows the closed-loop probability of mission success,which equals the probability of satisfying the high-level specifications. Fur-thermore, we reported also the belief about regions R and R being freeabout the goal regions G and G containing the science sample. Concluding, we have that by assumption x (0) ∈ X k curr ⊕ E ,which from Assumption 2 implies that at time t = 0 the MPCis feasible and therefore by Theorem 1 Algorithm 2 steerssystem (1) to G . Afterwards, we have that as N kj +1 = N kj − Case 3 occurs at most N times until the conditionsfrom Case 1 or Case 2 hold. Therefore, from Cases 1-2, wehave that Algorithm 2 steers system (24) to the goal cell C k goal for all k ∈ Z . Consequently, as the high-level policy (22)maximizes the probability of satisfying the specifications andfrom Assumption 3 there exists a policy which completes thecontrol task with probability one, we have that the closed-loopsystem satisfies the specifications.V. R ESULTS
We tested the proposed strategy in simulation and ex-periment on navigation tasks inspired by the Mars explo-ration mission [5], [6], [14]. We control a Segway-like robotand our goal is to explore the environment to find sci-ence samples which may be located in known goal regions G i with some probability. The high-level specification is ψ = ¬ collision U sample , where the atomic proposition collision is true when the Segway is in a cell which isnot traversable and the atomic proposition sample is true when the Segway is in a goal cell G i which contains ascience sample. While performing the search task, we haveto collect observations to determine the state of the uncertainregion R i , which may be traversable with some probability.The controller has access to only partial observations aboutthe environment. In particular, the Segway receives a perfectobservation about the state of the uncertain region R i whenone cell away, an observation which is correct with probability . , when the Manhattan distance is smaller than two, and anuninformative observations otherwise. Similarly, the Segwayreceives a partial observation about the goal region G i which is Fig. 6. This figure shows the computational time associated with middle andlow layers. It takes on average ms to compute the mid-level control actionsand less than ms to compute the low-level commands. In this example themiddle layer is discretized at Hz and the lowest level at kHz. correct with probability . , when one cell away and a perfectobservations when the goal cell G i is reached.The state of the Segway is x = [ X, Y, θ, v, ˙ θ, ψ, ˙ ψ ] , where ( X − Y ) represents the position of the center of mass, ( θ, ˙ θ ) theheading angle and yaw rate, v the velocity and ( ψ, ˙ ψ ) the rod’sangle and angular velocity. The control input u = [ T l , T r ] ,where T l and T r are the torques to the left and right wheelmotors, respectively. In order to implement the low-level CLF-CBF QP we used the following function: h ( e ) = 1 − || diag ( v h )( x − ¯ x ) || (30)where v h = [1 / . , / . , / . , / . , / . , / . , / . and ¯ x = [ ¯ X, ¯ Y , ¯ θ, ¯ v, ˙¯ θ, ¯ ψ, ˙¯ ψ ] represents the state of thenominal system from (5). The candidate control Lyapunovfunction is V ( e ) = || diag ( v v )( x − ¯ x ) || where v v = [100 , , , , , , and inthe CLF-QBF QP (9) we used c = 1 , c = 10 and α ( x ) = x . The planning model (5) is computed iterativelylinearizing the Segway dynamics around the predicted MPCtrajectory. This strategy is standard in MPC, for more detailson the linearization strategy please refer to [40]. The stagecost h ( x, u ) = x (cid:62) Qu + u (cid:62) Ru and the tuning matricesare Q = dial (0 . , . , , , , , , R = diag (0 . , . and Q f = diag (100 , . Furthermore, we added an inputrate cost with penalty Q rate = 0 . and a slack variablefor the terminal constraint on the state q t + N | i with weight Q slack = diag (100 , , . Finally, we approximated S = { e = x − ¯ x ∈ R n : h ( e ) ≥ } = { e = x − ¯ x ∈ R n : || diag ( v v )( x − ¯ x ) || ≤ } with ¯ S = { e = x − ¯ ∈ R n : || diag ( v v )( x − ¯ x ) || ∞ ≤ } . This strategy allows us to write Fig. 7. Comparison between the barrier function associated with the proposedstrategy and a naive strategy MPC which is based on the linearized dynamics.As shown in the figure, when the low-level controller is not used the differencebetween the planner trajectory and the MPC trajectory grows and, as a results,the barrier function (30) becomes negative. the MPC problem (15) as a QP , which we solved usingOSQP [41]. A. Simulation
We implemented the proposed strategy in our high-fidelityRobotic Operating System (ROS) simulator. Figure 4 showsthe locations of the uncertain and goal regions. The code canbe found at https://github.com/DrewSingletary/segway_sim , please check the
REAME.md to replicate ourresults. In this example the goal regions G and G may containa science sample with probability . and . , respectively.Whereas, regions R and R may be traversable with proba-bility . and . , as shown in Figure 5.Figure 4 shows the closed-loop trajectory of the Segway.We notice that the controller explores the uncertain region R ,which in this example is traversable and afterwards it reachesthe goal regions G . As shown in Figure 5, at the high-leveltime k = 19 the controller figures out that the goal cell G doesnot contain a science sample and, consequently, the probabilityof mission success drops. Afterwards, the controller steers theSegway to the traversable region R and to the goal regions G . In this example, the goal regions G contains a sciencesample and the mission is completed successfully, as shownin Figure 5.The mid-level is discretized for T = 50 ms and the low-levelat kHz. Figure 6 shows the computational time associatedwith mid-level and low-level control actions. It takes onaverage ms to compute the mid-level control action u m ( t ) and less than ms to compute the low-level action u l ( t ) .Finally, we analyse the evolution of the barrier function (30),which quantities the difference between the trajectory x ( t ) of Note that using S renders the MPC problem an SOCP, which is convexbut computationally more demanding. Fig. 8. Input torque sent to the right (top) and left (bottom) motor overa period of . second. The mid-level input is updated at Hz, whereasthe low-level action is updated at kHz. Notice that the total input is thesummation of the low and mid-level inputs. system (1) and the reference trajectory ¯ x ( t ) associated withnominal model (5). We compared the proposed strategy witha naive MPC which is synthesized as in (15), but withouttaking into account the effect of the tracking error, i.e., we donot tighten the constraints and we set x i | i = x ( t ) . Figure 7shows the evolution of the barrier function for the proposedstrategy and the naive MPC. We notice that when the low-level controller is not used, the barrier function becomesnegative and in general has a lower magnitude. Therefore, thisfigure shows the advantage of the proposed hierarchical controlarchitecture, where the low-level high-frequency controller isleveraged to track the reference trajectory. Indeed, this high-frequency feedback is used to modify the mid-level controlactions, as shown in Figure 8. As discussed, the mid-levelcontrol action is updated at Hz and the low-level inputat kHz. Notice that after the update of the mid-level input,the contribution of the low-level input towards the total controlaction is limited. However, as time progresses the linearizationused to plan the reference trajectory is less and less accurateand for this reason, the magnitude of low-level controllerincreases. B. Experiment
We implemented the proposed multi-rate hierarchical con-trol strategy on the Segway-like robot shown in Figure 1. Stateestimation is based on wheel encoders and IMU data froma VectorNav VN-100. The state estimate and the low-levelcontrol action u l are computed at Hz on the Segway,which is equipped with an ARM Cortex-A57 (quad-core) @2 GHz CPU running the ERIKA3 RTOS. On the other hand,the mid-level planner discretized at Hz and the high-leveldecision maker run on a desktop with an Intel Core i7-8700 In this example the nominal model is computed iteratively linearizingthe nonlinear dynamics Fig. 9. Experimental comparison between the barrier function associatedwith the proposed strategy and a naive MPC which is based on the linearizeddynamics. Also in this case, when the low-level controller is not used, thedifference between the planner trajectory and the MPC trajectory grows and,as a results, the barrier function (30) becomes negative.
CPU (6-cores) @ 3.7 GHz CPU, which sends the referencetrajectory ¯ x and the reference input u m via WiFi.Figure 1 shows the location of the three uncertain regions R , R and R which may be traversable with probability . , . and . , respectively. In this example, we assumethat the goal region G contains the science sample withprobability 1. Figure 10 shows the closed-loop trajectory. First,the controller explores region R , which is not traversableand afterwards it steers the Segway towards regions R and R . After collecting observations about the environment, thecontroller detects that region R is not traversable and thatregion R is free space that the Segway can navigate throughto reach the goal region G . A video of the experiment andcomparison with a naive MPC can be found at .Figure 9 shows the evolution of the control barrier func-tion (30). We compare the proposed strategy with a naive Fig. 10. Closed-loop trajectory during the experiment. The Segway firstexplores the uncertain regions R , R and R and. afterwards it reachesthe goal region. Fig. 11. Experimental results. Input torque sent to the right (top) and left(bottom) motor over a period of . seconds. The mid-level input is updatedat Hz, whereas the low-level action is updated at
Hz. Notice that thetotal input is the summation of the low and mid-level inputs.
MPC which is designed as in (15), but without robustifyingthe constraint sets and setting x i | i = x ( t ) . Also in this case,when the high-frequency low-level controller is not active,the barrier function becomes negative meaning that the error e does not belong to the safe set E , i.e., e ( t ) / ∈ E for all t ∈ R . This result highlights the importance of the low-level high-frequency feedback from the CLF-CBF QP, whichcompensates for the model mismatch at the planning layer.Indeed, the MPC planner uses a linearized and discretizedmodel, which is a first order approximation of the true dy-namics. This approximation is accurate at the discrete timeinstances when the MPC input is computed. For this reason,the low-level CLF-CBF QP tracking controller computes thehigh-frequency component u l ( t ) which corrects the mid-levelpiecewise constant input u m ( t ) , as shown in Figure 11.VI. C ONCLUSIONS
In this paper we presented a multi-rate hierarchical con-trol architecture for navigation tasks in partially observableenvironments. At the lowest level we leverage a CLF-CBFQP, which is used to track a reference trajectory within someerror bounds. The reference trajectory is computed by a mid-level planner which leverages an MPC with time-varyingterminal components. The feasibility of the MPC planner isguaranteed via a contingency scheme and a local reachabilityassumption on the planning model. Finally, at the highestlevel of abstraction, we showed how to model the system-environment interaction using a MOMDP and we proposed analgorithm to update the MPC time-varying components. Theeffectiveness of the proposed strategy is shown on navigationexamples, where a Segway-like robot has to find sciencesamples, while avoiding partially observable obstacles. VII. A
CKNOWLEDGEMENTS
The authors would like to thank Geoffroy le Courtois duManoir for helping with the experiments.R
EFERENCES[1] T. Wongpiromsarn, U. Topcu, and R. M. Murray, “Receding horizontemporal logic planning,”
IEEE Transactions on Automatic Control ,vol. 57, no. 11, pp. 2817–2830, 2012.[2] T. Wongpiromsarn, U. Topcu, and R. M. Murray, “Receding horizoncontrol for temporal logic specifications,” in
Proceedings of the 13thACM international conference on Hybrid systems: computation andcontrol , 2010, pp. 101–110.[3] P. Tabuada and G. J. Pappas, “Linear time logic control of discrete-time linear systems,”
IEEE Transactions on Automatic Control , vol. 51,no. 12, pp. 1862–1877, 2006.[4] R. Alur, T. A. Henzinger, G. Lafferriere, and G. J. Pappas, “Discreteabstractions of hybrid systems,”
Proceedings of the IEEE , vol. 88, no. 7,pp. 971–984, 2000.[5] S. Haesaert, R. Thakker, R. Nilsson, A. Agha-mohammadi, and R. M.Murray, “Temporal logic planning in uncertain environments with prob-abilistic roadmaps and belief spaces,” in . IEEE, 2019, pp. 6282–6287.[6] S. Haesaert, P. Nilsson, C. I. Vasile, R. Thakker, A.-a. Agha-mohammadi, A. D. Ames, and R. M. Murray, “Temporal logic con-trol of pomdps via label-based stochastic simulation relations,”
IFAC-PapersOnLine , vol. 51, no. 16, pp. 271–276, 2018.[7] S. Kousik, S. Vaskov, F. Bu, M. Johnson-Roberson, and R. Vasude-van, “Bridging the gap between safety and real-time performance inreceding-horizon trajectory design for mobile robots,” arXiv preprintarXiv:1809.06746 , 2018.[8] Y. S. Shao, C. Chao, S. Kousik, and R. Vasudevan, “Reachability-basedtrajectory safeguard (rts): A safe and fast reinforcement learning safetylayer for continuous control,” arXiv preprint arXiv:2011.08421 , 2020.[9] S. L. Herbert, M. Chen, S. Han, S. Bansal, J. F. Fisac, and C. J. Tomlin,“Fastrack: A modular framework for fast and guaranteed safe motionplanning,” in . IEEE, 2017, pp. 1517–1522.[10] A. Pnueli, “The temporal logic of programs,” in . IEEE, 1977, pp. 46–57.[11] S. G. Loizou and K. J. Kyriakopoulos, “Automatic synthesis of multi-agent motion tasks based on ltl specifications,” in ,vol. 1. IEEE, 2004, pp. 153–158.[12] G. E. Fainekos, H. Kress-Gazit, and G. J. Pappas, “Hybrid controllersfor path planning: A temporal logic approach,” in
Proceedings of the44th IEEE Conference on Decision and Control . IEEE, 2005, pp.4885–4890.[13] M. Kloetzer and C. Belta, “A fully automated framework for control oflinear systems from temporal logic specifications,”
IEEE Transactionson Automatic Control , vol. 53, no. 1, pp. 287–297, 2008.[14] P. Nilsson, S. Haesaert, R. Thakker, K. Otsu, C.-I. Vasile, A.-A. Agha-Mohammadi, R. M. Murray, and A. D. Ames, “Toward specification-guided active mars exploration for cooperative robot teams,”
Robotics:Science and Systems (RSS) , 2018.[15] M. Bouton, J. Tumova, and M. J. Kochenderfer, “Point-based methodsfor model checking in partially observable markov decision processes.”in
AAAI , 2020, pp. 10 061–10 068.[16] C.-I. Vasile, K. Leahy, E. Cristofalo, A. Jones, M. Schwager, andC. Belta, “Control in belief space with temporal logic specifications,” in . IEEE,2016, pp. 7419–7424.[17] Y. Wang, S. Chaudhuri, and L. E. Kavraki, “Bounded policy syn-thesis for pomdps with safe-reachability objectives,” arXiv preprintarXiv:1801.09780 , 2018.[18] M. Ahmadi, R. Sharan, and J. W. Burdick, “Stochastic finite state controlof pomdps with ltl specifications,” arXiv preprint arXiv:2001.07679 ,2020.[19] M. Kwiatkowska, G. Norman, and D. Parker, “Prism 4.0: Verification ofprobabilistic real-time systems,” in
International conference on computeraided verification . Springer, 2011, pp. 585–591.[20] C. Dehnert, S. Junges, J.-P. Katoen, and M. Volk, “A storm is coming:A modern probabilistic model checker,” in
International Conference onComputer Aided Verification . Springer, 2017, pp. 592–600. [21] E. Altman, Constrained Markov decision processes . CRC Press, 1999,vol. 7.[22] E. J. Sondik, “The optimal control of partially observable markov pro-cesses over the infinite horizon: Discounted costs,”
Operations research ,vol. 26, no. 2, pp. 282–304, 1978.[23] P. Poupart and C. Boutilier, “Bounded finite state controllers,” in
NIPS ,2003.[24] J. Pineau, G. Gordon, S. Thrun et al. , “Point-based value iteration: Ananytime algorithm for pomdps,” in
IJCAI , vol. 3, 2003, pp. 1025–1032.[25] T. Gurriet, A. Singletary, J. Reher, L. Ciarletta, E. Feron, and A. Ames,“Towards a framework for realizable safety critical control through activeset invariance,” in . IEEE, 2018, pp. 98–106.[26] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrierfunction based quadratic programs for safety critical systems,”
IEEETransactions on Automatic Control , vol. 62, no. 8, pp. 3861–3876, Aug2017.[27] L. Wang, A. D. Ames, and M. Egerstedt, “Safety barrier certificatesfor collisions-free multirobot systems,”
IEEE Transactions on Robotics ,vol. 33, no. 3, pp. 661–674, 2017.[28] K. P. Wabersich and M. N. Zeilinger, “Linear model predictive safetycertification for learning-based control,” in . IEEE, 2018, pp. 7130–7135.[29] H. Yin, M. Bujarbaruah, M. Arcak, and A. Packard, “Optimizationbased planner tracker design for safety guarantees,” arXiv preprintarXiv:1910.00782 , 2019.[30] S. Singh, M. Chen, S. L. Herbert, C. J. Tomlin, and M. Pavone,“Robust tracking with model mismatch for fast and safe planning: ansos optimization approach,” arXiv preprint arXiv:1808.00649 , 2018.[31] S. Singh, A. Majumdar, J.-J. Slotine, and M. Pavone, “Robust online mo-tion planning via contraction theory and convex optimization,” in .IEEE, 2017, pp. 5883–5890.[32] Y. Gao, A. Gray, H. E. Tseng, and F. Borrelli, “A tube-based robust non-linear predictive control approach to semiautonomous ground vehicles,”
Vehicle System Dynamics , vol. 52, no. 6, pp. 802–823, 2014.[33] M. K¨ogel and R. Findeisen, “Discrete-time robust model predictive con-trol for continuous-time nonlinear systems,” in . IEEE, 2015, pp. 924–930.[34] S. Yu, C. Maier, H. Chen, and F. Allg¨ower, “Tube mpc scheme basedon robust control invariant set with application to lipschitz nonlinearsystems,”
Systems & Control Letters , vol. 62, no. 2, pp. 194–200, 2013.[35] J. K¨ohler, R. Soloperto, M. A. Muller, and F. Allgower, “A computa-tionally efficient robust model predictive control framework for uncertainnonlinear systems,”
IEEE Transactions on Automatic Control , 2020.[36] U. Rosolia and A. D. Ames, “Multi-rate control design leveragingcontrol barrier functions and model predictive control policies,”
IEEEControl Systems Letters , vol. 5, no. 3, pp. 1007–1012, 2021.[37] D. Q. Mayne, M. M. Seron, and S. Rakovi´c, “Robust model predic-tive control of constrained linear systems with bounded disturbances,”
Automatica , vol. 41, no. 2, pp. 219–224, 2005.[38] S. C. Ong, S. W. Png, D. Hsu, and W. S. Lee, “Planning underuncertainty for robotic tasks with mixed observability,”
The InternationalJournal of Robotics Research , vol. 29, no. 8, pp. 1053–1068, 2010.[39] Y. Chen, H. Peng, J. Grizzle, and N. Ozay, “Data-driven computationof minimal robust control invariant set,” in . IEEE, 2018, pp. 4052–4058.[40] U. Rosolia and F. Borrelli, “Learning how to autonomously race a car:a predictive control approach,”
IEEE Transactions on Control SystemsTechnology , 2019.[41] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “Osqp:An operator splitting solver for quadratic programs,”