Temporal logic control of general Markov decision processes by approximate policy refinement
aa r X i v : . [ c s . S Y ] N ov Temporal logic control of general Markovdecision processes by approximate policyrefinement
Sofie Haesaert , Sadegh Soudjani , and Alessandro Abate California Institute of Technology, United States School of Computing, Newcastle University, United Kingdom Computer Science Department, Oxford University, United Kingdom
Abstract.
The formal verification and controller synthesis for Markovdecision processes that evolve over uncountable state spaces are compu-tationally hard and thus generally rely on the use of approximations. Inthis work, we consider the correct-by-design control of general Markovdecision processes (gMDPs) with respect to temporal logic propertiesby leveraging approximate probabilistic relations between the originalmodel and its abstraction. We newly work with a robust satisfactionfor the construction and verification of control strategies, which allowsfor both deviations in the outputs of the gMDPs and in the probabilis-tic transitions. The computation is done over the reduced or abstractedmodels, such that when a property is robustly satisfied on the abstractmodel, it is also satisfied on the original model with respect to a refinedcontrol strategy.
With the ever more ubiquitous embedding of digital components into physicalsystems new computational efficient verification and control synthesis methodsfor these cyber-physical systems are needed. The correct functioning of cyber-physical systems can only be expressed over the combined behaviour of boththe digital component and its connected physical system. Quite importantly,stochastic models are key when computers interact with physical systems such asbiological processes, power networks, and smart-grids. These dynamic systemswith uncertainty and non-determinism can be modelled as Markov processesevolving over continuous spaces. Their potential safety critical impact on theenvironment they interact with makes it of particular interest to develop formalmethods that assist in their verifiable design. In this work, we newly enable theverification and synthesis of these stochastic systems with respect to probabilisticlinear temporal logic properties.As our modelling framework, we consider the rich class of general Markovdecisions processes (gMDPs), which are Markov decision processes evolving overcontinuous or uncountable state spaces and which have control-dependent stochas-tic transitions in combination with a metric output space. It is over this output that we define properties of interest. The characterisation of properties over suchprocesses can in general not be attained analytically [3], so an alternative is toapproximate these models by simpler processes, such as finite-state MDP [9] orcontinuous-space reduced order models [21] that are prone to be mathemati-cally analysed or algorithmically verified [10]. In [13,12] we have proposed newapproximate similarity relations to quantify the accuracy of the approximationutilising bounds on the output distance and on the transition probabilities. Wehave shown that for bounded safety properties these approximate similarity re-lations can be used to refine control strategies with bounded error on transitionprobability and deviations in the output space. The main goal of this paper isto study general temporal logic properties of gMDPs in combination with theseapproximate similarity relations. As the main contribution of the paper we showthat the standard verification and control synthesis for gMDPs can be maderobust a-priori to the introduced accuracy errors.
Related Work.
Properties defined in PCTL, PLTL, and PCTL ∗ for finite-state Markov (decision) processes can be verified using tools such as PRISM [17].Moreover it is also well-known how to design policies, i.e., to control these Markovdecision processes such that the satisfaction of these properties is maximised.The work in [2] has studied model-checking of automata specifications againstautonomous (i.e. uncontrolled) discrete-time stochastic models over uncountablestate spaces. It was shown that the computation of the probability of satisfying aspecification expressed as deterministic finite automaton (DFA) can be restatedin terms of a probabilistic reachability problem over the product between theoriginal model and the DFA. This result has been extended in [22] to the caseof controlled discrete-time Markov processes, which are a special subclass ofgMDPs introduced in our work (obtained with identity output map, h ( x ) = x for all x ∈ X , and so with an output space Y = X ). In this work we extendthe model class discussed in [22] and build on their results by showing that anapproximate model can be used to compute the solutions of Bellman equationsassociated to the stochastic reachability problem.The paper is organised as follows. In the next section, we first define gMDPsand state the temporal logic control problem. In Section 3, we define approx-imate simulation relation on gMDPs and solve the robust probabilistic reach-ability problem. In Section 4, we extend the results to probabilistic temporallogic control utilising the product of gMDP and DFA. In Section 5, we detail theapproximation procedure for linear stochastic dynamical systems and we applyit to a simple toy case. Throughout the paper proofs of the theorems have beenrelegated to the appendix of the extended version [14]. In this work, we only consider Borel measurable spaces, i.e., ( X , B ( X )), and werestrict our attention to Polish spaces [5]. Together with the measurable space ( X , B ( X )), a probability measure P defines the probability space, denoted by( X , B ( X ) , P ) and has realisations x ∼ P . Let us further denote the set of allprobability measures for a given measurable space ( X , B ( X )) as P ( X , B ( X )).For the sets A and B a relation R ⊂ A × B is a subset of the Cartesianproduct A × B . The relation R relates x ∈ A with y ∈ B if ( x, y ) ∈ R , which isequivalently written as x R y . For a given set Y a metric or distance function d Y isa function d Y : Y × Y → R ≥ satisfying the following conditions: ∀ y , y , y ∈ Y : d Y ( y , y ) = 0 iff y = y ; d Y ( y , y ) = d Y ( y , y ); and d Y ( y , y ) ≤ d Y ( y , y ) + d Y ( y , y ). General Markov decision processes are related to control Markov processes [1]and Markov decision processes [4,20,15], and formalised as follows.
Definition 1 (general Markov decision process (gMDP)).
A discrete-time gMDP is a tuple M = ( X ,π, T , U ,h, Y ) with – X , an (uncountable) Polish state space with states x ∈ X as its elements; – U , the set of controls which is a Polish space; – π , the initial probability measure π : B ( X ) → [0 , ; – T : X × U × B ( X ) → [0 , , a conditional stochastic kernel assigning to eachstate x ∈ X and control u ∈ U probability measure T ( · | x, u ) over ( X , B ( X )) ; – Y , the output space decorated with metric d Y ; and – h : X → Y , a measurable output map. For any set A ∈ B ( X ), P x,u ( x ( t + 1) ∈ A ) = R A T ( dx ′ | x ( t ) = x, u ), where P x,u denotes the conditional probability P ( · | x, u ). At every state, the state transitiondepends non-deterministically on the choice of u ∈ U . When chosen accordingto a probability measure µ u : B ( U ) → [0 , µ u and denote the transition kernel as T ( ·| x, µ u ) = R U T ( ·| x, u ) µ u ( du ) ∈P ( X , B ( X )). Given a string of inputs { u ( t ) } t ≤ N := u (0) , u (1) , . . . , u ( N ), overa finite time horizon [0 , N ], and an initial condition x (sampled from π ), thestate at the ( t + 1)-st time instant, x ( t + 1), is obtained as a realisation of thecontrolled Borel-measurable stochastic kernel T ( · | x ( t ) , u ( t )) – these semanticsinduce paths (or executions) of the gMDP. Further, output traces of gMDPare obtained by applying the output map h ( · ) to the paths of gMDP, namely { y ( t ) } t ≤ N := y (0) , y (1) , . . . , y ( N ) with y ( t ) = h (cid:0) x ( t ) (cid:1) for all t ∈ [0 , N ]. Denotethe class of all gMDP with the same metric output space Y as M Y .A policy is a selection of control inputs based on the past history of states andcontrols. When the selected controls are only dependent on the current states,the policy is referred to as Markov. Definition 2 (Markov policy).
For a gMDP M = ( X , π, T , U , h, Y ) , a Markovpolicy µ is a sequence µ = ( µ , µ , µ , . . . ) of universally measurable maps µ t = X → P ( U , B ( U )) , t ∈ N := { , , , . . . } , from the state space X to the set ofcontrols. We allow controls to be selected via universally measurable maps [4] from thestate to the space of stochastic control inputs, so that properties such as safetycan be maximised [3]. We introduce the notion of a control strategy, and defineit as a broader, memory-dependent version of the Markov policy above. Thisstrategy is formulated again as a gMDP that takes as its input the state of theto-be-controlled gMDP.
Definition 3 (Control strategy).
A control strategy C = ( X C , x C , X , T C , h C ) for a gMDP M = ( X , π, T , U , h, Y ) is a gMDP with state space X C ; initial state x C ; input space X ; universally measurable kernels T C : X C × X ×B ( X C ) → [0 , ;and with universally measurable output map h C : X C → P ( U , B ( U )) . Note that the stochastic transitions for the control strategy and the gMDP are se-lected in an alternating fashion. The output map of the strategy is indexed basedon the time instant at which the resulting policy will be applied to the gMDP.This is further elucidated in Algorithm 1. The execution { ( x ( t ) , x C ( t )) , t ∈ [0 , N ] } Algorithm 1
Execution semantics of the controlled model C × M . procedure Execution ( C , M )2: set t := 0 and x C (0) := x C
3: draw x (0) ∼ π for t ≤ N do
5: draw x C ( t + 1) ∼ T C ( · | x C ( t ) , x ( t ))6: draw u ( t ) from µ t := h C ( x C ( t + 1))7: draw x ( t + 1) ∼ T ( · | x ( t ) , u ( t )))8: set t := t + 19: end for return { x ( t ) } t ≤ N end procedure π MC ... x (0) x (1) x C (0) x C (1) x C (2) x (0) x (1) x (2) u (0) u (1) y (0) y (1) y (2) of a gMDP M controlled with strategy C (denoted by C × M ) is defined on thecanonical sample space Ω := ( X × X C ) N +1 endowed with its product topology B ( Ω ) and with a unique probability measure P C × M . A Markov policy is as a spe-cial case of control strategy which does not have an internal state that can beused to remember relevant past events. Remark 1.
Any Markov policy µ as defined in Def. 2 can be written as a controlstrategy C µ := ( X µ , x µ , X , T µ , h µ ) with X µ := Q × X , for which Q is the setof time indices Q := {− , , , , , . . . } , and with x µ := ( − , x ) the initialstate for some x ∈ X . The probability measure on the next control state x ′ µ =( q ′ , ˜ x ′ ) ∈ X µ given x ( t ) and given the current state x µ = ( q, ˜ x ) is defined with astochastic kernel as T µ ( A | ( q, ˜ x ) , x ( t )) := (cid:26) q + 1 , x ( t )) ∈ A, . The policies can then be embedded into the control strategy via the output map h µ (( q, ˜ x )) := µ q (˜ x ). iv Consider a measurable target set K ⊂ Y . We say that an output trace { y ( t ) } t ≤ N reaches a target set K , if there exists a time t ∈ [0 , N ] such that y ( t ) ∈ K .This bounded reaching of K is denoted by ♦ ≤ N { y ∈ K } or briefly ♦ ≤ N K .For N → ∞ , we denote the reachability property as ♦ K , i.e., eventually K .For a given gMDP M with control strategy C , a verification task consists ofquantifying the probability that an output trace of C × M reaches K withinthe time horizon [0 , N ] , i.e., P C × M ( ♦ ≤ N K ), or that the target set K is eventuallyreached, i.e., P C × M ( ♦ K ), and verifying that it is within a given threshold.More complex properties can be described using temporal logic. Consider aset of atomic propositions AP , the alphabet Σ := 2 AP , and infinite words thatare string composed of elements from Σ , ω = ω (0) , ω (1) , ω (2) , . . . ∈ Σ N . Of in-terest are atomic propositions that are connected to the gMDP via a measurablelabelling function L : Y → Σ from the output space to the alphabet Σ . Via itstrivial extension, output traces { y ( t ) } t ≥ ∈ Y N can be mapped to the set ofinfinite words Σ N , as ω = L ( { y ( t ) } t ≥ ) := { L ( y ( t )) } t ≥ . Consider linear-timetemporal logic properties with syntax ψ ::= true | p | ¬ ψ | ψ ∧ ψ | (cid:13) ψ | ψ U ψ . (1)Let ω t = ω ( t ) , ω ( t +1) , ω ( t +2) , . . . be a postfix of the word ω , then the satisfactionrelation between ω and a property ψ , expressed via LTL, is denoted by ω (cid:15) ψ (or equivalently ω (cid:15) ψ ). The semantics of the satisfaction relation are definedrecursively over ω t and the syntax of the LTL formula ψ . An atomic proposition p ∈ AP is satisfied by ω t , i.e., ω t (cid:15) p , iff p ∈ ω ( t ). Furthermore, ω t (cid:15) ¬ ψ if ω t ψ and we say that ω t (cid:15) ψ ∧ ψ if ω t (cid:15) ψ and if ω t (cid:15) ψ . The nextoperator ω t (cid:15) (cid:13) ψ holds if the property holds at the next time instance ω t +1 (cid:15) ψ .The temporal until operator ω t (cid:15) ψ U ψ holds if ∃ i ∈ N : ω t + i (cid:15) ψ , and ∀ j ∈ N : 0 ≤ j < i, ω t + j (cid:15) ψ . Based on these semantics, operators suchas disjunction ( ∨ ) can also be defined through the negation and conjunction: ω t (cid:15) ψ ∨ ψ ⇔ ω t (cid:15) ψ or ω t (cid:15) ψ .We are interested in a fragment of LTL property known as syntactically co-safe temporal logic (scLTL) [16]. Even though scLTL formulas are interpretedover infinite words, their satisfaction is guaranteed in finite time. This fragmentis defined as follows. Definition 4.
A scLTL over a set of atomic propositions AP has syntax ψ ::= true | p | ¬ p | ψ ∧ ψ | ψ ∨ ψ | (cid:13) ψ | ψ U ψ | ♦ ψ (2) with p ∈ AP . iv Note that for this construction, the separability of the Polish space X is importantas otherwise T µ would not be measurable in general. In the remainder, we will mainly consider scLTL properties since their verifica-tion can be computed via a reachability property over a finite state automaton[16]. With respect to a scLTL property ψ , we say that a gMDP M satisfies ψ fora given control strategy C with probability at least p iff P C × M ( L ( { y ( t ) } t ≥ ) | = ψ ) ≥ p . Apart from this verification task, we are mostly interested in synthesisingcontrol strategies such that C × M satisfies the inequality. In this paper, we tackle the control synthesis for the (bounded) probabilisticreachability problem and for the temporal logic control problem, defined next.
Problem 1 ((Bounded) probabilistic reachability)
Given a gMDP M anda set K ⊂ Y , compute a control strategy C that maximises the bounded reacha-bility probability P C × M ( ♦ ≤ N K ) or the reachability probability P C × M ( ♦ K ) . Problem 2 (Temporal logic control)
Given a gMDP M , a scLTL property ψ and a labelling function L , compute a control strategy C that maximises theprobability of controlled Markov process C × M satisfying ψ , i.e., max C P C × M ( L ( { y ( t ) } t ≥ ) | = ψ ) . (3)Computation of (3) for a given controlled Markov process is generally impossible.In the next sections, we give a robust computations based on ( ǫ, δ )-probabilisticsimulation relations between a given model and its approximation. ǫ, δ )-probabilistic simulation relations Let a gMDP M = ( X , π, T , U , h, Y ) and a target set K ⊂ Y be given. In therobust formulation of Problem 1, we compute a control strategy C and a quan-tified lower bound p , such that the probability of satisfying ♦ ≤ N K , respectively ♦ K , is lower bounded by p , i.e., P C × M ( ♦ ≤ N K ) ≥ p , respectively P C × M ( ♦ K ) ≥ p. We associate to the target set K ⊂ Y , the corresponding set in the state space K X := h − ( K ) ∈ B ( X ). Let us denote by r µ ( K X , N ), the probability that K isreached within N time steps when a Markov policy µ = ( µ , µ , . . . , µ N − ) isused. We iterate the computation of stochastic reachability of a Markov deci-sion process, as explain in [3]. The value of r µ ( K X , N ) can be computed by abackward recursion initialised with V N = 0, and iterated for k = N − , . . . , V k ( x ) = Z X (cid:2) K X (¯ x ) + X \ K X (¯ x ) V k +1 (¯ x ) (cid:3) T ( d ¯ x | x, µ k ( x )) . (4)Based on the final value function after N iterations, we have that r µ ( K X , N ) = Z X (cid:2) K X ( x ) + X \ K X ( x ) V ( x ) (cid:3) π ( dx ) . (5) Furthermore, the optimal value functions V ∗ k ( x ), k ∈ [0 , N ], computed recur-sively with V ∗ N ( x ) = 0 and for all x ∈ X and k = N − , . . . , , V ∗ k ( x ) = sup µ k Z X (cid:2) K X (¯ x ) + X \ K X (¯ x ) V ∗ k +1 (¯ x ) (cid:3) T ( d ¯ x | x, µ k ( x )) , (6)give the optimal reachability probability r ∗ ( K X , N ) = Z X (cid:2) K X ( x ) + X \ K X ( x ) V ∗ ( x ) (cid:3) π ( dx ) . (7)Using V ∗ k ( x ) the control strategy C ∗ maximising P C × M ( ♦ K ) is defined by aMarkov policy µ ∗ with elements µ ∗ k µ ∗ k ( x ) ∈ arg sup µ k Z X (cid:2) K X (¯ x ) + X \ K X (¯ x ) V ∗ k +1 (¯ x ) (cid:3) T ( d ¯ x | x, µ k ( x )) . (8)The unbounded optimal reachability probability r ∗ ( K X ) can be evaluatedbased on the fixed point of (6). More specifically, for N → ∞ the value functionsare strictly increasing and converging to the fixed point solution of V ∗ ( x ) = sup µ Z X (cid:2) K X (¯ x ) + X \ K X (¯ x ) V ∗ (¯ x ) (cid:3) T ( d ¯ x | x, µ ( x )) . (9)For a given policy µ , the unbounded reachability probability r µ ( K X ) and theassociated value function V µ ( x ) are formulated similarly. Thus we have thatsup C P C × M ( ♦ K ) = r ∗ ( K X ), respectively, sup C P C × M ( ♦ ≤ N K ) = r ∗ ( K X , N ) . This together, with its universal measurability, has been discussed in [3]. Com-putation of backward recursions (4) and (6) is generally only tractable for finitestate spaces. In the following, we define approximate probabilistic simulationrelations over M Y , as introduced in [13], and apply it to compute a lower boundon probabilistic reachability and the corresponding synthesis problem. Consider two gMDPs M i = ( X i , π i , T i , U i , h i , Y ), i = 1 ,
2, that have the samemetric output space ( Y , d Y ). Given state-action pairs x ∈ X , u ∈ U and x ∈ X , u ∈ U , we want to relate the corresponding transition kernels, namelythe probability measures T ( · | x , u ) ∈ P ( X , B ( X )) and T ( · | x , u ) ∈P ( X , B ( X )). As in [13], we introduce the concept of δ -lifting as follows. Definition 5 ( δ -lifting for general state spaces). Let X , X be two setswith associated measurable spaces ( X , B ( X )) , ( X , B ( X )) , and let R ⊆ X × X be a relation for which R ∈ B ( X × X ) . We denote by ¯ R δ ⊆ P ( X , B ( X )) ×P ( X , B ( X )) the corresponding lifted relation so that ∆ ¯ R δ Θ holds if there existsa probability space ( X × X , B ( X × X ) , W ) (equivalently, a lifting W ) satisfying L1. for all X ∈ B ( X ) : W ( X × X ) = ∆ ( X ) ; L2. for all X ∈ B ( X ) : W ( X × X ) = Θ ( X ) ; L3. for the probability space ( X × X , B ( X × X ) , W ) it holds that x R x withprobability at least − δ , or equivalently that W ( R ) ≥ − δ . We introduce a notion of approximate probabilistic simulation relations whichnaturally leads to control refinement. For this, we build on the notion of inter-face function [11] to define probabilistic simulation relations that allow for thehierarchical control refinement of two gMDPs U v : U × X × X → P ( U , B ( U )) . This interface function U v is required to be a Borel measurable function. Intu-itively, an interface function implements (or refines) any control action synthe-sised over the abstract model to an action for the concrete model. Definition 6 ( ( ǫ, δ ) -probabilistic simulation relation). Consider two gMDPs M i = ( X i , π i , T i , U i , h i , Y ) , i = 1 , , over a shared metric output space ( Y , d Y ) . M is ( ǫ, δ ) -stochastically simulated by M if there exists an interface func-tion U v and a relation R ⊆ X × X , for which there exists a Borel measurablestochastic kernel W T ( · | u , x , x ) on X × X given U × X × X , such that: APS1. ∀ ( x , x ) ∈ R , d Y ( h ( x ) , h ( x )) ≤ ǫ ; APS2. ∀ ( x , x ) ∈ R , ∀ u ∈ U : T ( ·| x , u ) ¯ R δ T ( ·| x , U v ( u , x , x )) , withlifted probability measure W T ( · | u , x , x ) ; APS3. π ¯ R δ π .The simulation relation is denoted as M (cid:22) δǫ M . This definition extends the known exact notions of probabilistic simulation in[18], and the approximate notions of [7,8] to gMDPs over Polish spaces as elab-orated in [13]. The Borel measurability for both U v (see above) and W T (asin this definition), which is used to prove the well-posedness of the controllerrefinement, can be relaxed to universal measurability [13]. δ -Robust probabilistic reachabilityDefinition 7 (( ǫ, δ )-Robust satisfaction). Consider any M ∈ M Y . We saythat a Markov policy µ for M ( ǫ, δ ) -robustly satisfies ♦ ≤ N K with probability p if for every M ∈ M Y with M (cid:22) δǫ M there exists a controller C for M suchthat P C × M ( ♦ ≤ N K ) ≥ p . For a given universally measurable map ν : X → P ( U , B ( U )) and con-stant δ , define the operator T νδ : F → F acting on the set of functions F := { f : X → [0 , } as T νδ ( V )( x ) := L (cid:18)Z X [ K X (¯ x ) + X \ K X (¯ x ) V (¯ x )] T ( d ¯ x | x, ν ( x )) − δ (cid:19) (10)with L : R → [0 ,
1] being the truncation function L ( · ) := min(1 , max(0 , · )).Define also the operator T ∗ δ ( V ) on F as T ∗ δ ( V )( x ) := sup ν T νδ ( V )( x ). Theorem 1.
A target set K ⊂ Y of gMDP M is reached (0 , δ ) -robustly withMarkov policy µ and with probability r µ ( K X , N ) , where r µ ( K X , N ) := L (cid:18)Z X [ K X ( x ) + X \ K X ( x ) V δ ( x )] π ( dx ) − δ (cid:19) , (11) and V δ ( x ) is computed recursively according to V δk := T µ k δ ( V δk +1 ) , for k = N − , . . . , with initial value function V δN = 0 . If V δ, ∗ is computed similarly asthe solution of recursion V δ, ∗ k = T ∗ δ ( V δ, ∗ k +1 ) with initial value function V δ, ∗ N =0 and µ ∗ k ∈ arg sup µ k T µ k δ ( V δ, ∗ k +1 ) then we call µ ∗ = { µ , µ , . . . } the optimal (0 , δ ) -robust policy. Notice that for δ = 0 the computation of value functions V δk in Theorem 1 isthe same as (4). The proof of Theorem 1 builds on the construction of a refinedcontrol strategy as has been explained in [13]. For any M such that M (cid:22) δ M with the lifted probability measure W T , the control policy C can be refinedfrom C (cf. [13]). More precisely, a control strategy C that refines C over M is obtained by extending C with internal states ( x , x ). While the state( x , x ) of C is in R , the control refinement has as its basic ingredients thestates x and x , whose the stochastic transition to the pair ( x ′ , x ′ ) is governedfirstly by a point distribution δ x ( t ) ( dx ′ ) based on the measured state x ( t ) of M ; and subsequently, by the lifted probability measure W T ( dx ′ | x ′ , u , x , x ) , conditioned on x ′ .Before tackling unbounded reachability properties, we first analyse the be-haviour of T νδ and T ∗ δ . Suppose that W ( x ) ≥ W ( x ) for all x ∈ X , then fora given map ν : X → P ( U , B ( U )) we have T νδ ( W )( x ) ≥ T νδ ( W )( x ) , hence T ∗ δ ( W )( x ) ≥ T ∗ δ ( W )( x ) . Then the series V δN , V δN − , . . . , V δ constructed with V δk = T µ k δ V δk +1 and V δN = 0, is monotonically increasing. Therefore, for a givenpolicy µ the functions { ( T µ k δ ) q ( V ) } q ≥ initialised with V = 0 is point-wise con-verging since it is monotonically increasing and upper bounded. Additionally,the same holds for the function { ( T ∗ δ ) q ( V ) } q ≥ initialised with V = 0. For un-bounded reachability, this yields the following result. Corollary 1
A target set K ⊂ Y of gMDP M is reached (0 , δ ) -robustly withtime-homogeneous Markov policy µ and with probability r µ ( K X ) , where r µ ( K X ) := L (cid:18)Z X [ K X ( x ) + X \ K X ( x ) V δ ( x )] π ( dx ) − δ (cid:19) , (12) with V δ : X → [0 , the solution of V δ = T µδ ( V δ ) computed as the limit ofthe sequence { ( T µδ ) q ( V ) } q ≥ that is initialised with V = 0 . If V δ, ∗ is computedsimilarly as the solution of V δ, ∗ = T ∗ δ ( V δ, ∗ ) and µ ∗ ∈ arg sup T µδ ( V δ, ∗ ) then wecall µ ∗ the optimal (0 , δ ) -robust policy. ǫ, δ )-Robust probabilistic reachability Consider a target set K ⊂ Y , let K ǫ be the largest Borel measurable set suchthat K ǫ ⊂ { y | ∀ ¯ y ∈ Y with d Y (¯ y, y ) ≤ ǫ : ¯ y ∈ K } . (13) We can now introduce an eroded version of the original target set K X as K ǫ X := h − ( K ǫ ) , such that for any pair x , x if x ∈ K ǫ X and x R x then h ( x ) ∈ K . As a consequence of Theorem 1 and of Corollary 1, we can evaluate the( ǫ, δ )-robust reachability with respect to K ǫ X . Corollary 2 ( ( ǫ, δ ) -robust probabilistic reachability) A target set K ⊂ Y is reached ( ǫ, δ ) -robustly with Markov policy µ for M with respect to r ( δ,ǫ ) ( K X , N ) (or r ( δ,ǫ ) ( K X ) ) if it reaches the target set K ǫ as in (13) , (0 , δ ) -robustly withprobability r µ ( K ǫ X , N ) defined in (11) (or r µ ( K ǫ X ) defined in (12) ). Hence for any model M that is in an ( ǫ, δ )-probabilistic simulation relation with M , the combination of Theorem 1 and Corollaries 1 and 2, mean that we canverify probabilistic reachability robustly over M , and moreover, that we cansynthesise a robust controller maximising the satisfaction probability robustly. We now question whether we can quantify an upper bound on a reachabilityprobability using an approximate model M . Consider the operator T µ − δ , definedas T µ − δ ( V )( x ) := L (cid:18)Z X [ K − ǫ X (¯ x ) + X \ K − ǫ X (¯ x ) V (¯ x )] T ( d ¯ x | x, µ ( x )) + δ (cid:19) (14)with K − ǫ X := h − ( K − ǫ ) and K − ǫ := { y + y ǫ ∈ Y | y ∈ K and d Y (0 , y ǫ ) ≤ ǫ } . Theorem 2.
Consider a target set K ⊂ Y , then an upper bound on the maximalreachability r µ ( K X , N ) of M for M (cid:23) δǫ M can be given as ∀ µ : r µ ( K X , N ) ≤ r ( − δ, − ǫ ) ( K X , N ) (15) for which r ( − δ, − ǫ ) ( K X , N ) := r ( − δ ) ( K − ǫ X , N ) is computed with M as follows, r ( − δ, − ǫ ) ( K X , N ) := L (cid:18)Z X [ K − ǫ X ( x ) + X \ K − ǫ X ( x ) V δ ( x )] π ( dx ) + δ (cid:19) with V δk : X → [0 , such that V δk = sup µ T µ − δ ( V δk +1 ) and V δN = 0 . ǫ, δ )-probabilisticsimulation relations We extend the results on robust probabilistic reachability to the scLTL propertiesin Def. 4. For this purpose, we introduce a model known as Deterministic Finite-state Automaton (DFA).
Definition 8 (DFA).
A DFA is a tuple A = ( Q, q , Σ, F, t ) , where Q is a finiteset of locations, q ∈ Q is the initial location, Σ is a finite set, F ⊆ Q is a setof accept locations, and t : Q × Σ → Q is a transition function. A finite word composed of letters of the alphabet defined by Σ , i.e., ω =( ω (0) , . . . , ω ( n )) ∈ Σ n , is accepted by a DFA A if there exists a finite run q = ( q (0) , . . . , q ( n + 1)) ∈ Q n +2 such that q (0) = q , q ( i + 1) = t ( q ( i ) , ω ( i )) forall 0 ≤ i ≤ n and q ( n + 1) ∈ F . Similarly, we say that an infinite word w ∈ S isaccepted by a DFA A if there exists a finite prefix of ω accepted by A as a finiteword. More precisely, an infinite word ω ∈ Σ N is accepted by A if and only ifthere exists an infinite run q ∈ Q N such that q (0) = q , q ( i + 1) = t ( q ( i ) , ω ( i ))for all i ∈ N and there exists j ∈ N such that q ( j ) ∈ F . The accepted languageof A , denoted L ( A ), is the set of all words accepted by A . For every scLTLproperty ψ , c.f. Definition 4, there exists a DFA A ψ such that ω (cid:15) ψ ⇔ ω ∈ L ( A ψ ) . (16)As a result, the satisfaction of the property ψ now becomes equivalent tothe reaching of the accept locations in the DFA. We use the DFA A ψ to specifyproperties of the gMDP M = ( X , π, T , U , h, Y ) as follows. Remember that L : Y → Σ is a given measurable function. To each output y ∈ Y it assigns the letter L ( y ) ∈ Σ . Given a control strategy C , we can define the probability that a pathof M satisfies a scLTL property ψ , i.e. P C × M ( ω ∈ L ( A ψ )).We can reduce the computation of P C × M ( ω ∈ L ( A ψ )) over the traces ω of M to the reachability problem over another gMDP M ⊗ A ψ , which we refer to asa product of the gMDP M and the automaton A ψ . This was originally derivedin [22] for MDPs. We give a similar definition of the product construction as. Definition 9 (Product between automaton and gMDP).
Given a gMDP M = ( X , π, T , U , h, Y ) , a finite alphabet Σ , a labelling function L : Y → Σ and a DFA A ψ = ( Q, q , Σ, F, t ) , we define the product between M and A ψ tobe another gMDP denoted as M ⊗ A ψ = ( ¯ X , ¯ π, ¯ T , U , ¯ h, Y ) . Here ¯ X = X × Q , ¯ h ( x, q ) = h ( x ) for any ( x, q ) ∈ ¯ X , and ¯ T ( A × { q ′ }| x, q, u ) = Z ˜ x ∈ A ( q ′ = t ( q, L ( h (˜ x )))) · T ( d ˜ x | x, u ) . initialised with ¯ π ( dx, q ) = π ( dx ) t ( q , L ( h ( x )))) . The quantity P C × M ( ω ∈ L ( A ψ )) can be related to the reachability probabil-ity over the gMDP M ⊗ A with a goal state G := X × F , as it was shown to bethe case for MDPs in [22]. Lemma 1.
Given gMDP M , alphabet Σ , labelling function L , and scLTL spec-ification ψ modelled with DFA A ψ , it holds that P C ( µ,ψ ) × M ( ω ∈ L ( A ψ )) = P µ × ( A ψ ⊗ M ) ( ♦ G )) For any Mrkov policy µ on the product space of A ψ ⊗ M and any control strategy C ( µ, ψ ) on M with properly defined mappings between µ and the control strategy. δ -Robust satisfaction of scLTL properties In the following we analyse the robust satisfaction of scLTL specifications, whichare temporal specifications that go beyond reachability properties defined on M .The probabily of satisfying such a temporal specification can be quantified as areachability probability with respect to M ⊗ A ψ . For two gMDPs M and M ,subject to M (cid:22) δ M , we show that ( δ -approximate) probabilistic simulationrelations are preserved under a product with a DFA. Theorem 3.
Let M i , i = 1 , , M i = ( X i , π i , T i , U i , h i , Y ) , be two gMDPs suchthat M (cid:22) δ M and A = ( Q, q , Σ, F, t ) be an automaton. For any labelingfunction L : Y → Σ we have M ⊗ A (cid:22) δ M ⊗ A . This theorem enables us to quantify temporal logic properties for M with re-spect to M . Consider a scLTL property ψ with a corresponding DFA A ψ andtwo gMDPs M , M ∈ M Y for which M (cid:22) δ M . If there exists a Markov policy µ for M ⊗ A ψ such that the accepting states are reached with δ -robust proba-bility p , then there exists a control strategy C for M such that the acceptingstates of A ψ are reached with probability p under the evolution of C × M .More precisely, denote with ¯ X := X × Q the state space of M ⊗ A ψ , then themapping T νδ becomes T νδ ( V )( x , q ) = L (cid:16) Z X (cid:2) F ( t x ( q, x ′ )) + Q \ F ( t x ( q, x ′ )) V ( x ′ , t x ( q, x ′ )) (cid:3) × T ( dx ′ | x , ν ( x , q )) − δ (cid:17) (17)with t x ( q, x ′ ) := t ( q, L ( h ( x ′ )))). For T µδ ( V )( x , q ) = V ( x , q ), the δ -robustreachability probability is defined as r µ ( F × X ) = L (cid:16) Z X [ F ( t x ( q , x )) + Q \ F ( t x ( q , x )) × V ( x , t x ( q , x ))] π ( dx ) − δ (cid:17) . (18) ǫ, δ )-Robust satisfaction of scLTL properties We now integrate the ǫ error in the output space into the robust synthesis prob-lem via the effect it has on the labelling. Given L : Y → Σ , we define L ǫ : Y → Σ as L ǫ ( y ) := { q ∈ Σ |∃ ( y, y ǫ ) : d Y ( y, y ǫ ) ≤ ǫ and q = L ( y ǫ ) } . (19)Consider M (cid:22) δǫ M with R ǫ , then for all ( x , x ) ∈ R ǫ , it holds that L ( h ( x )) ∈ L ǫ ( h ( x )). In Figure 1, an output space together with its labels is depicted.When taking into account an ǫ error in the output, the labelling becomes non-deterministic in some regions. This can be observed on the right of Figure 1.Instead of integrating this relaxed labelling into the product construction ofa given gMDP, we will immediately adapt the δ -robust reachability computa-tions (17) to deal with this non-determinism. Consider the ( ǫ, δ )-robust operator Fig. 1: A typical labelling over the output space. On the left a normal labellingis given, on the right the labelling is non-deterministic due to the output error. T νǫ,δ ( V )( x , q ) defined as T νǫ,δ ( V )( x , q ) = L (cid:16) Z X min q ′ ∈ ¯ t x ( q,x ′ ) [ F ( q ′ ) + Q \ F ( q ′ ) V k +1 ( x ′ , q ′ )] × T ( dx ′ | x , ν ( x , q )) − δ (cid:17) with ¯ t x ( q, x ) := { t ( q, α ) with α ∈ L ǫ ( h ( x )) } . For a time-homogeneous Markovpolicy µ and V ( x , q ) that satisfy T µǫ,δ ( V )( x , q ) = V ( x , q ), the δ -robust reach-ability probability is defined as r ǫ,δ ( F × X ) = L (cid:16) Z X min q ′ ∈ ¯ t x ( q ,x ) [1 F ( q ′ ) + Q \ F ( q ′ ) V ( x , q ′ )] π ( dx ) − δ (cid:17) . Consider a scLTL property ψ and the corresponding A ψ with goal states F . If F × X is δ -robust reachable with respect to r ǫ,δ ( F × X ), then we can refine µ to C ( µ, ψ ) such that ψ is satisfied by C ( µ, ψ ) × M with a probability p ≥ r ǫ,δ ( F × X ). Of course the apparent non-determinism due to the relaxedlabelling will be resolved in the refined control strategy by selecting the labelsof the concrete model.Building on Corollary 1, we can now also maximise the ( ǫ, δ )-robust proba-bility using T ∗ ǫ,δ , defined as T ∗ ǫ,δ ( V )( x , q ) := sup µ T µǫ,δ ( V )( x , q ) , which yieldsan optimised robust Markov policy as µ ∗ ( x , q ) ∈ arg sup µ T µǫ,δ ( V ∗ )( x , q ) for T ∗ ǫ,δ ( V ∗ )( x , q ) = V ∗ ( x , q ) . Hence, we have shown that we can leverage probabilistic simulation relationsto use approximate models for the controller synthesis and verification of bothprobabilistic reachability (cf. Problem 1) and scLTL properties (cf. Problem 2).
In this section, we apply our results to formal controller synthesis for stochasticlinear dynamical systems. Existing formal controller synthesis results for thisclass of models either rely on model order reduction [19] or use abstractiontechniques as finite-state MDPs [9,22]. Our new results combine these two ap-proaches in one framework that benefits from both of them. More precisely, our abstract model used for synthesis is obtained by discretising the state space ofa reduced-order version of the concrete model. Concrete model.
Consider the following linear dynamical model M : x ( t + 1) = A x ( t ) + B u ( t ) + B w w ( t ) , x (0) = x ∈ X y ( t ) = C x ( t ) , t = 0 , , , . . . , where x ( · ) ∈ X ⊂ R n , u ( · ) ∈ U ⊂ R m , and y ( · ) ∈ Y ⊂ R p . Matrices A , B , B w , and C have appropriate dimension and w ( · ) are iid random variableswith standard multivariate Gaussian distributions. Constructing the abstract model.
Construction of the abstract model re-lies on partitioning a new space X s ⊂ R n s , where n s < n , as { A i ⊂ X s , i =1 , , . . . , l } . Over this partition, we select representative points { z i ∈ A i , i =1 , , . . . , l } , which we call X and which becomes the state space of the abstractmodel M . Introduce the operator Π : X s → X that assigns to any x ∈ A i , i ∈ { , . . . , l } the representative point of A i , z i = Π ( x ).Next we provide a dynamical characterisation of M . The state evolution of M is written as x ( t + 1) = Π ( A x ( t ) + B u ( t ) + B w w ( t )) , x (0) = x ∈ X y ( t ) = C x ( t ) , t = 0 , , , . . . , with state x ( · ) ∈ X , input u ( · ) ∈ U , and output y ( · ) ∈ Y , and matrices A , B , B w , C of appropriate dimensions. Note that the noise term w ( t ) in M is the same as the one in M , thereby allowing to define a lifting W T . Computing the ( ǫ, δ )-simulation relation.
Consider the linear interface func-tion u = Ru + Qx + K ( x − P x ), such that P A = A P + B Q for somematrix P . Define the relation ( x , x ) ∈ R ǫδ iff ( x − P x ) T M ( x − P x ) ≤ ǫ .Next we check conditions of Def. 6 under which M (cid:22) δǫ M . It is guaranteedthat d Y ( y , y ) ≤ ǫ for any ( x , x ) ∈ R ǫδ (cf. APS1 in Def. 6) if C = C P ,and C T C ≤ M . Condition APS2 in Def. 6 holds if c w is selected such that P ( w T w ≤ c w ) ≥ − δ and the following inequality( ¯ A ¯ x + ¯ Bu + ¯ B w w + P β ) T M ( ¯ A ¯ x + ¯ Bu + ¯ B w w + P β ) ≤ ǫ (20)is satisfied for any ¯ x, u , w, β such that w T w ≤ c w , u ≤ c u , ¯ x T M ¯ x ≤ ǫ , | β | ≤ δ .The matrices in (20) are defined as ¯ A := A + B K , ¯ B := B R − P B , and¯ B w := B w − P B w . Vector δ is the diameter of the partition { A i , i = 1 , . . . , l } ,which satisfies | x s − x ′ s | ≤ δ component-wise for any x s , x ′ s ∈ A i and any i ∈{ , , . . . , l } . Condition (20) can be checked using LMIs and the S-procedure [6].For Condition APS3 in Def. 6, we now question when there exists a de-terministic initial state x for a given deterministic initial state x satisfy-ing ( x − P x ) T M ( x − P x ) ≤ ǫ . We can choose x := ˆ P x withˆ P := ( P T M P ) − P T M which is the minimum of left-hand side, and select itas the representative point of the associated partition set in M . Alternatively,when the representative point cannot be freely chosen, we select x = Π ( ˆ P x ). In the former case, there exists an initial state x if x T M ( I n − P ˆ P ) x ≤ ǫ ,or, dually, ǫ is lower bounded for a given x . In the latter case, if k M ( I n − P ˆ P ) x k + k M P δ k ≤ ǫ , then there exists an initial state x satisfying Condi-tion APS3 . Toy example.
We consider the specification ψ = ♦(cid:3) ≤ n { y ∈ K } which encodesreach and stay over bounded time intervals. The associated DFA is given inFigure 2, together with an illustration of a potential application of this toyexample. Fig. 2: A game of tag: ♦(cid:3) ≤ n { x a ∈ K } Consider the original model M , which is a 3 dimensional model with output y ( t ) = x a and x a ( t + 1) = x a ( t ) − a ( x b ( t ) − x c ( t )) − a u ( t ) + a w ( t ) x b ( t + 1) = bx b ( t ) + u ( t ) x c ( t + 1) = c x c ( t ) + c w ( t ) (21)with a = . a = 0 . a = 6 e -3, b = c = . c = .
1. For the gamewe consider the case with n = 3. As in [13], we compute the lower dimensional − − . . . . x r ǫ , δ ( F , X ) x y Fig. 3: On the left: ( ǫ, δ )-robust satisfaction probability of ♦(cid:3) ≤ n { y ∈ [ − , } with ǫ = 1 . δ = 0 .
03. On the right: simulation runs for the originalmodel and the abstract model with the composed robust controller.model via the balance truncation of the original controlled model with a suitablefeedback gain ( K = [ − . . − . simulation runs are given v , the crosses are the outputs of M whereas the linesare the outputs of M . In this paper, we have introduced a new robust way of synthesising controlstrategies and verifying probabilistic temporal logic properties. Beyond this the-oretical contribution, future work will focus on the computational aspects of thisapproach to prepare for application on realistic sized problems.
References
1. A. Abate. Approximation metrics based on probabilistic bisimulations for generalstate-space Markov processes: a survey.
ENTCS , 297:3–25, 2013.2. A. Abate, J.-p. Katoen, and A. Mereacre. Quantitative Automata Model Checkingof Autonomous Stochastic Hybrid Systems. In
Proc. 14th ACM Int. Conf. HybridSyst. Comput. Control , pages 83–92, 2011.3. A. Abate, M. Prandini, J. Lygeros, and S. Sastry. Probabilistic Reachabilityand Safety for Controlled Discrete Time Stochastic Hybrid Systems.
Automat-ica , 44(11):2724–2734, 2008.4. D. Bertsekas and S. E. Shreve.
Stochastic Optimal control : The discrete time case .Athena Scientific, 1996.5. V. I. Bogachev.
Measure theory . Springer Science & Business Media, 2007.6. S. Boyd and L. Vandenberghe.
Convex Optimization . CUP, Cambridge, 2004.7. J. Desharnais, F. Laviolette, and M. Tracol. Approximate analysis of probabilis-tic processes: Logic, simulation and games.
Conf. on Quantitative Evaluation ofSystems , pages 264–273, Sept. 2008.8. A. D’Innocenzo, A. Abate, and J.-P. Katoen. Robust PCTL model checking. In
Proceedings of the 15th ACM international conference on Hybrid Systems: compu-tation and control , pages 275–285, 2012.9. S. Esmaeil Zadeh Soudjani and A. Abate. Adaptive and sequential gridding pro-cedures for the abstraction and verification of stochastic processes.
SIAM Journalon Applied Dynamical Systems , 12(2):921–956, 2013.10. S. Esmaeil Zadeh Soudjani, C. Gevaerts, and A. Abate. FAUST : Formal Ab-stractions of Uncountable-STate STochastic Processes. In Tools and Algorithmsfor the Construction and Analysis of Systems (TACAS) , Lecture Notes in Com-puter Science, pages 272–286. Springer Berlin Heidelberg, 2015.11. A. Girard and G. J. Pappas. Hierarchical control system design using approximatesimulation.
Automatica , 45(2):566–571, 2009.12. S. Haesaert, A. Abate, and P. M. J. Van den Hof. Verification of general markovdecision processes by approximate similarity relations and policy refinement. In , pages 227–243, 2016.13. S. Haesaert, S. Esmaeil Zadeh Soudjani, and A. Abate. Verification of generalmarkov decision processes by approximate similarity relations and policy refine-ment.
SIAM Journal on Control and Optimization , 55(4):2333–2367, 2017. v initiated at x a = 2 . x b = 2 . x c = 1 . Discrete-time Markov control processes ,volume 30 of
Applications of Mathematics (New York) . Springer Verlag, 1996.16. O. Kupferman and M. Y. Vardi. Model checking of safety properties.
FormalMethods in System Design , pages 291–314, 2001.17. M. Kwiatkowska, G. Norman, and D. Parker. PRISM 4.0: Verification of proba-bilistic real-time systems. In G. Gopalakrishnan and S. Qadeer, editors,
Proc. 23rdInternational Conference on Computer Aided Verification (CAV’11) , volume 6806of
LNCS , pages 585–591. Springer, 2011.18. K. G. Larsen and A. Skou. Bisimulation through probabilistic testing.
Informationand Computation , 94(1):1–28, 1991.19. A. Lavaei, S. Esmaeil Zadeh Soudjani, R. Majumdar, and M. Zamani. Composi-tional Abstractions of Interconnected Discrete-Time Stochastic Control Systems.
ArXiv e-prints , Sept. 2017.20. S. P. Meyn and R. L. Tweedie.
Markov chains and stochastic stability . Communi-cations and Control Engineering Series. Springer-Verlag London Ltd., 1993.21. M. G. Safonov and R. Chiang. A Schur method for balanced-truncation modelreduction.
IEEE Transactions on Automatic Control , 34(7):729–733, 1989.22. I. Tkachev, A. Mereacre, J. Katoen, and A. Abate. Quantitative automata-basedcontroller synthesis for non-autonomous stochastic hybrid systems. In
HSCC , pages293–302, 2013.
A Additional proofs
Proof (
Proof of Theorem 1 ). Consider any two models M , M ∈ M Y with M (cid:22) δ M . A Markov policy µ = ( µ , . . . , µ N ) for M can be refined to M .Technically, this means that we first can write it as a control strategy for whichwe can do control refined as proven in [13]. Denote the control strategy thatrefines the Markov policy as C . Then the composed system C × M containstransitions over X × X with stochastic transition kernels defined as W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) for k ∈ { , . . . , N − } We now need to show that r µ ( K X , N ) defined in (11) is a lower boundfor the probability that target set K is reached by C × M in N time steps, P C × M ( ♦ ≤ N K ). Onserve that the following holds P C × M (cid:2) (( x , x ) ∈ R ) U ≤ N (( x , x ) ∈ R ∧ ( x ∈ K X )) (cid:3) ≤ P C × M (cid:2) ♦ ≤ N ( y ∈ K ) (cid:3) , (22)since (( x , x ) ∈ R ∧ ( x ∈ K X )) implies h ( x ) = h ( x ) and h ( x ) ∈ K . Theleft-hand side of (22) can be computed via backward recursion of reach-avoidproperty. For this the safe set (i.e., the complement of the avoid set) is R andthe reach set is G := ( K X × X ) ∩ R . The value functions { V k , k = 0 , . . . , N } in the backward recursion are initialised with V N = 0 and computed as V k ( x , x ) = Z X × X (cid:2) G (¯ x ) + R\ G (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) (cid:3) × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) . This can be also written down as V k ( x , x ) = Z R h ( K X × X ) (¯ x ) + ( X \ K X ) × X (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) . We now want to compute a lower bound on V k ( x , x ) based on backwards com-putations over M . The value functions V δk : X → [0 ,
1] are defined inductivelyas V δk = T µ k δ ( V δk +1 ), which is V δk ( x ) := L (cid:18)Z X h K X (¯ x ) + X \ K X (¯ x ) V δk +1 (¯ x ) i T ( d ¯ x | x , µ k ( x )) − δ (cid:19) and initialised with V δk = 0. With focus on the above two recursions we claim thatif V δk +1 ( x ) is a lower bound for V k +1 ( x , x ) for all ( x , x ) ∈ R , then V δk ( x )is also a lower bound for V k ( x , x ). Once we prove this claim, by inductionwe get that V δ ( x ) is a lower bound for V k +1 ( x , x ). As a result, the δ -robustprobability can be computed as r µ ( K X , N ) := L (cid:18)Z X [ K X (¯ x ) + X \ K X (¯ x ) V δ (¯ x )] π ( d ¯ x ) − δ (cid:19) . To prove the claim we need to define ˜ V δk : X → [ − δ,
1] as˜ V δk ( x ) := Z X h K X (¯ x ) + X \ K X (¯ x ) V δk +1 (¯ x ) i T ( d ¯ x | x , µ k ( x )) − δ, such that V δk ( x ) = L (cid:16) ˜ V δk ( x ) (cid:17) . For any ( x , x ) ∈ R , we have˜ V δk ( x ) + δ = Z X h K X (¯ x ) + X \ K X (¯ x ) V δk +1 (¯ x ) i T ( d ¯ x | x , µ k ( x ))= Z X × X h ( K X × X ) (¯ x , ¯ x ) + ( X \ K X ) × X (¯ x , ¯ x ) V δk +1 (¯ x ) i (23) × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) ≤ Z R h ( K X × X ) (¯ x , ¯ x ) + ( X \ K X ) × X (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x )+ Z ( X × X ) \R W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) (24) ≤ Z R h ( K X × X ) (¯ x , ¯ x ) + ( X \ K X ) × X (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) + δ. Note that (23) follows from the fact that ( x , x ) ∈ R and (24) holds since V δk +1 (¯ x ) is upper bounded by 1 and is lower bounded by V δk +1 (¯ x , ¯ x ) over R .Thus by the definition of V k , we have also that V k ( x , x ) ≤ ˜ V δk ( x ) for all( x , x ) ∈ R . Since V k is also lower bounded by 0, we have that V k ( x , x ) ≤ L (cid:16) ˜ V δk ( x ) (cid:17) = V δk ( x ), which completes the proof. Proof (
Proof of Theorem 2 ). This proof upper-bounds the maximal reach-ability probability and follows an analogue path to the proof of Theorem 1.Consider any two models M , M ∈ M Y with M (cid:23) δǫ M . A Markov policy µ = ( µ , . . . , µ N ) for M can be refined to M such that the composed system C × M contains transitions over X × X with stochastic transition kernelsdefined as W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) for k ∈ { , . . . , N − } . We now need to show that P µ × M ( ♦ ≤ N K ) has an upper bound computed as givenin Theorem 2. Remark that ♦ ≤ N ( y ∈ K ) implies the specification ψ := [( x , x ) ∈ R ] U ≤ N (cid:2) ( x , x ) ∈ ¯ R ∨ ( x ∈ K − ǫ X ) (cid:3) with ¯ R := X × X \ R . Hence it follows that P µ × M (cid:2) ♦ ≤ N ( y ∈ K ) (cid:3) ≤ P C × M [ ψ ] . (25)The right-hand side of (25) can be computed via backward recursion of reach-avoid property. For this the safe set (i.e., the complement of the avoid set) is R and the reach set is (( K X × X ) ∩ R ) ∪ ¯ R . The value functions V k in thebackward recursion are initialised with V N = 0 and computed as V k ( x , x ) = Z X × X h ( K − ǫ X × X ) ∩R (¯ x , ¯ x ) + ¯ R (¯ x , ¯ x )+ R\ ( K − ǫ X × X ) (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) i W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) . (26)We now want to compute an upper bound on V k ( x , x ) based on backwardrecursion over M . Let V − δk : X → [0 ,
1] be defined inductively as V − δk ( x ) := sup ¯ µ L (cid:18)Z X h K X (¯ x ) + X \ K X (¯ x ) V − δk +1 (¯ x ) i T ( d ¯ x | ¯ µ k ( x ) , x ) + δ (cid:19) and initialised with V − δN = 0. We claim that if V − δk +1 ( x ) ≥ V k +1 ( x , x ) for all( x , x ) ∈ R , then V − δk ( x ) ≥ V k ( x , x ). Thus by induction we get V − δ ( x ) ≥ V ( x , x ). By repeating the same argument for initial measure π , we get Theo-rem 2.In order to prove the above claim we need to define value functions ˜ V δk : X → [ − δ,
1] as˜ V − δk ( x ) := sup ¯ µ (cid:18)Z X h K X (¯ x ) + X \ K X (¯ x ) V − δk +1 (¯ x ) i T ( d ¯ x | ¯ µ k ( x ) , x ) (cid:19) + δ, such that V − δk ( x ) = L (cid:16) ˜ V − δk ( x ) (cid:17) due to the fact that L and sup are inter-changeable here. For any ( x , x ) ∈ R we have V k ( x , x ) ≤ Z R h ( K − ǫ X × X ) ∩R (¯ x , ¯ x ) + R\ ( K − ǫ X × X ) (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) + δ ≤ Z R h ( K − ǫ X × X ) (¯ x , ¯ x ) + ( X × X ) \ ( K − ǫ X × X ) (¯ x , ¯ x ) V − δk +1 (¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) + δ ≤ Z X × X h ( K − ǫ X × X ) (¯ x , ¯ x ) + ( X × X ) \ ( K − ǫ X × X ) (¯ x , ¯ x ) V − δk +1 (¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) + δ ≤ sup ¯ µ Z X h K − ǫ X (¯ x ) + X \ K − ǫ X (¯ x ) V − δk +1 (¯ x ) i T ( d ¯ x | ¯ µ ( x ) , x ) + δ = ˜ V − δk ( x ) . Since V k ( x , x ) : X × X → [0 ,
1] it holds that if V k ( x , x ) ≤ ˜ V − δk ( x ) thenalso V k ( x , x ) ≤ L (cid:16) ˜ V − δk ( x ) (cid:17) = V − δk ( x ). This completes the proof. Proof (
Proof of Theorem 3 ). Since M (cid:22) δ M , according to Definition 6there exists an interface function U v , a relation R ⊆ X × X , and a Borelmeasurable stochastic kernel W T ( · | u , x , x ) such that1. ∀ ( x , x ) ∈ R , h ( x ) = h ( x );2. ∀ ( x , x ) ∈ R , ∀ u ∈ U , T ( ·| x , u ) ¯ R δ T ( ·| x , U v ( u , x , x )) , with liftedprobability measure W T ( · | u , x , x );3. π ¯ R δ π .Indicate the product gMDPs by M i ⊗ A = ( ¯ X i , ¯ π i , ¯ T i , U , ¯ h i , Y ). According toDefinition 9 we have for any i = 1 ,
2, ¯ X i = X i × Q , ¯ π i ( dx i , q ) = π i ( dx i ) · q = q ),¯ h i ( x i , q ) = h i ( x i ) for any ( x i , q ) ∈ ¯ X i , and¯ T i ( A i × { q ′ }| x i , q, u i ) = Z ˜ x i ∈ A i ( q ′ = t ( q, L ( h i (˜ x i )))) · T i ( d ˜ x i | x i , u i ) . In order to prove the theorem we construct the relation R p , the interface function U p v , and the lifted measure W p T based on R , U v , W T .1. Define R p ⊆ ¯ X × ¯ X with ( x , q ) R p ( x , q ) iff x R x and q = q . Selectany ¯ x = ( x , q ) ∈ ¯ X and ¯ x = ( x , q ) ∈ ¯ X . Then¯ h (¯ x ) = ¯ h ( x , q ) = h ( x ) and ¯ h (¯ x ) = ¯ h ( x , q ) = h ( x ) . Thus(¯ x , ¯ x ) ∈ R p ⇒ ( x , x ) ∈ R ⇒ h ( x ) = h ( x ) ⇒ ¯ h (¯ x ) = ¯ h (¯ x ) .
2. Since π ¯ R δ π , there exists a lifted measure W Init such that W Init ( R ) ≥ − δ , W Init ( A × X ) = π ( A ) for all A ∈ B ( X ), and W Init ( X × A ) = π ( A )for all A ∈ B ( X ). Define the probability space (cid:0) ¯ X × ¯ X , B ( ¯ X × ¯ X ) , ¯ W Init (cid:1) with the property that for all A ∈ B ( X ), A ∈ B ( X ), and q , q ∈ Q ¯ W Init ( { q } × A × { q } × A ) := W Init ( A × A )1( q = q )1( q = q ) , where q is the initial state of the automaton A . In words, ¯ W Init assignsprobabilities (which are the same as W Init ) to Borel measurable subsets of¯ X × ¯ X if and only if the discrete modes are equal to q . For this particularlifted measure, we have – ¯ W Init ( { q } × A × ¯ X ) = W Init ( A × X )1( q = q ) = π ( A )1( q = q ) =¯ π ( { q } × A ) . – ¯ W Init ( ¯ X × { q } × A ) = W Init ( X × A )1( q = q ) = π ( A )1( q = q ) =¯ π ( { q } × A ) . – ¯ W Init ( R p ) = P q ,q ¯ W Init (( x , x ) ∈ R ∧ q = q ) = W Init ( R ) ≥ − δ .Therefore ¯ π ¯ R p δ ¯ π with lifted measure ¯ W Init .3. We know that for all ( x , x ) ∈ R and u ∈ U , T ( ·| x , u ) ¯ R T ( ·| x , U v ( u , x , x )) , with lifted probability measure W T ( · | u , x , x ). Define the new measure¯ W T ( · | u , ¯ x , ¯ x ) on ¯ X × ¯ X given U × ¯ X × ¯ X with¯ W T ( { q ′ } × A × { q ′ } × A | u , q , x , q , x ) := Z ˜ x ∈ A Z ˜ x ∈ A . . . ( q ′ = t ( q , L ( h (˜ x )))) · ( q ′ = t ( q , L ( h (˜ x )))) · W T ( d ˜ x × d ˜ x | u , x , x ) . (27)In words, ¯ W T assigns probabilities to Borel measurable subsets of ¯ X × ¯ X which are the same probabilities as in W T and evolves the discrete modeof the two gMDP according to the automaton A . For this particular liftedmeasure ¯ W T , we have for any ¯ x = ( x , q ) ∈ X and ¯ x = ( x , q ) ∈ X with ¯ x R p ¯ x and any u ∈ U :¯ W T ( { q ′ } × A × ¯ X | u , q , x , q , x ) = X q ′ ∈ Q Z ˜ x ∈ X Z ˜ x ∈ A . . . = Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) Z ˜ x ∈ X W T ( d ˜ x × d ˜ x | u , x , x )= Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) W T ( d ˜ x × X | u , x , x )= Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) T ( d ˜ x | x , u )= ¯ T ( A × { q ′ }| x , q , u ) , and¯ W T ( ¯ X × { q ′ } × A | u , q , x , q , x ) = X q ′ ∈ Q Z ˜ x ∈ X Z ˜ x ∈ A . . . = Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) Z ˜ x ∈ X W T ( d ˜ x × d ˜ x | u , x , x )= Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) W T ( X × d ˜ x | u , x , x )= Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) T ( d ˜ x | x , U v ( u , x , x ))= ¯ T ( A × { q ′ }| x , q , U v ( u , x , x )) . Take any ( x , q ) R p ( x , q ) which implies q = q and x R x , hence h ( x ) = h ( x ). If we also assume (˜ x , q ′ ) R p (˜ x , q ′ ), then t ( q , L ( h (˜ x ))) = t ( q , L ( h (˜ x ))) , and we get¯ W T ( R p | u , q ,x , q , x ) = Z (˜ x , ˜ x ) ∈R W T ( d ˜ x × d ˜ x | u , x , x ) · X q ′ ,q ′ ∈ Q ( q ′ = t ( q , L ( h (˜ x )))) · ( q ′ = t ( q , L ( h (˜ x )))) . The above sum is equal to one due to q = q , (˜ x , ˜ x ) ∈ R , and the DFAbeing deterministic. Then¯ W T ( R p | u , q ,x , q , x ) = Z R W T ( d ˜ x × d ˜ x | u , x , x ) ≥ − δ. We have shown that¯ T ( A × { q ′ }| x , q , u ) ¯ R p δ ¯ T ( A × { q ′ }| x , q , U v ( u , x , x ))with lifted measure ¯ W T defined in (27) and the same interface function U vv