[PDF] Temporal logic control of general Markov decision processes by approximate policy refinement

Abstract

The formal verification and controller synthesis for Markov decision processes that evolve over uncountable state spaces are computationally hard and thus generally rely on the use of approximations. In this work, we consider the correct-by-design control of general Markov decision processes (gMDPs) with respect to temporal logic properties by leveraging approximate probabilistic relations between the original model and its abstraction. We newly work with a robust satisfaction for the construction and verification of control strategies, which allows for both deviations in the outputs of the gMDPs and in the probabilistic transitions. The computation is done over the reduced or abstracted models, such that when a property is robustly satisfied on the abstract model, it is also satisfied on the original model with respect to a refined control strategy.

Full PDF

aa r X i v : . [ c s . S Y ] N ov Temporal logic control of general Markovdecision processes by approximate policyreﬁnement

Soﬁe Haesaert , Sadegh Soudjani , and Alessandro Abate California Institute of Technology, United States School of Computing, Newcastle University, United Kingdom Computer Science Department, Oxford University, United Kingdom

Abstract.

The formal veriﬁcation and controller synthesis for Markovdecision processes that evolve over uncountable state spaces are compu-tationally hard and thus generally rely on the use of approximations. Inthis work, we consider the correct-by-design control of general Markovdecision processes (gMDPs) with respect to temporal logic propertiesby leveraging approximate probabilistic relations between the originalmodel and its abstraction. We newly work with a robust satisfactionfor the construction and veriﬁcation of control strategies, which allowsfor both deviations in the outputs of the gMDPs and in the probabilis-tic transitions. The computation is done over the reduced or abstractedmodels, such that when a property is robustly satisﬁed on the abstractmodel, it is also satisﬁed on the original model with respect to a reﬁnedcontrol strategy.

With the ever more ubiquitous embedding of digital components into physicalsystems new computational eﬃcient veriﬁcation and control synthesis methodsfor these cyber-physical systems are needed. The correct functioning of cyber-physical systems can only be expressed over the combined behaviour of boththe digital component and its connected physical system. Quite importantly,stochastic models are key when computers interact with physical systems such asbiological processes, power networks, and smart-grids. These dynamic systemswith uncertainty and non-determinism can be modelled as Markov processesevolving over continuous spaces. Their potential safety critical impact on theenvironment they interact with makes it of particular interest to develop formalmethods that assist in their veriﬁable design. In this work, we newly enable theveriﬁcation and synthesis of these stochastic systems with respect to probabilisticlinear temporal logic properties.As our modelling framework, we consider the rich class of general Markovdecisions processes (gMDPs), which are Markov decision processes evolving overcontinuous or uncountable state spaces and which have control-dependent stochas-tic transitions in combination with a metric output space. It is over this output that we deﬁne properties of interest. The characterisation of properties over suchprocesses can in general not be attained analytically [3], so an alternative is toapproximate these models by simpler processes, such as ﬁnite-state MDP [9] orcontinuous-space reduced order models [21] that are prone to be mathemati-cally analysed or algorithmically veriﬁed [10]. In [13,12] we have proposed newapproximate similarity relations to quantify the accuracy of the approximationutilising bounds on the output distance and on the transition probabilities. Wehave shown that for bounded safety properties these approximate similarity re-lations can be used to reﬁne control strategies with bounded error on transitionprobability and deviations in the output space. The main goal of this paper isto study general temporal logic properties of gMDPs in combination with theseapproximate similarity relations. As the main contribution of the paper we showthat the standard veriﬁcation and control synthesis for gMDPs can be maderobust a-priori to the introduced accuracy errors.

Related Work.

Properties deﬁned in PCTL, PLTL, and PCTL ∗ for ﬁnite-state Markov (decision) processes can be veriﬁed using tools such as PRISM [17].Moreover it is also well-known how to design policies, i.e., to control these Markovdecision processes such that the satisfaction of these properties is maximised.The work in [2] has studied model-checking of automata speciﬁcations againstautonomous (i.e. uncontrolled) discrete-time stochastic models over uncountablestate spaces. It was shown that the computation of the probability of satisfying aspeciﬁcation expressed as deterministic ﬁnite automaton (DFA) can be restatedin terms of a probabilistic reachability problem over the product between theoriginal model and the DFA. This result has been extended in [22] to the caseof controlled discrete-time Markov processes, which are a special subclass ofgMDPs introduced in our work (obtained with identity output map, h ( x ) = x for all x ∈ X , and so with an output space Y = X ). In this work we extendthe model class discussed in [22] and build on their results by showing that anapproximate model can be used to compute the solutions of Bellman equationsassociated to the stochastic reachability problem.The paper is organised as follows. In the next section, we ﬁrst deﬁne gMDPsand state the temporal logic control problem. In Section 3, we deﬁne approx-imate simulation relation on gMDPs and solve the robust probabilistic reach-ability problem. In Section 4, we extend the results to probabilistic temporallogic control utilising the product of gMDP and DFA. In Section 5, we detail theapproximation procedure for linear stochastic dynamical systems and we applyit to a simple toy case. Throughout the paper proofs of the theorems have beenrelegated to the appendix of the extended version [14]. In this work, we only consider Borel measurable spaces, i.e., ( X , B ( X )), and werestrict our attention to Polish spaces [5]. Together with the measurable space ( X , B ( X )), a probability measure P deﬁnes the probability space, denoted by( X , B ( X ) , P ) and has realisations x ∼ P . Let us further denote the set of allprobability measures for a given measurable space ( X , B ( X )) as P ( X , B ( X )).For the sets A and B a relation R ⊂ A × B is a subset of the Cartesianproduct A × B . The relation R relates x ∈ A with y ∈ B if ( x, y ) ∈ R , which isequivalently written as x R y . For a given set Y a metric or distance function d Y isa function d Y : Y × Y → R ≥ satisfying the following conditions: ∀ y , y , y ∈ Y : d Y ( y , y ) = 0 iﬀ y = y ; d Y ( y , y ) = d Y ( y , y ); and d Y ( y , y ) ≤ d Y ( y , y ) + d Y ( y , y ). General Markov decision processes are related to control Markov processes [1]and Markov decision processes [4,20,15], and formalised as follows.

Deﬁnition 1 (general Markov decision process (gMDP)).

A discrete-time gMDP is a tuple M = ( X ,π, T , U ,h, Y ) with – X , an (uncountable) Polish state space with states x ∈ X as its elements; – U , the set of controls which is a Polish space; – π , the initial probability measure π : B ( X ) → [0 , ; – T : X × U × B ( X ) → [0 , , a conditional stochastic kernel assigning to eachstate x ∈ X and control u ∈ U probability measure T ( · | x, u ) over ( X , B ( X )) ; – Y , the output space decorated with metric d Y ; and – h : X → Y , a measurable output map. For any set A ∈ B ( X ), P x,u ( x ( t + 1) ∈ A ) = R A T ( dx ′ | x ( t ) = x, u ), where P x,u denotes the conditional probability P ( · | x, u ). At every state, the state transitiondepends non-deterministically on the choice of u ∈ U . When chosen accordingto a probability measure µ u : B ( U ) → [0 , µ u and denote the transition kernel as T ( ·| x, µ u ) = R U T ( ·| x, u ) µ u ( du ) ∈P ( X , B ( X )). Given a string of inputs { u ( t ) } t ≤ N := u (0) , u (1) , . . . , u ( N ), overa ﬁnite time horizon [0 , N ], and an initial condition x (sampled from π ), thestate at the ( t + 1)-st time instant, x ( t + 1), is obtained as a realisation of thecontrolled Borel-measurable stochastic kernel T ( · | x ( t ) , u ( t )) – these semanticsinduce paths (or executions) of the gMDP. Further, output traces of gMDPare obtained by applying the output map h ( · ) to the paths of gMDP, namely { y ( t ) } t ≤ N := y (0) , y (1) , . . . , y ( N ) with y ( t ) = h (cid:0) x ( t ) (cid:1) for all t ∈ [0 , N ]. Denotethe class of all gMDP with the same metric output space Y as M Y .A policy is a selection of control inputs based on the past history of states andcontrols. When the selected controls are only dependent on the current states,the policy is referred to as Markov. Deﬁnition 2 (Markov policy).

For a gMDP M = ( X , π, T , U , h, Y ) , a Markovpolicy µ is a sequence µ = ( µ , µ , µ , . . . ) of universally measurable maps µ t = X → P ( U , B ( U )) , t ∈ N := { , , , . . . } , from the state space X to the set ofcontrols. We allow controls to be selected via universally measurable maps [4] from thestate to the space of stochastic control inputs, so that properties such as safetycan be maximised [3]. We introduce the notion of a control strategy, and deﬁneit as a broader, memory-dependent version of the Markov policy above. Thisstrategy is formulated again as a gMDP that takes as its input the state of theto-be-controlled gMDP.

Deﬁnition 3 (Control strategy).

A control strategy C = ( X C , x C , X , T C , h C ) for a gMDP M = ( X , π, T , U , h, Y ) is a gMDP with state space X C ; initial state x C ; input space X ; universally measurable kernels T C : X C × X ×B ( X C ) → [0 , ;and with universally measurable output map h C : X C → P ( U , B ( U )) . Note that the stochastic transitions for the control strategy and the gMDP are se-lected in an alternating fashion. The output map of the strategy is indexed basedon the time instant at which the resulting policy will be applied to the gMDP.This is further elucidated in Algorithm 1. The execution { ( x ( t ) , x C ( t )) , t ∈ [0 , N ] } Algorithm 1

Execution semantics of the controlled model C × M . procedure Execution ( C , M )2: set t := 0 and x C (0) := x C

3: draw x (0) ∼ π for t ≤ N do

5: draw x C ( t + 1) ∼ T C ( · | x C ( t ) , x ( t ))6: draw u ( t ) from µ t := h C ( x C ( t + 1))7: draw x ( t + 1) ∼ T ( · | x ( t ) , u ( t )))8: set t := t + 19: end for return { x ( t ) } t ≤ N end procedure π MC ... x (0) x (1) x C (0) x C (1) x C (2) x (0) x (1) x (2) u (0) u (1) y (0) y (1) y (2) of a gMDP M controlled with strategy C (denoted by C × M ) is deﬁned on thecanonical sample space Ω := ( X × X C ) N +1 endowed with its product topology B ( Ω ) and with a unique probability measure P C × M . A Markov policy is as a spe-cial case of control strategy which does not have an internal state that can beused to remember relevant past events. Remark 1.

Any Markov policy µ as deﬁned in Def. 2 can be written as a controlstrategy C µ := ( X µ , x µ , X , T µ , h µ ) with X µ := Q × X , for which Q is the setof time indices Q := {− , , , , , . . . } , and with x µ := ( − , x ) the initialstate for some x ∈ X . The probability measure on the next control state x ′ µ =( q ′ , ˜ x ′ ) ∈ X µ given x ( t ) and given the current state x µ = ( q, ˜ x ) is deﬁned with astochastic kernel as T µ ( A | ( q, ˜ x ) , x ( t )) := (cid:26) q + 1 , x ( t )) ∈ A, . The policies can then be embedded into the control strategy via the output map h µ (( q, ˜ x )) := µ q (˜ x ). iv Consider a measurable target set K ⊂ Y . We say that an output trace { y ( t ) } t ≤ N reaches a target set K , if there exists a time t ∈ [0 , N ] such that y ( t ) ∈ K .This bounded reaching of K is denoted by ♦ ≤ N { y ∈ K } or brieﬂy ♦ ≤ N K .For N → ∞ , we denote the reachability property as ♦ K , i.e., eventually K .For a given gMDP M with control strategy C , a veriﬁcation task consists ofquantifying the probability that an output trace of C × M reaches K withinthe time horizon [0 , N ] , i.e., P C × M ( ♦ ≤ N K ), or that the target set K is eventuallyreached, i.e., P C × M ( ♦ K ), and verifying that it is within a given threshold.More complex properties can be described using temporal logic. Consider aset of atomic propositions AP , the alphabet Σ := 2 AP , and inﬁnite words thatare string composed of elements from Σ , ω = ω (0) , ω (1) , ω (2) , . . . ∈ Σ N . Of in-terest are atomic propositions that are connected to the gMDP via a measurablelabelling function L : Y → Σ from the output space to the alphabet Σ . Via itstrivial extension, output traces { y ( t ) } t ≥ ∈ Y N can be mapped to the set ofinﬁnite words Σ N , as ω = L ( { y ( t ) } t ≥ ) := { L ( y ( t )) } t ≥ . Consider linear-timetemporal logic properties with syntax ψ ::= true | p | ¬ ψ | ψ ∧ ψ | (cid:13) ψ | ψ U ψ . (1)Let ω t = ω ( t ) , ω ( t +1) , ω ( t +2) , . . . be a postﬁx of the word ω , then the satisfactionrelation between ω and a property ψ , expressed via LTL, is denoted by ω (cid:15) ψ (or equivalently ω (cid:15) ψ ). The semantics of the satisfaction relation are deﬁnedrecursively over ω t and the syntax of the LTL formula ψ . An atomic proposition p ∈ AP is satisﬁed by ω t , i.e., ω t (cid:15) p , iﬀ p ∈ ω ( t ). Furthermore, ω t (cid:15) ¬ ψ if ω t ψ and we say that ω t (cid:15) ψ ∧ ψ if ω t (cid:15) ψ and if ω t (cid:15) ψ . The nextoperator ω t (cid:15) (cid:13) ψ holds if the property holds at the next time instance ω t +1 (cid:15) ψ .The temporal until operator ω t (cid:15) ψ U ψ holds if ∃ i ∈ N : ω t + i (cid:15) ψ , and ∀ j ∈ N : 0 ≤ j < i, ω t + j (cid:15) ψ . Based on these semantics, operators suchas disjunction ( ∨ ) can also be deﬁned through the negation and conjunction: ω t (cid:15) ψ ∨ ψ ⇔ ω t (cid:15) ψ or ω t (cid:15) ψ .We are interested in a fragment of LTL property known as syntactically co-safe temporal logic (scLTL) [16]. Even though scLTL formulas are interpretedover inﬁnite words, their satisfaction is guaranteed in ﬁnite time. This fragmentis deﬁned as follows. Deﬁnition 4.

A scLTL over a set of atomic propositions AP has syntax ψ ::= true | p | ¬ p | ψ ∧ ψ | ψ ∨ ψ | (cid:13) ψ | ψ U ψ | ♦ ψ (2) with p ∈ AP . iv Note that for this construction, the separability of the Polish space X is importantas otherwise T µ would not be measurable in general. In the remainder, we will mainly consider scLTL properties since their veriﬁca-tion can be computed via a reachability property over a ﬁnite state automaton[16]. With respect to a scLTL property ψ , we say that a gMDP M satisﬁes ψ fora given control strategy C with probability at least p iﬀ P C × M ( L ( { y ( t ) } t ≥ ) | = ψ ) ≥ p . Apart from this veriﬁcation task, we are mostly interested in synthesisingcontrol strategies such that C × M satisﬁes the inequality. In this paper, we tackle the control synthesis for the (bounded) probabilisticreachability problem and for the temporal logic control problem, deﬁned next.

Problem 1 ((Bounded) probabilistic reachability)

Given a gMDP M anda set K ⊂ Y , compute a control strategy C that maximises the bounded reacha-bility probability P C × M ( ♦ ≤ N K ) or the reachability probability P C × M ( ♦ K ) . Problem 2 (Temporal logic control)

Given a gMDP M , a scLTL property ψ and a labelling function L , compute a control strategy C that maximises theprobability of controlled Markov process C × M satisfying ψ , i.e., max C P C × M ( L ( { y ( t ) } t ≥ ) | = ψ ) . (3)Computation of (3) for a given controlled Markov process is generally impossible.In the next sections, we give a robust computations based on ( ǫ, δ )-probabilisticsimulation relations between a given model and its approximation. ǫ, δ )-probabilistic simulation relations Let a gMDP M = ( X , π, T , U , h, Y ) and a target set K ⊂ Y be given. In therobust formulation of Problem 1, we compute a control strategy C and a quan-tiﬁed lower bound p , such that the probability of satisfying ♦ ≤ N K , respectively ♦ K , is lower bounded by p , i.e., P C × M ( ♦ ≤ N K ) ≥ p , respectively P C × M ( ♦ K ) ≥ p. We associate to the target set K ⊂ Y , the corresponding set in the state space K X := h − ( K ) ∈ B ( X ). Let us denote by r µ ( K X , N ), the probability that K isreached within N time steps when a Markov policy µ = ( µ , µ , . . . , µ N − ) isused. We iterate the computation of stochastic reachability of a Markov deci-sion process, as explain in [3]. The value of r µ ( K X , N ) can be computed by abackward recursion initialised with V N = 0, and iterated for k = N − , . . . , V k ( x ) = Z X (cid:2) K X (¯ x ) + X \ K X (¯ x ) V k +1 (¯ x ) (cid:3) T ( d ¯ x | x, µ k ( x )) . (4)Based on the ﬁnal value function after N iterations, we have that r µ ( K X , N ) = Z X (cid:2) K X ( x ) + X \ K X ( x ) V ( x ) (cid:3) π ( dx ) . (5) Furthermore, the optimal value functions V ∗ k ( x ), k ∈ [0 , N ], computed recur-sively with V ∗ N ( x ) = 0 and for all x ∈ X and k = N − , . . . , , V ∗ k ( x ) = sup µ k Z X (cid:2) K X (¯ x ) + X \ K X (¯ x ) V ∗ k +1 (¯ x ) (cid:3) T ( d ¯ x | x, µ k ( x )) , (6)give the optimal reachability probability r ∗ ( K X , N ) = Z X (cid:2) K X ( x ) + X \ K X ( x ) V ∗ ( x ) (cid:3) π ( dx ) . (7)Using V ∗ k ( x ) the control strategy C ∗ maximising P C × M ( ♦ K ) is deﬁned by aMarkov policy µ ∗ with elements µ ∗ k µ ∗ k ( x ) ∈ arg sup µ k Z X (cid:2) K X (¯ x ) + X \ K X (¯ x ) V ∗ k +1 (¯ x ) (cid:3) T ( d ¯ x | x, µ k ( x )) . (8)The unbounded optimal reachability probability r ∗ ( K X ) can be evaluatedbased on the ﬁxed point of (6). More speciﬁcally, for N → ∞ the value functionsare strictly increasing and converging to the ﬁxed point solution of V ∗ ( x ) = sup µ Z X (cid:2) K X (¯ x ) + X \ K X (¯ x ) V ∗ (¯ x ) (cid:3) T ( d ¯ x | x, µ ( x )) . (9)For a given policy µ , the unbounded reachability probability r µ ( K X ) and theassociated value function V µ ( x ) are formulated similarly. Thus we have thatsup C P C × M ( ♦ K ) = r ∗ ( K X ), respectively, sup C P C × M ( ♦ ≤ N K ) = r ∗ ( K X , N ) . This together, with its universal measurability, has been discussed in [3]. Com-putation of backward recursions (4) and (6) is generally only tractable for ﬁnitestate spaces. In the following, we deﬁne approximate probabilistic simulationrelations over M Y , as introduced in [13], and apply it to compute a lower boundon probabilistic reachability and the corresponding synthesis problem. Consider two gMDPs M i = ( X i , π i , T i , U i , h i , Y ), i = 1 ,

2, that have the samemetric output space ( Y , d Y ). Given state-action pairs x ∈ X , u ∈ U and x ∈ X , u ∈ U , we want to relate the corresponding transition kernels, namelythe probability measures T ( · | x , u ) ∈ P ( X , B ( X )) and T ( · | x , u ) ∈P ( X , B ( X )). As in [13], we introduce the concept of δ -lifting as follows. Deﬁnition 5 ( δ -lifting for general state spaces). Let X , X be two setswith associated measurable spaces ( X , B ( X )) , ( X , B ( X )) , and let R ⊆ X × X be a relation for which R ∈ B ( X × X ) . We denote by ¯ R δ ⊆ P ( X , B ( X )) ×P ( X , B ( X )) the corresponding lifted relation so that ∆ ¯ R δ Θ holds if there existsa probability space ( X × X , B ( X × X ) , W ) (equivalently, a lifting W ) satisfying L1. for all X ∈ B ( X ) : W ( X × X ) = ∆ ( X ) ; L2. for all X ∈ B ( X ) : W ( X × X ) = Θ ( X ) ; L3. for the probability space ( X × X , B ( X × X ) , W ) it holds that x R x withprobability at least − δ , or equivalently that W ( R ) ≥ − δ . We introduce a notion of approximate probabilistic simulation relations whichnaturally leads to control reﬁnement. For this, we build on the notion of inter-face function [11] to deﬁne probabilistic simulation relations that allow for thehierarchical control reﬁnement of two gMDPs U v : U × X × X → P ( U , B ( U )) . This interface function U v is required to be a Borel measurable function. Intu-itively, an interface function implements (or reﬁnes) any control action synthe-sised over the abstract model to an action for the concrete model. Deﬁnition 6 ( ( ǫ, δ ) -probabilistic simulation relation). Consider two gMDPs M i = ( X i , π i , T i , U i , h i , Y ) , i = 1 , , over a shared metric output space ( Y , d Y ) . M is ( ǫ, δ ) -stochastically simulated by M if there exists an interface func-tion U v and a relation R ⊆ X × X , for which there exists a Borel measurablestochastic kernel W T ( · | u , x , x ) on X × X given U × X × X , such that: APS1. ∀ ( x , x ) ∈ R , d Y ( h ( x ) , h ( x )) ≤ ǫ ; APS2. ∀ ( x , x ) ∈ R , ∀ u ∈ U : T ( ·| x , u ) ¯ R δ T ( ·| x , U v ( u , x , x )) , withlifted probability measure W T ( · | u , x , x ) ; APS3. π ¯ R δ π .The simulation relation is denoted as M (cid:22) δǫ M . This deﬁnition extends the known exact notions of probabilistic simulation in[18], and the approximate notions of [7,8] to gMDPs over Polish spaces as elab-orated in [13]. The Borel measurability for both U v (see above) and W T (asin this deﬁnition), which is used to prove the well-posedness of the controllerreﬁnement, can be relaxed to universal measurability [13]. δ -Robust probabilistic reachabilityDeﬁnition 7 (( ǫ, δ )-Robust satisfaction). Consider any M ∈ M Y . We saythat a Markov policy µ for M ( ǫ, δ ) -robustly satisﬁes ♦ ≤ N K with probability p if for every M ∈ M Y with M (cid:22) δǫ M there exists a controller C for M suchthat P C × M ( ♦ ≤ N K ) ≥ p . For a given universally measurable map ν : X → P ( U , B ( U )) and con-stant δ , deﬁne the operator T νδ : F → F acting on the set of functions F := { f : X → [0 , } as T νδ ( V )( x ) := L (cid:18)Z X [ K X (¯ x ) + X \ K X (¯ x ) V (¯ x )] T ( d ¯ x | x, ν ( x )) − δ (cid:19) (10)with L : R → [0 ,

1] being the truncation function L ( · ) := min(1 , max(0 , · )).Deﬁne also the operator T ∗ δ ( V ) on F as T ∗ δ ( V )( x ) := sup ν T νδ ( V )( x ). Theorem 1.

A target set K ⊂ Y of gMDP M is reached (0 , δ ) -robustly withMarkov policy µ and with probability r µ ( K X , N ) , where r µ ( K X , N ) := L (cid:18)Z X [ K X ( x ) + X \ K X ( x ) V δ ( x )] π ( dx ) − δ (cid:19) , (11) and V δ ( x ) is computed recursively according to V δk := T µ k δ ( V δk +1 ) , for k = N − , . . . , with initial value function V δN = 0 . If V δ, ∗ is computed similarly asthe solution of recursion V δ, ∗ k = T ∗ δ ( V δ, ∗ k +1 ) with initial value function V δ, ∗ N =0 and µ ∗ k ∈ arg sup µ k T µ k δ ( V δ, ∗ k +1 ) then we call µ ∗ = { µ , µ , . . . } the optimal (0 , δ ) -robust policy. Notice that for δ = 0 the computation of value functions V δk in Theorem 1 isthe same as (4). The proof of Theorem 1 builds on the construction of a reﬁnedcontrol strategy as has been explained in [13]. For any M such that M (cid:22) δ M with the lifted probability measure W T , the control policy C can be reﬁnedfrom C (cf. [13]). More precisely, a control strategy C that reﬁnes C over M is obtained by extending C with internal states ( x , x ). While the state( x , x ) of C is in R , the control reﬁnement has as its basic ingredients thestates x and x , whose the stochastic transition to the pair ( x ′ , x ′ ) is governedﬁrstly by a point distribution δ x ( t ) ( dx ′ ) based on the measured state x ( t ) of M ; and subsequently, by the lifted probability measure W T ( dx ′ | x ′ , u , x , x ) , conditioned on x ′ .Before tackling unbounded reachability properties, we ﬁrst analyse the be-haviour of T νδ and T ∗ δ . Suppose that W ( x ) ≥ W ( x ) for all x ∈ X , then fora given map ν : X → P ( U , B ( U )) we have T νδ ( W )( x ) ≥ T νδ ( W )( x ) , hence T ∗ δ ( W )( x ) ≥ T ∗ δ ( W )( x ) . Then the series V δN , V δN − , . . . , V δ constructed with V δk = T µ k δ V δk +1 and V δN = 0, is monotonically increasing. Therefore, for a givenpolicy µ the functions { ( T µ k δ ) q ( V ) } q ≥ initialised with V = 0 is point-wise con-verging since it is monotonically increasing and upper bounded. Additionally,the same holds for the function { ( T ∗ δ ) q ( V ) } q ≥ initialised with V = 0. For un-bounded reachability, this yields the following result. Corollary 1

A target set K ⊂ Y of gMDP M is reached (0 , δ ) -robustly withtime-homogeneous Markov policy µ and with probability r µ ( K X ) , where r µ ( K X ) := L (cid:18)Z X [ K X ( x ) + X \ K X ( x ) V δ ( x )] π ( dx ) − δ (cid:19) , (12) with V δ : X → [0 , the solution of V δ = T µδ ( V δ ) computed as the limit ofthe sequence { ( T µδ ) q ( V ) } q ≥ that is initialised with V = 0 . If V δ, ∗ is computedsimilarly as the solution of V δ, ∗ = T ∗ δ ( V δ, ∗ ) and µ ∗ ∈ arg sup T µδ ( V δ, ∗ ) then wecall µ ∗ the optimal (0 , δ ) -robust policy. ǫ, δ )-Robust probabilistic reachability Consider a target set K ⊂ Y , let K ǫ be the largest Borel measurable set suchthat K ǫ ⊂ { y | ∀ ¯ y ∈ Y with d Y (¯ y, y ) ≤ ǫ : ¯ y ∈ K } . (13) We can now introduce an eroded version of the original target set K X as K ǫ X := h − ( K ǫ ) , such that for any pair x , x if x ∈ K ǫ X and x R x then h ( x ) ∈ K . As a consequence of Theorem 1 and of Corollary 1, we can evaluate the( ǫ, δ )-robust reachability with respect to K ǫ X . Corollary 2 ( ( ǫ, δ ) -robust probabilistic reachability) A target set K ⊂ Y is reached ( ǫ, δ ) -robustly with Markov policy µ for M with respect to r ( δ,ǫ ) ( K X , N ) (or r ( δ,ǫ ) ( K X ) ) if it reaches the target set K ǫ as in (13) , (0 , δ ) -robustly withprobability r µ ( K ǫ X , N ) deﬁned in (11) (or r µ ( K ǫ X ) deﬁned in (12) ). Hence for any model M that is in an ( ǫ, δ )-probabilistic simulation relation with M , the combination of Theorem 1 and Corollaries 1 and 2, mean that we canverify probabilistic reachability robustly over M , and moreover, that we cansynthesise a robust controller maximising the satisfaction probability robustly. We now question whether we can quantify an upper bound on a reachabilityprobability using an approximate model M . Consider the operator T µ − δ , deﬁnedas T µ − δ ( V )( x ) := L (cid:18)Z X [ K − ǫ X (¯ x ) + X \ K − ǫ X (¯ x ) V (¯ x )] T ( d ¯ x | x, µ ( x )) + δ (cid:19) (14)with K − ǫ X := h − ( K − ǫ ) and K − ǫ := { y + y ǫ ∈ Y | y ∈ K and d Y (0 , y ǫ ) ≤ ǫ } . Theorem 2.

Consider a target set K ⊂ Y , then an upper bound on the maximalreachability r µ ( K X , N ) of M for M (cid:23) δǫ M can be given as ∀ µ : r µ ( K X , N ) ≤ r ( − δ, − ǫ ) ( K X , N ) (15) for which r ( − δ, − ǫ ) ( K X , N ) := r ( − δ ) ( K − ǫ X , N ) is computed with M as follows, r ( − δ, − ǫ ) ( K X , N ) := L (cid:18)Z X [ K − ǫ X ( x ) + X \ K − ǫ X ( x ) V δ ( x )] π ( dx ) + δ (cid:19) with V δk : X → [0 , such that V δk = sup µ T µ − δ ( V δk +1 ) and V δN = 0 . ǫ, δ )-probabilisticsimulation relations We extend the results on robust probabilistic reachability to the scLTL propertiesin Def. 4. For this purpose, we introduce a model known as Deterministic Finite-state Automaton (DFA).

Deﬁnition 8 (DFA).

A DFA is a tuple A = ( Q, q , Σ, F, t ) , where Q is a ﬁniteset of locations, q ∈ Q is the initial location, Σ is a ﬁnite set, F ⊆ Q is a setof accept locations, and t : Q × Σ → Q is a transition function. A ﬁnite word composed of letters of the alphabet deﬁned by Σ , i.e., ω =( ω (0) , . . . , ω ( n )) ∈ Σ n , is accepted by a DFA A if there exists a ﬁnite run q = ( q (0) , . . . , q ( n + 1)) ∈ Q n +2 such that q (0) = q , q ( i + 1) = t ( q ( i ) , ω ( i )) forall 0 ≤ i ≤ n and q ( n + 1) ∈ F . Similarly, we say that an inﬁnite word w ∈ S isaccepted by a DFA A if there exists a ﬁnite preﬁx of ω accepted by A as a ﬁniteword. More precisely, an inﬁnite word ω ∈ Σ N is accepted by A if and only ifthere exists an inﬁnite run q ∈ Q N such that q (0) = q , q ( i + 1) = t ( q ( i ) , ω ( i ))for all i ∈ N and there exists j ∈ N such that q ( j ) ∈ F . The accepted languageof A , denoted L ( A ), is the set of all words accepted by A . For every scLTLproperty ψ , c.f. Deﬁnition 4, there exists a DFA A ψ such that ω (cid:15) ψ ⇔ ω ∈ L ( A ψ ) . (16)As a result, the satisfaction of the property ψ now becomes equivalent tothe reaching of the accept locations in the DFA. We use the DFA A ψ to specifyproperties of the gMDP M = ( X , π, T , U , h, Y ) as follows. Remember that L : Y → Σ is a given measurable function. To each output y ∈ Y it assigns the letter L ( y ) ∈ Σ . Given a control strategy C , we can deﬁne the probability that a pathof M satisﬁes a scLTL property ψ , i.e. P C × M ( ω ∈ L ( A ψ )).We can reduce the computation of P C × M ( ω ∈ L ( A ψ )) over the traces ω of M to the reachability problem over another gMDP M ⊗ A ψ , which we refer to asa product of the gMDP M and the automaton A ψ . This was originally derivedin [22] for MDPs. We give a similar deﬁnition of the product construction as. Deﬁnition 9 (Product between automaton and gMDP).

Given a gMDP M = ( X , π, T , U , h, Y ) , a ﬁnite alphabet Σ , a labelling function L : Y → Σ and a DFA A ψ = ( Q, q , Σ, F, t ) , we deﬁne the product between M and A ψ tobe another gMDP denoted as M ⊗ A ψ = ( ¯ X , ¯ π, ¯ T , U , ¯ h, Y ) . Here ¯ X = X × Q , ¯ h ( x, q ) = h ( x ) for any ( x, q ) ∈ ¯ X , and ¯ T ( A × { q ′ }| x, q, u ) = Z ˜ x ∈ A ( q ′ = t ( q, L ( h (˜ x )))) · T ( d ˜ x | x, u ) . initialised with ¯ π ( dx, q ) = π ( dx ) t ( q , L ( h ( x )))) . The quantity P C × M ( ω ∈ L ( A ψ )) can be related to the reachability probabil-ity over the gMDP M ⊗ A with a goal state G := X × F , as it was shown to bethe case for MDPs in [22]. Lemma 1.

Given gMDP M , alphabet Σ , labelling function L , and scLTL spec-iﬁcation ψ modelled with DFA A ψ , it holds that P C ( µ,ψ ) × M ( ω ∈ L ( A ψ )) = P µ × ( A ψ ⊗ M ) ( ♦ G )) For any Mrkov policy µ on the product space of A ψ ⊗ M and any control strategy C ( µ, ψ ) on M with properly deﬁned mappings between µ and the control strategy. δ -Robust satisfaction of scLTL properties In the following we analyse the robust satisfaction of scLTL speciﬁcations, whichare temporal speciﬁcations that go beyond reachability properties deﬁned on M .The probabily of satisfying such a temporal speciﬁcation can be quantiﬁed as areachability probability with respect to M ⊗ A ψ . For two gMDPs M and M ,subject to M (cid:22) δ M , we show that ( δ -approximate) probabilistic simulationrelations are preserved under a product with a DFA. Theorem 3.

Let M i , i = 1 , , M i = ( X i , π i , T i , U i , h i , Y ) , be two gMDPs suchthat M (cid:22) δ M and A = ( Q, q , Σ, F, t ) be an automaton. For any labelingfunction L : Y → Σ we have M ⊗ A (cid:22) δ M ⊗ A . This theorem enables us to quantify temporal logic properties for M with re-spect to M . Consider a scLTL property ψ with a corresponding DFA A ψ andtwo gMDPs M , M ∈ M Y for which M (cid:22) δ M . If there exists a Markov policy µ for M ⊗ A ψ such that the accepting states are reached with δ -robust proba-bility p , then there exists a control strategy C for M such that the acceptingstates of A ψ are reached with probability p under the evolution of C × M .More precisely, denote with ¯ X := X × Q the state space of M ⊗ A ψ , then themapping T νδ becomes T νδ ( V )( x , q ) = L (cid:16) Z X (cid:2) F ( t x ( q, x ′ )) + Q \ F ( t x ( q, x ′ )) V ( x ′ , t x ( q, x ′ )) (cid:3) × T ( dx ′ | x , ν ( x , q )) − δ (cid:17) (17)with t x ( q, x ′ ) := t ( q, L ( h ( x ′ )))). For T µδ ( V )( x , q ) = V ( x , q ), the δ -robustreachability probability is deﬁned as r µ ( F × X ) = L (cid:16) Z X [ F ( t x ( q , x )) + Q \ F ( t x ( q , x )) × V ( x , t x ( q , x ))] π ( dx ) − δ (cid:17) . (18) ǫ, δ )-Robust satisfaction of scLTL properties We now integrate the ǫ error in the output space into the robust synthesis prob-lem via the eﬀect it has on the labelling. Given L : Y → Σ , we deﬁne L ǫ : Y → Σ as L ǫ ( y ) := { q ∈ Σ |∃ ( y, y ǫ ) : d Y ( y, y ǫ ) ≤ ǫ and q = L ( y ǫ ) } . (19)Consider M (cid:22) δǫ M with R ǫ , then for all ( x , x ) ∈ R ǫ , it holds that L ( h ( x )) ∈ L ǫ ( h ( x )). In Figure 1, an output space together with its labels is depicted.When taking into account an ǫ error in the output, the labelling becomes non-deterministic in some regions. This can be observed on the right of Figure 1.Instead of integrating this relaxed labelling into the product construction ofa given gMDP, we will immediately adapt the δ -robust reachability computa-tions (17) to deal with this non-determinism. Consider the ( ǫ, δ )-robust operator Fig. 1: A typical labelling over the output space. On the left a normal labellingis given, on the right the labelling is non-deterministic due to the output error. T νǫ,δ ( V )( x , q ) deﬁned as T νǫ,δ ( V )( x , q ) = L (cid:16) Z X min q ′ ∈ ¯ t x ( q,x ′ ) [ F ( q ′ ) + Q \ F ( q ′ ) V k +1 ( x ′ , q ′ )] × T ( dx ′ | x , ν ( x , q )) − δ (cid:17) with ¯ t x ( q, x ) := { t ( q, α ) with α ∈ L ǫ ( h ( x )) } . For a time-homogeneous Markovpolicy µ and V ( x , q ) that satisfy T µǫ,δ ( V )( x , q ) = V ( x , q ), the δ -robust reach-ability probability is deﬁned as r ǫ,δ ( F × X ) = L (cid:16) Z X min q ′ ∈ ¯ t x ( q ,x ) [1 F ( q ′ ) + Q \ F ( q ′ ) V ( x , q ′ )] π ( dx ) − δ (cid:17) . Consider a scLTL property ψ and the corresponding A ψ with goal states F . If F × X is δ -robust reachable with respect to r ǫ,δ ( F × X ), then we can reﬁne µ to C ( µ, ψ ) such that ψ is satisﬁed by C ( µ, ψ ) × M with a probability p ≥ r ǫ,δ ( F × X ). Of course the apparent non-determinism due to the relaxedlabelling will be resolved in the reﬁned control strategy by selecting the labelsof the concrete model.Building on Corollary 1, we can now also maximise the ( ǫ, δ )-robust proba-bility using T ∗ ǫ,δ , deﬁned as T ∗ ǫ,δ ( V )( x , q ) := sup µ T µǫ,δ ( V )( x , q ) , which yieldsan optimised robust Markov policy as µ ∗ ( x , q ) ∈ arg sup µ T µǫ,δ ( V ∗ )( x , q ) for T ∗ ǫ,δ ( V ∗ )( x , q ) = V ∗ ( x , q ) . Hence, we have shown that we can leverage probabilistic simulation relationsto use approximate models for the controller synthesis and veriﬁcation of bothprobabilistic reachability (cf. Problem 1) and scLTL properties (cf. Problem 2).

In this section, we apply our results to formal controller synthesis for stochasticlinear dynamical systems. Existing formal controller synthesis results for thisclass of models either rely on model order reduction [19] or use abstractiontechniques as ﬁnite-state MDPs [9,22]. Our new results combine these two ap-proaches in one framework that beneﬁts from both of them. More precisely, our abstract model used for synthesis is obtained by discretising the state space ofa reduced-order version of the concrete model. Concrete model.

Consider the following linear dynamical model M : x ( t + 1) = A x ( t ) + B u ( t ) + B w w ( t ) , x (0) = x ∈ X y ( t ) = C x ( t ) , t = 0 , , , . . . , where x ( · ) ∈ X ⊂ R n , u ( · ) ∈ U ⊂ R m , and y ( · ) ∈ Y ⊂ R p . Matrices A , B , B w , and C have appropriate dimension and w ( · ) are iid random variableswith standard multivariate Gaussian distributions. Constructing the abstract model.

Construction of the abstract model re-lies on partitioning a new space X s ⊂ R n s , where n s < n , as { A i ⊂ X s , i =1 , , . . . , l } . Over this partition, we select representative points { z i ∈ A i , i =1 , , . . . , l } , which we call X and which becomes the state space of the abstractmodel M . Introduce the operator Π : X s → X that assigns to any x ∈ A i , i ∈ { , . . . , l } the representative point of A i , z i = Π ( x ).Next we provide a dynamical characterisation of M . The state evolution of M is written as x ( t + 1) = Π ( A x ( t ) + B u ( t ) + B w w ( t )) , x (0) = x ∈ X y ( t ) = C x ( t ) , t = 0 , , , . . . , with state x ( · ) ∈ X , input u ( · ) ∈ U , and output y ( · ) ∈ Y , and matrices A , B , B w , C of appropriate dimensions. Note that the noise term w ( t ) in M is the same as the one in M , thereby allowing to deﬁne a lifting W T . Computing the ( ǫ, δ )-simulation relation.

Consider the linear interface func-tion u = Ru + Qx + K ( x − P x ), such that P A = A P + B Q for somematrix P . Deﬁne the relation ( x , x ) ∈ R ǫδ iﬀ ( x − P x ) T M ( x − P x ) ≤ ǫ .Next we check conditions of Def. 6 under which M (cid:22) δǫ M . It is guaranteedthat d Y ( y , y ) ≤ ǫ for any ( x , x ) ∈ R ǫδ (cf. APS1 in Def. 6) if C = C P ,and C T C ≤ M . Condition APS2 in Def. 6 holds if c w is selected such that P ( w T w ≤ c w ) ≥ − δ and the following inequality( ¯ A ¯ x + ¯ Bu + ¯ B w w + P β ) T M ( ¯ A ¯ x + ¯ Bu + ¯ B w w + P β ) ≤ ǫ (20)is satisﬁed for any ¯ x, u , w, β such that w T w ≤ c w , u ≤ c u , ¯ x T M ¯ x ≤ ǫ , | β | ≤ δ .The matrices in (20) are deﬁned as ¯ A := A + B K , ¯ B := B R − P B , and¯ B w := B w − P B w . Vector δ is the diameter of the partition { A i , i = 1 , . . . , l } ,which satisﬁes | x s − x ′ s | ≤ δ component-wise for any x s , x ′ s ∈ A i and any i ∈{ , , . . . , l } . Condition (20) can be checked using LMIs and the S-procedure [6].For Condition APS3 in Def. 6, we now question when there exists a de-terministic initial state x for a given deterministic initial state x satisfy-ing ( x − P x ) T M ( x − P x ) ≤ ǫ . We can choose x := ˆ P x withˆ P := ( P T M P ) − P T M which is the minimum of left-hand side, and select itas the representative point of the associated partition set in M . Alternatively,when the representative point cannot be freely chosen, we select x = Π ( ˆ P x ). In the former case, there exists an initial state x if x T M ( I n − P ˆ P ) x ≤ ǫ ,or, dually, ǫ is lower bounded for a given x . In the latter case, if k M ( I n − P ˆ P ) x k + k M P δ k ≤ ǫ , then there exists an initial state x satisfying Condi-tion APS3 . Toy example.

We consider the speciﬁcation ψ = ♦(cid:3) ≤ n { y ∈ K } which encodesreach and stay over bounded time intervals. The associated DFA is given inFigure 2, together with an illustration of a potential application of this toyexample. Fig. 2: A game of tag: ♦(cid:3) ≤ n { x a ∈ K } Consider the original model M , which is a 3 dimensional model with output y ( t ) = x a and x a ( t + 1) = x a ( t ) − a ( x b ( t ) − x c ( t )) − a u ( t ) + a w ( t ) x b ( t + 1) = bx b ( t ) + u ( t ) x c ( t + 1) = c x c ( t ) + c w ( t ) (21)with a = . a = 0 . a = 6 e -3, b = c = . c = .

1. For the gamewe consider the case with n = 3. As in [13], we compute the lower dimensional − − . . . . x r ǫ , δ ( F , X ) x y Fig. 3: On the left: ( ǫ, δ )-robust satisfaction probability of ♦(cid:3) ≤ n { y ∈ [ − , } with ǫ = 1 . δ = 0 .

03. On the right: simulation runs for the originalmodel and the abstract model with the composed robust controller.model via the balance truncation of the original controlled model with a suitablefeedback gain ( K = [ − . . − . simulation runs are given v , the crosses are the outputs of M whereas the linesare the outputs of M . In this paper, we have introduced a new robust way of synthesising controlstrategies and verifying probabilistic temporal logic properties. Beyond this the-oretical contribution, future work will focus on the computational aspects of thisapproach to prepare for application on realistic sized problems.

References

1. A. Abate. Approximation metrics based on probabilistic bisimulations for generalstate-space Markov processes: a survey.

ENTCS , 297:3–25, 2013.2. A. Abate, J.-p. Katoen, and A. Mereacre. Quantitative Automata Model Checkingof Autonomous Stochastic Hybrid Systems. In

Proc. 14th ACM Int. Conf. HybridSyst. Comput. Control , pages 83–92, 2011.3. A. Abate, M. Prandini, J. Lygeros, and S. Sastry. Probabilistic Reachabilityand Safety for Controlled Discrete Time Stochastic Hybrid Systems.

Automat-ica , 44(11):2724–2734, 2008.4. D. Bertsekas and S. E. Shreve.

Stochastic Optimal control : The discrete time case .Athena Scientiﬁc, 1996.5. V. I. Bogachev.

Measure theory . Springer Science & Business Media, 2007.6. S. Boyd and L. Vandenberghe.

Convex Optimization . CUP, Cambridge, 2004.7. J. Desharnais, F. Laviolette, and M. Tracol. Approximate analysis of probabilis-tic processes: Logic, simulation and games.

Conf. on Quantitative Evaluation ofSystems , pages 264–273, Sept. 2008.8. A. D’Innocenzo, A. Abate, and J.-P. Katoen. Robust PCTL model checking. In

Proceedings of the 15th ACM international conference on Hybrid Systems: compu-tation and control , pages 275–285, 2012.9. S. Esmaeil Zadeh Soudjani and A. Abate. Adaptive and sequential gridding pro-cedures for the abstraction and veriﬁcation of stochastic processes.

SIAM Journalon Applied Dynamical Systems , 12(2):921–956, 2013.10. S. Esmaeil Zadeh Soudjani, C. Gevaerts, and A. Abate. FAUST : Formal Ab-stractions of Uncountable-STate STochastic Processes. In Tools and Algorithmsfor the Construction and Analysis of Systems (TACAS) , Lecture Notes in Com-puter Science, pages 272–286. Springer Berlin Heidelberg, 2015.11. A. Girard and G. J. Pappas. Hierarchical control system design using approximatesimulation.

Automatica , 45(2):566–571, 2009.12. S. Haesaert, A. Abate, and P. M. J. Van den Hof. Veriﬁcation of general markovdecision processes by approximate similarity relations and policy reﬁnement. In , pages 227–243, 2016.13. S. Haesaert, S. Esmaeil Zadeh Soudjani, and A. Abate. Veriﬁcation of generalmarkov decision processes by approximate similarity relations and policy reﬁne-ment.

SIAM Journal on Control and Optimization , 55(4):2333–2367, 2017. v initiated at x a = 2 . x b = 2 . x c = 1 . Discrete-time Markov control processes ,volume 30 of

Applications of Mathematics (New York) . Springer Verlag, 1996.16. O. Kupferman and M. Y. Vardi. Model checking of safety properties.

FormalMethods in System Design , pages 291–314, 2001.17. M. Kwiatkowska, G. Norman, and D. Parker. PRISM 4.0: Veriﬁcation of proba-bilistic real-time systems. In G. Gopalakrishnan and S. Qadeer, editors,

Proc. 23rdInternational Conference on Computer Aided Veriﬁcation (CAV’11) , volume 6806of

LNCS , pages 585–591. Springer, 2011.18. K. G. Larsen and A. Skou. Bisimulation through probabilistic testing.

Informationand Computation , 94(1):1–28, 1991.19. A. Lavaei, S. Esmaeil Zadeh Soudjani, R. Majumdar, and M. Zamani. Composi-tional Abstractions of Interconnected Discrete-Time Stochastic Control Systems.

ArXiv e-prints , Sept. 2017.20. S. P. Meyn and R. L. Tweedie.

Markov chains and stochastic stability . Communi-cations and Control Engineering Series. Springer-Verlag London Ltd., 1993.21. M. G. Safonov and R. Chiang. A Schur method for balanced-truncation modelreduction.

IEEE Transactions on Automatic Control , 34(7):729–733, 1989.22. I. Tkachev, A. Mereacre, J. Katoen, and A. Abate. Quantitative automata-basedcontroller synthesis for non-autonomous stochastic hybrid systems. In

HSCC , pages293–302, 2013.

A Additional proofs

Proof (

Proof of Theorem 1 ). Consider any two models M , M ∈ M Y with M (cid:22) δ M . A Markov policy µ = ( µ , . . . , µ N ) for M can be reﬁned to M .Technically, this means that we ﬁrst can write it as a control strategy for whichwe can do control reﬁned as proven in [13]. Denote the control strategy thatreﬁnes the Markov policy as C . Then the composed system C × M containstransitions over X × X with stochastic transition kernels deﬁned as W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) for k ∈ { , . . . , N − } We now need to show that r µ ( K X , N ) deﬁned in (11) is a lower boundfor the probability that target set K is reached by C × M in N time steps, P C × M ( ♦ ≤ N K ). Onserve that the following holds P C × M (cid:2) (( x , x ) ∈ R ) U ≤ N (( x , x ) ∈ R ∧ ( x ∈ K X )) (cid:3) ≤ P C × M (cid:2) ♦ ≤ N ( y ∈ K ) (cid:3) , (22)since (( x , x ) ∈ R ∧ ( x ∈ K X )) implies h ( x ) = h ( x ) and h ( x ) ∈ K . Theleft-hand side of (22) can be computed via backward recursion of reach-avoidproperty. For this the safe set (i.e., the complement of the avoid set) is R andthe reach set is G := ( K X × X ) ∩ R . The value functions { V k , k = 0 , . . . , N } in the backward recursion are initialised with V N = 0 and computed as V k ( x , x ) = Z X × X (cid:2) G (¯ x ) + R\ G (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) (cid:3) × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) . This can be also written down as V k ( x , x ) = Z R h ( K X × X ) (¯ x ) + ( X \ K X ) × X (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) . We now want to compute a lower bound on V k ( x , x ) based on backwards com-putations over M . The value functions V δk : X → [0 ,

1] are deﬁned inductivelyas V δk = T µ k δ ( V δk +1 ), which is V δk ( x ) := L (cid:18)Z X h K X (¯ x ) + X \ K X (¯ x ) V δk +1 (¯ x ) i T ( d ¯ x | x , µ k ( x )) − δ (cid:19) and initialised with V δk = 0. With focus on the above two recursions we claim thatif V δk +1 ( x ) is a lower bound for V k +1 ( x , x ) for all ( x , x ) ∈ R , then V δk ( x )is also a lower bound for V k ( x , x ). Once we prove this claim, by inductionwe get that V δ ( x ) is a lower bound for V k +1 ( x , x ). As a result, the δ -robustprobability can be computed as r µ ( K X , N ) := L (cid:18)Z X [ K X (¯ x ) + X \ K X (¯ x ) V δ (¯ x )] π ( d ¯ x ) − δ (cid:19) . To prove the claim we need to deﬁne ˜ V δk : X → [ − δ,

1] as˜ V δk ( x ) := Z X h K X (¯ x ) + X \ K X (¯ x ) V δk +1 (¯ x ) i T ( d ¯ x | x , µ k ( x )) − δ, such that V δk ( x ) = L (cid:16) ˜ V δk ( x ) (cid:17) . For any ( x , x ) ∈ R , we have˜ V δk ( x ) + δ = Z X h K X (¯ x ) + X \ K X (¯ x ) V δk +1 (¯ x ) i T ( d ¯ x | x , µ k ( x ))= Z X × X h ( K X × X ) (¯ x , ¯ x ) + ( X \ K X ) × X (¯ x , ¯ x ) V δk +1 (¯ x ) i (23) × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) ≤ Z R h ( K X × X ) (¯ x , ¯ x ) + ( X \ K X ) × X (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x )+ Z ( X × X ) \R W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) (24) ≤ Z R h ( K X × X ) (¯ x , ¯ x ) + ( X \ K X ) × X (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) + δ. Note that (23) follows from the fact that ( x , x ) ∈ R and (24) holds since V δk +1 (¯ x ) is upper bounded by 1 and is lower bounded by V δk +1 (¯ x , ¯ x ) over R .Thus by the deﬁnition of V k , we have also that V k ( x , x ) ≤ ˜ V δk ( x ) for all( x , x ) ∈ R . Since V k is also lower bounded by 0, we have that V k ( x , x ) ≤ L (cid:16) ˜ V δk ( x ) (cid:17) = V δk ( x ), which completes the proof. Proof (

Proof of Theorem 2 ). This proof upper-bounds the maximal reach-ability probability and follows an analogue path to the proof of Theorem 1.Consider any two models M , M ∈ M Y with M (cid:23) δǫ M . A Markov policy µ = ( µ , . . . , µ N ) for M can be reﬁned to M such that the composed system C × M contains transitions over X × X with stochastic transition kernelsdeﬁned as W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) for k ∈ { , . . . , N − } . We now need to show that P µ × M ( ♦ ≤ N K ) has an upper bound computed as givenin Theorem 2. Remark that ♦ ≤ N ( y ∈ K ) implies the speciﬁcation ψ := [( x , x ) ∈ R ] U ≤ N (cid:2) ( x , x ) ∈ ¯ R ∨ ( x ∈ K − ǫ X ) (cid:3) with ¯ R := X × X \ R . Hence it follows that P µ × M (cid:2) ♦ ≤ N ( y ∈ K ) (cid:3) ≤ P C × M [ ψ ] . (25)The right-hand side of (25) can be computed via backward recursion of reach-avoid property. For this the safe set (i.e., the complement of the avoid set) is R and the reach set is (( K X × X ) ∩ R ) ∪ ¯ R . The value functions V k in thebackward recursion are initialised with V N = 0 and computed as V k ( x , x ) = Z X × X h ( K − ǫ X × X ) ∩R (¯ x , ¯ x ) + ¯ R (¯ x , ¯ x )+ R\ ( K − ǫ X × X ) (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) i W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) . (26)We now want to compute an upper bound on V k ( x , x ) based on backwardrecursion over M . Let V − δk : X → [0 ,

1] be deﬁned inductively as V − δk ( x ) := sup ¯ µ L (cid:18)Z X h K X (¯ x ) + X \ K X (¯ x ) V − δk +1 (¯ x ) i T ( d ¯ x | ¯ µ k ( x ) , x ) + δ (cid:19) and initialised with V − δN = 0. We claim that if V − δk +1 ( x ) ≥ V k +1 ( x , x ) for all( x , x ) ∈ R , then V − δk ( x ) ≥ V k ( x , x ). Thus by induction we get V − δ ( x ) ≥ V ( x , x ). By repeating the same argument for initial measure π , we get Theo-rem 2.In order to prove the above claim we need to deﬁne value functions ˜ V δk : X → [ − δ,

1] as˜ V − δk ( x ) := sup ¯ µ (cid:18)Z X h K X (¯ x ) + X \ K X (¯ x ) V − δk +1 (¯ x ) i T ( d ¯ x | ¯ µ k ( x ) , x ) (cid:19) + δ, such that V − δk ( x ) = L (cid:16) ˜ V − δk ( x ) (cid:17) due to the fact that L and sup are inter-changeable here. For any ( x , x ) ∈ R we have V k ( x , x ) ≤ Z R h ( K − ǫ X × X ) ∩R (¯ x , ¯ x ) + R\ ( K − ǫ X × X ) (¯ x , ¯ x ) V k +1 (¯ x , ¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) + δ ≤ Z R h ( K − ǫ X × X ) (¯ x , ¯ x ) + ( X × X ) \ ( K − ǫ X × X ) (¯ x , ¯ x ) V − δk +1 (¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) + δ ≤ Z X × X h ( K − ǫ X × X ) (¯ x , ¯ x ) + ( X × X ) \ ( K − ǫ X × X ) (¯ x , ¯ x ) V − δk +1 (¯ x ) i × W T ( d ¯ x × d ¯ x | µ k ( x ) , x , x ) + δ ≤ sup ¯ µ Z X h K − ǫ X (¯ x ) + X \ K − ǫ X (¯ x ) V − δk +1 (¯ x ) i T ( d ¯ x | ¯ µ ( x ) , x ) + δ = ˜ V − δk ( x ) . Since V k ( x , x ) : X × X → [0 ,

1] it holds that if V k ( x , x ) ≤ ˜ V − δk ( x ) thenalso V k ( x , x ) ≤ L (cid:16) ˜ V − δk ( x ) (cid:17) = V − δk ( x ). This completes the proof. Proof (

Proof of Theorem 3 ). Since M (cid:22) δ M , according to Deﬁnition 6there exists an interface function U v , a relation R ⊆ X × X , and a Borelmeasurable stochastic kernel W T ( · | u , x , x ) such that1. ∀ ( x , x ) ∈ R , h ( x ) = h ( x );2. ∀ ( x , x ) ∈ R , ∀ u ∈ U , T ( ·| x , u ) ¯ R δ T ( ·| x , U v ( u , x , x )) , with liftedprobability measure W T ( · | u , x , x );3. π ¯ R δ π .Indicate the product gMDPs by M i ⊗ A = ( ¯ X i , ¯ π i , ¯ T i , U , ¯ h i , Y ). According toDeﬁnition 9 we have for any i = 1 ,

2, ¯ X i = X i × Q , ¯ π i ( dx i , q ) = π i ( dx i ) · q = q ),¯ h i ( x i , q ) = h i ( x i ) for any ( x i , q ) ∈ ¯ X i , and¯ T i ( A i × { q ′ }| x i , q, u i ) = Z ˜ x i ∈ A i ( q ′ = t ( q, L ( h i (˜ x i )))) · T i ( d ˜ x i | x i , u i ) . In order to prove the theorem we construct the relation R p , the interface function U p v , and the lifted measure W p T based on R , U v , W T .1. Deﬁne R p ⊆ ¯ X × ¯ X with ( x , q ) R p ( x , q ) iﬀ x R x and q = q . Selectany ¯ x = ( x , q ) ∈ ¯ X and ¯ x = ( x , q ) ∈ ¯ X . Then¯ h (¯ x ) = ¯ h ( x , q ) = h ( x ) and ¯ h (¯ x ) = ¯ h ( x , q ) = h ( x ) . Thus(¯ x , ¯ x ) ∈ R p ⇒ ( x , x ) ∈ R ⇒ h ( x ) = h ( x ) ⇒ ¯ h (¯ x ) = ¯ h (¯ x ) .

2. Since π ¯ R δ π , there exists a lifted measure W Init such that W Init ( R ) ≥ − δ , W Init ( A × X ) = π ( A ) for all A ∈ B ( X ), and W Init ( X × A ) = π ( A )for all A ∈ B ( X ). Deﬁne the probability space (cid:0) ¯ X × ¯ X , B ( ¯ X × ¯ X ) , ¯ W Init (cid:1) with the property that for all A ∈ B ( X ), A ∈ B ( X ), and q , q ∈ Q ¯ W Init ( { q } × A × { q } × A ) := W Init ( A × A )1( q = q )1( q = q ) , where q is the initial state of the automaton A . In words, ¯ W Init assignsprobabilities (which are the same as W Init ) to Borel measurable subsets of¯ X × ¯ X if and only if the discrete modes are equal to q . For this particularlifted measure, we have – ¯ W Init ( { q } × A × ¯ X ) = W Init ( A × X )1( q = q ) = π ( A )1( q = q ) =¯ π ( { q } × A ) . – ¯ W Init ( ¯ X × { q } × A ) = W Init ( X × A )1( q = q ) = π ( A )1( q = q ) =¯ π ( { q } × A ) . – ¯ W Init ( R p ) = P q ,q ¯ W Init (( x , x ) ∈ R ∧ q = q ) = W Init ( R ) ≥ − δ .Therefore ¯ π ¯ R p δ ¯ π with lifted measure ¯ W Init .3. We know that for all ( x , x ) ∈ R and u ∈ U , T ( ·| x , u ) ¯ R T ( ·| x , U v ( u , x , x )) , with lifted probability measure W T ( · | u , x , x ). Deﬁne the new measure¯ W T ( · | u , ¯ x , ¯ x ) on ¯ X × ¯ X given U × ¯ X × ¯ X with¯ W T ( { q ′ } × A × { q ′ } × A | u , q , x , q , x ) := Z ˜ x ∈ A Z ˜ x ∈ A . . . ( q ′ = t ( q , L ( h (˜ x )))) · ( q ′ = t ( q , L ( h (˜ x )))) · W T ( d ˜ x × d ˜ x | u , x , x ) . (27)In words, ¯ W T assigns probabilities to Borel measurable subsets of ¯ X × ¯ X which are the same probabilities as in W T and evolves the discrete modeof the two gMDP according to the automaton A . For this particular liftedmeasure ¯ W T , we have for any ¯ x = ( x , q ) ∈ X and ¯ x = ( x , q ) ∈ X with ¯ x R p ¯ x and any u ∈ U :¯ W T ( { q ′ } × A × ¯ X | u , q , x , q , x ) = X q ′ ∈ Q Z ˜ x ∈ X Z ˜ x ∈ A . . . = Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) Z ˜ x ∈ X W T ( d ˜ x × d ˜ x | u , x , x )= Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) W T ( d ˜ x × X | u , x , x )= Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) T ( d ˜ x | x , u )= ¯ T ( A × { q ′ }| x , q , u ) , and¯ W T ( ¯ X × { q ′ } × A | u , q , x , q , x ) = X q ′ ∈ Q Z ˜ x ∈ X Z ˜ x ∈ A . . . = Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) Z ˜ x ∈ X W T ( d ˜ x × d ˜ x | u , x , x )= Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) W T ( X × d ˜ x | u , x , x )= Z ˜ x ∈ A ( q ′ = t ( q , L ( h (˜ x )))) T ( d ˜ x | x , U v ( u , x , x ))= ¯ T ( A × { q ′ }| x , q , U v ( u , x , x )) . Take any ( x , q ) R p ( x , q ) which implies q = q and x R x , hence h ( x ) = h ( x ). If we also assume (˜ x , q ′ ) R p (˜ x , q ′ ), then t ( q , L ( h (˜ x ))) = t ( q , L ( h (˜ x ))) , and we get¯ W T ( R p | u , q ,x , q , x ) = Z (˜ x , ˜ x ) ∈R W T ( d ˜ x × d ˜ x | u , x , x ) · X q ′ ,q ′ ∈ Q ( q ′ = t ( q , L ( h (˜ x )))) · ( q ′ = t ( q , L ( h (˜ x )))) . The above sum is equal to one due to q = q , (˜ x , ˜ x ) ∈ R , and the DFAbeing deterministic. Then¯ W T ( R p | u , q ,x , q , x ) = Z R W T ( d ˜ x × d ˜ x | u , x , x ) ≥ − δ. We have shown that¯ T ( A × { q ′ }| x , q , u ) ¯ R p δ ¯ T ( A × { q ′ }| x , q , U v ( u , x , x ))with lifted measure ¯ W T deﬁned in (27) and the same interface function U vv