[PDF] A compositional semantics for Repairable Fault Trees with general distributions

Abstract

Fault Tree Analysis (FTA) is a prominent technique in industrial and scientific risk assessment. Repairable Fault Trees (RFT) enhance the classical Fault Tree (FT) model by introducing the possibility to describe complex dependent repairs of system components. Usual frameworks for analyzing FTs such as BDD, SBDD, and Markov chains fail to assess the desired properties over RFT complex models, either because these become too large, or due to cyclic behaviour introduced by dependent repairs. Simulation is another way to carry out this kind of analysis. In this paper we review the RFT model with Repair Boxes as introduced by Daniele Codetta-Raiteri. We present compositional semantics for this model in terms of Input/Output Stochastic Automata, which allows for the modelling of events occurring according to general continuous distribution. Moreover, we prove that the semantics generates (weakly) deterministic models, hence suitable for discrete event simulation, and prominently for Rare Event Simulation using the FIG tool.

Full PDF

AA compositional semantics for Repairable FaultTrees with general distributions (cid:63) (cid:63)(cid:63)

Raúl Monti , Pedro R. D’Argenio , Carlos E. Budde Universidad Nacional de Córdoba, FAMAF, Córdoba, Argentina CONICET, Córdoba, Argentina Saarland University, Department of Computer Science, Saarbrücken, Germany UTWENTE

Abstract.

Fault Tree Analysis (FTA) is a prominent technique in in-dustrial and scientiﬁc risk assessment. Repairable Fault Trees (RFT)enhance the classical Fault Tree (FT) model by introducing the possibil-ity to describe complex dependent repairs of system components. Usualframeworks for analyzing FTs such as BDD, SBDD, and Markov chainsfail to assess the desired properties over RFT complex models, eitherbecause these become too large, or due to cyclic behaviour introducedby dependent repairs. Simulation is another way to carry out this kindof analysis. In this paper we review the RFT model with Repair Boxesas introduced by Daniele Codetta-Raiteri. We present compositional se-mantics for this model in terms of Input/Output Stochastic Automata,which allows for the modelling of events occurring according to generalcontinuous distribution. Moreover, we prove that the semantics generates(weakly) deterministic models, hence suitable for discrete event simula-tion, and prominently for Rare Event Simulation using the

FIG tool.

Fault Tree Analysis is a prominent technique for dependability assessment ofcomplex industrial systems. Standard or

Static Fault Trees (SFTs [21]) are DAGswhose leafs are called Basic Events (BE), and usually represent the failure of aphysical system component. Each leaf is equipped with a failure rate or dis-crete probability, indicating the frequency at which the component breaks. Theother FT nodes are called gates, and they model how basic components failurescombine to induce more complex system failures, until the failure of interest(the top event of the tree) occurs. SFTs thus encode a logical formula. One ofthe most eﬃcient analysis techniques uses Binary Decision Diagrams (BDD) torepresent the formula, and then perform dependability studies using specialisedalgorithms. This assumes the absence of stochastic dependency among BEs.Many extensions to SFTs allow for further modelling capabilities. One of themost studied are

Dynamic Fault Trees (DFTs [16,22]). DFTs add gates to de-scribe time- and order-dependence among the tree nodes, in contrast to the plain (cid:63)

Supported by SeCyT-UNC 05/BP12, 05/B497 and ERC grant 695614 (POWVER). (cid:63)(cid:63)

Also by NWO project 15474 (

SEQUOIA ) and EU project 102112 (

SUCCESS ). a r X i v : . [ c s . F L ] O c t ombinatorial behavior of SFT gates. New analysis methods were introduced inorder to capture temporal requirements, such as cut sequences, translation toMarkov models [16,16,6], Sequence BDDs [19,28,35], algebraic approaches [25,1],simulation, and combination and optimisations thereof [3,20]. Repairable Fault Trees (RFT [4,27,2,6]) increase FTs expressiveness by in-troducing the possibility to model complex inter-dependent repair mechanismsfor basic components, i.e. system components that produce the basic events.In former models such as DFT, certain notions of repair had been addressedby allowing components to be repaired independently. Nevertheless, this is notusual in real world systems, where repair scheduling, resources management, andmaintenance play an important role. To address this, we will focus on the

RepairBox model (RBOX [18,27]). A RBOX models a repair unit in charge of repair-ing certain BEs following certain policy. Diﬀerent repair policies such as ﬁrstcome ﬁrst serve , priority service , random or nondeterministic choice , allow toanalyze the impact of taking these decisions in the real system. The introductionof these boxes greatly changes the dynamic of the tree. Quantitative analysesare no longer a combinatorial calculation, since the evolution of the system overtime has to be considered [32]. Furthermore, traditional qualitative analysis suchas cut sets lack of utility by not taking repairability into account. Traditionalquantitative analysis is also discarded by the cyclic behavior introduced by thismodel which disallows to use combinatorial solutions proposed for non repairableFTs and require a state based solution instead [3].In this work we present a formal deﬁnition of Repairable Fault Trees (RFT),along with its semantics given in terms of Input/Output Stochastic Automata(IOSA) [15,17]. We show that the underlying IOSA semantics of the RFT speciﬁ-cation is weakly deterministic , that is, the non-determinism present in the IOSAmodel is spurious. Hence the model is equivalent to a fully stochastic modeland thus amenable to discrete event simulation. IOSA allows us to model RFTsgeneral continuous failure and repair distributions.A variety of works address the problem of deﬁning a rigorous syntax andsemantics to FT, DFT, and RFT [14,4,6,2,6, etc.]. They usually diﬀer, for exam-ple, in the types and meaning of gates, expressiveness power, how spare elementsare claimed and how repair races are resolved. Presence of non-deterministic sit-uations is also a main discording issue. Comprehensive surveys on FTs can befound in [22] and [32]. In Section 5 we formally deﬁne the syntax for RFTsin a similar manner as [5] has done for DFTs. Furthermore, in order to deﬁnethe compositional and weakly deterministic semantics using IOSA, we discussdiﬀerent concerns about determinism on RFTs.As discussed before, RFT analysis requires a state space solution. This usu-ally means one of the following two approaches. A ﬁrst approach would be trans-lating the model to a Markov model, applying as much optimisations as possibleduring the modelling and analysis in order to relieve the state explosion prob-lem as much as possible. This is the approach followed by many works suchas [2,3,4]. Two main drawbacks can be pointed out on this approach. The ﬁrstone is that no matter which existing optimisation methods are used, there is nouarantee that there will be a signiﬁcant state space reduction in general models.This is a specially diﬃcult situation in big and complex industrial size systemsanalysis involving repair. A second drawback is the restriction to exponentiallydistributed events, not allowing to correctly model real life systems where tim-ing is governed by other continuous distributions. This is the case for example ofphenomena such as timeouts in communication protocols, hard deadlines in real-time systems, human response times or the variability of the delay of sound andvideo frames (so-called jitter) in modern multi-media communication systems,which are typically described by non-memoryless distributions such as uniform,log-normal, or Weibull distributions [17]. A second approach to RFT analysiswould be recurring to simulation, which does not need the full state space ofthe model to be constructed, and does not impose per se the restriction to anykind of probabilistic distributions. The main problem when confronting simula-tion is the big amount of computation needed to reach a suﬃciently accurateresult. This is a most relevant issue when analyzing highly dependable or faulttolerant systems, where the failure probability is very small and plane MonteCarlo simulation becomes infeasible. To face this problem one can make use ofRare Event Simulation techniques such as Importance Splitting or ImportanceSampling [33,10,11,29].Our main contribution in this work consists in a method for precisely mod-elling RFTs with generally distributed events. Furthermore, by yielding a deter-ministic IOSA model, thus amenable to discrete event simulation, we are ableto analyze it on the FIG Rare Event Simulation Tool [11,12], greatly improvingeﬃciency when analyzing highly dependable systems. Also the recent work [31]takes on the matter of using rare event simulation to analyze DFTs with com-plex repairs. Nevertheless, they restrict to Exponential and Erlang distributionsand they ﬁnally conduce their analysis over a Markov model hence suﬀering ofpotential states space explosion.

In Fig. 1 we depict the set of RFTelements that we consider in this work. Eachof them has a set of inputs where to connect its subtrees, and an output (if ap-plicable) to propagate the failure, repair and other signals. The propagation ofa failure and its subsequent repair starts at the leafs of the fault tree, includingonly (spare) basic elements. When one of them fails, or gets repaired, it instan-taneously propagates the event to the gates to which it is connected. The stateof a gate changes based on the signals it receives from its inputs and propagatesits new state to the gates it serves as input. Thus, a proper combination andtiming of fail signals may change a gate’s state to failing, and similarly, a propercombination and timing of repair signals may change it back to a working state.This very much depends on the type of gate. The state changes will at the sametime trigger output signals accordingly. Not only fail and repair signals, but alsoother signals may be produced, as it can be in the case of repair boxes, whichmay output a start repairing signal to any of their input basic elements. nputs2/3Output VOTING2/3gateInputsOutput PANDgateInputsOutput ORgateOutputInputs ANDgate BEOutputSBEOutput Inputs RBOXDependentInputsTriggeringInput FDEPgate SpareInputsMainInputs SparegateOutput

Fig. 1: RFT elements The intuition about thebehavior of each gate is asfollows. An AND gate failswhenever all its inputs fail,and gets repaired (stop fail-ing) when at least one of itsinputs is repaired. An ORgate fails whenever at leastone of its inputs fails andis repaired when all of itsinputs are repaired. A k/n

VOTING gate fails when-ever at least k of its n in-puts fail and stops failing ifat most k − of its inputs re-main failing. A PAND gatefails whenever its inputs failfrom left to right, inducingan order on the failure oc-currence, and it is repaired if the last input is repaired. A functional dependencygate (FDEP) has n + 1 inputs. The fail signal of one of its inputs (the trigger-ing one) makes all the other inputs inaccessible to the rest of the system. Notethat the dependent inputs do not necessarily fail, and they will be accessibleagain as soon as the triggering component is repaired (note the diﬀerence with[6,31] where dependent BEs do fail). In fact this gate can be easily replaced bya system of OR gates [34]. A spare basic element (SBE) is a special case of BEwhich can be enabled and disabled, and can be used as spare parts for otherBEs through spare gates. A same SBE can be shared by several spare gates,and diﬀerent sharing policies are introduced for this purpose. A spare gate (SG)allows to replace a basic element by one of several spare basic elements in caseit fails. Each spare gate has a main input and n spare parts inputs. The maininput can only be a BE. The spare inputs can only be SBEs. As soon as themain input fails, the SG uses its own policy to ask for the replacement by oneof its spare inputs. The SG will fail whenever it does not obtain a replacement,and will signal repair whenever the main input gets repaired or a spare input isobtained. If an in-use replacement fails the SG will look for a new one. If themain input is repaired, the SG will free the acquired spare input, in case thereis one. A repair box (RBOX) is the unit in charge of managing the repairing offailed BEs and SBEs. They have n inputs, which are the elements administeredfor repairing, and a dummy output. A RBOX policy determines in which orderthe failing elements will be repaired. Also notice that a RBOX can only repairone of its inputs at a time, while the rest of its failing inputs are waiting forrepair. Input/Output Stochastic Automata

Input/Output Stochastic Automata [15,17] is a modelling formalism tailored tomodel stochastic systems for the purpose of simulation. IOSA combine continu-ous probability jumps from Stochastic Automata, with discrete event synchroni-sation for a compositional style of modelling. IOSAs use continuous random vari-ables to control and observe the passage of time. These variables, called clocks,are set to a value according to their associated probability distribution, and, astime evolves, count down all at the same rate until they reach the value of zero.Clocks control the moments when actions are taken, and thus allow to modelsystems where events occur at random continuous time stamps. Output and in-put transitions can be used to synchronize and communicate between diﬀerentIOSAs. Output transitions are autonomous, while inputs occurrence depends onsynchronisation with outputs. A transversal classiﬁcation for actions allows tomark them as urgent or non urgent. While a non-urgent output is controlled bythe expiration of clocks (i.e., clocks reaching the value zero), an urgent outputaction is taken as soon as the state in which it is enabled is reached. Though anIOSA may be non-deterministic, [17] provides a set of suﬃcient conditions thatguarantee weak determinism (i.e. only spurious non-determinism is present).Furthermore, such conditions can be checked with a polynomial algorithm onthe components of the model.

Deﬁnition 1. An input/output stochastic automaton with urgency (IOSA) isa structure ( S , A , C , −→ , C , s ) , where S is a (denumerable) set of states, A isa (denumerable) set of labels partitioned into disjoint sets of input labels A i and output labels A o , from which a subset A u ⊆ A is marked as urgent , C is a(ﬁnite) set of clocks such that each x ∈ C has an associated continuous probabilitymeasure µ x on R s.t. µ x ( R > ) = 1 , −→ ⊆ S ×C×A×C× S is a transition function, C is the set of clocks that are initialized in the initial state, and s ∈ S is theinitial state. In addition it should satisfy the following constraints: (a) If s C, a ,C (cid:48) −−−−−→ s (cid:48) and a ∈ A i ∪ A u , then C = ∅ . (b) If s C, a ,C (cid:48) −−−−−→ s (cid:48) and a ∈ A o \ A u , then C is a singleton set. (c) If s { x } , a ,C −−−−−−−→ s and s { x } , a ,C −−−−−−−→ s then a = a , C = C and s = s . (d) For every a ∈ A i and state s , there exists a transition s ∅ , a ,C −−−−→ s (cid:48) . (e) For every a ∈ A i , if s ∅ , a ,C (cid:48) −−−−−→ s and s ∅ , a ,C (cid:48) −−−−−→ s , C (cid:48) = C (cid:48) and s = s . (f) There exists a function active : S → C such that: (i) active ( s ) ⊆ C , (ii) enabling ( s ) ⊆ active ( s ) , (iii) if s is stable, active ( s ) = enabling ( s ) , and (iv) if t C,a,C (cid:48) −−−−−→ s then active ( s ) ⊆ ( active ( t ) \ C ) ∪ C (cid:48) .where enabling ( s ) = { y | s { y } , _ , _ −−−−−−→ _ } , and s is stable if there is no a ∈ A u ∩A o such that s ∅ ,a, _ −−−−→ _. (_ indicates the existential quantiﬁcation of a parameter.) Restrictions (a) to (f) are there to ensure that at most one non-urgent outputaction is enabled at a time. If in addition the IOSA is closed (i.e., all commu-nications have been resolved and hence the set of inputs is empty) and all itsable 1: Parallel composition on IOSA s C, a ,C (cid:48) −−−−−→ s (cid:48) s || s C, a ,C (cid:48) −−−−−→ s (cid:48) || s a ∈ A \A (1) s C, a ,C (cid:48) −−−−−→ s (cid:48) s || s C, a ,C (cid:48) −−−−−→ s || s (cid:48) a ∈ A \A (2) s C , a ,C (cid:48) −−−−−−→ s (cid:48) s C , a ,C (cid:48) −−−−−−→ s (cid:48) s || s C ∪ C , a ,C (cid:48) ∪ C (cid:48) −−−−−−−−−−−→ s (cid:48) || s (cid:48) a ∈ A ∩A (3) urgent actions are conﬂuent (in the sense of [26], see also Def. 5) it turns out allthe non-determinism is spurious and does not alter the stochastic behavior (i.e.regardless of how non-determinism is resolved, the stochastic properties remainthe same) [15,17]. We call this property weak determinism .IOSAs are closed under parallel composition which is deﬁned according torules in Table 1. In order to avoid unintended behavior, the component IOSAsare requested to be compatible , that is, they should not share output actions norclocks, and be consistent with respect to urgent actions. module BE fc, rc : clock; inform : [0..2] init 0; broken : [0..2] init 0; [fl!] broken=0 @ fc -> (inform’=1) & (broken’=1); [r??] broken=1 -> (broken’=2) & (rc’= γ ); [up!] broken=2 @ rc -> (inform’=2) & (broken’=0) & (fc’= µ ); [f!!] inform=1 -> (inform’=0); [u!!] inform=2 -> (inform’=0); endmodule Fig. 2:

Basic Element IOSA symbolic model.

We present a symbolic lan-guage to describe an IOSAmodel. This language is theinput language of the toolFIG [12,10] and has somestrong resemblance with thePRISM modelling language [23].IOSAs compositional style ofmodelling is also reﬂected inthe language, where each com-ponent is modeled separatelyby what we call a module . Amodule is composed of a set ofvariables, whose valuation represent the actual state of the component, a setof clocks corresponding to the enabling clocks for non urgent transitions, and aset of transitions which symbolically describe the possible jumps between states(changes of valuations and resetting of clocks). Fig. 2 models a basic elementas an example. Variables can be of integer (with ﬁnite range) or boolean type.As we will see later, also arrays can be deﬁned as variables. An initial value foreach variable is determined after the keyword init . Clocks measures are deﬁnedat the transitions where they are reset. A transition is described by the name ofthe action which takes place, a guard that deﬁnes the origin states, an enablingclock (only for the case of non-urgent output transitions), a condition describingthe target states, and the set of clocks to be reset. A quick overview of Fig. 2ill help to further understand our symbolic language: Two clocks, fc and rc ,are deﬁned at line . These clocks will be used as enabling clocks for transitionsat lines and , and reset on transitions at lines and where γ and µ arethe distribution associated with rc and fc , respectively. Lines and deﬁnevariables inform and broken , both of integer type ranging between and , andinitialized with value . Line deﬁnes a set of non-urgent output transitions,which produce the output action fl . More precisely, this line deﬁnes the set ofnon-urgent transitions s { fc } , fl! , ∅ −−−−−−−−→ s (cid:48) , where s meets the condition broken=0 ,and s (cid:48) is the result of changing the values of variables inform and broken to while other variables remain with the same values as those in state s . The @ symbol precedes the enabling clock for the transition while the -> symboldistinguishes between conditions for the origin state and the target state. Theconditions on the target state are expressed as assignments to the next valuesof the variables, indicated with an apostrophe. Line deﬁnes an urgent inputtransition with label r . The double question marks after the name indicates thatit describes urgent input transitions. Urgent output transition are indicated withdouble exclamation marks ( !! ), non urgent input transitions with a single ques-tion mark, and non-urgent output transitions with a single exclamation mark.At the end of line we ﬁnd the reset of the clock rc to a value from a probabilitydistribution γ . This line then deﬁnes transitions s ∅ , r?? , { rc } −−−−−−−−→ s (cid:48) , where s meetswith condition broken=1 and s (cid:48) is identical to s except for variable broken whichhas value . At line , an urgent output transition is deﬁned, indicating thefailure of this component through action f!! . We will usually use these urgenttransitions to synchronize and communicate with other modules.The text of Fig. 2 is tacitly completed with self-loops with all inputs inall constraints that are not explicitly written. For example, in Fig. 2, the line“ [r??] broken != 1 -> ; ” is assumed to exist. In this section we present a formal deﬁnition of the RFT similar to those of[5,7] along with its semantics given in terms of IOSA. Each element of a RFT ischaracterized by a tuple consisting of its type, its arity (i.e. number of inputs),and possibly other parameters like probability distributions for fail and repairevents in a BE.

Deﬁnition 2.

Let n, m, k ∈ N + , and let µ , ν and γ be continuous probabilitydistributions. We deﬁne the set E of elements of a RFT to be composed of thefollowing tuples: – ( be , , µ, γ ) and ( sbe , , µ, ν, γ ) , which represents basic and spare basic ele-ments, with no inputs, with an active failure distribution µ , a dormant failuredistribution ν , and a repair distribution γ . – ( and , n ) , ( or , n ) and ( pand , n ) , which represent AND, OR and PAND gateswith n inputs, respectively, ( vot , n, k ) , which represent a k from n voting gate, – ( fdep , n ) , which represents a functional dependency gate, with trigger inputand n − dependent ones. By convention the ﬁrst input is the triggering one. – ( sg , n ) , which represents a SPARE gate with one main input and n − spareinputs. By convention the ﬁrst input is the main one. – ( rbox , n ) , which represents a RBOX element for n BEs (or SBEs).

A RFT is a directed acyclic graph, for which every vertex v is labeled with anelement l ( v ) ∈ E . An edge from v to w means that the output of v is connectedto an input of w . Since the order of the inputs is relevant, we give them in termsof a list i ( w ) instead of a set. Similarly, si ( v ) will list all the spare gates to whicha spare basic element v is connected as an input. Let t ( v ) indicate the type of v . That is, t ( v ) is the ﬁrst projection of l ( v ) . Let v ) indicate the number ofinputs of v , that is, it is the second projection l ( v ) . Deﬁnition 3.

A repair fault tree is a four-tuple T = ( V, i, si, l ) , where V is aset of vertices, l : V → E is a function labeling each vertex with a RFT element, i : V → V ∗ is a function assigning v ) inputs to each element v in V , and si : V → V ∗ which indicate which spare gates manage each spare BE. The set ofedges E = { ( v, w ) ∈ V | ∃ j · v = ( i ( w ))[ j ] } is the set of pairs ( v, w ) such that v is an input of w . If such an edge exists, we will say that v is connected to w and w to v . In addition, a RFT T should satisfy the following conditions: – The tuple ( V, E ) is a directed acyclic graph (DAG). – T has a unique top element, i.e. a unique element whose non dummy outputis not connected to another gate. That is, there is a unique vertex v ∈ V such that for all w ∈ V , ( v, w ) / ∈ E and t ( v ) (cid:54) = fdep , rbox . – An output can not be more than once the input of a same gate. That is, forall ≤ j, k ≤ | i ( w ) | with i ( w )[ j ] = i ( w )[ k ] , we have j = k . – Since FDEP and RBOX outputs are dummy, if ( v, w ) ∈ E then t ( v ) / ∈{ fdep , rbox } . – The inputs of a repair box can only be basic elements. I.e., if ( v, w ) ∈ E and t ( w ) = rbox then either t ( v ) = be or t ( v ) = sbe . – Each (spare) basic element can be connected to a single RBOX. I.e., if ( v, w ) ∈ E and ( v, w (cid:48) ) ∈ E and t ( w ) = t ( w (cid:48) ) = rbox , then w = w (cid:48) . – The spare inputs of a spare gate can only be spare basic elements, while itsmain input can only be a basic element. I.e., if ( v, w ) ∈ E and t ( w ) = sg then t ( i ( v )[0]) = be and for j > , t ( i ( v )[ j ]) = sbe . Furthermore, a spare basicelement can only be connected to a spare gate or a RBOX, i.e., if ( v, w ) ∈ E and t ( v ) = sbe then t ( w ) ∈ { sg , rbox } . – A spare basic element is an input of a spare gate, if and only if that sparegate is spare input of the spare basic element, i.e. for v and v (cid:48) such that l ( v (cid:48) ) = ( sbe , , µ, ν, γ ) and l ( v ) = ( sg , n ) , ( v (cid:48) , v ) ∈ E if and only if thereexists j such that v = si ( v (cid:48) )[ j ] . – A basic element can be connected to at most one spare gate, i.e. if ( v, w ) ∈ E and ( v, w (cid:48) ) ∈ E with t ( w ) = t ( w (cid:48) ) = sg and t ( v ) = be then w = w (cid:48) . If a basic element is connected to a spare gate then it can not be connectedto a FDEP gate, i.e. if ( v, w ) ∈ E and t ( v ) = be and t ( w (cid:48) ) = sg , then thereis no ( v, w (cid:48) ) ∈ E such that t ( w (cid:48) ) = fdep . In the following, we present a parametric semantics for RFT elements. Thiswill be used later to obtain the semantics for each vertex in a given RFT, andthe consequent semantics of the full model as a parallel composition of its com-ponents. In this section, we only give the semantics for BEs, AND gates, ORgates, PAND gates, and RBOX. Remember that FDEP can be replaced by ORgates. Similarly, voting gates can be modeled by a series of AND and OR gates(although a simpler model can be found in Appendix D). In the design of theIOSA modules we should take into account the communication of each elementof a RFT with its children and parents. For instance a basic element has to com-municate its failure and repair to those gates for which it is an input. Similarly,a RBOX has to communicate to its inputs a start repairing signal. In order todo so, the semantics of each element will be given by a function, which takesactions as parameters. module AND informf: bool init false; informu: bool init false; count: [0..2] init 0; [ f ??] count=1 -> (count’=2) & (informf’=true); [ f ??] count=0 -> (count’=1); [ f ??] count=2 -> ; [ f ??] count=1 -> (count’=2) & (informf’=true); [ f ??] count=0 -> (count’=1); [ f ??] count=2 -> ; [ u ??] count=2 -> (count’=1) & (informu’=true); [ u ??] count=1 -> (count’=0); [ u ??] count=0 -> ; [ u ??] count=2 -> (count’=1) & (informu’=true); [ u ??] count=1 -> (count’=0); [ u ??] count=0 -> ; [ f !!] informf & count=2 -> (informf’=false); [ u !!] informu & count!=2 -> (informu’=false); endmodule Fig. 3:

AND gate IOSA symbolic model.

For a BE element e ∈E , its semantics is a function [[ e ]] : A → IOSA , where [[( be , , µ, γ )]]( ﬂ , up , f , u , r ) resultsin the IOSA of Fig. 2. The stateof a basic element is deﬁnedby the fail clock fc , the repairclock rc , a variable signal thatindicates when to signal thefailure or repair, and variable broken to distinguish betweenbroken and normal states. Abasic element fails when clock fc expires (line 6) and immedi-ately informs it with the urgentsignal f !! at line 11. As soon asthe repair begins by the corre-sponding connected repair box(line 7), clock rc is set. Whenit expires, the component be-comes repaired. Hence, fc is setagain at line 8, and the repair is signaled with urgent action u !! at line 11. Atthe starting state of an IOSA module all its clocks are set randomly accordingto their associated distributions. Thus, rc is set at the initial state and couldeventually expire without having been set by a repair transition. This is whywe have to distinguish between cases when the BE is being repaired ( broken=2 )from when it is not.For an AND gate element with two inputs, its semantics is a function [[ e ]] : A → IOSA , where [[( and , f , u , f , u , f , u ) results in the IOSA in Fig. 3. Atines 6 to 11, the AND gate gets informed of the failure of either of its inputs.Upon failure of some input, we distinguish between the case where the otherinput has already failed ( count=1 ) and the case where it has not ( count=0 ). Inthe ﬁrst case the AND gate has to move to a failure state, for which we set the informf variable in order to enable the signaling of failure at line 20. Furthermorein both cases we increase the value of count so that we take note of the failure ofan input. A similar reasoning is done for the case of the repairing of an input atlines 13 to 18. In this case we have to set the module to signal a repair when aninput gets repaired at a state where both inputs were failing (lines 13 and 16),by enabling transition at line 21. From now on, we omit writing down self loopsoriginated by IOSA’s input enabledness, such as lines , , and as theyare assumed to be there. Nevertheless, we remark that it is necessary to takethem into account when analyzing conﬂuence in the next section. The semanticsfor an OR gate is similar to the AND gate and can be found in Appendix B.The semantics of a n inputs repair box with priority policy , is a function [[( rbox , n )]] : A ∗ n → IOSA , where [[( rbox , n )]]( ﬂ , up , r , ..., ﬂ n − , up n − , r n − ) results in the IOSA of Fig. 4. The RBOX with priority uses the array broken[n] module RBOX broken[n]: bool init false; busy: bool init false; [ ﬂ ?] -> (broken[0]’=true); ... [ ﬂ n − ?] -> (broken[n-1]’=true); [ r !!] !busy & broken[0] -> (busy’=true); ... [ r n − !!] !busy & broken[n-1] & !broken[n-2] & ... & !broken[0] -> (busy’=true); [ up ?] -> (broken[0]’=false) & (busy’=false); ... [ up n − ?] -> (broken[n-1]’=false) & (busy’=false); endmodule Fig. 4:

RBOX with priority policy to keep track of failed in-puts, updating it whenit receives their fail sig-nals (lines 5 to 7) and upsignals (lines 13 to 15).At the same time, whennot busy, it sends repairsignals to broken inputs(lines 9 to 12). Guards en-sure the priority order forrepairing. Note that in-stead of listening to theurgent output signals ofthe input BEs, it listensfor the non-urgent actionsof the transitions that trigger the failure or repair. This is done with the onlypurpose of facilitating the conﬂuence analysis over this module. Other types ofrepair boxes can be modeled, taking into account diﬀerent repairing policies. (seeApp. C).The semantics of a

Priority AND gate with inputs is deﬁned by [[( pand , A → IOSA , where [[( pand , f , u , f , u , f , u ) results in the IOSA of Fig. 5.PAND gates fail only when their inputs fail from left to right. This allows tocondition the failure of a system not only to the failure of the subsystems butalso to the ordering in which they fail. Notice that an n inputs PAND gate issimply a syntax sugar for a system of n − two-input PAND gates connectedin cascade. Literature is not always clear or even disagrees on what should bethe behavior of the PAND gate in case both inputs fail at the same time [24,14].This situation arises in some constructions with AND and OR gates, or when thenputs of a PAND gate are connected to the a same FDEP (see Fig. 6). Some pro-posals disallow these situations and discard them on early syntactic checks [31]. module PAND f1: bool init false; f2: bool init false; st: [0..4] init 0; \\ up, inform fail, failed, inform up, unbreakable [_?] st=0 & f1 & !f0 -> (st’=4); [ f ??] st=0 & !f0 & !f1-> (f0’=true); [ f ??] st=0 & !f0 & f1 -> (st’=1) & (f0’=true); [ f ??] st!=0 & !f0 -> (f0’=true); [ f ??] f0 -> ; [ f ??] st=0 & !f0 & !f1 -> (f1’=true); [ f ??] st=0 & f0 & !f1 -> (st’=1) & (f1’=true); [ f ??] st=3 & !f1 -> (st’=2) & (f1’=true); [ f ??] (st==1|st==2|st=4) & !f1 -> (f1’=true); [ f ??] f1 -> ; [ u ??] st!=1 & f0 -> (f0’=false); [ u ??] st=1 & f0 -> (st’=0) & (f0’=false); [ u ??] !f0 -> ; [ u ??] (st=0|st=3) & f1 -> (f1’=false); [ u ??] (st=1|st=4) & f1 -> (st’=0) & (f1’=false); [ u ??] st=2 & f1 -> (st’=3) & (f1’=false); [ f !!] st=1 -> (st’=2); [ u !!] st=3 -> (st’=0); endmodule Fig. 5:

PAND gate.

Some others assume a non-deterministic situation andﬁnd it important to analyzescenarios where the behavioris in fact unknown [5]. Otherworks decided that the PANDgate does not fail unless its in-puts break strictly from leftto right [6,4]. Some othersstate that PAND gates alsofail when both their inputsfail at the same time [14,9,8].We opted for this last case,so the gates needs to be ableto identify if time has passedbetween the occurrence of thefailures, and act consequently.In the particular case whereno time passes between thefailure of the inputs, we con-sider that the order in whichthe dependent BEs fail doesnot really matter and thus thenon-determinism is spurious.To identify if time has passedbetween the occurrence of the input failures, the model listens to any outputactions, which indicate that a clock has expired.Fig. 6:

Spurious non-determinism.

This is done by a special input actionat line 8, which synchronizes with all non-urgent outputs, regardless the name of theaction. Notice that there is only one sce-nario that we want to rule out, which iswhen the second input fails and then timepasses without the ﬁrst input failing too.This is in fact the case described by theguard of line 8. Furthermore, this tran-sition moves to the ‘unbreakable’ state,from which it can only go back when in-put 1 is ﬁxed. In consequence, the failure of the gate occurs either if both inputsfail at the same time or if the ﬁrst input fails, then time passes, and then thesecond input fails.The semantics of a RFT is that of the parallel composition of the semanticsof its components, being conveniently synchronized. eﬁnition 4.

Given a RFT T = ( V, i, si, l ) we deﬁne the semantics of T as [[ T ]] = || v ∈ V [[ v ]] where [[ v ]] is deﬁned by: [[ v ]] =  [[ l ( v )]]( ﬂ v , up v , f v , u v , r v ) if l ( v ) = ( be , , µ, γ )[[ l ( v )]]( f v , u v , f i ( v )[0] , u i ( v )[0] , ..., f i ( v )[ n − , u i ( v )[ n − ) if l ( v ) ∈ { ( and , n ) , ( or , n ) } [[ l ( v )]]( f v , u v , f i ( v )[0] , u i ( v )[0] , f i ( v )[1] , u i ( v )[1] ) if l ( v ) = ( pand , l ( v )]]( ﬂ i ( v )[0] , up i ( v )[0] , r i ( v )[0] , ..., ﬂ i ( v )[ n − , up i ( v )[ n − , r i ( v )[ n − ) if l ( v ) = ( rbox , n ) In Section 7, we extend the semantics to spare gates and spare basic elements.

In this section we show that RFTs composed only by BEs, AND gates, OR gates,PAND gates, and RBOX, are weakly deterministic. Since voting and FDEP gatescan be constructed using OR and AND gates, the result extends to these gates.Results in this section rely heavily on results about weak determinism on IOSAproven in [17]. Therefore, we ﬁrst summarize the essentials of [17] for this paper.

Deﬁnition 5.

An IOSA is conﬂuent if for all pair of urgent actions a and b , andfor every (reachable) state s , it satisﬁes that, if s ∅ ,a,C −−−−−→ s and s ∅ ,b,C −−−−−→ s ,then there is a state s such that s ∅ ,a,C −−−−−→ s and s ∅ ,b,C −−−−−→ s . Note that, according to this deﬁnition, regardless the order of the conﬂuenttransitions, the same state is reached. This non-determinism is spurious in thesense that it does not alter the stochastic properties of the given IOSA, regardlessthe manner it is solved. Since non-determinism can only arise on urgent actions,we say that a closed

IOSA is weakly deterministic if all its urgent actions areconﬂuent. In [17], we provided suﬃcient conditions to ensure that a closed IOSAis weakly deterministic. This is stated in Theorem 1 below which requires thefollowing deﬁnition.

Deﬁnition 6.

Given an IOSA I with state space S and actions A , we distin-guish the following sets of actions: – A set of urgent output actions B ⊆ A o ∩ A u is initial if each b ∈ B is enabledin s , i.e. if for each b ∈ B there is a state s ∈ S and C ⊆ C , such that s ∅ ,b,C −−−−→ s . – We say that a set B ⊆ A o ∩ A u of output urgent actions is spontaneouslyenabled by b ∈ A \ A u if there are stochastically reachable states s, s (cid:48) ∈ S (astate is stochastically reachable if there is a path in the IOSA from the initialstate that reaches such state with probability greater than zero) such that s is stable, s _ ,b, _ −−−−→ s (cid:48) , and all actions in B are enabled in s (cid:48) . Let a ∈ A u and b ∈ A o ∩ A u . We say that a triggers b if there are stochas-tically reachable states s , s , s ∈ S such that: s _ ,a, _ −−−−→ s , s _ ,b, _ −−−−→ s ,and, if a (cid:54) = b , then there is no outgoing transition from s labeled with b .The set { ( a, b ) | a triggers b } is called the triggering relation . The approximate indirect triggering relation of a composite IOSA is deﬁnedas the reﬂexive transitive closure of the union of the triggering relations of itscomponents. The following theorem from [17], gives necessary conditions for aclosed IOSA not to be conﬂuent. As a consequence, it provides suﬃcient condi-tions for a closed IOSA to be weakly deterministic . Theorem 1.

Given a closed composite IOSA I = ( I || . . . ||I n ) with actions A ,if I is not conﬂuent then there exist a pair of urgent actions a , b ∈ A u such that1. one of the components is not conﬂuent with respect to a and b ,2. there are actions c and d that approximately indirectly trigger both a and b ,respectively, and3. one of the following hold: (i) c and d are initial actions, or (ii) there existsan action e and possible empty sets B to B n spontaneously enabled by e in I to I n respectively, such that c and d are in (cid:83) ni =1 B i . In the following, we prove accessory propositions to eventually prove, usingTheorem 1, that the IOSA deﬁned by a RFT is weakly deterministic.

Proposition 1.

Let T be a RFT. [[ T ]] has no initially enabled actions. Moreover,the only spontaneous sets of actions are singletons of the form { f v } and { u v },for t ( l ( v )) = be , which are spontaneously enabled by fl v and up v , respectively.Proof. As a consequence of [17], the initially enabled actions of [[ T ]] are containedin the union of the sets of initially enabled actions of its components [[ v ]] , v ∈ V ,and the spontaneously enabled actions of [[ T ]] are contained in the union of thespontaneously enabled sets of [[ v ]] . It is direct to see that, for any element e ∈ E ,none of the urgent outputs are enabled at the initial state of [[ e ]] , since theirguards are initially false. Furthermore, the only non-urgent output transition inour models are at lines 6 and 8 of the BE (Fig. 2). Let v ∈ V such that t ( v ) = be .Then, after taking transition at line 6 the only urgent output enabled is f v (onthe instance [[ v ]] ), while after taking transition at line 8 the only one is u v , andthus these are the only possible spontaneous enabled actions. (cid:117)(cid:116) Proposition 2.

Let T be a RFT. The only possible pairs of non-conﬂuent ac-tions in [[ T ]] are { ( f v , u v (cid:48) ) | v, v (cid:48) ∈ i ( w ) , t ( w ) ∈ { and , or , pand }}∪{ ( f w , u v ) , ( u w , f v ) | v ∈ i ( w ) , t ( w ) ∈ { and , or }} .Proof. The proof of this Proposition follows an exhaustive check over each urgenttransition of each model, in order to single out any non-conﬂuent situation, andcan be found at Appendix E

Proposition 3.

Let T be a RFT. For each v ∈ V , the triggering relation of [[ v ]] is given by: {} , if l ( v ) ∈ { ( be , , µ, γ ) , ( rbox , n ) } , – { ( f w , f v ) | w ∈ i ( v ) } ∪ { ( u w , u v ) | w ∈ i ( v ) } , if l ( v ) ∈ { ( and , n ) , ( or , n ) } , and – { ( u w , u v ) | w = i ( v )[1] } ∪ { ( f w , f v ) | w ∈ i ( v ) } , if l ( v ) = ( pand , .Proof (sketch). It suﬃcies to make a satisﬁability analysis over guards and post-conditions of each pair ( t a , t b ) with t b an output urgent symbolic transition and t a any urgent symbolic transition, taking into account only reachable states. (cid:117)(cid:116) Theorem 2.

Let T be a RFT. Then [[ T ]] is weakly deterministic.Proof. We look for a, b, c, d and e as well as sets B i with i = 1 . . . n as Theorem 1suggests. Since Prop. 1 ensures that there are no initially enabled actions in [[ T ]] , c and d should be spontaneously enabled actions. By the same proposition either e is of the form fl v for some v and then (cid:83) i =1 B i = B = { f v } , or e is of theform up v for some v and then (cid:83) i =1 B i = B = { u v } . In the ﬁrst case, we get c = d = f v for some v , and in the second case c = d = u v . Furthermore, byProp. 2, either a is of the form f w for some w and b is of the form u w (cid:48) for some w (cid:48) or the other way around. As shown by Prop. 3, fail actions ( f v for some v ) onlytrigger fail actions and up actions ( u v for some v ) only trigger up actions, thusit is impossible that c and d indirectly trigger a and b respectively. Therefore,it is not possible to ﬁnd actions a , b , c , d , and e satisfying conditions 1 to 3 inTheorem 1, and hence [[ T ]] is conﬂuent. Since [[ T ]] is also closed, then it is weaklydeterministic. (cid:117)(cid:116) In this section we add the spare gate and spare basic element to the semanticsof RFTs. As before, we aim to guarantee that the IOSA model derived fromthe RFT is weakly deterministic. In order to do so, we need to bring specialattention to two particular scenarios that could introduce non-determinism ifnot correctly tackled.The ﬁrst scenario is given when a main basic element fails at a spare gatewhich is served with several spare basic elements. At this point, it arises thequestion of which of the available spare basic elements should the spare gatetake. Traditionally, spare elements are selected in order from an ordered set.To generalize this mechanism for the selection of the spares we intend to allowfor more complex state-involved policies. It should be always the case that thispolicy is deterministic in its elections. The second scenario arises when severalspare gates have requested a broken or already taken SBE, which eventually getsﬁxed by a repair box or freed by the owning spare gate. At this point, it is unclearwhich of the requesting spare gates will take the newly available SBE. For this,we deﬁne sharing policies on the SBE. Thus, to provide semantics to an SBE,we actually introduce two IOSA modules: one providing the extended behaviorof a BE that can be taken from dormant to enabled state and vice versa, andanother one, the multiplexer module, which manages the sharing of the SBE.Notice that this scenario is not a problem in the absence of repair boxes, since inuch cases SBEs do not become available after they are taken or fail. It is neithera problem when spare elements are not shared by diﬀerent spare gates [4,3]. Thework [22] also studies race conditions in spare gates when two spare gates failat the same time. This last situation is impossible in our settings given the lasttwo properties of Deﬁnition 3 and the fact that two simultaneous failures of ourbasic elements is discarded by the IOSA deterministic semantics.The models for the spare gate, the spare basic element and the multiplexercan be found in Appendix F. We extend the semantics of the RFT with the SBEand SG elements as follows.

Deﬁnition 7.

Given a RFT T = ( V, E ) , we extend Deﬁnition 4 with the fol-lowing cases: [[ v ]] =  · · · [[ l ( v )]]( ﬂ v , up v , f v , u v , r v , e v , d v , rq ( si ( v )[0] ,v ) , asg ( v,si ( v )[0]) , rel ( si ( v )[0] ,v ) , acc ( si ( v )[0] ,v ) , rj ( v,si ( v )[0]) , .., rj ( v,si ( v )[ n − ) if l ( v ) = ( sbe , n, µ, ν, γ )[[ l ( v )]]( f v , u v , ﬂ i ( v )[0] , up i ( v )[0] , ﬂ i ( v )[1] , up i ( v )[1] , rq ( v,i ( v )[1]) , asg ( i ( v )[1] ,v ) , acc ( v,i ( v )[1]) , rj ( i ( v )[1] ,v ) , rel ( v,i ( v )[1]) , ..., rel ( v,i ( v )[ n − ) if l ( v ) = ( sg , n ) Notice that in the case of the SBE and SG, several signals are indexed by apair of elements. This pair indicates which gate performs the action and whichone listens for synchronisation. As an example, asg ( v,si ( v )[0]) indicates that themultiplexer that manages v , assigns its spare basic element to its ﬁrst connectedspare gate ( si ( v )[0] ).Unfortunately, we could not ﬁnd an easy or direct way to prove that thisextension is indeed weakly deterministic, as we did with the RFT without spares.This is due in part to the complexity of the IOSA modules, intended to avoidthe aforementioned non-deterministic situations. While the spare basic elementmodule can be easily proved to be conﬂuent, this is not the case for the modules ofthe multiplexer and the spare gate. When analyzing these modules in isolationwe ﬁnd that some transitions are not conﬂuent and Theorem 1 could not beused directly. However, by partially composing spare gates with multiplexers, wewere able to check that conditions of Theorem 1 are not met. We automaticallyperform this check in several conﬁgurations, and showed that they are conﬂuent.As parallel composition preserves conﬂuence, they can be inserted in other RFTcontexts yielding weakly deterministic IOSAs. In this work we have deﬁned a semantics for Dynamic Fault Trees with repair boxin terms of Input/Output Stochastic Automata, introducing the novel feature For the reviewers eyes, only: the scripts that prove said conﬁgurations are availableat https://git.cs.famaf.unc.edu.ar/raulmonti/DeterminismScriptsRFT . f general probability measures for failure and repair rates of basic elements.Furthermore we have shown that our semantics produces weakly deterministicmodels which are hence amenable for discrete event simulation. In particular,our models serve as direct input to the FIG Simulator ( http://dsg.famaf.unc.edu.ar/fig ) [12,10], as well as other tools through the intermediate languageJani [13]. A future work direction could be introducing maintenance mechanismand levels of degradation as in [30], in order to increase the possibilities fordeﬁning repair models. Another line of work would be deﬁning an automatictranslation from a graphical modelling tool for fault trees into the IOSA models,in order to automate and ease the modelling and analysis of industrial sizesystems. Adding support for spare sub-trees such as in [] would be an interestingupgrade too, also along with support for sub-tree dedicated repair boxes. References

1. Amari, S., Dill, G., Howald, E.: A new approach to solve dynamic fault trees.In: Reliability and Maintainability Symposium, 2003. Annual. pp. 374–379. IEEE(2003)2. Beccuti, M., Raiteri, D.C., Franceschinis, G., Haddad, S.: Non deterministic re-pairable fault trees for computing optimal repair strategy. In: Baras, J.S., Cour-coubetis, C. (eds.) 3rd International ICST Conference on Performance Eval-uation Methodologies and Tools, VALUETOOLS 2008, Athens, Greece, Octo-ber 20-24, 2008. p. 56. ICST/ACM (2008), https://doi.org/10.4108/ICST.VALUETOOLS2008.4411

3. Bobbio, A., Franceschinis, G., Gaeta, R., Portinale, L.: Parametric fault tree for thedependability analysis of redundant systems and its high-level petri net semantics.IEEE Trans. Software Eng. 29(3), 270–287 (2003), https://doi.org/10.1109/TSE.2003.1183940

4. Bobbio, A., Raiteri, D.C.: Parametric fault trees with dynamic gates and repairboxes. In: Reliability and Maintainability, 2004 Annual Symposium-RAMS. pp.459–465. IEEE (2004)5. Boudali, H., Crouzen, P., Stoelinga, M.: A compositional semantics for dynamicfault trees in terms of interactive markov chains. In: Namjoshi, K.S., Yoneda, T.,Higashino, T., Okamura, Y. (eds.) ATVA 2007. LNCS, vol. 4762, pp. 441–456.Springer (2007), https://doi.org/10.1007/978-3-540-75596-8_31

6. Boudali, H., Crouzen, P., Stoelinga, M.: Dynamic fault tree analysis using in-put/output interactive markov chains. In: DSN 2007. pp. 708–717. IEEE ComputerSociety (2007), https://doi.org/10.1109/DSN.2007.37

7. Boudali, H., Crouzen, P., Stoelinga, M.: A rigorous, compositional, and extensibleframework for dynamic fault tree analysis. IEEE Trans. Dependable Sec. Comput.7(2), 128–143 (2010), https://doi.org/10.1109/TDSC.2009.45

8. Boudali, H., Dugan, J.B.: A discrete-time bayesian network reliability modelingand analysis framework. Rel. Eng. & Sys. Safety 87(3), 337–349 (2005), https://doi.org/10.1016/j.ress.2004.06.004

9. Boudali, H., Dugan, J.B.: Corrections on "a continuous-time bayesian networkreliability modeling and analysis framework". IEEE Trans. Reliability 57(3), 532–533 (2008), https://doi.org/10.1109/TR.2008.925796

10. Budde, C.E.: Automation of Importance Splitting Techniques for Rare Event Sim-ulation. Ph.D. thesis, Universidad Nacional de Córdoba (2017)1. Budde, C.E., D’Argenio, P.R., Hermanns, H.: Rare event simulation with fullyautomated importance splitting. In: Beltrán, M., Knottenbelt, W.J., Bradley, J.T.(eds.) EPEW 2015. LNCS, vol. 9272, pp. 275–290. Springer (2015), https://doi.org/10.1007/978-3-319-23267-6_18

12. Budde, C.E., D’Argenio, P.R., Monti, R.E.: Compositional Construction of Im-portance Functions in Fully Automated Importance Splitting. ACM (2017), http://dx.doi.org/10.4108/eai.25-10-2016.2266501

13. Budde, C.E., Dehnert, C., Hahn, E.M., Hartmanns, A., Junges, S., Turrini, A.:JANI: quantitative model and tool interaction. In: Legay, A., Margaria, T. (eds.)TACAS 2017. LNCS, vol. 10206, pp. 151–168 (2017), https://doi.org/10.1007/978-3-662-54580-5_9

14. Coppit, D., Sullivan, K.J., Dugan, J.B.: Formal semantics for computational engi-neering: A case study on dynamic fault trees. In: ISSRE 2000. pp. 270–282. IEEEComputer Society (2000), https://doi.org/10.1109/ISSRE.2000.885878

15. D’Argenio, P.R., Lee, M.D., Monti, R.E.: Input/output stochastic automata -compositionality and determinism. In: Fränzle, M., Markey, N. (eds.) FORMATS2016. LNCS, vol. 9884, pp. 53–68. Springer (2016), https://doi.org/10.1007/978-3-319-44878-7_4

16. Dugan, J.B., Bavuso, S.J., Boyd, M.A.: Dynamic fault-tree models for fault-tolerant computer systems. IEEE Transactions on Reliability 41(3), 363–377 (Sep1992)17. DâĂŹArgenio, P.R., Monti, R.E.: Input/output stochastic automata with urgency:Conﬂuence and weak determinism. In: International Colloquium on TheoreticalAspects of Computing. pp. 132–152. Springer (2018)18. Franceschinis, G., Gribaudo, M., Iacono, M., Mazzocca, N., Vittorini, V.: Towardsan object based multi-formalism multi-solution modeling approach. In: Proc. ofthe Second Workshop on Modelling of Objects, Components and Agents Aarhus(MOCA02), Denmark. vol. 26, pp. 47–65 (2002)19. Ge, D., Lin, M., Yang, Y., Zhang, R., Chou, Q.: Quantitative analysis of dynamicfault trees using improved sequential binary decision diagrams. Rel. Eng. & Sys.Safety 142, 289–299 (2015), https://doi.org/10.1016/j.ress.2015.06.001

20. Gulati, R., Dugan, J.B.: A modular approach for analyzing static and dynamic faulttrees. In: Reliability and Maintainability Symposium. 1997 Proceedings, Annual.pp. 57–63. IEEE (1997)21. Haasl, D.F., Roberts, N., Vesely, W., Goldberg, F.: Fault tree handbook. Tech.rep., Nuclear Regulatory Commission, Washington, DC (USA). Oﬃce of NuclearRegulatory Research (1981)22. Junges, S., Guck, D., Katoen, J., Stoelinga, M.: Uncovering dynamic fault trees.In: DSN 2016. pp. 299–310. IEEE Computer Society (2016), https://doi.org/10.1109/DSN.2016.35

23. Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM: probabilistic symbolic modelchecker. In: Field, T., Harrison, P.G., Bradley, J.T., Harder, U. (eds.) TOOLS2002. LNCS, vol. 2324, pp. 200–204. Springer (2002), https://doi.org/10.1007/3-540-46029-2_13

24. Manian, R., Coppit, D.W., Sullivan, K.J., Dugan, J.B.: Bridging the gap betweensystems and dynamic fault tree models. In: Reliability and Maintainability Sym-posium, 1999. pp. 105–111. IEEE (1999)25. Merle, G., Roussel, J., Lesage, J., Bobbio, A.: Probabilistic algebraic analysis offault trees with priority dynamic gates and repeated events. IEEE Trans. Reliability59(1), 250–261 (2010), https://doi.org/10.1109/TR.2009.2035793

6. Milner, R.: Communication and concurrency. PHI Series in computer science, Pren-tice Hall (1989)27. Raiteri, D.C., Iacono, M., Franceschinis, G., Vittorini, V.: Repairable fault treefor the automatic evaluation of repair policies. In: DSN 2004. pp. 659–668. IEEEComputer Society (2004), https://doi.org/10.1109/DSN.2004.1311936

28. Rauzy, A.: Sequence algebra, sequence decision diagrams and dynamic fault trees.Rel. Eng. & Sys. Safety 96(7), 785–792 (2011), https://doi.org/10.1016/j.ress.2011.02.005

29. Rubino, G., Tuﬃn, B.: Rare Event Simulation Using Monte Carlo Methods. WileyPublishing (2009)30. Ruijters, E., Guck, D., Drolenga, P., Stoelinga, M.: Fault maintenance trees: re-liability centered maintenance via statistical model checking. In: Reliability andMaintainability Symposium (RAMS), 2016 Annual. pp. 1–6. IEEE (2016)31. Ruijters, E., Reijsbergen, D., de Boer, P., Stoelinga, M.: Rare event simulation fordynamic fault trees. In: Tonetta, S., Schoitsch, E., Bitsch, F. (eds.) ComputerSafety, Reliability, and Security - 36th International Conference, SAFECOMP2017, Trento, Italy, September 13-15, 2017, Proceedings. Lecture Notes in Com-puter Science, vol. 10488, pp. 20–35. Springer (2017), https://doi.org/10.1007/978-3-319-66266-4_2

32. Ruijters, E., Stoelinga, M.: Fault tree analysis: A survey of the state-of-the-art inmodeling, analysis and tools. Computer Science Review 15, 29–62 (2015), https://doi.org/10.1016/j.cosrev.2015.03.001

33. Villén-Altamirano, M., Villén-Altamirano, J.: The rare event simulation methodRESTART: eﬃciency analysis and guidelines for its application. In: Kouvatsos,D.D. (ed.) Network Performance Engineering - A Handbook on Convergent Multi-Service Networks and Next Generation Internet, LNCS, vol. 5233, pp. 509–547.Springer (2011), https://doi.org/10.1007/978-3-642-02742-0_22

34. Xing, L., Dugan, J.B., Morrissette, B.A.: Eﬃcient reliability analysis of systemswith functional dependence loops. Eksploatacja I Niezawodnosc-Maintenance andReliability (3), 65–69 (2009)35. Xing, L., Shrestha, A., Dai, Y.: Exact combinatorial reliability analysis of dynamicsystems with sequence-dependent failures. Rel. Eng. & Sys. Safety 96(10), 1375–1385 (2011), https://doi.org/10.1016/j.ress.2011.05.007

IOSA Symbolic Language

The following context free grammar deﬁnes the complete IOSA symbolic mod-elling language. Here * stands for as many times as you want, + for at leastone time, ? for optional, | for option, and parentheses group productions andelements. MODEL = (

MODULE )+ MODULE = (

VARIABLE | ARRAY | CLOCK ) +

TRANSITION + VARIABLE = NAME : TYPE init

VALUE ; ARRAY = NAME [ INT ]: TYPE init

VALUE ; CLOCK = NAME : clock ;

TRANSITION = [ ( NAME ( ? | ?? | ! | !! )?)? ] PRE ( @ NAME )? −→ POS ; PRE = ((

NAME = EXPR )( & NAME = EXPR ) ∗ )? POS = ( ( NAME ’ = EXPR ) ( & ( NAME ’ = EXPR ) ) ∗ )? EXPR = VALUE | NAME | EXPR OP EXPR | ( EXPR ) | ! EXPROP = | | & | + | - | * | / | = NAME = ( a | b | ... | z | A | B | ... | Z )( a | b | ... | z | A | B | ... | Z | | ... | | _ | - ) ∗ TYPE = boolean | [ INT .. INT ] VALUE = true | false | INTINT = ( | | ... | )( | | ... | ) ∗ Fig. 7: IOSA symbolic language grammar.An IOSA model is composed by a set of modules, each one describing aconcurrent component of the system to model. The body of a module can beclearly divided into three parts: the variables declarations, the clocks declara-tions, and the transitions speciﬁcation. Arrays are declared along with variables,with the additional requirement of deﬁning the range of the array between brack-ets. Transitions preconditions are boolean formulas describing the origin statesfor the symbolic transition. In this case the & symbol stands for the logicalconjunction operator while | stands for the logical disjunction operator. Post-conditions on the other side, describe the changes on the module’s variables(state) by means of assignments to future values. Each assignment is enclosedby parenthesis, and the variable’s name is followed by an apostrophe to indicatethat corresponds to the value of the variable in the reached state after taking thetransition. An & separates each assignment. Notice the similarity with PRISM[23] syntax for describing transitions. Along with the assignment of values tofuture variables, we ﬁnd the reset of clocks. A clock is assigned a probabilitydistribution ( clock (cid:48) = γ ) to indicate that it will be reset to a value from thatprobability distribution immediately before reaching the new state. OR Gate

For an

OR gate element with two inputs, its semantics is a function [[]] : A → IOSA , where [[( or , f, u, f , u , f , u ) results in the following IOSA: module OR informf: bool init false; informu: bool init false; count: [0..2] init 0; [ f ??] count=0 -> (count’=1) & (informf’=true); [ f ??] count=1 -> (count’=2); [ f ??] count=0 -> (count’=1) & (informf’=true); [ f ??] count=1 -> (count’=2); [ u ??] count=2 -> (count’=1); [ ??] count=1 -> (count’=0) & (informu’=true); [ u ??] count=2 -> (count’=1); [ u ??] count=1 -> (count’=0) & (informu’=true); [ f !!] informf & count!=0 -> (informf’=false); [ u !!] informu & count=0 -> (informu’=false); endmodule In the OR gate model, a counter ( count ) is used to register how many inputshave failed at each moment. The failing of an input increases the counter, whilethe repair of an input decreases the counter. We of course take as a premise thatan input will not break two times in a row without being repaired in the middle,neither will it be repaired if it has not failed. When the counter changes its valuefrom 0 to 1, the gate has to inform a failure. It does so in transition at line 16,which gets enabled by the change of variable informf either at line 6 or 8. In thesame way, when count becomes , the repair is informed by enabling transitionat line through the change of variable informu either at line or . C Repair BOXes

For a repair box with ﬁrst come ﬁrst serve policy element e ∈ E with n inputs, itssemantics is a function [[ e ]] : A ∗ n → IOSA , where [[( rbox , n )]]( ﬂ , up , r , ..., ﬂ n − , up n − , r n − ) results in the following IOSA: module RBOX % with first come first serve policy queue[n]: [0..n] init 0; busy: bool init false; r: [0..n] init n; dummy: [0..0] init 0; [ ﬂ ?] -> (dummy’=broken(queue,0)); ... [ ﬂ n − ?] -> (dummy’=broken(queue,n-1)); [!!] fstexclude(queue,0) != -1 & r = n -> (r’=maxfrom(queue,0)); [ r !!] !busy & r = 0 -> (busy’=true) & (queue[0]’=0); ... [ r n − !!] !busy & r = n-1 -> (busy’=true) & (queue[n-1]’=0); [ up ?] -> (queue[0]’=0) & (busy’=false) & (r’ = n); ... [ up n − ?] -> (queue[n-1]’=0) & (busy’=false) & (r’ = n); endmodule he model for a repair box with ﬁrst come ﬁrst serve policy uses an array to markdown each broken input. Notice that each position in the queue corresponds toeach input. A value on an index i means that the input i has not failed, while agreater value on that position indicates for “how long” has it been broken. Repairboxes use some syntactic elements present in FIG (http://dsg.famaf.unc.edu.ar/ﬁg)simulator. These elements do not introduce a new semantics behavior and arethere only to reduce the complexity and obfuscation that would represent mod-elling this using only the grammar presented at App. A. Examples of this arethe function broken which given an array, in this case queue , and an index, inthis case , it increases by one the value at that index and every other valuegreater than in the array. In this way we can check the order in which theinputs failed by comparing the values at the corresponding index. The greaterthe value the sooner they broke. The syntactic function fstexclude on the otherhand, takes an array and a value and returns the index of the ﬁrst element witha diﬀerent value to the one passed. In this case we use it to check if there is anyfailed input. If there is at least one, then maxfrom function will return the indexof the highest value in queue , which corresponds to the input who broke ﬁrst inbetween all the broken ones. For a quick determinism analysis we point out thatall broken , fstexclude , and maxfrom are deterministic. Furthermore all pairs ofurgent transitions in the model are conﬂuent given that their preconditions aremutually exclusive given the value of variable r .For a repair box with random policy element e ∈ E with n inputs, its semanticsis a function [[ e ]] : A ∗ n → IOSA , where [[( rbox , n )]]( ﬂ , up , r , ..., ﬂ n − , up n − , r n − ) results in the following IOSA: module RBOX % with random policy broken[n]: bool init false; busy: bool init false; r: [0..n] init n; [ ﬂ ?] -> (broken[0]’=true); ... [ ﬂ n − ?] -> (broken[n-1]’=true); [!!] some(broken) & r = n -> (r’=random(broken)); [ r !!] !busy & r = 0 -> (busy’=true); ... [ r n − !!] !busy & r = n-1 -> (busy’=true); [ up ?] -> (broken[0]’=false) & (busy’=false) & (r’ = n); ... [ up n − ?] -> (broken[n-1]’=false) & (busy’=false) & (r’ = n); endmodule The model for a random policy repair box presents two new syntactic elementsfrom FIG. These are the function some , which returns a boolean value indicatingif there is some value diﬀerent to zero in the array, and the function random , whichmodels an uniform selection of an index between the non zero valued positionsat an array. Given that these two functions are deterministic, and with a similaranalysis as for the ﬁrst come ﬁrst serve policy repair box, we can deduce thatthis is also a deterministic model.

Voting gate

The following IOSA model corresponds to the modelling of a 2 from 3 votinggate. A generalisation to other values of N and K can be easily obtained. module VOTING count: [0..3] init 0; inform: bool init false; [ f ??] -> (count’=count+1) & (inform’=(count+1=2)); [ f ??] -> (count’=count+1) & (inform’=(count+1=2)); [ f ??] -> (count’=count+1) & (inform’=(count+1=2)); [ u ??] -> (count’=count-1) & (inform’=(count=2)); [ u ??] -> (count’=count-1) & (inform’=(count=2)); [ u ??] -> (count’=count-1) & (inform’=(count=2)); [ f !!] inform & count >= 2 -> (inform’=false); [ u !!] inform & count < 2 -> (inform’=false); endmodule Voting gates are modeled using a counter which counts how many inputs havefailed. This is done by listening to the corresponding fail signals at lines 5 to7, and repair signals at lines 9 to 11. In these same lines we take into accountif we have just reached the K value (2 in our example) or if we have just gonedown this value, which are the circumstances under which to inform the failureand repair respectively, which is ﬁnally done at lines 13 and 14. Although analternative modelling of these gates can be obtained by a combination of ORand AND gates, one may want to reduce the complexity of the system modellingby using this model, which also happens to be deterministic. E Proof (of Proposition 2.).

Parallel composition does not introduce new non-conﬂuent pair of actions and, moreover, it preserves the conﬂuency of its com-ponents [17]. Thus, we look at the components in isolation. First notice thattransitions in an IOSA module are deﬁned symbolically. Each symbolic tran-sition in a module describes, in fact, a set of IOSA transitions, which becomeconcrete when the symbolic transition is evaluated on a state that satisﬁes theguard. Notice also that a state in a module is deﬁned by the current values ofits variables. When analyzing that two urgent actions a and b are conﬂuent in amodule, for each symbolic transition t a and t b deﬁned for those actions in thatmodule, we look for a non-conﬂuence witness , i.e, a state that satisﬁes the guardsof t a and t b and shows that a and b are not conﬂuent (i.e., the pair does notsatisfy Def. 5). Note that by only checking reachable states in the component,we are already overapproximating the reachable states in the composition.For this proof we only analyze the case of the AND gate. For other RFTelements, the proof follows similarly. Let v be a vertex in a RFT such that l ( v ) =( and , . We analyze f against u in [[( and , (Fig. 5) and show that they arenot conﬂuent. Take state s deﬁned by count=1 , informf=false and informu=false ,which can be easily checked to be reachable. There, we ﬁnd that it enablesymbolic transitions at lines (with label f ) and (with label u ). On the onehand, transition at line moves to the state where count=2 , informf=true and informu=false is reached. At this point action u can only be performed throughtransition at line , which yields state s (cid:48) deﬁned by count=1 , informf=true and informu=true . On the other hand, transition at line moves to the state where count=0 , informf=false and informu=false . This state only enables f at line , which yields state s (cid:48)(cid:48) deﬁned by count=1 , informf=false and informu=false .Since s (cid:48) and s (cid:48)(cid:48) are two diﬀerent states, we have proved that f and u are notconﬂuent. Similarly, we can show that the pairs ( f , u i ) and ( u , f i ) , for i = 1 , ,are not conﬂuent.All other pairs are conﬂuent. Take for instance transitions at lines and which are deﬁned for actions f and f respectively, and the state s deﬁnedby count=0 , informf=false and informu=false . On the one hand, line leadsto the state where count=1 , informf=false and informu=false which in turnsenables f only at line yielding state s (cid:48) deﬁned by count=2 , informf=true and informu=false . On the other hand, line at state s moves to the state where count=1 , informf=false and informu=false which only enables f at line line yielding the same state s (cid:48) . The proof follows similarly from any other reachablestate enabling f and f showing, thus, that f and f are conﬂuent. In some othercases the proof of conﬂuence follows from the fact that the pair of actions arenever enabled simultaneously, as it is the case, e.g., of f and u (notice that theguards enabling each one of them are mutually exclusive). (cid:117)(cid:116) F The Spare Gate model

The Spare basic element (SBE).

For a

SBE element e ∈ E , its semantics is afunction [[ e ]] : A ∗ n → IOSA , where [[( sbe , n, µ, ν, γ )]]( ﬂ , up , f , u , r , e , d , rq , asg , rel , acc , rj , ..., rq n − , asg n − , rel n − , acc n − , rj n − ) results in the following pairof IOSA modules: module SBE fc, dfc, rc : clock; inform : [0..2] init 0; active : bool init false; broken : [0..2] init 0; [ e ??] !active -> (active’=true) & (fc’= µ ); [ d ??] active -> (active’=false) & (dfc’= ν ); [ ﬂ !] active & broken=0 @ fc -> (inform’=1) & (broken’=1); [ ﬂ !] !active & broken=0 @ dfc -> (inform’=1) & (broken’=1); [ r ??] -> (broken’=2) & (rc’= γ ); [ up !] active & broken=2 @ rc -> (inform’=2) & (broken’=0) & (fc’= µ ); [ up !] !active & broken=2 @ rc -> (inform’=2) & (broken’=0) & (dfc’= µ ); [ f !!] inform=1 -> (inform’=0); [ u !!] inform=2 -> (inform’=0); endmodule module MUX queue[n]: [0..3] init 0; % idle, requesting, reject, using avail: bool init true; broken: bool init false; enable: [0..2] init 0; [ ﬂ ?] -> (broken’=true); [ up ?] -> (broken’=false); [ e !!] enable=1 -> (enable’=0); [ d !!] enable=2 -> (enable’=0); [ rq ??] queue[0]=0 & (broken | !avail) -> (queue[0]’=2); [ rq ??] queue[0]=0 & !broken & avail -> (queue[0]’=1); [ asg !!] queue[0]=1 & !broken & avail -> (queue[0]’=3) & (avail’=false); [ rj !!] queue[0]=2 -> (queue[0]’=1); [ rel ??] queue[0]=3 -> (queue[0]’=0) & (avail’=true) & (enable’=2); [ acc ??] -> (enable’=1); ... [ rq n − ??] queue[n-1]=0 & (broken | !avail) -> (queue[n-1]’=2); [ rq n − ??] queue[n-1]=0 & !broken & avail -> (queue[n-1]’=1); [ asg n − !!] queue[n-1]=1 & queue[n-2]=0 & ... & queue[0]=0 & !broken & avail -> (queue[n-1]’=3) & (avail’=false); [ rj n − !!] queue[n-1]=2 -> (queue[n-1]’=1); [ rel n − ??] queue[n-1]=3 -> (queue[n-1]’=0) & (avail’=true) & (enable’=2); [ acc n − ??] -> (enable’=1); endmodule The model for a Spare basic element consists in two IOSA modules. One of thempresents the behaviour of a basic element which can be enabled and disabled,and an other module, the multiplexer, which presents the means to manage thesharing of the SBE between the interested Spare Gates. In this case, we havedecided to model the multiplexer with a priority policy, which prioritizes lowerindex input spare gates to higher indexed ones (notice assignment transitions atline 15 and 22 of the multiplexer module.) Other kinds of policies can be deﬁnedas for repair box gates. In the model, actions rq i indicate that the spare gateinput i is requesting the spare. acc i indicates that input i accepts the spare thathas previously been assigned to it through action asg i . On the other hand action rj i indicates that it rejects it. Action rel i indicates that input i is releasing thespare that has previously been assigned to it. Finally actions e and d enable anddisable the spare basic element when needed. The Spare Gate (SG).

For a spare gate element e ∈ E with priority policy, its se-mantics is a function [[ e ]] : A ∗ n → IOSA , where [[( sg , n )]]( f , u , ﬂ , up , ﬂ , up , rq , asg , acc , rj , rel ..., ﬂ n , up n , rq n , asg n , acc n , rj n , rel n ) results in the followingIOSA: module SPAREGATE state: [0..4] init 0; // on main, request, wait, on spare, broken inform: [0..2] init 0; release: [-n..n] init 0; idx: [1..n] init 1; [ ﬂ ?] state=0 -> (state’=1) & (idx’=1); [ up ?] state=4 -> (state’=0) & (inform’=2); [ up ?] state=3 & idx=1 -> (state’=0) & (idx’=1) & (release’=1); ... [ up ?] state=3 & idx=n -> (state’=0) & (idx’=1) & (release’=n); [ ﬂ ?] state=3 & idx=1 -> (release’=1); ... [ ﬂ n ?] state=3 & idx=n -> (release’=n); [ rq !!] state=1 & idx=1 -> (state’=2); ... [ rq n !!] state=1 & idx=n -> (state’=2); [ asg ??] state=0 | state=1 | state=3 -> (release’=1); [ asg ??] state=2 & idx=1 -> (release’=-1) & (state’=3); [ asg ??] state=4 -> (release’=-1) & (state’=3) & (idx’=1) & (inform’=2); ... [ asg n ??] state=0 | state=1 | state=3 -> (release’=n); [ asg n ??] state=2 & idx=n -> (release’=-n) & (state’=3); [ asg n ??] state=4 -> (release’=-n) & (state’=3) & (idx’=n) & (inform’=2); [ rj ??] state=2 & idx=1 -> (idx’=2) & (state’=1); [ rj ??] state=2 & idx=2 -> (idx’=3) & (state’=1); ... [ rj n ??] state=2 & idx=n -> (state’=4) & (idx’=1) & (inform’=1); [ rel !!] release=1 & !(state=3 & idx=1) -> (release’= 0); [ rel !!] release=1 & state=3 & idx=1 -> (release’= 0) & (state’=1) & (idx’=1); ... [ rel n !!] release=n & !(state=3 & idx=n) -> (release’=0); [ rel n !!] release=n & state=3 & idx=n -> (release’= 0) & (state’=1) & (idx’=1); [ acc !!] release=-1 -> (release’= 0); ... [ acc n !!] release=-n -> (release’=0); [ f !!] inform = 1 -> (inform’=0); [ u !!] inform = 2 -> (inform’=0); endmodule The Spare Gate model is using a priority policy over the available Spare BEs.This means that when looking for a Spare BE, it will start asking for it to thelower index inputs and go on with higher index until obtaining a replacement.Other policies can be deﬁned into the spare gate too, just as with the multiplexerand the repair box. In the SG model, a variable state distinguishes from whenthe SG is working with its main BE, requesting a SBE, waiting for a responsefrom its inputs, working on a SBE or broken. A vector named release indicatesfor each SBE input i when the SG has to release (value i ) or accept (value − i )the assignment of that SBE. A variable idx indicates which of the inputs torequest next. At line the SG deﬁnes the transition which starts with the SBEacquiring protocol whenever the main BE fails. The following transitions up toline are there to release the acquired SBEs whenever they fail or the main BEis repaired. Transitions from lines 17 to 19 are there to request for each availableSBE. After doing so, we need to wait for a response from the correspondingmultiplexer ( state’=2 ). The request can be rejected (lines 29 to 32), and weproceed by asking for the next SBE by setting idx to the corresponding valueif there is one, or by failing in case none of the SBE where available ( state’=4 at line 32). A SBE can be assigned to us when not needed anymore (lines and ), or when we where expecting it in order to avoid failing (lines and ), or when we had already failed and thus we get repaired by using it (lines and ). I may want to release a SBE when it is assigned to me and I do notneed it (lines and ) or when it fails while I am using it (lines and ).Finally we accept assigned SBEs at lines to and we signal failure at line and repair at line46