Verification of Hierarchical Artifact Systems
VVerification of Hierarchical Artifact Systems
Alin Deutsch
UC San Diego [email protected] Yuliang Li
UC San Diego [email protected] Victor Vianu
UC San Diego & INRIA Saclay [email protected]
ABSTRACT
Data-driven workflows, of which IBM’s Business Artifactsare a prime exponent, have been successfully deployed inpractice, adopted in industrial standards, and have spawneda rich body of research in academia, focused primarily onstatic analysis. The present work represents a significantadvance on the problem of artifact verification, by consid-ering a much richer and more realistic model than in pre-vious work, incorporating core elements of IBM’s successfulGuard-Stage-Milestone model. In particular, the model fea-tures task hierarchy, concurrency, and richer artifact data.It also allows database key and foreign key dependencies, aswell as arithmetic constraints. The results show decidabil-ity of verification and establish its complexity, making useof novel techniques including a hierarchy of Vector AdditionSystems and a variant of quantifier elimination tailored toour context.
Keywords data-centric workflows; business process management; tem-poral logic; verification
1. INTRODUCTION
The past decade has witnessed the evolution of work-flow specification frameworks from the traditional process-centric approach towards data-awareness. Process-centricformalisms focus on control flow while under-specifying theunderlying data and its manipulations by the process tasks,often abstracting them away completely. In contrast, data-aware formalisms treat data as first-class citizens. A notableexponent of this class is IBM’s business artifact model pio-neered in [44], successfully deployed in practice [11, 10, 18,23, 57] and adopted in industrial standards. Business arti-facts have also spawned a rich body of research in academia,dealing with issues ranging from formal semantics to staticanalysis (see related work).In a nutshell, business artifacts (or simply “artifacts”)model key business-relevant entities, which are updated by aset of services that implement business process tasks, speci-fied declaratively by pre-and-post conditions. A collection ofartifacts and services is called an artifact system . IBM hasdeveloped several variants of artifacts, of which the mostrecent is Guard-Stage-Milestone (GSM) [20, 36]. The GSMapproach provides rich structuring mechanisms for services,including parallelism, concurrency and hierarchy, and hasbeen incorporated in the OMG standard for Case Manage-ment Model and Notation (CMMN) [13, 40]. Artifact systems deployed in industrial settings typicallyspecify very complex workflows that are prone to costlybugs, whence the need for verification of critical properties.Over the past few years, we have embarked upon a study ofthe verification problem for artifact systems. Rather thanrelying on general-purpose software verification tools suf-fering from well-known limitations, our aim is to identifypractically relevant classes of artifact systems and proper-ties for which fully automatic verification is possible. Thisis an ambitious goal, since artifacts are infinite-state sys-tems due to the presence of unbounded data. Our approachrelies critically on the declarative nature of service specifica-tions and brings into play a novel marriage of database andcomputer-aided verification techniques.In previous work [24, 19], we studied the verification prob-lem for a bare-bones variant of artifact systems, without hi-erarchy or concurrency, in which each artifact consists ofa flat tuple of evolving values and the services are spec-ified by simple pre-and-post conditions on the artifact anddatabase. More precisely, we considered the problem of stat-ically checking whether all runs of an artifact system sat-isfy desirable properties expressed in LTL-FO, an extensionof linear-time temporal logic where propositions are inter-preted as ∃ FO sentences on the database and current arti-fact tuple. In order to deal with the resulting infinite-statesystem, we developed in [24] a symbolic approach allowinga reduction to finite-state model checking and yielding a pspace verification algorithm for the simplest variant of themodel (no database dependencies and uninterpreted datadomain). In [19] we extended our approach to allow fordatabase dependencies and numeric data testable by arith-metic constraints. Unfortunately, decidability was obtainedsubject to a rather complex semantic restriction on the ar-tifact system and property (feedback freedom), and the ver-ification algorithm has non-elementary complexity.The present work represents a significant advance on theartifact verification problem on several fronts. We consider amuch richer and more realistic model, called
Hierarchical Ar-tifact System (HAS), abstracting core elements of the GSMmodel. In particular, the model features task hierarchy, con-currency, and richer artifact data (including updatable arti-fact relations). We consider properties expressed in a novel hierarchical temporal logic, HLTL-FO, that is well-suited tothe model. Our main results establish the complexity ofchecking HLTL-FO properties for various classes of HAS,highlighting the impact of various features on verification.The results require qualitatively novel techniques, becausethe reduction to finite-state model checking used in previ- a r X i v : . [ c s . D B ] A p r us work is no longer possible. Instead, the richer modelrequires the use of a hierarchy of Vector Addition Systemswith States (VASS) [14]. The arithmetic constraints arehandled using quantifier elimination techniques, adapted toour setting.We next describe the model and results in more detail.A HAS consists of a database and a hierarchy (rooted tree)of tasks . Each task has associated to it local evolving dataconsisting of a tuple of artifact variables and an updatableartifact relation. It also has an associated set of services .Each application of a service is guarded by a pre-conditionon the database and local data and causes an update of thelocal data, specified by a post condition (constraining thenext artifact tuple) and an insertion or retrieval of a tuplefrom the artifact relation. In addition, a task may invoke achild task with a tuple of parameters, and receive back a re-sult if the child task completes. A run of the artifact systemconsists of an infinite sequence of transitions obtained byany valid interleaving of concurrently running task services.In order to express properties of HAS’s we introduce hi-erarchical LTL-FO (HLTL-FO). Intuitively, an HLTL-FOformula uses as building blocks LTL-FO formulas acting onruns of individual tasks, called local runs, referring onlyto the database and local data, and can recursively stateHLTL-FO properties on runs resulting from calls to chil-dren tasks. The language HLTL-FO closely fits the compu-tational model and is also motivated on technical groundsdiscussed in the paper. A main justification for adoptingHLTL-FO is that LTL-FO (and even LTL) properties areundecidable for HAS’s.Hierarchical artifact systems as sketched above providepowerful extensions to the variants we previously studied,each of which immediately leads to undecidability of veri-fication if not carefully controlled. Our main contributionis to put forward a package of restrictions that ensures de-cidability while capturing a significant subset of the GSMmodel. This requires a delicate balancing act aiming to limitthe dangerous features while retaining their most useful as-pects. In contrast to [19], this is achieved without the needfor unpleasant semantic constraints such as feedback free-dom. The restrictions are discussed in detail in the paper,and shown to be necessary by undecidability results.The complexity of verification under various restrictionsis summarized in Tables 1 (without arithmetic) and 2 (witharithmetic). As seen, the complexity ranges from pspace to non-elementary for various packages of features. Thenon-elementary complexity (a tower of exponentials whoseheight is the depth of the hierarchy) is reached for HAS withcyclic schemas, artifact relations and arithmetic. For acyclicschemas, which include the widely used Star (or Snowflake)schemas [38, 54], the complexity ranges from pspace (with-out arithmetic or artifact relations) to double-exponentialspace (with both arithmetic and artifact relations). This is asignificant improvement over the previous algorithm of [19],which even for acyclic schemas has non-elementary complex-ity in the presence of arithmetic (a tower of exponentialswhose height is the square of the total number of artifactvariables in the system).The paper is organized as follows. The HAS model ispresented in Section 2. We present its syntax and seman-tics, including a representation of runs as a tree of local taskruns, that factors out interleavings of independent concur-rent tasks. An example HAS modeling a simple travel book- ing process is provided in the appendix. The temporal logicHLTL-FO is introduced in Section 3, together with a corre-sponding extension of B¨uchi automata to trees of local runs.In Section 4 we prove the decidability of verification with-out arithmetic, and establish its complexity. To this end, wedevelop a symbolic representation of HAS runs and a reduc-tion of model checking to state reachability problems in a setof nested VASS (mirroring the task hierarchy). In Section5 we show how the verification results can be extended inthe presence of arithmetic. Section 6 traces the boundary ofdecidability, showing that the main restrictions adopted indefining the HAS model cannot be relaxed. Finally, we dis-cuss related work in Section 7 and conclude. The appendixprovides more details and proofs, together with our runningexample.
2. FRAMEWORK
In this section we present the syntax and semantics ofHierarchical Artifact Systems (HAS’s). We begin with theunderlying database schema.
Definition A database schema DB is a finite set ofrelation symbols, where each relation R of DB has an associ-ated sequence of distinct attributes containing the following: • a key attribute ID (present in all relations), • a set of foreign key attributes { F , . . . , F m } , and • a set of non-key attributes { A , . . . , A n } disjoint from { ID , F , . . . , F m } .To each foreign key attribute F i of R is associated a relation R F i of DB and the inclusion dependency R [ F i ] ⊆ R F i [ID] .It is said that F i references R F i . The domain
Dom ( A ) of each attribute A depends on itstype. The domain of all non-key attributes is numeric,specifically R . The domain of each key attribute is a count-able infinite domain disjoint from R . For distinct relations R and R (cid:48) , Dom ( R. ID ) ∩ Dom ( R (cid:48) . ID ) = ∅ . The domain ofa foreign key attribute F referencing R is Dom ( R. ID ). Wedenote by DOM id = ∪ R ∈DB Dom ( R. ID ). Intuitively, in sucha database schema, each tuple is an object with a globally unique id. This id does not appear anywhere else in thedatabase except as foreign keys referencing it. An instance of a database schema DB is a mapping D associating to eachrelation symbol R a finite relation D ( R ) of the same arityof R , whose tuples provide, for each attribute, a value fromits domain. In addition, D satisfies all key and inclusion de-pendencies associated with the keys and foreign keys of theschema. The active domain D , denoted adom ( D ), consistsof all elements of D (id’s and reals). A database schema DB is acyclic if there are no cycles in the references induced byforeign keys. More precisely, consider the labeled graph FKwhose nodes are the relations of the schema and in whichthere is an edge from R i to R j labeled with F if R i has aforeign key attribute F referencing R j . The schema DB is acyclic if the graph FK is acyclic, and it is linearly-cyclic ifeach relation R is contained in at most one simple cycle.The assumption that the ID of each relation is a single at-tribute is made for simplicity, and multiple-attribute IDs canbe easily handled. The fact that the domain of all non-keyattributes is numeric is also harmless. Indeed, an uninter-preted domain on which only equality can be used can beeasily simulated. Note that the keys and foreign keys usedn our schemas are special cases of the dependencies usedin [19]. The limitation to keys and foreign keys is one of thefactors leading to improved complexity of verification andstill captures most schemas of practical interest.We next proceed with the definition of tasks and services,described informally in the introduction. The definition im-poses various restrictions needed for decidability of verifica-tion. These are discussed and motivated in Section 6.Similarly to the database schema, we consider two infi-nite, disjoint sets VAR id of ID variables and VAR R of nu-meric variables. We associate to each variable x its domain Dom ( x ). If x ∈ VAR id , then Dom ( x ) = { null } ∪ DOM id ,where null (cid:54)∈ DOM id ∪ R ( null plays a special role that willbecome clear shortly). If x ∈ VAR R , then Dom ( x ) = R . An artifact variable is a variable in VAR id ∪ VAR R . If ¯ x is asequence of artifact variables, a valuation of ¯ x is a mapping ν associating to each variable in ¯ x an element of its domain Dom ( x ). Definition A task schema over database schema DB is a triple T = (cid:104) ¯ x T , S T , ¯ s T (cid:105) where ¯ x T is a sequence of dis-tinct artifact variables, S T is a relation symbol not in DB with associated arity k , and ¯ s T is a sequence of k distinct idvariables in ¯ x T . We denote by ¯ x Tid = ¯ x T ∩ VAR id and ¯ x T R = ¯ x T ∩ VAR R .We refer to S T as the artifact relation or set of T . Definition An artifact schema is a tuple A = (cid:104)H , DB(cid:105) where DB is a database schema and H is a rooted tree of taskschemas over DB with pairwise disjoint sets of artifact vari-ables and distinct artifact relation symbols. The rooted tree H defines the task hierarchy . Suppose theset of tasks is { T , . . . , T k } . For uniformity, we always taketask T to be the root of H . We denote by (cid:22) H (or simply (cid:22) when H is understood) the partial order on { T , . . . , T k } induced by H (with T the minimum). For a node T of H ,we denote by tree(T) the subtree of H rooted at T , child ( T )the set of children of T (also called subtasks of T ), desc ( T )the set of descendants of T (excluding T ). Finally, desc ∗ ( T )denotes desc ( T ) ∪ { T } . We denote by S H (or simply S when H is understood) the relational schema { S T i | ≤ i ≤ k } .An instance of S is a mapping associating to each S T i ∈ S a finite relation over DOM id of the same arity. Definition An instance of an artifact schema A = (cid:104)H , DB(cid:105) is a tuple ¯ I = (cid:104) ¯ ν, stg, D, ¯ S (cid:105) where D is a finiteinstance of DB , ¯ S a finite instance of S , ¯ ν a valuationof (cid:83) ki =1 ¯ x T i , and stg (standing for “stage”) a mapping of { T , . . . , T k } to { init , active , closed } . The stage stg ( T i ) of a task T i has the following intuitivemeaning in the context of a run of its parent: init indicatesthat T i has not yet been called within the run, active saysthat T i has been called and has not returned its answer, and closed indicates that T i has returned its answer. As we willsee, a task T i can only be called once within a given run ofits parent. However, it can be called again in subsequentruns.We denote by C an infinite set of relation symbols, eachof which has a fixed interpretation as the set of real solu-tions of a finite set of polynomial inequalities with integercoefficients. By slight abuse, we sometimes use the same no-tation for a relation symbol in C and its fixed interpretation.For a given artifact schema A = (cid:104)H , DB(cid:105) and a sequence ¯ x of variables, a condition on ¯ x is a quantifier-free FO for-mula over DB ∪ C ∪ { = } whose variables are included in ¯ x . The special constant null can be used in equalities with IDvariables. For each atom R ( x, y , . . . , y m , z , . . . , z n ) of rela-tion R ( ID , A , . . . , A m , F , . . . , F n ) ∈ DB , { x, z , . . . , z n } ⊆ VAR id and { y , . . . , y m } ⊆ VAR R . Atoms over C use onlynumeric variables. If α is a condition on ¯ x , D is an instanceof DB and ν a valuation of ¯ x , we denote by D ∪C | = α ( ν ) thefact that D ∪ C satisfies α with valuation ν with standardsemantics. For an atom R (¯ y ) in α where R ∈ DB and ¯ y ⊆ ¯ x ,if ν ( y ) = null for any y ∈ ¯ y , then R (¯ y ) is false.We next define services of tasks. We start with internalservices, which update the artifact variables and artifact re-lation of the task. Definition Let T = (cid:104) ¯ x T , S T , ¯ s T (cid:105) be a task of an ar-tifact schema A . An internal service σ of T is a tuple (cid:104) π, ψ, δ (cid:105) where: • π and ψ , called pre-condition and post-condition , respec-tively, are conditions over ¯ x T • δ ⊆ { + S T (¯ s T ) , − S T (¯ s T ) } is a set of set updates ; + S T (¯ s T ) and − S T (¯ s T ) are called the insertion and retrieval of ¯ s T , respectively. Intuitively, + S T (¯ s T ) causes an insertion of the current value of ¯ s T into S T , while − S T (¯ s T ) causes the removal ofsome non-deterministically chosen current tuple of S T andits assignment as the next value of ¯ s T . In particular, if δ = { + S T (¯ s T ) , − S T (¯ s T ) } , the tuple inserted by + S T (¯ s T ) andthe one retrieved by − S T (¯ s T ) are generally distinct, butmay be the same as a degenerate case (see definition of thesemantics below).As will become apparent, although pre-and-post condi-tions are quantifier-free, ∃ FO conditions can be simulatedby adding variables to ¯ x T .An internal service of a task T specifies transitions thatonly modify the variables ¯ x T of T and the contents of S T . In-teractions among tasks are specified using two kinds of spe-cial services, called the opening-services and closing-services . Definition Let T c be a child of a task T in A .(i) The opening-service σ oT c of T c is a tuple (cid:104) π, f in (cid:105) , where π is a condition over ¯ x T , and f in is a partial 1-1 mappingfrom ¯ x T c to ¯ x T (called the input variable mapping). Wedenote dom ( f in ) by ¯ x T c in , called the input variables of T c ,and range ( f in ) by ¯ x TT c ↓ (the variables of T passed as inputto T c ).(ii) The closing-service σ cT c of T c is a tuple (cid:104) π, f out (cid:105) , where π is a condition over ¯ x T c , and f out is a partial 1-1 map-ping from ¯ x T to ¯ x T c (called the output variable mapping).We denote dom ( f out ) by ¯ x TT c ↑ , referred to as the returnedvariables from T c . It is required that ¯ x TT c ↑ ∩ ¯ x Tin = ∅ .We denote by ¯ x T c ret the to-be-returned variables (or returnvariables), defined as range ( f out ) . Intuitively, the opening-service (cid:104) π, f in (cid:105) of a task T c spec-ifies the condition π that the parent task T has to satisfyin order to open T c . When T c is opened, a subset of thevariables of T are sent to T c according to the mapping f in .Similarly, the closing-service (cid:104) π, f out (cid:105) specifies the condition π that T c has to satisfy in order to be closed and return to T . When T c is closed, a subset of ¯ x T c is sent back to T , asspecified by f out .For uniformity of notation, we also equip the root task T with a service σ oT with pre-condition true that initiatesthe computation by providing a valuation to a designatedubset ¯ x T in of ¯ x T (the input variables of T ), and a service σ cT whose pre-condition is false (so it never occurs in arun). For a task T we denote by Σ T the set of its internalservices, Σ ocT = Σ T ∪ { σ oT , σ cT } , Σ obs T = Σ ocT ∪ { σ oT c , σ cT c | T c ∈ child ( T ) } , and Σ δT = Σ T ∪ { σ oT } ∪ { σ cT c | T c ∈ child ( T ) } .Intuitively, Σ obs T consists of the services observable in runsof task T and Σ δT consists of services whose application canmodify the variables ¯ x T . Definition A Hierarchical Artifact System (HAS) isa triple
Γ = (cid:104)A , Σ , Π (cid:105) , where A is an artifact schema, Σ isa set of services over A including σ oT and σ cT for each task T of A , and Π is a condition over ¯ x T in (where T is the roottask). We next define the semantics of HAS’s. Intuitively, a runof a HAS on a database D consists of an infinite sequenceof transitions among HAS instances (also referred to as con-figurations, or snapshots), starting from an initial artifacttuple satisfying pre-condition Π. At each snapshot, each ac-tive task T can open a subtask T c if the pre-condition of theopening service of T c holds, and the values of a subset of¯ x T is passed to T c as its input variables. T c can be closedif the pre-condition of its closing service is satisfied. When T c is closed, the values of a subset of ¯ x T c are sent to T as T ’s returned variables from T c . An internal service of T canonly be applied after all active subtasks of T have returnedtheir answer.Because of the hierarchical structure, and the locality oftask specifications, the actions of concurrently active chil-dren of a given task are independent of each other and canbe arbitrarily interleaved. To capture just the essential in-formation, factoring out the arbitrary interleavings, we firstdefine the notion of local run and tree of local runs . Intu-itively, a local run of a task consists of a sequence of servicesof the task, together with the transitions they cause on thetask’s local artifact variables and relation. The tasks’s inputand output are also specified. A tree of local runs capturesthe relationship between the local runs of tasks and those oftheir subtasks, including the passing of inputs and results.Then the runs of the full artifact system simply consist ofall legal interleavings of transitions represented in the treeof local runs, lifted to full HAS instances (we refer to theseas global runs ). We begin by defining instances of tasks andlocal transitions. For a mapping M , we denote by M [ a (cid:55)→ b ]the mapping that sends a to b and agrees with M everywhereelse. Definition Let T = (cid:104) ¯ x T , S T , ¯ s T (cid:105) be a task in Γ and D a database instance over DB . An instance of T is a pair ( ν, S ) where ν is a valuation of ¯ x T and S an instance of S T .For instances I = ( ν, S ) and I (cid:48) = ( ν (cid:48) , S (cid:48) ) of T and a service σ ∈ Σ obsT , there is a local transition I σ −→ I (cid:48) if the followingholds. If σ is an internal service ( π, ψ ) , then: • D ∪ C | = π ( ν ) and D ∪ C | = ψ ( ν (cid:48) ) • ν (cid:48) ( y ) = ν ( y ) for each y in ¯ x Tin • if δ = { + S T (¯ s T ) } , then S (cid:48) = S ∪ { ν (¯ s T ) } , • if δ = {− S T (¯ s T ) } , then ν (cid:48) (¯ s T ) ∈ S and S (cid:48) = S −{ ν (cid:48) (¯ s T ) } , • if δ = { + S T (¯ s T ) , − S T (¯ s T ) } , then ν (cid:48) (¯ s T ) ∈ S ∪ { ν (¯ s T ) } and S (cid:48) = ( S ∪ { ν (¯ s T ) } ) − { ν (cid:48) (¯ s T ) } , • if δ = ∅ then S (cid:48) = S .If σ = σ oT c = (cid:104) π, f in (cid:105) is the opening-service for a child T c of T then D ∪ C | = π ( ν ) , ν (cid:48) = ν and S (cid:48) = S . If σ = σ cT c then S = S (cid:48) , ν (cid:48) | (¯ x T − ¯ x TT c ↑ ) = ν | (¯ x T − ¯ x TT c ↑ ) and ν (cid:48) ( z ) = ν ( z ) for every z ∈ ¯ x TT c ↑ ∩ VAR id for which ν ( z ) (cid:54) = null . Finally, if σ = σ cT then I (cid:48) = I . We now define local runs.
Definition Let T = (cid:104) ¯ x T , S T , ¯ s T (cid:105) be a non-root taskin Γ and D a database instance over DB . A local run of T over D is a triple ρ T = ( ν in , ν out , { ( I i , σ i ) } ≤ i<γ ) , where: • γ ∈ N ∪ { ω }• for each i ≥ , I i is an instance of T and σ i ∈ Σ obsT • ν in is a valuation of ¯ x Tin • σ = σ oT and S = ∅ , • ν | ¯ x Tin = ν in , ν ( z ) = null for z ∈ VAR id − ¯ x Tin and ν ( z ) = 0 for z ∈ VAR R − ¯ x Tin • if for some i , σ i = σ cT then γ ∈ N and i = γ − (and ρ T is called a returning local run) • ν out = ν γ − | ¯ x Tret if ρ T is a returning run and ⊥ otherwise • a segment of ρ T is a subsequence { ( I i , σ i ) } i ∈ J , where J isa maximal interval [ a, b ] ⊆ { i | ≤ i < γ } such that no σ j is an internal service of T for j ∈ [ a + 1 , b ] . A segment J is terminal if γ ∈ N and b = γ − (and is called returningif σ γ − = σ cT and blocking otherwise). Segments of ρ T must satisfy the following properties. For each child T c of T there is at most one i ∈ J such that σ i = σ oT c . If J isnot blocking and such i exists, there is exactly one j ∈ J for which σ j = σ cT c , and j > i . If J is blocking, there is at most one such j . • for every < i < γ , I i − σ i −→ I i .Local runs of the root task T are defined as above, exceptthat ν in is a valuation of ¯ x T in such that D ∪ C | = Π , and ν out = ⊥ (the root task never returns). For a local run as above, we denote γ ( ρ T ) = γ . Note thatby definition of segment, a task can call each of its childrentasks at most once between two consecutive services in Σ ocT and all of the called children tasks must complete withinthe segment, unless it is blocking. These restrictions areessential for decidability and are discussed in Section 6.Observe that local runs take arbitrary inputs and allowfor arbitrary return values from its children tasks. The validinteractions between the local runs of a tasks and those ofits children is captured by the notion of tree of local runs . Definition A tree of local runs is a directed labeledtree Tree in which each node is a local run ρ T for some task T , and every edge connects a local run of a task T with alocal run of a child task T c and is labeled with a non-negativeinteger i (denoted i ( ρ T c ) ). In addition, the following prop-erties are satisfied. Let ρ T = ( ν Tin , ν
Tout , { ( I i , σ i ) } ≤ i<γ ) be anode of Tree , where I i = ( ν i , S i ) , i ≥ . Let i be such that σ i = σ oT c for some child T c of T . There exists a unique edgelabeled i from ρ T to a node ρ T c = ( ν in , ν out , { ( I (cid:48) i , σ (cid:48) i ) } ≤ i<γ (cid:48) ) of Tree , and the following hold: • ν in = f in ◦ ν i where f in is the input variable mapping of σ oT c • ρ T c is a returning run iff there exists j > i such that σ j = σ cT c ; let k be the minimum such j . Then ν k ( z ) = ν out ( f out ( z )) for every z ∈ ¯ x TT c ↑ for which ν k − ( z ) = null ,where f out is the output mapping of σ cT c .Finally, for every node ρ T of Tree , if ρ T is blocking thenthere exists a child of ρ T that is not returning (so infinite orblocking). Composition is left-to-right.ote that a tree of local runs may generally be rooted ata local run of any task of Γ. We say that
Tree is full if it isrooted at a local run of T .We next turn to global runs. A global run of Γ on databaseinstance D over DB is an infinite sequence ρ = { ( I i , σ i ) } i ≥ ,where each I i is an instance ( ν i , stg i , D, S i ) of A and σ i ∈ Σ,resulting from a tree of local runs by interleaving its tran-sitions, lifted to full HAS instances (see Appendix for theformal definition). For a tree of local runs
Tree , we denoteby L ( Tree ) the set of all global runs induced by the legalinterleavings of
Tree .
3. HIERARCHICAL LTL-FO
In order to specify temporal properties of HAS’s we use anextension of LTL (linear-time temporal logic). Recall thatLTL is propositional logic augmented with temporal oper-ators X (next), U (until), G (always) and F (eventually)(e.g., see [30]). Their semantics is reviewed in Appendix B.2.An extension of LTL in which propositions are interpreted asFO sentences has previously been defined to specify prop-erties of sequences of structures [51], and in particular ofruns of artifact systems [24, 19]. The extension is denotedby LTL-FO. In order to specify properties of HAS’s, weshall use a variant of LTL-FO, called hierarchical LTL-FO,denoted HLTL-FO. Intuitively, an HLTL-FO formula usesas building blocks LTL-FO formulas acting on local runsof individual tasks, referring only to the database and lo-cal data, and can recursively state HLTL-FO properties onruns resulting from calls to children tasks. This closely mir-rors the hierarchical execution of tasks, and is a natural fitfor this computation model. In addition to its naturaleness,the choice of HLTL-FO has several technical justifications.First, verification of LTL-FO (and even LTL) properties isnot possible for HAS’s.
Theorem
It is undecidable, given an LTL-FO for-mula ϕ and a HAS Γ = (cid:104)A , Σ , Π (cid:105) , whether Γ | = ϕ . More-over, this holds even for LTL formulas over Σ (restrictingthe sequence of services in a global run). The proof, provided in Appendix B.3, is by reduction fromrepeated state reachability in VASS with resets and boundedlossiness, whose undecidability follows from [41].Another technical argument in favor of HLTL-FO is thatit only expresses properties that are invariant under inter-leavings of independent tasks. Interleaving invariance is notonly a natural soundness condition, but also allows more ef-ficient model checking by partial-order reduction [45]. More-over, HLTL-FO enjoys a pleasing completeness property: itexpresses, in a reasonable sense, all interleaving-invariantLTL-FO properties of HAS’s. The proof is non-trivial, build-ing on completeness results for propositional temporal logicson Mazurkiewicz traces [27, 28] (see Appendix B.4).We next define HLTL-FO. Propositions in HLTL-FO areinterpreted as conditions on artifact instances in the run, orrecursively as HLTL-FO formulas on runs of invoked childrentasks. The different conditions may share some universallyquantified global variables. Definition
Let
Γ = (cid:104)A , Σ , Π (cid:105) be an artifact systemwhere A = (cid:104)H , DB(cid:105) . Let ¯ y be a finite sequence of variables For consistency with previous notation, we denote the logicHLTL-FO although the FO interpretations are restricted tobe quantifier free. in VAR id ∪ VAR R disjoint from { ¯ x T | T ∈ H} , called globalvariables . We first define recursively the set Ψ( T, ¯ y ) of basicHLTL-FO formulas with global variables ¯ y , for each task T ∈H . The set Ψ( T, ¯ y ) consists of all formulas ϕ f obtained asfollows: • ϕ is an LTL formula with propositions P ∪ Σ obsT where P is a finite set of proposition disjoint from Σ ; • Let Φ be the set of conditions on ¯ x T ∪ ¯ y extended by al-lowing atoms of the form S T (¯ z ) in which all variables in ¯ z are in ¯ y ∩ VAR id ; f is a function from P to Φ ∪{ [ ψ ] T c | ψ ∈ Ψ( T c , ¯ y ) , T c ∈ child ( T ) } ; • ϕ f is obtained by replacing each p ∈ P with f ( p ) ;An HLTL-FO formula over A is an expression ∀ ¯ y [ ϕ f ] T where ϕ f is in Ψ( T , ¯ y ) . In an HLTL-FO formula of task T , each proposition ismapped to either a quantifier-free FO formula referring tothe variables and set of task T , or an HLTL-FO formula ofa child task of T . The intuition is the following. A proposi-tion mapped to a quantifier-free FO formula holds in a givenconfiguration of T if the formula is true in that configura-tion. A proposition mapped to an expression [ ψ ] T c holds ina given configuration if T makes a call to T c and the run of T c resulting from the call satisfies ψ . Example
Let T be a root task with child tasks T and T . The HLTL-FO formula (with no global variables) ϕ = [ F [ ψ ] T → G ( σ oT → [ ψ ] T )] T states that whenever T calls child task T and T ’s local runsatisfies property ψ , then if T is also called (via the openingservice σ oT ), its local run must satisfy property ψ . See Appendix A.2 for a concrete HLTL-FO property of sim-ilar structure, in the context of our example for the HASmodel.Since HLTL-FO properties depend on local runs of tasksand their relationship to local runs of their descendants,their semantics is naturally defined using the full trees oflocal runs. We first define satisfaction by a local run ofHLTL-FO formulas with no global variables. This is donerecursively. Let
Tree be a full tree of local runs of Γ oversome database D . Let ϕ f be a formula in Ψ( T, (cid:104)(cid:105) ) (no globalvariables). Recall that ϕ is a propositional LTL formula over P ∪ Σ obsT . Let ρ T = ( ν in , ν out , { ( I i , σ i ) } i<γ ) be a local runof T in Tree . A proposition σ ∈ Σ obsT holds in ( I j , σ j ) if σ = σ j . Consider p ∈ P and f ( p ). If f ( p ) is an FO formula,the standard definition applies. If f ( p ) = [ ψ ] T c , then ( I j , σ j )satisfies [ ψ ] T c iff σ j = σ T c and the local run of T c connectedto ρ T in Tree by an edge labeled j satisfies ψ . The formula ϕ f is satisfied if the sequence of truth values of its proposi-tions via f satisfies ϕ . Note that ρ T may be finite, in whichcase a finite variant of the LTL semantics is used [22] (seeAppendix B.2).A full tree of local runs satisfies ϕ f ∈ Ψ( T , (cid:104)(cid:105) ) if its root (alocal run of T ) satisfies ϕ f . Finally, let ϕ f (¯ y ) be a formulain Ψ( T , ¯ y ). Then ∀ ¯ y [ ϕ f (¯ y )] T is satisfied by Tree , denoted
Tree | = ∀ ¯ y [ ϕ f (¯ y )] T , if for every valuation ν of ¯ y , Tree sat-isfies ϕ f ν where f ν is obtained from f by replacing each y in f ( p ) by ν ( y ) for every p ∈ P . Note that ϕ f ν ∈ Ψ( T , (cid:104)(cid:105) ).Finally, Γ satisfies ∀ ¯ y [ ϕ f (¯ y )] T , denoted Γ | = ∀ ¯ y [ ϕ f (¯ y )] T , if Tree | = ∀ ¯ y [ ϕ f (¯ y )] T for every database instance D and treeof local runs Tree of Γ on D . [ ψ ] T c is an expression whose meaning is explained below.he semantics of HLTL-FO on trees of local runs of aHAS also induces a semantics on the global runs of the HAS.Let ∀ ¯ y [ ϕ f (¯ y )] T be an HLTL-FO formula and ρ ∈ L ( Tree ),where
Tree is a full tree of local runs of Γ. We say that ρ satisfies ∀ ¯ y [ ϕ f (¯ y )] T if Tree satisfies ∀ ¯ y [ ϕ f (¯ y )] T . This iswell defined in view of the following easily shown fact: if ρ ∈ L ( Tree ) ∩ L ( Tree ) then Tree = Tree . Simplifications
Before proceeding, we note that severalsimplifications to HLTL-FO formulas and HAS specifica-tions can be made without impact on verification. First,although useful at the surface syntax, the global variables,as well as set atoms, can be easily eliminated from theHLTL-FO formula to be verified (Lemma 30 in AppendixB.5). It is also useful to note that one can assume, with-out loss of generality, two simplifications on artifact systemsregarding the interaction of tasks with their subtasks: (i)for every task T , the set of variables passed to subtasksis disjoint with the set of variables returned by subtasks,and (ii) all variables returned by subtasks are non-numeric(Lemma 31 in Appendix B.5). In view of the above, wehenceforth consider only properties with no global variablesor set atoms, and artifact systems simplified as described. Checking HLTL-FO Properties Using Automata
We next show how to check HLTL-FO properties of treesof local runs of artifact systems. Before we do so, recallthe standard construction of a B¨uchi automaton B ϕ corre-sponding to an LTL formula ϕ [53, 49]. The automaton B ϕ has exponentially many states and accepts precisely theset of ω -words that satisfy ϕ . Recall that we are interestedin evaluating LTL formulas ϕ on both infinite and finiteruns. It is easily seen that for the B ϕ obtained by the stan-dard construction there is a subset Q fin of its states suchthat B ϕ viewed as a finite-state automaton with final states Q fin accepts precisely the finite words that satisfy ϕ (detailsomitted).Consider now an artifact system Γ and let ϕ = [ ξ ] T be anHLTL-FO formula over Γ. Consider a full tree Tree of localruns. For task T , denote by Φ T the set of sub-formulas [ ψ ] T occurring in ϕ and by 2 Φ T the set of truth assignments tothese formulas. For each T and η ∈ Φ T , let B ( T, η ) be theB¨uchi automaton constructed from the formula (cid:0) ∧ ψ ∈ Φ T ,η ( ψ )=1 ψ (cid:1) ∧ (cid:0) ∧ ψ ∈ Φ T ,η ( ψ )=0 ¬ ψ (cid:1) and define B ϕ = { B ( T, η ) | T ∈ H , η ∈ Φ T } .We now define acceptance of Tree by B ϕ . An adornment of Tree is a mapping α associating to each edge from ρ T to ρ T c a truth assignment in 2 Φ Tc . Tree is accepted by B ϕ ifthere exists an adornment α such that: • for each local run ρ T of T with no outgoing edge and in-coming edge with adornment η , ρ T is accepted by B ( T, η ) • for each local run ρ T of T with incoming edge labeledby η , α ( ρ T ) is accepted by B ( T, η ), where α ( ρ T ) extends ρ T by assigning to each configuration ( ρ j , σ oT c ) the truthassignment in 2 Φ Tc adorning its outgoing edge labeled j .(Recall that in configurations ( I j , σ j ) for which σ j (cid:54) = σ oT c ,all formulas in Φ T c are false by definition.) • α ( ρ T ) is accepted by the B¨uchi automaton B ξ where α ( ρ T ) is defined as above.The following can be shown. Lemma
A full tree of local runs
Tree satisfies ϕ =[ ξ ] T iff Tree is accepted by B ϕ .
4. VERIFICATION WITHOUT ARITHMETIC
In this section we consider verification for the case whenthe artifact system and the HLTL-FO property have noarithmetic constraints. We show in Section 5 how our ap-proach can be extended when arithmetic is present.The roadmap to verification is the following. Let Γ be aHAS and ϕ = [ ξ ] T an HLTL-FO formula over Γ. To ver-ify that every tree of local runs of Γ satisfies ϕ , we checkthat there is no tree of local runs satisfying ¬ ϕ = [ ¬ ξ ] T ,or equivalently, accepted by B ¬ ϕ . Since there are infinitelymany trees of local runs of Γ due so the unbounded data do-main, and each tree can be infinite, an exhaustive search isimpossible. We address this problem by developing a sym-bolic representation of trees of local runs, called symbolictree of runs . The symbolic representation is subtle for sev-eral reasons. First, unlike the representations in [24, 19],it is not finite state. This is because summarizing the rel-evant information about artifact relations requires keepingtrack of the number of tuples of various isomorphism types.Second, the symbolic representation does not capture thefull information about the actual runs, but just enough forverification. Specifically, we show that for every HLTL-FOformula ϕ , there exists a tree of local runs accepted by B ϕ iff there exists a symbolic tree of runs accepted by B ϕ . Wethen develop an algorithm to check the latter. The algo-rithm relies on reductions to state reachability problems inVector Addition Systems with States (VASS) [14].One might wonder whether there is a simpler approach toverification of HAS, that reduces it to verification of a flatsystem (consisting of a single task). This could indeed bedone in the absence of artifact relations, by essentially con-catenating the artifact tuples of the tasks along the hierarchythat are active at any given time, and simulating all transi-tions by internal services. However, there is strong evidencethat this is no longer possible when tasks are equipped withartifact relations. First, a naive simulation using a singleartifact relation would require more powerful updating ca-pabilities than available in the model. Moreover, Theorem11 shows that LTL is undecidable for hierarchical systems,whereas the results in this section imply that it is decidablefor flat ones (as it coincides with HLTL for single tasks).While this does not rule out a simulation, it shows thatthere can be no effective simulation natural enough to beextensible to LTL properties. A reduction to the model of[19] is even less plausible, because of the lack of artifact re-lations. Note that, even if a reduction were possible, theresults of [19] would be of no help in obtaining our lowercomplexities for verification, since the algorithm providedthere is non-elementary in all cases.We next embark upon the development outlined above. We begin by defining the symbolic analog of a local run,called local symbolic run . The symbolic tree of runs is ob-tained by connecting the local symbolic runs similarly to theway local runs are connected in trees of local runs.Each local symbolic run is a sequence of symbolic repre-sentations of an actual instance within a local run of a task T . The representation has the following ingredients:1. the equality type of the artifact variables of T and theelements in the database reachable from them by navi-gating foreign keys up to a specified depth h ( T ). This iscalled the T -isomorphism type of the variables.. the T -isomorphism type of the input and return variables(if representing a returning local run)3. for each T -isomorphism type of the set variables of T together with the input variables, the net number of in-sertions of tuples of that type in S T .Intuitively, (1) and (2) are needed in order to ensure thatthe assumptions made about the database while navigatingvia foreign keys in tasks and their subtasks are consistent.The depth h ( T ) is chosen to be sufficiently large to ensurethe consistency. (3) is required in order to make sure thata retrieval from S T of a tuple with a given T -isomorphismtype is allowed only when sufficiently many tuples of thattype have been inserted in S T .We now formally define the symbolic representation, start-ing with T -isomorphism type. Let ¯ x T be the variables of T .We define h ( T ) as as follows. Let FK be the foreign keygraph of the schema DB and F ( n ) be the maximum num-ber of distinct paths of length at most n starting from anyrelation R in FK. Let h ( T ) = 1 + | ¯ x T | · F ( δ ) where δ = 1 if T is a leaf task and δ = max T c ∈ child ( T ) h ( T c ) otherwise.We next define expressions that denote navigation via for-eign keys starting from the set of id variables ¯ x Tid of T . Foreach x ∈ ¯ x Tid and R ∈ DB , let x R be a new symbol. An ex-pression is a sequence ξ .ξ . . . . ξ m , ξ = x R for some x ∈ ¯ x Tid and R ∈ DB , ξ j is a foreign key in some relation of DB for2 ≤ j < m , ξ m is a foreign key or a numeric attribute, ξ is an attribute of R , and for each i , 2 < i ≤ m , if ξ i − is aforeign key referencing Q then ξ i is an attribute of Q . Wedefine the length of ξ .ξ . . . . ξ m as m . A navigation set E T is a set of expressions such that: • for each x ∈ ¯ x Tid there is at most one R ∈ DB for whichthe expression x R is in E T ; • every expression in E T is of the form x R .w where x R ∈ E T ,and has length ≤ h ( T ); • if e ∈ E T then every expression e.s of length ≤ h ( T ) ex-tending e is also in E T .Note that E T is closed under prefix. We can now define T -isomorphism type. Let E + T = E T ∪ ¯ x T ∪ { null , } . The sort of e ∈ E + T is numeric if e ∈ ¯ x T R ∪ { } or e = w.a where a is a numeric attribute; its sort is null if e = null or e = x ∈ ¯ x Tid and x R (cid:54)∈ E T for all R ∈ DB ; and its sort isID( R ) for R ∈ DB if e = x R , or e = x ∈ ¯ x Tid and x R ∈ E T ,or e = w.f where f is a foreign key referencing R . Definition A T -isomorphism type τ consists of anavigation set E T together with an equivalence relation ∼ τ over E + T such that: • if e ∼ τ f then e and f are of the same sort; • for every { x, x R } ⊆ E + T , x ∼ τ x R ; • for every e of sort null , e ∼ τ null ; • if u ∼ τ v and u.f, v.f ∈ E T then u.f ∼ τ v.f . We call an equivalence relation ∼ τ as above an equalitytype for τ . The relation ∼ τ is extended to tuples componen-twise.Note that τ provides enough information to evaluate con-ditions over ¯ x T . Satisfaction of a condition ϕ by an isomor-phism type τ , denoted τ | = ϕ , is defined as follows: • x = y holds in τ iff x ∼ τ y , • R ( x, y , . . . , y m , z , . . . , z n ) holds in τ for relation R ( id, a ,. . . , a m , f , . . . , f n ) iff { x R .a , . . . , x R .a m , x R .f , . . . ,x R .f n } ⊆ E T , and ( y , . . . , y m , z , . . . , z m ) ∼ τ ( x R .a , . . . ,x R .a m , x R .f , . . . , x R .f n ) • Boolean combinations of conditions are standard.Let τ be a T -isomorphism type with navigation set E T and equality type ∼ τ . The projection of τ onto a subset ofvariables ¯ z of ¯ x T is defined as follows. Let E T | ¯ z = { x R .e ∈E T | x ∈ ¯ z } and ∼ τ | ¯ z be the projection of ∼ τ onto ¯ z ∪ E T | ¯ z ∪{ null , } . The projection of τ onto ¯ z , denoted as τ | ¯ z , is a T -isomorphism type with navigation set E T | ¯ z and equality type ∼ τ | ¯ z . Furthermore, the projection of T -isomorphism onto¯ z upto length k , denoted as τ | (¯ z, k ), is defined as τ | ¯ z withall expressions in E T | ¯ z with length more than k removed.We apply variable renaming to isomorphism types as fol-lows. Let f be a 1-1 partial mapping from ¯ x T to VAR id ∪ VAR R such that f (¯ x Tid ) ⊆ VAR id , f (¯ x T R ) ⊆ VAR R and f (¯ x T ) ∩ ¯ x T = ∅ . For a T -isomorphism type τ with navigation set E T , f ( τ ) is the isomorphism type obtained as follows. Its navi-gation set is obtained by replacing in E T each variable x and x R in E T with f ( x ) and f ( x ) R , for x ∈ dom ( f ). The relation ∼ f ( τ ) is the image of ∼ τ under the same substitution.As seen above, a T -isomorphism type captures all infor-mation needed to evaluate a condition on ¯ x T . However,the set S T can contain unboundedly many tuples, whichcannot be represented by a finite equality type. This ishandled by keeping a set of counters for projections of T -isomorphism types on the variables relevant to S T , that is,(¯ x Tin ∪ ¯ s T ). We refer to the projection of a T -isomorphismtype onto (¯ x Tin ∪ ¯ s T ) as a T S -isomorphism type, and denoteby TS ( T ) the set of T S -isomorphism types of T . We willuse counters to record the number of tuples in S T of each T S -isomorphism type.We can now define symbolic instances.
Definition
A symbolic instance I of task T is a tuple ( τ, ¯ c ) where τ is a T -isomorphism type and ¯ c is a vector ofintegers where each dimension of ¯ c corresponds to a T S -isomorphism type.
We denote by ¯ c (ˆ τ ) the value of the dimension of ¯ c corre-sponding to the T S -isomorphism type ˆ τ and by ¯ c [ˆ τ (cid:55)→ a ]the vector obtained from ¯ c by replacing ¯ c (ˆ τ ) with a . Definition A local symbolic run ˜ ρ T of task T is atuple ( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ) , where: • each I i is a symbolic instance ( τ i , ¯ c i ) of T • each σ i is a service in Σ obsT • γ ∈ N ∪ { ω } (if γ = ω then ˜ ρ T is infinite, otherwise it isfinite) • τ in , called the input isomorphism type, is a T -isomorphismtype projected to ¯ x Tin . And τ in | = Π if T = T . • at the first instance I , τ | ¯ x Tin = τ in , for every x ∈ ¯ x Tid − ¯ x Tin , x ∼ τ null , and for every x ∈ ¯ x T R − ¯ x Tin , x ∼ τ .Also ¯ c = ¯0 and σ = σ oT . • if for some i , σ i = σ cT then ˜ ρ T is finite and i = γ − (and ˜ ρ T is called a returning run) • τ out is ⊥ if ˜ ρ T is infinite or finite but σ γ − (cid:54) = σ cT , and itis τ γ − | (¯ x Tin ∪ ¯ x Tret ) otherwise • a segment of ˜ ρ T is a subsequence { ( I i , σ i ) } i ∈ J , where J is a maximal interval [ a, b ] ⊆ { i | ≤ i < γ } such that no σ j is an internal service of T for j ∈ [ a + 1 , b ] . A segment J is terminal if γ ∈ N and b = γ − . Segments of ˜ ρ T must satisfy the following properties. For each child T c of T there is at most one i ∈ J such that σ i = σ oT c . If J isnot terminal and such i exists, there is exactly one j ∈ J for which σ j = σ cT c , and j > i . If J is terminal, there is at most one such j . for every < i < γ , I i is a successor of I i − under σ i (see below). The successor relation is defined next. We begin withsome preliminary definitions. A
T S -isomorphism type ˆ τ is input-bound if for every s ∈ ¯ s T , s (cid:54)∼ ˆ τ null implies thatthere exists an expression x R .w in ˆ τ such that x ∈ ¯ x Tin and x R .w ∼ ˆ τ s . We denote by TS ib ( T ) the set of input-boundtypes in TS ( T ). For ˆ τ , ˆ τ (cid:48) ∈ TS ( T ), update δ of the form { + S T (¯ s T ) } or {− S T (¯ s T ) } and mapping ¯ c ib from TS ib ( T ) to { , } , we define the mapping ¯ a ( δ, ˆ τ , ˆ τ (cid:48) , ¯ c ib ) from TS ( T ) to {− , , } as follows (¯ a is the mapping sending TS ( T ) to0): • if δ = { + S T (¯ s T ) } , then ¯ a ( δ, ˆ τ , ˆ τ (cid:48) , ¯ c ib ) is ¯ a [ˆ τ (cid:55)→
1] if ˆ τ isnot input-bound, and ¯ a [ˆ τ (cid:55)→ (1 − ¯ c ib (ˆ τ ))] otherwise • if δ = {− S T (¯ s T ) } , then ¯ a ( δ, ˆ τ , ˆ τ (cid:48) , ¯ c ib ) = ¯ a [ˆ τ (cid:48) (cid:55)→ − • if δ is { + S T (¯ s T ) , − S T (¯ s T ) } then¯ a ( δ, ˆ τ , ˆ τ (cid:48) , ¯ c ib ) = ¯ a ( δ + , ˆ τ , ˆ τ (cid:48) , ¯ c ib ) + ¯ a ( δ − , ˆ τ , ˆ τ (cid:48) , ¯ c ib )where δ + = { + S T (¯ s T ) } and δ − = {− S T (¯ s T ) } .Intuitively, the vector ¯ a ( δ, ˆ τ , ˆ τ (cid:48) , ¯ c ib ) specifies how the cur-rent counters need to be modified to reflect the update δ .The input-bound T S -isomorphism types require special han-dling because consecutive insertions necessarily collide so thecounter’s value cannot go beyond 1.For symbolic instances I = ( τ, ¯ c ) and I (cid:48) = ( τ (cid:48) , ¯ c (cid:48) ), I (cid:48) is asuccessor of I by applying service σ (cid:48) iff: • If σ (cid:48) is an internal service (cid:104) π, ψ, δ (cid:105) , then for ˆ τ = τ | (¯ x Tin ∪ ¯ s T ) and ˆ τ (cid:48) = τ (cid:48) | (¯ x Tin ∪ ¯ s T ), – τ | ¯ x Tin = τ (cid:48) | ¯ x Tin , – τ | = π and τ (cid:48) | = ψ , – ¯ c (cid:48) ≥ ¯0 and ¯ c (cid:48) = ¯ c + ¯ a ( δ, ˆ τ , ˆ τ (cid:48) , ¯ c ib ), where ¯ c ib the restric-tion of ¯ c to TS ib ( T ). • If σ (cid:48) is an opening service (cid:104) π, f in (cid:105) of subtask T c , then τ = τ (cid:48) | = π and ¯ c (cid:48) = ¯ c . • If σ (cid:48) is a closing service of subtask T c , then for ¯ x Tconst =¯ x T −{ x ∈ ¯ x TT ↑ c | x ∼ τ null } , τ (cid:48) | ¯ x Tconst = τ | ¯ x Tconst and ¯ c (cid:48) = ¯ c . • If σ (cid:48) is the closing service σ cT = (cid:104) π, f out (cid:105) of T , then τ | = π and ( τ, ¯ c ) = ( τ (cid:48) , ¯ c (cid:48) ).Note that there is a subtle mismatch between transitionsin actual local runs and in symbolic runs. In the symbolictransitions defined above, a service inserting a tuple in S T always causes the correspoding counter to increase (exceptfor the input-bound case). However, in actual runs, an in-serted tuple may collide with an already existing tuple in theset, in which case the number of tuples does not increase.Symbolic runs do not account for such collisions (beyond theinput-bound case), which raises the danger that they mightoverestimate the number of available tuples and allow impos-sible retrievals. Fortunately, the proof of Theorem 20 showsthat collisions can be ignored at no peril. More specifically,it follows from the proof that for every actual local run withcollisions satisfying an HLTL-FO property there exists anactual local run without collisions that satisfies the sameproperty. The intuition is the following. First, given anactual run with collisions, one can modify it so that onlynew tuples are inserted in the artifact relation, thus avoid-ing collisions. However, this raises a challenge, since it mayrequire augmenting the database with new tuples. If donenaively, this could result in an infinite database. The moresubtle observation, detailed in the proof of Theorem 20, isthat only a bounded number of new tuples must be created,thus keeping the database finite. Definition A symbolic tree of runs is a directed la-beled tree Sym in which each node is a local symbolic run ˜ ρ T for some task T , and every edge connects a local sym-bolic run of a task T with a local symbolic run of a childtask T c and is labeled with a non-negative integer i (denoted i (˜ ρ T c ) ). In addition, the following properties are satisfied.Let ˜ ρ T = ( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ) be a node of Sym . Let i be such that σ i = σ oT c for some child T c of T . There existsa unique edge labeled i from ˜ ρ T to a node ˜ ρ T c = ( τ (cid:48) in , τ (cid:48) out , { ( I (cid:48) i , σ (cid:48) i ) } ≤ i<γ (cid:48) ) of Sym , and the following hold: • τ (cid:48) in = f − in ( τ i ) | (¯ x T c in , h ( T c )) where f in is the input variablemapping of σ oT c • ˜ ρ T c is a returning run iff there exists j > i such that σ j = σ cT c ; let k be the minimum such j . Let ¯ x r = ¯ x TT ↓ c and ¯ x w = { x | x ∈ ¯ x TT ↑ c , x ∼ τ k − null } . Then τ k | (¯ x r ∪ ¯ x w , h ( T c )) = (( f in ◦ f − out )( τ out )) | (¯ x r ∪ ¯ x w ) where f out isthe output variable mapping of σ cT c .For every local symbolic run ˜ ρ T where γ (cid:54) = ω and τ out = ⊥ ,there exists a child of ˜ ρ T which is not returning. Now consider an HLTL-FO formula ϕ = [ ξ ] T over Γ.Satisfaction of ϕ by a symbolic tree of runs is defined analo-gously to satisfaction by local runs, keeping in mind that aspreviously noted, isomorphism types of symbolic instancesof T provide enough information to evaluate conditions over¯ x T . The definition of acceptance by the automaton B ϕ , andLemma 14, are also immediately extended to symbolic treesof runs. We state the following. Lemma
A symbolic tree of runs
Sym over Γ satisfies ϕ iff Sym is accepted by B ϕ . The key result enabling the use of symbolic trees of runsis the following (see Appendix for proof).
Theorem
For an artifact system Γ and HLTL-FOproperty ϕ , there exists a tree of local runs Tree accepted by B ϕ , iff there exists a symbolic tree of runs Sym accepted by B ϕ . The only-if part is relatively straightforward, but the if part is non-trivial. The construction of an accepted treeof local runs from an accepted symbolic tree of runs Sym isdone in two stages. First, an accepted tree of local runs overan infinite database is constructed, using a global equalitytype that extends the local equality types by taking into ac-count connections across instances resulting from the prop-agation of input variables and insertions and retrievals oftuples from S T , and subject to satisfaction of the key con-straints. In the second stage, the infinite database is turnedinto a finite one by carefully merging data values, whileavoiding any inconsistencies. In view of Theorem 20, we can now focus on the problem ofchecking the existence of a symbolic tree of runs satisfyinga given HLTL-FO property. To begin, we define a notionthat captures the functionality of each task and allows amodular approach to the verification algorithm. Let ϕ bean HLTL-FO formula over Γ, and recall the automaton B ϕ and associated notation from Section 3. We consider therelation R T between input and outputs of each task, definedby its symbolic runs that satisfy a given truth assignment β to the formulas in Φ T . More specifically, we denote by H T the restriction of H to T and its descendants, and Γ T theorresponding HAS, with precondition true . The relation R T consists of the set of triples ( τ in , τ out , β ) for which thereexists a symbolic tree of runs Sym T of H T such that: • β is a truth assignment to Φ T • Sym T is accepted by B β • the root of Sym T is ˜ ρ T = ( τ in , τ out , { ( I i , σ i ) } ≤ i<γ )Note that there exists a symbolic tree of runs Sym over Γsatisfying ϕ = [ ξ ] T iff ( τ in , ⊥ , β ) ∈ R T for some τ in satis-fying the precondition of Γ, and β ( ξ ) = 1. Thus, if R T iscomputable for every T , then satisfiability of [ ξ ] T by somesymbolic tree of runs over Γ is decidable, and yields an al-gorithm for model-checking HLTL-FO properties of HAS’s.We next describe an algorithm that computes the relations R T ( τ in , τ out , β ) recursively. The algorithm uses as a key toolVector Addition Systems with States (VASS) [14, 33], whichwe review next.A VASS V is a pair ( Q, A ) where Q is a finite set of states and A is a finite set of actions of the form ( p, ¯ a, q ) where ¯ a ∈ Z d for some fixed d >
0, and p, q ∈ Q . A run of V = ( Q, A )is a finite sequence ( q , ¯ z ) . . . ( q n , ¯ z n ) where ¯ z = ¯0 and foreach i ≥ q i ∈ Q , ¯ z i ∈ N d , and ( q i , ¯ a, q i +1 ) ∈ A for some ¯ a such that ¯ z i +1 = ¯ z i + ¯ a . We will use the following decisionproblems related to VASS. • State Reachability : For given states q , q f ∈ Q , is there arun ( q , ¯ z ) . . . ( q n , ¯ z n ) of V such that q n = q f ? • State Repeated Reachability : For given states q , q f ∈ Q , isthere a run ( q , ¯ z ) . . . ( q m , ¯ z m ) . . . ( q n , ¯ z n ) of V such that q m = q n = q f and ¯ z m ≤ ¯ z n ?Both problems are known to be expspace -complete [39,47, 33] . In particular, [33] shows that for a n -states, d -dimensional VASS where every dimension of each action hasconstant size, the state repeated reachability problem canbe solved in O ((log n )2 c · d log d ) non-deterministic space forsome constant c . The state reachability problem has thesame complexity. VASS Construction
Let T be a task, and suppose thatrelations R T c have been computed for all children T c of T .We show how to compute R T using an associated VASS.For each truth assignment β of Φ T , we construct a VASS V ( T, β ) = (
Q, A ) as follows. The states in Q are all tuples( τ, σ, q, ¯ o, ¯ c ib ) where τ is a T -isomorphism type, σ a service, q a state of B ( T, β ), and ¯ c ib a mapping from TS ib ( T ) to { , } .The vector ¯ o indicates the current stage of each child T c of T ( init , active or closed ) and also specifies the outputsof T c (an isomorphism type or ⊥ ). That is, ¯ o is a partialmapping associating to some of the children T c of T thevalue ⊥ , a T c -isomorphism type projected to ¯ x T c in ∪ ¯ x T c ret orthe value closed . Intuitively, T c (cid:54)∈ dom (¯ o ) means that T c isin the init state, and ¯ o ( T c ) = ⊥ indicates that T c has beencalled but will not return. If ¯ o ( T c ) is an isomorphism type τ ,this indicates that T c has been called, has not yet returned,and will return the isomorphism type τ . When T c returns,¯ o ( T c ) is set to closed , and T c cannot be called again beforean internal service of T is applied.The set of actions A consists of all triples ( α, ¯ a, α (cid:48) ) where α = ( τ, σ, q, ¯ o, ¯ c ib ), α (cid:48) = ( τ (cid:48) , σ (cid:48) , q (cid:48) , ¯ o (cid:48) , ¯ c (cid:48) ib ), δ (cid:48) is the update of σ (cid:48) , and the following hold: • τ (cid:48) is a successor of τ by applying service σ (cid:48) ; • ¯ a = ¯ a ( δ (cid:48) , ˆ τ , ˆ τ (cid:48) , ¯ c ib ) (defined in Section 4.1), where ˆ τ = τ | (¯ x Tin ∪ ¯ s T ) and ˆ τ (cid:48) = τ (cid:48) | (¯ x Tin ∪ ¯ s T ) • ¯ c (cid:48) ib = ¯ c ib + ¯ a • if σ (cid:48) is an internal service, dom (¯ o (cid:48) ) = ∅ . • If σ (cid:48) = σ oT c , then T c (cid:54)∈ dom (¯ o ) and for τ T c in = f − in ( τ | (¯ x TT ↓ c , h ( T c ))) , for some output τ T c out of T c andtruth assignment β T c to Φ T c , tuple ( τ T c in , τ T c out , β T c ) is in R T c . Note that τ T c out can be ⊥ , which indicates that thiscall to T c does not return. Also, ¯ o (cid:48) = ¯ o [ T c (cid:55)→ τ T c out ]. • If σ (cid:48) = σ cT c , then ¯ o ( T c ) = ( f out ◦ f − in )( τ (cid:48) | (¯ x TT ↓ c ∪ ¯ x TT ↑ c , h ( T c )))and ¯ o (cid:48) = ¯ o [ T c (cid:55)→ closed ]. • q (cid:48) is a successor of q in B ( T, β ) by evaluating Φ T using( τ (cid:48) , σ (cid:48) ). If σ (cid:48) = σ oT c , formulas in Φ T c are assigned thetruth values defined by β T c .An initial state of V ( T, β ) is a state of the form v =( τ , σ , q , ¯ o , ¯ c ib ) where τ is an initial T -isomorphism type(i.e., for every x ∈ ¯ x Tid − ¯ x Tin , x ∼ τ null , and for every x ∈ ¯ x T R − ¯ x Tin , x ∼ τ σ = σ oT , q is the successor ofsome initial state of B ( T, β ) under ( τ , σ ), dom (¯ o ) = ∅ ,and ¯ c ib = ¯0. Computing R T ( τ in , τ out , β ) from V ( T, β )Checking whether ( τ in , τ out , β ) is in R T can be done usinga (repeated) reachability test on V ( T, β ), as stated in thefollowing key lemma (see Appendix for proof).
Lemma
21. ( τ in , τ out , β ) ∈ R T iff there exists an initialstate v = ( τ , σ , q , ¯ o , ¯ c ib ) of V ( T, β ) for which τ | ¯ x Tin = τ in and the following hold: • If τ out (cid:54) = ⊥ , then there exists state v n = ( τ n , σ n , q n , ¯ o n , ¯ c nib ) where τ out = τ n | (¯ x Tin ∪ ¯ x Tret ) , σ n = σ cT , q n ∈ Q fin where Q fin is the set of accepting states of B ( T, β ) for finiteruns, such that v n is reachable from v . A path from ( v , ¯0) to ( v n , ¯ z n ) is called a returning path . • If τ out = ⊥ , then one of the following holds: – there exists a state v n = ( τ n , σ n , q n , ¯ o n , ¯ c nib ) in which q n ∈ Q inf where Q inf is the set of accepting statesof B ( T, β ) for infinite runs, such that v n is repeatedlyreachable from v . A path ( v , ¯0) . . . ( v n , ¯ z n ) . . . ( v n , ¯ z (cid:48) n ) where ¯ z n ≤ ¯ z (cid:48) n is called a lasso path . – There exists state v n = ( τ n , σ n , q n , ¯ o n , ¯ c nib ) in which ¯ o n ( T c ) = ⊥ for some child T c of T and q n ∈ Q fin ,such that v n is reachable from v . The path from ( v , ¯0) to ( v n , ¯ z n ) is called a blocking path . Complexity of Verification
We now have all ingredientsin place for our verification algorithm. Let Γ be a HAS and ϕ = [ ξ ] T an HLTL-FO formula over Γ. In view of the pre-vious development, Γ | = ϕ iff [ ¬ ξ ] T is not satisfiable by asymbolic tree of runs of Γ. We outline a non-deterministicalgorithm for checking satisfiability of [ ¬ ξ ] T , and establishits space complexity O ( f ), where f is a function of the rel-evant parameters. The space complexity of verification (thecomplement) is then O ( f ) by Savitch’s theorem [48].Recall that [ ¬ ξ ] T is satisfiable by a symbolic tree of runsof Γ iff ( τ in , ⊥ , β ) ∈ R T for some τ in satisfying the precon-dition of Γ, and β ( ¬ ξ ) = 1. By Lemma 21, membership in R T can be reduced to state (repeated) reachability in theVASS V ( T , β ). For a given VASS, (repeated) reachabilityis decided by non-deterministically generating runs of theVASS up to a certain length, using space O (log n · c · d log d )where n is the number of states, d is the vector dimensionand c is a constant [33]. The same approach can be used forthe VASS V ( T , β ), with the added complication that gener-ating transitions requires membership tests in the relations R T c ’s for T c ∈ child ( T ). These in turn become (repeated)cyclic Linearly-Cyclic Cyclicw/o. Artifact relations c · N O (1) O ( N c · h ) h - exp( O ( N ))w. Artifact relations O (exp( N c )) O (2- exp( N c · h )) ( h + 2)- exp( O ( N )) Table 1:
Space complexity of verification without arithmetic ( N : size of (Γ , ϕ ) ; h : depth of hierarchy; c : constantsdepending on the schema) reachability tests in the corresponding VASS. Assuming that n and d are upper bounds for the number of states and di-mensions for all V ( T, β ) with T ∈ H , this yields a totalspace bound of O ( h log n · c · d log d ) for membership testingin V ( T , β ), where h is the depth of H .In our construction of V ( T, β ), the vector dimension d isthe number of T S -isomorphism types. The number of states n is at most the product of the number of T -isomorphismtypes, the number states in B ( T, β ), the number of all pos-sible ¯ o and the number of possible states of ¯ c ib . The worst-case complexity occurs for HAS with unrestricted schemas(cyclic foreign keys) and artifact relations. To understandthe impact of the foreign key structure and artifact relations,we also consider the complexity for acyclic and linear-cyclicschemas, and without artifact relations. A careful analysisyields the following (see Appendix C.3). For better read-ability, we state the complexity for HAS over a fixed schema(database and maximum arity of artifact relations). Theimpact of the schema is detailed in Appendix C.3. Theorem
Let Γ be a HAS over a fixed schema and ϕ an HLTL-FO formula over Γ . The deterministic space com-plexity of checking whether Γ | = ϕ is summarized in Table1. Note that the worst-case space complexity is non-elementary,as for feedback-free systems [19]. However, the height of thetower of exponentials in [19] is the square of the total num-ber of artifact variables of the system, whereas in our caseit is the depth of the hierarchy, likely to be much smaller.
5. VERIFICATION WITH ARITHMETIC
We next outline the extension of our verification algo-rithm to handle HAS and HLTL-FO properties whose con-ditions use arithmetic constraints expressed as polynomialinequalities with integer coefficients over the numeric vari-ables (ranging over R ). We note that one could alterna-tively limit the arithmetic constraints to linear inequalitieswith integer coefficients (and variables ranging over Q ), withthe same complexity results. These are sufficient for manyapplications.The seed idea behind our approach is that, in order todetermine whether the arithmetic constraints are satisfied,we do not need to keep track of actual valuations of the taskvariables and the numeric navigation expressions they an-chor (for which the search space would be infinite). Instead,we show that these valuations can be partitioned into a fi-nite set of equivalence classes with respect to satisfaction ofthe arithmetic constraints, which we then incorporate intothe isomorphism types of Section 4, extending the algorithmpresented there. This however raises some significant tech-nical challenges, which we discuss next.Intuitively, this approach uses the fact that a finite set ofpolynomials P partitions the space into a bounded num-ber of cells containing points located in the same region(= 0 , < , >
0) with respect to every polynomial P ∈ P . k - exp is the tower of exponential functions of height k . Isomorphism types are extended to include a cell, which de-termines which arithmetic constraints are satisfied in theconditions of services and in the property. In addition tothe requirements detailed in Section 4, we need to enforcecell compatibility across symbolic service calls. For instance,when a task executes an internal service, the correspondingsymbolic transition from cell c to c (cid:48) is possible only if theprojections of c and c (cid:48) on the subspace corresponding to thetask’s input variables have non-empty intersection (since in-put variables are preserved). Similarly, when the opening orclosing service of a child task is called, compatibility is re-quired between the parent’s and the child’s cell on the sharedvariables, which amounts again to non-empty intersectionbetween cell projections. This suggests the following first-cut (and problematic) attempt at a verification algorithm:once a local transition imposes new constraints, representedby a cell c (cid:48) , these constraints are propagated back to previ-ously guessed cells, refining them via intersection with c (cid:48) . Ifan intersection becomes empty, the candidate symbolic runconstructed so far has no corresponding actual run and thesearch is pruned. The problem with this attempt is that itis incompatible with the way we deal with sets in Section 4:the contents of sets are represented by associating countersto the isomorphism types of their elements. Since extendedisomorphism types include cells, retroactive cell intersectioninvalidates the counters and the results of previous VASSreachability checks.We develop an alternative solution that avoids retroactivecell intersection altogether. More specifically, for each task,our algorithm extends isomorphism types with cells guessedfrom a pre-computed set constructed by following the taskhierarchy bottom-up and including in the parent’s set thosecells obtained by appropriately projecting the children’s cellson shared variables and expressions. Only non-empty cellsare retained. We call the resulting cell collection the Hier-archical Cell Decomposition (HCD).The key benefit of the HCD is that it arranges the space ofcells so that consistency of a symbolic run can be guaranteedby performing simple local compatibility tests on the cellsinvolved in each transition. Specifically, (i) in the case ofinternal service calls, the next cell c (cid:48) must refine the currentcell c on the shared variables (that is, the projection of c (cid:48) must be contained in the projection of c ); (ii) in the case ofchild task opening/closing services, the parent cell c mustrefine the child cell c (cid:48) . This ensures that in case (i) theintersection with c (cid:48) of all relevant previously guessed cellsis non-empty (because we only guess non-empty cells and c (cid:48) refines all prior guesses), and in case (ii) the intersection withthe child’s cell c (cid:48) is a no-op for the parent cell. Consequently,retroactive intersection can be skipped as it can never leadto empty cells.A natural starting point for constructing the HCD is togather for each task all the polynomials appearing in itsarithmetic constraints (or in the property sub-formulas re-ferring to that task), and associate sign conditions to each.This turns out to be insufficient. For example, the projec-tion from the child cell can impose on the parent variablesnew constraints which do not appear explicitly in the parentcyclic Linearly-Cyclic Cyclicw/o. Artifact relations O (exp( N c · h )) O (exp( N c · h )) ( h + 1)- exp( O ( N ))w. Artifact relations O (2- exp( N c · h )) O (2- exp( N c · h )) ( h + 2)- exp( O ( N ))) Table 2:
Space complexity of verification with arithmetic ( N : size of (Γ , ϕ ) ; h : depth of hierarchy; c : constantsdepending on the schema) task. It is a priori not obvious that the constrained cells canbe represented symbolically, let alone efficiently computed.The tool enabling our solution is the Tarski-Seidenberg The-orem [52], which ensures that the projection of a cell is repre-sentable by a union of cells defined by a set of polynomials(computed from the original ones) and sign conditions forthem. The polynomials can be efficiently computed usingquantifier elimination.Observe that a bound on the number of newly constructedpolynomials yields a bound on the number of cells in theHCD, which in turn implies a bound on the number of dis-tinct extended isomorphism types manipulated by the ver-ification algorithm, ultimately yielding decidability of veri-fication. A naive analysis produces a bound on the numberof cells that is hyperexponential in the height of the task hi-erarchy, because the number of polynomials can proliferateat this rate when constructing all possible projections, and p polynomials may produce 3 p cells. Fortunately, a clas-sical result from real algebraic geometry ([4], reviewed inAppendix D.2) bounds the number of distinct non-empty cells to only exponential in the number of variables (the ex-ponent is independent of the number of polynomials). Thisyields an upper bound of the number of cells (and also thenumber of extended isomorphism types) which is singly ex-ponential in the number of numeric expressions and doublyexponential in the height of the hierarchy H . We state be-low our complexity results for verification with arithmetic,relegating details (including a fine-grained analysis) to Ap-pendix D. Theorem
Let Γ be a HAS over a fixed database schemaand ϕ an HLTL-FO formula over Γ . If arithmetic is allowedin (Γ , ϕ ) , then the deterministic space complexity of checkingwhether Γ | = ϕ is summarized in Table 2.
6. RESTRICTIONS AND UNDECIDABILITY
We briefly review the main restrictions imposed on theHAS model and motivate them by showing that they areneeded to ensure decidability of verification. Specifically,recall that the following restrictions are placed in the model:1. in an internal transition of a given task (caused by aninternal service), only the input parameters of the taskare explicitly propagated from one artifact tuple to thenext2. each task may overwrite upon return only null variablesin the parent task3. the artifact variables of a task storing the values returnedby its subtasks are disjoint from the task’s input variables4. an internal transition can take place only if all activesubtasks have returned5. each task has just one artifact relation6. the artifact relation of a task is reset to empty every timethe task closes7. the tuple of artifact variables whose value is inserted orretrieved from a task’s artifact relation is fixed8. each subtask may be called at most once between internaltransitions of its parent These restrictions are placed in order to control the dataflow and recursive computation in the system. Lifting anyof them leads to undecidability of verification, as stated in-formally next.
Theorem
For each i, ≤ i ≤ , let HAS ( i ) be definedidentically to HAS but without restriction ( i ) above. It isundecidable, given a HAS ( i ) Γ and an HLTL-FO formula ϕ over Γ , whether Γ | = ϕ . The proofs of undecidability for (1)-(7) are by reductionfrom the Post Correspondence Problem (PCP) [46, 48]. Theymake no use of arithmetic, so undecidability holds even with-out arithmetic constraints. The only undecidability resultrelying on arithmetic is (8). Indeed, restriction (8) can belifted in the absence of numeric variables, with no impact ondecidability or complexity of verification. This is because re-striction (2) ensures that even if a subtask is called repeat-edly, only a bounded number of calls have a non-vacuouseffect.The proofs using a reduction from the PCP rely on thesame main idea: removal of the restriction allows to extractfrom the database a path of unbounded length in a labeledgraph, and check that its labels spell a solution to the PCP.For illustration, the proof of undecidability for (2) using thistechnique is sketched in Appendix E.We claim that the above restrictions remain sufficientlypermissive to capture a wide class of applications of practicalinterest. This is confirmed by numerous examples of practi-cal business processes modeled as artifact systems, that weencountered in our collaboration with IBM (see [19]). Therestrictions limit the recursion and data flow among tasksand services. In practical workflows, the required recursionis rarely powerful enough to allow unbounded propagationof data among services. Instead, as also discussed in [19],recursion is often due to two scenarios: • allowing a certain task to undo and retry an unboundednumber of times, with each retrial independent of previ-ous ones, and depending only on a context that remainsunchanged throughout the retrial phase (its input param-eters). A typical example is repeatedly providing creditcard information until the payment goes through, whilethe order details remain unchanged. • allowing a task to batch-process an unbounded collectionof records, each processed independently, with unchangedinput parameters (e.g. sending invitations to an event toall attendants on the list, for the same event details).Such recursive computation can be expressed with theabove restrictions, which are satisfied by our example pro-vided in Appendix A.1.
7. RELATED WORK
We have already discussed our own prior related work inthe introduction. We summarize next other related work onverification of artifact systems.Initial work on formal analysis of artifact-based businessprocesses in restricted contexts has investigated reachability31, 32], general temporal constraints [32], and the existenceof complete execution or dead end [12]. For each consideredproblem, verification is generally undecidable; decidabilityresults were obtained only under rather severe restrictions,e.g., restricting all pre-conditions to be “true” [31], restrict-ing to bounded domains [32, 12], or restricting the pre- andpost-conditions to be propositional, and thus not referringto data values [32]. [17] adopts an artifact model variationwith arithmetic operations but no database. Decidabilityrelies on restricting runs to bounded length. [56] addressesthe problem of the existence of a run that satisfies a tempo-ral property, for a restricted case with no database and onlypropositional LTL properties. All of these works model nounderlying database, sets (artifact relations), task hierarchy,or arithmetic.A recent line of work has tackled verification of artifact-centric processes with an underlying relational database. [6,5, 7, 8, 21] evolve the business process model and propertylanguage, culminating in [34], which addresses verification offirst-order µ -calculus (hence branching time) properties overbusiness processes expressed in a framework that is equiva-lent to artifact systems whose input is provided by externalservices. [9, 16] extend the results of [34] to artifact-centricmulti-agent systems where the property language is a ver-sion of first-order branching-time temporal-epistemic logicexpressing the knowledge of the agents. This line of workuses variations of a business process model called DCDS(data-centric dynamic systems), which is sufficienty expres-sive to capture the GSM model, as shown in [50]. In theirunrestricted form, DCDS and HAS have similar expressivepower. However, the difference lies in the tackled verificationproblem and in the restrictions imposed to achieve decidabil-ity. We check satisfaction of linear-time properties for ev-ery possible choice of initial database instance, whereas therelated line checks branching-time properties and assumesthat the initial database is given. None of the related worksaddress arithmetic. In the absence of arithmetic, the restric-tions introduced for decidability are incomparable (neithersubsumes the other).Beyond artifact systems, there is a plethora of literatureon data-centric processes, dealing with various static analy-sis problems and also with runtime monitoring and synthe-sis. We discuss the most related works here and refer thereader to the surveys [15, 25] for more. Static analysis forsemantic web services is considered in [43], but in a contextrestricted to finite domains. The works [26, 51, 2] are an-cestors of [24] from the context of verification of electroniccommerce applications. Their models could conceptually (ifnot naturally) be encoded in HAS but correspond only toparticular cases supporting no arithmetic, sets, or hierar-chies. Also, they limit external inputs to essentially comefrom the active domain of the database, thus ruling out freshvalues introduced during the run.
8. CONCLUSION
We showed decidability of verification for a rich artifactmodel capturing core elements of IBM’s successful GSM sys-tem: task hierarchy, concurrency, database keys and foreignkeys, arithmetic constraints, and richer artifact data. Theextended framework requires the use of novel techniquesincluding nested Vector Addition Systems and a variantof quantifier elimination tailored to our context. We im-prove significantly on previous work on verification of arti- fact systems with arithmetic [19], which only exhibits non-elementary upper bounds regardless of the schema shape,even absent artifact relations. In contrast, for acyclic andlinearly-cyclic schemas, even in the presence of arithmeticand artifact relations, our new upper bounds are elementary(doubly-exponential in the input size and triply-exponentialin the depth of the hierarchy). This brings the verificationalgorithm closer to practical relevance, particularly since itscomplexity gracefully reduces to pspace (for acyclic schema)and expspace in the hierarchy depth (for linearly-cyclicschema) when arithmetic and artifact relations are not present.The sole remaining case of nonelementary complexity occursfor arbitrary cyclic schemas. Altogether, our results providesubstantial new insight and techniques for the automaticverification of realistic artifact systems.
Acknowledgement
This work was supported in part by theNational Science Foundation under award IIS-1422375.
9. REFERENCES
JCSS ,61(2):236–269, 2000. Extended abstract in PODS 98.[3] S. Basu, R. Pollack, and M.-F. Roy.
Algorithms inReal Algebraic Geometry (Algorithms andComputation in Mathematics) . Springer-Verlag NewYork, Inc., Secaucus, NJ, USA, 2006.[4] S. Basu, R. Pollak, and M.-F. Roy. On the number ofcells defined by a family of polynomials on a variety.
Mathematika , 43(1):120–126, 1996.[5] F. Belardinelli, A. Lomuscio, and F. Patrizi. Acomputationally-grounded semantics forartifact-centric systems and abstraction results. In
IJCAI 2011, Proceedings of the 22nd InternationalJoint Conference on Artificial Intelligence, Barcelona,Catalonia, Spain, July 16-22, 2011 , pages 738–743,2011.[6] F. Belardinelli, A. Lomuscio, and F. Patrizi.Verification of deployed artifact systems via dataabstraction. In
Service-Oriented Computing - 9thInternational Conference, ICSOC 2011, Paphos,Cyprus, December 5-8, 2011 Proceedings , pages142–156, 2011.[7] F. Belardinelli, A. Lomuscio, and F. Patrizi. Anabstraction technique for the verification ofartifact-centric systems. In
Principles of KnowledgeRepresentation and Reasoning: Proceedings of theThirteenth International Conference, KR 2012, Rome,Italy, June 10-14, 2012 , 2012.[8] F. Belardinelli, A. Lomuscio, and F. Patrizi.Verification of gsm-based artifact-centric systemsthrough finite abstraction. In
Service-OrientedComputing - 10th International Conference, ICSOC2012, Shanghai, China, November 12-15, 2012.Proceedings , pages 17–31, 2012.[9] F. Belardinelli, A. Lomuscio, and F. Patrizi.Verification of agent-based artifact systems.
J. Artif.Intell. Res. (JAIR) , 51:333–376, 2014.[10] K. Bhattacharya, N. S. Caswell, S. Kumaran,A. Nigam, and F. Y. Wu. Artifact-centered operationalmodeling: Lessons from customer engagements.
IBMSystems Journal , 46(4):703–721, 2007.11] K. Bhattacharya et al. A model-driven approach toindustrializing discovery processes in pharmaceuticalresearch.
IBM Systems Journal , 44(1):145–162, 2005.[12] K. Bhattacharya, C. E. Gerede, R. Hull, R. Liu, andJ. Su. Towards formal analysis of artifact-centricbusiness process models. In
Proc. Int. Conf. onBusiness Process Management (BPM) , pages 288–304,2007.[13] BizAgi and Cordys and IBM and Oracle and SAP AGand Singularity (OMG Submitters) and AgileEnterprise Design and Stiftelsen SINTEF and TIBCOand Trisotech (Co-Authors). Case Management Modeland Notation (CMMN), FTF Beta 1, Jan. 2013. OMGDocument Number dtc/2013-01-01, ObjectManagement Group.[14] M. Blockelet and S. Schmitz. Model checkingcoverability graphs of vector addition systems. In
Mathematical Foundations of Computer Science 2011 ,pages 108–119. Springer, 2011.[15] D. Calvanese, G. De Giacomo, and M. Montali.Foundations of data-aware process analysis: adatabase theory perspective. In
PODS , pages 1–12,2013.[16] D. Calvanese, G. Delzanno, and M. Montali.Verification of relational multiagent systems with datatypes. In
Proceedings of the Twenty-Ninth AAAIConference on Artificial Intelligence, January 25-30,2015, Austin, Texas, USA. , pages 2031–2037, 2015.[17] D. Calvanese, G. D. Giacomo, R. Hull, and J. Su.Artifact-centric workflow dominance. In
ICSOC/ServiceWave , pages 130–143, 2009.[18] T. Chao et al. Artifact-based transformation of IBMGlobal Financing: A case study. In
BPM , 2009.[19] E. Damaggio, A. Deutsch, and V. Vianu. Artifactsystems with data dependencies and arithmetic.
ACMTrans. Database Syst. , 37(3):22, 2012. Also in ICDT2011.[20] E. Damaggio, R. Hull, and R. Vacul´ın. On theequivalence of incremental and fixpoint semantics forbusiness artifacts with guard-stage-milestonelifecycles.
Information Systems , 38:561–584, 2013.[21] G. De Giacomo, R. D. Masellis, and R. Rosati.Verification of conjunctive artifact-centric services.
Int. J. Cooperative Inf. Syst. , 21(2):111–140, 2012.[22] G. De Giacomo and M. Y. Vardi. Linear temporallogic and linear dynamic logic on finite traces. In
Proceedings of the Twenty-Third international jointconference on Artificial Intelligence , pages 854–860.AAAI Press, 2013.[23] H. de Man. Case management: Cordys approach. BPTrends ( ), 2009.[24] A. Deutsch, R. Hull, F. Patrizi, and V. Vianu.Automatic verification of data-centric businessprocesses. In
ICDT , pages 252–267, 2009.[25] A. Deutsch, R. Hull, and V. Vianu. Automaticverification of database-centric systems.
SIGMODRecord , 43(3):5–17, 2014.[26] A. Deutsch, L. Sui, and V. Vianu. Specification andverification of data-driven web applications.
JCSS ,73(3):442–474, 2007. [27] V. Diekert and P. Gastin. Pure future local temporallogics are expressively complete for Mazurkiewicztraces. In
LATIN 2004: Theoretical Informatics, 6thLatin American Symposium, Buenos Aires, Argentina,April 5-8, 2004, Proceedings , pages 232–241, 2004.[28] V. Diekert and P. Gastin. Pure future local temporallogics are expressively complete for Mazurkiewicztraces.
Inf. Comput. , 204(11):1597–1619, 2006.[29] V. Diekert and G. Rozenberg. The Book of Traces.World Scientific, Singapore, 1995.[30] E. A. Emerson. Temporal and modal logic. In J. V.Leeuwen, editor,
Handbook of Theoretical ComputerScience, Volume B: Formal Models and Sematics ,pages 995–1072. North-Holland Pub. Co./MIT Press,1990.[31] C. E. Gerede, K. Bhattacharya, and J. Su. Staticanalysis of business artifact-centric operationalmodels. In
IEEE International Conference onService-Oriented Computing and Applications , 2007.[32] C. E. Gerede and J. Su. Specification and verificationof artifact behaviors in business process models. In
Proceedings of 5th International Conference onService-Oriented Computing (ICSOC) , Vienna,Austria, September 2007.[33] P. Habermehl. On the complexity of the linear-time µ -calculus for petri nets. In Application and Theory ofPetri Nets 1997 , pages 102–116. Springer, 1997.[34] B. B. Hariri, D. Calvanese, G. De Giacomo,A. Deutsch, and M. Montali. Verification of relationaldata-centric dynamic systems with external services.In
Proceedings of the 32nd ACMSIGMOD-SIGACT-SIGART Symposium on Principlesof Database Systems, PODS 2013, New York, NY,USA - June 22 - 27, 2013 , pages 163–174, 2013.[35] J. Heintz, P. Solern´o, and M. Roy. On the complexityof semialgebraic sets. In
IFIP Congress , pages293–298, 1989.[36] R. Hull, E. Damaggio, R. D. Masellis, F. Fournier,M. Gupta, F. H. III, S. Hobson, M. Linehan,S. Maradugu, A. Nigam, P. Sukaviriya, andR. Vacul´ın. Business artifacts withguard-stage-milestone lifecycles: Managing artifactinteractions with conditions and events. In
ACMDEBS , 2011.[37] H. Kamp. Tense logic and the theory of linear order,1968. Phd thesis, University of California, Los Angeles.[38] R. Kimball and M. Ross.
The data warehouse toolkit:the complete guide to dimensional modeling . JohnWiley & Sons, 2011.[39] R. Lipton. The reachability problem requiresexponential space.
Research Report 62, Department ofComputer Science, Yale University, New Haven,Connecticut , 1976.[40] M. Marin, R. Hull, and R. Vacul´ın. Data centric bpmand the emerging case management standard: A shortsurvey. In
BPM Workshops , 2012.[41] R. Mayr. Undecidable problems in unreliablecomputations.
Theoretical Computer Science ,297(1):337–354, 2003.[42] A. Mazurkiewicz. Concurrent program schemes andtheir interpretation. DAIMI Rep. PB 78, AarhusUniversity, Aarhus, 1977.43] S. Narayanan and S. McIlraith. Simulation,verification and automated composition of webservices. In
Intl. World Wide Web Conf.(WWW2002) , 2002.[44] A. Nigam and N. S. Caswell. Business artifacts: Anapproach to operational specification.
IBM SystemsJournal , 42(3):428–445, 2003.[45] D. Peled. Combining partial order reductions withon-the-fly model-checking. In
Computer aidedverification , pages 377–390. Springer, 1994.[46] E. L. Post. Recursive unsolvability of a problem ofThue.
J. of Symbolic Logic , 12:1–11, 1947.[47] C. Rackoff. The covering and boundedness problemsfor vector addition systems.
Theoretical ComputerScience , 6(2):223–231, 1978.[48] M. Sipser.
Introduction to the theory of computation .PWS Publishing Company, 1997.[49] A. P. Sistla, M. Y. Vardi, and P. Wolper. Thecomplementation problem for B¨uchi automata withapplications to temporal logic.
Theoretical ComputerScience , 49:217–237, 1987.[50] D. Solomakhin, M. Montali, S. Tessaris, and R. D.Masellis. Verification of artifact-centric systems:Decidability and modeling issues. In
Service-OrientedComputing - 11th International Conference, ICSOC2013, Berlin, Germany, December 2-5, 2013,Proceedings , pages 252–266, 2013.[51] M. Spielmann. Verification of relational transducersfor electronic commerce.
JCSS. , 66(1):40–65, 2003.Extended abstract in PODS 2000.[52] A. Tarski. A decision method for elementary algebraand geometry. , 1951.[53] M. Y. Vardi and P. Wolper. An automata-theoreticapproach to automatic program verification. In
LICS ,1986.[54] P. Vassiliadis and T. Sellis. A survey of logical modelsfor olap databases.
ACM Sigmod Record , 28(4):64–69,1999.[55] P. Wolper, M. Y. Vardi, P. Sistla, et al. Reasoningabout infinite computation paths. In
Foundations ofComputer Science, 1983., 24th Annual Symposium on ,pages 185–194. IEEE, 1983.[56] X. Zhao, J. Su, H. Yang, and Z. Qiu. Enforcingconstraints on life cycles of business artifacts. In
TASE , pages 111–118, 2009.[57] W.-D. Zhu et al. Advanced Case Management withIBM Case Manager. Available at . APPENDIXA. EXAMPLES
In this section we provide an example of HAS modelinga simple travel booking business process similar to Expedia[1]. We also show an example property that the processshould satisfy, using HLTL-FO.
A.1 Example Hierarchical Artifact System
The artifact system captures a process where a customerbooks flights and/or makes hotel reservations. The customerstarts with constructing a trip by adding a flight and/or ho- tel reservation to it. During this time, the customer has thechoice to store the trip as a candidate or retrieve a previouslystored trip. Once the customer has made a decision, she canproceed to book the trip. If a hotel reservation is made to-gether with certain flights, a discount price may be appliedto the hotel reservation. In addition, the hotel reservationcan be made by itself, together with the flight, or even afterthe flight is purchased. After submitting a valid payment,the customer is able to cancel the flight and/or the hotelreservation and receive a refund. If the customer cancelsthe purchase of a flight, she cannot receive the discount onthe hotel reservation.The Hierarchical artifact system has the following databaseschema: • FLIGHTS ( id , price , comp hotel id ) HOTELS ( id , unit price , discount price )In the schema, the id ’s are key attributes, price , unit price , discount price are non-key attributes, and comp hotel id is a foreign key attribute satisfying the dependency FLIGHTS [ comp hotel id ] ⊆ HOTELS [ id ].Intuitively, each flight stored in the FLIGHTS table has ahotel compatible for discount. If a flight is purchased to-gether with a compatible hotel reservation, a discount isapplied on the hotel reservation. Otherwise, the full priceneeds to be paid.The artifact system has 6 tasks: “T1:
ManageTrips ”,“T2:
AddHotel ”, “T3:
AddFlight ”, “T4:
BookInitial-Trip ”, “T5:
Cancel ” and “T6:
AlsoBookHotel ”, whichform the hierarchy represented in Figure 1.
T1: ManageTripsT2: AddHotel T4: BookInitialTrip T5: CancelT6: AlsoBookHotel T3: AddFlight
Figure 1: Tasks Hierarchy
The process can be described informally as follows. Thecustomer starts with task
ManageTrips , where the cus-tomer can add a flight and/or hotel to the trip by callingthe
AddHotel or the
AddFlight tasks. The customer isalso allowed to store candidate trips in an artifact relation
TRIPS and retrieve previously stored trips. (Note that forsimplicity, our example considers only outbound flights inthe trip. Return flights can be added by a simple exten-sion to the specification.) After the customer has made adecision, the
BookInitialTrip task is called to book thetrip and the payment is processed. The process also mim-ics a key feature of Expedia as follows. After payment ismade successfully, if the customer booked the flight withno hotel reservation, then she has the opportunity to adda hotel reservation by calling the
AddHotel task. Whenshe does so, the task
AlsoBookHotel needs to be calledto handle the payment of the added hotel reservation. Notethat the
AlsoBookHotel task can only be called after theflight is booked for but a hotel reservation is missing in thetrip. Once the payment is made, the customer can cancelthe order by calling the
Cancel task. Using
Cancel , thecustomer is able to cancel the flight and/or the hotel witha full refund. It is important to note that if the customerancels the purchase of the flight, then she cannot receivethe discount on the hotel reservation.The tasks are specified below. For convenience, we useexistential quantifications in conditions, which can be sim-ulated by adding extra variables. String values are used assyntactic sugar for numeric variables. We assume that theset of strings we used (“
Unpaid ”, “
Paid ”, “
FlightCanceled ”,etc.) correspond to distinct numeric constants. In par-ticular, the string “
Unpaid ” corresponds to the constant 0.Also for convenience, we use artifact variables with the samenames in parent and child tasks. By default, each input/returnvariable is mapped to the variable in the parent/child taskhaving the same name.
ManageTrips : This is the root task, modeling the processwhereby the customer creates, stores, and retrieves candi-date trips. A trip consists of a flight and/or hotel reserva-tion. Eventually, one of the candidate trips may be chosenfor booking. As the root task, its opening condition is true and closing condition is false . The task has the followingartifact variables: • ID variables: flight id , hotel id , • numeric variables: status and amount paid It also has an artifact relation
TRIPS storing candidate trips( flight id , hotel id ). The customer can use the subtasks AddFlight and
AddHotel (specified below) to fill in vari-ables flight id and hotel id . In addition, the task hastwo internal services:
StoreTrip and
RetrieveTrip . Intu-itively, when
StoreTrip is called, the current candidate trip( flight id , hotel id ) is inserted into TRIPS . When
Re-trieveTrip is called, one tuple is non-deterministically chosenand removed from
TRIPS , and ( flight id , hotel id ) is setto be the chosen tuple. The two tasks are specified as fol-lows: StoreTrip :Pre-condition: status = “Unpaid” ∧ ( flight id (cid:54) = null ∨ hotel id (cid:54) = null )Post-condition: flight id = null ∧ hotel id = null ∧ status = “Unpaid” ∧ amount paid = 0Set update: { + TRIPS ( flight id , hotel id ) } RetrieveTrip :Pre-condition: status = “Unpaid”Post-condition: status = “Unpaid” ∧ amount paid = 0Set update: {− TRIPS ( flight id , hotel id ) } AddFlight : This task adds a flight to the trip. It can beopened if flight id = null and status = “Unpaid” in theparent task. It has no input variable and the return variableis flight id . The task has a single internal service Choose-Flight that chooses a flight from the
FLIGHTS database andstores it in flight id , which is returned to
ManageTrips . AddHotel : This task adds a hotel reservation to the trip.It can be opened when hotel id = null and status is either“Paid” or “Unpaid”.This task has the following artifact variables: • ID variables: flight id , (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) hotel id • numeric variables: status , amount paid , (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) new amount paid (overwriting amount paid in the parent task when the taskreturns), discount price , unit price and hotel price the underlined variables are input variables the wavy underlined variables are return variables The task has a single internal service ChooseHotel whichpicks a hotel from
HOTELS and determines the price by check-ing whether the hotel is compatible with the chosen flight. Ifthey are compatible, then hotel price is set to the discountprice, otherwise it is set to the full price.A hotel can be added to the trip in two scenarios. First,if status is “Unpaid”, which means that the trip has notbeen booked, then this task chooses a hotel and the id ofthe hotel is returned to
ManageTrips . Second, if status is “Paid”, which means that a flight has already been pur-chased without a hotel reservation, then this task choosesa hotel and then the child task
AlsoBookHotel needs tobe called to handle the payment of the newly added hotel.In
AlsoBookHotel , a payment is received and the newtotal amount of payment received is written into variable new amount paid when
AlsoBookHotel returns.The closing service of
AddHotel has condition status =“Unpaid” ∨ ( status = “Paid” ∧ hotel price = new amount paid − amount paid ), which means that eitherthere is no need to call AlsoBookHotel or a correct pay-ment has been received in
AlsoBookHotel . The
Choose-Hotel service is specified as follows:
ChooseHotel :Pre-condition:
True
Post-condition: ∃ cid ∃ p f ( flight id = null → cid = null ) ∧ ( flight id (cid:54) = null → FLIGHTS ( flight id , p f , cid )) ∧ HOTELS ( hotel id , unit price , discount price ) ∧ ( cid = hotel id → hotel price = discount price ) ∧ ( cid (cid:54) = hotel id → hotel price = unit price ) ∧ ( new amount paid = 0) AlsoBookHotel : This task handles payment of hotel reser-vation made after the flight is purchased. It can be openedif hotel id (cid:54) = null and status = “Paid” in AddHotel . Itreceives input variables hotel price and amount paid fromthe parent and has local numeric variables new amount paid and hotel amount paid . It has a single service
Pay whichprocesses the payment. This service simply receives a hotelpayment in variable hotel amount paid and the new totalamount of payment received is calculated ( new amount paid = amount paid + hotel amount paid ). The service can failand the user can retry for unlimited number of times. Thistask can return only when the payment is successful, whichmeans that the closing condition is hotel amount paid = hotel price . When AlsoBookHotel returns, the numericvariable new amount paid is returned to
ManageTrips . BookInitialTrip : This task allows the customer to re-serve and pay for the chosen trip. Its opening conditionis status = “Unpaid”. This task has the following variables: • ID variables: flight id , hotel id • numeric variables: (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) status , (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) amount paid , ticket price , hotel price The task contains a single service
Pay to process the pay-ment, which can fail and be retried for an unlimited numberof times. Note that if the trip contains both the flight andhotel, when
Pay is called, the payments for both of themare received.If the payment is successful (i.e. amount paid equals tothe flight price plus the hotel price), status is set to “Paid”.therwise it is set to “Failed”. The closing condition ofthis task is status = “Paid” or status = “Failed”. When
BookInitialTrip returns, status and amount paid in theparent task are updated by the new status and amount paid returned by
BookInitialTrip . The
Pay service is specifiedas follows:
Pay :Pre-condition: hotel id (cid:54) = null ∨ flight id (cid:54) = null Post-condition: ∃ cid ∃ p ∃ p ( flight id = null → ticket price = 0 ∧ cid = null ) ∧ ( flight id (cid:54) = null → FLIGHTS ( flight id , ticket price ,cid )) ∧ ( hotel id = null → hotel price = 0) ∧ ( hotel id (cid:54) = null → ( HOTELS ( hotel id , p , p ) ∧ ( hotel id = cid → hotel price = p ) ∧ ( hotel id (cid:54) = cid → hotel price = p )) ∧ ( amount paid = ticket price + hotel price → status = “Paid”) ∧ ( amount paid (cid:54) = ticket price + hotel price → status = “Failed”) Cancel : In this task, the customer can cancel the flightand/or hotel after the trip has been paid for. Its openingcondition is status = “Paid”. This task has the followingvariables: • ID variables: hotel id and flight id • numeric variables: amount paid , ticket price , discount price , unit price , hotel price , amount refunded and (cid:58)(cid:58)(cid:58)(cid:58)(cid:58)(cid:58) status The task has 3 services,
CancelFlight , CancelHotel and
CancelBoth which cancel the flight, the hotel reservation,or both of them, respectively. When any of these servicesis called, amount refunded is calculated to be the correctamount needs to be refunded to the customer and status isset to “FlightCanceled”, “HotelCanceled” and “AllCanceled”respectively. In particular, if the customer would like tocancel the flight while keeping the hotel reservation, and if adiscount has been applied on the hotel reservation, then thecorrect amount refunded equals to ticket price minus thedifference between the normal cost and the discounted costof the hotel since she is no longer eligible for the discount.The closing condition of this task is True. We show thespecification of
CancelFlight as an example. Let
Discounted be the subformula( hotel id (cid:54) = null ) ∧ ( hotel price = discount price )And let Penalized be the subformula amount refunded = ticket price − ( unit price − discount price ) CancelFlight :Pre-condition: flight id (cid:54) = null ∧ status (cid:54) = “FlightCanceled” ∧ status (cid:54) = “HotelCanceled” ∧ status (cid:54) = “AllCanceled” Post-condition: ∃ cid FLIGHTS ( flight id , ticket price , cid ) ∧ ( hotel price = amount paid − ticket price ) ∧ ( hotel id (cid:54) = null → ( HOTELS ( hotel id , unit price , discount price ) ∧ ( ¬ Discounted → amount refunded = ticket price ) ∧ ( Discounted → Penalized ) ∧ status = “FlightCanceled” A.2 Example HLTL-FO Property
Suppose we wish to enforce the following policy: if a dis-count is applied to the hotel reservation, then a compatibleflight must be purchased without cancellation . One typicalway to defeat the policy would be for a user to first pay forthe flight, then reserve the hotel with the discount price, butnext cancel the flight without penalty. Detecting such bugscan be subtle, especially in a system allowing concurrency.The following HLTL-FO property of task
ManageTrips says “If
AddHotel is called and a hotel reservation is addedwith a discounted price, then at the task
Cancel , if the cus-tomer would like to cancel the flight, a penalty must bepaid”.The property is specified as [ ϕ ] T1 where ϕ is the formula: ϕ = F [ F ( Discounted ∧ X σ o T6 : AlsoBookHotel )] T2 : AddHotel → G ( σ o T5 : Cancel → [ G ( CancelFlight → Penalized )] T5 : Cancel )with the subformulas
Discounted and
Penalized definedabove.Notice that in the specification there is no guard prevent-ing
AddHotel and
Cancel to run concurrently after a suc-cessful payment is made, which can lead to a violation ofthis property. The problem can be fixed by adding a newvariable in
ManageTrips to indicate whether
AddHotel or Cancel are currently running and modifying their open-ing conditions to make sure that these two tasks are mutualexclusive.
B. FRAMEWORK AND HLTL-FOB.1 Definition of global run
The global runs of a HAS Γ are obtained from interleav-ings of the transitions in a tree of local runs, lifted to tran-sitions over instances of A . We make this more precise.Let D be a database and Tree a full tree of local runs over D . For a local run ρ = ( ν in , ν out , { ( I m , σ m ) } m<γ ) (where I m = ( ν m , S m )) and i < γ , we denote by σ ( ρ, i ) = σ i , ν ( ρ, i ) = ν i , and S ( ρ, i ) = S i . Let (cid:22) be the pre-order onthe set { ( ρ, i ) | ρ ∈ Tree , ≤ i < γ ( ρ ) } defined as the small-est reflexive-transitive relation containing the following:1. for each node ρ and 0 ≤ i ≤ j < γ ( ρ ), ( ρ, i ) (cid:22) ( ρ, j )2. for each edge in Tree from ρ T to ρ T c labeled i , ( ρ T , i ) (cid:22) ( ρ T c ,
0) and ( ρ T c , (cid:22) ( ρ T , i ). Additionally, if ρ T c is re-turning and m is the smallest j > i for which σ ( ρ T , j ) = σ cT c , then ( ρ T c , γ ( ρ T c )) (cid:22) ( ρ T , m ) and ( ρ T , m ) (cid:22) ( ρ T c ,γ ( ρ T c )).Let ∼ be the equivalence relation induced by (cid:22) (i.e., a ∼ b iff a (cid:22) b and b (cid:22) a ). Note that all classes of ∼ are single-tons except for the ones induced by (2), which are of theform { ( ρ , i ) , ( ρ , j ) } where σ ( ρ , i ) = σ ( ρ , j ) ∈ { σ oT , σ cT } for some task T . For an equivalence class ε of ∼ we denoteby σ ( ε ) the unique service of elements in ε . A linearization f (cid:22) is an enumeration of the equivalence classes of ∼ con-sistent with (cid:22) . Consider a linearization { ε i } i ≥ of (cid:22) . Notethat ε = ( ρ T ,
0) and let ν ( ρ T ,
0) = ν . A global run in-duced by { ε i } i ≥ is a sequence ρ = { ( ¯ I i , σ i ) } i ≥ such that σ i = σ ( ε i ) and each ¯ I i is an instance (¯ ν i , stg i , D, ¯ S i ) of A ,defined inductively as follows. For i = 0, • ¯ ν (¯ x T ) = ν (¯ x T ) (and arbitrary on other variables) • stg = { T (cid:55)→ active , T i (cid:55)→ init | ≤ i ≤ k }• ¯ S = { S T i (cid:55)→ ∅ | ≤ i ≤ k } .For i >
0, ¯ I i is defined as follows. Suppose first that ε i = { ( ρ, j ) } where ρ is a local run of task T and σ ( ρ, j ) is aninternal service of T . Then ¯ ν i = ¯ ν i − [¯ x T (cid:55)→ ν ( ρ, j )(¯ x T )],¯ S i = ¯ S i − [ S T (cid:55)→ S ( ρ, j )], and stg i = stg i − [ ¯ T (cid:55)→ init | ¯ T ∈ desc ( T )]. Now suppose ε = { ( ρ T , j ) , ( ρ T c , } , where T c isa child of T , ρ T and ρ T c are local runs of T and T c , and σ ( ε ) = σ oT c . Then ¯ ν i = ¯ ν i − [¯ x T c (cid:55)→ ν ( ρ T c , x T c )], ¯ S i =¯ S i − [ S T c (cid:55)→ ∅ ], and stg i = stg i − [ T c (cid:55)→ active ]. Finally,suppose ε = { ( ρ T , j ) , ( ρ T c , γ − } where σ ( ε ) = σ cT c . Then¯ ν i = ¯ ν i − [¯ x T (cid:55)→ ν ( ρ T , j )(¯ x T )], stg i = stg i − [ T c (cid:55)→ closed ],and ¯ S i = S i − [ S T c (cid:55)→ ∅ ].We denote by L ( Tree ) the set of global runs induced bylinearizations of (cid:22) . The set of global runs of Γ on a database D is Runs D (Γ) = (cid:83) {L ( Tree ) | Tree is a full tree of localruns of Γ on D } and the set of global runs of Γ is Runs (Γ) = (cid:83) D Runs D (Γ). B.2 Review of LTL
We review the classical definition of linear-time temporallogic (LTL) over a set P of propositions. LTL specifies prop-erties of infinite words ( ω -words) { τ i } i ≥ over the alphabetconsisting of truth assignments to P . Let τ ≥ j denote { τ i } i ≥ j ,for j ≥ X , U is the fol-lowing (where | = denotes satisfaction and j ≥ • τ ≥ j | = X ϕ iff τ ≥ j +1 | = ϕ , • τ ≥ j | = ϕ U ψ iff ∃ k ≥ j such that τ ≥ k | = ψ and τ ≥ l | = ϕ for j ≤ l < k .Observe that the above temporal operators can simulateall commonly used operators, including G (always) and F (eventually). Indeed, F ϕ ≡ true U ϕ and G ϕ ≡ ¬ ( F ¬ ϕ ).The standard construction of a B¨uchi automaton B ϕ cor-responding to an LTL formula ϕ is given in [53, 49]. Theautomaton B ϕ has exponentially many states and acceptsprecisely the set of ω -words that satisfy ϕ .It is sometimes useful to apply LTL on finite words ratherthan ω -words. The finite semantics we use for temporal op-erators is the following [22]. Let { τ i } ≤ i ≤ n a finite sequenceof truth values of P . Similarly to the above, let τ ≥ j denote { τ i } j ≤ i ≤ n , for 0 ≤ j ≤ n . The semantics of X and U aredefined as follows: • τ ≥ j | = X ϕ iff n > j and τ ≥ j +1 | = ϕ , • τ ≥ j | = ϕ U ψ iff ∃ k, j ≤ k ≤ n such that τ ≥ k | = ψ and τ ≥ l | = ϕ for j ≤ l < k .It is easy to verify that for the B ϕ obtained by the stan-dard construction [53, 49] there is a subset Q fin of its statessuch that B ϕ viewed as a finite-state automaton with finalstates Q fin accepts precisely the finite words that satisfy ϕ . B.3 Proof of Theorem 11
We show that it is undecidable whether a HAS Γ = (cid:104)A , Σ , Π (cid:105) satisfies an LTL formula over Σ. The proof is by reductionfrom the repeated state reachability problem of VASS withreset arcs and bounded lossiness (RB-VASS) [41]. An RB-VASS extends the VASS reviewed in Section 4 as follows.In addition to increment and decrement of the counters,an action of RB-VASS also allows resetting the values ofsome counters to 0. And after each transition, the value ofeach counter can decrease non-deterministically by an inte-ger value bounded by some constant c . The results in [41](Definition 2 and Theorem 18) indicate that the repeatedstate reachability problem for RB-VASS is undecidable forevery fixed c ≥
0, since the structural termination problemfor Reset Petri-net with bounded lossiness can be reducedto the repeated state reachability problem for RB-VASS’s.In our proof, we use RB-VASS’s with c = 1.Formally, a RB-VASS V (with lossiness bound 1 and di-mension d >
0) is a pair (
Q, A ) where Q is a finite set ofstates and A is a set of actions of the form ( p, ¯ a, q ) where¯ a ∈ {− , +1 , r } d , and p, q ∈ Q . A run of V = ( Q, A ) is a se-quence ( q , ¯ z ) , . . . ( q n , ¯ z n ) where ¯ z = ¯0 and for each i ≥ q i ∈ Q , ¯ z i ∈ N d , and for some ¯ a such that ( q i , ¯ a, q i +1 ) ∈ A ,and for 1 ≤ j ≤ d : • if ¯ a ( j ) ∈ {− , +1 } , then ¯ z i +1 ( j ) = ¯ z i ( j )+¯ a ( j ) or ¯ z i +1 ( j ) =¯ z i ( j ) + ¯ a ( j ) −
1, and • if ¯ a ( j ) = r , then ¯ z i +1 ( j ) = 0.For a given RB-VASS V = ( Q, A ) and a pair of states q , q f ∈ Q , we say that q f is repeatedly reachable from q if thereexists a run ( q , ¯ z ) . . . ( q n , ¯ z n ) . . . ( q m , ¯ z m ) of V such that q n = q m = q f and ¯ z n ≤ ¯ z m . As discussed above, checkingwhether q f is repeatedly reachable from q is undecidable.We now show that for a given RB-VASS V = ( Q, A ) and( q , q f ), one can construct a HAS Γ = (cid:104)A , Σ , Π (cid:105) and LTLproperty Φ over Σ such that q f is repeatedly reachable from q iff Γ | = Φ. At a high level, the construction of Γ uses d tasks to simulate the d -dimensional vector of counters. Eachtask is equipped with an artifact relation, and the numberof elements in the artifact relation is the current value ofthe corresponding counter. Increment and decrement thecounters are simulated by internal services of these tasks,and reset of the counters are simulated by closing and re-opening the task (recall that this resets the artifact relationto empty). Then we specify in the LTL formula Φ that theupdates of the counters of the same action are grouped insequence. Note that this requires coordinating the actionsof sibling tasks, which is not possible in HLTL-FO. Theconstruction is detailed next.The database schema of Γ consists of a single unary re-lation R ( id ). The artifact system has a root task T andsubtasks { P , P , . . . , P d , C , . . . , C d } which form the follow-ing tasks hierarchy: T P P P d-1 P d C C C d-1 C d P Figure 2: Tasks Hierarchy he tasks are defined as follows. The root task T hasno variables nor internal services. The task P contains anumeric variable s , indicating the current state of the RB-VASS. For each q ∈ Q , P has a service σ q , whose pre-condition is true and post-condition sets s to q .For i ≥
1, task P i has no variable. It has a single internalservice σ ri whose pre- and post-conditions are both true .Each C i has an ID variable x , an artifact relation S i and apair of services σ + i and σ − i , which simply insert x into S i andremoves an element from S i , respectively. Intuitively, thesize of S i is the current value of the i -th counter. Applicationof service σ ri corresponds to resetting the i -th counter. Andapplication of services σ + i and σ − i correspond to incrementand decrement of the i -th counter, respectively.Except for the closing condition of T , all opening andclosing conditions of tasks are true .We encode the set of actions A into an LTL formula asfollows. For each state p ∈ Q , we denote by α ( p ) the set ofactions starting from p . For each action α = ( p, ¯ a, q ) ∈ A ,we construct an LTL formula ϕ ( α ) as follows. First, let φ , . . . φ d , φ d +1 be LTL formulas where: • φ d +1 = X σ q , • for i = d, d − , . . . , – if ¯ a ( i ) = +1, then φ i = σ + i ∧ X φ i +1 , – if ¯ a ( i ) = −
1, then φ i = ( σ − i ∧ X φ i +1 ) ∨ ( σ − i ∧ X ( σ − i ∧ X φ i +1 )), and – if ¯ a ( i ) = r , then φ i = σ ci ∧ X ( σ ri ∧ X ( σ oi ∧ X φ i +1 )) where σ oi and σ ci are the opening and closing services of task C i .Let ϕ ( α ) = X φ . Intuitively, ϕ ( α ) specifies a sequence ofservice calls that update the content of the artifact relations S , . . . S d according to the vector ¯ a . In particular, for ¯ a ( i ) = r , the subsequence of services σ ci σ ri σ oi first closes task C i thenreopens it. This empties S i . For ¯ a ( i ) = +1, by executing σ + i , the size of S i might be increased by 1 or 0, dependingon whether the element to be inserted is already in S i . Andfor ¯ a ( i ) = −
1, we let σ − i to be executed either once or twice,so the size of S i can decrease by 1 or 2 nondeterministically.Then we letΦ = Φ init ∧ (cid:94) p ∈ Q G σ p → (cid:95) α ∈ α ( p ) ϕ ( α ) ∧ GF σ q f where Φ init is a formula specifying that the run is correctlyinitialized, which simply means that the opening services σ oT of all tasks are executed once at the beginning of the run,and then a σ q is executed.The second clause says that for every state p ∈ Q , when-ever the run enters a state p (by calling σ p ), a sequence ofservices as specified in ϕ ( α ) is called to update S , . . . , S k ,simulating the action α that starts from p .Finally, the last clause GF σ q f guarantees that the service σ q f is applied infinitely often, which means that q f is reachedinfinitely often in the run.We can prove the following lemma, which implies Theorem11: Lemma
For RB-VASS ( Q, A ) and states q , q f ∈ Q ,there exists a run ( q , ¯ z ) , . . . , ( q m , ¯ z m ) , . . . , ( q n , ¯ z n ) of ( Q, A ) where q m = q n = q f and ¯ z m ≤ ¯ z n iff there exists a globalrun ρ of Γ such that ρ | = Φ . B.4 Expressiveness of HLTL-FO
We next show that HLTL-FO expresses, in a reasonablesense, all interleaving-invariant LTL-FO properties. We con-sider a notion of interleaving-invariance of LTL-FO formu-las based on their propositional structure, rather than thespecifics of the propositions’ interpretation (which may leadto “accidental” invariance). In view of Lemma 30, we con-sider only formulas with no global variables or set atoms.We first recall the logic LTL-FO, slightly adapted to ourcontext. Let Γ = (cid:104)A , Σ , Π (cid:105) be a HAS where A = (cid:104)H , DB(cid:105) .An LTL-FO formula ϕ f over Γ consists of an LTL formula ϕ with propositions P ∪ Σ together with a mapping f asso-ciating to each p ∈ P a condition over ¯ x T for some T ∈ T (and we say that f ( p ) is over T ) . Satisfaction of ϕ f ona global run ρ = { ( I i , σ i ) } i ≥ of Γ on database D , where I i = ( ν i , stg i , D, S i ), is defined as usual, modulo the follow-ing: • f ( p ) over T holds in ( I i , σ i ) iff stg i ( T ) = active and thecondition f ( p ) on ν i (¯ x T ) holds; • proposition σ in Σ holds in ( I i , σ i ) if σ = σ i .Thus, the information about ( I i , σ i ) relevant to satisfactionof ϕ f consists of σ i , the stage of each task (active or not),and the truth values in I i of f ( p ) for p ∈ P .We now make more precise the notion of (propositional)invariance under interleavings. Consider an LTL-FO for-mula ϕ f over Γ. Invariance under interleavings is a propertyof the propositional formula ϕ (so independent on the inter-pretation of propositions provided by f ). Let P ∪ Σ be theset of propositions of ϕ and let P T denote the subset of P for which f ( p ) is a condition over ¯ x T . Thus, { P T | T ∈ T } is a partition of P . We define the set L (Γ) of ω -words as-sociated to Γ, on which ϕ operates. The alphabet, denoted A (Γ), consists of all triples ( κ, stg, σ ) where σ ∈ Σ, κ is atruth assignment to the propositions in P , and stg is a map-ping associating to each T ∈ T its stage ( active , init , or closed ). An ω -word { ( κ i , stg i , σ i ) } i ≥ over A (Γ) is in L (Γ)if the following hold:1. for each i >
0, if σ i ∈ Σ δT , then κ i and κ i − agree on all P ¯ T where ¯ T (cid:54) = T ;2. the sequence of calls, returns, and internal services obeysthe conditions on service sequences in global runs of Γ;3. for each i > T ∈ T , stg i ( T ) is the stage of T as determined by the sequence of calls and returns in { σ j } j σ i j ∈ Σ δT then ¯ κ j = ¯ κ j − [ P T (cid:55)→ κ i j ( P T )]ntuitively, u α is obtained from u by commuting actions thatare incomparable with respect to (cid:22) u , yielding the lineariza-tion α . We note that the relation (cid:22) u is the analog to oursetting of Mazurkiewicz traces, used in concurrent systemsto capture dependencies among process actions [42, 29, 28]. Definition
An LTL-FO formula ϕ f over Γ is propo-sitionally invariant with respect to interleavings if for every u ∈ L (Γ) and linearization α of (cid:22) u , u | = ϕ iff u α | = ϕ . We can show the following.
Theorem
HLTL-FO expresses precisely the LTL-FOproperties of HAS’s that are propositionally invariant withrespect to interleavings.
We next sketch the proof. For conciseness, we refer through-out the proof to propositionally interleaving-invariant LTL-FOsimply as interleaving-invariant LTL-FO.Showing that HLTL-FO expresses only interleaving-invariantLTL-FO properties is straightforward. The converse how-ever is non-trivial. We begin by showing a normal formfor LTL formulas, which facilitates the application to ourcontext of results from [27, 28] on temporal logics for con-current processes. Consider the alphabet H (Γ) = { ( κ, σ ) | ( κ, stg, σ ) ∈ A (Γ) } . Thus, H (Γ) is A (Γ) with the stage in-formation omitted. Let H (Γ) = h ( L (Γ)) where h (( κ, stg, σ )) =( κ, σ ). We define local-LTL to be LTL using the set of propo-sitions PΣ = { ( p, σ ) | p ∈ P T , σ ∈ Σ obsT } . A proposition( p, σ ) holds in (¯ κ, ¯ σ ) iff ¯ σ = σ and ¯ κ ( p ) is true. The defini-tion of interleaving-invariant local-LTL formula is the sameas for LTL. Lemma
For each interleaving-invariant LTL formula ϕ over L (Γ) one can construct an interleaving-invariant local-LTL formula ¯ ϕ over H (Γ) such that for every u ∈ L (Γ) , u | = ϕ iff h ( u ) | = ¯ ϕ where h (( κ, stg, σ )) = ( κ, σ ) . Proof.
We use the equivalence of FO and LTL over ω -words [37]. It is easy to see that each LTL formula ϕ over L (Γ) can be translated into an FO formula ψ ( ϕ ) over H (Γ)using only propositions in PΣ, such that for every u ∈ L (Γ), u | = ϕ iff h ( u ) | = ψ ( ϕ ). Indeed, it is straightforward todefine by FO means the stage of each transaction in a givenconfiguration, as well as each proposition in P ∪ Σ in termsof propositions in PΣ, on words in H (Γ). One can thenconstruct from the FO sentence ψ ( ϕ ) an LTL formula ¯ ϕ equivalent to it over words in H (Γ), using the same set ofpropositions PΣ. The resulting LTL formula is thus in local-LTL, and it is easily seen that it is interleaving-invariant.We use a propositional variant HLTL of HLTL-FO, de-fined over ω -words in H (Γ) similarly to HLTL-FO. Moreprecisely, LTL formulas applying to transaction T use propo-sitions in P T ∪ Σ obsT and expressions [ ψ ] T c where T c is a childof T and ψ is an HLTL formula applying to T c .We show the following key fact. Lemma
For each interleaving-invariant local-LTL for-mula over H (Γ) there exists an equivalent HLTL formulaover H (Γ) . Proof.
To show completeness of HLTL, we use a logicshown in [27, 28] to be complete for expressing LTL prop-erties invariant with respect to valid interleavings of ac-tions of concurrent processes (or equivalently, well-defined on Mazur-kievicz traces). The logic, adapted to our frame-work, operates on partial orders (cid:22) u of words u ∈ H (Γ),and is denoted LTL( (cid:22) ). For u = { ( κ i , σ i ) | i ≥ } , we de-fine the projection of u on T as the subsequence π T ( u ) = { ( κ i j | P T , σ i j ) } j ≥ where { σ i j | j ≥ } is the subsequence of { σ i | i ≥ } retaining all services in Σ obsT . LTL( (cid:22) ) uses theset of propositions PΣ and the following temporal operatorson (cid:22) u : • X T ϕ , which holds in ( κ i , σ i ) if π T ( v ) (cid:54) = (cid:15) for v = { ( κ j , σ j ) | j ≥ m } , where m is the minimum index such that i ≺ u m ,and ϕ holds on π T ( v ); • ϕ U T ψ , which holds in ( κ i , σ i ) if π T ( v ) (cid:54) = (cid:15) for v = { ( κ j , σ j ) | j ≥ i } , and ϕ U ψ holds on π T ( v ).From Theorem 18 in [27] and Proposition 2 and Corollary26 in [28] it follows that LTL( (cid:22) ) expresses all local-LTLproperties over H (Γ) invariant with respect to interleavings.We next show that HLTL can simulate LTL( (cid:22) ). To thisend, we consider an extension of HLTL in which LTL( (cid:22) ) for-mulas may be used in addition to propositions in P T ∪ Σ obsT inevery formula applying to transaction T . We denote the ex-tension by HLTL+LTL( (cid:22) ). Note that for each formula ξ inLTL( (cid:22) ), [ ξ ] T is an HLTL+LTL( (cid:22) ) formula. The proof con-sists in showing that the LTL( (cid:22) ) formulas can be eliminatedfrom HLTL+LTL( (cid:22) ) formulas. This is done by recursivelyreducing the depth of nesting of X T and U T operators, andfinally eliminating propositions. We define the rank of anLTL( (cid:22) ) formula to be the maximum number of X T and U T operators along a path in its syntax tree. For a formula ξ in HLTL+LTL( (cid:22) ), we define r ( ξ ) = ( n, m ) where n is themaximum rank of an LTL( (cid:22) ) formula occurring in ξ , and m is the number of such formulas with rank n . The pairs( n, m ) are ordered lexicographically.Let [ ξ ] T be an HLTL+LTL( (cid:22) ) formula. We associateto [ ξ ] T the tree Tree ( ξ ) whose nodes are all occurrences ofsubformulas of the form [ ψ ] T , with an edge from [ ψ i ] T i to[ ψ j ] T j if the latter occurs in ψ i and T j is a child of T i in H .Consider an HLTL+LTL( (cid:22) ) formula [ ξ ] T such that r ( ξ ) ≥ (1 , ξ has a subformula X T ϕ in LTL( (cid:22) ) of max-imum rank. Pick one such occurrence and let ¯ T be the min-imum task (wrt H ) such that X T ϕ occurs in [ ψ ] ¯ T . We con-struct an HLTL+LTL( (cid:22) ) formula ¯ ξ such that r (¯ ξ ) < r ( ξ ),essentially by eliminating X T . We consider 4 cases: T = ¯ T , T is a descendant or ancestor of ¯ T , or neither.Suppose first that T = ¯ T . Consider an occurrence of X T ϕ .Intuitively, there are two cases: X T ϕ is evaluated inside therun of T corresponding to [ ψ ] T , or at the last configuration.In the first case ( ¬ σ cT holds), X T ϕ is equivalent to X ϕ . Inthe second case ( σ cT holds), X T ϕ holds iff ϕ holds at thenext call to T . Thus, ξ is equivalent to ξ ∨ ξ , where:1. ξ says that ϕ does not hold at the next call to T (or nosuch call exists) and X T ϕ is replaced in ψ by ¬ σ cT ∧ X ϕ ξ says that ϕ holds at the next call to T (which exists)and X T ϕ is replaced in ψ by ¬ σ cT → X ϕ .We next describe how ξ states that ϕ does not hold at thenext call to T ( ξ is similar). We need to state that eitherthere is no future call to T , or such a call exists and ¬ ϕ holds at the first such call. Consider the path from T to T in H . Assume for simplicity that the path is T , T , . . . , T k where T k = T . For each i , 1 ≤ i < k , we define inductively(from k − α i , β i ( ¬ ϕ ) such that α i says thatthere is no call leading to T in the remainder of the currentubrun of T i , and β i ( ¬ ϕ ) says that such a call exists andthe first call leads to a subrun of T satisfying ¬ ϕ . First, α k − = G ( ¬ σ oT k ) and β k − ( ¬ ϕ ) = ¬ σ oT k U [ ¬ ϕ ] T k . For1 ≤ i < k − α i = G ( σ oT i +1 → [ α i +1 ] T i +1 ) and β i ( ¬ ϕ ) =( σ T i +1 → [ α i +1 ] T i +1 ) U [ β i +1 ( ¬ ϕ )] T i +1 . Now ξ = ξ ∨ (cid:87) ≤ j 1, ¯ ψ i is obtained from ψ i by replacing [ ψ i +1 ] T i +1 with[ ¯ ψ i +1 ] T i +1 ∧ α i . For 1 ≤ j < k , ξ j is obtained by replacing in ψ j , [ ψ j +1 ] T j +1 with [ ¯ ψ j +1 ] T j +1 ∧ β j ( ¬ ϕ ). It is clear that ξ states the desired property. The formula ξ is constructedsimilarly. Note that r ( ξ ∨ ξ ) < r ( ξ ).Now suppose T is an ancestor of ¯ T . We reduce this caseto the previous ( T = ¯ T ). Let T (cid:48) be the child of T . Suppose[ ψ T ] T is the ancestor of [ ψ ] ¯ T in Tree ( ξ ). Then ξ is equivalentto ¯ ξ = ξ ∨ ξ where:1. ξ says that ϕ does not hold at the next action of T wrt (cid:22) (or no such next action exists) and ψ is replaced by ψ ( X T ϕ ← false ) ( ← denotes substitution)2. ξ says that ϕ holds at the next action of T wrt (cid:22) and ψ is replaced by ψ ( X T ϕ ← true )To state that ϕ does not hold at the next call to T (or nosuch call exists) ξ is further modified by replacing in ψ T ,[ ψ T (cid:48) ] T (cid:48) with [ ψ T (cid:48) ] T (cid:48) ∧ ( G ( ¬ σ cT (cid:48) ) ∨ ( ¬ σ cT (cid:48) U ( σ cT (cid:48) ∧ ¬ X T ϕ )).Smilarly, ξ is further modified by replacing in ψ T , [ ψ T (cid:48) ] T (cid:48) with [ ψ T (cid:48) ] T (cid:48) ∧ ( ¬ σ cT (cid:48) U ( σ cT (cid:48) ∧ X T ϕ )). Note that there arenow two occurrences of X T ϕ in the modified ψ T ’s. By ap-plying twice the construction for the case ¯ T = T we obtainan equivalent ¯ ξ such that r (¯ ξ ) < r ( ξ ).Next consider the case when ¯ T is an ancestor of T . Sup-pose the path from T to T in H is T , . . . , T i , . . . T k where T i = ¯ T and T k = T . Consider the value of X T ϕ in therun ρ ψ of ¯ T on which ψ is evaluated. Similarly to the case T = ¯ T , there are two cases: ϕ holds at the next invocationof T following ρ ψ , or it does not. Thus, ξ is equivalent to ξ ∨ ξ , where:1. ξ says that ϕ does not hold at the next call to T (orno such call exists) and X T ϕ is replaced in ψ by β i ( ϕ ),where β i ( ϕ ) says that there exists a future call leadingto T in the current run of ¯ T , and the first such run of T satisfies ϕ ; β i ( ϕ ) is constructed as in the case T = ¯ T .2. ξ says that ϕ holds at the next call to T following thecurrent run of ¯ T and X T ϕ is replaced in ψ by α i ∨ β i ( ϕ )where α i , constructed as for the case T = ¯ T , says thatthere is no future call leading to T in the current run of¯ T .To say that ϕ does not hold at the next call to T following ρ ψ (or no such call exists), ξ is modified analogously to thecase ¯ T = T , and similarly for ξ .Finally suppose the least common ancestor of ¯ T and T is ˆ T distinct from both. Let [ ψ ˆ T ] ˆ T be the ancestor of [ ψ ] ¯ T in Tree ( ξ ). Consider the value of X T ϕ in the run of ¯ T onwhich ψ is evaluated. There are two cases: ϕ holds at thenext invocation of T following the run of ¯ T , or it does not.Thus, ξ is equivalent to ξ ∨ ξ , where: 1. ξ says that ϕ does not hold at the next call to T (or nosuch call exists) and ψ is replaced by ψ ( X T ϕ ← false )2. ξ says that ϕ holds at the next call to T and ψ is replacedby ψ ( X T ϕ ← true )To say that ϕ does not hold at the next call to T (or no suchcall exists), ξ is modified analogously to the case ¯ T = T ,and similarly for ξ , taking into account the fact that thenext call to T , if it exists, must take place in the current runof ˆ T or of one of its ancestors. This completes the simulationof X T ϕ .Now suppose ξ has a subformula ( ϕ U T ϕ ) of maximumrank. Pick one such occurrence and let ¯ T be the minimumtask (wrt H ) such that ( ϕ U T ϕ ) occurs in [ ψ ] ¯ T . Thereare several cases: ¯ T = T , ¯ T is an ancestor or descendantof T , or neither. The simulation technique is similar to theabove. We outline the construction for the most interestingcase when ¯ T = T .Consider the run of T on which [ ψ ] T is evaluated. Thereare two cases: ( † ) ( ϕ U T ϕ ) holds on the concatenationof the future runs of T , or ( † ) does not hold. Thus, ξ isequivalent to ξ ∨ ξ where:1. ξ says that ( † ) holds and ψ is modified by replacing theoccurrence of ( ϕ U T ϕ ) with G ϕ ∨ ( ϕ U ϕ ), and2. ξ says that ( † ) does not hold and ψ is modified by re-placing the occurrence of ( ϕ U T ϕ ) with ( ϕ U ϕ ).We show how ξ ensures ( † ). Let T , . . . , T k be the pathfrom root to T in H . For each i , 1 ≤ i < k , we define induc-tively (from k − α i , β i as follows. Intuitively, α i says that all future calls leading to T from the currentrun of T i must result in runs satisfying G ϕ : • α k − = G ( σ oT k → [ G ϕ ] T k ), • for 1 ≤ i < k − α i = G ( σ oT i +1 → [ α i +1 ] T i +1 )The formula β i says that there must be a future call to T in the current run of T i satisfying ϕ U ϕ and all prior callsresult in runs satisfying G ϕ : • β k − = ( σ oT k → [ G ϕ ] T k ) U [ ϕ U ϕ ] T k , • for 1 ≤ i < k − β i = ( σ oT i +1 → [ α i +1 ] T i +1 ) U [ β i +1 ] T i +1 .Now ξ is (cid:87) ≤ j We first show that the global variables, as well as setatoms, can be eliminated from HLTL-FO formulas. Lemma Let Γ be a HAS and ∀ ¯ y [ ϕ f (¯ y )] T an HLTL-FOformula over Γ . One can construct in linear time a HAS ¯Γ and an HLTL-FO formula [ ¯ ϕ f ] ¯ T , where ¯ ϕ f contains noatoms S T (¯ z ) , such that Γ | = ∀ ¯ y [ ϕ f (¯ y )] T iff ¯Γ | = [ ¯ ϕ f ] ¯ T . Proof. Consider first the elimination of global variables.Suppose Γ has tasks T , . . . , T k . The Hierarchical artifactsystem ¯Γ is constructed from Γ by adding ¯ y to the vari-ables of T and augmenting the input variables of all othertasks with ¯ y (appropriately renamed). Note that ¯ y is uncon-strained, so it can be initialized to an arbitrary valuation andthen passed as input to all other tasks. Let Γ consist of theresulting tasks, ¯ T , . . . , ¯ T k . It is clear that Γ | = ∀ ¯ y [ ϕ f (¯ y )] T iff ¯Γ | = [ ¯ ϕ f ] ¯ T . Consider now how to eliminate atoms of the form S T (¯ z )from ¯ ϕ f . Recall that for all such atoms, ¯ z ⊆ ¯ y , so ¯ z isfixed throughout each run. The idea is keep track of themembership of ¯ z in S T using two additional numeric artifactvariables x ¯ z and y ¯ z , such that x ¯ z = y ¯ z indicates that S T (¯ z )holds . Specifically, a pre-condition ensures that x ¯ z (cid:54) = y ¯ z initially holds, then x ¯ z (cid:54) = y ¯ z is enforced as soon as thereis an insertion + S T (¯ s T ) for which ¯ s T = ¯ z , and x ¯ z (cid:54) = y ¯ z isenforced again whenever there is a retrieval of a tuple equalto ¯ z . This can be achieved using pre-and-post conditionsof services carrying out the insertion or retrieval. Then theatom S T (¯ z ) can be replaced in ¯ ϕ f with ( x ¯ z = y ¯ z ).We next consider two simplifications of artifact systemsregarding the interaction of tasks with their subtasks. Lemma Let Γ be a HAS and ϕ an HLTL-FO propertyover Γ . One can construct a HAS ˜Γ and an HLTL-FO for-mula ˜ ϕ such that Γ | = ϕ iff ˜Γ | = ˜ ϕ and: ( i ) (cid:83) T c ∈ child ( T ) ¯ x TT ↑ c and (cid:83) T c ∈ child ( T ) ¯ x TT ↓ c are disjoint for each task T in ˜Γ , ( ii ) for each child task T c ∈ child ( T ) , ¯ x TT ↑ c ∩ VAR R = ∅ . Proof. Consider (i). We describe here informally theconstruction of ˜Γ that eliminates overlapping between (cid:83) T c ∈ child ( T ) ¯ x TT ↑ c and (cid:83) T c ∈ child ( T ) ¯ x TT ↓ c . For each task T andfor each subtask T c of T , for each variable x ∈ ¯ x TT ↓ c , we intro-duce to T a new variable ˆ x whose type is the same as the type(id or numeric) of x . We denote by ˆ x TT ↓ c the set of variablesadded to T for subtask T c . Then instead of passing ¯ x TT ↓ c to T c , T passes ˆ x TT ↓ c to T c when T c opens. And for the openingservice σ oT c with opening condition π , we check π in conjunc-tion with (cid:86) x ∈ ¯ x TT ↓ c ( x = ˆ x ). Note that (cid:83) T c ∈ child ( T ) ˆ x TT ↓ c and (cid:83) T c ∈ child ( T ) ¯ x TT ↑ c are disjoint. By this construction, in eachrun of ˜Γ, after each application of an internal service σ oftask T , the variables in ˆ x TT ↓ c for each subtask T c receives a setof non-deterministically chosen values. Then each subtask T c can be opened only when ˆ x TT ↓ c and ¯ x TT ↓ c have the samevalues. So passing ˆ x TT ↓ c to T c is equivalent to passing ¯ x TT ↓ c to T c .To guarantee that there is a bijection from the runs of Γto the runs of ˜Γ, we also need to make sure that the valuesof ˆ x TT ↓ c are non-deterministically chosen before the first ap-plication of internal service. (Recall that they either contain0 or null at the point when T is opened.) So we argument T with an extra binary variable x init and an extra internal ser-vice σ initT . Variable x init indicates whether task T has been“initialized”. The service σ initT has precondition that checkswhether x init = 0 and post-condition sets x init = 1. It setsall id variables to null and numeric variables 0 except forvariables in ˆ x TT ↓ c for any T c . So application of σ initT assignsvalues to ˆ x TT ↓ c for every subtask T c non-deterministically andall other variables are initialized to the initial state when T is opened. All other services are modified such that they canbe applied only when x init = 1. So in a projected run ρ T of˜Γ, the suffix with x init = 1 corresponds to the original pro-jected run of Γ. Thus we only need to rewrite the HLTL-FO This is done to avoid introducing constants, that could alsobe used as flags.roperty ϕ to ˜ ϕ such that each formula in Φ T only looksat the suffix of projected run ρ T after x init is set to be 1.(Namely, each ψ ∈ Φ T is replaced with F (( x init = 1) ∧ ψ ).)Now consider (ii). We outline the construction of ˜Γ and˜ ϕ informally. For each task T , we introduce a set of newnumeric variables { x T c | T c ∈ child ( T ) , x ∈ ¯ x TT ↑ c ∩ VAR R } to¯ x T . Intuitively, these variables contain non-deterministicallyguessed returning values from each child task T c . These arepassed to each child task T c as additional input variables.Before T c returns, these are compared to the values of thereturning numeric variables of T c , and T c returns only ifthey are identical. More formally, for each child task T c of T , variables { x T c | x ∈ ¯ x TT ↑ c ∩ VAR R } are passed from T to T c as part of the input variables of T c . For each variable x T c in T , we let x T c → T ∈ ¯ x T c be the corresponding inputvariable of x T c . And for each x T c , we denote by x ret thevariable in ¯ x T c satisfying that f out ( x ) = x ret for f out in theoriginal Γ. Then at T c , we remove all numeric variables from¯ x T c ret and add condition (cid:86) x ∈ ¯ x TT ↑ c ∩ VAR R x ret = x T c → T to theclosing condition of T c . Note that we need to guarantee thatthe variables in { x T c | T c ∈ child ( T ) , x ∈ ¯ x TT ↑ c ∩ VAR R } obtainnon-deterministically guessed values. This can be done asin the simulation for (i).Conditions on ¯ x T after a subset T ’s children has returnedare evaluated using the guessed values for the variables re-turned so far. Specifically, the correct value to be usedis the latest returned by a child transaction, if any (recallthat children tasks can overwrite each other’s numeric re-turn variables in the parent). Keeping track of the sequenceof returned transactions and evaluating conditions with thecorrect value can be easily done directly in the verificationalgorithm, at negligible extra cost. This means that we canassume that tasks have the form in (ii) without the exponen-tial blowup in the conditions, but with the quadratic blowupin the number of variables.To achieve the simulation fully via the specification iscostlier because some of the conditions needed have expo-nential size. We next show how this can be done. Intu-itively, we guess initially an order of the return of the chil-dren transactions and enforce that it be respected. We alsokeep track of the children that have already returned. Let child ( T ) = { T , . . . , T n } . To guess an order of return, weuse new ID variables ¯ o = { o ij | ≤ i, j ≤ n } . Intuitively, o ij (cid:54) = null says that T i returns before T j . We also use newID variables { t i | ≤ i ≤ n } , where t i (cid:54) = null means that T i has returned. The variables ¯ o are subject to a conditionspecifying the axioms for a total order: ∧ ≤ i,j ≤ n ( o ij (cid:54) = null ∨ o ji (cid:54) = null ) ∧ ≤ i C.1.1 Only-if: from actual runs to symbolic runs Let Tree be a tree of local runs accepted by B ϕ (withdatabase D ). The construction of Sym from Tree is simple.This can be done by replacing each local run ρ T ∈ Tree witha local symbolic run ˜ ρ T . More precisely, let ρ T = ( ν in , ν out , { ( J i , σ i ) } ≤ i<γ )be a local run in Tree , where J i = ( ν i , S i ), We construct acorresponding local symbolic run˜ ρ T = ( τ in , τ out , { ( I i , σ i ) } ≤ i<γ )For 0 ≤ i < γ , I i = ( τ i , ¯ c i ) is constructed from ( ν i , S i ) asfollows. The navigation set E T of τ i contains every x R forevery x ∈ ¯ x T and R such that ν ( x ) is an ID of relation R in D . Then we define ν ∗ i to be a mapping from E + T = E T ∪ { , null } ∪ ¯ x T to actual values, where: • ν ∗ i ( e ) = e if e ∈ { , null } , • ν ∗ i ( e ) = ν i ( x ) for e = x or e = x R , and • ν ∗ i ( e.ξ ) = t.ξ if ν ∗ i ( e ) is an ID of a tuple t ∈ D .We construct the equality type ∼ τ i such that for every e and e (cid:48) in E + , e ∼ τ i e (cid:48) iff ν ∗ i ( e ) = ν ∗ i ( e (cid:48) ). Also we let τ in = τ | ¯ x Tin and τ out = τ γ − | ¯ x Tin ∪ ¯ x Tret if ν out (cid:54) = ⊥ and τ out = ⊥ otherwise. Since D satisfies the functional dependencies,for every τ i and expressions e and e (cid:48) , e ∼ τ i , e (cid:48) implies that ν ∗ i ( e ) = ν ∗ i ( e (cid:48) ), so for every attribute a , if e.a and e (cid:48) .a are inthe navigation set of τ i , then e.a ∼ τ i , e (cid:48) .a because ν ∗ i ( e.a ) = ν ∗ i ( e (cid:48) .a ).We also note the following facts. Fact For every condition ψ over ¯ x T , D | = ψ ( ν i ) iff τ i | = ψ . act For all i, i (cid:48) and ¯ x ⊆ ¯ x T , if ν i (¯ x ) = ν i (cid:48) (¯ x ) then τ i | ¯ x = τ i (cid:48) | ¯ x . Given the sequence { ( τ i , σ i ) } ≤ i<γ , the sequence of vectorsof T S -isomorphism type counters { ¯ c i } ≤ i<γ is uniquely de-fined. Let ˜ ρ T = ( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ). In view of Fact32, it is easy to see that ˜ ρ T satisfies all items in the defini-tion of local symbolic run that do not involve the counters.To show that ˜ ρ T is a local symbolic run, it remains to showthat ¯ c i ≥ ¯0 for 0 ≤ i < γ . To see that this holds, we as-sociate a sequence of counter vectors { ˜ c i } ≤ i<γ to the localrun ρ T , where each ˜ c i provides, for each T S -isomorphismtype ˆ τ , the number of tuples in S i of T S -isomorphism typeˆ τ (the T S -isomorphism type of a tuple t ∈ S i is defined anal-ogously to the T -isomorphism type for each local instance).By definition, ˜ c i ≥ ¯0 for each i ≥ 0. Thus it is sufficientto show that ˜ c i ≤ ¯ c i for each i . We show this by induc-tion. For i = 0, ˜ c = ¯ c = 0. Suppose ˜ c i − ≤ ¯ c i − andconsider the transition under service σ i in ρ T and ˜ ρ T . Itis easily seen that ˜ c i − and ¯ c i − are modified in the sameway except in the case when + S T (¯ s T ) ∈ δ , ˆ τ i − is notinput-bound, and ν i − (¯ s T ) ∈ S i − . In this case, if ˆ τ is the T S -isomorphism type of ν i − (¯ s T ), ˜ c i (ˆ τ ) = ˜ c i − (ˆ τ ) whereas¯ c i (ˆ τ ) = ¯ c i − (ˆ τ ) + 1. In all cases, ˜ c i ≤ ¯ c i . Thus, ˜ ρ T is a localsymbolic run. The fact that Sym is a tree of symbolic localruns follows from Fact 33, which ensures the consistency ofthe isomorphism types passed to and from subtasks. Finally,the fact that Sym is accepted by B ϕ follows from acceptanceof Tree by B ϕ and Fact 32. C.1.2 If part: from symbolic runs to actual runs We denote by FD the set of key dependences in the databaseschema DB and IND the set of foreign key dependences. Weshow the following. Lemma For every symbolic tree of runs Sym acceptedby B β , there exists a tree Tree of local runs accepted by B β with a finite database instance D where D | = FD . Note that the above does not require that D satisfy IND .This is justified by the following. Lemma For every tree of local runs Tree with database D | = FD if Tree is accepted by B β then there exists a finitedatabase D (cid:48) | = FD ∪ IND such that Tree with database D (cid:48) is also a tree of local runs accepted by B β Proof. We can construct D (cid:48) by adding tuples to D asfollows. First, for each relation R such that R is empty in D , we add an arbitrary tuple t to R . Next, for each foreignkey dependency R i [ F ] ⊆ R j [ ID ], for each tuple t of R i suchthat there is no tuple in R j with id t [ F ], we add to R j atuple t (cid:48) where • t (cid:48) [ ID ] = t [ F ], and • t (cid:48) [ attr ( R j ) − { ID } ] = t (cid:48)(cid:48) [ attr ( R j ) − { ID } ] where t (cid:48)(cid:48) is anexisting tuple in R j . Tree with database D (cid:48) is accepted by B β since D (cid:48) is an ex-tension of D . Also D (cid:48) is finite since the number of addedtuples is at most linear in the sum of number of empty re-lations in D and the number of tuples in D that violate IND .To show Lemma 34, we begin with a construction of alocal run ρ T on a finite database D T for each local symbolicrun ˜ ρ T ∈ Sym . The local runs are constructed so that they can be merged consistently into a tree of local runs Tree with a single finite database D . The major challenge in theconstruction of each ρ T and D T is that if ˜ ρ T is infinite, thesize of S T can grow infinitely, and a naive construction of ρ T would require infinitely many distinct values in D T . Ourconstruction needs to ensure that D T is always finite. Forease of exposition, we first consider the case where ˜ ρ T isfinite and then extend the result to infinite ˜ ρ T . Finite Local Symbolic Runs Recall from the previous section that ν ∗ ( e ) denotes the valueof expression e in database D T with valuation ν of ¯ x T . Byabuse of notation, we extend ν ∗ ( e ) to e ∈ { x R .w | x ∈ ¯ x T , R ∈DB} ∪ ¯ x T ∪ { , null } where there is no restriction on thelength of w . So for expression e = x R .w , ν ∗ ( e ) is the valuein D T obtained by foreign key navigation starting from thevalue ν ∗ ( x ) at relation R and by the sequence of attributes w , if such a value exists. Note that ν ∗ may be only partiallydefined since D T may not satisfy all foreign key constraints.Analogously, we define ν ∗ in ( e ) to be the value of e in D T at valuation ν in and ν ∗ out ( e ) to be the value of e in D T atvaluation ν out .We prove the following, showing the existence of an ac-tual local run corresponding to a finite local symbolic run.The lemma provides some additional information used whenmerging local runs into a final tree of runs. Lemma For every finite local symbolic run ˜ ρ T =( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ) ( γ (cid:54) = ω ), there exists a local run ρ T = ( ν in , ν out , { ( ρ i , σ i ) } ≤ i<γ ) on finite database D T | = FD such that for every ≤ i < γ ,(i) for every expression e = x R .w where ν ∗ i ( e ) is defined,there exists expression e (cid:48) = x R .w (cid:48) where | w (cid:48) | ≤ h ( T ) suchthat ν ∗ i ( e ) = ν ∗ i ( e (cid:48) ) ,(ii) for all expressions e, e (cid:48) ∈ E + T of τ i , if ν ∗ i ( e ) and ν ∗ i ( e (cid:48) ) aredefined, then e ∼ τ i e (cid:48) iff ν ∗ i ( e ) = ν ∗ i ( e (cid:48) ) , and(iii) for δ = h ( T c ) if σ i ∈ { σ oT c , σ cT c } for some T c ∈ child ( T ) and δ = 1 otherwise, for every expression e ∈ E − T = E + T −{ x R .w | x ∈ ¯ x T , | w | > δ } , ν ∗ i ( e ) is defined. Part (i), needed for technical reasons, says that for all values v in D T , if v is the value of expression x R .w , then v is alsothe value of an expression x R .w (cid:48) where the length of w (cid:48) iswithin h ( T ). Part (ii) says, intuitively, that the equalitytypes in the symbolic local run and the constructed localrun are the same. Part (iii) states that for every 0 ≤ i < γ ,at valuation ν i , every expression e within δ steps of foreignkey navigation from any variable x is defined in D T . Since δ ≥ 1, this together with (ii) implies that for every condition π , τ i | = π iff D T | = π ( ν i ). So if ˜ ρ T is accepted by somecomputation of a B¨uchi automaton B ( T, η ) then ρ T is alsoaccepted by the same computation of B ( T, η ).We provide the proof of Lemma 36 in the remainder of thesection. We first show that from each finite local symbolicrun ˜ ρ T , we can construct a global isomorphism type of ˜ ρ T ,which is essentially an equality type over the entire set ofexpressions in the symbolic instances of ˜ ρ T . Then we showthat the local run ρ T and database D T whose domain valuesare the equivalence classes of the global isomorphism type,satisfy the properties in Lemma 36. lobal Isomorphism Types We prove Lemma 36 by constructing ρ T and D T from ˜ ρ T =( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ) ( γ (cid:54) = ω ). We first introduce someadditional notation.Let I + be the set of symbolic instances I i of ˜ ρ T ( i < γ − S T (¯ s T ) ∈ δ i +1 and ˆ τ i is not input-bound. Simi-larly let I − be the set of symbolic instances I i ( i < γ ) suchthat − S T (¯ s T ) ∈ δ i and ˆ τ i is not input-bound. We definea one-to-one function Retrieve from I − to I + such thatfor every I i = Retrieve ( I j ), i < j and ˆ τ i = ˆ τ j . We saythat I j retrieves from I i . As ¯ c i ≥ i , at leastone mapping Retrieve always exists. Intuitively, Retrieve connects symbolic instance I j to I i such that I j retrieves atuple from S T which has the same isomorphism type as atuple inserted at I i . For each I i = Retrieve ( I j ), in the localrun ρ T we construct, valuations ν i and ν j have same valueson variables ¯ s T . Here we ignore input-bound isomorphismtypes since these can be seen as part of the input isomor-phism type: in ρ T , instances having the same input-bound T S -isomorphism type have the same values on ¯ s T .Recall that a segment S = { ( I i , σ i ) } a ≤ i ≤ b is a maximumconsecutive subsequence of { ( I i , σ i ) } ≤ i<γ such that σ a isan internal service and for a < i ≤ b , σ i is opening serviceor closing service of child tasks of T . For our choice of the Retrieve relation, we define a life cycle L = { ( I i , σ i ) } i ∈ J as a maximum subsequence of { ( I i , σ i ) } ≤ i<γ for J ⊆ [0 , γ )where for each pair of consecutive ( I a , σ a ) and ( I b , σ b ) in L where a < b , ( I a , σ a ) and ( I b , σ b ) are either in the samesegment or I a = Retrieve ( I b ). Note that a life cycle L is alsoa sequence of segments. From the definition of local symbolicruns, we can show the following properties for segments andlife cycles: Lemma 37. ( i ) For every segment S = { ( I i , σ i ) } a ≤ i ≤ b ,for every i, j ∈ [ a, b ] where i < j , for ¯ x = { x | x ∈ ¯ x T , x (cid:54)∼ τ i null } , τ i | ¯ x = τ j | ¯ x . ( ii ) For every life cycle L = { ( I i , σ i ) } i ∈ J ,for every i, j ∈ J where i < j , for ¯ x = { x | x ∈ ¯ x Tin ∪ ¯ s T , x (cid:54)∼ τ i null } , τ i | ¯ x = τ j | ¯ x . Next, for each symbolic instance I i , we define the pruned isomorphism type λ i = ( E i , ∼ i ) of I i as follows. Intuitively, λ i is obtained from τ i by removing expressions with “long”navigation from variables. Formally, let E + T be the extendednavigation set of τ i and E − T = E + T − { x R .w | x ∈ ¯ x T , | w | > δ } ,where δ = 1 if T is a leaf task, otherwise δ =max T c ∈ child ( T ) h ( T c ). A local expression of I i is a pair ( i, e )where e ∈ E − T , and we define that E i = { ( i, e ) | e ∈ E − T } is the local navigation set of λ i . We also define the localequality type ∼ i of λ i to be an equality type over E i where( i, e ) ∼ i ( i, e (cid:48) ) iff e ∼ τ i e (cid:48) , for every e, e (cid:48) ∈ E − T .Then we define the global isomorphism type as follows. Aglobal isomorphism type is a pair Λ = ( E , ∼ ), where E = (cid:83) ≤ i<γ E i is called the global navigation set and ∼ is anequality type over E called global equality type . For eachexpression e ∈ E , let [ e ] denote its equivalence class withrespect to ∼ . The global equality type ∼ is constructed asfollows:1. Initialization: ∼ ← (cid:83) ≤ i<γ ∼ i Chase: Until convergence, merge two equivalence classes E and E (cid:48) of ∼ if E and E (cid:48) satisfy one of the followingconditions: • Segment-Condition: For some segment S = { ( I i , σ i ) } a ≤ i ≤ b , variable x ∈ ¯ x T and i, i (cid:48) ∈ [ a, b ] where x (cid:54)∼ τ i null and x (cid:54)∼ τ i (cid:48) null , E = [( i, x )] and E (cid:48) =[( i (cid:48) , x )]. • Life-Cycle-Condition: For some life cycle L = { ( I i , σ i ) } i ∈ J , variable x ∈ ¯ x Tin ∪ ¯ s T and i, i (cid:48) ∈ J where x (cid:54)∼ τ i null and x (cid:54)∼ τ i (cid:48) null , E = [( i, x )] and E (cid:48) =[( i (cid:48) , x )]. • Input-Condition: For some variable x ∈ ¯ x Tin and i, i (cid:48) ∈ [0 , γ ), E = [( i, x )] and E (cid:48) = [( i (cid:48) , x )]. • FD-Condition: For some local expressions ( i, e ) , ( i (cid:48) , e (cid:48) )and attribute a where ( i, e ) ∼ ( i (cid:48) , e (cid:48) ), E = [( i, e.a )] and E (cid:48) = [( i (cid:48) , e (cid:48) .a )].From the global isomorphism type Λ defined above, weconstruct ρ T and D T as follows. The domain of D T is the setof equivalence classes of ∼ . Each relation R ( id, a , . . . , a k ) in D T consists of all tuples ([( i, e )] , [( i, e.a )] , . . . [( i, e.a k )]) forwhich ( i, e ) , ( i, e.a ) , . . . , ( i, e.a k ) ∈ E . Note that the chasestep guarantees that for all local expressions ( i, e ) , ( i (cid:48) , e (cid:48) ),if ( i, e.a ) , ( i (cid:48) , e (cid:48) .a ) ∈ E and ( i, e ) ∼ ( i (cid:48) , e (cid:48) ), then ( i, e.a ) ∼ ( i (cid:48) , e (cid:48) .a ). It follows that D T | = FD . We next define ρ T =( ν in , ν out , { ( ρ i , σ i ) } ≤ i<γ ), where ρ i = ( ν i , S i ). First, let ν i ( x ) = [( i, x )] for 0 ≤ i < γ , ν in = ν | ¯ x Tin , and ν out = ⊥ if τ out = ⊥ and ν out = ν γ − | ¯ x Tret otherwise. Suppose that, aswill be shown below, properties (i)-(iii) of Lemma 36 holdfor D T and the sequence { ν i } ≤ i<γ so defined. Note that(ii) and (iii) imply that the pre-and-post conditions of allservices σ i hold. Also, by construction, for every variable x ∈ ¯ x T where ν i − ( x ) = ν i ( x ) is required by the transitionunder σ i we always have ( i, x ) ∼ ( i + 1 , x ). Consider thesets { S i } ≤ i<γ . Recall the constraints imposed on sets bythe definition of local run: S = ∅ , and for 0 < i < γ where δ i is the set update of σ i ,1. S i = S i − ∪ ν i − (¯ s T ) if δ i = { + S T (¯ s T ) } ,2. S i = S i − − ν i (¯ s T ) if δ i = {− S T (¯ s T ) } and3. S i = ( S i − ∪ { ν i − (¯ s T ) } ) − { ν i (¯ s T ) } if δ i = { + S T (¯ s T ) , − S T (¯ s T ) } ,4. S i = S i − if δ i = ∅ .Note that the only cases that can make ρ T invalid are thosefor which δ i contains − S T (¯ s T ). Indeed, while a tuple canalways be inserted, a tuple can be retrieved only if it belongsto S T (or is simultaneously inserted as in case (3)). Thus,in order to show that the specified retrievals are possible, itis sufficient to prove the following. Lemma Let < i < γ be such that (1)-(4) hold for { S j } ≤ j
The key observations, which are easily checkedby the construction of Λ, are the following:( † ) for every k, k (cid:48) ∈ [0 , γ ), if ˆ τ k , ˆ τ k (cid:48) are not input-bound and I k and I k (cid:48) are not in the same life cycle, then ν k (¯ s T ) (cid:54) = ν k (cid:48) (¯ s T ).( ‡ ) for every k, k (cid:48) ∈ [0 , γ ), if ˆ τ k , ˆ τ k (cid:48) are input-bound, ν k (¯ s T ) = ν k (cid:48) (¯ s T ) iff ˆ τ k = ˆ τ k (cid:48) .Now suppose that 0 < i < γ , (1)-(4) hold for { S j } ≤ j
We first introduce some additional notation. For each i and( i, e ) ∈ E i , we denote by [( i, e )] i the equivalence class of( i, e ) wrt ∼ i . And for x ∈ ¯ x T we denote by Reach i ( x, w ) theunique equivalence class of ∼ i reachable from [( i, x R )] i bysome navigation w (if such class exists). More precisely: Definition For each ≤ i < γ , we define G ( ∼ i ) tobe the labeled directed graph whose nodes are the equivalenceclasses of ∼ i and where for each attribute a , there is an edgelabeled a from E to F if there exist e ∈ E and f ∈ F suchthat ( i, e.a ) ∈ E i and e.a ∼ τ i f . Note that for each E there isat most one outgoing edge labeled a . For x ∈ ¯ x T , x (cid:54)∼ i null and sequence of attributes w , we denote by Reach i ( x, w ) theunique equivalence class F of ∼ i reachable from [( i, x )] i by apath in G ( ∼ i ) whose sequence of edge labels spells w , if suchexists, and the empty set otherwise. By our choice of h ( T ) and our construction of the λ i ’s, wecan show that Lemma For every ≤ i < γ and expression x R .w , if Reach i ( x, w ) is non-empty, then there exists an expression x R . ˜ w where | ˜ w | < h ( T ) such that Reach i ( x, w ) = Reach i ( x, ˜ w ) . Proof. It is sufficient to show that for each i , | G ( ∼ i ) | To show property (ii), it is sufficient to show an invariantwhich implies property (ii) and is satisfied throughout theconstruction of Λ. For simplicity, we assume that the chasestep in the construction of ∼ is divided into the following 3phases. • The Segment Phase . In this phase, we merge equiva-lence classes E and E (cid:48) that satisfies either the Segment-Condition or the FD-condition. • The Life Cycle Phase . In this phase, we merge equiva-lence classes E and E (cid:48) that satisfies either the Life-Cycle-Condition or the FD-condition. • The Input Phase . In this phase, we merge equivalenceclasses E and E (cid:48) that satisfies either the Input-conditionor the FD-condition.It is easily seen that no chase step applies after the inputphase. Thus, the above steps compute the complete chase.For each equivalence class E of ∼ , we let i ( E ) be the setof indices { i | ( i, e ) ∈ E } and for each i ∈ i ( E ), we denote by E | i the projection of E on the navigation set E i . One canshow that during the segment phase, for every E of ∼ , i ( E )are indices within the same segment. During the life cyclephase, for every E of ∼ , i ( E ) are indices within the samelife cycle. And during the input phase, i ( E ) can be arbitraryindices.The invariant is defined as follows. Lemma (Invariant of Λ ) Throughout the constructionof Λ , for every equivalence class E of ∼ , there exists variable x ∈ ¯ x T and navigation w where | w | ≤ h ( T ) , such that forevery i ∈ i ( E ) , E | i = Reach i ( x, w ) . Lemma 41 implies that for each equivalence class E of ∼ and for each λ i , E is a superset of at most one equivalenceclass of λ i . So ( i, e ) ∼ ( i, e (cid:48) ) implies ( i, e ) ∼ i ( i, e (cid:48) ) thusΛ |E i = λ i for every 0 ≤ i < γ , which implies property (ii) ofLemma 36. Proof. We consider each step of the construction of theglobal equality type ∼ . For the initialization step, the in-variant holds by Lemma 40.For the Chase steps, assume that the invariant is satisfiedbefore merging two equivalence classes E and E (cid:48) . For eachequivalence class E of ∼ , we denote by x ( E ) and w ( E ) thevariable and the navigation for E as stated in Lemma 41. Toshow the invariant is satisfied after merging E and E (cid:48) , it issufficient to show that there exists variable y and navigation u where | u | ≤ h ( T ) such that for every i ∈ i ( E ), E | i = Reach i ( y, u ) and for every i ∈ i ( E (cid:48) ), E (cid:48) | i = Reach i ( y, u ).Consider the segment phase. Suppose first that E and E (cid:48) are merged due to the Segment-Condition. For sim-plicity, we let x = x ( E ) , x (cid:48) = x ( E (cid:48) ) , w = w ( E ) and w (cid:48) = w ( E (cid:48) ). If E = [( i, y )] and E (cid:48) = [( i (cid:48) , y )] where i, i (cid:48) are in-dices within the same segment S , then by the assumption,we have ( i, y ) ∈ Reach i ( x, w ), so y ∼ τ i x R .w . As i ( E ) areindices of a segment S , and by Lemma 37, we have thatfor every j ∈ i ( E ), y ∼ τ j x R .w , so E | j = Reach j ( x, w ) = Reach j ( y, (cid:15) ). Similarly, we can show that for every j ∈ i ( E (cid:48) ), E (cid:48) | j = Reach j ( y, (cid:15) ).Next suppose E and E (cid:48) are merged due to the FD-condition.Thus, E = [( i, e.a )] and E (cid:48) = [( i (cid:48) , e (cid:48) .a )] where ( i, e ) ∼ ( i (cid:48) , e (cid:48) ).Let E ∗ be the equivalence class of ∼ that contains ( i, e ) and( i (cid:48) , e (cid:48) ). By the assumption, for y = x ( E ∗ ) and u = w ( E ∗ ),e have that E ∗ | i = Reach i ( y, u ) so ( i, e ) ∈ Reach i ( y, u ).By Lemma 40, there exists navigation ˜ u where | ˜ u | < h ( T )such that Reach i ( y, u ) = Reach i ( y, ˜ u ). So( i, e.a ) ∈ Reach i ( y, ˜ u.a ). Then in E , by the hypothesis, wehave ( i, e.a ) ∈ Reach i ( x, w ) so Reach i ( y, ˜ u.a ) = Reach i ( x, w ). As i ( E ) are indices of a segment S , and by Lemma37, we have that for every j ∈ i ( E ), for some relation R and R , y R . ˜ u.a ∼ τ j x R .w so E | j = Reach j ( x, w ) = Reach j ( y, ˜ u.a ). Similarly, we can show that for every j ∈ i ( E (cid:48) ), E (cid:48) | j = Reach j ( y, ˜ u.a ). Therefore, the invariant is preserved duringthe segment phase.Consider the life cycle phase. We can show that the invari-ant is again preserved, together with the following additionalproperty: for each equivalence class E of ∼ produced in thisphase, x ( E ) ∈ ¯ x Tin ∪ ¯ s T . Suppose E and E (cid:48) are merged dueto the Life-Cycle Condition, where E = [( i, y )], E (cid:48) = [( i (cid:48) , y )]and y ∈ ¯ x Tin ∪ ¯ s T . We have that E | j = Reach j ( x, w ) = Reach j ( y, (cid:15) ) for every j ∈ i ( E ). Indeed, by Lemma 37 andbecause i ( E ) are indices of some life cycle L , x R .w ∼ τ i y implies that x R .w ∼ τ j y for every index j of L . Similarly, E (cid:48) | j = Reach j ( y, (cid:15) ) for every j ∈ i ( E (cid:48) ). The case when E and E (cid:48) are merged in this stage due to the FD-conditionis similar to the above. Following similar analysis, we canshow that the input phase also preserves the invariant to-gether with the property that for every E produced at theinput phase, x ( E ) ∈ ¯ x Tin . This uses the fact that τ i | ¯ x Tin = τ in for every 0 ≤ i < γ .This completes the proof of Lemma 36. Infinite Local Symbolic Runs In this section we show that Lemma 36 can be extended toinfinite periodic local symbolic runs, which together with fi-nite runs are sufficient to represent accepted symbolic treesof runs by our VASS construction (see Lemma 21). Specif-ically, we show that we can extend the construction of theglobal isomorphism type to infinite periodic ˜ ρ T , while pro-ducing only finitely many equivalence classes. This is suffi-cient to show that the corresponding database D T is finite.We define periodic local symbolic runs next. Definition A local symbolic run ˜ ρ T =( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ) is periodic if γ = ω and there ex-ists n > and < t ≤ n , such that for every i ≥ n , symbolicinstances I i = ( τ i , ¯ c i , σ i ) and I i − t = ( τ i − t , ¯ c i − t , σ i − t ) satisfythat ( τ i , σ i ) = ( τ i − t , σ i − t ) and ¯ c i ≥ ¯ c i − t . The integer t iscalled the period of ˜ ρ T . From Lemma 21 in Section 4, we have the following: Corollary It there exists a symbolic tree of runs Sym accepted by B β , then there exists a symbolic tree ofruns Sym (cid:48) accepted by B β such that for every ˜ ρ T ∈ Sym , ˜ ρ T is finite or periodic. The above corollary indicates that for verification, it issufficient to consider only finite and periodic ˜ ρ T . So whatwe need to prove is: Lemma For every periodic local symbolic run ˜ ρ T =( τ in , τ out , { ( I i , σ i ) } ≤ i<ω ) , there exists a local run ρ T =( ν in , ν out , { ( ρ i , σ i ) } ≤ i<ω ) on finite database D T | = FD suchthat for every i ≥ , (i) for every expression e = x R .w where ν ∗ i ( e ) is defined,there exists expression e (cid:48) = x R .w (cid:48) where | w (cid:48) | ≤ h ( T ) suchthat ν ∗ i ( e ) = ν ∗ i ( e (cid:48) ) ,(ii) for all expressions e, e (cid:48) ∈ E + T of τ i , if ν ∗ i ( e ) and ν ∗ i ( e (cid:48) ) aredefined, then e ∼ τ i e (cid:48) iff ν ∗ i ( e ) = ν ∗ i ( e (cid:48) ) , and(iii) for δ = h ( T c ) if σ i ∈ { σ oT c , σ cT c } for some T c ∈ child ( T ) and δ = 1 otherwise, for every expression e ∈ E − T = E + T −{ x R .w | x ∈ ¯ x T , | w | > δ } , ν ∗ i ( e ) is defined. Intuitively, if we directly apply the construction of ρ T and D T from Lemma 36 in the case of finite ˜ ρ T , then each lifecycle with non-input-bound T S -isomorphism types wouldbe assigned with distinct sets of values, which could lead toan infinite D T . However, for any two life cycles L and L which are disjoint in their timespan, reusing the same valuesin L and L does not cause any conflict. And in particular,if L and L are identical on the sequence of τ i ’s and σ i ’s,they can share exactly the same set of values.Thus at a high level, our goal is to show that any periodiclocal symbolic run ˜ ρ T can be partitioned into finitely manysubsets of identical life cycles with disjoint timespans. Un-fortunately, this is generally not true if we pick the Retrieve function arbitrarily (recall that Retrieve defines the set oflife cycles). This is because an arbitrary Retrieve may yieldlife cycles whose timespans have unbounded length. If thetimespans overlap, it is impossible to separate the life cy-cles into finitely many subsets of life cycles with disjointtimespans. So instead of picking an arbitrary Retrieve asin the finite case, we show that for periodic ˜ ρ T we can con-struct Retrieve such that the timespan of each life cycle hasbounded length. This implies that we can partition the lifecycles into finitely many subsets of identical life cycles withdisjoint timespans, as desired. Finally we show that giventhe partition, we can construct the local run ρ T togetherwith a finite D T .We first define the equivalence relation between life cycles. Definition Segments S = { ( I i , σ i ) } a ≤ i ≤ b and S = { ( I i , σ i ) } a ≤ i ≤ b are equivalent, denoted as S ≡ S ,if { ( τ i , σ i ) } a ≤ i ≤ b = { ( τ i , σ i ) } a ≤ i ≤ b . Definition A segment S = { ( I i , σ i ) } a ≤ i ≤ b is static if I a ∈ I − , I b ∈ I + and τ a | ¯ s T = τ b | ¯ s T . A segment S iscalled dynamic if it is not static. When we compare two life cycles L and L , we can ignoretheir static segments since they do not change the contentof S T . We define equivalence of two life cycles as follows. Definition For life cycle L , let dym ( L ) = { S i } ≤ i ≤ k be the sequence of dynamic segments of L . Two life cycles L and L are equivalent, denoted as L ≡ L , if | dym ( L ) | = | dym ( L ) | and for dym ( L ) = { S i } ≤ i ≤ k and dym ( L ) = { S i } ≤ i ≤ k , for every ≤ i ≤ k , S i ≡ S i . Note that for each life cycle L , the number of dynamicsegments within L is bounded by | ¯ s T | since within L , eachvariable in ¯ s T is written at most once by returns of childtasks of T . For a task T , as the number of T -isomorphismtypes is bounded, the number of services is bounded and thelength of a segment is bounded because each subtask can becalled at most once, the number of equivalence classes ofsegments is bounded. And since the number of dynamicsegments is bounded within the same life cycle, the numberof equivalence classes of life cycles is also bounded. Thus, emma The equivalence relation ≡ on life cycles hasfinite index. Our next step is to show that one can define a Retrieve function so that all life cycles have bounded timespans. Thetimespan of a life cycle is defined as follows: Definition The timespan of a life cycle L , denotedby sp ( L ) , is an interval [ a, b ] where a is the index of the firstsymbolic instance of the first dynamic segment of L and b isthe index of the last symbolic instance of the last dynamicsegment. Consider an equivalence class L of life cycles. Supposethat for each L ∈ L , the length of sp ( L ) is bounded bysome constant m . Then we can further partition L into m subsets L , . . . , L m − of life cycles with disjoint timespan byassigning each L ∈ L where sp ( L ) = [ a, b ] to the subset L k where k = a mod m .We next show how to construct the function Retrieve . Inparticular, we construct a periodic Retrieve such that thereis a short gap between each pair of inserting and retrievinginstances. This is done in several steps, illustrated in Figure3.1. Initialize Retrieve to be an arbitrary one-to-one mappingwith domain { I i | I i ∈ I − , ≤ i ≤ n } such that for every I i = Retrieve ( I j ), i < j and ˆ τ i = ˆ τ j (recall that ˆ τ i = τ i | ¯ x Tin ∪ ¯ s T ).2. For every j ∈ [ n + 1 , n + t ], for j (cid:48) = j − t and for i (cid:48) beingthe index where I i (cid:48) = Retrieve ( I j (cid:48) ),(i) if i (cid:48) ∈ [ n − t + 1 , n ], then for i = i (cid:48) + t , let Retrieve ← Retrieve [ I j + k · t (cid:55)→ I i + k · t | k ≥ i (cid:48) ∈ [0 , n − t ], then we pick i ∈ [ n − t + 1 , n ] sat-isfying that I i ∈ I + , ˆ τ i = ˆ τ j and I i is currently notin the range of Retrieve . Then we let Retrieve ← Retrieve [ I j + k · t (cid:55)→ I i + k · t | k ≥ i (cid:48) ∈ [0 , n − t ], the i that we picked al-ways exists for the following reason. For every T S -isomorphismtype ˆ τ , let • M − ˆ τ be the number of symbolic instances in I − with T S -isomorphism type ˆ τ and indices in [ n − t + 1 , n ] that re-trieves from symbolic instances with indices in [0 , n − t ],and • M +ˆ τ be the number of symbolic instances in I + with T S -isomorphism type ˆ τ and indices in [ n − t +1 , n ] that is NOTretrieved by symbolic instances with indices in [ n − t +1 , n ].We have M +ˆ τ − M − ˆ τ = ¯ c n (ˆ τ ) − ¯ c n − t (ˆ τ ) ≥ 0. So for every I i (cid:48) = Retrieve ( I j (cid:48) ) where j (cid:48) ∈ [ n − t +1 , n ] and i (cid:48) ∈ [0 , n − t ],we can always find a unique i ∈ [ n − t + 1 , n ] such that I i ∈E + , ˆ τ i = ˆ τ j (cid:48) = ˆ τ i (cid:48) and I i is not retrieved by any retrievinginstances with indices in [ n − t + 1 , n ].Let us fix the function Retrieve constructed above. Wefirst show the following: Lemma For every periodic ˜ ρ T , and j > n , I i = Retrieve ( I j ) implies that j − i ≤ t and I i + t = Retrieve ( I j + t ) . Proof. By construction, for every I i = Retrieve ( I j )where j > i > n , I i + t = Retrieve ( I j + t ). And it is also guar-anteed that for the indices i and j , either (1) i and j are bothin the same range [ n + tk +1 , n + t ( k +1)] for some k ≥ 0, or (2) i ∈ [ n + tk +1 , n + t ( k +1)] and j ∈ [ n + t ( k +1)+1 , n + t ( k +2)]for some k ≥ 0. In both cases, j − i ≤ t . n n+t n+2tn-t ...0 n n+t n+2tn-t ...0 have not been retrieved Case 2(i):Case 2(ii): j-t j j+tj-t j j+t Figure 3: Construction of Retrieve For every life cycle L , for every pair of consecutive dy-namic segments S and S (cid:48) , we denote by gap ( S, S (cid:48) ) the num-ber of static segments in between S and S (cid:48) . To show that sp ( L ) is bounded, it is sufficient to show that gap ( S, S (cid:48) ) isbounded for every pair of consecutive dynamic segments S and S (cid:48) . For every segment S , we denote by a ( S ) the index ofthe first symbolic instance of S . For every segment S where a ( S ) > n , we let p ( S ) = ( a ( S ) − n − mod t .For every pair of consecutive dynamic segments S and S (cid:48) and by periodicity of Retrieve , there are no two staticsegments T and T (cid:48) in L in between S and S (cid:48) such that a ( S ) < a ( T ) < a ( T (cid:48) ) < a ( S (cid:48) ) and p ( T ) = p ( T (cid:48) ). Thus in L ,the number of static segments in between S and S (cid:48) is at most n + t . Then by Lemma 50, the number of symbolic instancesin between any pair of consecutive segments is bounded bymax(2 t, n ) so gap ( S, S (cid:48) ) ≤ ( n + t ) · max(2 t, n + t ). And as thenumber of dynamic segments in L is bounded by | ¯ s T | andthe length of each segment is at most 2 | child ( T ) | , it followsthat: Lemma For every periodic local symbolic run ˜ ρ T andlife cycle L of ˜ ρ T , | sp ( L ) | is bounded by m = ( n + t ) · max(2 t, n + t ) · ( | ¯ s T | + 1) · | child ( T ) | . So for a possibly infinite set of life cycles L where | sp ( L ) | ≤ m for each L ∈ L , L can be partitioned into sets L , . . . , L m − by assigning each life cycle L ∈ L where sp ( L ) = [ a, b ] tothe set L a mod m . So for every L i and two distinct L , L in L i where sp ( L ) = [ a , b ] and sp ( L ) = [ a , b ], we have a (cid:54) = a . Assume a < a . Then as a ≡ a (mod m ), a − a ≥ m . And since b − a + 1 < m , L and L aredisjoint. Thus, given Lemma 48 and Lemma 51, we have Lemma Every local symbolic run ˜ ρ T can be partitionedinto finitely many subsets of life cycles such that for eachsubset L , if L ∈ L , L ∈ L and L (cid:54) = L then L ≡ L and sp ( L ) ∩ sp ( L ) = ∅ . Next, we show how we can construct the local run ρ T andfinite database D T from ˜ ρ T using the partition. We firstconstruct global isomorphism type Λ = ( E , ∼ ) of ˜ ρ T usingthe approach for the finite case. Then we merge equivalentsegments in Λ as follows to obtain a new global isomorphismtype with finitely many equivalence classes. To merge twoequivalent segments S = { ( I i , σ i ) } a ≤ i ≤ a + l and S = { ( I i , σ i ) } a ≤ i ≤ a + l , first for every 0 ≤ i ≤ l and for every x ∈ ¯ x T , we merge the equivalence classes [( a + i, x )] and[( a + i, x )] of ∼ . Then we apply the chase step (i.e. theFD-condition) to make sure the resulting database satisfies FD .he new Λ is constructed as follows. For every two seg-ments S = { ( I i , σ i ) } a ≤ i ≤ b and S = { ( I i , σ i ) } c ≤ i ≤ d , wedefine that S precedes S , denote by S ≺ S , if b < c .For each subset L and for each pair of life cycles L , L ∈ L where dym ( L ) = { S i } ≤ i ≤ k and dym ( L ) = { S i } ≤ i ≤ k , • for 1 ≤ i ≤ k , merge S i and S i , • for 1 ≤ i < k , for every static segments S ⊆ L and S ⊆ L where S i ≺ S ≺ S i +1 , S i ≺ S ≺ S i +1 and S ≡ S , merge S and S , and • for every pair of static segments S ⊆ L and S ⊆ L where S k ≺ S , S k ≺ S and S ≡ S , merge S and S .Finally, ρ T and D T are constructed following the same ap-proach as in the finite case. In the above construction, asthe number of subsets of life cycles is finite, and for each L ,the number of dynamic segments is bounded and the num-ber of equivalence classes of static segments is bounded, thenumber of equivalence classes of Λ is also finite so D T isfinite.By an analysis similar to the finite case, we can showthat ρ T and D T satisfy property (i)-(iii) in Lemma 44 and D T | = FD . In particular, to show property (ii), we canshow the same invariant as in Lemma 41, the invariant holdsbecause every pair of merged segments are equivalent.Finally, to show Lemma 44, it remains to show that ρ T isa valid local run. Similar to the finite case, it is sufficient toshow that Lemma For every i ≥ , if δ i = {− S T (¯ s T ) } then ν i (¯ s T ) ∈ S i − . If δ i = { + S T (¯ s T ) , − S T (¯ s T ) } then either ν i (¯ s T ) ∈ S i − or ν i (¯ s T ) = ν i − (¯ s T ) . Proof. The following can be easily checked by the con-struction of Λ:(i) for every pair of distinct life cycles L and L (cid:48) where sp ( L ) ∩ sp ( L (cid:48) ) (cid:54) = ∅ , for every I k ∈ L and I k (cid:48) ∈ L (cid:48) , if ˆ τ k , ˆ τ k (cid:48) arenot input-bound then ν k (¯ s T ) (cid:54) = ν k (cid:48) (¯ s T ), and(ii) for every pair of life cycles L and L (cid:48) where sp ( L ) ∩ sp ( L (cid:48) ) = ∅ , if I i , I j ∈ L , I j = Retrieve ( I i ), ˆ τ i is not input-bound, I k ∈ L (cid:48) for j < k < i and ν k (¯ s T ) = ν i (¯ s T ) = ν j (¯ s T ), then I k is contained in a static segment of L (cid:48) .(iii) for every k, k (cid:48) ≥ 0, if ˆ τ k , ˆ τ k (cid:48) are input-bound, ν k (¯ s T ) = ν k (cid:48) (¯ s T ) iff ˆ τ k = ˆ τ k (cid:48) .Consider the case when δ i = {− S T (¯ s T ) } and ˆ τ i is notinput-bound. Let I j = Retrieve ( I i ) and L be the life cyclethat contains I i . Consider I k where j < k < i and let L (cid:48) be the life cycle containing I k . If sp ( L ) ∩ sp ( L (cid:48) ) (cid:54) = ∅ , by (i), ν i (¯ s T ) (cid:54) = ν k (¯ s T ). If sp ( L ) ∩ sp ( L (cid:48) ) = ∅ , by (ii), the segmentcontaining I k is static, so it does not change S T . Thus,for every segment S between I j and I i , the tuple ν i (¯ s T )remains in S T after S . So ν i (¯ s T ) ∈ S i − . The case when δ i = {− S T (¯ s T ) , + S T (¯ s T ) } is similar.The proof for the case when ˆ τ i is input-bound is the sameas the proof for Lemma 38.This completes the proof of Lemma 44. Symbolic Trees of Runs Finally, we show Lemma 34 by providing a recursive con-struction of a tree of runs Tree and database D from anysymbolic tree of runs Sym where all local symbolic runs areeither finite or periodic, using Lemmas 36 and 44. Intu-itively, the construction simply applies the two lemmas toeach node ˜ ρ T of Sym to obtain a local run ρ T with a local database D T . Then the local runs and databases are com-bined into a tree of local runs recursively by renaming thevalues in each ρ T and D T in a bottom-up manner, reflectingthe communication among local runs via input and returnvariables.Formally, we first define recursively the construction func-tion F where F ( Sym T ) = ( Tree T , D T ) where Sym T is asubtree of Sym and ( Tree T , D T ) are the resulting subtreeof local runs and database instance. F is defined as follows.If T is a leaf task, then Sym T contains a single local sym-bolic run ˜ ρ T . We define that F ( Sym T ) = F (˜ ρ T ) = ( ρ T , D T )where ρ T and D T are the local run and database instanceshown to exist in Lemmas 36 and 44 corresponding to ˜ ρ T .If T is a non-leaf task where the root of Sym T is ˜ ρ T =( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ), then we first let ( ρ T , D root ) = F (˜ ρ T ). Next, let J = { i | σ i = σ oT c , T c ∈ child ( T ) } . Forevery i ∈ J , we denote by Sym i the subtree rooted atthe child of ˜ ρ T where the edge connecting it with ˜ ρ T islabeled i and let ˜ ρ i be the root of Sym i . We denote by( Tree i , D i ) = F ( Sym i ) and by ρ i the local run at the rootof Tree i . From the construction in Lemmas 36 and 44, weassume that the domains of D root and the D i ’s are equiva-lence classes of local expressions. We first define the renam-ing function r whose domain is (cid:83) i ∈ J adom ( D i ) as follows.1. Initialize r to be the identity function.2. For every i ∈ J , for every expression x R .w where x ∈ ¯ x T c in and ν ∗ in ( x R .w ) is defined, for y = f in ( x ), let r ← r [ ν ∗ in ( x R .w ) (cid:55)→ ν ∗ i ( y R .w )]. Note that ν ∗ in is defined wrt ν in of ρ i and D i and ν ∗ i is defined wrt ν i of ρ T and D root . And we shall see next that for every such x R .w , if ν ∗ in ( x R .w ) is defined, then ν ∗ i ( y R .w ) is also defined.3. For every i ∈ J where ˜ ρ i is a returning local symbolic runwhere the index of the corresponding σ cT c in ˜ ρ T is j , for ev-ery expression x R .w where x ∈ ¯ x T c ret and ν ∗ out ( x R .w ) is de-fined, for y = f − out ( x ), let r ← r [ ν ∗ out ( x R .w ) (cid:55)→ ν ∗ j ( y R .w )].We denote by r ( D ) the database instance obtained by re-placing each value v ∈ dom ( r ) in D with r ( v ) and denoteby r ( Tree ) the tree of runs obtained by replacing each value v ∈ dom ( r ) in Tree with r ( v ).Then if ˜ ρ T is finite, we define F ( Sym T ) = ( Tree T , D T )where D T = D root ∪ (cid:83) i ∈ J r ( D i ) and Tree T is obtained from Sym T by replacing the root of Sym T with ρ T and eachsubtree Sym i with r ( Tree i ).If ˜ ρ T is periodic where the period is t and the loop startswith index n , we define F ( Sym T ) = ( Tree T , D T ) where D T = D root ∪ (cid:83) i ∈ J,i For every symbolic tree of runs Sym T where ( Tree T , D T ) = F ( Sym T ) , D T is a finite database satisfying FD , Tree T is a valid tree of runs over D T , and ( ρ T , D T ) satisfies properties (i)-(iii) in Lemma 36 and 44. Proof. We use a simple induction. For the base case,where T is a leaf task, the lemma holds trivially. For the in-duction step, assume that for each i ∈ J , D i is finite and sat-sfies FD , Tree i is a valid tree of runs over D i , and ( ρ i , D i )satisfies property (i)-(iii).For each i ∈ J , where ˜ ρ i is a local symbolic run of task T c ∈ child ( T ), we first consider the connection between ˜ ρ i and ˜ ρ T via input variables. As ρ i satisfies properties (i)and (ii), for every expressions x R .w and x (cid:48) R (cid:48) .w (cid:48) in the inputisomorphism type τ in of ˜ ρ i , if ν ∗ in ( x R .w ) and ν ∗ in ( x (cid:48) R (cid:48) .w (cid:48) ) aredefined, then ν ∗ in ( x R .w ) = ν ∗ in ( x (cid:48) R (cid:48) .w (cid:48) ) iff x R .w ∼ τ in x (cid:48) R (cid:48) .w (cid:48) .And by definition of symbolic tree of runs, we have that τ in = f − in ( τ i ) | (¯ x T c in , h ( T c )). So for y = f in ( x ) and y (cid:48) = f in ( x (cid:48) ), ν ∗ in ( x R .w ) = ν ∗ in ( x (cid:48) R (cid:48) .w (cid:48) ) iff y R .w ∼ τ i y (cid:48) R (cid:48) .w (cid:48) . Thenas ρ T satisfies (ii) and (iii), ν ∗ i ( y R .w ) and ν ∗ i ( y (cid:48) R (cid:48) .w (cid:48) ) aredefined and ν ∗ i ( y R .w ) = ν ∗ i ( y (cid:48) R (cid:48) .w (cid:48) ) iff y R .w ∼ τ i y (cid:48) R (cid:48) .w (cid:48) so ν ∗ i ( y R .w ) = ν ∗ i ( y (cid:48) R (cid:48) .w (cid:48) ) iff ν ∗ in ( x R .w ) = ν ∗ in ( x (cid:48) R (cid:48) .w (cid:48) ).If ˜ ρ i is returning, using the same argument as above, wecan show the following. Let j be the index of the cor-responding returning service σ cT c . Let f be the functionwhere f ( x ) = (cid:40) f in ( x ), x ∈ ¯ x T c in f − out ( x ) , x ∈ ¯ x T c ret and let ν be the valua-tion where ν ( x ) = (cid:40) ν in ( x ) , x ∈ ¯ x T c in ν out ( x ) , x ∈ ¯ x T c ret , where ν in and ν out are the input and output valuation of ρ i . For all expressions x R .w and x (cid:48) R (cid:48) .w (cid:48) where x, x (cid:48) ∈ ¯ x T c ret ∪ ¯ x T c in , if ν ∗ ( x R .w ) and ν ∗ ( x (cid:48) R (cid:48) .w (cid:48) ) are defined, then for y = f ( x ) and y (cid:48) = f ( x (cid:48) ), ν ∗ j ( y R .w ) and ν ∗ j ( y (cid:48) R (cid:48) .w (cid:48) ) are also defined and ν ∗ j ( y R .w ) = ν ∗ j ( y (cid:48) R (cid:48) .w (cid:48) ) iff ν ∗ ( x R .w ) = ν ∗ ( x (cid:48) R (cid:48) .w (cid:48) ).Given this, after renaming, D root and r ( D i ) can be com-bined consistently. Also, one can easily check that Tree T isa valid tree of runs where ( ρ T , D T ) satisfies properties (i)-(iii) and D T | = FD . And D T is a finite database because itis the union of D root and finitely many r ( D i )’s and by thehypothesis, D root and the D i ’s are finite.Finally, to complete the proof of correctness of the con-struction, we note: Lemma For every full symbolic tree of runs Sym whereall local symbolic runs in Sym are either finite or periodic,for ( Tree , D ) = F ( Sym ) and every HLTL-FO property ϕ , Sym is accepted by B ϕ iff Tree is accepted by B ϕ on D . The above follows immediately from the fact that by con-struction, for every task T and local symbolic run ˜ ρ T =( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ) in Sym where the correspondinglocal run in Tree is ρ T = ( ν in , ν out , { ( ρ i , σ i ) } ≤ i<γ ), for ev-ery condition π over ¯ x T and 0 ≤ i < γ , τ i | = π iff D | = π ( ν i ).This completes the proof of Lemma 34, and the only-ifpart of Theorem 20. C.2 Proof of Lemma 21 The proof is by induction on the task hierarchy H . Base Case Consider R T ( τ in , τ out , β ) where T is a leaf task.As T has no subtask, dom (¯ o ) is always empty so ¯ o can beignored. Note that, by definition, there can be no blockingpath of V ( T, β ).For the if part, consider ( τ in , τ out , β ) ∈ R T . Supposefirst that τ out (cid:54) = ⊥ . By definition, there exists a finite localsymbolic run ( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ) accepted by B ( T, β ),where γ ∈ N and σ γ − = σ cT . Consider an accepting com-putation { q i } ≤ i<γ of B ( T, η ) on { ( I i , σ i ) } ≤ i<γ , such that q γ − ∈ Q fin . We can construct a returning path P = { ( p i , ¯ z i ) } ≤ i<γ of V ( T, β ) where for each state p i = ( τ i , σ i , q i , ¯ o i , ¯ c iib ), ( τ i , σ i , q i ) is obtained directly from { ( I i , σ i ) } ≤ i<γ and { q i } ≤ i<γ , ¯ z i = ¯ c i , and ¯ c iib is the projec-tion of ¯ c i to input-bound T S -isomorphism types.Now suppose τ out = ⊥ . By definition, and since T is aleaf task, there exists an infinite symbolic run ( τ in , τ out , { ( I i , σ i ) } ≤ i<ω ) accepted by B ( T, β ). Consider the sequence { q i } ≤ i<ω of states in an accepting computation of B ( T, η )on { ( I i , σ i ) } ≤ i<ω . There must exist q f ∈ Q inf such thatfor infinitely many i , q i = q f . So we can construct a path P = { ( p i , ¯ z i ) } ≤ i<ω of V ( T, β ) where for each state p i =( τ i , σ i , q i , ¯ o i , ¯ c iib ) is obtained in the same way as in the casewhere τ out (cid:54) = ⊥ . It is sufficient to show that there existsa finite prefix { ( p i , ¯ z i ) } ≤ i ≤ n of P such that there exists m < n such that ( τ m , σ m , q m , ¯ c mib ) = ( τ n , σ n , q n , ¯ c nib ), q m = q n = q f , and ¯ z m ≤ ¯ z n . By the pigeonhole principle, thereexist τ , σ , ¯ c ib and an infinite J ⊆ N such that ( τ j , σ j , ¯ c jib q j ) =( τ, σ, ¯ c ib , q f ) for every j ∈ J . Consider the sequence { ¯ z j | j ∈ J } . Next, there exists an infinite J ⊆ J such that { ¯ z j | j ∈ J } is non-decreasing in the first dimension. Astraightforward induction shows that there exists an infinite J | ¯ z | ⊆ J such that { ¯ z j | j ∈ J | ¯ z | } is non-decreasing in alldimensions. Now consider m, n ∈ J | ¯ z | , m < n . The sequence( p , ¯ z ) , . . . , ( p m , ¯ z m ) , . . . , ( p n , ¯ z n ) is a lasso path of V ( T, β ).For the only-if direction, if there exists a returning pathin V ( T, β ), then by definition, τ in and τ out together withthe sequence { ( I i , σ i ) } ≤ i ≤ n where each ( I i , σ i ) is obtaineddirectly from ( p i , ¯ z i ) is a valid local symbolic run ˜ ρ T . And˜ ρ T is accepted by B ( T, β ) since q n is in Q fin . If there existsa lasso path in V ( T, β ), then we can obtain a finite sequence { ( I i , σ i ) } ≤ i ≤ n similar to above. And we can construct { ( I i , σ i ) } ≤ i<ω by repeating the subsequence from index m +1 to index n infinitely many times. As q n = q f ∈ Q inf ,( τ in , ⊥ , { ( I i , σ i ) } ≤ i<ω ) is an infinite local symbolic run ac-cepted by B ( T, β ), so ( τ in , ⊥ , β ) ∈ R T . Induction Consider a non-leaf task T , and suppose thestatement is true for all its children tasks.For the if part, suppose ( τ in , τ out , β ) ∈ R T . Then thereexists an adorned symbolic tree of runs Sym T with root˜ ρ T = ( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ) accepted by B ¯ β . We con-struct a path P = { ( p i , ¯ z i ) } ≤ i<γ of V ( T, β ) as follows. Thetransitions in ˜ ρ T caused by internal services are treated as inthe base case. Suppose that σ i = σ oT c for some child T c of T .Then there is an edge labeled ( i, β T c ) from ˜ ρ T to a symbolictree of runs accepted by B ¯ β Tc , rooted at a run ˜ ρ T c of T c withinput τ T c in and output τ T c out . Thus, ( τ T c in , τ T c out , β T c ) ∈ R T c and V ( T, β ) can make the transition from ( p i − , ¯ z i − ) to ( p i , ¯ z i )as in its definition (including the updates to ¯ o ). If τ T c out (cid:54) = ⊥ then there exists a minimum j > i for which σ j = σ cT c andonce again V ( T, β ) can make the transition from ( p j − , ¯ z j − )to ( p j , ¯ z j ) as in its definition, mimicking the return of T c us-ing the isomorphism type τ T c out stored in ¯ o ( T c ). Now considerthe resulting path P = { ( p i , ¯ z i ) } ≤ i<γ . By applying a simi-lar analysis as in the base case, if γ (cid:54) = ω and τ out (cid:54) = ⊥ , then P is a returning path. If γ (cid:54) = ω and τ out = ⊥ , then P is ablocking path. If γ = ω , then there exists a prefix P (cid:48) of P such that P (cid:48) is a lasso path.For the only-if direction, let P be a path of V ( T, β ), start-ing from a state p = ( τ , σ , q , ¯ o , ¯ c ib ) where τ | ¯ x Tin = τ in .If P is a returning path, let v n = ( τ n , σ n , q n , ¯ o n , ¯ c nib ) be itslast state and τ out = τ n | (¯ x Tin ∪ ¯ x Tret ). If P is not a returningpath, then τ out = ⊥ . From P we can construct a adornedsymbolic tree of runs Sym T accepted by B ¯ β as follows. Theoot of Sym T is a local symbolic run ˜ ρ T constructed anal-ogously to the construction in the only-if direction in thebase case. Then for each σ i = σ oT c , by the induction hy-pothesis, there exists a symbolic tree of runs Sym T c whoseroot has input isomorphism type τ T c in , output isomorphismtype τ T c out and is accepted by B β Tc (note that τ T c in , τ T c out and β T c are uniquely defined by P and i ). We connect Sym T with Sym T c with an edge labeled ( i, β T c ).If P is a returning or blocking path, then Sym T is ac-cepted by B ¯ β . If P is a lasso path, then we first modify theroot ˜ ρ T of Sym T by repeating the subsequence from m +1 to n infinitely, then for each integer i such that m + 1 ≤ i ≤ n and Sym T is connected with some Sym T c with edge labeledindex ( i, β T c ), for each repetition I i (cid:48) of symbolic instance I i ,we make a copy of Sym T c and connect Sym T with Sym T c with edge labeled ( i (cid:48) , β T c ). The resulting Sym T is acceptedby B ¯ β . Thus, ( τ in , τ out , β ) ∈ R T . C.3 Complexity of Verification without Arith-metic Let Γ be a HAS and ϕ an HLTL-FO formula over Γ. Recallthe VASS V ( T, β ) constructed for each task T and assign-ment β to Φ T . According to the discussion of the complexityof verification in Section 4, checking whether Γ (cid:54)| = ϕ can bedone in O ( h log n · c · d log( d ) ) nondeterministic space, where c is a constant, h is the depth of H , and n, d bound the num-ber of states, resp. vector dimensions of V ( T, β ) for all T and β . We will estimate these bounds using the maximumnumber of T -isomorphism types, denoted M , and the maxi-mum number of T S -isomorphism types, denoted D . We alsodenote by N the size of (Γ , ϕ ). To complete the analysis,the specific bounds M and D will be computed for acyclic,linear-cyclic, and cyclic schemas, as well as with and withoutartifact relations.By our construction, the vector dimension of each V ( T, β )is the number of T S -isomorphism types, so bounded by D .The number of states is at most the product of the number ofdistinct T -isomorphism types, the number states in B ( T, β ),the number of all possible ¯ o and the number of possiblestates of ¯ c ib . And since the number of T c -isomorphism typesis no more than the number of T -isomorphism types if T c is child of T , the number of all possible ¯ o is at most (3 + M ) | child ( T ) | ≤ (3 + M ) N . Note that the number of states in B ( T, β ) is at most exponential in the size of the HLTL-FOproperty ϕ (extending the classical construction [55]). Thus, n = M · O ( N ) · (3 + M ) N · D bounds the number of statesof all V ( T, β ). It follows that O ( h log n · c · d log( d ) ) = O ( h · N · log M · c · D · log D ), yielding the complexity of checkingΓ (cid:54)| = ϕ . Thus, checking whether Γ | = ϕ can be done in O ( h · N log M · c · D log D ) deterministic space by Savitch’sTheorem [48], for some constant c .For artifact systems with no artifact relation, the boundsdegrade to O ( h · N log M ) and O ( h · N log M ).The number of T - and T S -isomorphism types dependson the type of the schema DB of Γ, as described next. Inour analysis, we denote by r the number of relations in DB and a the maximum arity of relations in DB . We also let k = max T ∈H | ¯ x T | , s = max T ∈H | ¯ s T | and h be the height of H . Acyclic Schema if DB is acyclic, then the length of eachexpression in the navigation set is bounded by the numberof relations in DB . So the size of the navigation set of each T -isomorphism type is at most a r k . The total number of T -isomorphism types is at most the product of the number ofpossible navigation sets and the number of possible equalitytypes. So M = ( r + 1) k · ( a r k ) a r k is a bound for the numberof T -isomorphism types for every T .For T S -isomorphism types, we note that within the samepath in V ( T, β ), all T S -isomorphism types have the sameprojections on ¯ x Tin since the input variables are unchangedthroughout a local symbolic run. So within each query of(repeated) reachability, each T S -isomorphism type can berepresented by (1) the equality connections from expressionsstarting with x ∈ ¯ x Tin to expressions starting with x ∈ ¯ s T and(2) the equality connections within expressions starting with x ∈ ¯ s T . For (1), the total number of all possible connectionsis at most M M where M is the number of expressionsstarting with x ∈ ¯ x Tin and M is the number of expressionsstarting with x ∈ ¯ s T . For (2), the total number of all possi-ble connections is at most M M . Note that M ≤ a r k and M ≤ a r s . So the total number of T S -isomorphism type isat most D = ( r + 1) s · ( a r k · a r s ) a r s = ( r + 1) s · ( a r k · s ) a r s .So for DB of fixed size and S T of fixed arity, the number of T -isomorphism type is exponential in k and the number of T S -isomorphism type is polynomial in k .By substituting the above values of M and D in the spacebound O ( h · N log M · c · D log D ), we obtain: Theorem For HAS Γ with acyclic schema and HLTL-FOproperty ϕ over Γ , Γ | = ϕ can be checked in O (exp( N c )) deterministic space, where c = O ( a r log r s ) . If Γ does notcontain artifact relations, then Γ | = ϕ can be checked in c · N O (1) deterministic space, where c = O ( a r log a r ) . Note that if DB is a Star schema [38, 54], which is a specialcase of acyclic schema, then the size of the navigation set isat most ark instead of a r k . So verification has the complex-ities stated in Theorem 56, with constants c = O ( ars ) and c = O ( ar log ar ) respectively.Note that with the simulation used in Lemma 31, the num-ber of variables is at most quadratic in the original numberof variables. This only affects the constants in the abovecomplexities. Linearly-Cyclic Schema Consider the case where DB islinearly cyclic. To bound the number of T - and T S -isomorphismtypes, it is sufficient to bound h ( T ), which equals to 1 + k · F ( δ ) where δ = max T c ∈ child ( T ) { h ( T c ) } if T is a non-leaf taskand δ = 1 if T is a leaf. And recall that F ( δ ) is the maximumnumber of distinct paths of length at most δ starting fromany relation in the foreign key graph FK. If DB is linearlycyclic, then by definition, the graph of cycles in FK form anacyclic graph G (each node in G is a cycle in the FK graphand there is an edge from cycle u to cycle v iff there is anedge from some node in u to some node in v in FK).Consider each path P of length at most δ in FK. P canbe decomposed into a list of subsequences of nodes, whereeach subsequence consists of nodes within the same cycle inFK (as shown in Figure 4).So F ( δ ) can be bounded by the product of (1) the numberof distinct paths in G starting from any cycle and (2) themaximum number distinct paths of length at most δ formedusing subsequences of nodes from cycles within the same .. Figure 4: Path in Linearly-Cyclic Foreign KeyGraph path in G . It is easy to see that (1) is at most a r . And sincethe length of a path in G is at most r , (2) is at most δ r .Thus F ( δ ) is bounded by a r · δ r = ( a · δ ) r .So if DB is linearly cyclic, then h ( T ) is bounded by 1+ a r k if T is a leaf task and h ( T ) is bounded by 1+( a · δ ) r · k if T isnon-leaf task where δ = max T c ∈ child ( T ) { h ( T c ) } . By solvingthe recursion, for every task T , we have that h ( T ) ≤ c · ( a · k ) r · h for some constant c . So the size of the navigation setof each T -isomorphism type is at most c · ( a · k ) r ( h +1) . Thusthe number of T - and T S -isomorphism types are boundedby ( r + 1) k · ( c · ( a · k ) r ( h +1) ) c · ( a · k ) r ( h +1) . By an analysissimilar to that for acyclic schemas, we can show that Theorem For HAS Γ with linearly-cyclic schema andHLTL-FO property ϕ over Γ , Γ | = ϕ can be checked in O (2 - exp( N c · h )) deterministic space where c = O ( r ) . If Γ does not contain artifact relations, then Γ | = ϕ can bechecked in O ( N c · h ) deterministic space where c = O ( r ) . Cyclic Schema If DB is cyclic, then each relation in FKhas at most a outgoing edges so F ( δ ) is bounded by a δ . So h ( T ) = O ( k · a δ ) where δ = 1 if T is a leaf task and δ =max T c ∈ child ( T ) h ( T c ) otherwise. Solving the recursion yields h ( T ) = h - exp( O ( N )). By pursuing the analysis similarly tothe above, we obtain the following: Theorem For HAS Γ with cyclic schema and HLTL-FOproperty ϕ over Γ , Γ | = ϕ can be checked in ( h +2) - exp( O ( N )) deterministic space. If Γ does not contain artifact relations,then Γ | = ϕ can be checked in h - exp( O ( N )) deterministicspace. To summarize, the schema type determines the size of thenavigation set, and hence the complexity of verification, asfollows ( h the height of the task hierarchy and N the size of(Γ , ϕ )). • Acyclic schemas are the least general, yet sufficiently ex-pressive for many applications. A special case of acyclicschema is the Star schema [38, 54] (or Snowflake schema)which is widely used in modeling business process data.For fixed acyclic schemas, the navigation sets have con-stant depth. • Linearly-cyclic schemas extend acyclic schemas but yieldhigher complexity. In general, the size of the navigationset is exponential in h and polynomial in N . Linearly-cyclic schemas allow very simple cyclic foreign key rela-tions such as a single Employee-Manager relation. Theyinclude important special cases such as schemas whereeach relation has at most one foreign key attribute. • Cyclic schemas allow arbitrary foreign keys but also comewith much higher complexity (a tower of exponentials ofheight h ), as the size of navigation sets become hyper-exponential wrt h . D. VERIFICATION WITH ARITHMETICD.1 Review of Quantifier Elimination The quantifier elimination ( QE ) problem for the reals canbe stated as follows. Definition For real variables Y = { y i } ≤ i ≤ l and aformula Φ( Y ) of the form ( Q x ) . . . ( Q k x k ) F ( y . . . y l , x . . . x k ) where Q i ∈ {∃ , ∀} and F ( y . . . y l , x . . . x k ) is a Booleancombination of polynomial inequalities with integer coeffi-cients, the quantifier elimination problem is to output a quantifier-free formula Ψ( Y ) such that for every Y ∈ R l , Φ( Y ) is trueiff Ψ( Y ) is true. The best known algorithm for solving the QE problem forthe reals has time and space complexity doubly-exponentialin the number of quantifier alternations and singly-exponentialin the number of variables. When applying QE in verificationof HAS, we are only interested in formulas that are existen-tially quantified. According to Algorithm 14.6 of [3], theresult for this special case can be stated as follows: Theorem For existentially quantified formula Φ( Y ) ,an equivalent quantifier-free formula Ψ( Y ) can be computedin time and space ( s · d ) O ( k ) O ( l ) , where s is the number ofpolynomials in Φ , d is the maximum degree of the polynomi-als, k is the quantifier rank of Φ and l = | Y | . Note that in the special case when l = 0, quantifier elimina-tion simply checks satisfiability. Thus we have: Corollary Satisfiability over the reals of a Booleancombination Φ of polynomial inequalities with integer coeffi-cients can be decided in time and space ( s · d ) O ( k ) , where s is the number of polynomials in Φ , d is the maximum degreeof the polynomials, and k is the number of variables in Φ . Also in [3], it is shown that if the bit-size of coefficients inΦ is bounded by τ , then the bit-size of coefficients in Ψ isbounded by τ · d O ( k ) O ( l ) . D.2 Review of General Real Algebraic Geom-etry We next review a classic result in general real algebraicgeometry. For a given set of polynomials P = { P , . . . , P s } over k variables { x i } ≤ i ≤ k , a sign condition of P is a map-ping σ : P (cid:55)→ {− , , +1 } . We denote by c ( σ, P ) the semi-algebraic set { x | x ∈ R k , sign ( P ( x )) = σ ( P ) , ∀ P ∈ P} calledthe cell of the sign condition σ for P .We use the following result from [35, 4]: Theorem Given a set of polynomials P with integercoefficients over k variables { x i } ≤ i ≤ k , the number of dis-tinct non-empty cells, namely { σ : P (cid:55)→ {− , , +1 } | c ( σ, P ) (cid:54) = ∅} , is at most ( s · d ) O ( k ) , where s = |P| and d is the maximumdegree of polynomials in P . Given a set of polynomials P , we can use the followingnaive approach to compute the set of sign conditions result-ing in non-empty cells. We simply enumerate sign conditionsof P and discard sign conditions that results in empty cells orells equivalent to any recorded sign conditions known to benon-empty. Checking whether a cell is empty and checkingwhether two cells are equivalent can be reduced to check-ing satisfiability of a formula of polynomial inequalities. ByCorollary 61, this naive approach takes space ( s · d ) O ( k ) . Theorem Given a set of polynomials P over { x i } ≤ i ≤ k ,the set of non-empty cells { σ : P (cid:55)→ {− , , +1 } | c ( σ, P ) (cid:54) = ∅} defined by P can be computed in space ( s · d ) O ( k ) where s = |P| and d is the maximum degree of polynomials in P . D.3 Cells for Verification Intuitively, in order to handle arithmetic in our verifica-tion framework, we need to extend each isomorphism type τ with a set of polynomial inequality constraints over the setof numeric expressions in the extended navigation set E + T .We say that an expression e is numeric if e = x for somenumeric variable x or e = x R .w and the last attribute of w is numeric. For each task T , we denote by E T R the set ofnumeric expressions of T where for each x R .w ∈ E T R , | w | ≤ h ( T ).The constraints over the numeric expressions are repre-sented by a non-empty cell c (formally defined below). Whena service is applied, the arithmetic parts of the conditionsare evaluated against c . And for every transition I σ (cid:48) −→ I (cid:48) where c, c (cid:48) are the cells of I, I (cid:48) respectively, if any variablesare modified by the transition, then the projection of c (cid:48) ontothe preserved numeric expressions has to refine the projec-tion of c onto the preserved numeric expressions. Similarcompatibility checks are required when a child task returnsto its parent.We introduce some more notation. For every T ∈ H ,we consider polynomials in the polynomial ring Z [ E T R ]. Foreach polynomial P , we denote by var ( P ) the set of numericexpressions mentioned in P and for a set of polynomials P , we denote by var ( P ) the set (cid:83) P ∈P var ( P ). For P ⊂ Z [ E T R ] and E ⊆ E T R , we denote by P|E the set of polynomials { P | P ∈ P , var ( P ) ⊆ E} .We next define the cells used in our verification algorithm.At task T , for a set of numeric expressions E ⊆ E T R and aset of polynomials P where var ( P ) ⊆ E , we define the cellsover ( E , P ) as follows. Definition A cell c over ( E , P ) is a subset of R |E| for which there exists a sign condition σ of P such that c = c ( σ, P ) . For P ⊂ Z [ E T R ], we denote by K ( P , E ) the set of cellsover ( E , P|E ). Namely, K ( P , E ) = { c ( σ, P|E ) | σ ∈ P|E (cid:55)→{− , , +1 }} . And we denote by K ( P ) the set of cells (cid:83) E⊆E T R K ( P , E ).Compatibility between cells is tested using the notion ofrefinement. Intuitively, a cell c refines another cell c (cid:48) if c can be obtained by adding extra numeric expressions and/orconstraints to c (cid:48) . Formally, Definition For cell c over ( E , P ) and cell c (cid:48) over ( E (cid:48) , P (cid:48) ) where c = c ( σ, P ) and c (cid:48) = c ( σ (cid:48) , P (cid:48) ) , we say that c refines c (cid:48) , denoted by c (cid:118) c (cid:48) , if E (cid:48) ⊆ E , P (cid:48) ⊆ P and σ |P (cid:48) = σ (cid:48) . Note that if E = E (cid:48) , then c (cid:118) c (cid:48) iff c ⊆ c (cid:48) . We next define the projection of a cell onto a set of vari-ables. For each cell c over ( E , P ) where E ⊆ E T R and vari-ables ¯ x ⊆ ¯ x T , the projection of c onto ¯ x , denoted by c | ¯ x , is defined to be the projection of c onto the expressions E| ¯ x where E| ¯ x = { e ∈ E| e = x R .w ∨ e = x, x ∈ ¯ x } . By theTarski-Seidenberg theorem [52], c | ¯ x is a union of disjointcells. Also, the projections c | ¯ x can be obtained by quantifierelimination. Let Φ( c ) be the conjunctive formula defining c using polynomials in P . Then by treating E| ¯ x as the setof free variables, the formula Ψ( c ) obtained by eliminating E − E| ¯ x from Φ( c ) defines c | ¯ x . We denote by proj ( c, ¯ x ) theset of polynomials mentioned in Ψ( c ). It is easy to see that c | ¯ x is a union of cells over ( E| ¯ x, proj ( c, ¯ x )).The following notation is useful for checking compatibilitybetween a cell and the projection of another cell: we definethat a cell c refines another cell c (cid:48) wrt to projection to ¯ x ,denoted as c (cid:118) ¯ x c (cid:48) , if there exists a cell ˜ c ⊆ c (cid:48) | ¯ x such that c (cid:118) ˜ c .Finally, we introduce notations relative to variable pass-ing between parent task and child task. For each task T and T c ∈ child ( T ), we denote by E T c → T R the set of numericexpressions { e | e ∈ E T c R , e = x ∨ e = x R .w, x ∈ ¯ x T c in ∪ ¯ x T c ret } .In other words, E T c → T R is the subset of expressions in E T c R connected with expressions in E T R by calls/returns of T c . Let f in , f out be the input and output mapping between T and T c . For each expression e ∈ E T c → T R , we define e T c → T to bean expression in E T R as follows. If e = x , then e T c → T = ( f in ◦ f − out )( x ). If e = x R .w , then e T c → T = (( f in ◦ f − out )( x )) R .w .For a set of variables E ⊆ E T c → T R , we define E T c → T to be { e T c → T | e ∈ E} . For a polynomial P over E T c → T R where T c ∈ child ( T ), we denote by P T c → T the polynomial obtainedby replacing in P each numeric expression e with e T c → T . Fora cell c of T c where c = c ( σ, P ) and var ( P ) ⊆ E T c → T R for ev-ery P ∈ P , we let c T c → T to be the cell of T which equals c ( σ (cid:48) , P (cid:48) ), where P (cid:48) = { P T c → T | P ∈ P} and σ (cid:48) is a sign con-dition over P (cid:48) such that σ (cid:48) ( P T c → T ) = σ ( P ) for every P ∈ P . D.4 Hierarchical Cell Decomposition We now introduce the Hierarchical Cell Decomposition.Intuitively, for each task T , we would like to compute a setof polynomials P and a set of cells K T such that for eachsubset E of E T R , the set of cells over ( E , P|E ) in K T is apartition of R |E| .The set of cells K T satisfies the property that for the setof polynomials P mentioned at any condition of T in thespecification Γ and HLTL-FO property ϕ , each cell c ∈ K T uniquely defines the sign condition of P . This allows usto compute the signs of any polynomial in any conditionin the local symbolic runs. In addition, for each pair ofcells c, c (cid:48) ∈ K T , we require that the projection of c and c (cid:48) to the input variables ¯ x Tin (and ¯ x Tin ∪ ¯ s T ) be disjoint oridentical. So to check whether two cells c and c (cid:48) of twoconsecutive symbolic instances in a local symbolic run arecompatible when applying an internal service, we simplyneed to check whether their projections on ¯ x Tin are equal(note that refinement is implied by equality). Finally, foreach child task T c of T , for each cell c ∈ K T and c (cid:48) ∈ K T c , c uniquely defines the sign condition for the set of polynomialsthat defines c (cid:48) | ¯ x T c in and c (cid:48) | (¯ x T c ret ∪ ¯ x T c in ). This reduces to cellrefinement the problem of checking compatibility when childtasks are called or return.The Hierarchical Cell Decomposition is formally definedas follows. Definition The Hierarchical Cell Decomposition as-sociated to an artifact system H and property ϕ is a collec-ion {K T } T ∈H of sets of cells, such that for each T ∈ H , K T = K ( P (cid:48) T ) , where the set of polynomials P (cid:48) T is defined asfollows. First, let P T consist of the following: • all polynomials mentioned in any condition over ¯ x T in Γ and the property ϕ , • polynomials { e | e ∈ E T R } ∪ { e − e (cid:48) | e, e (cid:48) ∈ E T R } , and • for every T c ∈ child ( T ) and subset ¯ x ⊆ ¯ x T c ret , the set ofpolynomials { P T c → T | P ∈ proj ( c, ¯ x T c in ∪ ¯ x ) , c ∈ K T c } .Next, let P sT = P T ∪ (cid:83) c ∈K ( P T ) proj ( c, ¯ x Tin ∪ ¯ s T ) . Finally, P (cid:48) T = P sT ∪ (cid:83) c ∈K ( P sT ) proj ( c, ¯ x Tin ) . The Hierarchical Cell Decomposition satisfies the follow-ing property, as desired. Lemma Let T be a task and P (cid:48) T as above. For everypair of cells c , c ∈ K T , and ¯ x = (¯ x Tin ∪ ¯ s T ) or ¯ x = ¯ x Tin , if c ∈ K ( P (cid:48) T , E ) and c ∈ K ( P (cid:48) T , E ) where E | ¯ x = E | ¯ x , then c | ¯ x and c | ¯ x are either equal or disjoint. Proof. We prove the lemma for the case when ¯ x = ¯ x Tin .The proof is similar for ¯ x = ¯ x Tin ∪ ¯ s T .Let ˜ P sT = (cid:83) c ∈K ( P sT ) proj ( c, ¯ x Tin ). For each cell c ∈ K ( P (cid:48) , E ), since P (cid:48) |E = ( P sT |E ) ∪ ( ˜ P sT |E ) as P (cid:48) = P sT ∪ ˜ P sT , thereexist c ∈ K ( P sT , E ) and c ∈ K ( ˜ P sT , E ) such that c = c ∩ c . Then consider c | ¯ x Tin . Since all polynomials in ˜ P sT areover expressions of ¯ x Tin , we have c | ¯ x Tin = ( c ∩ c ) | ¯ x Tin =( c | ¯ x Tin ) ∩ c . And by definition, proj ( c , ¯ x Tin ) ⊆ ˜ P sT , so c uniquely defines the sign conditions for proj ( c , ¯ x Tin ), whichmeans that either c ∩ c | ¯ x Tin = ∅ or c ⊆ c | ¯ x Tin . And as c ∩ c | ¯ x Tin = c | ¯ x Tin is non-empty, c | ¯ x Tin = c .Therefore, for every c ∈ K ( P (cid:48) T , E ) and c ∈ K ( P (cid:48) T , E )where E | ¯ x Tin = E | ¯ x Tin = E , there exist cells ˜ c , ˜ c ∈ K ( P (cid:48) T , E )such that c | ¯ x Tin = ˜ c and c | ¯ x Tin = ˜ c . Since ˜ c and ˜ c areeither disjoint or equal, c | ¯ x Tin and c | ¯ x Tin are also either dis-joint or equal.From the above lemma, the following is obvious: Corollary For every task T and c ∈ K T , c | ¯ x Tin and c | (¯ x Tin ∪ ¯ s T ) are single cells in K T . In view of the corollary, we use the notations of single-cell operators (projection, refinement, etc.) on c | ¯ x Tin and c | (¯ x Tin ∪ ¯ s T ) in the rest of our discussion.To be able to connect with child tasks, we show the fol-lowing property of K T : Lemma For all tasks T and T c where T c ∈ child ( T ) ,and every cell c ∈ K T and c ∈ K T c where c ∈ K ( P (cid:48) T , E ) and c ∈ K ( P (cid:48) T c , E ) , for each set of variables ¯ x = ¯ x TT ↑ c ∪ ¯ y where ¯ y is some subset of ¯ x TT ↓ c , if E | ¯ x = ( E ) T c → T | ¯ x ,then either (1) c (cid:118) ¯ x ( c ) T c → T or (2) c | ¯ x is disjoint from ( c ) T c → T | ¯ x . Proof. Denote by P ¯ xT c the set of polynomials { P T c → T | P ∈ proj ( c, ¯ x ) , c ∈ K T c } . For each cell c ∈ K ( P (cid:48) T , E ), thereexists ˜ c ∈ K ( P ¯ xT c , E ) such that c ⊆ ˜ c . For each cell c ∈ K ( P (cid:48) T c , E ), as E | ¯ x = ( E ) T c → T | ¯ x , ( c ) T c → T | ¯ x is aunion of cells in K ( P ¯ xT c , E ). So either ˜ c is disjoint with orcontained in ( c ) T c → T | ¯ x . If ˜ c and ( c ) T c → T | ¯ x are disjoint,then ( c ) T c → T | ¯ x and c | ¯ x are disjoint. If ˜ c ⊆ ( c ) T c → T | ¯ x ,then we have c (cid:118) ˜ c ⊆ ( c ) T c → T | ¯ x so c (cid:118) ¯ x ( c ) T c → T . D.5 Extended Isomorphism Types Given the Hierarchical Cell Decomposition {K T } T ∈H , wecan extend our notion of isomorphism type to support arith-metic. Definition For navigation set E T , equality type ∼ τ over E + T and c ∈ K T , the triple τ = ( E T , ∼ τ , c ) is an extended T -isomorphism type if • ( E T , ∼ τ ) is a T -isomorphism type, and • c = c ( σ, P (cid:48) T | ( E T R ∩ E + T )) for some sign condition σ of P (cid:48) T | ( E T R ∩E + T ) such that for every numeric expression e, e (cid:48) ∈E + T , e ∼ τ e (cid:48) iff σ ( e − e (cid:48) ) = 0 and e ∼ τ iff σ ( e ) = 0 . For each condition π over ¯ x T and extended T -isomorphismtype τ , τ | = π is defined as follows. For each polynomialinequality “ P ◦ 0” in π where ◦ ∈ { <, >, = } , P ◦ σ ( P ) ◦ σ is the sign condition of c . The rest of thesemantics is the same as in normal T -isomorphism type.The projection of an extended T -isomorphism type τ on¯ x Tin and ¯ x Tin ∪ ¯ s T is defined in the obvious way. For τ =( E T , ∼ τ , c ), we define that τ | ¯ x = ( E T | ¯ x, ∼ τ | ¯ x, c | ¯ x ) for ¯ x =¯ x Tin or ¯ x = ¯ x Tin ∪ ¯ s T . The projection of τ on ¯ x Tin and ¯ x Tin ∪ ¯ s T up to length k is defined analogously. The projectionof every extended T -isomorphism type on ¯ x Tin ∪ ¯ s T is anextended T S -isomorphism type.To extend the definitions of local symbolic run and sym-bolic tree of runs, we first replace T -isomorphism type withextended T -isomorphism type and T S -isomorphism type withextended T S -isomorphism type in the original definitions.The semantics is extended with the following rules.For two symbolic instances I and I (cid:48) where the cell of I is c and the cell of I (cid:48) is c (cid:48) , I (cid:48) is a valid successor of I by applyingservice σ (cid:48) if the following conditions hold in addition to theoriginal requirements: • if σ (cid:48) is an internal service, then c | ¯ x Tin = c (cid:48) | ¯ x Tin . • if σ (cid:48) is an opening service of T c ∈ child ( T ) or closingservice of T , then c = c (cid:48) . • if σ (cid:48) is a closing service of T c ∈ child ( T ), then c (cid:48) (cid:118) c .The counters ¯ c are updated as in transitions between sym-bolic instances without arithmetic. Each dimension of ¯ c cor-responds to an extended T S -isomorphism type.For each local symbolic run ˜ ρ T = ( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ),the following are additionally satisfied: • c in = c | ¯ x Tin , where c in is the cell of τ in and c is the cellof τ ; • if τ out (cid:54) = ⊥ , then c out (cid:118) ¯ x Tin ∪ ¯ x Tret c γ − , where c out is thecell of τ out and c γ − is the cell of τ γ − .In a symbolic tree of runs Sym , for every two local sym-bolic runs ˜ ρ T = ( τ in , τ out , { ( I i , σ i ) } ≤ i<γ ) and ˜ ρ T c = ( τ (cid:48) in , τ (cid:48) out , { ( I (cid:48) i , σ (cid:48) i ) } ≤ i<γ (cid:48) ) where T c ∈ child ( T ), if ˜ ρ T c is connectedto ˜ ρ T by an edge labeled with index i , then the followingconditions must be satisfied in addition to the original re-quirements: • for the cell c i of symbolic instance I i and the cell c in of τ (cid:48) in , c i (cid:118) c T c → Tin . • if ˜ ρ T c is a returning local symbolic run, then for the cells c out of τ (cid:48) out and c j of I j where j is the smallest index suchthat σ j = σ cT c and j > i , we have that c j (cid:118) ¯ x null c T c → Tout ,where ¯ x null = { x | x ∈ ¯ x TT ↑ c , x ∼ τ j − null } . .6 Actual Runs versus Symbolic Runs We next show that the connection between actual runsand symbolic runs established in Theorem 20 still holds forthe extended local and symbolic runs. The structure of theproof is the same, so we only state the necessary modifica-tions needed to handle arithmetic. D.6.1 From Trees of Local Runs to Symbolic Trees ofRuns Given a tree of local runs Tree , the construction of a corre-sponding symbolic tree of runs Sym can be done as follows.We first construct Sym from Tree without the cells follow-ing the construction described in the proof of the only-if partof Theorem 20. Then for each task T and symbolic instance I with extended isomorphism type τ in some local symbolicrun of T , let E be the set of numeric expressions in τ and v : E (cid:55)→ R the valuation of E at I . Then the cell c of I ischosen to be the unique cell in K ( P (cid:48) T , E ) that contains v . Forcells c and c (cid:48) of two consecutive symbolic instances I and I (cid:48) where the service that leads to I (cid:48) is σ (cid:48) , • if σ (cid:48) is an internal service, by Lemma 67, as c | ¯ x Tin and c (cid:48) | ¯ x Tin overlaps, we have c | ¯ x Tin = c (cid:48) | ¯ x Tin , • if σ (cid:48) is an opening service, c = c (cid:48) is obvious, and • if σ (cid:48) is a closing service, let E be the numeric expressionsof c and E (cid:48) be the numeric expressions of c (cid:48) . We have E ⊆ E (cid:48) so P (cid:48) T |E ⊆ P (cid:48) T |E (cid:48) . So c (cid:48) can be written as c ∩ c where c ∈ K ( P (cid:48) T , E ) and c ∈ K ( P (cid:48) T , E (cid:48) − E ). As thevalues of the preserved numeric expressions are equal inthe two consecutive instances, we have c = c so c (cid:118) c (cid:48) .Thus, each local symbolic run in Sym is valid. Followinga similar analysis, one can verify that for every two con-nected local symbolic runs ˜ ρ T and ˜ ρ T c , the conditions forsymbolic tree of runs stated in Appendix D.5 are satisfieddue to Lemma 69. D.6.2 From Symbolic Trees of Runs to Trees of LocalRuns Given a symbolic tree of runs Sym , we construct the treeof local runs Tree as follows. Recall that in the originalproof, for each local symbolic run ˜ ρ T , we construct the globalisomorphism type Λ of ˜ ρ T and use Λ to construct the localrun ρ T and database instance D T . With arithmetic, theconstruction of Λ remains unchanged but we use a differentconstruction for ρ T and D T .To construct ρ T and D T , we first define a sequence ofmappings { p i } ≤ i<γ from the sequence of cells { c i } ≤ i<γ of˜ ρ T where each p i is a mapping from E + T ∩ E T R to R and E + T is the extended navigation set of τ i . Note that each p i canbe also viewed as a point in c i . The sequence of mappings { p i } ≤ i<γ determines the values of numeric expressions, aswe shall see next. For each mapping p whose domain is theset of numeric expressions E , we denote by p | ¯ x the projectionof p to E ∩ (¯ x ∪{ x R .w | x ∈ ¯ x } ). Then { p i } ≤ i<γ is constructedas follows: • First, we pick an arbitrary point (mapping) p in from c in where c in is the cell of the input isomorphism type of ˜ ρ T . • Then, for each equivalence class L of life cycles in ˜ ρ T ,let c L be the cell of the last symbolic instances in the lastdynamic segments of life cycles in L . Pick a mapping p L ∈ c L such that p L | ¯ x Tin = p in . Such a mapping always existsbecause, by Lemma 67, for each 0 ≤ i < γ , c i | ¯ x Tin = c in . • Next, for each equivalence class S of segments in L , let c S be the cell of the last symbolic instance in segments in S . Pick a mapping p S from c S such that p S | (¯ x Tin ∪ ¯ s T ) = p L | (¯ x Tin ∪ ¯ s T ). Such a mapping always exists because foreach life cycle L ∈ L and I i in L , c L | (¯ x Tin ∪ ¯ s T ) (cid:118) c i | (¯ x Tin ∪ ¯ s T ). • Finally, for each segment S = { ( I i , σ i ) } a ≤ i ≤ b ∈ S , let p b = p S , and for a ≤ i < b , let p i = p i +1 | ¯ x where ¯ x = { x | x (cid:54)∼ τ i null } are the preserved variables from I i to I i +1 .Such mappings always exist because for each a ≤ i < b , c i +1 (cid:118) c i .For the sequence of mappings { p i } ≤ i<γ constructed above,the following is easily shown: Lemma For all local expressions ( i, e ) and ( i (cid:48) , e (cid:48) ) inthe global isomorphism type Λ , where e and e (cid:48) are numeric, ( i, e ) ∼ ( i (cid:48) , e (cid:48) ) implies that p i ( e ) = p i (cid:48) ( e (cid:48) ) . Given the above property, we can construct ρ T and D T as follows. We first construct ρ T and D T as in the casewithout arithmetic. Then for each equivalence class [( i, e )],we replace the value [( i, e )] in ρ T and D T with the value p i ( e ). It is clear that Lemmas 36 and 44 still hold since theglobal equality type in Λ remains unchanged.To construct the full tree of local runs Tree from the sym-bolic tree of runs, we perform the above construction in atop-down manner. For each local symbolic run ˜ ρ T , we firstconstruct { p i } ≤ i<γ for the root ˜ ρ T of Sym using the aboveconstruction. Then recursively for each ˜ ρ T ∈ Sym and child˜ ρ T c connected to ˜ ρ T by an edge labeled with index i , we picka mapping p in from c in of ˜ ρ T c such that p T c → Tin = p i | ¯ x TT ↓ c .And if ˜ ρ T c is a returning run, we pick p out from c out of ˜ ρ T c such that p T c → Tout | ¯ x null = p j | ¯ x null where j is index of the cor-responding closing service σ cT c at ˜ ρ T , and ¯ x null is defined asabove.We next construct { p i } ≤ i<γ of ˜ ρ T c similarly to above, ex-cept that (1) p in is given, and (2) if ˜ ρ T c is a returning run,then for the equivalence class L of life cycles where I γ − iscontained in some life cycle L ∈ L , we pick p L such that p L | ¯ x T c in ∪ ¯ x T c ret = p out . Then ρ T c and D T c are constructedfollowing the above approach. The tree of local runs Tree is constructed as described in the proof of Theorem 20. Fol-lowing the same approach, we can show: Theorem For every HAS Γ and HLTL-FO property ϕ with arithmetic, there exists a symbolic tree of runs Sym accepted by B ϕ iff there exists a tree of local runs Tree anddatabase D such that Tree is accepted by B ϕ on D . D.7 Complexity of Verification with Arithmetic Similarly to the analysis in Appendix C.3, it is sufficient toupper-bound the number of T -and T S -isomorphism types.To do so, we need to bound the size of {K T } T ∈H . By theconstruction of each K T and by Theorem 62, it is sufficientto bound the size of each P (cid:48) T .We denote by l the number of numeric expressions, s thenumber of polynomials in Γ and ϕ , d the maximum degree ofthese polynomials, t the maximum bitsize of the coefficients,and h the height of the task hierarchy H . For each task T ,we denote by s ( T ) the number of polynomials in P (cid:48) T and d ( T ) the maximum degree of polynomials in P (cid:48) T .If T is a leaf task, then |P T | ≤ s + l . The number ofpolynomials in P sT is no more than the product of (1) theumber of subsets of E T R , (2) the maximum number of non-empty cells over ( E , P T |E ) and (3) the maximum number ofpolynomials in each proj ( c, ¯ x Tin ∪ ¯ s T ). By Theorem 60, thenumber of polynomials is no more than the running time,which is bounded by (( s + l ) · d ) O ( l ) . Then by Theorem62, the number of non-empty cells over ( E , P T |E ) is at most(( s + l ) · d ) O ( l ) . Thus, |P sT | ≤ (( s + l ) · d ) O ( l ) . By thesame analysis, we obtain that for P (cid:48) T , s ( T ) = |P (cid:48) T | ≤ (( s + l ) · d ) O ( l ) . Similarly, d ( T ) can be upper-bounded by (( s + l ) · d ) O ( l ) .Next, if T is a non-leaf task, we denote by s (cid:48) the sizeof P T and by d (cid:48) the maximum degree of polynomials in P T . We have that s (cid:48) ≤ ( s + l ) + (cid:80) T c ∈ child ( T ) l ( s ( T c ) · d ( T c )) O ( l ) · ( s ( T c ) · d ( T c )) O ( l ) ≤ ( s + l )+( s ( T c ) · d ( T c )) O ( l ) ,and d (cid:48) ≤ max T c ∈ child ( T ) ( s ( T c ) · d ( T c )) O ( l ) .Following the same analysis as above, we have that both s ( T ) and d ( T ) are at most (( s (cid:48) + l ) · d (cid:48) ) O ( l ) . By solvingthe recursion, we obtain that s ( T ) , d ( T ) ≤ (( s + l ) · d ) ( c · l ) h for some constant c . Then by Theorem 62, |K T | is at most( s ( T ) · d ( T )) O ( k ) . So we have Lemma For each task T , the number of cells in K T is at most (( s + l ) · d ) ( c · l ) h for some constant c . The space used by the verification algorithm with arith-metic is no more than the space needed to pre-compute {K T } T ∈H plus the space for the VASS (repeated) reacha-bility for each task T . By Theoream 63, for each task T , theset K T can be computed in space O (cid:16) (( s + l ) · d ) ( c · l ) h (cid:17) .For VASS (repeated) reachability, according to the anal-ysis in Appendix C.3, state (repeated) reachability can becomputed in O ( h · N log M · c · D log D ) space ( O ( h · N log M )w/o. artifact relation), where h is the height of H , N is thesize of (Γ , ϕ ), M is the number of extended T -isomorphismtypes and D is the number of extended T S -isomorphismtypes. With arithmetic, M and D are the products of num-ber of normal T -and T S -isomorphism types multiplied by |K T | respectively. As l is less than the number of expres-sions whose upper bounds are obtained in Appendix C.3, byapplying Lemma 73, we obtain upper bounds for M and D for the different types of schema.By substituting the bounds for M and D , we have thefollowing results. Note that for Γ without artifact relations,the complexity is dominated by the space for pre-computing {K T } T ∈H . Theorem Let Γ be a HAS with acyclic schema and ϕ an HLTL-FO property over Γ , where arithmetic is allowedin Γ and ϕ . Γ | = ϕ can be verified in - exp( N O ( h + r ) ) deter-ministic space. If Γ does not contain artifact relation, then Γ | = ϕ can be verified in exp( N O ( h + r ) ) deterministic space. Theorem Let Γ be a HAS with linearly-cyclic schemaand ϕ an HLTL-FO property over Γ , where arithmetic is al-lowed in Γ and ϕ . Γ | = ϕ can be verified in O (2 - exp( N c · h )) deterministic space, where c = O ( r ) . If Γ does not containartifact relation, then Γ | = ϕ can be verified in O (exp( N c · h )) deterministic space, where c = O ( r ) . Theorem Let Γ be a HAS with cyclic schema and ϕ an HLTL-FO property over Γ , where arithmetic is allowed in Γ and ϕ . Γ | = ϕ can be verified in ( h + 2) - exp( O ( N )) deter-ministic space. If Γ does not contain artifact relation, then Γ | = ϕ can be verified in ( h + 1) - exp( O ( N )) deterministicspace. E. UNDECIDABILITY RESULTS We provide a proof of Theorem 24 for relaxing restriction(2). Recall that HAS (2) allows subtasks of a given task tooverwrite non-null ID variables. The same proof idea can beused for restrictions (1) to (7). Proof. We show undecidability by reduction from thePost Correspondence Problem (PCP) [46, 48]. Given aninstance P = { ( a i , b i ) } ≤ i ≤ k of PCP, where each ( a i , b i ) isa pair of non-empty strings over { , } , we show how toconstruct a HAS (2) Γ and HLTL-FO formula ϕ such thatthere is a solution to P iff there exists a run of Γ satisfying ϕ (i.e., Γ (cid:54)| = ¬ ϕ ).The database schema of Γ contains a single relation G ( id, next , label )where next is a foreign-key attributes referencing attribute id and label is a non-key attribute. Let α, β be distinctid values in G . A path in G from α to β is a sequence ofIDs i , . . . , i n in G where α = i , β = i n , and for each j, ≤ j < n , i j +1 = i j . next . It is easy to see that there is atmost one path from α to β for which i j (cid:54) = α, β for 0 < j < n ,and the path must be simple ( i , i , . . . , i n are distinct). Ifsuch a path exists, we denote by w ( α, β ) the sequence oflabels i . label , . . . , i n . label (a word over { , } , assumingthe values of label are 0 or 1). Intuitively, Γ and ϕ do thefollowing given database G :1. non-deterministically pick two distinct ids α, β in G 2. check that there exists a simple path from α to β andthat w ( α, β ) witnesses a solution to P ; the uniqueness ofthe simple path from α to β is essential to ensure that w ( α, β ) is well defined.Step 2 requires simultaneously parsing w ( α, β ) as a s . . . a s m and b s . . . b s m for some s i ∈ [1 , k ] , ≤ i ≤ m , by syn-chronously walking the path from α to β with two point-ers P a and P b . More precisely, P a and P b are initializedto α . Then repeatedly, an index s j ∈ [1 , k ] is picked non-deterministically, and P a advances | a s j | steps to a new po-sition P (cid:48) a , such that the sequence of labels along the pathfrom P a to P (cid:48) a is a s j and no id along the path equals α or β .Similarly, P b advances | b s j | steps to a new position P (cid:48) b , suchthat the sequence of labels along the path from P b to P (cid:48) b is b s j and no id along the path equals α or β . This step re-peats until P a and P b simultaneously reach β (if ever). Theproperty ϕ checks that eventually P a = P b = β , so w ( α, β )witnesses a solution to P .In more detail, we use two tasks T p and T c where T c is achild task of T p (see Figure 5). start end P a P b T p : T c : start end P a P b P a ’ P b ’ Figure 5: Undecidiability for HAS (2) Task T p has two input variables start , end (initialized to dis-tinct ids α and β by the global precondition), and two ar-ifact variables P a and P b (holding the two pointers). T p also has a binary artifact relation S whose set variables are( P a , P b ). At each segment of T p , the subtask T c is calledwith ( P a , P b , start , end ) passed as input. Then an internalservice of T c computes P (cid:48) a and P (cid:48) b , such that P a , P (cid:48) a , P b and P (cid:48) b satisfy the condition stated above for some s j ∈ [1 , k ].Then T c closes and returns P (cid:48) a and P (cid:48) b to T p , overwriting P a and P b (note that this is only possible because restriction(2) is lifted). At this point we would like to call T c again,but multiple calls to a subtasks are disallowed between in-ternal transitions. To circumvent this, we equip T p with aninternal service that simply propagates ( P a , P b , start , end ).The variables start , end are automatically propagated as in-put variables of T p . Propagating ( P a , P b ) is done by insert-ing it into S and retrieving it in the next configuration (so δ = { + S ( P a , P b ) , − S ( P a , P b ) } ). Now we are allowed to callagain T c , as desired.It can be shown that there exists a solution to P iff thereexists a run of the above system that reaches a configura-tion in which P a = P b = end . This can be detected bya second internal service success of T p with pre-condition P a = P b = end . Thus, the HLTL-FO property ϕ is simply[ F ( success )] T p . Note that this is in fact an HLTL formula.Thus, checking HLTL-FO (and indeed HLTL) properties ofHAS (2)(2)