[PDF] Safety Verification of Parameterized Systems under Release-Acquire

Abstract

We study the safety verification problem for parameterized systems under the release-acquire (RA) semantics. It has been shown that the problem is intractable for systems with unlimited access to atomic compare-and-swap (CAS) instructions. We show that, from a verification perspective where approximate results help, this is overly pessimistic. We study parameterized systems consisting of an unbounded number of environment threads executing identical but CAS-free programs and a fixed number of distinguished threads that are unrestricted. Our first contribution is a new semantics that considerably simplifies RA but is still equivalent for the above systems as far as safety verification is concerned. We apply this (general) result to two subclasses of our model. We show that safety verification is only \pspace-complete for the bounded model checking problem where the distinguished threads are loop-free. Interestingly, we can still afford the unbounded environment. We show that the complexity jumps to \nexp-complete for thread-modular verification where an unrestricted distinguished `ego' thread interacts with an environment of CAS-free threads plus loop-free distinguished threads (as in the earlier setting). Besides the usefulness for verification, the results are strong in that they delineate the tractability border for an established semantics.

Full PDF

SSafety Veriﬁcation of Parameterized Systemsunder Release-Acquire

Adwait Godbole

University of California at [email protected]

Shankara Narayanan Krishna

IIT Bombay, [email protected]

Roland Meyer

Institute of Theoretical Computer Science, Braunschweig, [email protected]

Abstract

We study the safety veriﬁcation problem for parameterized systems under the release-acquire (RA)semantics. It has been shown that the problem is intractable for systems with unlimited access toatomic compare-and-swap (CAS) instructions. We show that, from a veriﬁcation perspective whereapproximate results help, this is overly pessimistic. We study parameterized systems consisting ofan unbounded number of environment threads executing identical but CAS-free programs and aﬁxed number of distinguished threads that are unrestricted.Our ﬁrst contribution is a new semantics that considerably simpliﬁes RA but is still equivalentfor the above systems as far as safety veriﬁcation is concerned. We apply this (general) result to twosubclasses of our model. We show that safety veriﬁcation is only

PSPACE -complete for the boundedmodel checking problem where the distinguished threads are loop-free. Interestingly, we can stillaﬀord the unbounded environment. We show that the complexity jumps to

NEXPTIME -completefor thread-modular veriﬁcation where an unrestricted distinguished ‘ego’ thread interacts with anenvironment of CAS-free threads plus loop-free distinguished threads (as in the earlier setting).Besides the usefulness for veriﬁcation, the results are strong in that they delineate the tractabilityborder for an established semantics.

Concurrency, Veriﬁcation

Keywords and phrases release acquire, parameterized systems

Release-acquire (RA) is a popular fragment of C++11 [14] (in which reads are annotatedby acquire and writes by release) that strikes a good balance between programmability andperformance and has received considerable attention (see e.g., [8, 41, 47, 49, 51, 53, 60, 63, 64]).The model is not limited to concurrent programs, though. RA has tight links [52] withcausal consistency (CC) [7], a prominent consistency guarantee in distributed databases [55].Common to RA implementations and distributed databases is that they tend to oﬀerfunctionality to multi-threaded client programs, be it means of synchronization or access toshared data.We are interested in verifying such implementations on top of RA. For veriﬁcation, wecan abstract the client program to invocations of the oﬀered functionality [20]. The result isa so-called instance of the implementation in which concurrent threads execute the code ofinterest. There is a subtlety. As the RA implementation should be correct for every client,we cannot ﬁx the instance to be veriﬁed. We have to prove correctness irrespective of thenumber of threads executing the code. This is the classical formulation of a parameterizedsystem as it has been studied over the last 35 years [20]. a r X i v : . [ c s . L O ] J a n Safety Veriﬁcation of Parameterized Systems under Release-Acquire

We are interested in the decidability and complexity of safety veriﬁcation for parameterizedprograms under RA. The goal is to identify expressive classes of programs for which theproblem is tractable. There are good arguments in favor of this agenda. From a pragmaticpoint of view, even if the implementation at hand does not fall into one of the classesidentiﬁed, we may hope for a reasonably precise encoding. From a conceptual point of view,tractability of veriﬁcation is linked to programmability, and understanding the complexitymay lead to suggestions for better consistency notions [50] or programming guidelines, e.g.in the form of type systems [56]. Safety veriﬁcation is a good ﬁt for linearizability [43], thede-facto standard correctness conditions for concurrency libraries, and has to be settledbefore going to more complicated notions.To explain the challenges of parameterized veriﬁcation under RA, it will be helpful to havean understanding of how to program under RA. The slogan of RA is never read “overwritten”values [52]. Assume we have shared variables g and d , initially 0, and a thread ﬁrst stores1 to d and then 1 to g . Assume a second thread reads the 1 from g . Under RA, thatthread can no longer claim d = 0. Formulated axiomatically [9], the reads-from, modiﬁcationorder, program order, and from-read should be acyclic [52]. While less concise, there areoperational formulations of RA that make explicit, information about the computation whichwill be useful for our development [47, 48, 59]. The mechanism is as follows. Program andmodiﬁcation order are encoded as natural numbers, called timestamps . Each thread storeslocally a view object, a map from shared variables to timestamps. This map reﬂects thethread’s progress in terms of seeing (or as above hearing from) stores to a shared variable. Thecommunication is organized in a way that achieves the desired acyclicity. Store instructionsgenerate messages that decorate the variable-value pair by a view. This view is the one heldby the thread except that the timestamp of the variable being written is raised to a strictlyhigher value. The shared memory is implemented as a pool to which the generated messagesare added and in which they remain forever. When loading a message from the pool, thetimestamp of the variable given by the message must be at least the timestamp in the thread.The views are then joined so that the receiver cannot load values older than what the senderhas seen.The timestamps render the RA semantics inﬁnite-state, which makes algorithmic veriﬁca-tion diﬃcult. Indeed, the problem of solving safety veriﬁcation under RA in a complete wayhas been studied very recently in the non-parameterized setting and proven to be undecidableeven for programs with ﬁnite control ﬂow and ﬁnite data domains [1]. With this insight, [1]proposes to give up completeness and show how to encode an under-approximation of thesafety veriﬁcation problem into sequential consistency [54]. Lahav and Boker [50] drewa diﬀerent conclusion. They proposed strong release-acquire (SRA) as a new consistencyguarantee under which safety veriﬁcation is decidable for general non-parameterized programs.Unfortunately, the lower bound is again non-primitive recursive. Also the related problem ofchecking CC itself for a given implementation has been studied. It is undecidable in general,but EXPSPACE-complete under the assumption of data independence [22].To sum up, despite recent eﬀorts [1, 22, 50] we are missing an expressive class of programsfor which the safety veriﬁcation problem under RA is tractable. The parameterized veriﬁcationproblem has not been studied. Problem Statement . The parameterized systems of interest have the form env k dis k· · · k dis n . We have a ﬁxed number of distinguished threads collectively referred to as dis andexecuting programs c dis , · · · c n dis , respectively. Moreover, we have an environment consistingof arbitrarily many threads executing the same program c env . We obtain an instance of thesystem by also ﬁxing the number of number of environment threads. The safety veriﬁcation dwait Godbole, S. Krishna, Roland Meyer 3 problem is as follows: Safety Veriﬁcation for Parameterized Systems :Given a parameterized system env k dis k · · · k dis n , is there an instance of the systemand a computation in that instance that reaches an assertion violation?The complexity of the problem depends on the system class under consideration. Wedenote system classes by signatures of the form env ( type env ) k dis ( type ) k · · · k dis n ( type n ),where the types constrain the programs executed by the threads. The parameters are thestructure of the control ﬂow, which may be loop-free, denoted by acyc , and the instructionset, which may forbid the atomic compare-and-swap (CAS) command, denoted by nocas . Wedrop the type if no restriction applies. If a thread is not present, we do not mention it in thesignature. With this, dis ( acyc ) k dis ( nocas ) k dis is a non-parameterized system (without env threads) having three dis threads executing: a loop-free c dis , c dis which does not haveCAS instructions, and c dis which is free of restrictions, respectively. Justifying the Parameters.

In [1], the safety veriﬁcation problem under RA has beenshown to be undecidable for non-parameterized ( env -free) systems from dis ( nocas ) k dis ( nocas ) k dis k dis and non-primitive-recursive for systems from dis ( nocas ) k dis ( nocas ).There are several conclusions to draw from this.With distinguished threads, we cannot hope to arrive at a tractable veriﬁcation problem.We take the bounded model checking [28] approach and consider loop-free code. Acyclicprograms, however, are not very expressive. Fortunately, RA implementations tend to beparameterized, and, as we will see, this frees us from the acyclicity restriction. The fact thatparameterization simpliﬁes veriﬁcation has been observed in various works [5, 33, 39, 46, 62]that we discuss below.Restricting the use of CAS requires an explanation. The class env of unconstrainedenvironment threads enables what we call leader isolation : an env thread can distinguishitself from the others by acquiring a CAS-based lock. Even just t CAS operations allows forthe isolation of t distinguished threads, which takes us back to the results of [1] for t = 2resp. t = 4. Acyclicity will not help in this case, in section 3 we show that safety veriﬁcationfor env ( acyc ) is undecidable. Contributions . We state our main results and present the technical details in the laterparts.

A Simpliﬁed Semantics . We consider parameterized systems of the form env ( nocas ) k dis k · · · k dis n . Our ﬁrst contribution is a simpliﬁed semantics (Section 4) that is equivalentwith the standard RA semantics as far as safety veriﬁcation is concerned. The simpliﬁedsemantics uses the notion of timestamp abstraction , which allows us to be imprecise aboutthe exact timestamps of the env threads. Note that we do not make any assumptions on theform of the distinguished threads but support cyclic control ﬂow and CAS. So the result inparticular applies to the intractable classes from [1], even when extended with a parameterizedenvironment. Supporting CAS in the distinguished threads is important. Without it, thereis no way to capture the optimistic synchronization strategies used in performance criticalprogramming [42].We continue to apply the simpliﬁed semantics to prove tight complexity bounds for the safetyveriﬁcation problem in two particular cases of dis programs. Loop-Free Setting.

In Section 5, we show a

PSPACE -upper bound for the safety veriﬁcationproblem of parameterized programs from env ( nocas ) k dis ( acyc ) k · · · k dis n ( acyc ). The class Safety Veriﬁcation of Parameterized Systems under Release-Acquire reﬂects the bounded model checking problem [28], which unrolls a given program into aloop-free under-approximation. Interestingly, we can sequeeze into

PSPACE the unboundedenvironment of cyclic threads. Our decision procedure is not only optimal complexity-wise,it also has the potential of being practical (we do not have experiments). We show how toencode the safety veriﬁcation problem into the query evaluation problem for linear Datalog,the format supported by Horn-clause solvers [17,18], a state-of-the-art backend in veriﬁcation.

Leader Setting.

We continue to show an

NEXPTIME -upper bound for env ( nocas ) k dis ( acyc ) k · · · k dis n ( acyc ) k ldr in Section 6. These systems add an unconstraineddistinguished thread, called the leader (denoted ldr ), to the system from Section 5. The classis in the spirit of thread-modular veriﬁcation techniques [35, 57], where the safety of a single‘ego’ thread is veriﬁed when interacting with an environment.We note that these results delineate the border of tractability: adding another dis threadresults in a non-primitive-recursive lower bound [1], and adding CAS operations to env resultsin undecidability (section 3). Lower Bounds . Our last contributions are matching lower bounds for the two classes.Interestingly, they hold even in the absence of CAS. We show that the safety veriﬁca-tion problem is

PSPACE -hard already for env ( nocas , acyc ), while it is NEXPTIME -hard for env ( nocas , acyc ) k ldr ( nocas ). Related Work . There is a vast body of work on algorithmic veriﬁcation under consistencymodels. Since our interest is in decidability and complexity, we focus on complete methods.We have already discussed the related work on RA and CC.

Other Consistency models.

Atig et al. have shown that safety veriﬁcation is decidable forassembly programs running on TSO, the consistency model of x86 architectures [11]. Theresult has been generalized to consistency models with non-speculative writes [12] and veryrecently to models with persistence [3]. It has also been generalized to parameterized programsexecuted by an unbounded number of threads [4]. Behind the decision procedures are (oftendrastic) reformulations of the semantics combined with well-structuredness arguments [6].A notable exception is [5], showing that safety veriﬁcation under TSO can be solved in

PSPACE for cas-free parameterized programs, called env ( nocas ) here. On the widely-usedPower architecture safety is undecidable [2].The decidability and complexity of veriﬁcation problems has been studied also fordistributed databases and data structures. Enea et al. considered the problem of checkingeventual consistency (EC) [66] of replicated databases and developed a surprising link tovector addition systems [23] that yields decidability and complexity results for the safety andliveness aspects of EC. For concurrent data structures, the default correctness criterion islinearizability wrt. a speciﬁcation [43]. While checking linearizability is EXPSpace -completein general [10, 40], important data structures (for which the speciﬁcation is then ﬁxed) admit

PSPACE -algorithms [21].

Parameterized Systems with Asynchronous Communication . We exploit a pleasant interplaybetween the asynchronous communication in RA and the parameterization of our systems inthe number of threads. Kahlon [46] was the ﬁrst to observe that parameterization simpliﬁesveriﬁcation in the case of concurrent pushdowns. Hague [39] showed that safety veriﬁcationremains decidable when adding a distinguished leader thread. Esparza, Ganty, Majumdarstudied the complexity of what is now called leader-contributor systems [33]. It is surprisinglylow, NP -complete for systems of ﬁnite-state components and PSPACE -completeness forsystems of pushdowns. At the heart of their technique is the so-called copycat-lemma. dwait Godbole, S. Krishna, Roland Meyer 5

The work has been generalized [62] to all classes of models that are closed under regularintersection and have a computable downward-closure. It has also been generalized to livenessveriﬁcation [31, 36]. Finally, the study has been generalized to parameterized complexity, forsafety [27] and liveness [25]. Our work is related in that the distinguished threads behavelike a leader. Moreover, our simpliﬁed semantics relies on an inﬁnite-supply property theproof of which gives a copycat variant for RA. Our Datalog encoding is reminiscent of thenotion of Strahler number [34].Leader-contributor systems are closely related to broadcast networks [32, 61]. Also there,safety veriﬁcation has been found to be suprisingly cheap, namely

PTIME -complete [30].For liveness veriﬁcation, there was a gap between

EXPSpace and

PTIME that was settledrecently with a non-trivial polynomial-time algorithm [26]. What is new in broadcastnetworks and neither occurs in leader-contributor systems nor in our setting is the problemof reconﬁguration [13, 16, 29].-

A parameterized system consists of an unknown and potentially large number of threads,all running the same program. Threads compute locally over a set of registers and interactwith each other by writing to and reading from a shared memory. The interaction with theshared memory is under the Release Acquire (RA) semantics [48, 52, 59]. ( c , rv , vw ) msg −− + ( c , rv , vw )( c ; c , rv , vw ) msg −− + ( c ; c , rv , vw ) i = 1 , c ⊕ c , rv , vw ) − + ( c i , rv , vw ) ( skip ; c , rv , vw ) − + ( c , rv , vw ) ( assert false , rv , vw ) − + ⊥ ( c ∗ , rv , vw ) − + ( skip ⊕ c ; c ∗ , rv , vw ) [[ e ]]( rv ( r )) = d rv = rv [ r d ]( r := e ( r ) , rv , vw ) − + ( skip , rv , vw ) [[ e ]]( rv ( r )) = 0( assume e ( r ) , rv , vw ) − + ( skip , rv , vw ) (ST-local) rv ( r ) = d vw < x vw ( x := r , rv , vw ) st , ( x , d , vw ) −−−−−− + ( skip , rv , vw ) (LD-local) vw ( x ) ≤ vw ( x ) rv = rv [ r d ]( r := x , rv , vw ) ld , ( x , d , vw ) −−−−−− + ( skip , rv , vw t vw ) (CAS-local) rv ( r ) = d rv ( r ) = d vw ( x ) ≤ vw ( x ) = ts f vw = vw [ x ts + 1] vw = vw t f vw ( cas ( x , r , r ) , rv , vw ) ld , ( x , d , vw ) −−−−−−− + st , ( x , d , vw ) −−−−−−−− + ( skip , rv , vw ) (LD-global) lcfm ( t ) = lcf lcf ld , msg −−− + lcf msg ∈ m ( m , lcfm ) ( t , msg ) −−−−→ ( m , lcfm [ t lcf ]) (ST-global) lcfm ( t ) = lcf lcf st , msg −−− + lcf msg m ( m , lcfm ) ( t , msg ) −−−−→ ( m ∪ { msg } , lcfm [ t lcf ]) (CAS-global) lcfm ( t ) = lcf lcf ld , msg l −−−− + st , msg s −−−− + lcf msg l ∈ m msg s m ( m , lcfm ) ( t , msg ) −−−−→ ( m ∪ { msg s } , lcfm [ t lcf ]) (Unlabelled) lcfm ( t ) = lcf lcf − + lcf ( m , lcfm ) t −→ ( m , lcfm [ t lcf ]) Figure 1

Local transition relation: silent (thread-local) transitions (pink), shared memorytransitions (blue). Global transition relation (below in green)

We model the individual threads in our system as (non-deterministic) sequential programs.Assume a standard while-language

Com deﬁned by: c ::= skip | assume e ( r ) | assert false | r := e ( r ) | c ; c | c ⊕ c | c ∗ | r := x | x := r | cas ( x , r , r )The programs compute on (thread-local) registers r from the ﬁnite set Reg using assume,assert, assignments, sequential composition, non-deterministic choice, and iteration. Condi-tionals if and iteratives while can be derived from these operators, and we use them where Safety Veriﬁcation of Parameterized Systems under Release-Acquire convenient. The shared memory variables x are accessed only by means of load, store andcompare-and-swap (CAS) operations as r := x , x := r and cas ( x , r , r ), respectively. Theseinstructions are also referred to as events . We have a ﬁnite set Var of shared variables, andwork with the data domain

Dom = N . We do not insist on a shape of expressions e butrequire an interpretation [[ e ]] : Dom n → Dom that respects the arity n of the expression. We give the semantics of parameterized systems under release-acquire consistency. We optedfor an operational [48, 59] over an axiomatic [52] deﬁnition, and follow [1]. What makes theoperational deﬁnition attractive is that it comes with a notion of conﬁguration or state of thesystem that we use to reason about computations. We ﬁrst deﬁne thread-local conﬁgurations,then add the shared memory, and give the global transition relation.

Local Conﬁgurations.

The RA semantics enforces a total order on all stores to the samevariable that have been performed in the computation. We model these total orders by

Time = N and refer to elements of Time as timestamps. Using the total orders, each threadkeeps track of its progress in the computation. It maintains a view from

View = Var → Time ,a function, that for a shared variable x , returns the timestamp of the most recent eventthe thread has observed on x . Besides, the thread keeps track of the command to beexecuted next (which can be represented as program counter) and the register valuation from RVal = Reg → Dom . The set of thread-local conﬁgurations is thus

LCF = Com × RVal × View . Unbounded Threads.

The number of threads executing in the system is not known apriori. As long as we restrict ourselves to safety properties, there are two ways of modelingthis. One way is to deﬁne instance programs for a given number of threads, and then requiringcorrectness of all instances, as has been done in [19]. The alternative is to consider an inﬁnitenumber of threads right away. We take the latter approach and deﬁne

TID = N to be the setof thread identiﬁers. The thread-local conﬁguration map then assigns a local conﬁgurationto each thread: LCFMap = TID → LCF . Views.

The views maintained by the threads are used for synchronization. They determinewhere in the (appropriate) total order a thread can place a store and from which stores it canload a value. To achieve this, the shared memory consists of messages , which are variablevalue pairs enriched by a view, with the form ( x , d , vw ): Msgs = Var × Dom × View . Shared Memory. A memory state is a set of such messages, and we use Mem = 2

Msgs forthe set of all memory states. With this, the set of all conﬁgurations of parametrized systemsunder release-acquire is CF = Mem × LCFMap . Transitions.

To deﬁne the transition relation among conﬁgurations, we ﬁrst give a thread-local transition relation among thread-local conﬁgurations − + ⊆ LCF × LAB × LCF in Figure 1.Thread-local transitions may be labeled or unlabeled, indicated by

LAB = { ε }∪ ( { ld , st , cas }× Msgs ). The unlabeled transitions capture the control ﬂow within a thread and properly dwait Godbole, S. Krishna, Roland Meyer 7 handle assignments and assumes. They are standard. The message-labeled transitionscapture the interaction of the thread with the shared memory. We elaborate on the load,store, and CAS transitions by which a thread with local view vw , interacts with the sharedmemory. Load . A load transition r := x picks a message ( x , d , vw ) from the shared memory where d is the value stored in the message and updates its register r with value d . The messageshould not be outdated, which means the timestamp of x in the message, vw ( x ), should be atleast the thread’s current timestamp for x , vw ( x ). The timestamps of other variables do notinﬂuence the feasibility of the load transition. They are taken into account, however, when theload is performed. The thread’s local view is updated by joining the thread’s current view vw and vw by taking the maximum timestamp per address; ( vw t vw ) = λ x . max( vw ( x ) , vw ( x )). Store . When a thread executes a store x := r it adds a message ( x , d , vw ) to the memory,where d is the value held by the register r . The new thread-local view (and the message view), vw , is obtained from the current vw by increasing the time-stamp of x . We use vw < x vw to mean vw ( x ) < vw ( x ) and vw ( y ) = vw ( y ) for all variables y = x . CAS . A CAS transition is a load and store instruction executed atomically. cas ( x , r , r )has the intuitive meaning atomic { r := x ; assume r = r ; x := r } . The instruction checkswhether the shared variable x holds the value of r and, in case it does, sets it to the value of r . The check and the assignment happen atomically. Under RA, this means the timestamp ts of the load instruction and the timestamp ts of the store instruction involved in the CASshould be adjacent, ts = ts + 1.The transition relation among conﬁgurations −→ ⊆ CF × TID × ( Msgs ∪ { ε } ) × CF is deﬁnedin Figure 1. It is labeled by a thread identiﬁer and possibly message (if the transition interactswith the shared memory). The relation expects a thread t which performs the transition. Inthe case of local computations, there are no more requirements and the transition propagatesto the conﬁguration. In the case of loads, we require the memory to hold the message to beloaded. In the case of stores, the message to be stored should not conﬂict with the memory.In the case of CAS, we require both of the above, and that the two messages should haveconsecutive timestamps. We defer the deﬁnition of non-conﬂicting messages for the momentuntil we can give it in broader perspective.Fix a parametrized system of interest c . The initial thread-local conﬁguration is lcf init =( c , rv , vw ), where the register valuation assigns rv ( r ) = 0 to all registers and the viewhas vw ( x ) = 0 for all x ∈ Var . The initial conﬁguration of the parametrized system is cf = ( Mem init , lcfm init ) with an initial memory Mem init consisting of messages where allshared variables store the value d init ∈ Dom , along with the initial view which assigns timestamp 0 to all shared variables, and lcfm init ( t ) = lcf init for all threads. A computation (or arun) is a ﬁnite sequence of consecutive transitions ρ = cf t , msg ) −−−−−−→ cf t , msg ) −−−−−−→ . . . ( t n , msg n ) −−−−−−→ cf n . The computation is initialized if cf = cf init . We use TS ( ρ ) for the set of all non-zerotimestamps that occur in all conﬁgurations across all variables. We use TID ( ρ ) to referto the set of thread identiﬁers labeling the transitions. For a set TID ⊆ TID of threadidentiﬁers, we use ρ ↓ TID to project the computation to transitions from the given threads.With ﬁrst ( ρ ) = cf , last ( ρ ) = cf n we access the ﬁrst/last conﬁgurations in the computation. (cid:73) Example 1.

Consider the program given in Figure 2 which implements a simpliﬁed versionof the Dekker’s mutual exclusion protocol for two threads. There are two shared variables x and y . Both x , y are initialized to 0, and at instructions λ , λ the registers r , r are initialized Safety Veriﬁcation of Parameterized Systems under Release-Acquire to 1. The ﬁrst thread t signals that it wants to enter the critical section by writing thevalue 1 to x . It then checks if thread t has asked to enter the critical section by reading thevalue of y and storing it into the register r . The thread t is allowed to enter the criticalsection only if the value stored in the register r is 0. The second thread t behaves in asymmetric manner. Variables x and y have been initialized to 0Thread t Thread t λ : r := 1 λ : r := 1 λ : x := r λ : y := r λ : r := y λ : r := x λ : if ( r == 0) : λ : if ( r == 0) : criticalsection criticalsection Figure 2

On the top, is a simpliﬁed version of Dekker’s mutual exclusion protocol. Below, is apartial execution sequence under RA. The rectangles show the contents (messages) of the sharedmemory. Messages have three components - (1) the variable, (2) value of the message and (3) themessage view - a map from { x , y } to the set of timestamps Time ( N ). The lines below show thethread-local state - instruction-pointer, register valuation and thread-local view. Under Sequential Consistency (SC) [54], which is a stronger notion of consistency, themutual exclusion property (i.e., at most one thread is in the critical section at any time) ispreserved. However, this is not the case under the RA memory model. To see why, considerthe execution sequence presented in Fig 2. At each instant, the ﬁgure shows where theinstruction-counter (i.e., the label of the next instruction to get executed) resides in each ofthe threads, along with the values of the registers. The black arrows with instruction labels λ , λ show the evolution of the run on executing the instruction labeled λ , λ respectively.Let m λ represent the memory obtained after executing the instruction labeled λ , and let msg λ be the unique new message (if any) that is part of m λ after the execution of the instructionlabeled λ . The initial memory is m init where x , y have values and timestamps 0; msg x , msg y represent the messages in m init corresponding to x , y . The execution of the instruction labeledby λ results in the addition of a new message msg λ to the memory whose timestamp (10)is higher than 0 (which is the current timestamp of the variable x for t ). The view of t is then updated to x , y

0. Likewise, the execution of the instruction labeled by λ results in the addition of a new message msg λ to the memory with a higher time stamp (7).This will result in the update of the view of t to x , y

7, wrt. the variable y . Theread instruction labeled by λ is then allowed to use the message msg y to fetch the value of y , since the view of t wrt. y is 0. Likewise, in the case of t concerning the execution of theinstruction labeled by λ , the message msg x is used since the view of t wrt. x is 0. Afterthese steps, both threads enter their respective critical section. dwait Godbole, S. Krishna, Roland Meyer 9 We need a notion of conﬂict not only for messages, but also for memories, conﬁgurations, andcomputations. Two messages are non-conﬂicting , denoted by ( x , d , vw ) x , d , vw ), ifeither their variables are diﬀerent, x = x , the timestamps are diﬀerent, vw ( x ) = vw ( x ),or the timestamps are zero, vw ( x )=0= vw ( x ). Observe that initial messages do not conﬂictwith any other message.Two memory states are non-conﬂicting, m m , if for all msg ∈ m and all msg ∈ m we have msg msg . Two conﬁgurations are non-conﬂicting, cf cf , if their memorystates are non-conﬂicting. Two computations are non-conﬂicting, denoted ρ ρ , if they usediﬀerent threads and non-conﬂicting messages, TID ( ρ ) ∩ TID ( ρ ) = ∅ and last ( ρ ) last ( ρ ). env ( acyc ) In this section, we establish the undecidabililty of the class env ( acyc ), that is the class withloop-free env threads (which can execute arbitrarily-many CAS operations) and without any dis threads. This result essentially shows that even with the loop-free assumption, allowing env threads to perform CAS operations is in itself intractable from a safety veriﬁcationviewpoint. Hence, the nocas restriction that we impose on env threads is a justiﬁed means ofachieving tractabililty.In fact, we will show a stronger result. We will show that we can transform a non-paramterized system consisting of k distinguished threads having full instruction set andloops under RA (the class dis k · · · k dis k ) to a parameterized system corresponding to theclass env ( acyc ) such that, control-state reachability is preserved. With this equivalence, theclaim would follow by the undecidability result of [1]. To show this result, we transform an input of n programs { c , c , · · · , c n } and a failure statelabel λ fail in some c i (with possibly full instruction set and loops) and transform them into asingle program c env and failure state λ fail , the control ﬂow for which is loop-free but uses thefull instruction set including CAS operations. We claim that the state label λ fail is reachablein dis k · · · k dis n with dis i executing c i if and only if the state label λ fail is reachable in thesystem env ( acyc ) with the environment threads executing c env . Let the variable set, datadomain and register set of the original system be Var , Dom and

Reg = { r , · · · , r k } as usual.We assume that the memory is initialized to 0 on all variables. Converting a single program c i to c i We show how we can convert one thread program c i into a loop-free program c i and then show how we can combine all the programs togetherinto a single loop-free program c env . Consider for an i , the program c i . For the purposes ofthis construction, we will assume that the program c i has been speciﬁed as a transition systemrather than in the while-language syntax. It is clear that both representations are equivalentand can be interconverted with only polynomial overhead. Hence we assume that c i = ( Q, ∆ , ι )where Q is the set of control states, ∆ is the transition relation and ι maps each transition toits corresponding instruction from { skip , assume e ( r ) , r := e ( r ) , r := x , x := r , cas ( x , r , r ) } .We transform c i to a loop-free program as follows. Let Q = { q , · · · , q n } and with q as theinitial state.In this conversion, we add extra variables and values such that Var = Var ]{ t i , r i , · · · , r | Reg | i } and Dom = Dom ∪ { , · · · , | Q | − , Λ ⊥ } where ] denotes disjoint union.Now we specify the new transition system c i which needs to be loop-free. For each transition ∂ = ( q a , q b ) ∈ ∆, with source and end states q a and q b respectively and instruction ι ( ∂ ), wetransform it into the following three transition sequence (a sequence of green transitionsstarting with a CAS, followed by k load operations; the transition corresponding to ι ( ∂ ),followed by the sequence of pink transitions consisting of k store operations ending with aCAS) denoted as (cid:69) ( ∂ ). (cid:69) ( ∂ ) = q start cas ( t i ,a, Λ ⊥ ) −−−−−−−−→ q ∂ r := r i −−−−→ q ∂ · · · q k − ∂ r k := r ki −−−−→ q k∂ ι ( ∂ ) −−→ q k +1 ∂ · · · (contd. below) · · · q k +1 ∂ r i := r −−−−→ q k +2 ∂ · · · q k − ∂ r ki := r k −−−−→ q k∂ cas ( t i , Λ ⊥ ,b ) −−−−−−−→ q end We construct (cid:69) ( ∂ ) for each transition ∂ in ∆ to get the complete transition system. Theinitial/ﬁnal (collectively called terminal) nodes of this transition system are q start , q end (whichare common to all (cid:69) ( ∂ )). The internal q _ ∂ states are all distinct across the (cid:69) ( ∂ ) for diﬀerent ∂ . The transition system that we obtain has size O ( | Reg || ∆ | ) (each original transition from q a to q b is transformed into a sequence of 2 | Reg | + 3 transitions between q start and q end ). It isclearly loop-free. See Figure 3 showing an example if we start with dis i k dis j . Combining individual c i We construct programs c i as described above for each thread dis i . Now we combine these individual programs into a single program c env .We ensure thatthe newly added shared variables ( t i , r _ i for dis i ) are also disjoint across threads. Hencethe variable set is now Var = Var ] { t , · · · , t n } ] { r ji } i ∈ [ n ] ,j ∈ [ | Reg | ] (where Var was theoriginal variable set { c , c , · · · , c n } were operating with). Finally, the combined data domainis simply a union of individual data domains (which possibly overlap). We combine theindividual programs as c env = c ⊕ c ⊕ · · · ⊕ c n where ⊕ denotes non-deterministic choice. It is clear that c env is loop-free. Additionally, | c env | , and the new | Dom | and | Var | is polynomial in P i | c i | and previous | Var | and | Dom | . q q q q q q ∂ ∂ ∂ ∂ ∂ Program c i Program c j q end q end q start q start Transformed program c j Transformed program c i ι ( ∂ ) ι ( ∂ ) ι ( ∂ ) ι ( ∂ ) ι ( ∂ )register loads register loadsregister stores register stores cas ( t i , , Λ ⊥ ) cas ( t i , , Λ ⊥ ) cas ( t i , Λ ⊥ , cas ( t i , Λ ⊥ , cas ( t j , Λ ⊥ , cas ( t j , Λ ⊥ , cas ( t j , , Λ ⊥ ) cas ( t j , , Λ ⊥ ) cas ( t j , , Λ ⊥ ) cas ( t j , Λ ⊥ , Figure 3

Examples of two dis threads dis i and dis j executing programs c i , c j and the correspondingtransformed programs c i , c j . The program c i has 2 transitions while c j has 3 transitions. Note howthe read-write value for CAS operations in the transformed program match with the transitions inthe original program. We now prove that the system env ( acyc ) with the env threads executing the program c env asdeﬁned above respect the original system. We defer the notion of ‘maintaining reachability’ dwait Godbole, S. Krishna, Roland Meyer 11 for a bit later. We ﬁrst observe the program c env and make some observations. Locking/unlocking of c i . For any single transformed program c i , we note that at anygiven point, only one thread can be in any internal (not initial/ﬁnal) state of c i . To see this,note the two atomic CAS operations ﬂanking each 3-path in c i . All these CAS operationsare on the same variable t i and moreover there are no other operations on t i . Hence at anygiven point in time, there is only one message on t i (the most recent write) that is availablefor a CAS operation. The value of this message dictates whether the operation will succeed.When it succeeds, the most recent write value changes to the value written by the CAS.Now note that t i is initialized with value 0; hence initially one thread, say t can takeperform a CAS and change the recent value to Λ ⊥ . Now, there is no transition from q start that performs a CAS with a read of Λ ⊥ . Hence all other threads are kept waiting until therecent value on t i changes from Λ ⊥ . This is possible only when the initial thread t , executesthe ﬁnal transition and reaches q end , maintaining the claim. Hence these CAS operationsperform the role of a mutual-exclusion lock. But then they perform another function too. State transference.

We now know that for each i , only one thread may execute c i atany given time. However the locking/unlocking operations using CAS also enable threads totransfer their state to their successors. There are three components to the state, which wehandle in turn:Control-state: Note that the recent value on variable t i is v = Λ ⊥ only if the previousthread terminated after simulating some transition ending at q v . Additionally, a lockingCAS operation for (cid:69) ( ∂ ) reads value v only if ∂ is a transition from q v to some other state.Hence, it is guaranteed that the successive thread will execute some transition that emergesfrom a state where the previous thread left oﬀ. Note how this is true for the ﬁrst thread aswell since the initial value on all variables is 0 and the initial state of the transition systemis q .View: The second component that we consider is the view. This also is transferred from athread to its successor through the CAS operation. In particular, when a thread t executesthe ﬁnal CAS operation to reach q end , it generates a message on t i which is read by itssuccessor. This read implies that the successor will take the join on its own (initial) viewwith that of the message and hence essentially accumulate the exact view that the previousthread left with. So, the view is transferred as well.Register valuations: The previous thread t stores its register valuations in the sharedvariables r ji in the ﬁnal sequence of store operations before terminating. These are thenaccessed by the successor thread through the initial sequence of load operations.In this way we see that not only is exclusion ensured, but the thread states are transferredfrom one thread to the next. Together, these sequences of threads simulate the entire runof the original dis i in fragments. The above holds for all i ∈ [ n ]. Hence at any given point,there are at most n threads simulating the original ones. Now we formalize the notion of equivalence in reachability. We say that an original statewith the threads { dis , · · · , dis n } is equivalent to a new state when we have the following.if the control states of threads { dis , · · · , dis n } are ( q i , q i , · · · , q i n ) respectively then in thenew system with env , the recent value of shared variable t j is i j , the register valuation of each original dis i is reﬂected ( r j = most recent write to r ji ) in themost recent writes to the variables r _ i for each thread i ,the view of dis i is the view stored in the most recent message to t i (again projected on theoriginal variable set) andthe global memory (projected) is identical across the original and env states.We claim the following: a state in the original system can be reached if and only someequivalent state in the new system can be reached. We can prove this by induction. Thebase case is that all threads are in their initial states, registers and views with the memoryonly with initial messages (0 on each variable). This trivially satisﬁes the requirement, bothin the forward and reverse directions.Now for the inductive case ( ⇒ ). Assume that it was true at some instant. Let some dis i execute an instruction for the transition ∂ . In the new system, we can simulate this as anew env thread t taking the path corresponding to (cid:69) ( ∂ ) in c i . We note by the observationsabove that the invariants for the thread-local state (control-state, register valuations andview) is maintained. Additionally, if dis i wrote a message to the memory, then so can t . Inparticular, since the view of t is obtained from the CAS read, it matches that of dis i . Hencethe message added by t can have the same timestamps as dis i .Inductive case ( ⇐ ). The same argument works in the reverse direction. Assume thata pair of equivalent states have been reached. Now, consider a env thread path (cid:69) ( ∂ ) in c i where ∂ = ( q a , q b ). Then, by the induction hypothesis this means that dis i is in state q a inthe original run. Given the equality of thread and memory state initially, it too can take thetransition ( q a , q b ). Once again, the invariant follows from the earlier observations.This gives a sketch of the proof. In particular, note that even though we give an equivalencebetween the control state in the original system and a variable value in the new system, thiscan be easily converted to an equivalence between control states themselves. This meansthat the reachability problem for dis k · · · k dis n can can converted to a reachability problemfor env ( acyc ). This prompts us to restrict env threads to a reduced (cas-free) instruction setand motivates the idea of modelling CAS instructions in a run via computations of the dis threads. In this section, we propose a simpliﬁed semantics for the class of systems given by env ( nocas ) k dis k · · · k dis n . The core of this result relies on the Inﬁnite Supply Lemma which showsthat if some env thread could generate a message ( x , val , vw ), then a clone of that threadcould generate the message ( x , val , vw ) with vw = vw [ x t ] for some t > vw ( x ).There are two assumptions that the inﬁnite supply lemma and hence our semanticsimpliﬁcation result rely on:arbitrarily many env threads executing identical programs.the env threads do not have atomic instructions (CAS).The ﬁrst assumption allows us to have clone env threads that duplicate the computation andhence the messages generated in it. The second assumption is required for the duplicatedcomputation to remain valid under RA.While performing the duplication, one must keep in mind the dependency between storesand loads across threads. The fact that dis threads are not replicatable (their messagescannot be duplicated) adds to the challenge. To ensure that the clone threads can follow inthe footsteps of the original computation we require that dis messages can be read by the dwait Godbole, S. Krishna, Roland Meyer 13 env clones whenever they can be read by the original env threads. This necessitates that werespect relative order among timestamps between env and dis threads.We develop some intermediate concepts that help us in developing a valid duplicate run.In order to accommodate the clone threads, we must make space (create unused timestamps)along Time for clones to write messages. We do this via timestamp liftings . Having done this,we need to deﬁne how we can combine the original computation with that of the clones. Wedevelop the concept of superposition of computations to do this. Finally, the inﬁnite supply(of messages) lemma shows how, using the earlier two concepts, we can generate copies ofmessages, with higher timestamps.This ‘duplication-at-will’ of env messages means that we need not store the entire set of env messages produced. Those with the smallest timestamps act as good representatives ofthe set. Additionally when any thread reads from an env message, we need not be botheredabout timestamp comparisons since we could always generate a copy of that message with ashigh timestamp as required. It is this observation that gives us the timestamp abstractionand with it the simpliﬁed semantics.

We now make these arguments precise. Our strategy is to split up the timestamps (hencethe computation) and separate the part originating from the dis threads from the env part(which can be duplicated at will). We write ρ ↓ env and ρ ↓ dis to denote the projections of ρ to env and dis respectively. In our development we will make use of timestamp transformations tf : Time → Time . Weextend these to views vw with per variable timestamp transformations tf = { tf x } x ∈ Var , where tf x only transforms the timestamps for the variable x . The transformed view tf ( vw ) : Var → Time is deﬁned by ( tf ( vw ))( x ) = tf x ( vw ( x )) for every variable x .As an example consider shared variables x , y and views vw , vw such that vw = [ x , y

5] and vw = [ x , y tf = { tf x , tf y } where tf x (0) = tf y (0) = 0, tf x ( t ) = t + 2 and tf y ( t ) = t + 7 for t >

0, we obtain tf ( vw ) = [ x , y

12] and tf ( vw ) = [ x , y RA-valid timestamp lifting . An

RA-valid timestamp lifting for a run ρ is a (per variable)timestamp transformation M = { µ x } x ∈ Var satisfying two properties for each x ∈ Var : (1)it is strictly increasing, µ x (0) = 0; for all t , t ∈ N with t < t we have µ x ( t ) < µ x ( t )and (2) if there is a CAS operation on x with (load, store) timestamps as ( t, t + 1) then µ x ( t + 1) = µ x ( t ) + 1, i.e. consecution of CAS-timestamps is maintained. Note that µ ( cf init ) = cf init . In the example above, tf is a RA-valid timestamp lifting.Lemma 2 says that the run M ( ρ ) obtained by modifying the timestamps of a valid run ρ with an RA-valid timestamp lifting M is also a valid under the RA semantics. (cid:73) Lemma 2 (Timestamp Lifting Lemma) . Let M = { µ x } x ∈ Var be an RA-valid timestamplifting. If ρ is a computation under RA, then so is M ( ρ ) . Hence if a conﬁguration cf isreachable under RA then so is M ( cf ) . Proof.

This result follows since timestamp lifting is just a relabelling of timestamps for eachshared variable. The lemma relies on the following facts/observations:

There are no timestamp comparisons across variables, vw ( x ) is never compared with vw ( x )for x = x .The relative order between timestamps on the same variable is preserved due to the strictlyincreasing property. Additionally, µ (0) = 0, maintaining the timestamps of the init messages.The load, store timestamps of (CAS-local) operations still remain consecutive.In particular the lemma can be formally proven by induction on the length of the run. Thebase case is trivial and the inductive case follows by showing that each instruction - read,write, CAS - that can be executed in ρ can be executed in the lifted run, M ( ρ ). (cid:74) The duplication of messages by the clone env threads requires us to copy computations andthen merge them such that the RA semantics are not violated. This requries (1) timestampsof merging computations to not conﬂict and (2) the reads-from dependencies between threadsare respected. With this in mind, we introduce the idea of superposition.

We deﬁne the superposition ρ . ρ of two computations ρ, ρ as the computation that ﬁrstexecutes ρ and then ρ . This requires us to combine the memory in last ( ρ ) with that of everyconﬁguration in ρ . Moreover, the threads transitioning in ρ, ρ must be disjoint. Given theseconsiderations, the operation requires the computations to be non-conﬂicting, ρ ρ (seeSection 2.2.1), and is deﬁned as follows: ρ . ρ = ρ ; ( last ( ρ ) + ρ ) . The addition of a conﬁguration cf to a computation ρ = cf −−−−−−→ ( t , msg ) . . . −−−−−−→ ( t n , msg n ) cf n yieldsthe new computation cf + ρ = ( cf + cf ) ( t , msg ) −−−−−−→ . . . ( t n , msg n ) −−−−−−→ ( cf + cf n ) . Addition of conﬁgurations cf = ( m , lcfm ) and cf = ( m , lcfm ) is the conﬁguration cf + cf = ( m ∪ m , lcfm ), where lcfm ( t ) = lcfm ( t ) if lcfm ( t ) = lcf init and lcfm ( t ) = lcfm ( t )otherwise.When ρ ρ holds, we have: (1) for any thread t , if it has transitioned in ρ , then itcannot in ρ ; likewise, if it has not transitioned in ρ , then it can in ρ .(2) last ( ρ ) last ( ρ ), and since the memory in earlier conﬁgurations of ρ is a subset of thatin last ( ρ ), the memory unions performed above involve nonconﬂicting memories. An initialconﬁguration is neutral for addition, in particular last ( ρ ) + ﬁrst ( ρ ) = last ( ρ ). The operationof concatenation ρ ; ρ expects two computations ρ and ρ that satisfy last ( ρ ) = ﬁrst ( ρ )and returns the sequence consisting of the transitions in ρ followed by the transitions in ρ .This need not be a valid computation under RA, but under the following conditions it is.Let Msgs ( ρ ) be the memory in last ( ρ ). Likewise, let Msgs ( ρ ↓ dis ) ⊆ Msgs ( ρ ) be the subsetof memory in last ( ρ ), which have been added by dis threads during ρ . (cid:73) Lemma 3 (Superposition) . Consider valid computations ρ, ρ of a parametrized system underRA such that ρ ↓ env ρ ↓ env and that Msgs ( ρ ↓ dis ) = Msgs ( ρ ↓ dis ) . Then the superposition ρ . ρ ↓ env is a valid computation under RA. Proof.

Since there are arbitrarily many env threads, we distinguish apart the env threads in ρ from the env threads in ρ . By doing so we ensure that the threads operating (changingstate) in ρ and ρ ↓ env are disjoint. dwait Godbole, S. Krishna, Roland Meyer 15 Now consider the global state obtained after executing ρ (which is a valid run under RA).By hypothesis, the memory state contains messages from ρ ↓ dis , which are identical to thosein ρ ↓ dis . After execution of ρ is complete, we claim that we can execute ρ ↓ env one step at atime.Whenever a dis thread loads from a message generated by a env thread in ρ , the same canhappen in ρ . ρ ↓ env . Likewise, the relative time stamps between the dis threads and the env threads in ρ are the same; so ρ ↓ env can be executed after ρ .Likewise, reads made by some env thread on ρ, ρ either from another env thread or a dis thread also continues exactly in the same way in ρ . ρ ↓ env , since the messages added by dis threads are exactly same in ρ, ρ , and the env threads are disjoint.The above two points show that we have exactly the same reads-from dependencies ( dis ↔ env in ρ , dis ↔ env in ρ ) in ρ . ρ ↓ env . The reason is that env threads are disjoint and themessages added by dis threads are the same in ρ, ρ . Finally, all writes made by the respective env threads of ρ, ρ can be done in ρ . ρ ↓ env ; likewise, all writes made by the dis threads in ρ can also be made in ρ . ρ ↓ env . The reason is that ρ ↓ env ρ ↓ env , and trivially, we have ρ ↓ dis ρ ↓ env . This ensures no conﬂict of write-timestamps.Formally this can be proven by induction on the length of ρ . (cid:74) Now we develop the inﬁnite supply lemma. Recall that our goal is to generate arbitrarilymany copies of env messages with the same variable and value but higher timestamp. Let usﬁx one such message, msg = ( x , d , vw ), for our discussion here and see how we can replicate it.Towards this end, consider a computation ρ in which it is generated. We ‘spread-apart’ thetimestamps of Msgs ( ρ ), using timestamp liftings so that we create ‘holes’ (unused timestamps)along Time . Then we generate copies of env threads, denoted as copy ( env ) (possible since env threads can be replicated).The holes accomodate for the timestamps of copy ( env ) and the (higher) timestamp of thecopy of msg . Throughout this, we preserve the order of timestamps of env , copy ( env ) threadsrelative to those of dis threads. This ensures that reads-from dependencies are maintained - copy ( env ) can read a dis message whenever env can do so.We deﬁne the computation e ρ as a copy of ρ ↓ env executed by copy ( env ) threads. Thewrite timestamps used by copy ( env ) threads are the unoccupied timestamps generated bythe timestamp lifting operation M ( ρ ). We show an example of this via a graphic. Let eT i and dT i respectively denote the timestamps chosen by env and dis along ρ (ﬁrst row). ρ : init dT eT dT eT eT M ( ρ ) : init dT eT eT a dT eT eT a eT eT a e ρ : init dT eT b eT dT eT b eT eT b eT The second row shows lifted timestamps (with subscript a ) of M ( ρ ) and the holes (faded).The third row shows holes being used by copy ( env ) for e ρ (these have subscript b ). Theconstruction guarantees M ( ρ ) e ρ and superposition M ( ρ ) . e ρ is allowed. In this computation, e ρ generates a copy of msg , msg = ( x , val , vw ) with higher vw ( x ). Additionally, since eT i a , eT i b have the same position relative to all dT j timestamps, so will vw ( y ) , vw ( y ) for y = x .Now we state the Inﬁnite Supply Lemma. As helper notation, for a run ρ and eachvariable x , we denote the timestamps of stores of dis threads on x as ts x < ts x < · · · . (cid:73) Lemma 4 (Inﬁnite Supply) . Let ρ be a valid run under the RA semantics, in which themessage ( x , d , vw ) has been generated by an env thread. Then for each timestamp t ∗ ∈ N , there exist two timestamp lifting functions M , M and a run ρ such that M ( ρ ) . M ( ρ ↓ env ) . ρ is a valid run. This run contains a message ( x , d , vw ) satisfying (ts comes from ρ ) ∀ i ( t ∗ ≤ ts x i ∧ vw ( x ) ≤ ts x i ) = ⇒ vw ( x ) ≤ µ x ( ts x i ) vw ( x ) ≥ µ x ( t ∗ ) ∀ x = x , ∀ i , vw ( x ) ≤ ts x i = ⇒ vw ( x ) ≤ µ x ( ts x i ) Proof.

Without loss of generality, we assume that in the run ρ , the timestamps on eachvariable are consecutive. If that is not the case, we can always use a timestamp loweringoperation that ‘ﬁlls in the gaps’ between non-consecutive timestamps, while maintainingconsecution of the load,store timestamps of (CAS-local) operations.We will give a constructive proof. We specify M ( ρ ) . M ( ρ ↓ env ) . ρ by deﬁning M , M and ρ , and showing that the resulting run is valid under RA. Then we show how a copyof the message ( x , d , vw ) can be obtained as claimed.First, we describe how to copy runs. Copying a run . For a variable x , we deﬁne the lifting functions as follows. Withthe consecutiveness assumption, the messages on x have consecutive timestamps and aregenerated by either dis or env , which we will denote below as disT and envT respectively.For developing intution quickly, consider the following sequence of consecutive timestampson some variable. init disT disT envT disT envT envT disT Intuitively, the new interleaved run is obtained by triplicating each envT timestamp intothree adjacent timestamps, envT a , envT b and envT c . The envT a timestamps belongs to thelifted run M ( ρ ), the envT b timestamp belongs to M ( ρ ) and the envT c timestamp belongsto ρ . The a, b, c copies are ordered as b < c < a giving us the following timestamp sequencefrom the one above. init disT disT envT b envT c envT a disT envT b envT c envT a envT b envT c envT a disT We can formalize M and M by counting the number of disT timestamps smaller than the envT i timestamp, but for ease of presentation, we will keep this implicit. The total shift canbe done for instance, using the function which maps a time stamp p ∈ N corresponding to a env thread to the (number of dis threads appearing before p ) + 3(number of env threadsappearing before p )+3, while for a dis time stamp p ∈ N , one can map it to (number of dis threads appearing before p ) + 3(number of env threads appearing before p )+1. So forinstance, the envT at timestamp 3 moves to the envT a time stamp 5, while the disT movesto the disT time stamp 3+ 3(3)+1=13. M maps the timestamp envT i to the timestamp envT ia timestamp. Similarly, M maps the envT i timestamp to the timestamp envT ib = envT ia −

2. Finally, we have envT ic = envT ia − M and M map the disT i timestamp to the corresponding disT i timestamp in the expanded run.We ﬁrst note that M satisﬁes the premise of the timestamp lifting lemma, that for (CAS-local) operations, consecutive load,store timestamps for CAS remain consecutive.This follows since only dis can perform (CAS-local) and under M , consecution ismaintained both for ( disT i − , disT i ) timestamps as well as ( envT a , disT i ) as depicted in the dwait Godbole, S. Krishna, Roland Meyer 17 following timestamp sequence. Thus, by the timestamp lifting lemma, M ( ρ ) is a valid rununder RA. init disT disT envT envT envT a disT envT envT envT a envT envT envT a disT The ﬁrst superposition gives a valid run . We now claim that the run M ( ρ ) . M ( ρ ↓ env )is a valid run under RA. For a env thread t in ρ we denote by copyB ( t ) as its ‘copy’ ( TID distinguished apart) in M ( ρ ↓ env ) ( B since it occupies the b timestamps). We will showthat copyB ( t ) copies the transitions that t took. We use the following invariant relating theview vw of thread t in ρ with the view vw of copyB ( t ) for showing this. Let TS ( t ) denotethe set of timestamps used by thread t in a run ρ .For every shared variable x , if vw ( x ) ∈ TS ( env ), then vw ( x ) = vw ( x ) −

2, else if vw ( x ) ∈ TS ( dis ) then vw ( x ) = vw ( x )Now we can reason by induction on the length of the run that whenever t takes a transitionin ρ , copyB ( t ) can replicate it but with the view as given by the above invariant. Moreprecisely, whenever a env thread t makes a store with timestamp envT , copyB ( env ) willmake a store with timestamp envT −

2. Similarly, when a env thread t makes a load, a. if the load is from a message by a dis thread, copyB ( env ) also loads from the same message b. if the load is from a message by some env thread t in ρ , copyB ( env ) loads from copyB ( t )It is easy to check that the view invariant is maintained through this simulation. Cruciallywe have envT ia < disT j ⇐⇒ envT ib < disT j . Thus t and copyB ( t ) can always read the sameset of dis messages. Thus we have that M ( ρ ) . M ( ρ ↓ env ) is valid under RA. Now we focuson message generation by ρ . Intuitively, ρ will also be a copy of ρ but will occupy the envT c timestamps. Generating a copy of the message . Now we will describe how we can use ρ to generatethe message ( x , d , vw ). Let ρ p be the preﬁx of ρ just (one transition) before the message( x , d , vw ) is generated. We generate the run ρ by copying the ρ p ↓ env using the c -timestamps.The run obtained is M ( ρ ) . M ( ρ ↓ env ) . ρ . Using similar reasoning as earlier, we can showthat this is a valid run under the RA semantics.Now let env thread t be the thread which generates the message ( x , d , vw ) in ρ . Then thereis a copy of t , thread copyC ( t ) in ρ , that is now in a control state enabling it to generate amessage on variable x with value d (since the transitions have been replicated exactly across env and copyC ( env )). Now we need to reason about the view of the message generated by copyC ( t ). If the view of t is vw a and that of copyC ( t ) is vw c we have the following, whichagain follows from the invariant mentioned above.For each variable x , if vw a ( x ) ∈ TS ( env ), then vw c ( x ) = vw a ( x ) −

1, else if vw a ( x ) ∈ TS ( dis ) then vw c ( x ) = vw a ( x )Observe how this satisﬁes the condition (3) in the lemma immediately since in both casesabove, we have vw c ( x ) ≤ vw a ( x ). Now the thread copyC ( t ) will choose the timestamp forvariable x , vw ( x ). Assume t ∗ ∈ N has been given. We have two cases, (i) t ∗ ≤ vw ( x ) and(ii) t ∗ > vw ( x ).(i) In this case there is nothing to prove as the original message is lifted to ( x , d , vw ) where vw ( x ) = µ x ( t ∗ ) which satisﬁes both conditions (1) and (2).(ii) We choose vw ( x ) = µ x ( t ∗ ) + 1, which satisﬁes (2). Note that this timestamp is higherthan vw c ( x ) since µ x ( t ∗ ) ≥ µ x ( vw ( x )) = vw a ( x ) > vw c ( x ). We have vw ( x ) = µ x ( t ∗ ) + 1 = µ x ( t ∗ ) − M , M , ρ . Additionally, ts x i ≥ t ∗ = ⇒ µ x (ts x i ) ≥ µ x ( t ∗ ) > vw ( x ) satisﬁying condition (1). In this case ρ is deﬁned as ρ extended bythe store transition generating the message. Note that since it is a c -type message, thetimestamp is available.Thus in both cases, we have a message ( x , d , vw ) with vw satisfying the required conditions.This proves the theorem. (cid:74) To sum up, we interpret the inﬁnite supply as this: M ( ρ ) is the lifted run with holes. M ( ρ ↓ env ) is the copy ( env ) run and ρ is obtained by running another copy that generates thenew message. We note that run triplication is not strictly necessary for message duplication,but makes the proof easier. We note, points (1) and (3) above refer to relative orderingbetween env and dis timestamps and (2) refers to the new message with an arbitrarily hightimestamp. We introduce the timestamp abstraction , which is a building block for the simpliﬁed semantics.Let us call a message msg an env ( dis ) message if it is generated by a env ( dis ) thread. Withthe intuition that env messages can be replicated with arbitrarily high timestamps, while dis or initial messages cannot be, we distinguish the write timestamps of the two types ofmessages. Timestamp Abstraction.

If a env thread has read a message ( x , d, vw ) from a dis threadwith a timestamp ts = vw ( x ) and has generated a message msg on x , then copies of msg are available with arbitrarily high timestamps at least as high as ts . To capture this in ourabstraction, we assign the env message msg , a timestamp ts + that is by deﬁnition, largerthan ts .We deﬁne the set of timestamps in the simpliﬁed semantics as N ] N + , where N + contains for each ts ∈ N , a timestamp ts + . The timestamps are equipped with the order (cid:22) in which ts + is greater than ts and smaller than ts + 1:0 ≺ + ≺ ≺ + ≺ . . . Timestamps of form ts ∈ N are used for the stores of dis threads while those of form ts + are used for stores of env threads. We allow multiple stores with the same timestampof form ts + , while allowing at most one store for timestamps of form ts . This abstractstimestamps of multiple env messages between two dis messages by a single ts + timestamp.Initial messages have timestamp 0 as usual.We utilize this timestamp abstraction by deﬁning a simpliﬁed semantics; note that thissimpliﬁcation is not per se a simpler formulation but rather is simple in the sense thatit will pave the way for eﬃcient veriﬁcation procedures (as detailed in Section 5, Section6). We then show that a run ρ in the classical RA semantics has an equivalent run in thesimpliﬁed semantics where the timestamps are transformed according to some timestamptransformation M as deﬁned above. We show that reachability across the two semantics ispreserved since both order and consecution between timestamps is maintained. RA semantics, simpliﬁed.

As in the classical RA semantics, the transition rules of thesimpliﬁed semantics will require us to increase timestamps (upon writing messages). Wedeﬁne the function raise ( − ) on N ] N + by raise ( ts ) = raise ( ts + ) = ts + , ts ∈ N . dwait Godbole, S. Krishna, Roland Meyer 19 The deﬁnition of the simpliﬁed semantics replaces the domain

Time by P = N ] N + . We usethe term abstract to refer to the resulting views, messages, memory, local conﬁgurations,and conﬁgurations and use a superscript de (shortened dis / env ) to indicate that an elementis abstract. So an abstract view is a function, vw de that maps shared variables to P . Wenow specify the transitions in the abstract semantics. Owing to their diﬀerent nature (one isreplicatable, the other is not) the dis and env threads will have diﬀerent transition rules inthe simpliﬁed semantics.For storing a value, the env threads use a rule (ST-local env ) that coincides with rule (ST-local) from the RA semantics (Figure 1) except that it replaces relation < x by < envx deﬁned as follows: vw < envx vw iﬀ ( raise ( vw ( x )) (cid:22) vw ( x ) ∈ N + vw ( y ) = vw ( y ) for y = x . Additionally, for stores of env threads, we no longer require the timestamp of the messageto be unused. So we disregard the msg m check in the global (ST-global) rule (notecrucially this is for env only). The dis threads use (ST-local) from the RA semanticswithout modiﬁcations, and hence choose a timestamp in N , not a raised value.For load instructions, we distinguish between messages generated by dis and env threads.This is a natural consequence of the diﬀerent nature of timestamps, ts for dis and ts + for env messages. For loading a dis message, we use rule (LD-local) (Figure 1) from the RAsemantics without changes.For loading from env threads, we introduce a new rule (LD-local env ) . It is deﬁnedby replacing t in (LD-local) by t env x . We drop the check on the order of timestamps(overwrite it by true); a env message may always be read, independent of the reading thread’sview. The join is dependent on the variable being read from, x . To deﬁne vw t envx vw ,let vw be the view of the thread thread loading the message and vw be the view in themessage. vw t envx vw = ( vw [ x raise ( vw ( x ))]) t vw Thus, if vw ( x ) = 4 and vw ( x ) = 2 + , then ( vw t envx vw )( x ) = 4 + . The update to raise ( vw ( x )) ensures that if it the timestamp on x was ts , it is at least ts + , and hence itcannot read a ( dis ) message with timestamp ts again. We note that the above join operationis not commutative.Now we consider the atomic operation - (CAS-local) - which can only be performed by dis . We have two cases depending on whether (CAS-local) loads from a dis or env message.If it is the latter, then the transition is identical to ( (LD-local env ) ; (ST-local) ) with theadditional condition that the load and store timestamps must be ts + and ts + 1 for some ts .If it is the former (load from dis ) then the load and store timestamps must be ts and ts + 1. Consequently, there cannot be any messages with timestamp ts + . Conversely, if thereis (atleast one) message with timestamp ts + , then the (CAS-local) operation with loadand store timestamps ts and ts + 1 is forbidden. We keep track of such ‘blocked’ intervals( ts , ts + 1) by adding a set B to the global state in the simpliﬁed semantics. The global andlocal transition relations of the full simpliﬁed semantics are in Figure 4, 5.The simpliﬁed semantics exactly captures reachability of the original semantics. Deﬁne α de to be a function which drops all views from messages and local conﬁgurations, and deﬁne= de as equality of the local conﬁgurations modulo views. (LD-global) lcfm ( t ) = lcf lcf ld , msg −−− + lcf msg ∈ m ( m , lcfm , B ) ( t , msg ) −−−−→ ( m , lcfm [ t lcf ] , B ) (Unlabelled) lcfm ( t ) = lcf lcf − + lcf ( m , lcfm , B ) t −→ ( m , lcfm [ t lcf ] , B ) (ST-global) dis t ∈ dis lcfm ( t ) = lcf lcf st , msg , dis −−−−− + lcf ( m , lcfm , B ) ( t , msg ) −−−−→ ( m ∪ { msg } , lcfm [ t lcf ] , B ) (ST-global) env t ∈ env lcfm ( t ) = lcf lcf st , msg , env −−−−−− + lcf msg = ( x , d , vw de ) vw de ( x ) = ts + ts B ( m , lcfm , B ) ( t , msg ) −−−−→ ( m ∪ { msg } , lcfm [ t lcf ] , B ∪ { ts + } ) (CAS-global) lcfm ( t ) = lcf lcf cas , msg r , msg w −−−−−−−− + lcf msg r ∈ m msg w = ( x , d , vw de ) vw de ( x ) = ts + 1if msg r ( x ) ∈ N then ts + B ( m , lcfm , B ) ( t , msg w ) −−−−−→ ( m ∪ { msg w } , lcfm [ t lcf ] , B ∪ { ts } ) Figure 4

Simpliﬁed semantics. Global transition relation. B is a set of blocked time stamps.For an env thread making a store operation, the time stamp ts + ∈ N + can be chosen only when ts has not been blocked ( ts / ∈ B ). ts + is added to B whenever a env thread makes a store operationadding a message ( x , d, ts + ). Likewise, when a dis makes a CAS operation on loading from a message( x , d, vw ) with vw ( x ) = ts ∈ N , then it must be checked that ts + / ∈ B , ensuring that there are no timestamps between ts and ts + 1. ts ∈ N is added to B when a dis thread makes a CAS, loading from amessage ( x , d, vw ) with vw ( x ) = ts ∈ N . (cid:73) Theorem 5 (Soundness and Completeness) . If a conﬁguration cf is reachable under RA,then there is an abstract conﬁguration cf de reachable in the simpliﬁed semantics so that cf de = de α de ( cf ) . Conversely, if a conﬁguration cf de is reachable in the simpliﬁed semantics,then there is a conﬁguration cf reachable under RA such that α de ( cf ) = de cf de . Proof.

At the outset, we note that the only component of the conﬁguration that diﬀersbetween the classical and simpliﬁed semantics is that of the timestamps and hence theview map, vw in concrete and vw de in the abstract conﬁguration. We now give a relationbetween these timestamps. With this relation being clear, the formal equivalence betweenthe semantics can be shown by considering a case analysis of the transitions that the threadscan take. Once again for quick intuition we take the example of timestamps on a singleshared variable x as follows. init disT disT envT disT envT envT disT init disT disT envT disT envT envT disT concrete 0 1 2 3 4 5 6 7abstract 0 1 2 2 + + dwait Godbole, S. Krishna, Roland Meyer 21 (ST-local env ) store operation for env rv ( r ) = d vw de < envx vw de ( x := r , rv , vw de ) st , ( x , d , vw de ) , env −−−−−−−−− + ( skip , rv , vw de ) (ST-local) store operation for dis rv ( r ) = d vw de < x vw de vw de ( x ) ∈ N ( x := r , rv , vw de ) st , ( x , d , vw de ) , dis −−−−−−−−− + ( skip , rv , vw de ) (LD-local env ) load from env messages rv = rv [ r d ] vw de ( x ) ∈ N + ( r := x, rv , vw de ) ld , ( x , d , vw de ) −−−−−−− + ( skip , rv , vw de t envx vw de ) (LD-local) load from dis messages rv = rv [ r d ] vw de ( x ) (cid:22) vw de ( x ) ∈ N ( r := x , rv , vw de ) ld , ( x , d , vw de ) −−−−−−− + ( skip , rv , vw de t vw de ) (CAS-local env ) cas with load from env messages rv ( r ) = d rv ( r ) = d vw de ( x ) = ts + vw de = vw de t envx vw de vw de ( x ) = ts +1 vw de = vw de [ x ts + 1]( cas ( x , r , r ) , rv , vw de ) cas , ( x , d , vw de ) , ( x , d , vw de ) −−−−−−−−−−−−−−− + ( skip , rv , vw de ) (CAS-local) cas with load from dis messages rv ( r ) = d rv ( r ) = d vw de ( x ) = ts ≥ vw de ( x ) vw de = vw de t vw de vw de = vw de [ x ts + 1]( cas ( x , r , r ) , rv , vw de ) cas , ( x , d , vw de ) , ( x , d , vw de ) −−−−−−−−−−−−−−− + ( skip , rv , vw de ) Figure 5

Simpliﬁed semantics. Thread-local transition relation. Margin annotations providedescription. The store rules refer to the thread type ( dis / env ) executing the instruction; the loadrules refer to the thread type which generated the message that is being loaded (similarly for theload part of CAS operations which can only be executed by dis threads). In Rule (ST-local env ) ,we use vw de < envx vw de to mean raise ( vw de ( x )) (cid:22) vw de ( x ) ∈ N + and vw de ( y ) = vw de ( y ) for all variables y = x . In Rule (LD-local env ) , vw de t envx vw de is deﬁned as vw de [ x raise ( vw de ( x ))] t vw de . The join t always means an element wise max over the relevant domain. In this fashion the ts + are abstracted env timestamps between any two dis timestamps. Wedeﬁne the abstraction (similarly concretization) function as the function that transforms alltimestamps in the run as shown above. With the above timestamp abstraction/concretizationin mind we show that abstract and concrete conﬁgurations are equivalent in terms ofreachability.We prove this by induction on the length of a run. We show that a concrete (simil-arly abstract) conﬁguration is reachable if and only if it has some abstraction (similarlyconcretization) that is reachable. Base Case . In the base case equivalence is maintained as the initial concrete conﬁgurationis equivalent to its simpliﬁed conﬁguration where all timestamps are 0. Recall that alltimestamp transformations maintain 0 as a ﬁxpoint. Hence the initial thread-local statesand memory are equivalent for the concrete and abstract semantics.

Inductive Case - Concrete to Abstract

For the inductive case, assume that we havethe result after n ∈ N steps in a computation. Now we induct by considering cases over typesof the n + 1 th instruction in the computation. Silent : Silent (thread-local) instructions are handled trivially. They only change the threadlocal state identically for the concrete and abstract conﬁgurations.

Load : A load transition can be either from a dis or a env . For both the cases, we note thatthe timestamp abstraction maintains (including equality) the relative order of timestamps.Hence whenever a concrete message is readable, so is the corresponding abstract message.

Store : This follows since the corresponding thread in the abstract conﬁguration can simulatethe store using the corresponding timestamp ( ts + ∈ N + in case of a env store, and ts ∈ N incase of a dis store). Note once again that, the abstraction preserves order on the timestampsand consequently, a store is allowed in the abstract semantics if it was allowed in the concretecomputation. CAS : In this case, we note that the set B keeps track of which timestamps are allowed forCAS operations. If the CAS operation read from a env message, the semantics follows from (LD-local env )(ST-local) . However, if the CAS load is performed using the store of a dis , then it implies that there are no env timestamps between the load,store timestamps( ts , ts + 1) of the CAS (similar to disT and disT in the ﬁgure above). Consequently, we seethat the set B in the abstract semantics does not contain the timestamp ts + ( ts + is addedto B the moment a env makes a store with the timestamp ts + to disallow a CAS with load,store timestamps ts and ts + 1). Thus the equivalent CAS operation is also allowed underthe abstract semantics. Inductive Case - Concrete to AbstractSilent : Silent instructions are handled trivially they only change the thread local stateidentically for the concrete and abstract conﬁgurations.

Load : We consider two cases depending on whether the load happens from a dis thread ora env thread.In the case where we load from a dis message, the semantics are equivalent between theabstract and concrete transitions, since we compare the timestamps ts + ≺ ts + 1 and ts ≺ ts + 1 (see the rule (LD-local) in Figure 5). Given that the concretization function(like the abstraction) maintains relative order between dis and env timestamps, the loadis also feasible in the concrete semantics.In the second case, the load is from a env . By inductive hypothesis, we have the concretecomputation till the load transition. In particular, the message ( x , d, vw ) we wish to loadhas already been generated in the concrete computation. To this concrete computation ρ obtained by inductive hypothesis, we invoke the inﬁnite supply lemma with t ∗ as thereading thread’s local view on x to generate the computation M ( ρ ) . M ( ρ ↓ env ) . ρ with the fresh message ( x , d , vw ). By point (2) in the lemma the message is loadable, vw ( x ) ≥ µ x ( t ∗ ). Note how we apply the timestamp lifting function to t ∗ since the readingthread’s new concrete timestamp has changed. Additionally by points (1) and (3) therelative order of timestamps in vw in variables other than x remain the same w.r.t the dis thread messages. This implies that after reading the message, the view of the readingthread will only increase on x . Hence for all other variables it will remain the same thusmaintaining equivalence between the timestamps in the concrete and abstract run. Store : The store transition for dis is identical to its concrete counterpart. For a env thread,we note that we generate copies of the abstract ts + timestamp to get a sequence of concretetimestamps. Here we can generate an arbitrary number of copies and hence the thread, willalways ﬁnd a vacant timestamp for its store. CAS : When a dis thread makes a CAS, it can either read from a env or from the store of a dis thread. In the latter case let the timestamps of the load,store in the CAS be ts , ts + 1.Then in the abstract semantics we require that ts + B . This implies that in the concretesemantics too, there are no env timestamps between the load,store timestamps and henceCAS is possible in the concrete semantics too. In the former case we again use the inﬁnitesupply lemma as we did in the case of loads, to generate a loadable env message. (cid:74) dwait Godbole, S. Krishna, Roland Meyer 23 This section discusses the safety veriﬁcation problem for the class env ( nocas ) k dis ( acyc ) k· · · k dis n ( acyc ) consisting of a set of n distinguished dis threads executing a loop-free programin the presence of an unbounded number of env threads. We show that the safety veriﬁcationproblem for this class of systems can be decided in PSPACE by leveraging the simpliﬁedsemantics from Section 4. We will assume that the domain

Dom is ﬁnite. Parallely, wedemonstrate the ability to improve automatic veriﬁcation techniques by showing how toencode the safety veriﬁcation problem (of whether all assertions hold) into Datalog programs.The encoding is interesting for two reasons: (1) it yields a complexity upper bound that,given [1], came as a surprise; (2) it provides practical veriﬁcation opportunities, consideringthat Datalog-based Horn-clause solvers are state-of-the-art in program veriﬁcation [17, 18]. (cid:73)

Theorem 6.

The safety veriﬁcation problem for env ( nocas ) k dis ( acyc ) k · · · k dis n ( acyc ) , n ∈ N is non-deterministic polynomial-time relative to the query evaluation problem in linearDatalog ( NP PSPACE ) , and hence is in PSPACE . We note that the theorem mentions non-deterministic polynomial time relative to thelinear Datalog oracle. We provide a non-deterministic poly-time procedure A lgo , that, givena veriﬁcation instance converts it to a Datalog problem P s.t. (1) for a ‘yes’ veriﬁcationinstance, atleast one execution of A lgo results in P having successful query evaluation and(2) for a ‘no’ veriﬁcation instance, no execution of A lgo leads to the resulting P to havesuccessful query evaluation.Linear Datalog is a syntactically restricted variant of Datalog for which query evaluationis easy to solve ( PSPACE ) at the cost of being inconvenient as an encoding target. Giventhat we show a

PSPACE upper bound on the parameterized safety veriﬁcation for theclass env ( nocas ) || dis ( acyc ) || . . . || dis n ( acyc ), in principle, we could have directly encodedthe parameterized safety veriﬁcation problem instance as a linear Datalog program. Forconvenience of encoding, we do not directly reduce safety veriﬁcation into query evaluation inlinear Datalog, but use an intermediate notion of Cache

Datalog. To make the ideas behindour reduction clear, we proceed in three steps. We introduce

Cache

Datalog, which is Datalog with an additional parameter, called the

Cache , that turns out decisive in controlling complexity of encodings in the following sense :every

Cache

Datalog program can be turned into a linear Datalog program at a cost that islinear in the size of the program plus that of the

Cache (Lemma 7), We then show that A lgo generates Cache

Datalog problems that satisfy the description fromthe previous paragraph (Lemma 8), and We then argue that for all

Cache

Datalog instances generated by A lgo , a Cache of polynomialsize is suﬃcient for query evaluation (Lemma 9).This shows Theorem 6.

Linearizing Datalog

A Datalog program

Prog [24] consists of a predicate set

Preds , adata domain

Data , and a set

Rules of rules (also called clauses). Each predicate comes with aﬁxed arity >

0. A predicate P of arity j is a mapping from Data j to { true, f alse } . An atom consists of a predicate P ( t , . . . , t j ) and a list t , . . . , t j of arguments, where each t i is a term.A term is either a variable or a constant; a term is a ground term if it is a constant, and anatom is a ground atom if all its terms are constants. A positive literal is a positive atom P ( t , . . . , t j ) and a negative literal is a negative atom ¬ P ( t , . . . , t j ), and a ground literal isa ground atom. A rule has the form head : − body , . . . , body t where head and body i are positive literals. A rule with one literal in the body is a linear rule, one without a body is called a fact . A linear Datalog program is one where all rules arelinear or are facts. An instantiation of a rule is the result of replacing each occurrence ofa variable in the rule by a constant. For all instantiations of the rule, if all ground atomsconstituting the body are true then the ground atom in the head can be inferred to be true.All instantiations of facts are trivially true. We write Prog ‘ g to denote that the groundatom g can be inferred from program Prog . Query Evaluation Problem.

The query evaluation problem for Datalog is, given a query instance ( Prog , g ) consisting of a Datalog program Prog and a ground atom g , todetermine whether Prog ‘ g . When studying the combined complexity , both Prog and g aregiven as input [65]. It is known [38] that combined complexity of query evaluation for linearDatalog is in PSPACE , while allowing non linear rules raises the complexity to

NEXPTIME ( [65] and [44]). Motivated by veriﬁcation, there has been interest in linearizing Datalog [45].

Adding

Cache to Datalog:

Cache

Datalog . We introduce to Datalog the concept ofa

Cache . A

Cache is a set of ground atoms that is used to control the inference process.The resulting program is called a

Cache

Datalog program. In the presence of a

Cache , thesemantics of Datalog is adapted by the following two rules.

Add : For an instantiated rule, the ground atom in the head can be inferred and added to

Cache only when all the ground atoms in the body are in

Cache . Drop : Atoms in

Cache can be dropped non-deterministically.The standard semantics of Datalog can be recovered by monotonically adding all inferredatoms (starting with facts) to the

Cache and never dropping anything. To show the upperbound, we use a notion of inference that takes into account the size of the

Cache andminimizes it. For a

Cache

Datalog program

Prog and k ∈ N , we write Prog ‘ k g to meanthat ground atom g can be inferred from Prog with a computation in which | Cache | ≤ k , thenumber of atoms in Cache is always at most k . The Cache size measures the complexity oflinearizing

Cache

Datalog as follows. (cid:73)

Lemma 7.

Given a

Cache

Datalog program

Prog , a ground atom g , and a bound k , intime quadratic in | Prog | + | g | + k we can construct a linear Datalog program Prog so that Prog ‘ k g iﬀ Prog ‘ g . Proof.

To go from

Cache

Datalog to linear Datalog, the idea is to simulate the

Cache using a new predicate

CachePred of arity k in the constructed linear Datalog program Prog . We know that a Cache of size k suﬃces in the Cache

Datalog program, so any rule head : − body , . . . , body p in the Cache

Datalog is s.t. p < k . Simulating Cache

Intuitively, the predicate

CachePred ( t , t , · · · , t k ) represents that theterms t i are members of the Cache . We can simulate the set

Cache by reshuﬄing termsusing rules that swap the i th and j th elements with rules of the form, CachePred ( t , · · · , t j , · · · , t i , · · · , t k ) : − CachePred ( t , · · · , t i , · · · , t j , · · · , t k )There are quadratically many such rules. Rules

Consider a rule R with a body of size p in Cache

Datalog as follows. head : − body , . . . , body p We convert this into a rule which matches the ﬁrst p terms of CachePred with the elementsof the body. If there is such a matching, the term head can be inferred and added into dwait Godbole, S. Krishna, Roland Meyer 25

Cache . This is simulated by replacing some term amongst t i with the term in the head whilekeeping other terms the same. CachePred ( t , · · · , t i = head , · · · , t k ) : − CachePred ( t = body , · · · , t p = body p , t p +1 , · · · , t k )There are k choices for the term to be replaced. Thus we have k new rules per rule in theoriginal program. Final Inference

Finally, since we know that each element of

Cache is true, we add theinference rules, t i : − CachePred ( t , t , · · · , t p ) for 1 ≤ i ≤ p Now, g can be generated if g ever enters Cache , i.e.

CachePred ( t , t , . . . , g, . . . , t p ) for someother terms t i . Then we can use the above inference rule to infer g .This shows that we need at most quadratically many rules each with a single body, to giveus a linear Datalog program. (cid:74) Theorem 5 tells us that safety veriﬁcation under RA is equivalent to safety veriﬁcation in thesimpliﬁed semantics. Safety veriﬁcation in the simpliﬁed semantics, in turn, can be reducedto the

Message Generation (MG) problem.Given a parametrized system c and a message msg = ( x ∗ , d ∗ , _) called goal message , doesthere exist a reachable conﬁguration cf de = ( m de , lcfm de ) such that msg ∈ m de (for some vw de )?To see the connection between MG and safety veriﬁcation, note that we can replaceeach assert false statement in the program by x ∗ := d ∗ for variable x ∗ and value d ∗ unusedelsewhere. The system is unsafe if and only if a goal message msg = ( x ∗ , d ∗ , vw de ) isgenerated for some vw de .While encoding into Datalog, we non-deterministically guess vw de . For this, we cruciallyshow that there are only exponentially-many choices of vw de which need to be enumerated.Henceforth we assume that the queried goal message msg can have arbitrary vw de . Given c , msg , our non-deterministic poly-time procedure A lgo satisﬁes the following, the proof ofwhich is in Sections 5.1.2. (cid:73) Lemma 8.

Given a parametrized system c and a goal message msg , Message Generation(MG) holds iﬀ there is some execution of A lgo that generates a query instance ( Prog , g ) suchthat Prog ‘ g . The construction of Prog and g is in (non-deterministic) time polynomial in | c | . The procedure A lgo generates one query instance ( Prog , g ) per execution. The postpone thefull description of A lgo and ﬁrst give some intuition. Since the parameterized system consistsof n loop-free dis threads, each can execute only linearly-many instructions in their size. Thetotal number of instructions executed (and hence the total number of timestamps used) bythe dis threads is polynomial in | c dis | , the combined size of dis programs (concretely the sumof sizes of individual c i dis programs). A lgo guesses the dis threads part of the computationand generates a query instance ( Prog , g ). Prog itself uses four main predicates. The e nvironment message predicate emp ( x , d , vw de )represents the availability of a env message on variable x with value d and view vw de . The environment thread predicate etp ( lc , rv , vw de ) encodes the env thread conﬁguration, where lc is the control-state, rv is the register valuation and vw de is the thread view. We alsohave similar message and thread predicates for dis threads. The distinguished messagepredicate dmp ( x , d , vw de ) represents the availability of a dis message. Additionally, for each dis thread i ∈ [ n ], we have a distinguished thread predicate dtp i ( lc , rv , vw de ) that encodes theconﬁgurations of dis [ i ].In the set of rules, we have the fact dmp ( x , d init , vw deinit ) for each x ∈ Var with d init theinitial value and vw deinit the initial view. We also have (i) facts etp ( λ init , rv init , vw deinit ) and dtp [ i ]( λ init , rv init , vw deinit ) representing the initial states of both env and dis threads, (ii) rulescorresponding to the env transitions and the guessed dis thread run fragments. Finally, thequery atom g , is a ground atom ∈ { emp , dmp } and captures the goal message msg beinggenerated. The instances generated in the non-deterministic branches of A lgo diﬀer only dueto the guessed dis run and the atom g .We now describe the full Datalog program, also proving Lemma 8. A lgo for query instance generation We discuss the details of the procedure A lgo which generates the query instance ( Prog , g )non-deterministically. We use the following predicates in the constructed Datalog program. emp ( msg ): the message generation predicate for env threads, where msg is a message; etp ( lc , rv , vw de ) : the thread state predicate for env threads dmp ( msg ): the message generation predicate for dis threads, where msg is a message; dtp [ i ]( lc , rv , vw de ): the thread state predicate, one for each dis thread avail ( x , ts + ) : the timestamp availability predicate, which indicates that a timestamp ts isnot blocked by a CAS operation, per variable.The Datalog program generated has two parts, one does not depend on the non-deterministic choices made by A lgo , while the other does. We describe the former part ﬁrst,these rules for the Datalog Program are in Figure 6. The second set of rules, depending onthe nondeterministic choice of A lgo is in Figure 7. The ﬁrst set of rules in the Datalog program (Figure 6). The facts, in green, providethe ground terms for the init messages as well as initial state of the dis and env threads. Theorange rules capture the thread local transitions of the env threads. We deviate a bit from thestandard notation for programs here, and instead view them as labelled transition systems.It is easy to see that the two notions are equivalent. The initial state labels are λ envinit for the env threads and λ i init for the dis threads. For a pair of labels, we write λ i −→ λ to denotethat λ can be reached from λ by executing i . In the Datalog program, we have a rule foreach such transition in the program. The thread-local transitions are in orange. Loads arein violet (ﬁrst corresponding to loads from env messages, the second for loading from dis messages). For loads, the rule requires a term with the message predicate (from which thethread is reading) in the body of the rule. Stores are in pink, the ﬁrst rule corresponds tothe new thread-local state after execution of the store. The second rule corresponds to thegeneration of a term for the new message (in the head). Though we use some higher ordersyntax for rules such as assume , t , and < envx we note that these can be easily translated topure Datalog with small overhead given the polynomial size of the domain and the constantarity of the predicates.These rules capture completely the env thread component of the run. As we had mentionedearlier, the component of the query instance that diﬀers due to non-determinism of A lgo is dwait Godbole, S. Krishna, Roland Meyer 27 rule condition on program of env threads, c env etp ( λ , rv , vw de ) : − etp ( λ , rv , vw de ) if λ skip −−→ λ etp ( λ , rv , vw de ) : − etp ( λ , rv with [[ e ]]( rv ( r )) = 0 , vw de ) if λ assume e ( r ) −−−−−−→ λ etp ( λ , rv [ r e ( r )] , vw de ) : − etp ( λ , rv , vw de ) if λ r := e ( r ) −−−−→ λ etp ( λ , rv [ r ← d ] , vw de t envx vw de ) : − etp ( λ , rv , vw de ) , emp ( x , d , vw de ) if λ r := x −−−→ λ etp ( λ , rv [ r ← d ] , vw de t x vw de ) : − etp ( λ , rv , vw de ) , dmp ( x , d , vw de ) , vw de < x vw de if λ r := x −−−→ λ etp ( λ , rv , vw de ) : − etp ( λ , rv , vw de ) , avail ( x , vw de ( x )) if λ x := r −−−→ λ , with vw de < envx vw de emp ( x , rv ( r ) , vw de ) : − etp ( λ , rv , vw de ) , avail ( x , vw de ( x )) if ∃ λ . λ x := r −−−→ λ , with vw de < envx vw de (a) The (ﬁxed) set of rules in the Datalog program encoding the transition system of the env threads.Silent transitions (in orange); memory accesses: loads (in violet) and stores (in pink). fact comment dmp ( x , d init , vw deinit ) : − for all variables xetp ( λ envinit , rv init , vw deinit ) : − λ envinit is initial state of env threads dtp [ i ]( λ i init , rv init , vw deinit ) : − λ i init is initial state of dis [ i ] thread (b) First set of facts in the Datalog program; these do not depend on the non-deterministic guess made by A lgo for the computation of dis threads. These facts encode the initial conﬁgurations of the threads andthe initial messages. Figure 6

First set of rules for the Datalog Program. This ﬁxed rule set is independent ofnon-determinism of A lgo . the dis part of the run. Essentially, A lgo guesses in polynomial time the executions of all the dis threads. This is possible since they are loop-free and hence execution lengths are linearin the size of their speciﬁcations. We now describe this second part of the Datalog queryinstance. Second set of rules in the Datalog program (Figure 7). We have a bound on thenumber of write timestamps that can be used by the dis threads - an easy bound is thecombined number of instructions in dis threads, | c dis | . We will refer to this bound as T .By the simpliﬁed semantics it suﬃces to consider the timestamps { , + , · · · , T, T + } . Thisfollows since, the dis threads perform atmost T writes. Hence we need only T timestampsof the form N . Additionally, we have only one timestamp of the form N + between any twotimestamps of the form N . This shows that the view terms in the predicates of the Datalogprogram can be guessed in polynomial space (since T is polynomial in the input).Now for each dis thread i , the procedure A lgo non-deterministically guesses the computa-tion ρ i for dis [ i ]. That is, A lgo guesses the timestamps and the register valuations of dis i at each conﬁguration in this run, along with the messages dis i loaded from. Post this, itconverts ρ i to a set of rules which are then added to the earlier set from Figure 6.Consider the computation ρ i ≡ λ i init i −→ λ i −→ λ · · · i | ρi | −−−→ λ | ρ i | of length | ρ i | . Let theviews of dis thread i at point j in the run be given as vw de j . Additionally, if i j is a loadinstruction, A lgo also guesses the message that was read by the dis thread i . Each instruction i j in this computation is then converted into one amongst the rules in Figure 7a dependingon the instruction i j executed, represented in the ﬁgure as ‘condition’. Additionally to encodethe N + timestamps that have not been occupied by CAS operations (and hence are free to use by the dis threads), we have the rule in 7b. rule condition on thread transition i j of computation ρ i for thread dis i dtp [ i ]( λ j , rv , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) i j = skipdtp [ i ]( λ j , rv , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) i j = assume e ( r ) ∧ [[ e ]]( rv ( r )) = 0 dtp [ i ]( λ j , rv [ r r := e ( r )] , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) i j = r := e ( r ) dtp [ i ]( λ j , rv [ r ← d ] , vw de t envx vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , emp ( x , d , vw de ) i j = r := x ∧ thread loads msg = ( x , d , vw de ) ∧ vw de ( x ) ∈ N + dtp [ i ]( λ j , rv [ r ← d ] , vw de t x vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , dmp ( x , d , vw de ) i j = r := x ∧ thread loads msg = ( x , d , vw de ) ∧ vw de ( x ) ∈ N ∧ vw de < x vw de dtp [ i ]( λ j , rv , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) dmp ( x , rv ( r ) , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) i j = x := r ∧ thread stores msg = ( x , rv ( r ) , vw de ) ∧ vw de < x vw de dtp [ i ]( λ j , rv , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , emp ( x , d , vw de ) i j = cas ( x , d , r ) ∧ vw de ( x ) ∈ N + , vw de = vw de t envx vw de , dmp ( x , rv ( r ) , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , emp ( x , d , vw de ) vw de ( x ) = ts + , vw de = vw de [ x → ts + 1] dtp [ i ]( λ j , rv , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , dmp ( x , d , vw de ) i j = cas ( x , d , r ) ∧ vw de ( x ) ≤ ts = vw de ( x ) dmp ( x , rv ( r ) , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , dmp ( x , d , vw de ) vw de = vw de t x vw de , vw de = vw de [ x → ts + 1] (a) These rules are chosen depending upon the nondeterministic choice made by A lgo of the computation ρ i of thread i . Each instruction i j executing in ρ j is then mapped to one of the rules from above dependingupon which condition (right column) is satisﬁed. Rules for silent i j (in orange); memory accesses, loads(in violet) and stores (in pink), and CAS (gray). The second pink rule corresponds to message generationby the thread i executing a store instruction. The ﬁrst two CAS rules correspond to the case wherethe load is from a env message and the last two correspond to the load from a dis message. The ﬁrstrule for each case is the thread-local state change rule while the second rule generates the ground atomcorresponding to the message generated by the CAS operation. fact condition on availability of timestamps avail ( x , ts + ) : − ts ∈ { , · · · T } ∧ no dis performs a cas operation with timestamps ( ts , ts + 1) (b) This fact corresponds to the availability of a N + timestamp for stores by env threads, which is knownonce all the dis computations have been guessed. These rules are not generated on a per- dis threadbasis but rather once the computations ρ i for all dis threads have been non-determinsitically guessed.Referring to the simpliﬁed semantics, this rule captures ts B , the fact that there is no CAS operationwith timestamps ( ts , ts + 1). Note that the avail predicate plays a role in inferring the env thread stateand message predicates as seen in the last two rows in Figure 6 : we can infer the env thread state andmessage predicates with a view vw ( x ) only when the respective timestamp is not blocked. This in turnis used in the ﬁrst CAS operation (ﬁrst gray row, Figure 7(a)) ) when loads happen from a env thread :the merged view vw de ( x ) = vw de ( x ) t envx vw de ( x ) is ts + , and the new timestamp after CAS is ts +1. Notethat this is possible since (i) if the timestamp of the env thread for x from where we load was ts + , thenthere is no dis thread with a timestamp ts for x and, (ii) if the timestamp of the env thread from wherewe load was ≺ ts + , then the timestamp of the dis thread performing the CAS was ts for x . In both cases,the timestamp after CAS will be ts +1. Figure 7

Second set of rules for the Datalog Program. This rule set depends on the nondetermin-istic choice made by A lgo for the computations of the env threads Since we have the polynomial bound on T , it is easy to see that the rules above for the run ρ i executed by each dis thread i can be generated in polynomial-time after nondeterministicallyguessing ρ i . These (non-determinism dependent) rules along with the rules from Figure 6together form the complete Datalog program. ↔ Computations in theSimpliﬁed Semantics) and proof of Lemma 8

Now we see how an inference process in the (complete) Datalog program corresponds toa computation in the simpliﬁed semantics. To do this, we give invariants which relate theinference of atoms in the Datalog program with the existence of events in the computation. dwait Godbole, S. Krishna, Roland Meyer 29

These invariants together imply the equivalence between an inference sequence in the Datalogprogram and a computation of c under the the simpliﬁed semantics. Finally, if the goalmessage msg is reachable at the end of a computation ρ of c , then correspondingly, thanks tothe invariants we obtain, we can also infer the ground term g being dmp ( msg ) or emp ( msg )in the Datalog program depending on whether the goal message was generated by a dis thread or env thread in the computation. env thread-local state invariantThe ground atom etp ( λ, rv , vw de ) can be inferred iﬀ some env thread can reach the lcf = ( λ, rv , vw de ).This says that some env thread is able to reach the state from its transition system withlabel λ such that the thread-local view and the register valuation at that time are vw de and rv respectively. We can prove that this holds by induction on the length of the run andby noting from the Datalog rules (Figure 6) that there is a transition λ i −→ λ wheneverthere is a rule corresponding to that. Additionally, the load rules (in blue) require thatthe corresponding message atoms ( emp / dmp ) holds which as we will see below implies thepossibility of the generation of a message in the memory. env thread message invariantThe ground atom emp ( x , d , vw de ) can be inferred if the corresponding message msg =( x , d , vw de ) can be generated in the simpliﬁed semantics by some env thread.Note that a ground atom of the form emp ( x , d , vw de ) can only be inferred using the last rulein Figure 6. The body of this rule contains the term etp ( λ, rv , vw de ). This, if true, impliesthat some env thread can reach the corresponding thread-local state by the ﬁrst invariant(above). An env thread in this thread-local state can generate the message ( x , rv ( r ) , vw de )since there is an outgoing transition from λ with instruction x := r . Note that we have thecheck on the existence of the transition to ensure that the message can indeed be generated.This is required for the last rule to exist in the program. This implies that the message canin fact be generated. dis thread-local state For each dis thread i , we have the following invariant.The ground atom dtp [ i ]( λ, rv , vw de ) can be inferred iﬀ the dis thread i can reach the lcf = ( λ, rv , vw de ).This is just the dis analog of the ﬁrst invariant for env threads. This can also be provedby induction on the length of the run (or the inference sequence). Also analogous to theinvariant for the emp , we have an invariant for dis messages. dis thread message invariantThe ground atom dmp ( x , d , vw de ) can be inferred if the corresponding ( dis ) message msg = ( x , d , vw de ) can be generated in the simpliﬁed semantics by some dis thread. avail timestamp availability invariantIf the fact avail ( x , ts + ) is in the Datalog program then we have ts B throughout thecomputation ρ of the simpliﬁed semantics.The base case for message predicates holds since the facts dmp ( x , d init , vw deinit ) for all variablesare given in the Datalog program. The base case for thread state predicates holds due tothe fact etp ( λ init , rv init , vw deinit ) which captures the initial state of the thread. The inductivesteps can be formally proved by considering a computation ρ under the simpliﬁed semanticsand mapping each transition in ρ to an inference step in the Datalog program. For theconverse, we assume an inference sequence (a sequence of invocations of the rules) and, for each rule invoked to infer a new ground atom, we show that a corresponding transition canbe taken by a thread in the simpliﬁed semantics so that the invariants are maintained. Thisin turn, is done by taking cases on the next instruction to be executed.The equivalence between transitions in a computation (hence a computation ρ ) of thesimpliﬁed semantics and the application of rules/facts in the Datalog program, leadingto reachability of some message msg in ρ iﬀ the corresponding ground term dmp ( msg ) or emp ( msg ) is inferred in Datalog is suﬃcient to prove Lemma 8. In particular, the generation of msg de in some computation ρ of c gives a sub-computation ρ ↓ dis performed by the dis threads.We consider the Datalog query instance ( Prog , g ) generated where A lgo correctly guesses ρ ↓ dis . By the message generation invariant, the ground atom dmp ( msg de ) or emp ( msg de )corresponding to msg de can be inferred, Prog ‘ g , giving the forward direction of the lemma.For the reverse direction, we note that Prog ‘ emp ( msg de ) or Prog ‘ dmp ( msg de ) immedi-ately implies that the message can be generated in some computation of the system c (the dis computation is already determined in the guessed program Prog ). Cache

Size

Having described the encoding(s), the challenge now is to provide a polynomial bound on thecache size for the query instances generated by A lgo . The Cache behaves like a memoized setof atoms which are used for the inference process. The reason why a polynomial sized

Cache suﬃces is that we can “forget” (remove from

Cache ) previously inferred atoms when theyare not being actively used. We use this crucially in the context of env predicates, emp , etp .Technically this is possible since the arbitrary replication property of env threads allows us to“forget” the state of the previously simulated env thread and simulate a fresh copy instead.Let Q = | Dom || Var | + | c dis | . We show that a Cache of size O ( Q ) is suﬃcient to infer g . (cid:73) Lemma 9.

For each ( Prog , g ) generated by A lgo , Prog ‘ g if and only if Prog ‘ k g with k ∈ O ( Q ) . An inference sequence performed on

Prog corresponds to a computation of the parameterizedsystem c in the simpliﬁed semantics (Section 5.1.2). Hence, to see that the above size of Cache is suﬃcient we analyze the structure of computations in the simpliﬁed semantics. Theanalysis will reveal a dependency relation among the messages generated. We will see thatthis gives enough information to guide the Datalog computation so as to use small sized

Cache .Consider a computation ρ de ending in the conﬁguration last ( ρ de ) = ( m de , lcfm de ). Forevery message msg de in m de , we deﬁne genthread ( msg de ) as the ﬁrst thread which added msg de to the memory m de . (Recall that the simpliﬁed semantics admits repeated insertionsfor env messages due to reuse of timestamps from N + ). We deﬁne depend ( msg de ) as the setof messages which genthread ( msg de ) reads from, before generating the ﬁrst instance of msg de .We deﬁne the notion of a dependency graph for a computation ρ de . (cid:73) Deﬁnition 10.

The dependency graph of a computation ρ de with last ( ρ de ) = ( m de , lcfm de ) is the directed graph G ρ de = ( V, E ) whose vertices V = m de are the messages in the ﬁnalconﬁguration and whose edges reﬂect the dependencies, ( msg de , msg de ) ∈ E if msg de ∈ depend ( msg de ) . As depend ( − ) is based on the linear order of the computation, the dependency graph is acyclic.The acyclicity of dependency graphs follows immediately from the deﬁnition of depend . Ifthere is a cycle, then all the threads involved in the cycle would be dependent on each other dwait Godbole, S. Krishna, Roland Meyer 31 for the ﬁrst generation of the respective message, thus causing a deadlock. We denote thesets of sink and source vertices of G by sink ( G ) resp. source ( G ). A path in G is also called a dependency sequence . A path or dependency sequence m → m → m → . . . m n − → m n thus says that m was read by some thread which generated m , m in turn was read by athread which generated m and so on till the thread which generated m n read m n − . Givensuch a sequence, we say m i is an ancestor of m j if i < j . The height of a vertex v is thelength of a longest path from a source vertex to v . The maximal height over all vertices is height ( G ). See Figure 8 for an example. ( x , ,

00) ( y , , y , , + )( x , , + + )( y , , + + ) ( x , ,

00) ( y , , y , , + )( x , , + + ) ( y , , + + ) x = 0 = y T T y. load (0)y. load (1)x. store (1)y. store (2) x. load (0)y. store (1)x. load (1)y. store (2) Figure 8

Two possible dependency graphs for the code snippet. T , T are both env threads. Thecolor of each message msg de signiﬁes genthread ( msg de ) ( T orange, T violet, init gray). We denotethe view as a vector t x t y . Since we only consider the thread adding a message for the ﬁrst time genthread ( y , , + + ) can be either T (left graph) or T (right graph). Compact Computations . Unfortunately, dependency graphs may contain exponentiallymany vertices (due to the views), and given the

PSPACE -hardness in Section 7 there is noway to reduce this to polynomial size. Yet, there are two parameters that we can reduce, the‘fan-in’ of each vertex v (number of messages read by genthread ( v ) before generating v ), andand the ‘height’ of the dependency graph (longest dependency sequence). A computation ρ de is compact if its dependency graph G ρ de satisﬁes the following two bounds. (1) Every message v depends on a small number of other messages, | depend ( v ) | ≤ Q . (2) The dependencysequences are polynomially long, that is, height ( G ρ de ) ≤ Q . The following lemma says thatcompact computations are suﬃcient: (cid:73) Lemma 11.

Any message that can be generated in the simpliﬁed semantics, can be generatedby a compact computation.

Proof.

We prove both parts (fan-in and height) of this lemma by showing that if there existsa computation whose dependency graph violates the bound for fan-in (similarly height), thenthere must exist a computation whose dependency graph has a lower fan-in (height) with therest of the graph (fan-ins of other vertices) unchanged. We ﬁrst show this for fan-in. We willassume that the programs c dis executed by dis threads have been speciﬁed as a transitionsystem (note that we can interconvert between the while-language and transition systemrepresentation with only polynomial blowup). Then | c dis | is an upper bound on the totalnumber of transitions in all dis threads together. Fan-in . Suppose to the contrary, we had | depend ( v ) | > | Dom || Var | + | c dis | for some message v . Consider the thread p = genthread ( v ) which generated the message represented by vertex v for the ﬁrst time. There are only | Dom || Var | distinct (variable, value) pairs, | Dom || Var | many init messages and only | c dis | many dis messages ( | c dis | is an upper bound on the numberof transitions the dis thread can take). Hence by a pigeonhole argument, p must have readtwo env messages with same (variable, value) pair but distinct abstract views. Let thesemessages be m = ( x , d , vw de ) and m = ( x , d , vw de ) where the abstract views are unequal.Without loss of generality assume that p = genthread ( v ) read m ﬁrst, and m later (inorder) before it generated v . It can be seen that any time p read m , it could have read m instead. This follows sincetimestamp comparisons are irrelevant when reading from env messages. The thread-localview obtained on replacing a read of m with that of m will only decrease or remain thesame. From the simpliﬁed semantics, after reading m once, the thread view vw de satisﬁes vw de w vw de (per-variable). Hence reading from m again leads to the thread view being vw de > envx vw de . On the other hand, after reading m , the view will be vw de t envx vw de whichis clearly higher than vw de . Indeed, instead of reading from m , the loading thread canread from m , resulting in a lower view for x (compared to reading from m ).Let ρ denote the subcomputation starting from the position right after reading from the env message m . We can see that if we replace this read operation by reading from m , wecan continue on ρ as before.Indeed, all store operations on ρ are independent of this load from m (or m ).Consider a load operation along ρ . A load on a variable y = x is not aﬀected clearly.Consider now a load on x performed by loading some message m . Assume the load isperformed by p . The view of x along ρ for thread p was coming from m which was aleast that given by m ; indeed if loading from m was possible in ρ when the view on x was at least vw ( x ), it deﬁnitely is possible now with a lower view on x .Lastly, consider a CAS operation on variable x , along ρ . Assume the load was madefrom m . If vw de ( x ) = ts +2 in m , then the CAS operation will add a new message m on x with vw ( x ) = ts + 1. However, note that the same thread can still perform CAS byreading from m , with vw de ( x ) = ts +1 in m (with ts +1 < ts +2 ) by adding a new message m on x with vw ( x ) = ts + 1.Hence reading from m instead of m does not aﬀect the sub computation ρ . Thus we caneliminate all reads of m to decrease | depend ( v ) | . Thus, | depend ( v ) | ≤ | Dom || Var | + | c dis | for each vertex v . Height . Let there be a dependency sequence of length greater than 2 | Dom || Var | + | c dis | .There are only | Dom || Var | (variable, value) pairs, | Dom || Var | many init messages and atmost | c dis | many dis messages. Hence by a pigeonhole argument, for a dependency sequence longerthan 2 | Dom || Var | + | c dis | there exists a (variable, value) pair ( x , d ) such that there are two env messages m = ( x , d , vw de ) and n = ( x , d , vw de ) along it. Without loss of generality, let n be an ancestor of m . So, n has been read before generating m . Then we must have vw de w vw de by the RA Semantics (since the thread generating m indirectly accumulatesthe view of n ). Then the thread reading from (depending-upon) m could have directlyread from n instead (note that since m itself depends on n , by the time m has beengenerated, n must have been as well). By reading from n its view may only decrease orremain the same thus not aﬀecting the run (as justiﬁed above). Thus we can eventuallyreduce the dependency sequences so that all have length at most 2 | Dom || Var | + | c dis | .This gives us the result. (cid:74) In Cache

Datalog, the inference of an atom g from the program Prog involves a sequenceof applications of the Add (to

Cache ) and Drop (from

Cache ) rules that ends with g beinginferred. Such a sequence for Prog ‘ g corresponds to a run ρ de under the simpliﬁed RAsemantics. We show that this follows by the structure of the query instance ( Prog , g ). Therun ρ de can be compacted to ρ de by Lemma 11. From the dependency graph of ρ de wecan read oﬀ an inference strategy that keeps the Cache size polynomial in | Var | , | Dom | and | c dis | . The following lemma formalizes this argument and so proves Proposition 9. We now dwait Godbole, S. Krishna, Roland Meyer 33 proceed by showing Lemma 12. This lemma along with Lemma 11 together gives Proposition9 and will lead to the coveted PSPACE -bound. Since the term 2 | Dom || Var | + | c dis | will occurrepeatedly, we denote it by the the quantity Q . From here on, Q = 2 | Dom || Var | + | c dis | . (cid:73) Lemma 12 (Datalog Inference Strategy) . Let A lgo generate the query instance ( Prog , g ) .The inference for Prog ‘ g implies the existence of an execution ρ de under the simpliﬁedsemantics, which can be compacted to ρ de . The computation ρ de can be mapped back to anew inference sequence such that Prog ‘ k g for k ∈ O ( Q ) . Proof.

This lemma has two parts: (1) it states that computations in the simpliﬁed semanticsand inference sequences in the

Cache

Datalog program are related and (2) it says thatcompact computations can be mapped to an inference sequence with a small

Cache size.Let (

Prog , g ) be generated by the procedure A lgo with Prog ‘ g . We need to showthat g can also be inferred from Prog with a small

Cache . Recall that when generating theDatalog program

Prog , the procedure A lgo guesses the computations of the dis processes.Consider some inference sequence for Prog ‘ g . For each application of an inference rule inthe sequence, we can ﬁnd a corresponding transition of a thread in the simpliﬁed semantics.This follows from the invariants in section 5.1.2. Hence we can convert the sequence ofinferences to a run ρ de . This run in turn can be compacted by the arguments in Lemma 11,to get a smaller run ρ de . Now we need to see how this compact run implies the existenceof an inference sequence with smaller sized Cache . To do this we consider the dependencygraph of ρ de .We proceed by induction on the height of messages in the dependency graph. Westrengthen the statement and show that for every message msg de at a height given by height ( msg de ) = h , we have Prog ‘ k emp ( msg de ) ( Prog ‘ k dmp ( msg de )) for k = h × Q . Thelemma follows by the deﬁnition of compactness, which guarantees h ≤ height ( G ρ de ) ≤ Q .The base case is trivial, since all messages in sink ( G ρ de ) are facts in the Datalog program Prog . We now show the inductive case for a message v ∈ G ρ de at height h + 1. The messages v in depend ( v ) have height at most h . The inductive hypothesis thus yields Prog ‘ h Q v .We infer these messages one at a time, store them in the Cache , and discard all atoms inthe

Cache used for the inference of the v . Hence at each step in the inference sequence, the Cache contains a subset of depend ( v ) which has already been inferred, and, additionally someatoms which are currently being used for the inference of the next member of depend ( v ).The former is bounded by Q by the compactness (Lemma 11, | depend ( v ) | < Q ) while thelatter is bounded by h Q by the induction hypothesis. Together the size of the Cache neverexceeds ( h + 1) Q . Thus by reusing the space in the Cache to infer members of depend ( v ),we only require an additional space of Q . At the end of this process, the size of the Cache equals | depend ( v ) | and the space consumption of the dependencies is at most( Q − | {z } bound on | depend ( v ) | + h Q |{z} inductive hypothesis for next atom at height h = ( h + 1)( Q ) − v , having inferred and insertedinto Cache atoms corresponding to messages from depend ( v ). This inference of v from themessages in depend ( v ) requires us to simulate the run of genthread ( v ) using the rules of theDatalog program (by mapping each transition executed by genthread ( v ) to its correspondingrule from the Datalog program). We note that at all points in the simulation it suﬃces tostore exactly one extra atom either of etp or of dtp (depending upon the type of genthread ( v ))corresponding to the local state of genthread ( v ). The additional atom can be accommodatedalong with depend ( v ) since | depend ( v ) | + 1 < ( h + 1) Q (since | depend ( v ) | < Q ). Hence a

Cache of size at most 2( h + 1)( | Dom || Var | + | c dis | ) is suﬃcient, and by inductionthe lemma follows. (cid:74) Lemma 11 along with the compact inference sequences, Lemma 12, together show thatfor all the query instances generated by A lgo , inference is possible if it is possible with asmall Cache . This shows Lemma 9 giving us

PSPACE -membership.

In this section our goal is to support compositional veriﬁcation methods prominent in programlogics and thread-modular reasoning style algorithmic veriﬁcation. Such approaches focus ona single thread and study its interaction with others.We extend the system from section 5 by adding a single distinguished ‘ego’ thread, whichwe refer to as the leader , denoted by the symbol ldr . Amongst the n dis threads only the ldr can execute loops, while the others, like section 5 are required to be loop-free.The environment once again consists of arbitrarily many identical env threads that arerequired to be cas-free. We can represent this as env ( nocas ) k dis k dis ( acyc ) k · · · k dis n ( acyc ) which we refer to as the leader setting.Note that the simpliﬁed semantics presented in Section 4 applies here. This allows usto leverage Theorem 5 by which we can operate on the simpliﬁed semantics instead. Themain challenge of this section then is to go from the simpliﬁed semantics in the presence of aleader to an NEXPTIME veriﬁcation technique, by means of a small model argument.

As discussed before, the safety veriﬁcation problem amounts to solving the message generationproblem (MG) (section 5.1). Let the goal message be denoted msg .We demonstrate that the simpliﬁed semantics helps solving the problem.Our main ﬁnding is that message generation has short witness computations (assumingthe domain is ﬁnite). The proof of Theorem 13 is in Section 6.3. (cid:73)

Theorem 13.

In the leader setting, a message can be generated in the simpliﬁed semanticsif and only if it can be generated by a computation of length at most exponential in the inputspeciﬁcation, | c dis | · | c env | · | Reg | · |

Dom | · |

Var | . (cid:73) Corollary 14.

In the leader setting, the message generation problem for RA is in

NEXPTIME . We establish the result in two steps. First we show that every computation in the simpliﬁedsemantics has a “backbone”, which is made up solely by some threads called essential threads (Lemma 16). Then we show how to truncate this backbone to obtain a short computation(Section 6.2).

Analyzing Dependencies in the Dependency Graph . The following study of depend-encies generalizes the one in Section 5.2. In a computation of the simpliﬁed semantics,messages from the dis threads have unique timestamps whereas messages from env threadsmay have identical timestamps. We recall genthread ( msg ), the thread which ﬁrst generatedmessage msg , and the dependency set of a message msg , denoted by depend ( msg ) as deﬁnedearlier in Section 5.2.We deﬁne depend ( msg ) = ∅ for initial messages. We write depend ∗ ( msg ) for the reﬂexiveand transitive closure of depend , the smallest set containing msg and such that for all msg ∈ depend ∗ ( msg ) we have depend ( msg ) ⊆ depend ∗ ( msg ). dwait Godbole, S. Krishna, Roland Meyer 35 Similar to Lemma 11, we now show that we can focus on computations where any writeevent directly depends on a small number of other events, and where dependency sequencesare short. The main diﬀerence with Section 5.2 is that since the leader has loops, we cannotapriori bound executions w.r.t. | c dis | . Keeping this in mind, we provide an alternative notionfor compact computations. Compact Computations . We call a computation ρ compact if for every env message msg ∈ depend ∗ ( msg ) in the computation (1) | depend ( msg ) ∩ Msgs ( ρ ↓ env ) | ≤ | Dom || Var | and(2) for every msg = msg from depend ∗ ( msg ) ∩ Msgs ( ρ ↓ env ) either the variable or the valueis diﬀerent from msg . The ﬁrst point addresses the situation where an env thread readstwo messages with the same variable and value but diﬀerent views: it says that the threadcould have chosen to read one of the messages twice. The second point says there is noneed to generate two env messages with the same variable and value along a dependencysequence. A thread reading the second message could equally well read the ﬁrst message, the ts + timestamp for env messages would make it available forever. (cid:73) Lemma 15.

In the leader setting, if the message msg can be generated in the simpliﬁedsemantics, then it can be generated by a compact computation.

In a compact computation, both fan-in (size of depend set) and depth (along a dependencysequence) of env messages is O ( | Dom || Var | ) since there are only as many distinct (variable,value) pairs. Hence O (( | Dom || Var | ) | Dom || Var | ) many env messages are suﬃcient to generate msg . Our goal is to derive a similar bound on dis messages. First, we consider the dis messages read by env threads, i.e. the dis - env reads-from dependencies. The dis - dis dependencies will be handled later. Essential Messages and Threads . Given a computation ρ in the simpliﬁed semantics,the essential messages for generating message msg , denoted by edepend ( msg ), is the smallestset that includes msg and is closed as follows. (1) ∀ messages msg ∈ edepend ( msg ) ∩ Msgs ( ρ ↓ env ) we have depend ( msg ) ⊆ edepend ( msg ). ∀ msg ∈ edepend ( msg ) ∩ Msgs ( ρ ↓ dis ) we have depend ( msg ) ∩ Msgs ( ρ ↓ env ) ⊆ edepend ( msg ).Note the asymmetry, for the env threads we track all dependencies, for the dis threads weonly track the dependencies from env .For a computation ρ , the threads generating essential messages of msg forthe ﬁrst time and the set of dis threads are essential threads ; ethread ( ρ ) = { genthread ( m ) | m ∈ edepend ( msg ) } ∪ dis .We claim that projecting ρ to essential threads yields a valid computation in the simpliﬁedsemantics. Essential messages thus form the backbone of the computation mentioned above.We now give the proof of Lemma 16 and Corollary 17. (cid:73) Lemma 16. If ρ is a computation in the simpliﬁed semantics, so is ρ ↓ ethread ( ρ ) . Proof.

To prove this theorem it suﬃces to show that there is no thread in ethread ( ρ ) thatreads from some thread t ethread ( ρ ). Then we simply can project away the threads not in ethread ( ρ ) and all the reads-from dependencies will still be respected.This follows trivially from the deﬁnition of edepend ( ). Indeed we have that for anessential env thread t the messages (and hence threads) that t reads from are also essential.All dis threads are essential by deﬁnition. Additionally, for any dis thread, we add all its env dependencies to the essential set. The set ethread ( ρ ) is then closed under reads-fromdependencies and hence the computation ρ ↓ edepend ( ρ ) is valid under RA. (cid:74) Now we discuss bounding of essential messages. Essential env messages and (and essential env threads) are atmost exponential, bounded by Q = ( | Dom || Var | ) | Dom || Var | using the earliercompactness argument. We show that the number of essential dis messages is boundedas well. Firstly, each env thread has a state space (control-state, registers) bounded by Q = | c env || Reg | | Dom | . Given the earlier bound on total number of essential env messages(and hence those by a single thread), an env thread run of length greater than O ( Q Q )implies that there will exist a sub-run in which (1) no essential message was generated and(2) the thread revisited the same local state twice. We can truncate this sub-sequence sincethe absence of essential messages implies that external reads-from dependencies are notaﬀected. Hence the computation for a single env is Q Q -bounded. Given the Q -bound on env threads, the total number of dis messages consumed by the env threads can be atmost Q Q . This implies suﬃciency with exponentially many essential dis messages. (cid:73) Corollary 17.

Let the goal message msg be generated in a computation of system c . Thenfor some compact computation, | edepend ( msg ) | is at most exponential in | c | . Proof.

Recall the notation genthread ( msg ) which refers to a thread which generated themessage msg for the ﬁrst time. In the following, if t = genthread ( msg ), we also refer to t asthe “ﬁrst writer” of msg .First we observe that edepend ( msg ) ⊆ depend ( msg ). Hence in particular, we have edepend ( msg ) ∩ Msgs ( ρ ↓ env ) ⊆ depend ( msg ) ∩ Msgs ( ρ ↓ env ). This is shown to be at mostexponential ( O ( Q )) by Lemma 15, since both the height and the env fan-in of the dependencygraph restricted to env is polynomial. Given that each essential message is generated for theﬁrst time by a unique essential thread, the number of essential env threads is also boundedby O ( Q ).Now, consider the fragment ρ of the computation between two consecutive ﬁrst-writes(ﬁrst points of generation) of two essential env messages. Now if any env thread performs morethan O ( Q ) = O ( | c env || Reg | | Dom | ) many transitions within ρ it would imply that there aretwo conﬁgurations lcf , lcf within ρ at which the local-states of the thread (modulo view) areidentical - this follows since | c env | is the program size and | Reg | | Dom | is the number of distinctregister valuations. Additionally note that the view at lcf cannot be greater than that at lcf (monotonicity of views in RA). Hence we can simply truncate the sub-computation between lcf and lcf while keeping the computation still valid under RA (the thread with lower viewcan still perform all its remaining transitions). In this truncation no essential messages willbe lost and hence the reads-from dependencies will be respected.To explain further, suppose to the contrary that some thread t which is the ﬁrst writerof an essential message executed more than O ( Q Q ) number of transitions lcf lcf · · · lcf l .Since the total number of essential messages is only O ( Q ), there must exist a subsequence σ such that no essential env messages were generated (for the ﬁrst time) in σ . Additionally,since the state-space of each thread is O ( Q ), by a pigeon-hole argument, it follows that twolocal conﬁgurations lcf i , lcf j of t in σ are equal. We can simply truncate the fragment of therun between these conﬁgurations since no essential messages have been generated for theﬁrst time.Then it suﬃces for each ﬁrst writer env thread to take at most O ( Q Q ) many transitionsand consequently read at most exponentially many dis messages. Recall that the dis messagesthat are read by ﬁrst writers of essential env messages are essential themselves. Since thenumber of essential env threads which are ﬁrst writers itself is bounded by O ( Q ), the numberof essential dis messages is bounded by O ( Q Q ), which is exponential in the input. Since edepend ( msg ) is a union of essential dis and env messages we get the exponential bound onessential messages. (cid:74) dwait Godbole, S. Krishna, Roland Meyer 37 Combined with Lemma 16, the corollary says it is suﬃcent to focus on computationswith atmost exponentially many essential threads and essential messages. We now want tobound the computation of the dis threads.

The computation truncation idea as applied to env threads earlier does not apply to theleader. Recall the asymmetry in the deﬁnition of essential dependencies; we did not includethe dis - dis load dependencies. The dependencies come in two forms: (1) those involving(either as message writer or as reader) some non-leader dis thread and (2) ldr - ldr dependencies.The former are poly-sized owing to the loop-free nature of the non-leader dis threads. Hence,we focus on ldr - ldr dependencies. For a memory m de , let m de ↓ ldr be the set of ldr messagesin it. Assuming vw de is the view of the ldr , let selfRead ( vw de , m de ) denote the ( x , d ) pairs inmessages of m de ↓ ldr which can be read by ldr . (cid:73) Deﬁnition 18. selfRead ( vw de , m de ) = { ( x , d ) | ( x , d , vw de ) ∈ m de ↓ ldr , vw de ( x ) = vw de ( x ) } . We note that a pair ( x , d ) is in selfRead when this pair is the last store by the ldr on x following which vw de ( x ) has not changed. Observe that there can be at most | Var | | Dom | many distinct selfRead functions. Consider a sub-computation of the leader between twogenerations of essential messages. We call conﬁgurations cf de and cf de ldr -equivalent if (1)the local conﬁgurations of the leader coincide except for the views vw de resp. vw de and (2)the memories m de and m de satisfy selfRead ( vw de , m de ) = selfRead ( vw de , m de )Then the computation of the leader between cf de and cf de can be projected awaywhile retaining a computation in the simpliﬁed semantics. Since there are only O ( | c ldr | ( | Reg || Var | ) | Dom | ) many distinct conﬁgurations that are not ldr -equivalent, after project-ing away the redundant part, the leader will have an at most exponentially long computationbetween generation of two consecutive essential messages. Given the exponential bound onall essential messages, we see that post projection, the leader computation is reduced toexponential size. Combined with the argument for the env and non-leader dis threads, givesTheorem 13. Note that the resulting non-deterministic algorithm does not run in polynomialspace as there may be exponentially many essential ldr messages which need to be generatedconcurrently with the env threads. NEXPTIME -membership of safety veriﬁcation in theleader case

We now move on to Theorem 13. It suﬃces to show that we only need to consider computationsof exponential length in order to verify safety properties of a parameterized system underthe simpliﬁed semantics in the leader case. For this, we show exponential bounds on the env and dis components of the computation.We have already seen that for the essential env threads, O ( Q Q ) is an upper bound onthe number of transitions they need to make. Additionally this bound also applies to thenumber of essential dis messages. Note that the non-leader dis threads are loop-free and hencetheir number of transitions is polynomial in | c dis | . Hence we now focus on computations of the leader. We denote Q = | c ldr | ( | Reg || Var | ) | Dom | which is a bound on the number of distinct(non equivalent) leader conﬁgurations and use it below in the proof.For the ldr , we need to maintain more states (as compared to the env threads) to ensurethat the truncated run is valid. This is so as we also want to capture ldr - ldr dependencies aswell. The selfRead function does precisely this - at each point in the run it tracks the set of ldr messages that can be read by the ldr itself.Assume once again that there is a (super-exponential) leader computation with lengthgreater than O ( Q Q Q ). Then since O ( Q Q ) is a bound on the number of total dis essentialmessages (and in particular essential ldr messages), there must exist a sub-computation ofthe ldr of length greater than O ( Q ) that is free of essential message generation. Let thissub-computation be lcf lcf · · · lcf l . Assume the memory states along this sub-computationto be m de m de . . . m de l .We augment each conﬁguration lcf i with the respective memory state m de i obtainingan augmented conﬁguration as explained below. Consider the conﬁgurations obtainedby augmenting lcf = ( c , rv , vw de ) to the set selfRead ( vw de , m de ). That is, given lcf i =( c i , rv i , vw de i ), on augmentation with selfRead ( vw de i , m de i ) we obtain the augmented state as h c i , rv i , selfRead ( vw de i , m i ) i . Now, selfRead can take atmost | Var | | Dom | many values, whilethe leader local-state (modulo view) has only | c ldr || Reg | | Dom | values. This implies, (by apigeon-hole argument), the existence of a pair i, j such that h lcf i , selfRead ( vw de i , m i ) i and h lcf j , selfRead ( vw de j , m j ) i are equivalent.Now, the view of the ldr thread is monotonic. This implies that if for i = j we have h c i , rv i , selfRead ( vw de i , m i ) i = h c j , rv j , selfRead ( vw de j , m j ) i then the sub-computation between i and j may be truncated. Thus the run lcf · · · lcf i lcf j +1 · · · lcf l is also a valid run of thethread. Moreover it does not aﬀect other threads since once again no essential messages arelost.Hence for any super-exponential (order greater than O ( Q Q Q )) leader computaion,there exists a shorter computation which also preserves reachability. Thus for safety veri-ﬁcation it suﬃces to consider runs of atmost exponential length, immediately giving an NEXPTIME upper bound.

PSPACE -hardness of env ( nocas , acyc ) We show that the applications of semantic simpliﬁcation to the loop-free and leader settingsare tight, and further simpliﬁcation is not possible.Having shown that safety veriﬁcation of env ( nocas ) k dis ( acyc ) k · · · k dis n ( acyc ) is in PSPACE , we give a matching lower bound. For the lower bound, it suﬃces to considerthe variant with no dis threads and loop-free env threads, env ( nocas , acyc ). In fact, thisresult captures the inherent complexity in Parameterized RA, termed as PureRA , i.e. RAin its simplest form. The simplicity of

PureRA comes from (1) disallowing registers, and(2) stores can only write value 1 and the memory is initialized with 0 values. We obtain

PSPACE -hardness even with this reduced form, which is surprising, given that in its full formit is in

PSPACE . Notice that the

PSPACE -hardness with registers is trivial, since

PSPACE canbe encoded in valuations of registers.

In this section, we elaborate on the

PSPACE -hardness of checking safety properties ofparameterized systems under RA in the absence of dis threads (and loop-free, cas-free env dwait Godbole, S. Krishna, Roland Meyer 39 threads), which we can denote as env ( nocas , acyc ). In fact, we investigate the inherentcomplexity in RA, by removing all extra frills like registers, as well as arbitrary data domains.So what we have is, Pure

RA, which is basically, RA in its simplest form. The simplicityof

Pure

RA comes from the fact that we do not use registers, and the only writes that areallowed are that of writing value 1 to any shared variable, where we assume that the memorywas initialized to 0 so that we have a data domain of { , } . The remarkable thing about thisresult is that we obtain PSPACE -hardness, which is surprising, given that in its full form itis in

PSPACE by Section 5. Notice that the

PSPACE -hardness with registers is trivial, sincecomputations can be encoded in register operations themselves. c = c AG ⊕ c SATC ⊕ c FE[0] ⊕ · · · ⊕ c FE[n − ⊕ c assert choose ( u ) = ( t u := 0) ⊕ ( f u := 0) c AG = choose ( u ); choose ( e ); choose ( u ); · · · ; choose ( u n ); ( s := 1) c SATC = assume ( s = 1); check (Φ);(( assume ( t u n = 0); a n, := 1; ) ⊕ ( assume ( f u n = 0); a n, := 1)) c FE[i] = assume ( a i +1 , = 1); assume ( a i +1 , = 1); ( assume ( f e i +1 = 0) ⊕ assume ( t e i +1 = 0));(( assume ( t u i = 0); a i, := 1) ⊕ ( assume ( f u i = 0); a i, := 1)) c assert = assume ( a , = 1); assume ( a , = 1); assert false Figure 9

The parametrized system used in the reduction

To show the

PSPACE -hardness of checking safety properties of parameterized systems of theclass env ( nocas , acyc ), we establish a reduction from the canonical PSPACE -complete problem,

QBF . The

QBF problem is described as follows. Given a quantiﬁed boolean formula Ψ = ∀ u ∃ e ∀ u ∃ e · · · ∃ e n ∀ u n Φ( u , e , · · · u n ), over variables V ars (Ψ) = { u , . . . , u n , e , . . . , e n } ,decide if Ψ is true. Ψ has n + 1 universally quantiﬁed variables and n existentially quantiﬁedvariables. To establish the reduction, we construct an instance of the parametrized reachabilityproblem for RA (in fact Pure

RA) consisting of the parametrized system c , such that c isunsafe if and only if the QBF instance is true. We assume that the

QBF instance Ψ is asgiven above and now detail the construction.The program c executed by the env threads (given in Figure 9) consists of functions(sub-programs), one of which may be executed non-deterministically: c = c AG ⊕ c SATC ⊕ c FE[0] ⊕ · · · ⊕ c FE[n − ⊕ c assert Gadgets used.

The task of checking the satisﬁability of Ψ is distributed over the env threads executing these functions. Each function has a particular role, which we term asgadgets and now describe. c AG The

Assignment Guesser guesses a possible satisfying assignment for

V ars (Ψ). c SATC

The

SATisﬁability Checker checks satisﬁability of Φ w.r.t. an assignment guessed by c AG . c FE[i]

The ∀∃ ( ForallExists ) Checker at level i , 0 ≤ i ≤ n − c FE[i] ) veriﬁes that the ( i + 1)thquantiﬁer alternation ∀ u i ∃ e i +1 is respected by the guessed assignments. This proceedsin levels, where the check function at level i + 1, c FE[i+1] ‘triggers’ the check functionat level i , c FE[i] , till we have veriﬁed that all assignments satisfying Φ constitute thetruth of Ψ. c assert The

Assertion Checker reaches the assert false instruction when all the previousfunctions act as intended, implying that the formula was true.Due to the parameterization, an arbitrary number of threads may execute the diﬀerentfunctions at the same time. However, there is no interference between threads, and there isa natural order between the roles: c SATC requires c AG to function as intended, and c FE[i] requires the functions c AG , c SATC and c FE[j] , n − ≥ j > i . Shared Variables.

We use the following set of shared variables in c : For each x ∈ V ars (Ψ), we have boolean shared variables t x and f x in c . These variables represent trueand false assignments to x using the respective boolean variables in a way that is explainedbelow. All the shared variables used are boolean, and the initial value of all variables is 0.We also have a special (boolean) variable s . Encoding variable assignments of Ψ : the essence of the construction. Recallthat the messages in the memory are of the form ( x , d , vw ) where x is a shared variable, d ∈ { , } , and vw is a view. To begin, the views of all variables are assigned time stamp0. An assignment to the variables in Ψ can be read oﬀ from the vw of a message ( s, , vw )in the memory state. For v ∈ V ars (Ψ), if vw ( t v ) = 0, then v is considered to have beenassigned true, while if vw ( f v ) = 0, then v is assigned false. Our construction, explainedbelow, ensures that exactly one of the shared variables t v , f v will have time stamp 0 in theview of the message ( s, , vw ). The zero/non-zero timestamps of variables t x and f x in theview of ( s, , vw ) can be used to check satisﬁability of Φ since only a thread with a zerotimestamp can read the initial message on the corresponding variable. Checking a single clause.

As an example, consider the i th clause e ∨ ¬ u ∨ u . Thesatisﬁability check is implemented in a code fragment as follows. check ( i ) = ( assume t e = 0) ⊕ ( assume f u = 0) ⊕ ( assume t u = 0) and check (Φ) = check (1); check (2); · · · ; check ( l ). Finally,we have the boolean variables a i, and a i, for i ∈ { , · · · n } : these are 2( n + 1) ‘universalityenforcing’ variables that ensure that all possible assignments to the universal variables in V ars (Ψ) have been checked.

First we describe the various gadgets.

We now detail the gadgets (functions) mentioned in Figure 9.

Assignment Guesser : c AG : The job of the Assignment Guesser is to guess a possibleassignment for the variables. This is done by writing 1 to exactly one of the variables t x , f x for all x ∈ V ars (Ψ). Each such write is required to have a timestamp greater than 0 by theRA semantics, and the view vw of the writing thread is updated similarly. After making theassignment to all variables in V ars (Ψ) as described, the writing thread adds the message( s, , vw ) to the memory.Consequently, the view vw of the writing thread (and hence the message) satisﬁes ∀ x ∈ V ars (Φ) , vw ( t x ) = 0 ⊕ vw ( f x ) = 0 dwait Godbole, S. Krishna, Roland Meyer 41 We interpret this as: the assignment chosen for x ∈ V ars (Φ) is true if vw ( t x ) = 0 and is falseif vw ( f x ) = 0. The chosen assignment is thus encoded in vw and hence can be incorporatedby threads loading 1 from s using the message ( s, , vw ), (see c SATC ). This follows since loadoperations of the RA semantics cause the thread-local view to be updated by the view in themessage loaded.

SAT Checker : c SATC : The SAT Checker reads from one of the messages of the form( s, , vw ) generated by c AG . Using the code explained in Figure 9, it must check that theassignment obtained using the vw satisﬁes Φ. The crucial observation is that assume ( t x = 0)( assume ( f x = 0)) being successful is synonymous with the timestamp of t x ( f x ) in vw being0. This holds since assume ( v = 0) requires the ability to read the initial message on v whichin turn requires the thread-local view on v to be 0. Timestamp of t x ( f x ) in vw itself being 0is equivalent to x being assigned the value true ( false ) by c AG .Finally it checks that either t u n or f u n had timestamp 0 in vw , and writes 1 to a n, or a n, correspondingly in Figure 9. For insight, we note prematurely that we will enforce boththese writes to a n, and a n, as a way of ensuring the universality for the variable u n . Themain task is to verify the ‘goodness’ of the assignments satisfying Φ. One of the thingsto verify is that, we have satisfying assignments for both values true/false of the universalvariables u i .If the assume ( t u n = 0) evaluates to true in c SATC then in the view of the message ( s, , vw )obtained at the end of c AG , vw ( t u n ) = 0. We now need a c AG function (executed by somethread) to make an assignment such that in the view of the message ( s, , vw ), we have vw ( f u n ) = 0, and the formula Φ is satisﬁable again. The next step is to check if theseassignments which diﬀer in u n are sound with respect to the ∀ u n − ∃ e n part of Ψ : thatis, the assignment to e n is independent to that of u n . This procedure has to be iteratedwith respect to all of u , u , . . . , u n − by (1) ﬁrst ensuring that Φ is satisﬁable for bothassignments to u i , 0 ≤ i ≤ n − ≤ i ≤ n −

1, (that is the choice of assignmentto e i is independent of all variables in { u i , e i +1 , · · · , u n } ). ForallExists Checker : c FE[_] : The n ∀∃ Checker s c FE[0] , . . . , c FE[n − take over at thispoint, consuming the writes made earlier. In general, for each i ∈ { , · · · , n − } , we have ∀∃ Checker function of n kinds, c FE[0] , . . . , c FE[n − that operate at levels 0 , . . . , n − c FE[i] operates at level i by reading 1 from a i +1 , , a i +1 , variables, and making writes to a i, , a i, variables for 0 ≤ i ≤ n − Universality Check : c FE[i] ﬁrst veriﬁes that all possible valuations to theuniversally quantiﬁed variable u i +1 made Φ satisﬁable : the two statements assume ( a i +1 , = 1); assume ( a i +1 , = 1) verify this by reading 1 from a i +1 , and a i +1 , (note how all higher c FE[j] , j > i level functions enforce this by generating a dependency treesuch as the one in Figure 10). Existentiality Check : Next, c FE[i] checks that the satisfying assignments of Φ seen sofar agree on the existentially quantiﬁed variable e i +1 : the statements ( assume ( f e i +1 = 0) ⊕ assume ( t e i +1 = 0)) check this. Assume that we have satisfying assignments of Φ which donot agree on e i +1 . Then we have messages ( a i +1 , , , vw ) and ( a i +1 , , , vw ) such that vw ( t e i +1 ) > e i +1 assigned false ) but vw ( t e i +1 ) = 0 ( e i +1 assigned true ). Now when c FE[i] reads from these messages, its view vw , will have both, vw ( t e i +1 ) > vw ( f e i +1 ) > c FE[i] from executing ( assume ( f e i +1 = 0) ⊕ assume ( t e i +1 = 0)) since themessages in the memory where t e i +1 and f e i +1 have value 0 (and time stamp 0) cannot beread. This enforces that the choice of the existentially quantiﬁed variable e i +1 is independentof the choice of the assignments made to the variables in { u i +1 , e i +2 , · · · , u n } , and hence the proper semantics of quantiﬁer alternation is maintained. Propagation

Finally, the c FE[i] function ‘propagates’ assignments to the next level, thatis, to c FE[i − after a last veriﬁcation. Let A i +1 ,j contain all assignments satisfying Φ whichagree on e i +1 , and where u i is assigned value j ∈ { , } . Such assignments are propagatedto the next level by a c FE[i] function which writes 1 to a i,j . c FE[i − is accessible only when A i +1 , and A i +1 , are both propagated. assert FE [0] FE [1] SATC a , =1 SATC a , =1 a , =1 FE [1] SATC a , =1 SATC a , =1 a , =1 a , = FE [0] FE [1] SATC a , =1 SATC a , =1 a , =1 FE [1] SATC a , =1 SATC a , =1 a , =1 a , = Figure 10

The dependency tree for the case of ∀ u ∃ e ∀ u ∃ e ∀ u Φ. The same color of siblingnodes c FE[i] represents that the value of e i +1 is same at both of these. Assert Checker : c assert : After the n ∀∃ Checkers ﬁnish, the Assertion Checker reads1 from the variables a , and a , and reaches the assertion assert false . This is possibleonly if all the earlier functions act as intended, which in turn is only possible if the QBFevaluates to true. The non-deterministic branching between the choices of the gadgets above means thateach env thread executes exactly one of the gadgets. However together they check Ψ ina distributed fashion as one thread passing on a part of its state to the next one by theload-stores for the a _ , / variables as mentioned above. Hence a computation that reachesthe assertion requires each thread to play a part in this tableau. We now describe this.First a set of 2 n threads run the c AG gadgets and they guess one assignment each suchthat all possible assignments for the universally quantiﬁed variables are covered and suchthat the existentially quantiﬁed variables are chosen such that the semantics of quantiﬁeralternation is respected. Essentially this means the the 2 n assignments guessed would be asuﬃcient witness to the truth of Ψ.Now, 2 n threads execute c SATC and check that each of the assignments guessed (onethread checks one assignment) satisﬁes Φ. They produce a ‘proof’ that this check is completeby writing to variables a n, / . This also checks the innermost universality is respected. Atlevel n −

1, 2 n − threads execute c FE[n − . Each c FE[n − reads 1 from both a n, and a n, and reads 0 from exactly one of t e n or f e n . Depending on the view read from the levelbelow, they either write 1 to a n − , or to a n − , . (Prematurely this corresponds to theassignments A n, and A n, in the proof below.) In essence these threads check that the lastquantiﬁer alternation ( ∀ u n − ∃ e n ) is respected. 2 n − threads then execute c FE[n − at level n −

2, reading 1 from both a n − , and a n − , , and reading 0 from exactly one of t e n − or f e n − . These threads then write 1 to either a n − , or to a n − , , (representing assignments A n − , and A n − , in the proof below). These threads check that the second last quantiﬁeralternation ∀ u n − ∃ e n − is respected. This continues till two threads execute c FE[0] , andwrites 1 to a , or a , . These two writes are read by a thread executing c assert . The views ofthese threads are all stitched together by the stores and loads they perform on the variables s (for guessing assignments) and a _ , / for checking proper alternation. Figure 10 illustrates dwait Godbole, S. Krishna, Roland Meyer 43 how the view (in which the assignments are embedded as described earlier) propagate throughthese threads for the case of the QBF ∀ u ∃ e ∀ u ∃ e ∀ u Φ. The nodes represent individualthreads executing the corresponding gadget and the edges represent the variable which achild writes to pass on its view to its parent. (cid:73)

Lemma 19. Ψ is true iﬀ the assert false statement is reachable in c . This gives us the main theorem (cid:73)

Theorem 20.

The veriﬁcation of safety properties for parametrized systems of the class env ( nocas , acyc ) under RA is PSPACE -hard.

We prove that reaching assert false is possible in the parameterized system c iﬀthe QBF Ψ is satisﬁable. First we ﬁx some notations. Given the QBF Ψ = ∀ u ∃ e . . . ∃ e n ∀ u n Φ( u , e , . . . , u n ), we deﬁne for 0 ≤ i ≤ n , the level i QBF correspondingto Ψ as follows. For 0 ≤ i ≤ n −

1, the level i QBF, denoted Ψ i is deﬁned asΨ i ≡ ∀ u i ∃ e i +1 ∀ u i +1 ∃ e i +2 . . . ∀ u n ∃ e ∃ e . . . ∃ e i ∃ u ∃ u . . . ∃ u i − Φ( u , e , . . . , u n ) For i = n , the level n QBF, denoted Ψ n is deﬁned asΨ n ≡ ∀ u n ∃ e . . . ∃ e n ∃ u ∃ u . . . ∃ u n − Φ( u , e , . . . , u n )Note that Ψ is the same as Ψ. To prove Lemma 19, we prove the following helper lemmas.For ease of arguments, we add some labels in our gadgets, and reproduce them below. choose ( u ) = ( t u := 0) ⊕ ( f u := 0) c AG = choose ( u ); choose ( e ); choose ( u ); · · · ; choose ( u n ); ( s := 1) Figure 11

Implementation of the Assignment Guesser c AG gadget (cid:73) Lemma 21. Ψ n is true iﬀ we reach the label λ of the c SAT gadget (ref. Figure 12) insome thread, and the label λ of the c SAT gadget in some thread. (cid:73)

Lemma 22.

For ≤ i ≤ n − , Ψ i is satisﬁable iﬀ we reach the label λ in the c FE[i] gadget(ref. Figure 13) in some thread, and the label λ in the c FE[i] gadget in some thread. (cid:73)

Lemma 23. assert false is reachable iﬀ we reach the label λ in the c FE[0] gadget in somethread, and the label λ in the c FE[0] gadget in some thread.

In the following, we write Φ for Φ( u , e , . . . , e n , u n ) since the free variables of Φ are clear. Proof of Lemma 21

Assume Ψ n is satisﬁable. Then there are satisfying assignments α and α s.t. α ( u n ) = 0, α ( u n ) = 1, such that α , α | = Φ. These assignments α , α can be guessed by c AG gadgetsin two threads, resulting in adding messages ( s, , view ) and ( s, , view ) to the memory,such that view ( f u n ) = 0 and view ( t u n ) = 0. Correspondingly, there are c SATC gadgets c SATC = assume ( s = 1); check (Φ); λ : skip ;[( assume ( t u n = 0); a n, := 1; λ : skip ; ) ⊕ ( assume ( f u n = 0); a n, := 1; λ : skip ; )] Figure 12

Implementation of the SAT Checker c SATC gadget with labels λ , λ , λ which read from these views, (they read 1 from s ), and check for the satisﬁability of Φ usingthe view , view values of t x , f x for x ∈ V ars (Ψ). Since both are satisfying assignments, thelabel λ is reachable in both c SATC gadgets. One of them will reach the label λ reading t u n = 0 (using view ) and the other will reach the label λ reading f u n = 0 (using view ).Conversely, assume that the label λ of c SATC is reachable in one thread, while the label λ of c SATC is reachable in another thread. Then we know that in one thread, we have reada message ( s, , view ), checked for the satisﬁability of Φ using view , and also veriﬁed that view ( t u n ) = 0, while in another thread, we have read a message ( s, , view ), checked forthe satisﬁability of Φ using view , and also veriﬁed that view ( f u n ) = 0. Thus, we have 2satisfying assignments to Φ, one where u n has been assigned to 0, and the other, where u n has been assigned 1. Hence Ψ n is satisﬁable. (cid:74)(cid:73) Deﬁnition 24.

Let view be a view. We say that an assignment α : V ars (Ψ) → { , } isembedded in view iﬀ for all x ∈ V ars (Ψ) , view ( t x ) = 0 ⇔ α ( x ) = 1 and view ( f x ) = 0 ⇔ α ( x ) = 0 . The term “embedded” is used since the view also has (program) variables outsideof t x and f x . For 0 ≤ i ≤ n , let A i and B i respectively represent the set of assignments which are embedded in the views reaching the labels λ , λ of the c FE[i] gadget. Thus, we know that A n = { α | = Φ | α ( u n ) = 1 } , and B n = { α | = Φ | α ( u n ) = 0 } c FE[i] = [ assume ( a i +1 , = 1); assume ( a i +1 , = 1)]; κ : skip ;[ assume ( f e i +1 = 0) ⊕ assume ( t e i +1 = 0)]; κ : skip ;[( assume ( t u i = 0); a i, := 1; λ : skip ; ) ⊕ ( assume ( f u i = 0); a i, := 1; λ : skip ; )] Figure 13 ∀∃ Checker at level i , c FE[i] , with labels κ , κ , λ , λ . We have n such gadgets, onefor each level 0 ≤ i ≤ n − (cid:73) Lemma 25.

For ≤ i ≤ n − , deﬁne sets of assignments A i, = { α ∈ A i +1 ] B i +1 | α ( u i ) = 1 , α ( e i +1 ) = 0 }A i, = { α ∈ A i +1 ] B i +1 | α ( u i ) = 1 , α ( e i +1 ) = 1 }B i, = { α ∈ A i +1 ] B i +1 | α ( u i ) = 0 , α ( e i +1 ) = 0 }B i, = { α ∈ A i +1 ] B i +1 | α ( u i ) = 0 , α ( e i +1 ) = 1 } where ] denotes disjoint union. Then A i is equal to one of the sets A i, or A i, . Similarly, B i is equal to one of the sets B i, or B i, . dwait Godbole, S. Krishna, Roland Meyer 45 Proof of Lemma 25. We already know the deﬁnitions of A n and B n . Consider the case of A n − and B n − . By construction, to reach label λ of c FE[n − ,(a) we need to have reached labels λ , λ of c FE[n] . The view (say view A ) on reaching thelabel κ in c FE[n − has embedded assignments from A n ] B n .(b) To reach the label κ of c FE[n − , we need either f e n to have time stamp 0 or t e n tohave time stamp 0 in view A . If we had view A ( t e n ) > view A ( f e n ) >

0, then thelabel κ is not reachable. That is, the assignments embedded in view A agree on theassignment of e n .(c) To reach the label λ in c FE[n − , the assignments embedded in view A agree on theassignment of u n − , such that u n − is assigned 1. Thus, A n − is obtained from A n ] B n by keeping those assignments which agree on e n and where u n − is true.Similarly, to reach label λ in c FE[n − ,(a) we need to have reached the labels λ , λ of c FE[n] . The view (say view B ) on reachingthe label κ in c FE[n − has embedded assignments from A n ] B n .(b) To reach label κ , we need either f e n to have time stamp 0 or t e n to have time stamp0 in view B . If we had view B ( t e n ) > view B ( f e n ) >

0, then the label κ is notreachable. That is, the assignments embedded in view B agree on the assignment of e n .(c) To reach the label λ in c FE[n − , the assignments embedded in view B agree on theassignment of u n − , such that u n − is assigned 0. Thus, B n − is obtained from A n ]B n by keeping those assignments which agree on e n and where u n − is false.The proof easily follows for any A i , B i , using the deﬁnitions of A i +1 , B i +1 as above. (cid:74) Proof of Lemma 22

We give an inductive proof for this, using Lemma 21 as the base case. As the inductive step,assume that Ψ i +1 is satisﬁable iﬀ we reach the label λ of the c FE[i+1] gadget in some thread,and the label λ of the c FE[i+1] gadget in some thread.Assume Ψ i is satisﬁable. We can write Ψ i as ∀ u i ∃ e i +1 Ψ i +1 . We show that there is athread which reaches label λ of the c FE[i] gadget with a view that has A i embedded in it,and there is a thread which reaches the label λ of the c FE[i] gadget with a view that has B i embedded in it.By inductive hypothesis, since Ψ i +1 is satisﬁable, there is a thread which reaches thelabel λ of the c FE[i+1] gadget with a view view A that has A i +1 embedded in it, and thereis a thread which reaches label λ of the c FE[i+1] gadget with a view view B that has B i +1 embedded in it. Note that a i +1 , , a i +1 , have been written 1 by these threads respectively,such that view A ( a i +1 , ) > view B ( a i +1 , ) >

0. Thanks to this, there is a thread whichcan take on the role of the c FE[i] gadget now. This thread begins with a view view C whichis the merge of view B and view A . The label κ of this c FE[i] gadget is reachable by reading1 from both a i +1 , and a i +1 , , and we want view C ( t e i +1 ) = 0 or view C ( f e i +1 ) = 0. Asseen in item(b) in the proof of observation 25, this is possible only if view B ( t e i +1 ) = 0 and view A ( t e i +1 ) = 0, or view B ( f e i +1 ) = 0 and view A ( f e i +1 ) = 0.By assumption, since Ψ i is satisﬁable, there exists assignments from A i +1 and B i +1 whichagree on e i +1 and u i . In particular, the satisﬁability of Ψ i = ∀ u i ∃ e i +1 Ψ i +1 says that we havea set of assignments S ⊆ A i +1 ] B i +1 which satisfy Ψ i , such that for all α ∈ S , α ( u i ) = 1and α ( e i +1 ) is some ﬁxed value. Similarly, the satisﬁability of Ψ i also gives us a set ofassignments S ⊆ A i +1 ] B i +1 such that for all α ∈ S , α ( u i ) = 0 and α ( e i +1 ) is some ﬁxed value. It is easy to see that S = A i , while S = B i . Thus, the satisﬁability of Ψ i implies thefeasibility of the assignments A i and B i . This in turn, gives us the following.Thus, starting with a view view C which has embedded assignments A i +1 ] B i +1 , it ispossible for a thread to read 1 from a i +1 , and a i +1 , (these are present in the view C ), Check that either t e i +1 has time stamp 0 in view C or f e i +1 has time stamp 0 in view C (thisis possible since the embedded assignments agree on e i +1 ), Check that t u i has time stamp 0 in view C (this is possible since the embedded assignmentsare such that u i is assigned 1)This ensures that the thread reaches the label λ of c FE[i] with a view having A i embeddedin it (notice that the last two checks ﬁlter out A i from A i +1 ] B i +1 ).In a similar manner, starting with a view view C which has embedded assignments A i +1 ] B i +1 , it is possible for a thread to read 1 from a i +1 , and a i +1 , (these are present in the view C ), Check that either t e i +1 has time stamp 0 in view C or f e i +1 has time stamp 0 in view C (thisis possible since the embedded assignments agree on e i +1 ), Check that f u i has time stamp 0 in view C (this is possible since the embedded assignmentsare such that u i is assigned 0)This ensures that the thread reaches the label λ of c FE[i] with a view having B i embeddedin it (notice that the last two checks ﬁlter out B i from A i +1 ] B i +1 ).Conversely, assume that we have two threads which have reached respectively, labels λ , λ of c FE[i] gadget having views in which B i and A i are embedded. We show that Ψ i issatisﬁable.By the deﬁnition of A i , we know that we have assignments from A i +1 ] B i +1 which agreeon e i +1 , and which set u i to 1. The fact that we reached the label λ of c FE[i] gadget with aview having A i embedded in it shows that these assignments are feasible. Similarly, reachingthe label λ of c FE[i] with a view having B i embedded in it shows that we have assignmentsfrom A i +1 ] B i +1 which agree on e i +1 , and which set u i to 0. The existence of these twoassignments proves the satisﬁability of Ψ i . Proof of Lemma 23

Assume that we reach assert false . Then we have read 1 from a , and a , . These are setto 1 only when the labels λ , λ of c FE[0] have been visited. The converse is exactly similar :indeed if we reach the labels λ , λ of c FE[0] , we have written 1 to a , and a , . This enablesthe reads of 1 from a , and a , leading to assert false . (cid:73) Theorem 26.

Parameterized safety veriﬁcation for env ( nocas , acyc ) is PSPACE -hard.

NEXPTIME -hardness of env ( nocas , acyc ) k dis ( nocas ) In this section we show an

NEXPTIME lower bound on the safety veriﬁcation problem inthe presence of a single leader dis thread, env ( nocas , acyc ) || dis ( nocas ). The lower bound isobtained with a fragment of RA which does not use registers and surprisingly in which dis alsodoes not perform any compare-and-swap operations. As in the case of the PSPACE -hardness, dwait Godbole, S. Krishna, Roland Meyer 47 we work with a ﬁxed set of shared memory locations X (also called shared variables) froma ﬁnite data domain D . We show the hardness via a reduction from the succinct versionof 3CNF-SAT, denoted SuccinctSAT . Following the main part of the paper, we refer to thedistinguished dis thread as the ‘leader’ and individual threads from env as ‘contributors’.

SuccinctSAT : succinct satisﬁability

The complexity of succinct representations was studied in the pioneering work [37] for graphproblems. Typically, the complexity of a problem is measured as a function of some quantity V , with the assumption that the input size is polynomial in V . If the underlying problemconcerns graphs, then V is the number of vertices in the graph, while if the underlying problemconcerns boolean formulae, then V is the size of the formula. [37] investigated the complexityof graph problems, when the input has an exponentially succinct representation, that is, theinput size is polylog in | V | , where V is the number of vertices of the graph, and showed thatsuccinct representations rendered trivial graph problems NP-complete, while [58] showed thatgraph properties which are NP-complete under the usual representation became NEXPTIME -complete under succinct representations.

SuccinctSAT is essentially a exponentially succinctencoding of a 3CNF-SAT problem instance. Let φ ( x , · · · , x n − ) be a 3CNF formula with2 n variables and 2 n clauses. Assume an n bit binary address for each clause.A succinct encoding of φ is a circuit D ( y , · · · , y n ) (with size polynomial in n ), which, onan n bit input y · · · y n interpreted as a binary address for clause c , generates 3 n + 3 bits,specifying the indices of the 3 variables from x , · · · , x n occurring in clause c and their signs(1 bit each). Thus, the circuit D provides a complete description of φ ( x , · · · , x n ) whenevaluated with all n -bit inputs. Deﬁne SuccinctSAT as the following

NEXPTIME -complete [58]problem.

Given a succinct description D of φ , check whether φ is satisﬁable. Adopting the notation above, we assume that we have been given n , the formula φ with2 n boolean variables BVars = { x , · · · , x n − } , and the succinct representation D with inputvariables { y , · · · , y n } . Denote the variables in clause c as var1 ( c ) , var2 ( c ) , var3 ( c ) and theirsigns as sig1 ( c ) , sig2 ( c ) , sig3 ( c ). We denote the n -bit address ¯ c of a clause c as a (boolean)word ¯ c ∈ { , } n and commonly use the variable α to refer to clause addresses. We denotethe variable addresses also as n -bit (boolean) words and commonly use the the variable β to represent them. We construct an instance of the parametrized reachability problemconsisting of a ldr leader thread running program c ldr and the env contributor threads runningprogram c env . We show that this system is ‘unsafe’ (an assert false is reachable) if and onlyif the SuccinctSAT instance is satisﬁable.

The leader running program c ldr guesses an assignment to the boolean variables in φ . Thecontributors running the program c env will be tasked with checking that the assignmentguessed by the leader does in fact satisfy the formula φ . They do this in a distributed fashion,where one clause from φ is veriﬁed by one contributor. Then similar to the PSPACE -hardnessproof, the program c env forces the contributors to combine checks for individual clauses as adependency tree. This is so that the root of the tree is able to reach an assertion failure onlyif all threads could successfully check their clauses under the leader’s guessed assignment.However since all the contributors run the same program, the trick is to enforce that allclauses will be checked. c env = c CL − ENC ⊕ c SAT ⊕ c Forall [ ] ⊕ · · · ⊕ c Forall [ n − ] ⊕ c assert choose ( u ) = ( t u := 1) ⊕ ( f u := 1) c CL − ENC = choose ( u ); choose ( u ); · · · ; choose ( u n − ); s := 1 c SAT = ( assume s = 1); c CV ; c Check ;(( assume ( t u n − = 0); a n − , := 1) ⊕ ( assume ( f u n − = 0); a n − , := 1)) c Forall [ i ] = assume a i +1 , = 1; assume a i +1 , = 1;(( assume t u i = 0; a i, := 1) ⊕ ( assume f u i = 0; a i, := 1)) c assert = assume ( a , = 1); assume ( a , = 1); assert false Figure 14

The contributor program c env used in the reduction. The sub-routines c CV and c Check are described later.

Gadgets. c env consists of a set of gadgets (modelled as ‘functions’ in the program), onlyone of which may be non-deterministically executed by the contributors, while c ldr is theprogram executed by the leader. c env = c CL − ENC ⊕ c SAT ⊕ c Forall [ ] ⊕ · · · ⊕ c Forall [ n − ] ⊕ c assert Recall that in the

PSPACE -hardness proof too, there were similar gadgets which were executedby the env threads. The gadgets in c env do execute various tasks as follows. c CL − ENC guesses an n bit address ¯ c of a clause c in φ , c SAT (1) acquires a clause address ¯ c generated by c CL − ENC , (2) uses the circuit D to obtainthe indices of variables var1 c , var2 c , var3 c in clause c , along with its sign (this isdone by the sub-routine c CV ), (3) accesses the assignment made to the variables bythe leader (sub-routine c Check ) and (4) the assignment is such that c is satisﬁed. c Forall [ i ] (0 ≤ i ≤ n −

2) together ensure that the satisﬁability of all the clauses in φ hasbeen checked. This is done by instantiations of c SAT , in levels (similar to the proofof

PSPACE -hardness). At the i th level, c Forall [ i ] checks the ∀ universality of the i th address bits of clause c . c assert ﬁnally reaches assert false , if all the previous functions execute faithfully, implyingthat the SuccinctSAT instance is satisﬁable.The non-deterministic branching implies that each env thread will only be able to executeone of these gadgets. The check for satisﬁability of φ is distributed between the env threadsmuch like the PSPACE -hardness construction. For this distributed check, threads are allocatedroles depending upon the function (gadget) they execute. Additionally, the distinguishedleader thread is tasked with guessing the assignment. We now describe this.

Role of the leader.

We have one leader thread which guesses a satisfying assignment forthe boolean variables

BVars as a string of writes made to a special program variable g . Thewrites made to g have n +2 values d t , d f , , . . . , n in a speciﬁc order. Let the initial values of allvariables in the system be init / ∈ { d t , d f , , . . . , n } . To illustrate a concrete example, considerthe case where n = 3. Let the guessed assignment for BVars be w = ftftttff ∈ { t,f } ,where t denotes true and f false. Then the writes made by the leader are as below, where d t and d f are macros for data domain values (other than { init , , · · · , n } ) representing true andfalse respectively. d f d t d f d t d t d t d f d f dwait Godbole, S. Krishna, Roland Meyer 49 The leader alternates writing a guessed assignment for x , . . . , x (in red) with writing avalue from { , . . . , n } (in blue). The values in { , . . . , n } (here { , , } ) in blue are writtenin a deterministic pattern as 1 2 1 3 1 2 1, which we call a ‘binary search pattern’ with 3values, denoted BSP (3) for short.

BSP ( n ) is a unique word of length 2 n − { , . . . , n } ,deﬁned inductively as follows. BSP (1) = 1

BSP ( n ) = BSP ( n − · n · BSP ( n −

1) for n ≥ x , . . . , x n − are interspersed alternately with symbols in BSP ( n )by the leader while writing to g . Formally, let S ( n, w ) = BSP ( n ) (cid:1) w represent theperfect shuﬄe (alternation) of BSP ( n ) with the guessed assignment w ∈ { d t , d f } n .The leader writes the word S ( n, w ) to g . From the example above, S (3 , ftftttff ) = d f d t d f d t d t d t d f d f . We show that the shuﬄe sequence which need togenerated is easily implementable by the leader with a polysized program. (cid:73) Lemma 27.

There exists a program c ldr , which nondeterministically choses w ∈ { d t , d f } n and generates the write sequence S ( n, w ) on a shared memory location g , with the size of theprogram growing polynomially with n . How contributors access variable assignments, intuitively.

Each contributor wants to check a single clause for which it needs to access the 3 variablesand their signs occurring in that clause. Since it pertains to the

BSP , we ﬁrst understandthis task and discuss the others (selecting clause, acquiring variable address and sign, etc.)later. For now we assume that the contributor has a variable x with address β and sign σ wants to access the assignment made to variable x by the leader.For boolean variable x , the contributor uses the BSP ( n ) pattern to locate the assignmentmade to x , by reading a subword of S ( n, w ). From program variable g , the contributor reads n + 1 values { d f , , . . . , n } or { d t , , . . . , n } without repetitions, depending upon the sign of x in the clause ( d f if sign is negative, d t if positive). In the running example, if contributorwants to access x from d f d t d f d t d t d t d f d f , it reads the sequence 2 d f x is obtained by reading 3 2 d f

1, while for x , the contributor mustread d f x ∈ BVars , there is a unique ‘access pattern’, whichforces the thread to acquire the assignment of exactly x and not any other variable. Inthis search, it is guided by the BSP , which acts as an indexing mechanism. It helps thecontributor narrow down unambiguously, to the part which contains the value of x . The contributors check that each clause in φ has been satisﬁed in a distributed fashion. Eachcontributor executes one of the functions in c env . They do this as follows. Clause Encoding : c CL − ENC : A thread executing c CL − ENC selects, nondeterministically, aclause address α ∈ { , } n . This is done by writing 1 to either t u i or f u i for all 0 ≤ i ≤ n − s . The function c CL − ENC in Figure 14 describesthis. The view of the message ( s, , vw ) encodes the address α of a clause satisfying( vw ( t u i ) > ⇐⇒ α [ i ] = 0) and ( vw ( f u i ) > ⇐⇒ α [ i ] = 1) for 0 ≤ i ≤ n − PSPACE -hardness proof. Eachbit is encoded in the view of a message. Overall 2 n threads will execute c CL − ENC to cover allthe clauses in the formula.

Satisfaction checking (for one clause) : c SAT : A thread executing c SAT acquires theaddress ¯ c of a clause c through the view vw by reading the message ( s, , vw ) generated by c CL − ENC . This thread has to check the satisﬁability of the clause with address ¯ c . For this, itneeds to know the 3 boolean variables var1 ( c ), var2 ( c ), var3 ( c ) appearing in c . Recall thatwe have been given, as part of the problem, the circuit D which takes an n bit address α corresponding to some clause as input, and outputs the 3 n + 3 bits corresponding to the 3variables appearing in the clause, along with their signs. We use D , and the encoding of aclause address ¯ c stored in vw , to compute D ( α ). We have a polysized sub-routine c CV (CVfor circuit value) that can compute the circuit value of D . Circuit Value : c CV : The c CV sub-routine takes the address α (of a clause c ) and convertsit into the index of one of the variables in c . Thus in essence, c CV evaluates the circuit-valueproblem D ( α ) by simulating the (polynomially-many) gates in D . The idea is to keep twoboolean program variables for each node in D , and propagate the evaluation of nodes in anobvious way (for instance, if we have ∧ gate with input gates g , g evaluating to 0, and 1respectively, then t ∧ will be written 1). We now brieﬂy explain how circuit value can beevaluated, by taking an example of a single gate.For each node p in D we use two boolean program variables, t p and f p . We say that aview vw encodes the value at node p if the following holds. We write encAddr ( vw ) to denotethe value values for boolean variables encoded in vw .( vw ( t p ) > ⇐⇒ p = 0) and ( vw ( f p ) > ⇐⇒ p = 1) (1) t i , f i t i , f i t o , f o Now assume a thread has a view vw when itwants to evaluate a logic (NAND) gate G , with outputnode o and input nodes i and i . We assume vw must encode the values of i and i (the thread hasevaluated inputs of G ) and the thread must not haveevaluated G before (we have that vw ( t o ) = vw ( f o ) = 0). Assuming that these conditionshold, the thread executes instructions such that the new view vw of the thread (1) diﬀersfrom vw only on t o and f o and (2) vw correctly encodes value of o . The function in Figure15 evaluates G . The main observation is that a thread can read init from a variable only if c NAND = ((( assume f i = init ) ⊕ ( assume f i = init )); f o := 1) ⊕ (( assume t i = init ); ( assume t i = init ); ( t o := 1)) Figure 15 c NAND - encoding evaluation for a NAND Gate in views of threads its view on that variable is 0 (since there is only one init message with timestamp 0). Claim(1) holds trivially since only timestamps of only t o or f o may be augmented (reading from init will not change timestamp). By a little observation we see that the thread can write to f o if one of f i { , } have timestamp 0, implying that one of the inputs to the gate is 0 by theassumption. Then it checks out that the new view on f o is greater than 0, thus claim (2)holds. The case for t o may be checked similarly. Since D has polynomially many gates, anythread can evaluate them in topological order, and hence eventually will end up with theevaluation of D ( α ). Also note that since the thread relied on its internal view, the same setof program variables { t p , f p | p ∈ G } may be used by all threads (hence crucially avoiding thenumber of variables to vary with the thread count). dwait Godbole, S. Krishna, Roland Meyer 51 (cid:73) Lemma 28.

There exists a sub-routine c CV that starting with the view vw from ( s, , vw ) ,evaluates the circuit value D ( α ) , where α is the clause address encoded in the variables t u i and f u i in encAddr ( vw ) . Also, c CV is polysized in n . Once D ( α ) has been computed, the thread can nondeterministically choose one of thethree variables appearing in clause c , say x ∈ { var1 ( c ) , var2 ( c ) , var3 ( c ) } . For simplicitywe include this as a part of the routine c CV itself. The address β of the variable x , thecontributor accesses the assignment made by the leader to x and checks if it satisﬁes clause c . This is done by the routine c Check . Clause Check : c Check : Having acquired the address β = β n − , . . . , β and sign σ ofvariable x , by executing c CV , the thread checks that variable x satisﬁes clause c . To faithfullyaccess the assignment to x from the variable g , the BSP guides the thread. The ‘accesspattern’ for x denoted by AP ( n ) ( β, σ implicit) which is recursively deﬁned asfor 0 < i ≤ n AP ( i ) = ( i · AP ( i −

1) if β i − = 1 AP ( i − · i if β i − = 0for checking satisﬁability AP (0) = ( d f if σ = 0 d t if σ = 1For example if x , with negative sign ( σ = 0), was to be accessed, then the access patternwould be AP (3) = 3 · AP (2) = 3 · · AP (1) = 3 · · AP (0) · · · d f · x ,with negative sign ( σ = 0), the access pattern would be AP (3) = 3 · AP (2) = 3 · AP (1) · · AP (0) · · · d f · · d f d t d f d t d t d t d f d f was written to g bythe leader, it is easy to see that the reads with access pattern for x (3 · · d f ·

1) would besuccessful, since x had been assigned to false by the leader while that for x ( 3 · d f · · true while the contributor wished to read d f . AP (0) isdeﬁned to ensure satisﬁability of the clause, AP (0) = d f iﬀ f sign = 0 (the sign of the variablein the clause is negative) and AP (0) = d t iﬀ t sign = 0 (the sign of the variable in the clauseis positive).The above recursive formulation gives us a poly-sized sub-routine which reads valuesmatching the AP sequence. We thus have the following lemma. (cid:73) Lemma 29.

There exists a sub-routine c Check , which, starting with a view vw encoding (in t d , f d , . . . , t d n − , f d n − and t sign , f sign ) the address and sign of boolean variable x in clause c , terminates only if c is satisﬁed under the assignment to x made by the leader. Until now, a thread which reads a clause from c CL − ENC has checked its satisﬁabilitywith respect to the assignment guessed by the leader, using the c SAT module. However, toensure satisﬁability of φ , this check must be done for all 2 n clauses. This is done in levels0 ≤ i ≤ n − c Forall [ i ] , exactly as in the PSPACE -hardness proof. Finally, we reach assert false reading 1 from both a , , a , . However, in this case we do not have to checkfor alternation, but only for the universality in the assignments. Forall Checker : c Forall [ i ] : The c Forall [ n − ] gadget checks the ‘universality’ with respect tothe second-last bit of the clause address, c Forall [ n − ] gadget does this check with respect tothe third-last bit, and so on, till c Forall [ ] does this check for the ﬁrst bit, ensuring that allclauses have been covered.2 n threads execute c c SAT , and write 1 to a n − , and a n − , , depending on the last addressbit of the clause it checks. Next, 2 n − threads execute c Forall [ n − ] . A thread executing c Forall [ n − ] reads 1 from both a n − , and a n − , representing two clauses whose last bits diﬀer;this thread checks that the second last bits in these two clauses agree: it writes 1 to a n − , (if the second last bit is 0) or to a n − , (if the second last bit is 1). When 2 n − threads ﬁnishexecuting c Forall [ n − ] , we have covered the second last bits across all clauses. This continueswith 2 n − threads executing c Forall [ n − ] . A thread executing c Forall [ n − ] reads 1 from both a n − , and a n − , representing two clauses whose second last bits diﬀer and checks thatthe third last bits in these two clauses agree. Finally, we have 2 threads executing c Forall [ ] ,certifying the universality of the ﬁrst address bit, writing 1 to a , and a , . Assertion Checker : c assert : The assertion checker gadget in Figure 14, reads 1 from a , , a , . If this is successful, then we reach the assert false . PSPACE -hardness proof

As is evident there are many things common between the two proofs. We now recapitulatethe similarities and diﬀerences.In the

PSPACE -h we wanted to check for truth of the QBF, hence guessing anassignment was not necessary. Here the leader is tasked with guessing an assignmentto the boolean variables.In

PSPACE -h we want to check for quantiﬁer alternation in the boolean variables inΨ. Here we want to also check for universality of addresses, i.e. the fact that allclauses have been checked. This makes the c Forall [_] gadget a bit simpler than its c FE[_] counterpart.In the

PSPACE -h, the CNF formula, φ was given in a simple form and hence allthreads executing c SAT checked the formula. Additionally, given the exponential sizeof the formula, the task is distributed between (exponentially) many threads. HereCNF formula also was in an encoded form, hence we had to devise the circuit valuemachinery to extract it from the succinct representation D . The proof of this lemma is very close to that of Lemma 19. Some of the terminology we usein this proof is borrowed from the proof of Lemma 19. As in the case of section 7.4, we addsome labels in the function descriptions for ease of argument in the proof. We describe thenotations and key sub lemmas required for the proof.

Notation and Interpretation of Boolean Variables involved in the construction

We denote by α U , an assignment on the (boolean) variables { u , u , · · · u n − } , inter-preted as the (n-bit) address of a clause. Here u n − is the most signiﬁcant bit (MSB)and u as the least signiﬁcant bit (LSB). We view the assignment so generated as α U ∈ { , } n as an n -bit vector. α U ( u i ) gives the assignment to u i .We denote by α D , an assignment on (boolean) variables { d , d , · · · d n − , d sign } , inter-preted as the (n-bit) index of a variable in V ars (Ψ) and one sign bit. Here d n − is theMSB and d is the LSB. We view the assignment as α D ∈ { , } n +1 as an ( n + 1)-bitvector.For an assignment α ∈ B n , D ( α U ), (similarly D ( α U ) and D ( α U )) are the n + 1bits signifying var1 ( α U ) , sig1 ( α U ) ( var2 ( α U ) , sig2 ( α U ) and var3 ( α U ) , sig3 ( α U ))respectively. dwait Godbole, S. Krishna, Roland Meyer 53 c SAT = ( assume s = 1); c CV ; λ : skip ; c Check ;(( assume ( t u n − = 0); a n − , := 1) ⊕ ( assume ( f u n − = 0); a n − , := 1)) Figure 16 c SAT - acquiring a clause c i and checking satisﬁability of that clause, with the label λ c Forall [ i ] = assume a i +1 , = 1; assume a i +1 , = 1;(( assume t u i = 0; a i, := 1; λ : skip ) ⊕ ( assume f u i = 0; a i, := 1); λ : skip ) Figure 17 c Forall [ i ] at level i with the labels λ , λ . We have n − ≤ i ≤ n − We observe that each thread executing a c CL − ENC function makes a (single) write to s , withthe message ( s, , view ) where view has embedded in it an assignment α U . We write α (cid:5) view todenote that the assignment α is embedded in view . Now a thread p executing a c SAT functionacquires the assignment α U , and computes (non-deterministically) one amongst D ( α U ), D ( α U ), D ( α U ) reaching the label λ . The correctness invariant involved is formalized inthe following lemma. (cid:73) Lemma 30.

Let a thread p executing the c SAT function read a message ( s, , view ) with α (cid:5) view . When the p reaches the label λ computing D i ( i ∈ { , , } ) with α D = D i ( α U ) .Let the view of the thread be view . Then we have α D (cid:5) view . Continuing from above, let p compute D i in c CV reaching λ . Then by lemma 30, we have α D = D i ( α ) embedded in the view of the thread. Now, using c Check , p checks that the clausewith index α U is satisﬁed by the n + 1 bits α D representing a variable and the sign of thevariable in the clause. Finally the thread makes writes to one of the program variables a n − , or a n − , . We have the following lemma that shows correctness of this operation. (cid:73) Lemma 31.

A thread p can make the write ( a n − , , (similarly ( a n − , , ) if and onlyif, clause α U is satisﬁed and if α U ( u n − ) = 1 (similarly α U ( u n − ) = 0 ). In section 8.3.1 and section 8.3.2 we have discussed how the system can check satisﬁability ofa single clause. Now, we need to check that each clause satisﬁed. We do this via additionalmodules to the PSPACE construction.Towards this goal, deﬁne a level predicate

IsSAT ( u n − , u n − , · · · , u ) denoting that theclause α U = u n − · · · u u is satisﬁable. Now very similarly to section 7.4 we deﬁne thefollowing formulae:For 0 ≤ i ≤ n − i ≡ ∀ u i ∀ u i +1 . . . ∀ u n − ∃ u . . . ∃ u i − IsSAT ( u n − , u n − , · · · , u )And we claim the following lemma, (cid:73) Lemma 32.

For ≤ i ≤ n − , Υ i is true ⇐⇒ the labels λ , λ in the gadget c Forall [ i ] canbe reached. The proof of Lemma 32 follows exactly in similar lines to that of Lemma 21 and Lemma 22.Finally, note that Υ is equivalent to the SuccinctSAT instance being satisﬁable. We havethe following ﬁnal lemma to show correctness of the entire construction. (cid:73)

Corollary 33.

We can reach the assert false assertion in the c assert gadget ⇐⇒ Υ istrue. This gives us the main theorem (cid:73)

Theorem 34.

The veriﬁcation of safety properties for parametrized systems of the class env ( nocas , acyc ) k dis ( nocas ) under RA is NEXPTIME -hard.

Atomic CAS operations are indispensible for most practical implementations of distributedprotocols, yet, they hinder veriﬁcation eﬀorts. Undecidability of safety veriﬁcation in thenon-parameterized setting [1] and even in the loop-free parameterized setting env ( acyc ), area testament to this.We tried to reconcile the two by studying the eﬀects of allowing restricted access to CASoperations in parameterized systems. Systems which prevent the env threads from performingCAS operations and allow only either (1) loop-free dis programs or (2) loop-free dis programsalong with a single (‘ego’) program with loops lead to accessible complexity bounds. Thesimpliﬁed semantics based on a timestamp abstraction provides the infrastructure for theseresults. The PSPACE -hardness gives an insight into the core complexity of RA (

PureRA )that stems from the consistency mechanisms of view-joining and timestamp comparisons.We conclude with some interesting avenues for future work. A problem arising from thiswork is the decidability of CAS-free parameterized systems, env ( nocas ) || dis ( nocas ) k · · · k dis n ( nocas ) which seems to be as elusive as its non-parameterized twin dis ( nocas ) k · · · k dis n ( nocas ). We believe that ideas in this paper can be adapted to causally consistent sharedmemory models [50] as well as transactional programs [15] in the parameterized setting. On,the practical side, the Datalog encoding suggests development of a tool, considering thatHorn-clause solvers are state-of-the-art in program veriﬁcation. References P. A. Abdulla, J. Arora, M. F. Atig, and S. N. Krishna. Veriﬁcation of programs under therelease-acquire semantics. In

PLDI , pages 1117–1132. ACM, 2019. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, Egor Derevenetc, CarlLeonardsson, and Roland Meyer. On the state reachability problem for concurrent programsunder power. In Chryssis Georgiou and Rupak Majumdar, editors,

Networked Systems - 8thInternational Conference, NETYS 2020, Marrakech, Morocco, June 3-5, 2020, Proceedings ,volume 12129 of

Lecture Notes in Computer Science , pages 47–59. Springer, 2020. doi:10.1007/978-3-030-67087-0\_4 . Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, K. Narayan Kumar, andPrakash Saivasan. Deciding reachability under persistent x86-tso.

Proc. ACM Program. Lang. ,5(POPL):1–32, 2021. doi:10.1145/3434337 . Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, and Tuan Phong Ngo. Aload-buﬀer semantics for total store ordering.

Log. Methods Comput. Sci. , 14(1), 2018. doi:10.23638/LMCS-14(1:9)2018 . dwait Godbole, S. Krishna, Roland Meyer 55 Parosh Aziz Abdulla, Mohamed Faouzi Atig, and Rojin Rezvan. Parameterized veriﬁcationunder tso is pspace-complete.

Proc. ACM Program. Lang. , 4(POPL), December 2019. doi:10.1145/3371094 . Parosh Aziz Abdulla and Bengt Jonsson. Verifying programs with unreliable channels. In

Proceedings of the Eighth Annual Symposium on Logic in Computer Science (LICS ’93),Montreal, Canada, June 19-23, 1993 , pages 160–170. IEEE Computer Society, 1993. doi:10.1109/LICS.1993.287591 . Mustaque Ahamad, Gil Neiger, James E. Burns, Prince Kohli, and Phillip W. Hutto. Causalmemory: Deﬁnitions, implementation, and programming.

Distributed Comput. , 9(1):37–49,1995. doi:10.1007/BF01784241 . Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling, simulation,testing, and data mining for weak memory.

ACM Trans. Program. Lang. Syst. , 36(2), July2014. doi:10.1145/2627752 . Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling, simulation,testing, and data mining for weak memory.

ACM Trans. Program. Lang. Syst. , 36(2):7:1–7:74,2014. Rajeev Alur, Kenneth L. McMillan, and Doron A. Peled. Model-checking of correctnessconditions for concurrent objects.

Inf. Comput. , 160(1-2):167–188, 2000. Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and Madanlal Musuvathi.On the veriﬁcation problem for weak memory models. In Manuel V. Hermenegildo and JensPalsberg, editors,

Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principlesof Programming Languages, POPL 2010, Madrid, Spain, January 17-23, 2010 , pages 7–18.ACM, 2010. doi:10.1145/1706299.1706303 . Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and Madanlal Musuvathi.What’s decidable about weak memory models? In Helmut Seidl, editor,

ProgrammingLanguages and Systems - 21st European Symposium on Programming, ESOP 2012, Held asPart of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012,Tallinn, Estonia, March 24 - April 1, 2012. Proceedings , volume 7211 of

Lecture Notes inComputer Science , pages 26–46. Springer, 2012. doi:10.1007/978-3-642-28869-2\_2 . A. R. Balasubramanian, Nathalie Bertrand, and Nicolas Markey. Parameterized veriﬁcation ofsynchronization in constrained reconﬁgurable broadcast networks. In Dirk Beyer and MariekeHuisman, editors,

Tools and Algorithms for the Construction and Analysis of Systems - 24thInternational Conference, TACAS 2018, Held as Part of the European Joint Conferenceson Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018,Proceedings, Part II , volume 10806 of

Lecture Notes in Computer Science , pages 38–54. Springer,2018. doi:10.1007/978-3-319-89963-3\_3 . Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. Mathematizingc++ concurrency.

SIGPLAN Not. , 46(1):55–66, January 2011. URL: http://doi.acm.org/10.1145/1925844.1926394 , doi:10.1145/1925844.1926394 . Sidi Mohamed Beillahi, Ahmed Bouajjani, and Constantin Enea. Robustness against trans-actional causal consistency. In , pages 30:1–30:18, 2019. doi:10.4230/LIPIcs.CONCUR.2019.30 . Nathalie Bertrand, Patricia Bouyer, and Anirban Majumdar. Reconﬁguration and MessageLosses in Parameterized Broadcast Networks. In Wan Fokkink and Rob van Glabbeek, editors, , volume 140 of

LeibnizInternational Proceedings in Informatics (LIPIcs) , pages 32:1–32:15, Dagstuhl, Germany, 2019.Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. URL: http://drops.dagstuhl.de/opus/volltexte/2019/10934 , doi:10.4230/LIPIcs.CONCUR.2019.32 . Nikolaj Bjørner, Arie Gurﬁnkel, Ken McMillan, and Andrey Rybalchenko. Horn clause solversfor program veriﬁcation. In

Fields of Logic and Computation II , pages 24–51. Springer, 2015. Nikolaj Bjørner, Ken McMillan, and Andrey Rybalchenko. On solving universally quantiﬁedHorn clauses. In

SAS , volume 7935 of

LNCS , pages 105–125. Springer, Springer, 2013. Roderick Bloem, Swen Jacobs, Ayrat Khalimov, Igor Konnov, Sasha Rubin, Helmut Veith, andJosef Widder.

Decidability of Parameterized Veriﬁcation . Synthesis Lectures on DistributedComputing Theory. Morgan & Claypool Publishers, 2015. Roderick Bloem, Swen Jacobs, Ayrat Khalimov, Igor Konnov, Sasha Rubin, Helmut Veith,and Josef Widder. Decidability in parameterized veriﬁcation.

SIGACT News , 47(2):53–64,2016. doi:10.1145/2951860.2951873 . Ahmed Bouajjani, Michael Emmi, Constantin Enea, and Jad Hamza. On reducing linearizabilityto state reachability. In Magnús M. Halldórsson, Kazuo Iwama, Naoki Kobayashi, and BettinaSpeckmann, editors,

Automata, Languages, and Programming - 42nd International Colloquium,ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part II , volume 9135 of

Lecture Notesin Computer Science , pages 95–107. Springer, 2015. doi:10.1007/978-3-662-47666-6\_8 . Ahmed Bouajjani, Constantin Enea, Rachid Guerraoui, and Jad Hamza. On verifying causalconsistency. In Giuseppe Castagna and Andrew D. Gordon, editors,

Proceedings of the44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017,Paris, France, January 18-20, 2017 , pages 626–638. ACM, 2017. URL: http://dl.acm.org/citation.cfm?id=3009888 . Ahmed Bouajjani, Constantin Enea, and Jad Hamza. Verifying eventual consistency ofoptimistic replication systems. In Suresh Jagannathan and Peter Sewell, editors,

The 41stAnnual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,POPL ’14, San Diego, CA, USA, January 20-21, 2014 , pages 285–296. ACM, 2014. doi:10.1145/2535838.2535877 . Stefano Ceri, Georg Gottlob, and Letizia Tanca. Syntax and semantics of datalog. In

LogicProgramming and Databases , pages 77–93. Springer, 1990. Peter Chini, Roland Meyer, and Prakash Saivasan. Complexity of liveness in parameterizedsystems. In Arkadev Chattopadhyay and Paul Gastin, editors, , volume 150 of

LIPIcs , pages 37:1–37:15. SchlossDagstuhl - Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPIcs.FSTTCS.2019.37 . Peter Chini, Roland Meyer, and Prakash Saivasan. Liveness in broadcast networks. InMohamed Faouzi Atig and Alexander A. Schwarzmann, editors,

Networked Systems - 7thInternational Conference, NETYS 2019, Marrakech, Morocco, June 19-21, 2019, RevisedSelected Papers , volume 11704 of

Lecture Notes in Computer Science , pages 52–66. Springer,2019. doi:10.1007/978-3-030-31277-0\_4 . Peter Chini, Roland Meyer, and Prakash Saivasan. Fine-grained complexity of safety veriﬁca-tion.

J. Autom. Reason. , 64(7):1419–1444, 2020. doi:10.1007/s10817-020-09572-x . Edmund M. Clarke, Armin Biere, Richard Raimi, and Yunshan Zhu. Bounded model checkingusing satisﬁability solving.

Formal Methods Syst. Des. , 19(1):7–34, 2001. Giorgio Delzanno, Arnaud Sangnier, Riccardo Traverso, and Gianluigi Zavattaro. On thecomplexity of parameterized reachability in reconﬁgurable broadcast networks. In DeepakD’Souza, Telikepalli Kavitha, and Jaikumar Radhakrishnan, editors,

IARCS Annual Conferenceon Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2012,December 15-17, 2012, Hyderabad, India , volume 18 of

LIPIcs , pages 289–300. Schloss Dagstuhl- Leibniz-Zentrum für Informatik, 2012. doi:10.4230/LIPIcs.FSTTCS.2012.289 . Giorgio Delzanno, Arnaud Sangnier, and Gianluigi Zavattaro. Veriﬁcation of ad hoc networkswith node and communication failures. In Holger Giese and Grigore Rosu, editors,

FormalTechniques for Distributed Systems - Joint 14th IFIP WG 6.1 International Conference,FMOODS 2012 and 32nd IFIP WG 6.1 International Conference, FORTE 2012, Stockholm,Sweden, June 13-16, 2012. Proceedings , volume 7273 of

Lecture Notes in Computer Science ,pages 235–250. Springer, 2012. doi:10.1007/978-3-642-30793-5\_15 . dwait Godbole, S. Krishna, Roland Meyer 57 Antoine Durand-Gasselin, Javier Esparza, Pierre Ganty, and Rupak Majumdar. Model checkingparameterized asynchronous shared-memory systems. In Daniel Kroening and Corina S.Pasareanu, editors,

Computer Aided Veriﬁcation - 27th International Conference, CAV 2015,San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I , volume 9206 of

Lecture Notesin Computer Science , pages 67–84. Springer, 2015. doi:10.1007/978-3-319-21690-4\_5 . Javier Esparza, Alain Finkel, and Richard Mayr. On the veriﬁcation of broadcast protocols.In ,pages 352–359, 1999. doi:10.1109/LICS.1999.782630 . Javier Esparza, Pierre Ganty, and Rupak Majumdar. Parameterized veriﬁcation of asynchron-ous shared-memory systems.

J. ACM , 63(1):10:1–10:48, 2016. doi:10.1145/2842603 . Philippe Flajolet, Jean-Claude Raoult, and Jean Vuillemin. The number of registers requiredfor evaluating arithmetic expressions.

Theoretical Computer Science , 9(1):99–125, 1979. Cormac Flanagan, Stephen N. Freund, and Shaz Qadeer. Thread-modular veriﬁcation forshared-memory programs. In

ESOP , volume 2305 of

LNCS , pages 262–277. Springer, 2002. Marie Fortin, Anca Muscholl, and Igor Walukiewicz. Model-checking linear-time properties ofparametrized asynchronous shared-memory pushdown systems. In Rupak Majumdar and ViktorKuncak, editors,

Computer Aided Veriﬁcation - 29th International Conference, CAV 2017,Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II , volume 10427 of

Lecture Notesin Computer Science , pages 155–175. Springer, 2017. doi:10.1007/978-3-319-63390-9\_9 . Hana Galperin and Avi Wigderson. Succinct representations of graphs.

Inf. Control. , 56(3):183–198, 1983. doi:10.1016/S0019-9958(83)80004-7 . Georg Gottlob and Christos Papadimitriou. On the complexity of single-rule datalog queries.

Information and Computation , 183(1):104–122, 2003. Matthew Hague. Parameterised pushdown systems with non-atomic writes. In

IARCSAnnual Conference on Foundations of Software Technology and Theoretical Computer Science,FSTTCS 2011, December 12-14, 2011, Mumbai, India , pages 457–468, 2011. doi:10.4230/LIPIcs.FSTTCS.2011.457 . Jad Hamza. On the complexity of linearizability. In Ahmed Bouajjani and Hugues Fauconnier,editors,

Networked Systems - Third International Conference, NETYS 2015, Agadir, Morocco,May 13-15, 2015, Revised Selected Papers , volume 9466 of

Lecture Notes in Computer Science ,pages 308–321. Springer, 2015. doi:10.1007/978-3-319-26850-7\_21 . Mengda He, Viktor Vafeiadis, Shengchao Qin, and João F. Ferreira. GPS $$+$$ + : Reasoningabout fences and relaxed atomics.

Int. J. Parallel Program. , 46(6):1157–1183, 2018. Maurice Herlihy. Wait-free synchronization.

ACM Trans. Program. Lang. Syst. , 13(1):124–149,1991. doi:10.1145/114005.102808 . Maurice Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrentobjects.

ACM Trans. Program. Lang. Syst. , 12(3):463–492, 1990. Neil Immerman.

Descriptive complexity . Springer Science & Business Media, 2012. Bishoksan Kaﬂe, John P Gallagher, and Pierre Ganty. Solving non-linear horn clauses using alinear horn clause solver. arXiv preprint arXiv:1607.04459 , 2016. Vineet Kahlon. Parameterization as abstraction: A tractable approach to the dataﬂow analysisof concurrent programs. In

Proceedings of the Twenty-Third Annual IEEE Symposium onLogic in Computer Science, LICS 2008, 24-27 June 2008, Pittsburgh, PA, USA , pages 181–192,2008. doi:10.1109/LICS.2008.37 . Jan-Oliver Kaiser, Hoang-Hai Dang, Derek Dreyer, Ori Lahav, and Viktor Vafeiadis. Stronglogic for weak memory: Reasoning about release-acquire consistency in iris. In Peter Müller,editor, , volume 74 of

LIPIcs , pages 17:1–17:29. Schloss Dagstuhl -Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPIcs.ECOOP.2017.17 . Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. A promisingsemantics for relaxed-memory concurrency. In

POPL , pages 175–189. ACM, 2017. Michalis Kokologiannakis, Ori Lahav, Konstantinos Sagonas, and Viktor Vafeiadis. Eﬀectivestateless model checking for C/C++ concurrency.

Proc. ACM Program. Lang. , 2(POPL):17:1–17:32, 2018. doi:10.1145/3158105 . Ori Lahav and Udi Boker. Decidable veriﬁcation under a causally consistent shared memory.In Alastair F. Donaldson and Emina Torlak, editors,

Proceedings of the 41st ACM SIGPLANInternational Conference on Programming Language Design and Implementation, PLDI 2020,London, UK, June 15-20, 2020 , pages 211–226. ACM, 2020. doi:10.1145/3385412.3385966 . Ori Lahav, Nick Giannarakis, and Viktor Vafeiadis. Taming release-acquire consistency. InRastislav Bodík and Rupak Majumdar, editors,

Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg,FL, USA, January 20 - 22, 2016 , pages 649–662. ACM, 2016. doi:10.1145/2837614.2837643 . Ori Lahav, Nick Giannarakis, and Viktor Vafeiadis. Taming release-acquire consistency. In

POPL , pages 649–662. ACM, 2016. Ori Lahav and Viktor Vafeiadis. Owicki-gries reasoning for weak memory models. In

ICALP ,volume 9135 of

LNCS , pages 311–323. Springer, 2015. Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocessprograms.

IEEE Trans. Computers , 28(9):690–691, 1979. Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. Don’t settlefor eventual: scalable causal consistency for wide-area storage with cops. In Ted Wobber andPeter Druschel, editors,

SOSP , pages 401–416. ACM, 2011. URL: http://dblp.uni-trier.de/db/conf/sosp/sosp2011.html . Roland Meyer and Sebastian Wolﬀ. Pointer life cycle types for lock-free data structures withmemory reclamation.

Proc. ACM Program. Lang. , 4(POPL):68:1–68:36, 2020. S. S. Owicki and D. Gries. An axiomatic proof technique for parallel programs I.

ActaInformatica , 6:319–340, 1976. Christos H. Papadimitriou and Mihalis Yannakakis. A note on succinct representations ofgraphs.

Inf. Control. , 71(3):181–185, 1986. doi:10.1016/S0019-9958(86)80009-2 . Anton Podkopaev, Ilya Sergey, and Aleksandar Nanevski. Operational aspects of C/C++concurrency.

CoRR , abs/1606.01400, 2016. URL: http://arxiv.org/abs/1606.01400 , arXiv:1606.01400 . Azalea Raad, Ori Lahav, and Viktor Vafeiadis. On parallel snapshot isolation and release/ac-quire consistency. In

Programming Languages and Systems - 27th European Symposium onProgramming, ESOP 2018, Held as Part of the European Joint Conferences on Theory andPractice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings , pages940–967, 2018. doi:10.1007/978-3-319-89884-1\_33 . Anu Singh, C. R. Ramakrishnan, and Scott A. Smolka. Query-based model checking ofad hoc network protocols. In Mario Bravetti and Gianluigi Zavattaro, editors,

CONCUR2009 - Concurrency Theory, 20th International Conference, CONCUR 2009, Bologna, Italy,September 1-4, 2009. Proceedings , volume 5710 of

Lecture Notes in Computer Science , pages603–619. Springer, 2009. doi:10.1007/978-3-642-04081-8\_40 . Salvatore La Torre, Anca Muscholl, and Igor Walukiewicz. Safety of parametrized asynchronousshared-memory systems is almost always decidable. In , pages 72–84, 2015. doi:10.4230/LIPIcs.CONCUR.2015.72 . Aaron Turon, Viktor Vafeiadis, and Derek Dreyer. GPS: navigating weak memory with ghosts,protocols, and separation. In

OOPSLA , pages 691–707. ACM, 2014. Viktor Vafeiadis and Chinmay Narayan. Relaxed separation logic: a program logic for C11concurrency. In

OOPSLA , pages 867–884. ACM, 2013. M Vardi. The complexity of relational database queries. In

Proc. STOC , pages 137–146, 1982. Werner Vogels. Eventually consistent.

Commun. ACM , 52(1):40–44, 2009. doi:10.1145/1435417.1435432doi:10.1145/1435417.1435432