Safety Verification of Parameterized Systems under Release-Acquire
SSafety Verification of Parameterized Systemsunder Release-Acquire
Adwait Godbole
University of California at [email protected]
Shankara Narayanan Krishna
IIT Bombay, [email protected]
Roland Meyer
Institute of Theoretical Computer Science, Braunschweig, [email protected]
Abstract
We study the safety verification problem for parameterized systems under the release-acquire (RA)semantics. It has been shown that the problem is intractable for systems with unlimited access toatomic compare-and-swap (CAS) instructions. We show that, from a verification perspective whereapproximate results help, this is overly pessimistic. We study parameterized systems consisting ofan unbounded number of environment threads executing identical but CAS-free programs and afixed number of distinguished threads that are unrestricted.Our first contribution is a new semantics that considerably simplifies RA but is still equivalentfor the above systems as far as safety verification is concerned. We apply this (general) result to twosubclasses of our model. We show that safety verification is only
PSPACE -complete for the boundedmodel checking problem where the distinguished threads are loop-free. Interestingly, we can stillafford the unbounded environment. We show that the complexity jumps to
NEXPTIME -completefor thread-modular verification where an unrestricted distinguished ‘ego’ thread interacts with anenvironment of CAS-free threads plus loop-free distinguished threads (as in the earlier setting).Besides the usefulness for verification, the results are strong in that they delineate the tractabilityborder for an established semantics.
Concurrency, Verification
Keywords and phrases release acquire, parameterized systems
Release-acquire (RA) is a popular fragment of C++11 [14] (in which reads are annotatedby acquire and writes by release) that strikes a good balance between programmability andperformance and has received considerable attention (see e.g., [8, 41, 47, 49, 51, 53, 60, 63, 64]).The model is not limited to concurrent programs, though. RA has tight links [52] withcausal consistency (CC) [7], a prominent consistency guarantee in distributed databases [55].Common to RA implementations and distributed databases is that they tend to offerfunctionality to multi-threaded client programs, be it means of synchronization or access toshared data.We are interested in verifying such implementations on top of RA. For verification, wecan abstract the client program to invocations of the offered functionality [20]. The result isa so-called instance of the implementation in which concurrent threads execute the code ofinterest. There is a subtlety. As the RA implementation should be correct for every client,we cannot fix the instance to be verified. We have to prove correctness irrespective of thenumber of threads executing the code. This is the classical formulation of a parameterizedsystem as it has been studied over the last 35 years [20]. a r X i v : . [ c s . L O ] J a n Safety Verification of Parameterized Systems under Release-Acquire
We are interested in the decidability and complexity of safety verification for parameterizedprograms under RA. The goal is to identify expressive classes of programs for which theproblem is tractable. There are good arguments in favor of this agenda. From a pragmaticpoint of view, even if the implementation at hand does not fall into one of the classesidentified, we may hope for a reasonably precise encoding. From a conceptual point of view,tractability of verification is linked to programmability, and understanding the complexitymay lead to suggestions for better consistency notions [50] or programming guidelines, e.g.in the form of type systems [56]. Safety verification is a good fit for linearizability [43], thede-facto standard correctness conditions for concurrency libraries, and has to be settledbefore going to more complicated notions.To explain the challenges of parameterized verification under RA, it will be helpful to havean understanding of how to program under RA. The slogan of RA is never read “overwritten”values [52]. Assume we have shared variables g and d , initially 0, and a thread first stores1 to d and then 1 to g . Assume a second thread reads the 1 from g . Under RA, thatthread can no longer claim d = 0. Formulated axiomatically [9], the reads-from, modificationorder, program order, and from-read should be acyclic [52]. While less concise, there areoperational formulations of RA that make explicit, information about the computation whichwill be useful for our development [47, 48, 59]. The mechanism is as follows. Program andmodification order are encoded as natural numbers, called timestamps . Each thread storeslocally a view object, a map from shared variables to timestamps. This map reflects thethread’s progress in terms of seeing (or as above hearing from) stores to a shared variable. Thecommunication is organized in a way that achieves the desired acyclicity. Store instructionsgenerate messages that decorate the variable-value pair by a view. This view is the one heldby the thread except that the timestamp of the variable being written is raised to a strictlyhigher value. The shared memory is implemented as a pool to which the generated messagesare added and in which they remain forever. When loading a message from the pool, thetimestamp of the variable given by the message must be at least the timestamp in the thread.The views are then joined so that the receiver cannot load values older than what the senderhas seen.The timestamps render the RA semantics infinite-state, which makes algorithmic verifica-tion difficult. Indeed, the problem of solving safety verification under RA in a complete wayhas been studied very recently in the non-parameterized setting and proven to be undecidableeven for programs with finite control flow and finite data domains [1]. With this insight, [1]proposes to give up completeness and show how to encode an under-approximation of thesafety verification problem into sequential consistency [54]. Lahav and Boker [50] drewa different conclusion. They proposed strong release-acquire (SRA) as a new consistencyguarantee under which safety verification is decidable for general non-parameterized programs.Unfortunately, the lower bound is again non-primitive recursive. Also the related problem ofchecking CC itself for a given implementation has been studied. It is undecidable in general,but EXPSPACE-complete under the assumption of data independence [22].To sum up, despite recent efforts [1, 22, 50] we are missing an expressive class of programsfor which the safety verification problem under RA is tractable. The parameterized verificationproblem has not been studied. Problem Statement . The parameterized systems of interest have the form env k dis k· · · k dis n . We have a fixed number of distinguished threads collectively referred to as dis andexecuting programs c dis , · · · c n dis , respectively. Moreover, we have an environment consistingof arbitrarily many threads executing the same program c env . We obtain an instance of thesystem by also fixing the number of number of environment threads. The safety verification dwait Godbole, S. Krishna, Roland Meyer 3 problem is as follows: Safety Verification for Parameterized Systems :Given a parameterized system env k dis k · · · k dis n , is there an instance of the systemand a computation in that instance that reaches an assertion violation?The complexity of the problem depends on the system class under consideration. Wedenote system classes by signatures of the form env ( type env ) k dis ( type ) k · · · k dis n ( type n ),where the types constrain the programs executed by the threads. The parameters are thestructure of the control flow, which may be loop-free, denoted by acyc , and the instructionset, which may forbid the atomic compare-and-swap (CAS) command, denoted by nocas . Wedrop the type if no restriction applies. If a thread is not present, we do not mention it in thesignature. With this, dis ( acyc ) k dis ( nocas ) k dis is a non-parameterized system (without env threads) having three dis threads executing: a loop-free c dis , c dis which does not haveCAS instructions, and c dis which is free of restrictions, respectively. Justifying the Parameters.
In [1], the safety verification problem under RA has beenshown to be undecidable for non-parameterized ( env -free) systems from dis ( nocas ) k dis ( nocas ) k dis k dis and non-primitive-recursive for systems from dis ( nocas ) k dis ( nocas ).There are several conclusions to draw from this.With distinguished threads, we cannot hope to arrive at a tractable verification problem.We take the bounded model checking [28] approach and consider loop-free code. Acyclicprograms, however, are not very expressive. Fortunately, RA implementations tend to beparameterized, and, as we will see, this frees us from the acyclicity restriction. The fact thatparameterization simplifies verification has been observed in various works [5, 33, 39, 46, 62]that we discuss below.Restricting the use of CAS requires an explanation. The class env of unconstrainedenvironment threads enables what we call leader isolation : an env thread can distinguishitself from the others by acquiring a CAS-based lock. Even just t CAS operations allows forthe isolation of t distinguished threads, which takes us back to the results of [1] for t = 2resp. t = 4. Acyclicity will not help in this case, in section 3 we show that safety verificationfor env ( acyc ) is undecidable. Contributions . We state our main results and present the technical details in the laterparts.
A Simplified Semantics . We consider parameterized systems of the form env ( nocas ) k dis k · · · k dis n . Our first contribution is a simplified semantics (Section 4) that is equivalentwith the standard RA semantics as far as safety verification is concerned. The simplifiedsemantics uses the notion of timestamp abstraction , which allows us to be imprecise aboutthe exact timestamps of the env threads. Note that we do not make any assumptions on theform of the distinguished threads but support cyclic control flow and CAS. So the result inparticular applies to the intractable classes from [1], even when extended with a parameterizedenvironment. Supporting CAS in the distinguished threads is important. Without it, thereis no way to capture the optimistic synchronization strategies used in performance criticalprogramming [42].We continue to apply the simplified semantics to prove tight complexity bounds for the safetyverification problem in two particular cases of dis programs. Loop-Free Setting.
In Section 5, we show a
PSPACE -upper bound for the safety verificationproblem of parameterized programs from env ( nocas ) k dis ( acyc ) k · · · k dis n ( acyc ). The class Safety Verification of Parameterized Systems under Release-Acquire reflects the bounded model checking problem [28], which unrolls a given program into aloop-free under-approximation. Interestingly, we can sequeeze into
PSPACE the unboundedenvironment of cyclic threads. Our decision procedure is not only optimal complexity-wise,it also has the potential of being practical (we do not have experiments). We show how toencode the safety verification problem into the query evaluation problem for linear Datalog,the format supported by Horn-clause solvers [17,18], a state-of-the-art backend in verification.
Leader Setting.
We continue to show an
NEXPTIME -upper bound for env ( nocas ) k dis ( acyc ) k · · · k dis n ( acyc ) k ldr in Section 6. These systems add an unconstraineddistinguished thread, called the leader (denoted ldr ), to the system from Section 5. The classis in the spirit of thread-modular verification techniques [35, 57], where the safety of a single‘ego’ thread is verified when interacting with an environment.We note that these results delineate the border of tractability: adding another dis threadresults in a non-primitive-recursive lower bound [1], and adding CAS operations to env resultsin undecidability (section 3). Lower Bounds . Our last contributions are matching lower bounds for the two classes.Interestingly, they hold even in the absence of CAS. We show that the safety verifica-tion problem is
PSPACE -hard already for env ( nocas , acyc ), while it is NEXPTIME -hard for env ( nocas , acyc ) k ldr ( nocas ). Related Work . There is a vast body of work on algorithmic verification under consistencymodels. Since our interest is in decidability and complexity, we focus on complete methods.We have already discussed the related work on RA and CC.
Other Consistency models.
Atig et al. have shown that safety verification is decidable forassembly programs running on TSO, the consistency model of x86 architectures [11]. Theresult has been generalized to consistency models with non-speculative writes [12] and veryrecently to models with persistence [3]. It has also been generalized to parameterized programsexecuted by an unbounded number of threads [4]. Behind the decision procedures are (oftendrastic) reformulations of the semantics combined with well-structuredness arguments [6].A notable exception is [5], showing that safety verification under TSO can be solved in
PSPACE for cas-free parameterized programs, called env ( nocas ) here. On the widely-usedPower architecture safety is undecidable [2].The decidability and complexity of verification problems has been studied also fordistributed databases and data structures. Enea et al. considered the problem of checkingeventual consistency (EC) [66] of replicated databases and developed a surprising link tovector addition systems [23] that yields decidability and complexity results for the safety andliveness aspects of EC. For concurrent data structures, the default correctness criterion islinearizability wrt. a specification [43]. While checking linearizability is EXPSpace -completein general [10, 40], important data structures (for which the specification is then fixed) admit
PSPACE -algorithms [21].
Parameterized Systems with Asynchronous Communication . We exploit a pleasant interplaybetween the asynchronous communication in RA and the parameterization of our systems inthe number of threads. Kahlon [46] was the first to observe that parameterization simplifiesverification in the case of concurrent pushdowns. Hague [39] showed that safety verificationremains decidable when adding a distinguished leader thread. Esparza, Ganty, Majumdarstudied the complexity of what is now called leader-contributor systems [33]. It is surprisinglylow, NP -complete for systems of finite-state components and PSPACE -completeness forsystems of pushdowns. At the heart of their technique is the so-called copycat-lemma. dwait Godbole, S. Krishna, Roland Meyer 5
The work has been generalized [62] to all classes of models that are closed under regularintersection and have a computable downward-closure. It has also been generalized to livenessverification [31, 36]. Finally, the study has been generalized to parameterized complexity, forsafety [27] and liveness [25]. Our work is related in that the distinguished threads behavelike a leader. Moreover, our simplified semantics relies on an infinite-supply property theproof of which gives a copycat variant for RA. Our Datalog encoding is reminiscent of thenotion of Strahler number [34].Leader-contributor systems are closely related to broadcast networks [32, 61]. Also there,safety verification has been found to be suprisingly cheap, namely
PTIME -complete [30].For liveness verification, there was a gap between
EXPSpace and
PTIME that was settledrecently with a non-trivial polynomial-time algorithm [26]. What is new in broadcastnetworks and neither occurs in leader-contributor systems nor in our setting is the problemof reconfiguration [13, 16, 29].-
A parameterized system consists of an unknown and potentially large number of threads,all running the same program. Threads compute locally over a set of registers and interactwith each other by writing to and reading from a shared memory. The interaction with theshared memory is under the Release Acquire (RA) semantics [48, 52, 59]. ( c , rv , vw ) msg −− + ( c , rv , vw )( c ; c , rv , vw ) msg −− + ( c ; c , rv , vw ) i = 1 , c ⊕ c , rv , vw ) − + ( c i , rv , vw ) ( skip ; c , rv , vw ) − + ( c , rv , vw ) ( assert false , rv , vw ) − + ⊥ ( c ∗ , rv , vw ) − + ( skip ⊕ c ; c ∗ , rv , vw ) [[ e ]]( rv ( r )) = d rv = rv [ r d ]( r := e ( r ) , rv , vw ) − + ( skip , rv , vw ) [[ e ]]( rv ( r )) = 0( assume e ( r ) , rv , vw ) − + ( skip , rv , vw ) (ST-local) rv ( r ) = d vw < x vw ( x := r , rv , vw ) st , ( x , d , vw ) −−−−−− + ( skip , rv , vw ) (LD-local) vw ( x ) ≤ vw ( x ) rv = rv [ r d ]( r := x , rv , vw ) ld , ( x , d , vw ) −−−−−− + ( skip , rv , vw t vw ) (CAS-local) rv ( r ) = d rv ( r ) = d vw ( x ) ≤ vw ( x ) = ts f vw = vw [ x ts + 1] vw = vw t f vw ( cas ( x , r , r ) , rv , vw ) ld , ( x , d , vw ) −−−−−−− + st , ( x , d , vw ) −−−−−−−− + ( skip , rv , vw ) (LD-global) lcfm ( t ) = lcf lcf ld , msg −−− + lcf msg ∈ m ( m , lcfm ) ( t , msg ) −−−−→ ( m , lcfm [ t lcf ]) (ST-global) lcfm ( t ) = lcf lcf st , msg −−− + lcf msg m ( m , lcfm ) ( t , msg ) −−−−→ ( m ∪ { msg } , lcfm [ t lcf ]) (CAS-global) lcfm ( t ) = lcf lcf ld , msg l −−−− + st , msg s −−−− + lcf msg l ∈ m msg s m ( m , lcfm ) ( t , msg ) −−−−→ ( m ∪ { msg s } , lcfm [ t lcf ]) (Unlabelled) lcfm ( t ) = lcf lcf − + lcf ( m , lcfm ) t −→ ( m , lcfm [ t lcf ]) Figure 1
Local transition relation: silent (thread-local) transitions (pink), shared memorytransitions (blue). Global transition relation (below in green)
We model the individual threads in our system as (non-deterministic) sequential programs.Assume a standard while-language
Com defined by: c ::= skip | assume e ( r ) | assert false | r := e ( r ) | c ; c | c ⊕ c | c ∗ | r := x | x := r | cas ( x , r , r )The programs compute on (thread-local) registers r from the finite set Reg using assume,assert, assignments, sequential composition, non-deterministic choice, and iteration. Condi-tionals if and iteratives while can be derived from these operators, and we use them where Safety Verification of Parameterized Systems under Release-Acquire convenient. The shared memory variables x are accessed only by means of load, store andcompare-and-swap (CAS) operations as r := x , x := r and cas ( x , r , r ), respectively. Theseinstructions are also referred to as events . We have a finite set Var of shared variables, andwork with the data domain
Dom = N . We do not insist on a shape of expressions e butrequire an interpretation [[ e ]] : Dom n → Dom that respects the arity n of the expression. We give the semantics of parameterized systems under release-acquire consistency. We optedfor an operational [48, 59] over an axiomatic [52] definition, and follow [1]. What makes theoperational definition attractive is that it comes with a notion of configuration or state of thesystem that we use to reason about computations. We first define thread-local configurations,then add the shared memory, and give the global transition relation.
Local Configurations.
The RA semantics enforces a total order on all stores to the samevariable that have been performed in the computation. We model these total orders by
Time = N and refer to elements of Time as timestamps. Using the total orders, each threadkeeps track of its progress in the computation. It maintains a view from
View = Var → Time ,a function, that for a shared variable x , returns the timestamp of the most recent eventthe thread has observed on x . Besides, the thread keeps track of the command to beexecuted next (which can be represented as program counter) and the register valuation from RVal = Reg → Dom . The set of thread-local configurations is thus
LCF = Com × RVal × View . Unbounded Threads.
The number of threads executing in the system is not known apriori. As long as we restrict ourselves to safety properties, there are two ways of modelingthis. One way is to define instance programs for a given number of threads, and then requiringcorrectness of all instances, as has been done in [19]. The alternative is to consider an infinitenumber of threads right away. We take the latter approach and define
TID = N to be the setof thread identifiers. The thread-local configuration map then assigns a local configurationto each thread: LCFMap = TID → LCF . Views.
The views maintained by the threads are used for synchronization. They determinewhere in the (appropriate) total order a thread can place a store and from which stores it canload a value. To achieve this, the shared memory consists of messages , which are variablevalue pairs enriched by a view, with the form ( x , d , vw ): Msgs = Var × Dom × View . Shared Memory. A memory state is a set of such messages, and we use Mem = 2
Msgs forthe set of all memory states. With this, the set of all configurations of parametrized systemsunder release-acquire is CF = Mem × LCFMap . Transitions.
To define the transition relation among configurations, we first give a thread-local transition relation among thread-local configurations − + ⊆ LCF × LAB × LCF in Figure 1.Thread-local transitions may be labeled or unlabeled, indicated by
LAB = { ε }∪ ( { ld , st , cas }× Msgs ). The unlabeled transitions capture the control flow within a thread and properly dwait Godbole, S. Krishna, Roland Meyer 7 handle assignments and assumes. They are standard. The message-labeled transitionscapture the interaction of the thread with the shared memory. We elaborate on the load,store, and CAS transitions by which a thread with local view vw , interacts with the sharedmemory. Load . A load transition r := x picks a message ( x , d , vw ) from the shared memory where d is the value stored in the message and updates its register r with value d . The messageshould not be outdated, which means the timestamp of x in the message, vw ( x ), should be atleast the thread’s current timestamp for x , vw ( x ). The timestamps of other variables do notinfluence the feasibility of the load transition. They are taken into account, however, when theload is performed. The thread’s local view is updated by joining the thread’s current view vw and vw by taking the maximum timestamp per address; ( vw t vw ) = λ x . max( vw ( x ) , vw ( x )). Store . When a thread executes a store x := r it adds a message ( x , d , vw ) to the memory,where d is the value held by the register r . The new thread-local view (and the message view), vw , is obtained from the current vw by increasing the time-stamp of x . We use vw < x vw to mean vw ( x ) < vw ( x ) and vw ( y ) = vw ( y ) for all variables y = x . CAS . A CAS transition is a load and store instruction executed atomically. cas ( x , r , r )has the intuitive meaning atomic { r := x ; assume r = r ; x := r } . The instruction checkswhether the shared variable x holds the value of r and, in case it does, sets it to the value of r . The check and the assignment happen atomically. Under RA, this means the timestamp ts of the load instruction and the timestamp ts of the store instruction involved in the CASshould be adjacent, ts = ts + 1.The transition relation among configurations −→ ⊆ CF × TID × ( Msgs ∪ { ε } ) × CF is definedin Figure 1. It is labeled by a thread identifier and possibly message (if the transition interactswith the shared memory). The relation expects a thread t which performs the transition. Inthe case of local computations, there are no more requirements and the transition propagatesto the configuration. In the case of loads, we require the memory to hold the message to beloaded. In the case of stores, the message to be stored should not conflict with the memory.In the case of CAS, we require both of the above, and that the two messages should haveconsecutive timestamps. We defer the definition of non-conflicting messages for the momentuntil we can give it in broader perspective.Fix a parametrized system of interest c . The initial thread-local configuration is lcf init =( c , rv , vw ), where the register valuation assigns rv ( r ) = 0 to all registers and the viewhas vw ( x ) = 0 for all x ∈ Var . The initial configuration of the parametrized system is cf = ( Mem init , lcfm init ) with an initial memory Mem init consisting of messages where allshared variables store the value d init ∈ Dom , along with the initial view which assigns timestamp 0 to all shared variables, and lcfm init ( t ) = lcf init for all threads. A computation (or arun) is a finite sequence of consecutive transitions ρ = cf t , msg ) −−−−−−→ cf t , msg ) −−−−−−→ . . . ( t n , msg n ) −−−−−−→ cf n . The computation is initialized if cf = cf init . We use TS ( ρ ) for the set of all non-zerotimestamps that occur in all configurations across all variables. We use TID ( ρ ) to referto the set of thread identifiers labeling the transitions. For a set TID ⊆ TID of threadidentifiers, we use ρ ↓ TID to project the computation to transitions from the given threads.With first ( ρ ) = cf , last ( ρ ) = cf n we access the first/last configurations in the computation. (cid:73) Example 1.
Consider the program given in Figure 2 which implements a simplified versionof the Dekker’s mutual exclusion protocol for two threads. There are two shared variables x and y . Both x , y are initialized to 0, and at instructions λ , λ the registers r , r are initialized Safety Verification of Parameterized Systems under Release-Acquire to 1. The first thread t signals that it wants to enter the critical section by writing thevalue 1 to x . It then checks if thread t has asked to enter the critical section by reading thevalue of y and storing it into the register r . The thread t is allowed to enter the criticalsection only if the value stored in the register r is 0. The second thread t behaves in asymmetric manner. Variables x and y have been initialized to 0Thread t Thread t λ : r := 1 λ : r := 1 λ : x := r λ : y := r λ : r := y λ : r := x λ : if ( r == 0) : λ : if ( r == 0) : criticalsection criticalsection Figure 2
On the top, is a simplified version of Dekker’s mutual exclusion protocol. Below, is apartial execution sequence under RA. The rectangles show the contents (messages) of the sharedmemory. Messages have three components - (1) the variable, (2) value of the message and (3) themessage view - a map from { x , y } to the set of timestamps Time ( N ). The lines below show thethread-local state - instruction-pointer, register valuation and thread-local view. Under Sequential Consistency (SC) [54], which is a stronger notion of consistency, themutual exclusion property (i.e., at most one thread is in the critical section at any time) ispreserved. However, this is not the case under the RA memory model. To see why, considerthe execution sequence presented in Fig 2. At each instant, the figure shows where theinstruction-counter (i.e., the label of the next instruction to get executed) resides in each ofthe threads, along with the values of the registers. The black arrows with instruction labels λ , λ show the evolution of the run on executing the instruction labeled λ , λ respectively.Let m λ represent the memory obtained after executing the instruction labeled λ , and let msg λ be the unique new message (if any) that is part of m λ after the execution of the instructionlabeled λ . The initial memory is m init where x , y have values and timestamps 0; msg x , msg y represent the messages in m init corresponding to x , y . The execution of the instruction labeledby λ results in the addition of a new message msg λ to the memory whose timestamp (10)is higher than 0 (which is the current timestamp of the variable x for t ). The view of t is then updated to x , y
0. Likewise, the execution of the instruction labeled by λ results in the addition of a new message msg λ to the memory with a higher time stamp (7).This will result in the update of the view of t to x , y
7, wrt. the variable y . Theread instruction labeled by λ is then allowed to use the message msg y to fetch the value of y , since the view of t wrt. y is 0. Likewise, in the case of t concerning the execution of theinstruction labeled by λ , the message msg x is used since the view of t wrt. x is 0. Afterthese steps, both threads enter their respective critical section. dwait Godbole, S. Krishna, Roland Meyer 9 We need a notion of conflict not only for messages, but also for memories, configurations, andcomputations. Two messages are non-conflicting , denoted by ( x , d , vw ) x , d , vw ), ifeither their variables are different, x = x , the timestamps are different, vw ( x ) = vw ( x ),or the timestamps are zero, vw ( x )=0= vw ( x ). Observe that initial messages do not conflictwith any other message.Two memory states are non-conflicting, m m , if for all msg ∈ m and all msg ∈ m we have msg msg . Two configurations are non-conflicting, cf cf , if their memorystates are non-conflicting. Two computations are non-conflicting, denoted ρ ρ , if they usedifferent threads and non-conflicting messages, TID ( ρ ) ∩ TID ( ρ ) = ∅ and last ( ρ ) last ( ρ ). env ( acyc ) In this section, we establish the undecidabililty of the class env ( acyc ), that is the class withloop-free env threads (which can execute arbitrarily-many CAS operations) and without any dis threads. This result essentially shows that even with the loop-free assumption, allowing env threads to perform CAS operations is in itself intractable from a safety verificationviewpoint. Hence, the nocas restriction that we impose on env threads is a justified means ofachieving tractabililty.In fact, we will show a stronger result. We will show that we can transform a non-paramterized system consisting of k distinguished threads having full instruction set andloops under RA (the class dis k · · · k dis k ) to a parameterized system corresponding to theclass env ( acyc ) such that, control-state reachability is preserved. With this equivalence, theclaim would follow by the undecidability result of [1]. To show this result, we transform an input of n programs { c , c , · · · , c n } and a failure statelabel λ fail in some c i (with possibly full instruction set and loops) and transform them into asingle program c env and failure state λ fail , the control flow for which is loop-free but uses thefull instruction set including CAS operations. We claim that the state label λ fail is reachablein dis k · · · k dis n with dis i executing c i if and only if the state label λ fail is reachable in thesystem env ( acyc ) with the environment threads executing c env . Let the variable set, datadomain and register set of the original system be Var , Dom and
Reg = { r , · · · , r k } as usual.We assume that the memory is initialized to 0 on all variables. Converting a single program c i to c i We show how we can convert one thread program c i into a loop-free program c i and then show how we can combine all the programs togetherinto a single loop-free program c env . Consider for an i , the program c i . For the purposes ofthis construction, we will assume that the program c i has been specified as a transition systemrather than in the while-language syntax. It is clear that both representations are equivalentand can be interconverted with only polynomial overhead. Hence we assume that c i = ( Q, ∆ , ι )where Q is the set of control states, ∆ is the transition relation and ι maps each transition toits corresponding instruction from { skip , assume e ( r ) , r := e ( r ) , r := x , x := r , cas ( x , r , r ) } .We transform c i to a loop-free program as follows. Let Q = { q , · · · , q n } and with q as theinitial state.In this conversion, we add extra variables and values such that Var = Var ]{ t i , r i , · · · , r | Reg | i } and Dom = Dom ∪ { , · · · , | Q | − , Λ ⊥ } where ] denotes disjoint union.Now we specify the new transition system c i which needs to be loop-free. For each transition ∂ = ( q a , q b ) ∈ ∆, with source and end states q a and q b respectively and instruction ι ( ∂ ), wetransform it into the following three transition sequence (a sequence of green transitionsstarting with a CAS, followed by k load operations; the transition corresponding to ι ( ∂ ),followed by the sequence of pink transitions consisting of k store operations ending with aCAS) denoted as (cid:69) ( ∂ ). (cid:69) ( ∂ ) = q start cas ( t i ,a, Λ ⊥ ) −−−−−−−−→ q ∂ r := r i −−−−→ q ∂ · · · q k − ∂ r k := r ki −−−−→ q k∂ ι ( ∂ ) −−→ q k +1 ∂ · · · (contd. below) · · · q k +1 ∂ r i := r −−−−→ q k +2 ∂ · · · q k − ∂ r ki := r k −−−−→ q k∂ cas ( t i , Λ ⊥ ,b ) −−−−−−−→ q end We construct (cid:69) ( ∂ ) for each transition ∂ in ∆ to get the complete transition system. Theinitial/final (collectively called terminal) nodes of this transition system are q start , q end (whichare common to all (cid:69) ( ∂ )). The internal q _ ∂ states are all distinct across the (cid:69) ( ∂ ) for different ∂ . The transition system that we obtain has size O ( | Reg || ∆ | ) (each original transition from q a to q b is transformed into a sequence of 2 | Reg | + 3 transitions between q start and q end ). It isclearly loop-free. See Figure 3 showing an example if we start with dis i k dis j . Combining individual c i We construct programs c i as described above for each thread dis i . Now we combine these individual programs into a single program c env .We ensure thatthe newly added shared variables ( t i , r _ i for dis i ) are also disjoint across threads. Hencethe variable set is now Var = Var ] { t , · · · , t n } ] { r ji } i ∈ [ n ] ,j ∈ [ | Reg | ] (where Var was theoriginal variable set { c , c , · · · , c n } were operating with). Finally, the combined data domainis simply a union of individual data domains (which possibly overlap). We combine theindividual programs as c env = c ⊕ c ⊕ · · · ⊕ c n where ⊕ denotes non-deterministic choice. It is clear that c env is loop-free. Additionally, | c env | , and the new | Dom | and | Var | is polynomial in P i | c i | and previous | Var | and | Dom | . q q q q q q ∂ ∂ ∂ ∂ ∂ Program c i Program c j q end q end q start q start Transformed program c j Transformed program c i ι ( ∂ ) ι ( ∂ ) ι ( ∂ ) ι ( ∂ ) ι ( ∂ )register loads register loadsregister stores register stores cas ( t i , , Λ ⊥ ) cas ( t i , , Λ ⊥ ) cas ( t i , Λ ⊥ , cas ( t i , Λ ⊥ , cas ( t j , Λ ⊥ , cas ( t j , Λ ⊥ , cas ( t j , , Λ ⊥ ) cas ( t j , , Λ ⊥ ) cas ( t j , , Λ ⊥ ) cas ( t j , Λ ⊥ , Figure 3
Examples of two dis threads dis i and dis j executing programs c i , c j and the correspondingtransformed programs c i , c j . The program c i has 2 transitions while c j has 3 transitions. Note howthe read-write value for CAS operations in the transformed program match with the transitions inthe original program. We now prove that the system env ( acyc ) with the env threads executing the program c env asdefined above respect the original system. We defer the notion of ‘maintaining reachability’ dwait Godbole, S. Krishna, Roland Meyer 11 for a bit later. We first observe the program c env and make some observations. Locking/unlocking of c i . For any single transformed program c i , we note that at anygiven point, only one thread can be in any internal (not initial/final) state of c i . To see this,note the two atomic CAS operations flanking each 3-path in c i . All these CAS operationsare on the same variable t i and moreover there are no other operations on t i . Hence at anygiven point in time, there is only one message on t i (the most recent write) that is availablefor a CAS operation. The value of this message dictates whether the operation will succeed.When it succeeds, the most recent write value changes to the value written by the CAS.Now note that t i is initialized with value 0; hence initially one thread, say t can takeperform a CAS and change the recent value to Λ ⊥ . Now, there is no transition from q start that performs a CAS with a read of Λ ⊥ . Hence all other threads are kept waiting until therecent value on t i changes from Λ ⊥ . This is possible only when the initial thread t , executesthe final transition and reaches q end , maintaining the claim. Hence these CAS operationsperform the role of a mutual-exclusion lock. But then they perform another function too. State transference.
We now know that for each i , only one thread may execute c i atany given time. However the locking/unlocking operations using CAS also enable threads totransfer their state to their successors. There are three components to the state, which wehandle in turn:Control-state: Note that the recent value on variable t i is v = Λ ⊥ only if the previousthread terminated after simulating some transition ending at q v . Additionally, a lockingCAS operation for (cid:69) ( ∂ ) reads value v only if ∂ is a transition from q v to some other state.Hence, it is guaranteed that the successive thread will execute some transition that emergesfrom a state where the previous thread left off. Note how this is true for the first thread aswell since the initial value on all variables is 0 and the initial state of the transition systemis q .View: The second component that we consider is the view. This also is transferred from athread to its successor through the CAS operation. In particular, when a thread t executesthe final CAS operation to reach q end , it generates a message on t i which is read by itssuccessor. This read implies that the successor will take the join on its own (initial) viewwith that of the message and hence essentially accumulate the exact view that the previousthread left with. So, the view is transferred as well.Register valuations: The previous thread t stores its register valuations in the sharedvariables r ji in the final sequence of store operations before terminating. These are thenaccessed by the successor thread through the initial sequence of load operations.In this way we see that not only is exclusion ensured, but the thread states are transferredfrom one thread to the next. Together, these sequences of threads simulate the entire runof the original dis i in fragments. The above holds for all i ∈ [ n ]. Hence at any given point,there are at most n threads simulating the original ones. Now we formalize the notion of equivalence in reachability. We say that an original statewith the threads { dis , · · · , dis n } is equivalent to a new state when we have the following.if the control states of threads { dis , · · · , dis n } are ( q i , q i , · · · , q i n ) respectively then in thenew system with env , the recent value of shared variable t j is i j , the register valuation of each original dis i is reflected ( r j = most recent write to r ji ) in themost recent writes to the variables r _ i for each thread i ,the view of dis i is the view stored in the most recent message to t i (again projected on theoriginal variable set) andthe global memory (projected) is identical across the original and env states.We claim the following: a state in the original system can be reached if and only someequivalent state in the new system can be reached. We can prove this by induction. Thebase case is that all threads are in their initial states, registers and views with the memoryonly with initial messages (0 on each variable). This trivially satisfies the requirement, bothin the forward and reverse directions.Now for the inductive case ( ⇒ ). Assume that it was true at some instant. Let some dis i execute an instruction for the transition ∂ . In the new system, we can simulate this as anew env thread t taking the path corresponding to (cid:69) ( ∂ ) in c i . We note by the observationsabove that the invariants for the thread-local state (control-state, register valuations andview) is maintained. Additionally, if dis i wrote a message to the memory, then so can t . Inparticular, since the view of t is obtained from the CAS read, it matches that of dis i . Hencethe message added by t can have the same timestamps as dis i .Inductive case ( ⇐ ). The same argument works in the reverse direction. Assume thata pair of equivalent states have been reached. Now, consider a env thread path (cid:69) ( ∂ ) in c i where ∂ = ( q a , q b ). Then, by the induction hypothesis this means that dis i is in state q a inthe original run. Given the equality of thread and memory state initially, it too can take thetransition ( q a , q b ). Once again, the invariant follows from the earlier observations.This gives a sketch of the proof. In particular, note that even though we give an equivalencebetween the control state in the original system and a variable value in the new system, thiscan be easily converted to an equivalence between control states themselves. This meansthat the reachability problem for dis k · · · k dis n can can converted to a reachability problemfor env ( acyc ). This prompts us to restrict env threads to a reduced (cas-free) instruction setand motivates the idea of modelling CAS instructions in a run via computations of the dis threads. In this section, we propose a simplified semantics for the class of systems given by env ( nocas ) k dis k · · · k dis n . The core of this result relies on the Infinite Supply Lemma which showsthat if some env thread could generate a message ( x , val , vw ), then a clone of that threadcould generate the message ( x , val , vw ) with vw = vw [ x t ] for some t > vw ( x ).There are two assumptions that the infinite supply lemma and hence our semanticsimplification result rely on:arbitrarily many env threads executing identical programs.the env threads do not have atomic instructions (CAS).The first assumption allows us to have clone env threads that duplicate the computation andhence the messages generated in it. The second assumption is required for the duplicatedcomputation to remain valid under RA.While performing the duplication, one must keep in mind the dependency between storesand loads across threads. The fact that dis threads are not replicatable (their messagescannot be duplicated) adds to the challenge. To ensure that the clone threads can follow inthe footsteps of the original computation we require that dis messages can be read by the dwait Godbole, S. Krishna, Roland Meyer 13 env clones whenever they can be read by the original env threads. This necessitates that werespect relative order among timestamps between env and dis threads.We develop some intermediate concepts that help us in developing a valid duplicate run.In order to accommodate the clone threads, we must make space (create unused timestamps)along Time for clones to write messages. We do this via timestamp liftings . Having done this,we need to define how we can combine the original computation with that of the clones. Wedevelop the concept of superposition of computations to do this. Finally, the infinite supply(of messages) lemma shows how, using the earlier two concepts, we can generate copies ofmessages, with higher timestamps.This ‘duplication-at-will’ of env messages means that we need not store the entire set of env messages produced. Those with the smallest timestamps act as good representatives ofthe set. Additionally when any thread reads from an env message, we need not be botheredabout timestamp comparisons since we could always generate a copy of that message with ashigh timestamp as required. It is this observation that gives us the timestamp abstractionand with it the simplified semantics.
We now make these arguments precise. Our strategy is to split up the timestamps (hencethe computation) and separate the part originating from the dis threads from the env part(which can be duplicated at will). We write ρ ↓ env and ρ ↓ dis to denote the projections of ρ to env and dis respectively. In our development we will make use of timestamp transformations tf : Time → Time . Weextend these to views vw with per variable timestamp transformations tf = { tf x } x ∈ Var , where tf x only transforms the timestamps for the variable x . The transformed view tf ( vw ) : Var → Time is defined by ( tf ( vw ))( x ) = tf x ( vw ( x )) for every variable x .As an example consider shared variables x , y and views vw , vw such that vw = [ x , y
5] and vw = [ x , y tf = { tf x , tf y } where tf x (0) = tf y (0) = 0, tf x ( t ) = t + 2 and tf y ( t ) = t + 7 for t >
0, we obtain tf ( vw ) = [ x , y
12] and tf ( vw ) = [ x , y RA-valid timestamp lifting . An
RA-valid timestamp lifting for a run ρ is a (per variable)timestamp transformation M = { µ x } x ∈ Var satisfying two properties for each x ∈ Var : (1)it is strictly increasing, µ x (0) = 0; for all t , t ∈ N with t < t we have µ x ( t ) < µ x ( t )and (2) if there is a CAS operation on x with (load, store) timestamps as ( t, t + 1) then µ x ( t + 1) = µ x ( t ) + 1, i.e. consecution of CAS-timestamps is maintained. Note that µ ( cf init ) = cf init . In the example above, tf is a RA-valid timestamp lifting.Lemma 2 says that the run M ( ρ ) obtained by modifying the timestamps of a valid run ρ with an RA-valid timestamp lifting M is also a valid under the RA semantics. (cid:73) Lemma 2 (Timestamp Lifting Lemma) . Let M = { µ x } x ∈ Var be an RA-valid timestamplifting. If ρ is a computation under RA, then so is M ( ρ ) . Hence if a configuration cf isreachable under RA then so is M ( cf ) . Proof.
This result follows since timestamp lifting is just a relabelling of timestamps for eachshared variable. The lemma relies on the following facts/observations:
There are no timestamp comparisons across variables, vw ( x ) is never compared with vw ( x )for x = x .The relative order between timestamps on the same variable is preserved due to the strictlyincreasing property. Additionally, µ (0) = 0, maintaining the timestamps of the init messages.The load, store timestamps of (CAS-local) operations still remain consecutive.In particular the lemma can be formally proven by induction on the length of the run. Thebase case is trivial and the inductive case follows by showing that each instruction - read,write, CAS - that can be executed in ρ can be executed in the lifted run, M ( ρ ). (cid:74) The duplication of messages by the clone env threads requires us to copy computations andthen merge them such that the RA semantics are not violated. This requries (1) timestampsof merging computations to not conflict and (2) the reads-from dependencies between threadsare respected. With this in mind, we introduce the idea of superposition.
We define the superposition ρ . ρ of two computations ρ, ρ as the computation that firstexecutes ρ and then ρ . This requires us to combine the memory in last ( ρ ) with that of everyconfiguration in ρ . Moreover, the threads transitioning in ρ, ρ must be disjoint. Given theseconsiderations, the operation requires the computations to be non-conflicting, ρ ρ (seeSection 2.2.1), and is defined as follows: ρ . ρ = ρ ; ( last ( ρ ) + ρ ) . The addition of a configuration cf to a computation ρ = cf −−−−−−→ ( t , msg ) . . . −−−−−−→ ( t n , msg n ) cf n yieldsthe new computation cf + ρ = ( cf + cf ) ( t , msg ) −−−−−−→ . . . ( t n , msg n ) −−−−−−→ ( cf + cf n ) . Addition of configurations cf = ( m , lcfm ) and cf = ( m , lcfm ) is the configuration cf + cf = ( m ∪ m , lcfm ), where lcfm ( t ) = lcfm ( t ) if lcfm ( t ) = lcf init and lcfm ( t ) = lcfm ( t )otherwise.When ρ ρ holds, we have: (1) for any thread t , if it has transitioned in ρ , then itcannot in ρ ; likewise, if it has not transitioned in ρ , then it can in ρ .(2) last ( ρ ) last ( ρ ), and since the memory in earlier configurations of ρ is a subset of thatin last ( ρ ), the memory unions performed above involve nonconflicting memories. An initialconfiguration is neutral for addition, in particular last ( ρ ) + first ( ρ ) = last ( ρ ). The operationof concatenation ρ ; ρ expects two computations ρ and ρ that satisfy last ( ρ ) = first ( ρ )and returns the sequence consisting of the transitions in ρ followed by the transitions in ρ .This need not be a valid computation under RA, but under the following conditions it is.Let Msgs ( ρ ) be the memory in last ( ρ ). Likewise, let Msgs ( ρ ↓ dis ) ⊆ Msgs ( ρ ) be the subsetof memory in last ( ρ ), which have been added by dis threads during ρ . (cid:73) Lemma 3 (Superposition) . Consider valid computations ρ, ρ of a parametrized system underRA such that ρ ↓ env ρ ↓ env and that Msgs ( ρ ↓ dis ) = Msgs ( ρ ↓ dis ) . Then the superposition ρ . ρ ↓ env is a valid computation under RA. Proof.
Since there are arbitrarily many env threads, we distinguish apart the env threads in ρ from the env threads in ρ . By doing so we ensure that the threads operating (changingstate) in ρ and ρ ↓ env are disjoint. dwait Godbole, S. Krishna, Roland Meyer 15 Now consider the global state obtained after executing ρ (which is a valid run under RA).By hypothesis, the memory state contains messages from ρ ↓ dis , which are identical to thosein ρ ↓ dis . After execution of ρ is complete, we claim that we can execute ρ ↓ env one step at atime.Whenever a dis thread loads from a message generated by a env thread in ρ , the same canhappen in ρ . ρ ↓ env . Likewise, the relative time stamps between the dis threads and the env threads in ρ are the same; so ρ ↓ env can be executed after ρ .Likewise, reads made by some env thread on ρ, ρ either from another env thread or a dis thread also continues exactly in the same way in ρ . ρ ↓ env , since the messages added by dis threads are exactly same in ρ, ρ , and the env threads are disjoint.The above two points show that we have exactly the same reads-from dependencies ( dis ↔ env in ρ , dis ↔ env in ρ ) in ρ . ρ ↓ env . The reason is that env threads are disjoint and themessages added by dis threads are the same in ρ, ρ . Finally, all writes made by the respective env threads of ρ, ρ can be done in ρ . ρ ↓ env ; likewise, all writes made by the dis threads in ρ can also be made in ρ . ρ ↓ env . The reason is that ρ ↓ env ρ ↓ env , and trivially, we have ρ ↓ dis ρ ↓ env . This ensures no conflict of write-timestamps.Formally this can be proven by induction on the length of ρ . (cid:74) Now we develop the infinite supply lemma. Recall that our goal is to generate arbitrarilymany copies of env messages with the same variable and value but higher timestamp. Let usfix one such message, msg = ( x , d , vw ), for our discussion here and see how we can replicate it.Towards this end, consider a computation ρ in which it is generated. We ‘spread-apart’ thetimestamps of Msgs ( ρ ), using timestamp liftings so that we create ‘holes’ (unused timestamps)along Time . Then we generate copies of env threads, denoted as copy ( env ) (possible since env threads can be replicated).The holes accomodate for the timestamps of copy ( env ) and the (higher) timestamp of thecopy of msg . Throughout this, we preserve the order of timestamps of env , copy ( env ) threadsrelative to those of dis threads. This ensures that reads-from dependencies are maintained - copy ( env ) can read a dis message whenever env can do so.We define the computation e ρ as a copy of ρ ↓ env executed by copy ( env ) threads. Thewrite timestamps used by copy ( env ) threads are the unoccupied timestamps generated bythe timestamp lifting operation M ( ρ ). We show an example of this via a graphic. Let eT i and dT i respectively denote the timestamps chosen by env and dis along ρ (first row). ρ : init dT eT dT eT eT M ( ρ ) : init dT eT eT a dT eT eT a eT eT a e ρ : init dT eT b eT dT eT b eT eT b eT The second row shows lifted timestamps (with subscript a ) of M ( ρ ) and the holes (faded).The third row shows holes being used by copy ( env ) for e ρ (these have subscript b ). Theconstruction guarantees M ( ρ ) e ρ and superposition M ( ρ ) . e ρ is allowed. In this computation, e ρ generates a copy of msg , msg = ( x , val , vw ) with higher vw ( x ). Additionally, since eT i a , eT i b have the same position relative to all dT j timestamps, so will vw ( y ) , vw ( y ) for y = x .Now we state the Infinite Supply Lemma. As helper notation, for a run ρ and eachvariable x , we denote the timestamps of stores of dis threads on x as ts x < ts x < · · · . (cid:73) Lemma 4 (Infinite Supply) . Let ρ be a valid run under the RA semantics, in which themessage ( x , d , vw ) has been generated by an env thread. Then for each timestamp t ∗ ∈ N , there exist two timestamp lifting functions M , M and a run ρ such that M ( ρ ) . M ( ρ ↓ env ) . ρ is a valid run. This run contains a message ( x , d , vw ) satisfying (ts comes from ρ ) ∀ i ( t ∗ ≤ ts x i ∧ vw ( x ) ≤ ts x i ) = ⇒ vw ( x ) ≤ µ x ( ts x i ) vw ( x ) ≥ µ x ( t ∗ ) ∀ x = x , ∀ i , vw ( x ) ≤ ts x i = ⇒ vw ( x ) ≤ µ x ( ts x i ) Proof.
Without loss of generality, we assume that in the run ρ , the timestamps on eachvariable are consecutive. If that is not the case, we can always use a timestamp loweringoperation that ‘fills in the gaps’ between non-consecutive timestamps, while maintainingconsecution of the load,store timestamps of (CAS-local) operations.We will give a constructive proof. We specify M ( ρ ) . M ( ρ ↓ env ) . ρ by defining M , M and ρ , and showing that the resulting run is valid under RA. Then we show how a copyof the message ( x , d , vw ) can be obtained as claimed.First, we describe how to copy runs. Copying a run . For a variable x , we define the lifting functions as follows. Withthe consecutiveness assumption, the messages on x have consecutive timestamps and aregenerated by either dis or env , which we will denote below as disT and envT respectively.For developing intution quickly, consider the following sequence of consecutive timestampson some variable. init disT disT envT disT envT envT disT Intuitively, the new interleaved run is obtained by triplicating each envT timestamp intothree adjacent timestamps, envT a , envT b and envT c . The envT a timestamps belongs to thelifted run M ( ρ ), the envT b timestamp belongs to M ( ρ ) and the envT c timestamp belongsto ρ . The a, b, c copies are ordered as b < c < a giving us the following timestamp sequencefrom the one above. init disT disT envT b envT c envT a disT envT b envT c envT a envT b envT c envT a disT We can formalize M and M by counting the number of disT timestamps smaller than the envT i timestamp, but for ease of presentation, we will keep this implicit. The total shift canbe done for instance, using the function which maps a time stamp p ∈ N corresponding to a env thread to the (number of dis threads appearing before p ) + 3(number of env threadsappearing before p )+3, while for a dis time stamp p ∈ N , one can map it to (number of dis threads appearing before p ) + 3(number of env threads appearing before p )+1. So forinstance, the envT at timestamp 3 moves to the envT a time stamp 5, while the disT movesto the disT time stamp 3+ 3(3)+1=13. M maps the timestamp envT i to the timestamp envT ia timestamp. Similarly, M maps the envT i timestamp to the timestamp envT ib = envT ia −
2. Finally, we have envT ic = envT ia − M and M map the disT i timestamp to the corresponding disT i timestamp in the expanded run.We first note that M satisfies the premise of the timestamp lifting lemma, that for (CAS-local) operations, consecutive load,store timestamps for CAS remain consecutive.This follows since only dis can perform (CAS-local) and under M , consecution ismaintained both for ( disT i − , disT i ) timestamps as well as ( envT a , disT i ) as depicted in the dwait Godbole, S. Krishna, Roland Meyer 17 following timestamp sequence. Thus, by the timestamp lifting lemma, M ( ρ ) is a valid rununder RA. init disT disT envT envT envT a disT envT envT envT a envT envT envT a disT The first superposition gives a valid run . We now claim that the run M ( ρ ) . M ( ρ ↓ env )is a valid run under RA. For a env thread t in ρ we denote by copyB ( t ) as its ‘copy’ ( TID distinguished apart) in M ( ρ ↓ env ) ( B since it occupies the b timestamps). We will showthat copyB ( t ) copies the transitions that t took. We use the following invariant relating theview vw of thread t in ρ with the view vw of copyB ( t ) for showing this. Let TS ( t ) denotethe set of timestamps used by thread t in a run ρ .For every shared variable x , if vw ( x ) ∈ TS ( env ), then vw ( x ) = vw ( x ) −
2, else if vw ( x ) ∈ TS ( dis ) then vw ( x ) = vw ( x )Now we can reason by induction on the length of the run that whenever t takes a transitionin ρ , copyB ( t ) can replicate it but with the view as given by the above invariant. Moreprecisely, whenever a env thread t makes a store with timestamp envT , copyB ( env ) willmake a store with timestamp envT −
2. Similarly, when a env thread t makes a load, a. if the load is from a message by a dis thread, copyB ( env ) also loads from the same message b. if the load is from a message by some env thread t in ρ , copyB ( env ) loads from copyB ( t )It is easy to check that the view invariant is maintained through this simulation. Cruciallywe have envT ia < disT j ⇐⇒ envT ib < disT j . Thus t and copyB ( t ) can always read the sameset of dis messages. Thus we have that M ( ρ ) . M ( ρ ↓ env ) is valid under RA. Now we focuson message generation by ρ . Intuitively, ρ will also be a copy of ρ but will occupy the envT c timestamps. Generating a copy of the message . Now we will describe how we can use ρ to generatethe message ( x , d , vw ). Let ρ p be the prefix of ρ just (one transition) before the message( x , d , vw ) is generated. We generate the run ρ by copying the ρ p ↓ env using the c -timestamps.The run obtained is M ( ρ ) . M ( ρ ↓ env ) . ρ . Using similar reasoning as earlier, we can showthat this is a valid run under the RA semantics.Now let env thread t be the thread which generates the message ( x , d , vw ) in ρ . Then thereis a copy of t , thread copyC ( t ) in ρ , that is now in a control state enabling it to generate amessage on variable x with value d (since the transitions have been replicated exactly across env and copyC ( env )). Now we need to reason about the view of the message generated by copyC ( t ). If the view of t is vw a and that of copyC ( t ) is vw c we have the following, whichagain follows from the invariant mentioned above.For each variable x , if vw a ( x ) ∈ TS ( env ), then vw c ( x ) = vw a ( x ) −
1, else if vw a ( x ) ∈ TS ( dis ) then vw c ( x ) = vw a ( x )Observe how this satisfies the condition (3) in the lemma immediately since in both casesabove, we have vw c ( x ) ≤ vw a ( x ). Now the thread copyC ( t ) will choose the timestamp forvariable x , vw ( x ). Assume t ∗ ∈ N has been given. We have two cases, (i) t ∗ ≤ vw ( x ) and(ii) t ∗ > vw ( x ).(i) In this case there is nothing to prove as the original message is lifted to ( x , d , vw ) where vw ( x ) = µ x ( t ∗ ) which satisfies both conditions (1) and (2).(ii) We choose vw ( x ) = µ x ( t ∗ ) + 1, which satisfies (2). Note that this timestamp is higherthan vw c ( x ) since µ x ( t ∗ ) ≥ µ x ( vw ( x )) = vw a ( x ) > vw c ( x ). We have vw ( x ) = µ x ( t ∗ ) + 1 = µ x ( t ∗ ) − M , M , ρ . Additionally, ts x i ≥ t ∗ = ⇒ µ x (ts x i ) ≥ µ x ( t ∗ ) > vw ( x ) satisfiying condition (1). In this case ρ is defined as ρ extended bythe store transition generating the message. Note that since it is a c -type message, thetimestamp is available.Thus in both cases, we have a message ( x , d , vw ) with vw satisfying the required conditions.This proves the theorem. (cid:74) To sum up, we interpret the infinite supply as this: M ( ρ ) is the lifted run with holes. M ( ρ ↓ env ) is the copy ( env ) run and ρ is obtained by running another copy that generates thenew message. We note that run triplication is not strictly necessary for message duplication,but makes the proof easier. We note, points (1) and (3) above refer to relative orderingbetween env and dis timestamps and (2) refers to the new message with an arbitrarily hightimestamp. We introduce the timestamp abstraction , which is a building block for the simplified semantics.Let us call a message msg an env ( dis ) message if it is generated by a env ( dis ) thread. Withthe intuition that env messages can be replicated with arbitrarily high timestamps, while dis or initial messages cannot be, we distinguish the write timestamps of the two types ofmessages. Timestamp Abstraction.
If a env thread has read a message ( x , d, vw ) from a dis threadwith a timestamp ts = vw ( x ) and has generated a message msg on x , then copies of msg are available with arbitrarily high timestamps at least as high as ts . To capture this in ourabstraction, we assign the env message msg , a timestamp ts + that is by definition, largerthan ts .We define the set of timestamps in the simplified semantics as N ] N + , where N + contains for each ts ∈ N , a timestamp ts + . The timestamps are equipped with the order (cid:22) in which ts + is greater than ts and smaller than ts + 1:0 ≺ + ≺ ≺ + ≺ . . . Timestamps of form ts ∈ N are used for the stores of dis threads while those of form ts + are used for stores of env threads. We allow multiple stores with the same timestampof form ts + , while allowing at most one store for timestamps of form ts . This abstractstimestamps of multiple env messages between two dis messages by a single ts + timestamp.Initial messages have timestamp 0 as usual.We utilize this timestamp abstraction by defining a simplified semantics; note that thissimplification is not per se a simpler formulation but rather is simple in the sense thatit will pave the way for efficient verification procedures (as detailed in Section 5, Section6). We then show that a run ρ in the classical RA semantics has an equivalent run in thesimplified semantics where the timestamps are transformed according to some timestamptransformation M as defined above. We show that reachability across the two semantics ispreserved since both order and consecution between timestamps is maintained. RA semantics, simplified.
As in the classical RA semantics, the transition rules of thesimplified semantics will require us to increase timestamps (upon writing messages). Wedefine the function raise ( − ) on N ] N + by raise ( ts ) = raise ( ts + ) = ts + , ts ∈ N . dwait Godbole, S. Krishna, Roland Meyer 19 The definition of the simplified semantics replaces the domain
Time by P = N ] N + . We usethe term abstract to refer to the resulting views, messages, memory, local configurations,and configurations and use a superscript de (shortened dis / env ) to indicate that an elementis abstract. So an abstract view is a function, vw de that maps shared variables to P . Wenow specify the transitions in the abstract semantics. Owing to their different nature (one isreplicatable, the other is not) the dis and env threads will have different transition rules inthe simplified semantics.For storing a value, the env threads use a rule (ST-local env ) that coincides with rule (ST-local) from the RA semantics (Figure 1) except that it replaces relation < x by < envx defined as follows: vw < envx vw iff ( raise ( vw ( x )) (cid:22) vw ( x ) ∈ N + vw ( y ) = vw ( y ) for y = x . Additionally, for stores of env threads, we no longer require the timestamp of the messageto be unused. So we disregard the msg m check in the global (ST-global) rule (notecrucially this is for env only). The dis threads use (ST-local) from the RA semanticswithout modifications, and hence choose a timestamp in N , not a raised value.For load instructions, we distinguish between messages generated by dis and env threads.This is a natural consequence of the different nature of timestamps, ts for dis and ts + for env messages. For loading a dis message, we use rule (LD-local) (Figure 1) from the RAsemantics without changes.For loading from env threads, we introduce a new rule (LD-local env ) . It is definedby replacing t in (LD-local) by t env x . We drop the check on the order of timestamps(overwrite it by true); a env message may always be read, independent of the reading thread’sview. The join is dependent on the variable being read from, x . To define vw t envx vw ,let vw be the view of the thread thread loading the message and vw be the view in themessage. vw t envx vw = ( vw [ x raise ( vw ( x ))]) t vw Thus, if vw ( x ) = 4 and vw ( x ) = 2 + , then ( vw t envx vw )( x ) = 4 + . The update to raise ( vw ( x )) ensures that if it the timestamp on x was ts , it is at least ts + , and hence itcannot read a ( dis ) message with timestamp ts again. We note that the above join operationis not commutative.Now we consider the atomic operation - (CAS-local) - which can only be performed by dis . We have two cases depending on whether (CAS-local) loads from a dis or env message.If it is the latter, then the transition is identical to ( (LD-local env ) ; (ST-local) ) with theadditional condition that the load and store timestamps must be ts + and ts + 1 for some ts .If it is the former (load from dis ) then the load and store timestamps must be ts and ts + 1. Consequently, there cannot be any messages with timestamp ts + . Conversely, if thereis (atleast one) message with timestamp ts + , then the (CAS-local) operation with loadand store timestamps ts and ts + 1 is forbidden. We keep track of such ‘blocked’ intervals( ts , ts + 1) by adding a set B to the global state in the simplified semantics. The global andlocal transition relations of the full simplified semantics are in Figure 4, 5.The simplified semantics exactly captures reachability of the original semantics. Define α de to be a function which drops all views from messages and local configurations, and define= de as equality of the local configurations modulo views. (LD-global) lcfm ( t ) = lcf lcf ld , msg −−− + lcf msg ∈ m ( m , lcfm , B ) ( t , msg ) −−−−→ ( m , lcfm [ t lcf ] , B ) (Unlabelled) lcfm ( t ) = lcf lcf − + lcf ( m , lcfm , B ) t −→ ( m , lcfm [ t lcf ] , B ) (ST-global) dis t ∈ dis lcfm ( t ) = lcf lcf st , msg , dis −−−−− + lcf ( m , lcfm , B ) ( t , msg ) −−−−→ ( m ∪ { msg } , lcfm [ t lcf ] , B ) (ST-global) env t ∈ env lcfm ( t ) = lcf lcf st , msg , env −−−−−− + lcf msg = ( x , d , vw de ) vw de ( x ) = ts + ts B ( m , lcfm , B ) ( t , msg ) −−−−→ ( m ∪ { msg } , lcfm [ t lcf ] , B ∪ { ts + } ) (CAS-global) lcfm ( t ) = lcf lcf cas , msg r , msg w −−−−−−−− + lcf msg r ∈ m msg w = ( x , d , vw de ) vw de ( x ) = ts + 1if msg r ( x ) ∈ N then ts + B ( m , lcfm , B ) ( t , msg w ) −−−−−→ ( m ∪ { msg w } , lcfm [ t lcf ] , B ∪ { ts } ) Figure 4
Simplified semantics. Global transition relation. B is a set of blocked time stamps.For an env thread making a store operation, the time stamp ts + ∈ N + can be chosen only when ts has not been blocked ( ts / ∈ B ). ts + is added to B whenever a env thread makes a store operationadding a message ( x , d, ts + ). Likewise, when a dis makes a CAS operation on loading from a message( x , d, vw ) with vw ( x ) = ts ∈ N , then it must be checked that ts + / ∈ B , ensuring that there are no timestamps between ts and ts + 1. ts ∈ N is added to B when a dis thread makes a CAS, loading from amessage ( x , d, vw ) with vw ( x ) = ts ∈ N . (cid:73) Theorem 5 (Soundness and Completeness) . If a configuration cf is reachable under RA,then there is an abstract configuration cf de reachable in the simplified semantics so that cf de = de α de ( cf ) . Conversely, if a configuration cf de is reachable in the simplified semantics,then there is a configuration cf reachable under RA such that α de ( cf ) = de cf de . Proof.
At the outset, we note that the only component of the configuration that differsbetween the classical and simplified semantics is that of the timestamps and hence theview map, vw in concrete and vw de in the abstract configuration. We now give a relationbetween these timestamps. With this relation being clear, the formal equivalence betweenthe semantics can be shown by considering a case analysis of the transitions that the threadscan take. Once again for quick intuition we take the example of timestamps on a singleshared variable x as follows. init disT disT envT disT envT envT disT init disT disT envT disT envT envT disT concrete 0 1 2 3 4 5 6 7abstract 0 1 2 2 + + dwait Godbole, S. Krishna, Roland Meyer 21 (ST-local env ) store operation for env rv ( r ) = d vw de < envx vw de ( x := r , rv , vw de ) st , ( x , d , vw de ) , env −−−−−−−−− + ( skip , rv , vw de ) (ST-local) store operation for dis rv ( r ) = d vw de < x vw de vw de ( x ) ∈ N ( x := r , rv , vw de ) st , ( x , d , vw de ) , dis −−−−−−−−− + ( skip , rv , vw de ) (LD-local env ) load from env messages rv = rv [ r d ] vw de ( x ) ∈ N + ( r := x, rv , vw de ) ld , ( x , d , vw de ) −−−−−−− + ( skip , rv , vw de t envx vw de ) (LD-local) load from dis messages rv = rv [ r d ] vw de ( x ) (cid:22) vw de ( x ) ∈ N ( r := x , rv , vw de ) ld , ( x , d , vw de ) −−−−−−− + ( skip , rv , vw de t vw de ) (CAS-local env ) cas with load from env messages rv ( r ) = d rv ( r ) = d vw de ( x ) = ts + vw de = vw de t envx vw de vw de ( x ) = ts +1 vw de = vw de [ x ts + 1]( cas ( x , r , r ) , rv , vw de ) cas , ( x , d , vw de ) , ( x , d , vw de ) −−−−−−−−−−−−−−− + ( skip , rv , vw de ) (CAS-local) cas with load from dis messages rv ( r ) = d rv ( r ) = d vw de ( x ) = ts ≥ vw de ( x ) vw de = vw de t vw de vw de = vw de [ x ts + 1]( cas ( x , r , r ) , rv , vw de ) cas , ( x , d , vw de ) , ( x , d , vw de ) −−−−−−−−−−−−−−− + ( skip , rv , vw de ) Figure 5
Simplified semantics. Thread-local transition relation. Margin annotations providedescription. The store rules refer to the thread type ( dis / env ) executing the instruction; the loadrules refer to the thread type which generated the message that is being loaded (similarly for theload part of CAS operations which can only be executed by dis threads). In Rule (ST-local env ) ,we use vw de < envx vw de to mean raise ( vw de ( x )) (cid:22) vw de ( x ) ∈ N + and vw de ( y ) = vw de ( y ) for all variables y = x . In Rule (LD-local env ) , vw de t envx vw de is defined as vw de [ x raise ( vw de ( x ))] t vw de . The join t always means an element wise max over the relevant domain. In this fashion the ts + are abstracted env timestamps between any two dis timestamps. Wedefine the abstraction (similarly concretization) function as the function that transforms alltimestamps in the run as shown above. With the above timestamp abstraction/concretizationin mind we show that abstract and concrete configurations are equivalent in terms ofreachability.We prove this by induction on the length of a run. We show that a concrete (simil-arly abstract) configuration is reachable if and only if it has some abstraction (similarlyconcretization) that is reachable. Base Case . In the base case equivalence is maintained as the initial concrete configurationis equivalent to its simplified configuration where all timestamps are 0. Recall that alltimestamp transformations maintain 0 as a fixpoint. Hence the initial thread-local statesand memory are equivalent for the concrete and abstract semantics.
Inductive Case - Concrete to Abstract
For the inductive case, assume that we havethe result after n ∈ N steps in a computation. Now we induct by considering cases over typesof the n + 1 th instruction in the computation. Silent : Silent (thread-local) instructions are handled trivially. They only change the threadlocal state identically for the concrete and abstract configurations.
Load : A load transition can be either from a dis or a env . For both the cases, we note thatthe timestamp abstraction maintains (including equality) the relative order of timestamps.Hence whenever a concrete message is readable, so is the corresponding abstract message.
Store : This follows since the corresponding thread in the abstract configuration can simulatethe store using the corresponding timestamp ( ts + ∈ N + in case of a env store, and ts ∈ N incase of a dis store). Note once again that, the abstraction preserves order on the timestampsand consequently, a store is allowed in the abstract semantics if it was allowed in the concretecomputation. CAS : In this case, we note that the set B keeps track of which timestamps are allowed forCAS operations. If the CAS operation read from a env message, the semantics follows from (LD-local env )(ST-local) . However, if the CAS load is performed using the store of a dis , then it implies that there are no env timestamps between the load,store timestamps( ts , ts + 1) of the CAS (similar to disT and disT in the figure above). Consequently, we seethat the set B in the abstract semantics does not contain the timestamp ts + ( ts + is addedto B the moment a env makes a store with the timestamp ts + to disallow a CAS with load,store timestamps ts and ts + 1). Thus the equivalent CAS operation is also allowed underthe abstract semantics. Inductive Case - Concrete to AbstractSilent : Silent instructions are handled trivially they only change the thread local stateidentically for the concrete and abstract configurations.
Load : We consider two cases depending on whether the load happens from a dis thread ora env thread.In the case where we load from a dis message, the semantics are equivalent between theabstract and concrete transitions, since we compare the timestamps ts + ≺ ts + 1 and ts ≺ ts + 1 (see the rule (LD-local) in Figure 5). Given that the concretization function(like the abstraction) maintains relative order between dis and env timestamps, the loadis also feasible in the concrete semantics.In the second case, the load is from a env . By inductive hypothesis, we have the concretecomputation till the load transition. In particular, the message ( x , d, vw ) we wish to loadhas already been generated in the concrete computation. To this concrete computation ρ obtained by inductive hypothesis, we invoke the infinite supply lemma with t ∗ as thereading thread’s local view on x to generate the computation M ( ρ ) . M ( ρ ↓ env ) . ρ with the fresh message ( x , d , vw ). By point (2) in the lemma the message is loadable, vw ( x ) ≥ µ x ( t ∗ ). Note how we apply the timestamp lifting function to t ∗ since the readingthread’s new concrete timestamp has changed. Additionally by points (1) and (3) therelative order of timestamps in vw in variables other than x remain the same w.r.t the dis thread messages. This implies that after reading the message, the view of the readingthread will only increase on x . Hence for all other variables it will remain the same thusmaintaining equivalence between the timestamps in the concrete and abstract run. Store : The store transition for dis is identical to its concrete counterpart. For a env thread,we note that we generate copies of the abstract ts + timestamp to get a sequence of concretetimestamps. Here we can generate an arbitrary number of copies and hence the thread, willalways find a vacant timestamp for its store. CAS : When a dis thread makes a CAS, it can either read from a env or from the store of a dis thread. In the latter case let the timestamps of the load,store in the CAS be ts , ts + 1.Then in the abstract semantics we require that ts + B . This implies that in the concretesemantics too, there are no env timestamps between the load,store timestamps and henceCAS is possible in the concrete semantics too. In the former case we again use the infinitesupply lemma as we did in the case of loads, to generate a loadable env message. (cid:74) dwait Godbole, S. Krishna, Roland Meyer 23 This section discusses the safety verification problem for the class env ( nocas ) k dis ( acyc ) k· · · k dis n ( acyc ) consisting of a set of n distinguished dis threads executing a loop-free programin the presence of an unbounded number of env threads. We show that the safety verificationproblem for this class of systems can be decided in PSPACE by leveraging the simplifiedsemantics from Section 4. We will assume that the domain
Dom is finite. Parallely, wedemonstrate the ability to improve automatic verification techniques by showing how toencode the safety verification problem (of whether all assertions hold) into Datalog programs.The encoding is interesting for two reasons: (1) it yields a complexity upper bound that,given [1], came as a surprise; (2) it provides practical verification opportunities, consideringthat Datalog-based Horn-clause solvers are state-of-the-art in program verification [17, 18]. (cid:73)
Theorem 6.
The safety verification problem for env ( nocas ) k dis ( acyc ) k · · · k dis n ( acyc ) , n ∈ N is non-deterministic polynomial-time relative to the query evaluation problem in linearDatalog ( NP PSPACE ) , and hence is in PSPACE . We note that the theorem mentions non-deterministic polynomial time relative to thelinear Datalog oracle. We provide a non-deterministic poly-time procedure A lgo , that, givena verification instance converts it to a Datalog problem P s.t. (1) for a ‘yes’ verificationinstance, atleast one execution of A lgo results in P having successful query evaluation and(2) for a ‘no’ verification instance, no execution of A lgo leads to the resulting P to havesuccessful query evaluation.Linear Datalog is a syntactically restricted variant of Datalog for which query evaluationis easy to solve ( PSPACE ) at the cost of being inconvenient as an encoding target. Giventhat we show a
PSPACE upper bound on the parameterized safety verification for theclass env ( nocas ) || dis ( acyc ) || . . . || dis n ( acyc ), in principle, we could have directly encodedthe parameterized safety verification problem instance as a linear Datalog program. Forconvenience of encoding, we do not directly reduce safety verification into query evaluation inlinear Datalog, but use an intermediate notion of Cache
Datalog. To make the ideas behindour reduction clear, we proceed in three steps. We introduce
Cache
Datalog, which is Datalog with an additional parameter, called the
Cache , that turns out decisive in controlling complexity of encodings in the following sense :every
Cache
Datalog program can be turned into a linear Datalog program at a cost that islinear in the size of the program plus that of the
Cache (Lemma 7), We then show that A lgo generates Cache
Datalog problems that satisfy the description fromthe previous paragraph (Lemma 8), and We then argue that for all
Cache
Datalog instances generated by A lgo , a Cache of polynomialsize is sufficient for query evaluation (Lemma 9).This shows Theorem 6.
Linearizing Datalog
A Datalog program
Prog [24] consists of a predicate set
Preds , adata domain
Data , and a set
Rules of rules (also called clauses). Each predicate comes with afixed arity >
0. A predicate P of arity j is a mapping from Data j to { true, f alse } . An atom consists of a predicate P ( t , . . . , t j ) and a list t , . . . , t j of arguments, where each t i is a term.A term is either a variable or a constant; a term is a ground term if it is a constant, and anatom is a ground atom if all its terms are constants. A positive literal is a positive atom P ( t , . . . , t j ) and a negative literal is a negative atom ¬ P ( t , . . . , t j ), and a ground literal isa ground atom. A rule has the form head : − body , . . . , body t where head and body i are positive literals. A rule with one literal in the body is a linear rule, one without a body is called a fact . A linear Datalog program is one where all rules arelinear or are facts. An instantiation of a rule is the result of replacing each occurrence ofa variable in the rule by a constant. For all instantiations of the rule, if all ground atomsconstituting the body are true then the ground atom in the head can be inferred to be true.All instantiations of facts are trivially true. We write Prog ‘ g to denote that the groundatom g can be inferred from program Prog . Query Evaluation Problem.
The query evaluation problem for Datalog is, given a query instance ( Prog , g ) consisting of a Datalog program Prog and a ground atom g , todetermine whether Prog ‘ g . When studying the combined complexity , both Prog and g aregiven as input [65]. It is known [38] that combined complexity of query evaluation for linearDatalog is in PSPACE , while allowing non linear rules raises the complexity to
NEXPTIME ( [65] and [44]). Motivated by verification, there has been interest in linearizing Datalog [45].
Adding
Cache to Datalog:
Cache
Datalog . We introduce to Datalog the concept ofa
Cache . A
Cache is a set of ground atoms that is used to control the inference process.The resulting program is called a
Cache
Datalog program. In the presence of a
Cache , thesemantics of Datalog is adapted by the following two rules.
Add : For an instantiated rule, the ground atom in the head can be inferred and added to
Cache only when all the ground atoms in the body are in
Cache . Drop : Atoms in
Cache can be dropped non-deterministically.The standard semantics of Datalog can be recovered by monotonically adding all inferredatoms (starting with facts) to the
Cache and never dropping anything. To show the upperbound, we use a notion of inference that takes into account the size of the
Cache andminimizes it. For a
Cache
Datalog program
Prog and k ∈ N , we write Prog ‘ k g to meanthat ground atom g can be inferred from Prog with a computation in which | Cache | ≤ k , thenumber of atoms in Cache is always at most k . The Cache size measures the complexity oflinearizing
Cache
Datalog as follows. (cid:73)
Lemma 7.
Given a
Cache
Datalog program
Prog , a ground atom g , and a bound k , intime quadratic in | Prog | + | g | + k we can construct a linear Datalog program Prog so that Prog ‘ k g iff Prog ‘ g . Proof.
To go from
Cache
Datalog to linear Datalog, the idea is to simulate the
Cache using a new predicate
CachePred of arity k in the constructed linear Datalog program Prog . We know that a Cache of size k suffices in the Cache
Datalog program, so any rule head : − body , . . . , body p in the Cache
Datalog is s.t. p < k . Simulating Cache
Intuitively, the predicate
CachePred ( t , t , · · · , t k ) represents that theterms t i are members of the Cache . We can simulate the set
Cache by reshuffling termsusing rules that swap the i th and j th elements with rules of the form, CachePred ( t , · · · , t j , · · · , t i , · · · , t k ) : − CachePred ( t , · · · , t i , · · · , t j , · · · , t k )There are quadratically many such rules. Rules
Consider a rule R with a body of size p in Cache
Datalog as follows. head : − body , . . . , body p We convert this into a rule which matches the first p terms of CachePred with the elementsof the body. If there is such a matching, the term head can be inferred and added into dwait Godbole, S. Krishna, Roland Meyer 25
Cache . This is simulated by replacing some term amongst t i with the term in the head whilekeeping other terms the same. CachePred ( t , · · · , t i = head , · · · , t k ) : − CachePred ( t = body , · · · , t p = body p , t p +1 , · · · , t k )There are k choices for the term to be replaced. Thus we have k new rules per rule in theoriginal program. Final Inference
Finally, since we know that each element of
Cache is true, we add theinference rules, t i : − CachePred ( t , t , · · · , t p ) for 1 ≤ i ≤ p Now, g can be generated if g ever enters Cache , i.e.
CachePred ( t , t , . . . , g, . . . , t p ) for someother terms t i . Then we can use the above inference rule to infer g .This shows that we need at most quadratically many rules each with a single body, to giveus a linear Datalog program. (cid:74) Theorem 5 tells us that safety verification under RA is equivalent to safety verification in thesimplified semantics. Safety verification in the simplified semantics, in turn, can be reducedto the
Message Generation (MG) problem.Given a parametrized system c and a message msg = ( x ∗ , d ∗ , _) called goal message , doesthere exist a reachable configuration cf de = ( m de , lcfm de ) such that msg ∈ m de (for some vw de )?To see the connection between MG and safety verification, note that we can replaceeach assert false statement in the program by x ∗ := d ∗ for variable x ∗ and value d ∗ unusedelsewhere. The system is unsafe if and only if a goal message msg = ( x ∗ , d ∗ , vw de ) isgenerated for some vw de .While encoding into Datalog, we non-deterministically guess vw de . For this, we cruciallyshow that there are only exponentially-many choices of vw de which need to be enumerated.Henceforth we assume that the queried goal message msg can have arbitrary vw de . Given c , msg , our non-deterministic poly-time procedure A lgo satisfies the following, the proof ofwhich is in Sections 5.1.2. (cid:73) Lemma 8.
Given a parametrized system c and a goal message msg , Message Generation(MG) holds iff there is some execution of A lgo that generates a query instance ( Prog , g ) suchthat Prog ‘ g . The construction of Prog and g is in (non-deterministic) time polynomial in | c | . The procedure A lgo generates one query instance ( Prog , g ) per execution. The postpone thefull description of A lgo and first give some intuition. Since the parameterized system consistsof n loop-free dis threads, each can execute only linearly-many instructions in their size. Thetotal number of instructions executed (and hence the total number of timestamps used) bythe dis threads is polynomial in | c dis | , the combined size of dis programs (concretely the sumof sizes of individual c i dis programs). A lgo guesses the dis threads part of the computationand generates a query instance ( Prog , g ). Prog itself uses four main predicates. The e nvironment message predicate emp ( x , d , vw de )represents the availability of a env message on variable x with value d and view vw de . The environment thread predicate etp ( lc , rv , vw de ) encodes the env thread configuration, where lc is the control-state, rv is the register valuation and vw de is the thread view. We alsohave similar message and thread predicates for dis threads. The distinguished messagepredicate dmp ( x , d , vw de ) represents the availability of a dis message. Additionally, for each dis thread i ∈ [ n ], we have a distinguished thread predicate dtp i ( lc , rv , vw de ) that encodes theconfigurations of dis [ i ].In the set of rules, we have the fact dmp ( x , d init , vw deinit ) for each x ∈ Var with d init theinitial value and vw deinit the initial view. We also have (i) facts etp ( λ init , rv init , vw deinit ) and dtp [ i ]( λ init , rv init , vw deinit ) representing the initial states of both env and dis threads, (ii) rulescorresponding to the env transitions and the guessed dis thread run fragments. Finally, thequery atom g , is a ground atom ∈ { emp , dmp } and captures the goal message msg beinggenerated. The instances generated in the non-deterministic branches of A lgo differ only dueto the guessed dis run and the atom g .We now describe the full Datalog program, also proving Lemma 8. A lgo for query instance generation We discuss the details of the procedure A lgo which generates the query instance ( Prog , g )non-deterministically. We use the following predicates in the constructed Datalog program. emp ( msg ): the message generation predicate for env threads, where msg is a message; etp ( lc , rv , vw de ) : the thread state predicate for env threads dmp ( msg ): the message generation predicate for dis threads, where msg is a message; dtp [ i ]( lc , rv , vw de ): the thread state predicate, one for each dis thread avail ( x , ts + ) : the timestamp availability predicate, which indicates that a timestamp ts isnot blocked by a CAS operation, per variable.The Datalog program generated has two parts, one does not depend on the non-deterministic choices made by A lgo , while the other does. We describe the former part first,these rules for the Datalog Program are in Figure 6. The second set of rules, depending onthe nondeterministic choice of A lgo is in Figure 7. The first set of rules in the Datalog program (Figure 6). The facts, in green, providethe ground terms for the init messages as well as initial state of the dis and env threads. Theorange rules capture the thread local transitions of the env threads. We deviate a bit from thestandard notation for programs here, and instead view them as labelled transition systems.It is easy to see that the two notions are equivalent. The initial state labels are λ envinit for the env threads and λ i init for the dis threads. For a pair of labels, we write λ i −→ λ to denotethat λ can be reached from λ by executing i . In the Datalog program, we have a rule foreach such transition in the program. The thread-local transitions are in orange. Loads arein violet (first corresponding to loads from env messages, the second for loading from dis messages). For loads, the rule requires a term with the message predicate (from which thethread is reading) in the body of the rule. Stores are in pink, the first rule corresponds tothe new thread-local state after execution of the store. The second rule corresponds to thegeneration of a term for the new message (in the head). Though we use some higher ordersyntax for rules such as assume , t , and < envx we note that these can be easily translated topure Datalog with small overhead given the polynomial size of the domain and the constantarity of the predicates.These rules capture completely the env thread component of the run. As we had mentionedearlier, the component of the query instance that differs due to non-determinism of A lgo is dwait Godbole, S. Krishna, Roland Meyer 27 rule condition on program of env threads, c env etp ( λ , rv , vw de ) : − etp ( λ , rv , vw de ) if λ skip −−→ λ etp ( λ , rv , vw de ) : − etp ( λ , rv with [[ e ]]( rv ( r )) = 0 , vw de ) if λ assume e ( r ) −−−−−−→ λ etp ( λ , rv [ r e ( r )] , vw de ) : − etp ( λ , rv , vw de ) if λ r := e ( r ) −−−−→ λ etp ( λ , rv [ r ← d ] , vw de t envx vw de ) : − etp ( λ , rv , vw de ) , emp ( x , d , vw de ) if λ r := x −−−→ λ etp ( λ , rv [ r ← d ] , vw de t x vw de ) : − etp ( λ , rv , vw de ) , dmp ( x , d , vw de ) , vw de < x vw de if λ r := x −−−→ λ etp ( λ , rv , vw de ) : − etp ( λ , rv , vw de ) , avail ( x , vw de ( x )) if λ x := r −−−→ λ , with vw de < envx vw de emp ( x , rv ( r ) , vw de ) : − etp ( λ , rv , vw de ) , avail ( x , vw de ( x )) if ∃ λ . λ x := r −−−→ λ , with vw de < envx vw de (a) The (fixed) set of rules in the Datalog program encoding the transition system of the env threads.Silent transitions (in orange); memory accesses: loads (in violet) and stores (in pink). fact comment dmp ( x , d init , vw deinit ) : − for all variables xetp ( λ envinit , rv init , vw deinit ) : − λ envinit is initial state of env threads dtp [ i ]( λ i init , rv init , vw deinit ) : − λ i init is initial state of dis [ i ] thread (b) First set of facts in the Datalog program; these do not depend on the non-deterministic guess made by A lgo for the computation of dis threads. These facts encode the initial configurations of the threads andthe initial messages. Figure 6
First set of rules for the Datalog Program. This fixed rule set is independent ofnon-determinism of A lgo . the dis part of the run. Essentially, A lgo guesses in polynomial time the executions of all the dis threads. This is possible since they are loop-free and hence execution lengths are linearin the size of their specifications. We now describe this second part of the Datalog queryinstance. Second set of rules in the Datalog program (Figure 7). We have a bound on thenumber of write timestamps that can be used by the dis threads - an easy bound is thecombined number of instructions in dis threads, | c dis | . We will refer to this bound as T .By the simplified semantics it suffices to consider the timestamps { , + , · · · , T, T + } . Thisfollows since, the dis threads perform atmost T writes. Hence we need only T timestampsof the form N . Additionally, we have only one timestamp of the form N + between any twotimestamps of the form N . This shows that the view terms in the predicates of the Datalogprogram can be guessed in polynomial space (since T is polynomial in the input).Now for each dis thread i , the procedure A lgo non-deterministically guesses the computa-tion ρ i for dis [ i ]. That is, A lgo guesses the timestamps and the register valuations of dis i at each configuration in this run, along with the messages dis i loaded from. Post this, itconverts ρ i to a set of rules which are then added to the earlier set from Figure 6.Consider the computation ρ i ≡ λ i init i −→ λ i −→ λ · · · i | ρi | −−−→ λ | ρ i | of length | ρ i | . Let theviews of dis thread i at point j in the run be given as vw de j . Additionally, if i j is a loadinstruction, A lgo also guesses the message that was read by the dis thread i . Each instruction i j in this computation is then converted into one amongst the rules in Figure 7a dependingon the instruction i j executed, represented in the figure as ‘condition’. Additionally to encodethe N + timestamps that have not been occupied by CAS operations (and hence are free to use by the dis threads), we have the rule in 7b. rule condition on thread transition i j of computation ρ i for thread dis i dtp [ i ]( λ j , rv , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) i j = skipdtp [ i ]( λ j , rv , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) i j = assume e ( r ) ∧ [[ e ]]( rv ( r )) = 0 dtp [ i ]( λ j , rv [ r r := e ( r )] , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) i j = r := e ( r ) dtp [ i ]( λ j , rv [ r ← d ] , vw de t envx vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , emp ( x , d , vw de ) i j = r := x ∧ thread loads msg = ( x , d , vw de ) ∧ vw de ( x ) ∈ N + dtp [ i ]( λ j , rv [ r ← d ] , vw de t x vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , dmp ( x , d , vw de ) i j = r := x ∧ thread loads msg = ( x , d , vw de ) ∧ vw de ( x ) ∈ N ∧ vw de < x vw de dtp [ i ]( λ j , rv , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) dmp ( x , rv ( r ) , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) i j = x := r ∧ thread stores msg = ( x , rv ( r ) , vw de ) ∧ vw de < x vw de dtp [ i ]( λ j , rv , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , emp ( x , d , vw de ) i j = cas ( x , d , r ) ∧ vw de ( x ) ∈ N + , vw de = vw de t envx vw de , dmp ( x , rv ( r ) , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , emp ( x , d , vw de ) vw de ( x ) = ts + , vw de = vw de [ x → ts + 1] dtp [ i ]( λ j , rv , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , dmp ( x , d , vw de ) i j = cas ( x , d , r ) ∧ vw de ( x ) ≤ ts = vw de ( x ) dmp ( x , rv ( r ) , vw de ) : − dtp [ i ]( λ j − , rv , vw de ) , dmp ( x , d , vw de ) vw de = vw de t x vw de , vw de = vw de [ x → ts + 1] (a) These rules are chosen depending upon the nondeterministic choice made by A lgo of the computation ρ i of thread i . Each instruction i j executing in ρ j is then mapped to one of the rules from above dependingupon which condition (right column) is satisfied. Rules for silent i j (in orange); memory accesses, loads(in violet) and stores (in pink), and CAS (gray). The second pink rule corresponds to message generationby the thread i executing a store instruction. The first two CAS rules correspond to the case wherethe load is from a env message and the last two correspond to the load from a dis message. The firstrule for each case is the thread-local state change rule while the second rule generates the ground atomcorresponding to the message generated by the CAS operation. fact condition on availability of timestamps avail ( x , ts + ) : − ts ∈ { , · · · T } ∧ no dis performs a cas operation with timestamps ( ts , ts + 1) (b) This fact corresponds to the availability of a N + timestamp for stores by env threads, which is knownonce all the dis computations have been guessed. These rules are not generated on a per- dis threadbasis but rather once the computations ρ i for all dis threads have been non-determinsitically guessed.Referring to the simplified semantics, this rule captures ts B , the fact that there is no CAS operationwith timestamps ( ts , ts + 1). Note that the avail predicate plays a role in inferring the env thread stateand message predicates as seen in the last two rows in Figure 6 : we can infer the env thread state andmessage predicates with a view vw ( x ) only when the respective timestamp is not blocked. This in turnis used in the first CAS operation (first gray row, Figure 7(a)) ) when loads happen from a env thread :the merged view vw de ( x ) = vw de ( x ) t envx vw de ( x ) is ts + , and the new timestamp after CAS is ts +1. Notethat this is possible since (i) if the timestamp of the env thread for x from where we load was ts + , thenthere is no dis thread with a timestamp ts for x and, (ii) if the timestamp of the env thread from wherewe load was ≺ ts + , then the timestamp of the dis thread performing the CAS was ts for x . In both cases,the timestamp after CAS will be ts +1. Figure 7
Second set of rules for the Datalog Program. This rule set depends on the nondetermin-istic choice made by A lgo for the computations of the env threads Since we have the polynomial bound on T , it is easy to see that the rules above for the run ρ i executed by each dis thread i can be generated in polynomial-time after nondeterministicallyguessing ρ i . These (non-determinism dependent) rules along with the rules from Figure 6together form the complete Datalog program. ↔ Computations in theSimplified Semantics) and proof of Lemma 8
Now we see how an inference process in the (complete) Datalog program corresponds toa computation in the simplified semantics. To do this, we give invariants which relate theinference of atoms in the Datalog program with the existence of events in the computation. dwait Godbole, S. Krishna, Roland Meyer 29
These invariants together imply the equivalence between an inference sequence in the Datalogprogram and a computation of c under the the simplified semantics. Finally, if the goalmessage msg is reachable at the end of a computation ρ of c , then correspondingly, thanks tothe invariants we obtain, we can also infer the ground term g being dmp ( msg ) or emp ( msg )in the Datalog program depending on whether the goal message was generated by a dis thread or env thread in the computation. env thread-local state invariantThe ground atom etp ( λ, rv , vw de ) can be inferred iff some env thread can reach the lcf = ( λ, rv , vw de ).This says that some env thread is able to reach the state from its transition system withlabel λ such that the thread-local view and the register valuation at that time are vw de and rv respectively. We can prove that this holds by induction on the length of the run andby noting from the Datalog rules (Figure 6) that there is a transition λ i −→ λ wheneverthere is a rule corresponding to that. Additionally, the load rules (in blue) require thatthe corresponding message atoms ( emp / dmp ) holds which as we will see below implies thepossibility of the generation of a message in the memory. env thread message invariantThe ground atom emp ( x , d , vw de ) can be inferred if the corresponding message msg =( x , d , vw de ) can be generated in the simplified semantics by some env thread.Note that a ground atom of the form emp ( x , d , vw de ) can only be inferred using the last rulein Figure 6. The body of this rule contains the term etp ( λ, rv , vw de ). This, if true, impliesthat some env thread can reach the corresponding thread-local state by the first invariant(above). An env thread in this thread-local state can generate the message ( x , rv ( r ) , vw de )since there is an outgoing transition from λ with instruction x := r . Note that we have thecheck on the existence of the transition to ensure that the message can indeed be generated.This is required for the last rule to exist in the program. This implies that the message canin fact be generated. dis thread-local state For each dis thread i , we have the following invariant.The ground atom dtp [ i ]( λ, rv , vw de ) can be inferred iff the dis thread i can reach the lcf = ( λ, rv , vw de ).This is just the dis analog of the first invariant for env threads. This can also be provedby induction on the length of the run (or the inference sequence). Also analogous to theinvariant for the emp , we have an invariant for dis messages. dis thread message invariantThe ground atom dmp ( x , d , vw de ) can be inferred if the corresponding ( dis ) message msg = ( x , d , vw de ) can be generated in the simplified semantics by some dis thread. avail timestamp availability invariantIf the fact avail ( x , ts + ) is in the Datalog program then we have ts B throughout thecomputation ρ of the simplified semantics.The base case for message predicates holds since the facts dmp ( x , d init , vw deinit ) for all variablesare given in the Datalog program. The base case for thread state predicates holds due tothe fact etp ( λ init , rv init , vw deinit ) which captures the initial state of the thread. The inductivesteps can be formally proved by considering a computation ρ under the simplified semanticsand mapping each transition in ρ to an inference step in the Datalog program. For theconverse, we assume an inference sequence (a sequence of invocations of the rules) and, for each rule invoked to infer a new ground atom, we show that a corresponding transition canbe taken by a thread in the simplified semantics so that the invariants are maintained. Thisin turn, is done by taking cases on the next instruction to be executed.The equivalence between transitions in a computation (hence a computation ρ ) of thesimplified semantics and the application of rules/facts in the Datalog program, leadingto reachability of some message msg in ρ iff the corresponding ground term dmp ( msg ) or emp ( msg ) is inferred in Datalog is sufficient to prove Lemma 8. In particular, the generation of msg de in some computation ρ of c gives a sub-computation ρ ↓ dis performed by the dis threads.We consider the Datalog query instance ( Prog , g ) generated where A lgo correctly guesses ρ ↓ dis . By the message generation invariant, the ground atom dmp ( msg de ) or emp ( msg de )corresponding to msg de can be inferred, Prog ‘ g , giving the forward direction of the lemma.For the reverse direction, we note that Prog ‘ emp ( msg de ) or Prog ‘ dmp ( msg de ) immedi-ately implies that the message can be generated in some computation of the system c (the dis computation is already determined in the guessed program Prog ). Cache
Size
Having described the encoding(s), the challenge now is to provide a polynomial bound on thecache size for the query instances generated by A lgo . The Cache behaves like a memoized setof atoms which are used for the inference process. The reason why a polynomial sized
Cache suffices is that we can “forget” (remove from
Cache ) previously inferred atoms when theyare not being actively used. We use this crucially in the context of env predicates, emp , etp .Technically this is possible since the arbitrary replication property of env threads allows us to“forget” the state of the previously simulated env thread and simulate a fresh copy instead.Let Q = | Dom || Var | + | c dis | . We show that a Cache of size O ( Q ) is sufficient to infer g . (cid:73) Lemma 9.
For each ( Prog , g ) generated by A lgo , Prog ‘ g if and only if Prog ‘ k g with k ∈ O ( Q ) . An inference sequence performed on
Prog corresponds to a computation of the parameterizedsystem c in the simplified semantics (Section 5.1.2). Hence, to see that the above size of Cache is sufficient we analyze the structure of computations in the simplified semantics. Theanalysis will reveal a dependency relation among the messages generated. We will see thatthis gives enough information to guide the Datalog computation so as to use small sized
Cache .Consider a computation ρ de ending in the configuration last ( ρ de ) = ( m de , lcfm de ). Forevery message msg de in m de , we define genthread ( msg de ) as the first thread which added msg de to the memory m de . (Recall that the simplified semantics admits repeated insertionsfor env messages due to reuse of timestamps from N + ). We define depend ( msg de ) as the setof messages which genthread ( msg de ) reads from, before generating the first instance of msg de .We define the notion of a dependency graph for a computation ρ de . (cid:73) Definition 10.
The dependency graph of a computation ρ de with last ( ρ de ) = ( m de , lcfm de ) is the directed graph G ρ de = ( V, E ) whose vertices V = m de are the messages in the finalconfiguration and whose edges reflect the dependencies, ( msg de , msg de ) ∈ E if msg de ∈ depend ( msg de ) . As depend ( − ) is based on the linear order of the computation, the dependency graph is acyclic.The acyclicity of dependency graphs follows immediately from the definition of depend . Ifthere is a cycle, then all the threads involved in the cycle would be dependent on each other dwait Godbole, S. Krishna, Roland Meyer 31 for the first generation of the respective message, thus causing a deadlock. We denote thesets of sink and source vertices of G by sink ( G ) resp. source ( G ). A path in G is also called a dependency sequence . A path or dependency sequence m → m → m → . . . m n − → m n thus says that m was read by some thread which generated m , m in turn was read by athread which generated m and so on till the thread which generated m n read m n − . Givensuch a sequence, we say m i is an ancestor of m j if i < j . The height of a vertex v is thelength of a longest path from a source vertex to v . The maximal height over all vertices is height ( G ). See Figure 8 for an example. ( x , ,
00) ( y , , y , , + )( x , , + + )( y , , + + ) ( x , ,
00) ( y , , y , , + )( x , , + + ) ( y , , + + ) x = 0 = y T T y. load (0)y. load (1)x. store (1)y. store (2) x. load (0)y. store (1)x. load (1)y. store (2) Figure 8
Two possible dependency graphs for the code snippet. T , T are both env threads. Thecolor of each message msg de signifies genthread ( msg de ) ( T orange, T violet, init gray). We denotethe view as a vector t x t y . Since we only consider the thread adding a message for the first time genthread ( y , , + + ) can be either T (left graph) or T (right graph). Compact Computations . Unfortunately, dependency graphs may contain exponentiallymany vertices (due to the views), and given the
PSPACE -hardness in Section 7 there is noway to reduce this to polynomial size. Yet, there are two parameters that we can reduce, the‘fan-in’ of each vertex v (number of messages read by genthread ( v ) before generating v ), andand the ‘height’ of the dependency graph (longest dependency sequence). A computation ρ de is compact if its dependency graph G ρ de satisfies the following two bounds. (1) Every message v depends on a small number of other messages, | depend ( v ) | ≤ Q . (2) The dependencysequences are polynomially long, that is, height ( G ρ de ) ≤ Q . The following lemma says thatcompact computations are sufficient: (cid:73) Lemma 11.
Any message that can be generated in the simplified semantics, can be generatedby a compact computation.
Proof.
We prove both parts (fan-in and height) of this lemma by showing that if there existsa computation whose dependency graph violates the bound for fan-in (similarly height), thenthere must exist a computation whose dependency graph has a lower fan-in (height) with therest of the graph (fan-ins of other vertices) unchanged. We first show this for fan-in. We willassume that the programs c dis executed by dis threads have been specified as a transitionsystem (note that we can interconvert between the while-language and transition systemrepresentation with only polynomial blowup). Then | c dis | is an upper bound on the totalnumber of transitions in all dis threads together. Fan-in . Suppose to the contrary, we had | depend ( v ) | > | Dom || Var | + | c dis | for some message v . Consider the thread p = genthread ( v ) which generated the message represented by vertex v for the first time. There are only | Dom || Var | distinct (variable, value) pairs, | Dom || Var | many init messages and only | c dis | many dis messages ( | c dis | is an upper bound on the numberof transitions the dis thread can take). Hence by a pigeonhole argument, p must have readtwo env messages with same (variable, value) pair but distinct abstract views. Let thesemessages be m = ( x , d , vw de ) and m = ( x , d , vw de ) where the abstract views are unequal.Without loss of generality assume that p = genthread ( v ) read m first, and m later (inorder) before it generated v . It can be seen that any time p read m , it could have read m instead. This follows sincetimestamp comparisons are irrelevant when reading from env messages. The thread-localview obtained on replacing a read of m with that of m will only decrease or remain thesame. From the simplified semantics, after reading m once, the thread view vw de satisfies vw de w vw de (per-variable). Hence reading from m again leads to the thread view being vw de > envx vw de . On the other hand, after reading m , the view will be vw de t envx vw de whichis clearly higher than vw de . Indeed, instead of reading from m , the loading thread canread from m , resulting in a lower view for x (compared to reading from m ).Let ρ denote the subcomputation starting from the position right after reading from the env message m . We can see that if we replace this read operation by reading from m , wecan continue on ρ as before.Indeed, all store operations on ρ are independent of this load from m (or m ).Consider a load operation along ρ . A load on a variable y = x is not affected clearly.Consider now a load on x performed by loading some message m . Assume the load isperformed by p . The view of x along ρ for thread p was coming from m which was aleast that given by m ; indeed if loading from m was possible in ρ when the view on x was at least vw ( x ), it definitely is possible now with a lower view on x .Lastly, consider a CAS operation on variable x , along ρ . Assume the load was madefrom m . If vw de ( x ) = ts +2 in m , then the CAS operation will add a new message m on x with vw ( x ) = ts + 1. However, note that the same thread can still perform CAS byreading from m , with vw de ( x ) = ts +1 in m (with ts +1 < ts +2 ) by adding a new message m on x with vw ( x ) = ts + 1.Hence reading from m instead of m does not affect the sub computation ρ . Thus we caneliminate all reads of m to decrease | depend ( v ) | . Thus, | depend ( v ) | ≤ | Dom || Var | + | c dis | for each vertex v . Height . Let there be a dependency sequence of length greater than 2 | Dom || Var | + | c dis | .There are only | Dom || Var | (variable, value) pairs, | Dom || Var | many init messages and atmost | c dis | many dis messages. Hence by a pigeonhole argument, for a dependency sequence longerthan 2 | Dom || Var | + | c dis | there exists a (variable, value) pair ( x , d ) such that there are two env messages m = ( x , d , vw de ) and n = ( x , d , vw de ) along it. Without loss of generality, let n be an ancestor of m . So, n has been read before generating m . Then we must have vw de w vw de by the RA Semantics (since the thread generating m indirectly accumulatesthe view of n ). Then the thread reading from (depending-upon) m could have directlyread from n instead (note that since m itself depends on n , by the time m has beengenerated, n must have been as well). By reading from n its view may only decrease orremain the same thus not affecting the run (as justified above). Thus we can eventuallyreduce the dependency sequences so that all have length at most 2 | Dom || Var | + | c dis | .This gives us the result. (cid:74) In Cache
Datalog, the inference of an atom g from the program Prog involves a sequenceof applications of the Add (to
Cache ) and Drop (from
Cache ) rules that ends with g beinginferred. Such a sequence for Prog ‘ g corresponds to a run ρ de under the simplified RAsemantics. We show that this follows by the structure of the query instance ( Prog , g ). Therun ρ de can be compacted to ρ de by Lemma 11. From the dependency graph of ρ de wecan read off an inference strategy that keeps the Cache size polynomial in | Var | , | Dom | and | c dis | . The following lemma formalizes this argument and so proves Proposition 9. We now dwait Godbole, S. Krishna, Roland Meyer 33 proceed by showing Lemma 12. This lemma along with Lemma 11 together gives Proposition9 and will lead to the coveted PSPACE -bound. Since the term 2 | Dom || Var | + | c dis | will occurrepeatedly, we denote it by the the quantity Q . From here on, Q = 2 | Dom || Var | + | c dis | . (cid:73) Lemma 12 (Datalog Inference Strategy) . Let A lgo generate the query instance ( Prog , g ) .The inference for Prog ‘ g implies the existence of an execution ρ de under the simplifiedsemantics, which can be compacted to ρ de . The computation ρ de can be mapped back to anew inference sequence such that Prog ‘ k g for k ∈ O ( Q ) . Proof.
This lemma has two parts: (1) it states that computations in the simplified semanticsand inference sequences in the
Cache
Datalog program are related and (2) it says thatcompact computations can be mapped to an inference sequence with a small
Cache size.Let (
Prog , g ) be generated by the procedure A lgo with Prog ‘ g . We need to showthat g can also be inferred from Prog with a small
Cache . Recall that when generating theDatalog program
Prog , the procedure A lgo guesses the computations of the dis processes.Consider some inference sequence for Prog ‘ g . For each application of an inference rule inthe sequence, we can find a corresponding transition of a thread in the simplified semantics.This follows from the invariants in section 5.1.2. Hence we can convert the sequence ofinferences to a run ρ de . This run in turn can be compacted by the arguments in Lemma 11,to get a smaller run ρ de . Now we need to see how this compact run implies the existenceof an inference sequence with smaller sized Cache . To do this we consider the dependencygraph of ρ de .We proceed by induction on the height of messages in the dependency graph. Westrengthen the statement and show that for every message msg de at a height given by height ( msg de ) = h , we have Prog ‘ k emp ( msg de ) ( Prog ‘ k dmp ( msg de )) for k = h × Q . Thelemma follows by the definition of compactness, which guarantees h ≤ height ( G ρ de ) ≤ Q .The base case is trivial, since all messages in sink ( G ρ de ) are facts in the Datalog program Prog . We now show the inductive case for a message v ∈ G ρ de at height h + 1. The messages v in depend ( v ) have height at most h . The inductive hypothesis thus yields Prog ‘ h Q v .We infer these messages one at a time, store them in the Cache , and discard all atoms inthe
Cache used for the inference of the v . Hence at each step in the inference sequence, the Cache contains a subset of depend ( v ) which has already been inferred, and, additionally someatoms which are currently being used for the inference of the next member of depend ( v ).The former is bounded by Q by the compactness (Lemma 11, | depend ( v ) | < Q ) while thelatter is bounded by h Q by the induction hypothesis. Together the size of the Cache neverexceeds ( h + 1) Q . Thus by reusing the space in the Cache to infer members of depend ( v ),we only require an additional space of Q . At the end of this process, the size of the Cache equals | depend ( v ) | and the space consumption of the dependencies is at most( Q − | {z } bound on | depend ( v ) | + h Q |{z} inductive hypothesis for next atom at height h = ( h + 1)( Q ) − v , having inferred and insertedinto Cache atoms corresponding to messages from depend ( v ). This inference of v from themessages in depend ( v ) requires us to simulate the run of genthread ( v ) using the rules of theDatalog program (by mapping each transition executed by genthread ( v ) to its correspondingrule from the Datalog program). We note that at all points in the simulation it suffices tostore exactly one extra atom either of etp or of dtp (depending upon the type of genthread ( v ))corresponding to the local state of genthread ( v ). The additional atom can be accommodatedalong with depend ( v ) since | depend ( v ) | + 1 < ( h + 1) Q (since | depend ( v ) | < Q ). Hence a
Cache of size at most 2( h + 1)( | Dom || Var | + | c dis | ) is sufficient, and by inductionthe lemma follows. (cid:74) Lemma 11 along with the compact inference sequences, Lemma 12, together show thatfor all the query instances generated by A lgo , inference is possible if it is possible with asmall Cache . This shows Lemma 9 giving us
PSPACE -membership.
In this section our goal is to support compositional verification methods prominent in programlogics and thread-modular reasoning style algorithmic verification. Such approaches focus ona single thread and study its interaction with others.We extend the system from section 5 by adding a single distinguished ‘ego’ thread, whichwe refer to as the leader , denoted by the symbol ldr . Amongst the n dis threads only the ldr can execute loops, while the others, like section 5 are required to be loop-free.The environment once again consists of arbitrarily many identical env threads that arerequired to be cas-free. We can represent this as env ( nocas ) k dis k dis ( acyc ) k · · · k dis n ( acyc ) which we refer to as the leader setting.Note that the simplified semantics presented in Section 4 applies here. This allows usto leverage Theorem 5 by which we can operate on the simplified semantics instead. Themain challenge of this section then is to go from the simplified semantics in the presence of aleader to an NEXPTIME verification technique, by means of a small model argument.
As discussed before, the safety verification problem amounts to solving the message generationproblem (MG) (section 5.1). Let the goal message be denoted msg .We demonstrate that the simplified semantics helps solving the problem.Our main finding is that message generation has short witness computations (assumingthe domain is finite). The proof of Theorem 13 is in Section 6.3. (cid:73)
Theorem 13.
In the leader setting, a message can be generated in the simplified semanticsif and only if it can be generated by a computation of length at most exponential in the inputspecification, | c dis | · | c env | · | Reg | · |
Dom | · |
Var | . (cid:73) Corollary 14.
In the leader setting, the message generation problem for RA is in
NEXPTIME . We establish the result in two steps. First we show that every computation in the simplifiedsemantics has a “backbone”, which is made up solely by some threads called essential threads (Lemma 16). Then we show how to truncate this backbone to obtain a short computation(Section 6.2).
Analyzing Dependencies in the Dependency Graph . The following study of depend-encies generalizes the one in Section 5.2. In a computation of the simplified semantics,messages from the dis threads have unique timestamps whereas messages from env threadsmay have identical timestamps. We recall genthread ( msg ), the thread which first generatedmessage msg , and the dependency set of a message msg , denoted by depend ( msg ) as definedearlier in Section 5.2.We define depend ( msg ) = ∅ for initial messages. We write depend ∗ ( msg ) for the reflexiveand transitive closure of depend , the smallest set containing msg and such that for all msg ∈ depend ∗ ( msg ) we have depend ( msg ) ⊆ depend ∗ ( msg ). dwait Godbole, S. Krishna, Roland Meyer 35 Similar to Lemma 11, we now show that we can focus on computations where any writeevent directly depends on a small number of other events, and where dependency sequencesare short. The main difference with Section 5.2 is that since the leader has loops, we cannotapriori bound executions w.r.t. | c dis | . Keeping this in mind, we provide an alternative notionfor compact computations. Compact Computations . We call a computation ρ compact if for every env message msg ∈ depend ∗ ( msg ) in the computation (1) | depend ( msg ) ∩ Msgs ( ρ ↓ env ) | ≤ | Dom || Var | and(2) for every msg = msg from depend ∗ ( msg ) ∩ Msgs ( ρ ↓ env ) either the variable or the valueis different from msg . The first point addresses the situation where an env thread readstwo messages with the same variable and value but different views: it says that the threadcould have chosen to read one of the messages twice. The second point says there is noneed to generate two env messages with the same variable and value along a dependencysequence. A thread reading the second message could equally well read the first message, the ts + timestamp for env messages would make it available forever. (cid:73) Lemma 15.
In the leader setting, if the message msg can be generated in the simplifiedsemantics, then it can be generated by a compact computation.
In a compact computation, both fan-in (size of depend set) and depth (along a dependencysequence) of env messages is O ( | Dom || Var | ) since there are only as many distinct (variable,value) pairs. Hence O (( | Dom || Var | ) | Dom || Var | ) many env messages are sufficient to generate msg . Our goal is to derive a similar bound on dis messages. First, we consider the dis messages read by env threads, i.e. the dis - env reads-from dependencies. The dis - dis dependencies will be handled later. Essential Messages and Threads . Given a computation ρ in the simplified semantics,the essential messages for generating message msg , denoted by edepend ( msg ), is the smallestset that includes msg and is closed as follows. (1) ∀ messages msg ∈ edepend ( msg ) ∩ Msgs ( ρ ↓ env ) we have depend ( msg ) ⊆ edepend ( msg ). ∀ msg ∈ edepend ( msg ) ∩ Msgs ( ρ ↓ dis ) we have depend ( msg ) ∩ Msgs ( ρ ↓ env ) ⊆ edepend ( msg ).Note the asymmetry, for the env threads we track all dependencies, for the dis threads weonly track the dependencies from env .For a computation ρ , the threads generating essential messages of msg forthe first time and the set of dis threads are essential threads ; ethread ( ρ ) = { genthread ( m ) | m ∈ edepend ( msg ) } ∪ dis .We claim that projecting ρ to essential threads yields a valid computation in the simplifiedsemantics. Essential messages thus form the backbone of the computation mentioned above.We now give the proof of Lemma 16 and Corollary 17. (cid:73) Lemma 16. If ρ is a computation in the simplified semantics, so is ρ ↓ ethread ( ρ ) . Proof.
To prove this theorem it suffices to show that there is no thread in ethread ( ρ ) thatreads from some thread t ethread ( ρ ). Then we simply can project away the threads not in ethread ( ρ ) and all the reads-from dependencies will still be respected.This follows trivially from the definition of edepend ( ). Indeed we have that for anessential env thread t the messages (and hence threads) that t reads from are also essential.All dis threads are essential by definition. Additionally, for any dis thread, we add all its env dependencies to the essential set. The set ethread ( ρ ) is then closed under reads-fromdependencies and hence the computation ρ ↓ edepend ( ρ ) is valid under RA. (cid:74) Now we discuss bounding of essential messages. Essential env messages and (and essential env threads) are atmost exponential, bounded by Q = ( | Dom || Var | ) | Dom || Var | using the earliercompactness argument. We show that the number of essential dis messages is boundedas well. Firstly, each env thread has a state space (control-state, registers) bounded by Q = | c env || Reg | | Dom | . Given the earlier bound on total number of essential env messages(and hence those by a single thread), an env thread run of length greater than O ( Q Q )implies that there will exist a sub-run in which (1) no essential message was generated and(2) the thread revisited the same local state twice. We can truncate this sub-sequence sincethe absence of essential messages implies that external reads-from dependencies are notaffected. Hence the computation for a single env is Q Q -bounded. Given the Q -bound on env threads, the total number of dis messages consumed by the env threads can be atmost Q Q . This implies sufficiency with exponentially many essential dis messages. (cid:73) Corollary 17.
Let the goal message msg be generated in a computation of system c . Thenfor some compact computation, | edepend ( msg ) | is at most exponential in | c | . Proof.
Recall the notation genthread ( msg ) which refers to a thread which generated themessage msg for the first time. In the following, if t = genthread ( msg ), we also refer to t asthe “first writer” of msg .First we observe that edepend ( msg ) ⊆ depend ( msg ). Hence in particular, we have edepend ( msg ) ∩ Msgs ( ρ ↓ env ) ⊆ depend ( msg ) ∩ Msgs ( ρ ↓ env ). This is shown to be at mostexponential ( O ( Q )) by Lemma 15, since both the height and the env fan-in of the dependencygraph restricted to env is polynomial. Given that each essential message is generated for thefirst time by a unique essential thread, the number of essential env threads is also boundedby O ( Q ).Now, consider the fragment ρ of the computation between two consecutive first-writes(first points of generation) of two essential env messages. Now if any env thread performs morethan O ( Q ) = O ( | c env || Reg | | Dom | ) many transitions within ρ it would imply that there aretwo configurations lcf , lcf within ρ at which the local-states of the thread (modulo view) areidentical - this follows since | c env | is the program size and | Reg | | Dom | is the number of distinctregister valuations. Additionally note that the view at lcf cannot be greater than that at lcf (monotonicity of views in RA). Hence we can simply truncate the sub-computation between lcf and lcf while keeping the computation still valid under RA (the thread with lower viewcan still perform all its remaining transitions). In this truncation no essential messages willbe lost and hence the reads-from dependencies will be respected.To explain further, suppose to the contrary that some thread t which is the first writerof an essential message executed more than O ( Q Q ) number of transitions lcf lcf · · · lcf l .Since the total number of essential messages is only O ( Q ), there must exist a subsequence σ such that no essential env messages were generated (for the first time) in σ . Additionally,since the state-space of each thread is O ( Q ), by a pigeon-hole argument, it follows that twolocal configurations lcf i , lcf j of t in σ are equal. We can simply truncate the fragment of therun between these configurations since no essential messages have been generated for thefirst time.Then it suffices for each first writer env thread to take at most O ( Q Q ) many transitionsand consequently read at most exponentially many dis messages. Recall that the dis messagesthat are read by first writers of essential env messages are essential themselves. Since thenumber of essential env threads which are first writers itself is bounded by O ( Q ), the numberof essential dis messages is bounded by O ( Q Q ), which is exponential in the input. Since edepend ( msg ) is a union of essential dis and env messages we get the exponential bound onessential messages. (cid:74) dwait Godbole, S. Krishna, Roland Meyer 37 Combined with Lemma 16, the corollary says it is sufficent to focus on computationswith atmost exponentially many essential threads and essential messages. We now want tobound the computation of the dis threads.
The computation truncation idea as applied to env threads earlier does not apply to theleader. Recall the asymmetry in the definition of essential dependencies; we did not includethe dis - dis load dependencies. The dependencies come in two forms: (1) those involving(either as message writer or as reader) some non-leader dis thread and (2) ldr - ldr dependencies.The former are poly-sized owing to the loop-free nature of the non-leader dis threads. Hence,we focus on ldr - ldr dependencies. For a memory m de , let m de ↓ ldr be the set of ldr messagesin it. Assuming vw de is the view of the ldr , let selfRead ( vw de , m de ) denote the ( x , d ) pairs inmessages of m de ↓ ldr which can be read by ldr . (cid:73) Definition 18. selfRead ( vw de , m de ) = { ( x , d ) | ( x , d , vw de ) ∈ m de ↓ ldr , vw de ( x ) = vw de ( x ) } . We note that a pair ( x , d ) is in selfRead when this pair is the last store by the ldr on x following which vw de ( x ) has not changed. Observe that there can be at most | Var | | Dom | many distinct selfRead functions. Consider a sub-computation of the leader between twogenerations of essential messages. We call configurations cf de and cf de ldr -equivalent if (1)the local configurations of the leader coincide except for the views vw de resp. vw de and (2)the memories m de and m de satisfy selfRead ( vw de , m de ) = selfRead ( vw de , m de )Then the computation of the leader between cf de and cf de can be projected awaywhile retaining a computation in the simplified semantics. Since there are only O ( | c ldr | ( | Reg || Var | ) | Dom | ) many distinct configurations that are not ldr -equivalent, after project-ing away the redundant part, the leader will have an at most exponentially long computationbetween generation of two consecutive essential messages. Given the exponential bound onall essential messages, we see that post projection, the leader computation is reduced toexponential size. Combined with the argument for the env and non-leader dis threads, givesTheorem 13. Note that the resulting non-deterministic algorithm does not run in polynomialspace as there may be exponentially many essential ldr messages which need to be generatedconcurrently with the env threads. NEXPTIME -membership of safety verification in theleader case
We now move on to Theorem 13. It suffices to show that we only need to consider computationsof exponential length in order to verify safety properties of a parameterized system underthe simplified semantics in the leader case. For this, we show exponential bounds on the env and dis components of the computation.We have already seen that for the essential env threads, O ( Q Q ) is an upper bound onthe number of transitions they need to make. Additionally this bound also applies to thenumber of essential dis messages. Note that the non-leader dis threads are loop-free and hencetheir number of transitions is polynomial in | c dis | . Hence we now focus on computations of the leader. We denote Q = | c ldr | ( | Reg || Var | ) | Dom | which is a bound on the number of distinct(non equivalent) leader configurations and use it below in the proof.For the ldr , we need to maintain more states (as compared to the env threads) to ensurethat the truncated run is valid. This is so as we also want to capture ldr - ldr dependencies aswell. The selfRead function does precisely this - at each point in the run it tracks the set of ldr messages that can be read by the ldr itself.Assume once again that there is a (super-exponential) leader computation with lengthgreater than O ( Q Q Q ). Then since O ( Q Q ) is a bound on the number of total dis essentialmessages (and in particular essential ldr messages), there must exist a sub-computation ofthe ldr of length greater than O ( Q ) that is free of essential message generation. Let thissub-computation be lcf lcf · · · lcf l . Assume the memory states along this sub-computationto be m de m de . . . m de l .We augment each configuration lcf i with the respective memory state m de i obtainingan augmented configuration as explained below. Consider the configurations obtainedby augmenting lcf = ( c , rv , vw de ) to the set selfRead ( vw de , m de ). That is, given lcf i =( c i , rv i , vw de i ), on augmentation with selfRead ( vw de i , m de i ) we obtain the augmented state as h c i , rv i , selfRead ( vw de i , m i ) i . Now, selfRead can take atmost | Var | | Dom | many values, whilethe leader local-state (modulo view) has only | c ldr || Reg | | Dom | values. This implies, (by apigeon-hole argument), the existence of a pair i, j such that h lcf i , selfRead ( vw de i , m i ) i and h lcf j , selfRead ( vw de j , m j ) i are equivalent.Now, the view of the ldr thread is monotonic. This implies that if for i = j we have h c i , rv i , selfRead ( vw de i , m i ) i = h c j , rv j , selfRead ( vw de j , m j ) i then the sub-computation between i and j may be truncated. Thus the run lcf · · · lcf i lcf j +1 · · · lcf l is also a valid run of thethread. Moreover it does not affect other threads since once again no essential messages arelost.Hence for any super-exponential (order greater than O ( Q Q Q )) leader computaion,there exists a shorter computation which also preserves reachability. Thus for safety veri-fication it suffices to consider runs of atmost exponential length, immediately giving an NEXPTIME upper bound.
PSPACE -hardness of env ( nocas , acyc ) We show that the applications of semantic simplification to the loop-free and leader settingsare tight, and further simplification is not possible.Having shown that safety verification of env ( nocas ) k dis ( acyc ) k · · · k dis n ( acyc ) is in PSPACE , we give a matching lower bound. For the lower bound, it suffices to considerthe variant with no dis threads and loop-free env threads, env ( nocas , acyc ). In fact, thisresult captures the inherent complexity in Parameterized RA, termed as PureRA , i.e. RAin its simplest form. The simplicity of
PureRA comes from (1) disallowing registers, and(2) stores can only write value 1 and the memory is initialized with 0 values. We obtain
PSPACE -hardness even with this reduced form, which is surprising, given that in its full formit is in
PSPACE . Notice that the
PSPACE -hardness with registers is trivial, since
PSPACE canbe encoded in valuations of registers.
In this section, we elaborate on the
PSPACE -hardness of checking safety properties ofparameterized systems under RA in the absence of dis threads (and loop-free, cas-free env dwait Godbole, S. Krishna, Roland Meyer 39 threads), which we can denote as env ( nocas , acyc ). In fact, we investigate the inherentcomplexity in RA, by removing all extra frills like registers, as well as arbitrary data domains.So what we have is, Pure
RA, which is basically, RA in its simplest form. The simplicityof
Pure
RA comes from the fact that we do not use registers, and the only writes that areallowed are that of writing value 1 to any shared variable, where we assume that the memorywas initialized to 0 so that we have a data domain of { , } . The remarkable thing about thisresult is that we obtain PSPACE -hardness, which is surprising, given that in its full form itis in
PSPACE by Section 5. Notice that the
PSPACE -hardness with registers is trivial, sincecomputations can be encoded in register operations themselves. c = c AG ⊕ c SATC ⊕ c FE[0] ⊕ · · · ⊕ c FE[n − ⊕ c assert choose ( u ) = ( t u := 0) ⊕ ( f u := 0) c AG = choose ( u ); choose ( e ); choose ( u ); · · · ; choose ( u n ); ( s := 1) c SATC = assume ( s = 1); check (Φ);(( assume ( t u n = 0); a n, := 1; ) ⊕ ( assume ( f u n = 0); a n, := 1)) c FE[i] = assume ( a i +1 , = 1); assume ( a i +1 , = 1); ( assume ( f e i +1 = 0) ⊕ assume ( t e i +1 = 0));(( assume ( t u i = 0); a i, := 1) ⊕ ( assume ( f u i = 0); a i, := 1)) c assert = assume ( a , = 1); assume ( a , = 1); assert false Figure 9
The parametrized system used in the reduction
To show the
PSPACE -hardness of checking safety properties of parameterized systems of theclass env ( nocas , acyc ), we establish a reduction from the canonical PSPACE -complete problem,
QBF . The
QBF problem is described as follows. Given a quantified boolean formula Ψ = ∀ u ∃ e ∀ u ∃ e · · · ∃ e n ∀ u n Φ( u , e , · · · u n ), over variables V ars (Ψ) = { u , . . . , u n , e , . . . , e n } ,decide if Ψ is true. Ψ has n + 1 universally quantified variables and n existentially quantifiedvariables. To establish the reduction, we construct an instance of the parametrized reachabilityproblem for RA (in fact Pure
RA) consisting of the parametrized system c , such that c isunsafe if and only if the QBF instance is true. We assume that the
QBF instance Ψ is asgiven above and now detail the construction.The program c executed by the env threads (given in Figure 9) consists of functions(sub-programs), one of which may be executed non-deterministically: c = c AG ⊕ c SATC ⊕ c FE[0] ⊕ · · · ⊕ c FE[n − ⊕ c assert Gadgets used.
The task of checking the satisfiability of Ψ is distributed over the env threads executing these functions. Each function has a particular role, which we term asgadgets and now describe. c AG The
Assignment Guesser guesses a possible satisfying assignment for
V ars (Ψ). c SATC
The
SATisfiability Checker checks satisfiability of Φ w.r.t. an assignment guessed by c AG . c FE[i]
The ∀∃ ( ForallExists ) Checker at level i , 0 ≤ i ≤ n − c FE[i] ) verifies that the ( i + 1)thquantifier alternation ∀ u i ∃ e i +1 is respected by the guessed assignments. This proceedsin levels, where the check function at level i + 1, c FE[i+1] ‘triggers’ the check functionat level i , c FE[i] , till we have verified that all assignments satisfying Φ constitute thetruth of Ψ. c assert The
Assertion Checker reaches the assert false instruction when all the previousfunctions act as intended, implying that the formula was true.Due to the parameterization, an arbitrary number of threads may execute the differentfunctions at the same time. However, there is no interference between threads, and there isa natural order between the roles: c SATC requires c AG to function as intended, and c FE[i] requires the functions c AG , c SATC and c FE[j] , n − ≥ j > i . Shared Variables.
We use the following set of shared variables in c : For each x ∈ V ars (Ψ), we have boolean shared variables t x and f x in c . These variables represent trueand false assignments to x using the respective boolean variables in a way that is explainedbelow. All the shared variables used are boolean, and the initial value of all variables is 0.We also have a special (boolean) variable s . Encoding variable assignments of Ψ : the essence of the construction. Recallthat the messages in the memory are of the form ( x , d , vw ) where x is a shared variable, d ∈ { , } , and vw is a view. To begin, the views of all variables are assigned time stamp0. An assignment to the variables in Ψ can be read off from the vw of a message ( s, , vw )in the memory state. For v ∈ V ars (Ψ), if vw ( t v ) = 0, then v is considered to have beenassigned true, while if vw ( f v ) = 0, then v is assigned false. Our construction, explainedbelow, ensures that exactly one of the shared variables t v , f v will have time stamp 0 in theview of the message ( s, , vw ). The zero/non-zero timestamps of variables t x and f x in theview of ( s, , vw ) can be used to check satisfiability of Φ since only a thread with a zerotimestamp can read the initial message on the corresponding variable. Checking a single clause.
As an example, consider the i th clause e ∨ ¬ u ∨ u . Thesatisfiability check is implemented in a code fragment as follows. check ( i ) = ( assume t e = 0) ⊕ ( assume f u = 0) ⊕ ( assume t u = 0) and check (Φ) = check (1); check (2); · · · ; check ( l ). Finally,we have the boolean variables a i, and a i, for i ∈ { , · · · n } : these are 2( n + 1) ‘universalityenforcing’ variables that ensure that all possible assignments to the universal variables in V ars (Ψ) have been checked.
First we describe the various gadgets.
We now detail the gadgets (functions) mentioned in Figure 9.
Assignment Guesser : c AG : The job of the Assignment Guesser is to guess a possibleassignment for the variables. This is done by writing 1 to exactly one of the variables t x , f x for all x ∈ V ars (Ψ). Each such write is required to have a timestamp greater than 0 by theRA semantics, and the view vw of the writing thread is updated similarly. After making theassignment to all variables in V ars (Ψ) as described, the writing thread adds the message( s, , vw ) to the memory.Consequently, the view vw of the writing thread (and hence the message) satisfies ∀ x ∈ V ars (Φ) , vw ( t x ) = 0 ⊕ vw ( f x ) = 0 dwait Godbole, S. Krishna, Roland Meyer 41 We interpret this as: the assignment chosen for x ∈ V ars (Φ) is true if vw ( t x ) = 0 and is falseif vw ( f x ) = 0. The chosen assignment is thus encoded in vw and hence can be incorporatedby threads loading 1 from s using the message ( s, , vw ), (see c SATC ). This follows since loadoperations of the RA semantics cause the thread-local view to be updated by the view in themessage loaded.
SAT Checker : c SATC : The SAT Checker reads from one of the messages of the form( s, , vw ) generated by c AG . Using the code explained in Figure 9, it must check that theassignment obtained using the vw satisfies Φ. The crucial observation is that assume ( t x = 0)( assume ( f x = 0)) being successful is synonymous with the timestamp of t x ( f x ) in vw being0. This holds since assume ( v = 0) requires the ability to read the initial message on v whichin turn requires the thread-local view on v to be 0. Timestamp of t x ( f x ) in vw itself being 0is equivalent to x being assigned the value true ( false ) by c AG .Finally it checks that either t u n or f u n had timestamp 0 in vw , and writes 1 to a n, or a n, correspondingly in Figure 9. For insight, we note prematurely that we will enforce boththese writes to a n, and a n, as a way of ensuring the universality for the variable u n . Themain task is to verify the ‘goodness’ of the assignments satisfying Φ. One of the thingsto verify is that, we have satisfying assignments for both values true/false of the universalvariables u i .If the assume ( t u n = 0) evaluates to true in c SATC then in the view of the message ( s, , vw )obtained at the end of c AG , vw ( t u n ) = 0. We now need a c AG function (executed by somethread) to make an assignment such that in the view of the message ( s, , vw ), we have vw ( f u n ) = 0, and the formula Φ is satisfiable again. The next step is to check if theseassignments which differ in u n are sound with respect to the ∀ u n − ∃ e n part of Ψ : thatis, the assignment to e n is independent to that of u n . This procedure has to be iteratedwith respect to all of u , u , . . . , u n − by (1) first ensuring that Φ is satisfiable for bothassignments to u i , 0 ≤ i ≤ n − ≤ i ≤ n −
1, (that is the choice of assignmentto e i is independent of all variables in { u i , e i +1 , · · · , u n } ). ForallExists Checker : c FE[_] : The n ∀∃ Checker s c FE[0] , . . . , c FE[n − take over at thispoint, consuming the writes made earlier. In general, for each i ∈ { , · · · , n − } , we have ∀∃ Checker function of n kinds, c FE[0] , . . . , c FE[n − that operate at levels 0 , . . . , n − c FE[i] operates at level i by reading 1 from a i +1 , , a i +1 , variables, and making writes to a i, , a i, variables for 0 ≤ i ≤ n − Universality Check : c FE[i] first verifies that all possible valuations to theuniversally quantified variable u i +1 made Φ satisfiable : the two statements assume ( a i +1 , = 1); assume ( a i +1 , = 1) verify this by reading 1 from a i +1 , and a i +1 , (note how all higher c FE[j] , j > i level functions enforce this by generating a dependency treesuch as the one in Figure 10). Existentiality Check : Next, c FE[i] checks that the satisfying assignments of Φ seen sofar agree on the existentially quantified variable e i +1 : the statements ( assume ( f e i +1 = 0) ⊕ assume ( t e i +1 = 0)) check this. Assume that we have satisfying assignments of Φ which donot agree on e i +1 . Then we have messages ( a i +1 , , , vw ) and ( a i +1 , , , vw ) such that vw ( t e i +1 ) > e i +1 assigned false ) but vw ( t e i +1 ) = 0 ( e i +1 assigned true ). Now when c FE[i] reads from these messages, its view vw , will have both, vw ( t e i +1 ) > vw ( f e i +1 ) > c FE[i] from executing ( assume ( f e i +1 = 0) ⊕ assume ( t e i +1 = 0)) since themessages in the memory where t e i +1 and f e i +1 have value 0 (and time stamp 0) cannot beread. This enforces that the choice of the existentially quantified variable e i +1 is independentof the choice of the assignments made to the variables in { u i +1 , e i +2 , · · · , u n } , and hence the proper semantics of quantifier alternation is maintained. Propagation
Finally, the c FE[i] function ‘propagates’ assignments to the next level, thatis, to c FE[i − after a last verification. Let A i +1 ,j contain all assignments satisfying Φ whichagree on e i +1 , and where u i is assigned value j ∈ { , } . Such assignments are propagatedto the next level by a c FE[i] function which writes 1 to a i,j . c FE[i − is accessible only when A i +1 , and A i +1 , are both propagated. assert FE [0] FE [1] SATC a , =1 SATC a , =1 a , =1 FE [1] SATC a , =1 SATC a , =1 a , =1 a , = FE [0] FE [1] SATC a , =1 SATC a , =1 a , =1 FE [1] SATC a , =1 SATC a , =1 a , =1 a , = Figure 10
The dependency tree for the case of ∀ u ∃ e ∀ u ∃ e ∀ u Φ. The same color of siblingnodes c FE[i] represents that the value of e i +1 is same at both of these. Assert Checker : c assert : After the n ∀∃ Checkers finish, the Assertion Checker reads1 from the variables a , and a , and reaches the assertion assert false . This is possibleonly if all the earlier functions act as intended, which in turn is only possible if the QBFevaluates to true. The non-deterministic branching between the choices of the gadgets above means thateach env thread executes exactly one of the gadgets. However together they check Ψ ina distributed fashion as one thread passing on a part of its state to the next one by theload-stores for the a _ , / variables as mentioned above. Hence a computation that reachesthe assertion requires each thread to play a part in this tableau. We now describe this.First a set of 2 n threads run the c AG gadgets and they guess one assignment each suchthat all possible assignments for the universally quantified variables are covered and suchthat the existentially quantified variables are chosen such that the semantics of quantifieralternation is respected. Essentially this means the the 2 n assignments guessed would be asufficient witness to the truth of Ψ.Now, 2 n threads execute c SATC and check that each of the assignments guessed (onethread checks one assignment) satisfies Φ. They produce a ‘proof’ that this check is completeby writing to variables a n, / . This also checks the innermost universality is respected. Atlevel n −
1, 2 n − threads execute c FE[n − . Each c FE[n − reads 1 from both a n, and a n, and reads 0 from exactly one of t e n or f e n . Depending on the view read from the levelbelow, they either write 1 to a n − , or to a n − , . (Prematurely this corresponds to theassignments A n, and A n, in the proof below.) In essence these threads check that the lastquantifier alternation ( ∀ u n − ∃ e n ) is respected. 2 n − threads then execute c FE[n − at level n −
2, reading 1 from both a n − , and a n − , , and reading 0 from exactly one of t e n − or f e n − . These threads then write 1 to either a n − , or to a n − , , (representing assignments A n − , and A n − , in the proof below). These threads check that the second last quantifieralternation ∀ u n − ∃ e n − is respected. This continues till two threads execute c FE[0] , andwrites 1 to a , or a , . These two writes are read by a thread executing c assert . The views ofthese threads are all stitched together by the stores and loads they perform on the variables s (for guessing assignments) and a _ , / for checking proper alternation. Figure 10 illustrates dwait Godbole, S. Krishna, Roland Meyer 43 how the view (in which the assignments are embedded as described earlier) propagate throughthese threads for the case of the QBF ∀ u ∃ e ∀ u ∃ e ∀ u Φ. The nodes represent individualthreads executing the corresponding gadget and the edges represent the variable which achild writes to pass on its view to its parent. (cid:73)
Lemma 19. Ψ is true iff the assert false statement is reachable in c . This gives us the main theorem (cid:73)
Theorem 20.
The verification of safety properties for parametrized systems of the class env ( nocas , acyc ) under RA is PSPACE -hard.
We prove that reaching assert false is possible in the parameterized system c iffthe QBF Ψ is satisfiable. First we fix some notations. Given the QBF Ψ = ∀ u ∃ e . . . ∃ e n ∀ u n Φ( u , e , . . . , u n ), we define for 0 ≤ i ≤ n , the level i QBF correspondingto Ψ as follows. For 0 ≤ i ≤ n −
1, the level i QBF, denoted Ψ i is defined asΨ i ≡ ∀ u i ∃ e i +1 ∀ u i +1 ∃ e i +2 . . . ∀ u n ∃ e ∃ e . . . ∃ e i ∃ u ∃ u . . . ∃ u i − Φ( u , e , . . . , u n ) For i = n , the level n QBF, denoted Ψ n is defined asΨ n ≡ ∀ u n ∃ e . . . ∃ e n ∃ u ∃ u . . . ∃ u n − Φ( u , e , . . . , u n )Note that Ψ is the same as Ψ. To prove Lemma 19, we prove the following helper lemmas.For ease of arguments, we add some labels in our gadgets, and reproduce them below. choose ( u ) = ( t u := 0) ⊕ ( f u := 0) c AG = choose ( u ); choose ( e ); choose ( u ); · · · ; choose ( u n ); ( s := 1) Figure 11
Implementation of the Assignment Guesser c AG gadget (cid:73) Lemma 21. Ψ n is true iff we reach the label λ of the c SAT gadget (ref. Figure 12) insome thread, and the label λ of the c SAT gadget in some thread. (cid:73)
Lemma 22.
For ≤ i ≤ n − , Ψ i is satisfiable iff we reach the label λ in the c FE[i] gadget(ref. Figure 13) in some thread, and the label λ in the c FE[i] gadget in some thread. (cid:73)
Lemma 23. assert false is reachable iff we reach the label λ in the c FE[0] gadget in somethread, and the label λ in the c FE[0] gadget in some thread.
In the following, we write Φ for Φ( u , e , . . . , e n , u n ) since the free variables of Φ are clear. Proof of Lemma 21
Assume Ψ n is satisfiable. Then there are satisfying assignments α and α s.t. α ( u n ) = 0, α ( u n ) = 1, such that α , α | = Φ. These assignments α , α can be guessed by c AG gadgetsin two threads, resulting in adding messages ( s, , view ) and ( s, , view ) to the memory,such that view ( f u n ) = 0 and view ( t u n ) = 0. Correspondingly, there are c SATC gadgets c SATC = assume ( s = 1); check (Φ); λ : skip ;[( assume ( t u n = 0); a n, := 1; λ : skip ; ) ⊕ ( assume ( f u n = 0); a n, := 1; λ : skip ; )] Figure 12
Implementation of the SAT Checker c SATC gadget with labels λ , λ , λ which read from these views, (they read 1 from s ), and check for the satisfiability of Φ usingthe view , view values of t x , f x for x ∈ V ars (Ψ). Since both are satisfying assignments, thelabel λ is reachable in both c SATC gadgets. One of them will reach the label λ reading t u n = 0 (using view ) and the other will reach the label λ reading f u n = 0 (using view ).Conversely, assume that the label λ of c SATC is reachable in one thread, while the label λ of c SATC is reachable in another thread. Then we know that in one thread, we have reada message ( s, , view ), checked for the satisfiability of Φ using view , and also verified that view ( t u n ) = 0, while in another thread, we have read a message ( s, , view ), checked forthe satisfiability of Φ using view , and also verified that view ( f u n ) = 0. Thus, we have 2satisfying assignments to Φ, one where u n has been assigned to 0, and the other, where u n has been assigned 1. Hence Ψ n is satisfiable. (cid:74)(cid:73) Definition 24.
Let view be a view. We say that an assignment α : V ars (Ψ) → { , } isembedded in view iff for all x ∈ V ars (Ψ) , view ( t x ) = 0 ⇔ α ( x ) = 1 and view ( f x ) = 0 ⇔ α ( x ) = 0 . The term “embedded” is used since the view also has (program) variables outsideof t x and f x . For 0 ≤ i ≤ n , let A i and B i respectively represent the set of assignments which are embedded in the views reaching the labels λ , λ of the c FE[i] gadget. Thus, we know that A n = { α | = Φ | α ( u n ) = 1 } , and B n = { α | = Φ | α ( u n ) = 0 } c FE[i] = [ assume ( a i +1 , = 1); assume ( a i +1 , = 1)]; κ : skip ;[ assume ( f e i +1 = 0) ⊕ assume ( t e i +1 = 0)]; κ : skip ;[( assume ( t u i = 0); a i, := 1; λ : skip ; ) ⊕ ( assume ( f u i = 0); a i, := 1; λ : skip ; )] Figure 13 ∀∃ Checker at level i , c FE[i] , with labels κ , κ , λ , λ . We have n such gadgets, onefor each level 0 ≤ i ≤ n − (cid:73) Lemma 25.
For ≤ i ≤ n − , define sets of assignments A i, = { α ∈ A i +1 ] B i +1 | α ( u i ) = 1 , α ( e i +1 ) = 0 }A i, = { α ∈ A i +1 ] B i +1 | α ( u i ) = 1 , α ( e i +1 ) = 1 }B i, = { α ∈ A i +1 ] B i +1 | α ( u i ) = 0 , α ( e i +1 ) = 0 }B i, = { α ∈ A i +1 ] B i +1 | α ( u i ) = 0 , α ( e i +1 ) = 1 } where ] denotes disjoint union. Then A i is equal to one of the sets A i, or A i, . Similarly, B i is equal to one of the sets B i, or B i, . dwait Godbole, S. Krishna, Roland Meyer 45 Proof of Lemma 25. We already know the definitions of A n and B n . Consider the case of A n − and B n − . By construction, to reach label λ of c FE[n − ,(a) we need to have reached labels λ , λ of c FE[n] . The view (say view A ) on reaching thelabel κ in c FE[n − has embedded assignments from A n ] B n .(b) To reach the label κ of c FE[n − , we need either f e n to have time stamp 0 or t e n tohave time stamp 0 in view A . If we had view A ( t e n ) > view A ( f e n ) >
0, then thelabel κ is not reachable. That is, the assignments embedded in view A agree on theassignment of e n .(c) To reach the label λ in c FE[n − , the assignments embedded in view A agree on theassignment of u n − , such that u n − is assigned 1. Thus, A n − is obtained from A n ] B n by keeping those assignments which agree on e n and where u n − is true.Similarly, to reach label λ in c FE[n − ,(a) we need to have reached the labels λ , λ of c FE[n] . The view (say view B ) on reachingthe label κ in c FE[n − has embedded assignments from A n ] B n .(b) To reach label κ , we need either f e n to have time stamp 0 or t e n to have time stamp0 in view B . If we had view B ( t e n ) > view B ( f e n ) >
0, then the label κ is notreachable. That is, the assignments embedded in view B agree on the assignment of e n .(c) To reach the label λ in c FE[n − , the assignments embedded in view B agree on theassignment of u n − , such that u n − is assigned 0. Thus, B n − is obtained from A n ]B n by keeping those assignments which agree on e n and where u n − is false.The proof easily follows for any A i , B i , using the definitions of A i +1 , B i +1 as above. (cid:74) Proof of Lemma 22
We give an inductive proof for this, using Lemma 21 as the base case. As the inductive step,assume that Ψ i +1 is satisfiable iff we reach the label λ of the c FE[i+1] gadget in some thread,and the label λ of the c FE[i+1] gadget in some thread.Assume Ψ i is satisfiable. We can write Ψ i as ∀ u i ∃ e i +1 Ψ i +1 . We show that there is athread which reaches label λ of the c FE[i] gadget with a view that has A i embedded in it,and there is a thread which reaches the label λ of the c FE[i] gadget with a view that has B i embedded in it.By inductive hypothesis, since Ψ i +1 is satisfiable, there is a thread which reaches thelabel λ of the c FE[i+1] gadget with a view view A that has A i +1 embedded in it, and thereis a thread which reaches label λ of the c FE[i+1] gadget with a view view B that has B i +1 embedded in it. Note that a i +1 , , a i +1 , have been written 1 by these threads respectively,such that view A ( a i +1 , ) > view B ( a i +1 , ) >
0. Thanks to this, there is a thread whichcan take on the role of the c FE[i] gadget now. This thread begins with a view view C whichis the merge of view B and view A . The label κ of this c FE[i] gadget is reachable by reading1 from both a i +1 , and a i +1 , , and we want view C ( t e i +1 ) = 0 or view C ( f e i +1 ) = 0. Asseen in item(b) in the proof of observation 25, this is possible only if view B ( t e i +1 ) = 0 and view A ( t e i +1 ) = 0, or view B ( f e i +1 ) = 0 and view A ( f e i +1 ) = 0.By assumption, since Ψ i is satisfiable, there exists assignments from A i +1 and B i +1 whichagree on e i +1 and u i . In particular, the satisfiability of Ψ i = ∀ u i ∃ e i +1 Ψ i +1 says that we havea set of assignments S ⊆ A i +1 ] B i +1 which satisfy Ψ i , such that for all α ∈ S , α ( u i ) = 1and α ( e i +1 ) is some fixed value. Similarly, the satisfiability of Ψ i also gives us a set ofassignments S ⊆ A i +1 ] B i +1 such that for all α ∈ S , α ( u i ) = 0 and α ( e i +1 ) is some fixed value. It is easy to see that S = A i , while S = B i . Thus, the satisfiability of Ψ i implies thefeasibility of the assignments A i and B i . This in turn, gives us the following.Thus, starting with a view view C which has embedded assignments A i +1 ] B i +1 , it ispossible for a thread to read 1 from a i +1 , and a i +1 , (these are present in the view C ), Check that either t e i +1 has time stamp 0 in view C or f e i +1 has time stamp 0 in view C (thisis possible since the embedded assignments agree on e i +1 ), Check that t u i has time stamp 0 in view C (this is possible since the embedded assignmentsare such that u i is assigned 1)This ensures that the thread reaches the label λ of c FE[i] with a view having A i embeddedin it (notice that the last two checks filter out A i from A i +1 ] B i +1 ).In a similar manner, starting with a view view C which has embedded assignments A i +1 ] B i +1 , it is possible for a thread to read 1 from a i +1 , and a i +1 , (these are present in the view C ), Check that either t e i +1 has time stamp 0 in view C or f e i +1 has time stamp 0 in view C (thisis possible since the embedded assignments agree on e i +1 ), Check that f u i has time stamp 0 in view C (this is possible since the embedded assignmentsare such that u i is assigned 0)This ensures that the thread reaches the label λ of c FE[i] with a view having B i embeddedin it (notice that the last two checks filter out B i from A i +1 ] B i +1 ).Conversely, assume that we have two threads which have reached respectively, labels λ , λ of c FE[i] gadget having views in which B i and A i are embedded. We show that Ψ i issatisfiable.By the definition of A i , we know that we have assignments from A i +1 ] B i +1 which agreeon e i +1 , and which set u i to 1. The fact that we reached the label λ of c FE[i] gadget with aview having A i embedded in it shows that these assignments are feasible. Similarly, reachingthe label λ of c FE[i] with a view having B i embedded in it shows that we have assignmentsfrom A i +1 ] B i +1 which agree on e i +1 , and which set u i to 0. The existence of these twoassignments proves the satisfiability of Ψ i . Proof of Lemma 23
Assume that we reach assert false . Then we have read 1 from a , and a , . These are setto 1 only when the labels λ , λ of c FE[0] have been visited. The converse is exactly similar :indeed if we reach the labels λ , λ of c FE[0] , we have written 1 to a , and a , . This enablesthe reads of 1 from a , and a , leading to assert false . (cid:73) Theorem 26.
Parameterized safety verification for env ( nocas , acyc ) is PSPACE -hard.
NEXPTIME -hardness of env ( nocas , acyc ) k dis ( nocas ) In this section we show an
NEXPTIME lower bound on the safety verification problem inthe presence of a single leader dis thread, env ( nocas , acyc ) || dis ( nocas ). The lower bound isobtained with a fragment of RA which does not use registers and surprisingly in which dis alsodoes not perform any compare-and-swap operations. As in the case of the PSPACE -hardness, dwait Godbole, S. Krishna, Roland Meyer 47 we work with a fixed set of shared memory locations X (also called shared variables) froma finite data domain D . We show the hardness via a reduction from the succinct versionof 3CNF-SAT, denoted SuccinctSAT . Following the main part of the paper, we refer to thedistinguished dis thread as the ‘leader’ and individual threads from env as ‘contributors’.
SuccinctSAT : succinct satisfiability
The complexity of succinct representations was studied in the pioneering work [37] for graphproblems. Typically, the complexity of a problem is measured as a function of some quantity V , with the assumption that the input size is polynomial in V . If the underlying problemconcerns graphs, then V is the number of vertices in the graph, while if the underlying problemconcerns boolean formulae, then V is the size of the formula. [37] investigated the complexityof graph problems, when the input has an exponentially succinct representation, that is, theinput size is polylog in | V | , where V is the number of vertices of the graph, and showed thatsuccinct representations rendered trivial graph problems NP-complete, while [58] showed thatgraph properties which are NP-complete under the usual representation became NEXPTIME -complete under succinct representations.
SuccinctSAT is essentially a exponentially succinctencoding of a 3CNF-SAT problem instance. Let φ ( x , · · · , x n − ) be a 3CNF formula with2 n variables and 2 n clauses. Assume an n bit binary address for each clause.A succinct encoding of φ is a circuit D ( y , · · · , y n ) (with size polynomial in n ), which, onan n bit input y · · · y n interpreted as a binary address for clause c , generates 3 n + 3 bits,specifying the indices of the 3 variables from x , · · · , x n occurring in clause c and their signs(1 bit each). Thus, the circuit D provides a complete description of φ ( x , · · · , x n ) whenevaluated with all n -bit inputs. Define SuccinctSAT as the following
NEXPTIME -complete [58]problem.
Given a succinct description D of φ , check whether φ is satisfiable. Adopting the notation above, we assume that we have been given n , the formula φ with2 n boolean variables BVars = { x , · · · , x n − } , and the succinct representation D with inputvariables { y , · · · , y n } . Denote the variables in clause c as var1 ( c ) , var2 ( c ) , var3 ( c ) and theirsigns as sig1 ( c ) , sig2 ( c ) , sig3 ( c ). We denote the n -bit address ¯ c of a clause c as a (boolean)word ¯ c ∈ { , } n and commonly use the variable α to refer to clause addresses. We denotethe variable addresses also as n -bit (boolean) words and commonly use the the variable β to represent them. We construct an instance of the parametrized reachability problemconsisting of a ldr leader thread running program c ldr and the env contributor threads runningprogram c env . We show that this system is ‘unsafe’ (an assert false is reachable) if and onlyif the SuccinctSAT instance is satisfiable.
The leader running program c ldr guesses an assignment to the boolean variables in φ . Thecontributors running the program c env will be tasked with checking that the assignmentguessed by the leader does in fact satisfy the formula φ . They do this in a distributed fashion,where one clause from φ is verified by one contributor. Then similar to the PSPACE -hardnessproof, the program c env forces the contributors to combine checks for individual clauses as adependency tree. This is so that the root of the tree is able to reach an assertion failure onlyif all threads could successfully check their clauses under the leader’s guessed assignment.However since all the contributors run the same program, the trick is to enforce that allclauses will be checked. c env = c CL − ENC ⊕ c SAT ⊕ c Forall [ ] ⊕ · · · ⊕ c Forall [ n − ] ⊕ c assert choose ( u ) = ( t u := 1) ⊕ ( f u := 1) c CL − ENC = choose ( u ); choose ( u ); · · · ; choose ( u n − ); s := 1 c SAT = ( assume s = 1); c CV ; c Check ;(( assume ( t u n − = 0); a n − , := 1) ⊕ ( assume ( f u n − = 0); a n − , := 1)) c Forall [ i ] = assume a i +1 , = 1; assume a i +1 , = 1;(( assume t u i = 0; a i, := 1) ⊕ ( assume f u i = 0; a i, := 1)) c assert = assume ( a , = 1); assume ( a , = 1); assert false Figure 14
The contributor program c env used in the reduction. The sub-routines c CV and c Check are described later.
Gadgets. c env consists of a set of gadgets (modelled as ‘functions’ in the program), onlyone of which may be non-deterministically executed by the contributors, while c ldr is theprogram executed by the leader. c env = c CL − ENC ⊕ c SAT ⊕ c Forall [ ] ⊕ · · · ⊕ c Forall [ n − ] ⊕ c assert Recall that in the
PSPACE -hardness proof too, there were similar gadgets which were executedby the env threads. The gadgets in c env do execute various tasks as follows. c CL − ENC guesses an n bit address ¯ c of a clause c in φ , c SAT (1) acquires a clause address ¯ c generated by c CL − ENC , (2) uses the circuit D to obtainthe indices of variables var1 c , var2 c , var3 c in clause c , along with its sign (this isdone by the sub-routine c CV ), (3) accesses the assignment made to the variables bythe leader (sub-routine c Check ) and (4) the assignment is such that c is satisfied. c Forall [ i ] (0 ≤ i ≤ n −
2) together ensure that the satisfiability of all the clauses in φ hasbeen checked. This is done by instantiations of c SAT , in levels (similar to the proofof
PSPACE -hardness). At the i th level, c Forall [ i ] checks the ∀ universality of the i th address bits of clause c . c assert finally reaches assert false , if all the previous functions execute faithfully, implyingthat the SuccinctSAT instance is satisfiable.The non-deterministic branching implies that each env thread will only be able to executeone of these gadgets. The check for satisfiability of φ is distributed between the env threadsmuch like the PSPACE -hardness construction. For this distributed check, threads are allocatedroles depending upon the function (gadget) they execute. Additionally, the distinguishedleader thread is tasked with guessing the assignment. We now describe this.
Role of the leader.
We have one leader thread which guesses a satisfying assignment forthe boolean variables
BVars as a string of writes made to a special program variable g . Thewrites made to g have n +2 values d t , d f , , . . . , n in a specific order. Let the initial values of allvariables in the system be init / ∈ { d t , d f , , . . . , n } . To illustrate a concrete example, considerthe case where n = 3. Let the guessed assignment for BVars be w = ftftttff ∈ { t,f } ,where t denotes true and f false. Then the writes made by the leader are as below, where d t and d f are macros for data domain values (other than { init , , · · · , n } ) representing true andfalse respectively. d f d t d f d t d t d t d f d f dwait Godbole, S. Krishna, Roland Meyer 49 The leader alternates writing a guessed assignment for x , . . . , x (in red) with writing avalue from { , . . . , n } (in blue). The values in { , . . . , n } (here { , , } ) in blue are writtenin a deterministic pattern as 1 2 1 3 1 2 1, which we call a ‘binary search pattern’ with 3values, denoted BSP (3) for short.
BSP ( n ) is a unique word of length 2 n − { , . . . , n } ,defined inductively as follows. BSP (1) = 1
BSP ( n ) = BSP ( n − · n · BSP ( n −
1) for n ≥ x , . . . , x n − are interspersed alternately with symbols in BSP ( n )by the leader while writing to g . Formally, let S ( n, w ) = BSP ( n ) (cid:1) w represent theperfect shuffle (alternation) of BSP ( n ) with the guessed assignment w ∈ { d t , d f } n .The leader writes the word S ( n, w ) to g . From the example above, S (3 , ftftttff ) = d f d t d f d t d t d t d f d f . We show that the shuffle sequence which need togenerated is easily implementable by the leader with a polysized program. (cid:73) Lemma 27.
There exists a program c ldr , which nondeterministically choses w ∈ { d t , d f } n and generates the write sequence S ( n, w ) on a shared memory location g , with the size of theprogram growing polynomially with n . How contributors access variable assignments, intuitively.
Each contributor wants to check a single clause for which it needs to access the 3 variablesand their signs occurring in that clause. Since it pertains to the
BSP , we first understandthis task and discuss the others (selecting clause, acquiring variable address and sign, etc.)later. For now we assume that the contributor has a variable x with address β and sign σ wants to access the assignment made to variable x by the leader.For boolean variable x , the contributor uses the BSP ( n ) pattern to locate the assignmentmade to x , by reading a subword of S ( n, w ). From program variable g , the contributor reads n + 1 values { d f , , . . . , n } or { d t , , . . . , n } without repetitions, depending upon the sign of x in the clause ( d f if sign is negative, d t if positive). In the running example, if contributorwants to access x from d f d t d f d t d t d t d f d f , it reads the sequence 2 d f x is obtained by reading 3 2 d f
1, while for x , the contributor mustread d f x ∈ BVars , there is a unique ‘access pattern’, whichforces the thread to acquire the assignment of exactly x and not any other variable. Inthis search, it is guided by the BSP , which acts as an indexing mechanism. It helps thecontributor narrow down unambiguously, to the part which contains the value of x . The contributors check that each clause in φ has been satisfied in a distributed fashion. Eachcontributor executes one of the functions in c env . They do this as follows. Clause Encoding : c CL − ENC : A thread executing c CL − ENC selects, nondeterministically, aclause address α ∈ { , } n . This is done by writing 1 to either t u i or f u i for all 0 ≤ i ≤ n − s . The function c CL − ENC in Figure 14 describesthis. The view of the message ( s, , vw ) encodes the address α of a clause satisfying( vw ( t u i ) > ⇐⇒ α [ i ] = 0) and ( vw ( f u i ) > ⇐⇒ α [ i ] = 1) for 0 ≤ i ≤ n − PSPACE -hardness proof. Eachbit is encoded in the view of a message. Overall 2 n threads will execute c CL − ENC to cover allthe clauses in the formula.
Satisfaction checking (for one clause) : c SAT : A thread executing c SAT acquires theaddress ¯ c of a clause c through the view vw by reading the message ( s, , vw ) generated by c CL − ENC . This thread has to check the satisfiability of the clause with address ¯ c . For this, itneeds to know the 3 boolean variables var1 ( c ), var2 ( c ), var3 ( c ) appearing in c . Recall thatwe have been given, as part of the problem, the circuit D which takes an n bit address α corresponding to some clause as input, and outputs the 3 n + 3 bits corresponding to the 3variables appearing in the clause, along with their signs. We use D , and the encoding of aclause address ¯ c stored in vw , to compute D ( α ). We have a polysized sub-routine c CV (CVfor circuit value) that can compute the circuit value of D . Circuit Value : c CV : The c CV sub-routine takes the address α (of a clause c ) and convertsit into the index of one of the variables in c . Thus in essence, c CV evaluates the circuit-valueproblem D ( α ) by simulating the (polynomially-many) gates in D . The idea is to keep twoboolean program variables for each node in D , and propagate the evaluation of nodes in anobvious way (for instance, if we have ∧ gate with input gates g , g evaluating to 0, and 1respectively, then t ∧ will be written 1). We now briefly explain how circuit value can beevaluated, by taking an example of a single gate.For each node p in D we use two boolean program variables, t p and f p . We say that aview vw encodes the value at node p if the following holds. We write encAddr ( vw ) to denotethe value values for boolean variables encoded in vw .( vw ( t p ) > ⇐⇒ p = 0) and ( vw ( f p ) > ⇐⇒ p = 1) (1) t i , f i t i , f i t o , f o Now assume a thread has a view vw when itwants to evaluate a logic (NAND) gate G , with outputnode o and input nodes i and i . We assume vw must encode the values of i and i (the thread hasevaluated inputs of G ) and the thread must not haveevaluated G before (we have that vw ( t o ) = vw ( f o ) = 0). Assuming that these conditionshold, the thread executes instructions such that the new view vw of the thread (1) differsfrom vw only on t o and f o and (2) vw correctly encodes value of o . The function in Figure15 evaluates G . The main observation is that a thread can read init from a variable only if c NAND = ((( assume f i = init ) ⊕ ( assume f i = init )); f o := 1) ⊕ (( assume t i = init ); ( assume t i = init ); ( t o := 1)) Figure 15 c NAND - encoding evaluation for a NAND Gate in views of threads its view on that variable is 0 (since there is only one init message with timestamp 0). Claim(1) holds trivially since only timestamps of only t o or f o may be augmented (reading from init will not change timestamp). By a little observation we see that the thread can write to f o if one of f i { , } have timestamp 0, implying that one of the inputs to the gate is 0 by theassumption. Then it checks out that the new view on f o is greater than 0, thus claim (2)holds. The case for t o may be checked similarly. Since D has polynomially many gates, anythread can evaluate them in topological order, and hence eventually will end up with theevaluation of D ( α ). Also note that since the thread relied on its internal view, the same setof program variables { t p , f p | p ∈ G } may be used by all threads (hence crucially avoiding thenumber of variables to vary with the thread count). dwait Godbole, S. Krishna, Roland Meyer 51 (cid:73) Lemma 28.
There exists a sub-routine c CV that starting with the view vw from ( s, , vw ) ,evaluates the circuit value D ( α ) , where α is the clause address encoded in the variables t u i and f u i in encAddr ( vw ) . Also, c CV is polysized in n . Once D ( α ) has been computed, the thread can nondeterministically choose one of thethree variables appearing in clause c , say x ∈ { var1 ( c ) , var2 ( c ) , var3 ( c ) } . For simplicitywe include this as a part of the routine c CV itself. The address β of the variable x , thecontributor accesses the assignment made by the leader to x and checks if it satisfies clause c . This is done by the routine c Check . Clause Check : c Check : Having acquired the address β = β n − , . . . , β and sign σ ofvariable x , by executing c CV , the thread checks that variable x satisfies clause c . To faithfullyaccess the assignment to x from the variable g , the BSP guides the thread. The ‘accesspattern’ for x denoted by AP ( n ) ( β, σ implicit) which is recursively defined asfor 0 < i ≤ n AP ( i ) = ( i · AP ( i −
1) if β i − = 1 AP ( i − · i if β i − = 0for checking satisfiability AP (0) = ( d f if σ = 0 d t if σ = 1For example if x , with negative sign ( σ = 0), was to be accessed, then the access patternwould be AP (3) = 3 · AP (2) = 3 · · AP (1) = 3 · · AP (0) · · · d f · x ,with negative sign ( σ = 0), the access pattern would be AP (3) = 3 · AP (2) = 3 · AP (1) · · AP (0) · · · d f · · d f d t d f d t d t d t d f d f was written to g bythe leader, it is easy to see that the reads with access pattern for x (3 · · d f ·
1) would besuccessful, since x had been assigned to false by the leader while that for x ( 3 · d f · · true while the contributor wished to read d f . AP (0) isdefined to ensure satisfiability of the clause, AP (0) = d f iff f sign = 0 (the sign of the variablein the clause is negative) and AP (0) = d t iff t sign = 0 (the sign of the variable in the clauseis positive).The above recursive formulation gives us a poly-sized sub-routine which reads valuesmatching the AP sequence. We thus have the following lemma. (cid:73) Lemma 29.
There exists a sub-routine c Check , which, starting with a view vw encoding (in t d , f d , . . . , t d n − , f d n − and t sign , f sign ) the address and sign of boolean variable x in clause c , terminates only if c is satisfied under the assignment to x made by the leader. Until now, a thread which reads a clause from c CL − ENC has checked its satisfiabilitywith respect to the assignment guessed by the leader, using the c SAT module. However, toensure satisfiability of φ , this check must be done for all 2 n clauses. This is done in levels0 ≤ i ≤ n − c Forall [ i ] , exactly as in the PSPACE -hardness proof. Finally, we reach assert false reading 1 from both a , , a , . However, in this case we do not have to checkfor alternation, but only for the universality in the assignments. Forall Checker : c Forall [ i ] : The c Forall [ n − ] gadget checks the ‘universality’ with respect tothe second-last bit of the clause address, c Forall [ n − ] gadget does this check with respect tothe third-last bit, and so on, till c Forall [ ] does this check for the first bit, ensuring that allclauses have been covered.2 n threads execute c c SAT , and write 1 to a n − , and a n − , , depending on the last addressbit of the clause it checks. Next, 2 n − threads execute c Forall [ n − ] . A thread executing c Forall [ n − ] reads 1 from both a n − , and a n − , representing two clauses whose last bits differ;this thread checks that the second last bits in these two clauses agree: it writes 1 to a n − , (if the second last bit is 0) or to a n − , (if the second last bit is 1). When 2 n − threads finishexecuting c Forall [ n − ] , we have covered the second last bits across all clauses. This continueswith 2 n − threads executing c Forall [ n − ] . A thread executing c Forall [ n − ] reads 1 from both a n − , and a n − , representing two clauses whose second last bits differ and checks thatthe third last bits in these two clauses agree. Finally, we have 2 threads executing c Forall [ ] ,certifying the universality of the first address bit, writing 1 to a , and a , . Assertion Checker : c assert : The assertion checker gadget in Figure 14, reads 1 from a , , a , . If this is successful, then we reach the assert false . PSPACE -hardness proof
As is evident there are many things common between the two proofs. We now recapitulatethe similarities and differences.In the
PSPACE -h we wanted to check for truth of the QBF, hence guessing anassignment was not necessary. Here the leader is tasked with guessing an assignmentto the boolean variables.In
PSPACE -h we want to check for quantifier alternation in the boolean variables inΨ. Here we want to also check for universality of addresses, i.e. the fact that allclauses have been checked. This makes the c Forall [_] gadget a bit simpler than its c FE[_] counterpart.In the
PSPACE -h, the CNF formula, φ was given in a simple form and hence allthreads executing c SAT checked the formula. Additionally, given the exponential sizeof the formula, the task is distributed between (exponentially) many threads. HereCNF formula also was in an encoded form, hence we had to devise the circuit valuemachinery to extract it from the succinct representation D . The proof of this lemma is very close to that of Lemma 19. Some of the terminology we usein this proof is borrowed from the proof of Lemma 19. As in the case of section 7.4, we addsome labels in the function descriptions for ease of argument in the proof. We describe thenotations and key sub lemmas required for the proof.
Notation and Interpretation of Boolean Variables involved in the construction
We denote by α U , an assignment on the (boolean) variables { u , u , · · · u n − } , inter-preted as the (n-bit) address of a clause. Here u n − is the most significant bit (MSB)and u as the least significant bit (LSB). We view the assignment so generated as α U ∈ { , } n as an n -bit vector. α U ( u i ) gives the assignment to u i .We denote by α D , an assignment on (boolean) variables { d , d , · · · d n − , d sign } , inter-preted as the (n-bit) index of a variable in V ars (Ψ) and one sign bit. Here d n − is theMSB and d is the LSB. We view the assignment as α D ∈ { , } n +1 as an ( n + 1)-bitvector.For an assignment α ∈ B n , D ( α U ), (similarly D ( α U ) and D ( α U )) are the n + 1bits signifying var1 ( α U ) , sig1 ( α U ) ( var2 ( α U ) , sig2 ( α U ) and var3 ( α U ) , sig3 ( α U ))respectively. dwait Godbole, S. Krishna, Roland Meyer 53 c SAT = ( assume s = 1); c CV ; λ : skip ; c Check ;(( assume ( t u n − = 0); a n − , := 1) ⊕ ( assume ( f u n − = 0); a n − , := 1)) Figure 16 c SAT - acquiring a clause c i and checking satisfiability of that clause, with the label λ c Forall [ i ] = assume a i +1 , = 1; assume a i +1 , = 1;(( assume t u i = 0; a i, := 1; λ : skip ) ⊕ ( assume f u i = 0; a i, := 1); λ : skip ) Figure 17 c Forall [ i ] at level i with the labels λ , λ . We have n − ≤ i ≤ n − We observe that each thread executing a c CL − ENC function makes a (single) write to s , withthe message ( s, , view ) where view has embedded in it an assignment α U . We write α (cid:5) view todenote that the assignment α is embedded in view . Now a thread p executing a c SAT functionacquires the assignment α U , and computes (non-deterministically) one amongst D ( α U ), D ( α U ), D ( α U ) reaching the label λ . The correctness invariant involved is formalized inthe following lemma. (cid:73) Lemma 30.
Let a thread p executing the c SAT function read a message ( s, , view ) with α (cid:5) view . When the p reaches the label λ computing D i ( i ∈ { , , } ) with α D = D i ( α U ) .Let the view of the thread be view . Then we have α D (cid:5) view . Continuing from above, let p compute D i in c CV reaching λ . Then by lemma 30, we have α D = D i ( α ) embedded in the view of the thread. Now, using c Check , p checks that the clausewith index α U is satisfied by the n + 1 bits α D representing a variable and the sign of thevariable in the clause. Finally the thread makes writes to one of the program variables a n − , or a n − , . We have the following lemma that shows correctness of this operation. (cid:73) Lemma 31.
A thread p can make the write ( a n − , , (similarly ( a n − , , ) if and onlyif, clause α U is satisfied and if α U ( u n − ) = 1 (similarly α U ( u n − ) = 0 ). In section 8.3.1 and section 8.3.2 we have discussed how the system can check satisfiability ofa single clause. Now, we need to check that each clause satisfied. We do this via additionalmodules to the PSPACE construction.Towards this goal, define a level predicate
IsSAT ( u n − , u n − , · · · , u ) denoting that theclause α U = u n − · · · u u is satisfiable. Now very similarly to section 7.4 we define thefollowing formulae:For 0 ≤ i ≤ n − i ≡ ∀ u i ∀ u i +1 . . . ∀ u n − ∃ u . . . ∃ u i − IsSAT ( u n − , u n − , · · · , u )And we claim the following lemma, (cid:73) Lemma 32.
For ≤ i ≤ n − , Υ i is true ⇐⇒ the labels λ , λ in the gadget c Forall [ i ] canbe reached. The proof of Lemma 32 follows exactly in similar lines to that of Lemma 21 and Lemma 22.Finally, note that Υ is equivalent to the SuccinctSAT instance being satisfiable. We havethe following final lemma to show correctness of the entire construction. (cid:73)
Corollary 33.
We can reach the assert false assertion in the c assert gadget ⇐⇒ Υ istrue. This gives us the main theorem (cid:73)
Theorem 34.
The verification of safety properties for parametrized systems of the class env ( nocas , acyc ) k dis ( nocas ) under RA is NEXPTIME -hard.
Atomic CAS operations are indispensible for most practical implementations of distributedprotocols, yet, they hinder verification efforts. Undecidability of safety verification in thenon-parameterized setting [1] and even in the loop-free parameterized setting env ( acyc ), area testament to this.We tried to reconcile the two by studying the effects of allowing restricted access to CASoperations in parameterized systems. Systems which prevent the env threads from performingCAS operations and allow only either (1) loop-free dis programs or (2) loop-free dis programsalong with a single (‘ego’) program with loops lead to accessible complexity bounds. Thesimplified semantics based on a timestamp abstraction provides the infrastructure for theseresults. The PSPACE -hardness gives an insight into the core complexity of RA (
PureRA )that stems from the consistency mechanisms of view-joining and timestamp comparisons.We conclude with some interesting avenues for future work. A problem arising from thiswork is the decidability of CAS-free parameterized systems, env ( nocas ) || dis ( nocas ) k · · · k dis n ( nocas ) which seems to be as elusive as its non-parameterized twin dis ( nocas ) k · · · k dis n ( nocas ). We believe that ideas in this paper can be adapted to causally consistent sharedmemory models [50] as well as transactional programs [15] in the parameterized setting. On,the practical side, the Datalog encoding suggests development of a tool, considering thatHorn-clause solvers are state-of-the-art in program verification. References P. A. Abdulla, J. Arora, M. F. Atig, and S. N. Krishna. Verification of programs under therelease-acquire semantics. In
PLDI , pages 1117–1132. ACM, 2019. Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, Egor Derevenetc, CarlLeonardsson, and Roland Meyer. On the state reachability problem for concurrent programsunder power. In Chryssis Georgiou and Rupak Majumdar, editors,
Networked Systems - 8thInternational Conference, NETYS 2020, Marrakech, Morocco, June 3-5, 2020, Proceedings ,volume 12129 of
Lecture Notes in Computer Science , pages 47–59. Springer, 2020. doi:10.1007/978-3-030-67087-0\_4 . Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, K. Narayan Kumar, andPrakash Saivasan. Deciding reachability under persistent x86-tso.
Proc. ACM Program. Lang. ,5(POPL):1–32, 2021. doi:10.1145/3434337 . Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, and Tuan Phong Ngo. Aload-buffer semantics for total store ordering.
Log. Methods Comput. Sci. , 14(1), 2018. doi:10.23638/LMCS-14(1:9)2018 . dwait Godbole, S. Krishna, Roland Meyer 55 Parosh Aziz Abdulla, Mohamed Faouzi Atig, and Rojin Rezvan. Parameterized verificationunder tso is pspace-complete.
Proc. ACM Program. Lang. , 4(POPL), December 2019. doi:10.1145/3371094 . Parosh Aziz Abdulla and Bengt Jonsson. Verifying programs with unreliable channels. In
Proceedings of the Eighth Annual Symposium on Logic in Computer Science (LICS ’93),Montreal, Canada, June 19-23, 1993 , pages 160–170. IEEE Computer Society, 1993. doi:10.1109/LICS.1993.287591 . Mustaque Ahamad, Gil Neiger, James E. Burns, Prince Kohli, and Phillip W. Hutto. Causalmemory: Definitions, implementation, and programming.
Distributed Comput. , 9(1):37–49,1995. doi:10.1007/BF01784241 . Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling, simulation,testing, and data mining for weak memory.
ACM Trans. Program. Lang. Syst. , 36(2), July2014. doi:10.1145/2627752 . Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling, simulation,testing, and data mining for weak memory.
ACM Trans. Program. Lang. Syst. , 36(2):7:1–7:74,2014. Rajeev Alur, Kenneth L. McMillan, and Doron A. Peled. Model-checking of correctnessconditions for concurrent objects.
Inf. Comput. , 160(1-2):167–188, 2000. Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and Madanlal Musuvathi.On the verification problem for weak memory models. In Manuel V. Hermenegildo and JensPalsberg, editors,
Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principlesof Programming Languages, POPL 2010, Madrid, Spain, January 17-23, 2010 , pages 7–18.ACM, 2010. doi:10.1145/1706299.1706303 . Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and Madanlal Musuvathi.What’s decidable about weak memory models? In Helmut Seidl, editor,
ProgrammingLanguages and Systems - 21st European Symposium on Programming, ESOP 2012, Held asPart of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012,Tallinn, Estonia, March 24 - April 1, 2012. Proceedings , volume 7211 of
Lecture Notes inComputer Science , pages 26–46. Springer, 2012. doi:10.1007/978-3-642-28869-2\_2 . A. R. Balasubramanian, Nathalie Bertrand, and Nicolas Markey. Parameterized verification ofsynchronization in constrained reconfigurable broadcast networks. In Dirk Beyer and MariekeHuisman, editors,
Tools and Algorithms for the Construction and Analysis of Systems - 24thInternational Conference, TACAS 2018, Held as Part of the European Joint Conferenceson Theory and Practice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018,Proceedings, Part II , volume 10806 of
Lecture Notes in Computer Science , pages 38–54. Springer,2018. doi:10.1007/978-3-319-89963-3\_3 . Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. Mathematizingc++ concurrency.
SIGPLAN Not. , 46(1):55–66, January 2011. URL: http://doi.acm.org/10.1145/1925844.1926394 , doi:10.1145/1925844.1926394 . Sidi Mohamed Beillahi, Ahmed Bouajjani, and Constantin Enea. Robustness against trans-actional causal consistency. In , pages 30:1–30:18, 2019. doi:10.4230/LIPIcs.CONCUR.2019.30 . Nathalie Bertrand, Patricia Bouyer, and Anirban Majumdar. Reconfiguration and MessageLosses in Parameterized Broadcast Networks. In Wan Fokkink and Rob van Glabbeek, editors, , volume 140 of
LeibnizInternational Proceedings in Informatics (LIPIcs) , pages 32:1–32:15, Dagstuhl, Germany, 2019.Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. URL: http://drops.dagstuhl.de/opus/volltexte/2019/10934 , doi:10.4230/LIPIcs.CONCUR.2019.32 . Nikolaj Bjørner, Arie Gurfinkel, Ken McMillan, and Andrey Rybalchenko. Horn clause solversfor program verification. In
Fields of Logic and Computation II , pages 24–51. Springer, 2015. Nikolaj Bjørner, Ken McMillan, and Andrey Rybalchenko. On solving universally quantifiedHorn clauses. In
SAS , volume 7935 of
LNCS , pages 105–125. Springer, Springer, 2013. Roderick Bloem, Swen Jacobs, Ayrat Khalimov, Igor Konnov, Sasha Rubin, Helmut Veith, andJosef Widder.
Decidability of Parameterized Verification . Synthesis Lectures on DistributedComputing Theory. Morgan & Claypool Publishers, 2015. Roderick Bloem, Swen Jacobs, Ayrat Khalimov, Igor Konnov, Sasha Rubin, Helmut Veith,and Josef Widder. Decidability in parameterized verification.
SIGACT News , 47(2):53–64,2016. doi:10.1145/2951860.2951873 . Ahmed Bouajjani, Michael Emmi, Constantin Enea, and Jad Hamza. On reducing linearizabilityto state reachability. In Magnús M. Halldórsson, Kazuo Iwama, Naoki Kobayashi, and BettinaSpeckmann, editors,
Automata, Languages, and Programming - 42nd International Colloquium,ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part II , volume 9135 of
Lecture Notesin Computer Science , pages 95–107. Springer, 2015. doi:10.1007/978-3-662-47666-6\_8 . Ahmed Bouajjani, Constantin Enea, Rachid Guerraoui, and Jad Hamza. On verifying causalconsistency. In Giuseppe Castagna and Andrew D. Gordon, editors,
Proceedings of the44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017,Paris, France, January 18-20, 2017 , pages 626–638. ACM, 2017. URL: http://dl.acm.org/citation.cfm?id=3009888 . Ahmed Bouajjani, Constantin Enea, and Jad Hamza. Verifying eventual consistency ofoptimistic replication systems. In Suresh Jagannathan and Peter Sewell, editors,
The 41stAnnual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,POPL ’14, San Diego, CA, USA, January 20-21, 2014 , pages 285–296. ACM, 2014. doi:10.1145/2535838.2535877 . Stefano Ceri, Georg Gottlob, and Letizia Tanca. Syntax and semantics of datalog. In
LogicProgramming and Databases , pages 77–93. Springer, 1990. Peter Chini, Roland Meyer, and Prakash Saivasan. Complexity of liveness in parameterizedsystems. In Arkadev Chattopadhyay and Paul Gastin, editors, , volume 150 of
LIPIcs , pages 37:1–37:15. SchlossDagstuhl - Leibniz-Zentrum für Informatik, 2019. doi:10.4230/LIPIcs.FSTTCS.2019.37 . Peter Chini, Roland Meyer, and Prakash Saivasan. Liveness in broadcast networks. InMohamed Faouzi Atig and Alexander A. Schwarzmann, editors,
Networked Systems - 7thInternational Conference, NETYS 2019, Marrakech, Morocco, June 19-21, 2019, RevisedSelected Papers , volume 11704 of
Lecture Notes in Computer Science , pages 52–66. Springer,2019. doi:10.1007/978-3-030-31277-0\_4 . Peter Chini, Roland Meyer, and Prakash Saivasan. Fine-grained complexity of safety verifica-tion.
J. Autom. Reason. , 64(7):1419–1444, 2020. doi:10.1007/s10817-020-09572-x . Edmund M. Clarke, Armin Biere, Richard Raimi, and Yunshan Zhu. Bounded model checkingusing satisfiability solving.
Formal Methods Syst. Des. , 19(1):7–34, 2001. Giorgio Delzanno, Arnaud Sangnier, Riccardo Traverso, and Gianluigi Zavattaro. On thecomplexity of parameterized reachability in reconfigurable broadcast networks. In DeepakD’Souza, Telikepalli Kavitha, and Jaikumar Radhakrishnan, editors,
IARCS Annual Conferenceon Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2012,December 15-17, 2012, Hyderabad, India , volume 18 of
LIPIcs , pages 289–300. Schloss Dagstuhl- Leibniz-Zentrum für Informatik, 2012. doi:10.4230/LIPIcs.FSTTCS.2012.289 . Giorgio Delzanno, Arnaud Sangnier, and Gianluigi Zavattaro. Verification of ad hoc networkswith node and communication failures. In Holger Giese and Grigore Rosu, editors,
FormalTechniques for Distributed Systems - Joint 14th IFIP WG 6.1 International Conference,FMOODS 2012 and 32nd IFIP WG 6.1 International Conference, FORTE 2012, Stockholm,Sweden, June 13-16, 2012. Proceedings , volume 7273 of
Lecture Notes in Computer Science ,pages 235–250. Springer, 2012. doi:10.1007/978-3-642-30793-5\_15 . dwait Godbole, S. Krishna, Roland Meyer 57 Antoine Durand-Gasselin, Javier Esparza, Pierre Ganty, and Rupak Majumdar. Model checkingparameterized asynchronous shared-memory systems. In Daniel Kroening and Corina S.Pasareanu, editors,
Computer Aided Verification - 27th International Conference, CAV 2015,San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I , volume 9206 of
Lecture Notesin Computer Science , pages 67–84. Springer, 2015. doi:10.1007/978-3-319-21690-4\_5 . Javier Esparza, Alain Finkel, and Richard Mayr. On the verification of broadcast protocols.In ,pages 352–359, 1999. doi:10.1109/LICS.1999.782630 . Javier Esparza, Pierre Ganty, and Rupak Majumdar. Parameterized verification of asynchron-ous shared-memory systems.
J. ACM , 63(1):10:1–10:48, 2016. doi:10.1145/2842603 . Philippe Flajolet, Jean-Claude Raoult, and Jean Vuillemin. The number of registers requiredfor evaluating arithmetic expressions.
Theoretical Computer Science , 9(1):99–125, 1979. Cormac Flanagan, Stephen N. Freund, and Shaz Qadeer. Thread-modular verification forshared-memory programs. In
ESOP , volume 2305 of
LNCS , pages 262–277. Springer, 2002. Marie Fortin, Anca Muscholl, and Igor Walukiewicz. Model-checking linear-time properties ofparametrized asynchronous shared-memory pushdown systems. In Rupak Majumdar and ViktorKuncak, editors,
Computer Aided Verification - 29th International Conference, CAV 2017,Heidelberg, Germany, July 24-28, 2017, Proceedings, Part II , volume 10427 of
Lecture Notesin Computer Science , pages 155–175. Springer, 2017. doi:10.1007/978-3-319-63390-9\_9 . Hana Galperin and Avi Wigderson. Succinct representations of graphs.
Inf. Control. , 56(3):183–198, 1983. doi:10.1016/S0019-9958(83)80004-7 . Georg Gottlob and Christos Papadimitriou. On the complexity of single-rule datalog queries.
Information and Computation , 183(1):104–122, 2003. Matthew Hague. Parameterised pushdown systems with non-atomic writes. In
IARCSAnnual Conference on Foundations of Software Technology and Theoretical Computer Science,FSTTCS 2011, December 12-14, 2011, Mumbai, India , pages 457–468, 2011. doi:10.4230/LIPIcs.FSTTCS.2011.457 . Jad Hamza. On the complexity of linearizability. In Ahmed Bouajjani and Hugues Fauconnier,editors,
Networked Systems - Third International Conference, NETYS 2015, Agadir, Morocco,May 13-15, 2015, Revised Selected Papers , volume 9466 of
Lecture Notes in Computer Science ,pages 308–321. Springer, 2015. doi:10.1007/978-3-319-26850-7\_21 . Mengda He, Viktor Vafeiadis, Shengchao Qin, and João F. Ferreira. GPS $$+$$ + : Reasoningabout fences and relaxed atomics.
Int. J. Parallel Program. , 46(6):1157–1183, 2018. Maurice Herlihy. Wait-free synchronization.
ACM Trans. Program. Lang. Syst. , 13(1):124–149,1991. doi:10.1145/114005.102808 . Maurice Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrentobjects.
ACM Trans. Program. Lang. Syst. , 12(3):463–492, 1990. Neil Immerman.
Descriptive complexity . Springer Science & Business Media, 2012. Bishoksan Kafle, John P Gallagher, and Pierre Ganty. Solving non-linear horn clauses using alinear horn clause solver. arXiv preprint arXiv:1607.04459 , 2016. Vineet Kahlon. Parameterization as abstraction: A tractable approach to the dataflow analysisof concurrent programs. In
Proceedings of the Twenty-Third Annual IEEE Symposium onLogic in Computer Science, LICS 2008, 24-27 June 2008, Pittsburgh, PA, USA , pages 181–192,2008. doi:10.1109/LICS.2008.37 . Jan-Oliver Kaiser, Hoang-Hai Dang, Derek Dreyer, Ori Lahav, and Viktor Vafeiadis. Stronglogic for weak memory: Reasoning about release-acquire consistency in iris. In Peter Müller,editor, , volume 74 of
LIPIcs , pages 17:1–17:29. Schloss Dagstuhl -Leibniz-Zentrum für Informatik, 2017. doi:10.4230/LIPIcs.ECOOP.2017.17 . Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. A promisingsemantics for relaxed-memory concurrency. In
POPL , pages 175–189. ACM, 2017. Michalis Kokologiannakis, Ori Lahav, Konstantinos Sagonas, and Viktor Vafeiadis. Effectivestateless model checking for C/C++ concurrency.
Proc. ACM Program. Lang. , 2(POPL):17:1–17:32, 2018. doi:10.1145/3158105 . Ori Lahav and Udi Boker. Decidable verification under a causally consistent shared memory.In Alastair F. Donaldson and Emina Torlak, editors,
Proceedings of the 41st ACM SIGPLANInternational Conference on Programming Language Design and Implementation, PLDI 2020,London, UK, June 15-20, 2020 , pages 211–226. ACM, 2020. doi:10.1145/3385412.3385966 . Ori Lahav, Nick Giannarakis, and Viktor Vafeiadis. Taming release-acquire consistency. InRastislav Bodík and Rupak Majumdar, editors,
Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg,FL, USA, January 20 - 22, 2016 , pages 649–662. ACM, 2016. doi:10.1145/2837614.2837643 . Ori Lahav, Nick Giannarakis, and Viktor Vafeiadis. Taming release-acquire consistency. In
POPL , pages 649–662. ACM, 2016. Ori Lahav and Viktor Vafeiadis. Owicki-gries reasoning for weak memory models. In
ICALP ,volume 9135 of
LNCS , pages 311–323. Springer, 2015. Leslie Lamport. How to make a multiprocessor computer that correctly executes multiprocessprograms.
IEEE Trans. Computers , 28(9):690–691, 1979. Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. Don’t settlefor eventual: scalable causal consistency for wide-area storage with cops. In Ted Wobber andPeter Druschel, editors,
SOSP , pages 401–416. ACM, 2011. URL: http://dblp.uni-trier.de/db/conf/sosp/sosp2011.html . Roland Meyer and Sebastian Wolff. Pointer life cycle types for lock-free data structures withmemory reclamation.
Proc. ACM Program. Lang. , 4(POPL):68:1–68:36, 2020. S. S. Owicki and D. Gries. An axiomatic proof technique for parallel programs I.
ActaInformatica , 6:319–340, 1976. Christos H. Papadimitriou and Mihalis Yannakakis. A note on succinct representations ofgraphs.
Inf. Control. , 71(3):181–185, 1986. doi:10.1016/S0019-9958(86)80009-2 . Anton Podkopaev, Ilya Sergey, and Aleksandar Nanevski. Operational aspects of C/C++concurrency.
CoRR , abs/1606.01400, 2016. URL: http://arxiv.org/abs/1606.01400 , arXiv:1606.01400 . Azalea Raad, Ori Lahav, and Viktor Vafeiadis. On parallel snapshot isolation and release/ac-quire consistency. In
Programming Languages and Systems - 27th European Symposium onProgramming, ESOP 2018, Held as Part of the European Joint Conferences on Theory andPractice of Software, ETAPS 2018, Thessaloniki, Greece, April 14-20, 2018, Proceedings , pages940–967, 2018. doi:10.1007/978-3-319-89884-1\_33 . Anu Singh, C. R. Ramakrishnan, and Scott A. Smolka. Query-based model checking ofad hoc network protocols. In Mario Bravetti and Gianluigi Zavattaro, editors,
CONCUR2009 - Concurrency Theory, 20th International Conference, CONCUR 2009, Bologna, Italy,September 1-4, 2009. Proceedings , volume 5710 of
Lecture Notes in Computer Science , pages603–619. Springer, 2009. doi:10.1007/978-3-642-04081-8\_40 . Salvatore La Torre, Anca Muscholl, and Igor Walukiewicz. Safety of parametrized asynchronousshared-memory systems is almost always decidable. In , pages 72–84, 2015. doi:10.4230/LIPIcs.CONCUR.2015.72 . Aaron Turon, Viktor Vafeiadis, and Derek Dreyer. GPS: navigating weak memory with ghosts,protocols, and separation. In
OOPSLA , pages 691–707. ACM, 2014. Viktor Vafeiadis and Chinmay Narayan. Relaxed separation logic: a program logic for C11concurrency. In
OOPSLA , pages 867–884. ACM, 2013. M Vardi. The complexity of relational database queries. In
Proc. STOC , pages 137–146, 1982. Werner Vogels. Eventually consistent.
Commun. ACM , 52(1):40–44, 2009. doi:10.1145/1435417.1435432doi:10.1145/1435417.1435432