[PDF] A Framework for Consistency Algorithms

Abstract

We present a framework that provides deterministic consistency algorithms for given memory models. Such an algorithm checks whether the executions of a shared-memory concurrent program are consistent under the axioms defined by a model. For memory models like SC and TSO, checking consistency is NP-complete. Our framework shows, that despite the hardness, fast deterministic consistency algorithms can be obtained by employing tools from fine-grained complexity. The framework is based on a universal consistency problem which can be instantiated by different memory models. We construct an algorithm for the problem running in time O*(2^k), where k is the number of write accesses in the execution that is checked for consistency. Each instance of the framework then admits an O*(2^k)-time consistency algorithm. By applying the framework, we obtain corresponding consistency algorithms for SC, TSO, PSO, and RMO. Moreover, we show that the obtained algorithms for SC, TSO, and PSO are optimal in the fine-grained sense: there is no consistency algorithm for these running in time 2^o(k) unless the exponential time hypothesis fails.

Full PDF

AA Framework for Consistency Algorithms

Peter Chini

TU [email protected]

Prakash Saivasan

The Institute of Mathematical [email protected]

Abstract

We present a framework that provides deterministic consistency algorithms for given memory models.Such an algorithm checks whether the executions of a shared-memory concurrent program areconsistent under the axioms deﬁned by a model. For memory models like SC and TSO , checkingconsistency is NP -complete. Our framework shows, that despite the hardness, fast deterministicconsistency algorithms can be obtained by employing tools from ﬁne-grained complexity.The framework is based on a universal consistency problem which can be instantiated by diﬀerentmemory models. We construct an algorithm for the problem running in time O ∗ (2 k ), where k isthe number of write accesses in the execution that is checked for consistency. Each instance ofthe framework then admits an O ∗ (2 k )-time consistency algorithm. By applying the framework, weobtain corresponding consistency algorithms for SC , TSO , PSO , and

RMO . Moreover, we show thatthe obtained algorithms for SC , TSO , and

PSO are optimal in the ﬁne-grained sense: there is noconsistency algorithm for these running in time 2 o ( k ) unless the exponential time hypothesis fails. Theory of computation → Concurrency; Theory of computation → Problems, reductions and completeness

Keywords and phrases

Consistency, Weak Memory, Fine-Grained Complexity.

Digital Object Identiﬁer

The paper at hand develops a framework for consistency algorithms. Given an execution ofa concurrent program over a shared-memory system, consistency algorithms check whetherthe execution is consistent under the intended behavior of the memory. Our frameworktakes an abstraction of this intended behavior, a memory model , and yields a deterministicconsistency algorithm for it. By applying the framework, we obtain provably optimalconsistency algorithms for the well-known memory models SC [38], TSO , and

PSO [2].Checking consistency is central in the veriﬁcation of shared-memory implementations.Such implementations promise programmers consistency guarantees according to a certainmemory model. However, due to the complex and performance-oriented design, implementingshared memories is sensitive to errors and implementations may not provide the promisedguarantees. Consistency algorithms test this. They take an execution over a shared-memoryimplementation, multiple sequences of read and write events, one for each thread. Then theycheck whether the execution is viable under the memory model, namely whether read andwrite events can be arranged in an interleaving that satisﬁes the axioms of the model.In 1997, Gibbons and Korach [32] were the ﬁrst ones that studied consistency checking asit is considered in this work. They focused on the basic memory model

Sequential Consistency ( SC ) by Lamport [38]. In SC , read and write accesses to the memory are atomic making eachwrite of a thread immediately visible to all other threads. Gibbons and Korach showed thatchecking consistency in this setting is, in general, NP -complete. Moreover, they consideredrestrictions of the problem showing that even under the assumption that certain parameters © Peter Chini and Prakash Saivasan;licensed under Creative Commons License CC-BYLeibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany a r X i v : . [ c s . D S ] J u l X:2 A Framework for Consistency Algorithms like the number of threads are constant, the problem still remains NP -complete.The SPARC memory models Total Store Order ( TSO ), Partial Store Order ( PSO ), and

Relaxed Memory Order ( RMO ) were investigated by Cantin et al. in [16]. The authorsshowed that, like for SC , checking consistency for these models is NP -hard. Furbach et al.[31] extended the NP -hardness to almost all models appearing in the Steinke-Nutt hierarchy[46], a hierarchy developed for the classiﬁcation of memory models. This yields NP -hardnessresults for memory models like Causal Consistency ( CC ) [37], Pipelined RAM ( PRAM ) [44],

Cache Consistency [33] or variants of

Processor Consistency [33, 5]. Bouajjani et al. [12]independently found that checking CC , and variants of it, for a given execution is NP -hard.We approach consistency checking under the assumption of data-independence [12, 50, 11].In fact, the behavior of a shared-memory implementation or a database does not depend onprecise values in practice [49, 1, 4]. We can therefore assume that in a given execution, avalue is written at most once. However, the NP -hardness of checking consistency under SC , TSO , and

PSO carries over to the data-independent case [32, 31]. Deterministic consistencyalgorithms for these models will therefore face exponential running times. By employing a ﬁne-grained complexity analysis, we show that one can still obtain consistency algorithms thathave only a mild exponential dependence on certain parameters and are provably optimal .Fine-grained complexity analyses are a task of

Parameterized Complexity [30, 22, 24].The goal of this new ﬁeld within complexity theory is to measure the inﬂuence of certainparameters on a problem’s complexity. In particular, if a problem is NP -hard, one candetermine which parameter k of the problem still oﬀers a fast deterministic algorithm. Suchan algorithm runs in time f ( k ) · poly ( n ), where f is a computable function that only dependson the parameter, and poly ( n ) is a polynomial dependent on the size of the input n . Problemsadmitting such algorithms lie in the class FPT of ﬁxed-parameter tractable problems. Thetime-complexity of a problem in FPT is denoted by O ∗ ( f ( k )). A ﬁne-grained complexityanalysis determines the precise function f that is needed to solve the problem. While ﬁndingupper bounds amounts to ﬁnding algorithms, lower bounds on f can be obtained from the exponential time hypothesis ( ETH ) [35]. It assumes that n -variable 3- SAT cannot be solvedin time 2 o ( n ) and is considered standard in parameterized complexity [22, 39, 21, 17]. Afunction f is optimal when upper and lower bound match.Our contribution is a framework which yields consistency algorithms that are optimal inthe ﬁne-grained sense. Obtained algorithms run in time O ∗ (2 k ), where k is the number ofwrite events in the given execution. We demonstrate the applicability by obtaining consistencyalgorithms for SC , TSO , PSO , and

RMO . Relying on the

ETH , we prove that for the formerthree models, consistency cannot be checked in time 2 o ( k ) . This shows that our frameworkyields optimal algorithms for these models. Note that considering other parameters likethe number of threads, the number of events per thread, or the size of the underlying datadomain yields W [1]-hard problems [42, 32] that are unlikely to admit FPT -algorithms [22, 24].The framework is based on a universal consistency problem that can be instantiated by amemory model of choice. We develop an algorithm for this universal problem running intime O ∗ (2 k ). Then, any instance by a memory model automatically admits an O ∗ (2 k )-timeconsistency algorithm. For the formulation of the problem, we rely on the formal frameworkof Alglave [6] and Alglave et al. [7] for describing memory models in terms of relations. Infact, checking consistency then amounts to ﬁnding a particular store order [50] on the writeevents that satisﬁes various acyclicity constraints.For solving the universal consistency problem, we show that instead of a store order wecan also ﬁnd a total order on the write events satisfying similar acyclicity constraints. Thelatter are algorithmically simpler to ﬁnd. We develop a notion of snapshot orders that mimic . Chini and P. Saivasan XX:3 total orders on subsets of write events. This allows for shifting from the relation-based domainof the problem to the subset lattice of writes. On this lattice, we can perform a dynamicprogramming which builds up total orders step by step and avoids an explicit iteration oversuch. Keeping track of the acyclicity constraints is achieved by so-called coherence graphs .The dynamic programming runs in time O ∗ (2 k ) which constitutes the complexity.To apply the framework, we follow the formal description of SC , TSO , PSO , and

RMO ,given in [6, 7] and instantiate the universal consistency problem. Optimality of the algorithmsfor SC , TSO , and

PSO is obtained from the

ETH . To this end, we construct a reduction from3-

SAT to the corresponding consistency problem that generates only linearly many writeevents. The reduction transports the assumed lower bound on 3-

SAT to consistency checking.

Related Work.

In its general form, consistency checking is NP -hard for most memorymodels. Furbach et al. [31] show that LOCAL [3] is an exception. Checking consistencyunder

LOCAL takes polynomial time. This also holds for

Cache Consistency and

PRAM if certain parameters of the consistency problem are assumed to be constant. In the caseof data-independence, Bouajjani et al. [12] show that checking consistency under CC andvariants of CC also takes polynomial time. Wei et al. [48] present a similar result for PRAM .In [50], Bouajjani et al. present practically eﬃcient algorithms for the consistency problemsof SC and TSO under data-independence. They rely on the polynomial-time algorithm for CC [12] and obtain a partial store order, which is completed by an enumeration. In theory,the enumeration has a worst-case time complexity of O ∗ ( k k ). We avoid such an enumerationby a dynamic programming running in time O ∗ (2 k ). Consistency checking for weaker andstronger notions of consistency, like linearizability [34], is considered in [26, 27, 25].Instead of checking consistency for a single execution of a shared-memory implementation,there were eﬀorts in verifying that all executions are consistent under a certain memorymodel. Alur et al. show in [8] that for SC , the problem is undecidable. This also holds for CC [12]. Under data-independence, the problem becomes decidable for CC [12]. Verifying Eventual Consistency [47] was shown to be decidable by Bouajjani et al. in [13]. Therehas also been work on other veriﬁcation problems like reachability and robustness. Atiget al. show in [9] that, under

TSO and

PSO , reachability is decidable. In [10] the authorsextend their results and present a relaxation of

TSO with decidable reachability problem.Robustness against

TSO was considered in [14] and shown to be

PSPACE -complete. This alsoholds for

POWER [40, 45], as shown in [23], and for partitioned global address spaces [15].Parameterized complexity has been applied to other veriﬁcation problems as well. Biswasand Enea [11] study the complexity of transactional consistency and obtain an

FPT -algorithmin the size and the width of a history. This also yields an algorithm for the serializabilityproblem, proven to be NP -hard by Papadimitriou [43] in 1979. A ﬁne-grained algorithm forserializability under TSO was given in [28]. The authors of [29] present an

FPT -algorithmfor predicting atomicity violations as well as an intractability result. The parameterizedcomplexity of data race prediction was considered in [42]. Fine-grained complexity analyseswere conducted for reachability under bounded context switching on ﬁnite-state systems [18],and for reachability and liveness on parameterized systems [19, 20].

To state our framework, we introduce some basic notions around memory models and theconsistency problem. We mainly follow [7, 6, 50, 12]. Further, we give a short introductioninto ﬁne-grained complexity. For standard textbooks in this ﬁeld, we refer to [30, 24, 22].

X:4 A Framework for Consistency Algorithms

Relations, Histories, and Memory Models.

We consider the consistency problem: givenan execution of a concurrent program and a model of the shared memory, decide whether theexecution adheres to the model. Formally, executions consist of events modeling write andread accesses to the shared memory. To deﬁne these, let

Var be the ﬁnite set of variables ofthe program. Moreover, let

Val be its ﬁnite data domain and

Lab a ﬁnite set of labels. A writeevent is deﬁned by w : wr ( x, v ), where w ∈ Lab is a label, x ∈ Var is a variable, and v ∈ Val isa value. The set of write events is deﬁned by WR = { w : wr ( x, v ) | w ∈ Lab , x ∈ Var , v ∈ Val } .A read event is given by r : rd ( x, v ). The set of read events is denoted by RD . We deﬁnethe set of all events by E = WR ∪ RD . If it is clear from the context, we omit the label ofan event. Given an event o ∈ E , we access the variable of o by var ( o ) ∈ Var . For a subset O ⊆ E , we denote by WR ( O ) and RD ( O ) the set of write and read events in O .For modeling dependencies between events we use strict orders. Let O ⊆ E be a set ofevents. A strict partial order on O is an irreﬂexive, transitive relation over O . A strict totalorder is a strict partial order that is total. We often refer to the notions without mentioningthat they are strict. Given two relations rel , rel ⊆ O × O , we denote by rel ◦ rel theircomposition, by rel + the transitive closure, and by rel − the inverse. For variable x , we denoteby rel x the restriction of rel to events on x : rel x = { ( o, o ) ∈ rel | var ( o ) = var ( o ) = x } .Executions are modeled by histories . A history is a tuple h = h O, po , rf i , where O ⊆ E isa set of events executed by the threads of the program. The program order po is a partialorder on O which orders the events of a thread according to the execution. Typically, it isa union of total orders, one for each thread. The relation rf ⊆ WR ( O ) × RD ( O ) is called reads-from relation. It speciﬁes the write event providing the value for a read event in thehistory. Moreover, for each read event r ∈ RD ( O ) we have a write event w ∈ WR ( O ) suchthat ( w, r ) ∈ rf and if ( w, r ) ∈ rf , both events access the same variable.Note that we assume the reads-from relation to be given as a part of the history. This isdue to the data-independence of shared-memory and database implementations in practice[49, 1, 11, 4, 12, 50]. This means that the behavior of the implementation does not dependon actual values and in an execution, we may assume each value to be written at most once.From such an execution, we can simply read oﬀ the relation rf .Our framework is compatible with histories that feature initial writes . These historieshave a write event for each variable writing the initial value of that variable. Formally, thesewrite events are smaller than all other events under program order. If a history h = h O, po , rf i is ﬁxed, we abuse notation and also use WR and RD to denote WR ( O ) and RD ( O ). For avariable x , we write WR ( x ) = { w ∈ WR | var ( w ) = x } for the set of write events on x in h .Furthermore, we will later make use of the relation po - loc , deﬁned by restricting po to eventson the same variable: po - loc = { ( o, o ) ∈ po | var ( o ) = var ( o ) } .A memory model is an abstraction of the memory behavior deﬁning axioms thatthe relations in a history must adhere to. Formally, a memory model MM is a tuple MM = ( po - mm , rf - mm ). The relation po - mm , also called preserved program order , is a sub-relation of po describing the structure maintained by the memory model. The latter relation rf - mm is a subrelation of rf . It shows which write events are visible globally under MM . Fine-Grained Complexity.

For many memory models, the consistency problem is NP -hard[31, 32, 16, 12]. Hence, deterministic consistency algorithms usually face exponential runningtimes. But exponents might only depend on certain parameters of the problem which still allowthe algorithm for being fast. Finding such parameters is a task of parameterized complexity .The basis of parameterized complexity are parameterized problems . That is, subsets P of Σ ∗ × N , where Σ is a ﬁnite alphabet. An input to P is of the form ( x, k ), with k being . Chini and P. Saivasan XX:5 called the parameter . A particularly interesting class of parameterized problems are the ﬁxed-parameter tractable ( FPT ) problems. A problem P is FPT if it can be solved by adeterministic algorithm running in time f ( k ) · | x | O (1) , where f is a computable function onlydependent on k . The running time of such an algorithm is usually denoted by O ∗ ( f ( k )) tosuppress the polynomial part. The class FPT is contained in the class W [1]. Problems thatare W [1]-hard are considered intractable since they are unlikely to be FPT .Given a ﬁxed-parameter tractable problem P , ﬁnding an upper bound for f is achievedby constructing an algorithm for P . Lower bounds on f are usually obtained from the exponential time hypothesis ( ETH ) [35]. This standard hardness assumptions asserts that3-

SAT cannot be solved by an algorithm running in time 2 o ( n ) , where n is the number ofvariables. A lower bound on f is then obtained by a suitable reduction from 3- SAT to P . Weare interested in ﬁnding the optimal f for the consistency problem where upper and lowerbound match. The search for such an f is referred to as ﬁne-grained complexity . We present our framework. Given a model describing the memory, the framework providesan (optimal) deterministic algorithm for the corresponding consistency problem. That is,whether a given history can be scheduled under the axioms imposed by the model. Theobtained algorithm can then be used within a testing routine for concurrent programs.At the heart of the framework is a consistency problem that can be instantiated withdiﬀerent memory models. We solve this universal problem by switching from a relation-baseddomain, where the problem is deﬁned, to a subset-based domain. On the latter, we can thenapply a dynamic programming which constitutes the desired deterministic algorithm.

The basis of our framework is a universal consistency problem which can be instantiated tosimulate a particular memory model. For its formulation, we make use of a consistency notionthat allows for the construction of a fast algorithm but deviates from the literature [6, 7, 50]at ﬁrst sight. Therefore, it is proven in Section 4 that instantiating the problem with aparticular memory model yields the correct notion of consistency.We clarify our notion of consistency. Intuitively, a history is consistent under a memorymodel if it can be scheduled such that certain axioms deﬁned by the model are satisﬁed.Following the formal framework of [6, 7], ﬁnding such a schedule amounts to ﬁnding aparticular order of the write events that satisﬁes acyclicity requirements imposed by theaxioms. Formally, let h = h O, po , rf i be a history and let MM be a memory model describedby the tuple ( po - mm , rf - mm ). Then h is called MM -consistent if there exists a strict totalorder tw on the write events WR of h such that the graphs G loc = ( O, po - loc ∪ rf ∪ tw ∪ cf ) and G mm = ( O, po - mm ∪ rf - mm ∪ tw ∪ cf )are both acyclic. Here, the conﬂict relation cf is deﬁned by cf = rf − ◦ S x ∈ Var tw x . Phraseddiﬀerently, ( r, w ) ∈ cf if r is a read event on a variable x , w is a write event on x , and thereis a write event w on x such that ( w , r ) ∈ rf and ( w , w ) ∈ tw .The acyclicity of G loc is called uniprocessor requirement [6] or memory coherence foreach location [16]. Roughly, it demands that an order among writes to the same locationthat can be extracted from the history, is kept in tw . The second acyclicity requirement inthe deﬁnition resembles the underlying memory model MM . If G mm is acyclic, the historycan be scheduled adhering to the axioms deﬁned by MM . X:6 A Framework for Consistency Algorithms

Our deﬁnition of consistency deviates from the literature in two aspects. First, we demanda total order tw instead of a store order , a partial order that is total on writes to the samelocation [6, 7, 50]. In Section 4 we will show that the resulting notions of consistency areequivalent. A further diﬀerence is that we do not explicitly test for out of thin air values [41].For the majority of memory models considered in this work, the test is not necessary as it isimplied by the acyclicity of G loc and G mm . But it can easily be added when needed.We are ready to state the universal consistency problem. To this end, let MM be a ﬁxedmemory model. Given a history h , the problem asks whether h is MM -consistent. MM -Consistency Input:

A history h = h O, po , rf i . Question: Is h MM -consistent?Instantiations of the problem by well-known memory models like SC or TSO are typically NP -hard [32, 31]. However, we are interested in a deterministic algorithm for MM - Consistency .While we cannot avoid an exponential running time for such an algorithm, a ﬁne-grainedcomplexity analysis can determine the optimal exponential dependence. Many parameters of MM - Consistency like the number of threads, the maximum size per thread, or the size ofthe data domain yield parameterizations that are W [1]-hard [42, 32]. Therefore, we conducta ﬁne-grained analysis for the parameter k = | WR | , the number of writes in h . The mainﬁnding is an algorithm for MM - Consistency running in time O ∗ (2 k ). The optimality of thisapproach is shown in Section 5 by a complementing lower bound. We formally state theupper bound in the following theorem. There, n = | O | denotes the number of events in h . (cid:73) Theorem 1.

The problem MM -Consistency can be solved in time O (2 k · k · n ) . Note that an algorithm for MM - Consistency running in time O ∗ ( k k ) is immediate. Onecan iterate over all total orders of WR and check the acyclicity of G loc and G mm in polynomialtime. Since we cannot aﬀord this iteration in O ∗ (2 k ), improving the running time needs analternative approach and further technical development that we summarize in Section 3.2. We present the upper bound for MM - Consistency as stated in Theorem 1. Our algorithmis a dynamic programming. It switches from the domain of total orders to subsets of writeevents and iterates over the latter. The crux is that for a particular subset we do not need toremember a precise order. In fact, we only need to store that it can be ordered by a so-called snapshot order that mimics total orders on subsets. Not having a precise order at hand yieldsa disadvantage: we cannot just test both acyclicity requirements in the end. Instead, weperform an acyclicity test on a coherence graph in each step of the iteration. These graphscarry enough information to ensure acyclicity as it is required by MM - Consistency .We begin our technical development by introducing snapshot orders . Intuitively, thesesimulate total orders of the write events on subsets of writes. Given a subset, a snapshotorder consists of two parts: a total order on the subset and a partial order. The latterexpresses that the complement of the given set precedes the subset but is yet unordered. (cid:73)

Deﬁnition 2.

Let V ⊆ WR . A snapshot order on V is a union tw [ V ] = t [ V ] ∪ r [ V ].The relation t [ V ] is a strict total order on V and r [ V ] = { ( v, v ) | v ∈ V , v ∈ V } arrangesthat the elements of V are smaller than the elements of V . By V , we denote the complementof V in the write events, V = WR \ V . Note that r [ V ] does not impose an order among V . . Chini and P. Saivasan XX:7 A snapshot order is indeed a strict partial order. Even more, when the considered setis the whole write events WR , a snapshot order tw [ WR ] is a total order on WR . Therefore, MM -consistency can be checked by ﬁnding a snapshot order on WR satisfying both acyclicityrequirements. The advantage of this formulation is that we can construct such an order fromsnapshot orders on subsets. Technically, we parameterize the problem along all V ⊆ WR .For the acyclicity requirements, we need a similar parameterization. To this end, let V ⊆ WR be a subset and tw [ V ] a snapshot order on V . We parameterize the above graphs G loc and G mm via exchanging the total order by the snapshot order: G loc ( tw [ V ]) = ( O, po - loc ∪ rf ∪ tw [ V ] ∪ cf [ V ]) ,G mm ( tw [ V ]) = ( O, po - mm ∪ rf - mm ∪ tw [ V ] ∪ cf [ V ]) . As above, the conﬂict relation is deﬁned by cf [ V ] = rf − ◦ S x ∈ Var tw [ V ] x . Note that for asnapshot order tw [ WR ] on the whole set of write events, the resulting graphs G loc ( tw [ WR ])and G mm ( tw [ WR ]) are exactly those appearing in the acyclicity requirement.Now we have the tools to state the parameterization of MM - Consistency along subsets ofwrite events. This allows for leaving the domain of total orders and switch to subsets instead.To this end, we deﬁne a table T with a Boolean entry T [ V ] for each V ⊆ WR . Entry T [ V ]will be 1, if there is a snapshot order on V satisfying the acyclicity requirement on bothparameterized graphs. Otherwise, T [ V ] will evaluate to 0. Formally, T [ V ] is deﬁned by T [ V ] = ( , if ∃ snapshot ord. tw [ V ] : G loc ( tw [ V ]) and G mm ( tw [ V ]) are acyclic , , otherwise . The following lemma relates MM - Consistency to the table T . It is crucial in our devel-opment as it states the correctness of the constructed parameterization. The proof followsfrom the beforehand deﬁnitions and the fact that a snapshot order on WR is already total. (cid:73) Lemma 3.

History h is MM -consistent if and only if T [ WR ] = 1 . We are now left with the problem of evaluating the entry T [ WR ]. Our approach is to setup a recursion among the entries of T and evaluate it via a bottom-up dynamic programming.The recursion will explain how entries of subsets are aggregated to compute entries of largersets. In fact, write events are added element by element: the recursion shows how an entry T [ V ] can be utilized to compute the entry of an enlarged set V ∪ { v } , where v ∈ V .When passing from T [ V ] to T [ V ∪ { v } ], we need to provide a snapshot order on V ∪ { v } that satisﬁes the acyclicity requirements. A snapshot order on V can always be extended to asnapshot order on V ∪ { v } : we insert v as new minimal element in the contained total order.But we need to keep track of whether the acyclicity is compatible with the new minimalelement v . To this end, we perform acyclicity tests on coherence graphs . These do not dependon a snapshot order and solely rely on the fact that v is the new minimal element. This willlater allow for an evaluation of the table without touching precise orders. (cid:73) Deﬁnition 4.

Let V ⊆ WR and v ∈ V . The coherence graphs of V and v are deﬁned by G loc [ V, v ] = ( O, po - loc ∪ rf ∪ r [ V, v ] ∪ cf [ V, v ]) ,G mm [ V, v ] = ( O, po - mm ∪ rf - mm ∪ r [ V, v ] ∪ cf [ V, v ]) . The parameterization here does not refer to parameterized complexity.

X:8 A Framework for Consistency Algorithms

In the deﬁnition, relation r [ V, v ] expresses that V ∪ { v } is smaller than V ∪ { v } and that v isthe minimal element in V ∪{ v } . Formally, it is given by r [ V, v ] = r [ V ∪{ v } ] ∪{ ( v, w ) | w ∈ V } .The conﬂict relation is deﬁned by cf [ V, v ] = rf − ◦ S x ∈ Var r [ V, v ] x .Coherence graphs are key for the recursion among the entries of T . Assume we are givena snapshot order tw [ V ] on V meeting the acyclicity requirements of T and we extend it to asnapshot order tw [ V ] on V = V ∪ { v } , as above - by inserting v as minimal element of V .We show that each potential cycle in G loc ( tw [ V ]) or G mm ( tw [ V ]) either implies a cycle in acoherence graph G loc [ V, v ] or G mm [ V, v ] or in one of the graphs G loc ( tw [ V ]) or G mm ( tw [ V ]).If T [ V ] = 1, we can assume the latter graphs to be acyclic. Moreover, if we have checkedthat the coherence graphs are acyclic as well, we obtain that T [ V ] = 1. Hence, a recursionshould check whether T [ V ] = 1 and whether the corresponding coherence graphs are acyclic.We formulate the recursion in the subsequent lemma. Note that it is a top-downformulation that only refers to non-empty subsets of write events. An evaluation of thebase case is immediate. Entry T [ ∅ ] is evaluated to 1 if G loc ( ∅ ) = ( O, po - loc ∪ rf ) and G mm ( ∅ ) = ( O, po - mm ∪ rf - mm ) are both acyclic. Otherwise it is evaluated to 0. (cid:73) Lemma 5.

Let V ⊆ WR be a non-empty subset. Entry T [ V ] admits the following recursion: T [ V ] = _ v ∈ V ( G loc [ V \{ v } , v ] acyclic ) ∧ ( G mm [ V \{ v } , v ] acyclic ) ∧ T [ V \{ v } ] . We interpret ( G loc [ V \{ v } , v ] acyclic ) as a predicate evaluating to 1 if the graph is acyclic, to0 otherwise. Hence, the recursion requires the existence of an v ∈ V such that both coherencegraphs are acyclic and T [ V \{ v } ] evaluates to 1. A proof of Lemma 5 is given in Appendix A.With the recursion at hand we can evaluate the table T by a dynamic programming.To this end, we store already computed entries and look them up when needed. An entry T [ V ] is evaluated as follows. We branch over all write events v ∈ V and test whether thecoherence graphs G loc [ V \{ v } , v ] and G mm [ V \{ v } , v ] are acyclic. Then, we look up whether T [ V \ { v } ] = 1. If all three queries are positive, we store T [ V ] = 1. Otherwise, T [ V ] = 0.The complexity estimation of Theorem 1 is obtained as follows. The table has 2 k manyentries that we evaluate, which constitutes the exponential factor. For each entry T [ V ], webranch over at most k write events v ∈ V . Looking up the value of T [ V \{ v } ] can be donein constant time. The following lemma shows that O ( k · n ) time suﬃces to construct thecoherence graphs and to check them for acyclicity. The latter checks are based on Kahn’salgorithm [36] for ﬁnding a topological sorting. This completes the proof of Theorem 1. (cid:73) Lemma 6.

Let V ⊆ WR and v ∈ V . Constructing the coherence graphs G loc [ V, v ] and G mm [ V, v ] and testing both for acyclicity can be done in time O ( k · n ) . We show the applicability of our framework and obtain consistency algorithms for the memorymodels SC , TSO , PSO , and

RMO . To this end, we ﬁrst need to show that our notion ofconsistency coincides with the notion of consistency used in the literature for these models.This ensures that the obtained algorithms really solve the correct problem. Once this isachieved, we can directly apply the framework to SC , TSO , and

PSO . For

RMO , we showhow the framework can be slightly modiﬁed to also capture this more relaxed model.

Consistency, as it is considered in the literature, is also known as validity [6, 7]. We usethe latter name to avoid confusion with our notion of consistency. Before we show that . Chini and P. Saivasan XX:9 both notions actually coincide, we formally deﬁne validity. The deﬁnition is based on storeorders [6, 7, 50] (also known as coherence orders ). Given a history h = h O, po , rf i , a storeorder ww ⊆ WR × WR takes the form ww = S x ∈ Var ww x so that each ww x is a strict totalorder on WR ( x ). Phrased diﬀerently, store orders are unions of total orders on writes to thesame variable. Note that, in contrast to a total order on WR , a store order does not haveany edge between write events referring to distinct variables.Validity is similar to consistency. But instead of a total order, the acyclicity require-ments need to be satisﬁed by a store order. Let MM be a memory model described by( po - mm , rf - mm ). A history h = h O, po , rf i is MM -valid if there exists a store order so that G wwloc = ( O, po - loc ∪ rf ∪ ww ∪ fr ) and G wwmm = ( O, po - mm ∪ rf - mm ∪ ww ∪ fr )are acyclic. The from-read relation is deﬁned by fr = rf − ◦ ww . Note that the deﬁnition, asin the case of consistency above, omits checking for out of thin air values. We will later addan explicit test for memory models that require it. This will not aﬀect the complexity.We show the equivalence of validity and consistency. To this end, we need to prove that astore order can be replaced by a total order on the write events while acyclicity is preserved.The following lemma states the result. It is crucial for the applicability of our framework. (cid:73) Lemma 7.

A history h is MM -valid if and only if it is MM -consistent. Before we give the proof of Lemma 7, we need an auxiliary statement. It shows that astore order ww in G wwloc can be replaced by any linearization of ww without aﬀecting acyclicity.Phrased diﬀerently, any total order tw on the write events that contains ww can be insertedinto the graph G wwloc - it will still be acyclic. We state the corresponding lemma. (cid:73) Lemma 8.

Let h = h O, po , rf i be a history, ww a store order, and tw a total order on WRsuch that ww ⊆ tw. If G wwloc is acyclic, then so is G twloc = ( O, po -loc ∪ rf ∪ tw ∪ fr ) . The proof of Lemma 8 is given in Appendix B. We turn to the proof of Lemma 7.

Proof of Lemma 7.

First assume that h = h O, po , rf i is MM -valid. Then there is a storeorder ww such that G wwloc and G wwmm are acyclic. Consider the edges of the latter graph. Theyform a relation ord - mm = po - mm ∪ rf - mm ∪ ww ∪ fr . Since G wwmm is acyclic, the transitiveclosure ord - mm + is a strict partial order on O . Hence, there exists a linear extension, astrict total order L containing ord - mm + . We deﬁne tw = L ∩ WR × WR . Then, tw is a totalorder on WR and we have ww ⊆ L ∩ WR × WR = tw . We show that G loc and G mm are acyclic.Note that the latter refer to the graphs from the deﬁnition of consistency.The store order ww is contained in tw . Hence, we obtain that ww x ⊆ tw x for eachvariable x ∈ Var . This implies that ww x = tw x since ww x is total on WR ( x ). We can deduce ww = S x ∈ Var ww x = S x ∈ Var tw x and thus cf = rf − ◦ S x ∈ Var tw x = rf − ◦ ww = fr .Since fr = cf , we get the acyclicity of G loc = G twloc from Lemma 8. The acyclicity of G mm follows since its edges po - mm ∪ rf - mm ∪ tw ∪ cf form a subrelation of L . A cycle wouldmean that L has a reﬂexive element, but L is a strict order. Hence, h is MM -consistent.For the other direction, assume that h is MM -consistent. By deﬁnition, there is atotal order tw on WR such that G loc and G mm are acyclic. We construct the store order ww = S x ∈ Var tw x . Note that, since tw x is total on WR ( x ), ww is indeed a store orderand we have ww ⊆ tw . We show that G wwloc and G wwmm are acyclic. In fact, we have that fr = rf − ◦ ww = cf . This implies that G wwloc and G wwmm are subgraphs of G loc and G mm ,respectively. Hence, the two graphs are acyclic and h is MM -valid. (cid:74) X:10 A Framework for Consistency Algorithms

We apply the algorithmic framework to the mentioned memory models and obtain (optimal)deterministic algorithms for their corresponding validity/consistency problem. To this end,we employ the formal description of these models given in [6, 7].

Sequential Consistency.

Sequential Consistency ( SC ) is a basic memory model, ﬁrst deﬁnedby Lamport in [38]. Intuitively, SC strictly follows the given program order and ﬂushes eachissued write immediately to the memory so that it is visible to all other threads.Formally, SC is described by the tuple SC = ( po - sc , rf - sc ) with po - sc = po and rf - sc = rf .Hence, it employs the full program order and reads-from relation, making the uniprocessortest on G loc obsolete. However, our framework still applies. It yields an algorithm for thecorresponding validity/consistency problem running in time O (2 k · k · n ). We show inSection 5 that the obtained algorithm is optimal under ETH . Total Store Ordering.

The SPARC memory model

Total Store Order ( TSO ) [2] resemblesa more relaxed memory behavior. Instead of ﬂushing writes immediately to the memory, likein SC , each thread has an own FIFO buﬀer and issued writes of that thread are pushed intothe buﬀer. Writes in the buﬀer are only visible to the owning thread. If the owner reads acertain variable, it ﬁrst looks through the buﬀer and reads the latest issued write on thatvariable. This is called early read . At some nondeterministic point, the buﬀer is ﬂushed tothe memory, making the writes visible to other threads as well.The formal description of TSO is given by the tuple

TSO = ( po - tso , rf - tso ), where po - tso = po \ WR × RD is a relaxation of the program order, containing no write-read pairs.The relation rf - tso = rf e is a restriction of rf to write-read pairs from diﬀerent threads: rf e = { ( w, r ) ∈ rf | ( w, r ) / ∈ po , ( r, w ) / ∈ po } . Unlike in the case of SC , we do not have the full program order and reads-from relation athand. Hence, the uniprocessor test is essential. Applying the framework yields an algorithmfor the validity/consistency problem of TSO running in time O (2 k · k · n ). The optimalityof the obtained algorithm is shown in Section 5. Partial Store Ordering.

The second SPARC model that we consider is

Partial Store Order ( PSO ) [2]. It is weaker than

TSO since writes to diﬀerent locations issued by a thread maynot arrive at the memory in program order. Intuitively, in

PSO each thread has a buﬀer pervariable where the corresponding writes to the variable are pushed. Like for

TSO , threadscan read early from their buﬀers and the buﬀers are, at some point, ﬂushed to the memory.Formally,

PSO is captured by the tuple

PSO = ( po - pso , rf - pso ). Here, the relation po - pso = po \ ( WR × RD ∪ WR × WR ) takes away the write-read pairs and the write-write pairsfrom the program order and, like for TSO , we have rf - pso = rf e . Hence, we can apply ourframework and obtain an O (2 k · k · n )-time algorithm. The obtained algorithm is optimal. Relaxed Memory Order.

We extend the framework to also capture SPARC’s

RelaxedMemory Order ( RMO ) [2]. The model needs an explicit out of thin air test and allows forso-called load-load hazards . We show how both modiﬁcations can be built into the frameworkwithout aﬀecting the complexity of the resulting consistency algorithm.The model

RMO relies on an additional dependency relation resembling address and datadependencies among events in an execution of a program. For instance, if a read event has . Chini and P. Saivasan XX:11 inﬂuence on the value written by a subsequent write event. We assume that the dependencyrelation dp is given along with a history h = h O, po , rf i and is a subrelation of po ∩ ( RD × O ).The latter means that dp always starts in a read event. With the relation at hand we canperform an out of thin air test. In fact, such a test [6] requires that ( O, dp ∪ rf ) is acyclic.This can be checked by Kahn’s algorithm [36] in time O ( n ). Hence, the test can be added tothe framework without increasing the time complexity of the obtained consistency algorithm.Load-load hazards are allowed by RMO . These occur when two reads of the same variableare scheduled not following the program order. To obtain an algorithm from the framework inthis case, we need to weaken the uniprocessor check [6]. In fact, we replace the relation po - loc by po - loc llh = po - loc \ RD × RD and require that the graph G loc − llh = ( O, po - loc llh ∪ rf ∪ tw ∪ cf )is acyclic. The correctness of the framework is ensured since Lemma 7 still holds in thissetting. Moreover, the running time of the resulting algorithm is not aﬀected.With these modiﬁcations, we can obtain a consistency algorithm for RMO . Formally,

RMO = ( po - rmo , rf - rmo ) where po - rmo = dp and rf - rmo = rf e . Applying the frameworkwith out of thin air test and G loc − llh yields a consistency algorithm running in O (2 k · k · n ). We show that the framework provides optimal consistency algorithms for SC , TSO , and

PSO . To this end, we employ the

ETH and prove that checking consistency under thesethree memory models cannot be achieved in subexponential time 2 o ( k ) . Since the algorithmsobtained in Section 4 match the lower bound, they are indeed optimal.We begin with the lower bound for SC -Consistency. For its proof, we rely on a characteri-zation of the ETH , known as the

Sparsiﬁcation Lemma [35]. It states that

ETH is equivalentto the assumption that 3-

SAT cannot be solved in time 2 o ( n + m ) , where n is the numberof variables and m is the number of clauses of the input formula. To transport the lowerbound to consistency checking, we construct a polynomial-time reduction from 3- SAT to SC -Consistency which controls the number of writes k . Technically, for a given formula ϕ ,the reduction yields a history h ϕ that has only k = O ( n + m ) many write events and is SC -consistent if and only if ϕ is satisﬁable. By invoking the reduction, an 2 o ( k ) -time algorithmfor SC -Consistency, would yield an 2 o ( n + m ) -time algorithm for 3- SAT , contradicting the

ETH . (cid:73) Theorem 9. SC -Consistency cannot be solved in time o ( k ) unless ETH fails.

It is left to construct the reduction. Let ϕ be a 3- SAT -instance over the variables X = { x , . . . , x n } and with clauses C , . . . , C m . Moreover, let L denote the set of literals.We construct a history h ϕ the number of writes of which depends linearly on n + m .The main idea of the reduction is to mimic an evaluation of ϕ by an interleaving of theevents in h ϕ . To this end, we divide evaluating ϕ into three steps: (1) choose an evaluationof the variables, (2) evaluate the literals accordingly, and (3) check whether the clauses aresatisﬁed. For each of these steps we have separate threads taking care of the task. Schedulingthem in diﬀerent orders will yield diﬀerent evaluations. An overview is given in Figure 1.Figure 1 presents h ϕ as a collection of threads. The program order is obtained fromreading threads top to bottom. The reads-from relation is given since each value is written atmost once to a variable. Hence, there is always a unique write event providing the read value.We elaborate on the details of the reduction. For realizing Step (1), we construct twothreads, T ( x ) and T ( x ), for each variable x ∈ X . These mimic an evaluation of the variableand consist of only one write event. Thread T ( x ) writes 0 to x , thread T ( x ) writes 1. If T ( x ) gets scheduled before T ( x ), variable x is evaluated to 1 and to 0 otherwise. Hence,the thread that is scheduled later will determine the actual evaluation of x . X:12 A Framework for Consistency Algorithms T ( x ) : wr ( x, T ( x ) : wr ( x, T ( ‘ ) : rd ( x, wr ( ‘, c ) rd ( x, T ( ‘ ) : rd ( x, wr ( ‘, d ) rd ( x, T ( C ) : rd ( ‘ , rd ( ‘ , T ( C ) : rd ( ‘ , rd ( ‘ , T ( C ) : rd ( ‘ , rd ( ‘ , Figure 1

Parts of the history h ϕ for a variable x ∈ X , a literal ‘ ∈ L , and a clause C = ‘ ∨ ‘ ∨ ‘ .Values of c and d depend on ‘ . If ‘ = x , then c = 0 , d = 1. Otherwise, c = 1 , d = 0. In Step (2), we propagate the evaluation of the variables to the literals. To this end, weconstruct two threads for each literal ‘ ∈ L . Let ‘ = x/ ¬ x be a literal on variable x ∈ X . Theﬁrst thread T ( ‘ ) is responsible for evaluating ‘ when x is evaluated to 0. It ﬁrst performs aread event rd ( x, wr ( ‘, c ) and rd ( x, c depends on the literal: if ‘ = x , then c = 0. Otherwise c = 1. Note that the read events guard the write event. Thisensures that T ( ‘ ) can only run if x is already evaluated to 0 and once T ( ‘ ) is running, theevaluation of x cannot change until the thread ﬁnishes. Thread T ( ‘ ) behaves similar. Itevaluates the literal ‘ when x is evaluated to 1. Both threads cannot interfere. Like for thevariables, the later scheduled thread determines the actual evaluation of the literal.It is left to evaluate the clauses. For a clause C = ‘ ∨ ‘ ∨ ‘ , we have threads T ( C ), T ( C ), and T ( C ) as shown in Figure 1. It is the task of these threads to ensure that at leastone literal in C evaluates to 1. To see this, assume we have the contrary, an evaluation of thevariables (and the literals) such that ‘ , ‘ , and ‘ evaluate to 0. Due to the construction, ‘ storing 0 implies that wr ( ‘ ,

1) preceded the write event wr ( ‘ , rd ( ‘ ,

1) in T ( C ) must have already been scheduled. In particular, it has to occur before rd ( ‘ ,

0) in T ( C ). Since ‘ and ‘ also store 0, we get a similar dependency among theirreads: rd ( ‘ ,

1) occurs before rd ( ‘ ,

0) and rd ( ‘ ,

1) occurs before rd ( ‘ , rd ( ‘ , → rd ( ‘ , → rd ( ‘ , → rd ( ‘ , → rd ( ‘ , → rd ( ‘ , → rd ( ‘ , . An arrow r → r means that r has to precede r in an interleaving of the events in h ϕ . Sincecycles cannot occur in an interleaving, the threads can only be scheduled properly when asatisfying assignment is given. The construction of a proper schedule is subtle. We providedetails in Appendix C. The following lemma states the correctness of the construction. (cid:73) Lemma 10.

Formula ϕ is satisﬁable if and only if the history h ϕ is SC -consistent. Clearly, h ϕ can be constructed in polynomial time. We determine the number of writeevents. For each variable x ∈ X and each literal ‘ ∈ L , we introduce two write events. Hence, k = 2 · n + 2 · | L | . Since there are at most 3 · m many literals in ϕ , we get that k is boundedby 2 · n + 6 · m , a number linear in n + m . This ﬁnishes the proof of Theorem 9.We obtain lower bounds for TSO and

PSO , by constructing a similar reduction from3-

SAT to TSO and

PSO -Consistency. To this end, we extend the above reduction by onlyadding read events that enforce sequential behavior. Intuitively, we can force the FIFObuﬀers of

TSO and

PSO to push each issued write to the memory immediately. Then, theabove correctness argument still applies. The number of write events does not change and isstill linear in n + m . This yields the following result. Details are given in Appendix C. (cid:73) Theorem 11.

TSO and

PSO -Consistency cannot be solved in time o ( k ) unless ETH fails. . Chini and P. Saivasan XX:13

References https://github.com/jepsen-io/jepsen/blob/master/galera/src/jepsen/galera/dirty_reads.clj . The SPARC Architecture Manual - Version 8 and Version 9 . 1992,1994. H. Sinha A. Heddaya. Coherence, non-coherence and local consistency in distributed sharedmemory for parallel computing. Technical Report BU-CS-92-004, Boston University, 1992. P. A. Abdulla, F. Haziza, L. Holík, B. Jonsson, and A. Rezine. An integrated speciﬁcationand veriﬁcation technique for highly concurrent data structures. In

TACAS , volume 7795 of

Lecture Notes in Computer Science , pages 324–338. Springer, 2013. M. Ahamad, R. A. Bazzi, R. John, P. Kohli, and G. Neiger. The power of processor consistency.page 251–260. ACM, 1993. J. Alglave. A formal hierarchy of weak memory models.

Formal Methods Syst. Des. , 41(2):178–210, 2012. J. Alglave, L. Maranget, and M. Tautschnig. Herding cats: Modelling, simulation, testing,and data mining for weak memory.

ACM Trans. Program. Lang. Syst. , 36(2):7:1–7:74, 2014. R. Alur, K. L. McMillan, and D. A. Peled. Model-checking of correctness conditions forconcurrent objects.

Inf. Comput. , 160(1-2):167–188, 2000. M. F. Atig, A. Bouajjani, S. Burckhardt, and M. Musuvathi. On the veriﬁcation problem forweak memory models. In

POPL , pages 7–18. ACM, 2010. M. F. Atig, A. Bouajjani, S. Burckhardt, and M. Musuvathi. What’s decidable about weakmemory models? In

ESOP , volume 7211 of

Lecture Notes in Computer Science , pages 26–46.Springer, 2012. R. Biswas and C. Enea. On the complexity of checking transactional consistency.

Proc. ACMProgram. Lang. , 3(OOPSLA):165:1–165:28, 2019. A. Bouajjani, C. Enea, R. Guerraoui, and J. Hamza. On verifying causal consistency. In

POPL , pages 626–638. ACM, 2017. A. Bouajjani, C. Enea, and J. Hamza. Verifying eventual consistency of optimistic replicationsystems. In

POPL , pages 285–296. ACM, 2014. A. Bouajjani, R. Meyer, and E. Möhlmann. Deciding robustness against total store ordering.In

ICALP , volume 6756 of

Lecture Notes in Computer Science , pages 428–440. Springer, 2011. G. Calin, E. Derevenetc, R. Majumdar, and R. Meyer. A theory of partitioned global addressspaces. In

FSTTCS , volume 24 of

LIPIcs , pages 127–139. Schloss Dagstuhl, 2013. J. F. Cantin, M. H. Lipasti, and J. E. Smith. The complexity of verifying memory coherenceand consistency.

IEEE Transactions on Parallel and Distributed Systems , 16(7):663–671, 2005. J. Chen, B. Chor, M. Fellows, X. Huang, D. W. Juedes, I. A. Kanj, and G. Xia. Tight lowerbounds for certain parameterized np-hard problems.

Inf. Comput. , 201(2):216–231, 2005. P. Chini, J. Kolberg, A. Krebs, R. Meyer, and P. Saivasan. On the complexity of boundedcontext switching. In

ESA , volume 87 of

LIPIcs , pages 27:1–27:15. Schloss Dagstuhl, 2017. P. Chini, R. Meyer, and P. Saivasan. Fine-grained complexity of safety veriﬁcation. In

TACAS ,volume 10806 of

Lecture Notes in Computer Science , pages 20–37. Springer, 2018. P. Chini, R. Meyer, and P. Saivasan. Complexity of liveness in parameterized systems. In

FSTTCS , volume 150 of

LIPIcs , pages 37:1–37:15. Schloss Dagstuhl, 2019. M. Cygan, H. Dell, D. Lokshtanov, D. Marx, J. Nederlof, Y. Okamoto, R. Paturi, S. Saurabh,and M. Wahlström. On problems as hard as CNF-SAT.

ACM Trans. Algorithms , 12(3):41:1–41:24, 2016. M. Cygan, F. V. Fomin, L . Kowalik, D. Lokshtanov, D. Marx, M. Pilipczuk, M. Pilipczuk,and S. Saurabh. Parameterized algorithms . Springer, 2015. E. Derevenetc and R. Meyer. Robustness against power is pspace-complete. In

ICALP , volume8573 of

Lecture Notes in Computer Science , pages 158–170. Springer, 2014. R. G. Downey and M. R. Fellows.

Fundamentals of Parameterized Complexity . Springer, 2013. M. Emmi and C. Enea. Monitoring weak consistency. In

CAV , volume 10981 of

Lecture Notesin Computer Science , pages 487–506. Springer, 2018.

X:14 A Framework for Consistency Algorithms M. Emmi and C. Enea. Sound, complete, and tractable linearizability monitoring for concurrentcollections.

Proc. ACM Program. Lang. , 2(POPL):25:1–25:27, 2018. M. Emmi, C. Enea, and J. Hamza. Monitoring reﬁnement via symbolic reasoning. In

PLDI ,pages 260–269. ACM, 2015. C. Enea and A. Farzan. On atomicity in presence of non-atomic writes. In

TACAS , volume9636 of

Lecture Notes in Computer Science , pages 497–514. Springer, 2016. A. Farzan and P. Madhusudan. The complexity of predicting atomicity violations. In

TACAS ,volume 5505 of

Lecture Notes in Computer Science , pages 155–169. Springer, 2009. F. V. Fomin and D. Kratsch.

Exact Exponential Algorithms . Texts in Theoretical ComputerScience. Springer, 2010. F. Furbach, R. Meyer, K. Schneider, and M. Senftleben. Memory-model-aware testing: Auniﬁed complexity analysis.

ACM Trans. Embedded Comput. Syst. , 14(4):63:1–63:25, 2015. P. B. Gibbons and E. Korach. Testing shared memories.

SIAM J. Comput. , 26(4):1208–1244,1997. J. R. Goodman. Cache consistency and sequential consistency. Technical Report 1006,University of Wisconsin-Madison, 1991. M. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects.

ACM Trans. Program. Lang. Syst. , 12(3):463–492, 1990. R. Impagliazzo and R. Paturi. On the complexity of k-SAT.

JCSS , 62(2):367–375, 2001. A. B. Kahn. Topological sorting of large networks.

Commun. ACM , 5(11):558–562, 1962. L. Lamport. Time, clocks, and the ordering of events in a distributed system.

Commun. ACM ,21(7):558–565, 1978. L. Lamport. How to make a multiprocessor computer that correctly executes multiprocessprograms.

IEEE Trans. Computers , 28(9):690–691, 1979. D. Lokshtanov, D. Marx, and S. Saurabh. Slightly superexponential parameterized problems.In

SODA , pages 760–776. SIAM, 2011. S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K.Martin, P. Sewell, and D. Williams. An axiomatic memory model for POWER multiprocessors.In

CAV , volume 7358 of

Lecture Notes in Computer Science , pages 495–512. Springer, 2012. J. Manson, W. Pugh, and S. V. Adve. The java memory model. In

POPL , pages 378–391.ACM, 2005. U. Mathur, A. Pavlogiannis, and M. Viswanathan. The complexity of dynamic data raceprediction. In

LICS , pages 713–727. ACM, 2020. C. H. Papadimitriou. The serializability of concurrent database updates.

J. ACM , 26(4):631–653, 1979. J. S. Sandberg R. J. Lipton. PRAM: A scalable shared memory. Technical Report CS-TR-180-88, Princeton University, 1988. S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams. Understanding POWERmultiprocessors. In

PLDI , pages 175–186. ACM, 2011. R. C. Steinke and G. J. Nutt. A uniﬁed theory of shared memory consistency.

J. ACM ,51(5):800–849, 2004. D. B. Terry, M. Theimer, K. Petersen, A. J. Demers, M. Spreitzer, and C. Hauser. Managingupdate conﬂicts in bayou, a weakly connected replicated storage system. In

SOSP , pages172–183. ACM, 1995. H. Wei, Y. Huang, J. Cao, X. Ma, and J. Lu. Verifying PRAM consistency over read/writetraces of data replicas.

CoRR , abs/1302.5161, 2013. P. Wolper. Expressing interesting properties of programs in propositional temporal logic. In

POPL , pages 184–193. ACM, 1986. R. Zennou, A. Bouajjani, C. Enea, and M. Erradi. Gradual consistency checking. In

CAV ,volume 11562 of

Lecture Notes in Computer Science , pages 267–285. Springer, 2019. . Chini and P. Saivasan XX:15

A Proofs of Section 3

Proof of Lemma 5.

Let V ⊆ WR be non-empty. We have to prove two directions. To thisend, ﬁrst assume T [ V ] = 1. We show that there is an element v ∈ V such that G loc [ V \{ v } , v ]and G mm [ V \{ v } , v ] are both acyclic and T [ V \{ v } , v ] = 1.Since T [ V ] = 1, there is a snapshot ordering tw [ V ] = t [ V ] ∪ r [ V ] with a total order t [ V ]on V . Moreover, the snapshot order satisﬁes that the graphs G loc ( tw [ V ]) = ( O, po - loc ∪ rf ∪ tw [ V ] ∪ cf [ V ]) ,G mm ( tw [ V ]) = ( O, po - mm ∪ rf - mm ∪ tw [ V ] ∪ cf [ V ])are both acyclic.We extract the suitable write event. Since t [ V ] is total on V , there is a unique minimalelement v according to the order. We set V = V \{ v } and show the following three facts: (1) G loc [ V , v ] is a subgraph of G loc ( tw [ V ]), (2) G mm [ V , v ] is a subgraph of G mm ( tw [ V ]), and (3) T [ V ] = 1.With the these facts at hand, we can conclude that v is the element we were looking.Since G loc ( tw [ V ]) and G mm ( tw [ V ]) are acyclic, any subgraph of these are as well.We begin by proving (1) . To this end, we show that each edge of the coherence graph G loc [ V , v ] = ( O, po - loc ∪ rf ∪ r [ V , v ] ∪ cf [ V , v ])is already present in G loc ( tw [ V ]). Since po - loc and rf are already present in G loc ( tw [ V ]), weneed to show that the edges of r [ V , v ] and cf [ V , v ] are also there.By deﬁnition, r [ V , v ] = r [ V ] ∪ { ( v, w ) | w ∈ V } . Since v was selected to be the minimalelement of t [ V ] on V and t [ V ] is total, we get that each edge ( v, w ) with w ∈ V is alsocontained in t [ V ]. Hence, we can deduce the following: r [ V , v ] ⊆ r [ V ] ∪ t [ V ] = tw [ V ] . For the edges of cf [ V , v ] we then obtain: cf [ V , v ] = rf − ◦ [ x ∈ Var r [ V , v ] x ⊆ rf − ◦ [ x ∈ Var tw [ V ] x = cf [ V ] , showing that G loc [ V , v ] is a subgraph of G loc ( tw [ V ]).The proof of (2) follows from (1) . We have to show that each edge of G mm [ V , v ] = ( O, po - mm ∪ rf - mm ∪ r [ V , v ] ∪ cf [ V , v ])is contained in G mm ( tw [ V ]). The edges of po - mm and rf - mm are already present in G mm ( tw [ V ]). Since r [ V , v ] ⊆ tw [ V ] and cf [ V , v ] ⊆ cf [ V ] hold by (1) , we get that G mm [ V , v ]is a proper subgraph of G mm ( tw [ V ]).It is left to prove (3) . To this end, we construct a snapshot order tw [ V ] on V such that G loc ( tw [ V ]) is a subgraph of G loc ( tw [ V ]) and G mm ( tw [ V ]) is a subgraph of G mm ( tw [ V ]).This shows that T [ V ] = 1 since the two latter graphs are acyclic.We construct the snapshot order tw [ V ] as follows. Set tw [ V ] = t [ V ] ∪ r [ V ], where r [ V ] = { ( w, w ) | w ∈ V , w ∈ V } and t [ V ] = t [ V ] ∩ ( V × V ) is the restriction of t [ V ] tothe set V . Note that t [ V ] is total on V . Hence, tw [ V ] is a proper snapshot order. X:16 A Framework for Consistency Algorithms

By deﬁnition we get that t [ V ] ⊆ t [ V ]. Now consider an edge ( w, w ) from r [ V ] with w ∈ V and w ∈ V . There are two cases: (1) For w = v , the edge ( v, w ) is already containedin t [ V ] since v was chosen to be t [ V ]-minimal and t [ V ] is total on V . (2) For w = v , we getthat w ∈ V . Hence, the edge ( w, w ) is already contained in r [ V ]. Putting the cases together,we obtain the following inclusions: tw [ V ] = t [ V ] ∪ r [ V ] ⊆ t [ V ] ∪ r [ V ] = tw [ V ] , cf [ V ] = rf − ◦ [ x ∈ Var tw [ V ] x ⊆ rf − ◦ [ x ∈ Var tw [ V ] x = cf [ V ] . From these inclusions, we immediately obtain that G loc ( tw [ V ]) is a subgraph of G loc ( tw [ V ])and that G mm ( tw [ V ]) is a subgraph of G mm ( tw [ V ]).For the other direction assume the existence a write event v ∈ V such that the coherencegraphs G loc [ V , v ] and G mm [ V , v ] are acyclic and T [ V ] = 1. Here, V = V \{ v } . In order toshow that T [ V ] = 1, we need to construct a snapshot order tw [ V ] on V such that the graphs G loc ( tw [ V ]) and G mm ( tw [ V ]) are acyclic.By the assumption T [ V ] = 1, there is a snapshot order tw [ V ] = t [ V ] ∪ r [ V ] such that G loc ( tw [ V ]) = ( O, po - loc ∪ rf ∪ tw [ V ] ∪ cf [ V ]) ,G mm ( tw [ V ]) = ( O, po - loc ∪ rf - mm ∪ tw [ V ] ∪ cf [ V ])are both acyclic. We extend the order t [ V ] by adding v as new minimal element. Deﬁne t [ V ] = t [ V ] ∪ { ( v, w ) | w ∈ V } . Then, t [ V ] is a total order on V . Thus, tw [ V ] = t [ V ] ∪ r [ V ]with relation r [ V ] = { ( w, w ) | w ∈ V , w ∈ V } is a snapshot order on V .We show the acyclicity of G loc ( tw [ V ]) and G mm ( tw [ V ]) in two steps. First, we deﬁneintermediary graphs J loc [ V, v ] and J mm [ V, v ] and show that G loc ( tw [ V ]) and G mm ( tw [ V ]) aresubgraphs. In the second step we prove that J loc [ V, v ] and J mm [ V, v ] are acyclic. In fact,we show that a cycle in one of the two graphs would induce a cycle in one of the coherencegraphs G loc [ V , v ] and G mm [ V , v ] or in G loc ( tw [ V ]) and G mm ( tw [ V ]), which are acyclic byassumption. The acyclicity of G loc ( tw [ V ]) and G mm ( tw [ V ]) follows and we obtain T [ V ] = 1.We begin with the ﬁrst step. Before we deﬁne the intermediary graphs, we need two newrelations depending on the fact that v is the new minimal element of V . Deﬁne inc ( v ) = { ( w, v ) | w ∈ V } and cﬁnc ( v ) = rf − ◦ [ x ∈ Var inc ( v ) x . Note that a pair ( r, v ) is in cﬁnc ( v ) if r is a read event with a write event w ∈ V such that( w, r ) ∈ rf , ( w, v ) ∈ inc ( v ), and w and v write to the same variable x .The graphs J loc [ V, v ] and J mm [ V, v ] are now deﬁned as follows: J loc [ V, v ] = ( O, po - loc ∪ rf ∪ tw [ V ] ∪ inc ( v ) ∪ cf [ V ] ∪ cﬁnc ( v )) ,J mm [ V, v ] = ( O, po - mm ∪ rf - mm ∪ tw [ V ] ∪ inc ( v ) ∪ cf [ V ] ∪ cﬁnc ( v )) . We show that G loc ( tw [ V ]) is a subgraph of J loc [ V, v ]. First note that the edges of po - loc and rf are already present in J loc [ V, v ]. It is left to argue that tw [ V ] and cf [ V ] are includedin the edges of J loc [ V, v ] as well. To this end, consider the following inclusion: t [ V ] = t [ V ] ∪ { ( v, w ) | w ∈ V } ⊆ t [ V ] ∪ r [ V ] = tw [ V ] . . Chini and P. Saivasan XX:17 The ﬁrst equality is the deﬁnition of t [ V ]. The inclusion holds since v ∈ V and thus { ( v, w ) | w ∈ V } ⊆ r [ V ]. Relation r [ V ] is embedded as follows: r [ V ] = { ( w, w ) | w ∈ V , w ∈ V } = { ( w, w ) | w ∈ V , w ∈ V } ∪ { ( w, v ) | w ∈ V }⊆ r [ V ] ∪ inc ( v ) . The latter inclusion holds due to the fact that V ⊆ V . Combining the above inclusions thenyields tw [ V ] ⊆ tw [ V ] ∪ inc ( v ). For the conﬂict relation, we consequently obtain cf [ V ] = rf − ◦ [ x ∈ Var tw [ V ] x ⊆ rf − ◦ [ x ∈ Var ( tw [ V ] ∪ inc ( v )) x = rf − ◦ [ x ∈ Var ( tw [ V ] x ∪ inc ( v ) x )= rf − ◦ [ x ∈ Var tw [ V ] x ! ∪ rf − ◦ [ x ∈ Var inc ( v ) x ! = cf [ V ] ∪ cﬁnc ( v ) . Hence, all edges of G loc ( tw [ V ]) are present in J loc [ V, v ] which proves the subgraph relation.The fact that G mm ( tw [ V ]) is a subgraph of J mm [ V, v ] follows easily from the aboveobservations. Since the relations po - mm and rf - mm are already present in J mm [ V, v ] and tw [ V ] ⊆ tw [ V ] ∪ inc ( v ) as well as cf [ V ] ⊆ cf [ V ] ∪ cﬁnc ( v ) hold independently from theconsidered graphs, we obtain the desired subgraph relation.In the second step, we show the acyclicity of J loc [ V, v ] and J mm [ V, v ]. We focus on J loc [ V, v ] since the proof for J mm [ V, v ] is similar. Assume there is a cycle C in J loc [ V, v ]. If C does neither contain an edge from inc ( v ) nor from cﬁnc ( v ), the cycle has only edges over po - loc ∪ rf ∪ tw [ V ] ∪ cf [ V ]. Hence, C is a cycle in G loc ( tw [ V ]) which is a contradiction sincethe graph is acyclic. Therefore, C goes through at least one edge from inc ( v ) or cﬁnc ( v ). Inboth cases, this means that C passes through the write event v . We may think of C as acycle that starts and ends in v : C is of the form C = e .e . . . e ‘ with e i edges and e = ( v, w ), e ‘ = ( w ‘ , v ) for events w , w ‘ ∈ O . Moreover, we assume that C is short. The write event v is only visited once. Otherwise, we would get a shorter cycle.Out of C , we show how to construct a cycle ˆ C in G loc [ V , v ] which contradicts theassumption that the coherence graphs are acyclic. To this end, we induct over the edges of C and construct ˆ C while keeping the invariant that all edges of ˆ C are in G loc [ V , v ].Initially, ˆ C does not have any edges. The induction step is as follows. Assume we havealready constructed a part of ˆ C while iterating to the i -th edge e of C . We get the followingcase distinction, based upon the type of e :If e is an edge in po - loc ∪ rf . Then e is also present in G loc [ V , v ] and we can add it to ˆ C by setting: ˆ C = ˆ C.e .If e is an edge in tw [ V ] we get two subcases: (1) If e is in t [ V ]. Then, e = ( w, w ), where w, w ∈ V are write events. There is an edge ( v, w ) ∈ r [ V , v ] by deﬁnition. We deletethe content of ˆ C and start a new cycle with this edge: ˆ C = ( v, w ). It lies in G loc [ V , v ]. X:18 A Framework for Consistency Algorithms (2) If e is an edge in r [ V ]. Then e = ( w, w ) where w ∈ V and w ∈ V . The edge thenalso lies in r [ V , v ] and thus in G loc [ V , v ]. We add it to ˆ C : ˆ C = ˆ C.e .If e is an edge in inc ( v ). In this case, e is of the form ( w, v ) with w ∈ V . Thus, e lies in r [ V , v ] and therefore in G loc [ V , v ]. We add the edge to ˆ C by ˆ C = ˆ C.e .If e is an edge in cf [ V ]. Then e = ( r, w ), where r is a read event and w ∈ V is a writeevent. There is an edge ( v, w ) ∈ r [ V , v ] and thus in G loc [ V , v ]. We delete ˆ C and start anew cycle via ˆ C = ( v, w ).If e is an edge in cﬁnc ( v ). Then e lies in cf [ V , v ] since inc ( v ) ⊆ r [ V , v ] and cﬁnc ( v ) = rf − ◦ [ x ∈ Var inc ( v ) x ⊆ rf − ◦ [ x ∈ Var r [ V , v ] x = cf [ V , v ] . Hence, e can be added by ˆ C = ˆ C.e .In the construction, the ﬁrst edge of ˆ C always leaves the write event v , it is of the form( v, w ) for some event w ∈ O . Moreover, the edges in inc ( v ) and cﬁnc ( v ) are the only edges in J loc [ V, v ] that are incoming for v . Such an edge is always the last edge of C and does neverget deleted during the construction of ˆ C . Note that we assumed the existence of such anedge. Hence, by construction ˆ C is a non-empty cycle in G loc [ V , v ] that starts and ends in v .This contradicts the acyclicity of the coherence graph. Altogether, J loc [ V, v ] is acyclic. (cid:74)

Proof of Lemma 6.

We provide a proof for G loc [ V, v ] since the statement for G mm [ V, v ] isshown similarly. First, we focus on the construction of the graph. Recall that G loc [ V, v ] = ( O, po - loc ∪ rf ∪ r [ V, v ] ∪ cf [ V, v ]) . Constructing the vertices can clearly be done in time O ( n ), as n = | O | . The edges of po - loc ∪ rf are part of the input. Hence, we can iterate over these edges and add them tothe graph. Since po - loc ∪ rf is a relation in O × O , this takes time at most O ( n ). Next, weconstruct the edges of the relation r [ V, v ] = { ( w, w ) | w ∈ V ∪ { v } , w ∈ V ∪ { v }} ∪ { ( v, w ) | w ∈ V } . The latter part is simple to construct: we add an edge ( v, w ) for each w ∈ V . These are atmost O ( k ) and takes the same amount of time. For constructing the former relation, weiterate over w ∈ V ∪ { v } and w ∈ V ∪ { v } and add the edge ( w, w ). This takes time at most O ( k ) = O ( k · n ) time. Hence, the relation r [ V, v ] contains at most O ( k · n ) many edges andcan be constructed in time O ( k · n ).It is left to construct the conﬂict relation cf [ V, v ] = rf − ◦ S x ∈ Var r [ V, v ] x . To this end,we ﬁrst construct the relation rf − by turning around the edges stored in rf . Note that rf consists of at most O ( n ) many of these since it contains exactly one edge for each read event.Hence, rf − can be constructed in time O ( n ). The relations r [ V, v ] x can be constructed from r [ V, v ]. We iterate over the edges in r [ V, v ] and put an edge ( w, w ) to the correspondingprojection r [ V, v ] x if both write events w, w write to variable x . This takes time at most O ( k · n ). The composition cf [ V, v ] = rf − ◦ S x ∈ Var r [ V, v ] x is then obtained as follows. Weiterate over all edges ( r, w ) in rf − and ( w , ˆ w ) in one of the r [ V, v ] x and add ( r, ˆ w ) to cf [ V, v ]if w = w . Since rf − contains at most O ( n ) many edges and the union of the r [ V, v ] x contains at most O ( k · n ) many edges, constructing cf [ V, v ] takes time O ( k · n ).Hence, the graph G loc [ V, v ] can be constructed in time O ( k · n ). It is left to show thatcycles in G loc [ V, v ] can be detected within the same amount of time. To this end, we applyKahn’s algorithm [36]. It ﬁnds a topological sorting for a given graph. Such a sorting only . Chini and P. Saivasan XX:19 exists if the graph is acyclic. If this is not the case, the algorithm outputs an error. Kahn’salgorithm runs in time linear in the vertices and edges. In our setting, it needs at most O ( n )time since we can have at most n many edges. This is below the bound of O ( k · n ) andtherefore ﬁnishes the proof of the lemma. (cid:74) B Proofs of Section 4

Proof of Lemma 8.

First, we consider the structure of the acyclic graph G wwloc . In fact, G wwloc decomposes into a disjoint union of its projections to the variables. We show that G wwloc = [ x ∈ Var G wwloc ( x ) , where G wwloc ( x ) = ( O ( x ) , po - loc x ∪ rf x ∪ ww x ∪ fr x ) is the projection to variable x and theunion is taken over vertices and edges.It is clear that each projection G wwloc ( x ) is contained in G wwloc as O ( x ) ⊆ O and eachprojected relation is a subset of the original relation. For the other inclusion, ﬁrst note thatthe set of vertices O is contained in the union since we can write O = S x ∈ Var O ( x ). Phraseddiﬀerently, each event in O refers to exactly one location. It is left to show that all edges of G wwloc are contained in the union. By deﬁnition, each of the relations po - loc , rf , ww , and fr only relates events to the same location. Hence, an edge ( w, w ) from one of the relations isan edge among events on a variable x and therefore contained in the graph G wwloc ( x ). Theunion is disjoint since there is no edge in G wwloc that involves events on diﬀerent variables.The order tw contains the store order ww and for each x ∈ Var , order ww x is total on WR ( x ). This implies, that tw x = ww x . Hence, tw diﬀers from ww by additional edges amongwrite events on diﬀerent variables. Formally, we can write tw as a disjoint union: tw = ww ∪ ext , where ext = { ( w, w ) ∈ tw | var ( w ) = var ( w ) } . This means that the graph G twloc of interesthas a structure similar to G wwloc but with edges connecting the projections: G twloc = ext ∪ [ x ∈ Var G wwloc ( x ) . In the union, we interpret the relation ext as a graph with vertices O and edges ext .Now assume that there is a cycle C in the graph G twloc . Then, C takes the following form: C = o π −→ o ext −−→ o π −→ o ext −−→ . . . ext −−→ o ‘ π ‘ −→ o ‘ , where (1) o i , o i are write events in a graph G wwloc ( x i ) for a variable x i , (2) o ‘ = o , (3) each π i is a path within G wwloc ( x i ), and (4) each edge o i ext −−→ o i +1 is an edge in ext .Since o i and o i are write events on the same variable x i and ww x i is a total order on theseevents, there is a relation between the two writes: either ( o i , o i ) ∈ ww x i or ( o i , o i ) ∈ ww x i .In the latter case, we would immediately get a cycle o i ww xi −−−→ o i π i −→ o i in the graph G wwloc ( x i ).But as a subgraph of G wwloc , the graph is acyclic and the cycle cannot appear. Hence, we getthat ( o i , o i ) ∈ ww x i for each i . Since ww x i is contained in tw , we have an edge ( o i , o i ) ∈ tw for each i . Hence, we can shorten the cycle C to a cycle C tw of the form: C tw = o tw −→ o tw −→ o tw −→ o tw −→ . . . tw −→ o ‘ tw −→ o ‘ . Note that we used the fact ext ⊆ tw . The cycle C tw contradicts the fact that tw is a stricttotal order of WR . Hence, cycle C cannot exist and G twloc is acyclic. (cid:74) X:20 A Framework for Consistency Algorithms

C Proofs of Section 5

Proof of Lemma 10.

Let ϕ be satisﬁable. We show that h ϕ is SC -consistent. Since theformula is satisﬁable, there is an evaluation function v : X → { , } that evaluates ϕ to 1. Inorder to prove that h ϕ is SC -consistent, we need to construct a total order tw on the writeevents of h ϕ such that G sc is acyclic. Note that the acyclicity of G loc is implied. In fact, weconstruct a topological sorting of the vertices of G sc which implies acyclicity.For the construction of tw , we ﬁrst extract an ordering on the literals of each clause. Forany clause C i = ‘ i ∨ ‘ i ∨ ‘ i , let L ( C i ,

1) = { ‘ ij ∈ C i | v ( ‘ ij ) = 1 } be the set of literals in C i that evaluate to 1. Since v is satisfying, we get that for each i ∈ { , . . . , m } , the set ofliterals L ( C i ,

1) is non-empty. Similarly, we deﬁne L ( C i ,

0) = { ‘ ij ∈ C i | v ( ‘ ij ) = 0 } .We begin by constructing a partial order among the literals of the clauses. To this end,let Nxt ( j ) = ( j mod 3) + 1 for j = 1 , ,

3. We ﬁrst let all literals of a clause that evaluateto 0 be smaller than the literals evaluating to 1. Formally, we set L ( C i , < L ( C i ,

1) foreach i ∈ { , . . . , m } . If we ﬁnd that | L ( C i , | >

1, there is are two literals evaluating to 0(note that it cannot be three). In this case, let L ( C i ,

0) = { ‘ ij , ‘ ij } where j = Nxt ( j ). Notethat the literals in L ( C i ,

0) always have this form. We set ‘ ij < ‘ ij . The reason why weconstruct the order like this is that in a topological sorting (linear order) of the events of h ϕ ,the corresponding read events of the literals will respect this order. The total order.

We construct tw . To this end, we consider the following sets of threads:First( X ) = { T ( x ) | v ( x ) = 1 } ∪ { T ( x ) | v ( x ) = 0 } , Sec( X ) = { T ( x ) | v ( x ) = 0 } ∪ { T ( x ) | v ( x ) = 1 } . The set First( X ) contains those threads that will write the complement evaluation of v intothe variables. These threads have to run ﬁrst in an interleaving/topological. The variablesthen get overwritten by the threads of Sec( X ). These write the correct evaluation v to thevariables. We need further notation to construct tw . Let ‘ be a literal over x ∈ X . We set T neg ( ‘ ) = ( T ( ‘ ) , if ‘ = x,T ( ‘ ) , otherwise . This is the thread that stores 0 in ‘ . Similarly, we may deﬁne a notation for the thread of h ϕ that stores 1 in ‘ . It is given by: T pos ( ‘ ) = ( T ( ‘ ) , if ‘ = ¬ x,T ( ‘ ) , otherwise . We go on with the deﬁnition of further sets. Let C i be a clause. First( C i ) is the set ofthreads that write 0 to the literals that are evaluated to 1 under v . It also contains thethreads writing 1 to literals that are evaluated to 0:First( C i ) = { T neg ( ‘ ) | ‘ ∈ C i , v ( ‘ ) = 1 } ∪ { T pos ( ‘ ) | ‘ ∈ C i , v ( ‘ ) = 0 } . The idea is similar as above. In an interleaving of all events, the threads of First( C i ) run ﬁrst.The variables then get overwritten by the threads of Sec( C i ). These forward the evaluation v of the variables to the literals:Sec( C i ) = { T pos ( ‘ ) | ‘ ∈ C i , v ( ‘ ) = 1 } ∪ { T neg ( ‘ ) | ‘ ∈ C i , v ( ‘ ) = 0 } . . Chini and P. Saivasan XX:21 The total order tw consists of several parts obtained from ordering the above sets. LetLin WR (First( X )) be some total order on the write events of the threads occurring in First( X ),based on an assumed order on the variables. Similarly, let Lin WR (Sec( X )) be a total orderon the write events of the threads in Sec( X ). Also the second order respects the assumedorder on the variables. Further, let Lin WR (First( C i )) be a total order on the write events ofFirst( C i ) that respects the above order on literals (where we see literals as variables of h ϕ ).Similarly, let Lin WR (Sec( C i )) be a total order on write events from the threads of Sec( C i ).To ﬁnally deﬁne tw , we use a suitable append operator . Let t and r be two total orders.Then t.r is the total order obtained from ordering the elements of t to be smaller than theelements of r while preserving the orders t and r , meaning t, r ⊆ t.r . The total order tw onall write events of h ϕ is then given by combining the total orders deﬁned above as follows.Deﬁne tw = tw first . tw sec , where tw first = Lin WR (First( X )) . Lin WR (First( C )) . . . Lin WR (First( C m )) , tw sec = Lin WR (Sec( X )) . Lin WR (Sec( C )) . . . Lin WR (Sec( C m )) . Interleaving the events.

We construct an interleaving of all events following the total order tw . First, we store the complement evaluation ¯ v of v . This is achieved by scheduling First( X )in the beginning. We run these threads in an assumed order on the variables. Then, weforward the evaluation ¯ v to the literals. Since the literal threads are guarded by read events,there is a read dependency, meaning that the threads in First( X ) provide the read values forthe threads in First( C i ) for each i , see Figure 1. For providing the values, we run First( X )ﬁrst, followed by First( C i ) for each i . Similarly, running Sec( X ) will store the evaluation v in the variables and the threads in Sec( C i ) will push it to the literals.The total order tw provides the write events in such a way that the read events of theclause threads T ( C ), T ( C ), and T ( C ) can be scheduled properly. Let C = ‘ ∨ ‘ ∨ ‘ bea clause. We distinguish the following cases:(1) If C is satisﬁed by all three literals, we have v ( ‘ i ) = 1 for i = 1 , ,

3. Then, under¯ v , we have ¯ v ( ‘ i ) = 0 for i = 1 , ,

3. Since the write events in First( C ) evaluate the literalsunder ¯ v , we can schedule the read events rd ( ‘ , rd ( ‘ , rd ( ‘ ,

0) since the values areprovided. Technically, we schedule these events after tw first and before tw sec . The remainingreads rd ( ‘ , rd ( ‘ , rd ( ‘ ,

1) can then be scheduled after tw sec .(2) C is satisﬁed by two literals. Without loss of generality, we assume v ( ‘ ) = 1, v ( ‘ ) = 1, and v ( ‘ ) = 0. Then ¯ v ( ‘ ) = 0, ¯ v ( ‘ ) = 0 and ¯ v ( ‘ ) = 1. To schedule all the readsof the clause threads properly, we do the following: After tw first , we schedule the reads inthe following order rd ( ‘ , . rd ( ‘ , . rd ( ‘ , tw first . After tw sec , where the literals areevaluated according to v , we can then schedule rd ( ‘ , . rd ( ‘ , . rd ( ‘ , C is satisﬁed by one literal. Let us assume v ( ‘ ) = 1, v ( ‘ ) = 0, and v ( ‘ ) = 0.Then, we get ¯ v ( ‘ ) = 0, ¯ v ( ‘ ) = 1, and ¯ v ( ‘ ) = 1. In this case, we schedule the reads rd ( ‘ , . rd ( ‘ ,

1) immediately after tw first . We cannot schedule rd ( ‘ ,

1) since it is blockedby the read rd ( ‘ ,

0) the value of which we have not provided yet. For providing it, consider T neg ( ‘ ). The thread occurs in Sec( C ) and is scheduled within tw sec . Immediately afterit performed its write wr ( ‘ , rd ( ‘ ,

0) which was blocking. Sincewe did not change the content of variable ‘ yet, we can schedule rd ( ‘ , WR (Sec( C )),the thread T neg ( ‘ ) precedes T neg ( ‘ ). After the described schedule, T ( C ) and T ( C ) arecompletely executed. After tw sec , the ‘ i store the evaluation under v . We can then schedulethe remaining reads rd ( ‘ , . rd ( ‘ , X:22 A Framework for Consistency Algorithms

By constructing a schedule following these rules for each clause, we obtain an interleav-ing/total order on all events. This implies that G sc is acyclic.For the other direction of the proof, assume that h ϕ is SC -consistent. We show that ϕ issatisﬁable. By deﬁnition, we obtain a total order tw on the write events of h ϕ such that G sc is acyclic. We construct the evaluation v : X → { , } along the total order: v ( x ) = 1 if and only if wr ( x, tw −→ wr ( x, . Hence, variable x admits the value that is written latest in tw . We show that the evaluationcan be consistently extended to the literals. For each literal ‘ we have: v ( ‘ ) = 1 if and only if wr ( ‘, tw −→ wr ( ‘, . To prove this, let ‘ be a literal evaluating to 1 under v . Towards a contradiction, supposethat wr ( ‘, tw −→ wr ( ‘, ‘ = x . The prove for ‘ = ¬ x is similar. Since v ( x ) = 1 as well, we get wr ( x, tw −→ wr ( x, G sc : wr ( ‘, tw −→ wr ( ‘, po −→ rd ( x, cf −→ wr ( x, rf −→ rd ( x, po −→ wr ( ‘, . Hence, we get that wr ( ‘, tw −→ wr ( ‘, ‘ is a literal evaluating to 0 under v , we canobtain a similar proof. This shows the above equivalence.Now we show that for each clause, there is at least one literal that evaluates to 1 under v .Assume the contrary, then there is a clause C = ‘ ∨ ‘ ∨ ‘ such that v ( ‘ i ) = 0 for i = 1 , , wr ( ‘ i , tw −→ wr ( ‘ i ,

0) for each i = 1 , ,

3. From this, wecan obtain the following cycle in G sc : rd ( ‘ , po −→ rd ( ‘ , cf −→ wr ( ‘ , rf −→ rd ( ‘ , po −→ rd ( ‘ , cf −→ wr ( ‘ , rf −→ rd ( ‘ , po −→ rd ( ‘ , cf −→ wr ( ‘ , rf −→ rd ( ‘ , . Hence, the clauses are satisﬁed under v . This completes the proof. (cid:74) T ( ‘ ) : rd ( x, wr ( ‘, c ) T ( ‘ ) : rd ( x, wr ( ‘, d ) T ( ‘ ) : rd ( ‘, c ) rd ( x, T ( ‘ ) : rd ( ‘, d ) rd ( x, Figure 2

Parts of the history h ϕ for a literal ‘ ∈ L . Values of c and d depend on ‘ . If ‘ = x , then c = 0 , d = 1. Otherwise, c = 1 , d = 0. Proof of Lemma 11.

Recall that for the memory models

TSO and

PSO , the preservedprogram orders are po - tso = po \ WR × RD and po - pso = po \ ( WR × RD ∪ WR × WR ),respectively. While there are no po -relations of the form WR × WR in our lower boundconstruction h ϕ for the case of SC , it has po -relations of the form WR × RD . Recall thatthe threads T ( ‘ ) and T ( ‘ ) were guarded by reads. The program order edges connectingthe write event in the threads with the latter read events will vanish under TSO and

PSO . . Chini and P. Saivasan XX:23 Hence, for a total order tw on the write events of h ϕ , a cycle in G sc does not necessarilyimply a cycle on G tso or G pso .We overcome this issue by replacing the latter guard in T i ( ‘ ) by a separate thread T i ( ‘ ). The thread reads the value written to ‘ in T i ( ‘ ), followed by the guarding read. Theconstruction is shown in Figure 2. We denote the obtained history by h ϕ . The advantage of h ϕ is, that for any total order tw , the graphs G sc , G tso , and G pso are all equal. This is dueto the construction. There are no po -relations of the form WR × RD or WR × WR which couldbe relaxed by TSO or PSO . Intuitively, we enforce sequential behavior with the new threads.It is left to prove that ϕ is satisﬁable if and only if h ϕ is SC -consistent (and thus TSO / PSO -consistent). To this end, note that h ϕ is SC -consistent if and only if h ϕ is. Indeed, theprogram order edge wr ( ‘, c ) po −→ rd ( x,

0) in h ϕ is replaced by wr ( ‘, c ) rf −→ rd ( ‘, c ) po −→ rd ( x, h ϕ . A similar replacement is done for wr ( ‘, d ) po −→ rd ( x, wr ( ‘, c ) → ∗ rd ( x,

0) in h ϕ if and only if there is a path wr ( ‘, c ) → ∗ rd ( x,

0) in h ϕ . Due tothis, acyclicity (and hence SC -consistency) is preserved across the histories.-consistency) is preserved across the histories.