Reconciling Event Structures with Modern Multiprocessors
Evgenii Moiseenko, Anton Podkopaev, Ori Lahav, Orestis Melkonian, Viktor Vafeiadis
RReconciling Event Structures with ModernMultiprocessors
Evgenii Moiseenko
St. Petersburg University, Russia and JetBrains Research, [email protected]
Anton Podkopaev
National Research University Higher School of Economics, Russia and MPI-SWS, Germany andJetBrains Research, [email protected]
Ori Lahav
Tel Aviv University, [email protected]
Orestis Melkonian
University of Edinburgh, [email protected]
Viktor Vafeiadis
MPI-SWS, [email protected]
Abstract
Weakestmo is a recently proposed memory consistency model that uses event structures to resolvethe infamous “out-of-thin-air” problem and to enable efficient compilation to hardware. Nevertheless,this latter property—compilation correctness—has not yet been formally established.This paper closes this gap by establishing correctness of the intended compilation schemes from
Weakestmo to a wide range of formal hardware memory models ( x86 , POWER , ARMv7 , ARMv8 ) inthe Coq proof assistant. Our proof is the first that establishes correctness of compilation of anevent-structure-based model that forbids “out-of-thin-air” behaviors, as well as the first mechanizedcompilation proof of a weak memory model supporting sequentially consistent accesses to such arange of hardware platforms. Our compilation proof goes via the recent Intermediate Memory Model(
IMM ), which we suitably extend with sequentially consistent accesses.
Theory of computation → Logic and verification; Software andits engineering → Concurrent programming languages
Keywords and phrases
Weak Memory Consistency, Event Structures, IMM, Weakestmo.
Digital Object Identifier
A major research problem in concurrency semantics is to develop a weak memory model thatallows load-to-store reordering (a.k.a. load buffering , LB ) and compiler optimizations ( e.g., elimination of fake dependencies), while forbidding “out-of-thin-air” behaviors [19, 11, 5, 14].The problem can be illustrated with the following two programs, which access locations x and y initialized to 0. The annotated outcome a = b = 1 ought to be allowed for LB-fakebecause 1 + a ∗ a := [ x ] // [ y ] := 1 + a ∗ b := [ y ] // [ x ] := b (LB-fake) a := [ x ] // [ y ] := a b := [ y ] // [ x ] := b (LB-data) © Evgenii Moiseenko, Anton Podkopaev, Ori Lahav, Orestis Melkonian, Viktor Vafeiadis;licensed under Creative Commons License CC-BY34th European Conference on Object-Oriented Programming (ECOOP 2020).Editors: Robert Hirschfeld and Tobias Pape; Article No. 5; pp. 5:1–5:34Leibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany a r X i v : . [ c s . P L ] M a y :2 Reconciling Event Structures with Modern Multiprocessors Among the proposed models that correctly distinguish between these two programs is therecent
Weakestmo model [6].
Weakestmo was developed in response to certain limitations ofearlier models, such as the “promising semantics” of Kang et al. [12], namely that ( i ) theydid not cover the whole range of C/C++ concurrency features and that ( ii ) they did notsupport the intended compilation schemes to hardware.Being flexible in its design, Weakestmo addresses the former point. It supports all usualfeatures of the C/C++11 model [3] and can easily be adapted to support any new concurrencyfeatures that may be added in the future. It does not, however, fully address the latterpoint. Due to the difficulty of establishing correctness of the intended compilation schemesto hardware architectures that permit load-store reordering ( i.e.,
POWER , ARMv7 , ARMv8 ),Chakraborty and Vafeiadis [6] only establish correctness of suboptimal schemes that add(unnecessary) explicit fences to prevent load-store reordering.In this paper, we address this major limitation of the
Weakestmo paper. We establish inCoq correctness of the intended compilation schemes to a wide range of hardware architecturesthat includes the major ones: x86 - TSO [18],
POWER [1],
ARMv7 [1],
ARMv8 [22]. The com-pilation schemes, whose correctness we prove, do not require any fences or fake dependenciesfor relaxed accesses. Because of a technical limitation of our setup (see §6), however, compi-lation of read-modify-write (RMW) accesses to
ARMv8 uses a load-reserve/store-conditionalloop (similar to that of
ARMv7 and
POWER ) as opposed to the newly introduced
ARMv8 instructions for certain kinds of RMWs.The main challenge in this proof is to reconcile the different ways in which hardwaremodels and
Weakestmo allow load-store reordering. Unlike most models at the programminglanguage level, hardware models (such as
ARMv8 ) do not execute instructions in sequence;they instead keep track of dependencies between instructions and ensure that no dependencycycles ever arise in a single execution. In contrast,
Weakestmo executes instructions in order,but simultaneously considers multiple executions to justify an execution where a load readsa value that indirectly depends upon a later store. Technically, these multiple executionstogether form an event structure , upon which
Weakestmo places various constraints.
IMM SC ARMv7POWERx86 - TSOARMv8WeakestmoC11
Figure 1
Results proved in this paper.
The high-level proof structure is shown inFig. 1. We reuse
IMM , an intermediate memorymodel , introduced by Podkopaev et al. [20] asan abstraction over all major existing hardwarememory models. To support
Weakestmo compila-tion, we extend
IMM with sequentially consistent (SC) accesses following the
RC11 model [14]. As
IMM is very much a hardware-like model ( e.g., ittracks dependencies), the main result is compilation from
Weakestmo to IMM (indicated bythe bold arrow). The other arrows in the figure are extensions of previous results to accountfor SC accesses, while double arrows indicate results for two compilation schemes.The complexity of the proof is also evident from the size of the Coq development. Wehave written about 30K lines of Coq definitions and proof scripts on top of an existinginfrastructure of about another 20K lines (defining
IMM , the aforementioned hardware modelsand many lemmas about them). As part of developing the proof, we also had to mechanizethe
Weakestmo definition in Coq and to fix some minor deficiencies in the original definition,which were revealed by our proof effort.To the best of our knowledge, our proof is the first proof of correctness of compilation ofan event-structure-based memory model. It is also the first mechanized compilation proofof a weak memory model supporting sequentially consistent accesses to such a range of . Moiseenko et al. 5:3
Init R ( x, W ( y, R ( y, W ( x, po po pporfpo po (a) G LB : Execution graph of LB. Init R ( x, W ( y, R ( y, W ( x, po poppo pporfpo po (b) Execution of LB-data and LB-fake.
Figure 2
Executions of LB and LB-data/LB-fake with outcome a = b = 1. hardware architectures. The latter, although fairly straightforward in our case, has had ahistory of wrong compilation correctness arguments (see [14] for details). Outline
We start with an informal overview of
IMM , Weakestmo , and our compilation proof(§2). We then present a fragment of
Weakestmo formally (§3) and its compilation proof (§4).Subsequently, we extend these results to cover SC accesses (§5), discuss related work (§6)and conclude (§7). The associated proof scripts and supplementary material for our paperare publicly available at http://plv.mpi-sws.org/weakestmoToImm/ . To get an idea about the
IMM and
Weakestmo memory models, consider a version of theLB-fake and LB-data programs from §1 with no dependency in thread 1: a := [ x ] // [ y ] := 1 b := [ y ] // [ x ] := b (LB)As we will see, the annotated outcome is allowed by both IMM and
Weakestmo , albeit indifferent ways. The different treatment of load-store reordering affects the outcomes of otherprograms. For example,
IMM forbids the annotate outcome of LB-fake by treating it exactlyas LB-data, whereas
Weakestmo allows the outcome by treating LB-fake exactly as LB.
IMM
IMM is a declarative (also called axiomatic ) model identifying a program’s semantics with aset of execution graphs , or just executions . As an example, Fig. 2a contains G LB , an IMM execution graph of LB corresponding to an execution yielding the annotated behavior.Vertices of execution graphs, called events , represent memory accesses either due to theinitialization of memory or to the execution of program instructions. Each event is labeledwith the type of the access ( e.g., R for reads, W for writes), the location accessed, and thevalue read or written. Memory initialization consists of a set of events labeled W ( x,
0) foreach location x used in the program; for conciseness, however, we depict the initializationevents as a single event with label Init .Edges of execution graphs represent different relations on events. In Fig. 2, three differentrelations are depicted. The program order relation ( po ) totally orders events originated fromthe same thread according to their order in the program, as well as the initialization event(s)before all other events. The reads-from relation ( rf ) relates a write event to the read eventsthat read from it. Finally, the preserved program order ( ppo ) is a subset of the programorder relating events that cannot be executed out of order. Such ppo edges arise wheneverthere is a dependency chain between the corresponding instructions ( e.g., a write storing thevalue read by a prior read). E C O O P 2 0 2 0 :4 Reconciling Event Structures with Modern Multiprocessors
Because of the syntactic nature of ppo , IMM conflates the executions of LB-data andLB-fake leading to the outcome a = b = 1 (see Fig. 2b). This choice is in line with hardwarememory models; it means, however, that IMM is not suitable as a memory model for aprogramming language (because, as argued in §1, LB-fake can be transformed to LB by anoptimizing compiler).The executions of a program are constructed in two steps. First, a thread-local semanticsdetermines the sequential executions of each thread, where the values returned by eachread access are chosen non-deterministically (among the set of all possible values), and theexecutions of different threads are combined into a single execution. Then, the executiongraphs are filtered by a consistency predicate , which determines which executions are allowed(i.e., are
IMM -consistent). These
IMM -consistent executions form the program’s semantics.
IMM -consistency checks three basic constraints:
Completeness:
Every read event reads from precisely one write with the same location andvalue;
Coherence:
For each location x , there is a total ordering of x -related events extending theprogram order so that each read of x reads from the most recent prior write according tothat total order; and Acyclic dependency:
There is no cycle consisting only of ppo and rf edges.The final constraint disallows executions in which an event recursively depends upon itself,as this pattern can lead to “out-of-thin-air” outcomes. Specifically, the execution in Fig. 2b,which represents the annotated behavior of LB-fake and LB-data, is not IMM -consistentbecause of the ( ppo ∪ rf )-cycle. In contrast, G LB is IMM -consistent.
Weakestmo
We move on to
Weakestmo , which also defines the program’s semantics as a set of executiongraphs. However, they are constructed differently—extracted from a final event structure ,which
Weakestmo incrementally builds for a program.An event structure represents multiple executions of a programs in a single graph. Likeexecution graphs, event structures contain a set of events and several relations among them.Like execution graphs, the program order ( po ) orders events according to each thread’scontrol flow. However, unlike execution graphs, po is not necessarily total among the eventsof a given thread. Events of the same thread that are not po -ordered are said to be in conflict ( cf ) with one another, and cannot belong to the same execution. Such conflict events arisewhen two read events originate from the same read instruction ( e.g., representing executionswhere the reads return different values). Moreover, cf “extends downwards”: events thatdepend upon conflicting events ( i.e., have conflicting po -predecessors) are also in conflictwith one other. In pictures, we typically show only the immediate conflict edges (betweenreads originating from the same instruction) and omit the conflict edges between events po -after immediately conflicting ones.Event structures are constructed incrementally starting from an event structure consistingonly of the initialization events. Then, events corresponding to the execution of programinstructions are added one at a time. We start by executing the first instruction of aprogram’s thread. Then, we may execute the second instruction of the same thread or thefirst instruction of another thread, and so on. For a detailed formal description of the graphs and their construction process we refer the reader to [20,§2.2]. . Moiseenko et al. 5:5
Init e : R ( x, jf (a) S a Init e : R ( x, e : W ( y, jf (b) S b with execution X b selected Init e : R ( x, e : W ( y, e : R ( y, jf jf (c) S c Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, jf jf (d) S d with execution X d selected Init e : R ( x, e : W ( y, e : R ( x, e : R ( y, e : W ( x, cf jfjfjf (e) S e Init e : R ( x, e : W ( y, e : R ( x, e : W ( y, e : R ( y, e : W ( x, cfewjf (f) S f with execution X f selected Figure 3
A run of
Weakestmo witnessing the annotated outcome of LB.
As an example, Fig. 3 constructs an event structure for LB. Fig. 3a depicts the eventstructure S a obtained from the initial event structure by executing a := [ x ] in LB’s thread 1.As a result of the instruction execution, a read event e : R ( x,
0) is added.Whenever the event added is a read,
Weakestmo has to justify the returned value from anappropriate write event. In this case, there is only one write to x —the initialization write—and so S a has a justified from edge, denoted jf , going to e in S a . This is a requirement of Weakestmo : each read event in an event structure has to be justified from exactly one writeevent with the same value and location. (This requirement is analogous to the completeness requirement in
IMM -consistency for execution graphs.) Since events are added in programorder and read events are always justified from existing events in the event structure, po ∪ jf is guaranteed to be acyclic by construction.The next three steps (Figures 3b to 3d) simply add a new event to the event structure.Notice that unlike IMM executions,
Weakestmo event structures do not track syntacticdependencies, e.g., S d in Fig. 3d does not contain a ppo edge between e and e . This isprecisely what allows Weakestmo to assign the same behavior to LB and LB-fake: theyhave exactly the same event structures. As a programming-language-level memory model,
Weakestmo supports optimizations removing fake dependencies.The next step (Fig. 3e) is more interesting because it showcases the key distinctionbetween event structures and execution graphs, namely that event structures may containmore than one execution for each thread. Specifically, the transition from S d to S e rerunsthe first instruction of thread 1 and adds a new event e justified from a different writeevent. We say that this new event conflicts ( cf ) with e because they cannot both occurin a single execution. Because of conflicts, po in event structures does not totally order allevents of a thread; e.g., e and e are not po -ordered in S e . Two events of the same threadare conflicted precisely when they are not po -ordered. E C O O P 2 0 2 0 :6 Reconciling Event Structures with Modern Multiprocessors
Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (a) T C a Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (b) T C b Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (c) T C c Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (d) T C d Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (e) T C e Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (f) T C f Figure 4
Traversal configurations for G LB . The final construction step (Fig. 3f) demonstrates another
Weakestmo feature. Conflictingwrite events writing the same value to the same location ( e.g., e and e in S f ) may bedeclared equal writes , i.e., connected by an equivalence relation ew . The ew relation is used to define Weakestmo ’s version of the reads-from relation, rf ,which relates a read to all (non-conflicted) writes equal to the write justifying the read. Forexample, e reads from both e and e .The Weakestmo ’s rf relation is used for extraction of program executions. An executiongraph G is extracted from an event structure S denoted S (cid:66) G if G is a maximal conflict-freesubset of S , it contains only visible events (to be defined in §3), and every read event in G reads from some write in G according to S. rf . Two execution graphs can be extracted from S f : { Init , e , e , e , e } and { Init , e , e , e , e } representing the outcomes a = 0 ∧ b = 1and a = b = 1 respectively. Weakestmo to IMM
Compilation: High-Level Proof Structure
In this paper, we assume that
Weakestmo is defined for the same assembly language as
IMM (see [20, Fig. 2]) extended with SC accesses and refer to this language as L . Having that, weshow the correctness of the identity mapping as a compilation scheme from Weakestmo to IMM in the following theorem. (cid:73)
Theorem 1.
Let prog be a program in L , and G be an IMM -consistent execution graph ofprog. Then there exists an event structure S of prog under Weakestmo such that S (cid:66) G . To prove the theorem, we must show that
Weakestmo may construct the needed eventstructure in a step by step fashion. If the
IMM -consistent execution graph G contains no po ∪ rf cycles, then the construction is completely straightforward: G itself is a Weakestmo -consistent event structure (setting jf to be just rf ), and its events can be added in anyorder extending po ∪ rf .The construction becomes tricky for IMM -consistent execution graphs, such as G LB , thatcontain po ∪ rf cycles. Due to the cycle(s), G cannot be directly constructed as a (conflict-free) In this paper, we take ew to be reflexive, whereas it is is irreflexive in Chakraborty and Vafeiadis [6].Our ew is the reflexive closure of the one in [6]. . Moiseenko et al. 5:7 Weakestmo event structure. We must instead construct a larger event structure S containingmultiple executions, one of which will be the desired graph G . Roughly, for each po ∪ rf cycle in G , we have to construct an immediate conflict in the event structure.To generate the event structure S , we rely on a basic property of IMM -consistent executiongraphs shown by Podkopaev et al. [20, §§6,7], namely that execution graphs can be traversed in a certain order, i.e., its events can be issued and covered in that order, so that in theend all events are covered. The traversal captures a possible execution order of the programthat yields the given execution. In that execution order, events are not added according toprogram order, but rather according to preserved program order ( ppo ) in two steps. Eventsare first issued when all their dependencies have been resolved, and are later covered whenall their po -prior events have been covered.In more detail, a traversal of an IMM -consistent execution graph G is a sequence oftraversal steps between traversal configurations . A traversal configuration T C of an executiongraph G is a pair of sets of events, h C, I i , called the covered and issued set respectively. Asan example, Fig. 4 presents all six traversal configurations of the execution graph G LB of LBfrom Fig. 2a except for the initial configuration. The issued set is marked by and thecovered set by .A traversal might be seen as an execution of an abstract machine that can execute writeinstructions early but has to execute everything else in order. The first option correspondsto issuing a write event, and the second option to covering an event. The traversal strategyhas certain constraints. To issue a write event, all external reads that it depends uponmust be resolved; i.e., they must read from already issued events. To cover an event, allits po -predecessors must also be covered. For example, in Fig. 4, a traversal cannot issue e : W ( x,
1) before issuing e : W ( y,
1) nor cover e : R ( x,
1) before issuing e : W ( x, et al. [20, Prop. 6.5], every IMM -consistent execution graph G has a full traversal of the following form: G ‘ T C init ( G ) −→ T C −→ T C −→ ... −→ T C final ( G )where the initial configuration, T C init ( G ) (cid:44) h G. Init , G.
Init i , has issued and covered only G ’sinitial events and the final configuration, T C final ( G ) (cid:44) h G. E , G. W i , has covered all G ’s eventsand issued all its write events.We construct the event structure S following a full traversal of G . We define a simulationrelation, I ( prog , G, T C, S, X ), between the program prog , the current traversal configuration T C of execution G and the current event structure’s state h S, X i , where X is a subset ofevents corresponding to a particular execution graph extracted from the event structure S .Our simulation proof is divided into the following three lemmas, which state that theinitial states are simulated, that simulation extends along traversal steps, and that thesimilation of final states means that G can be extracted from the generated event structure. (cid:73) Lemma 2 (Simulation Start) . Let prog be a program of L , and G be an IMM -consistentexecution graph of prog. Then I ( prog , G, T C init ( G ) , S init ( prog ) , S init ( prog ) . E ) holds. (cid:73) Lemma 3 (Weak Simulation Step) . If I ( prog , G, T C, S, X ) and G ‘ T C −→ T C hold,then there exist S and X such that I ( prog , G, T C , S , X ) and S −→ ∗ S hold. (cid:73) Lemma 4 (Simulation End) . If I ( prog , G, T C final ( G ) , S, X ) holds, then the execution graphassociated with X is isomorphic to G . For readers familiar with PS [12], issuing a write event corresponds to promising a message, and coveringan event to normal execution of an instruction. E C O O P 2 0 2 0 :8 Reconciling Event Structures with Modern Multiprocessors
The proof of Theorem 1 then proceeds by induction on the length of the traversal G ‘ T C init ( G ) −→ ∗ T C final ( G ). Lemma 2 serves as the base case, Lemma 3 is the inductionstep simulating each traversal step with a number of event structure construction steps, andLemma 4 concludes the proof.The proofs of Lemmas 2 and 4 are technical but fairly straightforward. (We define I in away that makes these lemmas immediate.) In contrast, Lemma 3 is much more difficult toprove. As we will see, simulating a traversal step sometimes requires us to construct a newbranch in the event structure, i.e., to add multiple events (see §4.3). Weakestmo to IMM
Compilation Correctness by Example
Before presenting any formal definitions, we conclude this overview section by showcasingthe construction used in the proof of Lemma 3 on execution graph G LB in Fig. 2a followingthe traversal of Fig. 4. We have actually already seen the sequence of event structuresconstructed in Fig. 3. Note that, even though Figures 3 and 4 have the same number ofsteps, there is no one-to-one correspondence between them as we explain below.Consider the last event structure S f from Fig. 3. A subset of its events X f marked by ,which we call a simulated execution , is a maximal conflict-free subset of S f and all read eventsin X f read from some write in X f ( i.e., are justified from a write deemed “equal” to somewrite in X f ). Then, by definition, X f is extracted from S f . Also, an execution graph inducedby X f is isomorphic to G LB . That is, construction of S f for LB shows that in Weakestmo it ispossible to observe the same behavior as G LB . Now, we explain how we construct S f andchoose X f .During the simulation, we maintain the relation I ( prog , G, T C, S, X ) connecting a program prog , its execution graph G , its traversal configuration T C , an event structure S , and asubset of its events X . Among other properties (presented in §4.2), the relation states that allissued and covered events of T C have exact counterparts in X , and that X can be extractedfrom S .The initial event structure and X Init consist of only initial events. Then, following issuingof event e : W ( y,
1) in
T C a (see Fig. 4a), we need to add a branch to the event structure thathas W ( y,
1) in it. Since
Weakestmo requires adding events according to program order, wefirst need to add a read event corresponding to ‘ a := [ x ]’ of LB’s thread 1. Each read eventin an event structure has to be justified from somewhere. In this case, the only write event tolocation x is the initial one. That is, the added read event e is justified from it (see Fig. 3a).In the general case, having more than one option, we would choose a ‘safe’ write event foran added read event to be justified from, i.e., the one which the corresponding branch is‘aware’ of already and being justified from which would not break consistency of the eventstructure. After that, a write event e : W ( y,
1) can be added po -after e (see Fig. 3b), and I (LB , G LB , T C a , S b , X b ) holds for X b = { Init , e , e } .Next, we need to simulate the second traversal step (see Fig. 4b), which issues W ( x, e has toget value 1, since there is a dependency between instructions in thread 2. As we mentionedearlier, the traversal strategy guarantees that e : W ( y,
1) is issued at the moment of issuing e : W ( x, e from. Now, the write event e : W ( y,
1) representing e can be added to the eventstructure (see Fig. 3d) and I (LB , G LB , T C b , S d , X d ) holds for X d = { Init , e , e , e , e } .In the third traversal step (see Fig. 4c), the read event e : R ( x,
1) is covered. To havea representative event for e in the event structure, we add e (see Fig. 3e). It is justified . Moiseenko et al. 5:9 from e , which writes the needed value 1. Also, e represents an alternative to e executionof the first instruction of thread 1, so the events are in conflict.However, we cannot choose a simulated execution X related to T C c and S e by thesimulation relation since X has to contain e and a representative for e : W ( y,
1) (in S e it isrepresented by e ) while being conflict-free. Thus, the event structure has to make one otherstep (see Fig. 3f) and add the new event e to represent e : W ( y, X f = { Init , e , e , e , e } .Since X f has to be extracted from S f , every read event in X has to be connected via an rf edge to an event in X . To preserve the requirement, we connect the newly added event e and e via an ew edge, i.e., marking them to be equal writes. This induces an rf edgebetween e and e . That is, I (LB , G LB , T C c , S f , X f ) holds.To simulate the remaining traversal steps (Figures 4d to 4f), we do not need to modify S f because it already contains counterparts for the newly covered events and, moreover, theexecution graph associated with X f is isomorphic to G LB . That is, we just need to show that I (LB , G LB , T C d , S f , X f ), I (LB , G LB , T C e , S f , X f ), and I (LB , G LB , T C f , S f , X f ) hold. Weakestmo
In this section, we introduce the notation used in the rest of the paper and define the
Weakestmo memory model. For simplicity, we present only a minimal fragment of
Weakestmo containing only relaxed reads and writes. For the definition of the full
Weakestmo model, werefer the readers to Chakraborty and Vafeiadis [6] and to our Coq development [17].
Notation
Given relations R and R , we write R ; R for their sequential composition.Given relation R , we write R ? , R + and R ∗ to denote its reflexive, transitive and reflexive-transitive closures. We write id to denote the identity relation ( i.e., id (cid:44) {h x, x i} ). For a set A , we write [ A ] to denote the identity relation restricted to A (that is, [ A ] (cid:44) {h a, a i | a ∈ A } ).Hence, for instance, we may write [ A ] ; R ; [ B ] instead of R ∩ ( A × B ). We also write [ e ] todenote [ { e } ] if e is not a set.Given a function f : A → B , we denote by = f the set of f -equivalent elements: (= f (cid:44) {h a, b i ∈ A × A | f ( a ) = f ( b ) } ). In addition, given a relation R , we denote by R | = f therestriction of R to f -equivalent elements ( R | = f (cid:44) R ∩ = f ), and by R | = f be the restriction of R to non- f -equivalent elements ( R | = f (cid:44) R \ = f ). Events , e ∈ Event , and thread identifiers , t ∈ Tid , are represented by natural numbers. Wetreat the thread with identifier 0 as the initialization thread. We let x ∈ Loc to range over locations , and v ∈ Val over values .A label, l ∈ Lab , takes one of the following forms: R ( x, v ) — a read of value v from location x . W ( x, v ) — a write of value v to location x . Actually, it is easy to show that there could be only one such event since equal writes are in conflictand X is conflict-free. Note that we could have left e without any outgoing ew edges since the choice of equal writes fornewly added events in Weakestmo is non-deterministic. However, that would not preserve the simulationrelation.
E C O O P 2 0 2 0 :10 Reconciling Event Structures with Modern Multiprocessors
Given a label l the functions typ , loc , val return (when applicable) its type ( i.e., R or W ),location and value correspondingly. When a specific function assigning labels to events isclear from the context, we abuse the notations R and W to denote the sets of all events labelledwith the corresponding type. We also use subscripts to further restrict this set to a specificlocation ( e.g., W x denotes the set of write events operating on location x .) An event structure S is a tuple h E , tid , lab , po , jf , ew , co i where: E is a set of events, i.e., E ⊆ Event . tid : E → Tid is a function assigning a thread identifier to every event. We treat eventswith the thread identifier equal to 0 as initialization events and denote them as
Init , thatis
Init (cid:44) { e ∈ E | tid ( e ) = 0 } . lab : E → Lab is a function assigning a label to every event in E . po ⊆ E × E is a strict partial order on events, called program order , that tracks theirprecedence in the control flow of the program. Initialization events are po -before all otherevents, whereas non-initialization events can only be po -before events from the samethread.Not all events of a thread are necessarily ordered by po . We call such po -unorderednon-initialization events of the same thread conflicting events. The corresponding binaryrelation cf is defined as follows: cf (cid:44) ([ E \ Init ] ; = tid ; [ E \ Init ]) \ ( po ∪ po − ) ? jf ⊆ [ E ∩ W ] ; (= loc ∩ = val ) ; [ E ∩ R ] is the justified from relation, which relates a write eventto the reads it justifies. We require that reads are not justified by conflicting writes ( i.e., jf ∩ cf = ∅ ) and jf − be functional ( i.e., whenever h w , r i , h w , r i ∈ jf , then w = w ).We also define the notion of external justification: jfe (cid:44) jf \ po . A read event isexternally justified from a write if the write is not po -before the read. ew ⊆ [ E ∩ W ] ; ( cf ∩ = loc ∩ = val ) ? ; [ E ∩ W ] is an equivalence relation called the equal-writes relation. Equal writes have the same location and value, and (unless identical) are inconflict with one another. co ⊆ [ E ∩ W ] ; (= loc \ ew ) ; [ E ∩ W ] is the coherence order, a strict partial order that relatesnon-equal write events with the same location. We require that coherence be closed withrespect to equal writes ( i.e., ew ; co ; ew ⊆ co ) and total with respect to ew on writes tothe same location: ∀ x ∈ Loc . ∀ w , w ∈ W x . h w , w i ∈ ew ∪ co ∪ co − Given an event structure S , we use “dot notation” to refer to its components ( e.g., S. E , S. po ). For a set A of events, we write S.A for the set A ∩ S. E (for instance, S. W x = { e ∈ S. E | typ ( S. lab ( e )) = W ∧ loc ( S. lab ( e )) = x } ). Further, for e ∈ S. E , we write S. typ ( e )to retrieve typ ( S. lab ( e )). Similar notation is used for the functions loc and val . Given aset of thread identifiers T , we write S. thread ( T ) to denote the set of events belonging to oneof the threads in T , i.e., S. thread ( T ) (cid:44) { e ∈ S. E | S. tid ( e ) ∈ T } . When T = { thread ( t ) } is a singleton, we often write S. thread ( t ) instead of S. thread ( { t } ).We define the immediate po and cf edges of an event structure as follows: S. po imm (cid:44) S. po \ ( S. po ; S. po ) S. cf imm (cid:44) S. cf ∩ ( S. po imm − ; S. po imm ) . Moiseenko et al. 5:11 An event e is an immediate po -predecessor of e if e is po -before e and there is no event po -between them. Two conflicting events are immediately conflicting if they have the sameimmediate po -predecessor. Given a program prog , we construct its event structures operationally in a way that guaranteescompleteness ( i.e., that every read is justified from some write) and po ∪ jf acyclicity. Westart with an event structure containing only the initialization events and add one event at atime following each thread’s semantics.For the thread semantics, we assume reductions of the form σ e −→ σ between threadstates σ, σ ∈ ThreadState and labeled by the event e ∈ E generated by that executionstep. Given a thread t and a sequence of events e , ... , e n ∈ S. thread ( t ) in immediate po succession ( i.e., h e i , e i +1 i ∈ S. po imm for 1 ≤ i < n ) starting from a first event of thread t ( i.e.,dom ( S. po ; [ e ]) ⊆ Init ), we can add an event e po -after that sequence of events provided thatthere exist thread states σ , ... , σ n and σ such that prog ( t ) e −→ σ e −→ σ · · · e n −→ σ n e −→ σ ,where prog ( t ) is the initial thread state of thread t of the program prog . By construction,this means that the newly added event e will be in conflict with all other events of thread t besides e , ... , e n .Further, when the new event e is a read event, it has to be justified from an existingwrite event, so as to ensure completeness and prevent “out-of-thin-air” values. The writeevent is picked non-deterministically from all non-conflicting writes with the same locationas the new read event. Similarly, when e is a write event, its position in co order should bechosen. It can be done by either picking an ew equivalence class and including the new writein it, or by putting the new write immediately after some existing write in co order. At eachstep, we also check for event structure consistency (to be defined in Def. 5): If the eventstructure obtained after the addition of the new event is inconsistent, it is discarded. To define consistency, we first need a number of auxiliary definitions. The happens-before order S. hb is a generalization of the program order. Besides the program order edges, itincludes certain synchronization edges (captured by the synchronizes with relation, S. sw ). S. hb (cid:44) ( S. po ∪ S. sw ) + For the fragment covered in this section, there are no synchronization edges ( i.e., sw = ∅ ),and so hb and po coincide. In the full model, however, certain justification edges ( e.g., between release/acquire accesses) contribute to sw and hence to hb .The extended conflict relation S. ecf extends the notion of conflicting events to accountfor hb ; two events are in extended conflict if they happen after conflicting events. S. ecf (cid:44) ( S. hb − ) ? ; S. cf ; S. hb ? As already mentioned in §2, the reads-from relation, S. rf , of a Weakestmo event structureis derived. It is defined as an extension of S. jf to all S. ew -equivalent writes. S. rf (cid:44) ( S. ew ; S. jf ) \ S. cf Our definition of immediate conflicts differs from that of [6] and is easier to work with. The twodefinitions are equivalent if the set of initialization events is non-empty. The full model is presented in [6] and also in our Coq development [17].
E C O O P 2 0 2 0 :12 Reconciling Event Structures with Modern Multiprocessors
Note that unlike S. jf − , the relation S. rf − is not functional. This does not cause anyproblems, however, since all the writes from whence a read reads have the same location andvalue and are in conflict with one another.The relation S. fr , called from-read or reads-before , places read events before subsequentwrites. S. fr (cid:44) S. rf − ; S. co The extended coherence S. eco is a strict partial order that orders events operating on thesame location. (It is almost total on accesses to a given location, except that it does notorder equal writes nor reads reading from the same write.) S. eco (cid:44) ( S. co ∪ S. rf ∪ S. fr ) + We observe that in our model, eco is equal to rf ∪ co ; rf ? ∪ fr ; rf ? , similar to the correspondingdefinitions about execution graphs in the literature. The last ingredient that we need for event structure consistency is the notion of visible events, which will be used to constrain external justifications. We define it in a few steps.Let e be some event in S . First, consider all write events used to externally justify e orone of its justification ancestors. The relation S. jfe ; ( S. po ∪ S. jf ) ∗ defines this connectionformally. Among that set of write events restrict attention to those conflicting with e , andcall that set M . That is, M (cid:44) dom ( S. cf ∩ ( S. jfe ; ( S. po ∪ S. jf ) ∗ ) ; [ e ]). Event e is visible ifall writes in M have an equal write that is po -related with e . Formally, S. Vis (cid:44) { e ∈ S. E | S. cf ∩ ( S. jfe ; ( S. po ∪ S. jf ) ∗ ) ; [ e ] ⊆ S. ew ; ( S. po ∪ S. po − ) ? } Intuitively, visible events cannot depend on conflicting events: for every such justificationdependence, there ought to be an equal non-conflicting write.
Consistency places a number of additional constraints on event structures. First, it checksthat there is no redundancy in the event structure: immediate conflicts arise only becauseof read events justified from non-equal writes. Second, it extends the constraints about cf to the extended conflict ecf ; namely that no event can conflict with itself or be justifiedfrom a conflicting event. Third, it checks that reads are justified either from events of thesame thread or from visible events of other threads. Finally, it ensures coherence , i.e., thatexecutions restricted to accesses on a single location do not have any weak behaviors. (cid:73) Definition 5.
An event structure S is said to be consistent if the following conditions hold.dom ( S. cf imm ) ⊆ S. R ( cf imm -read) S. jf ; S. cf imm ; S. jf − ; S. ew is irreflexive. ( cf imm -justification) S. ecf is irreflexive. ( ecf -irreflexivity) S. jf ∩ S. ecf = ∅ ( jf -non-conflict) dom ( S. jfe ) ⊆ S. Vis ( jfe -visible) S. hb ; S. eco ? is irreflexive. (coherence) This equivalence equivalence does not hold in the original
Weakestmo model [6]. To make the equivalencehold, we made ew transitive, and require ew ; co ; ew ⊆ co . Note, that in [6] the definition of the visible events is slightly more verbose. We proved in Coq [17] thatour simpler definition is equivalent to the one given there. . Moiseenko et al. 5:13
The last part of
Weakestmo is the extraction of executions from an event structure. Anexecution is essentially a conflict-free event structure. (cid:73)
Definition 6. An execution graph G is a tuple h E , tid , lab , po , rf , co i where its componentsare defined similarly as in the case of an event structure with the following exceptions: po is required to be total on the set of events from the same thread. Thus, executiongraphs have no conflicting events, i.e., cf = ∅ .The rf relation is given explicitly instead of being derived. Also, there are no jf and ew relations. co totally orders write events operating on the same location. All derived relations are defined similarly as for event structures. Next we show how toextract an execution graph from the event structure. (cid:73)
Definition 7.
A set of events X is called extracted from S if the following conditions aremet: X is conflict-free, i.e., [ X ] ; S. cf ; [ X ] = ∅ . X is S. rf -complete, i.e., X ∩ S. R ⊆ codom ([ X ] ; S. rf ) . X contains only visible events of S , i.e., X ⊆ S. Vis . X is hb -downward-closed, i.e., dom ( S. hb ; [ X ]) ⊆ X . Given an event structure S and extracted subset of its events X , it is possible to associatewith X an execution graph G simply by restricting the corresponding components of S to X : G. E = X G. tid = S. tid | X G. lab = S. lab | X G. po = [ X ] ; S. po ; [ X ] G. rf = [ X ] ; S. rf ; [ X ] G. co = [ X ] ; S. co ; [ X ]We say that such execution graph G is associated with X and that it is extracted from theevent structure: S (cid:66) G . Weakestmo additionally defines another consistency predicate to further filter out someof the extracted execution graphs. In the
Weakestmo fragment we consider, this additionalconsistency predicate is trivial—every extracted execution satisfies it—and so we do notpresent it here. In the full model, execution consistency checks atomicity of read-modify-writeinstructions, and sequential consistency for SC accesses.
Weakestmo
In this section, we outline our correctness proof for the compilation from
Weakestmo to thevarious hardware models. As already mentioned, our proof utilizes
IMM [20]. In the following,we briefly present
IMM for the fragment of the model containing only relaxed reads andwrites (§4.1), our simulation relation (§4.2) for the compilation from
Weakestmo to IMM ,and outline the argument as to why the simulation relation is preserved (§4.3). Mappingfrom
IMM to the hardware models has already been proved correct by Podkopaev et al. [20],so we do not present this part here. Later, in §5, we will extend the
IMM mapping results tocover SC accesses.As a further motivating example for this section consider yet another variant of the loadbuffering program shown in Fig. 5. As we will see, its annotated weak behavior is allowed by
IMM and also by
Weakestmo , albeit in a different way. The argument for constructing the
Weakestmo event structure that exhibits the weak behavior from the given
IMM executiongraph is non-trivial.
E C O O P 2 0 2 0 :14 Reconciling Event Structures with Modern Multiprocessors r := [ x ] // [ y ] := r [ z ] := 1 r := [ y ] // r := [ z ] // [ x ] := r Init e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, rfrfrfppo ppo Figure 5
A variant of the load-buffering program (left) and the
IMM graph G corresponding toits annotated weak behavior (right). IMM
In order to discuss the proof, we briefly present a simplified version of the formal
IMM definition, where we have omitted constraints about RMW accesses and fences. (cid:73)
Definition 8. An IMM execution graph G is an execution graph (Def. 6) extended withone additional component: the preserved program order ppo ⊆ [ R ] ; po ; [ W ] . Preserved program order edges correspond to syntactic dependencies guaranteed to bepreserved by all major hardware platforms. For example, the execution graph in Fig. 5 hastwo ppo edges corresponding to the data dependencies via registers r and r . (The full IMM definition [20] distinguishes between the different types of dependencies—control, data,adress–and includes them as separate components of execution graphs. In the full model, ppo is actually derived from the more basic dependencies.)
IMM -consistency checks completeness, coherence, and acyclicity: (cid:73) Definition 9. An IMM execution graph G is IMM -consistent ifcodom ( G. rf ) = G. R , (completeness) G. hb ; G. eco ? is irreflexive, and (coherence) G. rf ∪ G. ppo is acyclic. (no-thin-air) As we can see, the execution graph G of Fig. 5 is IMM -consistent because every read ofthe graph reads from some write event and, moreover, the coherence and no-thin-air properties hold.
Weakestmo to IMM
Proof
In this section, we define the simulation relation I , which is used for the simulation of atraversal of an IMM -consistent execution graph by a
Weakestmo event structure presented in§2.3.The way we define I ( prog , G, h C, I i , S, X ) induces a strong connection between events inthe execution graph G and the event structure S . We make this connection explicit with thefunction s2g G,S : S. E → G. E , which maps events of the event structure S into the events ofthe execution graph G , such that e and s2g G,S ( e ) belong to the same thread and have the Again, this is a simplified presentation for a fragment of the model. We refer the reader to Podkopaev et al. [20] for the full definition, which further distinguishes between internal and external rf edges. A refined version of the simulation relation for the full
Weakestmo model can be found in [17, Appendix A] . Moiseenko et al. 5:15 same po -position in the thread. Note that s2g
G,S is defined for all events e ∈ S. E , meaningthat the event structure S does not contain any redundant events that do not correspond toevents in the IMM execution graph G . The function s2g G,S , however, does not have to beinjective: in particular, events e and e that are in immediate conflict in S have the same s2g G,S -image in G . In the rest of the paper, whenever G and S are clear from the context,we omit the G, S subscript from s2g .In the context of a function s2g (for some G and S ), we also use (cid:86) · (cid:87) and (cid:84) · (cid:85) to lift s2g to sets and relations:for A S ⊆ S. E : (cid:86) A S (cid:87) (cid:44) { s2g ( e ) | e ∈ A S } for A G ⊆ G. E : (cid:84) A G (cid:85) (cid:44) { e ∈ S. E | s2g ( e ) ∈ A G } for R S ⊆ S. E × S. E : (cid:86) R S (cid:87) (cid:44) {h s2g ( e ) , s2g ( e ) i | h e, e i ∈ R S } for R G ⊆ G. E × G. E : (cid:84) R G (cid:85) (cid:44) {h e, e i ∈ S. E × S. E | h s2g ( e ) , s2g ( e ) i ∈ R G } For example, (cid:84) C (cid:85) denotes a subset of S ’s events whose s2g -images are covered events in G ,and (cid:86) S. rf (cid:87) denotes a relation on events in G whose s2g -preimages in S are related by S. rf .We define the relation I ( prog , G, h C, I i , S, X ) to hold if the following conditions are met: G is an IMM -consistent execution of prog . S is a Weakestmo -consistent event structure of prog . X is an extracted subset of S . S and X corresponds precisely to all covered and issued events and their po -predecessors: (cid:86) S. E (cid:87) = (cid:86) X (cid:87) = C ∪ dom ( G. po ? ; [ I ])(Note that C is closed under po -predecessors, so dom ( G. po ? ; [ C ]) = C .) Each S event has the same thread, type, modifier, and location as its corresponding G event. In addition, covered and issued events in X have the same value as theircorresponding ones in G . a. ∀ e ∈ S. E . S. { tid , typ , loc , mod } ( e ) = G. { tid , typ , loc , mod } ( s2g ( e )) b. ∀ e ∈ X ∩ (cid:84) C ∪ I (cid:85) . S. val ( e ) = G. val ( s2g ( e )) Program order in S corresponds to program order in G : (cid:86) S. po (cid:87) ⊆ G. po Identity relation in G corresponds to identity or conflict relation in S : (cid:84) id (cid:85) ⊆ S. cf ? Reads in S are justified by writes that have already been observed by the correspondingevents in G . Moreover, covered events in X are justified by a write corresponding to thatread from the corresponding read in G : a. (cid:86) S. jf (cid:87) ⊆ G. rf ? ; G. hb ? Here we assume existence and uniqueness of such a function. In our Coq development [17], we have adifferent representation of execution graph events (but the same for events of event structures), whichmakes the existence and uniqueness questions trivial.More specifically, we follow Podkopaev et al. [20, §2.2]. There each non-initializing event e of an executiongraph G is encoded as a pair h t, n i where t is e ’s thread and n is a serial number of e in thread t , i.e., aposition of e in G. po restricted to events of thread t ; each initializing event is encoded by the correspondinglocation— h init l i .In this representation, the function s2g G,S for an event e returns (i) the e ’s thread and a number ofnon-initial events which S. po -preceded e if e is non-initialing or (ii) its location if it is initializing: s2g G,S ( e ) (cid:44) (cid:26) h S. tid ( e ) , | dom ([ S. E \ S. Init ]; S. po ; [ e ]) |i for e S. Init h init S. loc ( e ) i for e ∈ S. Init
E C O O P 2 0 2 0 :16 Reconciling Event Structures with Modern Multiprocessors
Init e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, G andits traversal configuration T C a ppo ppo Init e : R ( x, e : W ( y, e : W ( z, jf The event structure S a andthe selected execution X a Figure 6
The execution graph G , its traversal configuration T C a , the related event structure S a ,and the selected execution X a . Covered events are marked by and issued ones by . Eventsbelonging to the selected execution are marked by . b. (cid:86) S. jf ; [ X ∩ (cid:84) C (cid:85) ] (cid:87) ⊆ G. rf Every write event justifying some external read event should be S. ew -equal to some issuedwrite event in X : dom ( S. jfe ) ⊆ dom ( S. ew ; [ X ∩ (cid:84) I (cid:85) ]) Equal writes in S correspond to the same write event in G : (cid:86) S. ew (cid:87) ⊆ id Every non-trivial S. ew equivalence class contains an issued write in X : S. ew ⊆ ( S. ew ; [ X ∩ (cid:84) I (cid:85) ] ; S. ew ) ? Coherence edges in S correspond to coherence or identity edges in G . (We will explain in§4.3 why a coherence edge in S might correspond to an identity edge in G .) (cid:86) S. co (cid:87) ⊆ G. co ? As an example, consider the execution G from Fig. 5, the traversal configuration T C a (cid:44) h{ Init } , { Init , e }i , and the event structure S a shown in Fig. 6. We will show that I ( prog , G, T C a , S a , X a ), where X a (cid:44) S a . E , holds.Take s2g G,S a = { Init Init , e e , e e , e e } . Given that cf = ew = ∅ , theconsistency constraints hold immediately. For example, condition 8 holds because e isjustified by Init , which happens before it. Finally, note that only e and e are required tohave the same value by constraint 5, the other related thread events only need to have thesame type and address.The definition of the simulation relation I renders the proofs of Lemmas 2 and 4 straight-forward. Specifically, for Lemma 2, the initial configuration T C init ( G ) containing only theinitialization events is simulated by the initial event structure S init as all the constraints aretrivially satisfied ( S init . po = S init . jf = S init . ew = S init . co = ∅ ).For Lemma 4, since T C final ( G ) covers all events of G , property 5 implies that the labelsof the events in X are equal to the corresponding events of G ; property 6 means that po isthe same between them; property 8 means that rf is the same between them; properties 7and 12 together mean that co is the same. Therefore, G and the execution corresponding to X are isomorphic. We next outline the proof of Lemma 3, which states that the simulation relation I can berestored after a traversal step. . Moiseenko et al. 5:17 Init e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, T C b vf vfvf ppo ppo Init e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, jf jfjf The event structure S b andthe selected execution X b Figure 7
The traversal configuration
T C b , the related event structure S b , and the selectedexecution X b . Suppose that I ( prog , G, T C, S, X ) holds for some prog , G , T C , S , and X , and we needto simulate a traversal step T C −→ T C that either covers or issues an event of thread t . Then we need to produce an event structure S and a subset of its events X such that I ( prog , G, T C , S , X ) holds. Whenever thread t has any uncovered issued write events, Weakestmo might need to take multiple steps from S to S so as to add any missing events po -before the uncovered issued writes of thread t . Borrowing the terminology of the “promisingsemantics” [12], we refer to these steps as constructing a certification branch for the issuedwrite(s).Before we present the construction, let us return to the example of Fig. 5. Considerthe traversal step from configuration T C a to configuration T C b (cid:44) h{ Init } , { Init , e , e }i byissuing the event e (see Fig. 7). To simulate this step, we need to show that it is possibleto execute instructions of thread 2 and extend the event structure with a set of events Br b matching these instructions. As we have already seen, the labels of the new events can differfrom their counterparts in G —they only have to agree for the covered and issued events. Inthis case, we set Br b = { e , e , e } , and adding them to the event structure S a gives usevent structure S b shown in Fig. 7.In more detail, we need to build a run of thread-local semantics prog (2) e −−→ e −−→ e −−→ σ such that (1) it contains events corresponding to all the events of thread 2 up to e ( i.e., e , e , e ) with the same location, type, and thread identifier and (2) any events correspondingto covered or issued events ( i.e., e ) should also have the same value as the correspondingevent in G .Then, following the run of the thread-local semantics, we should extend the event structure S a to S b by adding new events Br b , and ensure that the constructed event structure S b isconsistent (Def. 5) and simulates the configuration T C b . In particular, it means that:for each read event in Br b we need to pick a justification write event, which is eitheralready present in S or po -preceed the read event;for each write event in Br b we should determine its position in co order of the eventstructure.Finally, we need to update the selected execution by replacing all events of thread 2 by thenew events Br b : X b (cid:44) X a \ S. thread ( { } ) ∪ Br b . E C O O P 2 0 2 0 :18 Reconciling Event Structures with Modern Multiprocessors
In order to determine whence these read events should be justified (and hence what valuethey should return), we have adopted the approach of Podkopaev et al. [20] for a similarproblem with certifying promises in the compilation proof from PS to IMM . The constructionrelies on several auxiliary definitions.First, given an execution G and a traversal configuration h C, I i , we define the set of determined events to be those events of G that must have equal counterparts in S . Inparticular, this means that S should assign to these events the same label as G , and thus thesame reads-from source for the read events. G. determined h C,I i (cid:44) C ∪ I ∪ dom (( G. rf ∩ G. po ) ? ; G. ppo ; [ I ]) ∪ codom ([ I ] ; ( G. rf ∩ G. po ))Besides covered and issued events, the set of determined events also contains the ppo -prefixesof issued events, since issued events may depend on their values, as well as any internal readsreading from issued events, since their values are also determined by the issued events.For the graph G and traversal configuration T C b , the set of determined events containsevents e , e , and e . (The events e and e are issued, whereas e has a ppo edge to e .)In contrast, events e , e , and e are not determined, since their corresponding events in S read/write a different value.Second, we introduce the viewfront relation ( vf ) to contain all the writes that have beenobserved at a certain point in the graph. That is, the edge h w, e i ∈ G. vf T C indicates thatthe write w either happens before e , is read by a covered event happening before e , or isread by a determined read earlier in the same thread as e . G. vf h C,I i (cid:44) [ G. W ] ; ( G. rf ; [ C ]) ? ; G. hb ? ∪ G. rf ; [ G. determined h C,I i ] ; G. po ? Figure 7 depicts three G. vf T C b edges. Since G. vf T C ; G. po ⊆ G. vf T C , the other incomingviewfront edges to thread 2 can be derived. Note that there is no edge from e to thread 2,since e neither happens before any event in thread 2 nor is read by any determined read.Finally, we construct the stable justification relation ( sjf ) that helps us justify the readevents in Br b in the event structure: G. sjf T C (cid:44) ([ G. W ] ; ( G. vf T C ∩ = G. loc ) ; [ G. R ]) \ ( G. co ; G. vf T C )It relates a read event r to the co -last ‘observed’ write event with same location. Assumingthat G is IMM -consistent, it can be shown that G. sjf agrees with G. rf on the set ofdetermined reads. G. sjf T C ; [ G. determined T C ] ⊆ G. rf For the graph G and traversal configuration T C b shown in Fig. 7 the sjf relation coincideswith the depicted vf edges: i.e., we have h Init , e i , h Init , e i , h e , e i ∈ G. sjf T C b .Having sjf T C b as a guide for values read by instructions in the certification run, weconstruct the steps of the thread-local operational semantics prog (2) −→ ∗ σ using thereceptiveness property of the thread’s semantics, which essentially says that given an executiontrace τ = e , ... , e n of the thread semantics, and a subset of events K ⊆ { e , ... , e n − } alongthat trace that have no ppo -successors in the graph, we arbitrarily change the values of readevents in K , and there exist values for the write events in K such that the updated executiontrace is also a trace of the thread semantics. The formal definition of the receptiveness property is quite elaborate. For the detailed definition werefer the reader to the Coq development of
IMM [7]. . Moiseenko et al. 5:19
Init e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, T C c ppo ppo Init e : R ( x, e : W ( y, e : W ( z, e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, cfcoew The event structure S c andthe selected execution X c Figure 8
The traversal configuration
T C c , the related event structure S c , and the selectedexecution X c . The relation sjf
T C b is also used to pick justification writes for the read events in Br b . Wehave proved that each sjf edge either starts in some issued event (of the previous traversalconfiguration) or it connects two events that are related by po : G. sjf T C b ⊆ [ I a ] ; G. sjf T C b ∪ G. po In the former case, thanks to the property 4 of our simulation relation, we can pick awrite event from X a corresponding to the issued write ( e.g., for Fig. 7, it is the event e ,corresponding to the issued write e ). In the latter case, we pick either the initial write orsome S b . po preceding write belonging to Br b . In order to pick the S b . co position of the new write events in the updated event structure, wegenerally follow the original G. co order of the IMM graph. Because of the conflicting events,however, it is not always possible to preserve the inclusion between the relations. This iswhy we relax the inclusion to (cid:86) S. co (cid:87) ⊆ G. co ? in property 12 of the simulation relation.To see the problem let us return to the example. Suppose that the next traversal stepcovers the read e . To simulate this step, we build an event structure S c (see Fig. 8). Itcontains the new events Br c (cid:44) { e , e , e } .Consider the write events e and e of the event structure. Since the events havedifferent labels, we cannot make them ew -equivalent. And since S c . co should be total amongall writes to the same location (with respect to S c . ew ), we must put a co edge between thesetwo events in one direction or another. Note that events e and e correspond to the sameevent e in the graph, thus we cannot use the coherence order of the graph G. co to guideour decision.In fact, the co -order between these two events does not matter, so we could pick eitherdirection. For the purposes of our proofs, however, we found it more convenient to alwaysput the new events earlier in the co order (thus we have h e , e i ∈ S c . co ). Thereby we canshow that the co edges of the event structure ending in the new events, have correspondingedges in the graph: (cid:86) S c . co ; [ Br c ] (cid:87) ⊆ G. co .Now consider the events e and e . Since these events have the same label and correspondto the same event in G , we make them ew -equivalent. In fact, this choice is necessary for thecorrectness of our construction. Otherwise, the new events Br c would be deemed invisible,because of the S c . cf ∩ ( S c . jfe ; ( S c . po ∪ S c . jf ) ∗ ) path between e and e . Recall that onlythe visible events can be used to extract an execution from the event structure (Def. 7). E C O O P 2 0 2 0 :20 Reconciling Event Structures with Modern Multiprocessors
In general, assuming that I ( prog , G, h C, I i , S, X ) holds, we attach the new write event e to an S. ew equivalence class represented by the write event w , s.t. (i) w has the same s2g image as e , i.e., s2g ( w ) = s2g ( e ); (ii) w belongs to X and its s2g image is issued, that is w ∈ X ∩ (cid:84) I (cid:85) . If there is no such an event w , we put e S. co -after events such that their s2g images are ordered G. co -before s2g ( e ), and S. co -before events such that their s2g imagesare equal to s2g ( e ) or ordered G. co -after it. Note that thanks to property 9 of the simulationrelation, that is dom ( S. jfe ) ⊆ dom ( S. ew ; [ X ∩ (cid:84) I (cid:85) ]), our choice of ew guarantees that allnew events will be visible. To sum up, to prove Lemma 3, we consider the events of G. thread ( { t } ) where t is thethread of the event issued or covered by the traversal step T C −→ T C , together with the sjf relation determining the values of the read events. At this point, we can show that I -conditions for the new configuration T C hold for all events except for those in thread t .Because of receptiveness, there exists a sequence of the thread steps prog ( t ) −→ ∗ σ forsome thread state σ such that the labels on this sequence match the events G. thread ( { t } )with the labels determined by sjf , and include an event with the same label as the oneissued or covered by the traversal step T C −→ T C .We then do an induction on this sequence of steps, and add each event to the eventstructure S and to its selected subset of events X (unless already there), showing along theway that the I -conditions also hold for the updated event structure, selected subset, andthe events added. At the end, when we have considered all the events generated by thestep sequence, we will have generated the event structure S and execution X such that I ( prog , G, T C , S , X ) holds. In this section, we briefly describe the changes needed in order to handle the compilationof
Weakestmo ’s sequentially consistent (SC) accesses. The purpose of SC accesses is toguarantee sequential consistency for the simple programming pattern that uses exclusivelySC accesses to communicate between threads. As Lahav et al. [14] showed, however, theirsemantics is quite complicated because they can be freely mixed with non-SC accesses.We first define an extension of
IMM , which we call
IMM SC . Its consistency extends thatof IMM with an additional acyclicity requirement concerning SC accesses, which is takendirectly from
RC11 -consistency [14, Definition 1]. (cid:73)
Definition 10.
An execution graph G is IMM SC -consistent if it is IMM -consistent [20,Definition 3.11] and G. psc base ∪ G. psc F is acyclic, where: G. scb (cid:44) G. po ∪ G. po | = G. loc ; G. hb ; G. po | = G. loc ∪ G. hb | = loc ∪ G. co ∪ G. fr G. psc base (cid:44) ([ G. E sc ] ∪ [ G. F sc ] ; G. hb ? ) ; G. scb ; ([ G. E sc ] ∪ G. hb ? ; [ G. F sc ]) G. psc F (cid:44) [ G. F sc ]; ( G. hb ∪ G. hb ; G. eco ; G. hb ); [ G. F sc ]The scb , psc base and psc F relations were carefully designed by Lahav et al. [14] (andrecently adopted by the C++ standard), so that they provide strong enough guarantees for In IMM SC , event labels include an “access mode”, where sc denotes an SC access. The sets G. E sc consists of all SC accesses (reads, writes and fences) in G , and G. F sc consists of all SC fences in G . . Moiseenko et al. 5:21 programmers while being weak enough to support the intended compilation of SC accessesto commodity hardware. In particular, a previous (simpler) proposal in [2], which essentiallyincludes G. hb between SC accesses in the relation required to be acyclic, is too strongfor efficient compilation to the POWER architecture. Indeed, the compilation schemes toPOWER do not enforce a strong barrier on hb -paths between SC accesses, but rather on G. po ; G. hb ; G. po -paths between SC accesses. (cid:73) Remark 11.
The full
IMM model ( i.e., including release/acquire accesses and SC fences, asdefined by Podkopaev et al. [20]) forbids cycles in rfe ∪ ppo ∪ bob ∪ psc F , where bob is (similarto ppo ) a subset of the program order that must be preserved due to the presence of a memoryfence or release/acquire access. Since psc F is already included in IMM ’s acyclicity constraint,one may consider the natural option of including psc base in that acyclicity constraint as well.However, it leads to a model that is too strong, as it forbids the following behavior: a := [ x ] rlx // [ y ] sc := 1 [ y ] sc := 2 b := [ y ] rlx // [ x ] rlx := b R rlx ( x, W sc ( y, W sc ( y, R rlx ( y, W rlx ( x, bob coepsc base rfe pporfe This behavior is allowed by
POWER (using any of the two intended compilation schemes forSC accesses; see §5.1.2).Adapting the compilation from
Weakestmo to IMM SC to cover SC accesses is straightfor-ward because the full definition of Weakestmo [6] does not have any additional constraintsabout SC accesses at the level of event structures. It only has an SC constraint at the level ofextracted executions which is actually the same as in
RC11 , which we took as is for
IMM SC . IMM SC to Hardware In this section, we establish describe the extension of the results of [20] to support SC accesseswith their intended compilation schemes to the different architectures.As was done in [20], since
IMM SC and the models of hardware we consider are alldefined in the same declarative framework (using execution graphs), we formulate ourresults on the level of execution graphs. Thus, we actually consider the mapping of IMM SC execution graphs to target architecture execution graphs that is induced by compilationof IMM SC programs to machine programs. Hence, roughly speaking, for each architecture α ∈ { TSO , POWER , ARMv7 , ARMv8 } , our (mechanized) result takes the following form:If the α -execution-graph G α corresponds to the IMM SC -execution-graph G , then α -consistency of G α implies IMM SC -consistency of G .Since the mapping from Weakestmo to IMM SC (on the program level) is the identity mapping (Theorem 1), we obtain as a corollary the correctness of the compilation from Weakestmo toeach architecture α that we consider. The exact notions of correspondence between G α and G are presented in [17, Appendices B, C and D].The mapping of IMM SC to each architecture follows the intended compilation schemeof C/C++11 [16, 14], and extends the corresponding mappings of IMM from Podkopaev et al. [20] with the mapping of SC reads and writes. Next, we schematically present theseextensions.
E C O O P 2 0 2 0 :22 Reconciling Event Structures with Modern Multiprocessors
TSO
There are two alternative sound mappings of SC accesses to x86 - TSO :Fence after SC writes Fence before SC reads( | R sc | ) (cid:44) mov ( | R sc | ) (cid:44) mfence;mov ( | W sc | ) (cid:44) mov;mfence ( | W sc | ) (cid:44) mov ( | RMW sc | ) (cid:44) (lock) xchg ( | RMW sc | ) (cid:44) (lock) xchg The first, which is implemented in mainstream compilers, inserts an mfence after every SCwrite; whereas the second inserts an mfence before every SC read. Importantly, one should globally apply one of the two mappings to ensure the existence of an mfence between everySC write and following SC read.
POWER
There are two alternative sound mappings of SC accesses to
POWER :Leading sync
Trailing sync ( | R sc | ) (cid:44) sync; ( | R acq | ) ( | R sc | ) (cid:44) ld;sync ( | W sc | ) (cid:44) sync;st ( | W sc | ) (cid:44) ( | W rel | ) ;sync ( | RMW sc | ) (cid:44) sync; ( | RMW acq | ) ( | RMW sc | ) (cid:44) ( | RMW rel | ) ;sync The first scheme inserts a sync before every SC access, while the second inserts an sync after every SC access. Importantly, one should globally apply one of the two mappings toensure the existence of a sync between every two SC accesses.Observing that sync is the result of mapping an SC-fence to
POWER , we can reuse theexisting proof for the mapping of
IMM to POWER . To handle the leading sync (respectively,trailing sync ) scheme we introduce a preceding step, in which we prove that splitting in thewhole execution graph each SC access to a pair of an SC fence followed (preceded) by arelease/acquire access is a sound transformation under
IMM SC . That is, this global executiongraph transformation cannot make an inconsistent execution consistent: (cid:73) Theorem 12.
Let G be an execution graph such that [ R sc ∪ W sc ] ; ( G. po ∪ G. po ; G. hb ; G. po ) ; [ R sc ∪ W sc ] ⊆ G. hb ; [ F sc ] ; G. hb , where G. po (cid:44) G. po \ G. rmw . Let G be the execution graph obtained from G by weakeningthe access modes of SC write and read events to release and acquire modes respectively. Then, IMM SC -consistency of G follows from IMM -consistency of G . Having this theorem, we can think about mapping of
IMM SC to POWER as if it consistsof three steps. We establish the correctness of each of them separately. At the
IMM SC level, we globally split each SC-access to an SC-fence and release/acquireaccess. Correctness of this step follows by Theorem 12. We map
IMM to POWER , whose correctness follows by the existing results of [20], sincewe do not have SC accesses at this stage. We remove any redundant fences introduced by the previous step. Indeed, following theleading sync scheme, we will obtain sync;lwsync;st for an SC write. The lwsync isredundant here since sync provides stronger guarantees than lwsync and can be removed.Similarly, following the trailing sync scheme, we will obtain ld;cmp;bc;isync;sync foran SC read. Again, the sync makes other synchronization instructions redundant. . Moiseenko et al. 5:23
ARMv7
The
ARMv7 model [1] is very similar to the
POWER model with the main difference beingthat it has a weaker preserved program order than
POWER . However, Podkopaev et al. [20]proved
IMM to POWER compilation correctness without relying on
POWER ’s preservedprogram order explicitly, but assuming the weaker version of
ARMv7 ’s order. Thus, theirproof also establishes correctness of compilation from
IMM to ARMv7 .Extending the proof to cover SC accesses follows the same scheme discussed for
POWER ,since two intended mappings of SC accesses for
ARMv7 are the same except for replacing
POWER ’s sync fence with ARMv7 ’s dmb :Leading dmb Trailing dmb ( | R sc | ) (cid:44) dmb; ( | R acq | ) ( | R sc | ) (cid:44) ldr;dmb ( | W sc | ) (cid:44) dmb;str ( | W sc | ) (cid:44) ( | W rel | ) ;dmb ( | RMW sc | ) (cid:44) dmb; ( | RMW acq | ) ( | RMW sc | ) (cid:44) ( | RMW rel | ) ;dmb ARMv8
Since
ARMv8 has added dedicated instructions to support C/C++-style SC accesses, wehave established the correctness of a mapping employing these new instructions:( | R sc | ) (cid:44) LDAR ( | W sc | ) (cid:44) STLR ( | FADD sc | ) (cid:44) L:LDAXR;STLXR;BC L ( | CAS sc | ) (cid:44) L:LDAXR;CMP;BC Le;STLXR;BC L;Le:
We note that in this mapping, we follow Podkopaev et al. [20] and compile RMW opera-tions to loops with load-linked and store-conditional instructions (
LDX / STX ). An alternativemapping for RMWs would be to use single hardware instructions, such as
LDADD and
CAS , thatdirectly implement the required functionality. Unfortunately, however, due to a limitation ofthe current
IMM setup and unclarity about the exact semantics of the
CAS instruction, weare not able to prove the correctness of the alternative mapping employing these instructions.The problem is that
IMM assumes that every po -edge from a RMW instruction is preserved,which holds for the mapping of CAS using the aforementioned loop, but not necessarily usingthe single instruction.
While there are several memory model definitions both for hardware architectures [1, 10, 18,22, 23] and programming languages [3, 4, 11, 15, 19, 21] in the literature, there are relativelyfew compilation correctness results [6, 9, 12, 14, 20, 25].Most of these compilation results do not tackle any of the problems caused by po ∪ rf cycles,which are the main cause of complexity in establishing correctness of compilation mappingsto hardware architectures. A number of papers ( e.g., [6, 12, 25]) consider only hardwaremodels that forbid such cycles, such as x86-TSO [18] and “strong POWER” [13], while others( e.g., [9]) consider compilation schemes that introduce fences and/or dependencies so as toprevent po ∪ rf cycles. The only compilation results where there is some non-trivial interplayof dependencies are by Lahav et al. [14] and by Podkopaev et al. [20].The former paper [14] defines the RC11 model (repaired C11), and establishes a numberof results about it, most of which are not related to compilation. The only relevant resultis its pencil-and-paper correctness proof of a compilation scheme from
RC11 to POWER
E C O O P 2 0 2 0 :24 Reconciling Event Structures with Modern Multiprocessors that adds a fence between relaxed reads and subsequent relaxed writes, but not betweennon-atomic accesses. As such, the only po ∪ rf cycles possible under the compilation schemeinvolve a racy non-atomic access. Since non-atomic races have undefined semantics in RC11,whenever there is such a cycle, the proof appeals to receptiveness to construct a differentacyclic execution exhibiting the race.The latter paper [20] introduced IMM and used it to establish correctness of compilationfrom the “promising semantics” ( PS ) [12] to the usual hardware models. As already men-tioned, IMM ’s definition catered precisely for the needs of the PS compilation proof, andso did not include important features such as sequentially consistent (SC) accesses. Ourcompilation proof shares some infrastructure with that proof—namely, the definition of IMM and traversals—but also has substantial differences because PS is quite different from Weakestmo . The main challenges in the PS proof were (1) to encode the various orders ofthe IMM execution graphs with the timestamps of the PS machine, and (2) to construct thecertification runs for each outstanding promise. In contrast, the main technical challenge inthe Weakestmo compilation proof is that event structures represent several possible executionsof the program together, and that
Weakestmo consistency includes constraints that correlatethese executions, allowing one execution to affect the consistency of another.
In this paper, we presented the first correctness proof of mapping from the
Weakestmo memory model to a number of hardware architectures. As a way to show correctness of
Weakestmo compilation to hardware, we employed
IMM [20], which we extended with SCaccesses, from which compilation to hardware follows.Although relying on
IMM modularizes the compilation proof and makes it easy to extendto multiple architectures, it does have one limitation. As was discussed in §5.1.4,
IMM enforces ordering between RMW events and subsequent memory accesses, while one desirablealternative compilation mapping of RMWs to
ARMv8 does not enforce this ordering, whichmeans that we cannot prove soundness of that mapping via the current definition of
IMM .We are investigating whether one can weaken the corresponding
IMM constraint, so that wecan establish correctness of the alternative
ARMv8 mapping as well.Another way to establish correctness of this alternative mapping to
ARMv8 may be to usethe recently developed Promising-ARM model [23]. Indeed, since Promising-ARM is closelyrelated to PS [12], it should be relatively easy to prove the correctness of compilation from PS to Promising-ARM. Establishing compilation correctness of Weakestmo to Promising-ARM, however, would remain unresolved because
Weakestmo and PS are incomparable [6].Moreover, a direct compilation proof would probably also be quite difficult because of therather different styles in which these models are defined. Acknowledgments.
Evgenii Moiseenko and Anton Podkopaev were supported by RFBR(grant number 18-01-00380). Ori Lahav was supported by the Israel Science Foundation(grant number 5166651), by Len Blavatnik and the Blavatnik Family foundation, and by theAlon Young Faculty Fellowship.
References Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling, simulation,testing, and data mining for weak memory.
ACM Trans. Program. Lang. Syst. , 36(2):7:1–7:74,July 2014. URL: http://doi.acm.org/10.1145/2627752 , doi:10.1145/2627752 . . Moiseenko et al. 5:25 Mark Batty, Alastair F. Donaldson, and John Wickerson. Overhauling SC atomics in C11 andOpenCL. In
POPL 2016 , pages 634–648. ACM, 2016. Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. Mathematizing C++concurrency. In
POPL 2011 , pages 55–66, New York, 2011. ACM. doi:10.1145/1925844.1926394 . John Bender and Jens Palsberg. A formalization of java’s concurrent access modes.
Proc.ACM Program. Lang. , 3(OOPSLA):142:1–142:28, October 2019. URL: http://doi.acm.org/10.1145/3360568 , doi:10.1145/3360568 . Hans-J. Boehm and Brian Demsky. Outlawing ghosts: Avoiding out-of-thin-air results. In
MSPC 2014 , pages 7:1–7:6. ACM, 2014. doi:10.1145/2618128.2618134 . Soham Chakraborty and Viktor Vafeiadis. Grounding thin-air reads with event structures.
Proc. ACM Program. Lang. , 3(POPL):70:1–70:27, 2019. doi:10.1145/3290383 . The Coq development of IMM, available at http://github.com/weakmemory/imm , 2019. Will Deacon. The ARMv8 application level memory model, 2017. URL: https://github.com/herd/herdtools7/blob/master/herd/libdir/aarch64.cat . Stephen Dolan, KC Sivaramakrishnan, and Anil Madhavapeddy. Bounding data races in spaceand time. In
PLDI 2018 , pages 242–255, New York, 2018. ACM. URL: http://doi.acm.org/10.1145/3192366.3192421 , doi:10.1145/3192366.3192421 . Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, Luc Maranget,Will Deacon, and Peter Sewell. Modelling the ARMv8 architecture, operationally: Concurrencyand ISA. In
POPL 2016 , pages 608–621, New York, 2016. ACM. URL: http://doi.acm.org/10.1145/2837614.2837615 , doi:10.1145/2837614.2837615 . Alan Jeffrey and James Riely. On thin air reads towards an event structures model of relaxedmemory. In
LICS 2016 , pages 759–767, New York, 2016. ACM. URL: http://doi.acm.org/10.1145/2933575.2934536 , doi:10.1145/2933575.2934536 . Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. A promisingsemantics for relaxed-memory concurrency. In
POPL 2017 , pages 175–189, New York, 2017.ACM. doi:10.1145/3009837.3009850 . Ori Lahav and Viktor Vafeiadis. Explaining relaxed memory models with program transfor-mations. In
FM 2016 . Springer, 2016. doi:10.1007/978-3-319-48989-6_29 . Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. Repairingsequential consistency in C/C++11. In
PLDI 2017 , pages 618–632, New York, 2017. ACM. doi:10.1145/3062341.3062352 . Jeremy Manson, William Pugh, and Sarita V. Adve. The Java memory model. In
POPL 2005 ,pages 378–391, New York, 2005. ACM. doi:10.1145/1040305.1040336 . C/C++11 mappings to processors, 2016. URL: . Evgenii Moiseenko, Anton Podkopaev, Ori Lahav, Orestis Melkonian, and Viktor Vafeiadis.Coq proof scripts and supplementary material for this paper, available at http://plv.mpi-sws.org/weakestmoToImm/ , 2020. Scott Owens, Susmit Sarkar, and Peter Sewell. A better x86 memory model: x86-TSO. In
TPHOLs 2009 , volume 5674 of
LNCS , pages 391–407, Heidelberg, 2009. Springer. Jean Pichon-Pharabod and Peter Sewell. A concurrency semantics for relaxed atomics thatpermits optimisation and avoids thin-air executions. In
POPL 2016 , pages 622–633, New York,2016. ACM. doi:10.1145/2837614.2837616 . Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. Bridging the gap between programminglanguages and hardware weak memory models.
Proc. ACM Program. Lang. , 3(POPL):69:1–69:31, 2019. doi:10.1145/3290382 . Anton Podkopaev, Ilya Sergey, and Aleksandar Nanevski. Operational aspects of C/C++concurrency.
CoRR , abs/1606.01400, 2016. URL: http://arxiv.org/abs/1606.01400 . E C O O P 2 0 2 0 :26 Reconciling Event Structures with Modern Multiprocessors Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit Sarkar, and Peter Sewell.Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8.
Proc. ACM Program. Lang. , 2(POPL):19:1–19:29, 2018. doi:10.1145/3158107 . Christopher Pulte, Jean Pichon-Pharabod, Jeehoon Kang, Sung-Hwan Lee, and Chung-Kil Hur. Promising-ARM/RISC-V: a simpler and faster operational concurrency model.In
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Designand Implementation , PLDI 2019, pages 1–15, New York, NY, USA, 2019. ACM. URL: http://doi.acm.org/10.1145/3314221.3314624 , doi:10.1145/3314221.3314624 . Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and FrancescoZappa Nardelli. Common compiler optimisations are invalid in the C11 memory modeland what we can do about it. In
POPL 2015 , pages 209–220, New York, 2015. ACM. doi:10.1145/2676726.2676995 . Jaroslav Ševčík, Viktor Vafeiadis, Francesco Zappa Nardelli, Suresh Jagannathan, and PeterSewell. CompCertTSO: A verified compiler for relaxed-memory concurrency.
J. ACM , 60(3):22,2013. doi:10.1145/2487241.2487248 . . Moiseenko et al. 5:27 A Simulation Relation for the complete
Weakestmo model
Here we present the simulation relation I T ( prog , G, T C, S, X ) and the auxiliary relation I cert ( prog , G, h C, I i , h C , I i , S, X, t, Br, σ, σ ) for the complete Weakestmo memory model.In addition to the relaxed accesses the full versions of the relations handle fences, read-modify-write pairs, release, acquire and sequentially consistent accesses.We define the relation I T ( prog , G, h C, I i , S, X ) to hold if the following conditions are met: G is an IMM SC -consistent execution of prog . S is a Weakestmo -consistent event structure of prog . X is an extracted subset of S . The s2g -image of X is equal to the union of the covered and issued events and the eventswhich po -precede the issued ones: (cid:86) X (cid:87) = C ∪ dom ( G. po ? ; [ I ]) The s2g -image of the event from the thread t ∈ T lies in C ∪ dom ( G. po ? ; [ I ]). (cid:86) S. thread ( T ) (cid:87) ⊆ C ∪ dom ( G. po ? ; [ I ]) The s2g -image of S ’s event has the same thread, type, modifier, and location. Additionally,the s2g -image of X ’s event which is covered or issued has the same value: a. ∀ e ∈ S. E . S. { tid , typ , loc , mod } ( e ) = G. { tid , typ , loc , mod } ( s2g ( e )) b. ∀ e ∈ X ∩ (cid:84) C ∪ I (cid:85) . S. val ( e ) = G. val ( s2g ( e )) The s2g -image of S. po is a subset of the G. po relation: (cid:86) S. po (cid:87) ⊆ G. po Identity relation in G corresponds to identity or conflict relation in S : (cid:84) id (cid:85) ⊆ S. cf ? The s2g -image of a justification edge is included in paths in G representing observationof the corresponding thread. The s2g -image of a justification edge is in G. rf if theedge ends either in domain of S. rmw , an acquire access, or followed by an acquire fence.Moreover, the s2g -image of S. jf ending in X matches the simulation reads-from relation: a. (cid:86) S. jf (cid:87) ⊆ G. rf ? ; ( G. hb ; [ G. F sc ]) ? ; G. psc ? F ; G. hb ? b. (cid:86) S. jf ; S. rmw (cid:87) ⊆ G. rf ; G. rmw c. (cid:86) S. jf ; ( S. po ; [ S. F ]) ? ; [ S. E w acq ] (cid:87) ⊆ G. rf ; ( G. po ; [ S. F ]) ? ; [ G. E w acq ] d. (cid:86) S. jf ; [ X ] (cid:87) ⊆ G. sjf ( T C )Using the last property it is possible to derive that (cid:86) S. jf ; [ X ∩ (cid:84) C (cid:85) ] (cid:87) ⊆ G. rf . Each write event in S which justifies some read event externally should be S. ew -equal toa write event in X whose s2g -image is issued: dom ( S. jfe ) ⊆ dom ( S. ew ; [ X ∩ (cid:84) I (cid:85) ]) The s2g -image of S. ew is a subset of the identity relation: (cid:86) S. ew (cid:87) ⊆ id Let w and w be different events in one S. ew equivalence class. Then, there is w in thisequivalence class s.t. w is in X and s2g ( w ) is issued: S. ew ⊆ ( S. ew ; [ X ∩ (cid:84) I (cid:85) ] ; S. ew ) ? The s2g -image of S. co lies in the reflexive closure of G. co . Additionally, s2g -images of S. co -edges ending in X ∩ S. thread ( T ) lay in G. co : a. (cid:86) S. co (cid:87) ⊆ G. co ? b. (cid:86) S. co ; [ X ∩ S. thread ( T ) (cid:87) ⊆ G. co The s2g -image of S. rmw is in G. rmw . Vice versa, G. rmw ending in the covered set is inthe s2g -image of S. rmw ending in X . a. (cid:86) S. rmw (cid:87) ⊆ G. rmw b. G. rmw ; [ C ] ⊆ (cid:86) S. rmw ; [ X ] (cid:87) E C O O P 2 0 2 0 :28 Reconciling Event Structures with Modern Multiprocessors
Let e , w , and w be events in S s.t. (i) h e, w i is an S. release edge, (ii) w and w is in thesame S. ew equivalence class, (iii) w is in X , and (iv) s2g ( w ) is issued. Then e is in X : dom ( S. release ; S. ew ; [ X ∩ (cid:84) I (cid:85) ]) ⊆ X This property is needed to show that dom ( S. hb \ S. po ) is included in X . Let r , r , w , and w be events in S s.t. (i) r and r are in immediate conflict and justifiedfrom w and w respectively, and (ii) r is in X and its thread is in T . Then s2g ( w ) is G. co -less than s2g ( w ): (cid:86) S. jf ; S. cf imm ; [ X ∩ S. thread ( T )] ; S. jf − (cid:87) ⊆ G. co This property is needed to prove cf imm -justification on the simulation step. For all t ∈ T there exists σ s.t. S. K C ( t ) −→ ∗ t σ and the thread-local execution graph σ.G is equivalent modulo rf and co components to the restriction of G to the thread t .In addition to I we also define a version of the simulation realtion which holds duringthe construction of a certification branch I cert .We define the relation I cert ( prog , G, h C, I i , h C , I i , S, X, t, Br, σ, σ ) to hold if the follow-ing conditions are met: I T \{ t } ( prog , G, h C, I i , S, X ) holds. G ‘ h C, I i −→ t h C , I i holds. σ and σ are thread states s.t. σ is reachable from σ , σ corresponds to the S. po -lastevent in Br and the partial execution graph of σ contains covered and issued events upto the G. po -last issued write in the thread t : a. σ −→ ∗ t σ b. σ.G. E = (cid:86) Br (cid:87) c. σ .G. E = G. thread ( t ) ∩ ( C ∪ dom ( G. po ? ; [ I ])) The set Br consists of the events from the thread t and covered prefixes of Br and X restricted to thread t coincide: a. Br ⊆ S. thread ( t ) b. Br ∩ (cid:84) C (cid:85) = X ∩ S. thread ( t ) ∩ (cid:84) C (cid:85) The partial execution graph of σ assigns same thread identifier, type, location and modeas the full execution graph G does. Additionally, it assigns the same value as G todetermined events. a. ∀ e ∈ σ .G. E . σ .G. { tid , typ , loc , mod } ( e ) = G. { tid , typ , loc , mod } ( e ) b. ∀ e ∈ σ .G. E ∩ G. determined ( h C , I i ) . σ .G. val ( e ) = G. val ( e ) The s2g -image of the jf relation ending in Br is included in G. sjf ( h C , I i ): (cid:86) S. jf ; [ Br ] (cid:87) ⊆ G. sjf ( h C , I i ) For every issued event from Br there exists an S. ew -equivalent in X . And, symmetrically,every issued event from X within the processed part of the certification branch has an S. ew -equivalent in Br . a. Br ∩ (cid:84) I (cid:85) ⊆ dom ( S. ew ; [ X ]) b. X ∩ (cid:84) I ∩ σ.G. E (cid:85) ⊆ dom ( S. ew ; [ Br ]) The s2g -image of S. co ending in Br lies in G. co The s2g -image of S. co ending in X ∩ S. thread ( t ) and not in the processed part of the certification branch lies in G. co . a. (cid:86) S. co ; [ Br ] (cid:87) ⊆ G. co b. (cid:86) S. co ; [ X ∩ S. thread ( t ) \ (cid:84) σ.G. E (cid:85) ] (cid:87) ⊆ G. co Each G. rmw edge ending in the processed part of the certification branch is the s2g -imageof some S. rmw edge ending in Br . G. rmw ; [ C ∩ σ.G. E ] ⊆ (cid:86) S. rmw ; [ Br ] (cid:87) . Moiseenko et al. 5:29 ( | r := [ e ] rlx | ) ≈ “ ldr ” ( | [ e ] rlx := e | ) ≈ “ str ”( | r := [ e ] w acq | ) ≈ “ ldar ” ( | [ e ] w rel := e | ) ≈ “ stlr ”( | fence acq | ) ≈ “ dmb.ld ” ( | fence = acq | ) ≈ “ dmb.sy ”( | r := FADD o ( e , e ) | ) ≈ “ L: ” ++ ld( o ) ++ st( o ) ++ “ bc L ”( | r := CAS o ( e, e R , e W ) | ) ≈ “ L: ” ++ ld( o ) ++ “ cmp;bc Le ; ” ++ st( o ) ++ “bc L ; Le: ”ld( o ) (cid:44) o w acq ? “ ldaxr; ” : “ ldxr; ” st( o ) (cid:44) o w rel ? “ stlxr.; ” : “ stxr.; ” Figure 9
Compilation scheme from
IMM SC to ARMv8. Suppose w , w , r , and r are S ’s events s.t. (i) r and r are justified from w and w respectively, and (ii) r and r are in immediate conflict and belong to thread t . Then s2g ( w ) is G. co -greater than s2g ( w ) if either r is in Br : (cid:86) S. jf ; S. cf imm ; [ Br ] ; S. jf − (cid:87) ⊆ G. co or r is not in Br and r is in X ∩ S. thread ( t ): (cid:86) S. jf ; [ S. E \ Br ] ; S. cf imm ; [ X ∩ S. thread ( t )] ; S. jf − (cid:87) ⊆ G. co B From
IMM SC to ARMv8
The intended mapping of
IMM to ARMv8 is presented schematically in Fig. 9 and follows [16].Note that acquire and SC loads are compiled to the same instruction ( ldar ) as well as releaseand SC stores ( stlr ). In
ARM assembly RMWs are represented as pairs of instructions— exclusive load ( ldxr ) followed by exclusive store ( stxr ), and these instructions are also havetheir stronger (SC) counterparts— ldaxr and stlxr .We use
ARMv8 declarative model [8] (see also [22]). Its labels are given by:
ARM read label: R o R ( x, v ) where x ∈ Loc , v ∈ Val , o R ∈ { rlx , Q , A } , and rlx (cid:64) Q (cid:64) A . ARM write label: W o W ( x, v ) where x ∈ Loc , v ∈ Val , o W ∈ { rlx , L } , and rlx (cid:64) L . ARM fence label: F o F where o F ∈ { ld , sy } and ld (cid:64) sy .In turn, ARM ’s execution graphs are defined as
IMM SC ’s ones, except for the CAS dependency, casdep , which is not present in ARM executions.The definition of
ARMv8 -consistency requires the following derived relations (see [22] forfurther explanations and details): obs (cid:44) rfe ∪ fre ∪ coe ( observed-by ) dob (cid:44) ( addr ∪ data ); rfi ? ∪ ( ctrl ∪ data ); [ W ]; coi ? ∪ addr ; po ; [ W ]( dependency-ordered-before ) aob (cid:44) rmw ∪ [ W ex ]; rfi ; [ R w Q ] ( atomic-ordered-before ) bob (cid:44) po ; [ F sy ]; po ∪ [ R ]; po ; [ F ld ]; po ∪ [ R w Q ]; po ∪ po ; [ W L ]; coi ? ∪ [ W L ]; po ; [ R A ]( barrier-ordered-before ) (cid:73) Definition 13. An ARMv8 execution graph G a is called ARMv8 -consistent if the followinghold:codom ( G a . rf ) = G a . R .For every location x ∈ Loc , G a . co totally orders G a . W ( x ) . G a . po | loc ∪ G a . rf ∪ G a . fr ∪ G a . co is acyclic. (sc-per-loc) G a . rmw ∩ ( G a . fre ; G a . coe ) = ∅ . We only describe the fragment of the model that is needed for mapping of
IMM SC , thus excluding isb fences. E C O O P 2 0 2 0 :30 Reconciling Event Structures with Modern Multiprocessors ( | r := [ e ] = sc | ) ≈ “ mov ” ( | fence = sc | ) ≈ “”( | [ e ] = sc := e | ) ≈ “ mov ” ( | fence sc | ) ≈ “ mfence ”( | r := FADD o ( e , e ) | ) ≈ “ (lock) xadd ”( | r := CAS o ( e, e R , e W ) | ) ≈ “ (lock) cmpxchg ” Alt. 1: ( | r := [ e ] sc | ) ≈ “ mov ” Alt. 2: ( | r := [ e ] sc | ) ≈ “ mfence;mov ”( | [ e ] sc := e | ) ≈ “ mov;mfence ” ( | [ e ] sc := e | ) ≈ “ mov ” Figure 10
Compilation scheme from
IMM SC to TSO . G a . obs ∪ G a . dob ∪ G a . aob ∪ G a . bob is acyclic. (external) We interpret the intended compilation on execution graphs: (cid:73)
Definition 14.
Let G be an IMM execution graph. An
ARM execution graph G a corresponds to G if the following hold: G a . E = G. E and G a . po = G. po G a . lab = { e ( | G. lab ( e ) | ) | e ∈ G. E } where: ( | R rlx s ( x, v ) | ) (cid:44) R rlx ( x, v ) ( | W rlx ( x, v ) | ) (cid:44) W rlx ( x, v )( | R acq s ( x, v ) | ) (cid:44) R Q ( x, v ) ( | W w rel ( x, v ) | ) (cid:44) W L ( x, v )( | R sc s ( x, v ) | ) (cid:44) R A ( x, v )( | F acq | ) (cid:44) F ld ( | F rel | ) = ( | F acqrel | ) = ( | F sc | ) (cid:44) F sy G. rmw = G a . rmw , G. data = G a . data , and G. addr = G a . addr (the compilation does not change RMW pairs and data/address dependencies) G. ctrl ⊆ G a . ctrl (the compilation only adds control dependencies) [ G. R ex ] ; G. po ⊆ G a . ctrl ∪ G a . rmw ∩ G a . data (exclusive reads entail a control dependency to any future event, except for their immediateexclusive write successor if arose from an atomic increment) G. casdep ; G. po ⊆ G a . ctrl (CAS dependency to an exclusive read entails a control dependency to any future event) We state our theorem that ensures
IMM SC -consistency if the corresponding ARMv8 execution graph is
ARMv8 -consistent. (cid:73)
Theorem 15.
Let G be an IMM execution graph with whole serial numbers ( sn [ G. E ] ⊆ N ),and let G a be an ARMv8 execution graph that corresponds to G . Then, ARMv8 -consistencyof G a implies IMM SC -consistency of G . Outline.
IMM -consistency of G follows from [20, Theorem 4.5]. That is, we only need to showthat acyclicity of G. psc base ∪ G. psc F holds. We start by showing that G a . obs ∪ G a . dob ∪ G a . aob ∪ G a . bob is acyclic, where obs (cid:44) rfe ∪ fr ∪ cobob (cid:44) bob ∪ [ R ]; po ; [ F ld ] ∪ po ; [ F sy ] ∪ [ F w ld ]; po Then, we finish the proof by showing that G a . psc base ∪ G a . psc F is included in ( G a . obs ∪ G a . dob ∪ G a . aob ∪ G a . bob ) + . (cid:74) . Moiseenko et al. 5:31 C From
IMM SC to TSO
The intended mapping of
IMM SC to TSO is presented schematically in Fig. 10. There aretwo possible alternatives for compiling SC accesses (see the bottom of Fig. 10): to compilean SC store to a store followed by a fence or to compile an SC load to a load preceded by afence. Both of the schemes guarantee that in compiled code there is a fence between everystore and load instructions originated from SC accesses. Regarding compilation schemes ofSC accesses, our proof of the compilation correctness from
IMM SC to TSO depends only onthis property. That is, in this section, we concentrate only on the compilation alternativewhich compiles SC stores using fences.As a model of the
TSO architecture, we use a declarative model from [1]. Its labels aregiven by:
TSO read label: R ( x, v ) where x ∈ Loc and v ∈ Val . TSO write label: W ( x, v ) where x ∈ Loc and v ∈ Val . TSO fence label:
MFENCE .In turn,
TSO ’s execution graphs are defined as
IMM SC ’s ones. Below, we interpret thecompilation on execution graphs. (cid:73) Definition 16.
Let G be an IMM execution graph with whole identifiers ( G. E ⊆ N ). A TSO execution graph G t corresponds to G if the following hold: G t . E = G. E \ G. F = sc ∪ { n + 0 . | n ∈ G. W sc } (non-SC fences are removed) G t . tid ( e ) = G. tid ( b e c + 0 . for all e in G t G t . po =[ G t . E ] ; ( G. po ∪ {h a, n + 0 . i | h a, n i ∈ G. po ? } ∪ {h n + 0 . , a i | h n, a i ∈ G. po } ) ; [ G t . E ] (new events are added after SC writes) G t . lab = { e ( | G. lab ( e ) | ) | e ∈ G. E \ G. F = sc } ∪ { e MFENCE | e ∈ G t . E \ G. E } where: ( | R o R s ( x, v ) | ) (cid:44) R ( x, v ) ( | W o W ( x, v ) | ) (cid:44) W ( x, v ) ( | F sc | ) (cid:44) MFENCE G. rmw = G t . rmw , G. data = G t . data , and G. addr = G t . addr (the compilation does not change RMW pairs and data/address dependencies) G. ctrl ; [ G. E \ G. F = sc ] ⊆ G t . ctrl (the compilation only adds control dependencies) The following derived relations are used to define the
TSO -consistency predicate. ppo
TSO (cid:44) [ R ∪ W ]; po ; [ R ∪ W ] \ [ W ]; po ; [ R ] fence TSO (cid:44) [ R ∪ W ]; po ; [ MFENCE ]; po ; [ R ∪ W ] implied _ fence TSO (cid:44) [ W ]; po ; [ dom ( rmw )] ∪ [ codom ( rmw )]; po ; [ R ] hb TSO (cid:44) ppo
TSO ∪ fence TSO ∪ implied _ fence TSO ∪ rfe ∪ co ∪ fr (cid:73) Definition 17. G is called TSO -consistent if the following hold:codom ( G. rf ) = G. R . ( rf -completeness) For every location x ∈ Loc , G. co totally orders G. W ( x ) . ( co -totality) po | loc ∪ rf ∪ fr ∪ co is acyclic. (sc-per-loc) G. rmw ∩ ( G. fre ; G. coe ) = ∅ . (atomicity) G. hb TSO is acyclic. (tso-no-thin-air)
Next, we state our theorem that ensures
IMM SC -consistency if the corresponding TSO execution graph is
TSO -consistent.
E C O O P 2 0 2 0 :32 Reconciling Event Structures with Modern Multiprocessors (cid:73)
Theorem 18.
Let G be an IMM SC execution graph with whole identifiers ( G. E ⊆ N ), andlet G t be an TSO execution graph that corresponds to G . Then, TSO -consistency of G t implies IMM SC -consistency of G . Outline.
Since G t corresponds to G , we know that[ G. W sc ]; G. po ; [ G. R sc ] ⊆ G t . po ; [ G t . MFENCE ]; G t . po as the aforementioned property of the compilation scheme. We show that G t . ehb TSO (cid:44) G t . hb TSO ∪ [ G t . MFENCE ]; G t . po ∪ [ G t . MFENCE ]; G t . po is acyclic. Then, we show that G. psc base ∪ G. psc F is included in G t . ehb + TSO . It means thatacyclicity of G. psc base ∪ G. psc F holds, and it leaves us to prove that G is IMM -consistent.That is done by standard relational techniques (see [7]). (cid:74) . Moiseenko et al. 5:33 ( | r := [ e ] rlx | ) ≈ “ ld ” ( | [ e ] rlx := e | ) ≈ “ st ”( | r := [ e ] acq | ) ≈ “ ld;cmp;bc;isync ” ( | [ e ] rel := e | ) ≈ “ lwsync;st ”( | fence = sc | ) ≈ “ lwsync ” ( | fence sc | ) ≈ “ sync ”( | r := FADD o ( e , e ) | ) ≈ wmod( o ) ++ “ L: lwarx;stwcx.;bc L ” ++ rmod( o )( | r := CAS o ( e, e R , e W ) | ) ≈ wmod( o ) ++ “ L: lwarx;cmp;bc Le ;stwcx.;bc L ; Le: ” ++ rmod( o )wmod( o ) (cid:44) o w rel ? “ lwsync; ” : “” rmod( o ) (cid:44) o w acq ? “ ;isync ” : “” Leading sync : Trailing sync : ( | r := [ e ] sc | ) ≈ “ sync;ld;cmp;bc;isync ” ( | r := [ e ] sc | ) ≈ “ ld;sync ”( | [ e ] sc := e | ) ≈ “ sync;st ” ( | [ e ] sc := e | ) ≈ “ lwsync;st;sync ” Figure 11
Compilation scheme from
IMM to POWER . D From
IMM SC to POWER
Here we use the same mapping of
IMM to POWER (see Fig. 11) as in [20] for all instructionsexcept for SC accesses. For the latter, there are two standard compilations schemes [16]presented in the bottom of Fig. 11: with leading and trailing sync fences.The next definition presents the correspondence between
IMM execution graphs and theirmapped
POWER ones following the leading compilation scheme in Fig. 11 with eliminationof the aforementioned redundancy of SC write compilation. (cid:73)
Definition 19.
Let G be an IMM execution graph with whole identifiers ( G. E ⊆ N ). A POWER execution graph G p corresponds to G if the following hold: G p . E = G. E ∪ { n + 0 . | n ∈ ( G. R w acq \ dom ( G. rmw )) ∪ codom ([ G. R w acq ] ; G. rmw ) }∪ { n − . | n ∈ ( G. E w rel \ dom ( G. rmw )) ∪ dom ( G. rmw ; [ G. W w rel ]) } (new events are added after acquire reads and acquire RMW pairs and before SC accessesand SC RMW pairs) G p . tid ( e ) = G. tid ( b e + 0 . c ) for all e in G p G p . po = G. po ∪ (( G p . E × G p . E ) ∩ ( {h a, n − . i | h a, n i ∈ G. po } ∪{h n − . , a i | h n, a i ∈ G. po ? } ∪{h a, n + 0 . i | h a, n i ∈ G. po ? } ∪{h n + 0 . , a i | h n, a i ∈ G. po } )) G p . lab = { e ( | G. lab ( e ) | ) | e ∈ G. E } ∪{ n + 0 . F isync | n + 0 . ∈ G p . E ∧ n ∈ N } ∪{ n − . F lwsync | n − . ∈ G p . E ∧ n ∈ N ∧ n G. E sc ∪ dom ( G. rmw ; [ G. W sc ]) } ∪{ n − . F sync | n − . ∈ G p . E ∧ n ∈ N ∧ n ∈ G. E sc ∪ dom ( G. rmw ; [ G. W sc ]) } where: ( | R o R s ( x, v ) | ) (cid:44) R ( x, v ) ( | F acq | ) = ( | F rel | ) = ( | F acqrel | ) (cid:44) F lwsync ( | W o W ( x, v ) | ) (cid:44) W ( x, v ) ( | F sc | ) (cid:44) F sync G. rmw = G p . rmw , G. data = G p . data , and G. addr = G p . addr (the compilation does not change RMW pairs and data/address dependencies) G. ctrl ⊆ G p . ctrl (the compilation only adds control dependencies) E C O O P 2 0 2 0 :34 Reconciling Event Structures with Modern Multiprocessors [ G. R w acq ] ; G. po ⊆ G p . rmw ∪ G p . ctrl (a control dependency is placed from every acquire or SC read) [ G. R ex ] ; G. po ⊆ G p . ctrl ∪ G p . rmw ∩ G p . data (exclusive reads entail a control dependency to any future event, except for their immediateexclusive write successor if arose from an atomic increment) G. data ; [ codom ( G. rmw )] ; G. po ⊆ G p . ctrl (data dependency to an exclusive write entails a control dependency to any future event) G. casdep ; G. po ⊆ G p . ctrl (CAS dependency to an exclusive read entails a control dependency to any future event) The correspondence between
IMM and
POWER execution graphs which follows the trailingcompilation scheme may be presented similarly with two main difference. First, obviously,SC accesses are compiled to release and acquire accesses followed by SC fences: G p . E = G. E ∪ { n + 0 . | n ∈ { ( G. E w acq \ dom ( G. rmw )) ∪ codom ([ G. R w acq ] ; G. rmw ) }∪ { n − . | n ∈ ( G. W w rel \ dom ( G. rmw )) ∪ dom ( G. rmw ; [ G. W w rel ]) } G p . lab = { e ( | G. lab ( e ) | ) | e ∈ G. E } ∪{ n + 0 . F isync | n + 0 . ∈ G p . E ∧ n ∈ N ∧ n ∈ G. R acq ∪ codom ([ G. R acq ] ; G. rmw ) } ∪{ n + 0 . F sync | n + 0 . ∈ G p . E ∧ n ∈ N ∧ n ∈ G. E sc ∪ codom ([ G. R sc ] ; G. rmw ) } ∪{ n − . F lwsync | n − . ∈ G p . E ∧ n ∈ N + } Second, [ G. R w acq ] ; G. po has to be included in G p . rmw ∪ G p . ctrl ∪ G p . po ; [ G p . F lwsync ] ; G p . po ? ,not just in G p . rmw ∪ G p . ctrl , to allow for elimination of the aforementioned SC readcompilation redundancy.The next theorem ensures IMM SC -consistency if the corresponding POWER executiongraph is
POWER -consistent. (cid:73)
Theorem 20.
Let G be an IMM execution graph with whole identifiers ( G. E ⊆ N ), and let G p be a POWER execution graph that corresponds to G . Then, POWER -consistency of G p implies IMM SC -consistency of G . Outline.
We construct an
IMM execution graph G by inserting SC fences before SC accessesin G . We also construct G NoSC from G by replacing SC write and read accesses of G with release write and acquire read ones respectively. Obviously, IMM SC -consistency of G follows from IMM SC -consistency of G , which, in turn, follows from IMM -consistency of G NoSC by Theorem 12. We construct an
IMM execution graph G from G NoSC by inserting releasefences before release writes, and then an
IMM execution graph G NoRel from G by weakeningthe access modes of release write events to a relaxed mode. As on a previous proof step, IMM -consistency of G NoSC follows from
IMM -consistency of G , which in turn follows from IMM -consistency of G NoRel by [20, Theorem 4.1].Thus to prove the theorem we need to show that G NoRel is IMM -consistent. Note that G p —the POWER execution graph corresponding to G —also corresponds to G NoRel by constructionof G NoRel . That is,
IMM -consistency of G NoRel follows from
POWER -consistency of G p by [20,Theorem 4.3] since G NoRel does not contain SC read and write access events as well as releasewrite access events.does not contain SC read and write access events as well as releasewrite access events.