[PDF] Reconciling Event Structures with Modern Multiprocessors

Abstract

Weakestmo is a recently proposed memory consistency model that uses event structures to resolve the infamous "out-of-thin-air" problem. Although it has been shown to have important benefits over other memory models, its established compilation schemes are suboptimal in that they add more fences than necessary. In this paper, we prove the correctness in Coq of the intended compilation schemes for Weakestmo to a range of hardware memory models (x86, POWER, ARMv7, ARMv8, RISC-V). Our proof is the first that establishes correctness of compilation of an event-structure-based model that forbids "thin-air" behaviors, as well as the first mechanized compilation proof of a weak memory model supporting sequentially consistent accesses to such a range of hardware platforms. Our compilation proof goes via the recent Intermediate Memory Model (IMM), which we suitably extend with sequentially consistent accesses.

Full PDF

RReconciling Event Structures with ModernMultiprocessors

Evgenii Moiseenko

St. Petersburg University, Russia and JetBrains Research, [email protected]

Anton Podkopaev

National Research University Higher School of Economics, Russia and MPI-SWS, Germany andJetBrains Research, [email protected]

Ori Lahav

Tel Aviv University, [email protected]

Orestis Melkonian

University of Edinburgh, [email protected]

Viktor Vafeiadis

MPI-SWS, [email protected]

Abstract

Weakestmo is a recently proposed memory consistency model that uses event structures to resolvethe infamous “out-of-thin-air” problem and to enable eﬃcient compilation to hardware. Nevertheless,this latter property—compilation correctness—has not yet been formally established.This paper closes this gap by establishing correctness of the intended compilation schemes from

Weakestmo to a wide range of formal hardware memory models ( x86 , POWER , ARMv7 , ARMv8 ) inthe Coq proof assistant. Our proof is the ﬁrst that establishes correctness of compilation of anevent-structure-based model that forbids “out-of-thin-air” behaviors, as well as the ﬁrst mechanizedcompilation proof of a weak memory model supporting sequentially consistent accesses to such arange of hardware platforms. Our compilation proof goes via the recent Intermediate Memory Model(

IMM ), which we suitably extend with sequentially consistent accesses.

Theory of computation → Logic and veriﬁcation; Software andits engineering → Concurrent programming languages

Keywords and phrases

Weak Memory Consistency, Event Structures, IMM, Weakestmo.

Digital Object Identiﬁer

A major research problem in concurrency semantics is to develop a weak memory model thatallows load-to-store reordering (a.k.a. load buﬀering , LB ) and compiler optimizations ( e.g., elimination of fake dependencies), while forbidding “out-of-thin-air” behaviors [19, 11, 5, 14].The problem can be illustrated with the following two programs, which access locations x and y initialized to 0. The annotated outcome a = b = 1 ought to be allowed for LB-fakebecause 1 + a ∗ a := [ x ] // [ y ] := 1 + a ∗ b := [ y ] // [ x ] := b (LB-fake) a := [ x ] // [ y ] := a b := [ y ] // [ x ] := b (LB-data) © Evgenii Moiseenko, Anton Podkopaev, Ori Lahav, Orestis Melkonian, Viktor Vafeiadis;licensed under Creative Commons License CC-BY34th European Conference on Object-Oriented Programming (ECOOP 2020).Editors: Robert Hirschfeld and Tobias Pape; Article No. 5; pp. 5:1–5:34Leibniz International Proceedings in InformaticsSchloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany a r X i v : . [ c s . P L ] M a y :2 Reconciling Event Structures with Modern Multiprocessors Among the proposed models that correctly distinguish between these two programs is therecent

Weakestmo model [6].

Weakestmo was developed in response to certain limitations ofearlier models, such as the “promising semantics” of Kang et al. [12], namely that ( i ) theydid not cover the whole range of C/C++ concurrency features and that ( ii ) they did notsupport the intended compilation schemes to hardware.Being ﬂexible in its design, Weakestmo addresses the former point. It supports all usualfeatures of the C/C++11 model [3] and can easily be adapted to support any new concurrencyfeatures that may be added in the future. It does not, however, fully address the latterpoint. Due to the diﬃculty of establishing correctness of the intended compilation schemesto hardware architectures that permit load-store reordering ( i.e.,

POWER , ARMv7 , ARMv8 ),Chakraborty and Vafeiadis [6] only establish correctness of suboptimal schemes that add(unnecessary) explicit fences to prevent load-store reordering.In this paper, we address this major limitation of the

Weakestmo paper. We establish inCoq correctness of the intended compilation schemes to a wide range of hardware architecturesthat includes the major ones: x86 - TSO [18],

POWER [1],

ARMv7 [1],

ARMv8 [22]. The com-pilation schemes, whose correctness we prove, do not require any fences or fake dependenciesfor relaxed accesses. Because of a technical limitation of our setup (see §6), however, compi-lation of read-modify-write (RMW) accesses to

ARMv8 uses a load-reserve/store-conditionalloop (similar to that of

ARMv7 and

POWER ) as opposed to the newly introduced

ARMv8 instructions for certain kinds of RMWs.The main challenge in this proof is to reconcile the diﬀerent ways in which hardwaremodels and

Weakestmo allow load-store reordering. Unlike most models at the programminglanguage level, hardware models (such as

ARMv8 ) do not execute instructions in sequence;they instead keep track of dependencies between instructions and ensure that no dependencycycles ever arise in a single execution. In contrast,

Weakestmo executes instructions in order,but simultaneously considers multiple executions to justify an execution where a load readsa value that indirectly depends upon a later store. Technically, these multiple executionstogether form an event structure , upon which

Weakestmo places various constraints.

IMM SC ARMv7POWERx86 - TSOARMv8WeakestmoC11

Figure 1

Results proved in this paper.

The high-level proof structure is shown inFig. 1. We reuse

IMM , an intermediate memorymodel , introduced by Podkopaev et al. [20] asan abstraction over all major existing hardwarememory models. To support

Weakestmo compila-tion, we extend

IMM with sequentially consistent (SC) accesses following the

RC11 model [14]. As

IMM is very much a hardware-like model ( e.g., ittracks dependencies), the main result is compilation from

Weakestmo to IMM (indicated bythe bold arrow). The other arrows in the ﬁgure are extensions of previous results to accountfor SC accesses, while double arrows indicate results for two compilation schemes.The complexity of the proof is also evident from the size of the Coq development. Wehave written about 30K lines of Coq deﬁnitions and proof scripts on top of an existinginfrastructure of about another 20K lines (deﬁning

IMM , the aforementioned hardware modelsand many lemmas about them). As part of developing the proof, we also had to mechanizethe

Weakestmo deﬁnition in Coq and to ﬁx some minor deﬁciencies in the original deﬁnition,which were revealed by our proof eﬀort.To the best of our knowledge, our proof is the ﬁrst proof of correctness of compilation ofan event-structure-based memory model. It is also the ﬁrst mechanized compilation proofof a weak memory model supporting sequentially consistent accesses to such a range of . Moiseenko et al. 5:3

Init R ( x, W ( y, R ( y, W ( x, po po pporfpo po (a) G LB : Execution graph of LB. Init R ( x, W ( y, R ( y, W ( x, po poppo pporfpo po (b) Execution of LB-data and LB-fake.

Figure 2

Executions of LB and LB-data/LB-fake with outcome a = b = 1. hardware architectures. The latter, although fairly straightforward in our case, has had ahistory of wrong compilation correctness arguments (see [14] for details). Outline

We start with an informal overview of

IMM , Weakestmo , and our compilation proof(§2). We then present a fragment of

Weakestmo formally (§3) and its compilation proof (§4).Subsequently, we extend these results to cover SC accesses (§5), discuss related work (§6)and conclude (§7). The associated proof scripts and supplementary material for our paperare publicly available at http://plv.mpi-sws.org/weakestmoToImm/ . To get an idea about the

IMM and

Weakestmo memory models, consider a version of theLB-fake and LB-data programs from §1 with no dependency in thread 1: a := [ x ] // [ y ] := 1 b := [ y ] // [ x ] := b (LB)As we will see, the annotated outcome is allowed by both IMM and

Weakestmo , albeit indiﬀerent ways. The diﬀerent treatment of load-store reordering aﬀects the outcomes of otherprograms. For example,

IMM forbids the annotate outcome of LB-fake by treating it exactlyas LB-data, whereas

Weakestmo allows the outcome by treating LB-fake exactly as LB.

IMM

IMM is a declarative (also called axiomatic ) model identifying a program’s semantics with aset of execution graphs , or just executions . As an example, Fig. 2a contains G LB , an IMM execution graph of LB corresponding to an execution yielding the annotated behavior.Vertices of execution graphs, called events , represent memory accesses either due to theinitialization of memory or to the execution of program instructions. Each event is labeledwith the type of the access ( e.g., R for reads, W for writes), the location accessed, and thevalue read or written. Memory initialization consists of a set of events labeled W ( x,

0) foreach location x used in the program; for conciseness, however, we depict the initializationevents as a single event with label Init .Edges of execution graphs represent diﬀerent relations on events. In Fig. 2, three diﬀerentrelations are depicted. The program order relation ( po ) totally orders events originated fromthe same thread according to their order in the program, as well as the initialization event(s)before all other events. The reads-from relation ( rf ) relates a write event to the read eventsthat read from it. Finally, the preserved program order ( ppo ) is a subset of the programorder relating events that cannot be executed out of order. Such ppo edges arise wheneverthere is a dependency chain between the corresponding instructions ( e.g., a write storing thevalue read by a prior read). E C O O P 2 0 2 0 :4 Reconciling Event Structures with Modern Multiprocessors

Because of the syntactic nature of ppo , IMM conﬂates the executions of LB-data andLB-fake leading to the outcome a = b = 1 (see Fig. 2b). This choice is in line with hardwarememory models; it means, however, that IMM is not suitable as a memory model for aprogramming language (because, as argued in §1, LB-fake can be transformed to LB by anoptimizing compiler).The executions of a program are constructed in two steps. First, a thread-local semanticsdetermines the sequential executions of each thread, where the values returned by eachread access are chosen non-deterministically (among the set of all possible values), and theexecutions of diﬀerent threads are combined into a single execution. Then, the executiongraphs are ﬁltered by a consistency predicate , which determines which executions are allowed(i.e., are

IMM -consistent). These

IMM -consistent executions form the program’s semantics.

IMM -consistency checks three basic constraints:

Completeness:

Every read event reads from precisely one write with the same location andvalue;

Coherence:

For each location x , there is a total ordering of x -related events extending theprogram order so that each read of x reads from the most recent prior write according tothat total order; and Acyclic dependency:

There is no cycle consisting only of ppo and rf edges.The ﬁnal constraint disallows executions in which an event recursively depends upon itself,as this pattern can lead to “out-of-thin-air” outcomes. Speciﬁcally, the execution in Fig. 2b,which represents the annotated behavior of LB-fake and LB-data, is not IMM -consistentbecause of the ( ppo ∪ rf )-cycle. In contrast, G LB is IMM -consistent.

Weakestmo

We move on to

Weakestmo , which also deﬁnes the program’s semantics as a set of executiongraphs. However, they are constructed diﬀerently—extracted from a ﬁnal event structure ,which

Weakestmo incrementally builds for a program.An event structure represents multiple executions of a programs in a single graph. Likeexecution graphs, event structures contain a set of events and several relations among them.Like execution graphs, the program order ( po ) orders events according to each thread’scontrol ﬂow. However, unlike execution graphs, po is not necessarily total among the eventsof a given thread. Events of the same thread that are not po -ordered are said to be in conﬂict ( cf ) with one another, and cannot belong to the same execution. Such conﬂict events arisewhen two read events originate from the same read instruction ( e.g., representing executionswhere the reads return diﬀerent values). Moreover, cf “extends downwards”: events thatdepend upon conﬂicting events ( i.e., have conﬂicting po -predecessors) are also in conﬂictwith one other. In pictures, we typically show only the immediate conﬂict edges (betweenreads originating from the same instruction) and omit the conﬂict edges between events po -after immediately conﬂicting ones.Event structures are constructed incrementally starting from an event structure consistingonly of the initialization events. Then, events corresponding to the execution of programinstructions are added one at a time. We start by executing the ﬁrst instruction of aprogram’s thread. Then, we may execute the second instruction of the same thread or theﬁrst instruction of another thread, and so on. For a detailed formal description of the graphs and their construction process we refer the reader to [20,§2.2]. . Moiseenko et al. 5:5

Init e : R ( x, jf (a) S a Init e : R ( x, e : W ( y, jf (b) S b with execution X b selected Init e : R ( x, e : W ( y, e : R ( y, jf jf (c) S c Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, jf jf (d) S d with execution X d selected Init e : R ( x, e : W ( y, e : R ( x, e : R ( y, e : W ( x, cf jfjfjf (e) S e Init e : R ( x, e : W ( y, e : R ( x, e : W ( y, e : R ( y, e : W ( x, cfewjf (f) S f with execution X f selected Figure 3

A run of

Weakestmo witnessing the annotated outcome of LB.

As an example, Fig. 3 constructs an event structure for LB. Fig. 3a depicts the eventstructure S a obtained from the initial event structure by executing a := [ x ] in LB’s thread 1.As a result of the instruction execution, a read event e : R ( x,

0) is added.Whenever the event added is a read,

Weakestmo has to justify the returned value from anappropriate write event. In this case, there is only one write to x —the initialization write—and so S a has a justiﬁed from edge, denoted jf , going to e in S a . This is a requirement of Weakestmo : each read event in an event structure has to be justiﬁed from exactly one writeevent with the same value and location. (This requirement is analogous to the completeness requirement in

IMM -consistency for execution graphs.) Since events are added in programorder and read events are always justiﬁed from existing events in the event structure, po ∪ jf is guaranteed to be acyclic by construction.The next three steps (Figures 3b to 3d) simply add a new event to the event structure.Notice that unlike IMM executions,

Weakestmo event structures do not track syntacticdependencies, e.g., S d in Fig. 3d does not contain a ppo edge between e and e . This isprecisely what allows Weakestmo to assign the same behavior to LB and LB-fake: theyhave exactly the same event structures. As a programming-language-level memory model,

Weakestmo supports optimizations removing fake dependencies.The next step (Fig. 3e) is more interesting because it showcases the key distinctionbetween event structures and execution graphs, namely that event structures may containmore than one execution for each thread. Speciﬁcally, the transition from S d to S e rerunsthe ﬁrst instruction of thread 1 and adds a new event e justiﬁed from a diﬀerent writeevent. We say that this new event conﬂicts ( cf ) with e because they cannot both occurin a single execution. Because of conﬂicts, po in event structures does not totally order allevents of a thread; e.g., e and e are not po -ordered in S e . Two events of the same threadare conﬂicted precisely when they are not po -ordered. E C O O P 2 0 2 0 :6 Reconciling Event Structures with Modern Multiprocessors

Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (a) T C a Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (b) T C b Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (c) T C c Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (d) T C d Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (e) T C e Init e : R ( x, e : W ( y, e : R ( y, e : W ( x, pporf (f) T C f Figure 4

Traversal conﬁgurations for G LB . The ﬁnal construction step (Fig. 3f) demonstrates another

Weakestmo feature. Conﬂictingwrite events writing the same value to the same location ( e.g., e and e in S f ) may bedeclared equal writes , i.e., connected by an equivalence relation ew . The ew relation is used to deﬁne Weakestmo ’s version of the reads-from relation, rf ,which relates a read to all (non-conﬂicted) writes equal to the write justifying the read. Forexample, e reads from both e and e .The Weakestmo ’s rf relation is used for extraction of program executions. An executiongraph G is extracted from an event structure S denoted S (cid:66) G if G is a maximal conﬂict-freesubset of S , it contains only visible events (to be deﬁned in §3), and every read event in G reads from some write in G according to S. rf . Two execution graphs can be extracted from S f : { Init , e , e , e , e } and { Init , e , e , e , e } representing the outcomes a = 0 ∧ b = 1and a = b = 1 respectively. Weakestmo to IMM

Compilation: High-Level Proof Structure

In this paper, we assume that

Weakestmo is deﬁned for the same assembly language as

IMM (see [20, Fig. 2]) extended with SC accesses and refer to this language as L . Having that, weshow the correctness of the identity mapping as a compilation scheme from Weakestmo to IMM in the following theorem. (cid:73)

Theorem 1.

Let prog be a program in L , and G be an IMM -consistent execution graph ofprog. Then there exists an event structure S of prog under Weakestmo such that S (cid:66) G . To prove the theorem, we must show that

Weakestmo may construct the needed eventstructure in a step by step fashion. If the

IMM -consistent execution graph G contains no po ∪ rf cycles, then the construction is completely straightforward: G itself is a Weakestmo -consistent event structure (setting jf to be just rf ), and its events can be added in anyorder extending po ∪ rf .The construction becomes tricky for IMM -consistent execution graphs, such as G LB , thatcontain po ∪ rf cycles. Due to the cycle(s), G cannot be directly constructed as a (conﬂict-free) In this paper, we take ew to be reﬂexive, whereas it is is irreﬂexive in Chakraborty and Vafeiadis [6].Our ew is the reﬂexive closure of the one in [6]. . Moiseenko et al. 5:7 Weakestmo event structure. We must instead construct a larger event structure S containingmultiple executions, one of which will be the desired graph G . Roughly, for each po ∪ rf cycle in G , we have to construct an immediate conﬂict in the event structure.To generate the event structure S , we rely on a basic property of IMM -consistent executiongraphs shown by Podkopaev et al. [20, §§6,7], namely that execution graphs can be traversed in a certain order, i.e., its events can be issued and covered in that order, so that in theend all events are covered. The traversal captures a possible execution order of the programthat yields the given execution. In that execution order, events are not added according toprogram order, but rather according to preserved program order ( ppo ) in two steps. Eventsare ﬁrst issued when all their dependencies have been resolved, and are later covered whenall their po -prior events have been covered.In more detail, a traversal of an IMM -consistent execution graph G is a sequence oftraversal steps between traversal conﬁgurations . A traversal conﬁguration T C of an executiongraph G is a pair of sets of events, h C, I i , called the covered and issued set respectively. Asan example, Fig. 4 presents all six traversal conﬁgurations of the execution graph G LB of LBfrom Fig. 2a except for the initial conﬁguration. The issued set is marked by and thecovered set by .A traversal might be seen as an execution of an abstract machine that can execute writeinstructions early but has to execute everything else in order. The ﬁrst option correspondsto issuing a write event, and the second option to covering an event. The traversal strategyhas certain constraints. To issue a write event, all external reads that it depends uponmust be resolved; i.e., they must read from already issued events. To cover an event, allits po -predecessors must also be covered. For example, in Fig. 4, a traversal cannot issue e : W ( x,

1) before issuing e : W ( y,

1) nor cover e : R ( x,

1) before issuing e : W ( x, et al. [20, Prop. 6.5], every IMM -consistent execution graph G has a full traversal of the following form: G ‘ T C init ( G ) −→ T C −→ T C −→ ... −→ T C ﬁnal ( G )where the initial conﬁguration, T C init ( G ) (cid:44) h G. Init , G.

Init i , has issued and covered only G ’sinitial events and the ﬁnal conﬁguration, T C ﬁnal ( G ) (cid:44) h G. E , G. W i , has covered all G ’s eventsand issued all its write events.We construct the event structure S following a full traversal of G . We deﬁne a simulationrelation, I ( prog , G, T C, S, X ), between the program prog , the current traversal conﬁguration T C of execution G and the current event structure’s state h S, X i , where X is a subset ofevents corresponding to a particular execution graph extracted from the event structure S .Our simulation proof is divided into the following three lemmas, which state that theinitial states are simulated, that simulation extends along traversal steps, and that thesimilation of ﬁnal states means that G can be extracted from the generated event structure. (cid:73) Lemma 2 (Simulation Start) . Let prog be a program of L , and G be an IMM -consistentexecution graph of prog. Then I ( prog , G, T C init ( G ) , S init ( prog ) , S init ( prog ) . E ) holds. (cid:73) Lemma 3 (Weak Simulation Step) . If I ( prog , G, T C, S, X ) and G ‘ T C −→ T C hold,then there exist S and X such that I ( prog , G, T C , S , X ) and S −→ ∗ S hold. (cid:73) Lemma 4 (Simulation End) . If I ( prog , G, T C ﬁnal ( G ) , S, X ) holds, then the execution graphassociated with X is isomorphic to G . For readers familiar with PS [12], issuing a write event corresponds to promising a message, and coveringan event to normal execution of an instruction. E C O O P 2 0 2 0 :8 Reconciling Event Structures with Modern Multiprocessors

The proof of Theorem 1 then proceeds by induction on the length of the traversal G ‘ T C init ( G ) −→ ∗ T C ﬁnal ( G ). Lemma 2 serves as the base case, Lemma 3 is the inductionstep simulating each traversal step with a number of event structure construction steps, andLemma 4 concludes the proof.The proofs of Lemmas 2 and 4 are technical but fairly straightforward. (We deﬁne I in away that makes these lemmas immediate.) In contrast, Lemma 3 is much more diﬃcult toprove. As we will see, simulating a traversal step sometimes requires us to construct a newbranch in the event structure, i.e., to add multiple events (see §4.3). Weakestmo to IMM

Compilation Correctness by Example

Before presenting any formal deﬁnitions, we conclude this overview section by showcasingthe construction used in the proof of Lemma 3 on execution graph G LB in Fig. 2a followingthe traversal of Fig. 4. We have actually already seen the sequence of event structuresconstructed in Fig. 3. Note that, even though Figures 3 and 4 have the same number ofsteps, there is no one-to-one correspondence between them as we explain below.Consider the last event structure S f from Fig. 3. A subset of its events X f marked by ,which we call a simulated execution , is a maximal conﬂict-free subset of S f and all read eventsin X f read from some write in X f ( i.e., are justiﬁed from a write deemed “equal” to somewrite in X f ). Then, by deﬁnition, X f is extracted from S f . Also, an execution graph inducedby X f is isomorphic to G LB . That is, construction of S f for LB shows that in Weakestmo it ispossible to observe the same behavior as G LB . Now, we explain how we construct S f andchoose X f .During the simulation, we maintain the relation I ( prog , G, T C, S, X ) connecting a program prog , its execution graph G , its traversal conﬁguration T C , an event structure S , and asubset of its events X . Among other properties (presented in §4.2), the relation states that allissued and covered events of T C have exact counterparts in X , and that X can be extractedfrom S .The initial event structure and X Init consist of only initial events. Then, following issuingof event e : W ( y,

1) in

T C a (see Fig. 4a), we need to add a branch to the event structure thathas W ( y,

1) in it. Since

Weakestmo requires adding events according to program order, weﬁrst need to add a read event corresponding to ‘ a := [ x ]’ of LB’s thread 1. Each read eventin an event structure has to be justiﬁed from somewhere. In this case, the only write event tolocation x is the initial one. That is, the added read event e is justiﬁed from it (see Fig. 3a).In the general case, having more than one option, we would choose a ‘safe’ write event foran added read event to be justiﬁed from, i.e., the one which the corresponding branch is‘aware’ of already and being justiﬁed from which would not break consistency of the eventstructure. After that, a write event e : W ( y,

1) can be added po -after e (see Fig. 3b), and I (LB , G LB , T C a , S b , X b ) holds for X b = { Init , e , e } .Next, we need to simulate the second traversal step (see Fig. 4b), which issues W ( x, e has toget value 1, since there is a dependency between instructions in thread 2. As we mentionedearlier, the traversal strategy guarantees that e : W ( y,

1) is issued at the moment of issuing e : W ( x, e from. Now, the write event e : W ( y,

1) representing e can be added to the eventstructure (see Fig. 3d) and I (LB , G LB , T C b , S d , X d ) holds for X d = { Init , e , e , e , e } .In the third traversal step (see Fig. 4c), the read event e : R ( x,

1) is covered. To havea representative event for e in the event structure, we add e (see Fig. 3e). It is justiﬁed . Moiseenko et al. 5:9 from e , which writes the needed value 1. Also, e represents an alternative to e executionof the ﬁrst instruction of thread 1, so the events are in conﬂict.However, we cannot choose a simulated execution X related to T C c and S e by thesimulation relation since X has to contain e and a representative for e : W ( y,

1) (in S e it isrepresented by e ) while being conﬂict-free. Thus, the event structure has to make one otherstep (see Fig. 3f) and add the new event e to represent e : W ( y, X f = { Init , e , e , e , e } .Since X f has to be extracted from S f , every read event in X has to be connected via an rf edge to an event in X . To preserve the requirement, we connect the newly added event e and e via an ew edge, i.e., marking them to be equal writes. This induces an rf edgebetween e and e . That is, I (LB , G LB , T C c , S f , X f ) holds.To simulate the remaining traversal steps (Figures 4d to 4f), we do not need to modify S f because it already contains counterparts for the newly covered events and, moreover, theexecution graph associated with X f is isomorphic to G LB . That is, we just need to show that I (LB , G LB , T C d , S f , X f ), I (LB , G LB , T C e , S f , X f ), and I (LB , G LB , T C f , S f , X f ) hold. Weakestmo

In this section, we introduce the notation used in the rest of the paper and deﬁne the

Weakestmo memory model. For simplicity, we present only a minimal fragment of

Weakestmo containing only relaxed reads and writes. For the deﬁnition of the full

Weakestmo model, werefer the readers to Chakraborty and Vafeiadis [6] and to our Coq development [17].

Notation

Given relations R and R , we write R ; R for their sequential composition.Given relation R , we write R ? , R + and R ∗ to denote its reﬂexive, transitive and reﬂexive-transitive closures. We write id to denote the identity relation ( i.e., id (cid:44) {h x, x i} ). For a set A , we write [ A ] to denote the identity relation restricted to A (that is, [ A ] (cid:44) {h a, a i | a ∈ A } ).Hence, for instance, we may write [ A ] ; R ; [ B ] instead of R ∩ ( A × B ). We also write [ e ] todenote [ { e } ] if e is not a set.Given a function f : A → B , we denote by = f the set of f -equivalent elements: (= f (cid:44) {h a, b i ∈ A × A | f ( a ) = f ( b ) } ). In addition, given a relation R , we denote by R | = f therestriction of R to f -equivalent elements ( R | = f (cid:44) R ∩ = f ), and by R | = f be the restriction of R to non- f -equivalent elements ( R | = f (cid:44) R \ = f ). Events , e ∈ Event , and thread identiﬁers , t ∈ Tid , are represented by natural numbers. Wetreat the thread with identiﬁer 0 as the initialization thread. We let x ∈ Loc to range over locations , and v ∈ Val over values .A label, l ∈ Lab , takes one of the following forms: R ( x, v ) — a read of value v from location x . W ( x, v ) — a write of value v to location x . Actually, it is easy to show that there could be only one such event since equal writes are in conﬂictand X is conﬂict-free. Note that we could have left e without any outgoing ew edges since the choice of equal writes fornewly added events in Weakestmo is non-deterministic. However, that would not preserve the simulationrelation.

E C O O P 2 0 2 0 :10 Reconciling Event Structures with Modern Multiprocessors

Given a label l the functions typ , loc , val return (when applicable) its type ( i.e., R or W ),location and value correspondingly. When a speciﬁc function assigning labels to events isclear from the context, we abuse the notations R and W to denote the sets of all events labelledwith the corresponding type. We also use subscripts to further restrict this set to a speciﬁclocation ( e.g., W x denotes the set of write events operating on location x .) An event structure S is a tuple h E , tid , lab , po , jf , ew , co i where: E is a set of events, i.e., E ⊆ Event . tid : E → Tid is a function assigning a thread identiﬁer to every event. We treat eventswith the thread identiﬁer equal to 0 as initialization events and denote them as

Init , thatis

Init (cid:44) { e ∈ E | tid ( e ) = 0 } . lab : E → Lab is a function assigning a label to every event in E . po ⊆ E × E is a strict partial order on events, called program order , that tracks theirprecedence in the control ﬂow of the program. Initialization events are po -before all otherevents, whereas non-initialization events can only be po -before events from the samethread.Not all events of a thread are necessarily ordered by po . We call such po -unorderednon-initialization events of the same thread conﬂicting events. The corresponding binaryrelation cf is deﬁned as follows: cf (cid:44) ([ E \ Init ] ; = tid ; [ E \ Init ]) \ ( po ∪ po − ) ? jf ⊆ [ E ∩ W ] ; (= loc ∩ = val ) ; [ E ∩ R ] is the justiﬁed from relation, which relates a write eventto the reads it justiﬁes. We require that reads are not justiﬁed by conﬂicting writes ( i.e., jf ∩ cf = ∅ ) and jf − be functional ( i.e., whenever h w , r i , h w , r i ∈ jf , then w = w ).We also deﬁne the notion of external justiﬁcation: jfe (cid:44) jf \ po . A read event isexternally justiﬁed from a write if the write is not po -before the read. ew ⊆ [ E ∩ W ] ; ( cf ∩ = loc ∩ = val ) ? ; [ E ∩ W ] is an equivalence relation called the equal-writes relation. Equal writes have the same location and value, and (unless identical) are inconﬂict with one another. co ⊆ [ E ∩ W ] ; (= loc \ ew ) ; [ E ∩ W ] is the coherence order, a strict partial order that relatesnon-equal write events with the same location. We require that coherence be closed withrespect to equal writes ( i.e., ew ; co ; ew ⊆ co ) and total with respect to ew on writes tothe same location: ∀ x ∈ Loc . ∀ w , w ∈ W x . h w , w i ∈ ew ∪ co ∪ co − Given an event structure S , we use “dot notation” to refer to its components ( e.g., S. E , S. po ). For a set A of events, we write S.A for the set A ∩ S. E (for instance, S. W x = { e ∈ S. E | typ ( S. lab ( e )) = W ∧ loc ( S. lab ( e )) = x } ). Further, for e ∈ S. E , we write S. typ ( e )to retrieve typ ( S. lab ( e )). Similar notation is used for the functions loc and val . Given aset of thread identiﬁers T , we write S. thread ( T ) to denote the set of events belonging to oneof the threads in T , i.e., S. thread ( T ) (cid:44) { e ∈ S. E | S. tid ( e ) ∈ T } . When T = { thread ( t ) } is a singleton, we often write S. thread ( t ) instead of S. thread ( { t } ).We deﬁne the immediate po and cf edges of an event structure as follows: S. po imm (cid:44) S. po \ ( S. po ; S. po ) S. cf imm (cid:44) S. cf ∩ ( S. po imm − ; S. po imm ) . Moiseenko et al. 5:11 An event e is an immediate po -predecessor of e if e is po -before e and there is no event po -between them. Two conﬂicting events are immediately conﬂicting if they have the sameimmediate po -predecessor. Given a program prog , we construct its event structures operationally in a way that guaranteescompleteness ( i.e., that every read is justiﬁed from some write) and po ∪ jf acyclicity. Westart with an event structure containing only the initialization events and add one event at atime following each thread’s semantics.For the thread semantics, we assume reductions of the form σ e −→ σ between threadstates σ, σ ∈ ThreadState and labeled by the event e ∈ E generated by that executionstep. Given a thread t and a sequence of events e , ... , e n ∈ S. thread ( t ) in immediate po succession ( i.e., h e i , e i +1 i ∈ S. po imm for 1 ≤ i < n ) starting from a ﬁrst event of thread t ( i.e.,dom ( S. po ; [ e ]) ⊆ Init ), we can add an event e po -after that sequence of events provided thatthere exist thread states σ , ... , σ n and σ such that prog ( t ) e −→ σ e −→ σ · · · e n −→ σ n e −→ σ ,where prog ( t ) is the initial thread state of thread t of the program prog . By construction,this means that the newly added event e will be in conﬂict with all other events of thread t besides e , ... , e n .Further, when the new event e is a read event, it has to be justiﬁed from an existingwrite event, so as to ensure completeness and prevent “out-of-thin-air” values. The writeevent is picked non-deterministically from all non-conﬂicting writes with the same locationas the new read event. Similarly, when e is a write event, its position in co order should bechosen. It can be done by either picking an ew equivalence class and including the new writein it, or by putting the new write immediately after some existing write in co order. At eachstep, we also check for event structure consistency (to be deﬁned in Def. 5): If the eventstructure obtained after the addition of the new event is inconsistent, it is discarded. To deﬁne consistency, we ﬁrst need a number of auxiliary deﬁnitions. The happens-before order S. hb is a generalization of the program order. Besides the program order edges, itincludes certain synchronization edges (captured by the synchronizes with relation, S. sw ). S. hb (cid:44) ( S. po ∪ S. sw ) + For the fragment covered in this section, there are no synchronization edges ( i.e., sw = ∅ ),and so hb and po coincide. In the full model, however, certain justiﬁcation edges ( e.g., between release/acquire accesses) contribute to sw and hence to hb .The extended conﬂict relation S. ecf extends the notion of conﬂicting events to accountfor hb ; two events are in extended conﬂict if they happen after conﬂicting events. S. ecf (cid:44) ( S. hb − ) ? ; S. cf ; S. hb ? As already mentioned in §2, the reads-from relation, S. rf , of a Weakestmo event structureis derived. It is deﬁned as an extension of S. jf to all S. ew -equivalent writes. S. rf (cid:44) ( S. ew ; S. jf ) \ S. cf Our deﬁnition of immediate conﬂicts diﬀers from that of [6] and is easier to work with. The twodeﬁnitions are equivalent if the set of initialization events is non-empty. The full model is presented in [6] and also in our Coq development [17].

E C O O P 2 0 2 0 :12 Reconciling Event Structures with Modern Multiprocessors

Note that unlike S. jf − , the relation S. rf − is not functional. This does not cause anyproblems, however, since all the writes from whence a read reads have the same location andvalue and are in conﬂict with one another.The relation S. fr , called from-read or reads-before , places read events before subsequentwrites. S. fr (cid:44) S. rf − ; S. co The extended coherence S. eco is a strict partial order that orders events operating on thesame location. (It is almost total on accesses to a given location, except that it does notorder equal writes nor reads reading from the same write.) S. eco (cid:44) ( S. co ∪ S. rf ∪ S. fr ) + We observe that in our model, eco is equal to rf ∪ co ; rf ? ∪ fr ; rf ? , similar to the correspondingdeﬁnitions about execution graphs in the literature. The last ingredient that we need for event structure consistency is the notion of visible events, which will be used to constrain external justiﬁcations. We deﬁne it in a few steps.Let e be some event in S . First, consider all write events used to externally justify e orone of its justiﬁcation ancestors. The relation S. jfe ; ( S. po ∪ S. jf ) ∗ deﬁnes this connectionformally. Among that set of write events restrict attention to those conﬂicting with e , andcall that set M . That is, M (cid:44) dom ( S. cf ∩ ( S. jfe ; ( S. po ∪ S. jf ) ∗ ) ; [ e ]). Event e is visible ifall writes in M have an equal write that is po -related with e . Formally, S. Vis (cid:44) { e ∈ S. E | S. cf ∩ ( S. jfe ; ( S. po ∪ S. jf ) ∗ ) ; [ e ] ⊆ S. ew ; ( S. po ∪ S. po − ) ? } Intuitively, visible events cannot depend on conﬂicting events: for every such justiﬁcationdependence, there ought to be an equal non-conﬂicting write.

Consistency places a number of additional constraints on event structures. First, it checksthat there is no redundancy in the event structure: immediate conﬂicts arise only becauseof read events justiﬁed from non-equal writes. Second, it extends the constraints about cf to the extended conﬂict ecf ; namely that no event can conﬂict with itself or be justiﬁedfrom a conﬂicting event. Third, it checks that reads are justiﬁed either from events of thesame thread or from visible events of other threads. Finally, it ensures coherence , i.e., thatexecutions restricted to accesses on a single location do not have any weak behaviors. (cid:73) Deﬁnition 5.

An event structure S is said to be consistent if the following conditions hold.dom ( S. cf imm ) ⊆ S. R ( cf imm -read) S. jf ; S. cf imm ; S. jf − ; S. ew is irreﬂexive. ( cf imm -justification) S. ecf is irreﬂexive. ( ecf -irreflexivity) S. jf ∩ S. ecf = ∅ ( jf -non-conflict) dom ( S. jfe ) ⊆ S. Vis ( jfe -visible) S. hb ; S. eco ? is irreﬂexive. (coherence) This equivalence equivalence does not hold in the original

Weakestmo model [6]. To make the equivalencehold, we made ew transitive, and require ew ; co ; ew ⊆ co . Note, that in [6] the deﬁnition of the visible events is slightly more verbose. We proved in Coq [17] thatour simpler deﬁnition is equivalent to the one given there. . Moiseenko et al. 5:13

The last part of

Weakestmo is the extraction of executions from an event structure. Anexecution is essentially a conﬂict-free event structure. (cid:73)

Deﬁnition 6. An execution graph G is a tuple h E , tid , lab , po , rf , co i where its componentsare deﬁned similarly as in the case of an event structure with the following exceptions: po is required to be total on the set of events from the same thread. Thus, executiongraphs have no conﬂicting events, i.e., cf = ∅ .The rf relation is given explicitly instead of being derived. Also, there are no jf and ew relations. co totally orders write events operating on the same location. All derived relations are deﬁned similarly as for event structures. Next we show how toextract an execution graph from the event structure. (cid:73)

Deﬁnition 7.

A set of events X is called extracted from S if the following conditions aremet: X is conﬂict-free, i.e., [ X ] ; S. cf ; [ X ] = ∅ . X is S. rf -complete, i.e., X ∩ S. R ⊆ codom ([ X ] ; S. rf ) . X contains only visible events of S , i.e., X ⊆ S. Vis . X is hb -downward-closed, i.e., dom ( S. hb ; [ X ]) ⊆ X . Given an event structure S and extracted subset of its events X , it is possible to associatewith X an execution graph G simply by restricting the corresponding components of S to X : G. E = X G. tid = S. tid | X G. lab = S. lab | X G. po = [ X ] ; S. po ; [ X ] G. rf = [ X ] ; S. rf ; [ X ] G. co = [ X ] ; S. co ; [ X ]We say that such execution graph G is associated with X and that it is extracted from theevent structure: S (cid:66) G . Weakestmo additionally deﬁnes another consistency predicate to further ﬁlter out someof the extracted execution graphs. In the

Weakestmo fragment we consider, this additionalconsistency predicate is trivial—every extracted execution satisﬁes it—and so we do notpresent it here. In the full model, execution consistency checks atomicity of read-modify-writeinstructions, and sequential consistency for SC accesses.

Weakestmo

In this section, we outline our correctness proof for the compilation from

Weakestmo to thevarious hardware models. As already mentioned, our proof utilizes

IMM [20]. In the following,we brieﬂy present

IMM for the fragment of the model containing only relaxed reads andwrites (§4.1), our simulation relation (§4.2) for the compilation from

Weakestmo to IMM ,and outline the argument as to why the simulation relation is preserved (§4.3). Mappingfrom

IMM to the hardware models has already been proved correct by Podkopaev et al. [20],so we do not present this part here. Later, in §5, we will extend the

IMM mapping results tocover SC accesses.As a further motivating example for this section consider yet another variant of the loadbuﬀering program shown in Fig. 5. As we will see, its annotated weak behavior is allowed by

IMM and also by

Weakestmo , albeit in a diﬀerent way. The argument for constructing the

Weakestmo event structure that exhibits the weak behavior from the given

IMM executiongraph is non-trivial.

E C O O P 2 0 2 0 :14 Reconciling Event Structures with Modern Multiprocessors r := [ x ] // [ y ] := r [ z ] := 1 r := [ y ] // r := [ z ] // [ x ] := r Init e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, rfrfrfppo ppo Figure 5

A variant of the load-buﬀering program (left) and the

IMM graph G corresponding toits annotated weak behavior (right). IMM

In order to discuss the proof, we brieﬂy present a simpliﬁed version of the formal

IMM deﬁnition, where we have omitted constraints about RMW accesses and fences. (cid:73)

Deﬁnition 8. An IMM execution graph G is an execution graph (Def. 6) extended withone additional component: the preserved program order ppo ⊆ [ R ] ; po ; [ W ] . Preserved program order edges correspond to syntactic dependencies guaranteed to bepreserved by all major hardware platforms. For example, the execution graph in Fig. 5 hastwo ppo edges corresponding to the data dependencies via registers r and r . (The full IMM deﬁnition [20] distinguishes between the diﬀerent types of dependencies—control, data,adress–and includes them as separate components of execution graphs. In the full model, ppo is actually derived from the more basic dependencies.)

IMM -consistency checks completeness, coherence, and acyclicity: (cid:73) Deﬁnition 9. An IMM execution graph G is IMM -consistent ifcodom ( G. rf ) = G. R , (completeness) G. hb ; G. eco ? is irreﬂexive, and (coherence) G. rf ∪ G. ppo is acyclic. (no-thin-air) As we can see, the execution graph G of Fig. 5 is IMM -consistent because every read ofthe graph reads from some write event and, moreover, the coherence and no-thin-air properties hold.

Weakestmo to IMM

Proof

In this section, we deﬁne the simulation relation I , which is used for the simulation of atraversal of an IMM -consistent execution graph by a

Weakestmo event structure presented in§2.3.The way we deﬁne I ( prog , G, h C, I i , S, X ) induces a strong connection between events inthe execution graph G and the event structure S . We make this connection explicit with thefunction s2g G,S : S. E → G. E , which maps events of the event structure S into the events ofthe execution graph G , such that e and s2g G,S ( e ) belong to the same thread and have the Again, this is a simpliﬁed presentation for a fragment of the model. We refer the reader to Podkopaev et al. [20] for the full deﬁnition, which further distinguishes between internal and external rf edges. A reﬁned version of the simulation relation for the full

Weakestmo model can be found in [17, Appendix A] . Moiseenko et al. 5:15 same po -position in the thread. Note that s2g

G,S is deﬁned for all events e ∈ S. E , meaningthat the event structure S does not contain any redundant events that do not correspond toevents in the IMM execution graph G . The function s2g G,S , however, does not have to beinjective: in particular, events e and e that are in immediate conﬂict in S have the same s2g G,S -image in G . In the rest of the paper, whenever G and S are clear from the context,we omit the G, S subscript from s2g .In the context of a function s2g (for some G and S ), we also use (cid:86) · (cid:87) and (cid:84) · (cid:85) to lift s2g to sets and relations:for A S ⊆ S. E : (cid:86) A S (cid:87) (cid:44) { s2g ( e ) | e ∈ A S } for A G ⊆ G. E : (cid:84) A G (cid:85) (cid:44) { e ∈ S. E | s2g ( e ) ∈ A G } for R S ⊆ S. E × S. E : (cid:86) R S (cid:87) (cid:44) {h s2g ( e ) , s2g ( e ) i | h e, e i ∈ R S } for R G ⊆ G. E × G. E : (cid:84) R G (cid:85) (cid:44) {h e, e i ∈ S. E × S. E | h s2g ( e ) , s2g ( e ) i ∈ R G } For example, (cid:84) C (cid:85) denotes a subset of S ’s events whose s2g -images are covered events in G ,and (cid:86) S. rf (cid:87) denotes a relation on events in G whose s2g -preimages in S are related by S. rf .We deﬁne the relation I ( prog , G, h C, I i , S, X ) to hold if the following conditions are met: G is an IMM -consistent execution of prog . S is a Weakestmo -consistent event structure of prog . X is an extracted subset of S . S and X corresponds precisely to all covered and issued events and their po -predecessors: (cid:86) S. E (cid:87) = (cid:86) X (cid:87) = C ∪ dom ( G. po ? ; [ I ])(Note that C is closed under po -predecessors, so dom ( G. po ? ; [ C ]) = C .) Each S event has the same thread, type, modiﬁer, and location as its corresponding G event. In addition, covered and issued events in X have the same value as theircorresponding ones in G . a. ∀ e ∈ S. E . S. { tid , typ , loc , mod } ( e ) = G. { tid , typ , loc , mod } ( s2g ( e )) b. ∀ e ∈ X ∩ (cid:84) C ∪ I (cid:85) . S. val ( e ) = G. val ( s2g ( e )) Program order in S corresponds to program order in G : (cid:86) S. po (cid:87) ⊆ G. po Identity relation in G corresponds to identity or conﬂict relation in S : (cid:84) id (cid:85) ⊆ S. cf ? Reads in S are justiﬁed by writes that have already been observed by the correspondingevents in G . Moreover, covered events in X are justiﬁed by a write corresponding to thatread from the corresponding read in G : a. (cid:86) S. jf (cid:87) ⊆ G. rf ? ; G. hb ? Here we assume existence and uniqueness of such a function. In our Coq development [17], we have adiﬀerent representation of execution graph events (but the same for events of event structures), whichmakes the existence and uniqueness questions trivial.More speciﬁcally, we follow Podkopaev et al. [20, §2.2]. There each non-initializing event e of an executiongraph G is encoded as a pair h t, n i where t is e ’s thread and n is a serial number of e in thread t , i.e., aposition of e in G. po restricted to events of thread t ; each initializing event is encoded by the correspondinglocation— h init l i .In this representation, the function s2g G,S for an event e returns (i) the e ’s thread and a number ofnon-initial events which S. po -preceded e if e is non-initialing or (ii) its location if it is initializing: s2g G,S ( e ) (cid:44) (cid:26) h S. tid ( e ) , | dom ([ S. E \ S. Init ]; S. po ; [ e ]) |i for e S. Init h init S. loc ( e ) i for e ∈ S. Init

E C O O P 2 0 2 0 :16 Reconciling Event Structures with Modern Multiprocessors

Init e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, G andits traversal conﬁguration T C a ppo ppo Init e : R ( x, e : W ( y, e : W ( z, jf The event structure S a andthe selected execution X a Figure 6

The execution graph G , its traversal conﬁguration T C a , the related event structure S a ,and the selected execution X a . Covered events are marked by and issued ones by . Eventsbelonging to the selected execution are marked by . b. (cid:86) S. jf ; [ X ∩ (cid:84) C (cid:85) ] (cid:87) ⊆ G. rf Every write event justifying some external read event should be S. ew -equal to some issuedwrite event in X : dom ( S. jfe ) ⊆ dom ( S. ew ; [ X ∩ (cid:84) I (cid:85) ]) Equal writes in S correspond to the same write event in G : (cid:86) S. ew (cid:87) ⊆ id Every non-trivial S. ew equivalence class contains an issued write in X : S. ew ⊆ ( S. ew ; [ X ∩ (cid:84) I (cid:85) ] ; S. ew ) ? Coherence edges in S correspond to coherence or identity edges in G . (We will explain in§4.3 why a coherence edge in S might correspond to an identity edge in G .) (cid:86) S. co (cid:87) ⊆ G. co ? As an example, consider the execution G from Fig. 5, the traversal conﬁguration T C a (cid:44) h{ Init } , { Init , e }i , and the event structure S a shown in Fig. 6. We will show that I ( prog , G, T C a , S a , X a ), where X a (cid:44) S a . E , holds.Take s2g G,S a = { Init Init , e e , e e , e e } . Given that cf = ew = ∅ , theconsistency constraints hold immediately. For example, condition 8 holds because e isjustiﬁed by Init , which happens before it. Finally, note that only e and e are required tohave the same value by constraint 5, the other related thread events only need to have thesame type and address.The deﬁnition of the simulation relation I renders the proofs of Lemmas 2 and 4 straight-forward. Speciﬁcally, for Lemma 2, the initial conﬁguration T C init ( G ) containing only theinitialization events is simulated by the initial event structure S init as all the constraints aretrivially satisﬁed ( S init . po = S init . jf = S init . ew = S init . co = ∅ ).For Lemma 4, since T C ﬁnal ( G ) covers all events of G , property 5 implies that the labelsof the events in X are equal to the corresponding events of G ; property 6 means that po isthe same between them; property 8 means that rf is the same between them; properties 7and 12 together mean that co is the same. Therefore, G and the execution corresponding to X are isomorphic. We next outline the proof of Lemma 3, which states that the simulation relation I can berestored after a traversal step. . Moiseenko et al. 5:17 Init e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, T C b vf vfvf ppo ppo Init e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, jf jfjf The event structure S b andthe selected execution X b Figure 7

The traversal conﬁguration

T C b , the related event structure S b , and the selectedexecution X b . Suppose that I ( prog , G, T C, S, X ) holds for some prog , G , T C , S , and X , and we needto simulate a traversal step T C −→ T C that either covers or issues an event of thread t . Then we need to produce an event structure S and a subset of its events X such that I ( prog , G, T C , S , X ) holds. Whenever thread t has any uncovered issued write events, Weakestmo might need to take multiple steps from S to S so as to add any missing events po -before the uncovered issued writes of thread t . Borrowing the terminology of the “promisingsemantics” [12], we refer to these steps as constructing a certiﬁcation branch for the issuedwrite(s).Before we present the construction, let us return to the example of Fig. 5. Considerthe traversal step from conﬁguration T C a to conﬁguration T C b (cid:44) h{ Init } , { Init , e , e }i byissuing the event e (see Fig. 7). To simulate this step, we need to show that it is possibleto execute instructions of thread 2 and extend the event structure with a set of events Br b matching these instructions. As we have already seen, the labels of the new events can diﬀerfrom their counterparts in G —they only have to agree for the covered and issued events. Inthis case, we set Br b = { e , e , e } , and adding them to the event structure S a gives usevent structure S b shown in Fig. 7.In more detail, we need to build a run of thread-local semantics prog (2) e −−→ e −−→ e −−→ σ such that (1) it contains events corresponding to all the events of thread 2 up to e ( i.e., e , e , e ) with the same location, type, and thread identiﬁer and (2) any events correspondingto covered or issued events ( i.e., e ) should also have the same value as the correspondingevent in G .Then, following the run of the thread-local semantics, we should extend the event structure S a to S b by adding new events Br b , and ensure that the constructed event structure S b isconsistent (Def. 5) and simulates the conﬁguration T C b . In particular, it means that:for each read event in Br b we need to pick a justiﬁcation write event, which is eitheralready present in S or po -preceed the read event;for each write event in Br b we should determine its position in co order of the eventstructure.Finally, we need to update the selected execution by replacing all events of thread 2 by thenew events Br b : X b (cid:44) X a \ S. thread ( { } ) ∪ Br b . E C O O P 2 0 2 0 :18 Reconciling Event Structures with Modern Multiprocessors

In order to determine whence these read events should be justiﬁed (and hence what valuethey should return), we have adopted the approach of Podkopaev et al. [20] for a similarproblem with certifying promises in the compilation proof from PS to IMM . The constructionrelies on several auxiliary deﬁnitions.First, given an execution G and a traversal conﬁguration h C, I i , we deﬁne the set of determined events to be those events of G that must have equal counterparts in S . Inparticular, this means that S should assign to these events the same label as G , and thus thesame reads-from source for the read events. G. determined h C,I i (cid:44) C ∪ I ∪ dom (( G. rf ∩ G. po ) ? ; G. ppo ; [ I ]) ∪ codom ([ I ] ; ( G. rf ∩ G. po ))Besides covered and issued events, the set of determined events also contains the ppo -preﬁxesof issued events, since issued events may depend on their values, as well as any internal readsreading from issued events, since their values are also determined by the issued events.For the graph G and traversal conﬁguration T C b , the set of determined events containsevents e , e , and e . (The events e and e are issued, whereas e has a ppo edge to e .)In contrast, events e , e , and e are not determined, since their corresponding events in S read/write a diﬀerent value.Second, we introduce the viewfront relation ( vf ) to contain all the writes that have beenobserved at a certain point in the graph. That is, the edge h w, e i ∈ G. vf T C indicates thatthe write w either happens before e , is read by a covered event happening before e , or isread by a determined read earlier in the same thread as e . G. vf h C,I i (cid:44) [ G. W ] ; ( G. rf ; [ C ]) ? ; G. hb ? ∪ G. rf ; [ G. determined h C,I i ] ; G. po ? Figure 7 depicts three G. vf T C b edges. Since G. vf T C ; G. po ⊆ G. vf T C , the other incomingviewfront edges to thread 2 can be derived. Note that there is no edge from e to thread 2,since e neither happens before any event in thread 2 nor is read by any determined read.Finally, we construct the stable justiﬁcation relation ( sjf ) that helps us justify the readevents in Br b in the event structure: G. sjf T C (cid:44) ([ G. W ] ; ( G. vf T C ∩ = G. loc ) ; [ G. R ]) \ ( G. co ; G. vf T C )It relates a read event r to the co -last ‘observed’ write event with same location. Assumingthat G is IMM -consistent, it can be shown that G. sjf agrees with G. rf on the set ofdetermined reads. G. sjf T C ; [ G. determined T C ] ⊆ G. rf For the graph G and traversal conﬁguration T C b shown in Fig. 7 the sjf relation coincideswith the depicted vf edges: i.e., we have h Init , e i , h Init , e i , h e , e i ∈ G. sjf T C b .Having sjf T C b as a guide for values read by instructions in the certiﬁcation run, weconstruct the steps of the thread-local operational semantics prog (2) −→ ∗ σ using thereceptiveness property of the thread’s semantics, which essentially says that given an executiontrace τ = e , ... , e n of the thread semantics, and a subset of events K ⊆ { e , ... , e n − } alongthat trace that have no ppo -successors in the graph, we arbitrarily change the values of readevents in K , and there exist values for the write events in K such that the updated executiontrace is also a trace of the thread semantics. The formal deﬁnition of the receptiveness property is quite elaborate. For the detailed deﬁnition werefer the reader to the Coq development of

IMM [7]. . Moiseenko et al. 5:19

Init e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, T C c ppo ppo Init e : R ( x, e : W ( y, e : W ( z, e : R ( x, e : W ( y, e : W ( z, e : R ( y, e : R ( z, e : W ( x, cfcoew The event structure S c andthe selected execution X c Figure 8

The traversal conﬁguration

T C c , the related event structure S c , and the selectedexecution X c . The relation sjf

T C b is also used to pick justiﬁcation writes for the read events in Br b . Wehave proved that each sjf edge either starts in some issued event (of the previous traversalconﬁguration) or it connects two events that are related by po : G. sjf T C b ⊆ [ I a ] ; G. sjf T C b ∪ G. po In the former case, thanks to the property 4 of our simulation relation, we can pick awrite event from X a corresponding to the issued write ( e.g., for Fig. 7, it is the event e ,corresponding to the issued write e ). In the latter case, we pick either the initial write orsome S b . po preceding write belonging to Br b . In order to pick the S b . co position of the new write events in the updated event structure, wegenerally follow the original G. co order of the IMM graph. Because of the conﬂicting events,however, it is not always possible to preserve the inclusion between the relations. This iswhy we relax the inclusion to (cid:86) S. co (cid:87) ⊆ G. co ? in property 12 of the simulation relation.To see the problem let us return to the example. Suppose that the next traversal stepcovers the read e . To simulate this step, we build an event structure S c (see Fig. 8). Itcontains the new events Br c (cid:44) { e , e , e } .Consider the write events e and e of the event structure. Since the events havediﬀerent labels, we cannot make them ew -equivalent. And since S c . co should be total amongall writes to the same location (with respect to S c . ew ), we must put a co edge between thesetwo events in one direction or another. Note that events e and e correspond to the sameevent e in the graph, thus we cannot use the coherence order of the graph G. co to guideour decision.In fact, the co -order between these two events does not matter, so we could pick eitherdirection. For the purposes of our proofs, however, we found it more convenient to alwaysput the new events earlier in the co order (thus we have h e , e i ∈ S c . co ). Thereby we canshow that the co edges of the event structure ending in the new events, have correspondingedges in the graph: (cid:86) S c . co ; [ Br c ] (cid:87) ⊆ G. co .Now consider the events e and e . Since these events have the same label and correspondto the same event in G , we make them ew -equivalent. In fact, this choice is necessary for thecorrectness of our construction. Otherwise, the new events Br c would be deemed invisible,because of the S c . cf ∩ ( S c . jfe ; ( S c . po ∪ S c . jf ) ∗ ) path between e and e . Recall that onlythe visible events can be used to extract an execution from the event structure (Def. 7). E C O O P 2 0 2 0 :20 Reconciling Event Structures with Modern Multiprocessors

In general, assuming that I ( prog , G, h C, I i , S, X ) holds, we attach the new write event e to an S. ew equivalence class represented by the write event w , s.t. (i) w has the same s2g image as e , i.e., s2g ( w ) = s2g ( e ); (ii) w belongs to X and its s2g image is issued, that is w ∈ X ∩ (cid:84) I (cid:85) . If there is no such an event w , we put e S. co -after events such that their s2g images are ordered G. co -before s2g ( e ), and S. co -before events such that their s2g imagesare equal to s2g ( e ) or ordered G. co -after it. Note that thanks to property 9 of the simulationrelation, that is dom ( S. jfe ) ⊆ dom ( S. ew ; [ X ∩ (cid:84) I (cid:85) ]), our choice of ew guarantees that allnew events will be visible. To sum up, to prove Lemma 3, we consider the events of G. thread ( { t } ) where t is thethread of the event issued or covered by the traversal step T C −→ T C , together with the sjf relation determining the values of the read events. At this point, we can show that I -conditions for the new conﬁguration T C hold for all events except for those in thread t .Because of receptiveness, there exists a sequence of the thread steps prog ( t ) −→ ∗ σ forsome thread state σ such that the labels on this sequence match the events G. thread ( { t } )with the labels determined by sjf , and include an event with the same label as the oneissued or covered by the traversal step T C −→ T C .We then do an induction on this sequence of steps, and add each event to the eventstructure S and to its selected subset of events X (unless already there), showing along theway that the I -conditions also hold for the updated event structure, selected subset, andthe events added. At the end, when we have considered all the events generated by thestep sequence, we will have generated the event structure S and execution X such that I ( prog , G, T C , S , X ) holds. In this section, we brieﬂy describe the changes needed in order to handle the compilationof

Weakestmo ’s sequentially consistent (SC) accesses. The purpose of SC accesses is toguarantee sequential consistency for the simple programming pattern that uses exclusivelySC accesses to communicate between threads. As Lahav et al. [14] showed, however, theirsemantics is quite complicated because they can be freely mixed with non-SC accesses.We ﬁrst deﬁne an extension of

IMM , which we call

IMM SC . Its consistency extends thatof IMM with an additional acyclicity requirement concerning SC accesses, which is takendirectly from

RC11 -consistency [14, Deﬁnition 1]. (cid:73)

Deﬁnition 10.

An execution graph G is IMM SC -consistent if it is IMM -consistent [20,Deﬁnition 3.11] and G. psc base ∪ G. psc F is acyclic, where: G. scb (cid:44) G. po ∪ G. po | = G. loc ; G. hb ; G. po | = G. loc ∪ G. hb | = loc ∪ G. co ∪ G. fr G. psc base (cid:44) ([ G. E sc ] ∪ [ G. F sc ] ; G. hb ? ) ; G. scb ; ([ G. E sc ] ∪ G. hb ? ; [ G. F sc ]) G. psc F (cid:44) [ G. F sc ]; ( G. hb ∪ G. hb ; G. eco ; G. hb ); [ G. F sc ]The scb , psc base and psc F relations were carefully designed by Lahav et al. [14] (andrecently adopted by the C++ standard), so that they provide strong enough guarantees for In IMM SC , event labels include an “access mode”, where sc denotes an SC access. The sets G. E sc consists of all SC accesses (reads, writes and fences) in G , and G. F sc consists of all SC fences in G . . Moiseenko et al. 5:21 programmers while being weak enough to support the intended compilation of SC accessesto commodity hardware. In particular, a previous (simpler) proposal in [2], which essentiallyincludes G. hb between SC accesses in the relation required to be acyclic, is too strongfor eﬃcient compilation to the POWER architecture. Indeed, the compilation schemes toPOWER do not enforce a strong barrier on hb -paths between SC accesses, but rather on G. po ; G. hb ; G. po -paths between SC accesses. (cid:73) Remark 11.

The full

IMM model ( i.e., including release/acquire accesses and SC fences, asdeﬁned by Podkopaev et al. [20]) forbids cycles in rfe ∪ ppo ∪ bob ∪ psc F , where bob is (similarto ppo ) a subset of the program order that must be preserved due to the presence of a memoryfence or release/acquire access. Since psc F is already included in IMM ’s acyclicity constraint,one may consider the natural option of including psc base in that acyclicity constraint as well.However, it leads to a model that is too strong, as it forbids the following behavior: a := [ x ] rlx // [ y ] sc := 1 [ y ] sc := 2 b := [ y ] rlx // [ x ] rlx := b R rlx ( x, W sc ( y, W sc ( y, R rlx ( y, W rlx ( x, bob coepsc base rfe pporfe This behavior is allowed by

POWER (using any of the two intended compilation schemes forSC accesses; see §5.1.2).Adapting the compilation from

Weakestmo to IMM SC to cover SC accesses is straightfor-ward because the full deﬁnition of Weakestmo [6] does not have any additional constraintsabout SC accesses at the level of event structures. It only has an SC constraint at the level ofextracted executions which is actually the same as in

RC11 , which we took as is for

IMM SC . IMM SC to Hardware In this section, we establish describe the extension of the results of [20] to support SC accesseswith their intended compilation schemes to the diﬀerent architectures.As was done in [20], since

IMM SC and the models of hardware we consider are alldeﬁned in the same declarative framework (using execution graphs), we formulate ourresults on the level of execution graphs. Thus, we actually consider the mapping of IMM SC execution graphs to target architecture execution graphs that is induced by compilationof IMM SC programs to machine programs. Hence, roughly speaking, for each architecture α ∈ { TSO , POWER , ARMv7 , ARMv8 } , our (mechanized) result takes the following form:If the α -execution-graph G α corresponds to the IMM SC -execution-graph G , then α -consistency of G α implies IMM SC -consistency of G .Since the mapping from Weakestmo to IMM SC (on the program level) is the identity mapping (Theorem 1), we obtain as a corollary the correctness of the compilation from Weakestmo toeach architecture α that we consider. The exact notions of correspondence between G α and G are presented in [17, Appendices B, C and D].The mapping of IMM SC to each architecture follows the intended compilation schemeof C/C++11 [16, 14], and extends the corresponding mappings of IMM from Podkopaev et al. [20] with the mapping of SC reads and writes. Next, we schematically present theseextensions.

E C O O P 2 0 2 0 :22 Reconciling Event Structures with Modern Multiprocessors

TSO

There are two alternative sound mappings of SC accesses to x86 - TSO :Fence after SC writes Fence before SC reads( | R sc | ) (cid:44) mov ( | R sc | ) (cid:44) mfence;mov ( | W sc | ) (cid:44) mov;mfence ( | W sc | ) (cid:44) mov ( | RMW sc | ) (cid:44) (lock) xchg ( | RMW sc | ) (cid:44) (lock) xchg The ﬁrst, which is implemented in mainstream compilers, inserts an mfence after every SCwrite; whereas the second inserts an mfence before every SC read. Importantly, one should globally apply one of the two mappings to ensure the existence of an mfence between everySC write and following SC read.

POWER

There are two alternative sound mappings of SC accesses to

POWER :Leading sync

POWER , we can reuse theexisting proof for the mapping of

IMM to POWER . To handle the leading sync (respectively,trailing sync ) scheme we introduce a preceding step, in which we prove that splitting in thewhole execution graph each SC access to a pair of an SC fence followed (preceded) by arelease/acquire access is a sound transformation under

IMM SC . That is, this global executiongraph transformation cannot make an inconsistent execution consistent: (cid:73) Theorem 12.

Let G be an execution graph such that [ R sc ∪ W sc ] ; ( G. po ∪ G. po ; G. hb ; G. po ) ; [ R sc ∪ W sc ] ⊆ G. hb ; [ F sc ] ; G. hb , where G. po (cid:44) G. po \ G. rmw . Let G be the execution graph obtained from G by weakeningthe access modes of SC write and read events to release and acquire modes respectively. Then, IMM SC -consistency of G follows from IMM -consistency of G . Having this theorem, we can think about mapping of

IMM SC to POWER as if it consistsof three steps. We establish the correctness of each of them separately. At the

IMM SC level, we globally split each SC-access to an SC-fence and release/acquireaccess. Correctness of this step follows by Theorem 12. We map

IMM to POWER , whose correctness follows by the existing results of [20], sincewe do not have SC accesses at this stage. We remove any redundant fences introduced by the previous step. Indeed, following theleading sync scheme, we will obtain sync;lwsync;st for an SC write. The lwsync isredundant here since sync provides stronger guarantees than lwsync and can be removed.Similarly, following the trailing sync scheme, we will obtain ld;cmp;bc;isync;sync foran SC read. Again, the sync makes other synchronization instructions redundant. . Moiseenko et al. 5:23

ARMv7

The

ARMv7 model [1] is very similar to the

POWER model with the main diﬀerence beingthat it has a weaker preserved program order than

POWER . However, Podkopaev et al. [20]proved

IMM to POWER compilation correctness without relying on

POWER ’s preservedprogram order explicitly, but assuming the weaker version of

ARMv7 ’s order. Thus, theirproof also establishes correctness of compilation from

IMM to ARMv7 .Extending the proof to cover SC accesses follows the same scheme discussed for

POWER ,since two intended mappings of SC accesses for

ARMv7 are the same except for replacing

POWER ’s sync fence with ARMv7 ’s dmb :Leading dmb Trailing dmb ( | R sc | ) (cid:44) dmb; ( | R acq | ) ( | R sc | ) (cid:44) ldr;dmb ( | W sc | ) (cid:44) dmb;str ( | W sc | ) (cid:44) ( | W rel | ) ;dmb ( | RMW sc | ) (cid:44) dmb; ( | RMW acq | ) ( | RMW sc | ) (cid:44) ( | RMW rel | ) ;dmb ARMv8

Since

We note that in this mapping, we follow Podkopaev et al. [20] and compile RMW opera-tions to loops with load-linked and store-conditional instructions (

LDX / STX ). An alternativemapping for RMWs would be to use single hardware instructions, such as

LDADD and

CAS , thatdirectly implement the required functionality. Unfortunately, however, due to a limitation ofthe current

IMM setup and unclarity about the exact semantics of the

CAS instruction, weare not able to prove the correctness of the alternative mapping employing these instructions.The problem is that

IMM assumes that every po -edge from a RMW instruction is preserved,which holds for the mapping of CAS using the aforementioned loop, but not necessarily usingthe single instruction.

While there are several memory model deﬁnitions both for hardware architectures [1, 10, 18,22, 23] and programming languages [3, 4, 11, 15, 19, 21] in the literature, there are relativelyfew compilation correctness results [6, 9, 12, 14, 20, 25].Most of these compilation results do not tackle any of the problems caused by po ∪ rf cycles,which are the main cause of complexity in establishing correctness of compilation mappingsto hardware architectures. A number of papers ( e.g., [6, 12, 25]) consider only hardwaremodels that forbid such cycles, such as x86-TSO [18] and “strong POWER” [13], while others( e.g., [9]) consider compilation schemes that introduce fences and/or dependencies so as toprevent po ∪ rf cycles. The only compilation results where there is some non-trivial interplayof dependencies are by Lahav et al. [14] and by Podkopaev et al. [20].The former paper [14] deﬁnes the RC11 model (repaired C11), and establishes a numberof results about it, most of which are not related to compilation. The only relevant resultis its pencil-and-paper correctness proof of a compilation scheme from

RC11 to POWER

E C O O P 2 0 2 0 :24 Reconciling Event Structures with Modern Multiprocessors that adds a fence between relaxed reads and subsequent relaxed writes, but not betweennon-atomic accesses. As such, the only po ∪ rf cycles possible under the compilation schemeinvolve a racy non-atomic access. Since non-atomic races have undeﬁned semantics in RC11,whenever there is such a cycle, the proof appeals to receptiveness to construct a diﬀerentacyclic execution exhibiting the race.The latter paper [20] introduced IMM and used it to establish correctness of compilationfrom the “promising semantics” ( PS ) [12] to the usual hardware models. As already men-tioned, IMM ’s deﬁnition catered precisely for the needs of the PS compilation proof, andso did not include important features such as sequentially consistent (SC) accesses. Ourcompilation proof shares some infrastructure with that proof—namely, the deﬁnition of IMM and traversals—but also has substantial diﬀerences because PS is quite diﬀerent from Weakestmo . The main challenges in the PS proof were (1) to encode the various orders ofthe IMM execution graphs with the timestamps of the PS machine, and (2) to construct thecertiﬁcation runs for each outstanding promise. In contrast, the main technical challenge inthe Weakestmo compilation proof is that event structures represent several possible executionsof the program together, and that

Weakestmo consistency includes constraints that correlatethese executions, allowing one execution to aﬀect the consistency of another.

In this paper, we presented the ﬁrst correctness proof of mapping from the

Weakestmo memory model to a number of hardware architectures. As a way to show correctness of

Weakestmo compilation to hardware, we employed

IMM [20], which we extended with SCaccesses, from which compilation to hardware follows.Although relying on

IMM modularizes the compilation proof and makes it easy to extendto multiple architectures, it does have one limitation. As was discussed in §5.1.4,

IMM enforces ordering between RMW events and subsequent memory accesses, while one desirablealternative compilation mapping of RMWs to

ARMv8 does not enforce this ordering, whichmeans that we cannot prove soundness of that mapping via the current deﬁnition of

IMM .We are investigating whether one can weaken the corresponding

IMM constraint, so that wecan establish correctness of the alternative

ARMv8 mapping as well.Another way to establish correctness of this alternative mapping to

ARMv8 may be to usethe recently developed Promising-ARM model [23]. Indeed, since Promising-ARM is closelyrelated to PS [12], it should be relatively easy to prove the correctness of compilation from PS to Promising-ARM. Establishing compilation correctness of Weakestmo to Promising-ARM, however, would remain unresolved because

Weakestmo and PS are incomparable [6].Moreover, a direct compilation proof would probably also be quite diﬃcult because of therather diﬀerent styles in which these models are deﬁned. Acknowledgments.

Evgenii Moiseenko and Anton Podkopaev were supported by RFBR(grant number 18-01-00380). Ori Lahav was supported by the Israel Science Foundation(grant number 5166651), by Len Blavatnik and the Blavatnik Family foundation, and by theAlon Young Faculty Fellowship.

References Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling, simulation,testing, and data mining for weak memory.

ACM Trans. Program. Lang. Syst. , 36(2):7:1–7:74,July 2014. URL: http://doi.acm.org/10.1145/2627752 , doi:10.1145/2627752 . . Moiseenko et al. 5:25 Mark Batty, Alastair F. Donaldson, and John Wickerson. Overhauling SC atomics in C11 andOpenCL. In

POPL 2016 , pages 634–648. ACM, 2016. Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. Mathematizing C++concurrency. In

POPL 2011 , pages 55–66, New York, 2011. ACM. doi:10.1145/1925844.1926394 . John Bender and Jens Palsberg. A formalization of java’s concurrent access modes.

Proc.ACM Program. Lang. , 3(OOPSLA):142:1–142:28, October 2019. URL: http://doi.acm.org/10.1145/3360568 , doi:10.1145/3360568 . Hans-J. Boehm and Brian Demsky. Outlawing ghosts: Avoiding out-of-thin-air results. In

MSPC 2014 , pages 7:1–7:6. ACM, 2014. doi:10.1145/2618128.2618134 . Soham Chakraborty and Viktor Vafeiadis. Grounding thin-air reads with event structures.

Proc. ACM Program. Lang. , 3(POPL):70:1–70:27, 2019. doi:10.1145/3290383 . The Coq development of IMM, available at http://github.com/weakmemory/imm , 2019. Will Deacon. The ARMv8 application level memory model, 2017. URL: https://github.com/herd/herdtools7/blob/master/herd/libdir/aarch64.cat . Stephen Dolan, KC Sivaramakrishnan, and Anil Madhavapeddy. Bounding data races in spaceand time. In

PLDI 2018 , pages 242–255, New York, 2018. ACM. URL: http://doi.acm.org/10.1145/3192366.3192421 , doi:10.1145/3192366.3192421 . Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, Luc Maranget,Will Deacon, and Peter Sewell. Modelling the ARMv8 architecture, operationally: Concurrencyand ISA. In

POPL 2016 , pages 608–621, New York, 2016. ACM. URL: http://doi.acm.org/10.1145/2837614.2837615 , doi:10.1145/2837614.2837615 . Alan Jeﬀrey and James Riely. On thin air reads towards an event structures model of relaxedmemory. In

LICS 2016 , pages 759–767, New York, 2016. ACM. URL: http://doi.acm.org/10.1145/2933575.2934536 , doi:10.1145/2933575.2934536 . Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer. A promisingsemantics for relaxed-memory concurrency. In

POPL 2017 , pages 175–189, New York, 2017.ACM. doi:10.1145/3009837.3009850 . Ori Lahav and Viktor Vafeiadis. Explaining relaxed memory models with program transfor-mations. In

FM 2016 . Springer, 2016. doi:10.1007/978-3-319-48989-6_29 . Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek Dreyer. Repairingsequential consistency in C/C++11. In

PLDI 2017 , pages 618–632, New York, 2017. ACM. doi:10.1145/3062341.3062352 . Jeremy Manson, William Pugh, and Sarita V. Adve. The Java memory model. In

POPL 2005 ,pages 378–391, New York, 2005. ACM. doi:10.1145/1040305.1040336 . C/C++11 mappings to processors, 2016. URL: . Evgenii Moiseenko, Anton Podkopaev, Ori Lahav, Orestis Melkonian, and Viktor Vafeiadis.Coq proof scripts and supplementary material for this paper, available at http://plv.mpi-sws.org/weakestmoToImm/ , 2020. Scott Owens, Susmit Sarkar, and Peter Sewell. A better x86 memory model: x86-TSO. In

TPHOLs 2009 , volume 5674 of

LNCS , pages 391–407, Heidelberg, 2009. Springer. Jean Pichon-Pharabod and Peter Sewell. A concurrency semantics for relaxed atomics thatpermits optimisation and avoids thin-air executions. In

POPL 2016 , pages 622–633, New York,2016. ACM. doi:10.1145/2837614.2837616 . Anton Podkopaev, Ori Lahav, and Viktor Vafeiadis. Bridging the gap between programminglanguages and hardware weak memory models.

Proc. ACM Program. Lang. , 3(POPL):69:1–69:31, 2019. doi:10.1145/3290382 . Anton Podkopaev, Ilya Sergey, and Aleksandar Nanevski. Operational aspects of C/C++concurrency.

CoRR , abs/1606.01400, 2016. URL: http://arxiv.org/abs/1606.01400 . E C O O P 2 0 2 0 :26 Reconciling Event Structures with Modern Multiprocessors Christopher Pulte, Shaked Flur, Will Deacon, Jon French, Susmit Sarkar, and Peter Sewell.Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8.

Proc. ACM Program. Lang. , 2(POPL):19:1–19:29, 2018. doi:10.1145/3158107 . Christopher Pulte, Jean Pichon-Pharabod, Jeehoon Kang, Sung-Hwan Lee, and Chung-Kil Hur. Promising-ARM/RISC-V: a simpler and faster operational concurrency model.In

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Designand Implementation , PLDI 2019, pages 1–15, New York, NY, USA, 2019. ACM. URL: http://doi.acm.org/10.1145/3314221.3314624 , doi:10.1145/3314221.3314624 . Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and FrancescoZappa Nardelli. Common compiler optimisations are invalid in the C11 memory modeland what we can do about it. In

POPL 2015 , pages 209–220, New York, 2015. ACM. doi:10.1145/2676726.2676995 . Jaroslav Ševčík, Viktor Vafeiadis, Francesco Zappa Nardelli, Suresh Jagannathan, and PeterSewell. CompCertTSO: A veriﬁed compiler for relaxed-memory concurrency.

J. ACM , 60(3):22,2013. doi:10.1145/2487241.2487248 . . Moiseenko et al. 5:27 A Simulation Relation for the complete

Weakestmo model

Here we present the simulation relation I T ( prog , G, T C, S, X ) and the auxiliary relation I cert ( prog , G, h C, I i , h C , I i , S, X, t, Br, σ, σ ) for the complete Weakestmo memory model.In addition to the relaxed accesses the full versions of the relations handle fences, read-modify-write pairs, release, acquire and sequentially consistent accesses.We deﬁne the relation I T ( prog , G, h C, I i , S, X ) to hold if the following conditions are met: G is an IMM SC -consistent execution of prog . S is a Weakestmo -consistent event structure of prog . X is an extracted subset of S . The s2g -image of X is equal to the union of the covered and issued events and the eventswhich po -precede the issued ones: (cid:86) X (cid:87) = C ∪ dom ( G. po ? ; [ I ]) The s2g -image of the event from the thread t ∈ T lies in C ∪ dom ( G. po ? ; [ I ]). (cid:86) S. thread ( T ) (cid:87) ⊆ C ∪ dom ( G. po ? ; [ I ]) The s2g -image of S ’s event has the same thread, type, modiﬁer, and location. Additionally,the s2g -image of X ’s event which is covered or issued has the same value: a. ∀ e ∈ S. E . S. { tid , typ , loc , mod } ( e ) = G. { tid , typ , loc , mod } ( s2g ( e )) b. ∀ e ∈ X ∩ (cid:84) C ∪ I (cid:85) . S. val ( e ) = G. val ( s2g ( e )) The s2g -image of S. po is a subset of the G. po relation: (cid:86) S. po (cid:87) ⊆ G. po Identity relation in G corresponds to identity or conﬂict relation in S : (cid:84) id (cid:85) ⊆ S. cf ? The s2g -image of a justiﬁcation edge is included in paths in G representing observationof the corresponding thread. The s2g -image of a justiﬁcation edge is in G. rf if theedge ends either in domain of S. rmw , an acquire access, or followed by an acquire fence.Moreover, the s2g -image of S. jf ending in X matches the simulation reads-from relation: a. (cid:86) S. jf (cid:87) ⊆ G. rf ? ; ( G. hb ; [ G. F sc ]) ? ; G. psc ? F ; G. hb ? b. (cid:86) S. jf ; S. rmw (cid:87) ⊆ G. rf ; G. rmw c. (cid:86) S. jf ; ( S. po ; [ S. F ]) ? ; [ S. E w acq ] (cid:87) ⊆ G. rf ; ( G. po ; [ S. F ]) ? ; [ G. E w acq ] d. (cid:86) S. jf ; [ X ] (cid:87) ⊆ G. sjf ( T C )Using the last property it is possible to derive that (cid:86) S. jf ; [ X ∩ (cid:84) C (cid:85) ] (cid:87) ⊆ G. rf . Each write event in S which justiﬁes some read event externally should be S. ew -equal toa write event in X whose s2g -image is issued: dom ( S. jfe ) ⊆ dom ( S. ew ; [ X ∩ (cid:84) I (cid:85) ]) The s2g -image of S. ew is a subset of the identity relation: (cid:86) S. ew (cid:87) ⊆ id Let w and w be diﬀerent events in one S. ew equivalence class. Then, there is w in thisequivalence class s.t. w is in X and s2g ( w ) is issued: S. ew ⊆ ( S. ew ; [ X ∩ (cid:84) I (cid:85) ] ; S. ew ) ? The s2g -image of S. co lies in the reﬂexive closure of G. co . Additionally, s2g -images of S. co -edges ending in X ∩ S. thread ( T ) lay in G. co : a. (cid:86) S. co (cid:87) ⊆ G. co ? b. (cid:86) S. co ; [ X ∩ S. thread ( T ) (cid:87) ⊆ G. co The s2g -image of S. rmw is in G. rmw . Vice versa, G. rmw ending in the covered set is inthe s2g -image of S. rmw ending in X . a. (cid:86) S. rmw (cid:87) ⊆ G. rmw b. G. rmw ; [ C ] ⊆ (cid:86) S. rmw ; [ X ] (cid:87) E C O O P 2 0 2 0 :28 Reconciling Event Structures with Modern Multiprocessors

Let e , w , and w be events in S s.t. (i) h e, w i is an S. release edge, (ii) w and w is in thesame S. ew equivalence class, (iii) w is in X , and (iv) s2g ( w ) is issued. Then e is in X : dom ( S. release ; S. ew ; [ X ∩ (cid:84) I (cid:85) ]) ⊆ X This property is needed to show that dom ( S. hb \ S. po ) is included in X . Let r , r , w , and w be events in S s.t. (i) r and r are in immediate conﬂict and justiﬁedfrom w and w respectively, and (ii) r is in X and its thread is in T . Then s2g ( w ) is G. co -less than s2g ( w ): (cid:86) S. jf ; S. cf imm ; [ X ∩ S. thread ( T )] ; S. jf − (cid:87) ⊆ G. co This property is needed to prove cf imm -justification on the simulation step. For all t ∈ T there exists σ s.t. S. K C ( t ) −→ ∗ t σ and the thread-local execution graph σ.G is equivalent modulo rf and co components to the restriction of G to the thread t .In addition to I we also deﬁne a version of the simulation realtion which holds duringthe construction of a certiﬁcation branch I cert .We deﬁne the relation I cert ( prog , G, h C, I i , h C , I i , S, X, t, Br, σ, σ ) to hold if the follow-ing conditions are met: I T \{ t } ( prog , G, h C, I i , S, X ) holds. G ‘ h C, I i −→ t h C , I i holds. σ and σ are thread states s.t. σ is reachable from σ , σ corresponds to the S. po -lastevent in Br and the partial execution graph of σ contains covered and issued events upto the G. po -last issued write in the thread t : a. σ −→ ∗ t σ b. σ.G. E = (cid:86) Br (cid:87) c. σ .G. E = G. thread ( t ) ∩ ( C ∪ dom ( G. po ? ; [ I ])) The set Br consists of the events from the thread t and covered preﬁxes of Br and X restricted to thread t coincide: a. Br ⊆ S. thread ( t ) b. Br ∩ (cid:84) C (cid:85) = X ∩ S. thread ( t ) ∩ (cid:84) C (cid:85) The partial execution graph of σ assigns same thread identiﬁer, type, location and modeas the full execution graph G does. Additionally, it assigns the same value as G todetermined events. a. ∀ e ∈ σ .G. E . σ .G. { tid , typ , loc , mod } ( e ) = G. { tid , typ , loc , mod } ( e ) b. ∀ e ∈ σ .G. E ∩ G. determined ( h C , I i ) . σ .G. val ( e ) = G. val ( e ) The s2g -image of the jf relation ending in Br is included in G. sjf ( h C , I i ): (cid:86) S. jf ; [ Br ] (cid:87) ⊆ G. sjf ( h C , I i ) For every issued event from Br there exists an S. ew -equivalent in X . And, symmetrically,every issued event from X within the processed part of the certiﬁcation branch has an S. ew -equivalent in Br . a. Br ∩ (cid:84) I (cid:85) ⊆ dom ( S. ew ; [ X ]) b. X ∩ (cid:84) I ∩ σ.G. E (cid:85) ⊆ dom ( S. ew ; [ Br ]) The s2g -image of S. co ending in Br lies in G. co The s2g -image of S. co ending in X ∩ S. thread ( t ) and not in the processed part of the certiﬁcation branch lies in G. co . a. (cid:86) S. co ; [ Br ] (cid:87) ⊆ G. co b. (cid:86) S. co ; [ X ∩ S. thread ( t ) \ (cid:84) σ.G. E (cid:85) ] (cid:87) ⊆ G. co Each G. rmw edge ending in the processed part of the certiﬁcation branch is the s2g -imageof some S. rmw edge ending in Br . G. rmw ; [ C ∩ σ.G. E ] ⊆ (cid:86) S. rmw ; [ Br ] (cid:87) . Moiseenko et al. 5:29 ( | r := [ e ] rlx | ) ≈ “ ldr ” ( | [ e ] rlx := e | ) ≈ “ str ”( | r := [ e ] w acq | ) ≈ “ ldar ” ( | [ e ] w rel := e | ) ≈ “ stlr ”( | fence acq | ) ≈ “ dmb.ld ” ( | fence = acq | ) ≈ “ dmb.sy ”( | r := FADD o ( e , e ) | ) ≈ “ L: ” ++ ld( o ) ++ st( o ) ++ “ bc L ”( | r := CAS o ( e, e R , e W ) | ) ≈ “ L: ” ++ ld( o ) ++ “ cmp;bc Le ; ” ++ st( o ) ++ “bc L ; Le: ”ld( o ) (cid:44) o w acq ? “ ldaxr; ” : “ ldxr; ” st( o ) (cid:44) o w rel ? “ stlxr.; ” : “ stxr.; ” Figure 9

Compilation scheme from

IMM SC to ARMv8. Suppose w , w , r , and r are S ’s events s.t. (i) r and r are justiﬁed from w and w respectively, and (ii) r and r are in immediate conﬂict and belong to thread t . Then s2g ( w ) is G. co -greater than s2g ( w ) if either r is in Br : (cid:86) S. jf ; S. cf imm ; [ Br ] ; S. jf − (cid:87) ⊆ G. co or r is not in Br and r is in X ∩ S. thread ( t ): (cid:86) S. jf ; [ S. E \ Br ] ; S. cf imm ; [ X ∩ S. thread ( t )] ; S. jf − (cid:87) ⊆ G. co B From

IMM SC to ARMv8

The intended mapping of

IMM to ARMv8 is presented schematically in Fig. 9 and follows [16].Note that acquire and SC loads are compiled to the same instruction ( ldar ) as well as releaseand SC stores ( stlr ). In

ARM assembly RMWs are represented as pairs of instructions— exclusive load ( ldxr ) followed by exclusive store ( stxr ), and these instructions are also havetheir stronger (SC) counterparts— ldaxr and stlxr .We use

ARMv8 declarative model [8] (see also [22]). Its labels are given by:

ARM read label: R o R ( x, v ) where x ∈ Loc , v ∈ Val , o R ∈ { rlx , Q , A } , and rlx (cid:64) Q (cid:64) A . ARM write label: W o W ( x, v ) where x ∈ Loc , v ∈ Val , o W ∈ { rlx , L } , and rlx (cid:64) L . ARM fence label: F o F where o F ∈ { ld , sy } and ld (cid:64) sy .In turn, ARM ’s execution graphs are deﬁned as

IMM SC ’s ones, except for the CAS dependency, casdep , which is not present in ARM executions.The deﬁnition of

ARMv8 -consistency requires the following derived relations (see [22] forfurther explanations and details): obs (cid:44) rfe ∪ fre ∪ coe ( observed-by ) dob (cid:44) ( addr ∪ data ); rfi ? ∪ ( ctrl ∪ data ); [ W ]; coi ? ∪ addr ; po ; [ W ]( dependency-ordered-before ) aob (cid:44) rmw ∪ [ W ex ]; rfi ; [ R w Q ] ( atomic-ordered-before ) bob (cid:44) po ; [ F sy ]; po ∪ [ R ]; po ; [ F ld ]; po ∪ [ R w Q ]; po ∪ po ; [ W L ]; coi ? ∪ [ W L ]; po ; [ R A ]( barrier-ordered-before ) (cid:73) Deﬁnition 13. An ARMv8 execution graph G a is called ARMv8 -consistent if the followinghold:codom ( G a . rf ) = G a . R .For every location x ∈ Loc , G a . co totally orders G a . W ( x ) . G a . po | loc ∪ G a . rf ∪ G a . fr ∪ G a . co is acyclic. (sc-per-loc) G a . rmw ∩ ( G a . fre ; G a . coe ) = ∅ . We only describe the fragment of the model that is needed for mapping of

Compilation scheme from

IMM SC to TSO . G a . obs ∪ G a . dob ∪ G a . aob ∪ G a . bob is acyclic. (external) We interpret the intended compilation on execution graphs: (cid:73)

Deﬁnition 14.

Let G be an IMM execution graph. An

ARM execution graph G a corresponds to G if the following hold: G a . E = G. E and G a . po = G. po G a . lab = { e ( | G. lab ( e ) | ) | e ∈ G. E } where: ( | R rlx s ( x, v ) | ) (cid:44) R rlx ( x, v ) ( | W rlx ( x, v ) | ) (cid:44) W rlx ( x, v )( | R acq s ( x, v ) | ) (cid:44) R Q ( x, v ) ( | W w rel ( x, v ) | ) (cid:44) W L ( x, v )( | R sc s ( x, v ) | ) (cid:44) R A ( x, v )( | F acq | ) (cid:44) F ld ( | F rel | ) = ( | F acqrel | ) = ( | F sc | ) (cid:44) F sy G. rmw = G a . rmw , G. data = G a . data , and G. addr = G a . addr (the compilation does not change RMW pairs and data/address dependencies) G. ctrl ⊆ G a . ctrl (the compilation only adds control dependencies) [ G. R ex ] ; G. po ⊆ G a . ctrl ∪ G a . rmw ∩ G a . data (exclusive reads entail a control dependency to any future event, except for their immediateexclusive write successor if arose from an atomic increment) G. casdep ; G. po ⊆ G a . ctrl (CAS dependency to an exclusive read entails a control dependency to any future event) We state our theorem that ensures

IMM SC -consistency if the corresponding ARMv8 execution graph is

ARMv8 -consistent. (cid:73)

Theorem 15.

Let G be an IMM execution graph with whole serial numbers ( sn [ G. E ] ⊆ N ),and let G a be an ARMv8 execution graph that corresponds to G . Then, ARMv8 -consistencyof G a implies IMM SC -consistency of G . Outline.

IMM -consistency of G follows from [20, Theorem 4.5]. That is, we only need to showthat acyclicity of G. psc base ∪ G. psc F holds. We start by showing that G a . obs ∪ G a . dob ∪ G a . aob ∪ G a . bob is acyclic, where obs (cid:44) rfe ∪ fr ∪ cobob (cid:44) bob ∪ [ R ]; po ; [ F ld ] ∪ po ; [ F sy ] ∪ [ F w ld ]; po Then, we ﬁnish the proof by showing that G a . psc base ∪ G a . psc F is included in ( G a . obs ∪ G a . dob ∪ G a . aob ∪ G a . bob ) + . (cid:74) . Moiseenko et al. 5:31 C From

IMM SC to TSO

The intended mapping of

IMM SC to TSO is presented schematically in Fig. 10. There aretwo possible alternatives for compiling SC accesses (see the bottom of Fig. 10): to compilean SC store to a store followed by a fence or to compile an SC load to a load preceded by afence. Both of the schemes guarantee that in compiled code there is a fence between everystore and load instructions originated from SC accesses. Regarding compilation schemes ofSC accesses, our proof of the compilation correctness from

IMM SC to TSO depends only onthis property. That is, in this section, we concentrate only on the compilation alternativewhich compiles SC stores using fences.As a model of the

TSO architecture, we use a declarative model from [1]. Its labels aregiven by:

TSO read label: R ( x, v ) where x ∈ Loc and v ∈ Val . TSO write label: W ( x, v ) where x ∈ Loc and v ∈ Val . TSO fence label:

MFENCE .In turn,

TSO ’s execution graphs are deﬁned as

IMM SC ’s ones. Below, we interpret thecompilation on execution graphs. (cid:73) Deﬁnition 16.

Let G be an IMM execution graph with whole identiﬁers ( G. E ⊆ N ). A TSO execution graph G t corresponds to G if the following hold: G t . E = G. E \ G. F = sc ∪ { n + 0 . | n ∈ G. W sc } (non-SC fences are removed) G t . tid ( e ) = G. tid ( b e c + 0 . for all e in G t G t . po =[ G t . E ] ; ( G. po ∪ {h a, n + 0 . i | h a, n i ∈ G. po ? } ∪ {h n + 0 . , a i | h n, a i ∈ G. po } ) ; [ G t . E ] (new events are added after SC writes) G t . lab = { e ( | G. lab ( e ) | ) | e ∈ G. E \ G. F = sc } ∪ { e MFENCE | e ∈ G t . E \ G. E } where: ( | R o R s ( x, v ) | ) (cid:44) R ( x, v ) ( | W o W ( x, v ) | ) (cid:44) W ( x, v ) ( | F sc | ) (cid:44) MFENCE G. rmw = G t . rmw , G. data = G t . data , and G. addr = G t . addr (the compilation does not change RMW pairs and data/address dependencies) G. ctrl ; [ G. E \ G. F = sc ] ⊆ G t . ctrl (the compilation only adds control dependencies) The following derived relations are used to deﬁne the

TSO -consistency predicate. ppo

TSO (cid:44) [ R ∪ W ]; po ; [ R ∪ W ] \ [ W ]; po ; [ R ] fence TSO (cid:44) [ R ∪ W ]; po ; [ MFENCE ]; po ; [ R ∪ W ] implied _ fence TSO (cid:44) [ W ]; po ; [ dom ( rmw )] ∪ [ codom ( rmw )]; po ; [ R ] hb TSO (cid:44) ppo

TSO ∪ fence TSO ∪ implied _ fence TSO ∪ rfe ∪ co ∪ fr (cid:73) Deﬁnition 17. G is called TSO -consistent if the following hold:codom ( G. rf ) = G. R . ( rf -completeness) For every location x ∈ Loc , G. co totally orders G. W ( x ) . ( co -totality) po | loc ∪ rf ∪ fr ∪ co is acyclic. (sc-per-loc) G. rmw ∩ ( G. fre ; G. coe ) = ∅ . (atomicity) G. hb TSO is acyclic. (tso-no-thin-air)

Next, we state our theorem that ensures

IMM SC -consistency if the corresponding TSO execution graph is

TSO -consistent.

E C O O P 2 0 2 0 :32 Reconciling Event Structures with Modern Multiprocessors (cid:73)

Theorem 18.

Let G be an IMM SC execution graph with whole identiﬁers ( G. E ⊆ N ), andlet G t be an TSO execution graph that corresponds to G . Then, TSO -consistency of G t implies IMM SC -consistency of G . Outline.

Since G t corresponds to G , we know that[ G. W sc ]; G. po ; [ G. R sc ] ⊆ G t . po ; [ G t . MFENCE ]; G t . po as the aforementioned property of the compilation scheme. We show that G t . ehb TSO (cid:44) G t . hb TSO ∪ [ G t . MFENCE ]; G t . po ∪ [ G t . MFENCE ]; G t . po is acyclic. Then, we show that G. psc base ∪ G. psc F is included in G t . ehb + TSO . It means thatacyclicity of G. psc base ∪ G. psc F holds, and it leaves us to prove that G is IMM -consistent.That is done by standard relational techniques (see [7]). (cid:74) . Moiseenko et al. 5:33 ( | r := [ e ] rlx | ) ≈ “ ld ” ( | [ e ] rlx := e | ) ≈ “ st ”( | r := [ e ] acq | ) ≈ “ ld;cmp;bc;isync ” ( | [ e ] rel := e | ) ≈ “ lwsync;st ”( | fence = sc | ) ≈ “ lwsync ” ( | fence sc | ) ≈ “ sync ”( | r := FADD o ( e , e ) | ) ≈ wmod( o ) ++ “ L: lwarx;stwcx.;bc L ” ++ rmod( o )( | r := CAS o ( e, e R , e W ) | ) ≈ wmod( o ) ++ “ L: lwarx;cmp;bc Le ;stwcx.;bc L ; Le: ” ++ rmod( o )wmod( o ) (cid:44) o w rel ? “ lwsync; ” : “” rmod( o ) (cid:44) o w acq ? “ ;isync ” : “” Leading sync : Trailing sync : ( | r := [ e ] sc | ) ≈ “ sync;ld;cmp;bc;isync ” ( | r := [ e ] sc | ) ≈ “ ld;sync ”( | [ e ] sc := e | ) ≈ “ sync;st ” ( | [ e ] sc := e | ) ≈ “ lwsync;st;sync ” Figure 11

Compilation scheme from

IMM to POWER . D From

IMM SC to POWER

Here we use the same mapping of

IMM to POWER (see Fig. 11) as in [20] for all instructionsexcept for SC accesses. For the latter, there are two standard compilations schemes [16]presented in the bottom of Fig. 11: with leading and trailing sync fences.The next deﬁnition presents the correspondence between

IMM execution graphs and theirmapped

POWER ones following the leading compilation scheme in Fig. 11 with eliminationof the aforementioned redundancy of SC write compilation. (cid:73)

Deﬁnition 19.

Let G be an IMM execution graph with whole identiﬁers ( G. E ⊆ N ). A POWER execution graph G p corresponds to G if the following hold: G p . E = G. E ∪ { n + 0 . | n ∈ ( G. R w acq \ dom ( G. rmw )) ∪ codom ([ G. R w acq ] ; G. rmw ) }∪ { n − . | n ∈ ( G. E w rel \ dom ( G. rmw )) ∪ dom ( G. rmw ; [ G. W w rel ]) } (new events are added after acquire reads and acquire RMW pairs and before SC accessesand SC RMW pairs) G p . tid ( e ) = G. tid ( b e + 0 . c ) for all e in G p G p . po = G. po ∪ (( G p . E × G p . E ) ∩ ( {h a, n − . i | h a, n i ∈ G. po } ∪{h n − . , a i | h n, a i ∈ G. po ? } ∪{h a, n + 0 . i | h a, n i ∈ G. po ? } ∪{h n + 0 . , a i | h n, a i ∈ G. po } )) G p . lab = { e ( | G. lab ( e ) | ) | e ∈ G. E } ∪{ n + 0 . F isync | n + 0 . ∈ G p . E ∧ n ∈ N } ∪{ n − . F lwsync | n − . ∈ G p . E ∧ n ∈ N ∧ n G. E sc ∪ dom ( G. rmw ; [ G. W sc ]) } ∪{ n − . F sync | n − . ∈ G p . E ∧ n ∈ N ∧ n ∈ G. E sc ∪ dom ( G. rmw ; [ G. W sc ]) } where: ( | R o R s ( x, v ) | ) (cid:44) R ( x, v ) ( | F acq | ) = ( | F rel | ) = ( | F acqrel | ) (cid:44) F lwsync ( | W o W ( x, v ) | ) (cid:44) W ( x, v ) ( | F sc | ) (cid:44) F sync G. rmw = G p . rmw , G. data = G p . data , and G. addr = G p . addr (the compilation does not change RMW pairs and data/address dependencies) G. ctrl ⊆ G p . ctrl (the compilation only adds control dependencies) E C O O P 2 0 2 0 :34 Reconciling Event Structures with Modern Multiprocessors [ G. R w acq ] ; G. po ⊆ G p . rmw ∪ G p . ctrl (a control dependency is placed from every acquire or SC read) [ G. R ex ] ; G. po ⊆ G p . ctrl ∪ G p . rmw ∩ G p . data (exclusive reads entail a control dependency to any future event, except for their immediateexclusive write successor if arose from an atomic increment) G. data ; [ codom ( G. rmw )] ; G. po ⊆ G p . ctrl (data dependency to an exclusive write entails a control dependency to any future event) G. casdep ; G. po ⊆ G p . ctrl (CAS dependency to an exclusive read entails a control dependency to any future event) The correspondence between

IMM and

POWER execution graphs which follows the trailingcompilation scheme may be presented similarly with two main diﬀerence. First, obviously,SC accesses are compiled to release and acquire accesses followed by SC fences: G p . E = G. E ∪ { n + 0 . | n ∈ { ( G. E w acq \ dom ( G. rmw )) ∪ codom ([ G. R w acq ] ; G. rmw ) }∪ { n − . | n ∈ ( G. W w rel \ dom ( G. rmw )) ∪ dom ( G. rmw ; [ G. W w rel ]) } G p . lab = { e ( | G. lab ( e ) | ) | e ∈ G. E } ∪{ n + 0 . F isync | n + 0 . ∈ G p . E ∧ n ∈ N ∧ n ∈ G. R acq ∪ codom ([ G. R acq ] ; G. rmw ) } ∪{ n + 0 . F sync | n + 0 . ∈ G p . E ∧ n ∈ N ∧ n ∈ G. E sc ∪ codom ([ G. R sc ] ; G. rmw ) } ∪{ n − . F lwsync | n − . ∈ G p . E ∧ n ∈ N + } Second, [ G. R w acq ] ; G. po has to be included in G p . rmw ∪ G p . ctrl ∪ G p . po ; [ G p . F lwsync ] ; G p . po ? ,not just in G p . rmw ∪ G p . ctrl , to allow for elimination of the aforementioned SC readcompilation redundancy.The next theorem ensures IMM SC -consistency if the corresponding POWER executiongraph is

POWER -consistent. (cid:73)

Theorem 20.

Let G be an IMM execution graph with whole identiﬁers ( G. E ⊆ N ), and let G p be a POWER execution graph that corresponds to G . Then, POWER -consistency of G p implies IMM SC -consistency of G . Outline.

We construct an

IMM execution graph G by inserting SC fences before SC accessesin G . We also construct G NoSC from G by replacing SC write and read accesses of G with release write and acquire read ones respectively. Obviously, IMM SC -consistency of G follows from IMM SC -consistency of G , which, in turn, follows from IMM -consistency of G NoSC by Theorem 12. We construct an

IMM execution graph G from G NoSC by inserting releasefences before release writes, and then an

IMM execution graph G NoRel from G by weakeningthe access modes of release write events to a relaxed mode. As on a previous proof step, IMM -consistency of G NoSC follows from

IMM -consistency of G , which in turn follows from IMM -consistency of G NoRel by [20, Theorem 4.1].Thus to prove the theorem we need to show that G NoRel is IMM -consistent. Note that G p —the POWER execution graph corresponding to G —also corresponds to G NoRel by constructionof G NoRel . That is,

IMM -consistency of G NoRel follows from

POWER -consistency of G p by [20,Theorem 4.3] since G NoRel does not contain SC read and write access events as well as releasewrite access events.does not contain SC read and write access events as well as releasewrite access events.