[PDF] A Thread-Local Semantics and Efficient Static Analyses for Race Free Programs

Abstract

Data race free (DRF) programs constitute an important class of concurrent programs. In this paper we provide a framework for designing and proving the correctness of data flow analyses that target this class of programs. These analyses are in the same spirit as the "sync-CFG" analysis proposed in earlier literature. To achieve this, we first propose a novel concrete semantics for DRF programs, called L-DRF, that is thread-local in nature---each thread operates on its own copy of the data state. We show that abstractions of our semantics allow us to reduce the analysis of DRF programs to a sequential analysis. This aids in rapidly porting existing sequential analyses to sound and scalable analyses for DRF programs. Next, we parameterize L-DRF with a partitioning of the program variables into "regions" which are accessed atomically. Abstractions of the region-parameterized semantics yield more precise analyses for "region-race" free concurrent programs. We instantiate these abstractions to devise efficient relational analyses for race free programs, which we have implemented in a prototype tool called RATCOP. On the benchmarks, RATCOP was able to prove up to 65% of the assertions, in comparison to 25% proved by our baseline. Moreover, in a comparative study with a recent concurrent static analyzer, RATCOP was up to 5 orders of magnitude faster.

Full PDF

AA Thread-Local Semantics and Eﬃcient StaticAnalyses for Race Free Programs

Suvam Mukherjee · Oded Padon · SharonShoham · Deepak D’Souza · NoamRinetzkyAbstract

Data race free (DRF) programs constitute an important class of concur-rent programs. In this paper we provide a framework for designing and proving thecorrectness of data ﬂow analyses that target this class of programs. These analysesare in the same spirit as the “sync-CFG” analysis proposed in earlier literature. Toachieve this, we ﬁrst propose a novel concrete semantics for DRF programs, called L -DRF that is thread-local in nature – each thread operates on its own copy ofthe data state. We show that abstractions of our semantics allow us to reduce theanalysis of DRF programs to a sequential analysis. This aids in rapidly portingexisting sequential analyses to sound and scalable analyses for DRF programs.Next, we parameterize L -DRF with a partitioning of the program variables into“regions” which are accessed atomically. Abstractions of the region-parameterizedsemantics yield more precise analyses for region-race free concurrent programs.We instantiate these abstractions to devise eﬃcient relational analyses for racefree programs, which we have implemented in a prototype tool called RATCOP. Suvam MukherjeeMicrosoft Research,IndiaE-mail: [email protected]

Work done while the author was at the Indian Institute of Science, Bangalore, India

Oded PadonStanford University,USAE-mail: [email protected] ShohamTel Aviv University,IsraelE-mail: [email protected] D’SouzaIndian Institute of Science,IndiaE-mail: [email protected] RinetzkyTel Aviv University,IsraelE-mail: [email protected] a r X i v : . [ c s . P L ] S e p S. Mukherjee et al

On the benchmarks, RATCOP was able to prove upto 65% of the assertions, incomparison to 25% proved by our baseline. Moreover, in a comparative study witha recent concurrent static analyzer, RATCOP was up to 5 orders of magnitudefaster.

Keywords

Abstract Interpretation · Concurrent Programs · Static Analysis · Data-race freedom

Our aim in this work is to provide a framework for developing data-ﬂow analyseswhich speciﬁcally target the class of data race free (DRF) concurrent programs.DRF programs constitute an important class of concurrent programs, as mostprogrammers strive to write race free code. There are a couple of reasons whyprogrammers do so. Firstly, even assuming sequential consistency (SC) semantics,a racy program often leads to undesirable eﬀects like atomicity violations. Secondly,under the prevalent “SC-for-DRF” policy only DRF programs are guaranteed tohave sequentially consistent execution behaviors in many weak memory models[1, 6, 24]. Non-DRF programs do not have this guarantee: for example the JavaMemory Model [24] gives some weak guarantees, while the C++ semantics [6]gives essentially no guarantees, for the execution semantics of racy programs.Thus ensuring that a racy program does something useful is a diﬃcult job for aprogrammer. For these and other reasons, programmers tend to write race freeprograms. There is thus is a large code base of DRF programs that can beneﬁtfrom data-ﬂow analysis techniques that leverage the property of race-freedom toprovide analyses that run eﬃciently.The starting point of this work is the “sync-CFG” style of statically analyzingDRF programs, proposed in [11]. The analysis here essentially runs a sequentialanalysis on each thread, communicating data-ﬂow facts between threads only via“synchronization edges”, that go from a release statement in one thread to thecorresponding acquire statement in another thread. The analysis thus runs on thecontrol-ﬂow graphs (CFGs) of the threads, augmented with synchronization edges,as shown in the center of Fig. 2, which explains the name for this style of analysis.The analysis computes data ﬂow facts about the value of a variable that are sound only at points where that variable is relevant , in that it is read or written to atthat location. The analysis thus trades unsoundness of facts at irrelevant pointsfor the eﬃciency gained by restricting interference between threads to points ofsynchronization alone.However, the analysis in [11] suﬀers from some drawbacks. Firstly, the anal-ysis is intrinsically a “value-set” analysis, which can only keep track of the setof values each variable can assume, and not the relationships between variables.Any naive attempt to extend the analysis to a more precise relational one quicklyleads to unsoundness. The second issue is to do with the technique for establish- ing soundness. A convenient way to prove soundness of an analysis is to showthat it is a consistent abstraction [9] of a canonical analysis, like the collectingsemantics for sequential programs [9] or the interleaving semantics for concurrentprograms [22]. For this one typically makes use of the “local” suﬃcient conditionsfor consistent abstration given in [9]. However, for a sync-CFG-based analysis, it hread-Local Analyses 3 appears diﬃcult to use this route to show it to be a consistent abstraction of thestandard interleaving semantics. This is largely due to the thread-local nature ofthe states and the unsoundness at irrelevant points, which makes it diﬃcult tocome up with natural abstraction and concretization functions that form a Galoisconnection. Instead, one needs to resort to an intricate argument, as done in [11],which essentially shows that in the least ﬁxed point of the analysis, every write toa variable will ﬂow to a read of that variable via a happens-before path (that isguaranteed to exist by the property of race-freedom). Thus, while one can arguesoundness of abstractions of the value-set analysis by demonstrating a consistentabstraction with the latter, to argue soundness of any other proposed sync-CFGstyle analysis (in particular one that uses a more precise domain than value-sets),one would have to work out a similar involved proof as in [11].Towards addressing these issues, we propose a framework that facilitates thedesign of diﬀerent sync-CFG analyses with varying degrees of precision and eﬃ-ciency. The foundation of this framework is a novel thread-local semantics for DRFprograms, which can play the role of a “most precise” analysis which other sync-CFG analyses can be shown to be consistent abstractions of. This semantics, whichwe call L -DRF [30], is similar to the interleaving semantics of concurrent programs,but keeps thread-local (or per-thread) copies of the shared state. Intuitively, oursemantics works as follows. Apart from its local copy of the shared data state,each thread t also maintains a per-variable version count, which is incrementedwhenever t writes to the variable. The exchange of information between threads isvia buﬀers, associated with release program points in the program. When a threadreleases a lock m , it stores its local data state to the corresponding buﬀer, alongwith the version counts of the variables. As a result, the buﬀer of a release pointrecords both the local data state and the variable versions, as they were, when therelease was last executed. When some thread t (cid:48) subsequently acquires m , it com-pares its per-variable version count with those in the buﬀers pertaining to releasepoints associated with m . The thread t (cid:48) then copies over the valuation (and theversion) of a variable to its local state, if it is newer in some buﬀer (as indicated bya higher version count). The value of a shared variable in the local state of a threadmay be “ stale ”, in that the variable has subsequently been updated by anotherthread but has not yet been reﬂected here. The L -DRF semantics leverages therace freedom property to ensure that the value of a variable is correct in the localstate at program points where it is relevant (read or written to). It thus capturesthe essence of a sync-CFG analysis. The L -DRF semantics is also of independentinterest, since it can be viewed as an alternative characterization of the behaviorof data race free programs.The analysis induced by the L -DRF semantics is shown to be sound for DRFprograms. In addition, the analysis is, in some sense, the most precise sync-CFGanalysis one can hope for: at every point in a thread, the relevant part of thethread-local copy of the shared state is guaranteed to arise in some execution ofthe program.Using the L -DRF semantics as a basis, we now propose several precise and eﬃcient relational sync-CFG analyses. The soundness of these analyses all followimmediately, since they can easily be shown to be consistent abstractions of L -DRF. The key idea behind obtaining a sound relational analysis is suggested bythe L -DRF analysis: we preserve variable correlations within a thread, whereas S. Mukherjee et al at each acquire point, we apply a mix operator on the abstract values. The mix operation essentially amounts to forgetting all correlations between the variables.While these analyses allow maintaining fully-relational properties within thread-local states, communicating information over cross-thread edges loses all correla-tions due to the mix operation. To improve precision further, we reﬁne the L -DRFsemantics to take into account data regions . Technically, we introduce the notionof region race freedom and develop the L -RegDRF semantics [30]: the program-mer can partition the program variables into “regions” that should be accessed atomically . A program is region race free if it does not contain conﬂicting accessesto variables in the same region, that are unordered by the happens-before rela-tion [21]. The classical notion of data race freedom is a special case of region racefreedom, where each region consists of a single variable. Techniques to determinewhether a program is race free can be naturally extended to determine region racefreedom (see Sec. 7). L -RegDRF reﬁnes L -DRF by taking into account the atomicnature of accesses that the program makes to variables in the same region. Forprograms which are free from region-races, L -RegDRF produces executions whichare indistinguishable, with respect to reads of the regions, from the ones producedby L -DRF. By leveraging the L -RegDRF semantics as a starting point, we obtainmore precise sequential analyses that track relational properties within regions and across threads. This is obtained by reﬁning the granularity of the mix operatorfrom single variables to regions.We have implemented the new relational analyses (based on L -DRF and L -RegDRF) in a prototype analyzer called RATCOP [29], and provide a thoroughempirical evaluation in Sec. 8. We show that RATCOP attains a precision of upto 65% on a subset of race-free programs from the SV-COMP15 suite. This subsetcontains programs which have interesting relational invariants. In contrast, aninterval based value-set analysis derived from [11] (which we use as our baseline)was able to prove only 25% of the assertions. On a separate set of experiments,RATCOP turns out to be nearly 5 orders of magnitude faster than an existingstate-of-the-art abstract interpretation based tool [28].The rest of this paper is organized as follows. In the next section we give anoverview of our thread-local semantics and the associated analyses. In Sec. 3 wedeﬁne our programming language and its standard interleaving semantics. Sec. 4contains the L -DRF semantics and the proof of its soundness and completenessvis-a-vis the standard semantics. We then introduce some analyses inspired bythe L -DRF semantics, and formally show how we can prove their soundness byshowing them to be a consistent abstraction of the L -DRF semantics. In Sec. 7we introduce our region-based analysis. In Sec. 8 we describe the implementationof our analyses, and experimental evaluation. We conclude in Sec. 9 with relatedwork and discussion. We illustrate the L -DRF semantics, and its sequential abstractions, on the simpleprogram in Fig. 1. We assume that all variables are shared and are initialized to0. The threads access x and y only after acquiring lock m . The program is freefrom data races. hread-Local Analyses 5 Thread t () {1: acquire (m);2: x := y;3: x ++;4: y ++;5: assert (x=y);6: release (m);7:} Thread t () {8: z ++;9: assert (z =1) ;10: acquire (m);11: assert (x=y);12: release (m);13:} Fig. 1

A simple race free multi-threaded program. The variables x , y and z are shared andinitialized to 0. Fig. 2 shows the sync-CFG representation of the program (the control-ﬂowgraphs of the threads have been made implicit to improve clarity) in the center. Thecolumns to the left and right show data ﬂow facts obtained using three diﬀerentanalyses based on the L -DRF semantics, which we will describe later. Fig. 2

The sync-CFG representation of the program of Fig. 1 (center), with the facts computedby three analyses based on the L -DRF semantics shown in the three columns on the sides.In the sync-CFG the intra-thread control ﬂow edges are omitted for clarity, and only thesynchronization edges are shown. The columns Rel and

RegRel show the facts computed bypolyhedral-based relational abstractions of the L -DRF semantics and its region-parameterizedversion, respectively. The Value-Set column shows the facts computed by interval abstractionsof the Value-Set analysis of [11]. The RegRel analysis is able to prove all 3 assertions, while

Rel fails to prove the assertion at line 11. Value-Set manages to prove only the assertion atline 9.

A state in the L -DRF semantics keeps track of the following components:a location map pc mapping each thread to the location of the next commandto be executed, a lock map µ which maps each lock to the thread holding it, S. Mukherjee et al a local environment (variable to value map) Θ for each thread, and a function Λ which maps each buﬀer (associated with each program location following arelease command) to an environment. Every release point of each lock m has anassociated buﬀer, where a thread stores a copy of its local environment when itexecutes the corresponding release instruction. In the environments, each variable x has a version count associated with it which, along any execution π , essentiallyassociates this valuation of x with a unique prior write to it in π . As an example,the “versioned” environment (cid:104) x (cid:55)→ , y (cid:55)→ , z (cid:55)→ (cid:105) , obtained at some point inan execution π , says that x has the value 1 by the second write to x , y has thevalue 1 by the ﬁrst write to y in π , and z has not been written to. An execution isan interleaving of commands from diﬀerent threads. Consider an execution of theprogram in Fig. 1 where, after a certain number of interleaved steps, we have thestate pc : t (cid:55)→ , t (cid:55)→ Θ ( t

1) : x (cid:55)→ , y (cid:55)→ , z (cid:55)→ Θ ( t

2) : x (cid:55)→ , y (cid:55)→ , z (cid:55)→ µ : m (cid:55)→ t Λ : 7 (cid:55)→ ⊥ , (cid:55)→ ⊥ The release buﬀers are all empty as no thread has executed a release yet.Note that the values (and versions) of x and y in t (similarly for z in t ) are stale , as they do not have the latest value of these variables which were updatedby another thread. Next, t can execute the release at line 6, thereby setting µ ( m ) = ⊥ and storing its current local versioned environment to Λ (7). Now t can execute the acquire at line 10. In doing so, the following state changes takeplace. As usual, the pc is updated to say that t is now at line 11, and the lockmap is updated to say that t now holds lock m . Additionally t “imports” themost up-to-date values (and versions) of x and y from the release buﬀer Λ (7). Wecall this inter-thread join operation a mix . This results in its local state becoming (cid:104) x (cid:55)→ , y (cid:55)→ , z (cid:55)→ (cid:105) (the valuations of x and y are pulled in from the buﬀer,while the valuation of z in t ’s local state persists). The state thus becomes pc : t (cid:55)→ , t (cid:55)→ Θ ( t

1) : x (cid:55)→ , y (cid:55)→ , z (cid:55)→ Θ ( t

2) : x (cid:55)→ , y (cid:55)→ , z (cid:55)→ µ : m (cid:55)→ t Λ (7) : x (cid:55)→ , y (cid:55)→ , z (cid:55)→ Λ (13) : ⊥ We note that the values of x and y in Θ ( t ) are no longer stale: the L -DRFsemantics leverages race freedom to ensure that the values of x and y are correctwhen they are read at line 11.Roughly, we obtain sequential data-ﬂow abstractions of the L -DRF semanticsvia the following steps: hread-Local Analyses 7 – Provide a data abstraction of sets of environments. – Deﬁne the state to be a map from locations to these abstract data values. – Compute the sync-CFG representation of the program by drawing inter-threadedges which connect releases and acquires of the same lock (as shown in thecenter of Fig. 2). – Deﬁne an abstract mix operation which soundly approximates the “import”step outlined earlier. – Analyze the program as if it was a sequential program, with inter -thread joinpoints (the acquire ’s) using the mix operator.The analysis in [11] is precisely such a sequential abstraction, where the ab-stract data values are abstractions of value-sets (variables mapped to sets of val-ues). Value sets do not track correlations between variables, and only allow coarseabstractions like Intervals [8]. The mix operator, in this case, turns out to be thestandard join (union of value-sets). For the program of Fig. 1, the interval basedvalue-set analysis, shown in the column “Value-Set” in Fig. 2, only manages toprove the assertion at line 9.A more precise relational abstraction of L -DRF which we call Rel can beobtained by keeping track of a set of environments at each point. Fig. 2 shows(in the column “Rel”) the results of such an analysis implemented using convexpolyhedra [10]. The resulting analysis is more precise than the interval analysis,being able to prove the assertions at lines 5 and 9. However, in this case, the mix must forget the correlations among variables in the incoming states: it essentiallytreats them as value sets. This is essential for soundness. Thus, even though the acquire at line 10 obtains the fact that x = y from the buﬀer at 7, and theincoming fact from 9 also has x = y , it fails to maintain this correlation after the mix . Consequently, it fails to prove the assertion at line 11.Finally, one can exploit the fact that x and y form a data “region” in that theyare protected by the same lock. The variable z constitutes a region by itself. Aswe show in later in Sec. 7, the program is region race free for this particular regiondeﬁnition. One can parameterize the L -DRF semantics with this region deﬁnition,to yield the L -RegDRF semantics. The resulting analysis called RegRel maintainsrelational information as in the

Rel analysis, but has a more precise mix operatorwhich preserves relational facts that hold within a region. Since both the incomingfacts at line 10 satisfy x = y , the mix preserves this fact, and the analysis is ableto prove the assertion at line 11.Note that in all the three analyses, we are guaranteed to compute sound factsfor variables only at points where they are accessed. For example, all three anal-yses claim that x and y are both 0 at line 9, which is clearly wrong. However, wenote that x and y are not accessed at this point. This loss of soundness at “irrel-evant” points helps us gain eﬃciency in the analysis by not having to propagateall interferences from one thread to all points of another thread. We also pointout that in Fig. 2, the inter-thread edges add a spurious loop in the sync-CFG(and, therefore, in the analysis of the program), which prevents us from comput- ing an upper bound for the values of x and y . We show in Sec. 5.5 how we canappropriately abstract the versions to avoid some of these spurious loops. S. Mukherjee et al

In this section we introduce the programming language we use to describe multi-threaded programs, and describe the standard interleaving semantics for programsin this language.3.1 PreliminariesWe begin by introducing some of the mathematical notation we will use in thispaper. We denote the set of natural numbers { , , . . . , } by N . We use → and (cid:42) to denote total and partial functions, respectively. We use “ ⊥ ” to denote anundeﬁned value, which we assume is included in every domain under consideration.We denote the length of a ﬁnite sequence of elements π by | π | , and the i -th elementof π , for 0 ≤ i < | π | , by π i . For a function f : A → B , we denote by dom ( f ) itsdomain A , and for a ∈ A and b ∈ B , we write f [ a (cid:55)→ b ] to denote the function f (cid:48) : A → B such that f (cid:48) ( x ) = b if x = a , and f ( x ) otherwise. For a pair of elements ve = (cid:104) φ, ν (cid:105) , we write ve . φ , and ve . ν , of the pair ve .We will make use of the standard notion of labelled transition systems todescribe the semantics we will give to our programs. A Labelled Transition System (LTS) is a structure L = ( S, Γ, s , → ), where S is a set of states , Γ is a set of transition labels , s ∈ S is the initial state , and →⊆ S × Γ × S is the (labelled)transition relation. We sometimes write a transition t = (cid:104) s, l, s (cid:48) (cid:105) as s → l s (cid:48) .An execution of an LTS L = ( S, Γ, s , → ), is a ﬁnite sequence of transitions π = t , t , . . . , t n ( n ≥

0) from → , such that there exists a sequence of states q , q , . . . , q n from S , with q = s and t i = ( q i − , l i , q i ) for each 1 ≤ i ≤ n . Wher-ever convenient we will also represent an execution like π above as an interleavedsequence of the form q → l q → l · · · → l n q n . We also deﬁne

Reach ( L ) to be the set of states reachable by an execution of L . Thus Reach ( L ) = { s ∈ S | ∃ an execution q → l · · · → l n q n with s = q n } . V and locks M which are shared by the threads of the program. Wedenote by V the set of values that the program variables can assume. In this workwe will take V to be simply the set of integers. Each thread in the program is a control-ﬂow graph in which each edge islabelled by a basic statement (or command) over the set of variables V and locks M . We allow a small set of basic commands over V and M , which we denote by cmd V , M , as shown in Tab. 1. For generality, we refrain from deﬁning the syntaxof the expressions e and boolean conditions b . hread-Local Analyses 9 Type Syntax Description

Assignment x := e Assigns the value of expression e to variable x ∈ V Assume assume ( b ) Blocks execution if condition b does not holdAcquire acquire ( m ) Acquires lock m ∈ M , provided m is not held by any threadRelease release ( m ) Releases lock m ∈ M , provided the executing thread holds m Table 1

The set of program commands cmd V , M over variables V and locks M Formally, we represent a multi-threaded program as a tuple P = ( V , M , T )where – V is a ﬁnite set of program variables – M is a ﬁnite set of locks – T is a ﬁnite set of thread identiﬁers . Each thread t ∈ T has an associatedcontrol-ﬂow graph of the form G t = ( L t , ent t , inst t ) where – L t is a ﬁnite set of locations of thread t – ent t ∈ L t is the entry location of thread t – inst t ⊆ L t × cmd V , M × L t is a ﬁnite set of instructions of thread t .Some deﬁnitions related to threads will be useful going forward. We denoteby L P = (cid:83) t ∈T L t the disjoint union of the thread locations. We denote by ent P the set { ent t | t ∈ T } of all entry locations of P . Henceforth, whenever P is clear from the context we will drop the subscript P from L P and its decorations. Fora location n ∈ L , we denote by tid ( n ) the thread t which contains location n .We denote the set of instructions of P by inst P = (cid:83) t ∈T inst t . For an instruction ι ∈ inst t , we will also write tid ( ι ) to mean the thread t containing ι . For aninstruction ι = (cid:104) n s , c, n t (cid:105) , we call n s the source location, and n t the target locationof ι . We expect instructions pertaining to acquire () and release () commands tohave unique source and target locations. Let L rel t be the set of program locationsin thread t which are the target of a release () instruction. We refer to L rel t as t ’s post-release points and denote the set of release points in the program by L rel = (cid:83) t ∈T L rel t . Similarly, we deﬁne t ’s pre-acquire points , denoted L acq t , anddenote a program’s acquire points by L acq = (cid:83) t ∈T L acq t . We denote the sets ofpost-release and pre-acquire points pertaining to operations on lock m by L rel m and L acq m , respectively.We denote the set of commands appearing in program P by cmd ( P ). Weconsider an assignment x := e to be a write-access to x , and as a read-access toevery variable that appears in the expression e . Similarly, an assume ( b ) statementis considered a read-access to every variable that occurs in the boolean condition b . We illustrate these deﬁnitions for the example program from Fig. 1. Here V = { x , y , z } , M = { m } , and T = { t , t } . Some example instructions in this programare (cid:104) , x := y, (cid:105) and (cid:104) , acquire ( m ) , (cid:105) . The set L t of program locations inthread t , is { , , , , , , } , while tid (8) = t . In this program, the set L rel t of post-release points in t , is { } . The set of post-release points of the wholeprogram L rel is { , } . The set of pre-acquire points of the whole program L acq is { , } . Since this program has a single lock, m , L rel m = { , } and L acq m = { , } .Many other standard commands can be expressed using the basic commands inour language. A goto instruction from program location l to l (cid:48) can be simulated by the instruction (cid:104) l, assume ( true ) , l (cid:48) (cid:105) . Constructs like if and while can be simulatedusing assume statements in a standard way.3.3 Interleaving SemanticsWe now deﬁne the standard interleaving semantics of a multi-threaded program.We ﬁrst introduce some notation that will be useful in the sequel. Given a program P = ( V , M , T ), an environment for P is a valuation φ : V → V , which assignsvalues in V to the variables of P . We denote by Env P the set of all environmentsfor P . A lock map for P is a partial map µ : M (cid:42) T which assigns to each lock thethread that holds it (if such a thread exists). We denote by LM P the set of lockmaps for P . Finally, a program counter for P is a map pc : T → L P which assignsa location to each thread in P , such that for each t ∈ T , pc ( t ) ∈ L t . We denoteby PC P the set of program counters of P . As usual, whenever P is clear from thecontext we will drop the subscript P from these symbols. Fig. 3 summarizes thesemantic domains, and the meta-variables ranging over them, that we will makeuse of in this section and subsequently. x, y ∈ V Variable identiﬁers m ∈ M Lock identiﬁers t ∈ T Thread identiﬁers n ∈ L Program locations v ∈ V Values r ∈ R Region identiﬁers φ ∈ Env ≡ V → V Environments µ ∈ LM ≡ M (cid:42) T Lock map pc ∈ PC ≡ T → L Program counters ν ∈ Ver ≡ V → N Variable versions ve ∈ VE ≡ Env × Ver

Versioned environments s = (cid:104) pc , µ, φ (cid:105) ∈ S ≡ PC × LM × Env

Standard States σ = (cid:104) pc , µ, Θ, Λ (cid:105) ∈ Σ ≡ PC × LM × ( T → VE ) × ( L rel → VE ) Thread-Local States Fig. 3

Some of the semantic domains associated with a program P = ( V , M , T ). Let us ﬁx a program P = ( V , M , T ). We deﬁne the interleaving semantics of P using an LTS L S P = ( S , T , s ent , TR S P ) whose components are deﬁned below. Theset of states S is PC × LM × Env . Thus each state is of the form (cid:104) pc , µ, φ (cid:105) , where pc is a program counter, µ is a lock map, and φ is an environment for P . Thetransition labels come from the set T of thread identiﬁers of P . The initial state s ent is (cid:104) λt. ent t , λ m . ⊥ , λx. (cid:105) . Thus, in s ent , every thread is at its entry programlocation, no thread holds a lock, and all the variables are initialized to zero. Transition Relation.

The transition relation TR S P is the union of the transitionrelations TR S i induced by each instruction ι in inst P . We elaborate on this below. The transition relation for each instruction depends on the command associ-ated with it. Intuitively, the semantics of the program commands are as follows.An assignment x := e command updates the value of the variable x according tothe (possibly non-deterministic) expression e . An assume ( b ) command generatestransitions only from states in which the (deterministic) boolean interpretation hread-Local Analyses 11 of the condition b is true . An acquire ( m ) command executed by thread t sets µ ( m ) = t , provided the lock m is not held by any other thread. A release ( m )command executed by thread t sets µ ( m ) = ⊥ provided t holds m .It will be convenient to ﬁrst deﬁne a notation for the evaluation of expressions.The evaluation of an expression e , in an environment φ , is a value in V . We denotethis value by (cid:74) e (cid:75) φ . The interpretation of a boolean condition b , in an environment φ , is a boolean value true or false , and we denote this value by (cid:74) b (cid:75) φ .For an instruction ι = (cid:104) n, c, n (cid:48) (cid:105) in inst P , with tid ( ι ) = t , we deﬁne TR S ι as theset of all transitions (cid:104)(cid:104) pc , µ, φ (cid:105) , t, (cid:104) pc (cid:48) , µ (cid:48) , φ (cid:48) (cid:105)(cid:105) such that pc ( t ) = n , pc (cid:48) = pc [ t (cid:55)→ n (cid:48) ]and the following additional conditions are satisﬁed: – If c is a command of the form x := e then µ (cid:48) = µ , and φ (cid:48) = φ [ x (cid:55)→ (cid:74) e (cid:75) φ ]. – If c is a command of the form assume ( b ) then µ (cid:48) = µ , (cid:74) b (cid:75) φ = true , and φ (cid:48) = φ . – If c is a command of the form acquire ( m ) then µ ( m ) = ⊥ , µ (cid:48) = µ [ m (cid:55)→ t ], and φ (cid:48) = φ . – If c is a command of the form release ( m ) then µ ( m ) = t , µ (cid:48) = µ [ m (cid:55)→ ⊥ ], and φ (cid:48) = φ .For a transition τ caused by an instruction ι = (cid:104) n, c, n (cid:48) (cid:105) in inst t , we denote by tid ( τ ) the thread t , by instr ( τ ) the instruction ι , and by cmd ( τ ) the command c .The transition relation TR S P can now be deﬁned as: TR S P = (cid:91) ι ∈ inst P TR S ι . Executions. An execution of the program P in the interleaving semantics is sim-ply an execution of the LTS L S P . When dealing with executions in the interleav-ing semantics, we will denote the transition relation TR S P by ⇒ S . We denote by Reach S ( P ) the set Reach ( L S P ), namely the set of reachable states in the standardinterleaving semantics of P .Fig. 4 depicts an execution of the program in Fig. 1 in the interleaving seman-tics. To keep it simple we show only the sequence of program instructions (fromtop to bottom), and the thread they belong to (column t or t ). The states alongthe execution can be inferred by the standard semantics of the commands. Theother annotations in the ﬁgure will be explained in Sec. 3.4.3.4 Data Races and the Happens-Before RelationNow that we have formally deﬁned the standard interleaving semantics, we arein a position to formally deﬁne what constitutes a data race. A standard wayto formalize the notion of data race freedom (DRF), is to use the happens-before relation [2, 21] induced by executions.For a given execution of the program P in the standard interleaving semantics,the happens-before relation is deﬁned as the reﬂexive and transitive closure of the program-order and synchronizes-with relations, formalized below. Deﬁnition 1 (Program order)

Let π be an execution of P . Transition π i isrelated to the transition π j , according to the program-order relation in π , denotedby π i po −→ π π j , if j = min { k | i < k < | π | ∧ tid ( π k ) = tid ( π i ) } . Fig. 4

A typical execution of the program in Fig. 1 with two threads, according to thestandard interleaving semantics. Time ﬂows from the top to the bottom. Instructions orderedby program-order are annotated as po . The release executed by t and the acquire executedby t are related by synchronizes-with, and is annotated as sw . The write of x in thread t ,and its subsequent read in thread t , are connected by a happens-before path, comprising po and sw annotated edges. That is, π i and π j are successive executions, in π , of instructions by the samethread. The transitions related by program-order in Fig. 4 are marked with po . Deﬁnition 2 (Synchronizes-with)

Let π be an execution of P . Transition π i is related in π , by the synchronizes-with relation, to the transition π j , denoted by π i sw −−→ π π j , if cmd ( π i ) = release ( m ) for some lock m , and j = min { k | i < k < | π | ∧ cmd ( π k ) = acquire ( m ) } . That is, π i is a release of lock m in π , and π j is a subsequent acquire of m , andthere are no intervening acquires of m .The transitions related by synchronizes-with in Fig. 4 are marked with sw . Deﬁnition 3 (Happens before)

The happens-before relation pertaining to anexecution π of P , denoted by · hb −→ π · , is the reﬂexive and transitive closure of the union of the program-order and synchronizes-with relations induced by theexecution π . Strictly speaking, the various relations we deﬁne are between indices { , . . . , | π | − } of anexecution, and not transitions, so we should have written, e.g., i po −−→ π j instead of π i po −−→ π π j .We use the informal latter notation, for readability.hread-Local Analyses 13 Note that transitions executed by the same thread are always related byprogram-order, and are thus always related according to the happens-before rela-tion.

Deﬁnition 4 (Data Race)

Let π be an execution of P . Transitions π i and π j , in π , constitute a racing pair, or a data-race, if the following conditions are satisﬁed:1. cmd ( π i ) and cmd ( π j ) are conﬂicting accesses to a variable x (i.e. they bothaccess the variable x , and at least one of them is a write-access), and2. neither π i hb −→ π π j nor π j hb −→ π π i holds.To illustrate these deﬁnitions, consider the execution of the program of Fig. 1,shown in Fig. 4. The program-order relation between transitions is shown usingedges marked po , while the synchronizes-with relation is shown using edges marked sw . For example, the transitions where t executes x := y and the one where t executes x++ are related by program-order. The transition where t releasesthe lock m , and the subsequent transition where t acquires m , are related bythe synchronizes-with relation. There is a happens-before path, namely the pathcomprising po and sw annotated edges in Fig. 4, between the write to x by t ,and the subsequent read of x by t . Note that even though the instruction x :=y is executed by t before t executes z++ in the execution in Fig. 4, these twoinstructions are not related by happens-before. Consider, for a moment, if t did not have the acquire(m) instruction. Then, the transitions made by t could neverbe happens-before related to the ones in t (due to the absence of sw edges). Inparticular, the write to x by t would not be happens-before ordered with the readof x in t , and we would have a data race in the execution.A program in which every execution is free from data races is said to be datarace free . The program in Fig. 1 is an example of such a race free program.We say an instruction ι in P is racy if there is an execution π of P in whichtwo transitions π i and π j are involved in a race and instr ( π i ) = ι .We can now deﬁne the notion of the set of variables “owned” by a thread atone of its locations. We say variable x is owned by a thread t at a location n ∈ L t ,in program P , if the introduction of a read of x at location n is not racy. In otherwords, if we introduce the instruction ι with command assume ( x == x ) at point n in t , to get the program P (cid:48) , then instruction ι is not racy in P (cid:48) . For example, inthe program of Fig. 1, at location 3, thread t owns the variables x and y . Howeverit does not own the variable z at location 3, since a read of z introduced at thispoint would be racy (it would race with the write to z at line 8 in t ). L -DRF In this section, we introduce a novel semantics for the class of data race freeprograms, which we refer to as the L -DRF semantics [30]. The “ L ” highlightsthe fact that the semantics is thread- local in nature, while DRF emphasizes that we deal exclusively with data race free programs. The L -DRF semantics pavesthe way towards devising eﬃcient “thread-local” data ﬂow analyses for race freeconcurrent programs. Like the standard interleaving semantics we saw in Sec. 3.3,we present the L -DRF semantics of a program as a labeled transition system. Wethen prove that the L -DRF semantics is sound and complete with respect to the standard semantics, in the sense that for each execution of the program in thestandard semantics, there is an “equivalent” execution in the L -DRF semantics,and vice versa.4.1 The L -DRF SemanticsOur thread-local semantics, like the standard one deﬁned in Sec. 3.3, is basedon the interleaving of transitions made by diﬀerent threads, and the use of a lockmap to coordinate the use of locks. However, unlike the standard semantics, wherethe threads share access to a single global environment, in the L -DRF semantics,every thread has its own local environment which it uses to evaluate conditionsand perform assignments.Threads exchange information through release buﬀers : every post-release point n ∈ L rel t of a thread t is associated with a buﬀer Λ ( n ) which records a snapshotof t ’s local environment the last time t ended up at the program point n . Recallthat this happens right after t executes the instruction (cid:104) n (cid:48) , release ( m ) , n (cid:105) ∈ inst P .When a thread t (cid:48) subsequently acquires the lock m , it updates its local environmentusing the snapshots stored in all the buﬀers pertaining to the release of m .To ensure that t updates its environment such that the value of every variableis up-to-date, every thread maintains its own version map ν : V → N , whichassociates a count to each variable. A thread increments ν ( x ) whenever it writesto x . Along any execution, the version ν ( x ), for x ∈ V , in the version map ν ofthread t , associates a unique prior write with this particular valuation of x . It alsoreﬂects the total number of write accesses made (across threads) to x to obtainthe value of x stored in the map. A thread stores both its local environment andversion map in the buﬀer after releasing a lock m . When a thread subsequentlyacquires lock m , it copies from the release buﬀers at L rel m the most up-to-datevalue (according to the version numbers) of every variable. We prove that for datarace free programs, there can be only one such value. If the version of x is thelocal state of t is higher than the versions of x in the associated release buﬀers,then the value of x in the local state persists.Let us ﬁx a concurrent race free program P = ( V , M , T ). As in Sec. 3.3, wedeﬁne the L -DRF semantics of P in terms of a labeled transition system L L P =( Σ, T , σ ent , TR L P ) whose components we deﬁne below. States. A state σ ∈ Σ in the L -DRF semantics of P is a tuple (cid:104) pc , µ, Θ, Λ (cid:105) , where pc and µ are the program counter and lock map, as in the standard interleavingsemantics (Sec. 3.3). A versioned environment is a pair (cid:104) φ, ν (cid:105) , where φ ∈ Env is an environment and ν : V → N is a version map, which assigns a versioncount to each variable. We denote by VE P (or just VE when P is clear from thecontext) the set of versioned environments of program P . The local environmentmap Θ : T → VE maps every thread to a local versioned environment, and therelease buﬀer map Λ : L rel → VE records the snapshots of versioned environments stored in buﬀers associated with post-release points. Recall that L rel t is the set of all post-release points in the thread t . Recall that L rel m is the set of all post-release points in the program associated with therelease of lock m .hread-Local Analyses 15 Initial State.

The initial state σ ent is deﬁned to be σ ent = (cid:104) λt. ent t , λ m . ⊥ , λt. ve ent , λl ∈ L rel . ve ent (cid:105) where ve ent = (cid:104) λx. , λx. (cid:105) . Thus, in σ ent , every thread is at its entry programlocation, no thread holds a lock, and all the thread-local versioned environmentshave all the variables and versions initialized to 0. The release buﬀers are alsoinitialized to the versioned environment where all variable values and versions are0. Transition Relation.

The transition relation

T R L P ⊆ Σ × T × Σ captures theinterleaving nature of the L -DRF semantics of P . Like the interleaving semanticsin Sec. 3.3, T R L P is the union of the transition relations T R L ι induced by eachinstruction ι ∈ inst P .For an instruction ι = (cid:104) n, c, n (cid:48) (cid:105) in inst P , with tid ( ι ) = t , we deﬁne TR L ι as the set of all transitions (cid:104)(cid:104) pc , µ, Θ, Λ (cid:105) , t, (cid:104) pc (cid:48) , µ (cid:48) , Θ (cid:48) , Λ (cid:48) (cid:105)(cid:105) such that pc ( t ) = n , pc (cid:48) = pc [ t (cid:55)→ n (cid:48) ] and the following additional conditions are satisﬁed: – Assignment. If c is a command of the form x := e then µ (cid:48) = µ , and Θ (cid:48) = Θ [ t (cid:55)→ (cid:104) φ (cid:48) , ν (cid:48) (cid:105) ], where φ (cid:48) and ν (cid:48) are given as follows. Let Θ ( t ) = (cid:104) φ, ν (cid:105) . Then φ (cid:48) = φ [ x (cid:55)→ (cid:74) e (cid:75) φ ], and ν (cid:48) = ν [ x (cid:55)→ ν ( x ) + 1]. For subsequent use, we deﬁne theinterpretation of an assignment statement x := e on a versioned environment (cid:104) φ, ν (cid:105) , denoted (cid:74) x := e (cid:75) L ( (cid:104) φ, ν (cid:105) ), to be (cid:104) φ (cid:48) , ν (cid:48) (cid:105) , where φ (cid:48) = φ [ x (cid:55)→ (cid:74) e (cid:75) φ ] and ν (cid:48) = ν [ x (cid:55)→ ν ( x ) + 1]. – Assume. If c is an assume statement of the form assume ( b ), then µ (cid:48) = µ , and Θ (cid:48) = Θ , Λ (cid:48) = Λ , and (cid:74) b (cid:75) L ( Θ ( t )) is true. Here by (cid:74) b (cid:75) L (cid:104) φ, ν (cid:105) we simply mean (cid:74) b (cid:75) φ .We note that for instructions which execute either assignment or assume com-mands, the executing thread accesses and modiﬁes only its own local versionedenvironment. – Acquire. An acquire ( m ) command, executed by a thread t , has the same eﬀecton the lock map component as in the standard semantics (see Sec. 3.3). Inaddition, it updates the versioned environment Θ ( t ) based on the contents ofthe relevant release buﬀers. The release buﬀers relevant to a thread when itacquires m are the ones at L rel m .We deﬁne an auxiliary function updEnv to update the value of each x ∈ V (along with its version) in Θ ( t ), by taking its value from a snapshot storedat a relevant buﬀer which has the highest version of x , if the latter versionis higher than ( Θ ( t ) . x ). If the version of x is highest in ( Θ ( t ) . x ), then t simply retains this value. Finding the most up-to-date (value, version) pairsfor a variable x from a set of versioned environments is the job of the auxiliaryfunction take x . We will separately prove (in Lemma 4) that all reachable L -DRFstates are admissible in that in any two component versioned environments (i.e.the thread local versioned environments or release buﬀers of the state), if the versions for a variable coincide, then so must their values. Thus if (cid:104) φ, ν (cid:105) and (cid:104) φ (cid:48) , ν (cid:48) (cid:105) are two versioned environments in the components of a reachable state,then for each variable x , ν ( x ) = ν (cid:48) ( x ) = ⇒ φ ( x ) = φ (cid:48) ( x ).Given a set of versioned environments Y , we deﬁne take x ( Y ) to be the setof (value,version) pairs (cid:104) v, m (cid:105) such that there exists a versioned environment (cid:104) φ, ν (cid:105) in Y with φ ( x ) = v and ν ( x ) = m , and m is the highest version of x among the versioned environements in Y (i.e. ν ( x ) ≥ ν (cid:48) ( x ) for each (cid:104) φ (cid:48) , ν (cid:48) (cid:105) in Y ).Given a versioned environment ve and a set of versioned environments X , wedeﬁne updEnv ( ve , X ) to be the set of versioned environments (cid:104) φ (cid:48) , ν (cid:48) (cid:105) such thatfor each variable x ∈ V , (cid:104) φ (cid:48) ( x ) , ν (cid:48) ( x ) (cid:105) ∈ take x ( { ve } ∪ X ).We can now deﬁne the transition induced by an acquire command. If c isan aquire statement of the form acquire ( m ), then µ [ m ] = ⊥ , µ (cid:48) = µ [ m (cid:55)→ t ], Θ (cid:48) = Θ [ t (cid:55)→ ve (cid:48) ], and Λ (cid:48) = Λ , where ve (cid:48) = updEnv ( Θ ( t ) , Λ m ) and Λ m = { Λ ( n (cid:48)(cid:48) ) | n (cid:48)(cid:48) ∈ L rel m } is the set of versioned environments relevant to m .As an example, consider again the execution of the program of Fig. 1, as shownin Fig. 4. When thread t executes the acquire ( m ) instruction, the conditionof the relevant buﬀers and the thread local state of t is shown in Fig. 5. Theﬁgure also outlines the operation of the functions take x , take y and take z , andﬁnally the operation of the function updEnv . Fig. 5

Operation of the functions take x , take y , take z , and updEnv when t acquires m in theexecution of the program of Fig. 1, as shown in Fig. 4. The superscripts indicate the versions. – Release. If c is a release statement of the form release ( m ), then µ [ m ] = t , µ (cid:48) = µ [ m (cid:55)→ ⊥ ], Θ (cid:48) = Θ , and Λ (cid:48) = Λ [ n (cid:48) (cid:55)→ Θ ( t ).Thus an instruction ι pertaining to a release ( m ) command has the same eﬀecton the lock map component of the state in the L -DRF semantics that it has inthe standard semantics (See Sec. 3.3). In addition, it stores the local versionedenvironment of thread t (= tid ( ι )), Θ ( t ), in the buﬀer associated with thepost-release point of the executed release ( m ) instruction.The transition relation T R L P of program P according to the L -DRF semantics,is the union of the set of all possible transitions generated by its instructions. Formally, TR L P = (cid:91) ι ∈ inst P TR L ι . This completes the description of the labelled transition system L L P capturingthe L -DRF semantics. An execution of program P in the L -DRF semantics is sim- hread-Local Analyses 17 ply an execution of the transition system L L P . When dealing with executions in the L -DRF semantics, we will denote the transition relation TR L P by ⇒ L . We denoteby Reach L ( P ) the set of reachable states in this semantics, namely Reach ( L L P ).4.2 Soundness and Completeness of L -DRFIn this section, we show that for the class of data race free programs, the thread lo-cal semantics L -DRF is sound and complete with respect to the standard interleav-ing semantics. Intuitively, the L -DRF and the standard semantics are “equivalent”in the sense that for each execution of a program P in the standard semantics, onecan ﬁnd a corresponding execution in the L -DRF semantics which coincides withthe values read from the variables. Likewise, every execution of program P in the L -DRF semantics has a corresponding execution in the standard semantics.Let us ﬁx a race free program P = ( V , M , T ). To formalize the above claim,we ﬁrst deﬁne a function which extracts a state in the interleaving semantics froma state in the L -DRF semantics. Deﬁnition 5 (Extraction Function χ ) The extraction function χ : Σ (cid:42) S isdeﬁned for admissible states (see Sec. 4.1) in Σ as follows: χ ( (cid:104) pc , µ, Θ, Λ (cid:105) ) = (cid:104) pc , µ, φ (cid:105) , where φ is deﬁned as follows. For each x ∈ V , φ ( x ) = v , provided there exists aversion value m , with (cid:104) v, m (cid:105) ∈ take x ( (cid:83) t ∈T { Θ ( t ) } ). The function χ thus preservesthe values of the program counters and the lock map, while it takes the value of avariable x from the thread which has the maximal version count for x in its localenvironment. The map χ is clearly well-deﬁned for admissible states.The function χ can be extended to executions in the L -DRF semantics, in thefollowing sense. Given an execution ˆ π = σ ⇒ L t . . . ⇒ L t n σ n of program P in the L -DRF semantics, and an execution π = s ⇒ S t . . . ⇒ S t l s l of P in the standardsemantics, we say π = χ (ˆ π ) if l = n and for each i : 0 ≤ i ≤ n , s i = χ ( σ i ). Theorem 1 (Completeness)

For any execution π of P in the standard inter-leaving semantics, there exists an execution ˆ π of P in the L -DRF semantics suchthat χ (ˆ π ) = π . Theorem 2 (Soundness)

For any execution ˆ π of P in the L -DRF semantics,there is an execution π in the standard interleaving semantics of P , with π = χ (ˆ π ) . In order to prove Theorem 1 and Theorem 2, we need to establish a few inter-mediate results.

Lemma 1

In any execution ˆ π in the L -DRF semantics of P , the version of anyvariable x ∈ V , in any component versioned environment of any state σ in ˆ π , is bounded by the total number of writes to x preceding it.Proof In ˆ π , the only transitions which can increment the version of variable x per-tain to instructions containing commands which write to x , of the form x:=e . In-structions containing other commands ( assume , acquire and release ) only make copies of existing version counts. If there are n such transitions containing instruc-tions writing to x in ˆ π , and the initial version count of x is 0 in all the componentversioned environments of the initial state σ ent , the version of x , in any componentversioned environment of any state σ in ˆ π can be at most n . (cid:117)(cid:116) Lemma 2

Let ˆ π = (cid:104) pc , µ , Θ , Λ (cid:105) ⇒ L t . . . ⇒ L t N (cid:104) pc N , µ N , Θ N , Λ N (cid:105) be an exe-cution in the L -DRF semantics of program P . Let τ j = (cid:104) pc j − , µ j − , Θ j − , Λ j − (cid:105) ⇒ L t j (cid:104) pc j , µ j , Θ j , Λ j (cid:105) be a transition in ˆ π which contains an access (read or write) to the variable x .Suppose there is a prior write to x in ˆ π , and let τ i = (cid:104) pc i − , µ i − , Θ i − , Λ i − (cid:105) ⇒ L t i (cid:104) pc i , µ i , Θ i , Λ i (cid:105) be the last transition, prior to τ j , which contains an assignment to x . Then, ( Θ j − ( t j ) . x ) ≥ ( Θ i ( t i ) . x ) . In other words, the version of x in Θ ( t j ) is no less than the version of x in thelocal state of t i post the write at τ i . Fig. 6

A typical execution of a program P in the L -DRF semantics. The solid arrows rep-resent the interleaved execution of the instructions from diﬀerent threads. The dotted arrowsdenote the happens-before path induced by this execution. The ﬁgure marks the sections ofthe happens-before path which are program-order related ( po ), and the transitions related bysynchronizes-with ( sw ). Proof

Fig. 6 provides a pictorial description of the situation we are considering.We lift the notion of a happens-before path, which we deﬁned for the interleavingsemantics, in a natural way to L -DRF executions. The sequence of transitions inˆ π can also be viewed as an standard execution, and the resulting happens-beforepath in ˆ π contains the same sequence of transitions as the happens-before path in hread-Local Analyses 19 the execution in the standard interleaving semantics. Since τ i and τ j are conﬂictingaccesses to the variable x , and since the program P is assumed to be free fromraces, we have τ i hb −→ ˆ π τ j (indicated by the path comprising dotted arrows inFig. 6).Let ρ be such a happens-before path between τ i and τ j , excluding both τ i and τ j . If ρ is of 0 length, then τ j must immediately follow τ i in the same thread,and the lemma clearly holds. Suppose ρ is of length at least one, and considera transition τ k in ρ . By induction on the position n of τ k in ρ , we claim that( Θ k ( t k ) . x ) ≥ ( Θ i ( t i ) . x ). Base Case. If n = 1, then τ i and τ k must be related by program order, whichimplies t i = t k and k = i + 1. Thus clearly ( Θ ( t k ) . x ) ≥ ( Θ ( t i ) . x ). Inductive Case.

Assume that the hypothesis holds for all transitions at positionsless than or equal to n in ρ , and let us suppose τ k occurs at position n + 1 in ρ .Let the n -th transition in ρ be τ u = (cid:104) pc u − , µ u − , Θ u − , Λ u − (cid:105) ⇒ L t u (cid:104) pc u , µ u , Θ u , Λ u (cid:105) . There are two possible cases here. Either τ u po −→ ˆ π τ k , and consequently t u = t k .In this case too, clearly ( Θ ( t k ) . x ) ≥ ( Θ ( t u ) . x ), which, by the inductionhypothesis, is greater than or equal to ( Θ ( t i ) . x ). Hence this case is taken careof. On the other hand, if τ u sw −−→ ˆ π τ k , then τ u must be the release of some lock m ,and τ k must be the acquire of m . By the L -DRF semantics of acquire , thread t k will observe the buﬀer associated with the release command of τ u . Consequently,( Θ k ( t k ) . x ) ≥ ( Θ u ( t u ) . . ( x ), by the semantics of the acquire command and( Θ u ( t u ) . x ) ≥ ( Θ i ( t i ) . x ), by the induction hypothesis. Thus, the hypothesisholds in this case as well. This proves the claim.The lemma now follows directly from the claim. (cid:117)(cid:116) Lemma 3

Let ˆ π = (cid:104) pc , µ , Θ , Λ (cid:105) ⇒ L t . . . ⇒ L t N (cid:104) pc N , µ N , Θ N , Λ N (cid:105) be an exe-cution in the L -DRF semantics of program P , and let the i -th transition in theexecution be τ i = (cid:104) pc i − , µ i − , Θ i − , Λ i − (cid:105) ⇒ L t i (cid:104) pc i , µ i , Θ i , Λ i (cid:105) . Consider a transition τ k with cmd ( τ k ) being an assignment to a variable x . Then ( Θ k ( t k ) . x ) = |{ i : i ≤ k and cmd ( τ i ) is an assignment to x }| . That is, in the post-state of an assignment to a variable x by thread t , the versionof x in the local versioned environment of t equals the total number of writes madeto x till that point.Proof We prove the lemma by induction on k . Base Case. If k = 1, then clearly ( Θ k ( t k ) . x ) = 1, and we are done. Inductive Case.

Let k = n +1 and assume the lemma holds for all earlier writesto x in ˆ π . Let the last write to x , prior to c ( τ n +1 ), be in the transition τ i . By theinduction hypothesis, By abuse of notation we use cmd ( τ ) to denote the command of the instruction ι causingthe transition τ .0 S. Mukherjee et al ( Θ i ( t i ) . x ) = |{ j : j ≤ i ∧ cmd ( τ j ) is an assignment to x }| = w (say)We now infer the following:( Θ n ( t n +1 ) . x ) ≥ w from Lemma 2 and( Θ n ( t n +1 ) . x ) ≤ w from Lemma 1Therefore ( Θ n ( t n +1 ) . x ) = w . Since τ n +1 increments the version of x in Θ n +1 ( t n +1 ),we have( Θ n +1 ( t n +1 ) . x ) = ( Θ n ( t n +1 ) . x ) + 1= w + 1= |{ j : j ≤ i ∧ cmd ( τ j ) is an assignment to x }| + 1= |{ j : j ≤ n + 1 ∧ cmd ( τ j ) is an assignment to x }| . This completes the proof of the lemma. (cid:117)(cid:116)

Corollary 1

Let ˆ π = σ ⇒ L t . . . ⇒ L t N σ N be an execution in the L -DRF se-mantics of program P . Let σ i = (cid:104) pc i , µ i , Θ i , Λ i (cid:105) , and let the i -th transition in theexecution be τ i = σ i − ⇒ L t i σ i . Suppose τ k contains an access (read or write) to the variable x . Let m be thehighest version count of x among all component versioned environments in σ k − .Then ( Θ k − ( t k ) . x ) = m . In other words, whenever a thread accesses a variable x , the version of x is the highest in its local versioned environment.Proof Suppose τ k is the ﬁrst write to x in ˆ π . Then by Lemma 1, Θ k − ( t ) = 0 foreach t ∈ T , and we are done. Otherwise, let there be m ≥ x before τ k , and let τ i be the last such write. Then by Lemma 1, ( Θ k − ( t ) . x ) ≤ m for each t ∈ T , and also ( Λ k − ( n ) . x ) ≤ m for each n ∈ L rel . Further, byLemma 3, ( Θ i ( t i ) . x ) = m , and by Lemma 2, ( Θ k − i ( t k ) . x ) ≥ m . Hence( Θ k − i ( t k ) . x ) = m , and we have the corollary. (cid:117)(cid:116) The next Lemma proves that the L -DRF semantics generates only admissiblestates. Lemma 4

Let ˆ π = σ ent ⇒ L t . . . ⇒ L t N σ N be an execution of P in the L -DRFsemantics. Then, for any σ k , with two component versioned environments (inthread local states or buﬀers) (cid:104) φ , ν (cid:105) and (cid:104) φ , ν (cid:105) , and any variable x ∈ V , if ν ( x ) = ν ( x ) , then φ ( x ) = φ ( x ) . Proof

We prove the lemma using induction on the position k in ˆ π . Let the i -thtransition in ˆ π be τ i = σ i − ⇒ L t i σ i , and let each σ i be (cid:104) pc i , µ i , Θ i , Λ i (cid:105) . Base Case

When k = 0, we have σ k = σ ent . Since all versions and values are0, the hypothesis clearly holds. hread-Local Analyses 21 Inductive Case.

Let us assume that for all k ≤ n the claim of the lemmaholds, and consider k = n + 1. We consider the diﬀerent cases for cmd ( τ n +1 ).If cmd ( τ n +1 ) is either an assume or a release statement, then the claim clearlyholds since by assumption it holds for σ n and these commands do not alter anyversions or values in going from σ n to σ n +1 .If cmd ( τ n +1 ) is of the form acquire ( m ), then t n +1 updates its local versionedenvironment based on its local versioned environment and the versioned environ-ments at relevant buﬀers. By the induction hypothesis, the versioned environm-ments in σ n satisfy the property of the lemma. By the semantics of the acquire command, t n +1 copies over the version and the valuation of x from one such ve (in-cluding, possibly, t n +1 ’s local versioned environment) in σ n . Thus, the hypothesisalso holds for σ n +1 in this case.If cmd ( τ n +1 ) is of the form x := e , then t n +1 updates the version and valuationof x in its local versioned environment. By Lemma 1, in any component versionedenvironment (cid:104) φ, ν (cid:105) in σ n , we must have φ ( x ) ≤ m, where m is the total number of writes to x preceeding τ n +1 . By Lemma 3,( Θ n +1 ( t n +1 ) . x ) = m + 1 . This implies that for any component versioned environment (cid:104) φ (cid:48) , ν (cid:48) (cid:105) in σ n +1 , otherthan the local versioned environment of t n +1 , ν (cid:48) ( x ) < ( Θ n +1 ( t n +1 ) . x ) . Since none of the other versioned environments is modiﬁed, the claim of the lemmacontinues to hold for σ n +1 . This completes the proof the lemma. (cid:117)(cid:116) We now proceed to prove the completeness and soundness results (Theorem 1and Theorem 2).

Proof (Completeness, Theorem 1)

We ﬁrst outline the idea behind the proof, usingFig. 7. For any trace π of P in the interleaving semantics, we obtain a correspond-ing trace ˆ π in the L -DRF semantics by taking the same interleaving of instructionsfrom the threads. Our inductive hypothesis is that every N length standard in-terleaving execution has a corresponding N length L -DRF execution. We nowconsider a N + 1 length execution π in the standard interleaving semantics, andwe show that there exists a state σ n +1 , using which we can extend the N length L -DRF trace to create a N + 1 length trace which is χ -equivalent to π .We prove the result using induction on the length of the execution. Let P ( N )denote the following hypothesis. For any trace π = (cid:104) pc , φ , µ (cid:105) ⇒ S t . . . ⇒ S t N (cid:104) pc N , φ N , µ N (cid:105) of program P in the standard semantics, there exists a traceˆ π = (cid:104) pc , µ , Θ , Λ (cid:105) ⇒ L t . . . ⇒ L t N (cid:104) pc N , µ N , Θ N , Λ N (cid:105) in the L -DRF semantics such that χ (ˆ π ) = π .We outline the inductive arguments. Fig. 7

The inductive proof obligation for Completeness. If we hypothesize that every n lengthtrace π of program P in the standard semantics has an equivalent trace ˆ π in L -DRF semantics,and if we can extend the trace π by a single step to reach state s n +1 , then there exists a state σ n +1 , with χ ( σ n +1 ) = s n +1 , by which we can extend the ˆ π trace by a single step as well. Base Case.

For N = 0, the execution π contains the single state s ent . Thelength 0 L -DRF execution contains the single state σ ent . Since χ ( σ ent ) = s ent , P (0) holds. Inductive Case.

Assume that P ( k ) holds for all executions of length k , where0 ≤ k ≤ n . We prove that P ( n + 1) holds. Consider a n + 1 length execution π = (cid:104) pc , φ , µ (cid:105) ⇒ S t . . . ⇒ S t n +1 (cid:104) pc n +1 , φ n +1 , µ n +1 (cid:105) of program P in the interleaving semantics. Let the instruction corresponding tothe last transition in π be (cid:104) l, c, l (cid:105) . We denote by π [1 . . . n ] the n -length preﬁx of π .By the induction hypothesis, there exists a traceˆ π (cid:48) = (cid:104) pc , µ , Θ , Λ (cid:105) ⇒ S t . . . ⇒ S t n (cid:104) pc n , µ n , Θ n , Λ n (cid:105) of length n in the L -DRF semantics, such that π [1 . . . n ] = χ (cid:0) ˆ π (cid:48) (cid:1) . Note that thisimplies that χ ( σ n ) = s n (1)where σ n = (cid:104) pc n , µ n , Θ n , Λ n (cid:105) and s n = (cid:104) pc n , µ n , φ n (cid:105) .We show that there exists a state σ n +1 = (cid:104) pc n +1 , µ n +1 , Θ n +1 , Λ n +1 (cid:105) in the L -DRF semantics, such that χ ( σ n +1 ) = s n +1 and σ n ⇒ L t n +1 σ n +1 via the sameinstruction (cid:104) l, c, l (cid:48) (cid:105) used in the transition s n ⇒ S t n +1 s n +1 . Let ˆ π be the resulting L -DRF execution σ ⇒ L t · · · ⇒ L t n − σ n ⇒ L t n σ n +1 . Then this would prove that ˆ π satisﬁes the property χ (ˆ π ) = π . We show this proof obligation diagrammaticallyin Fig. 7.We note that since (cid:104) l, c, l (cid:48) (cid:105) is the last instruction in π , we must have: pc n ( t n +1 ) = l pc n +1 ( t n +1 ) = l (cid:48) . hread-Local Analyses 23 Note that, by construction, the pc and µ components of σ n +1 and s n +1 are madeequal. Thus the components pc n +1 and µ n +1 of σ n +1 are already ﬁxed, and it re-mains to deﬁne Θ n +1 and Λ n +1 appropriately. We now case split on the command c . – c = acquire ( m ): We deﬁne the components of state σ n +1 as follows: Θ n +1 = Θ n [ t n +1 (cid:55)→ updEnv ( Θ n ( t n +1 ) , Λ m n )] Λ n +1 = Λ n . Since the lock maps in both σ n and s n are the same, the lock acquisition suc-ceeds from σ n as well. By the L -DRF semantics of acquire , σ n ⇒ L t n +1 σ n +1 .Since the acquire does not change the maximum version, and the correspond-ing value, of each x ∈ V between σ n and σ n +1 , we have χ ( σ n +1 ) = s n +1 . Thus P ( n + 1) holds in this case. – c = release ( m ): We deﬁne the components of state σ n +1 as follows: Θ n +1 = Θ n Λ n +1 = Λ n [ l (cid:48) (cid:55)→ Θ n ( t n +1 )] . Once again the lock release must succeed from σ n as well. By the L -DRFsemantics of release , σ n ⇒ L t n +1 σ n +1 . Since the release does not change themaximum version, and the corresponding value, of each x ∈ V between σ n and σ n +1 , we have χ ( σ n +1 ) = s n +1 . Thus P ( n + 1) holds in this case as well. – c = assume ( b ): We deﬁne the components of state σ n +1 as follows: Θ n +1 = Θ n Λ n +1 = Λ n . Consider an arbitrary variable x that is read in the condition b . By Corollary 1,in σ n , the version of x is highest in Θ n ( t n +1 ). Given that by the inductionhypothesis χ ( σ n ) = s n , this implies that for any such variable x , φ n ( x ) =( Θ n ( t n +1 ) . x ). Hence, it follows that (cid:74) b (cid:75) φ n = (cid:74) b (cid:75) L ( Θ n ( t n +1 )).Since, by assumption, s n ⇒ S t n +1 s n +1 , it follows that σ n ⇒ L t n +1 σ n +1 . Sincethe assume does not alter the maximum version, and the corresponding value,of each x ∈ V between σ n and σ n +1 , we have χ ( σ n +1 ) = s n +1 . Thus P ( n + 1)holds in this case as well. – c = x := e : We deﬁne the components of state σ n +1 as follows: Θ n +1 = Θ n [ t n +1 (cid:55)→ (cid:104) φ (cid:48) , ν (cid:48) (cid:105) ] Λ n +1 = Λ n where φ (cid:48) and ν (cid:48) are deﬁned as follows. Let Θ n ( t n +1 ) be (cid:104) φ, ν (cid:105) . Then φ (cid:48) = φ [ x (cid:55)→ φ n +1 ( x )] ν (cid:48) = ν [ x (cid:55)→ ν ( x ) + 1] . Consider an arbitrary variable y that is read in the expression e . By Corollary 1,in σ n , the version of y is highest in Θ n ( t n +1 ). This implies that for any suchvariable y ∈ V , φ n ( y ) = φ ( y )= ⇒ (cid:74) e (cid:75) φ n = (cid:74) e (cid:75) L ( Θ n ( t n +1 ))= ⇒ φ n +1 ( x ) = ( Θ n +1 ( t n +1 ) . x )Coupled with the deﬁnition of ν (cid:48) , this proves that (cid:104) φ (cid:48) , ν (cid:48) (cid:105) = (cid:74) x := e (cid:75) L Θ n ( t n +1 )which allows us to conclude that σ n ⇒ L t σ n +1 . By Lemma 3 and the con-struction of σ n +1 , the version of x is highest in Θ n +1 ( t n +1 ), among all othercomponent versioned environments of σ n +1 . This, coupled with the fact thatno other versions are modiﬁed, lets us conclude that χ ( σ n +1 ) = s n +1 . Conse-quently, P ( n + 1) holds here as well.This completes the induction argument, and hence the lemma. (cid:117)(cid:116) Fig. 8

The inductive proof obligation for Soundness. If we hypothesize that every n lengthexecution ˆ π of program P in the L -DRF semantics has an equivalent execution π in thestandard semantics, and if we can extend the execution ˆ π by a single step to reach state σ n +1 ,then there exists a state s n +1 , with χ ( σ n +1 ) = s n +1 , by which we can extend the execution π by a single step as well. Proof (Soundness, Theorem 2)

We outline the proof idea using Fig. 8. Here thesituation is the inverse of that in Fig. 7. Given any execution ˆ π in the L -DRF semantics of P , we show that the sequence of states induced by the χ -map, is avalid execution of P in the interleaving semantics. Our induction hypothesis is onthe length n of the L -DRF execution. When we consider a n + 1 length L -DRFexecution ˆ π (cid:48) , we know there exists an execution π in the interleaving semanticscorresponding to the n length preﬁx of ˆ π (cid:48) . We show that we can extend π by using hread-Local Analyses 25 χ ( σ n +1 ) in order to obtain an n + 1 length execution in the interleaving semantics,which is χ related to ˆ π (cid:48) .Consider an execution ˆ π = σ ent ⇒ L t . . . ⇒ L t N σ N in the L -DRF semantics of program P . We deﬁne a sequence of states of P in thestandard semantics π = s ent ⇒ S t . . . ⇒ S t N s N , where for each i : 0 ≤ i ≤ N , s i = χ ( σ i ), and claim this to be a valid executionof P in the standard semantics. For each i , let σ i = (cid:104) pc i , µ i , Θ i , Λ i (cid:105) and s i = (cid:104) pc i , µ i , φ i (cid:105) . We prove the claim by induction on the length N of the execution ˆ π . Base Case. If N = 0, the execution ˆ π contains the single state σ = σ ent . Since χ ( σ ) = s = s ent , we have that π is a valid length 0 execution of P in the standardsemantics. Inductive Case.

Assume that the claim holds for all L -DRF executions of length n . Let N = n + 1. If ˆ π [1 . . . n ] denotes the n length preﬁx of the execution ˆ π , thenby the induction hypothesis, s ⇒ S t . . . ⇒ S t n s n is a valid execution of P in the interleaving semantics. We show that s n ⇒ S t n +1 s n +1 , where s n +1 = χ ( σ n +1 ), using the same instruction in the correspondingtransition of ˆ π . We show the proof obligation diagrammatically in Fig. 7.We case split on cmd ( τ n +1 ), where τ n +1 is the last transition in ˆ π .If cmd ( τ n +1 ) is either an acquire or a release , then since the location mapsand lock maps are identical in both s n and σ n , the lock acquisition (or release)is enabled from s n . Moreover, since neither of the commands alter the versionsbetween σ n and σ n +1 , we have φ n = φ n +1 . Thus, s n ⇒ S t n +1 s n +1 , and the claimholds in this case.If cmd ( τ n +1 ) is assume ( b ), then, by Corollary 1, the version of any variable x read in the condition b is highest in Θ n ( t n +1 ). Moreover, since χ ( σ n ) = s n , forany variable x accessed in the condition b , we must have φ n ( x ) = ( Θ n ( t n +1 ) . x ) . This implies that (cid:74) b (cid:75) φ n = (cid:74) b (cid:75) L ( Θ n ( t n +1 )). Thus, s n ⇒ S t n +1 s n +1 and the claimholds in this case too.Finally, we consider the case when cmd ( τ n +1 ) is an assignment statement ofthe form x := e . In a manner analogous to the case of the assume earlier, we canprove that (cid:74) e (cid:75) φ n = (cid:74) e (cid:75) L ( Θ n ( t n +1 )). By, Lemma 3, the version of x in σ n +1 ishighest in Θ n +1 ( t n +1 ). Thus, φ n +1 ( x ) = ( Θ n +1 ( t n +1 ) . x ) . Since the assignment command is always enabled, and the above facts hold, weobtain that s n ⇒ S t n +1 s n +1 is a valid transition, and we are done.This completes the proof of the claim, and hence the theorem follows. (cid:117)(cid:116) An important corollary of the proofs of these theorems is that the L -DRF se-mantics is both sound and precise (vis-a-vis the standard semantics) in a relational sense, provided we restrict our attention to variables owned by a thread at a pro-gram point. For environments φ and φ (cid:48) and a subset of variables V of V , we usethe notation φ = V φ (cid:48) to mean that φ and φ (cid:48) agree on the values of variables in V ;i.e. for all x ∈ V we have φ ( x ) = φ (cid:48) ( x ). Corollary 2

Let P be a race-free program as above. Consider a thread t ∈ T anda point n ∈ L t . Let V ⊆ V be the set of variables owned by t at n . Then1. If (cid:104) pc , µ, φ (cid:105) is a reachable state in the standard interleaving semantics of P ,with pc ( t ) = n , then there exists a reachable state in the L -DRF semantics ofthe form (cid:104) pc , µ, Θ, Λ (cid:105) , with Θ ( t ) . V φ .2. Conversely, if (cid:104) pc , µ, Θ, Λ (cid:105) is a reachable state in the L -DRF semantics of P ,with pc ( t ) = n , then there exists a reachable state in the standard semantics ofthe form (cid:104) pc , µ, φ (cid:105) , with Θ ( t ) . V φ .Proof We prove the two parts separately.1. Since s = (cid:104) pc , µ, φ (cid:48) (cid:105) is a reachable state in the interleaving semantics, there isan execution π in the standard semantics that ends at s . By the completenessproof, there exists an execution ˆ π of the L -DRF semantics ending in a state σ , with π = χ (ˆ π ). It follows that σ must be of the form (cid:104) pc , µ, Θ, Λ (cid:105) with s = χ ( σ ). Further, it follows from Corollary 1, that for each variable x ∈ V ,the version of x must be highest in t . It now follows that φ (cid:48) = V Θ ( t ) .φ .2. If σ = (cid:104) pc , µ, Θ, Λ (cid:105) is a state in Reach L ( P ), then there must exist an executionˆ π = σ ent ⇒ L t . . . ⇒ L t σ of P in the L -DRF semantics. By Theorem 2, thereexists an execution s ent ⇒ S t . . . ⇒ S t s of P in the standard semantics, with χ ( σ ) = s . Thus s is of the form ( pc , µ, φ ) for some environment φ . Once again,it follows from Corollary 1, that the version of each x ∈ V is highest in Θ ( t ),among all component versioned environments in σ . By the construction of thefunction χ , it follows that for each variable x ∈ V , φ ( x ) = ( Θ ( t ) . x ). (cid:117)(cid:116) Remark 1

Until now we assumed that buﬀers associated with every post-releasepoint in L rel m are “relevant” to each pre-acquire point in L acq m . That is, for a post-release point n , if we take G ( n ) to be the set of pre-aquire points for which n isrelevant, then so far we have assumed that G ( n ) = L acq m . However, if no (standard)execution of the program P contains a transition τ i (with the target locationbeing n ) which synchronizes-with a transition τ j (with source location n (cid:48) ∈ L acq m ),then Theorem 1 (as well as Theorem 2) holds even if we remove n (cid:48) from G ( n ).This is true because in race-free programs, conﬂicting accesses are ordered by thehappens-before relation. Thus, if the most up-to-date value of a variable accessedby t was written by another thread t (cid:48) , then in between these accesses there mustbe a (sequence of) synchronization operations starting at a lock released by t (cid:48) andending at a lock acquired by t . This reﬁnement of the set G based on the aboveobservation can be used to improve the precision of the analyses derived from L -DRF, as it reduces the set of possible release points an acquire can observe. L -DRF In this section we introduce and illustrate a few static program analyses which arebased on the sync-CFG representation of a program and are, in turn, derived from hread-Local Analyses 27 the L -DRF semantics. We also reason about the correctness of such analyses usingthe notion of consistent abstractions. We begin by adapting the standard notionof abstract interpretation [9] to our setting, and recalling the theory of consistentabstractions.5.1 Abstract Interpretation of programsLet us ﬁx a program P = ( V , M , T ) for the rest of this section.An abstract interpretation (or data-ﬂow analysis ) of P is a structure of theform A = ( D, ≤ , d o , F ) where – D is the set of abstract states and ≤ represents a partial ordering over D . – ( D, ≤ ) forms a complete lattice. We denote the join (least upper bound) in thislattice by (cid:116) ≤ , or simply (cid:116) when the ordering is clear from the context. – d ∈ D is the initial abstract state. – F : inst P → ( D → D ) associates a transfer funcion F ( ι ) with each instruction ι of P . In what follows, we will write F ι instead of F ( ι ) for ease of presentation.We require each transfer function F ι to be monotonic , in that whenever d ≤ d (cid:48) we have F ι ( d ) ≤ F ι ( d (cid:48) ).An abstract interpretation A = ( D, ≤ , d , F ) of P induces a “global” transferfunction F : D → D , given by F ( d ) = d (cid:116) (cid:71) ι ∈ inst P F ι ( d ) . This transfer function can also be seen to be monotonic. By the Knaster-Tarskitheorem [33], F has a least ﬁxed point ( LFP ) in D , and we deﬁne this to be the“semantics” or “meaning” associated to P by the interpretation A , and denote itas (cid:74) P (cid:75) A . Formally, (cid:74) P (cid:75) A def = LFP ( F ) . Given two analyses C = ( D, ≤ , d , F ) and A = ( D (cid:48) , ≤ (cid:48) , d (cid:48) , F (cid:48) ) for P , we say A is a consistent abstraction of C if there exists functions α : D → D (cid:48) (called the abstraction function ), and γ : D (cid:48) → D (called the concretization function ), suchthat:1. α and γ form a Galois connection, which entails the following:(a) α and γ are monotonic(b) α and γ satisfy the following conditions – ∀ d ∈ D : γ ( α ( d )) ≥ d – ∀ d (cid:48) ∈ D (cid:48) : α ( γ ( d (cid:48) )) = d (cid:48) α ( (cid:74) P (cid:75) C ) ≤ (cid:48) (cid:74) P (cid:75) A (or, equivalently, (cid:74) P (cid:75) C ≤ γ ( (cid:74) P (cid:75) A )).A suﬃcient condition for consistent abstraction, that can be checked “locally”for each instruction, was proposed in [9]: Theorem 3 ( [9])

Let C = ( D, ≤ , d , F ) and A = ( D (cid:48) , ≤ (cid:48) , d (cid:48) , F (cid:48) ) be analyses for P . A suﬃcient condition for A to be a consistent abstraction of C is that thereexist maps α : D → D (cid:48) , and γ : D (cid:48) → D , which satisfy:1. α and γ form a Galois connection,

2. for each ι ∈ inst P , F (cid:48) ι safely approximates F ι , in that ∀ d ∈ D : α ( F ι ( d )) ≤ (cid:48) F (cid:48) ι ( α ( d )) ,

3. and α ( d ) ≤ (cid:48) d (cid:48) . (cid:117)(cid:116) P , A S = ( P ( S ) , ⊆ , { s ent } , F S ) , where, for any instruction ι ∈ inst P , with tid ( ι ) = t say, and for any subset X ⊆ S , F S ι ( X ) = { s (cid:48) | ∃ s ∈ X with s ⇒ S t s (cid:48) } . It turns out that the LFP of this analysis isexactly the reachable set of states in the transition system L S P : (cid:74) P (cid:75) A S = Reach ( L S P ) . In a similar way, the L -DRF semantics of Sec. 4 induces a collecting analysis A L given by A L = ( P ( Σ ) , ⊆ , { σ ent } , F L ) , where, for any instruction ι ∈ inst P , with tid ( ι ) = t say, and for any subset X ⊆ Σ , F L ι ( X ) = { σ | σ ∈ X with σ ⇒ L t σ (cid:48) } Once again, the LFP of this analysis can beseen to coincide with the reachable set of states in the transition system L L P ofSec. 4 for the L -DRF semantics: (cid:74) P (cid:75) A L = Reach ( L L P ) . P comprises the control ﬂow graphs of each staticthread code, augmented with synchronizes-with edges between synchronizationoperations (like releases and acquires of the same lock). Each thread operates onlocal copies of the data states, and communication between the threads is lim-ited to synchronization points alone. Such an analysis was ﬁrst introduced in [11],while analyses similar in spirit have been proposed in the literature (for examplethe thread-modular shape analysis of [16]).A sync-CFG diﬀers from the standard “product-graph” representation of con-current programs in two important ways:1. The sync-CFG contains nodes corresponding to each control location in theconcurrent program P . In contrast, the product graph contains nodes corre-sponding to every possible combination of control locations in P .

2. Each execution of P corresponds to some path in its product graph representa-tion. A sync-CFG does not maintain such a property in general. On the otherhand, a key property maintained by the sync-CFG is that for each executionof P , every happens-before path induced by the execution corresponds to somepath in the sync-CFG. hread-Local Analyses 29 As an example, consider again the program in Fig. 1. The sync-CFG repre-sentation of the program is given on the left in Fig. 9 (also shown in the centerof Fig. 2). On the other hand, an excerpt of the far larger product-graph of thisprogram is shown on the right of the same ﬁgure. As one may expect, any analysisbased on the product graph would be intractable for large programs.

Fig. 9

The sync-CFG representation of the program of Fig. 1 is presented on the left. On theright is an excerpt of the standard product graph representation of the same program.

More precisely, we say an abstract interpretation A of a program P is a sync-CFG based analysis if:1. The domain of abstract states of A is of the form L P → D (cid:48) . Thus the domainassociates an abstract fact from D (cid:48) with each location in P .2. The transfer function for each instruction ι = ( n, c, n (cid:48) ) depends only on theabstract fact at n for commands other than acquire (), while for acquire ()commands the transfer function depends on the abstract facts at n and asso-ciated release () points.The soundness of the facts computed by a sync-CFG based analysis needs to bequaliﬁed. The abstract fact computed by the analysis at each program point may not be an over-approximation of the set of concrete (interleaving) states arisingat that point. However, the facts are sound as long as they are interpreted in thewindow of variables owned by the thread at that point (cf. Sec. 3.4). This propertyof soundness of sync-CFG analyses was hitherto proved by a direct and somewhatinvolved argument that the LFP of the analysis will over-approximate the ownedportion of the concrete state along an execution [11, 16]. In particular, it appearsdiﬃcult to argue soundness by showing that the analysis is a consistent abstractionof the standard interleaving semantics.Instead, we give a way of arguing soundness of sync-CFG-based analyses by showing them to be consistent abstractions of the L -DRF semantics. In this sense,the L -DRF semantics is a kind of canonical or reference analysis for sync-CFGbased analyses. We elaborate on this in Sec. 5.6. Before that, however, we outlineseveral sync-CFG based analyses, as examples, which can be derived from the L -DRF semantics. Thread t () {1: acquire (l);2: x := y;3: x ++;4: y ++;5: release (l);6: } Thread t () {7: acquire (l);8: x ++;9: y ++;10: release (l);11: } Fig. 10

A simple race-free program on which we illustrate the analyses

VRel , Rel and

ValSet .All the variables are shared. L -DRFWe introduce and illustrate some sync-CFG analyses that are derived from the L -DRF semantics. We call these analyses (in decreasing order of precision) VRel (for“Versioned Relational”),

Rel (for “Relational”) and

ValSet (for “Value Set” [11]).We will use the race free program in Fig. 10 as an example to illustrate theseanalyses.

The

VRel analysis keeps track of sets of versioned environments at each programpoint. The abstract states are functions mapping program locations to sets ofenvironments, ordered by point-wise inclusion. We call these states cartesian , sincethey now lose the correlation between thread locations in the program counter.We deﬁne

VRel = (

L → P ( VE ) , (cid:22) , d VRel , F VRel ), where – f (cid:22) g iﬀ for each n ∈ L we have f ( n ) ⊆ g ( n ). – The initial abstract state is d VRel = λn. (cid:26) { ve ent } if n ∈ ent P ∅ otherwise . Here ve ent is the versioned environment (cid:104) λx. , λx. (cid:105) . – The transfer function F VRel ι , for an instruction inst = ( n, c, n (cid:48) ) of P is givenby F VRel ι = λf. ( f (cid:116) (cid:22) f (cid:48) )where f (cid:48) is deﬁned based on the command c as follows. If c is an assignmentcommand x := e , f (cid:48) ( l ) = (cid:26) (cid:74) x := e (cid:75) L ( f ( n )) if l = n (cid:48) ∅ otherwise . By (cid:74) c (cid:75) L ( f ( n )) we mean the application of the semantics of the command c , (cid:74) c (cid:75) L , pointwise on the set of versioned environments f ( n ). The case when c is an assume ( b ) command is handled similarly.When c is an acquire ( m ) command, we deﬁne f (cid:48) ( l ) = (cid:26) (cid:83) ve ∈ f ( n ) UpdEnv ( ve , X ) if l = n (cid:48) ∅ otherwise , hread-Local Analyses 31 where X = (cid:83) ¯ n ∈L rel m f (¯ n ).Interestingly, the eﬀect of release commands in the cartesian semantics is thesame as skip : This is because the abstraction neither tracks ownership of locksnor explicitly manipulates the contents of buﬀers. Thus when c is a releasecommand, we deﬁne f (cid:48) ( l ) = (cid:26) f ( n ) if l = n (cid:48) ∅ otherwise , Remark 2

We note here that we have chosen to deﬁne the transfer function in theform of F ι = λd. ( d (cid:116) d (cid:48) ) instead of simply F ι = λd.d (cid:48) . This is because (a) it is easyto see that the LFP of the analyses coincide in both forms, and (b) the latter formwill be convenient for showing the suﬃcient conditions for consistent abstractionin Sec. 6.Fig. 12 shows a sequence of instructions from the program in Fig. 10, alongwith the abstract states obtained by running the VRel analysis along this path.This is shown in the column marked

VRel . We show only the state at the relevantlocations of the active thread along the execution. The leftmost column shows the L − DRF states along the execution. Each L − DRF state shown has four rowscorresponding to the location counter, the local state of the thread t , the localstate of thread t , and ﬁnally the contents of the release buﬀers. We ignore thelock maps here. It is instructive to see how the VRel analysis over-approximatesthe L − DRF analysis at each step along the execution path. The abstraction maphere maps a set of L -DRF states X to a set of versioned environments Y n at point n in a thread t , which contains the thread-local versioned environments of t inthe states of X where thread t is a point n . Finally, Fig. 13 shows the ﬁxed pointsolutions of the three analyses we consider here, for the program of Fig. 10. Theleftmost columns on the two sides of the program show the values for the VRel analysis, with version tags abstracted away.

We now deﬁne the

Rel analysis, which abstracts the

VRel analysis by abstractingaway the version numbers. This is a more practicable analysis, and is one of theanalyses we focus on subsequently in our experiments.We deﬁne

Rel = ( A × , (cid:118) × , a ent × , F × ), where – The set of abstracts states is

L → P ( Env ), which we call A × , and we rangeover it using the meta-variable a × . – We have a × (cid:118) × a (cid:48)× iﬀ ∀ n ∈ L we have a × ( n ) ⊆ a (cid:48)× ( n ). – The initial abstract state is a ent × = λn. (cid:26) { λx. } if n ∈ ent P ∅ otherwise . The initial state thus maps the entry location of every thread to the set contain- ing the single environment, where all the variables are initialized to 0. Everyother program location is mapped to the empty set. – The transfer function F × ι , for an instruction inst = ( n, c, n (cid:48) ) of P is given asfollows. We deﬁne F × ι = λa × . ( a × (cid:116) × a (cid:48)× ) , where a (cid:48)× is deﬁned as follows.When c is an assignment command x := e , we deﬁne a (cid:48)× ( l ) = (cid:26) (cid:74) x := e (cid:75) S ( a × ( n )) if l = n (cid:48) ∅ otherwise . Here (cid:74) c (cid:75) S is the interpretation of the command c according to the standardsemantics, assumed to apply pointwise on a set of environments. The case ofan assume command is deﬁned similarly.When c is a release command, we have a (cid:48)× ( l ) = (cid:26) a × ( n ) if l = n (cid:48) ∅ otherwise , More directly, F × ι = λa × .a × [ n (cid:48) (cid:55)→ ( a × ( n (cid:48) ) ∪ a × ( n ))] . When c is an acquire ( m ) command, we deﬁne a (cid:48)× ( l ) = (cid:26) E mix if l = n (cid:48) ∅ otherwise , where E mix = mix ( a × ( n (cid:48) ) ∪ (cid:83) { a × (¯ n ) | ¯ n ∈ L rel m ∧ n ∈ G (¯ n ) } ), and mix : P ( Env ) → P ( Env ) ≡ λB × . { φ (cid:48) | ∀ x ∈ V , ∃ φ ∈ B × : φ (cid:48) ( x ) = φ ( x ) } . In other words, the mix returns a cartesian product of the input states. Note that as a result of abstracting away the version numbers, a thread cannot deter-mine the most up-to-date value of a variable, and thus conservatively picks anypossible value found either in its own local environment or in a relevant releasebuﬀer. Fig. 11 illustrates the operation of the mix function on two arbitrary inputenvironments.

Fig. 11

Illustrating the mix on a set of containing two environments φ and φ . Observe thatthe invariant x = y holds in the input environments. However, since this mix operates at thegranularity of single variables, the correlation is lost in the output states. We denote the LFP of the

Rel analysis for program P by (cid:74) P (cid:75) × . hread-Local Analyses 33 acquire(l); y++; release(l);acquire(l); x := y; x++; x++; (cid:104)(cid:105) (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) {(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105)} (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) (cid:104) , (cid:105) { (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) }(cid:104)(cid:105) {(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105)} (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) { (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) }(cid:104)(cid:105) {(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105)} (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) { (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) } (cid:55)→ (cid:10) x (cid:55)→ ,y (cid:55)→ (cid:11) {(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105)} (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) { (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) } (cid:55)→ (cid:10) x (cid:55)→ ,y (cid:55)→ (cid:11) {(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105)} (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) { (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) } (cid:55)→ (cid:10) x (cid:55)→ ,y (cid:55)→ (cid:11) {(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105) , (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) { (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) } (cid:55)→ (cid:10) x (cid:55)→ ,y (cid:55)→ (cid:11) {(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105) , (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) { (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) } (cid:55)→ (cid:10) x (cid:55)→ ,y (cid:55)→ (cid:11) {(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105) , (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) { (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) } (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) (cid:104) , (cid:105) (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) (cid:104) , (cid:105) (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) (cid:104) , (cid:105) (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) (cid:104) , (cid:105) (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) (cid:104) , (cid:105) (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) (cid:104) , (cid:105) (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) (cid:104) , (cid:105) (cid:104) x (cid:55)→ , y (cid:55)→ (cid:105) , (cid:104) x (cid:55)→ , y (cid:55)→ (cid:105) , (cid:104) x (cid:55)→ , y (cid:55)→ (cid:105)}(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105)}(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105)} Rel (cid:104)(cid:105) {(cid:104) x (cid:55)→ , y (cid:55)→ (cid:105)} (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) { (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) } (cid:10) x (cid:55)→ , y (cid:55)→ (cid:11) (cid:104) , (cid:105) x (cid:55)→{ } , y (cid:55)→{ } x (cid:55)→{ } , y (cid:55)→{ } x (cid:55)→{ } , y (cid:55)→{ } x (cid:55)→{ } , y (cid:55)→{ } x (cid:55)→{ } , y (cid:55)→{ } x (cid:55)→{ } , y (cid:55)→{ } x (cid:55)→{ , } , y (cid:55)→{ , } x (cid:55)→{ , } , y (cid:55)→{ , } x (cid:55)→{ , } , y (cid:55)→{ , } ValSetVRel L -DRF t t Fig. 12

The interpretation of

VRel , Rel , and

ValSet along an execution of the program ofFig. 10.

The

ValSet analysis of [11] can be obtained as an abstraction of the

Rel analysis.The abstract domain of the

ValSet analysis is of the form

L → VS , where VS is the “value-set” domain which which maps each program variable to a set ofvalues, that is, VS : V → P ( V ). We deﬁne

ValSet = (

L → VS , (cid:118) , s ValSet , F ValSet ) where – s (cid:118) s (cid:48) iﬀ ∀ n ∈ locs we have s ( n )( x ) ⊆ s (cid:48) ( n )( x ). acquire(l); x := y; x++; release(l);y++; acquire(l); release(l); y++; x++; VRel x = y =00 ≤ x = y ≤ x = y

The ﬁxed point results of the

VRel , Rel , and

ValSet analyses on the program ofFig. 10. The set of variables owned at location 1, 6, 7 and 11 is ∅ , while at other points it is { x, y } . The facts are sound (even in a relational sense) when restricted to the variables ownedat each point. – The initial abstract state is s ValSet = λn. (cid:26) λx. { } if n ∈ ent P λx. ∅ otherwise . – The transfer function F ValSet can be deﬁned via the transfer function F × ofthe Rel analysis. Let us deﬁne the value-set abstraction function α vs : A × → ( L → VS ) as α vs ( a × ) = λ n . ( λx. { v | ∃ φ ∈ a × ( n ) : φ ( x ) = v } ) , and the value-set concretization function γ VS : ( L → VS ) → A × as γ VS ( s ) = λn. { φ | ∀ x ∈ V : φ ( x ) ∈ s ( n )( x ) } . The transfer function of the

ValSet analysis for an instruction ι can now bedeﬁned as F ValSet ι ( s ) = α VS ( F × ι ( γ VS ( s ))).In the ValSet analysis, the abstract mix operator reduces to the standard value-set join operation (which takes a component wise union of the value-sets).The abstract state of the

ValSet analysis along the example execution is shownin the third column of Fig. 12, and the ﬁxed point solution in the third column ofFig. 13.As one can see from Fig. 13, the analysis

VRel computes the most precisefacts – it is able to establish the equality between x and y prior to the release ()command in both the threads. The Rel analysis loses this correlation after the acquire () command in thread t . Lastly, the ValSet analysis fails to establish anyuseful relation between x and y . L -DRFWe can improve upon Rel in a practicable way by not forgetting the versionsentirely. We augment A × with “recency” information based on the versions as hread-Local Analyses 35 follows. For a set C of states of the L -DRF semantics, deﬁne recent ( C ) to be theset of threads t ∈ T such that there exists a state (cid:104) pc , µ, Θ, Λ (cid:105) ∈ C , and x ∈ V ,such that ( Θ ( t ) . x ) ≥ ( Θ ( t (cid:48) ) . x ) for each t (cid:48) ∈ T . In other words, recent ( C )is the set of threads which contain the most up-to-date value of some variable x .This additional information can now be used to improve the precision of mix . Fig. 14

A simple race-free program to demonstrate the beneﬁt of using thread-identiﬁers inthe abstract state. In the normal setting, the synchronizes-with edges create a cycle in theprogram, and it is not possible to derive an upper bound on the value of x . However, if wetrack thread-identiﬁers in the state, thread t observes that any state it receives from t istagged with the set { t } , and thus t can safely drop the data ﬂow facts. In the program shown in Fig. 14, thread t writes to x , while holding the lock m , whereas thread t reads from x while holding m . In the usual sync-CFG setting,the synchronizes-with edges creates a cycle in the program graph. Thus, the dataﬂow facts propagate back and forth between the threads, and the analysis, inthis example, fails to derive an upper bound for the value of x . In the recencybased analysis, the data ﬂow fact comprises elements from A × , as well as a set S of thread-identiﬁers that overapproximate the recency information. Whenevera thread writes to a variable, it adds its identiﬁer to S . Other commands do notaﬀect S . In the example, t adds its identiﬁer to S , and this is propagated to t .However, since t does not write to x , the set S is propagated back, unaltered, to t . The thread t now ﬁnds that the incoming data ﬂow fact contains a singleton S , with its own thread-identiﬁer, which indicates it is receiving a stale fact. Thisallows the thread to safely drop the data ﬂow fact along an incoming sync-edge,thereby breaking the cycle. An abstract analysis based on thread-identiﬁers can,in fact, prove an upper bound for x .5.6 Soundness of Sync-CFG analysesConsider a sync-CFG analysis A for program P . We can prove the “soundness” of A , in the sense deﬁned in Sec. 5.3, with respect to the interleaving semantics, byshowing A to be a consistent abstraction of the L -DRF analysis via an abstractionmap α and concretization map γ . Simply put, the set of environments computedby the sync-CFG analysis A at location n in thread t , is guaranteed to be a safe approximation of the actual concrete (standard) states arising whenever thread t is at location n , provided we restrict our attention to the sub-environments on theset of variables owned by t at n . We state this more formally below. Theorem 4

Let A be a sync-CFG analysis of a race free program P . Supposethat A has been shown to be a consistent abstraction of the L -DRF analysis, via an abstraction map α and concretization map γ . Let t ∈ T and n ∈ L t , andlet V be the set of variables owned by t at location n . Let s = (cid:104) pc , µ, φ (cid:105) be areachable state of the interleaving semantics, with pc ( t ) = n . Then there exists astate σ = (cid:104) pc , µ, Θ, Λ (cid:105) in γ ( (cid:74) P (cid:75) A ) with φ = V ( Θ ( t ) . .Proof The proof is immediate since, by Corollary 2, there is a reachable state σ of the L -DRF semantics which coincides with s , modulo the restriction to V . Thefact that A is a consistent abstraction of L -DRF says that the γ image of its LFPmust contain the state σ . (cid:117)(cid:116) For example, the facts about x and y inferred by each of the three analyses inFig. 13 at point 4 is sound (since both x and y are owned by t at these points).However at point 1, the inferred facts may not be sound (and in fact they are not),since x and y are not owned at point 1. Rel analysis

In this section we show that the

Rel analysis is a consistent abstraction of the A L analysis based on L -DRF. Claim

For any program P , the analysis Rel is a consistent abstraction of the A L analysis for P . Proof

Consider a program P = ( V , M , T ). We will make use of the deﬁnitions ofthe analysis A L from Sec. 5.2, and Rel from Sec. 5.4.2, and we refer the reader tothem. To show that

Rel is a consistent abstraction of A L , it suﬃces (by Theorem 3)to exhibit an abstraction map α × and a concretization function γ × satisfying theconditions of Theorem 3.The abstraction function α × maps a set of L − DRF states C ⊆ Σ to anabstract state a × ∈ A × . The abstract value α × ( C )( n ) contains the collection of t ’s environments (where t = tid ( n )) coming from any state σ ∈ C where t is atlocation n . In addition, if n is a post-release point, α × ( C )( n ) also contains thecontents of the buﬀer Λ ( n ) for each state σ ∈ C . We deﬁne α × : P ( Σ ) → A × ,given by α × ( C ) = λ n . ( { φ | (cid:104) pc , µ, Θ, Λ (cid:105) ∈ C ∧ tid ( n ) = t ∧ pc ( t ) = n ∧ Θ ( t ) = (cid:104) φ, ν (cid:105)} ∪{ φ | (cid:104) pc , µ, Θ, Λ (cid:105) ∈ C ∧ n ∈ L rel ∧ Λ ( n ) = (cid:104) φ, ν (cid:105)} ) . The concretization function γ × maps a cartesian state a × to a set of L − DRF states C in which the local state of a thread t , when t is at program point n ∈ L t ,comes from a × ( n ) and the contents of the release buﬀer pertaining to the post-release location n ∈ L rel also comes from a × ( n ). We deﬁne γ × : A × → P ( Σ )given by: γ × ( a × ) = (cid:26) (cid:104) pc , µ, Θ, Λ (cid:105) ∈ Σ (cid:12)(cid:12)(cid:12)(cid:12) ∀ t ∈ T : Θ ( t ) = (cid:104) φ, ν (cid:105) ∧ φ ∈ a × ( pc ( t )) ∧∀ n ∈ L rel : Λ ( n ) = (cid:104) φ, ν (cid:105) ∧ φ ∈ a × ( n ) (cid:27) . Let X ⊆ Σ be a set of states of P in the L -DRF semantics. Let ι = ( n, c, n (cid:48) )be an instruction in P , with tid ( n ) = t . Let X (cid:48) = F L ι ( X ) = { σ (cid:48) | ∃ σ ∈ X, σ ⇒ L t σ (cid:48) } . hread-Local Analyses 37 X a × X (cid:48) a (cid:48)× α × F L ( n,c,n (cid:48) ) F × ( n,c,n (cid:48) ) (cid:118) × X (cid:48) α × Fig. 15

The proof obligation to show

Rel is a consistent abstraction of A L . The solid linesrepresent given relations, while the dashed line needs to be established. Further, let a × = α × ( X ) and a (cid:48)× = F × ι ( a × ). Then we need to show that α × ( X (cid:48) ) (cid:118) × a (cid:48)× . (2)This is depicted in Fig. 15.We observe that for each σ (cid:48) = (cid:104) pc (cid:48) , µ (cid:48) , Θ (cid:48) , Λ (cid:48) (cid:105) in X (cid:48) we have pc (cid:48) ( t ) = n (cid:48) , andthere exists a state σ = (cid:104) pc , µ, Θ, Λ (cid:105) ∈ X such that pc ( t ) = n , pc (cid:48) = pc [ t (cid:55)→ n (cid:48) ], andfor each t (cid:48) (cid:54) = t we have Θ (cid:48) ( t (cid:48) ) = Θ ( t (cid:48) ). Further, every environment φ (cid:48) that occursin Θ ( t (cid:48) ) where t (cid:48) (cid:54) = t , is already present in a × . This is because (a) it is present in σ and α × ensures that it is present in the appropriate location in a × ; and (b) bythe deﬁnition of the transfer function F × ι , every environment at location l in a × is also at location l in a (cid:48)× . Thus to show that (2) holds, it suﬃces to show for anarbitrary σ (cid:48) = (cid:104) pc (cid:48) , µ (cid:48) , Θ (cid:48) , Λ (cid:48) (cid:105) that the environments in Θ (cid:48) ( t ) and Λ (cid:48) are presentin the appropriate locations ( n (cid:48) and release points, respectively) in a (cid:48)× .Let us ﬁx an σ (cid:48) = (cid:104) pc (cid:48) , µ (cid:48) , Θ (cid:48) , Λ (cid:48) (cid:105) ∈ X (cid:48) and a σ = (cid:104) pc , µ, Θ, Λ (cid:105) ∈ X as above.We now show this subclaim for each command c . Assignment.

When c is an assignment of the form x := e . Let Θ (cid:48) ( t ) = (cid:104) φ (cid:48) , ν (cid:48) (cid:105) .Then φ (cid:48) = (cid:74) x := e (cid:75) φ , where Θ ( t ) = (cid:104) φ, ν (cid:105) , for some ν . Now φ ∈ a × ( n ), and by thedeﬁnition of F × ι , also in a (cid:48)× ( n (cid:48) ).Further, since Λ (cid:48) = Λ , its environments are all included in a × and hence alsoin a (cid:48)× .The case of assume commands is handled similarly. Release.

Recall that in this case a (cid:48)× = a × [ n (cid:48) (cid:55)→ ( a × ( n (cid:48) ) ∪ a × ( n ))]. Now φ (cid:48) = φ and therefore φ (cid:48) ∈ a × ( n (cid:48) ). Also, Λ (cid:48) = Λ [ n (cid:48) (cid:55)→ (cid:104) φ, ν (cid:105) ]. But φ already belongs to a × ( n (cid:48) ). Acquire.

In this case, Θ (cid:48) ( t ) chooses to take the value of a variable x in the thread-local environment of t , from the versioned environment ve in some relevant buﬀer,or the existing thread-local environment of t . By the construction of α × , if ve waschosen from some post-release point ¯ n , then this environment is guaranteed toexist in a × (¯ n ). Likewise, if ve is simply the thread-local versioned environment of t , then the environment would be in a × ( n ). Since, by the semantics of the acquire in the Rel analysis, all the environments at all such ¯ n , and the environment at n ,is taken into account in the mix , and since this operation is performed for eachvariable x ∈ V , we have Θ (cid:48) ( t ) . ∈ a × ( n (cid:48) ).This completes the proof of (2) and hence of the Claim. (cid:117)(cid:116) From Theorem 4, it now follows that the facts inferred by the

Rel analysisabout the owned set of variables at each location in a program P , are indeedsound. L -DRF In this section, we introduce a reﬁned notion of data race freedom, based on dataregions , and derive from it a more precise abstract analysis capable of transferring some relational information between threads at synchronization points. The ob-jective is to modify the L -DRF semantics such that the abstract mix operates ata granularity higher than individual variables.7.1 Why do we need another semantics?Fig. 11, which illustrates the operation of mix , also highlights the key issue withthe L -DRF semantics: any abstract analysis derived from the L -DRF semanticsmust make use of an abstract mix which operates at the granularity of individualvariables. Thus, even though two variables may be related in the input environ-ments to mix (like x = y in Fig. 11), the function must necessarily forget theircorrelation after the mixing. This is essential for soundness. This is the reason thatprevents us from proving the assertion x = y at line 11 in the motivating examplein Fig. 2. Even though the acquire ( m ) in t obtains the fact x = y from both itsinput edges, it fails to maintain this correlation post the mix.While the VRel analysis we saw in Sec. 5.4 had a mix operator which didbetter for the program in Fig. 10 – it preserved the correlation between x and y after the mix in thread t – the analysis is not practicable (it does not provide anabstraction of the versions, which may grow in an unbounded fashion).Our solution is to make use of user-deﬁned regions. Essentially, regions are auser-deﬁned partitioning of the set of program variables. We call each partition a region r , denote the set of regions as R , and the region of a variable x by rg ( x ).The semantics precisely tracks correlations between variables within regions across inter-thread communication, while abstracting away the correlations be-tween variables across regions. This partitioning is based on the semantics of theprogram: developers often write code where a group of variables forms a logicalcluster. Often, some invariant holds on the variables within this cluster at speciﬁcprogram points. Since we make this partitioning explicit in the semantics, withsuitable abstractions the tracked correlations can improve the precision of the ab- stract analyses for programs which conform to the notion of race freedom deﬁnedbelow. hread-Local Analyses 39 region-level data race [30] occurs when two concurrent threads accessvariables from the same region r (not necessarily the same variable), with at leastone access being a write, and the accesses are devoid of any ordering constraints.A command x := e constitutes a write access to the region rg ( x ), and a readaccess of every region rg ( y ), for each variable y appearing in the expression e .Similarly, a command assume ( b ) constitutes a read access of every region rg ( y ),for each variable y appearing in the condition b . We are now in a position tointroduce our notion of region level races. Deﬁnition 6 (Region-level races)

Let P be a program and let R be a regionpartitioning of P . An execution π of P , in the standard interleaving semantics,has a region-level race if there exists 0 ≤ i < j < | π | , such that c ( π i ) and c ( π j )both access variables in region r ∈ R , at least one access is a write, and it is notthe case that π i hb −→ π π j .The problem of checking for region races can be reduced to the problem ofchecking for data races as follows. We introduce a fresh variable X r for each region r ∈ R . We now transform the input program P to a program P (cid:48) with the following additions. We assume without loss of generality that assume () statements in onlyreference thread-local variables. For example, we replace assume ( x < y ) by thestatements “ l x := x ; l y := y ; assume ( l x < l y )”. – We precede every assignment statement x := e , where r w is the region which iswritten to, and r , . . . , r n are the regions read, with a sequence of instructions X r w := X r ; . . . X r w := X r n ;. – Statements of the form assume ( b ) do not need to be changed because b refersonly to thread-private variables. – The acquire and release statements do not involve the access of any variable.Thus, they remain unmodiﬁed.Note that these modiﬁcations do not alter the semantics of the original program(for each trace of P there is a corresponding trace in P (cid:48) , and vice versa). We nowcheck for data races on the X r variables.7.3 The L -RegDRF semanticsThe region-based version of L -DRF semantics, which we call here the L -RegDRFsemantics [30], is obtained via a simple change to the L -DRF semantics: a write-access to a variable x leads to incrementing the version of every variable thatresides in x ’s region. In other words, the semantics of the assignment command, (cid:74) x := e (cid:75) : VE → VE , is deﬁned as follows: (cid:74) x := e (cid:75) (cid:104) φ, ν (cid:105) = (cid:104) φ (cid:48) , ν (cid:48) (cid:105) where φ (cid:48) = φ [ x (cid:55)→ (cid:74) e (cid:75) φ ], and ν (cid:48) is given by: ν (cid:48) ( y ) = (cid:26) ν ( y ) + 1 if rg ( y ) = rg ( x ) ,ν ( y ) otherwise . It is not diﬃcult to see that the versions of Theorems 1 and 2 hold for thecompleteness and soundness of the L -RegDRF semantics vis-a-vis the standardinterleaving semantics, for programs that are region-race free. Hence, we can an-alyze such programs using abstractions of L -RegDRF and obtain sound resultswith respect to the standard interleaving semantics (Sec. 3.3).7.4 Thread-Local Abstractions of the L -RegDRF SemanticsThe cartesian abstractions deﬁned in Sec. 5 can be extended to accommodateregions in a natural way. The only diﬀerence lies in the deﬁnition of the mix operation, which now operates at the granularity of regions , rather than variables: mix : P ( Env ) → P ( Env ) def = λB × . { φ (cid:48) | ∀ r ∈ R , ∃ φ ∈ B × s . t . ∀ x ∈ V s . t . rg ( x ) = r we have φ (cid:48) ( x ) = φ ( x ) } . Mixing environments at the granularity of regions is permitted because the L -RegDRF semantics ensures that all the variables in the same region have thesame version. Thus, their most up-to-date values reside in either the thread’s localenvironment or in one of the release buﬀers. As before, we can obtain an eﬀec-tive analysis using any sequential abstraction, provided that the abstract domainsupports the (more precise) region based mix operator.7.5 Illustrative ExampleWe illustrate the eﬀect of the regions using some small examples. Consider againthe situation in Fig. 11. Recall that even though the input environments main-tained x = y , the mix was unable to preserve this correlation because it operatedat the granularity of individual variables. However, when mix is made aware of theregion deﬁnitions, it maintains the correlation between variables within a region.Thus, in Fig. 16, the invariant x = y continues to hold in the output state.Returning to the program in Fig. 2, consider the situation at the acquire at line 10 (illustrated in Fig. 17). It receives the invariant x = y from both itsinput branches. The mix in the Rel abstraction of L -DRF only outputs the correctbounds for the variables, and forgets the correlation between x and y . However, theregion-aware mix preserves this invariant, which enables the region-aware versionof Rel derived from L -RegDRF, which we call RegRel , to prove the assertion atline 11.

In this section, we perform a thorough empirical evaluation of our analyses usinga prototype analyzer which we have developed, called RATCOP [29] , for thestatic intra-procedural analysis of race-free concurrent Java programs. RATCOP The project artifacts are available at https://bitbucket.org/suvam/ratcop hread-Local Analyses 41

Fig. 16

Illustrating the operation of mix when it is aware of regions. In this example, withthe regions being (cid:104){ x, y } , { z }(cid:105) , the function maintains the correlation between x and y in theoutput. Fig. 17

The improved precision of the region aware mix derived from the L -RegDRF seman-tics allows it to prove the additional assertion at line 11 in Fig. 2. comprises around 4000 lines of Java code, and implements a variety of relationalanalyses based on the theoretical underpinnings described in earlier sections ofthis paper. Through command line arguments, each analysis can be made to useany one of the following three numerical abstract domains provided by the Apronlibrary [19]: Convex Polyhedra (with support for strict inequalities), Octagons andIntervals. RATCOP also makes use of the Soot [34] analysis framework for Java.The tool reuses the code for ﬁxed point computation and the graph data structuresin the implementation of [11].The tool takes as input a Java program with assertions marked at appropriateprogram points. We ﬁrst checked all the programs in our benchmarks for dataraces and region races using Chord [31]. For detecting region races, we have imple-mented the translation scheme outlined in Sec. 7.2. RATCOP then performs the necessary static analysis on the program until a ﬁxpoint is reached. Subsequently,the tool automatically tries to prove the assertions using the inferred facts (whichtranslates to checking whether the inferred fact at a program point, projected tothe variables owned at that point, implies the assertion condition): if it fails to prove an assertion, it records the corresponding inferred fact in a log ﬁle for manualinspection. Fig. 18 summarizes the set of operations in RATCOP. Fig. 18

Architecture of RATCOP.

As benchmarks, we use a subset of concurrent programs from the SV-COMP2015 suite [4]. We chose only those programs which we believe have interesting re-lational invariants. We ported the programs (which are originally in C) to Java andintroduced locks appropriately to remove races. We also use a program from [26],which is an abstraction of a producer-consumer scenario. While these programsare not too large, they have challenging invariants to prove, and provide a goodtest for the precision of the various analyses. We ran the tool in a virtual machinewith 16GB RAM and 4 cores. The virtual machine, in turn, ran on a machine with32GB RAM and a quad-core Intel i7 processor. We evaluated ﬁve analyses on thebenchmarks. The ﬁrst four are based on the

Rel analysis (Sec. 5.4.2), and employthe Octagon numerical abstract domain. The last is based on the

ValSet analysis(Sec. 5.4.3), and uses the Interval domain. These analyses are named as follows:1. RT : Without regions and thread identiﬁers .2. RT : With regions, but with no thread identiﬁers.3. RT : Without regions, but with thread identiﬁers.4. RT : With regions and thread identiﬁers.5. VS : The value-set analysis of [11].In terms of the precision of the abstract domains, the analyses form the fol-lowing partial order: VS ≺ RT ≺ RT ≺ RT and VS ≺ RT ≺ RT ≺ RT . Weuse VS as the baseline.8.2 Evaluation Porting Sequential Analyses to Concurrent Analyses.

For the sequential com-mands, we performed a lightweight parsing of statements and simply re-use the built-in transformers of Apron. The only operator we needed to deﬁne afresh wasthe abstract mix . Since Apron exposes functions to perform each of the constituentsteps, implementing the abstract mix was straightforward as well. By thread-identiﬁers we are referring to the abstraction of the versions (recency informa-tion) outlined in Remark 5.5hread-Local Analyses 43

Precision and Eﬃciency.

Table. 2 summarizes the results of the experiments. R T R T R T R T V S P r og r a m L O C T h r e a d s A ss e r t s (cid:88) T i m e ( m s ) (cid:88) T i m e ( m s ) (cid:88) T i m e ( m s ) (cid:88) T i m e ( m s ) (cid:88) T i m e ( m s ) r e o r d e r ( C ) ( C ) ( C ) ( C ) s i g m a B * sss c un v e r i f s p i n s i m p l e L oo p s i m p l e L oo p d o ub l e L o c k p ﬁb B e n c h ﬁb B e n c h L o n g e r i nd e x e r t w o s t a g e B s i n g l e t o n w i t hun i n i t s t a c k s t a c k l o n g e r s t a c k l o n g e s t s y n c q w [ ] F i g . . T o t a l ( A v g ) ( A v g ) ( A v g ) ( A v g ) ( A v g ) ( A v g ) T a b l e Su mm a r y o f t h ee x p e r i m e n t s . Sup e r s c r i p t B i nd i c a t e s t h a tt h e p r og r a m h a s a n a c t u a l bu g . ( C ) i nd i c a t e s t h e u s e o f C o n v e x P o l y h e d r aa s a b s t r a c t d a t a d o m a i n . “*” i nd i c a t e s a p r og r a m w h e r e w e h a v e a l t e r e d / w e a k e n e d t h e o r i g i n a l a ss e r t i o n . T h e (cid:88) c o l u m n i nd i c a t e s t h e nu m b e r o f a ss e r t i o n s t h e t oo l w a s a b l e t o p r o v e . hread-Local Analyses 45 While all the analyses failed to prove the assertions in reorder 2 , RT and RT were able to prove them when they used convex polyhedra instead of octagons.Since none of the analyses track arrays precisely, all of them failed to prove theoriginal assertion in sigma (which involves checking a property involving the sumof the array elements). However, RT and RT correctly detect a potential arrayout-of-bounds violation in the program. The improved precision is due to thefact that RT and RT track thread identiﬁers in the abstract state, which avoidsspurious read-write cycles in the analysis of sigma . The program twostage 3 hasan actual bug, and the assertions are expected to fail. This program providesa “sanity check” of the soundness of the analyses. Programs marked with “*””contain assertions which we have altered completely and/or weakened. In thesecases, the original assertion was either expected to fail or was too precise (possiblyrequiring a disjunctive domain in order to prove it). In qw2004 , for example, ourmodiﬁed assertions are of the form x = y . RT and RT perform well in this case,since we can specify a region containing x and y , which precisely tracks theircorrelation across threads. The imprecision in the remaining cases are mostly dueto the program requiring disjunctive domains to discharge the assertions, or thepresence of spurious write-write cycles which weaken the inferred facts. Abstractingour semantics to handle such cycles is an interesting future work.Of the total 40 “valid” assertions (excluding the two in twostage 3 ), RT is themost precise, being able to prove 65% of them. It is followed by RT (55%), RT (45%), RT (35%) and, lastly, VS (25%). Thus, the new analyses derived from L -DRF and L -RegDRF perform signiﬁcantly better than the value-set analysisof [11]. Moreover, this total order respects the partial ordering between the analysesdeﬁned earlier.With respect to the running times, the maximum time taken, across all theprograms, is around 2 seconds, by RT . VS turns out to be the fastest in general,due to its lightweight abstract domain. RT and RT are typically slower that RT and RT respectively. The slowdown can be attributed to the additional trackingof regions by the former analyses. Note that for the program sigma , RT was bothmore precise and faster than the baseline VS .8.3 Comparison with a recent abstract interpretation based tool.We also compared the eﬃciency of RATCOP with that of Batman, a tool im-plementing the previous state-of-the-art analyses based on abstract interpreta-tion [27, 28] (a discussion on the precision of our analyses against those in [27]is presented in Sec. 9). The basic structure of the benchmark programs for thisexperiment is as follows: each program deﬁnes a set of shared variables. A main thread then partitions the set of shared variables, and creates threads which ac-cess and modify variables in a unique partition. Thus, the set of memory locationsaccessed by any two threads is disjoint. In our experiments, each thread simplyperformed a sequence of writes to a speciﬁc set of shared variables. In some sense, these programs represent a “best-case” scenario for concurrent program analysesbecause there are no interferences between threads. Unlike RATCOP, the Bat-man tool, in its current form, only supports a small toy language and does notprovide the means to automatically check assertions. Thus, for the purposes ofthis experiment, we only compare the time required to reach a ﬁxpoint in the two tools. We compare RT against Batman running with the Octagon domain andthe BddApron library [18] (Bm-oct). RT Time (ms) Bm-oct Time (ms)

Table 3

Running times of RATCOP ( RT ) and Batman (Bm-oct) on loosely coupled threads.The number of shared variables is ﬁxed at 6. Fig. 19

Graphical representation of the data in Table 3 on a logarithmic scale. RATCOPperforms exponentially faster, compared to Batman, on this benchmark.

The running times of the two analyses are given in Table 3. The graph inFig. 19 plots these running times on a logarithmic scale. In the benchmarks, withincreasing number of threads, RATCOP was upto 5 orders of magnitude fasterthan Bm-oct. The rate of increase in running time was roughly linear for RAT-COP, while it was almost exponential for Bm-oct. We believe the reason for thisdiﬀerence in running times is that the analyses in [27, 28] compute sound factsat every program point. Thus, as the number of threads increase, these analy-ses have to account for data ﬂow over an exponential number of context-switchpoints, which contributes to the slowdown. RATCOP, on the other hand, does not attempt to be sound at all program points. For these programs it performs nointer-thread propagation, and the time increases linearly with the total numberof program points. For assertions in thread t which only involve variables in thelogical partition of t , RATCOP is at least as precise as Batman, since proving suchassertions do not require inter-thread reasoning. hread-Local Analyses 47 In this paper we have presented a framework for developing intra-procedural data-ﬂow analyses for data race free shared-memory concurrent programs, with a stat-ically ﬁxed number of threads, and with variables having primitive data types.There is a rich literature on data ﬂow analysis of concurrent programs. Werefer the reader to the detailed survey by Rinard [32] which provides details of themain approaches. In this section, we proceed to compare our work with some ofthe relevant prior approaches.

Degree of Inter-thread Communication.

Chugh et al [7] automatically lift a givensequential analysis to a sound analysis for concurrent programs, using a datarace detector. However, data-ﬂow facts are not communicated across threads, andthis can cause a loss in precision. The work by Mine [25] allows a greater degreeof inter-thread communication. Here, the overall analysis can be considered toproceed in rounds of thread-modular analyses. At the end of each round, everythread generates a set of per-thread “interferences” – for each variable x , a thread t stores the set of values it writes to x when t was analyzed modularly. In thenext iteration, each thread t (cid:48) (cid:54) = t takes into account this interference informationfrom t , whenever it reads x . This, in turn, generates more interferences for t (cid:48) ,and the process continues till ﬁxpoint. Thus, the inter-thread communication isﬂow insensitive. Unlike our semantics, this analysis is unable to infer relationalproperties between variables.Mine [27] presents an abstract interpretation formulation of the rely-guaranteeproof paradigm [20, 35], and allows one to derive analyses with varying degreesof inter-thread ﬂow sensitivity. In particular, the work in [25] is shown to be anabstraction of the semantics in [27]. The semantics in [27] involves a nested ﬁxed-point computation, compared to our single ﬁxed-point formulation. The resultinganalysis aims to be sound at all program points (e.g, in Fig. 2 the value of y atline 9 in t ), due to which many more interferences will have to be propagatedthan we do, leading to a less eﬃcient analysis. The times clocked by Batman, incomparison to RATCOP, is testament to this. [27] attempts to retrieve some de-gree of eﬃciency by computing “lock invariants”, which are essentially summariesof each critical section. However, to make use of this, the program must be well-synchronized – every access of a shared variable must be protected by a lock, whichis a stronger requirement than data race freedom. Moreover, for certain programs,our abstract analyses are more precise. Fig. 20 shows a program which is racefree, even though the conﬂicting accesses to x in lines 2 and 12 are not protectedby a common lock. The “lock invariants” in [27] would consider these accessesas potentially racy, and would allow the read at line 12 to observe the write atline 2, thereby being unable to prove the assertion. However, our analyses wouldensure that the read only observes the write at line 11, and is able to prove theassertion. [15] presents an operational semantics for concurrent programs, param-eterized by a relation. It makes additional assumptions about code regions which are unsynchronized (allowing only read-only shared variables and local variablesin such regions). Moreover, it too computes sound facts at every point, resultingin less eﬃcient abstractions. In this sense, De et al [11] strikes a sweet spot: byleveraging the race freedom assumption, the analysis restricts data ﬂow facts to synchronization points alone, thereby gaining eﬃciency. However, this work can-not compute relational information either, being based on a cartesian value-setdomain. Control Flow Representation.

The methods described in [11,12,17] present concur-rent data ﬂow algorithms by building specialized concurrent ﬂow graphs. However,the class of analyses they address are restricted – [12] handles properties express-ible as Quantiﬁed Regular Expressions, [17] handles reaching deﬁnitions, while [11]only handles value-set analyses. While our analyses also makes use of the sync-CFG data structure of [11], the L -DRF and L -RegDRF semantics allows us to useit in conjunction with much more expressive abstract domains. In contrast to ourapproach, the techniques in [13, 14] provide an approach to verifying properties ofconcurrent programs using data ﬂow graphs, rather than use control ﬂow graphslike we do. Thread t1 () {1: acquire (m);2: x := 1;3: y := 1;4: release (m);5: } Thread t2 () {6: while ( p != 1 ) {7: acquire (m);8: p := y;9: release (m);10: }11: x := 2;12: p := x;13: assert (p != 1);14: }

Fig. 20

Example demonstrating that a program can be DRF, when the accesses of a globalvariable (in this case, the write and read of x at lines 11 and 12 respectively) are not directlyguarded by any lock. Resource Invariants vs. Regions.

A traditional approach to analyzing concurrentprograms involves resource invariants associated with every lock (e.g. Gotsman etal [16]). This approach depends on a locking policy where a thread only accessesglobal data if it holds a protecting lock. In contrast, our approach does not requirea particular locking policy (e.g., see Fig. 20), and is based on a parameterizednotion of data-race-freedom, which allows to encode locking policies as a particularcase. Thus, at the overhead cost of ensuring data race freedom, our new semanticsprovides greater ﬂexibility to analysis writers. The analysis in [16] also works insimilar spirit as the sync-CFG a selected part of the heap protected by a lockis made accessible to a thread only when it acquires the lock. In contrast, thesynchronization edges in a sync-CFG propagate entire data ﬂow facts. The lockingpolicy employed by [16] is stronger than the notion of race freedom, and the classof programs the analysis can handle is a subset of what we handle in this work.

Region Races.

Our notion of region races is inspired by the notion of high-level data races [3]. The concept of splitting the state space into regions was earlierused in [23], which used these regions to perform shape analysis for concurrentprograms. However, that algorithm still performs a full interleaving analysis whichresults in poor scalability. The notion of variable packing [5] is similar to our notion hread-Local Analyses 49 of data regions. However, variable packs constitute a purely syntactic grouping ofvariables, while regions are semantic in nature. A syntactic block may not accessall variables in a semantic region, which would result in a region partitioning morereﬁned than what the programmer has in mind, which would result in decreasedprecision.As future work, we would like to evaluate the performance of our tool whenequipped with disjunctive relational domains. In this work, we do not considerdynamically allocated memory, and extending the L -DRF semantics to accountfor the heap memory is interesting future work. Abstractions of such a semanticscould potentially yield eﬃcient shape analyses for race free concurrent programs. Acknowledgements

We would like to thank the anonymous reviewers for their insightfuland helpful comments which have greatly improved the quality of the presentation. We wouldlike to thank Mooly Sagiv for his help and insights. We would also like to thank AntoineMin´e and Rapha¨el Monat for their help with the Apron library and in setting up Batman.This publication is part of a project that has received funding from the European ResearchCouncil (ERC) under the European Union’s Seventh Framework Programme (FP7/2007-2013)/ ERC grant agreement n ◦ [321174] and under the European Union’s Horizon 2020 researchand innovation programme (grant agreement No [759102-SVIS]). This research was supportedby Len Blavatnik and the Blavatnik Family foundation, and by the Blavatnik InterdisciplinaryCyber Research Center, Tel Aviv University.0 S. Mukherjee et al References

1. Adve, S.V., Hill, M.D.: Weak ordering – a new deﬁnition. In: ACM SIGARCH ComputerArchitecture News, vol. 18, pp. 2–14. ACM (1990)2. Adve, S.V., Hill, M.D.: A uniﬁed formalization of four shared-memory models. IEEETrans. Parallel Distrib. Syst. (6), 613–624 (1993). DOI 10.1109/71.242161. URL https://doi.org/10.1109/71.242161

3. Artho, C., Havelund, K., Biere, A.: High-level data races. In: New Technologies for In-formation Systems, Proceedings of the 3rd International Workshop on New Developmentsin Digital Libraries (NDDL 2003), and the 1st International Workshop on Validation andVeriﬁcation of Software for Enterprise Information Systems (VVEIS 2003), Angers, pp.82–93 (2003)4. Beyer, D.: Software veriﬁcation and veriﬁable witnesses – report on SV-COMP 2015. In:Proc. 21st International Conference on Tools and Algorithms for the Construction andAnalysis of Systems (TACAS 2015), London, pp. 401–416 (2015)5. Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Min´e, A., Monniaux, D.,Rival, X.: A static analyzer for large safety-critical software. In: Proc. ACM SIGPLANConference on Programming Language Design and Implementation (PLDI 2001), SanDiego, vol. 38, pp. 196–207. ACM (2003)6. Boehm, H., Adve, S.V.: Foundations of the C++ concurrency memory model. In:Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI 2008), Tucson, USA, 2008, pp. 68–78. ACM (2008)7. Chugh, R., Voung, J.W., Jhala, R., Lerner, S.: Dataﬂow analysis for concurrent programsusing datarace detection. In: Proc. ACM SIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI 2008), Tucson, vol. 43, pp. 316–326. ACM (2008)8. Cousot, P., Cousot, R.: Static determination of dynamic properties of programs. In: Proc.2nd International Symposium on Programming, Paris. Dunod (1976)9. Cousot, P., Cousot, R.: Abstract interpretation: a uniﬁed lattice model for static analysisof programs by construction or approximation of ﬁxpoints. In: Proc. 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL 1977), pp. 238–252. ACM (1977)10. Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of aprogram. In: Proc. 5th ACM SIGACT-SIGPLAN symposium on Principles of Program-ming Languages (POPL 1978), pp. 84–96. ACM (1978)11. De, A., D’Souza, D., Nasre, R.: Dataﬂow Analysis for Datarace-Free Programs. In: Proc.20th European Symposium on Programming (ESOP 2011), Saarbr¨ucken, pp. 196–215.Springer (2011)12. Dwyer, M.B., Clarke, L.A.: Data Flow Analysis for Verifying Properties of ConcurrentPrograms. In: Proc. Second ACM SIGSOFT Symposium on Foundations of SoftwareEngineering (FSE 1994), New Orleans, pp. 62–75 (1994)13. Farzan, A., Kincaid, Z.: Veriﬁcation of parameterized concurrent programs by modularreasoning about data and control. In: Proc. 39th ACM SIGPLAN-SIGACT Symposiumon Principles of Programming Languages (POPL 2012), Philadelphia, vol. 47, pp. 297–308.ACM (2012)14. Farzan, A., Kincaid, Z., Podelski, A.: Inductive data ﬂow graphs. In: Proc. 40th ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2013),Rome, vol. 48, pp. 129–142. ACM (2013)15. Ferreira, R., Feng, X., Shao, Z.: Parameterized memory models and concurrent separationlogic. In: Proc. 19th European Symposium on Programming (ESOP 2010), Paphos, pp.267–286. Springer (2010)16. Gotsman, A., Berdine, J., Cook, B., Sagiv, M.: Thread-modular shape analysis. In:Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI 2017), San Diego, vol. 42, pp. 266–277. ACM (2007)17. Grunwald, D., Srinivasan, H.: Data ﬂow equations for explicitly parallel programs. In:Proc. Fourth ACM SIGPLAN Symposium on Principles & Practice of Parallel Program-ming (PPOPP 1993), San Diego, vol. 28, pp. 159–168. ACM (1993)18. Jeannet, B.: Some experience on the software engineering of abstract interpretation tools.Electronic Notes in Theoretical Computer Science (2), 29–42 (2010)19. Jeannet, B., Min´e, A.: Apron: A library of numerical abstract domains for static analysis.In: Proc. 21st International Conference on Computer Aided Veriﬁcation (CAV 2009), pp.661–667. Springer (2009)hread-Local Analyses 5120. Jones, C.B.: Development methods for computer programs including a notion of interfer-ence. Oxford University Computing Laboratory (1981)21. Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Commun.ACM , 558–565 (1978)22. Lamport, L.: How to make a correct multiprocess program execute correctly on a multi-processor. IEEE Transactions on Computers , 779–782 (1997)23. Manevich, R., Lev-Ami, T., Sagiv, M., Ramalingam, G., Berdine, J.: Heap decompositionfor concurrent shape analysis. In: Proc. 15th International Symposium on Static Analysis(SAS 2008), Valencia, vol. 5079, pp. 363–377. Springer (2008)24. Manson, J., Pugh, W., Adve, S.V.: The Java Memory Model. In: Proc. 32nd ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2005),Long Beach, pp. 378–391. ACM (2005)25. Min´e, A.: Static analysis of run-time errors in embedded critical parallel C programs.In: Proc. 20th European Symposium on Programming (ESOP 2011), Saarbr¨ucken, pp.398–418. Springer (2011)26. Min´e, A.: Static analysis by abstract interpretation of concurrent programs. Ph.D. thesis,Ecole Normale Sup´erieure de Paris-ENS Paris (2013)27. Min´e, A.: Relational thread-modular static value analysis by abstract interpretation. In:Proc. 15th International Conference on Veriﬁcation, Model Checking, and Abstract Inter-pretation (VMCAI 2014), San Diego, pp. 39–58. Springer (2014)28. Monat, R., Min´e, A.: Precise thread-modular abstract interpretation of concurrent pro-grams using relational interference abstractions. In: Proc. 18th International Conferenceon Veriﬁcation, Model Checking, and Abstract Interpretation (VMCAI 2017), pp. 386–404.Springer (2017)29. Mukherjee, S., Padon, O., Shoham, S., D’Souza, D., Rinetzky, N.: RATCOP: relationalanalysis tool for concurrent programs. In: Proc. 13th International Haifa VeriﬁcationConference (HVC 2017), Haifa, pp. 229–233 (2017)30. Mukherjee, S., Padon, O., Shoham, S., D’Souza, D., Rinetzky, N.: Thread-local semanticsand its eﬃcient sequential abstractions for race-free programs. In: Proc. 24th InternationalSymposium on Static Analysis (SAS 2017), New York, pp. 253–276 (2017)31. Naik, M.: Chord: A Program Analysis Platform for Java. https://bitbucket.org/psl-lab/jchord/ . Accessed: 26 June 201832. Rinard, M.: Analysis of multithreaded programs. In: Proc. 8th International Symposiumon Static Analysis (SAS 2001), Paris, pp. 1–19. Springer (2001)33. Tarski, A.: A lattice-theoretical ﬁxpoint theorem and its applications. Paciﬁc journal ofMathematics (2), 285–309 (1955)34. Vall´ee-Rai, R., Co, P., Gagnon, E., Hendren, L., Lam, P., Sundaresan, V.: Soot – a Javabytecode optimization framework. In: Proc. conference of the Centre for Advanced Studieson Collaborative Research, p. 13. IBM Press (1999)35. Xu, Q., de Roever, W.P., He, J.: The rely-guarantee method for verifying shared variableconcurrent programs. Formal Aspects of Computing9