An abstract semantics of speculative execution for reasoning about security vulnerabilities
aa r X i v : . [ c s . P L ] M a r An abstract semantics of speculative executionfor reasoning about security vulnerabilities
Robert J. Colvin and Kirsten Winter
Defence Science and Technology Group, Australia, andSchool of Information Technology and Electrical EngineeringUniversity of Queensland
Abstract.
Reasoning about correctness and security of software is in-creasingly difficult due to the complexity of modern microarchitecturalfeatures such as out-of-order execution. A class of security vulnerabili-ties termed Spectre that exploits side effects of speculative, out-of-orderexecution was announced in 2018 and has since drawn much attention.In this paper we formalise speculative execution and its side effects withthe intention of allowing speculation to be reasoned about abstractlyat the program level, limiting the exposure to processor-specific or low-level semantics. To this end we encode and expose speculative executionexplicitly in the programming language, rather than solely in the op-erational semantics; as a result the effects of speculative execution arecaptured by redefining the meaning of a conditional statement, and in-troducing novel language constructs that model transient execution ofan alternative branch. We add an abstract cache to the global state ofthe system, and derive some general refinement rules that expose cacheside effects due to speculative loads. Underlying this extension is a se-mantic model that is based on instruction-level parallelism. The rulesare encoded in a simulation tool, which we use to analyse an abstractspecification of a Spectre attack and vulnerable code fragments.
Modern multicore architectures exhibit several features to speed up execution:commands may appear to occur out of order, allowing computation to proceedpast some bottleneck (e.g., loading a value from memory), several levels of fasterintermediate memory (caches) to speed up repeated accesses, and in particular, speculative execution , where a branch is optimistically executed, even thoughlocal computation may not yet have determined if it is the correct branch. Suchfeatures are difficult to reason about, though there has been significant work inunderstanding weak memory models [37,2,36,1,15,23] and also detailed formalmicroarchitectural models (e.g., [4]).Recently several significant security vulnerabilities have been found relatedto out-of-order execution, e.g., Meltdown [28], Foreshadow [5], and Spoiler [22].In this paper we focus on the recently published Spectre class of attacks [25,24].Spectre differs in that the attack may target the victim’s code to retrieve privatenformation, while other attacks exploit processor features only. While complexto exploit, Spectre is a vulnerability present in almost all modern architectures.It allows malicious code to access the memory of a victim process, potentiallyreading private data, without sharing the virtual memory space. The attackworks by detecting footprints in the cache left by speculative execution; forinstance, a branch that includes a bounds check on an index i into an array A may speculatively load the element at A [ i ], before it knows for certain that i is within the bounds of A . Though the speculative computations leading up tothe point where the mis-speculation is detected are discarded, depending on thesubsequent access patterns there may still be an effect on the cache, which is notdiscarded, and which can be used to infer the value in memory at out-of-boundsaddress A [ i ].In earlier work we have proposed a semantic framework to support reasoningabout weak memory models [12] which is implemented in a simulation/modelchecking tool based on Maude [8]. In this paper we extend this framework witha model of cache behaviour and speculative execution. Although Spectre mayoccur in memory models that provide sequential consistency, a weak memorymodel framework is a natural fit for speculative execution as speculated instruc-tions may begin out of order, i.e., before the relevant branch is reached. Thisenables not only a close inspection of Spectre-like attacks but also the analysisof other related potential vulnerabilities that may arise in modern hardware ar-chitectures. Our intention for the semantics is to allow analysis of vulnerabilityto Spectre-like attacks to be integrated within a more general, software-levelreasoning framework; we do not aim to precisely model the implementation ofspeculative execution or caches for a particular architecture.Speculative execution presents several challenges. Firstly, it requires a modelof the cache which, for our concerns, needs to be modelled at a level that presentsenough details to realistically capture the effects of speculative execution, but isabstract enough to not over-complicate reasoning. Secondly, speculative execu-tion should allow side effects to take effect before the relevant branch is reached.Thirdly, speculation can be nested, and the target of future branches may dependon speculatively executed computations, necessitating the creation of transient state that can be easily discarded. Finally, we want to be able to model andexplore possible mitigations, e.g., memory barriers to halt speculation, such asIntel’s LFENCE instruction [21, Sect. 11.4.4.3].The paper is structured as follows: in Sect. 2 we summarise a wide-spectrumlanguage and its semantics for reasoning about weak memory models. In Sect. 3we extend this with new constructs for reasoning about speculative execution,and give its semantics. We formalise some attacker and victim patterns, in par-ticular those of Spectre, in Sect. 4. Related work is discussed in Sect. 5.
IMP-ro
A wide-spectrum language for reasoning about weak memory models,
IMP-ro , isintroduced in [12,11]. It is essentially an imperative language with assignments,2onditionals and loops, with the difference that instead of sequential composition( c ; c for c i a command) it has prefixing, α ; c , where α is an instruction (asin process algebras such as CSP [20] and CCS [31]). The semantics of prefixingis defined so that either α may be executed, or some instruction β from within c may be executed, provided that β can be reordered before α according to therules of the memory model. To instantiate IMP-ro for a particular memory modela “reordering relation” r ⇐ on instructions is defined, stating when instructionscan occur out of order; in addition, different models may also have differentinstruction types, for instance, memory barriers for enforcing order.We recap IMP-ro below, before extending it to include speculative executionin later sections. We ground our work in a weak memory framework becausespeculation can occur before preceding instructions are executed, even whenspeculative execution is implemented on architectures which enforce sequentialconsistency. In addition, it appears that increasingly security vulnerabilities willbe found due to instruction reordering on modern architectures, e.g., [22]. How-ever, the particular reordering relation is not important for the analysis in thispaper, and to avoid distraction we mostly assume sequential consistency.The elements of
IMP-ro are actions (instructions) α , commands (programs) c , processes (local state and a command) p , and the top level system s , en-compassing a shared state and all processes. We assume a set of variables Var ,divided into locals (registers) and globals. By convention we use r , r , r , etc., toname local variables, and unless otherwise stated, x , y , z for global variables. Astate σ is a mapping from Var to values, with the notation σ [ x := v ] representingan update of σ to map x to v . Below x is a variable (shared or local) and e anexpression. α ::= x := e | [ e ] c ::= nil | α ; c | α (cid:5) c | c ⊓ c | while b do cp ::= ( local σ • c ) s ::= ( global σ • p k p k . . . )An action may be an update x := e or a guard [ e ]. For weak memory modelsthe set of actions may also include fences (memory barriers); we introduce anabstract barrier in later sections. Commands include the terminated command nil , prefixing, choice, and iteration. We also include the abstract command typefor “true prefixing”, α (cid:5) c , where reordering is forbidden, i.e., (cid:5) is prefixing inthe usual CSP [20] and CCS [31] sense. For brevity, for a command α ; β ; nil we omit the trailing nil and just write α ; β . A process encapsulates a commandwithin a local state σ (total on local variables), representing registers. A systemis structured as the parallel composition of processes sharing a global state, eachwith their own values for local variables.A relevant subset of the operational rules are given in Fig. 1. Transitions arelabelled with the syntax of the transition, i.e., assignments and guards, with theaddition of the silent label τ , modelling an internal step of a process with noeffect on the context. For brevity and ease of explanation we tend to focus onrules involving guards of a particular form, [ x = v ], which represents a load of x ule 1 (Prefix) ( α ; c ) α −→ c ( a ) c β −→ c ′ α r ⇐ β h α i ( α ; c ) β h α i −−−→ ( α ; c ′ ) ( b ) Rule 2 (Choice) c ⊓ d τ −→ cc ⊓ d τ −→ d Rule 3 (While)while b do c τ −→ if b then ( c ; while b do c ) else nilRule 4 (Locals) c r := v −−−→ c ′ ( local σ • c ) τ −→ ( local σ [ r := v ] • c ′ ) Rule 5 (Locals/store) c x := r −−−→ c ′ σ ( r ) = v ( local σ • c ) x := v −−−→ ( local σ • c ′ ) Rule 6 (Locals/load) c r := x −−−→ c ′ ( local σ • c ) [ x = v ] −−−→ ( local σ [ r := v ] • c ′ ) Rule 7 (Locals/guard) c [ e ] −−→ c ′ ( local σ • c ) [ e σ ] −−→ ( local σ • c ′ ) Rule 8 (Parallel) p α −→ p ′ p k p α −→ p ′ k p p α −→ p ′ p k p α −→ p k p ′ Rule 9 (Globals/store) p x := e −−−→ p ′ ( global σ • p ) τ −→ ( global σ [ x := e σ ] • p ′ ) Rule 10 (Globals/load) p [ x = v ] −−−→ p ′ σ ( x ) = v ( global σ • p ) τ −→ ( global σ • p ′ ) Fig. 1.
Semantics of the language when x = v . The more general rules are given in [12]. We omit some rules, suchas terminating rules like ( local σ • nil ) τ −→ nil .Rule 1 is the key rule that allows later instructions to happen earlier, ac-cording to an architecture-specific reordering relation r ⇐ . For instance, for TSO,the main part of the reordering relation is that loads can come before stores,i.e., x := 1 r ⇐ r := y , while α r ⇐ β for all other instruction types. Relations forTSO, ARM and POWER are given in [12]. To avoid distraction in this paper weassume sequential consistency, i.e., α r ⇐ β for the basic instruction types, withthe exception that τ steps can be reordered (allowing future local calculations tobe executed ahead of time). In Rule 1 the notation β h α i accounts for forwarding ,where in a case such as x := 1 ; r := x the instruction r := x can take effect before x := 1 provided the value 1 is forwarded to r , meaning that r := 1 is executed(rather than r := x , which it would not be sensible to execute before x := 1 froma sequential semantics perspective). Forwarding is defined straightforwardly in412], and we do not repeat it here. The semantics for true prefixing, α (cid:5) c , isgiven by an equivalent version of Rule 1(a).Rule 2 is straightforward for nondeterministic choice. In Rule 3 we unfold aloop into a conditional; the definition of conditional in a speculative context iscrucial and is deferred until Sect. 3.2. Rule 4 covers the case of some change tothe local registers. This is an internal step of the process and is a silent τ step atthe global level. Rule 5 applies when a store x := r is executed by a process: thelocal value v for r is substituted so that the label x := v is promoted to the globalstate (this rule can be generalised to cover any assignment of the form x := e [10]). Rule 6 states that when a load r := x instruction is executed internally itbecomes a load of x , i.e., a guard [ x = v ], for any value v . Although there is atransition [ x = v ] for every possible v , only the guard with the correct value for x will be possible at the system level (via Rule 10). The loaded value becomesthe new value for r in the local state. Rule 7 states that a guard is evaluatedwith respect to the registers, and is promoted for evaluation with respect to theglobal state. Rule 8 gives the usual interleaving model of concurrency. Rules 9and 10 straightforwardly update and access the global store via promoted stores(Rule 5) and loads (Rule 6). Refinement ( ⊑ ) is defined so that c ⊑ d iff all terminating traces of d are alsotraces of c , ignoring subsequences of internal ( τ ) steps. Terminating traces arethose retrieved from the operational semantics where eventually nil is reached.(For simplicity we ignore non-terminating behaviours, that is, for this paper weconsider only partial correctness, which is sufficient for detecting Spectre-likeattacks.) Note that if a behaviour is blocked (no rules are applicable, e.g., afalse guard) it is not considered terminating. This eliminates behaviours wherethe wrong branch is incorrectly taken (as opposed to incorrectly speculated), asdiscussed in more detail in [12].We lift reasoning from the operational to refinement level via Law 11, whichallows us to straightforwardly derive Law 13. More specific laws may also bestraightforwardly derived, such as resolving nondeterminism via Law 12, andLaw 14 that hides local effects, exposing a process’s global effect; this helps laterto abstract from the details of transient speculative contexts. c α −→ c ′ ⇛ c ⊑ α (cid:5) c ′ (11) c ⊓ c ⊑ c (12) α ; c ⊑ α (cid:5) c (13)( local { r } • x := r ) ⊑ x := 1 (14) IMP-ro-spec
From the perspective of functional correctness, speculative execution may beignored: in the case where a process speculates along the branch that is eventuallytaken (after the conditional is evaluated) implementations ensure that speculatedinstructions are committed in a consistent order; and when speculation was In this paper we assume a multicopy atomic storage system; for memory modelswhich lack this (e.g., POWER) the storage system described in [12] may be used.
IMP-ro to expose them.For convenience we call the extended language
IMP-ro-spec , which definesconditionals to expose (incorrect) speculative execution, and records operationson the cache in a global variable. Speculation occurs within a transient context ,which is discarded if speculation is found to be incorrect.
IMP-ro-spec
Speculative execution.
We introduce three new commands to capture spec-ulative execution in
IMP-ro . α ::= . . . | specfence (15) c ::= . . . | spec ( c ) | c △ c | ( buf σ • c ) (16) e c b = ( buf ∅ • ( local σ • c )) (17) if b then c else c b = spec ( e c ) △ ([ b ] ; c ) ⊓ spec ( e c ) △ ([ ¬ b ] ; c ) (18)The instruction type specfence blocks load speculation; this is an abstractcommand type that may correspond to, for instance, the LFENCE command ofIntel architectures [21]. We include it to demonstrate the relevance of reorderingrelations and how mitigation techniques can be considered in our framework.A speculation command spec ( c ) gives the effect of executing command c spec-ulatively, that is, no effects on the global or local state can be seen, however,there can be cache side effects based on the steps of c . A partial pre-executioncommand c △ c partially executes c before c begins. The initial command c may not execute at all, execute to completion, or partially execute. It is thewell-known CSP “interrupt” operator, but we rename it in this context to avoidconfusion with hardware interrupts. The transient buffer command ( buf σ • c )is used to keep track of modifications to globals executed speculatively.We also introduce the abbreviation e c which creates the transient context for aspeculative execution of c , that is, a (temporary) mapping of (all) registers, andan initially empty transient buffer (17). The values for the speculative copy ofregisters σ created here is left unconstrained and may differ to the actual localstate in the outer context; this accounts for different strategies that differentarchitectures may take. Because the specifics of the local state are not relevantfor reasoning about Spectre we do not model a specific strategy, which could begiven by adding an explicit transition that sets up the local state according to thecurrent context. A speculative execution of code c is of the form spec ( buf σ b • ( local σ l • c )), where a copy of the locals is encapsulated in σ l , stores to globalsare encapsulated in σ b , and the outer speculative command generates the cacheside effects. An example of how they interact is given in Sect. 3.3.Speculation is evident at branch points, and hence we model conditionalsdifferently. Whereas in [12] a conditional if b then c else c was defined in thestandard way as ([ b ]; c ) ⊓ ([ ¬ b ]; c ) here we extend the definition to potentiallypre-execute speculation down the alternative branches as given in (18). This says6here are two possibilities: speculatively execute the second branch (ignoringthe guard) up until the point where the first branch is chosen, or speculativelyexecute the first branch until the point the second branch is chosen. Thesetwo possibilities cover all behaviours relevant in the context of Spectre; as faras is known speculation down the eventually correct branch has no impact onthe security of the system that is not already visible through other analysistechniques, e.g., information flow [33]. However, speculation down the correctbranch is straightforward to capture, as discussed in Appendix A.To explain the relevance of the transient context (initialised in (17)) considerthe execution of spec ( x := 1 ; r := x ; . . . ). The effect of x := 1 must not be seenglobally (as it is difficult to unwind), however during speculation r must use thevalue 1. If instead r was to use a value of x loaded from main memory this wouldviolate local consistency (see [1]). This detail is especially important if r is usedin later (speculated) calculations, including future branches. In our approach itemerges from the semantics that x is not loaded nor drawn into the cache duringspeculation of the above code. A purely syntactic approach to determining theeffect of speculative execution might conclude that x is added to the cache, andhence could be overly pessimistic from a security analysis perspective.Nested speculation, which may arise from nested conditionals or a speculatedloop, is straightforward in our framework; a new, nested, transient context iscreated, and if an inner speculation attempts to load a global which the outerspeculation has buffered then the cache effect is removed (see Rule 24(e)). The cache.
The cache is modelled as a single global variable cache , kept inthe shared state, which holds a set of type
Addr , representing addresses (for thiswork we do not care what values are in the cache; however it is straightforward tomodify the type of cache ). We assume an uninterpreted function & : Var → Addr such that & x returns the address of the (global) variable x . We introduce threeoperations on cache to model cache side channels abstractly: cache fetching(adding something to the cache), cache clearing (clearing the (entire) cache),and cache querying (checking if an address is in the cache). Other explicit cacheoperations could be added, but these are sufficient for modelling the attackpatterns utilised to instrument Spectre attacks [25]. cache + = x b = cache := cache ∪ { & x } (19) cclear b = cache := ∅ (20) x ∈ cache ⇔ & x ∈ cache (21)As these are abbreviations for updates to and guards on a global variable theyfit in with the framework introduced in Sect. 2. A cache fetch represents the sideeffect of a speculated load. The instruction cclear captures abstractly flushingas well as eviction of particular cache lines as it ensures that a certain addressis not present in the cache any more.The variable cache is kept in the global state and hence is shared between allprocesses. An alternative would be to explicitly model it as a separate construct,7.g., ( cch C • c ) where C is a set of addresses. This approach would allow morefine-grained control over cache levels, e.g., each process could have its own L1cache, with some subset sharing an L2 cache, with the L3 cache at the top level.( cch L • ( cch L a • ( cch L • p ) k ( cch L • p )) k ( cch L b • . . . ))We are interested in the worst case behaviour of the cache, where it leaks privateinformation, and are not concerned with the specifics of how that may happen.However details of the cache, such as its update policy, may also be capturedwith extra machinery. In that sense our model of the cache is an abstraction ofthe underlying microarchitecture implementation, which could be verified usingdata and action refinement techniques [19,32,3]. IMP-ro-spec
Partial pre-execution.
The semantics of a partial pre-execution process isbased on that of the interrupt operator from CSP [20].
Rule 22 (Partial pre-execution) c α −→ c ′ c △ c α −→ c ′ △ c ( a ) c α −→ c ′ c △ c α −→ c ′ ( b )For commands of the form spec ( c ) △ d the speculation of c occurs for some periodof time (Rule 22(a)) before discarding the computation and starting down the d branch (Rule 22(b)). The arbitrariness of when c starts captures the unknowntime at which speculation may be found to be incorrect. We make use of thefollowing law that covers the interruption occurring after a single action.( α (cid:5) c ) △ c ⊑ α (cid:5) c (23) Transient buffers.
Transient buffers catch stores and record them in a state;recorded values may be used for speculative computations.
Rule 24 (Buffer) c x := v −−−−→ c ′ ( buf σ • c ) τ −→ ( buf σ [ x := v ] • c ′ ) ( a ) c [ x = v ] −−−−→ c ′ ( x v ) ∈ σ ( buf σ • c ) τ −→ ( buf σ • c ′ ) ( b ) c [ x = v ] −−−−→ c ′ x dom( σ )( buf σ • c ) [ x = v ] −−−−→ ( buf σ • c ′ ) ( c ) c cache + = x −−−−−−−→ c ′ x ∈ dom( σ )( buf σ • c ) τ −→ ( buf σ • c ′ ) ( d ) c cache + = x −−−−−−−→ c ′ x dom( σ )( buf σ • c ) cache + = x −−−−−−−→ ( buf σ • c ′ ) ( e )Rule 24(a) states that (speculated) stores are recorded in the transient buffer;8ule 24(b) states that (speculated) loads are serviced by the buffer (similar to forwarding [12]) if a value is available; Rule 24(c) states that otherwise the load ispromoted (to be handled by the global state via Rule 10). In cases where nestedspeculation has resulted in a cache fetch, Rule 24(d), similarly to Rule 24(b),hides a fetch of x if a store of x is in the buffer already; Rule 24(e) states thatotherwise the cache fetch is promoted. In addition a transient buffer commandpromotes other instruction types not covered above (e.g., τ , specfence ), andthe rules do not need to cover registers since the transient buffer encloses a localstate (17). Speculation (down an incorrect path).
Speculation should have no observ-able effect on registers or globals (the “CPU state”), however in reality it mayleave a footprint in the cache. The main concept is to make explicit a cache fetchwith each speculated load.
Rule 25 (Speculative context) c [ x = v ] −−−−→ c ′ spec ( c ) cache + = x −−−−−−−→ ([ x = v ] ; spec ( c ′ )) ( a ) c τ −→ c ′ spec ( c ) τ −→ spec ( c ′ ) ( b ) c cache + = x −−−−−−−→ c ′ spec ( c ) cache + = x −−−−−−−→ spec ( c ′ ) ( c ) spec ( nil ) τ −→ nil ( d )Rule 25(a) states that speculated loads of global variables have an initial sideeffect on the cache. The load is delayed until after the cache fetch. Rule 25(b)states that speculative execution can perform local computation. Rule 25(c)states that cache fetches are promoted (from nested speculation). Rule 25(d)states that speculation may silently complete. By omission, i.e., since there isno corresponding rule, speculation is blocked if c executes a specfence com-mand. We do not need to consider further action types, since speculation alwaysencompasses a transient context out of which only loads and cache fetches areexposed. Reordering of cache instructions.
The semantics of
IMP-ro is instantiatedfor a particular memory model by defining the relation r ⇐ , as used in Rule 1(b).We must therefore define the cases under which the new (cache-based) instruc-tion types can be reordered. The concept of speculative execution is that loadscan be initiated ahead of time, though they must still (appear to) conform tothe particular memory model. However the cache fetches are not so constrained.We therefore allow cache fetch instructions to be reordered before the majorityof instruction types. y := e r ⇐ cache + = x iff x , y distinct (26) specfence r ⇐ r := x specfence r ⇐ cache + = x (27)9quation (26) states that a cache fetch of x may occur earlier than loads, andstores of other variables (note that x := 1 r ⇐ cache + = x as the assignment willservice the corresponding load, rather than memory). Equation (27) states that specfence instructions block loads and cache fetches. A potential mitigationfor the Spectre vulnerability (short of turning off speculation entirely) is toinsert (concrete) specfence instructions at the start of each potentially affectedbranch. However, this would have too great an impact on processor speed to beseriously considered as a blanket fix [30].As an example of out-of-order execution with cache side effects consider acommand of the following form, where l i are loads and s i are store instructionsto distinct locations. l ; l ; ( if b then l ; s else s ; l )Speculation allows cache fetches to come earlier (out-of-order), although whetherthe loads themselves can come earlier than preceding loads depends on the archi-tecture; ARM and POWER allow loads to be reordered, whereas TSO doesn’t[37,1]. Let c be the cache fetch corresponding to load l . One possible behaviour,where the true branch is speculated before the false branch is executed, is givenby the following sequence, which exposes the cache fetch for l . c (cid:5) l (cid:5) l (cid:5) l (cid:5) [ ¬ b ] (cid:5) s (cid:5) l The cache fetch for l occurs before the earlier loads, which, for some execution-and architecture-specific reason, have taken longer to resolve. Note l itself occursin an order consistent with the memory model.For simplicity we enforce ordering on cache operations, though the frameworkis flexible (for instance, on Intel architectures cache flush instructions do notnecessarily prevent pre-fetching [21]). α r ⇐ cclear cclear r ⇐ α x ∈ cache r ⇐ α α r ⇐ x ∈ cache We do not intend for these to be definitive, but rather develop a framework thatis flexible enough to cope with different models.
In this section we show the particular behaviour of a conditional statement,where the true branch is (partly) speculated before the false branch begins.We construct the true branch, branch T , so that it modifies some global x anda register r , before loading z into register r and proceeding as branch ′ T . A(partial) behaviour of branch T is given by (28). branch T b = x := 1 ; r := 2 ; r := z ; branch ′ T branch T x := 1 r := 2 r := z −−−−−−−−−−−−−−−→ branch ′ T (28)The trace ends with a load of z . We will take the case where globally z has thevalue 42. Now consider speculating branch T .10 pec ( g branch T )= Set up new transient context (17) spec ( buf ∅ • ( local σ • branch T )) τ −→ ∗ From (28), locally update x by Rule 24(a) and r by Rule 4 spec ( buf { x } • ( local σ [ r := 2] • r := z ; branch ′ T )) cache + = z −−−−−−−→ Fetch z (Rule 25(a)); arbitrarily assume z is 42[ z = 42] ; spec ( buf { x } • ( local σ [ r := 2][ r := 42] • branch ′ T ))The cache fetch has been exposed in the trace (the corresponding load [ z = 42]is pending). We abbreviate the remaining code as branch ′′ T , and may derive (29)by the above calculation and Law 11. branch ′′ T b = [ z = 42] ; spec ( buf { x } • ( local σ [ r := 2][ r := 42] • branch ′ T )) spec ( g branch T ) ⊑ cache + = z (cid:5) branch ′′ T (29)Now we show how the cache fetch in the true branch may be seen in be-haviours where the false branch is taken. if b then branch T else branch F b = Definition 18 spec ( g branch F ) △ ([ b ] ; branch T ) ⊓ spec ( g branch T ) △ ([ ¬ b ] ; branch F ) ⊑ Arbitrarily choose false branch by Law 12 spec ( g branch T ) △ ([ ¬ b ] ; branch F ) ⊑ by (29)( cache + = z (cid:5) branch ′′ T ) △ ([ ¬ b ] ; branch F ) ⊑ by Law 23 cache + = z (cid:5) ([ ¬ b ] ; branch F )From the system’s perspective the speculation has had no effect: the assignmentto x was caught in the transient buffer, and then discarded, and the computationsinvolving registers r and r became silent steps that did not affect the outerstate. However, the cache has (potentially) been modified.At the system level this gives the following behaviour, assuming global state σ g satisfies σ g ( z ) = 42 and assuming σ g ( cache ) = C (the value for x is irrele-vant), and σ l is the local state (mapping r and r ).( global σ g • ( local σ l • if b then branch T else branch F ) k . . . ) ⊑ By the above derivation (note that neither σ g nor σ l are affected)( global σ g • ( local σ l • cache + = z (cid:5) ([ ¬ b ] ; branch F )) k . . . ) ⊑ Execute instruction (Rule 1(a)), (19), Rule 9( global σ g [ cache := C ∪{ & z } ] • ( local σ l • [ ¬ b ] ; branch F ) k . . . )The processes in ‘ . . . ’ could include a malicious attacker that may be able toexploit the existence of z in the cache. We give an example of this in the nextsection. Note that branch T does not depend on any of the values it buffers/loads, and hencewe may choose an arbitrary local σ ; for other cases the choice of σ may be important. spec ( g r := x ; r := y ) ⊑ cache + = x (cid:5) [ x = v ] ; spec ( g r := y ) ⊑ cache + = x (cid:5) cache + = y (cid:5) [ x = v ] ; [ y = v ] ; spec ( f nil )And hence by generalising Law 23 we may deduce the following. spec ( g r := x ; r := y ) △ c ⊑ cache + = x (cid:5) cache + = y (cid:5) c (30) if b then r := x ; r := y else c ⊑ cache + = x (cid:5) cache + = y (cid:5) [ ¬ b ] ; c (31) Cache-based timing attacks often utilise certain attack strategies to set up thecache as a covert or side channel to expose secret information. Generally, anattacker that shares a cache with a victim can observe through the variationin access time whether a particular memory address resides in the cache (acache hit) and hence has been accessed previously, or not (a cache miss). Toreduce noise on this covert channel, the attacker first “clears” the cache to makesure the memory address in question does not reside in the cache. This can beachieved by either flushing the cache line in question (some Intel architecturesoffer an instruction clflush ), or by filling the cache with other content (byaccessing physically congruent addresses in a large array [17]), so that due tothe contention the memory addresses in question (if present) will be evicted.Both these options are captured in our model through the instruction cclear (as emptying the cache and filling the cache with other content amounts to thesame desired effect).For example consider the following code that iterates over the elements of anarray B to determine which of B [ i ] is in the cache. Atk b = i := 0 ; ( while i < do ( if B [ i ] ∈ cache then r := i ) ; i + = 1) (32)If the attacker is trying to determine the value of some byte of data d , thenunder the assumption B [ d ] ∈ cache and for all i = d we have B [ i ] cache then we have r = d .The guard B [ i ] ∈ cache is an abstraction of a timing attack that loads B [ i ]and checks the amount of time against an architecture-specific threshold. Forour level of analysis we do not need to explicitly model such detail, we care onlythat it is possible. Flush + Reload [46] and also
Evict + Reload [17], two examples that followthe above pattern, can be used to target the last level cache (LLC), which is12hared between cores, and hence works on any cross-core as well as cross-VMsettings [27]. In cases where a flush instruction is not available eviction is usedto “clear” the cache. The following fundamental concepts of micro-architecturesare exploited in these attack patterns [17]: 1) the LLC is shared amongst allCPUs; 2) the LLC is inclusive (i.e., contains all data that is stored in the L1and L2 caches, hence modifications on the LLC influence caches on all othercores); 3) single cache lines are shared amongst processes on the same core; and4) programs can map any other program binary/library into their address space.
Spectre attacks typically use an attack pattern based on those described above.Additionally to setting up the cache as a channel, the attacker (mis)trainsthe branch predictor to speculate down the desired branch. Depending on theprocessor-specific branch prediction mechanism used, the training can occur byrepeatedly running the code with “correct” input. When unexpectedly suppliedwith an “incorrect” input, the processor will (incorrectly) speculate the desiredbranch, in which secret information is loaded from memory (e.g., execute a mem-ory access at an address that is chosen by the attacker), or in other variants theattacker may leverage its own code to access the secret from the same process,for instance, a webpage script run from within a browser process. In a thirdphase of the attack the timing difference between a cache hit and a cache missis observed by the attacker, as in (32), allowing it to deduce the secret value.An example of victim’s code that is susceptible to a Spectre attack is givenbelow (following [24]). Assume that the attacker wishes to know the value ofsome data d , held at some address in the private space of the victim process V and which can be retrieved via variable k , i.e., d is at address & k . The attackerknows/calculates the address of k relative to the victim array A , which we willcall χ , loading the value into r via an out-of-bounds index into A . That is, A [ χ ] = d . This private data is then used as an index into another array B . V b = r := χ ; n := A ; if r < n then ( r := A [ r ] ; r := B [ r ])We apply Law 31 to observe the potential effects of speculation. V ⊑ r := χ ; n := A ; cache + = A [ χ ] (cid:5) cache + = B [ d ] (cid:5) [ r < n ](We let & A [ i ] return a unique address for the array A at index i .) Let σ l bethe local state for V ( A , n , r , r , r ∈ dom( σ l )), and σ g the global state ( B ∈ dom( σ g ) , σ g ( cache ) = C ); then we can derive the following refinement.( global σ g • ( local σ l • V ) k p ) ⊑ ( global σ g [ cache := C ∪{ & k , & B [ d ] } ] • ( local σ ′ l • nil ) k p )The data d does not appear explicitly in the shared state, but indirectly througha cache fetch. Note that the values of the variables whose addresses are in thecache are not accessible. 13o infer d the attacker may perform an attack as given by Atk in (32). Forsimplicity here we assume
Atk and V share B , for instance if B is a read-onlyarray of data shared by processes in a system; alternatively Atk does not needto share B , but rather know where B maps to in a shared cache, and mapan array B Atk of its own so that the addresses in the cache line up. At thislevel of abstraction we do not distinguish these alternatives. To establish theprecondition that all elements of B are not in the cache the attacker sets up thecontext to ensure that it executes a cclear before the victim’s code is run. Forinstance, if the vulnerable code is in a function call provided by the server V ,with the initial value of r passed as an argument,( global { cache , . . . } • cclear ; V ( χ ) ; Atk )This pattern can be repeated; in fact χ need not be a specific address, as datafrom V ’s private space can be read consecutively byte-by-byte by incrementing χ on each attack. Model checking.
We validated the semantics by encoding the refinement lawsas an extension to the simulation tool described in [12], which is written in theMaude rewriting engine [8,42]. The refinement laws and auxiliary definitions(such as r ⇐ for cache fetches) were encoded straightforwardly. We then encodedthe Spectre attacker and victim processes, extending the array A so that itscontents went beyond its stated length to model an out-of-bounds index intoprivate memory; the simulation runs showed that r = d is established in theattacker (32) in the cases where speculation is not interrupted in the victimuntil after the two cache fetches. Cache side channels have been studied in the past decade (see [47,16] for anoverview), and a number of tools have been developed to support the detectionof vulnerabilities (e.g., [14,44,6,39]). However, these developments predate thepublication of the Spectre vulnerability [25,24] and hence do not consider theeffects of speculative execution. Since the effects of speculation do not affect thefunctional correctness of an implementation (the results of incorrect speculationare thrown away), they could be safely ignored in earlier work on the semanticsof weak memory models (e.g., [12]). Detailed formal models of microarchitecturedescribe the interaction of the cache with processors [4], but are not readilyintegrated with language-level analysis techniques.A model of speculative execution to study vulnerabilities and support theevaluation of software mitigations is presented in [30]. That work assumes auniprocessor system and is not integrated with a weak memory model, and isdesigned to give a precise description of the behaviour of the microarchitecture.The work of [13] gives a model of execution that highlights speculative behavioursby explicitly modelling executions down false branches within a partially-ordered14ultiset graph-based model. In contrast to our framework, they don’t considernested speculation, nor reorder speculated instructions.A number of tools have been developed for detecting Spectre-vulnerable codeand injecting fences to mitigate the danger [43,26,45] as well as informationflow approaches to ensuring security in the presence of speculative execution[18,7]. The operational semantics underlying these approaches is less abstractthan that presented in this paper, and the analysis is performed at the semanticlevel. The key difference of our work is that we encode speculative execution atthe command level, and hence our framework supports algebraic, or refinement-based reasoning.The CheckMate tool [40] integrates a model of speculative execution into aweak memory model framework [29]. Since the work aims at the verification ofmicroarchitectures, their model is set at that level and does not provide high-level properties such as Law 31 to support reasoning on the program level. Theirtool is used to synthesise Spectre-style attacks and generate assembler test pro-grams that can be used to determine if a particular processor is susceptible. Wecan potentially use these test programs to investigate the security implicationswithin our more abstract framework. We have focused on cache effects fromspeculative loads, however two variants of Meltdown and Spectre discovered bythe CheckMate tool [40,41] work from speculative stores. On architectures wherespeculatively executed stores affect the cache we can adapt our semantics suchthat Rule 24(a) emits the appropriate cache-modifying action (rather than beinga purely internal step).
We have captured the side effects of speculative execution down the wrong pathwith a relatively small extension to an existing framework for reasoning aboutweak memory models (out-of-order execution). To calculate speculated compu-tations (beyond loads) we introduced a transient context, which is discardedin the case of incorrect speculation. In our semantic framework, in contrast toPlotkin-style semantics where states appear in the configuration of the opera-tional rules [35], we expose the effect of a transition in its label. This simplifiessemantic issues concerning redeclaration of variables (see [9,10] for a further dis-cussion); operations on variables in the inner (transient) scope become silent τ steps that do not effect the variables in the outer scope, despite sharing the samenames. Allowing early execution of speculated instructions was straightforwardto specify in the reordering relation of IMP-ro [12].Our intention is to allow abstract functional analysis techniques to be usedalongside security analysis techniques, reusing existing tools. In particular, theinformation flow analysis framework in [34,33] has been extended to weak mem-ory models [38] based on the reordering semantics of
IMP-ro [12]. We envisage afurther extension of that work based on
IMP-ro-spec to find information leaksresulting from speculative execution. (information flow approaches to speculativeexecution are also considered in [18,7]). We have aimed to provide just enough15etail so that cache effects can be modelled, but not so much that the ability toderive generic algebraic laws (such as Law 31) is lost.
Acknowledgements.
We thank Samuel Chenoweth, Patrick Meiring, MarkBeaumont, Harrison Cusack and the anonymous reviewers for helping us improvethe paper.
References
1. Jade Alglave, Luc Maranget, and Michael Tautschnig. Herding cats: Modelling,simulation, testing, and data mining for weak memory.
ACM Trans. Program.Lang. Syst. , 36(2):7:1–7:74, July 2014.2. Jade Alglave, Luc Maranget, Michael Tautschnig, Susmit Sarkar, and Peter Sewell.Litmus: Running tests against hardware. In Parosh Aziz Abdulla and K. Rustan M.Leino, editors,
Tools and Algorithms for the Construction and Analysis of Systems(TACAS) , Lecture Notes in Computer Science, pages 41–44. Springer, 2011.3. R. J. R. Back and J. von Wright. Trace refinement of action systems. In BengtJonsson and Joachim Parrow, editors,
CONCUR ’94: Concurrency Theory , pages367–384, Berlin, Heidelberg, 1994. Springer Berlin Heidelberg.4. Shiji Bijo, Einar Broch Johnsen, Ka I. Pun, and S. Lizeth Tapia Tarifa. An oper-ational semantics of cache coherent multicore architectures. In
Proceedings of the31st Annual ACM Symposium on Applied Computing , SAC ’16, pages 1219–1224,New York, NY, USA, 2016. ACM.5. Jo Van Bulck, Marina Minkin, Ofir Weisse, Daniel Genkin, Baris Kasikci, FrankPiessens, Mark Silberstein, Thomas F. Wenisch, Yuval Yarom, and Raoul Strackx.Foreshadow: Extracting the keys to the Intel SGX kingdom with transient out-of-order execution. In
USENIX Security Symposium , 2018.6. Sudipta Chattopadhyay and Abhik Roychoudhury. Symbolic verification of cacheside-channel freedom.
CoRR , abs/1801.01203, 2018.7. K. Cheang, C. Rasmussen, S. Seshia, and P. Subramanyan. A formal approach tosecure speculation. In , pages 288–28815, June 2019.8. Manuel Clavel, Francisco Duran, Steven Eker, Patrick Lincoln, Narciso Marti-Oliet,Jos´e Meseguer, and Jos´e F. Quesada. Maude: specification and programming inrewriting logic.
Theoretical Computer Science , 285(2):187 – 243, 2002.9. Robert Colvin and Ian J. Hayes. CSP with hierarchical state. In Michael Leuscheland Heike Wehrheim, editors,
Integrated Formal Methods (IFM 2009) , volume 5423of
Lecture Notes in Computer Science , pages 118–135. Springer, 2009.10. Robert J. Colvin and Ian J. Hayes. Structural operational semantics throughcontext-dependent behaviour.
Journal of Logic and Algebraic Programming ,80(7):392 – 426, 2011.11. Robert J. Colvin and Graeme Smith. A high-level operational semantics for hard-ware weak memory models.
CoRR , abs/1812.00996, 2018.12. Robert J. Colvin and Graeme Smith. A wide-spectrum language for verification ofprograms on weak memory models. In Klaus Havelund, Jan Peleska, Bill Roscoe,and Erik de Vink, editors,
Formal Methods , pages 240–257, Cham, 2018. SpringerInternational Publishing.13. C. Disselkoen, R. Jagadeesan, A. Jeffrey, and J. Riely. Code that never ran: Mod-eling attacks on speculative evaluation. In
Proc. of IEEE Symposium on Securityand Privacy (S&P) , 2019. to appear.
4. Goran Doychev, Boris K¨opf, Laurent Mauborgne, and Jan Reineke. CacheAudit:A tool for the static analysis of cache side channels.
ACM Trans. Inf. Syst. Secur. ,18(1):4:1–4:32, June 2015.15. Shaked Flur, Kathryn E. Gray, Christopher Pulte, Susmit Sarkar, Ali Sezgin, LucMaranget, Will Deacon, and Peter Sewell. Modelling the ARMv8 architecture,operationally: Concurrency and ISA. In
Proceedings of the 43rd Annual ACMSIGPLAN-SIGACT Symposium on Principles of Programming Languages , POPL’16, pages 608–621, New York, NY, USA, 2016. ACM.16. Qian Ge, Yuval Yarom, David Cock, and Gernot Heiser. A survey of microarchi-tectural timing attacks and countermeasures on contemporary hardware.
Journalof Cryptographic Engineering , 8(1):1–27, 2018.17. Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. Cache template attacks:Automating attacks on inclusive last-level caches. In , pages 897–912. USENIX Association, 2015.18. Marco Guarnieri, Boris K¨opf, Jos´e F. Morales, Jan Reineke, and Andr´es S´anchez.SPECTECTOR: principled detection of speculative information flows.
CoRR ,abs/1812.08639, 2018.19. Jifeng He, C.A.R. Hoare, and Jeff Sanders. Data refinement refined (resume). InBernard Robinet and Reinhard Wilhelm, editors,
ESOP 86 , volume 213 of
LectureNotes in Computer Science , pages 187–196. Springer Berlin / Heidelberg, 1986.20. C. A. R. Hoare.
Communicating Sequential Processes . Prentice-Hall, Inc., UpperSaddle River, NJ, USA, 1985.21. Intel.
Intel 64 and IA-32 Architectures Software Developers Manual , January 2019.22. Saad Islam, Ahmad Moghimi, Ida Bruhns, Moritz Krebbel, Berk Gulmezoglu,Thomas Eisenbarth, and Berk Sunar. Spoiler: Speculative load hazards boostrowhammer and cache attacks, 2019.23. Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek Dreyer.A promising semantics for relaxed-memory concurrency. In
Proceedings of the44th ACM SIGPLAN Symposium on Principles of Programming Languages , POPL2017, pages 175–189, New York, NY, USA, 2017. ACM.24. Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, MoritzLipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom.Spectre attacks: Exploiting speculative execution.
CoRR , abs/1801.01203, 2018.25. Paul Kocher, Jann Horn, Anders Fogh, , Daniel Genkin, Daniel Gruss, WernerHaas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, MichaelSchwarz, and Yuval Yarom. Spectre attacks: Exploiting speculative execution. In , 2019.26. P. Li, L. Zhao, R. Hou, L. Zhang, and D. Meng. Conditional speculation: Aneffective approach to safeguard out-of-order execution against Spectre attacks. In , pages 264–276, Feb 2019.27. Moritz Lipp, Daniel Gruss, Raphael Spreitzer, Cl´ementine Maurice, and StefanMangard. Armageddon: Cache attacks on mobile devices. In , pages 549–564. USENIX Association, 2016.28. Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, An-ders Fogh, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom,and Michael Hamburg. Meltdown: Reading kernel memory from user space. In
USENIX Security Symposium , 2018.29. Daniel Lustig, Michael Pellauer, and Margaret Martonosi. PipeCheck: Specify-ing and verifying microarchitectural enforcement of memory consistency models. n Proceedings of the 47th Annual IEEE/ACM International Symposium on Mi-croarchitecture , MICRO-47, pages 635–646, Washington, DC, USA, 2014. IEEEComputer Society.30. Ross Mcilroy, Jaroslav Sevcik, Tobias Tebbi, Ben L. Titzer, and Toon Verwaest.Spectre is here to stay: An analysis of side-channels and speculative execution.
CoRR , abs/1902.05178, 2019.31. Robin Milner.
A Calculus of Communicating Systems . Springer-Verlag New York,Inc., 1982.32. C. Morgan and P. Gardiner. Data refinement by calculation.
Acta Informatica ,27:481–503, 1990.33. Toby C. Murray, Robert Sison, and Kai Engelhardt. C overn : A logic for composi-tional verification of information flow control. In , pages 16–30. IEEE, 2018.34. Toby C. Murray, Robert Sison, Edward Pierzchalski, and Christine Rizkallah. Com-positional verification and refinement of concurrent value-dependent noninterfer-ence. In
IEEE 29th Computer Security Foundations Symposium, CSF 2016 , pages417–431. IEEE Computer Society, 2016.35. Gordon D. Plotkin. A structural approach to operational semantics.
J. Log. Algebr.Program. , 60-61:17–139, 2004.36. Susmit Sarkar, Peter Sewell, Jade Alglave, Luc Maranget, and Derek Williams.Understanding POWER multiprocessors.
SIGPLAN Not. , 46(6):175–186, June2011.37. Peter Sewell, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Mag-nus O. Myreen. X86-TSO: A rigorous and usable programmer’s model for x86multiprocessors.
Commun. ACM , 53(7):89–97, July 2010.38. Graeme Smith, Nicholas Coughlin, and Toby Murray. Value-dependent inform-ation-flow security on weak memory models, 2019. To appear, Formal Methods2019.39. Valentin Touzeau, Claire Ma¨ıza, David Monniaux, and Jan Reineke. Fast andexact analysis for LRU caches.
Proc. ACM Program. Lang. , 3(POPL):54:1–54:29,2019.40. Caroline Trippel, Daniel Lustig, and Margaret Martonosi. Checkmate: Auto-mated synthesis of hardware exploits and security litmus tests. , pages947–960, 2018.41. Caroline Trippel, Daniel Lustig, and Margaret Martonosi. MeltdownPrime andSpectrePrime: Automatically-synthesized attacks exploiting invalidation-based co-herence protocols.
CoRR , abs/1802.03802, 2018.42. Alberto Verdejo and Narciso Mart-Oliet. Executable structural operational seman-tics in Maude.
Journal of Logic and Algebraic Programming , 67(1-2):226 – 293,2006.43. Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, Tulika Mitra, and Ab-hik Roychoudhury. oo7: Low-overhead defense against spectre attacks via binaryanalysis.
CoRR , abs/1807.05843, 2018.44. Shuai Wang, Pei Wang, Xiao Liu, Danfeng Zhang, and Dinghao Wu. CacheD:Identifying cache-based timing channels in production software. In , pages 235–252. USENIX Association,2017.45. Meng Wu and Chao Wang. Abstract interpretation under speculative execution.In
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language esign and Implementation , PLDI 2019, pages 802–815, New York, NY, USA,2019. ACM.46. Yuval Yarom and Katrina Falkner. FLUSH+RELOAD: A high resolution, lownoise, L3 cache side-channel attack. In USENIX Security Symposium (USENIXSecurity 14) , pages 719–732. USENIX Association, 2014.47. Yinqian Zhang. Cache side channels: State of the art and research opportunities.In
Proceedings of the 2017 ACM SIGSAC Conference on Computer and Commu-nications Security , CCS ’17, pages 2617–2619. ACM, 2017.
A Speculation down the correct branch; parallelspeculation
As far as is currently known correct speculation has no security implications,and therefore we do not model such behaviours explicitly. However if needed wecan capture this in several ways. For instance, a cache fetch can be associatedwith every load, whether inside or outside a speculation, similarly to Rule 25(a).Such semantics can be given by annotating each load that may exhibit this sideeffect.( r := x ) cache cache + = x −−−−−−−→ r := x Alternatively we could add the possibility of speculation down the eventuallychosen branch as a choice. spec ( c ⊓ c ) △ ([ b ] ; c ) ⊓ spec ( c ⊓ c ) △ ([ ¬ b ] ; c )A more precise model that commits the transient context when correct specula-tion is found is possible, though significantly more complicated.The concept of speculation down either branch can be extended straightfor-wardly to parallel speculation down multiple branches, for instance,( spec ( c ) k spec ( c )) △ ([ b ] ; c1