[PDF] Speculative Interference Attacks: Breaking Invisible Speculation Schemes

Abstract

Recent security vulnerabilities that target speculative execution (e.g., Spectre) present a significant challenge for processor design. The highly publicized vulnerability uses speculative execution to learn victim secrets by changing cache state. As a result, recent computer architecture research has focused on invisible speculation mechanisms that attempt to block changes in cache state due to speculative execution. Prior work has shown significant success in preventing Spectre and other vulnerabilities at modest performance costs. In this paper, we introduce speculative interference attacks, which show that prior invisible speculation mechanisms do not fully block these speculation-based attacks. We make two key observations. First, misspeculated younger instructions can change the timing of older, bound-to-retire instructions, including memory operations. Second, changing the timing of a memory operation can change the order of that memory operation relative to other memory operations, resulting in persistent changes to the cache state. Using these observations, we demonstrate (among other attack variants) that secret information accessed by mis-speculated instructions can change the order of bound-to-retire loads. Load timing changes can therefore leave secret-dependent changes in the cache, even in the presence of invisible speculation mechanisms. We show that this problem is not easy to fix: Speculative interference converts timing changes to persistent cache-state changes, and timing is typically ignored by many cache-based defenses. We develop a framework to understand the attack and demonstrate concrete proof-of-concept attacks against invisible speculation mechanisms. We provide security definitions sufficient to block speculative interference attacks; describe a simple defense mechanism with a high performance cost; and discuss how future research can improve its performance.

Full PDF

SSpeculative Interference Attacks: Breaking Invisible Speculation Schemes

Mohammad Behnia, Prateek Sahu ◦ , Riccardo Paccagnella, Jiyong Yu, Zirui Zhao, Xiang Zou ‡ ,Thomas Unterluggauer ‡ , Josep Torrellas, Carlos Rozas ‡ , Adam Morrison † , Frank Mckeen ‡ ,Fangfei Liu ‡ , Ron Gabor (cid:5) , Christopher W. Fletcher, Abhishek Basak ‡ , Alaa Alameldeen ‡ University of Illinois at Urbana-Champaign, ◦ University of Texas at Austin, ‡ Intel Corporation, † Tel Aviv University, (cid:5)

Toga Networks{mbehnia2,rp8,jiyongy2,ziruiz6,torrella,cwﬂetch}@illinois.edu,[email protected],{xiang.chris.zou,thomas.unterluggauer,carlos.v.rozas,frank.mckeen,fangfei.liu,abhishek.basak,alaa.r.alameldeen}@intel.com,[email protected],[email protected]

Abstract

Recent security vulnerabilities that target speculative exe-cution (e.g., Spectre) present a signiﬁcant challenge for pro-cessor design. The highly publicized vulnerability uses specu-lative execution to learn victim secrets by changing the cachestate. As a result, recent computer architecture research hasfocused on invisible speculation mechanisms that attempt toblock changes in cache state due to speculative execution.Prior work has shown signiﬁcant success in preventing Spec-tre and other vulnerabilities at modest performance costs.In this paper, we introduce speculative interference attacks ,which show that prior invisible speculation mechanisms donot fully block speculation-based attacks that use cache state.We make two key observations. First, mis-speculated youngerinstructions can change the timing of older, bound-to-retireinstructions, including memory operations. Second, changingthe timing of a memory operation can change the order ofthat memory operation relative to other memory operations, resulting in persistent changes to the cache state. Using bothof these observations, we demonstrate (among other attackvariants) that secret information accessed by mis-speculatedinstructions can change the order of bound-to-retire loads.Load timing changes can therefore leave secret-dependentchanges in the cache, even in the presence of invisible specu-lation mechanisms.We show that this problem is not easy to ﬁx: Speculativeinterference converts timing changes to persistent cache-statechanges, and timing is typically ignored by many cache-baseddefenses. We develop a framework to understand the attackand demonstrate concrete proof-of-concept attacks againstinvisible speculation mechanisms. We conclude with a dis-cussion of security deﬁnitions that are sufﬁcient to block theattacks, along with preliminary defense ideas based on thosedeﬁnitions.

1. Introduction

Speculative execution attacks such as Spectre [29] and follow-on work [8, 11, 21, 28, 30, 34, 41, 51] have opened a newchapter in processor security. In these attacks, adversary-controlled transient instructions—i.e., speculative instructionsbound to squash—access and then transmit potentially sen-sitive program data over microarchitectural covert channels (e.g., the cache [53], port contention [8]). For example inSpectre variant 1— if (i < N) { j = A[i]; B[j];} —speculative execution bypasses a bounds check due to abranch misprediction, accesses an out-of-bounds value ( j =A[i] ) and transmits that value through a cache-based covertchannel ( B[j] ), i.e., by forcing a cache ﬁll to occur in a setthat depends on j . In this paper, we consider the illegallyaccessed value j to be the secret . Here, the attacker controlsthe value of i , thus j can be any value in program memoryand the covert channel can reveal arbitrary program data.While a variety of covert channels can be used to leaksecret values under mis-speculation, cache-based covert chan-nels [25, 29, 35, 52, 53, 54] make the fewest assumptions on theattacker and have therefore received the most attention. Thisis for two reasons. First, secret-dependent cache ﬁlls leavea persistent footprint in the cache which is observable longafter speculation squashes. Second, certain levels of moderncache hierarchies are globally shared by all cores in the sys-tem, enabling attackers to observe said persistent state changesfrom other physical cores [33, 52]. By contrast, many othercovert channels (e.g., arithmetic port contention [6, 8]) leaveonly intermittent side effects that must be monitored beforethe squash, and/or require that the attacker share hardwareresources on the same physical core (e.g., branch predictorchannels [4, 12, 13, 14])—both of which can be easily blocked(e.g., disabling SMT).The above view of the covert channel landscape has led toa surge of architecture-level “ invisible speculation ” proposalsto block cache-based covert channels due to mis-speculations1 a r X i v : . [ c s . A R ] S e p e.g., InvisiSpec [51], SafeSpec [26], Delay-on-Miss [37],Conditional Speculation [31], MuonTrap [5]). Invisible specu-lation schemes add hardware to prevent mis-speculated loadsfrom making persistent state changes to the memory subsys-tem. To maintain the performance beneﬁts of caching, onlynon-speculative loads that are bound to retire are allowed tomodify the cache state. To maintain the performance beneﬁtsof out-of-order execution, loads are allowed to “invisibly” ex-ecute (i.e., bring data directly to the core without ﬁlling thecache) and forward their results to dependent instructions. In this paper we introduce speculative interference attacks ,which show that invisible speculation schemes do not fullyblock speculation-based attacks that use the cache state. Ourattacks are based on two key observations. First, that mis-speculated instructions can inﬂuence the timing of older,bound-to-retire operations. Second, if changing the timingof a memory operation changes the order of that memoryoperation with respect to other memory operations , the re-sulting re-ordering can cause persistent cache-state changes.Putting these together, we show (among other attack vari-ants) how secret information accessed in a mis-speculatedwindow inﬂuences the order of bound-to-retire loads, leavingsecret-dependent state changes in the cache—even if invisiblespeculation is enabled.To explain these ideas in more detail, consider a simplebut representative invisible speculation scheme

Delay-on-miss (DoM) [37]. DoM issues a speculative load and (a) on anL1 cache hit, forwards the load result to dependent instruc-tions, or (b) on an L1 cache miss, delays servicing the missand re-issues the load when it becomes non-speculative. Incase (a), DoM does not update any replacement state (e.g.,replacement bits) in the L1 cache until the load becomes non-speculative. For simplicity, we explain ideas assuming onlybranch instructions cast speculative shadows [37], i.e., a loadis considered non-speculative/safe iff it is older than the oldestunresolved branch. We discuss attacks on more conservativeDoM variants in § 3.DoM’s (and other invisible speculation schemes’) stated se-curity goal is to only focus on blocking cache state changes dueto mis-speculations, while leaving other covert channels un-blocked. This is reﬂected in DoM’s design. On the one hand,DoM prevents mis-speculated loads from directly changing thecache state. On the other hand, DoM allows mis-speculatedloads to forward their results to dependent instructions, whichcan clearly form covert channels through intermittent statechanges. For example, both whether a mis-speculated load hitsor misses in the L1 cache, and the mis-speculated load’s returnvalue, determine whether and how dependent mis-speculatedinstructions execute. This is exactly the basis for forming, e.g.,arithmetic unit port contention covert channels [6, 8].This paper demonstrates how instructions that cause inter-mittent state changes can be leveraged to create persistent non-spec instrs; if (i < N) { // mispredict secret = A[i]; // M1 k = B[secret*64]; // M2 spec dependent instrs(k); } FrontendExe1 Exe2 CDB (a) non-spec instrs: … = *X; … = *Y; (c) == 0 secret

LdX == 1

LdYLdY LdX time Y is MRU iff secret == 1 (d) (b)

Figure 1:

Speculative interference example. (a) Assume the codesnippet is run on a processor protected by invisible speculation suchas DoM and that &B[0] is cached while &B[64] is not cached. (b)This results in speculative dependent instructions conditionally con-tending for execution resources with non-speculative instructions ,depending on the value of secret . (c), (d) If the non-speculativeinstructions are two loads, the contention can inﬂuence the order inwhich the loads are issued. Finally, the attacker can infer the secretbased on the cache replacement state after the loads both issue. state changes in the cache. Consider the example in Figure 1,modeled after Spectre variant 1. Suppose this code is run on aprocessor using DoM. In Figure 1 (a), a mis-speculated loadM1 forwards secret data secret to a second load M2 ( (cid:192) ). Anormal Spectre attack would monitor the cache state changeleft by M2 to deduce secret . To prevent this leak, DoMwould prevent M2 from changing the cache state, speciﬁcallyby allowing it to access and return data from the L1 if thereis an L1 hit and delaying its execution otherwise. While thisblocks the cache state change due to M2, M2 is allowed to for-ward its result when it completes ( (cid:193) ), meaning that dependentinstructions execute at a time that depends on secret ( (cid:194) ).This has the potential to create a traditional non-cache basedcovert channel, e.g., through execution unit port usage, whichDoM ignores.Our key observation is that secret-dependent timing effectscaused by the dependent instructions can be monitored indi-rectly through how they interact with the execution of oldernon-speculative instructions . In this example, the instruc-tion(s) before the mis-speculated branch ( (cid:195) ). Although thenon-speculative instructions come before the speculative de-pendent instructions in program order, out-of-order executioncould have both of them executing concurrently and contend-ing for resources as shown in Figure 1 (b). For example, ifthey use EXE1 and EXE2, respectively, and contend for thecommon data bus in the same cycle. We call this speculativeinterference .Next, we show how speculative interference can be used tobootstrap a change in the cache state. Speciﬁcally, suppose thenon-speculative instructions are made up of two independentloads to addresses X and Y in different cache lines mappedto the same set, shown in Figure 1 (c). Since these loads areolder than the mispredicted branch in program order, they arenot protected by DoM. We show how, depending on the timingchanges caused by the speculative dependent instructions, theorder in which load X is issued with respect to load Y can2hange. That is, depending on a secret, the processor issueseither loads to X followed by Y or Y followed by X.

To ﬁnishthe attack (Figure 1 (d)), we show how changing the order ofmemory operations can be used to create persistent changes inthe cache state, the intuition being that state in the cache (e.g.,replacement bits) depend on not just what requests are made,but also their order.This issue is not easy to ﬁx. The crux of the problem is thattiming changes can be converted to persistent state changes.These timing changes can arise due to interference througha large number of microarchitectural structures, through dif-ferent instructions, etc. Further, while our example reorderstwo loads that originate from the same thread, there are manyother memory address streams through which to interleaveoperations, e.g., interleaving instruction and data cache ac-cesses, accesses made across threads and security domains,etc.—which further widens the attack surface.The rest of the paper expands on the above ideas as follows:• We introduce and provide a framework to reason about speculative interference attacks , whereby subtle secret-dependent microarchitectural interference inﬂuences thebehavior of older non-speculative instructions. We showhow this can be used to create cache-based covert channels,even in the presence of invisible speculation schemes.• We implement proof-of-concept exploits for three such at-tacks on a real machine, exploiting interference in the pro-cessor’s out-of-order issue logic, MSHR usage and frontendqueues. All attacks are cache-based and work across physi-cal cores.• We provide a sufﬁciently strong security deﬁnition to blockthe attacks, and also provide a starting point defense anddiscussion to set a research agenda for more efﬁciently andcomprehensively addressing the problem.

2. Background

We adopt the standard threat model used by invisible specu-lation schemes [5, 26, 31, 37, 51]. Such schemes care aboutpreventing “persistent” side-effects that are observable dueto mis-speculated loads. For example, which lines are in thecache, replacement and coherence state of each line, etc.Invisible speculation schemes further distinguish fromwhere the attacker is monitoring the covert channel (i.e., wherethe receiver [27] runs). In particular, one of the ﬁrst invisiblespeculation schemes by Yan et al. [51] speciﬁes several suchattacker models:

SameThread model : Here we consider untrusted code in-terleaved with trusted code, as in the case of a sandbox.

CrossCore model : Here, the idea is that the system pre-vents untrusted code from running on a sibling hyper-thread.However, it may run on another core and monitor a cache-based covert channel through a shared cache. We will show attacks against these models. Yan et al. [51]also speciﬁes an

SMT model where the receiver runs in anadjacent hyper-thread. This gives the attacker more power,thus we will focus on the former two models.

Dynamically scheduled processors execute instructions out-of-order (OoO) to improve performance [20, 43]. An instructionis fetched by the processor frontend, dispatched to reservationstations (RS) for scheduling, issued to execution units (EUs)in the processor backend, and ﬁnally retired when it updatesthe machine’s architected state. Instructions proceed throughthe frontend, backend and retirement stages in order, possiblyout of order and in order, respectively. In-order retirementis implemented by queueing instructions in a reorder buffer(ROB) [24] in order and retiring a completed instruction whenit reaches the ROB head.

Invisible speculation aims to block covert channels from form-ing through the cache due to mis-speculated loads [5, 26, 31,37, 51]. While speculative execution attacks can leverageboth cache-based and non-cache-based covert channels, in-visible speculation schemes are only concerned with onesusing the cache because cache-state changes are relativelysimple to monitor. Speciﬁcally, cache-state changes persist after the squash and can be measured across physical cores (asin the CrossCore model, above). By contrast, non-persistentcovert channels, such as those through execution units [19]and ports [8, 18], are more difﬁcult to monitor and place addi-tional restrictions on the attacker (e.g., limit the attacker to theSMT model [8]).While there have been multiple invisible speculation pro-posals, they all share several common traits. For security, theyprevent state changes to the cache due to mis-speculated loads,until those loads are deemed safe/become non-speculative.Performance hinges on several optimizations. First, mis-speculated loads should—subject to the constraint of not up-dating cache state—be allowed to issue and forward theirresults to dependent (mis-speculated) instructions. This allowsthese schemes to maintain efﬁciency in the common case thatspeculation turns out to be correct. Second, loads should beable to update cache state when they become non-speculative(safe). Together, these performance optimizations enable suchschemes to reap the beneﬁts of out-of-order execution andcaching.Different invisible speculation schemes differ in their exactpolicies for allowing speculative loads to issue. We describethe “Delay-On-Miss” (DOM) [37] scheme here, as it is simpleand illustrates the main ideas.

Delay-On-miss (DOM) allows loads that hit in the L1 cacheto execute and forward their results to dependent instructions(which are themselves allowed to execute). Any cache state3hange that would have been made as a by-product of theL1 cache hit (e.g., modifying replacement bits) is deferreduntil the load becomes non-speculative. If the load missesthe L1 cache, it is delayed and re-executed when it becomesnon-speculative.

3. Speculative Interference Attacks

While invisible speculation blocks direct cache-state changesby issued speculative loads, it does not restrict how secretsreturned by those loads ﬂow through the pipeline and impactthe execution of non-memory instructions (§ 2.3). We nowdescribe speculative interference attacks . The key insight isthat secret-dependent resource usage patterns of non-memoryspeculative instructions can be transformed into cache-basedcovert channels. Speciﬁcally, a speculative interference at-tack exploits intermittent resource contention to inﬂuence thetiming of non-speculative parts of the pipeline (§ 3.1). Thesetiming effects are then used to make the non-speculative cacheaccess pattern depend on the secret, creating a cache covertchannel (§ 3.2).

We ﬁrst present a framework for leaking a secret bit by makingthe secret determine the time at which an unprotected victim memory operation accesses the cache, modifying its state.See Figure 2(a) for a visualization. 1 A mis-speculated access load reads a secret into the pipeline and forwards itsecret to a sequence of mis-speculated instructions, called an interference gadget , which creates secret-dependent pressureon some microarchitectural resource(s). 2 Contention onthese resource(s) inﬂuences the timing of actions performed bya non-speculative part of the pipeline, called the interferencetarget , making them encode the secret. 3 The target is chosenso that changing its timing creates a pipeline “ripple effect”that ultimately delays an unprotected victim memory access.That is, how quickly the target executes determines when theunprotected victim’s memory operation is issued and accessesthe cache, thereby modifying the cache state.Interference gadgets can be classiﬁed as one of three types,determined by how the secret is used to create resource con-tention interfering with the target’s execution. Figure 2(b)shows examples of the gadget types.A

Type 1 gadget forwards the secret to instruction(s) withoperand-dependent resource usage patterns, called transmit-ters [56], which issue at a secret-independent time. The re-sulting secret-dependent resource pressure during the trans-mitter(s) execute stage interferes with the target. Examples oftransmitters include data-dependent arithmetic [19] or loads.Notice that a (speculative) load is considered a transmitter de-spite not modifying the cache state under invisible speculation.The reason is that its usage pattern of other resources, such asMSHRs, depends on its address operand. In a

Type 2 gadget, the secret is encoded through instruc-tions issuing at a secret-dependent time. (As opposed to aType 1 gadget, in which the contending instructions issue atsecret-independent times but have secret-dependent resourceusage during their execute stage.) Secret-dependent issuetime is achieved by having the contending instructions bedata-dependent on variable-latency instruction(s) that data-depend on the secret. Importantly, while the variable-latencyinstruction(s) are necessarily transmitter(s), here their secret-dependent resource usage does not interfere with the target.It is only used to inﬂuence when subsequent interfering in-structions (which can be transmitters or non-transmitters) areissued for execution.In a

Type 3 gadget, whether or not the instructions, interfer-ing with the target, execute depends on the secret. For exam-ple, if the gadget contains a branch with a secret-dependentpredicate, then resolution of the branch “poisons” the branchpredictor’s state with secret-dependent information. As a re-sult, the branch’s prediction can become secret-dependentand determine whether the contending instructions execute insubsequent executions of the gadget.All gadgets exploit the fact that invisible speculation doesnot “protect” a transmitter’s resource usage pattern, executiontime, or branch prediction. Although the resulting secret-dependent execution behavior cannot be directly observed bythe attacker (which monitors the cache), it can be indirectlyobserved through its inﬂuence on the non-speculative interfer-ence target, as discussed next.

Here, we design sev-eral interference gadget/targets, which illustrate Type 1 and 2gadgets that delay D- and I-Cache accesses. § 4 implementsthe attack variants described in this section and shows thatthey lead to observable interference in practice. Our goal hereis not to exhaustively enumerate all possible gadgets/targets,but to illustrate the problem. Exploring if (and which) othermicroarchitectural resources can be used to build interferencegadgets is future work.We ﬁrst present two gadgets that delay an unprotected D-Cache access. We exploit the fact that in all invisible specula-tion designs, a load that executes only when it reaches the headof the ROB performs its D-Cache access unprotected. Ourinterference targets therefore arrange for the victim load’s ad-dress, or target address , to become available just as it reachesthe head of the ROB. Our interference gadgets either delaythe victim from reaching the ROB head or delay its execution(cache access) after it has reached the ROB head.Next, we present a gadget that delays an unprotected I-Cache access. Such accesses are performed by InvisiSpec andDoM. We acknowledge that unprotected I-Cache access canform a cache-based covert channel in and of itself but describethis gadget due to its interesting property of interfering withthe frontend and not with some instruction’s execution. G DMSHR : Delay data access with MSHR contention.

This isa Type 1 gadget that delays the execution time of a load at4 nterference_target; if (i < N) { // mispredict secret = A[i]; // access interference_gadget(secret); } == 0 secret (b) Interference gadget examples (C = instruction(s) interfering with target) secret == 0 gadget secret == 1 Squash! secret == 0

B predicts not taken

B resolves taken secret == 1

Squash! B0 resolves branch resolves secret == 0 access target secret == 1 Time

Squash! secret == 0

B predicts not taken

B resolves taken secret == 1

Time

Squash!

Load M issues B resolves not taken

Squash!

B0 resolvesaccess

Time

Squash!

B predicts not taken B resolves not taken

Squash!

B0 resolvesB predicts not taken Time

Squash! branch predictor state not updatedbranch predictor state not updated interference_target;if (i < N) { // mispredict secret = A[i]; // access interference_gadget(secret); } secret == 0 gadget secret == 1 secret == 0 secret == 1 secret == 0 access target secret == 1 Time secret == 0 secret == 1 explicit gadget gadgetaccess target

Time gadgetaccess target Time implicit gadget access target

Time gadgetA A AA A = unprotected memory access (a) Attack framework gadget( secret ): C ( secret ) gadget( secret ): x = f( secret ) C (x) gadget( secret ): if ( secret ): C () Time == 1

Time Time C executes Type 1 Type 2 Type 3 interference f( ) is fastf( ) is slow Figure 2:

Speculative interference attack framework (a) and classiﬁcation of interference gadgets (b). A = f(z) // takes Z cycles y = load(A) if ( i < N): // mispred. taken (miss on N) secret = load(&TargetArray[i ]) // access // Interference Gadget x0 = load(&S[secret * 64 * 0]) x1 = load(&S[secret * 64 * 1]) ... x M − = load(&S[secret * 64 * (M − Squash - - - f(z)issued load(A)stalled LoadSsecret == 1(x ..x M-1 diff)secret == 0(x ..x M-1 same) Branch(a)(b) f(z)issued load(A)stalled LoadS+64 MSHRFullf(z)computed MSHRAvailable load(A)returnsLoadS LoadS f(z)computed load(A)returns - - -

LoadS+(M-1)*64

Z Cycles T CyclesT Cycles

Mis-speculation duration

Figure 3:

Delaying a load using miss status holding registers ( G DMSHR ). M is the number of L1 D-Cache MSHRs. Transmitters in theinterference gadget are boxed in both the pseudo-code and timeline to show correspondence. the head of the ROB by a secret-dependent amount of time.Figure 3(a) shows the gadget and target. The target consistsof the victim load, whose address operand takes Z cycles togenerate. The value Z is such that the gadget’s instructions canissue while the target address is being generated. The gadgetconsists of M independent loads, where M is the number ofL1 D-Cache miss status handling registers (MSHRs), each ofwhich holds information on all the outstanding misses for somecache line. The gadget’s goal is to create secret-dependentMSHR pressure, to delay when the victim load obtains its dataafter reaching the head of the ROB and issuing. This gadgettargets invisible speculation designs that issue speculative L1D-Cache misses, i.e., InvisiSpec, SafeSpec, and MuonTrap.None of these designs specify changes to the MSHR allocationpolicy, so we assume they use the standard policy of allocatingan MSHR to a missing load based on issue order. Figure 3(b)shows the attack timeline.1 If secret =

1, each gadget load accesses a differentcache line. The attacker primes the cache so that each ofthese accesses is an L1 D-Cache miss. The result is thateach load allocates a distinct MSHR, exhausting the availableMSHRs. Thus, the victim load (assumed to be an L1 D-Cachemiss) cannot issue and is delayed until one of the gadget loadscompletes or the mis-speculation is squashed.2 If secret =

0, all the gadget loads access the samecache line. They therefore use the same MSHR, which leavesMSHRs available for the victim load once it reaches the headof the ROB. The victim load can issue and is not delayed. G DNPEU : Delay data access using non-pipelined EU.

Thisis a Type 2 gadget that creates a secret-dependent delay ofthe victim’s target address generation. Figure 4(a) shows thegadget and target. The target consists of a retirement-bound load whose operand, A , is generated by a dependent chain ofinstructions, denoted f . The gadget consists of a load (trans-mitter) and a sequence of independent instructions, denoted f’ , that depend on the load. (Here and elsewhere, the gad-get executes in the shadow of a slow-to-resolve mispredictedbranch, due to a cache miss on N .) Each instruction in f ’ usesthe same EU as the target. This must be a non-pipelined EU,so that a gadget instruction being issued to the EU blocks atarget instruction from issuing.Figure 4(b) shows the attack timeline. The value z , onwhich the target address A depends on, takes Z cycles to com-pute. The value Z is such that before z gets computed, thereis enough time for the attack’s access load to read the secret,forward it to the interference gadget, and for the gadget’stransmitter load to return (if it is a cache hit).1 The transmitter load accesses a secret-dependent cacheline. This load executes under invisible speculation protection,but it still retrieves the data from some level of the memoryhierarchy. The attacker can therefore orchestrate for its execu-tion time to be secret-dependent, by appropriately priming thecache prior to the attack.2 If secret = , the transmitter load returns quickly, justbefore the value z is produced. This makes the instructionsequence f’ in the gadget ready, and its ﬁrst instruction f (cid:48) is issued to the EU before f , the ﬁrst instruction in f . Thus,when f becomes ready, it is blocked from using the EU. When f (cid:48) completes, f is issued (due to age-ordered scheduling).However, once f completes, f —which depends on f —doesnot immediately become ready, due to f ’s writeback delay. Incontrast, f (cid:48) , which depends only on the transmitter, is alreadyready and so is issued to the EU. This creates a cascading effectin which each instruction f i gets delayed, delaying the targetaddress’ computation, until the mis-speculation is squashed.5 z = ... // takes Z cycles A = f(z) // takes F cycles y = load(A) if ( i < N): // mispred. taken (miss on N) secret = load(&TargetArray[i ]) // access // Interference Gadget x = load(&S[secret * 64]) f (cid:48) (x) Wait onz zcomputedf(z)stalled Load S[64]issued S[64]returns f ' runs f ' runs - - - f(z) returnsload(A) issuedWait onz zcomputedf(z)stalled f runsf runs - - - Load S[0]issuedsecret == 1(Load S[64] hits)secret == 0(Load S[0] miss) S[0]returnsf runs(a)(b) f(z) computedload(A) issuedf runsf runs Z Cycles F CyclesF Cycles

SquashBranch Mis-speculation duration

Figure 4:

Delaying a load using contention on a non-pipelined EU ( G DNPEU ). Instruction sequences f and f’ use the same non-pipelined EU.Transmitters in the interference gadget are boxed in both the pseudo-code and timeline to show correspondence. if ( i < N): // mispredict taken (miss on N) secret = load(&TargetArray[i ]) // access // Interference Gadget x = load(&S[secret * 64]) // Congest RS sum += x; ... sum += x; // many times target instr . // Target instruction Load S[64]issues S[0]returns

ADD sDispatchFetch TargetInstI-Cache ﬁll

ADD s ﬁll RS,Frontend stops fetchsecret == 1(Load S[64] miss)secret == 0(Load S[0] hit)(a)(b) Fetch TargetInstI-Cache ﬁllLoad S[0]issues S[64]returns

ADD sDispatch SquashBranch

Mis-speculation duration

Figure 5:

Back-throttling the Fetch Unit by contending for RS for G IRS gadget. sum+=x repeats for N (number of RS slots) times. Transmittersin the interference gadget are boxed in both the pseudo-code and timeline to show correspondence. secret =

0, the transmitter load does not returnbefore z is produced. (In delay-based invisible speculationdesigns [37], the load is never executed. In other designs [51],the load simply takes a long time to return compared to the hitcase.) As a result, the target address is computed before thegadget’s interfering instructions execute (if they execute), andthe victim load can issue. G IRS : Delay instruction fetch with RS contention.

This isa Type 1 gadget that ﬁlls the RS for a speciﬁc EU, whichcauses head-of-line blocking in the dispatch queue and forcesthe Fetch unit to stop I-Cache accesses for fetching instruc-tions. Figure 5(a) shows the gadget; the target is an instructionfetch by the frontend, so does not appear in the code. (Thegadget’s goal is to change when or if the target instruction isfetched.) The gadget consists of a load (transmitter) and a longsequence of arithmetic (

ADD ) instructions that depend on theload. Figure 5(b) shows the attack timeline.1 The transmitter load accesses a secret-dependent cache line,which is setup to hit or miss in the cache hierarchy, dependingon the secret.2 If secret =

1, the transmitter load is a miss in the D-Cache. The dependent

ADD instructions are fetched and ﬁllup the RS slots, but do not issue. This leads to the RS gettingﬁlled up. This consequently creates pressure on the Fetch Unitand the frontend stalls. Hence, the target instruction is notfetched. Once the branch resolves, execution continues andthe target instruction get fetched.3 If secret =

0, the transmitter load hits in the D-Cache.Its output is then quickly available to the dependent

ADD instructions, which issue as soon as EU resources are available,hence freeing up the RS slots. Since the RS does not ﬁll up,the frontend does not stop fetching. Consequently, the cache line holding the target instruction is fetched into the I-Cache.

We now show how to transform the basic attack primitive(§ 3.1), which creates a secret-dependent delay for an unpro-tected victim memory access, into a cache covert channel.The insight here is that we can transform a secret-dependent timing change—the delay in the victim’s access—into a secret-dependent cache state change by using the delay to reorder the victim access with another (unprotected) reference mem-ory access, which occurs at a ﬁxed, secret-independent time.Conceptually, the reference access acts as a kind of “clock,”helping the attacker to observe whether the victim load issuesbefore or after some point in time.Crucially, the only property required from the referencememory access is that its issue time does not depend on thesecret. In particular, the reference access can be issued bythe victim or by the attacker, depending on the speciﬁc attack.We use this property to determine whether an unprotectedvictim memory access A is delayed by determining whether A accesses the cache after or before the unprotected referencememory access, B , respectively. The cache state, σ , is deter-mined by the sequence of memory accesses to the cache, α .We assume that σ is not commutative, i.e.,Formally, therefore, making the order in which A and B access the cache secret-dependent makes the cache state secret-dependent, creating a cache covert channel.Non-commutativity of the cache state holds for most cachearchitectures, particularly after sufﬁciently many past accesses,as long as both memory accesses target different cache lineaddresses that map to the same cache set. We denote theselines by A and B , according to the memory access that targets6 ccesses With Secret-Dependent OrderGadget V D - V D & V I - V D V D - A D V I - A D G DNPEU

InvisiSpec (Spectre), DoM(non-TSO), SafeSpec(WFB)

All AllG

DMSHR

InvisiSpec (Spectre), Safe-Spec (WFB) InvisiSpec,SafeSpec,MuonTrap InvisiSpec,SafeSpec,Muon-Trap G IRS – –

InvisiSpec,DoM

Table 1:

Invisible speculation designs vulnerability matrix. each of them. The memory access order impacts the set’s replacement state (e.g., LRU bits), and can be observed byinducing evictions and monitoring which lines get evicted (bytiming memory accesses).Blocking replacement state-related leakage is explicitly inthe scope of invisible speculation (e.g., [31, 37, 51]). How-ever, we are not aware of such attacks being demonstrated inpractice. In § 4, we demonstrate a covert channel based onthe ordering of two LLC accesses on a commercial CPU witha sophisticated replacement policy. Thus, for the followingdiscussion, we assume that achieving secret-dependent cacheaccess order is equivalent to forming a covert channel.

We now combine the spec-ulative interference gadgets (§ 3.1.1) with various types ofreference memory accesses to obtain several complete attackson different points in the invisible speculation design space.Each attack creates a cache covert channel by making thesecret determine the order of two unprotected LLC accesses,which may be a victim data access ( V D ), victim instructionfetch ( V I ), or an attacker data access ( A D ). (We use V and A to specify whether the victim or attacker thread, respectively,performs the access.) Table 1 summarizes which defenses arevulnerable to which attack combinations. V D - V D ordering. This attack targets invisible speculation de-signs that may have multiple unprotected loads executing con-currently. For example, InvisiSpec and SafeSpec have modesthat only defend against control-ﬂow mis-speculation. In thesemodes, any load that becomes ready to execute when there areno unresolved branches older than it in the ROB, performs anunprotected access [26, 51]. A similar case exists with DoMon architectures with a non-TSO memory consistency model.In this case, any load can execute without protection if allolder branches have resolved and all older stores and loadshave their addresses resolved [37].We show how to base the attack on the G DMSHR or G DNPEU gadgets, by modifying the gadget’s interference target sothat the victim load A is followed (in program order) by aretirement-bound reference load B , whose issue time is not Recent work [50] shows information leakage through cache LRU states,but its channels rely on more than ordering of two accesses. z = ... // takes Z cycles A = f(z) // takes F cycles y = load(A) B = g(z) // takes G > F cycles z = load(B) if ( i < N): // mispredict taken (miss on N) secret = load(&TargetArray[i ]) // Interference Gadget x = load(&S[sec * 64]) // secret =1 − >hit, secret =0 − >miss f (cid:48) (x) Figure 6:

Reordering victim loads by exploiting contention on anon-pipelined EU. Instruction sequences f and f’ use the same non-pipelined EU. Instruction sequence g uses a different EU. affected by the gadget. Due to space constraints, we fullydescribe the attack based on the G DNPEU gadget; the G DMSHR -based attack is similar. Figure 6 shows the modiﬁed targetand the original gadget (Figure 4). Both A and B ’s addressgeneration depend on z . If secret = A accesses the D-Cache before the refer-ence load B , since the sequence of instructions that generates B , g(z) , takes longer to complete than f(z) . However, if secret =

1, there is speculative interference, so A ’s gener-ation is delayed while B ’s is not, and load B accesses theD-Cache ﬁrst. V I - V D ordering. Modifying the target in the V D - V D attack sothat the branch condition i depends on load A makes the delayof load A also delay the branch’s resolution time, i.e., whenthe squash occurs. This can change the order of a post-squashinstruction fetch—which is unprotected, as it is of the correctexecution path—with respect to load B . V D - A D ordering. Many invisible speculation designs unpro-tect a load only when it becomes the oldest load or the oldestinstruction in the ROB. This is the case in InvisiSpec’s Fu-turistic mode [51], SafeSpec’s wait-for-commit mode [26],Conditional Speculation [31], and MuonTrap [5]. These de-signs make it impossible to reorder unprotected victim loads,as no two such loads can execute concurrently. As noted above,however, the same effect—secret-dependent order—can beachieved if the attacker performs the reference access. Forthis, the attacker simply needs to issue an LLC access to thesame set accessed by the V D load from another core, at a ﬁxedtime after inducing the mis-speculation. This attack can bebased on either of the G DMSHR or G DNPEU gadgets. V I - A D ordering. As in the V I - V D case, the G DMSHR and G DNPEU gadgets can be used to target the branch condition,delaying a post-squash instruction fetch on the correct exe-cution path. This can be measured using the attacker’s LLCaccess as a reference clock. In contrast, the G IRS gadget onlyimpacts the timing of instruction fetches in the mis-speculatedpath. Hence, the delay it introduces for instruction fetches canonly be observed if I-Cache accesses are not protected by theinvisible speculation scheme, as in InvisiSpec and DoM.

Attack landscape summary

Every invisible speculation de-7ign we have evaluated is vulnerable to at least one of theattacks described above. Table 1 summarizes which designsare vulnerable to which attack combinations. The differencesin security manifest in whether an attacker can reorder unpro-tected victim accesses, or must rely on its own access as a“reference clock.”

We refer to the combination of an interference gadget and atarget as a sender , i.e., the sending side of the cache covertchannel. It is natural to ask if senders exist in “the wild,” giventheir speciﬁc structure. There are several real-world attacksettings [22] in which the attacker has some control over the in-struction stream and can craft senders. These settings include(1) the in-domain setting, where a software sandbox executesattacker-controller code, as in the case of in-browser JavaScriptcode or user-supplied Linux eBPF kernel extensions [38];and (2) the domain-bypass setting, where the attacker runsits own program, attempting to use its mis-speculated execu-tion to steal secrets from another hardware protection domain,e.g., Meltdown [32]. Finding whether speculative interferencesenders exist in victim programs is interesting future work.Conceptually, however, even the fact that senders might exist creates uncertainty about security on an invisible spec-ulation system. Users and developers cannot know if theirprogram contains a sender without performing program anal-ysis to verify their non-existence. Having to rely on suchanalysis to guarantee security undermines the efﬁcacy of invis-ible speculation as a software-transparent hardware defense.

4. Attack Demonstrations

In this section, we demonstrate concrete proof-of-concept(PoC) speculative interference attacks based on the ideas from§ 3 on a commercial machine. Although invisible speculationschemes are not implemented today, we can emulate theirbehavior by arranging for loads that would be made ‘invisible’to return data in secret-dependent amounts of time. At thesame time, by evaluating on real hardware, we must addressmany details in real machines that are simpliﬁed in simulators(e.g., LLC replacement policies, RS limits).We evaluate multiple D-Cache PoCs and a variation of theI-Cache PoC described in § 3.1.1 namely, G DNPEU , G DMSHR and G IRS . All the PoCs were successfully implemented andthe attacks successfully leak secret bits to the attacker. Weonly show the G DNPEU attack for space, and refer to it as the D-Cache PoC (§ 4.2). Of independent interest, our D-Cache PoCrequires constructing a novel receiver able to read changes inreplacement state for the QLRU_H11_M1_R0_U0 replace-ment policy (§ 4.2.2). All attacks change cache state, witha receiver (attacker) that monitors execution from anotherphysical core (CrossCore; § 2.1).

We evaluate on an Intel Core i7-7700 KabyLake CPU with 4 physical cores running at a base frequency of3.6GHz, with hyper-threading enabled. Each core has a uniﬁedreservation station, that is shared across execution units, storesup to 97 micro-ops, and has 8 execution unit ports (numbered 0through 7). Each core has two levels of private cache (a 32KBL1-instruction and 32KB L1-data cache, 256KB of combinedL2) and 8MB of Shared L3 (LLC) cache [1, 2].

Tools borrowed from prior work.

We trigger branch mis-predictions by training the target branch in a given direction(similar to [29]). Likewise, we delay branch resolution byhaving the branch predicate be the result of a pointer chase.The attacks also use a Flush+Reload-style [53] receiver.Finally, the D-Cache PoC uses standard techniques to con-struct eviction sets in the LLC [33], which are sets of cachelines that map to the same LLC set in the same LLC slice. Byaccessing lines in an eviction set, the attacker can efﬁcientlyevict other lines whose set and slice is known.

Recall from § 3.1.1, the key principle in the G DNPEU attackis for the attacker to observe the reordering of two bound-to-retire loads. Our PoC enables the attacker to measure thisordering by mapping the two loads to the same LLC set andmeasuring changes in replacement state.To deploy the attack there are two ingredients that need tobe developed. First (§ 4.2.1), an implementation of the G DNPEU sender, i.e., to re-order older bound-to-retire loads. Second(§ 4.2.2), a novel receiver capable of measuring differences inLLC replacement state. We consider both of these to be of in-dependent interest, i.e., to re-order older non-load instructionsto perform different speculative interference attack variants orto be used in entirely different attack settings (in the case ofthe replacement state-based receiver).

To reorder the loads totwo addresses A and B, we follow the structure from Figure 6.Namely, there are two sequences of instructions, f(z) and g(z) , that generate addresses A and B respectively. An inter-ference gadget only affects f(z) . In presence of the gadget,load A is delayed to issue after load B whereas regularly itwould be issued before load B.First consider the address generation of A and the interfer-ence gadget in isolation. We implement f(z) and f’(x) (Figure 6) as repeated sequences of same instructions, calledthe target instruction and gadget instruction , respectively.We pick suitable instructions (i.e., that maximize the inter-ference of the gadget on the target) as follows. We identifyhigh latency, low-throughput instructions that use the same ex-ecution port. Low-throughput allows for an issued instructionin the interference gadget to block the execution port of ready-to-schedule instructions in the interference target; high latencymaximizes the time it blocks instructions in the interference8arget. Finally, the gadget instruction should be composedof only a few micro-ops. This allows more instructions inthe interference gadget to occupy RS simultaneously, whichincreases the likelihood of them getting issued concurrent tothe target instructions.Based on the above process, our PoC uses the

VSQRTPD instruction for both gadget and target.

VSQRTPD consistsof only 1 micro-op executed on the core’s execution port 0and has observed latencies of 15–16 cycles and reciprocalthroughput of 9–12 cycles [15]. We also veriﬁed that the attackis functional with

VDIVPD . Figure 9 shows the time from theissue of the ﬁrst instruction of f(z) to the completion of theload A in the presence (interference) and absence (baseline) ofthe interference gadget’s execution. The takeaway is there is aclear timing difference in the interference target’s executiondepending on presence/absence of the interference gadget’sexecution. This is the secret-dependent delay imposed by thegadget on the victim load. With thecapability to reorder two loads, the next ingredient for the at-tack is to translate a reordering of loads into a persistent cachestate change. We achieve this using the cache replacementstate. For the rest of the section, we use the notation A-Bto indicate the order in time in which the loads are issued,i.e., A-B means A issues ﬁrst and vice-versa. We also assumeaccess to eviction sets (EV; § 4.1).Our attack targets the replacement state because we areonly changing the order of loads. Changing the order ofloads is different than changing which loads are issued asin a normal cache-based attack. For example, a standard LLCPrime+Probe attack, without a very ﬁne probe granularity,would observe both A and B in the cache, regardless of theirorder and be unable to distinguish A-B from the B-A case.Translating load issue order into a persistent replacementstate change is not difﬁcult in textbook replacement policies,such as LRU, as the ordering directly inﬂuences replacementpriority ranking. However, replacement policies in modernmachines, such as our target processor, are more complex. Thenew technical challenge for the attacker is that fresh insertionsof A and B are ranked equally.We show how this new challenge can be overcome by pro-viding a technique to extract replacement state data from thereplacement policy on the Kaby Lake machine. To identify thereplacement policy on our machine, we used a CacheAnalyzertool by nanoBench [3]. The resulting replacement policy isapproximately QLRU_H11_M1_R0_U0 (“Quad-age LRU”) RELOAD+REFRESH [9] also uses replacement-state manipulation prin-ciples to execute a cache-based attack. The distinction in this work is wetry to identify the victim’s load issue order, whereas they try to identify thepresence of a victim’s access to a target address. on speciﬁc cache sets [46]. QLRU is a Static-RRIP Replace-ment policy variant with a 2 bit ﬁeld used for the age of acache line [3, 23], summed up here:• M1: Insertion policy. Inserts cache lines with age 1.• H11: Hit promotion policy. Promotes a line of age 3 to age1, age 2 to age 1, and age 1/0 to age 0 upon hit.• R0: Eviction policy. Insert to leftmost location if cacheset is not full; otherwise, evict block corresponding to theleftmost physical tag with age 3.• U0: Age update policy. Increments age ﬁelds of all cachelines until there is a candidate ready for eviction (age = 3).

Attacker Receiver Protocol.

We now describe how the at-tacker decodes from the replacement state whether A-B or B-Aoccured. At a high level, similar to a traditional cache attack,the attacker thread ﬁrst primes the LLC set, waits for the victimto issue its secret-dependent ordering, and ﬁnally probes theLLC set to determine which ordering the victim issues. Dueto the nature of QLRU, however, the details are different fromconventional attacks. Speciﬁcally, the attacker ﬁrst constructstwo eviction sets of size LLC_ASSOCIATIVITY-1 elements,call these EVS1 and EVS2, which map to the same LLC setand slice as A and B. The attacker then uses the followingaccess sequences to prime and probe the cache set:•

Prime Sequence: Access EVS1 many times + Access A • Probe Sequence: Access EVS2

The attacker accesses EVS1 many times in order to saturatetheir age at 0, leaving A with an age of 3. To be able to accessaddress A, our current PoC requires that the receiver sharememory with the victim (hence the use of Flush+Reload).For our machine, the targeted cache sets are 16-way asso-ciative.We will refer to elements in EVS1 as EV0-EV14, andelements in EVS2 as EV15-EV29. The resulting cache statesfor prime and probe with the A-B sequence is displayed in Fig-ure 7. The main idea is that only A or B is still resident in theLLC by the end of prime+victim_accesses+probe sequence.

In this section, we present theoverall D-Cache PoC. The attack steps shown in Figure 8 areexplained in detail below:1 Attacker initializes eviction sets based on addresses A,B.2 Attacker primes the LLC set replacement state (§ 4.2.2)and mis-trains the victim’s branch predictor.3 Victim issues loads to A and B, where order depends onthe secret (§ 3.1.1). If secret = 0, A-B is issued, and if secret =1, B-A is issued.4 Attacker probes the LLC set replacement state (§ 4.2.2),and observes the residency of A or B in the LLC set. Theresidency is determined by issuing a timed access to A and Band comparing it with a LLC cache miss threshold. It is likely the case that the LLC cache sets do not strictly abide bythis replacement policy and have an adaptive replacement policy. However,for the purposes of this PoC, the attack strategy that creates observable re-placement state changes on QLRU_H11_M1_R0_U0, also creates observablereplacement state changes on our machine. V0 EV1 EV2 EV3 ...

EV11 EV12 EV13 EV14 A2 3(a) After Prime Sequence 2 2 2 2 2 2(b) VictimAccess A-B(c) Probe with EV15-EV29 B EV1 EV2 EV3 ...

EV11 EV12 EV13 EV14 A1 23 3 3 3 3 3B EV15 EV16 EV17 ...

EV25 EV26 EV27 EV28 EV293 23 3 3 3 3 3

Figure 7:

QLRU State for the targeted cache set. EV N ,A,B representaddresses and numbers represent the age for each cache line. (a)shows the cache state after attacker primes the cache. (b) & (c) rep-resent the cache states after the victim runs (with pattern A-B) andafter the attacker completes the probe. A victim access pattern ofB-A has analogous state changes. ≈ ≈ Victim (Core 1)

Attacker (Core 2) t i m e find_eviction_set (A, B) train_branch_predictor()prime_llc_set()A = contention_target() y = load(A)B = fixed_latency() z = load(B)N = pointer_chase() if (i < N): //mis-spec secret = load(tgt[i]) x = load(&S[secret*64]) //miss(secret=0) ⇒ A-B //hit(secret=1) ⇒ B-A interference_gadget() probe_llc_set()load(A), load(B) if(A cache_hit & B cache_miss) secret = 1 if(A cache_miss & B cache_hit) secret = 0 Figure 8:

An End to End visualization of the D-Cache attack.

We run the PoC in a cross-coresetting and evaluate the end-to-end covert channel error rate vs.throughput in Figure 10. Throughput is deﬁned as the number

120 125 130 135 140 145 150 155 160

Cycles F r e q u e n c y =15, b =3 baseline interference Figure 9:

The average time(measured with clock thread [40,42]) to execute the interferencetarget changes by ∼

16 clockticks (80 rdtsc cycles) based onthe presence or absence of theinterference gadget. bit rate (bps) B i t E rr o r P r o b a b ili t y Figure 10:

D-Cache PoC chan-nel error vs. bit rate. As a repre-sentative result: choosing a rateof 50 bps, an AES-128 key can beleaked in under 2.56s with over80% accuracy. of secret bits transmitted per unit time. It is represented asbits per second (bps) and evaluated by measuring the CPUcycles required to leak 1 bit. Error rate is deﬁned as thenumber of incorrectly inferred bits over the total number ofbits transmitted. We can trade-off error rate and bit rate bychanging PoC parameters, e.g., the number of times the PoC isrun to leak each bit, the amount of time spent trying to mistrainthe branch predictor.

5. Related Work

Most speculative execution attacks that have been presentedto date build on cache-based covert channels to leak data,being inspired by either Spectre [11, 21, 29, 30, 34, 47] orMeltdown [10, 32, 39, 44, 45, 49]. To our knowledge, onlySMoTherSpectre [8] and NetSpectre [41] make use of alterna-tive covert channels, such as port contention, for speculativeexecution attacks.We provide background on invisible speculation schemes [5,26, 31, 37, 51] in § 2. CleanupSpec [36] targets the invisi-ble speculation goal of blocking only cache covert channels,but uses a unique approach of (1) undoing cache occupancychanges upon a squash and (2) using randomized replacementto block replacement-related leakage. CleanupSpec does notblock speculative interference but makes its exploitation morechallenging. We leave this as future work.Concurrent to this work, Fustos et al. [16] also observedthat younger speculative instructions can inﬂuence the timingof older bound-to-retire instructions. Yet, their SpectreRewindattack is a traditional contention attack (§ 1) and explicitlyoutside of the scope of invisible speculation schemes.

6. Discussion & Conclusion

Speculative interference attacks show that current invisiblespeculation schemes are not immune to cache attacks. Specif-ically, we show how an attacker can convert timing changesinto persistent cache state changes. We hope our work helpsset a research agenda towards more secure, efﬁcient invisiblespeculation mechanisms.We argue that a ﬁrst step toward this goal is to deﬁne aformal security guarantee that an invisible speculation schememust satisfy. As we have shown, the intuitive property that“speculative loads do not modify cache state” is too weak.We propose an ideal invisible speculation security property,which formally models the goal of eliminating all speculativeexecution cache covert channels. Ideal invisible speculationrequires that the system’s cache state is invariant of specu-lative execution. More formally, given an execution E ofthe microarchitecture, we assume that the attacker can ob-serve the sequence of state changes in the cache hierarchy,denoted C ( E ) . Ideal invisible speculation is the followingproperty, akin to non-interference [17]: For any execution E : C ( E ) = C ( NoSpec ( E )) , where NoSpec ( E ) is the executionthat would have occurred if E had no mis-speculations.10 high level principle for achieving ideal invisible spec-ulation is: a speculative instruction must not inﬂuence theexecution of a non-speculative instruction . This principledoes not preclude an instruction from speeding up youngerinstructions, which is the basis for invisible speculation’s per-formance beneﬁts. It does imply the microarchitectural rule ofblocking loads from changing cache state while speculative,as that can affect non-speculative instructions. As we haveshown, however, more microarchitectural rules are required torealize this principle, creating a design space to explore.One possible approach is based on the following two rules:(1) no instruction ever inﬂuences the execution time of an olderinstruction, and (2) any resources allocated to an instructionat the interface of the frontend and the execution engine arenot deallocated until the instruction becomes non-speculative.Rule (1) is straightforward to obtain if every microarchitecturalresource is perfectly pipelined, but requires further researchto implement otherwise. Rule (2) makes instruction fetch rateinvariant of speculation, which guarantees that speculationcannot inﬂuence when instructions that eventually retire beginexecuting.Overall, we have shown that ideal invisible speculation can-not be achieved while ignoring “bandwidth” or “contention”or “intermittent” covert channels. Implementing ideal invisi-ble speculation appears to involve complexity and efﬁciencychallenges. Whether ideal invisible speculation can be sim-pler or more efﬁcient than defenses with more comprehensivethreat models [7,48,55,56] is an interesting question for futurework. References [1] 8th and 9th generation intel® core™ processor families datasheet,volume 1 of 2. .[2] Kaby lake - microarchitectures - intel - wikichip. https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake .[3] Andreas Abel and Jan Reineke. nanobench: A low-overhead toolfor running microbenchmarks on x86 systems. arXiv preprintarXiv:1911.03282 , 2019.[4] Onur Acıiçmez, Çetin Kaya Koç, and Jean-Pierre Seifert. Predictingsecret keys via branch prediction. In

Cryptographers’ Track at the RSAConference . Springer, 2007.[5] Sam Ainsworth and Timothy M. Jones. Muontrap: Preventing cross-domain spectre-like attacks by capturing speculative state. In

Proc. ofthe ACM/IEEE International Symposium on Computer Architecture(ISCA) , 2020.[6] Alejandro Cabrera Aldaya, Billy Bob Brumley, Sohaib ul Hassan,Cesar Pereida García, and Nicola Tuveri. Port contention for fun andproﬁt. In

Proc. of the IEEE Symposium on Security and Privacy (S&P) .IEEE, 2019.[7] Kristin Barber, Anys Bacha, Li Zhou, Yinqian Zhang, and Radu Teodor-escu. SpecShield: Shielding Speculative Data from MicroarchitecturalCovert Channels. In

Proc. of the International Conference on ParallelArchitectures and Compilation Techniques (PACT) , 2019.[8] Atri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandt-ner, Alessandro Sorniotti, Babak Falsaﬁ, Mathias Payer, and AnilKurmus. SMoTherSpectre: Exploiting Speculative Execution throughPort Contention. In

Proc. of the ACM Conference on Computer andCommunications Security (CCS) , 2019.[9] Samira Briongos, Pedro Malagón, José M Moya, and Thomas Eisen-barth. Reload+refresh: Abusing cache replacement policies to perform stealthy cache attacks. In

Proc. of the USENIX Security Symposium(USENIX) , 2020.[10] Claudio Canella, Daniel Genkin, Lukas Giner, Daniel Gruss, MoritzLipp, Marina Minkin, Daniel Moghimi, Frank Piessens, MichaelSchwarz, Berk Sunar, Jo Van Bulck, and Yuval Yarom. Fallout: Leak-ing data on meltdown-resistant cpus. In

Proc. of the ACM Conferenceon Computer and Communications Security (CCS) , 2019.[11] G. Chen, S. Chen, Y. Xiao, Y. Zhang, Z. Lin, and T. H. Lai. SgxPectre:Stealing intel secrets from sgx enclaves via speculative execution.In

Proc. of the IEEE European Symposium on Security and Privacy(EuroS&P) , 2019.[12] Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. Jumpover ASLR: Attacking branch predictors to bypass ASLR. In

Proc.of the IEEE/ACM International Symposium on Microarchitecture (MI-CRO) , 2016.[13] Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh. Un-derstanding and mitigating covert channels through branch predictors.

ACM Transactions on Architecture and Code Optimization (TACO) ,13(1), 2016.[14] Dmitry Evtyushkin, Ryan Riley, Nael Abu-Ghazaleh, and DmitryPonomarev. Branchscope: A new side-channel attack on directionalbranch predictor. In

Proc. of the ACM International Conference onArchitectural Support for Programming Languages and OperatingSystems (ASPLOS) , 2018.[15] Agner Fog et al. Instruction tables: Lists of instruction latencies,throughputs and micro-operation breakdowns for intel, amd and viacpus.

Copenhagen University College of Engineering , 93:110, 2011.[16] Jacob Fustos, Michael Bechtel, and Heechul Yun. SpectreRewind:Leaking secrets to past instructions. arXiv preprint arXiv:2003.12208 ,2020.[17] J. A. Goguen and J. Meseguer. Security policies and security models.In

Proc. of the IEEE Symposium on Security and Privacy (S&P) , 1982.[18] Ben Gras, Cristiano Giuffrida, Michael Kurth, Herbert Bos, and KavehRazavi. ABSynthe: Automatic blackbox side-channel synthesis oncommodity microarchitectures. In

Proc. of the Symposium on Networkand Distributed System Security (NDSS) , 2020.[19] Johann Großschädl, Elisabeth Oswald, Dan Page, and Michael Tunstall.Side-channel analysis of cryptographic software via early-terminatingmultiplications. In

Proc. of the International Conference on Informa-tion Security and Cryptology (ICISC) , 2009.[20] John L. Hennessy and David A. Patterson.

Computer Architecture,Sixth Edition: A Quantitative Approach . Morgan Kaufmann PublishersInc., 6th edition, 2017.[21] Jann Horn. Speculative execution, variant 4: speculative store bypass. https://bugs.chromium.org/p/project-zero/issues/detail?id=1528 , 2018.[22] Intel. Reﬁned Speculative Execution Terminology. https://software.intel.com/security-software-guidance/insights/refined-speculative-execution-terminology , 2020.[23] Aamer Jaleel, Kevin B Theobald, Simon C Steely Jr, and Joel Emer.High performance cache replacement using re-reference interval predic-tion (rrip).

ACM SIGARCH Computer Architecture News , 38(3):60–71,2010.[24] Mike Johnson.

Superscalar Microprocessor Design . Prentice HallEnglewood Cliffs, New Jersey, 1991.[25] Mehmet Kayaalp, Nael Abu-Ghazaleh, Dmitry Ponomarev, and AamerJaleel. A high-resolution side-channel attack on the last level cache. In

Proc. of the Design Automation Conference (DAC) , 2016.[26] Khaled N. Khasawneh, Esmaeil Mohammadian Koruyeh, ChengyuSong, Dmitry Evtyushkin, Dmitry Ponomarev, and Nael B. Abu-Ghazaleh. Safespec: Banishing the spectre of a meltdown with leakage-free speculation. In

Proc. of the Design Automation Conference (DAC) ,2019.[27] Vladimir Kiriansky, Ilia A. Lebedev, Saman P. Amarasinghe, SrinivasDevadas, and Joel Emer. Dawg: A defense against cache timingattacks in speculative execution processors. In

Proc. of the IEEE/ACMInternational Symposium on Microarchitecture (MICRO) , 2018.[28] Vladimir Kiriansky and Carl Waldspurger. Speculative buffer over-ﬂows: Attacks and defenses. arXiv preprint arXiv:1807.03757 , 2018.[29] Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, MikeHamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, MichaelSchwarz, and Yuval Yarom. Spectre attacks: Exploiting speculativeexecution. In

Proc. of the IEEE Symposium on Security and Privacy(S&P) , 2019.

30] Esmaeil Mohammadian Koruyeh, Khaled N. Khasawneh, ChengyuSong, and Nael Abu-Ghazaleh. Spectre returns! speculation attacksusing the return stack buffer. In

Proc. of the USENIX Workshop onOffensive Technologies (WOOT) , 2018.[31] Peinan Li, Lutan Zhao, Rui Hou, Lixin Zhang, and Dan Meng. Con-ditional speculation: An effective approach to safeguard out-of-orderexecution against spectre attacks. In

Proc. of the IEEE InternationalSymposium on High Performance Computer Architecture (HPCA) ,2019.[32] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher,Werner Haas, Stefan Mangard, Paul Kocher, Daniel Genkin, YuvalYarom, and Mike Hamburg. Meltdown: Reading kernel memory fromuser space. In

Proc. of the USENIX Security Symposium (USENIX) ,2018.[33] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee. Last-level cacheside-channel attacks are practical. In

Proc. of the IEEE Symposium onSecurity and Privacy (S&P) , 2015.[34] Giorgi Maisuradze and Christian Rossow. Ret2spec: Speculativeexecution using return stack buffers. In

Proc. of the ACM Conferenceon Computer and Communications Security (CCS) , 2018.[35] Dag Arne Osvik, Adi Shamir, and Eran Tromer. Cache attacks andcountermeasures: The case of aes. In

Proc. of the Cryptographers’Track at the RSA Conference (CT-RSA) , 2006.[36] Gururaj Saileshwar and Moinuddin K. Qureshi. Cleanupspec: An"undo" approach to safe speculation. In

Proc. of the IEEE/ACM Inter-national Symposium on Microarchitecture (MICRO) , 2019.[37] Christos Sakalis, Stefanos Kaxiras, Alberto Ros, Alexandra Jim-borean, and Magnus Själander. Efﬁcient Invisible Speculative Ex-ecution Through Selective Delay and Value Prediction. In

Proc. ofthe ACM/IEEE International Symposium on Computer Architecture(ISCA) , 2019.[38] Jay Schulist, Daniel Borkmann, and Alexei Starovoitov. Linux SocketFiltering aka Berkeley Packet Filter (BPF). ,2018.[39] Michael Schwarz, Moritz Lipp, Daniel Moghimi, Jo Van Bulck, JulianStecklina, Thomas Prescher, and Daniel Gruss. ZombieLoad: Cross-privilege-boundary data sampling. In

Proc. of the ACM Conference onComputer and Communications Security (CCS) , 2019.[40] Michael Schwarz, Clémentine Maurice, Daniel Gruss, and StefanMangard. Fantastic timers and where to ﬁnd them: high-resolution mi-croarchitectural attacks in javascript. In

Proc. of the International Con-ference on Financial Cryptography and Data Security (FC) . Springer,2017.[41] Michael Schwarz, Martin Schwarzl, Moritz Lipp, and Daniel Gruss.Netspectre: Read arbitrary memory over network. In

Proc. of theEuropean Symposium on Research in Computer Security (ESORICS) ,2019.[42] Michael Schwarz, Samuel Weiser, Daniel Gruss, Clémentine Maurice,and Stefan Mangard. Malware guard extension: Using sgx to concealcache attacks. In

Proc. of the Conference on Detection of Intrusionsand Malware, and Vulnerability Assessment (DIMVA) , 2017.[43] Robert M Tomasulo. An efﬁcient algorithm for exploiting multiplearithmetic units.

IBM Journal of Research and Development , 11(1):25–33, 1967. [44] Jo Van Bulck, Marina Minkin, Oﬁr Weisse, Daniel Genkin, BarisKasikci, Frank Piessens, Mark Silberstein, Thomas F. Wenisch, YuvalYarom, and Raoul Strackx. Foreshadow: Extracting the keys to theIntel SGX kingdom with transient out-of-order execution. In

Proc. ofthe USENIX Security Symposium (USENIX) , 2018.[45] Stephan van Schaik, Alyssa Milburn, Sebastian Österlund, Pietro Frigo,Giorgi Maisuradze, Kaveh Razavi, Herbert Bos, and Cristiano Giuf-frida. RIDL: Rogue in-ﬂight data load. In

Proc. of the IEEE Symposiumon Security and Privacy (S&P) , 2019.[46] Pepe Vila, Pierre Ganty, Marco Guarnieri, and Boris Köpf. Cache-Query: Learning Replacement Policies from Hardware Caches. In

Proc. of the ACM SIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) , 2020.[47] Jack Wampler, Ian Martiny, and Eric Wustrow. Exspectre: Hidingmalware in speculative execution. In

Proc. of the Symposium onNetwork and Distributed System Security (NDSS) , 2019.[48] Oﬁr Weisse, Ian Neal, Kevin Loughlin, Thomas Wenisch, and BarisKasikci. NDA: Preventing Speculative Execution Attacks at TheirSource. In

Proc. of the IEEE/ACM International Symposium on Mi-croarchitecture (MICRO) , 2019.[49] Oﬁr Weisse, Jo Van Bulck, Marina Minkin, Daniel Genkin, BarisKasikci, Frank Piessens, Mark Silberstein, Raoul Strackx, Thomas F.Wenisch, and Yuval Yarom. Foreshadow-NG: Breaking the virtualmemory abstraction with transient out-of-order execution.

Technicalreport , 2018.[50] Wenjie Xiong and Jakub Szefer. Leaking Information Through CacheLRU States. In

Proc. of the IEEE International Symposium on HighPerformance Computer Architecture (HPCA) , 2020.[51] Mengjia Yan, Jiho Choi, Dimitrios Skarlatos, Adam Morrison, Christo-pher W. Fletcher, and Josep Torrellas. InvisiSpec: Making SpeculativeExecution Invisible in the Cache Hierarchy. In

Proc. of the IEEE/ACMInternational Symposium on Microarchitecture (MICRO) , 2018.[52] Mengjia Yan, Read Sprabery, Bhargava Gopireddy, ChristopherFletcher, Roy Campbell, and Josep Torrellas. Attack Directories, NotCaches: Side Channel Attacks in a Non-Inclusive World. In

Proc. ofthe IEEE Symposium on Security and Privacy (S&P) , 2019.[53] Yuval Yarom and Katrina Falkner. Flush+Reload: A high resolution,low noise, L3 cache side-channel attack. In

Proc. of the USENIXSecurity Symposium (USENIX) , 2014.[54] Yuval Yarom, Daniel Genkin, and Nadia Heninger. Cachebleed: atiming attack on openssl constant-time rsa.

Journal of CryptographicEngineering , 7(2):99–112, 2017.[55] Jiyong Yu, Namrata Mantri, Josep Torrellas, Adam Morrison, andChristopher W. Fletcher. Speculative Data-Oblivious Execution (SDO):Mobilizing Safe Prediction For Safe and Efﬁcient Speculative Execu-tion. In

Proc. of the ACM/IEEE International Symposium on ComputerArchitecture (ISCA) , 2020.[56] Jiyong Yu, Mengjia Yan, Artem Khyzha, Adam Morrison, Josep Tor-rellas, and Christopher W. Fletcher. Speculative Taint Tracking (STT):A Comprehensive Protection for Speculatively Accessed Data. In

Proc. of the IEEE/ACM International Symposium on Microarchitecture(MICRO) , 2019., 2019.