Determinating Timing Channels in Compute Clouds
aa r X i v : . [ c s . O S ] J u l Determinating Timing Channels in Compute Clouds
Amittai Aviram, Sen Hu, Bryan Ford
Yale University
Ramakrishna Gummadi
University of Massachusetts Amherst
ABSTRACT
Timing side-channels represent an insidious security challenge forcloud computing, because: (a) massive parallelism in the cloudmakes timing channels pervasive and hard to control; (b) timingchannels enable one customer to steal information from anotherwithout leaving a trail or raising alarms; (c) only the cloud providercan feasibly detect and report such attacks, but the provider’s in-centives are not to; and (d) resource partitioning schemes for tim-ing channel control undermine statistical sharing efficiency, and,with it, the cloud computing business model. We propose a newapproach to timing channel control, using provider-enforced de-terministic execution instead of resource partitioning to eliminatetiming channels within a shared cloud domain. Provider-enforceddeterminism prevents execution timing from affecting the resultsof a compute task, however large or parallel, ensuring that a task’soutputs leak no timing information apart from explicit timing inputsand total compute duration. Experiments with a prototype OS fordeterministic cloud computing suggest that such an approach maybe practical and efficient. The OS supports deterministic versionsof familiar APIs such as processes, threads, shared memory, andfile systems, and runs coarse-grained parallel tasks as efficientlyand scalably as current timing channel-ridden systems.
1. INTRODUCTION
It is hotly debated whether individuals and companies shouldtrust cloud providers with sensitive information, but few wouldsuggest that a cloud customer should trust the provider and all theprovider’s other customers. Yet this may soon be the cloud’s defacto security model—if it isn’t already—due to timing channels.Timing channels are well-known and well-studied [20,37], origi-nally driven by military-grade security demands. They have gainedbroader relevance, however, in the context of commercially appli-cable information flow control [17, 38], and due to the discoverythat computations unintentionally broadcast sensitive informationvia numerous timing channels in shared environments. A sensitivecomputation sharing a CPU core with an attacker, through eithertime division or hyperthreading, is akin to standing behind a trans-parent shower door: e.g., an attacker may steal information fromthe victim via the shared L1 data cache [28], shared functionalunits [35], the branch target cache [2], or the instruction cache [1].Most of the above attacks were demonstrated between processeson a conventional OS, but per-customer VMs on a provider-ownedmachine share resources in essentially the same way, making theresults theoretically applicable to clouds—especially those rely-ing on “container-based” virtualization [32]. Timing attacks haveeven been demonstrated specifically on VMs commonly used inclouds [30], although it is not yet clear how easily these lab-basedexperiments could be replicated in a noisy commercial cloud.Whether timing channels represent an immediate security threator merely a hairline fracture, it is worth repeating the security adage,“attacks never get worse; they only get better.” Today’s timing- channel exploits pick low-hanging fruit, extracting information fromonly one high-bandwidth timing channel at a time via straightfor-ward analysis techniques. Shared computing environments havemany other timing channels, such as L3 caches shared betweencores, memory and I/O busses, and cluster interconnects. There areprobably ways to extract weaker signals from stronger noise, ag-gregate information from low-rate leaks over time, correlate leaksacross multiple channels, etc. Attack amplification techniques ap-plicable to arbitrary timing channels have already appeared [29].It would simply be foolish for us to expect timing attacks not tocontinue getting more effective and more practical over time.In the rest of this paper, we set aside the “imminence of threat”debate and simply assume that at some point, sooner or later, tim-ing channels will become an important cloud security issue. Wefocus here on understanding the basic nature of the timing chan-nel problem in the cloud context, independent of specific channelsand attacks, and on discovering potential solutions compatible withthe requirements of cloud environments. We focus in particularon timing channels internal to a cloud: other side-channels, suchas those derived from a client’s communication with a cloud-basedservice [12], are also important but beyond our present scope.We make three main contributions. First, we identify four waysthe cloud computing model amplifies timing channel security riskscompared with traditional infrastructure. Second, we propose anew method of timing channel control based on provider-enforceddeterministic execution, which aggregates all internal timing chan-nels into a single controllable channel at the cloud’s border. Third,we present a proof-of-concept cloud computing OS that enforcesdeterminism, with preliminary results suggesting that it could sup-port parallel cloud applications efficiently without sacrificing thecloud provider’s flexibility in allocating resources to clients.
2. TIMING CHANNELS IN THE CLOUD
Current cloud privacy discussions focus on the provider’s obli-gation to enforce security and earn the customer’s trust. These dis-cussions presuppose the provider’s full awareness of the securityrisks from which it must shield the customer [24,27]. But exposureto malice from another customer’s software may be hard for theprovider to detect or prevent without careful consideration of thecloud’s architecture. Timing channels typify such insidious risks.Although timing channels represent an important security risk inany shared infrastructure, the cloud model exacerbates these risksin at least four specific ways, which we discuss below. The firsttwo points are well-known to some but worth repeating, while toour knowledge the second two have not previously been discussed.
Parallelism creates pervasive timing channels.
In the days of uniprocessors and single-threaded processes, itwas possible to control timing channels by limiting untrusted pro-cesses’ access to high-resolution clocks and timers, and to other I/Odevices that can behave like clocks [20, 37]. But today’s increas-1 olatile long long timer = 0;void *timer_func(void *){ while (1) timer++; }main() {pthread_create(&timer_thread, NULL,timer_func, NULL);...// Read the "current time"long long timestamp = timer;...}
Figure 1: Implementing a high-resolution reference clock usingthreads, when no explicit hardware clocks are available. ingly parallelism-oriented hardware—especially in the massivelyparallel cloud context—creates numerous implicit, high-resolutionclocks that have nothing to do with I/O. Hardware caches and inter-connects in their many forms all represent shared resources that canbe modulated [1, 2, 28, 35]. A thread running in a loop can createa high-resolution reference clock [37], as illustrated by the trivialcode in Figure 1, even if the OS or VM has virtualized or disabledall “explicit” hardware clocks. Even processes with no access toexplicit clocks, timers, or other devices, can thus use parallelism-derived implicit clocks to exploit timing channels.
Insider attacks become outsider attacks.
With notable exceptions [10], timing channel exploits usually re-quire the attacker to run a sophisticated, CPU-intensive programon the victim’s machine. On private infrastructure, this usuallymeans the attacker must be an “insider” or have already compro-mised the machine. But a cloud provider’s business is to run anypaying customer’s computation with “no questions asked.” Sincethe provider may colocate arbitrary customers’ computations on agiven machine without the knowledge or consent of either, a timingattack exploitable only by “insiders” on private infrastructure maybe mounted by malicious “outsiders” in the cloud. An attackermay simply “fish” for secrets without even knowing the identityof the co-resident victim, by monitoring timing channels for SSHkeystrokes for example, or the attacker may deliberately attempt toobtain co-residency with a specific target [30].
Cloud-based timing attacks are unlikely to be caught.
The owner of private infrastructure has the right to monitor andinspect any running software to detect malicious code. Cloud cus-tomers cannot monitor other customers’ computations to protectthemselves against timing attacks, however (except by engagingin “counter-espionage” attacks themselves), and cloud providershave no prerogative to monitor their customers’ computations dueto customer privacy concerns. Since a timing attack leaves no trailof compromised protection mechanisms, successful timing attacksare unlikely to raise alarms and will probably just go unnoticed.Thus, providers risk nothing by leaving timing attacks undetectedand unreported, whereas monitoring customers in order to detectand report such attacks may invite privacy lawsuits.
Controlling timing channels via resource partitioningundermines the cloud’s elasticity and business model.
One general approach to controlling timing channels is to limitthe rate at which one user’s demand for a shared resource may vis-ibly affect the resource’s availability to another user, either by stat-
Figure 2: Timing-hardened cloud architecture. Gateways ac-cept requests, dispatch deterministic jobs into the cloud, thenreturn job results that depend only on explicit job inputs, andnot on internal timing. ically partitioning the resource or injecting noise into schedulingdecisions. Recent cache partitioning proposals exemplify this ap-proach [21]. These methods limit the provider’s ability to oversub-scribe and statistically multiplex shared hardware efficiently amongusers, however, undermining the basic business model of cloudcomputing. Without statistical multiplexing, the cloud loses itselasticity, leaving the provider essentially selling only private in-frastructure hosting and outsourced management services.
3. A TIMING-HARDENED CLOUD
We now explore a cloud computing architecture that closes all in-ternal timing channels, regardless of number and types of shared re-sources, leaving only one controllable timing channel at the bound-ary. The basic idea is to make the cloud behave like a deterministicbatch job processor, reminiscent of early mainframes.A computation needs access to two “clocks” to exploit any tim-ing channel: a reference clock and a clock that can be modulated [37].While standard approaches to timing channel control attempt tolimit visible clock modulation, our approach is to eliminate all in-ternal reference clocks—even in the presence of parallelism.
As illustrated in Figure 2, a set of gateway nodes at the cloud’sboundary accepts job requests, including any inputs the job re-quires. Upon completion, the gateway returns the job’s outputs,which depend only on explicit inputs, and not on timings of op-erations within the cloud. For each job, the cloud provider effec-tively computes a pure mathematical function , whose outputs de-pend only on the job’s explicit customer-provided inputs, and noth-ing else. The provider’s cloud OS or VMM enforces this deter-minism, ensuring that even malicious guest code can do nothing tomake its results depend on internal timing or other implicit inputs.To process each job, the provider’s gateway breaks the job intosmaller work units and uses load-balancing algorithms controlledby the provider to distribute work among cloud servers. Theseservers may communicate internally while performing a job, pro-2ided communication timing cannot affect computed results.A customer’s job may also read and write the customer’s persis-tent data stored in the cloud, provided any writes remain invisibleboth externally and to other jobs until the writing job completes.Each job in effect executes within a provider-enforced transaction.The provider may statistically multiplex different customers’ jobsfreely onto shared hardware within the cloud, with no static parti-tioning or scheduling noise injection. Provider-enforced determin-ism nevertheless ensures that no timing or other nondeterministicinformation leaks from one guest computation to another, and onlyone unit of timing information per job leaks to the outside world:namely the total time the job took to complete. This remainingtiming channel leaks only heavily aggregated information that isunlikely to be easily exploitable, and the provider can limit thistiming channel’s information flow rate by returning job results tocustomers on a periodic schedule—e.g., once per millesecond, sec-ond, or minute—rather than immediately on job completion.
The applicability of this cloud architecture depends on two ques-tions: whether a strictly deterministic execution environment canprovide a practical programming model for cloud applications, andwhether such a deterministic environment can be efficient enough.We address the first question here and the second in Section 4.This architecture may be readily applicable to many large, paral-lel, compute-bound applications such as scientific computing, ren-dering, and data analysis. Nondeterminism in parallel applicationsis usually undesired [9,23], so eliminating it benefits the developer.The only common intentional nondeterminism in such applicationsis for internal performance optimization purposes—e.g., distribut-ing work items to workers according to dynamic availability andload—and our architecture delegates these functions to the cloudprovider. Determinism thus simplifies the customer’s programmingtask by eliminating pervasive heisenbugs [25], making all bugs re-producible [22], and offloading load-balancing responsibilities tothe provider. Applicability thus reduces to the efficiency question.While large compute-bound applications fit the proposed archi-tecture most naturally, more interactive uses may be feasible aswell. A deterministic cloud might host interactive web applica-tions, for example, as follows. The provider’s gateway nodes actas generic front-end Web servers, accepting HTTP requests fromremote clients and converting them into deterministic job submis-sions on behalf of the web application’s owner. The gateway at-taches a job creation timestamp to each job’s inputs, enabling theapplication to “tell time” at job granularity. A job’s results can re-quest the gateway to start a follow-up job at a future time, enablingthe web application to implement timeouts, push notifications onpersistent sockets, etc. The remaining questions are whether sucha “gateway-driven” web programming model can be made suffi-ciently familiar for customers implementing web applications, andwhether the provider can support job creation and dispatch at suf-ficiently high rate and fine granularity to handle customer responsetime requirements. We believe both of these questions can be an-swered be answered positively, the first using appropriate runtimelibraries or virtualization mechanisms, the second via efficient de-terministic execution as described later.
Our architecture requires that the provider manage schedulingand load-balancing decisions within a cloud, since enabling cus-tomers to do so would involve leaking potentially sensitive timinginformation into customer computations and their outputs. An im-portant concern is whether the unavailability of this fine-grained internal timing information will make it difficult for customers todevelop and optimize their parallel applications effectively: e.g., toperform detailed profiling-based analysis of their applications, orto implement application-specific dynamic optimizations or load-balancing schemes within their applications.The unavailability of fine-grained timing information to customersmay indeed present a challenge for application profiling purposes.A customer’s application need not run always or only on a sharedcloud, however. The customer might perform development andtesting on a smaller private cloud owned or exclusively leased bythe customer. Even after deployment, the customer might distributean application across both shared and customer-private infrastruc-ture, giving the customer access to full timing information on thephysical machines the customer owns or has leased exclusively.Some applications may require dynamic, application-specific in-ternal load-balancing algorithms in order to perform well. To sup-port such applications, a provider might allow customers to sup-ply application-specific scheduling or load-balancing “plug-ins,”as long as the provider’s OS ensures that these plug-ins can affect only the application’s performance and not its job outputs. Theprovider’s OS might enforce such constraints on load-balancingplug-ins via sandboxing mechanisms for untrusted kernel exten-sions [7], or by running the application’s load-balancing code inuser space and using DIFC techniques [17, 38] to track processesthat have been “tainted” with timing information, and prevent thistiming information from leaking back to the customer.
4. A DETERMINISTIC CLOUD OS
Our architecture’s “magic ingredient,” obviously, is provider-enforced deterministic execution. Most cloud-oriented operatingsystems and virtual machine monitors replicate the inherently non-deterministic execution model provided by the underlying multi-processor/multicore hardware. Recent application-level determin-istic scheduling techniques show promise [5,6], but they apply onlywithin a process and do not prevent a guest from intentionally es-caping its “deterministic sandbox.” The only system we are awareof that enforces determinism on multiprocessor guests does so byrecording and replaying a previous (nondeterministic) execution,and imposes a high performance cost [15].To offer evidence that the proposed architecture may be prac-tical, we introduce Determinator, a novel OS that enforces deter-minism on multi-process parallel computations at moderate cost,while supporting familiar parallel programming abstractions suchas fork/join synchronization, shared memory, and file systems. Wedescribe Determinator from a more general perspective elsewhere [4],but we briefly summarize here the aspects relevant to timing chan-nel control in the cloud.Determinator is intended to supervise the compute nodes in acloud architecture such as that shown in Figure 2. We believe cloudproviders will have an incentive to deploy deterministic computeclouds based on an OS designed along the lines of Determinator,because of the enhanced data privacy assurance that a deterministiccloud could offer security-conscious customers. Integrating Deter-minator into a trusted cloud computing model [31] could furtherincrease both real and perceived security.Our current priority is to demonstrate the viability of OS-enforceddeterministic execution of compute-bound jobs. Determinator cur-rently provides no persistent storage, and does not emulate hard-ware interfaces or host existing operating systems, although we in-tend to expand Determinator’s capabilities in the future.We now outline Determinator’s basic execution environment andAPI, the consistency model it uses to manage state logically sharedamong parallel processes, and how it supports both threads inter-3 igure 3: Determinator process model. Each guest owns a hi-erarchy of processes/threads executing in parallel. acting via (logically) shared memory and Unix-like processes in-teracting via a (logically) shared file system. We make no claimthat this is the “right” way to implement a determinism-enforcingOS, but merely use Determinator to explore some key design chal-lenges and solutions, and how Determinator’s design potentiallyaddresses the goal of timing-hardened cloud computing.
Determinator gives each guest an independent process hierarchy,as shown in Figure 3: it creates a root process on behalf of the cus-tomer, and existing processes can create new child processes. Un-like Unix, but as in nested process models [18], Determinator’s hi-erarchy strictly constrains process lifetime and inter-process com-munication. A process cannot outlive its parent, and a process cancommunicate directly only with its immediate parent and children.Although all guest processes can execute in parallel, Determina-tor enforces determinism in two ways. First, from the kernel’s per-spective, each process is single-threaded and shares no state withother processes. Each process has its own registers and addressspace, and processes cannot share read/write access to the samephysical memory, thereby ensuring that each process’s internal ex-ecution is deterministic as long as the processor’s underlying in-struction set is deterministic. Second, Determinator constrains theinter-process communication and synchronization of all processesto act as a Kahn process network [19], which provably yields de-terministic behavior globally in spite of parallel execution. Determinator processes can have three states: runnable , stopped ,and waiting . Runnable processes can execute concurrently with allother runnable processes, according to a kernel-controlled schedul-ing policy, but do not interact with each other while running. (Pro-cesses could offer the kernel “scheduling hints” such as priori-ties, which the OS might use or ignore, but determinism precludesany explicit feedback from the OS affecting computed results.) Astopped process does nothing until its parent explicitly starts it. Awaiting process is blocked until a particular child stops, at whichpoint the waiting process becomes runnable again.All inter-process interaction is driven by processor traps and thekernel’s three system calls: PUT, GET, and RET. PUT waits un-til a designated child stops, then copies a block of virtual memoryand/or register state into the child, and also optionally: (a) copiesthe child’s entire virtual address space into a reference snapshot as-sociated with the child; and/or (b) (re-)starts the child. GET waits until a designated child stops, then copies or merges a block ofthe child’s virtual memory, and/or the child’s final register state,back into the parent. A merge is like a copy, except Determinatorcopies only words that differ between the child’s current and ref-erence snapshots into the parent’s address space, leaving all otherwords in the parent untouched. RET explicitly stops the currentprocess, effectively returning control to the parent. Exceptions suchas divide-by-zero in any process have the effect of a RET, providingthe parent a status code indicating why the child stopped.The above interaction model ensures global determinism becauseprocesses interact only at well-defined execution points determinedby each process’s internal flow: namely when the parent does aGET or PUT and the designated child has stopped. The kernelgives ordinary processes no ability to wait for “the first child thatstops,” nor to race each other to insert or remove items from mes-sage queues shared among multiple threads. See the underlyingformal model [19] for more details.If any process contains a bug causing an endless loop, other pro-cesses trying to synchronize with it might block forever. To ad-dress this risk and facilitate debugging, a processes can specify an instruction limit when it starts a child: the child and its descendantscollectively execute at most this many instructions before the kernelforcibly returns control to the parent. Counting instructions enablesprocesses to regain control of errant children without violating de-terminism, and also allows processes to “quantize” the execution ofchildren and implement deterministic scheduling schemes [5, 13]. Since the kernel permits processes to share no physical state,they can communicate only by copying data via GET and PUT.The kernel uses copy-on-write to optimize large virtual copies, anduses similar techniques to optimize merge operations, so merginga page that either the parent or the child have left unmodified re-quires only page-level remapping. Leveraging this efficient virtualcopy primitive, the C library linked into each process implements logical shared state abstractions purely in user space. The C li-brary emulates shared state by treating the guest’s process hierarchylike a distributed system. Each process maintains a replica of theshared state, and processes reconcile this state at well-defined syn-chronization points during program execution, as in replicated filesystems [26] and distributed shared memory (DSM) systems [11].
Shared File System.
Determinator’s C library currently emulates the Unix file APIby reading and writing a file system image stored in the process’sown virtual memory. (Files could alternatively be stored in childprocesses not used for execution, reducing address space usage andthe danger of wild memory writes corrupting shared files.)The C library also implements Unix’s fork , exec* , and wait* functions, to create and execute child processes whose virtual mem-ory is not logically shared with the parent but whose file system isshared. The fork function clones the parent process, includingfile system image, into a new child process. The exec* functionsreplace the current process, except for its file system image, with anew executable loaded from the file system.The wait* functions not only synchronize with a child processas in Unix, but also use file versioning [26] to merge the parent’sand child’s file system changes. The file system implements nolocking or ownership, so concurrent writes to a file cause conflicts,which the C library detects and flags. A conflict makes further fileaccess attempts return errors, until the user resolves the conflict andexplicitly clears the flag (or fixes the bug causing the conflict andreruns the job). Concurrent writes are allowed in one case, how-4ver: if all writes are append-only ( O_APPEND ), as with standardoutput or log files, reconciliation simply collects all appends with-out concern for file offsets or ordering, yielding effects analogousto those of asynchronous appends in Unix.
Shared Memory.
Determinator’s C library also emulates shared memory paral-lelism, currently via a simple thread fork/join API. The tfork function clones the entire parent process, like fork , but tjoin not only merges file system changes but also merges the child’schanges to regular process memory into the parent, using the ker-nel’s merge operation described in Section 4.2. The result is a de-terministic analog of release-consistent DSM [11] we refer to as deterministic consistency , detailed elsewhere [3]. Unlike determin-istic schedulers that emulate sequential consistency by executingthreads under an artificial “round-robin” schedule [5, 6, 13], deter-ministic consistency need not rely on speculation to achieve paral-lelism and never needs to re-execute code due to misspeculation.Deterministic consistency also makes the effects of parallel execu-tion not only precisely repeatable but also more predictable to thesoftware developer. If two threads execute the statements x = y and y = x concurrently, for example, under deterministic consis-tency the result is always to swap the values of x and y , whereasunder deterministic schedulers the result depends on relative codepath lengths and hence on subtle program input variations. De-terminator’s runtime can also provide deterministic scheduling forcompatibility with legacy parallel code, though this execution modehas performance and predictability costs [4]. An early Determinator prototype currently runs on the 32-bit x86architecture, and implements both the shared file system and sharedmemory parallel APIs described above atop the kernel’s determin-istic “shared-nothing” processes. The prototype has no TCP/IP net-working or persistent storage as yet, and merely accepts jobs fromthe console. The shared file system supports only 256 files, eachup to 4MB in size, reflecting the limitations of a 32-bit addressspace. The prototype nevertheless suggests the feasibility of pro-viding convenient and familiar parallel programming abstractionsunder a regime of kernel-enforced determinism.
To offer some evidence that the timing-hardened cloud comput-ing architecture proposed in this paper may be feasible and efficientat least for some workloads, we briefly evaluate the current Deter-minator prototype using several parallel benchmarks. We use thefollowing benchmarks: md5 is an “embarrassingly parallel” brute-force MD5 password cracker; matmult is a × integermatrix multiply; qsort is a recursive parallel quicksort on an inte-ger array; blackscholes is a financial benchmark from the PARSECsuite [8]; fft is a parallel Fast Fourier Transform from SPLASH-2 [36]; and lu_cont and lu_noncont are LU-decomposition bench-marks also from SPLASH-2. We ran all benchmarks on a 12-core(2 sockets × md5 , matmult , and qsort , which perform a substantial amount of computation betweeninter-thread synchronization events, consistently run nearly as fastand sometimes faster on Determinator compared with Linux. The md5 benchmark surprisingly scales much better on Determinatorthan on Linux, achieving more than × speedup over Linux on 12 Figure 4: Performance of several parallel benchmarks runningdeterministically on Determinator, versus nondeterministic ex-ecution on Linux. cores; we have not yet determined the precise cause of this per-formance increase but suspect bottlenecks in Linux’s thread sys-tem [33]. The blackscholes benchmark is also “embarrassingly par-allel,” but our port of this benchmark uses deterministic schedulingfor compatibility with the pthreads API, incurring a constant per-formance overhead [4]. The more fine-grained SPLASH-2 bench-marks exhibit higher performance costs on Determinator due totheir more frequent inter-thread synchronization.We also examined whether we could more easily reduce (thoughnot eliminate) timing information leaks in stock Linux kernels, sim-ply by removing access to accurate timers in both the kernel andapplications. Disabling these high-resolution timers does not pre-vent processes from creating ad hoc timers via parallel threads, ofcourse, as discussed in Section 2 and illustrated in Figure 1. Nev-ertheless, to test the effect of timer unavailability on a stock OS,we compiled the Linux kernel and applications to eliminate useof cycle counting instructions such as rdtsc and high-resolutiontimers. Interestingly, we found that the throughput of the Apacheweb server under load dropped by about 20% compared to the un-modified case, because web server and the kernel TCP/IP stack relyon high-resolution timers for estimating client latency, cache sizes,etc. This result suggests that there are no simple workarounds toclose timing channels while delivering high throughput.TCP’s dependency on high-resolution timers does not presentan immediate problem in our proposed cloud architecture, as longas TCP is implemented in a provider-controlled kernel or VMM:the provider’s kernel is trusted and can use high-resolution timers.Dependencies on high-resolution timers in application-level suitessuch as Web services, however, are likely to present a pragmaticchallenge when run under any timing channel control mechanism;we leave further evaluation of these challenges to future work.
5. RELATED WORK
Timing channels are well-studied [20, 37], but only recently ex-amined in the cloud context [12, 30]. Most proposed solutions torecent cache-based attacks [1, 2, 28, 35] involve cache partition-ing [21], requiring hardware modifications and decreasing perfor-mance. Specific algorithms may be hardened [34], but the onlyknown general solution—resource partitioning—limits statisticalmultiplexing and undermines the cloud business model.Deterministic execution has been used for other purposes such asreplay debugging [22] and intrusion analysis [14], and its benefitsfor parallel programming are well-recognized [9, 23]. Parallel lan-guages such as SHIM [16] and DPJ [9] provide deterministic pro-gramming models for these reasons, but they cannot run legacy ormulti-process parallel code. User-level deterministic schedulers [5,6] can provide determinism within one well-behaved process, butcannot supervise multiple interacting processes or prevent misbe-haved applications from escaping the deterministic environment.5loud providers must be able to enforce determinism in guestsin order to eliminate timing channels using our architecture. Theonly system we know of that can enforce determinism on multipro-cessor guests is SMP-ReVirt [15]. While impressive, SMP-ReVirtis designed to replay prior nondeterministic executions, rather thanto execute guests deterministically “from the start,” and its perfor-mance cost is too high for everyday use.
6. CONCLUSION
We have proposed a new, general approach to combating timingchannels in clouds via provider-enforced deterministic execution.The key benefit of this approach is that it eliminates the exploitabil-ity of all timing channels internal to a cloud, independent of thetype of resource manifesting the channel, without undermining thecloud’s elasticity through resource partitioning. Preliminary resultsfrom our determinism-enforcing OS suggest that such a timing-hardened architecture may be feasible and efficient at least for someapplications, but many questions remain. Can such an architecturesupport fine-grained parallel applications, interactive Web applica-tions, transactional storage- or communication-intensive applica-tions? Can it offer cloud customers a rich and convenient, yet ef-ficient, programming model in which to express such applicationsdeterministically? Can deterministic clouds reuse legacy softwareand operating systems? Only further exploration will tell.
7. REFERENCES [1] O. Acıiçmez. Yet another microarchitectural attack:Exploiting I-cache. In
CCAW , Nov. 2007.[2] O. Acıiçmez, Çetin Kaya Koç, and J.-P. Seifert. Predictingsecret keys via branch prediction. In
CT-RSA , Feb. 2007.[3] A. Aviram and B. Ford. Deterministic consistency: Aprogramming model for shared memory parallelism, Feb.2010. http://arxiv.org/abs/0912.0926 .[4] A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficientsystem-enforced deterministic parallelism. In , Oct.2010. To appear. Preprint available at: http://arxiv.org/abs/1005.3450 .[5] T. Bergan, O. Anderson, J. Devietti, L. Ceze, andD. Grossman. CoreDet: A compiler and runtime system fordeterministic multithreaded execution. In ,Mar. 2010.[6] E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safemultithreaded programming for C/C++. In
OOPSLA , Oct.2009.[7] B. N. Bershad et al. Extensibility, safety and performance inthe SPIN operating system. In , 1995.[8] C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSECbenchmark suite: Characterization and architecturalimplications. In , October 2008.[9] R. L. Bocchino Jr., V. S. Adve, S. V. Adve, and M. Snir.Parallel programming must be deterministic by default. In . Mar. 2009.[10] D. Brumley and D. Boneh. Remote timing attacks arepractical. In , Aug. 2003.[11] J. B. Carter, J. K. Bennett, and W. Zwaenepoel.Implementation and performance of munin. In ,Oct. 1991.[12] S. Chen, R. Wang, X. Wang, and K. Zhang. Side-channelleaks in web applications: a reality today, a challengetomorrow. In
IEEE Symposium on Security and Privacy ,May 2010.[13] J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP:Deterministic shared memory multiprocessing. In , Mar. 2009.[14] G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M.Chen. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. In , Dec. 2002.[15] G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M.Chen. Execution replay for multiprocessor virtual machines.In
VEE , Mar. 2008.[16] S. A. Edwards, N. Vasudevan, and O. Tardieu. Programmingshared memory multiprocessors with deterministicmessage-passing concurrency: Compiling SHIM to Pthreads.In
DATE , Mar. 2008.[17] P. Efstathopoulos et al. Labels and event processes in theAsbestos operating system. In , Oct. 2005.[18] B. Ford, M. Hibler, J. Lepreau, P. Tullmann, G. Back, andS. Clawson. Microkernels meet recursive virtual machines.In , pages 137–151, 1996.[19] G. Kahn. The semantics of a simple language for parallelprogramming. In
Information Processing , pages 471–475.1974.[20] R. A. Kemmerer. Shared resource matrix methodology: Anapproach to identifying storage and timing channels.
TOCS ,1(3):256–277, Aug. 1983.[21] J. Kong, O. Acıiçmez, J.-P. Seifert, and H. Zhou.Deconstructing new cache designs for thwarting softwarecache-based side channel attacks. In , Oct. 2008.[22] T. J. Leblanc and J. M. Mellor-Crummey. Debugging parallelprograms with instant replay.
IEEE Transactions onComputers , C-36(4):471–482, Apr. 1987.[23] E. Lee. The problem with threads.
Computer , 39(5):33–42,May 2006.[24] L. Liu, E. Yu, and J. Mylopoulos. Analyzing securityrequirements as relationships among strategic actors. In
SREIS’022nd Symposium on Requirements Engineering forInformation Security , oct 2002.[25] S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes— a comprehensive study on real world concurrency bugcharacteristics. In , pages 329–339, Mar. 2008.[26] D. S. Parker, Jr. et al. Detection of mutual inconsistency indistributed systems.
IEEE Transactions on SoftwareEngineering , SE-9(3), May 1983.[27] S. Pearson. Taking account of privacy when designing cloudcomputing services. In
ICSE-Cloud ’09 , pages 44–52, May2009.[28] C. Percival. Cache missing for fun and profit. In
BSDCan ,May 2005.[29] N. R. Potlapally et al. Satisfiability-based framework forenabling side-channel attacks on cryptographic software. In
DATE , Mar. 2006.[30] T. Ristenpart et al. Hey, you, get off of my cloud: Exploringinformation leakage in third-party compute clouds. In , pages 199–212. 2009.[31] N. Santos, K. P. Gummadi, and R. Rodrigues. Towardstrusted cloud computing. In
HotCloud , June 2009.[32] S. Soltesz, H. Pötzl, M. E. Fiuczynski, A. Bavier, andL. Peterson. Container-based operating system virtualization:A scalable, high-performance alternative to hypervisors. In
EuroSys , Mar. 2007.[33] R. von Behren et al. Capriccio: Scalable threads for internetservices. In
SOSP’03 .[34] C. Vuillaume and K. Okeya. Flexible exponentiation withresistance to side channel attacks. In , pages268–283, June 2006.[35] Z. Wang and R. B. Lee. Covert and side channels due toprocessor architecture. In , Dec. 2006.[36] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta.The SPLASH-2 programs: Characterization andmethodological considerations. In , pages 24–36,June 1995.[37] J. C. Wray. An analysis of covert timing channels. In
IEEESymposium on Security and Privacy , May 1991.[38] N. Zeldovich, S. Boyd-Wickizer, E. Kohler, and D. Mazières.Making information flow explicit in HiStar. In7th OSDI