[PDF] SIVSHM: Secure Inter-VM Shared Memory

Abstract

With wide spread acceptance of virtualization, virtual machines (VMs) find their presence in various applications such as Network Address Translation (NAT) servers, firewall servers and MapReduce applications. Typically, in these applications a data manager collects data from the external world and distributes it to multiple workers for further processing. Currently, data managers distribute data with workers either using inter-VM shared memory (IVSHMEM) or network communication. IVSHMEM provides better data distribution throughput sacrificing security as all untrusted workers have full access to the shared memory region and network communication provides better security at the cost of throughput. Secondly, IVSHMEM uses a central distributor to exchange eventfd - a file descriptor to an event queue of length one, which is used for inter-VM signaling. This central distributor becomes a bottleneck and increases boot time of VMs. Secure Inter-VM Shared Memory (SIVSHM) provided both security and better throughout by segmenting inter-VM shared memory, so that each worker has access to segment that belong only to it, thereby enabling security without sacrificing throughput. SIVSHM boots VMs in 30% less time compared to IVSHMEM by eliminating central distributor from its architecture and enabling direct exchange of eventfds amongst VMs.

Full PDF

SSIVSHM: Secure Inter-VM Shared Memory

Shesha B. Sreenivasamurthy

Univ. of California, Santa [email protected]

Ethan L. Miller

Univ. of California, Santa [email protected]

Abstract

With wide spread acceptance of virtualization, virtual ma-chines (VMs) ﬁnd their presence in various applicationssuch as Network Address Translation (NAT) servers, ﬁre-wall servers and MapReduce applications. Typically, inthese applications a data manager collects data from theexternal world and distributes it to multiple workers forfurther processing. Currently, data managers distributedata with workers either using inter-VM shared memory(IVSHMEM) or network communication. IVSHMEMprovides better data distribution throughput sacriﬁcing se-curity as all untrusted workers have full access to theshared memory region and network communication pro-vides better security at the cost of throughput. Secondly,IVSHMEM uses a central distributor to exchange eventfd – a ﬁle descriptor to an event queue of length one, whichis used for inter-VM signaling. This central distributor be-comes a bottleneck and increases boot time of VMs. Se-cure Inter-VM Shared Memory (SIVSHM) provided bothsecurity and better throughout by segmenting inter-VMshared memory, so that each worker has access to segmentthat belong only to it, thereby enabling security withoutsacriﬁcing throughput. SIVSHM boots VMs in 30% lesstime compared to IVSHMEM by eliminating central dis-tributor from its architecture and enabling direct exchangeof eventfds amongst VMs.

In a cloud computing, a collection of VMs provide a ser-vice. In a multi-tenant environment, each tenant runsmultiple such services. MapReduce services will havea manager (mapper) that farms incoming data to mul- tiple workers (reducers), making routing decisions, andthe workers run the computation using programs providedby untrusted third party computation providers and re-turn the result back to the manager [18]. In the rest ofthis paper, manager and mapper are used interchangeablyand so are workers and reducers. The goal is to ensurethat each reducer has access to only its portion of datathereby preventing any information leak amongst the re-ducers. Data sharing using traditional network providesthat natural boundary whereby data of one reducer is in-accessible to another. If mapper and reducer VMs run onthe same physical host, data can be shared by inter-VMshared memory [11, 13]. Although this improves per-formance signiﬁcantly, it is the vulnerable to informationleakage among the VMs.SIVSHM solves this problem by segmenting the sharedmemory and mapping only a segment to each reducer VMin the hypervisor. Thus, each reducer VM has access toonly its segment. Any illegal access of memory by theguest kernel impacts only the adversary VM without af-fecting host, mapper or other reducer VMs.Inter-VM interrupts are used between mapper and re-ducer VMs to signal data availability and task com-pletion. This is accomplished by exchanging eventfds[20, 21] during startup with the aid of an eventfd dis-tributor [11, 13]. One eventfd distributor per service isrequired for services using IVSHMEM. However, in amulti-tenant multi-service environment, eventfd distribu-tor becomes a bottleneck during VM startup, thereby in-creasing boot time. Additionally, this extra software com-ponent needs to be managed by cloud service manage-ment software and is a management overhead.SIVSHM makes inter-VM shared memory more con-ducive to the cloud environment by enabling direct ex-change of eventfds between mapper and reducers thereby a r X i v : . [ c s . O S ] S e p liminating eventfd distributor from the architecture. Thisenables SIVSHM to boot a service with 32 VMs in 30%less time compared to IVSHMEM [11, 13]. SIVSHMmakes the following two contributions:1. secure inter-VM shared memory architecture2. improved boot-time of servicesSIVSHM is built using the same underlying mechanism(virtual-PCI device) as that of IVSHMEM and thereforeperforms similarly during data transfer as shown in ﬁg-ure 3. The advantages of SIVSHM over IVSHMEM issecurity, improved boot time and manageability. Data be-tween mapper and reducers are predominantly transferredusing high speed network interfaces. Therefore, through-put of SIVSHM is compared with VirtIO [1] and to thebest of our knowledge no prior work has made such acomparative study in MapReduce context. Compared toVirtIO, SIVSHM takes 40% less time to process 32 GB ofdata for small conﬁguration with 3 reducers and 60% lesstime for large conﬁguration with 31 reducers. In general, MapReduce applications fall into Recognition,Mining and Synthesis (RMS) framework proposed by In-tel [10]. They ﬁnd their presence in various applicationssuch as database engines, virtual routers, virtual ﬁrewalls,load balancers, video stream processing and numerousother applications including neural prostheses [9]. Therecan be applications where intermediate results producedon one machine are processed on another, either simulta-neously or later in time [12]. SIVSHM can be used by allapplications that fall under this category.There have been efforts to improve the performanceof MapReduce type of applications. Phoenix [2] is animplementation of MapReduce for shared-memory sys-tems that includes a programming API and an efﬁcientruntime system. However, the drawback of Phoenix is,both mapper and reducers run natively on physical sys-tem, which makes it less conducive to cloud environment.SIVSHM solves this problem by enabling mapper and re-ducers to run on different VMs and also opens opportu-nity to have multi-operating system environment wherereducer applications can be running on different types of operating systems. Additionally, failure of mapper or a re-ducer VM will or affect other VMs in the system and thefailed VM can be restarted without unlinking the sharedmemory thereby not losing any running VMs’ work.Similar to SIVSHM, IVSHMEM can be used to sharememory between VMs. Both are implemented as virtual-PCI (vPCI) devices in QEMU (explained later) and mapshared memory region to PCI device memory. Data trans-fer performance of SIVSHM and IVSHMEM will be sim-ilar as they are built using the same underlying vPCImechanism. However, the advantage of SIVSHM overIVSHMEM is security, improved boot time and manage-ability.Non IVSHMEM/SIVSHM MapReduce services dis-tribute data between mapper and reducers over the net-work using one of the two popular virtual network de-vices – e1000 or VirtIO. VirtIO is a virtual network de-vice that enables high speed data transfer between any twoVMs. Simple iperf [8] test shows VirtIO has 10x band-width compared to virtual e1000 device (4.12 Gbps Vs406 Mbps). As VirtIO is the predominant way to transferdata between VMs, data transfer throughput of SIVSHMis compared with VirtIO and IVSHMEM.Several applications have previously been developedusing the IVSHMEM infrastructure [15, 16, 17]. Gor-don has modiﬁed Phoenix MapReduce application to useIVSHMEM so that each Phoenix MapReduce thread runsinside a VM. This enables Phoenix to use shared memoryand yet be conducive to cloud environment [14]. How-ever, it suffers from information leakage as explained ear-lier.Airavat [18] runs on SELinux [19] to provide securityto MapReduce applications in cloud environment withoutshared memory. Data is exchanged over the network andtherefore would perform similar to VirtIO. SIVSHM im-proves performance with the use of shared memory andprovides security by memory segmentation.Quick EMUlator (QEMU) [3, 4] is a user space hard-ware emulator combined with Kernel Virtual Machine(KVM) [6] provides complete virtualization environmentin Linux. Hardware devices are emulated in QEMU whilememory management and guest instruction execution isperformed by KVM. QEMU triggers guest instruction ex-ecution using a blocking

KVM RUN ioctl to KVM. Whenguest I/O instruction such as reading and writing to a vPCIIO-registers are encountered by KVM, it returns from2octl for QEMU to emulate those hardware operations.During the time when control is within QEMU, guest in-struction execution has stopped. Similarly, when a usersignal is sent to QEMU process, KVM returns from ioctland after handling the signal, guest instruction executionis resumed by QEMU by issuing

KVM RUN ioctl. Wecan notice that a read/write to vPCI IO-registers by theguest, results in a VM-exit and a context switch, whichare expensive operations [25, 26].Eventfd forms the backbone of both IVSHMEM andSIVSHM’s interrupt architecture. The eventfd() system-call creates a kernel object that can be used as an eventwait/notify mechanism between user-space applicationsand kernel. The system-call returns a ﬁle descriptor ( fd )associated with the kernel object called eventfd to the userapplication. The kernel object contains an unsigned 64-bit integer counter that is maintained by the kernel. InIVSHMEM and SIVSHM, eventfds are used by QEMU– a user-space application and host-kernel. Eventfds aremapped to vPCI IO-registers by QEMU, which enablesthe hypervisor (host-kernel) to directly notify VM b whileexecuting guest instructions of VM a . These fd s are ex-changed between VMs for efﬁcient inter-VM signalingduring startup, by placing them in the control ﬁeld ofa unix-socket message. Placing fd s in the control ﬁeldof a message, is the standard linux way of fd exchange,whereby the sender process (QEMU a ) is instructing thehost-kernel to ensure that the receiving process (QEMU b )receives a fd unique in its name space. Eventfd exchangemechanism among VMs is improved in SIVSHM (ex-plained in the next section), but the core signaling mecha-nism is very similar to IVSHMEM and we refer the readerto IVSHMEM [13] for detailed explanation.All applications developed using IVSHMEM can alsobe developed using SIVSHM without having any of theabove mentioned drawbacks. In this work, a simple IO in-tensive MapReduce application is chosen to demonstrateits feasibility. SIVSHM architecture contains a trusted mapper VM anda farm of untrusted reducer VMs. Guest user-space map-per and reducer applications run inside mapper and re-ducer VMs respectively. SIVSHM architecture is as shown in ﬁgure 1. Polling or interrupt mechanism is usedbetween mapper and reducers to signal data availabilityand task completion. Mapper application stripes the datareceived from external world to respective memory slicesof the reducers and signal the reducers of data availabil-ity. The reducers process the data by accessing their re-spective slice and inform the mapper when their task iscompleted. Mapper consolidates the data and presents theresult. The main advantages of SIVSHM, security andimproved boot performance are explained below.

Security:

To share memory among VMs and for inter-VM signalling, we implemented a vPCI device called“ sivshm ” in QEMU. Mapper is always the ﬁrst VM to beinstantiated, which gets ID 0 and reducers are assignednon-zero IDs. Mapper’s sivshm device ( sivshm m ) createsa shared memory region and maps the entire region to itsPCI device memory. In IVSHMEM, reducer too maps theentire shared memory region similar to mapper. In con-trast to IVSHMEM, reducer’s sivshm device ( sivshm r ) inSIVSHM gets shared memory ID and the size of its mem-ory slice from sivshm m . It offsets into the shared memoryregion using its own ID as the key and maps only its sliceto the PCI device memory before booting the guest. Thisensures that guest running inside reducer VM has accessto only its slice, thereby providing superior security overIVSHMEM shared memory architecture. Boot performance:

Inter-VM signaling between VMsis achieved by eventfds[20, 21] in both SIVSHM andIVSHMEM. In IVSHMEM, a mapper and reducers ex-change eventfd through an eventfd distributor (man inmiddle). Eventfd distributor becomes a bottleneck whena service containing large number of reducers is instan-tiated. SIVSHM removes this bottleneck by direct ex-change of eventfd between mapper and reducer VMs.Mapper and reducer VMs exchange information such as,shared memory size, number of clients and client ID viaa unix-socket messages. Eventfds are piggybacked onthose messages by placing them in the message’s controlﬁeld. The format of the message exchanged is as shownin ﬁgure 2. The exchanged eventfds are added to their re-spective poll list, waiting to be notiﬁed by the hypervisorof any events. This design eliminates eventfd distribu-tor used by IVSHMEM from SIVSHM’s architecture en-abling better boot performance as shown in section 5.3 igure 1: SIVSHM ArchitectureFigure 2: Format of the message exchanged between sivshm m and sivshm r during startup System implementation is explained by taking a bottom-up approach. Physical PCI/PCI-E devices have an onboard RAM called device memory. The address of the de-vice memory is provided in PCI register called Base Ad-dress Register (

BAR ). PCI standard [24] supports 6 BARsand SIVSHM uses two of them –

BAR0 for IO registerspace,

BAR1 for mapping its own slice. A vPCI device inQEMU emulates device memory by allocating host mem-ory. In sivshm , the vPCI device memory is a shared mem-ory object created using

MAP SHARED ﬂag. The sharedmemory is

LOCKED to prevent it from being swapped outalong with the VM by the host OS as it is not just used byone VM but shared by many. The address of the sharedmemory object is provided in the emulated BAR. The ad-dress in the BAR is perceived as physical address by theguest that is mapped to kernel virtual address by the guestdriver.A guest kernel driver ( sivshm.ko ) to drive this new vPCIdevice was implemented, that claims this device and mapsthe vPCI device memory address to the guest kernel vir- tual address. The user applications can use this deviceto map the device memory to user address space, retrievedevice information and to generate interrupts. To aid ofuser applications, we implemented a shared library ( lib-sivshm.so ) that hides the driver interface details from theuser applications and exposes a simple API to user appli-cations. sivshm requires a shared memory ID, a unix socketpath, VM ID, size and maximum number of reducers asits input. Size and number of reducers are only used bythe mapper. These parameters are implemented as devicespeciﬁc variables, which are speciﬁed as command linearguments to QEMU. Mapper is instantiated ﬁrst and isalways assigned an ID 0. Speciﬁed shared memory ID isdeleted (if it is present in the system) and recreated. Op-tionally, unlink=0 can be passed instructing sivshm not todelete it. Reducers are assigned a non-zero ID. sivshm m communicates the size of the slice to sivshm r over the unixsocket.The shared memory region is sliced into equal sizedsegments: slice size = total shm size / Total VMs . TotalVMs includes mapper as it gets a slice too, which is usedas a message box during polling. The design does notpreclude reducers from having different sized segments.However, for the workload used in our experiments, itwas apt to have equal sized segments. sivshm r uses itsown ID ( r j ) as key to calculate the start address ( s i ) ofdevice memory region: s i = shm start address + ( r j × lice size ) . s i is mapped to BAR1 of sivshm r that allowsreducer r j to access only its slice. A predeﬁned location inthe reducers’ slice is used as mailbox to exchange signalsbetween mapper and reducers in polling mode.All guest instructions are executed by the hypervisoron guests’ behalf. A write by the guest to a vPCI IO-register mapped to an eventfd’s kernel object, triggers anevent to be delivered to the QEMU process waiting onthe corresponding user level fd . If mapper’s guest writesto its IO-register, the hypervisor directly notiﬁes sivshm r ,which sets the interrupt bit. An interrupt is delivered im-mediately to the reducer VM via eventfd that gets handledby the guest vPCI driver. MapReduce applications regis-ter a signal (SIGUSR1) to be delivered when an interruptis handled by the vPCI kernel driver. Similarly, reducerscan also send signals to the mapper.When the VM boots, the guest kernel driver ( sivshm.ko )claims the sivshm device. It maps the device memory toguest-kernel address and creates a device in the system de-vice tree. Mapper and reducer guest applications commu-nicate with the guest-kernel driver using simple APIs ofSIVSHM’s shared library ( libsivshm.so ). Mapper signalsthe reducers after distributing the data so that the reduc-ers can start processing it. Similarly, after reducers havecompleted their job, they signal the mapper to perform re-sult consolidation. A pre-deﬁned memory location in theshared memory region is monitored in polling mode forsignal exchange. In interrupt mode, mapper and reducerguest-applications request the guest-kernel to send inter-VM interrupts by using sivshm notify API. These appli-cations register a callback handler with the guest-kerneldriver, which are called when an interrupt is ﬁred.

The system is implemented using Linux 3.2.0-23 kernelas both host and guest operating system, KVM hyper-visor and QEMU version 2.3.0 device emulator. Thesetup was tested on a system with 128 GB RAM andfour 10-core processors with 2 hyper-threads per phys-ical core, seen as 80 processing units by the operatingsystem. Each VM was allocated 6 GB RAM. SIVSHMwas experimented with one mapper VM and {

1, 3, 7,15 and 31 } reducer VMs. Each vCPU was bound to aphysical processing unit. Reducer VMs were allocated 2 Figure 3: Performance comparison of VirtIO, IVSHMEM andSIVSHM to process 32 GB of data with 1 GB shared memory.Xfer – Time taken by all mapper threads to complete data transferResp – Time the mapper waits to receive response from all reducers‘I’ denotes Interrupt, ‘P’ denotes Polling vCPUs, one dedicated to the reducers’ guest applicationand another one for kernel activities. Mapper VM runsa multi-threaded application that distributes data to mul-tiple reducers. Hence, more vCPUs were allocated to itcompared to reducers. With 31 reducer VMs, we have 62CPUs allocated to reducers, 2 for the host hypervisor andthe rest 16 to the mapper VM.Entire set of experiments was repeated 10 times and theaverage of transfer and response times (explained later) isas shown in ﬁgure 3. The error bars denote the standarddeviation of the 10 values of each experiment. In each run,a total of 32 GB of random data was processed and perfor-mance of VirtIO, IVSHMEM and SIVSHM is compared.During VirtIO performance measurements, all VMs werenetworked using Linux bridge. Maximum possible band-width for VirtIO interfaces is achieved with this setup asno packets leave the compute node. 128 MB applicationbuffer was allocated, as this was the maximum slice sizethat SIVSHM was experimented with.We were interested in comparing IO performance.Therefore, a non-CPU intensive workload such as count-ing the number of occurrences of a speciﬁed characterand storing the result in mapper’s slice was chosen as ourworkload. Transfer time (

Xfer ) in VirtIO is the time takenby all mapper threads to copy data from application bufferto kernel buffer. In SIVSHM and IVSHEMEM, it is the5ime to copy data from application buffer to reducer slices.Response time (

Resp ) is the time the mapper waits to re-ceive responses from all reducers.In our experiments, 32 GB of data was processed with 1GB shared memory region. Slice sized data is transferredby mapper in parallel to all reducers. Map-reduce perfor-mance increases as reducers are added because, amount ofdata processed by each reducer decreases. This in turn re-duces the time taken to complete the work, especially re-sponse time, as shown in ﬁgure 3. This effectively showsperformance improvement due to parallelism.The response time of SIVSHM and VirtIO are verysimilar as the workers in both the cases perform simi-lar task – counting number of occurrences of a charac-ter. However, transfer time of SIVSHM and IVSHMEMis signiﬁcantly lower than VirtIO as both avoid a data copyfrom kernel buffer to VirtIO device queue in addition toTCP/IP stack overhead. Compared to VirtIO, SIVSHMtakes (3/5) th and (2/5) th the time at low and high reducercounts in both polling and interrupt modes.The response time of SIVSHM is marginally better thanIVSHMEM at lower reducer count and the gap graduallydecreases with higher reducer count. This is attributed tolower interrupt latency in SIVSHM – amount of time be-tween an interrupt is delivered to the device to the timethe application handles that interrupt. SIVSHM has a ker-nel vPCI driver that signals the user process (SIGUSR1)when an interrupt is delivered to it. This is faster thanIVSHMEM driver that is implemented using Linux UIOdriver [22] infrastructure, where an user-space processthat is blocked on a read of a fd has to be woken up. Theinterrupt latency of SIVSHM and IVSHMEM, on aver-age, was measured to be 55 µ sec and 120 µ sec. Thoughthe number of interrupts increase at higher reducer count,the latency gap decreases due to interrupt coalescing.The transfer time remains approximately the same in allmethods for different number of reducers. This is due tonetwork saturation in VirtIO and memory bandwidth sat-uration in case of SIVSHM and IVSHMEM. In ﬁgure 4,we can notice that, at higher reducer count the data trans-fer rate of VirtIO interface saturates at 6 Gbps. This sat-uration translates to constant transfer time when reducercount is greater than one. The performance drops multi-ple times as the mapper is waiting for all the reducers toprocess the data and report completion. For example – atotal of 32 GB of data is transferred from mapper to re- Figure 4: VirtIO network performance (Mapper) ducers and in case of 31 reducers, each reducer processes ∼ th the time to boot all the VMs when compared to6 igure 5: Mapper CPU concurrency performance. x-axis denotesnumber of simultaneously utilized logical CPUs. Graph byIntel R (cid:13) VTune TM [23].Figure 6: Time to boot different number of VMs using SIVSHM andIVSHMEM. Shorter bar is better. IVSHMEM. This is signiﬁcant, especially in a data cen-ters where services are instantiated and torn down veryfrequently.Cloud pricing model is based on resource consump-tion. Therefore, we wanted to measure SIVSHM’s perfor-mance with reduced vCPU count for the mapper. We re-duced to 8 vCPUs, half of the original count. As expected,it can be inferred from ﬁgure 7 that SIVSHM performsbetter with increased vCPU count. With half the num-ber of vCPUs, 1.25x increase in transfer time is noticed.Improvement is seen even when the number of reducers,and hence number of mapper threads, are less than num-ber vCPUs. This is attributed to vCPU resource sharingmechanism in the hypervisor. However, the gap increaseswhen the ratio of reducer count to vCPU count increases.

Figure 7: SIVSHM performance with different number of vCPUs. ‘I’denotes Interrupt, ‘P’ denotes Polling, 8 & 16 denote number of vCPUs.Eg: SIVSHM-I8 — SIVSHM in interrupt mode with 8 vCPUs.Figure 8: CPU utilization between polling and interrupt

Another interesting observation is that SIVSHM showssimilar performance in both polling and interrupt modeas seen in ﬁgure 3. With similar performance, the moti-vation to implement more complex interrupt architectureis the improvement in CPU utilization – 100% in pollingmode and hovers between 55-60% in interrupt mode asshown in ﬁgure 8. The remaining 40% is available forother processes, which is a huge beneﬁt of using interruptdriven architecture.7

Conclusion

Shared memory is used to signiﬁcantly boost the perfor-mance of regular applications from a long time. However,sharing memory among VMs is a recent thing. SIVSHMis a secure inter-VM shared memory architecture that canbe used to boost performance of many cloud applications.SIVSHM takes 0.6x and 0.4x the amount of time com-pared to VirtIO for small and large number of reducers asVirtIO involves extra data copying and additional TCP/IPstack overhead.The main restriction of SIVSHM is that the VMsshould be running on the same compute node. In spiteof this, applications can still be beneﬁtted by SIVSHM asmany of these applications run multiple copies for perfor-mance and to insulate customers from software bugs, bothof which can be achieved by running VMs carrying theseapplications on the same compute node. However, we be-lieve that we can overcome this restriction if we build anarchitecture where SIVSHM is used when VMs are co-located on the same compute node and utilize RDMA [7]technology when VMs are on different compute nodes.This hybrid architecture can then be used in any data cen-ter or cloud environment to improve the performance ofvariety of applications without the restriction that VMsshould be running on the same compute node.

References [1] Russell, Rusty. “virtio: towards a de-facto standard for virtual I/Odevices.” ACM SIGOPS Operating Systems Review 42.5 (2008):95-103.[2] Ranger, Colby, et al. “Evaluating mapreduce for multi-core andmultiprocessor systems.” High Performance Computer Architec-ture, 2007. HPCA 2007. IEEE 13th International Symposium on.IEEE, 2007.[3] Bellard, Fabrice. “QEMU, a Fast and Portable Dynamic Trans-lator.” USENIX Annual Technical Conference, FREENIX Track.2005.[4] Bartholomew, Daniel. “QEMU a Multihost Multitarget Emulator.”Linux Journal 2006.145 (2006): 3.[5] Jujjuri, Venkateswararao, et al. “VirtFS-A virtualization aware FileSystem passthrough.” Ottawa Linux Symposium (OLS). 2010.[6] Kivity, Avi, et al. “kvm: the Linux virtual machine monitor.” Pro-ceedings of the Linux Symposium. Vol. 1. 2007. [7] Recio, Renato, et al. A remote direct memory access protocol spec-iﬁcation. RFC 5040, October, 2007.[8] iperf. Retrieved from http://iperf.sourceforge.net onMay 02, 2015.[9] Linderman, Michael D., and Teresa H. Meng. “A low power mergecell processor for real-time spike sorting in implantable neuralprostheses.” Circuits and Systems, 2006. ISCAS 2006. Proceed-ings. 2006 IEEE International Symposium on. IEEE, 2006.[10] Dubey, Pradeep. “Recognition, mining and synthesis moves com-puters to the era of tera.” Technology@ Intel Magazine 9.2 (2005):1-10.[11] IVSHMEM: Inter VM Shared Memory. Retrieved fromhttp://dpdk.org/doc/guides/prog guide/ivshmem lib.html on May02, 2015[12] Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. “TheGoogle ﬁle system.” ACM SIGOPS operating systems review. Vol.37. No. 5. ACM, 2003.[13] Macdonell, A. Cameron. Shared-memory optimizations for virtualmachines. Diss. University of Alberta, 2011.[14] Gordon, Adam Wolfe. Enhancing cloud environments with inter-virtual machine shared memory. Diss. University of Alberta, 2011.[15] Ke, Xiaodi. Interprocess communication mechanisms with Inter-Virtual machine shared memory. Diss. University of Alberta, 2011.[16] Diakhat´e, Franc¸ois, et al. “Efﬁcient shared memory message pass-ing for inter-VM communications.” Euro-Par 2008 Workshops-Parallel Processing. Springer Berlin Heidelberg, 2009.[17] Mohebbi, Hamid Reza, Omid Kasheﬁ, and Mohsen Shariﬁ.“Zivm: A zero-copy inter-vm communication mechanism for cloudcomputing.” Computer and Information Science 4.6 (2011): p18.[18] Roy, Indrajit, et al. “Airavat: Security and Privacy for MapRe-duce.” NSDI. Vol. 10. 2010.[19] B. McCarty. SELinux: NSA’s Open Source Security EnhancedLinux. O’Reilly Media, 2004[20] eventfd. Retrieved from http://man7.org/linux/man-pages/man2/eventfd.2.html on Sep 02, 2015.[21] Kerrisk, Michael. The Linux programming interface. No StarchPress, 2010.[22] Koch, Hans J., and H. Linutronix Gmb. “Userspace I/O drivers ina realtime context.” The 13th Realtime Linux Workshop. 2011.[23] Intel R (cid:13) VTune TM Ampliﬁer 2016: Downloaded from https://software.intel.com/en-us/intel-vtune-amplifier-xe on Sep 29, 2015.

24] PCI Local Bus Speciﬁcation Revision 3.0: Downloaded from on Sep 29, 2015.[25] Kivity, Avi, et al. “kvm: the Linux virtual machine monitor.” Pro-ceedings of the Linux symposium. Vol. 1. 2007.[26] Zhang, Binbin, et al. “Evaluating and optimizing I/O virtualiza-tion in kernel-based virtual machine (KVM).” Network and ParallelComputing. Springer Berlin Heidelberg, 2010. 220-231.on Sep 29, 2015.[25] Kivity, Avi, et al. “kvm: the Linux virtual machine monitor.” Pro-ceedings of the Linux symposium. Vol. 1. 2007.[26] Zhang, Binbin, et al. “Evaluating and optimizing I/O virtualiza-tion in kernel-based virtual machine (KVM).” Network and ParallelComputing. Springer Berlin Heidelberg, 2010. 220-231.