Secure Memory Management on Modern Hardware
Reto Achermann, Nora Hossle, Lukas Humbel, Daniel Schwyn, David Cock, Timothy Roscoe
SSecure Memory Management on Modern Hardware
Reto Achermann, Nora Hossle, Lukas Humbel, Daniel Schwyn, David Cock, Timothy RoscoeSystems Group, Department of Computer Science, ETH Zurich
Abstract
Almost all modern hardware, from phone SoCs to high-endservers with accelerators, contain memory translation and pro-tection hardware like IOMMUs, firewalls, and lookup tableswhich make it impossible to reason about, and enforce pro-tection and isolation based solely on the processor’s MMUs.This has led to numerous bugs and security vulnerabilities intoday’s system software.In this paper we regain the ability to reason about and en-force access control using the proven concept of a referencemonitor mediating accesses to memory resources. We presenta fine-grained, realistic memory protection model that makesthis traditional concept applicable today, and bring systemsoftware in line with the complexity of modern, heteroge-neous hardware.Our design is applicable to any operating system, regard-less of architecture. We show that it not only enforces theintegrity properties of a system, but does so with no inherentperformance overhead and it is even amenable to automationthrough code generation from trusted hardware specifications.
Both new, fully-verified kernels and traditional production-quality operating systems rely on a model of memory address-ing and protection so simple it is rarely remarked on: RAMand devices reside at unique addresses in a single, shared phys-ical address space, and all cores have homogeneous memorymanagement units (MMUs) which translate virtual addressesinto this single physical address space.The OS running on the platform then fulfills two roles:First, it manages resource allocation . Virtual memory makesmultiplexing hardware easier by decoupling the application’sview of memory from the physical resources managed bythe OS, allowing late binding of addresses. Second it forms,alongside the MMU, a reference monitor [4]: All resourceaccesses (dereferences) are intercepted by the monitor (specif-ically the TLB), and checked against an access-control policy . This has for decades formed the basis for secure process iso-lation in all operating systems implementing virtual memory.The reference monitor concept repeats throughout tradi-tional OS design, with more sophisticated abstractions grad-ually built up, and their associated security properties en-forced through a combination of hardware-provided monitors(e.g. MMUs), and software ones (e.g. traps and syscalls).For example, consider name (or address) resolution andauthorization checks in the mmap() syscall. A process beginswith a reference to a file: its filename. The OS, meanwhile,enforces some access-control policy, e.g. UNIX-style per-missions. The calling process dereferences the filename bypassing it to the open() syscall, whereupon the OS vali-dates the request against policy (permissions), and resolves the reference to another reference: the file descriptor (FD),now referring to an entry in the global open-file table. Theexistence of this entry, and that the process may possess areference is justified by the top-level policy; The pattern ofopen files and FDs (the state) is a projection of somethingpermitted by the policy.This pattern is replicated in the VM system thanks to mmap() . Unix cannot directly interpose on memory readsand writes (to the buffer cache page mapped to the user), butdoes implement the initial mmap() call, and the page faulthandler. The kernel builds a reference monitor by composing itself with that provided by the MMU. On an mmap() call, thekernel verifies that the FD is valid, with appropriate permis-sions (e.g. write), before constructing a VM region to back themapping. The policy encoded in the region’s flags is thus a(transitive) projection of the original file system permissions.On a page fault, the kernel is again invoked to lazily populatethe region (from the buffer cache). Now, it can consult themapping parameters (e.g. writable), and translate these toflags in the page-table entry.Thus, the page-table state (e.g. permission bits), and thencethe eventual TLB state, are justified by a chain of monitors allthe way back up to the system policy (file system permissions).The MMU enforces this projected policy on the OS’ behalf.Together they form, in security terms, a compound reference a r X i v : . [ c s . O S ] S e p onitor to enforce a policy both on real hardware resources(RAM), and abstract OS-specific objects (processes, files).This model has worked well for decades, but has beenundermined by a changing hardware contract. A modernsystem contains not just processors and their attached MMUs,but system MMUs or IOMMUs, memory firewalls, regionlookup tables, etc. all of which mediate access to and fromparts of the platform. “Smart” devices like GPGPUs, co-processors, network cards, or accelerators come with theirown hardware protection and translation units [20].In such a system, the processor’s MMU alone does notform a reference monitor for memory, as it is not invoked onall accesses. Indeed, the complex address-translation topol-ogy of these systems renders even the concept of a uniquephysical address meaningless, raising the risk that the policyencoded into the distributed hardware reference monitor (thecollections of MMUs, SMMUs, etc.) is inconsistent due totheir differing views of the machine. These two problemshave already led to security vulnerabilities [32, 33, 37, 41].We identify three classes of security vulnerabilities andbugs (Table 1) that i) cause the execution of an operationwithout sufficient rights (a failure of policy enforcement ), ii) allow a compromise of the reference monitor itself (e.g. writ-ing translation tables, a failure of partitioning ), or iii) use thewrong addresses in descriptors or pointers (a failure of nameresolution ). The lack of a proper reference monitor which isaware of the complex and configurable addressing networkcontinues to result in numerous bugs and security vulnera-bilities [14, 21, 42, 45, 46, 53, 61]. Confining these bugs in akernel is hard, and they are likely to compromise the entiresystem [13].In this paper we demonstrate that these whole classes ofbugs can be prevented by extending the traditional OS-MMUreference monitor to cover all hardware translation and en-forcement engines, allowing policy enforcement on all mem-ory accesses, ensuring consistent name resolution by adoptingthe decoding net [1, 2] as a more faithful model of modernaddressing hardware, and ensuring the secure partitioning ofreference monitor state either through a partitioned capabilitysystem, or in a traditional kernel (such as Linux) by goodsoftware engineering practice and the application of existingmemory management interfaces.Our first contribution is to identify the undermining ofthe traditional OS-MMU reference monitor by a changinghardware/software contract as the root cause of several largeclasses of critical security bugs.Our second contribution is to adopt a faithful model ofcomplex addressing hardware (the decoding net), and from itderive a minimal least-privilege model of memory manage-ment authority on modern hardware, covering the commonfunctionality of all virtual memory systems (§ 4.1).Our third contribution is the specification of an OS-agnostic reference monitor to enforce policy expressed in the abovemodel, prototyped as an executable specification in Haskell, Type CVE-...
Policy enforcement 1999-1166 2014-3601 2014-8369 2014-98882017-16994 2019-2250 2019-10538 2019-10539 2019-10540Partitioning 2011-1898 2013-43292014-0972 2018-10382018-11994 2019-2182 2019-19579Name resolution 2013-4329 2014-9932 2016-3960 2016-53492017-8061 2017-12188 2019-15099
Table 1: Classes of Security Vulnerabilities.and abstracting the OS’s internal policy language (e.g. capa-bilities or ACLs) as an access-control matrix .Our fourth contribution is to demonstrate that this refer-ence monitor design can be implemented without invasivechanges on either partitioned capability systems (e.g. seL4or Barrelfish), or on ACL-based UNIX-style kernel (such asLinux). Further our benchmarks demonstrates that there isno measurable performance cost for a secure fully-explicitleast-privilege system-wide virtual memory authority imple-mentation (§ 6)
The difficulty of getting complex memory addressing right inan OS is shown by the steady, ongoing stream of related bugsand vulnerabilities in operating systems, for example, policyenforcement in Linux’s memory management code [25].We identify three classes of common bugs and security vul-nerabilities related specifically to the incompleteness of thecurrent reference monitor, which would be rendered impossi-ble under comprehensive reference monitor which faithfullyreflected the hardware:
Policy Enforcement.
These are bugs where a subject wasable to change the configuration of a translation unit withouthaving the proper rights do to so. The reference monitor failshere to enforce the system policy: • Mappings with holes belonging to another subject [39]. • Incorrect permissions on data pages [40]. • IOMMU configured to map too large a range [47–49].All these bugs are impossible once the operations are per-formed through a (correct) reference monitor implementingthe system security property.
Partitioning.
These bugs involve bypassing the referencemonitor directly e.g. by directly modifying its internal state: • DMA transfers into MSI-x interrupt registers [36]. • DMA transfers into IOMMU control registers [38]. • Process modifies its own page table [44].These are prevented once the reference monitor state is iden-tified and partitioned by subjecting them to system policye.g. that no DMA engine or process may map a page table.2 ame Resolution.
This class represents inconsistent inter-pretations of pointers (names): • Insufficient context to identify the correct object [42]. • Resolving addresses in the wrong context [43].These are prevented once names are dereferenced (resolve)through a monitor with a complete, accurate model of address-ing.
Before presenting our authority model and the executablespecification in the next section, we will briefly cover ref-erence monitors in a little more detail, in particular the im-portance of consistent naming, and how complex addressingtopologies make it difficult.We also summarize the existing decoding net model, the ex-ecutable specification/refinement approach which we borrowfrom the seL4 system, and the related work.
The reference monitor is a powerful structuring concept inaccess control, and is implicitly used in practically every OS.A reference monitor enforces an access-control policy, allow-ing a separation of concerns, and thus effort: if every accessis subject to the policy, then the overall safety of the system(w.r.t. the policy) can be guaranteed independently of the cor-rectness of the components making the accesses. This is ofenormous benefit to a monolithic system (e.g. Linux), where afault in one subsystem can easily spread to others, particularlyas any subsystem can, in principle modify translations. Evenwithout enforcing a strict boundary between components (asin a microkernel), routing all updates via a single componentresponsible for safety ensures that accidental errors will nolonger lead to a whole-system compromise.The critical point for a reference monitor is that all accessesmust pass through to it, and that it is able to accurately iden-tify which resources are being accessed (e.g. which DRAMaddress will ultimately be written) when applying its pol-icy. Both of these are undermined in the complex address-translation networks of modern systems, but not fatally so:The hardware component of the reference monitor is now distributed among multiple system MMUs, firewalls, etc.;addresses may be rewritten after policy is applied, routingthem to locations that should not be accessible.Both of these problems are solved with an accurate modelof the hardware: First, to know the complete set of access-control components that must be included in the referencemonitor, and second, to guarantee that any translation belowthe access-control level is consistent with policy.
As established, modern platforms are composed of multiple,heterogeneous cores and devices each of which can issue ac-cesses to addressable resources such as DRAM, non-volatilememory or device registers. Worse, there is no single “ref-erence” physical address space [20]. Instead, a network ofaddress spaces or buses is connected by address translationunits which “route” memory accesses. As just described,in order to securely enforce access control, it is essential toknow what final resource some intermediate address (or name )refers to.I/O memory management units (IOMMUs, or systemMMUs) translate addresses generated by accelerators andDMA-capable devices into a “canonical” system-wide physi-cal address space. This allows user-space programs to share avirtual address space with a context on the device, but imposea further complexity burden on the underlying OS which mustnow ensure that IOMMUs are always correctly programmed.This code is fraught with complexity and consequent bugsand vulnerabilities, as it is also intended to provide protectionfrom malicious memory accesses [32–35]. The problem islikely going to get worse with the proliferation of IOMMUdesigns built into GPUs, co-processors, and intelligent NICs.Even memory controllers can violate the traditional model.Hillenbrand et al. [23] reconfigure memory controller config-urations from system software to provide DRAM aliases formitigating the performance effects of channel and bank inter-leaving. Proposals for “in-memory” or “near-data” process-ing [51,56,60] raise further questions for OS abstractions [10]and require a way to unambiguously refer to memory regard-less of which module accesses it.
A systematic and accurate way to establish canonical namesfor access-controlled resources that may be referred by differ-ent local names in different parts of the system is providedby the established decoding net [1, 2] model of address trans-lation.Decoding nets model the addressing structure of a systemas a directed graph, where nodes represent (virtual or physical)address spaces or devices (including RAM), and edges thetranslation of
AS-local addresses into other address spaces ordevices. The graph is a set of nodes, defined as an abstractdatatype: name = Name nodeid addressnode = Node accept :: { address } translate :: address → { name } The model distinguishes local names ( address ), relative tosome address space, and global names ( name ), which qualifya local name with its enclosing address space. Each nodemay accept a set of (local) addresses (e.g. RAM or mem-ory mapped device registers), and/or translate them to one3 ecoding Net ModelAbstract Authorization Model Prior workExecutable Speci fi cationOperating System Implementation Dynamic updates, subjects/objects, authority.OS implementationExecutable speci fi cationof a reference monitor. Figure 1: Methodology Overview: Refinement steps.or more global names (addresses in other address spaces,e.g. MMU or PCI bridges).This approach dovetails nicely with the reference monitorconcept as described above. Every translate step correspondsto a dereference operation, and any accept can be used as acanonical name: the ID of the accepting node, plus the local address at which it accepts (e.g. address within a DRAMbank).Decoding nets have been successfully used to model a widevariety of systems of exactly the sort that is of interest to us,and give a trustworthy, precise guide to where a referencemonitor is required: any configurable translation node mustbe treated as part of the distributed reference monitor. It mustonly be configured such that its local translations are a projec-tion of the higher-level security property, exactly as for a pro-cessor’s MMU.
Static configuration nodes must be configuredin such a way (either by construction or static verification)that their translations are consistent with the projected policyat the point they are applied.
We borrow our modeling technique, combining refinement with executable specification from the successful seL4 project.We identify all relevant objects (page tables, address spaces,frames, . . . ), the subjects that manipulate them (processes,devices, . . . ), and which authority each subject exercises overeach object (e.g. in mapping a frame to a virtual address).These are expressed in an access-control matrix (followingLampson [29]) which forms our abstract specification , analo-gous to the high-level security policy (integrity) shown to berefined (correctly implemented) all the way down to compiledbinaries for seL4 [55].Again, as in seL4 [15], we next develop an executablespecification in Haskell (see § 4.2), expressing subjects, ob-jects, and authority as first-class objects, permitting rapidprototyping without giving up strong formal semantics. Cor-respondence between abstract and executable models is thusfar by inspection and careful construction.Finally, we show (again with precedent [59]) that the exe-cutable model (and hence the abstract model) permits multiplehigh-performance implementations (see § 5): On Barrelfish,as a representative of partitioned-capability systems includingseL4 (capabilities corresponding to rows in the matrix), andon Linux, as a representative UNIX-style monolithic kernel(where ACLs correspond to columns in the matrix).
The seL4 proof [28] assumed a single, fixed, physical addressspace and a single MMU, and provides no guarantees in thepresence of other cores or DMA devices. CertiKOS [22]builds on a model of memory accesses to abstract regionsof private, shared or atomic memory, but again provides noproof in the presence of other translation units or cores. Evenwork on verifying memory consistency in the presence oftranslation currently only considers the simple case of virtual-to-physical mappings [52].Graviton [57] provides a trusted execution environmentfor GPUs requiring all updates to the page tables go throughthe command processor, acting as a reference monitor for theGPU. Komodo [19] uses ARM TrustZone [6] to implement asoftware enclave. Both of these works are steps in the rightdirection, and in this work we extend this approach to thewhole system.OpenCL’s Shared Virtual Memory [27], nVidia’sCUDA [50] or HSA [24] provide a unified view of memory,ensuring addresses remain valid between CPU and GPU.VAST [30] which uses compiler support to dynamicallycopy memory to and from the GPU and Mosaic [8], whichprovides support for multiple sizes of page translation in ashared virtual address space between CPU and GPU. Theseapproaches ensure address consistency in the specific caseof CPU–GPU sharing, but are again not whole-systemapproaches.In DVMT [3], a customized TLB miss handler imple-mented as a helper thread installs entries in the TLB usingspecialized instructions. Similar to the MMU, the OS/hy-pervisor sets up data structures specifying the policy whichmappings the thread is allowed to install. Again this solutionfocuses on the processor and its MMU.
A static decoding net is a snapshot of the address transla-tion configuration of a system, at a particular moment. Weaugment the static decoding net with a transition relation,modelling the dynamic reconfiguration of the translation hard-ware such as when a page table is modified. The allowabletransitions express the actions (or traces ) permitted by themodel.
The system consists of a set of address spaces each having acurrent configuration , which corresponds to a decoding net node, that defines the translation of local addresses in thisaddress-space context : configuration :: address space → node This lets us reason about translations with the existing mech-anisms available for decoding nets. Hardware constraints,4 odes: node :: Decoding Net Node
Objects:
Object = { name } Rights:
Right = Grant | Map | Access
Configuration Space:
ConfSpace :: AddressSpace → { node } Address Space Configuration:
Configuration :: AddressSpace → node Access Control Matrix:
AccessControlMatrix :: Subject × Object → {
Right } Model State:
State = (
AccessControlMatrix , Configuration ) State Transitions:
ModifyMap :: Subject → ( name → { name } ) → State → State
Figure 2: Model Definition
VirtualAddress SpaceIntermediateAddress SpacePhysicalAddress Space grantgrant mapmap
Figure 3: Mappings between address spaces showing grantand map rights of mapped segments.e.g. an MMU that only supports the translation of naturallyaligned 4 KiB blocks of addresses, are expressed as a restric-tion on the set of possible nodes an address space can mapto. This set is the configuration space of an address space.Invariant I1 requires that every address space must have awell-defined configuration. The configuration space of a fixedaddress space is a singleton set.
Invariant I1 (Well-defined Configuration) ∀ a :: AddressSpace . Configuration a ∈ ConfSpace a . Configuration Authority (Mapping).
The configuration ofsome address spaces can be changed. The configurationspace defines the set of possible states an address space mayoccupy. An authority is a subset of configuration transitions,representing what configuration actions a given subject ispermitted to take.Consider Figure 3, representing the general case of anupdate to an intermediate address space (for example the in-termediate physical address, IPA, in a two-stage translationsystem). We identify two distinct authorities: The
MAP au-thority, or the authority to change the meaning of an IPA bychanging its mapping; and the
GRANT authority, or the rightto grant
ACCESS (by mapping) to some range of physical ad-dresses. Note that the ’virtual’ and ’physical’ address spacesof Figure 3 can be viewed as special cases of an intermediateaddress space: A top-level ’virtual’ address space is simplyone to which nobody has a
GRANT authority, and a ’physical’address space e.g. DRAM is one to which there exists no
MAP authority.
Right R1 (Grant)
The right to insert this memory object into some address space
Right R2 (Map)
The right to insert some memory object into this address space
Right R3 (Access)
The right to read or write an object.
Xeon Phi CoreXeon Phi BusXeon Phi SMPTGDDR IOMMU
Registers
DMA Core PCI Bridge WindowRAM Processor Memory ControllerCPU CoreIOMMU
Figure 4: Address spaces in a system with two PCI devices subject / object DMA IOMMU bufferIOMMU driver
MAP
Xeon Phi process
GRANT
Table 2: Access control matrix of the Xeon Phi example
Changing Mappings.
Consider Figure 4, showing the ad-dress space configuration of a system with two PCI devices:a DMA engine and an Intel Xeon Phi co-processor. Imaginethat we wish to establish a shared mapping to allow a processon a Xeon Phi core to receive DMA transfers (e.g. networkpackets) into a buffer allocated on the GDDR (following thehighlighted path from the DMA core to the GDDR).The process ‘owns’ the buffer, and has the ability to call recv() , triggering a DMA transfer. In other words, theprocess has the right to grant
ACCESS (temporarily) to theDMA core, but it clearly should not have the ability to modifythe IOMMU mappings of the DMA core at will. Hence,it does not have the
MAP authority on the relevant addressspace.To change the mappings of an address space, an agent (a subject , in standard access-control terminology) needs boththe
GRANT authority on the buffer object , and the
MAP au-thority on the address space object .The state transition, i.e. changing the configuration andtherefore how an address space translates addresses, is ex-pressed by the operation
ModifyMap() : A subject tries tochange how a name is being translated by the system, andthus updates its state.
Authority Representation.
In a monolithic kernel, boththese authorities are held (implicitly) by the kernel, whichexercises them on behalf of the subjects. It is up to the kernelto maintain accurate bookkeeping to determine whether anysuch request is safe, typically using an ACL (access-controllist) i.e. the object lists the subjects and their authorities on it.In a partitioned-capability system such as seL4 or Barrelfish,these authorities are represented by capabilities, handed ex-plicitly to one subject , to authorize the operation. In this case,subjects hold the authority on the object . These are equiva-lent from the perspective of access control, differing only inimplementation: the same two basic types of authority arepresent.The standard representation of authority in systems is anaccess control matrix [29], such as that of Table 2. This can beread in rows: The IOMMU driver has the
MAP capability tothe IOMMU address space, and the process the
GRANT capa-5 ddressSpaceOf() mappable objectsunmappable objects retype()
RAMTranslation Structure grant
Frame grant,map
Con fi g. Address Space Figure 5: Object Type Hierarchy and possible rights.bility to the buffer. Alternatively, reading down the columnsgives the ACLs: the IOMMU records
MAP permission for thedriver, and for the buffer records a
GRANT permission for theprocess.
Security Property.
This access control matrix is our abstractmodel. A system is correct (secure) statically , if its currentconfiguration is consistent with the access control matrix. Itis secure dynamically if any possible transition, beginning ina secure state, must leave the system in a secure state. Theaccess control matrix, together with the configuration spacedefines the allowable state transitions. The address spacemust have a valid configuration supported by hardware, andthe subject modifying it must have sufficient rights to do so.
We refine this abstract model into an executable specificationof a reference monitor [4] for
ModifyMap() . When com-posed with the reference monitor
ACCESS i.e. the MMU, wehave our desired compound reference monitor for the fully-dynamic VM system, secure for accesses beginning at anycore or device.This specification serves as an intermediate step between(Figure 1) the abstract model and the concrete OS implemen-tation of the next section, and also an OS-agnostic prototypefor implementation in other systems. This approach is in-spired by seL4 [17], which also employed an intermediateHaskell specification to facilitate prototyping.
Explicit Translation Structures.
We now explicitly rep-resent address translation structures (e.g. page tables, ormemory-mapped device registers) as memory objects, with-out imposing any particular layout on them. This allows us toreason about the manner in with address translation dependson the contents of a memory object (e.g. page tables in RAM,or the contents of device registers).Once the translation structures are explicit, and noting thatthese are exactly the reference monitor state we must securelypartition, we can state the partitioning invariant (Invariant I2)in terms of implementation-visible objects.
Invariant I2 (Partitioning)
No subject has
ACCESS to a translation object
We model address translation structures as an opaque datatype (
TStructure ). This allows us to maintain generality byassuming nothing about their actual inner structure: data Object = RAM {base::Name, size::Natural}| Frame {base::Name, size::Natural}| TStructure {base::Name, size::Natural}
Memory objects form a hierarchy (Figure 5 shows an excerpt)which defines how the different types of objects can be de-rived from each other. For example, in-memory translationstructures ( TS TRUCTURE ) are created by retyping
RAM ob-jects.
RAM is the base type for untyped memory. Retyping
RAM to a F RAME makes it possible to map it into an addressspace i.e. to
GRANT access to it. Note, that neither
RAM nor TS TRUCTURE have the
GRANT right, and therefore thesemay never become accessible (partitioning).An address space is derived from (and defined by) a trans-lation structure, and is an explicit object granting the rightto map this space into higher-level address spaces (e.g. asecond-stage page table defining an IPA space, assigned tothe guest-physical address space of a virtualized OS): Fig-ure 3.
AddressSpaceOf :: Object -> AddressSpace
Authority and State.
The system is a set of agents, a map-ping database (MDB) recording the derivation relation be-tween objects, and a set of active address spaces: data KState = KState (Set Agent) MDB (Set AddrSpace)
Authority is either directly to an object, or a meta-authority,the right to grant an authority to another. In turn set of suchauthorities, coupled with an identifier, define an agent. data Authority = Access Object | Map Object| Grant Authority
Reference monitor.
The model exposes a set of operationsthat either change a configuration or access a memory address.The set of permitted operations defines the behavior of thereference monitor. We express this in Haskell as a customstate monad: data Operation a = Operation (State -> (a, State))instance Monad (Operation) where ...
The reference monitor intercepts operations and verifies thatthe agent performing the operation has sufficient rights toexecute it. We express the changes to the system’s state as se-quence of operations on the reference monitor, e.g. retype() or map() , forming a trace of operations: mappingTrace = do... -- retype a RAM object to a Frame res <- Model.retype RAM Agent Frame Agent -- retype another RAM object to a TStructure res <- Model.retype RAM2 Agent TStructure Agent -- map the frame into the translation structure mapping1 <- Model.map TStructure Frame Agent... Model traces are sequences of monitor states, ( KS TATE ),each corresponding to a static decoding net model. Operationsinclude: • retype() converts an existing object into an object of apermissible sub type. • map() installs a mapping in a translation structure. • copy() copies an authority from one subject to another. Valid Traces.
Contained within the set T of all possibletraces, there is a set of traces T V ∈ T that conform to all con-6 olicyMechanismStatic platformdescriptionHardwarediscovery Model runtime stateAlgorithmsstate population Access Control SystemApplicationmodel runtime QueryReference Monitor operations reference monitor Figure 6: Implementation Overviewstraints enforced by the executable specification. We expressthese traces in the model as sequences of
KState s. All othertraces ( T − T V ) indicate ending in a failure state (e.g. thatexecution ended in a state not satisfying the access-controlpolicy). Summary.
The executable specification allows us to bothsimulate and specify sequences of operations such as mem-ory accesses or translation configurations as they would beperformed by a concrete OS, implementing the new abstractmodel.
In this section, we describe the implementation of the refer-ence monitor and runtime support libraries and services intwo classes of operating systems: a complete implementationin
Barrelfish/MAS as a representative of a partitoned capabil-ity system, derived from the open-source Barrelfish OS [12],and side-by-side a sketch of an implementation within Linux,as a representative of a traditional UNIX-style kernel. Architecture Overview.
Figure 6 shows an overview of theresulting architecture. We separate policy and mechanism: 1at the center is the runtime representation of the model (§ 5.1)which stores the memory topology and provides queries andalgorithms for memory allocation policies, 2 the referencemonitor which enforces access control and provides the mech-anisms for resource management and configuration, and 3static platform descriptions and dynamic discovery mecha-nisms (§ 5.3) provide input for the policy and mechanismimplementations.
We implement the runtime representation of the address spacemodel (Figure 6, 1 ) in a policy engine. On Barrelfish,this is merged into the Prolog-based system knowledge-base(SKB) [54], which already stores both static and dynamicfacts about the system. On Linux, we could use a standaloneProlog instance and run it as a service, or implement themodel directly along with other memory allocation policiesinside the kernel. We now describe the model representation,its algorithms and potential optimizations.
Model representation.
We implement the model represen-tation by asserting facts for the accept, translate and overlay MAS stands for m ultiple a ddress s paces. assert ( translate ( RegionFrom , RegionTo )).assert ( overlay ( NodeFrom , NodeTo )).assert ( accept ( Region )).dn_get_allocation_range ( NodeSrc , NodeDst ).dn_get_config_nodes ( NodeSrc , NodeDst ).dn_resolve_range (Node , Addr , Size ).dn_resolve_range ( NodeSrc , Addr , Size , DstSrc ). Listing 1: Prolog Model Representationconstructs of the model (see syntax in [2]). Listing 1 showsthe corresponding Prolog rules. This encodes the decodingnet, and adds the information to the database.
Algorithms.
On top of the model encoding, we implementseveral algorithms, useful for making allocation and config-uration policy decisions. For instance, to set up a device,the driver uses the dn_get_allocation_range() query tofind a suitable address space for memory allocation, then runs dn_get_config_nodes() to get the list of address spaceswhich need to be configured to make the memory resource ac-cessible, and lastly execute dn_resolve_range() to obtainthe address at which the device sees the memory resource.The result of the queries is then converted into a sequenceof capability operations to allocate memory, setup transla-tion structures and perform the relevant mappings. Note, themodel queries only provide a roadmap, the actual reconfigu-ration steps are invocations of the reference monitor whichenforces the authority and integrity of the system followingthe definition of the executable specification (§ 4.2).
Optimization.
Running the Prolog queries on the full graphis costly. We provide a library that caches the (flattened) graphrepresentation consisting only of cores/devices, configurableaddress spaces and memory nodes in the Prolog engine and directly in C using adjacency lists. We can then run a shortest-path algorithm to perform the queries, which minimizes thenumber of address spaces to configure.
We now describe the implementation of the reference monitordefined by the executable specification in Linux and
Bar-relfish/MAS . Resource Management.
Both, Linux and Barrelfish alreadyhave thorough resource management mechanisms, albeit difer-ent: Barrelfish manages physical resources using a distributed,partitioned capability system for naming, access control, andaccounting of objects. As in seL4 [18], capabilities are typed to indicate what can be done with the memory they refer to;rules dictate valid retype operations (e.g RAM to a Frame).Linux maintains a data structure, the page struct, for every 4KiB page of memory. In both systems, only the kernel hasdirect access to those data structures, and can maintain thepartitioning invariant.
Reference Monitor.
As with all microkernels, Barrelfish’skernel is essentially nothing but a reference monitor. It usesthe capability system to express the objects in memory and7he authority a process (subject) has over them. Any changesto the translation units (e.g. mapping a memory frame intothe IOMMU) correspond to capability operations. The refer-ence monitor checks type, address spaces and rights of thecapabilites.On Linux, we can use the para-virtualization interface (PV-Ops) to implement a reference monitor inside the kernel itself.We can then extend the PV-Ops interface to include all addresstranslation units in the system. This effectively implementsa well-defined hypercall interface to request changes to thetranslation tables from the hypervisor acting as the referencemonitor. Similarly, the nested kernel [16] integrates a priv-ileged kernel inside the monolithic kernel which interposesall updates to translation tables. Extending this interface toinclude all other translation hardware as well, would present agood way to implement a reference monitor inside the Linuxkernel.
Naming of Resources.
Barrelfish’s capabilities contain phys-ical addresses to identify the objects they are referring to. Tobe able to still identify the objects uniquely in the presenceof multiple address spaces we change the capability systemin
Barrelfish/MAS to use canonical base names, consistingof an address space identifier and an address within that ad-dress space. We adapt the kernel to consider the ASID whenperforming capability operations. An operation may nowfail in new ways, due to incompatible address spaces of thecapabilities (e.g. one cannot directly map host physical frameto a guest virtual address).Linux uses the physical frame number (PFN) uniquely iden-tify every 4 KiB page of memory. Using the sparse memorymodel [58] or heterogeneous memory [31], we can implementmemory nodes (address spaces) a dynamic mapping of thePFN to the underlying page struct. In this manner, we can usethe PFN as the memory resource’s canonical name.On both operating systems, we need a function to deref-erence the canonical name of a resource into a locally validaddress. We can generate such a translation function basedon the platform description or the model state.
Object Types.
In addition,
Barrelfish/MAS introduces newcapability types for all hardware translation units (not justpage tables), ASID allocation, and entire physical, interme-diate or virtual address spaces. Like Barrelfish, we allow acapability to refer to a memory region of arbitrary size, butrequire that it must not span multiple address spaces.On Linux, we do not need to use typed objects as such asthe kernel does not expose handles to physical resources touser space. Internally, Linux already uses different accountingtypes for memory allocations.
Page Tables and Address Spaces.
Barrelfish/MAS intro-duces distinct capability types for all hardware-defined trans-lation structures (register sets or page table levels). Each ofthese capability types are translation structures in the senseof the executable spec. Since a page table defines an addressspace, we can derive an address space capability from it, and use it to install mappings in other address spaces. Delet-ing the page table capability triggers a recursive deletion ofits spanned address spaces and all possible mappings. Weintegrated this process into the capability system. This is ef-fectively equivalent to revoking all descendants of the addressspace capability and then deleting it. This ensures, that thereare no mappings referring to an invalid address space.With the implementation of para-virtualization and KVM-based virtualization, Linux has support to represent the guestaddress space inside the kernel. This would be one possiblityto get support for different address spaces in the kernel. Al-ternatively, we can use the sparse memory model or HMM tocreate “virtual” memory nodes that correspond to an interme-diate address space. Tracking Mappings.
Barrelfish/MAS uses designated map-ping capabilities to track mappings. For every mapped object,there is a corresponding mapping capability, which is a de-scendant thereof. Therefore, the capability system is ableto locate and invalidate all mappings when access to an ob-ject is revoked. Note, translation structures effectively definean address space, and hence there is no difference betweenmappings of multi-level page tables, or actual frames.Similar to the mapping capabilities, Linux uses the rmap data structure to store where a page of memory is mapped.This is already maintained for the page cache, as well as guestmemory pages. We can use this mechanisms to track allmappings of a page in Linux.
The last part of the implementation describes how the modelstate is populated ( 3 in Figure 6). There are two majorsources of memory topology information building up theruntime representation: i) static description of platforms (orparts there of), and ii) discovery mechanisms such as PCI orACPI, which may instantiate predefined descriptions. Static Platform Descriptions.
The memory topology ofparts of the system – or in the case of SoC the entire sys-tem – is fixed and known in advance: for instance, the XeonPhi co-processor has a defined number of cores and memory.We can therefore write down a description of the memory sub-system. For this, we use a domain specific language (DSL),which follows closely the syntax of the formal model, allowswriting down the memory topology of the entire system, or itssub-components. The DSL compiler then produces a set ofProlog rules, which populate the model at runtime, either fullyor in response to hardware discovery events. On Linux, wecan use procfs and sysfs , as well as device trees to obtainsystem topology descriptions.
Using Static Descriptions: Code Generation.
From thestatic descriptions, we can pre-compute and enumerate theaddress spaces of the hardware component, or in the case ofSoC platforms, the entire memory topology. The DSL com-piler generates a set of data structures and code used by the8eference monitor to instantiate the initial set of capabilities,verify address space compatibility in capability operations,translation tables, or functions to convert the canonical namesinto valid, local physical or virtual addresses. We evaluatethis scenario in § 6.4.
Using Static Descriptions: Hardware Discovery.
In gen-eral, the configuration of a platform is known after devicediscovery mechanisms such as ACPI or PCI (if percent). Dur-ing this process, the model is dynamically populated with thepartial descriptions of its components: e.g. the ACPI tableindicates the presence and version of an IOMMU, and in re-sponse the partial description of the IOMMU is instantiatedand added to the model at runtime. A driver may update themodel with more precise information, e.g. only the Xeon Phidriver knows the precise number of cores and memory size ofthe PCI Express attached co-processor.
In this section, we present a quantitative and qualitative per-formance evaluation of the address space and least-privilegeauthority model in
Barrelfish/MAS . The goal of this sectionis to establish the following:1. The mechanism implementation results in a performantmemory system (§ 6.1, § 6.2).2. The policy implementation produces usable results withinreasonable overheads (§ 6.3).3. Qualitatively demonstrate, that the resulting system is ableto handle complex memory topologies (§ 6.4).
Evaluation Platform.
All performance measurements areperformed on a dual-socket server consisting of two IntelXeon E5-2670 v2 processors (
Ivy-Bridge micro-architecture)with 10 cores each. The machine has 256 GiB of main mem-ory split equally into two NUMA nodes. The machine runsin “ performance mode ”, with disabled simultaneous multi-threading (SMT), Intel TurboBoost technology, and IntelSpeed Stepping, to ensure consistent measurements. Themachine further contains two Intel Xeon Phi co-processor31S1 attached as a PCI Express 3.0 device. The co-processorshave 57 cores with four hardware threads per core, and 8 GiBGDDR memory. The Intel VT-d [26] (IOMMU) is enabled.We use a vanilla Ubuntu 18.04 LTS with Linux kernel 4.15.For a fair comparison we disable specter/meltdown mitiga-tion as they slow down memory operations significantly andBarrelfish doesn’t implement them. Barrelfish and
Barrelfish/-MAS are compiled in release mode.
In this part of the evaluation, we quantitatively evaluate theperformance of
Barrelfish/MAS ’s virtual memory operationsin comparison to vanilla Barrelfish and Linux. T i m e p e r p a g e [ s ] LinuxBarrelfishBarrelfish/MAS 1 2 4 8 16 32 64 128256 T i m e p e r p a g e [ s ] LinuxBarrelfishBarrelfish/MAS 1 2 4 8 16 32 64 T i m e p e r p a g e [ s ] LinuxBarrelfishBarrelfish/MAS (a) map()
Operation T i m e p e r p a g e [ s ] LinuxBarrelfishBarrelfish/MAS 1 2 4 8 16 32 64 128256 T i m e p e r p a g e [ s ] LinuxBarrelfishBarrelfish/MAS 1 2 4 8 16 32 64 T i m e p e r p a g e [ s ] LinuxBarrelfishBarrelfish/MAS (b) protect()
Operation T i m e p e r p a g e [ s ] LinuxBarrelfishBarrelfish/MAS 1 2 4 8 16 32 64 128256 T i m e p e r p a g e [ s ] LinuxBarrelfishBarrelfish/MAS 1 2 4 8 16 32 64 T i m e p e r p a g e [ s ] LinuxBarrelfishBarrelfish/MAS (c) unmap()
Operation
Figure 7: Measured Latency per Page for the VM Operationson Linux, Barrelfish and
Barrelfish/MAS . map() protect() unmap() Table 3: The Best Configuration of the Linux VM Operations.
Benchmark Methodology.
We compare the performanceof the virtual memory operations map() , protect() and unmap() for buffer sizes from 4 KiB to 64 GiB using one ofthe three native supported page sizes (4 KiB, 2 MiB and 1GiB). On Barrelfish/MAS and Barrelfish, we use the defaultuser-level virtual memory management library, and on Linuxwe take the fastest of the measured different techniques to mapmemory using anonymous memory ( mmap() ), shared mem-ory objects ( shmfd() ) or shared memory segment ( shmat() ).We exclude the allocation and clearing of backing memory inthis benchmark as it affects all systems the same and woulddominate the execution times.
Results.
Figure 7 contains the results of this evaluation for thethree operations and page sizes. The graphs show the medianlatency (lower is better) and standard error per modified pagetable entry. We scale the number of changed page table entries.For Linux, we select the best configuration as indicated inTable 3. We make the following observations: • Amortization:
The general pattern is similar: the cost perpage decreases with increasing numbers of affected pages.The cost of the virtual region management, syscall overhead,locating the page table entry is amortized among multiplepages, whose mappings are likely to be in consecutive pagetable entries. • map() . Both, Barrelfish and Barrelfish/MAS have match-ing performance patterns, independent of the used page size.Linux is faster for mapping up to two 4 KiB pages. Forlarger pages Barrelfish (as well as
Barrelfish/MAS ) outper-9orms Linux. This is not an effect of our implementation butdue to Linux allocating lower-level page tables, in case thesuper-page mapping needs to be broken up. Therefore, Linuxallocates and clears memory to hold the page table. Zeroing apage can add up to 0.71 µs which is the difference we see inthe graph. Both, Barrelfish and Barrelfish/MAS only have tocreate a new mapping capability and insert it into the MDB. • protect() . We observe very predictable patterns for Bar-relfish and Barrelfish/MAS , where vanilla Barrelfish is slightlyfaster due to storing an explicit pointer to the page table di-rectly in the mapping capability, whereas
Barrelfish/MAS stores the canonical name which requires an address transla-tion causing more work. In both cases, the mapping capabil-ity contains all information to perform the operation. Linuxneeds to walk the page table to locate the page table entry tobe protected. This is again not an effect of the MAS extensionbut a difference between Linux and vanilla Barrelfish. • unmap() . Up to eight affected pages, Linux is faster thanBarrelfish and Barrelfish/MAS , which both need to removeand delete the mapping capability from the MDB, whichresults in another syscall on Barrelfish (
Barrelfish/MAS re-moves this when clearing the page table entry). Removingthe mapping capability gets amortized when more pages areaffected.
Discussion.
In direct comparison with Barrelfish, we observethat
Barrelfish/MAS is able to match the performance in allcases. Moreover, the comparison with Linux shows, that
Barrelfish/MAS has comparable performance to a mainstreamOS. We conclude that our least-privilege access control modelwith support for multiple address spaces can be implementedwith fine granularity while maintaining competitive memorymanagement performance.
The Appel-Li benchmark [5] exercises the virtual memorysubsystem with operations, which are relevant to tasks suchas garbage collection or tracking page modifications.
Benchmark Methodology.
The benchmark consists of thefollowing three experiments:1. prot1-trap-unprot.
Randomly pick a page of memory,write-protect the page, write to it, take a trap, unprotect thepage, continue with next page.2. protN-trap-unprot.
Write-protect 512 pages of memory atonce, write to each page of memory in turn, taking a trap andunprotecting the page.3. trap only.
Pick a protected page, write to it and take the trapcontinue with next page without changing any permissions.We run this benchmark on Barrelfish and
Barrelfish/MAS . Inaddition, we compare to Linux as a frame of reference. OnBarrelfish and
Barrelfish/MAS the numbers include the costof virtual address space accounting in userspace.
Results.
We show the benchmark results in Figure 8. Each prot1-trap-unprot protN-trap-unprot trap only 2.55.07.510.012.5 k c y c l e s / ( p a g e | t r a p ) Linux Barrelfish Barrelfish/MAS
Figure 8: Appel-Li Benchmark on
Barrelfish/MAS and Linux.bar corresponds to a different OS and represents the timetaken per page. The three bar groups represent the threebenchmark experiments. The standard error is less than 0.5%.We make the following observations: • Barrelfish vs. Barrelfish/MAS.
Direct comparison shows aslowdown of less than 5% for
Barrelfish/MAS vs. Barrelfish.The trap performance of both systems is the same. • Linux vs Barrelfish.
Barrelfish outperforms Linux in allexperiments. Barrelfish can use its capability system to effi-ciently find the page table that has to be modified while Linuxneeds to walk the page table tree. Furthermore Barrelfish re-flects the trap directly to user-space without checking whetherthe faulting address has been previously allocated [9]. Thisapplies to
Barrelfish/MAS as well as vanilla Barrelfish and isindependent of our extension. • Batching.
The protection of 512 pages in one syscall ( protN-trap-unprot ) amortizes the total syscall overheads, which re-duce the time per page on all systems by 600-2000 cycles.
Discussion.
In this evaluation, we show that
Barrelfish/MAS is able to match the performance of Barrelfish with a maxi-mum overhead of less than 5%, despite support for explicitaddress spaces. The comparison to Linux again shows that
Barrelfish/MAS ’s memory operation performance is competi-tive to that of a mainstream OS.
In this evaluation, we investigate the overheads of themodel runtime representation and the translation unit re-configuration following the principle of least-privilege.
Benchmark Methodology.
This benchmark models anoffload-scenario, where an application workload wants tomake use of a co-processor attached to PCI Express. We usethe Xeon Phi co-processor for this purpose. We are interestedin the sequence of initialization steps to establish a sharedbuffer between the CPU cores and the co-processor:1.
Model Query.
Evaluate the runtime representation to finda suitable memory region and needed re-configuration steps.2.
Allocate and Map.
Request memory from the allocator andmap it into the application’s virtual address space.3.
Program Translation Units.
Re-configure the translationunits indicated in the model query response. Here, this in-cludes i) the IOMMU, and ii) the SMPT of the co-processor.10
200 400 600 800 1000 1200 1400Time [us]Linux MMAPBarrelfish Alloc and MapBarrelfish/MAS Local MapBarrelfish/MAS RPC Map I/OMMU programmingSMPT programmingModel Query Memory Allocation and MappingMemset of Allocated Memory
Figure 9: Breakdown of the Offloading Scenario.We profile the execution of these steps and measure thetime it takes to perform each step individually. We evaluatetwo mechanisms to program the IOMMU, i) to use capabilityinvocations directly, and ii) use an RPC to the IOMMU ser-vice acting as a reference monitor. The buffer size used is 8MiB. As a frame of reference, measure the time it takes tojust allocate and map memory on both Linux (using mmap() )and vanilla Barrelfish. Results.
The breakdown of the operation into the steps isshown in Figure 9. We show both the numbers for bothmechanisms to program the IOMMU, and for comparison, weinclude the time it takes to just allocate and map the memoryon vanilla Barrelfish and Linux. The x-axis represents themeasured times in µs . We make the following observations: • Memory Allocation and Mapping.
All three OSes use aboutthe same time to allocate and map the required memory region,which accounts for the majority of the profiled time. It isdominated by zeroing the newly allocated memory. • Model Query.
Evaluating the model at runtime accountsfor less than 5% of the total runtime. • SMPT Configuration.
Programming the SMPT of the co-processor uses less than 0.3% of the runtime. • IOMMU Programming.
The configuration of the IOMMUusing direct capability invocations is fast (0.2% of the run-time). When using the RPC to the IOMMU reference monitor,this requires capability transfers which corresponds to about3% of the execution time.Overall, the resulting overhead for the model query and theaddress space configuration accounts for 5 . Barrelfish/MAS compared to Barrelfish and Linux.
Discussion.
In this evaluation, we have shown that it is possi-ble to efficiently implement a representation of our executablemodel in an operating system and reconfigure address spacesfollowing the principle of least-privilege. Moreover, subse-quent allocations may use the cached results of the modelquery, reducing the overhead even further. Note, that thequery merely indicate the operations to be carried out, but thecapability system enforces the integrity thereof.
In this evaluation, we qualitatively show the application andintegration of the address space model into the OS toolchain
Platform DescriptionDSL Prolog Model Representation Platform Data structures and functionsLISA+SimulatorCon fi guration FastModelsSimulatorBinaryBarrel fi shSKBState Barrel fi sh/MASOS imageeclipseclpfastmodelscompiler gcccompilerrun onsimulator Figure 10: Running
Barrelfish/MAS on an ARM FastMod-els [7] Platform Based on a Hardware Description.
Con fi gurable Memory Map ARM Cortex A57ARM Cortex A57 Con fi gurable Memory MapDRAM 0 DRAM 1 DRAM 2 DRAM 3 Figure 11: FastModels Simulator Configurationto generate low-level, platform-specific OS code and datastructures. By doing that we show, that our implementationis functional even when run on simulated platforms with un-usual address space topologies not supported by other systems.While these simulated platforms are extreme, they includeother real systems such as those with secure co-processors.
Evaluation Methodology.
We design and build the toolchainillustrated in Figure 10 and write a series of different platformdescriptions using a DSL. These platform descriptions thenspecify the memory topology of the simulated platforms. TheDSL compiler then generates:1.
Executable Model.
A runtime representation of the mem-ory topology model, and2.
Simulator Configuration.
The LISA+ hardware descriptionthat configures the ARM FastModels simulator [7].The generated runtime representation of the topologymodel then acts as the initial state for the Barrelfish SKB,and is used to generate low-level OS code and data struc-tures, which are compiled and linked into a platform-specific
Barrelfish/MAS
OS image.We mention four example configurations we tested for thisevaluation. Figure 11 shows an illustration of the simulatedplatform, which consists of two ARM Cortex A57 processors,each having a configurable local memory map which definesat which addresses they see the DRAM regions (and the restof the system in general) in their local address space. Weevaluated the following configurations:1.
Uniform
Both cores have an identical memory map.2.
Swapped
DRAM is split in two halves, where each coresees the two halves at swapped address ranges.3.
Private
One shared memory region, and each core furtherhas a private memory region, inaccessible by the other.4.
Private Swapped
Combines the swapped and private se-tups: shared memory with swapped views, and private mem-ory per core.
Results.
During out experiments, we managed to compile
Barrelfish/MAS and run it successfully on all tested platform11onfigurations. This includes various memory managementtasks and shared-memory message passing between the cores.There was no programmer effort required, besides writing theplatform description.
Discussion.
We know of no other current OS designs whichcan manage memory globally in all these cases. PopcornLinux [11] and Barrelfish have limited support for case 3;while regular Linux and seL4 only support case 1. In contrast,
Barrelfish/MAS supports all four cases.
Barrelfish/MAS is able to boot and manage memory on allplatforms without modifications, regardless of the topology.
In this evaluation, we have shown that it is possible to effi-ciently implement the address space model and least-privilegememory management in an OS. We have quantitatively eval-uated
Barrelfish/MAS ’s virtual memory system, the recon-figuration operations, and analyzed the space and runtimecomplexity of maintaining kernel state.Moreover, we have seen that
Barrelfish/MAS is able to han-dle complex and non-standard memory topologies by strictlyusing the memory object’s canonical name in the capabil-ity system, and generated translation functions which furtherconvert this canonical name to a valid local address
In this paper, we made the case to bring back the concept ofa reference monitor to mediate access to memory resourceon modern, heterogeneous platforms. We presented a fine-grained, realistic memory protection model based on whichwe can extend the reference monitor to include all memorytranslation and protection hardware present in the system.This allows systems software to adapt their access controlmodel and catch up with the complexity of modern hardware.We have shown that our design is applicable to any OS, re-gardless of its architecture. We have developed an executablespecification of a reference monitor including the state, oper-ations and authority, on which we have based our prototypeimplementation in
Barrelfish/MAS . Not only can this memoryprotection model eliminate three different classes of bugs andvulnerabilities, but there is also no inherent performance over-head in implementing it in an operating system. Moreover,based on trusted hardware specifications we can increase thelevel of automation and generate low-level operating systemscode. We believe that our approach can lay the foundationfor both fully verified systems and more reliable memorymanagement in existing systems.We plan to open-source the reference monitor and
Bar-relfish/MAS implementations.
References [1] Reto Achermann, Lukas Humbel, David Cock, and Tim-othy Roscoe. Formalizing Memory Accesses and Inter-rupts. In
Proceedings of the 2nd Workshop on Modelsfor Formal Analysis of Real Systems , MARS 2017, pages66–116, 2017.[2] Reto Achermann, Lukas Humbel, David Cock, and Tim-othy Roscoe. Physical Addressing on Real Hardware inIsabelle/HOL. In
Proceedings of the 9th InternationalConference on Interactive Theorem Proving , ITP’18,pages 1–19, Oxford, United Kingdom, 2018. SpringerInternational Publishing.[3] Hanna Alam, Tianhao Zhang, Mattan Erez, and YoavEtsion. Do-It-Yourself Virtual Memory Translation.In
Proceedings of the 44th Annual International Sym-posium on Computer Architecture , ISCA ’17, pages457–468, New York, NY, USA, 2017. ACM.[4] James P. Anderson. Computer Security TechnologyPlanning Study. Technical Report ESD-TR-73-51, Vol.I, AD-758 206, Electronic Systems Division, Deputyfor Command and Management Systems HQ ElectronicSystems Division (AFSC), L. G. Hanscom Field, Bed-ford, Massachusetts 01730, USA, 10 1972.[5] Andrew W. Appel and Kai Li. Virtual Memory Primi-tives for User Programs. In
Proceedings of the FourthInternational Conference on Architectural Support forProgramming Languages and Operating Systems , AS-PLOS IV, pages 96–107, New York, NY, USA, 1991.ACM.[6] ARM Ltd.
ARM Security Technology - Building aSecure System using TrustZone Technology , prd29-genc-009492c edition, 4 2009.[7] ARM Ltd. Development Tools and Software:Fast Models. ,8 2019.[8] Rachata Ausavarungnirun, Joshua Landgraf, VanceMiller, Saugata Ghose, Jayneel Gandhi, Christopher J.Rossbach, and Onur Mutlu. Mosaic: A GPU MemoryManager with Application-transparent Support for Mul-tiple Page Sizes. In
Proceedings of the 50th AnnualIEEE/ACM International Symposium on Microarchitec-ture , MICRO-50 ’17, pages 136–150, New York, NY,USA, 2017. ACM.[9] Moshe Bar. The Linux Signals Handling Model.
LinuxJournal , 5 2000. .1210] Antonio Barbalace, Anthony Iliopoulos, Holm Rauch-fuss, and Goetz Brasche. It’s Time to Think About anOperating System for Near Data Processing Architec-tures. In
Proceedings of the 16th Workshop on HotTopics in Operating Systems , HotOS ’17, pages 56–61,New York, NY, USA, 2017. ACM.[11] Antonio Barbalace, Marina Sadini, Saif Ansary, Christo-pher Jelesnianski, Akshay Ravichandran, Cagil Kendir,Alastair Murray, and Binoy Ravindran. Popcorn: Bridg-ing the Programmability Gap in heterogeneous-ISA Plat-forms. In
Proceedings of the Tenth European Confer-ence on Computer Systems , EuroSys ’15, pages 29:1–29:16, New York, NY, USA, 2015. ACM.[12] Andrew Baumann, Paul Barham, Pierre-Evariste Da-gand, Tim Harris, Rebecca Isaacs, Simon Peter, Tim-othy Roscoe, Adrian Schüpbach, and Akhilesh Sing-hania. The Multikernel: A New OS Architecture forScalable Multicore Systems. In
Proceedings of the ACMSIGOPS 22nd Symposium on Operating Systems Prin-ciples , SOSP ’09, pages 29–44, New York, NY, USA,2009. ACM.[13] Simon Biggs, Damon Lee, and Gernot Heiser. Thejury is in: Monolithic os design is flawed: Microkernel-based designs improve security. In
Proceedings ofthe 9th Asia-Pacific Workshop on Systems , APSys ’18,New York, NY, USA, 2018. Association for ComputingMachinery.[14] Adam Chester. Exploiting CVE-2018-1038 - TotalMeltdown. Online. https://blog.xpnsec.com/total-meltdown-cve-2018-1038/ , 4 2018.[15] David Cock, Gerwin Klein, and Thomas Sewell. SecureMicrokernels, State Monads and Scalable Refinement.In
Proceedings of the 21st International Conference onTheorem Proving in Higher Order Logics , TPHOLs ’08,pages 167–182, Berlin, Heidelberg, 2008. Springer-Verlag.[16] Nathan Dautenhahn, Theodoros Kasampalis, Will Dietz,John Criswell, and Vikram Adve. Nested kernel: An op-erating system architecture for intra-kernel privilege sep-aration. In
Proceedings of the Twentieth InternationalConference on Architectural Support for ProgrammingLanguages and Operating Systems , ASPLOS ’15, pages191–206, New York, NY, USA, 2015. ACM.[17] Philip Derrin, Kevin Elphinstone, Gerwin Klein, DavidCock, and Manuel M. T. Chakravarty. Running theManual: An Approach to High-assurance MicrokernelDevelopment. In
Proceedings of the 2006 ACM SIG-PLAN Workshop on Haskell , Haskell ’06, pages 60–71,New York, NY, USA, 2006. ACM. [18] Dhammika Elkaduwe, Gerwin Klein, and Kevin Elphin-stone. Verified protection model of the sel4 microker-nel. In
Proceedings of the 2nd International Confer-ence on Verified Software: Theories, Tools, Experiments ,VSTTE ’08, pages 99–114, Berlin, Heidelberg, 2008.Springer-Verlag.[19] Andrew Ferraiuolo, Andrew Baumann, Chris Haw-blitzel, and Bryan Parno. Komodo: Using Verificationto Disentangle Secure-enclave Hardware from Software.In
Proceedings of the 26th Symposium on OperatingSystems Principles , SOSP ’17, pages 287–305, NewYork, NY, USA, 2017. ACM.[20] Simon Gerber, Gerd Zellweger, Reto Achermann, Ko-rnilios Kourtis, Timothy Roscoe, and Dejan Milojicic.Not Your Parents’ Physical Address Space. In
Proceed-ings of the 15th USENIX Conference on Hot Topics inOperating Systems , HOTOS’15, pages 16–16, Berkeley,CA, USA, 2015. USENIX Association.[21] Xiling Gong. Exploiting Qualcomm WLAN and Mo-dem Over the Air. In
Proceedings of the BlackHat USA2019 , 2019.[22] Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan Wu,Jieung Kim, Vilhelm Sjöberg, and David Costanzo. Cer-tiKOS: An Extensible Architecture for Building Certi-fied Concurrent OS Kernels. In
Proceedings of the 12thUSENIX Conference on Operating Systems Design andImplementation , OSDI’16, pages 653–669, Berkeley,CA, USA, 2016. USENIX Association.[23] Marius Hillenbrand, Mathias Gottschlag, Jens Kehne,and Frank Bellosa. Multiple Physical Mappings: Dy-namic DRAM Channel Sharing and Partitioning. In
Pro-ceedings of the 8th Asia-Pacific Workshop on Systems ,APSys ’17, pages 21:1–21:9, Mumbai, India, 2017.[24] HSA Foundation.
HSA Runtime Programmer’s Refer-ence Manual , version: 1.1.4 edition, 10 2016.[25] Jian Huang, Moinuddin K. Qureshi, and KarstenSchwan. An Evolutionary Study of Linux MemoryManagement for Fun and Profit. In
Proceedings ofthe 2016 USENIX Conference on Usenix Annual Tech-nical Conference , USENIX ATC ’16, pages 465–478,Berkeley, CA, USA, 2016. USENIX Association.[26] Intel Corporation.
Intel Virtualization Technology forDirected I/O - Architecture Specification , d51397-011,revision 3.1 edition, 6 2019.[27] Khronos OpenCL Working Group.
The OpenCL Speci-fication , version: 2.1, document revision: 24 edition, 22018.1328] Gerwin Klein, Kevin Elphinstone, Gernot Heiser, JuneAndronick, David Cock, Philip Derrin, Dhammika Elka-duwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish,Thomas Sewell, Harvey Tuch, and Simon Winwood.seL4: Formal Verification of an OS Kernel. In
Proceed-ings of the ACM SIGOPS 22nd Symposium on Operat-ing Systems Principles , SOSP ’09, pages 207–220, NewYork, NY, USA, 2009. ACM.[29] Butler W Lampson. Protection.
ACM SIGOPS Operat-ing Systems Review , 8(1):18–24, 1974.[30] Janghaeng Lee, Mehrzad Samadi, and Scott Mahlke.VAST: The Illusion of a Large Memory Space for GPUs.In
Proceedings of the 23rd International Conferenceon Parallel Architectures and Compilation , PACT ’14,pages 443–454, New York, NY, USA, 2014. ACM.[31] Linux Kernel Documentation.
Heterogeneous MemoryManagement (HMM) , version 5.0 edition, 4 2019.[32] A Theodore Markettos, Colin Rothwell, Brett F Gut-stein, Allison Pearce, Peter G Neumann, Simon WMoore, and Robert NM Watson. Thunderclap: Ex-ploring Vulnerabilities in Operating System IOMMUProtection via DMA from Untrustworthy Peripherals.In
NDSS , 2019.[33] Alex Markuze, Adam Morrison, and Dan Tsafrir. TrueIOMMU Protection from DMA Attacks: When Copy isFaster Than Zero Copy. In
Proceedings of the Twenty-First International Conference on Architectural Supportfor Programming Languages and Operating Systems ,ASPLOS ’16, pages 249–262, New York, NY, USA,2016. ACM.[34] Benot Morgan, Eric Alata, Vincent Nicomette, andMohamed Kaaniche. Bypassing IOMMU Protectionagainst I/O Attacks. In , pages145–150, 10 2016.[35] Benot Morgan, Eric Alata, Vincent Nicomette, and Mo-hamed Kaaniche. IOMMU Protection Against I/O At-tacks: A Vulnerability and a Proof of Concept.
Journalof the Brazilian Computer Society , 24(1):2, 1 2018.[36] NATIONAL VULNERABILITY DATABASE NVD.CVE-2011-1898. Online, 8 2011.[37] NATIONAL VULNERABILITY DATABASE NVD.CVE-2013-4329. Online, 9 2013.[38] NATIONAL VULNERABILITY DATABASE NVD.CVE-2014-0972. Online, 8 2014.[39] NATIONAL VULNERABILITY DATABASE NVD.CVE-2014-3601. Online, 8 2014. [40] NATIONAL VULNERABILITY DATABASE NVD.CVE-2014-9888. Online, 8 2014.[41] NATIONAL VULNERABILITY DATABASE NVD.CVE-2015-6994. Online, 1 2017.[42] NATIONAL VULNERABILITY DATABASE NVD.CVE-2016-5349. Online, 4 2017.[43] NATIONAL VULNERABILITY DATABASE NVD.CVE-2017-12188. Online, 10 2017.[44] NATIONAL VULNERABILITY DATABASE NVD.CVE-2018-1038. Online, 8 2018.[45] NATIONAL VULNERABILITY DATABASE NVD.CVE-2015-4421. Online, 5 2019.[46] NATIONAL VULNERABILITY DATABASE NVD.CVE-2015-4422. Online, 5 2019.[47] NATIONAL VULNERABILITY DATABASE NVD.CVE-2019-10538 - Modem into Linux Kernel issue.Online, 8 2019.[48] NATIONAL VULNERABILITY DATABASE NVD.CVE-2019-10539 - Compromise WLAN Issue. Online,8 2019.[49] NATIONAL VULNERABILITY DATABASE NVD.CVE-2019-10540 - WLAN into Modem issue. Online,8 2019.[50] NVIDIA Corporation.
Unified Memory in CUDA 6 , 112013.[51] David Patterson, Thomas Anderson, Neal Card-well, Richard Fromm, Kimberly Keeton, ChristoforosKozyrakis, Randi Thomas, and Katherine Yelick. ACase for Intelligent RAM.
IEEE Micro , 17(2):34–44, 31997.[52] Bogdan F. Romanescu, Alvin R. Lebeck, and Daniel J.Sorin. Specifying and Dynamically Verifying AddressTranslation-aware Memory Consistency. In
Proceed-ings of the Fifteenth Edition of ASPLOS on ArchitecturalSupport for Programming Languages and OperatingSystems , ASPLOS XV, pages 323–334, New York, NY,USA, 2010. ACM.[53] Pierre Schnarz, Joachim Wietzke, and Ingo Stengel. To-wards attacks on restricted memory areas through co-processors in embedded multi-os environments via ma-licious firmware injection. In
Proceedings of the FirstWorkshop on Cryptography and Security in ComputingSystems , CS2 ’14, pages 25–30, New York, NY, USA,2014. ACM.1454] Adrian Schüpbach, Andrew Baumann, Timothy Roscoe,and Simon Peter. A Declarative Language Approach toDevice Configuration. In
Proceedings of the SixteenthInternational Conference on Architectural Support forProgramming Languages and Operating Systems , ASP-LOS XVI, pages 119–132, New York, NY, USA, 2011.ACM.[55] Thomas Sewell, Simon Winwood, Peter Gammie, TobyMurray, June Andronick, and Gerwin Klein. seL4Enforces Integrity. In Markovan Eekelen, HermanGeuvers, Julien Schmaltz, and Freek Wiedijk, editors,
Interactive Theorem Proving , pages 325–340, Berlin,Heidelberg, 2011. Springer Berlin Heidelberg.[56] Erik Vermij, Leandro Fiorin, Rik Jongerius, ChristophHagleitner, Jan Van Lunteren, and Koen Bertels. Anarchitecture for integrated near-data processors.
ACMTrans. Archit. Code Optim. , 14(3):30:1–30:25, Septem-ber 2017.[57] Stavros Volos, Kapil Vaswani, and Rodrigo Bruno.Graviton: Trusted execution environments on gpus. In
Proceedings of the 12th USENIX Conference on Oper-ating Systems Design and Implementation , OSDI’18,page 681–696, USA, 2018. USENIX Association. [58] Andy Whitcroft. Sparsemem Memory Model. https://lwn.net/Articles/134804/ , 8 2019.[59] Simon Winwood, Gerwin Klein, Thomas Sewell, JuneAndronick, David Cock, and Michael Norrish. Mindthe Gap. In
Proceedings of the 22nd InternationalConference on Theorem Proving in Higher Order Logics ,TPHOLs ’09, pages 500–515, Berlin, Heidelberg, 2009.Springer-Verlag.[60] Dongping Zhang, Nuwan Jayasena, Alexander Lya-shevsky, Joseph L. Greathouse, Lifan Xu, and MichaelIgnatowski. Top-pim: Throughput-oriented pro-grammable processing in memory. In
Proceedings ofthe 23rd International Symposium on High-performanceParallel and Distributed Computing , HPDC ’14, pages85–98, New York, NY, USA, 2014. ACM.[61] Zhiting Zhu, Sangman Kim, Yuri Rozhanski, Yige Hu,Emmett Witchel, and Mark Silberstein. Understandingthe security of discrete gpus. In