[PDF] CapablePtrs: Securely Compiling Partial Programs Using the Pointers-as-Capabilities Principle

Abstract

Capability machines such as CHERI provide memory capabilities that can be used by compilers to provide security benefits for compiled code (e.g., memory safety). The existing C to CHERI compiler, for example, achieves memory safety by following a principle called "pointers as capabilities" (PAC). Informally, PAC says that a compiler should represent a source language pointer as a machine code capability. But the security properties of PAC compilers are not yet well understood. We show that memory safety is only one aspect, and that PAC compilers can provide significant additional security guarantees for partial programs: the compiler can provide security guarantees for a compilation unit, even if that compilation unit is later linked to attacker-provided machine code. As such, this paper is the first to study the security of PAC compilers for partial programs formally. We prove for a model of such a compiler that it is fully abstract. The proof uses a novel proof technique (dubbed TrICL, read trickle), which should be of broad interest because it reuses the whole-program compiler correctness relation for full abstraction, thus saving work. We also implement our scheme for C on CHERI, show that we can compile legacy C code with minimal changes, and show that the performance overhead of compiled code is roughly proportional to the number of cross-compilation-unit function calls.

Full PDF

aa r X i v : . [ c s . P L ] M a y CapablePtrs: Securely Compiling Partial Programsusing the Pointers-as-Capabilities Principle

Akram El-Korashy , Stelios Tsampas , Marco Patrignani , Dominique Devriese , Deepak Garg , and Frank Piessens MPI-SWS, Germany; email: [email protected], [email protected] imec-Distrinet, KU Leuven, Belgium; email [email protected] Stanford University, USA and CISPA, Germany; email [email protected] Vrije Universiteit Brussel, Belgium; email [email protected]

Abstract —Capability machines such as

CHERI provide mem-ory capabilities that can be used by compilers to providesecurity beneﬁts for compiled code (e.g., memory safety). TheC to

CHERI compiler, for example, achieves memory safety byfollowing a principle called “pointers as capabilities” (

PAC ).Informally,

PAC says that a compiler should represent asource language pointer as a machine code capability. But thesecurity properties of

PAC compilers are not yet well understood.We show that memory safety is only one aspect, and that

PAC compilers can provide signiﬁcant additional security guaranteesfor partial programs : the compiler can provide guarantees for acompilation unit, even if that compilation unit is later linked toattacker-controlled machine code.This paper is the ﬁrst to study the security of

PAC compilersfor partial programs formally. We prove for a model of such acompiler that it is fully abstract . The proof uses a novel prooftechnique (dubbed

TrICL , read trickle ), which is of broad interestbecause it reuses and extends the compiler correctness relationin a natural way, as we demonstrate.We implement our compiler on top of the

CHERI platformand show that it can compile legacy C code with minimal codechanges. We provide performance benchmarks that show howperformance overhead is proportional to the number of cross-compilation-unit function calls.

I. I

NTRODUCTION

A. Capability machines and pointers-as-capabilities

In a conventional computer, memory locations are ad-dressed using integers. There, a store instruction such as sw $ r2 n ($ r0 ) interprets the contents of register $ r0 as avirtual address (which is an integer), and adds n to it to obtainthe (integer) address where to store the contents of $ r2 . Inother words, the memory model of a conventional computer isessentially an array of integers indexed by integers. Instead, ina capability machine (i.e., a capability-based computer), mem-ory locations are addressed using a capability . Capabilities area separate type of run-time values that carry more informationthan just a memory address. They also contain bounds infor-mation indicating a section of memory that can be accessedusing this capability, and possibly also other information suchas access permissions. The store instruction csw $ r2 n ($ c0 ) now uses a capability register $ c0 , and the machine willcheck that the store performed by this instruction is withinbounds, and is compliant with the permissions associated with the capability in $ c0 . If n is too big, or if $ c0 only haspermission for reading, the store instruction will fail with anexception. The hardware also implements run-time type checksto ensure that integers and capabilities can not be confused,for instance capabilities are stored in separate register banks,and capabilities are tracked in memory by tagging memorylocations that contain a capability. Hence, capability machinesimplement a more structured memory model where (somewhatsimpliﬁed) memory is a collection of independent integer-indexed arrays, containing integers or capabilities, and everycapability gives access to a contiguous segment of one ofthose arrays. This more structured memory model can be usedto implement ﬁne-grained memory protection , and has thepotential to provide protection against many software bugs.Unfortunately, this support for memory protection is onlyuseful if high-level software can use it, and manually mod-ifying large existing code bases to adopt this memory pro-tection is not practically feasible. Hence, ideally, compilersshould handle this automatically. Thus, capability machinesshould be designed to make it easy for compilers of existinglanguages (like C) to use the additional support for memoryprotection. A recent example of such a machine is CHERI ,a mature capability machine implementation that has its ownFreeBSD version and C compiler [1, 2, 3, 4, 5]. Many keydesign choices in

CHERI were made to facilitate the use ofmemory protection in existing, large code bases. Speciﬁcally,

CHERI supports the pointers-as-capabilities (

PAC ) principle,which intuitively dictates that a compiler should representa source-level pointer as a target-level capability . To makethis convenient, a

CHERI capability contains (among otherthings): base and length addresses, and an offset relativeto the base address [5]. Such a capability represents a pointerpointing to the address base + offset , and that is valid foraccessing (using indexing or pointer arithmetic) addresses inthe range [ base , base + length ) . Pointer arithmetic can beimplemented by manipulating the offset.The following example illustrates how a compiler can mapC pointers to such machine-level capabilities. As a typesetting convention, we use a blue , sans - serif font for source language elements and an orange , bold font for target language ones.Elements common to both languages are typeset in a black , italic font. xample 1 (From C Pointers to CHERI

Capabilities) . TheC compilation unit (below on the left) declares two module-scoped variables and deﬁnes a function f() using one ofthese variables. The assembly pseudocode (below on the right)shows how a

PAC compiler could translate the body of f() . extern void send_rcv(char ∗ buffer); char iobuffer [512]; static int secret; void f() { iobuffer[42] = ’X’; send_rcv(iobuffer); } csl $ c1, $ ddc,512; li $ r1, ’X’; csw $ r1, 42( $ c1); call send_rcv; The default data capability register $ddc contains a capabilityfor the global data section. The compiler knows that thevariable iobuffer occupies the ﬁrst 512 bytes of that globaldata section. Hence, the ﬁrst instruction (CSetLen, set lengthof a capability) loads in register $c1 a copy of $ddc but withthe length ﬁeld reduced to 512. The next two instructionsimplement the assignment instruction of f() . Note that anout-of-bounds access would be trapped by the hardware.The ﬁnal call instruction implements the function call in f() (assuming a calling convention where the parameter ispassed in register $c1 ). Note that also all accesses to iobuffer performed in send_rcv will be bounds-checked, since thecapability representing the pointer passed to send_rcv carriesthe bounds information.B. Security properties of a

PAC compiler

Clearly, a

PAC compiler provides security beneﬁts. Butwhat security properties does it provide exactly?First, the compiler provides spatial memory safety . Sincethe bounds meta-data for a pointer is stored together with thepointer address in a single capability value, it is straightfor-ward to implement a bounds-checking compiler. For instance,an out-of-bounds access to iobuffer in the example above willnot access the secret variable, but will fail instead. It is inprinciple also possible to have the compiler provide temporalmemory safety . The compiler should then emit code to revoke acapability when the corresponding memory region is freed, butthere are many challenges to do this efﬁciently. For instance,revocation can be implemented by searching all reachablememory and zeroing out any remaining capabilities for thememory region to be freed, but this is clearly inefﬁcient, inparticular for revoking stack allocated memory. Several papersstudy approaches to efﬁciently achieve temporal memorysafety on a capability machine, both for stack [6, 7, 8] andheap memory [9], but it is fair to say that achieving efﬁcienttemporal memory safety for C is still an open problem. Hence,we leave it out of scope for this paper.But memory safety is not the full story. A

PAC compilercan provide stronger security guarantees. Consider again theexample above, but now under the assumption that the external send_rcv function is not implemented in C, but directly inassembly code. Now, we lose the guarantee that send_rcv can not access the secret variable. An assembly level implemen-tation can directly access $ddc . Hence, memory safety is onlyguaranteed for complete programs : if all code in a programis compiled by the

PAC compiler, then all out-of-boundsaccesses will be trapped.Since capability machines are designed to guarantee un-forgeability of capabilities even at machine code level, a

PAC compiler can provide stronger guarantees than spatial memorysafety for complete programs. The main contribution of thispaper is that we show how a

PAC compiler can provide strongsecurity guarantees for partial programs . In particular, for theexample above, the compiler we discuss in this paper providesthe guarantee that secret is inaccessible, even if the externalfunction is implemented in hand-crafted assembly code .In order to achieve non-trivial security guarantees for partialprograms, the target capability machine needs to support amechanism to deﬁne separate protection domains within asingle process. For instance,

CHERI provides support forobject capabilities [3, 4]. This makes it possible to put differentprogram parts in separate protection domains.

CHERI mainlyuses the mechanism of object capabilities to compartmentalize programs: it offers an API to programmers to run parts ofa program in a sandbox , a protection domain with reducedprivileges. The current

CHERI compiler, however, does notmake direct use of the object capability mechanism: it canonly be used through the provided API by the programmer who has to deﬁne and set up sandboxes. The

PAC compilerwe propose in this paper, on the other hand, will automatically set up a separate protection domain for every compilationunit. Doing so allows the compiler to provide strong securityguarantees for partial programs. In addition to the guaran-tees for partial programs against hand-crafted assembly, this automatic compartmentalization can signiﬁcantly reduce thepotentially dangerous impact of bugs in one compilation uniton the operation of other compilation units, again without anyprogrammer effort.

C. Summary of our results

We study the security guarantees that a

PAC compiler canprovide for partial programs. We prove a security theoremabout a simple model of a compartmentalizing

PAC compiler,and discuss its implications. We implement the compiler forfull C on top of the existing

CHERI compiler, and weevaluate the performance cost as well as the compatibility ofthe compiler with existing code.Our formulation and proof of the security guarantees of

PAC models

PAC as a compiler from a very simple im-perative source language with pointers, to a capability ma-chine with memory capabilities and a very basic form ofprotection domains/object capabilities. The security theoremwe prove takes the form of a full abstraction result [10],which intuitively means that it is valid for programmers toreason about the security of their program parts in the sourcelanguage. If a program part has a security property (such asthe conﬁdentiality of the secret variable in the example) that isvalid according to the source language semantics (i.e., underny interaction of the program part with other source languageprogram parts), then that property remains valid also if thecompiled program part is linked with attacker-crafted machinecode.This is a very strong security property, and we only proveit under some (reasonable) restrictions. First, some syntacticrestrictions need to be imposed on the (attacker-provided)machine code. These restrictions can be implemented as a codeveriﬁcation step by the linker. The main restriction we rely onis that machine code should not directly access the programcounter capability, a restriction that is easy to check at linktime, and that guarantees that machine code can not confuse aprogram part by providing it a code capability where it expectsa data capability.Second, to make source-code-level reasoning sound evenwhen linking to machine code, we need to make some ofthe machine-code level power of the attacker also availableat source code level, essentially forcing the programmer totake this into account when reasoning about his programpart at source code level. Consider for instance the followingexample: static bool secret=true; int f(int ∗ p, int ∗ q) { if ((int) p ! = (int) q) return 0; if (secret) return p[1]; else return q[1]; } Function f() ﬁrst tests whether its two argument pointers areequal addresses. This equality implies that p[1] and q[1] alwaysevaluate to the same value (or both fail). Hence f() does notleak information about secret .However, a machine code level adversary can call f() with two capabilities that point to the same address buthave different bounds information. In that case accessing p[1] could fail, while q[1] returns a value, and f() does leakinformation about secret . Hence, to prove our theorem, wehave to extend the source language to make pointers carrybounds information. This essentially makes explicit exactlywhat aspects of the target language programmers need to takeinto account to reason soundly about partial source programs.Full abstraction ( FA ) theorems are known to be hard toprove, and the same goes for our theorem. Our proof adopts anovel proof technique called TrICL (read, trickle ) that relieson trace semantics for both source and target languages. Whileinspired by existing FA proofs, this technique is both noveland of broad interest for secure compilation since it simpliﬁestrace-based proofs that are very common for secure compila-tion results (as we describe later on). TrICL simpliﬁes trace-based proofs by providing a general technique for re-using thesimulation relation that is built for compiler correctness, andfor extending it into an alternating simulation relation for thetrace-based security proof.In summary, this paper makes the following contributions: • it is the ﬁrst to state and prove the security properties of a PAC compiler for partial programs. To do so, the papermakes substantial technical contributions: – the deﬁnition of a sound and complete trace semanticsfor a C-like language (Section II-A) and for a languagewith capabilities (Section II-B). Both languages featurea memory model that allows ﬁne-grained memorysharing, which makes our results more interesting,our proofs more challenging and, in turn, our prooftechniques more widely-applicable; – the deﬁnition of a compiler between the afore-mentioned languages that embodies the pointers-as-capabilities design (Section III); – a proof that said compiler is fully abstract. In thisproof, we use a novel proof technique called TrICL (Section IV); • it reports on an implementation of that compiler on top ofthe C-to- CHERI compiler and benchmarks its efﬁciencyand compatibility with existing C code (Sections Vand VI).Our contributions are followed by a discussion of our model’slimitations (Section VII) and related work (Section VIII).Most of the formalisation presented in this paper is mas-saged for presentation, and many auxiliary lemmas as well asproofs are omitted; the interested reader will ﬁnd the full for-mal details at https://github.com/capable-ptrs/tech-report. Theimplementation of our compiler and the related benchmarkswill be made available publicly on publication of this paper.II. M

ODELS OF THE SOURCE AND TARGET LANGUAGES

In this section, we introduce our models of thesource and target languages:

ImpMod (Section II-A) and

CHERIExpress (Section II-B). But ﬁrst, we explain no-tation that is used throughout the paper including notation forcontextual equivalence.Given execution states s, s ′ , and a small-step relation → (with a reﬂexive transitive closure → ∗ ), we denote by s → s ′ the judgment that state s executes and transitions into state s ′ .Such a state s is not stuck . If on the other hand, no s ′ existssuch that s → s ′ , then we say s is stuck . In our models, forsimplicity, we treat errors the same way we do stuck states,i.e., we do not explicitly distinguish the event of an exceptionwith any special error state. An execution that reaches a stuckstate or that never terminates is referred to as diverging . Programs and initial & terminal states:

A program is a listof modules, and a module is a list of functions. Linking ofa pair of programs is denoted by ⋉ . We deﬁne linking to benon-symmetric because it makes our security proof easier (InSection VII, we discuss the reason for this non-symmetry). Forthe sake of our theorem, we will distinguish two parts C and p of a linked program C ⋉ p as the context and the program ,suggesting that the latter is the program of interest because itis the program that is or has been translated by our compiler.But notice that each of the program of interest and the context may themselves consist of more than one module (i.e., morethan one compilation unit ). Our results do not assume anyrestrictions on the number of modules that the program or theontext may consist of. As usual, only whole programs witha main function can execute correctly.The initial state of a program p is denoted by init ( p ) . Astate s is called a terminal state when it satisﬁes the judgment ⊢ t s .If the execution of a program of interest p in a certaincontext C reaches a terminal state, then we say the execution converges (or instead say the program converges), and we de-note this fact by C [ p ] ⇓ which is shorthand for the following: ∃ s. init ( C ⋉ p ) → ∗ s ∧ ⊢ t s Now with the notation for convergence in hand, we deﬁnecontextual equivalence of programs as follows:

Deﬁnition 1 (Contextual equivalence of programs) . p ≃ ctx p def = ∀ C . C [ p ] ⇓ ⇐⇒ C [ p ] ⇓ Two programs p and p whose termination behaviors (i.e.,whether they converge or not) always agree whenever linkedand executed with the same context are contextually equivalent ( ≃ ctx ). A. ImpMod : An imperative language with modules, arrays,and pointers

For simplicity, and in order to focus on the “ pointers tocapabilities ” aspect of the translation, we design

ImpMod tobe low-level enough so that we do not need to worry aboutextra correctness arguments that are otherwise pretty standard.For example,

ImpMod features only unstructured control ﬂowin the form of a

JumpIfZero instruction. But it still featuresfunctions and modules.In fact, modules are crucial to implementing the “ pointers tocapabilities ” translation. They are the scope of module-globalvariables and they are also used as a unit of isolation; i.e.,1) every module gets its own data stack on which it storesthe frames of alive calls to its functions, and 2) every functioninside a module can access all the module-global variables,whereas any function external to the module can only call themodule’s functions but is by default not allowed to access itsmodule-global variables.For example, in Listing II.1 below, we have an examplemodule (

Main ) where variables iobuffer and secret (lines 4 &5) are both module-global variables. Module id: Main Import module: Networking iobuffer [512]; secret; main() { Assign &iobuffer[42] 4242; Call read_secret(); Call encrypt(); Call send_rcv(&iobuffer); Call decrypt(); } read_secret() { ... } encrypt() { ... } decrypt() { ... } Listing II.1: Example

ImpMod module. All of the main function and the secret-handling functions: read_secret , encrypt and decrypt are deﬁned in the same module( Main ), and thus can each access both variables ( iobuffer and secret ). The function send_rcv in contrast is external:it is deﬁned in the

Networking module which is presumablyuntrusted. In line 11, the untrusted

Networking module gainsaccess to the iobuffer . The

Networking module, however, willnot gain access to secret through &iobuffer —so long as theother trusted functions read_secret and encrypt also make sureto not copy (a pointer to) the secret to the iobuffer . Anyattempt to increment and access the pointer &iobuffer beyondthe bounds of the array will get stuck. To understand howwe model this bounds check, we introduce the expressionsemantics and the memory model of

ImpMod .

1) Expressions and memory model of

ImpMod : Expres-sions are denoted with e . e ::= Z | VarID | e ⊕ e | e [ e ] | & VarID | & e [ e ] | ∗ ( e ) | start ( e ) | end ( e ) | oﬀset ( e ) | capType ( e ) | limRange ( e , e , e ) Base expressions are integers Z and variable identiﬁers VarID .On top of the base expressions, arithmetic expressions aregenerically denoted with e ⊕ e . Pointer and array expressions are: 1) the array-offset expression e [ e ] , 2) the ampersandoperator of the forms & VarID and & e [ e ] , and 3) the staroperator ∗ ( e ) . Moreover, there are low-level expressions thatare necessary for reﬂecting the target memory model in thesource language (as mentioned in Section I-B): four getters: start ( e ) , end ( e ) , oﬀset ( e ) , and capType ( e ) ; and a setter: limRange ( e , e , e ) . These low-level expressions operate on thecapability-based representation of memory addresses that weexplain next.Addresses in ImpMod are represented as capabili-ties. Thus, built into the memory model is a type

Cap def = { κ, δ } × Z × Z × Z for capabilities that is distinctfrom integers Z . Hence, run-time values V are integers Z orcapabilities Cap . And therefore, a memory

Mem : Z ﬁn − ⇀ V ofa program’s execution state in ImpMod can contain capabilityvalues. The ﬁrst ﬁeld of the capability value indicates its type:code ( κ ) or data ( δ ). The getter of this ﬁeld is the expression capType . The next three (integer) ﬁelds are respectively the start , end , and oﬀset of a capability. The start and end identifythe memory region on which this capability authorizes a (code( κ ) or data ( δ )) access operation. The offset designates the oneaddress at which an access operation is performed. The offsetshould be within range (checked by Rule Eval-star): (Eval-star) e ⇓ ( δ, st , end , oﬀ ) st ≤ st + oﬀ < end Mem ( st + oﬀ ) = v ∗ ( e ) ⇓ v For space constraints, the full expression semantics is givenin the appendix in Figure 5. Notice in the main function on line 8 the use of variable iobuffer ; thesyntax for l-values is more explicit than C syntax. ) Commands and execution state of

ImpMod : Commandsof

ImpMod are denoted with

Cmd : Cmd ::=

An execution state s of a program in ImpMod consists of thememory

Mem , the stack pointers Φ of the module-local stacks,the program counter pc , the trusted control stack stk —whichis shared among all modules of a program—and the memory-allocation status represented by the next-free-address nalloc .The space for dynamic memory allocation (i.e., the heap) isalso shared by all the program modules (like the trusted controlstack). The semantics for allocation and assignment are givenin the appendix in Figure 6. B. CHERIExpress : An imperative language with modules,and capabilities

Our target language

CHERIExpress is (like

ImpMod ) animperative language with modules. However (unlike

ImpMod ),it does not feature variables, neither global nor local. Instead,it only features “capability registers” and integers as baseexpressions. With these capability registers in hand, the roleof the compiler from

ImpMod to CHERIExpress is toimplement the operations on pointers by using operationson capability registers. The memory model (as explainedalready in Section II-A1 about the memory model of

ImpMod )is capability based. Through this capability-based memorymodel and the capability registers,

CHERIExpress modelsa higher-level and simpliﬁed version of CHERI assembly [11].

1) Expressions and commands in

CHERIExpress : Ex-pressions are denoted with e . e ::= Z | getddc | getstc | e ⊕ e | inc ( e , e ) | deref ( e ) | start ( e ) | end ( e ) | oﬀset ( e ) | capType ( e ) | limRange ( e , e , e ) No expression allows fabrication of an arbitrary capabilitythus exhibiting capability unforgeability . The getter expres-sions getddc and getstc are the only base expressions thatevaluate to a capability value. Evaluation of the expressions getddc and getstc immediately results in the current valueof the respective capability registers, ddc and stc . (Eval-ddc) Mem , ddc , stc , pcc ⊢ getddc ⇓ ddc There is a third capability register that is also part of theexecution state, namely, the pcc register. It holds the programcounter capability which points to and allows the executionof commands. This register is however eliminated from thepossible expression forms in

CHERIExpress . This elimi-nation is a simple way of enacting in our model the prohibition(mentioned in Section I-B) on linking with contexts that The choice of the names of capability registers hints at their recommendedusage: ddc stands for default data capability [11], and stc which we deﬁnedfor

CHERIExpress stands for stack capability. We use ddc as a capabilityon the per-module data segment, and stc on the per-module stack. mention the pcc register. Commands in

CHERIExpress are (modulo expressions) the same as commands in

ImpMod .Thus, instead of presenting here the full operational semantics,we focus next on presenting the details of our

PAC compiler.III.

ImpMod TO CHERIExpress : APOINTERS - AS - CAPABILITIES COMPILER

Our

ImpMod to CHERIExpress compiler J · K is a PAC compiler. In this section, we present its crucial bits, namely,the translation of

ImpMod expressions to

CHERIExpress expressions. The translations of commands (denoted L · M ) andof modules (denoted J · K ), on the other hand, are not surprisingdue to the closeness of the source and target languages.The translation of expressions * · + : e → e , whose excerptsare presented below, is indexed by the syntactic information ﬁd , modID (function id and module id) giving the scope of theexpression being translated, and β giving layout and boundsof variables. * z + _ def = z * e ⊕ e + ﬁd , modID ,β def = * e + ﬁd , modID ,β ⊕ * e + ﬁd , modID ,β * & vid + _ , modID ,β def = limRange ( ddc , start ( ddc ) + st , start ( ddc ) + end ) when β ( vid , ⊥ , modID ) = ( st , end ) * & vid + ﬁd , modID ,β def = limRange ( stc , st + start ( stc )+ oﬀset ( stc ) , end + start ( stc )+ oﬀset ( stc )) when β ( vid , ﬁd , modID ) = ( st , end ) * vid + ﬁd , mid ,β def = deref ( * & vid + ﬁd , mid ,β ) * & e arr [ e oﬀ ] + ﬁd , mid ,β def = inc ( * & e arr + ﬁd , mid ,β , * e oﬀ + ﬁd , mid ,β ) * e arr [ e oﬀ ] + ﬁd , mid ,β def = deref ( * & e arr [ e oﬀ ] + ﬁd , mid ,β ) Translating expressions start , end , oﬀset , capType , and limRange is straightforward and, similar to * e ⊕ e + , they arehomomorphisms (as expected [12]). Drawing on the transla-tion of expressions we come again to the ImpMod program ofListing II.1 where Line 8 is translated as shown in Example 2:

Example 2 (Translation line 8 from Listing II.1) . L Assign &( iobuﬀer [ ]) M = Assign * &( iobuﬀer [ ]) + * + = Assign inc ( * & iobuﬀer + , ) = Assign inc ( limRange ( ddc , start ( ddc ) + , start ( ddc ) + ) , ) Note that the compiler uses the bound information that isgiven in the text of the program in the declaration of the array(Line 4) to introduce explicit curbing (using limRange ) ofthe ddc capability so that the resulting capability is of thesame size as the declared array size ( ) and not bigger, acurbing that in turn ensures that the assignment succeeds onlyhen the offset is within the declared bounds of the array. InLine 8, we could already ensure the same by trivial static anal-ysis. But more crucially, when static analysis is not feasible,this automatic curbing introduced by the compiler ensures thatthe run-time check that

CHERIExpress provides is doneagainst the correct bounds.In Line 11, the main function passes a pointer to an externaluntrusted function (here no static analysis is feasible unlessthe external function were trusted). Example 3 shows how totranslate Line 11.

Example 3 (Translation line 11 from Listing II.1) . L Call send _ rcv (& iobuﬀer ) M = Call send _ rcv ( limRange ( ddc , start ( ddc ) + , start ( ddc ) + )) Again, curbing the ddc capability is achieved by means ofthe limRange expression and it is crucial for the security proof. Without such curbing, we would not be able to provefull abstraction of our compiler. In fact without it, the untrusted send_rcv function will have access to the secret when bydeﬁnition of the semantics of

ImpMod it should not.We next explain the end-to-end security guarantee that our

PAC compiler provides.IV. P

ROVING THE COMPILER IS SECURE

This section discusses FA , the formal criterion we prove todemonstrate that our compiler is secure (Section IV-A). Thenit presents a high-level overview of the proof (Section IV-B).The proof relies on the deﬁnition of auxiliary trace semantics(Section IV-C), and on a novel proof technique ( TrICL )that allows extending the compiler-correctness relation intoan alternating simulation relation that is indexed by a trace(Section IV-D).

A. Fully abstract compilation, formally

Our compiler J · K is provably secure. The formal deﬁnitionthat we choose for compiler security is standard [10, 13],namely, full abstraction . A compiler is fully abstract whenit both reﬂects and preserves contextual equivalence. Theorem 1 (The compiler J · K is fully abstract) . (i) The compiler J · K reﬂects contextual equivalence: ∀ p s1 , p s2 . J p s1 K ≃ ctx J p s2 K = ⇒ p s1 ≃ ctx p s2 (ii) The compiler J · K preserves contextual equivalence: ∀ p s1 , p s2 . J p s1 K ≃ ctx J p s2 K ⇐ = p s1 ≃ ctx p s2 The compiler reﬂecting contextual equivalence (condi-tion (i)) means that whenever it produces two contextuallyequivalent programs ( J p s1 K ≃ ctx J p s2 K ), then the source pro-grams themselves must also have been contextually equiva-lent ( p s1 ≃ ctx p s2 ). This condition (reﬂection of contextualequivalence) ensures that the transformation J · K is non-trivial.A trivial compiler might compile semantically different pro-grams to the same output program. However, it would violatecondition (i) because semantically different source programs are by deﬁnition not contextually equivalent. Notice that con-dition (i) usually even follows from whole-program correct-ness , namely, whole-program backward simulation (togetherwith some structural lemmas about linking and compilation).Whole-program backward simulation is a standard [14, 15]formal criterion for considering a compiler bug-free withrespect to a given source semantics.Condition (ii), on the other hand, ensures that whenevertwo source programs, p s1 and p s2 , are not distinguishable inthe source semantics ( p s1 ≃ ctx p s2 ), then after compilation,they should remain indistinguishable in the target semantics( J p s1 K ≃ ctx J p s2 K ). Loosely speaking, what this conditionensures is that no extra distinguishing power may be gained bywriting distinguishing contexts in the target language (i.e., byattempting to distinguish the compiled rather than the sourceprograms) compared to writing contexts in the source. B. Overview of the proof

To prove that our compiler is fully abstract, we proveboth the contrapositive of condition (i), and the contrapositiveof condition (ii). The former follows easily from compilercorrectness and its proof is fully detailed in the supplementarymaterial. The latter, namely preservation of contextual equiv-alence , is depicted in Figure 1 and it follows from three mainlemmas (Lemmas 1 to 3) that all rely on an auxiliary (andmore manageable) deﬁnition of program equivalence called trace equivalence (denoted T = ). J p s1 K ctx J p s2 KJ p s1 K T = J p s2 K p s1 ctx p s2 p s1 T = p s2 Lemma 1 Lemma 2 Lemma 3Contrapositive of (ii)by Lemmas 1 to 3

Fig. 1: Visual decomposition of our proof of (ii).

Lemma 1 (Soundness of target trace-equivalence) . ∀ p s1 , p s2 . J p s1 K ctx J p s2 K = ⇒ J p s1 K T = J p s2 K Lemma 2 (Compilation preserves trace equivalence) . ∀ p s1 , p s2 . J p s1 K T = J p s2 K = ⇒ p s1 T = p s2 Lemma 3 (Completeness of source trace-equivalence) . ∀ p s1 , p s2 . p s1 T = p s2 = ⇒ p s1 ctx p s2 The three lemmas above, put together as in Figure 1,establish that the compiler preserves contextual equivalence(the horizontal dashed arrow). One can intuitively understandthem as follows:1) Lemma 1 ensures we can abstract contextual equivalenceas trace equivalence. Assuming the common ways of deﬁning termination, and of deﬁningcontextual equivalence to be equi-termination under all linkable contexts ) Lemma 2 ensures (in the contrapositive) that the compilerpreserves trace equivalence.3) Lemma 3 ﬁnally ensures we can concretize trace equiv-alence back as contextual equivalence.Lemma 2 is the hardest, so we dedicate Section IV-D toexplaining its proof. The proofs of Lemmas 1 and 3 reusesome of the ideas introduced in Section IV-D as well. Whilethe general approach (Figure 1) that we adopt—namely, theuse of trace equivalence as a go-between in the proof—is not new [16, 17, 18, 19], we believe the elaboration here inthis paper of many details (especially the trace-indexed cross-language (

TrICL ) simulation) is a novelty of this work. Thereason we come up with the

TrICL simulation technique isto provide a principled way of re-using the simulation relationthat is built for compiler correctness by extending it into an al-ternating simulation relation for the trace-based security proof(namely, the proof of Lemma 2). This re-usability advantagemight spark the interest of researchers in extending state-of-the-art correctness proofs for realistic compilers into securityproofs (potentially by targeting architectures like

CHERI ). C. Trace equivalence

The reason we deﬁne this auxiliary program equivalencecalled trace equivalence is to abstract over the execution steps → that are made by the target context. The way we abstractis we characterize the behavior of a program of interest p bya set Tr ( p ) of informative trace preﬁxes α that capture allthe possible interactions of the program p with any context.Each trace preﬁx α is a ﬁnite list λ of labels λ . Two programs p and p are then trace equivalent (written p = p as notedearlier) whenever they have the same set Tr ( p ) of informativetrace preﬁxes ( α ): Deﬁnition 2 (Trace equivalence of programs) . p = p def = ∀ α. α ∈ Tr ( p ) ⇐⇒ α ∈ Tr ( p ) It then remains to understand how we deﬁne the set Tr ( p ) for a program p . To explain that, we show an excerpt of the labeled trace-step relation λ − ⇀ p that we deﬁne for our targetlanguage CHERIExpress . Note that a very similar tracestep relation is deﬁned for

ImpMod programs too. The trace-step relation relates two trace states and a label λ . A tracestate in CHERIExpress is written as ( s , ς ) : it extends thenormal execution state s with auxiliary information ς , which isthe set of memory addresses shared so far (i.e., from the initialstate and up until execution state s ) between the program ofinterest p and the context. This auxiliary information is used todeﬁne an informative trace label λ , which records a snapshot Mem of the entire shared memory. Trace labels λ of both ourlanguages have the following forms: λ ::= τ | X | call ( ﬁd ) v ? Mem , nalloc | call ( ﬁd ) v ! Mem , nalloc | ret ? Mem , nalloc | ret ! Mem , nalloc

1) A silent label τ is uninformative: it abstracts over anyexecution step that is internal to either the program orthe context.2) A termination label X indicates a terminal execution statewas reached. (Once a X appears, it re-appears in all nexttrace positions.)3) An input call label call ( ﬁd ) v ? Mem , nalloc indicatesthat at an execution state where the shared memory was Mem , and the allocator status was nalloc , the contextcalled the program’s function ﬁd with the list of values v as arguments.4) An output call label call ( ﬁd ) v ! Mem , nalloc is similarexcept now in the opposite direction: the program calledthe context’s function ﬁd .5) An input return label ret ? Mem , nalloc indicates that atan execution state where the shared memory was Mem ,and the allocator status was nalloc , the context returnedcontrol to a caller function in the program.6) An output return label ret ! Mem , nalloc is similarexcept the program was the one that returned control tothe context.Fig. 2: Trace semantics of CHERIExpress (Excerpts) . Thetrace-step relation is indexed with a program p . (Return-to-program) s → s ′ s . M c ( s . pcc ) = Returns . pcc * dom ( p . M c ) s ′ . pcc ⊆ dom ( p . M c ) ς ′ = reachable _ addresses _ closure ( ς, s ′ . Mem )( s , ς ) ret ? s ′ . Mem | ς ′ , s . nalloc −−−−−−−−−−−−−−−− ⇀ p ( s ′ , ς ′ ) (Call-to-program-silent) s . M c ( s . pcc ) = Call ﬁd e s → s ′ s . pcc ⊆ dom ( p . M c ) s ′ . pcc ⊆ dom ( p . M c )( s , ς ) τ − ⇀ p ( s ′ , ς ) Figure 2 gives two rules of the trace-step relation λ − ⇀ p :one where the label emitted is informative (namely, an in-put return label), and the other where the label emitted isnot informative ( τ ). Notice that the assumptions of bothRules Return-to-program and Call-to-program-silent performa check on the program counter capability pcc both beforeand after the execution step (i.e., in states s and s ′ ). If thiscapability value is changed with respect to the program-of-interest’s code memory ( p . M c ) by the execution step, thenthis indicates that control went into (or out of) the program ofinterest p . We call such a change of control with respect tothe program of interest a border-crossing action, and we giveit an informative label that records the memory snapshot andthat updates the auxiliary information (computing ς ′ ). If on theother hand, the capability value in pcc is not changed (withrespect to p . M c ) by the execution step, then this indicatesno interesting change of control took place. We call such anuninteresting action an internal action (i.e., internal to eitherthe program p or to the context), and give it the uninformativelabel ( τ ).ll uninformative labels ( τ ) are eventually dropped, andwe concatenate only the informative labels into a trace preﬁx α . These technicalities (all worked out in the supplementarymaterial) are mostly inspired by process calculi [20]. Also,the idea of a trace label containing a shared-memory dump,and of the auxiliary information ς recording a summary of theshared addresses is inspired by Laird’s [21] trace semanticsfor a lambda calculus with general references.One key result of the choice of these ? - and ! - decoratedtrace labels, is that the trace preﬁxes formed by the concatena-tion of the non-silent labels are alternating . By alternating, wemean that the decorators “ ? ” for input and “ ! ” for output occuralternately on the trace preﬁx. This is true of any trace preﬁx α in the set of traces Tr ( p ) of a program p in both our targetlanguage CHERIExpress and the source one

ImpMod : Fact 1 (Traces are alternating) . α ∈ Tr ( p ) = ⇒ α ∈ Alt X ∗ where Alt def = ( • ? | ǫ ) ( • ! • ?) ∗ ( • ! | ǫ ) and • ? is the set of ? -decorated labels, and similarly for • ! . D. Using TrICL simulation to prove that our

PAC compilerpreserves trace equivalence

Having introduced how our auxiliary trace semantics works,we return back to Lemma 2, which states (in the contraposi-tive) that our compiler preserves trace equivalence. Our way toprove this lemma is to reduce it to the following two lemmasby unfolding Deﬁnition 2:

Lemma 4 (No trace is omitted by compilation) . ∀ α, p s . α ∈ Tr ( p s ) = ⇒ α ∈ Tr ( J p s K ) Lemma 5 (No trace is added by compilation) . ∀ α, p s . α ∈ Tr ( p s ) ⇐ = α ∈ Tr ( J p s K ) The ﬁrst (Lemma 4) follows easily by lifting the compilerforward-simulation result (which we prove about the executionsteps) to trace steps. The second lemma (Lemma 5), however,is a bit involved, and requires extending the compiler forward-and backward- simulation relation into an alternating cross-language relation, namely

TrICL , that is indexed by an exe-cution trace. The alternation of the

TrICL simulation relationrelies on Fact 1 about the alternation of traces.To understand how the proof of Lemma 5 works, we ﬁrstillustrate how an example trace preﬁx α emitted by an example compilation would look like. Example 4 portrays a tracepreﬁx that is emitted by the compiled version of the example ImpMod module of Listing II.1 (with indications of each partof the action). For simplicity, we take our program of interestto be this singleton list of modules. In general, a program ofinterest can consist of many modules. The ∗ in X ∗ is because a terminal state loops forever in our model. Example 4 (Trace preﬁx of the compilation of the programin Listing II.1) . call ﬁd z }| { ( send _ rcv ) v z }| { [( δ, σ, σ + 512 , σ , . . . , σ + 42 , . . . , σ + 511 | {z } Mem , − |{z} nalloc :: ret ? z }| { [ σ , . . . , σ + 42 , . . . , σ + 511 , z}|{ − In the example, the compilation of the ﬁrst three commandsgenerates silent steps, which are dropped from traces andtherefore not shown. Then the two non-silent labels illus-trate the two cases of the proof of the alternating

TrICL simulation:1) the function call on Line 11 is border crossing, so its com-pilation emits a non-silent label, namely, an output calllabel which contains the callee function id ( send _ rcv ),the argument to the call (the δ -capability representing thepointer &iobuffer ), the direction of the call ( ! denotingoutput, i.e., program-to-context), a dump of the memoryshared so far (namely, the contents of the array ioubuffer ), and the value − denoting the ﬁrst heap address (theheap grows towards negative addresses).2) the target context (in which our compiled program ex-ecutes) returns control to the program after zeroing outthe contents of the shared memory, emitting the inputreturn label that is shown last in Example 4.Notice that the goal from Lemma 5 is to show that exactlythe given “target” trace preﬁx of Example 4 can be emittedby the source program (in some source context).We show in Listing IV.1 a source context that (when linkedwith the program in Listing II.1) emits the trace in Example 4. Module id: Networking Import module: HelperBackTranslation current_trace_idx; send_rcv(iob_ptr) { Call readAndIncrementTraceIdx(¤t_trace_idx); Call saveArgs_send_rcv_1(iob_ptr); Call saveSnapshot_0(); Call doAllocations_1(); Call mimicMemory_1(); Return; } / ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ / Module id: HelperBackTranslation current_trace_idx; arg_store_0_send_rcv_0; snapshot_0_ σ ; ... snapshot_0_ σ + 511 ; mimicMemory_1() { Assign ∗ (arg_store_0_send_rcv_0[0]) 0; ... Assign ∗ (arg_store_0_send_rcv_0[511]) 0; Return; } ... Listing IV.1: Example back-translation (simpliﬁed excerpt):An

ImpMod context emulating the trace of Example 4.The emulating context consists of two modules,

Networking , which implements the API function send _ rcv ,and HelperBackTranslation , which implements helperunctions and maintains metadata. We show just one exampleof such a helper function, namely mimicMemory _ () . mimicMemory _ () is called (on Line 11) by send _ rcv () , sothat the former zeroes out the IO buffer. The way the IObuffer is accessible at all by mimicMemory _ () is through thepointer stored in the global variable arg _ store _ _ send _ rcv _ (Line 18). This pointer would have been stored (not shown)by the function call saveArgs _ send _ rcv _ ( iob _ ptr ); onLine 8.We brieﬂy explain what each helper function does. First,notice that given a ﬁnite trace preﬁx, the context that em-ulates this trace preﬁx deﬁnes a set of helper functions per trace label . The index of the corresponding trace labelappears in the identiﬁer of a helper function (for example, mimicMemory _ () , and saveSnapshot _ () ).To explain the helper functions, we follow the body of send _ rcv ( iob _ ptr ) line by line. In the beginning, the call to readAndIncrementTraceIdx keeps track of the current positionin the trace. This knowledge of the current position in thetrace is not used in our toy example, but it would be usedif the API function ( send _ rcv in this case) were called atmore than one trace position ; at each trace position, we woulduse this knowledge to call the corresponding helper functions(e.g., mimicMemory _ () instead of mimicMemory _ () , whichwould carbon copy to the shared memory the values thatappear in trace position 3 instead of 1).Next, on Line 8, we store the pointer iob _ ptr in a globalvariable in the HelperBackTranslation module (as explainedabove). The reason we need to store it in a global variableis that we may need to use it in a future trace position, notjust in the current call to send _ rcv . For the same reason (thepotential future use), we save (on Line 9) a snapshot of thewhole shared memory in global variables snapshot _ _ σ to snapshot _ _ σ + .Next, on Lines 10 to 12, the actual emulation of thetrace action at trace position 1 is done. In our toy example, doAllocations _ () would do nothing because nalloc in Exam-ple 4 does not change. Then, mimicMemory _ () writes allthe values (the zeros) to the shared memory before send _ rcv eventually returns, transferring control from the emulatingsource context back to the program of interest. The “vertical gap” challenge and how the

TrICL relation solves it : Notice two main differences between theinternal behavior of the emulating source context (that weexplained above) and the behavior of the target context:(i) the speciﬁc sequence of updates to the shared memory(before returning to the program of interest), and(ii) the sequence of internal function calls (before returningto the program of interest)These two differences mean that respectively the memory andthe trusted call stack do not remain in sync (between theemulating source context and the given target context). We callthis lack of sync between the execution states of the sourceand target contexts the vertical gap challenge.The vertical gap is a challenge we face when attemptingto deﬁne a cross-language relation between the given target execution and its emulating source execution. Notice that bydeﬁning the traces to capture only the observable behavior(and not the internal execution) of the program/context, wehave created this vertical gap. However, the reason we deﬁnedthe traces this way is we want to allow more freedom towhat the back-translation function is rather than restrictingthis function to be the “inverse-compilation” function (becausesuch a restriction would propagate assumptions that weakenour threat model, assumptions that would require a targetcontext be only a context resulting from compilation).Having acknowledged that the vertical gap is an inherentchallenge of our threat model and of the nature of ourlanguages (the languages that, unlike prior work [16, 17,18, 22, 23], allow ﬁne-grained memory sharing), we designedthe

TrICL relation, which solves the vertical gap by relyingon an alternating strong/weak similarity, and by introducing amediator execution.The

TrICL relation is deﬁned in terms of strong ( ≈ )and weak ( ∼ ) similarities, and in terms of the relation forwhole-program compiler-correctness ( ∼ = p ). That TrICL reusesthe compiler-correctness relation ( ∼ = p ) is one of the mainadvantages of TrICL . Deﬁnition 3 (Trace-Indexed Cross-Language (

TrICL ) alter-nating simulation relation) . TrICL ( s emu , s med , s given , ς ) α,i, p def = s emu ∼ = p s med ∧ emulate _ invariants ( s emu ) α,i, p ∧ α ( i ) ∈ • ! = ⇒ ( s med , ς ) ≈ J p K ( s given , ς ) ∧ α ( i ) ∈ • ? = ⇒ ( s med , ς ) ∼ J p K ( s given , ς ) The

TrICL relation helps us prove Lemma 5 becausewe can rely on the fact that

TrICL satisﬁes the following alternating backward-simulation condition:

Lemma 6 ( TrICL step-wise alternating backward-simulation) . α ∈ Alt ∧ TrICL ( s emu , s med , s given , ς ) α,i, p ∧ ( s given , ς ) ⇀ α ( i ) −− ⇀ J p K ( s ′ given , ς ′ )= ⇒∃ s ′ emu , s ′ med . ( s emu , ς ) ⇀ α ( i ) −− ⇀ p ( s ′ emu , ς ′ ) ∧ ( s med , ς ) ⇀ α ( i ) −− ⇀ J p K ( s ′ med , ς ′ ) ∧ TrICL (cid:0) s ′ emu , s ′ med , s ′ given , ς ′ (cid:1) α,i +1 , p Observe that the

TrICL relation is a ternary relationbetween two target states and a source state: The auxiliary information ς is considered part of the three trace-states,hence we do not count it in the arity, and only count the three trace-states. ) the source state , which comes from the execution of C emu [ p ] , which is the program of interest p linked with theemulating source context C emu (Listing IV.1 is an example C emu ),2) the ﬁrst target state, namely, the mediator state , whichcomes from the execution of J C emu K [ J p K ] , which is the translation of both the program of interest and theemulating context, and3) the second target state, namely, the given state , whichcomes from the execution of C given [ J p K ] , which is the translation of the program of interest, linked with a targetcontext. TrICL introduces a “mediator” execution (state s med ) be-tween the execution of the back-translated source context (state s emu ), and the execution of the given target context (state s given ). This mediator execution enables us to convenientlyre-use both compiler backward- and forward- simulations. (InDeﬁnition 3 above, notice that independently of the decorator/-parity of the label α ( i ) , we need in Lemma 6 to re-establishthe compiler correctness relation ( ∼ = p ). There is where we inone case need compiler backward-simulation and in the otherforward-simulation). TrICL helped us solve the vertical gap because it reducedthe lack of sync problem between s emu and s given to alack of sync between s med and s given . The latter is easierbecause it can be captured using a same-language invariant,namely the strong/weak similarity relation. The key idea of thestrong/weak similarity is to characterize the strong relation( ≈ J p K ) by equality of the whole reachable memory, and theweak relation ( ∼ J p K ) by equality of the unreachable memory,i.e., the memory that is private to the program of interest J p K . Because the memory private to J p K is by deﬁnitionuntouchable by the emulating context, weak similarity is weakenough to allow the (compiled) emulating context to do anarbitrary sequence of memory updates (and internal functioncalls). Weak similarity is however just strong enough to be re-strengthened upon a successful emulation of the input (?) step.(Notice from Deﬁnition 3 that weak similarity holds when the context is executing—identiﬁed by the next label α ( i ) beingan input (?) to the program of interest).Due to lack of space, we defer lots of proof details aboutthe strong and weak similarity relations to the technical report.Most interestingly, we introduce there the idea of a successor-preserving isomorphism and use it to deﬁne the stack similarityrelations that allow the internal function calls to differ betweenthe stack of s med and the stack of s given .We focused in Section IV-D on the proof of Lemma 2because we think it is the most interesting (compared toLemmas 1 and 3). Nevertheless, it is worth mentioning thatfor the proof of Lemma 1, we actually reuse the idea of thestrong and weak similarity relations that we just introducedabove. Also, for Lemma 3, we use a construction similar tothe back-translation that we introduced in Listing IV.1. V. I MPLEMENTING AUTOMATIC MODULE - BASEDISOLATION FOR C USING THE

CHERI

INFRASTRUCTURE

The

CHERI compiler already implements the

PAC prin-ciple, but it does not yet implement a module-based isolationscheme like our

ImpMod does. This section presents how toimplement such a scheme as a source-to-source C compilerthat annotates C programs with compartmentalization direc-tives that, in turn, the

CHERI system (including the

CHERI

Clang/LLVM) [4] understands and implements. This providesa way to automatically enforce our module-based isolation forreal-world C programs . The implementation of this source-to-source compiler relied on extensions to libcheri that wedetail in Section V-C. A. libcheri In order to present the source-to-source compiler, weﬁrst brieﬂy explain libcheri [24], CheriBSD’s programmer-friendly interface to the

CHERI compartmentalization fea-tures. Under libcheri , the isolated, compartmentalized partsof a program are called sandboxes and libcheri is the APIfor loading and invoking sandboxes.

CHERI ’s in-process sandboxes are objects which are repre-sented in

CHERI assembly using invokable object capabili-ties [24]. The creation of these objects is the responsibilityof the programmer who groups functions into classes andcreates an invokable object capability of the class type. Theprogrammer does that by annotating function declarations as cheri_ccall so that conventional function calls are replaced byobject capability invocations. Functions that are exported bythe current sandbox should be annotated with the cheri_ccallee attribute. At run time, libcheri loads the executable imagesfrom the ﬁle system and creates the respective object capabil-ities. It is critical that, when the sandbox loading routines arecalled, the program is in its initial state where it has controlover its entire address space as well as ﬁle system access.Finally, constructor functions sandboxes_init must be addedto the modules that have call sites to external functions. Theseconstructor functions essentially perform a second linkingphase to correctly initialize the references that modules holdto each other’s exported functions.Listing V.1 below shows how these annotations are addedto (a subset of) the code snippet of Listing II.1. struct cheri_object networking; int iobuffer [512]; int secret; __attribute__ ((constructor)) static void sandboxes_init(void){ networking = fetch_object("networking"); } __attribute__((cheri_ccall)) __attribute__((cheri_method_class(networking))) int send_rcv(int ∗ ); __attribute__((cheri_ccallee)) __attribute__((cheri_method_class(main))) int main(void); __attribute__((cheri_ccallee)) __attribute__((cheri_method_class(main))) int read_secret(void); Listing V.1: Adding

CHERI annotations to existing code. . Source-to-source transformations

Our compilation scheme maps each C module to a separatesandbox, i.e., it assigns a

CHERI class and creates a

CHERI object for each module. To do this, our compiler createsa mapping from function identiﬁers to C modules, whichhelps resolve dependencies in the next step. Then it traverseseach module’s AST and annotates every external function declaration it encounters as either cheri_ccall or cheri_ccallee ,depending on whether the function is deﬁned in the currenttranslation unit or is exported (more examples are availablein the Appendix Listings A.1 to A.4). As a performanceoptimization over the ImpMod model, intra-module functioncalls do not translate to object capability invocations. Instead,they are ordinary MIPS function calls. This change has nosecurity implications since the module is the unit of trust.The initialization of sandboxes required some extensions to libcheri , which we now describe. C. libcheri extensions One hurdle in using libcheri for our compilation schemeis initialization. libcheri exports two functions for sandboxcreation: sandbox_class_new , which instantiates the class , and sandbox_object_new , which creates an object of a certain class.As stated in Section V-A, the calls to these two functions needto take place when the program is at its initial state with controlover the entire address space. We implement this by annotatingthe initialization functions for all of the potentially requiredsandboxes as constructor functions, which are invoked at thevery beginning before executing the main function.Concretely, we extend libcheri with a new load/initial-ization function, sandbox_chain_load() . This function is meantto be called only once by an initialization module, which isthe only privileged part of the program (and hence can invokesystem calls and create sandboxes). The initialization function sandbox_chain_load() loads the main sandbox, the module im-plementing the main() function, from the ﬁle system and alsoany modules that main depends on (recursively). It also createsrelevant object capabilities for every sandbox and places themat the beginning of the sandbox’s data segment. As a result,every sandbox has access to the object capabilities necessaryto invoke exported functions from other sandboxes. Extending libcheri required considerable additions to the libcheri code base including the deﬁnition of sandbox_chain_load() , newversions of sandbox creation routines that support sandbox de-pendencies and low-level macros that expose relevant sandboxmetadata to C.VI. E

XPERIMENTAL EVALUATION

Evaluating our module-based isolation scheme and its proof-of-concept implementation is a matter of application support and performance . In terms of support, it is important toestablish the viability of building and executing real-worldC applications under our scheme (Section VI-A) while per-formance measurements are necessary in order to reveal theoverhead that our scheme introduces (Section VI-B).

A. Manual code changes

We ported four large open-source C libraries, and care-fully chose a heterogeneous set of examples: zlib [25],

LibYAML [26],

GNU-barcode [27], and libpng [28]. Theseprograms can be built using

CHERI ’s PAC compiler, whichonly translates pointers to capabilities, but the situation istrickier after we apply our source-to-source transformations.In particular, there are additional constraints in C code whenusing sandboxes in

CHERI and there is no guarantee that thecode will adhere to them and as a result, we encountered anumber of incompatible programming patterns.TABLE I: Incompatible programming patterns

Software zlib LibYAML GNU-barcode libpng

Passed local ref 2 15 0 13Extern global var 3 0 0 0Pointer to ext fun 2 1 26 2Other 0 1 1 4Total 7 17 27 19

TABLE II: Porting overhead

Software zlib LibYAML GNU-barcode libpng

Lines of code 11255 12762 4657 33029Altered lines 130 114 164 51Percentage 1.15 0.89 3.5 0.15

Table I summarizes the frequency of occurrence of eachtype of incompatible pattern that we found in each of theseexample libraries. Table II summarizes the amount of codechange that we needed to make (manually) so that we cancompile and successfully run the benchmarks for each of theseexample libraries. Overall, Tables I and II let us conclude thatthe amount of code changes we had to make is rather small. Wediscuss the three most common, and interesting, incompatiblepatterns (in row order). a) Pointers to local variables passed as argument to anexternal function: CHERI prohibits passing local capabilitieswhen invoking object capabilities and this pattern will triggerthe violation. The solution is simple as a semantically equiv-alent result can be obtained by moving the guilty variables tothe heap, either as a global variable or dynamically allocated.This is a fundamental limitation of

CHERI and is reﬂectedin our theoretical model. More on the issue of temporal safetyon capability machines can be found in recent work [6, 7, 8]. b) External global variables:

Our compiler supportsglobal variables that are used in the same translation unitthey are deﬁned in, regardless of their qualiﬁers. It does not,however, support the declaration or usage of a global variablethat is deﬁned in a different translation unit. Instead, theimporting translation unit may use a pointer to that variable, Note that the measurements come from the UNIX utility diff , wheremodifying a line counts as two lines modiﬁed: one for deleting the originaland one for adding its replacement. Having said that, the ratio was very lowwith zlib , LibYAML and libpng at a remarkable 1.15% and 0.89% and 0.15%respectively with

GNU-barcode at a slightly higher 3.5% here this pointer is explicitly imported at run time (ideallyduring an initialization phase by calling a getter function). Fullsupport would require additional work to the

CHERI linker. c) External function pointers: In CHERI , a C functionpointer is always compiled to an executable capability. Thiscan be problematic under our scheme in the case where thepointee function belongs to a different module. In the easy casewhere we can determine the pointee statically , we can do awaywith the function pointer altogether and replace it with thepointee. If on the other hand, the decision of which callbackfunction to provide is made by the provider only at runtime ,then a major re-factoring is needed. This re-factoring modiﬁesboth the callback provider and the callback consumers. If thecallback consumer is however an arbitrary client library, thenour source-to-source transformation alone will not be able tohandle this without forcing the programmer to rewrite the API(giving up function pointers) and consequently to change everyclient library (i.e., every use of the API).

B. Performance

To measure the performance of our system, we carried outbenchmarks on the software we ported to our scheme. Wecompared the secure, compartmentalized version of portedsoftware with one running in a single sandbox. This isbecause unsandboxed execution has a signiﬁcantly alteredexecuting environment w.r.t when sandboxes are involved, andthat heavily affects performance. The memory allocator is atelling example, as the one used for sandboxed execution isless memory efﬁcient but much faster. So a single-sandboxexecution environment “evens up” the playing ﬁeld yield-ing measurements of the actual overhead. The benchmarkswere performed on a

CHERI virtual machine (implementing

CHERI

ISA version 5 [29]) running our modiﬁed version ofCheriBSD.Before presenting the results, it is important to brieﬂydiscuss the main factor that affects performance, namelyobject capability invocation. As mentioned in Section V, ourcompiler only affects declarations of external functions whileC statements remain unaltered. Consequently the transformedexecutable should differ from the original only on the invoca-tion method of external functions. The rest of the instructions,representing the parts of the compiled program that are notrelated to object invocation, are identical.The performance overhead of object capability invocationhas been covered in earlier literature [4], measuring at about500 machine cycles per invocation. This cost includes user-space clearing of unused argument registers, saving and restor-ing registers and kernel-space argument validation. Variousoptimization techniques are being explored [3] and partiallyimplemented, as the latest published version of the

CHERI speciﬁcation [11] supports non-exception-based domain tran-sitions. However, the performance gains are either specula-tive [3] or otherwise unexplored.What this means is that performance boils down to thenumber of external function calls in relation to the rest ofthe program’s statements: A higher proportional number of external function calls would translate to a higher number ofobject capability invocations and a more pronounced delay.This is, of course, a characteristic of individual software andnot of our compiler, thus begging the question of whether ornot the number of external function calls in real-world softwareis sparse enough to guarantee an acceptable performanceoverhead when compiling it under our scheme.The benchmark results are summarized in Figure 3. Notethat

GNU-barcode is designed to process bar codes of limitedsize so there was no point in running benchmarks for it. Eachof the three remaining software demonstrate a distinct case ofoverhead while all of them conﬁrm that the number of sandboxinvocations is by far the biggest delay factor. There is also asmall, constant delay of about 0.2 seconds associated with thesandbox loading routines.The ﬁrst benchmark was that of zlib found in Figure 3a.The overall number of sandbox invocations was very lowthroughout, irrespective of the size of the input data, rangingfrom 43 to 441. This is reﬂected in the ﬁgure as the delay isconsistently minor.The case of

LibYAML proved very different. As it turnsout, the number of sandbox invocations were increasing inlinear fashion to the size of the input data. Although initiallythe sandbox invocations were relatively low at 1482, theyreached up to 1523546. This is reﬂected in Figure 3b where theexecution time difference between the two versions increaseswith each iteration, while the relative delay is consistentlybetween 35% and 40%.The ﬁnal case, libpng , was that of a program with a rela-tively high number of sandbox invocations (about 125k) thatremains more or less constant irrespective of the input data.This translates to a perceptible, constant delay throughout, asone can see in Figure 3c.Our benchmarks conﬁrm that the number of sandbox invo-cations is the most signiﬁcant overhead factor. Moreover, eachsandbox invocation introduces a constant amount of delay (seeFigure 4). In this ﬁgure, the x -axis is the number of sandboxinvocations in logarithmic scale and the y -axis is in seconds.We plot the absolute execution time difference between thetwo versions for each of the three test cases. For zlib , theplot is an almost ﬂat line as the number of sandbox invocationsincreases very slightly. The plot for libpng is basically a pointas neither the sandbox invocations or the delay changes inany perceptible way. The most interesting graph is that for LibYAML , where the delay increases in a perfect, linear mannerrelative to the increase of the sandbox invocations (it is noteasy to see as the x -axis is logarithmic).As stated earlier, the overhead being acceptable or notfor a particular piece of software boils down to the numberof external function calls (which are eventually compiled tosandbox invocations) relative to the rest of the code. In general,we cannot make any assumptions on that metric for existingC software, so the question of viability of a certain piece ofsoftware for our scheme can only be answered on a per-casebasis. a) zlib Payload size (mb). Logarithmic T i m e ( s ) securesingle (b) LibYAML Payload size (mb). Logarithmic T i m e ( s ) securesingle (c) libpng Image height (rows). Logarithmic T i m e ( s ) securesingle Fig. 3: Benchmark results for zlib , LibYAML and libpng respectively. Blue lines represent the secure, compartmentalizedversions while red lines represent a single-sandbox execution. All x axes are in Log scale. Sandbox invocations T i m e d i ff e r e n ce ( s ) zlibLibYAMLlibpng Fig. 4: Execution time difference relative to the number ofsandbox invocations.VII. L

IMITATIONS OF THE MODEL AND FUTURE WORK

While we believe our work is a signiﬁcant step forward inthe understanding of the security properties of

PAC compilers,it still makes some simpliﬁcations and assumptions that wouldbe interesting to remove in future work.

Treatment of memory management:

Memory allocation inour model is oversimpliﬁed and does not support de-allocation.This simpliﬁcation allows us to represent the state of thememory allocator as just the next-free-address , and this isessential in keeping our model manageable. To the best ofour knowledge, nobody has yet developed fully abstract tracesemantics for languages with a realistic model of deallocation.In practice, this means that our formal results do not ruleout that compiled programs become distinguishable (i.e. leakinformation) through the way they manage memory.

Other side-channels:

Programs can leak information totheir context through a whole variety of other software-exploitable side-channels. A context can measure executiontime of a program, or determine the memory locations that theprogram has accessed through cache-based side-channels [30].We do not model these channels. As a consequence, ourcompiler also is not guaranteed to preserve resistance against such attacks [31]. There is however recent related work thatspeciﬁcally investigates how to secure compilers such that theypreserve side-channel resistance [32, 33, 34].

Possibly redundant load-time checks:

Our compiler relieson the linker/loader to perform some checks on machine codebefore it can be linked. For instance, to avoid confusionbetween data and code capabilities, the loader should check thelinked code does not access pcc directly. (In our formalization,this is modeled by the absence of a getppc primitive.) Whilesome of these checks (like the one above) are justiﬁable froma security perspective, some are just artifacts of the proofstrategy [35], and do not seem essential for security. Oneexample is that we impose that the context’s data segmentmust be in memory after the program’s data segment. Thereis no security motivation for this check; it just makes theproof easier: the construction of the back-translated contextwill occupy a data segment whose size is in principle larger(due to meta-data) than the size of the data segment of thetarget context that we are back-translating. This check ensuresthat this increase in size does not impact the position of theprogram’s variables in memory. A more complex proof (ora different proof technique) might remove the need for suchchecks. VIII. C

ONCLUSION AND R ELATED WORK

We presented a proof of full abstraction of the compiler

ImpMod to CHERIExpress which we called a “pointers-as-capabilities” compiler. This compiler that we model shouldbe understood as a speciﬁcation for compilers that targetarchitectures like

CHERI . In addition, we have providedevidence for the feasibility of implementing this speciﬁcationby reporting on a proof-of-concept implementation.Our paper is the ﬁrst to formally show that

PAC compilerssatisfy a strong full abstraction property. But there is a richbody of related work studying security proofs for compilers,and compilation to capability machines. We brieﬂy summarizerelated work and the relation to our results. . Compilation to capability machines

The closest work to ours is

StkTokens by Skorstengaard etal. [8]. They also model and verify a compiler transformationin a compiler that targets capability architectures. But thereare broadly two differences to highlight between their modeland ours. First, they model and verify a compiler transfor-mation that implements a calling convention , rather than a

PAC transformation. Second, their compiler targets a linearcapability machine , meaning one that supports a special formof capabilities which is subject to a certain non-re-usabilitycheck that they assume is supported in hardware. Our modeldoes not assume this particular hardware support.The calling convention that they implement relies carefullyon linear capabilities. Instead of

CHERIExpress ’s stack-per-component design (for data stacks), the use of linearcapabilities enables sharing the same uniﬁed control and datastack among all components including mutually distrustfulones: the (untrusted) callee is handed a linear capabilityon the stack. Then after control returns, by convention, thecaller checks for the existence of the same capability, therebydetecting any (malicious or buggy) retention of the stackaccess permission. In contrast, in

CHERIExpress whichlacks linear capabilities, a trusted control stack is assumedto be maintained in kernel space, thereby eliminating eventhe need to hand out a capability on this stack to any user-space program—hence, no retention of such a stack capabilityby a malicious component is at all possible. For storing the data part of the call frames, every component is allocated itsown data stack. The assumption that the kernel is responsiblefor the control stack is compatible with the current

CHERI

ISA [11] speciﬁcation. B. Trace equivalence

Laird [21] gives a fully-abstract trace semantics for a func-tional language with general references. Our trace labels for

CHERIExpress are inspired by this work. Laird relies on a bipartite

LTS in which nodes are partitioned between program-conﬁgurations and environment-conﬁgurations (we used theword “ context ” instead of environment). We, however, do notbuild into

CHERIExpress this explicit segregation betweenprogram and environment conﬁgurations. Instead, we performa simple check on the status of the capability registers beforeand after a step in order to decide the direction of a traceaction. Relying on this check, Fact 1 ensures that programsinteract with the context in precisely the desired alternatingfashion that a bipartition would have ensured.The fact that we chose to use trace equivalence at all (tocharacterize contextual equivalence) is inspired by existingwork that use it in proofs of full abstraction of compilation toProtected Module Architectures (PMAs) and tagged architec-tures [16, 17, 18, 22, 36]. The current

CHERI

ISA also includes an experimental design withsupport for linear capabilities. C. FA Proofs that do not use trace equivalence

Back-translating traces to enable an easy cross-languagesimulation relation is not the only strategy that has been usedin proofs of fully-abstract compilation (surveyed by Patrignaniet al. [13]). Several other proof ideas exist: • While Fournet et al. [37] also use labeled transitions, theirfull abstraction proof of the translation of an ML-likelanguage to Javascript is based on bisimulation . We, onthe other hand, require our TrICL (Trace-Indexed Cross-Language) relation to only satisfy the coarser simulation condition. • In their

StkTokens work, Skorstengaard et al. [8] useanother piece of machinery that does not require labeledtransitions at all: a logical relation with recursive worlds . • New et al. [38] rely on a shallow embedding of the targetlanguage (a λ -calculus with exceptions) into the source(a λ -calculus with recursive types) at a dynamic type,and use a cross-language relation that deﬁnes in multi-language terms the contextual equivalence relation. • Devriese et al. [39] coin the term approximate back-translation to express the idea that the context that theback-translation produces only needs to behave similarlyto the original (and unknown) context for a limitednumber of steps, namely, the number of steps in which it(the original context) terminates. They use step indexingto represent in the cross-language logical relation thisapproximate behavior.When the target language is a subset of the source, as in thework by Crary [40] on fully-abstract module compilation, thenfull abstraction of the translation is a corollary of contextualequivalence between the input term and the output term of thetranslation.

D. Other secure compilation criteria

Abadi [10] was the ﬁrst to promote full abstraction as acriterion for compiler-enforced protection. Abadi motivatedthe usefulness of full abstraction as a tool for understandingthe design of secure implementations of programming lan-guages by giving examples of security violations introducedby otherwise-functionally-correct implementations, and thenexplaining how those insecure implementations are not fullyabstract.Other criteria for telling the security of a compilationscheme involve proving what class of hyperproperties a com-piler can preserve robustly, i.e., when its code is linkedagainst arbitrary target code, as we do [23, 41, 42, 43]. Forexample, if a compiler needs to preserve noninterference-likeproperties, it sufﬁces to prove that it preserves hypersafetyproperties (noninterference being in the class of hypersafety).Or, if a compiler needs to preserve integrity of compiledcode, it sufﬁces to prove that it preserves safety properties(integrity being in the class of safety). Unlike FA most of thesecriteria require reasoning about the trace behaviour of a partialprogram. Since we have devised traces for our languages, webelieve we can explore which of these criteria is applicable toour work as well.nother active line of work [32, 33, 34] is concerned withmaking sure that the compiler does not undo countermeasuresthat the programmer of cryptographic libraries implementsin order to ensure protection against timing attacks or othersecret-revealing attacks. In this line of work, the guaranteesthat are sufﬁcient to capture the desired security intuition arebelieved to be weaker but more domain-speciﬁc than the fullabstraction guarantee. A PPENDIX

A. Semantics of

ImpMod expressions

In Figure 5, notice that rules Eval-arr-offset and Eval-varare just de-sugaring rules. Also, notice that for simplicity,expressions in

ImpMod do not have side effects (the evalua-tion context Fd , MVar , β, ∆ , Σ , Mem , Φ , pc that we introduceshortly is actually read-only.).The evaluation context for expressions consists of:1) syntactic information about the program given by thefunction deﬁnitions Fd , the declarations of module-globalvariables MVar , and the layout and bounds β of all theprogram’s variables;2) load-time information about the program given by theper-module d ata-segment-location ∆ , and the per-modulelocal- s tack-location Σ ; and3) execution-state information given by the memory Mem ,the stack pointers Φ of the module-local stacks and theprogram counter pc . B. The

TrICL proof technique

The

TrICL relation is indexed by the given trace ( α ) fromLemma 5, and by a position i on this trace.The goal is to prove that the TrICL relation satisﬁes thestep-wise backward simulation condition (Lemma 6). Butﬁrst, we recall that

TrICL is deﬁned in terms of four mainrelations/invariants:1) the vanilla (whole-program) compiler-correctness rela-tion ( ∼ = p ) between the source state and the mediatorstate satisfying lifted forward- and backward-simulations(Lemmas 7 and 8)2) a strong-similarity relation ( ≈ J p K ) between the mediatorstate and the given state, a relation that satisﬁes lock-stepsimulation (Lemma 9)3) a weak-similarity relation ( ∼ J p K ) also between the me-diator state and the given state, a relation that satisﬁesoption simulation (Lemma 11), and together with thestrong similarity satisfying both weakening (Lemma 10)and strengthening (Lemma 12)4) emulation invariants about the source state satisfyingboth adequacy (Lemma 13) and preservation by tracesteps (Lemma 14) Notation:

We write s ⇀ α − ⇀ p s ′ to denote that s is a stateof the program-of-interest p (linked with some context), andthat α is a compressed trace preﬁx (i.e., with uninformative ( τ )labels dropped) that is emitted by the (multiple-step) executionof s until s ′ . We write s ⇀ α − ⇀ p s ′ to denote the same about thetarget language. Lemma 7 (Compiler forward-simulation lifted to compressedtrace steps) . s s ∼ = p s t ∧ ( s s , ς ) ⇀ α − ⇀ p ( s ′ s , ς ′ ) = ⇒∃ s ′ t . ( s t , ς ) ⇀ α − ⇀ J p K ( s ′ t , ς ′ ) ∧ s ′ s ∼ = p s ′ t Lemma 8 (Compiler backward-simulation lifted to com-pressed trace steps) . ig. 5: Evaluation of expressions e in ImpMod . The evaluation relation ⇓ ⊆ e × V is deﬁned on pairs of expressions e and val-ues V . The evaluation relation ⇓ is indexed with an evaluation context Fd , MVar , β, ∆ , Σ , Mem , Φ , pc . The evaluation context isused in rules Eval-amp-local-var, Eval-amp-module-var and Eval-star. Instead of writing Fd , MVar , β, ∆ , Σ , Mem , Φ , pc ⊢ e ⇓ v ,we abbreviate it as e ⇓ v . (Eval-binop) e ⇓ z z ∈ Z e ⇓ z z ∈ Z z r = z [ ⊕ ] z e ⊕ e ⇓ z r (Eval-arr-offset) ∗ (& e arr [ e idx ]) ⇓ ve arr [ e idx ] ⇓ v (Eval-var) ∗ (& vid ) ⇓ vvid ⇓ v (Eval-const) z ∈ Z z ⇓ z (Eval-start) e ⇓ ( _ , z , _ , _ ) start ( e ) ⇓ z (Eval-end) e ⇓ ( _ , _ , z , _ ) end ( e ) ⇓ z (Eval-offset) e ⇓ ( _ , _ , _ , z ) oﬀset ( e ) ⇓ z (Eval-capType) e ⇓ x x ∈ Z = ⇒ v = ∈ { κ } × Z × Z × Z = ⇒ v = ∈ { δ } × Z × Z × Z = ⇒ v = ( e ) ⇓ v (Eval-limRange) e c ⇓ ( x , st , end , _ ) e s ⇓ st ′ st ′ ∈ Z e e ⇓ end ′ end ′ ∈ Z [ st ′ , end ′ ) ⊆ [ st , end ) limRange ( e c , e s , e e ) ⇓ ( x , st ′ , end ′ , ) (Eval-amp-local-var) ( ﬁd , _ ) = pc vid ∈ localIDs ( Fd ( ﬁd )) ∪ args ( Fd ( ﬁd )) mid = moduleID ( Fd ( ﬁd )) β ( vid , ﬁd , mid ) = [ st , end ) φ = Σ ( mid ) . + Φ ( mid )& vid ⇓ ( δ, φ + st , φ + end , ) (Eval-amp-module-var) ( ﬁd , _ ) = pc vid / ∈ localIDs ( Fd ( ﬁd )) ∪ args ( Fd ( ﬁd )) mid = moduleID ( Fd ( ﬁd )) vid ∈ MVar ( mid ) β ( vid , ⊥ , mid ) = [ st , end )& vid ⇓ ( δ, ∆ ( mid ) . + st , ∆ ( mid ) . + end , ) (Eval-amp-arr) & e arr ⇓ ( δ, st , end , oﬀ ) e idx ⇓ oﬀ ′ oﬀ ′ ∈ Z & e arr [ e idx ] ⇓ ( δ, st , end , oﬀ + oﬀ ′ ) (Eval-star) e ⇓ ( δ, st , end , oﬀ ) st ≤ st + oﬀ < end Mem ( st + oﬀ ) = v ∗ ( e ) ⇓ v Fig. 6: (Excerpt) Small-step semantics of commands

Cmd in ImpMod . The small-step relation → ⊆ s × s is deﬁned on pairsof execution states s . The small-step relation → is indexed with an evaluation context Fd , MVar , β, ∆ , Σ and an allocationlimit ∇ . The allocation limit ∇ is used in rule Allocate. (Assign-to-var-or-arr) ( ﬁd , n ) = pc commands ( Fd ( ﬁd ))( n ) = Assign e l e r modID = moduleID ( Fd ( ﬁd )) Fd , MVar , β, ∆ , Σ , Mem , Φ , pc ⊢ e l ⇓ ( δ, st , end , oﬀ ) Fd , MVar , β, ∆ , Σ , Mem , Φ , pc ⊢ e r ⇓ v ∀ st ′ , end ′ . v = ( δ, st ′ , end ′ , _ ) = ⇒ ([ st ′ , end ′ ) ∩ Σ ( modID ) = ∅ ∨ [ st , end ) ⊆ Σ ( modID )) st ≤ st + oﬀ < end Mem ′ = Mem [ st + oﬀ v ] Fd , MVar , β, ∆ , Σ , ∇ ⊢ h Mem , Φ , pc , stk , nalloc i → h Mem ′ , Φ , inc ( pc ) , stk , nalloc i (Allocate) ( ﬁd , n ) = pc commands ( Fd ( ﬁd ))( n ) = Alloc e l e size Fd , MVar , β, ∆ , Σ , Mem , Φ , pc ⊢ e l ⇓ ( δ, st , end , oﬀ ) Fd , MVar , β, ∆ , Σ , Mem , Φ , pc ⊢ e size ⇓ vst ≤ st + oﬀ < end v ∈ Z + nalloc − v > ∇ nalloc ′ = nalloc − vMem ′ = Mem [ st + oﬀ ( δ, nalloc ′ , nalloc , )][ a | a ∈ [ nalloc ′ , nalloc )] Fd , MVar , β, ∆ , Σ , ∇ ⊢ h Mem , Φ , pc , stk , nalloc i → h Mem ′ , Φ , inc ( pc ) , stk , nalloc ′ i ig. 7: (Excerpt) Evaluation of expressions e in CHERIExpress . The evaluation relation ⇓ ⊆ e × V is deﬁned on pairsof expressions e and values V . The evaluation relation ⇓ is indexed with an evaluation context Mem , ddc , stc , pcc whichis part of an execution state. (Eval-ddc) Mem , ddc , stc , pcc ⊢ ddc ⇓ ddc (Eval-deref) Mem , ddc , stc , pcc ⊢ e ⇓ ( δ, st , end , oﬀ ) st ≤ st + oﬀ < endMem , ddc , stc , pcc ⊢ deref ( e ) ⇓ Mem ( st + oﬀ ) (Eval-inc) Mem , ddc , stc , pcc ⊢ e ⇓ ( x , st , end , oﬀ ) Mem , ddc , stc , pcc ⊢ e z ⇓ v z v z ∈ Z Mem , ddc , stc , pcc ⊢ inc ( e , e z ) ⇓ ( x , st , end , oﬀ + v z ) s s ∼ = p s t ∧ ( s t , ς ) ⇀ α − ⇀ J p K ( s ′ t , ς ′ ) = ⇒ ∃ s ′ s . ( s s , ς ) ⇀ α − ⇀ p ( s ′ s , ς ′ ) ∧ s ′ s ∼ = p s ′ t Lemma 9 (Lock-step simulation of strong similarity) . ( s , ς ) ≈ J p K ( s , ς ) ∧ ( s , ς ) τ − ⇀ J p K ( s ′ , ς ) = ⇒∃ s ′ . ( s , ς ) τ − ⇀ J p K ( s ′ , ς ) ∧ ( s ′ , ς ) ≈ J p K ( s ′ , ς ) Lemma 10 (Strong similarity is weakened by an output ac-tion) . λ ∈ • ! ∧ ( s , ς ) ≈ J p K ( s , ς ) ∧ ( s , ς ) λ − ⇀ J p K ( s ′ , ς ′ ) = ⇒∃ s ′ . ( s , ς ) λ − ⇀ J p K ( s ′ , ς ′ ) ∧ ( s ′ , ς ′ ) ∼ J p K ( s ′ , ς ′ ) Lemma 11 (Option simulation of weak similarity) . ( s , ς ) ∼ J p K ( s , ς ) ∧ ( s , ς ) τ − ⇀ J p K ( s ′ , ς ) = ⇒ ( s ′ , ς ) ∼ J p K ( s , ς ) Lemma 12 (Weak similarity is strengthened by aligned inputactions) . λ ∈ • ? ∧ ( s , ς ) ∼ J p K ( s , ς ) ∧ ( s , ς ) λ − ⇀ J p K ( s ′ , ς ′ ) ∧ ( s , ς ) λ − ⇀ J p K ( s ′ , ς ′ ) = ⇒ ( s ′ , ς ′ ) ≈ J p K ( s ′ , ς ′ ) Lemma 13 (Adequacy of emulate _ invariants ) . emulate _ invariants ( s ) α,i, p ∧ α ( i ) ∈ • ? ∪ { X } = ⇒∃ s ′ . ( s , _ ) ⇀ α ( i ) −− ⇀ p ( s ′ , _ ) Lemma 14 (Preservation of emulate _ invariants ) . emulate _ invariants ( s ) α,i, p ∧ ( s , _ ) ⇀ α ( i ) −− ⇀ p ( s ′ , _ ) = ⇒ emulate _ invariants ( s ′ ) α,i +1 , p Using the lemmas above, one can prove that

TrICL satisﬁesstep-wise backward-simulation (Lemma 6).

C. Output of the source-to-source transformation struct cheri_object main_obj; static struct sandbox_object ∗ main_objectp; __attribute__((cheri_ccall)) __attribute__((cheri_method_suffix("_cap"))) __attribute__((cheri_method_class(main_obj))) extern int main(int argc, char ∗ argv[]); int init(int argc, char ∗ argv[]) { sandbox_chain_load("main", &main_objectp); main_obj = sandbox_object_getobject(main_objectp); main(argc, argv); } Listing A.1: Source-to-source compilation output.Initialization module init.c struct cheri_object lib1; struct cheri_object lib2; __attribute__((cheri_ccall)) __attribute__((cheri_method_class(lib1))) int f1(void); __attribute__((cheri_ccall)) __attribute__((cheri_method_class(lib2))) int f2(void); __attribute__((cheri_ccallee)) __attribute__((cheri_method_class(main_obj))) int main(void); __attribute__ ((constructor)) static void sandboxes_init(void) { lib2 = fetch_object("lib2"); lib1 = fetch_object("lib1"); } int main(void) { f1(); f2(); return 0; } Listing A.2: Source-to-source compilation output.Transformed main.c extern struct cheri_object lib1; struct cheri_object lib2; __attribute__((cheri_ccallee)) __attribute__((cheri_method_class(lib1))) int f1(void); __attribute__((cheri_ccall)) __attribute__((cheri_method_class(lib2))) int f2(void); __attribute__ ((constructor)) static void sandboxes_init(void) { lib2 = fetch_object("lib2"); } int f1(void) { f2(); } Listing A.3: Source-to-source compilation output.Transformed lib1.c extern struct cheri_object lib2; __attribute__((cheri_ccallee)) __attribute__((cheri_method_class(lib2))) int f2(void); int f2(void) { [..] } Listing A.4: Source-to-source compilation output.Transformed lib2.c

EFERENCES [1] R. N. Watson, P. G. Neumann, and S. W. Moore,“Balancing disruption and deployability in the cheriinstruction-set architecture (isa),” 2017.[2] J. Woodruff, R. N. Watson, D. Chisnall, S. W.Moore, J. Anderson, B. Davis, B. Laurie, P. G.Neumann, R. Norton, and M. Roe, “The CHERICapability Model: Revisiting RISC in an Age ofRisk,”

SIGARCH Comput. Archit. News , vol. 42,no. 3, pp. 457–468, Jun. 2014. [Online]. Available:http://doi.acm.org/10.1145/2678373.2665740[3] R. N. M. Watson, R. M. Norton, J. Woodruff, S. W.Moore, P. G. Neumann, J. Anderson, D. Chisnall,B. Davis, B. Laurie, M. Roe, N. H. Dave, K. Gudka,A. Joannou, A. T. Markettos, E. Maste, S. J. Murdoch,C. Rothwell, S. D. Son, and M. Vadera, “Fast protection-domain crossing in the cheri capability-system architec-ture,”

IEEE Micro , vol. 36, no. 5, pp. 38–49, Sept 2016.[4] R. N. Watson, J. Woodruff, P. G. Neumann, S. W. Moore,J. Anderson, D. Chisnall, N. Dave, B. Davis, K. Gudka,B. Laurie et al. , “CHERI: A hybrid capability-systemarchitecture for scalable software compartmentalization,”in

Security and Privacy (SP), 2015 IEEE Symposium on .IEEE, 2015, pp. 20–37.[5] D. Chisnall, C. Rothwell, R. N. Watson, J. Woodruff,M. Vadera, S. W. Moore, M. Roe, B. Davis, and P. G.Neumann, “Beyond the pdp-11: Architectural support fora memory-safe c abstract machine,” in

ACM SIGPLANNotices , vol. 50, no. 4. ACM, 2015, pp. 117–130.[6] S. Tsampas, D. Devriese, and F. Piessens,“Temporal safety for stack allocated memory oncapability machines,” 2019. [Online]. Available:https://lirias.kuleuven.be/retrieve/538854[7] L. Skorstengaard, D. Devriese, and L. Birkedal, “Reason-ing about a machine with local capabilities,” in

EuropeanSymposium on Programming . Springer, 2018, pp. 475–501.[8] ——, “Stktokens: Enforcing well-bracketed controlﬂow and stack encapsulation using linear capabilities,”

Proc. ACM Program. Lang. , vol. 3, no. POPL,pp. 19:1–19:28, Jan. 2019. [Online]. Available:http://doi.acm.org/10.1145/3290332[9] D. Chisnall, B. Davis, K. Gudka, D. Brazdil, A. Joannou,J. Woodruff, A. T. Markettos, J. E. Maste, R. Norton,S. Son, M. Roe, S. W. Moore, P. G. Neumann, B. Laurie,and R. N. Watson, “CHERI JNI: Sinking the Java Secu-rity Model into the C,” in

International Conference onArchitectural Support for Programming Languages andOperating Systems . ACM, 2017, pp. 569–583.[10] M. Abadi, “Protection in programming-language trans-lations,” in

International Colloquium on Automata, Lan-guages, and Programming

Machine Intelligence , vol. 7,no. 3, pp. 51–70, 1972.[13] M. Patrignani, A. Ahmed, and D. Clarke, “Formalapproaches to secure compilation: A survey of fullyabstract compilation and related work,”

ACM Comput.Surv. , vol. 51, no. 6, pp. 125:1–125:36, Feb. 2019.[Online]. Available: http://doi.acm.org/10.1145/3280984[14] X. Leroy, “Formal veriﬁcation of a realistic compiler,”

Communications of the ACM , vol. 52, no. 7, pp. 107–115, 2009.[15] D. PATTERSON and A. AHMED, “The next 700 com-piler correctness theorems (functional pearl),” 2019.[16] P. Agten, R. Strackx, B. Jacobs, and F. Piessens,“Secure compilation to modern processors,” in

CSF’12 . IEEE, 2012, pp. 171 – 185. [Online]. Available:http://dx.doi.org/10.1109/CSF.2012.12[17] M. Patrignani, D. Devriese, and F. Piessens, “On Modularand Fully-Abstract Compilation,” in

Proceedings of the29th IEEE Computer Security Foundations SymposiumCSF 2016, Lisbon, Portugal , ser. CSF 2016, 2016.[18] M. Patrignani, P. Agten, R. Strackx, B. Jacobs, D. Clarke,and F. Piessens, “Secure compilation to protected mod-ule architectures,”

ACM Transactions on ProgrammingLanguages and Systems (TOPLAS) , vol. 37, no. 2, p. 6,2015.[19] Y. Juglaret and C. Hritcu, “Secure compilation usingmicro-policies,” 2015.[20] R. Milner,

Communicating and mobile systems: the picalculus . Cambridge university press, 1999.[21] J. Laird,

A fully abstract trace semantics for generalreferences , ser. Lecture Notes in Computer Science (in-cluding subseries Lecture Notes in Artiﬁcial Intelligenceand Lecture Notes in Bioinformatics). Springer Verlag,7 2007, pp. 667–679.[22] Y. Juglaret, C. Hri¸tcu, A. Azevedo de Amorim, and B. C.Pierce, “Beyond good and evil: Formalizing the securityguarantees of compartmentalizing compilation,” in . IEEE Computer Society Press, Jul. 2016.[Online]. Available: http://arxiv.org/abs/1602.04503[23] C. Abate, A. Azevedo de Amorim, R. Blanco, A. N.Evans, G. Fachini, C. Hritcu, T. Laurent, B. C. Pierce,M. Stronati, and A. Tolmach, “When good componentsgo bad: Formally secure compilation despite dynamiccompromise,”

CoRR

J. Cryptographic Engineering ,vol. 8, no. 1, pp. 1–27, 2018.[31] V. D’Silva, M. Payer, and D. Song, “The correctness-security gap in compiler optimization,” in . IEEE, 2015, pp. 73–87.[32] G. Barthe, B. Grégoire, and V. Laporte, “Secure compi-lation of side-channel countermeasures: the case of cryp-tographic “constant-time”,” in . IEEE, 2018,pp. 328–343.[33] F. Besson, A. Dang, and T. Jensen, “Securing compilationagainst memory probing,” in

Proceedings of the 13thWorkshop on Programming Languages and Analysis forSecurity . ACM, 2018, pp. 29–40.[34] C. Watt, J. Renner, N. Popescu, S. Cauligi, and D. Stefan,“Ct-wasm: type-driven secure cryptography for the webecosystem,”

Proceedings of the ACM on ProgrammingLanguages , vol. 3, no. POPL, p. 77, 2019.[35] T. C. Murray and P. C. van Oorschot, “BP:formal proofs, the ﬁne print and side effects,” in , 2018, pp. 1–10. [Online]. Available:https://doi.org/10.1109/SecDev.2018.00009[36] M. Patrignani, “The tome of secure compilation: Fullyabstract compilation to protected modules architectures,”2015.[37] C. Fournet, N. Swamy, J. Chen, P.-E. Dagand,P.-Y. Strub, and B. Livshits, “Fully abstract compilation to javascript,”

SIGPLAN Not. , vol. 48,no. 1, pp. 371–384, Jan. 2013. [Online]. Available:http://doi.acm.org/10.1145/2480359.2429114[38] M. S. New, W. J. Bowman, and A. Ahmed,“Fully abstract compilation via universal embedding,”in

Proceedings of the 21st ACM SIGPLANInternational Conference on Functional Programming ,ser. ICFP 2016. New York, NY, USA:ACM, 2016, pp. 103–116. [Online]. Available:http://doi.acm.org/10.1145/2951913.2951941[39] D. Devriese, M. Patrignani, and F. Piessens, “Fully-abstract compilation by approximate back-translation,”in

Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages, POPL 2016, St. Petersburg, FL, USA,January 20 - 22, 2016 , 2016, pp. 164–177. [Online].Available: http://doi.acm.org/10.1145/2837614.2837618[40] K. Crary, “Fully abstract module compilation,”

Proc. ACM Program. Lang. , vol. 3, no. POPL,pp. 10:1–10:29, Jan. 2019. [Online]. Available:http://doi.acm.org/10.1145/3290323[41] C. Abate, R. Blanco, D. Garg, C. Hritcu, M. Patrignani,and J. Thibault, “Journey beyond full abstraction: Explor-ing robust property preservation for secure compilation,”2018.[42] M. Patrignani and D. Garg, “Robustly safe compilation,”in

Programming Languages and Systems , L. Caires, Ed.Cham: Springer International Publishing, 2019, pp. 469–498.[43] ——, “Secure Compilation and Hyperproperties Preser-vation,” in