[PDF] Compiled Obfuscation for Data Structures in Encrypted Computing

Abstract

Encrypted computing is an emerging technology based on a processor that `works encrypted', taking encrypted inputs to encrypted outputs while data remains in encrypted form throughout. It aims to secure user data against possible insider attacks by the operator and operating system (who do not know the user's encryption key and cannot access it in the processor). Formally `obfuscating' compilation for encrypted computing is such that on each recompilation of the source code, machine code of the same structure is emitted for which runtime traces also all have the same structure but each word beneath the encryption differs from nominal with maximal possible entropy across recompilations. That generates classic cryptographic semantic security for data, relative to the security of the encryption, but it guarantees only single words and an adversary has more than that on which to base decryption attempts. This paper extends the existing integer-based technology to doubles, floats, arrays, structs and unions as data structures, covering ANSI C. A single principle drives compiler design and improves the existing security theory to quantitative results: every arithmetic instruction that writes must vary to the maximal extent possible.

Full PDF

aa r X i v : . [ c s . CR ] F e b Compiled Obfuscation for Data Structuresin Encrypted Computing

Peter T. Breuer

Hecusys LLCAtlanta, GA, USA [email protected]

Abstract —Encrypted computing is an emerging technologybased on a processor that ‘works encrypted’, taking encryptedinputs to encrypted outputs while data remains in encrypted formthroughout. It aims to secure user data against possible insiderattacks by the operator and operating system (who do not knowthe user’s encryption key and cannot access it in the processor).Formally ‘obfuscating’ compilation for encrypted computing issuch that on each recompilation of the source code, machine codeof the same structure is emitted for which runtime traces also allhave the same structure but each word beneath the encryptiondiffers from nominal with maximal possible entropy acrossrecompilations. That generates classic cryptographic semanticsecurity for data, relative to the security of the encryption, butit guarantees only single words and an adversary has morethan that on which to base decryption attempts. This paperextends the existing integer-based technology to doubles, ﬂoats,arrays, structs and unions as data structures, covering

ANSI

C. A single principle drives compiler design and improves theexisting security theory to quantitative results: every arithmeticinstruction that writes must vary to the maximal extent possible.

Index Terms —obfuscation, compilation, encrypted computing

I. I

NTRODUCTION

This article examines how to make ‘formally obfuscating’compilation for encrypted computing work for the complexdata structures of standard programming languages such as

ANSI

C [1], with its long long, ﬂoat, double, array, struct(record) and union data types. How to do it for 32-bit integer-only computing was established in [2] (and is recapitulated inSection IV). Integers are enough for formal purposes but thispaper bootstraps that to cover practice and heterogeneous datastructures with a simple approach that reworks all the theory.Encrypted computing means running on a processor that‘works profoundly encrypted’ in user mode (in which accessis always limited to certain registers and memory), takingencrypted inputs to encrypted outputs via encrypted interme-diate values in registers and memory. The processor worksunencrypted in the conventional way in operator mode, whichhas unrestricted access to all registers and memory. Since userdata exists only in encrypted form, operator-level privilegegives no ‘magic’ access to the decrypted form of user data (theuser can interpret the data – elsewhere – as they know the key).Several prototype processors that support encrypted computingat near conventional speeds already exist (see Section II).Platform issues such as the real randomness of randomnumbers or power side-channel information leaks are not at question here. Keys may be installed at manufacture, as withSmartcards [3], or uploaded in public view to the write-onlyinternal store via a Difﬁe-Hellman circuit [4], and are notaccessible via the programming instruction interface. Keymanagement is not an issue via a simple argument: if (a)user B’s key is still loaded when user A runs, then A’sprograms do not run correctly because the running encryptionis wrong for them, and if (b) B’s key is in the machinetogether with B’s program when A runs, then user A cannotsupply appropriate encrypted inputs nor interpret the encryptedoutput. The question of security user on user essentially boilsdown to security for user mode against operator mode as themost powerful potential adversary, and it is proved in [5]that (i) a processor that supports encrypted computing, (ii)an appropriate machine code instruction set architecture, (iii)a compiler with an ‘obfuscating’ property, together formallyprovide classic cryptographic semantic security [6] (CSS),relative to the security of the encryption, for user data againstoperator mode as adversary. A translation is that encryptedcomputing cannot in itself further compromise the encryption,and ‘good’ security amounts to choosing secure encryption.The obfuscation property (iii) for the compiler simplyrequires it to produce code such that an adversary cannotcount on 0, 1, 2 and other low values being the most commonto appear (encrypted) in a program trace. That would be thecase if the program were written by a human and compiled tomachine code conventionally, and it would allow statistically-based dictionary attacks [7] against the encryption. The prop-erty is that no value may appear with any higher frequencythan any other, both for observations of single words and forsimultaneous observations at multiple points in a trace. Theproperty is violated, for example, in implementations [8] offully homomorphic encryptions [9], [10] (FHE), where theoutput of a 1-bit AND (multiplication) operation is predictably0 with 75% probability (see Box 1a). This document will use ‘the operator’ for operator mode.A subverted operating system is ‘the operator’, as is a humanwith administrative privileges, perhaps obtained by physicallyinterfering with the boot process. A scenario for an attack That 0 is a probable outcome from multiplication in a FHE E is not anextra liability because in 1-bit arithmetic E [ x ] + E [ x ] = E [0] with certaintyfrom any observed encrypted value E [ x ] . It can also be relied on that E [1] is one of the inputs in any nontrivial calculation because ‘all-zeros’ as inputspropagates through to all-zeros as output via E [0]+ E [0] = E [0] ∗E [0] = E [0] . ox 1 (a) A fully homomorphic encryp-tion (FHE) E of 1-bit data doesnot have the cryptographic seman-tic security (CSS) property. E [0] ∗ E [0] = E [0] E [0] ∗ E [1] = E [0] E [1] ∗ E [0] = E [0] E [1] ∗ E [1] = E [1] Guessing 0 as outcome is right75% of the time. (b) A FHE program that adds2-bit data to itself: E [0] + E [0] = E [0] E [1] + E [1] = E [2] E [2] + E [2] = E [0] E [3] + E [3] = E [2] has output y = 2 x that is100% even, breaking CSS. by the operator is where cinematographic imagery is beingrendered in a server farm. The computer operators have anopportunity to pirate for proﬁt portions of the movie beforerelease and they may be tempted. Another scenario is theprocessing in a specialised facility of satellite photos of aforeign power’s military installations to spot changes. If anoperator (or hacked operating system) can modify the data toshow no change where there has been some, then that is anoption for espionage. A successful attack by the operator isone that discovers the plaintext of user data or alters it to order.A processor starts in operator mode when it is switchedon, in order to load operating system code into reserved areasof memory from disk, and conventional application softwarerelies on the processor to change from user mode to operatormode and back for the operating system system supportroutines (e.g., disk I/O) as required, so the operator modeof working of the processor intrinsically presents difﬁcultiesas an adversary. Nevertheless, the CSS result of [5] meansthe operator cannot directly or indirectly by deterministicor stochastic means read a word of user data beneath theencryption, even to a probability slightly above chance. Norcan user data be rewritten deliberately, even stochastically onthe balance of averages, to a value beneath the encryption thatis independently deﬁned, such as π , or the encryption key (see[5] again). That is a good start on answering (positively) to thequestion of the security of encrypted computing as a whole, butit might be, for example, that an adversary can detect that ananomaly in satellite photos has been found, though they cannottell what it is. A simple example is a one-instruction programthat adds its input to itself (Box 1b). An observer would notknow what the input is nor what the output is, but can be surethat the latter is twice the former. It terms of pairs x , y ofinput/output values, only four of sixteen are possible, makinga statistical dictionary attack feasible. Ideally, a compiler forencrypted computing should produce program codes such thatbiases in joint frequencies of values beneath the encryption areremoved. In principle, it can do that by injecting some verynoisy signal of its own that swamps any existing biases.An ‘obfuscating’ compiler (iii) like that is described in[11] and is proved in [5] to generate object code that varies on recompilation of the same source code but always looksthe same to an adversary, the difference consisting entirelyof the encrypted constants embedded in the code (whichthe adversary a priori cannot read, lacking the encryptionkey). Runtime traces also ‘look the same,’ with the sameinstructions n the same order, the same jumps and branches,reading from and writing to the same registers. But the databeneath the encryption varies arbitrarily and independentlyfrom recompilation to recompilation at each point in the trace,subject only to the constraints that a copy instruction preservesthe value, and the variations introduced by the compiler arealways equal where control paths join (i.e., at either endof a loop, after conditional blocks, at subroutine returns, ateither end of a goto). Within those constraints, compiled codesvary as much as is possible, in a way that can be quantiﬁedprecisely. A new principle subsuming that is put forward here: Every arithmetic instruction that writes should intro-duce maximal possible entropy to the program trace. ( e H )as a single driver for the approach, reworking existing theory.Entropy is measured across recompilations, so what thismeans is that the compiler fully exercises its possibilities forvarying the trace at each opportunity in a compiled program.It does not, for example, always use 1 as the increment in anaddition instruction when the possibility exists of doing some-thing different. If two addition instructions are introduced, thenboth vary independently across compilations. The principle( e H ) allows CSS and stronger formulations of security relativeto the security of encryption to be proved (see Section IX).The compiler of [11] implements the principle ( e H ) for aminimal C-like language with 32-bit signed integers beneaththe encryption as the only data type. The extension of com-pilation to ANSI

C pointers, arrays, structs (record types) andunions, arbitrarily nested, will be described in this paper. Allatomic data types (int, short int, long int, long long int, signedand unsigned, ﬂoat and double ﬂoat) are covered. Pointersmust be declared as restricted to a named area of memory (anarray), which is a limitation with respect to the standard.Encrypted 32-bit integer arithmetic will be taken as prim-itive. Since hardware is not the focus here, for further con-venience, encrypted 64-bit integer arithmetic will also beassumed for the target platform, carried out on two encrypted32-bit integers representing the high and low bits respectively(that can be supported in software, as an alternative).Encrypted 32-bit ﬂoating point arithmetic will also be takenas primitive, on the same rationale. It works on encrypted 32-bit integers each encoding a 32-bit ﬂoat bitwise as speciﬁedin IEEE standard 754 (ISO standard 60559; see the goodcommentaries on the standard in [12] and [13]). Encrypted64-bit ﬂoating point arithmetic will be taken as primitive too,working on two encrypted 32-bit integers encoding separatelythe high and low bits of a 64-bit double ﬂoat as per theIEEE 754 standard. All these primitives are supported byat least one of the prototype processors referenced in Sec-tion II. Coincidentally, the IEEE ﬂoating point test suite athttp://jhauser.us/arithmetic/TestFloat.html consisting of 22,000ines of C code is one of the compilation and executiontests for our own prototype compiler, so we can be surethat encrypted ﬂoating point arithmetic in software wouldwork if we had to resort to it, and that our test platform’simplementation in hardware is correct.This article is organised as follows. Section II introducesextant platforms for encrypted computing and discusses knownelements of the theory. Section III introduces a modiﬁedOpenRISC (http://openrisc.io) machine code instruction set forencrypted computing ﬁrst described in [11], and its abstractsemantics. Section IV resumes ‘obfuscating’ integer-basedcompilation as in [11]. Section V extends it to ramiﬁedbasic types such as long integers and ﬂoats, Section VI dealswith arrays, Section VII with ‘struct’ (record) types, andSection VIII with union types. The theory is developed inSection IX, quantifying the entropy in a runtime trace for codecompiled according to the principle ( e H ) and characterising thecompilation as ‘best possible’ with respect to that. Section Xdiscusses the further implications for security in this context.N OTATION

Encryption is denoted by x E = E [ x ] of plaintext value x .Decryption is x = D [ x E ] . The operation on the ciphertextdomain corresponding to f on the plaintext domain is written [ o ] , where x E [ o ] y E = E [ x o y ] .II. B ACKGROUND

Several fast processors for encrypted computing are de-scribed in [14]. Those include the 32-bit KPU [15] with 128-bit AES encryption [16], which benchmarks at approximatelythe speed of a 433 MHz classic Pentium, and the slightly older16-bit HEROIC [17], [18] with 2048-bit Paillier encryption[19], which runs like a 25 KHz Pentium, as well as the recentlyannounced CryptoBlaze [20], 10 × faster.The machine code instruction set deﬁning the programminginterface is important because a conventional instruction set isinsecure against powerful insiders, who may, for example, stealan (encrypted) user datum x and put it through the machine’sdivision instruction to get x/x encrypted, an encrypted 1. Thenany desired encrypted y may be constructed by repeatedlyapplying the machine’s addition instruction. By using the in-struction set’s comparator instructions (testing ≤ z , ≤ z ,. . . ) on an encrypted z and subtracting on branch, z may beobtained efﬁciently bitwise. That is a chosen instruction attack (CIA) [21]. The instruction set has to resist such attacks, butthe compiler must be involved too, else there would be knownplaintext attacks (KPAs) [22] based on the idea that not only doinstructions like x − x predictably favour one value over others(the result there is always x − x =0 ), but human programmersintrinsically use values like 0, 1 more often. The compiler’sjob is to even out those statistics.A compiler must do that even for object code consisting ofa single instruction. That gives the conditions on the machinecode instruction design (ﬁrst described in [11]) shown inBox II): instructions must (1) execute atomically, or recentattacks such as Meltdown [23] and Spectre [24] against Intel Box 2: Machine code conditions. Instructions . . . • are a black box from the perspective of the program-ming interface, with no intermediate states; (1) • take encrypted inputs to encrypted outputs; (2) • are adjustable via (encrypted) embedded constants toproduce any offset in (decrypted) inputs/outputs; (3) • are such that there are no collisions between en-crypted instruction constants and runtime values. (4)might become feasible, must (2) work with encrypted valuesor an adversary could read them, and (3) must be adjustablevia embedded encrypted constants to offset the values beneaththe encryption by arbitrary deltas. The condition (4) is for thesecurity proofs and amounts to different padding or blindingfactors for encrypted program constants and runtime values.In this document (4) will be strengthened to also: There are no collisions between (encrypted) constantsin instructions with different opcodes, or differentlypositioned constants where the opcode is equal. (4 ∗ )Padding beneath the encryption enforces that. The aim isthat experiments by the adversarial operator that transplantconstants from one instruction to another cannot be performed.With (4), experiments that use runtime encrypted data valueas an instruction constant, or vice versa, are ruled out. With(4*) an adversary can modify copied instructions even less.The salient effect of a machine code instruction set satisfy-ing (1-4) is proved in [5] to be: A machine code instruction program and its runtimetrace (with encrypted data) can be interpreted arbi-trarily with respect to the plaintext data beneath theencryption at any point in memory and in the controlgraph by any observer and experimenter who does nothave the key to the encryption, with the proviso thatcopy instructions preserve value and the delta fromnominal at start and end of a loop is the same. ( ∀ I )That means that picking any one point in the trace, the wordbeneath the encryption there varies over a 32-bit range fromrecompilation to recompilation with ﬂat probability, indepen-dently of (almost) any other point in the trace. The exceptionalpoints that are not independent are data pairs that are theinputs to and outputs from a copy instruction, and also, datameasured in the same register or memory location respectivelyat the beginning and end of a loop must have the same deltasfrom nominal values beneath the encryption, whatever thatdelta is. To keep programs working correctly the compilerhas to arrange that they are same. The proviso actually holdswherever two control paths join in the machine code, at thebeginning of a loop but also at the target of any jump orconditional branch, in particular at the label of a backward-going jump and multiple entry or exit points of a subroutine.The rationale behind ( ∀ I ) is that an arbitrary delta fromthe nominal value can be introduced by the compiler in one ox 3: What the compiler does:(A) change only encrypted program constants, generating via (3) anobfuscation scheme of planned offsets from nominal values forinstruction inputsandoutputsbeneaththeencryption;(B) makeruntimetraceslookunchanged,apartfromdifferencesinthe(encrypted) instruction constantsanddata (A);(C) equiprobablygenerateallobfuscationschemessatisfying(A), (B). instruction and changed again in the next instruction, via theembedded instruction constants of (3), while (1-2) preventthe adversary from knowing. Note that (1) means ‘no side-channels’. The compiler’s job boils down to:

Varying the encrypted instruction constants (3) fromrecompilation to recompilation so deltas from nom-inal in the runtime data beneath the encryption ateach point in the trace are equiprobable. ( EP )The compiler strategy in [11] does that. It is subsumed by ( e H )here, but [5] shows ( ∀ I ) implies ( EP ), which in turn implies:Cryptographic semantic security (CSS) holds for userdata against insiders not privy to the encryption. ( ⋓(cid:3) )I.e., encrypted computation does not compromise encryption .How the ‘equiprobable variation’ is obtained by the com-piler is encapsulated in Box 3: a new obfuscation scheme is generated at each recompilation. That is a planned offsetdelta for the data beneath the encryption in every memoryand register location per point in the program control graph .Precisely, the compiler C [ − ] translates an expression e thatis to end up in register r at runtime into machine code mc andgenerates a 32-bit offset ∆ e for r at the point in the programwhere it is loaded with the result of the expression e . That is C [ e ] r = ( mc , ∆ e ) (5)The offset ∆ e is the obfuscation for register r at the pointwhere the encrypted value of the expression is written to it.Let s ( r ) be the content of register r in state s of theprocessor at runtime. The machine code mc ’s action changesstate s to an s with a ciphertext in r whose plaintext valuediffers by ∆ e from the nominal value e : s mc s where s ( r ) = E [ e + ∆ e ] (6)Bitwise exclusive-or or the binary operation of another math-ematical group are alternatives to addition in the e + ∆ e .The encryption E is shared with the user and the processorbut not the potential adversaries: the operator and operatingsystem. The randomly generated offsets ∆ e of the obfuscationscheme are known to the user, but not the processor and not theoperator and operating system. The user compiles the programand sends it to the processor to be executed and needs toknow the offsets on the inputs and outputs. That allows theright inputs to be created and sent off for processing on theencrypted computing platform, and allows sense to be madeof the outputs received back. TABLE II

NTEGER PORTION OF F X A MACHINE CODE INSTRUCTION SET FORENCRYPTED WORKING – ABSTRACT SYNTAX AND SEMANTICS . op. ﬁelds mnem. semantics add r r r k E add r ← r [+] r [+] k E sub r r r k E subtract r ← r [ − ] r [+] k E mul r r r k E k E k E multiply r ← ( r [ − ] k E ) [ ∗ ]( r [ − ] k E ) [+] k E div r r r k E k E k E divide r ← ( r [ − ] k E ) [ ÷ ]( r [ − ] k E ) [+] k E . . .mov r r move r ← r beq i r r k E branch if b then pc ← pc + i , b ⇔ r [=] r [+] k E bne i r r k E branch if b then pc ← pc + i , b ⇔ r [ =] r [+] k E blt i r r k E branch if b then pc ← pc + i , b ⇔ r [ < ] r [+] k E bgt i r r k E branch if b then pc ← pc + i , b ⇔ r [ > ] r [+] k E ble i r r k E branch if b then pc ← pc + i , b ⇔ r [ ≤ ] r [+] k E bge i r r k E branch if b then pc ← pc + i , b ⇔ r [ ≥ ] r [+] k E . . .b i branch pc ← pc + i sw ( k E )r r store mem J r [+] k E K ← r lw r ( k E )r load r ← mem J r [+] k E K jr r jump pc ← r jal j jump ra ← pc + 4; pc ← j j j jump pc ← j nop no-opL EGEND r – register indices k – 32-bit integers pc – prog. count reg. j – program count ‘ ← ’ – assignment ra – return addr. reg. E [ ] – encryption i – pc increment r – register content k E – encrypted value E [ k ] x E [ o ] y E = E [ x o y ] x E [ R ] y E ⇔ x R y III. F X A I

NSTRUCTIONS

A ‘fused anything and add’ (FxA) [11] instruction set archi-tecture (ISA) is the general target here, satisfying conditions(1-4) of Section I. The integer portion is shown in Table I. It isadapted from the open standard OpenRISC instruction set v1.1http://openrisc.io/or1k.html. That has about 200 instructions(6-bit opcode plus variable minor opcodes) separated intosingle and double precision integer and ﬂoating point andvector subsets with instructions all 32 bits long and the FxA in-struction set follows that design closely. FxA instructions, likeOpenRISC instructions, access up to three 32 general purposeregisters (GPRs) per instruction, designated in contiguous 5-bitplaintext speciﬁer ﬁelds within the instruction.To give an idea of what FxA machine code looks like ‘inaction’, a trace of code compiled for the Ackermann function [25] is shown in Table II. For readability here, the ﬁnal deltafor the return value in register v0 is set to zero. That functionis the most computationally complex function theoreticallypossible, stepping up in complexity for each increment of theﬁrst argument, so it is a good test of correct compilation. A. Preﬁx Instructions

FxA instructions may need to contain 128-bit or longer en-crypted constants, so some adaptation of the basic OpenRISCarchitecture is required for that to be possible. A ‘preﬁx’instruction takes care of it, supplying extra bits as necessary.Each preﬁx instruction instruction s 32 bits long, but severalmay be concatenated. Ackermann C code: int A( int m, int n) { if (m == 0) return n+1; if (n== 0) return A(m-1, 1); return

A(m-1, A(m, n-1)); } . . Single Precision Floating Point In addition to the integer instructions of Table I, there may beﬂoating point instructions addf , subf , mulf etc. paralleling theOpenRISC ﬂoating point subset. The contents of registers andmemory for ﬂoating point operations are the encryptions of 32-bit integers that themselves encode ﬂoating point numbers (21mantissa bits, 10 exponent bits, 1 sign bit) via the IEEE 754standard encoding. Deﬁnition 1.

Let . ∗ denote the ﬂoating point multiplication onplaintext integers encoding IEEE 754 ﬂoats, and use the sameconvention for other arithmetic operations and relations. Let [ . ∗ ] be the corresponding operation in the ciphertextdomain, following the notation convention at end of Section I.Then the ﬂoating point multiplication instruction semantics is r ← ( r [ − ] k E ) [ . ∗ ] ( r [ − ] k E ) [+] k E ( . ∗ )The − and + are the ordinary plaintext integer subtractionand addition operations respectively, and [ − ] and [+] are thecorresponding operations in the ciphertext domain (see Nota-tion in Section I). That is, the FxA ﬂoating point multiplicationtakes the encrypted integers representing (in IEEE 754 format)ﬂoating point numbers that have been offset as integers, undoesthe offsets then multiplies them as ﬂoats, obtaining the IEEE754 integer representation before offsetting as integer again.The operation is atomic, as required by (1) of Box II, leavingno trace if aborted. The offsets k i satisfy the requirement (3).The FxA set in use in our prototypes has two encrypted con-stants for a ﬂoating point test condition in branch instructions.The ﬂoating point branch-if-equal instruction calculates ( r [ − ] k E ) [ . =] ( r [ − ] k E ) ( . = )where . = is the ﬂoating point comparison on integers encodingﬂoats via IEEE 754, and [ . =] is the corresponding test in theciphertext domain, with x E [ . =] y E ⇔ x . = y . The subtraction isas integers on the encoding, not ﬂoating point. The operationis atomic, leaving no trace if aborted or interrupted, as requiredby (1) of Box II, and all encrypted operations in the processor(should and do) take the same time and power on all operands. C. Instruction Diddling

Condition (2) of Box II requires there to be one more constantphysically present in each branch instruction, an encrypted bit k that decides if the 1-bit result of the test is to be inverted ornot. That is because the test outcome is observable by whetherthe branch is taken or not, so by condition (3) it should bevariable via an encrypted constant in the instruction. The bitchanges equals to not-equals and vice versa, a less-than into agreater-than-or-equal-to, and so on. The bit is said to diddle theinstruction. In practice, the bit is composed from the paddingbits in the other constants in the instruction, so it has not beenmentioned explicitly in Table I, where the branch semanticsshown are after the diddle.The opcode in the instruction is in plaintext, but whichbranch in the control graph is which is hidden by the diddle. TABLE IIT

RACE FOR A CKERMANN (3,1),

RESULT

PC instruction update trace ...35 add t0 a0 zer E [-86921031] t0 = E [-86921028]36 add t1 zer zer E [-327157853] t1 = E [-327157853]37 beq t0 t1 2 E [240236822]38 add t0 zer zer E [-1242455113] t0 = E [-1242455113]39 b 141 add t1 zer zer E [-1902505258] t1 = E [-1902505258]42 xor t0 t0 t1 E [-1734761313] E [1242455113] E [1902505258]t0 = E [-17347613130]43 beq t0 zer 9 E [-1734761313]53 add sp sp zer E [800875856] sp = E [1687471183]54 add t0 a1 zer E [-915514235] t0 = E [-915514234]55 add t1 zer zer E [-1175411995] t1 = E [-1175411995]56 beq t0 t1 2 E [259897760]57 add t0 zer zer E [11161509] t0 = E [11161509]...143 add v0 t0 zer E [42611675] v0 = E [13]...147 jr ra (return E [13] in v0 ) Legend: (registers) a0 = function argument; sp = stack pointer; t0,t1 = temporaries; v0 = return value; zer = null placeholder.

D. The Debatable Equals Branch Instruction

Diddling works well to disguise less-than instructions andorder inequalities in general, but not for equals versus not-equals. What the instruction is, equals or not-equals, maybe guessed by what proportion of operands cause a jump atruntime. If almost all do then that is a not-equals test. If fewdo then that is an equality test. Trying the same operand bothsides is almost guaranteed to cause equality to fail because ofthe embedded constants k , k in ( . = ), so if it succeeds instead,the equality instruction has likely been diddled to not-equals.So whether the test succeeds or not at runtime is detectablein practice for an equality/not-equals branch instruction, con-tradicting (2). To beat that, the compiler described in [11]randomly changes the way it interprets the original booleansource code expression at every level so it cannot be told ifthe source code, not the object code, had an equality or an not-equals test. It independently and randomly decides as it worksupwards through a boolean expression if the source code atthat point is to be interpreted by a truthteller , who says ‘true’when true is meant and ‘false’ when false is meant, or bya liar , who says ‘false’ when true is meant and ‘true’ whenfalse is meant. It equiprobably generates, at each level in theboolean expression, liar code and uses the branch-if-not-equalmachine code instruction for an equality test, or truthtellercode and uses the branch-if-equal instruction.With that compile strategy, if the equals branch instructionjumps or not at runtime does not relate statistically to whatthe boolean in the source code should be. Condition (3) ofBox II on the output of the instruction is effectively vacuouswith respect to the source, as there is no deﬁnite meaning to itjumping. An observer who sees it jump does not know if thatis the result of a truthteller’s interpretation of an equals test inthe source code and it has come out true at runtime, or it isthe result of the liar’s interpretation and it has come out false.Ditto not-equals. This equates to a (structured) garbled circuit construction in the classical sense of [26]. While a structuredboolean expression reveals its intermediates as outputs to anbserver too, the classical result has it that no output valuescan be deciphered by an observer who does not already knowwhich is being ‘lied’ about, and which not.For other comparison tests, just as many operand pairs causea branch one way as the other, and make it indistinguishablewhether the opcode is diddled or not. Still, the truthteller/liarcompile strategy is used there too. An equality test cannotbe recreated by an adversary as x ≤ y and y ≤ x becauseonly x ≤ y + k is available in FxA, for unknown constant k .Reversing operands is allowed by (4*) but produces y ≤ x + k ,not y + k ≤ x . An estimate for k can be made by the proportionof pairs ( x, y ) that satisfy the conjunction of the inequality andthe reversed inequality. In particular whether k< is signalledby the absence of pairs that satisfy both inequalities. Butdiddling means the conjunctions might be x>y + k and y>x + k instead, and those have no solutions when − k − is negative.So either k< or k ≥ , which gives nothing away.Note for the general description below of the compilerstrategy established in [11] that ‘liar’ amounts to adding a deltaequal to 1 mod 2 to a boolean 1-bit result, and ‘truthteller’amounts to adding a delta equal to 0 mod 2.IV. O BFUSCATING C OMPILATION

A compiler built to obfuscate in the sense of this article workswith a database D : Loc → Oﬀ containing a (here 32-bit)integer offset ∆ l of type Off for data in register or memorylocation l (type Loc). The offset is a delta by which the runtimedata underneath the encryption is to vary from nominal at agiven point in the program, and the database D comprises the obfuscation scheme . It is varied by the compiler as it makesa pass through the source code.The compiler (any compiler) also maintains a conventionaldatabase of type L : Var → Loc binding source variables toregisters and memory locations. In our prototype an intermedi-ate layer (RALPH: Register ALlocation in Physical Hardware)optimises the mapping and detail of this is omitted here.

A. Expressions

In [11], a generic (non-side-effecting) integer expressioncompiler putting its result in register r is described with type: C L [ : ] r : DB × Expr → MC × Off (7)where MC is the type of machine code, a sequence of FxAinstructions mc , and Off is the type of the integer offset ∆ r from nominal that the compiler intends for the result in r beneath the encryption when the machine code is evaluated atruntime. The aim is to satisfy ( EP ) by varying ∆ r arbitrarilyand equiprobably from recompilation to recompilation.To translate x + y , for example, where x and y are signedinteger expressions, the compiler ﬁrst emits machine code mc computing expression x in register r with offset ∆ x . It then In 2s complement arithmetic x < y is the same as x − y = z and z < and exactly half of the range satisﬁes z < and exactly half satisﬁes z ≥ . emits machine code mc computing expression y in register r with offset ∆ y . That is ( mc , ∆ x ) = C L [ D : x ] r ( mc , ∆ y ) = C L [ D : y ] r It then decides a random offset ∆ e for the whole expression e = x + y and emits the FxA integer addition instruction withabstract semantics r ← r [+] r [+] k E to return the result in r : C L [ D : x + y ] r = ( mc e , ∆ e ) (8) mc e = mc ; mc ; add r r r k E k = ∆ e − ∆ x − ∆ y The ﬁnal offset ∆ e for the runtime result in r beneath theencryption may be freely chosen, as ( EP ) stipulates.That is carrying through the global requirement for compilerconstructions ( e H ): the code takes the opportunity of one newarithmetic instruction that writes, here add , to generate one new, independent, randomly chosen offset ∆ e for the writtenlocation r . The same will be true of the compilation of thesubexpressions x , y : each arithmetic machine code instructionemitted introduces an independent random delta in its target. B. Statements

Statements do not produce a result, instead they have a side-effect. Let Stat be the type of statements. The statementcompiler in [11] works not by returning an offset, as forexpressions, but a new scheme for offsets at multiple locations: C L [ : ] : DB × Stat → DB × MC (9)Recall that a database D of type DB holds the obfuscationscheme (the offset deltas from nominal values beneath theencryption in all locations) as the compiler works throughthe code, and consider an assignment z = e to a source codevariable z , which the location database L says is bound inregister r = Lz . Let a pair in the cross product DB × MC bewritten D : mc for readability. First code mc e for evaluatingexpression e in temporary register t0 at runtime is emitted viathe expression compiler as already described: ( mc e , ∆ e ) = C L [ D : e ] t0 Offset ∆ e is generated by the expression compiler for theresult e in t0 . A short form add instruction with semantics r ← t0 [+] k E to change offset ∆ e to a new randomly chosenoffset ∆ ′ r in register r is emitted next: C L [ D : z = e ] = D ′ : mc e ; add r t0 k E (10) k = ∆ ′ r − ∆ e The change to the database of offsets is at index r . An initialoffset Dr = ∆ r changes to D ′ r = ∆ ′ r . The new offset hasbeen freely and randomly chosen by the compiler, supporting( EP ), and the one new arithmetic machine code instructionemitted, add , to write the expression in the target variableincorporates one new random delta, supporting ( e H ).. L ONG B ASIC T YPES

Double length (64-bit) plaintext integers x can be viewed asconcatenated 32-bit integers x = x H . x L , the high and low32 bits of x respectively. In the processor, the encryption of x occupies two registers or two memory locations, containingthe encrypted values E [ x H ] , E [ x L ] respectively. Deﬁnition 2.

Encryption of 64-bit integers x concatenates theencryptions of their 32-bit high and low bit components: x E = E [ x ] = E [ x H . x L ] = E [ x H ] . E [ x L ] The FxA instructions for dealing with encrypted 64-bit valuesnecessarily contain (encrypted) 64-bit constants.

A. Long Long Integers

The 64-bit integer type is known in C as ‘long long’.

Deﬁnition 3.

Let − and + be the two-by-two independentapplication of respectively 32-bit addition and 32-bit subtrac-tion to the pairs of 32-bit plaintext integer high-bit and low-bitcomponents of 64-bit integers, with similar notation for otherbinary operators. I.e. and e.g.: ( u . l ) + ( u . l ) = ( u + u ) . ( l + l ) Deﬁnition 4.

Let ˜ ∗ denote the usual plaintext multiplication on64-bit ‘long long’ integers, and similarly for other operators. The FxA 64-bit multiplication operation on operands E [ x ] , E [ y ] has semantics: E [( x − k ) ˜ ∗ ( y − k ) + k ] ( ˜ ∗ )where k , k , k are 64-bit plaintext integer constants embed-ded encrypted in the instruction as k E i , i = 0 , , . Putting itin terms of the effect on register contents, the FxA long longmultiplication instruction semantics is: r H . r L ← ( r H . r L [ − ] k E ) [ ˜ ∗ ]( r H . r L [ − ] k E ) [+ ] k E For encrypted (and unencrypted) 64-bit operations the proces-sor partitions the register set into pairs referred to by one nameeach. In those terms the semantics is simpliﬁed to: r ← ( r [ − ] k E ) [ ˜ ∗ ]( r [ − ] k E ) [+ ] k E That is written mull r r r k E k E k E in assembler, followingthe 32-bit instruction pattern. The operation is atomic (1).The other instructions for ‘long long’ integer arithmetic inFxA also match the architecture of the corresponding 32-bitinteger instruction (Table I), with longer encrypted constantsand the ‘two-at-a-time’ register naming convention, and an l sufﬁx on the name in assembler. Only the different opcode andthe extra preﬁxes distinguish the long forms ‘on the wire’.The pattern for compiled code generated for long longinteger expressions and statements on the encrypted computingplatform follows exactly that for 32-bit expressions and state-ments but using the ‘ l ’ instructions. Exactly one new (64-bit)arithmetic instruction that writes is issued with each compilerconstruct. It contains just one 64-bit (encrypted) constant thatallows the 64-bit (i.e. × -bit) offset delta in the target location to be freely chosen and generated by the compiler,supporting ( e H ). The target register or memory location pairhas a different (32-bit) delta generated for each of the pair. B. Double Floats

Double precision plaintext 64-bit ﬂoats (‘double’) are encodedas two (encrypted) 32-bit integers, the top and bottom bits re-spectively of a 64-bit IEEE 754 standard integer representation.

Deﬁnition 5.

Let ¨ ∗ denote the plaintext double precisionﬂoating point multiplication on the IEEE 754 encoding ofdouble (64-bit) ﬂoats as 64-bit integers rendered as two 32-bitintegers, and similarly for other operations and relations. Let [¨ ∗ ] be the corresponding operation in the cipherspacedomain on two pairs of encrypted 32-bit integers. Then theFxA multiplication instruction on encrypted 64-bit doubleoperands in the (pairs of) registers r , r respectively, writingto (the pair) register r has semantics: r ← ( r [ − ] k E ) [ ¨ ∗ ] ( r [ − ] k E ) [+ ] k E (11)where k E i , i = 0 , , are encrypted 64-bit constants embeddedin the instruction. That is written muld r r r k E k E k E inassembler, following the 32-bit pattern, but with a d sufﬁx onthe root of the mnemonic. The operation is atomic (1).The pattern for the compiled code emitted for doubleﬂoating point expressions and statements on the encryptedcomputing platform follows exactly that for 32-bit ﬂoatingpoint expressions and statements (which follows the 32-bitinteger pattern) but with these ‘ d ’ instructions instead. Exactlyone new arithmetic instruction that writes is issued per eachcompiler construct for expressions or a write to a locationholding a source code variable. The instruction contains one64-bit (encrypted) constant that allows the 64-bit (i.e. × -bit) offset delta in the target location to be freely chosen andgenerated by the compiler, supporting ( e H ). C. Short Basic Types and Casts

Machine code instructions that act on encrypted ‘short’ (16-bit) or ‘char’ (8-bit) integers are unneeded for C because shortintegers are promoted to 32-bits ones at ﬁrst use.The compiler instead generates cast s following the principle( e H ) (emitting any one instruction that writes entails managingit to vary to the fullest extent possible across recompilations).For C, the 13 basic types (signed/unsigned char, short, int,long, long long integer, and ﬂoat and double precision ﬂoat,also the single bit bool type) have to be inter-converted. Herefollows the cast for encrypted signed 32-bit ‘int’ to encryptedsigned 16-bit ‘short’. The compiler-issued code moves theinteger 16 places left and then 16 places right again using onemultiplication and one division (read on for improvement): C L [ D : ( short ) x ] r = ( mc e , ∆ e ) (12) ( mc , ∆ x ) = C L [ D : x ] r mc e = mc ; mul r r E [2 ] E [∆ x ] k E ; div r r E [2 ] k E E [∆ e ] hose are short form instructions mul r r k E k E k E and div r r k E k E k E with semantics r ← ( r [ − ] k E ) [ ∗ ] k E [+] k E and r ← ( r [ − ] k E ) [ / ] k E [+] k E . The constants k , ∆ e arefreely chosen for these two ‘arithmetic instructions that write’,in support of ( e H ).But (a) the compiler must avoid encryptions of alwaysappearing. Instead a register r can be loaded with the encryp-tion of a random number k and then the full-form instructionsof Table I instead of the short forms can be used, with r [ − ] k E in place of E [2 ] , where k = k − . Then the encryptedconstants that appear in the code are uniformly distributed.Also (b) the top 16 bits should be ﬁlled randomly, but that istaken care of in the ﬁnal offset delta ∆ e . That the differencebetween k , k for (a) is constant at across recompilationsdoes not help an adversary as the processor arithmetic doesnot work on instruction constants (4).Our FxA instruction set provides integer-to-ﬂoat (and viceversa) conversion primitives for the platform. Each embedsencrypted constants that offset inputs and outputs arbitrarilybeneath the encryption, as required by (3). The compiler needsjust one such instruction for an integer/ﬂoat cast, containingone constant allowing one arbitrary offset beneath the encryp-tion in the target location to be generated, supporting ( e H ).VI. A RRAYS AND P OINTERS

There is a natural and there is an efﬁcient way to bootstrapinteger computation to an array A of n integers and bothwill be discussed brieﬂy. The natural way is to imagine aset of variables A , A , . . . for the entries of the array. Thatallows the compiler to translate a lookup A[i] as a compoundexpression ‘(i = ? A : (i = ? A : . . . ’, while a write A[i]=x canbe translated to ‘ if (i =

0) A =x else if (i =

1) A =x else . . . ’.The entries get individual offsets from nominal ∆A , ∆A ,. . . in the obfuscation scheme maintained by the compiler. A. Single Shared Array Offset

While the natural approach is logically correct, it makes arrayaccess have complexity O ( n ). It can trivially be improvedto O ( log n ) but that is still an overhead. So we have alsoexplored an efﬁcient approach: array A’s entries share the sameoffset ∆A from their nominal value beneath the encryption.Then pointer-based access becomes easier to generate codefor. At compile time where in the array the pointer will point atruntime is unknown, but the shared offset for all array entriesmay be relied on. Pointers p must be declared with the array: restrict A int *p ; With this approach, the compiler constructs the dereference ∗ e of an expression e that is a pointer into A as follows. It ﬁrstemits code mc e that evaluates the pointer in register r with arandomly generated offset ∆ e beneath the encryption: ( mc e , ∆ e ) = C L [ D : e ] r It emits a load instruction lw r ( k E ) r containing (encrypted)displacement constant k = − ∆ e that compensates the offset ∆ e in the address in r . The processor does the calculation a E = r [ − ] E [∆ e ] that produces the encrypted address a E andpasses it as-is for lookup by the memory unit. The entryretrieved from memory has the shared offset ∆A and thecompiler emits a short-form add instruction add r r k E withsemantics r ← r [+] k E and k =∆ ′ r − ∆A to change it to a new,freely chosen offset ∆ ′ r in r . The complete code emitted is: C L [ D : ∗ e ] r = ( mc , ∆) (13) mc = mc e ; lw r ( E [ − ∆ e ]) r add r r E [∆ ′ r − ∆ A ] An indexed array lookup A[i] is handled by dereferencinga pointer *(A+i). Does that follow the principle ( e H )? Theadd instruction is varied as the compiler chooses, but theload instruction is not. However, a load instruction is not an arithmetic instruction and ( e H ) refers only to those. A loadinstruction is a copy from RAM and should just copy. Wherein RAM the read is physically mapped to is up to the hardwareand should be varied by it independently. A test of whethertwo encrypted addresses are equal based on if they retrievethe same values from RAM does not break encryption becausethe lookup is of the encrypted not the decrypted address. Thegeneral compilation technique for dealing with this situation(‘hardware aliasing’; the term originated in [29]) in whichthe program has different names for one RAM location isdescribed in [30], [31] (the memory address must be saved forreuse in reads between consecutive writes, not recalculated; inparticular, the classical frame pointer register is used to savethe stack pointer register on entry to a subroutine and forrestoration at subroutine exit).Writing an array entry is more problematic, because itshould change the offset delta beneath the encryption. Becausethat is shared across the whole array, therefore every arrayentry must be rewritten to the new offset whenever one iswritten, an O ( n ) ‘write storm.’ But the n − writes to theother array entries all install the same offset. That contradictsthe principle ( e H ) that each such arithmetic write must exercisethe possibilities for variation to the maximum. Each instructioncould vary independently, but is constrained by the conventionthat the offset ∆ A holds array-wide. Therefore this ‘efﬁcient’approach is wrong. Nevertheless, because it is a straightfor-ward extension of the integers-only compilation technique, it isthe one presently implemented in our compiler. Although soloarray reads are more efﬁcient, blinding which array elementis really being read from requires a ‘read storm’ like the writestorm, so it is not more efﬁcient if a compiler codes for that. B. One Offset per Array Entry

An array may also be viewed as a single (encrypted) n × bit long integer variable A, with a single n × bit offset ∆A=(∆A i ) n − beneath the encryption. Extending Defn. 2: In our own prototype processor for encrypted computing, a frontend tothe address translation lookaside buffer (TLB) memoises [27] the encryptedaddress to a physically backed sub-range of the full memory address space.The memoisation is changed randomly at every write through it, so a physicalobserver sees a random pattern approximating oblivious RAM (ORAM) [28]. eﬁnition 6.

Encryption of n × -bit integers x concatenatesthe encryptions of the 32-bit components: x E = E [ x ] = E [ x . . . . . x n − ] = E [ x ] . . . . . E [ x n − ] The compiler must generate a ‘write storm’ to the whole ofthe array after writing one entry and changing its offset deltabecause it does not know at compile time which entry A i (andits associated offset delta ∆ A i ) will be rewritten at runtime,so it must plan to rewrite all – or rewrite none, which wouldgo against ( e H ). Each write in the write storm contributes newtrace information – the new delta offset – and hence entropy.As stated above, this is the correct approach but our ownprototype compiler does not yet implement it. The softwareengineering perspective is not clear as to whether movingforward to single but long integer deltas, or multiple 32-bitdeltas like those already used for doubles, is the least difﬁcultdevelopment route. The ‘single shared 32-but offset’ ∆ A not (∆ A i ) approach for an array A is what is currently in use.VII. S TRUCTS

C ‘structs’ are records with ﬁxed ﬁelds. The approach thecompiler takes is to maintain a different offset per ﬁeld, pervariable of struct type. That is, for a variable x of struct typewith ﬁelds .a and .b the compiler maintains offsets ∆ x.a and ∆ x.b. It is as though there were two variables, x.a and x.b.In the case of an array A the entries of which are structswith ﬁelds .a and .b, the compiler maintains two separate setsof offsets ∆ A i .a and ∆ A i .b and so on recursively if the ﬁeldsare themselves structs. Updating one ﬁeld in one entry changesthe associated offset and is accompanied by a ‘write storm’ ofadjustments over the stripe through the array consisting of thatsame ﬁeld in all entries. That is more efﬁcient than a stormto all ﬁelds, so for more efﬁcient computing in this context,array entries should be split into structs whenever possible.VIII. U NIONS

The obfuscation scheme in a union type such as union { struct { int a ; float b [ ]; } ; double c [ ]; } engages compatible offset schemes for the component types.The offset scheme for the struct will have the pattern (in 32-bit words) x, y , y , with x the offset for the int and y , y the offsets for the ﬂoat array entries, while the pattern for thedouble array will be u , v , u , v . union { struct { int a x ; float b [ ] y , y ; } ; double c [ ] u , v , u , v ; } The resolution is x = u = α , y = v = β , y = u = γ , v = δ fora scheme α, β, γ, δ . That is the least restrictive obfuscationscheme forced by the union layout here, and it means that awrite to one target ﬁeld within the union can be just that.With our compiler’s present (inadequate) solution for arrays, y = y so β = γ , and u = u so α = γ , and v = v so β = δ . Thatgives α = β = γ = δ and the scheme α, α, α, α of offsets. Thatneeds a write storm to update the deltas across the whole unionafter an update to just one ﬁeld. Not only is that inefﬁcient,but it carries no extra entropy into the trace, contradicting ( e H ). IX. T HEORY

By a trace T of a program at runtime is meant the sequence ofwrites to registers and memory locations. If a location is readfor the ﬁrst time without it having previously been written inthe trace, then that is not part of the trace but an input to it.Trace T is a random variable, varying from recompilationto recompilation of the same source code by the compiler. Thecompiler freely chooses delta offset schemes for each point inthe code as described in previous sections, and the probabilitydistribution for T depends on the distribution of those choices.After a simple assignment to a register r , the trace is longer byone: T ′ = T ⌢ h r = E [ v ] i . Let H( T ) be the entropy of trace T in this stochastic setting. Let f T be the probability distributionof T , then the entropy is the expectation H( T ) = E [ − log f T ] (14)The increase in entropy from T to T ′ (it cannot decrease as T lengthens) is the number of bits of unpredictable informationadded. A ﬂat distribution f T = k (constant) uniquely hasmaximal entropy H( T ) = log (1 /k ) . Only this fragment ofinformation theory will be required: adding a maximal entropysignal to a random variable with any distribution at all on a n -bit space gives another maximal entropy, i.e., ﬂat, distribution .If the offset ∆ r beneath the encryption is chosen randomlyand independently with ﬂat distribution by the compiler, so ithas maximal entropy, then H( T ′ ) = H( T ) + 32 , because thereare 32 bits of unpredictable information added in via the 32-bitdelta to the 32-bit value beneath the encryption, so the 32-bitsum value plus delta varies with (32-bit) maximal entropy.Although per instruction the compiler has free choice inaccord with ( e H ), not all the register/memory write instructionsissued by the compiler are jointly free as to the offset deltafor the target location – it is constrained to be equal at thebeginning and end of a loop, and in general at any point wheretwo control paths join: Deﬁnition 7.

An instruction emitted by the compiler thatadjusts the offset in location l to a ﬁnal value common withthat in a joining control path is a trailer instruction. Trailer instructions come in sets for each location l for acontrol path join, with one member per path. Each in the setfor l is last to write to l in a control path before the join.An example occurs at return from a subroutine. The ﬁnaloffsets per location must be the same at all exit points fromthe subroutine and the arithmetic instructions that write thatmake them so make up the trailer instruction sets.Because running through the same instruction twice, or ainstruction with the same delta offset for the target location asecond time, does not add any new entropy (the delta offset isalready determined for the second encounter by the ﬁrst en-counter), the total entropy in a trace can be counted as follows: Lemma 1.

The entropy of a trace compiled according to ( e H ) is n + m ) bits, where n is the number of distinct arithmeticinstructions that write in the trace, counted once only per setf they are one of a set of trailer instructions, and once eachif they are not, and m is the number of input words. Recall that ‘input’ is provided by those instructions that readfor a ﬁrst time in the trace a location not written in it earlier.Observing data at any point in the trace that has beenwritten by a program instruction (or read from a location inmemory that has not yet been written) sees variation acrossrecompilations. The compiler principle ( e H ) guarantees thatevery opportunity provided by the emission of an arithmeticinstruction that writes is taken by the compiler as a point atwhich new variation is introduced. But at ‘trailer’ instructionsas deﬁned above the compiler jointly organises several in-structions to provide the same ﬁnal delta to a location andthat is sometimes unnecessary, because that location is neverread again. Then the variation the compiler has introduced isnot maximal, because it could be increased by varying deltasindependently among the trailer instructions.To make the trailer instruction synchronisation necessary weconsider that the code might be embedded in any surroundingcode, including that which reads all locations affected. Thenthe trailer synchronisation is necessary and the compiler hasdone the best job possible in terms of introducing as muchentropy as possible. Proposition 1.

The entropy of a program trace compiled ac-cording to ( e H ) with synchronisation only at trailer instructionsbefore different control paths join is maximal over the spaceof all possible variations of the constant parameters in themachine code, given that it works correctly in any context. The proposition implies a full 32 bits of entropy in thevariation beneath the encryption must exist in any locationat any point in the trace where the location has been written,or not yet being written, is read. The datum in that locationhas no other option for coming to be. This is the result ( EP )obtained by structural induction in [11]: Corollary 1.

The probability across different compilations bya compiler that follows principle ( e H ) that any particular 32-bit value has encryption E [ x ] in a given register or memorylocation at any given point in the program at runtime isuniformly / . That is what formally implies ( ⋓(cid:3) ), relative to the security ofthe encryption. But a stronger result can now be obtained fromthe understanding in the lemma and proposition above:

Deﬁnition 8.

Two data observations in the trace are (delta) dependent if they are of the same register at the same point,are input and output of a copy instruction, or are of the sameregister after the last write to it in a control path before a joinand before the next write.

If the trace is observed at two (in general, n ) independentpoints, the variation is maximal possible: Theorem 1.

The probability across different compilations bya compiler that follows principle ( e H ) that any n particular 32- bit values in the trace have encryptions E [ x i ] , provided theyare pairwise (delta) independent, is / n . Each dependent pair reduces the entropy by 32 bits.X. D

ISCUSSION

Theorem 1 quantiﬁes exactly the cross-correlation that existsbeneath the encryption in a trace from compiled code wherethe compiler is built according to the principle ( e H ) (everyarithmetic instruction that writes is varied to the maximal ex-tent possible across recompilations). It ‘names and shames’ thepoints in the trace where the induced variation is necessarilyweak because of the nature of computation, and statisticalinﬂuences from the original source code may show through.For example, if the code runs a loop summing the samevalue again and again into an accumulator, then looking atthe accumulator shows an observer E [ a + i ∗ b + δ ] for aconstant offset δ . That is an arithmetic series with unknownstarting point and constant step and it is likely to be one of therelatively few short-stepping paths, and that can be leveragedinto a dictionary attack on the encryption.A compiler built following the principle ( e H ) does as wellas any may to avoid introducing more such weaknesses. Theonly way to eliminate them is to have no loops or branches inthe object code. That would be a ﬁnite-length calculation orunrolled bounded loop with branches embedded as t ∗ x + (1 − t ) ∗ y calculations, where x and y are the potential outcomesfrom two branches and t is the outcome of a 1/0 test.With respect to data structures, ( e H ) means that each entryof an array must have its own individually chosen delta offsetfrom nominal beneath the encryption, and each write to anarray must change them all, as one must change on write andthe compiler does not know which it will be. The compilermust emit a ‘write storm’. Reads too are necessarily moreinefﬁcient than naively may be expected. Structs (records withnamed ﬁelds) have different offsets per ﬁeld, along the samelines, but the compiler does know which will be accessed, sothere are no write storms. Unions do force equalities amongthe delta offsets of their ﬁelds, but they are to be expectedfrom the aliasing (if it is worthwhile preserving ‘trick’ code –type punned or aliased, writing and reading different types –is another question, but it would break legacy codes not to).This document has not touched on short data structures suchas short integers, but it is a problem as their natural variationis small so they are intrinsically a good subject for dictionaryattacks. With an abundance of caution, we treat them asintegers with random high bits, and a poor consequence isthat strings are loosely packed. The text has also not touchedunsigned integers, but the compiler’s treatment is the same asfor ﬂoats – that is, they are regarded as being coded as signedintegers (with the same bits). The platform provides primitivearithmetic operations on them in that coding (encrypted).The treatment of short integers raises the question ofwhether extra entropy could be introduced by changing to64-bit or 128-bit plaintext words beneath the encryption,instead of 32-bit, and correspondingly sized delta offsets fromominal. We believe that is the correct logical inference. The32-bit range of variation of standard-sized integers would beswamped by a 64-bit delta introduced by the compiler and thelooped stepping example E [ a + i ∗ b + δ above would have a64-bit δ , so would have possible origin points for the pathfor any hypothetical step b , not just , which is too many toexamine in a practical dictionary attack. A 256-bit encryptionfor 128-bit plaintext words with 128-bit deltas introduced bythe compiler could be sufﬁcient for all practical purposes sinceno measurement on the trace could then have less than 128bits of entropy (Corollary 1 makes this observation).A particular concern is whether interactions with memoryreveal too much. One can imagine, for example, testing if twodata values are equal beneath the encryption by seeing if, usedas addresses in a load instruction, they pull the same valuesinto registers. But load and store do not resolve the addressbeneath the encryption. Instead they pass the literal, encryptedaddress as-is to the memory unit (which is not privy to theencryption), so identity of the encrypted addresses is whatwould be tested and that is visible already to an observer.The ‘hardware aliasing’ that multiple encryptions of the sameaddress causes in use in load and store from the program’spoint of view is dealt with by the compiler – it emits code tosave the address verbatim at ﬁrst write for subsequent reuse.At the current stage of development, our own proto-type compiler (http://sf.net/p/obfusc) has near total coverageof ANSI

C with GNU extensions, including statements-as-expressions and expressions-as-statements. It lacks longjmp ,computed goto and global data shared across different compi-lation units (a linking issue).XI. C

ONCLUSION

How to compile compound and nested C data structures forencrypted computing extending existing compiler-based ‘ob-fuscation’ in this context has been set out here. A single com-piler principle is proposed – if any arithmetic instruction thatwrites is emitted, then it must be varied by the compiler to themaximal extent possible from recompilation to recompilation.Then the compiler is ‘best possible’ in terms of introducingentropy beneath the encryption in a program runtime trace, andthat is what provides protection against decryption attemptsin this context. The quantitative theory improves the existing‘cryptographic semantic security relative to the security of theencryption’ result for encrypted computing.R

EFERENCES[1] ISO/IEC, “Programming languages – C,” International Organization forStandardization, 9899:201x Tech. Report n1570, Aug. 2011, JTC 1, SC22, WG 14.[2] P. Breuer and J. Bowen, “A fully homomorphic crypto-processor design:Correctness of a secret computer,” in

Proc. Int. Symp. Eng. Sec. Softw.Sys. (ESSoS’13) , ser. LNCS, no. 7781. Heidelberg/Berlin: Springer,Feb. 2013, pp. 123–138.[3] O. K¨ommerling and M. G. Kuhn, “Design principles for tamper-resistantsmartcard processors,” in

Proc. USENIX Work. Smartcard Tech. , May1999, pp. 9–20.[4] M. Buer, “CMOS-based stateless hardware security module,” Apr. 2006,US Pat. App. 11/159,669.[5] P. Breuer, J. Bowen, E. Palomar, and Z. Liu, “On security in encryptedcomputing,” in

Proc. 20th Int. Conf. Info. Comm. Sec. (ICICS’18) , ser.LNCS, D. Naccache et al. , Eds., no. 11149. Cham, Ger.: Springer, Oct.2018, pp. 192–211. [6] S. Goldwasser and S. Micali, “Probabilistic encryption & how to playmental poker keeping secret all partial information,” in

Proc. 14th Ann.ACM Symp. Th. Comp. , ser. STOC’82. ACM, 1982, pp. 365–377.[7] J. Katz, A. J. Menezes, P. C. Van Oorschot, and S. A. Vanstone,

Handbook of applied cryptography . CRC press, 1996, chapter 10,section 2.2.[8] M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan, “Fully homo-morphic encryption over the integers,” in

Proc. 29th Ann. Int. Conf. Th.Appl. Crypto. Tech. (EUROCRYPT’10) . Springer, 2010, pp. 24–43.[9] R. L. Rivest, L. Adleman, and M. L. Dertouzos, “On data banksand privacy homomorphisms,”

Foundations of Secure Computation,Academia Press , pp. 169–179, 1978.[10] C. Gentry, “Fully homomorphic encryption using ideal lattices,” in

Proc.41st Ann. ACM Symp. Th. Comp. (STOC’09) , NY, 2009, pp. 169–178.[11] P. Breuer, J. Bowen, E. Palomar, and Z. Liu, “On obfuscating compila-tion for encrypted computing,” in

Proc. 14th Int. Conf. Sec. Crypto.(SECRYPT’17) , P. Samarati, M. S. Obaidat, and E. Cabello, Eds.,INSTICC. Port.: SCITEPRESS, Jul. 2017, pp. 247–254.[12] W. J. Cody, “Analysis of proposals for the ﬂoating-point standard,”

Computer , no. 3, pp. 63–68, 1981.[13] D. Goldberg, “What every computer scientist should know aboutﬂoating-point arithmetic,”

ACM Comput. Surv. , vol. 23, no. 1, pp. 5–48, Mar. 1991.[14] P. Breuer, J. Bowen, E. Palomar, and Z. Liu, “Superscalar encryptedRISC: The measure of a secret computer,” in

Proc. 17th Int. Conf. Trust,Sec. & Priv. in Comp. & Comms. (TrustCom’18) . CA, USA: IEEEComp. Soc., Aug. 2018, pp. 1336–1341.[15] ——, “A practical encrypted microprocessor,” in

Proc. 13th Int. Conf.Sec. Crypto. (SECRYPT’16) , C. Callegari, M. van Sinderen, P. Sarigian-nidis, P. Samarati, E. Cabello, P. Lorenz, and M. S. Obaidat, Eds., vol. 4.Port.: SCITEPRESS, Jul. 2016, pp. 239–250.[16] J. Daemen and V. Rijmen,

The Design of Rijndael: AES – The AdvancedEncryption Standard . Springer, 2002.[17] N. G. Tsoutsos and M. Maniatakos, “Investigating the application of oneinstruction set computing for encrypted data computation,” in

Proc. Int.Conf. Sec., Priv. Appl. Crypto. Eng.

Springer, 2013, pp. 21–37.[18] ——, “The HEROIC framework: Encrypted computation without sharedkeys,”

IEEE Trans. CAD IC Sys. , vol. 34, no. 6, pp. 875–888, 2015.[19] P. Paillier, “Public-key cryptosystems based on composite degree resid-uosity classes,” in

Proc. Int. Conf. Th. Appl. Crypto. Tech. (EURO-CRYPT’99) , ser. LNCS, J. Stern, Ed., no. 1592. Heidelberg/Berlin:Springer, 1999, pp. 223–238.[20] F. Irena, D. Murphy, and S. Parameswaran, “Cryptoblaze: A partiallyhomomorphic processor with multiple instructions and non-deterministicencryption support,” in

Proc. 23rd Asia S. Pac. Des. Autom. Conf. (ASP-DAC) . IEEE, 2018, pp. 702–708.[21] S. Rass and P. Schartner, “On the security of a universal cryptocomputer:The chosen instruction attack,”

IEEE Access , vol. 4, pp. 7874–82, 2016.[22] A. Biryukov, “Known plaintext attack,” in

Encyclopedia of Cryptographyand Security , H. C. A. van Tilborg and S. Jajodia, Eds. Boston, MA:Springer, 2011, pp. 704–705.[23] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, S. Mangard,P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg, “Meltdown,”

ArXive-prints , Jan. 2018.[24] P. Kocher, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp,S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom, “Spectre attacks:Exploiting speculative execution,”

ArXiv e-prints , Jan. 2018.[25] Y. Sundblad, “The Ackermann function. a theoretical, computational,and formula manipulative study,”

BIT Num. Math. , vol. 11, no. 1, pp.107–119, Mar. 1971.[26] A. C.-C. Yao, “How to generate and exchange secrets,” in

IEEE, 1986, pp. 162–167.[27] T. Ishihara and F. Fallah, “System and method for providing a waymemoization in a processing environment,” Fujitsu Ltd, Apr. 27 2006,uS Patent App. 10/970,882.[28] R. Ostrovsky, “Efﬁcient computation on oblivious RAMs,” in

Proc. 22ndAnn. ACM Symp. Th. Comp. , ACM. ACM, 1990, pp. 514–523.[29] M. Barr, “Memory,” in

Programming Embedded Systems in C and C++ ,1st ed., A. Oram, Ed. Sebastopol, CA: O’Reilly & Associates, Inc.,1998, ch. 6, pp. 64–92.[30] P. Breuer and J. Bowen, “Certifying machine code safe from hardwarealiasing: RISC is not necessarily risky,” in

Softw. Eng. and FormalMethods , ser. LNCS, S. Counsell and M. N´u˜nez, Eds., no. 8368.Heidelberg: Springer, 2014, pp. 371–388, proc. SEFM 2013 CollocatedWork. (OpenCert’13).[31] ——, “Avoiding hardware aliasing: Verifying RISC Machine and assem-bly code for encrypted computing,” in