Obfuscation using Encryption
11 Obfuscation using Encryption
Johannes Schneider ∗ and Thomas Locher †∗ University of Liechtenstein, Liechtenstein { firstname.lastname } @uni.li † ABB Corporate Research, Baden-Daettwil, Switzerland { firstname.lastname } @ch.abb.com Abstract —Protecting source code against reverse engineeringand theft is an important problem. The goal is to carry outcomputations using confidential algorithms on an untrusted partywhile ensuring confidentiality of algorithms. This problem hasbeen addressed for Boolean circuits known as ‘circuit privacy’.Circuits corresponding to real-world programs are impractical.Well-known obfuscation techniques are highly practicable, butprovide only limited security, e.g., no piracy protection. In thiswork, we modify source code yielding programs with adjustableperformance and security guarantees ranging from indistin-guishability obfuscators to (non-secure) ordinary obfuscation.The idea is to artificially generate ‘misleading’ statements. Theirresults are combined with the outcome of a confidential statementusing encrypted selector variables . Thus, an attacker must ‘guess’the encrypted selector variables to disguise the confidentialsource code. We evaluated our method using more than tenprogrammers as well as pattern mining across open source coderepositories to gain insights of (micro-)coding patterns that arerelevant for generating misleading statements. The evaluationreveals that our approach is effective in that it successfullypreserves source code confidentiality.
I. I
NTRODUCTION
Intellectual property, e.g., in the form of algorithms, iscostly to develop and it is always at risk of being stolen.In particular, in an untrusted cloud environment, algorithmsholding valuable expert knowledge are susceptible to theft: Ifthe cloud infrastructure is compromised, an attacker can steal(compiled) program code. Insider attacks are also a real threat.A typical approach to (partially) protect intellectual propertyis to obscure it in such a way that it becomes difficult to figureout its purpose and functionality. However, such obfuscationdoes not solve the problem satisfactorily as the functioningof the algorithm can still be observed and analyzed, even tothe point where the whole algorithm and its parameters arereconstructed. Thus, an attacker with access to the cloud canstill steal an algorithm and execute it without modification. Inturn, Boolean circuits can be cryptographically protected, e.g.,using fully homomorphic encryption. An attacker obtaining a‘protected’ circuit cannot evaluate it. However, computationson Boolean circuits alone are considered highly impracticaldue to the large size of circuits that are needed for complexfunctionality. We tackle the problem at a higher level byencrypting statements in source code instead of individualgates. Our primary focus is on encryption of source code aswritten by the programmer (or intermediate code). Our obfus-cation technique yields analogous guarantees as for Booleancircuits, i.e., candidate indistinguishability obfuscators, using higher level source code primitives. Moreover, we discusshow to obtain less secure obfuscations that run much faster.Obfuscation is done by a novel technique: we transform sourcecode into “encrypted” source code by adding a multitudeof additional misleading statements (and variables) combinedwith selector variables. Selector variables are encrypted binaryvariables that “choose” the right result among misleadingand confidential statements. Since the selector variables areencrypted, it is generally non-trivial to determine which state-ments merely serve the purpose of misleading an attacker andwhich statements actually contribute to the computation ofthe result. In particular, current de-obfuscation methods andtools are unable to do so. The difficulty of de-obfuscationor extracting the confidential source code depends on thebackground knowledge of the attacker about the programto de-obfuscate as well as on the choice of the misleadingstatements. We also provide an analysis of several open sourcecode programs to determine patterns that might have to beobserved when creating obfuscated source code. Finally, wegive a brief assessment of our method using a small-scaleexperiment using actual programmers whose task was to breaka simple encrypted piece of code. Programmers were very farfrom breaking our obfuscation scheme despite the fact that weused a rather limited degree of source code encryption. Thus,the results provide some evidence that the proposed methodis indeed effective.II. M
ODEL AND P ROBLEM
A client wishes to execute confidential source code inan untrusted environment. The data might or might not beconfidential, but we assume that the source code computes onencrypted data. Typically, the clients encrypt code (and data)and send both to an untrusted server. The server performspossibly multiple computations using the encrypted code andvarious input data. It returns encrypted results to clients, wherethe result are decrypted.Intuitively, the mechanism is secure if the adversary havingfull access to the server (including CPU registers, memory,hard drives and the obfuscated code) does not learn anythingabout the source code or about the data. However, in practice,it might be sufficient if only certain parts or aspects of apiece of software remain confidential, eg. the call graph or thearchitecture. We focus on the case where an attacker shouldlearn as little as possible how the computation of the outputworks. a r X i v : . [ c s . CR ] D ec We assume that all encryptions are perfectly secure, i.e., anattacker cannot gain any knowledge from encrypted data.Obfuscation maps a confidential program P C from aprogram class C C to another program from a program class C O , ie. we can express an obfuscator O as O : C C → C O .As we shall discuss later, the relationship is non-injective,ie. every program P O ∈ C O maps to some set C C . Whereasoperations in the given program P C operate on plaintexts,obfuscated programs in C O perform computations onencrypted data. For example, a program a · b with plaintextvariables a, b might correspond to EN C ( a ) · EN C ( b ) on encrypted values, ie. EN C ( a ) , EN C ( b ) denotesciphertexts and the ‘ · ’ operator performs multiplicationof plaintexts using encrypted values only, such that DEC ( EN C ( a ) · EN C ( b )) = a · b , where DEC denotesdecryption. For this example, knowing
EN C ( a ) · EN C ( b ) suffices to discover the program on plaintexts, ie. a · b ,since the performed operations on plaintext can generally beinferred by the attacker having the obfuscated program withoperations on ciphertexts. For example, for the Goldwasser-Micali crytposystem multiplication of ciphertexts correspondsto an XOR of plaintexts. We assume that the attackerknows which cryptoysystem is employed. For more generalcomputations we need either fully homomorphic encryption orsecure multi-party computation. Generally, merely encryptingdata used for computation does not provide any security ofalgorithms.The adversary might know the obfuscation algorithm andprogram classes C O and C C . The task of the attacker isto assign to each program P from the program class C C aprobability p ( P ) stating his belief that the program P is the (ora) confidential program P C . Thus, the goal of the attacker is tochoose P to maximize p ( P ) without knowing the distribution p . However, an attacker might have some knowledge about thedistribution. In particular, he might know coding patterns, eg.covering a few lines of code, that are more likely to occurthan other lines of code. Generally, finding a program P (cid:48) that differs only slightly from the confidential program P C might also be satisfactory. An attacker is said to successfullybreak our scheme, if it can do so using any computation takingpolynomial time in the size of the largest program in C O . A. Indistinguishability Obfuscation
Indistinguishability obfuscation (see Definition 1 in [13])essentially says that the obfuscations of two circuits look‘equivalent’ to an attacker. This definition focusing on circuitscan be extended to more general programs being composed ofmore complex commands than Boolean gates. By contrast, wealso use additional misleading input data for a program thathas no impact on the result, eg. a function f ( x ) = x can beextended to f ( x, y ) = x +0 · y . Therefore, we assume that anyprogram can be called with any superset of the required inputto compute the result. A program class C C is an arbitrary setof programs. We discuss program classes and their relationto our obfuscation technique later (see Section III-C). Moreformally, we define similarly to Definition 1 in [13]: Definition 1 (Indistinguishability Obfuscator (iO) ) . A uniformProbabilistic Polynomial-Time (PPT) machine iO: C C → C O is called an indistinguishability obfuscator for program class C C if the following is satisfied:i) For all P ∈ C C and inputs x , we have P r [ DEC ( P (cid:48) ( EN C ( x ))) = P ( x ) : P (cid:48) ← iO ( P )] = 1 ii) For any PPT distinguisher D, there exists a negligible value (cid:15) such that for all pairs of programs P, P (cid:48) ∈ C C if P ( x ) = P (cid:48) ( x ) for all inputs x , then | P r [ D ( iO ( P )) = 1] − P r [ D ( iO ( P (cid:48) )) = 1] | ≤ (cid:15) Condition (i) essentially demands that the obfuscated pro-gram and the non-obfuscated program produce the same out-come. Condition (ii) says that knowing an obfuscated programis of no use to an attacker. This definition by itself is not suf-ficient to ensure confidentiality without using proper programclasses. In other words, an obfuscator satisfying the abovedefinition does not necessarily protect an algorithm well. Inparticular, the confidential program is not protected sufficientlyif the program class is not meaningful in that many programsin the program class can easily be ruled out as the confidentialprogram (see Section III-C). Furthermore, Condition (ii) in theabove definition (and also in [13]) focuses only on avoidingthe leakage of programs that behave identically with respectto outputs, ie. P ( x ) = P (cid:48) ( x ) . In practice, any program withvery similar input-output behavior to the confidential programmight also be deemed confidential. B. Program Definition
We keep to a simple yet sufficient definition. Assumethat a program P of length n is a sequence of statements s , s , s , ..., s n − . A statement s i is an assignment of a(simple) expression to a variable, ie. a statement s couldbe r := a · b with expression a · b and result variable r . A simple expression consists of a single operationand two (input) variables. We generally write it in theform Op ( Input , Input , eg. M U L ( a, b ) denotes themultiplication a · b . The program returns the result, ie.variable, of the last statement. Later (see Definition 2),we introduce in more detail combining statements , whichaggregate the results of several misleading statements andone confidential statement. Note that this simple definitionof a statement also covers control-flow statements forcode operating on encrypted data (only), since conditionalstatements must be transformed to hide the outcome of thebranching conditions. For example, loop unrolling is doneusing an upper bound on the possible number of iterations.For illustration, consider a program to square a number a . Let program P ( a ) consist of a single statement b := M U L ( a, a ) with input value a ∈ { , , ... } . Assume theset of possible operations in the used programming languageis given by S O := { M U L, ADD, DIV } and there is justone variable a . A simple expression consists of an operationand its input variables, eg., M U L ( a, a ) . If we define the set of programs to be all programs with one simple expressionand one input variable, we could define the program classas P = { ( b := M U L ( a, a )) , ( b := ADD ( a, a )) , ( b := DIV ( a, a )) } . If we use two statements, where the first definessome variable, we get P = { ( b := M U L ( a, a ); c := M U L ( a, b )) , ( b := M U L ( a, a ); c := ADD ( a, b )) , ... } . Out ofthese programs three actually perform the desired functionalityof squaring a number, namely ( b := M U L ( a, a ); c := M U L ( a, a )) , ( b := ADD ( a, a ); c := M U L ( a, a )) , ( b := DIV ( a, a ); c := M U L ( a, a )) . The number of simple ex-pressions given | V | variables and | S O | operations taking t input variables can be bounded by | S O | · | V | t . Any of theprior programs that squares a number might serve as anobfuscated program in the traditional sense (i.e., without usingencryption). But standard dead code elimination would alreadyremove the added misleading statement for obfuscation. Next,we introduce a more sophisticated technique that makes useof encrypted selector variables.III. S OURCE C ODE O BFUSCATION THROUGH E NCRYPTION
We distinguish two levels of source code encryption differ-ing in the unit of abstraction they seek to conceal: Programlevel encryption considers programs as a whole. A finergranularity is given by hiding individual statements.
A. Program Level
The idea is to conceal the confidential program is to run(all) possible programs (not just those yielding the desiredoutput), ie. all programs from class C C . Then we select theconfidential result out of all results using encrypted binaryselector variables. Since all programs are run and only theresult is selected, an attacker obtains no information whichof the executed programs yields the result. Thus, the class C O consists of just one program executing all programs in C C using encrypted inputs. It is important that the order ofcomputation of the programs is randomized, ie. we randomlypermute all programs and then evaluate them.Let EN C ( r i ) be the encrypted result of program P i ( x ) ∈ C O for an input x . Define b i ∈ { , } to be a selector bit forprogram i , ie. b i = 1 if i = i ∗ and 0 otherwise. Let P i ∗ = P C be the confidential program, e.g., with the functionality tosquare a number. We say P i ∗ ( x ) = r ∗ yields the desiredresult r ∗ . The final encrypted result r ∗ can be obtained by EN C ( r ∗ ) := (cid:80) i EN C ( b i ) · EN C ( r i ) , where ‘ · ’ is a multipli-cation on encrypted data and the summation is also performedon encrypted values. The client can decrypt EN C ( r ∗ ) toget r ∗ . For example, say P = P C := { M U L ( a, a ) } and P := {} ADD ( a, a ) } . We set b = 1 and b = 0 .Therefore, EN C ( r ∗ ) := EN C ( M U L ( a, a )) · EN C ( b ) + EN C ( ADD ( a, a )) · EN C ( b ) = EN C ( M U L ( a, a )) Since the bits b i as well as the results are encrypted, anattacker does not know which of the results is chosen. It isimmediate that such an obfuscator satisfies Definition 1 giventhat all secure computation mechanisms are secure, ie. wefulfill condition (i) due to the definition of selector variablesresulting in choosing the output of the confidential program forany input among all executed programs. We fulfill (ii), since there is essentially just one obfuscated program (comprisingof all programs), ie. any two programs P (cid:48) and P map to thesame obfuscated program.Running entire programs and selecting the result is not ideal.When choosing k programs with similar running times orcomputational costs, the running time increases by a factor of k . But an attacker only has to choose one out of k programs,i.e., the attacker breaks the “encryption” with probability /k by guessing. We would like to have at least an exponentialrelationship in the computational cost and the difficulty toguess, e.g., / k . The key idea to reach this goal is to applyselector bits per statement rather than picking entire programs. B. Statement Level
An attacker should not be able to infer a specific statement,even given that he knows all prior and all following statements.Our strategy is analogous as for (entire) programs: In order todisguise the statement s we create a set of k − ‘misleading’statements M i . Then all statements { M , ..., M k − , s } arerandomly permuted. After that they are executed in that order.We choose the correct result, ie. that of the confidentialvariable s , using binary selector variables b i . More precisely,we compute the combining statement , which is defined asfollows: Definition 2.
The combining statement for a set of statements S = { S C , M , ..., M k − } with confidential statement S C andmisleading statements M i is given by EN C ( r ∗ ) := (cid:88) s ∈ S EN C ( b s ) · EN C ( r s ) where r s is the result of statement s and b s are binary(selector) variables, such that b s = 1 for s = S C and 0otherwise. For example, for the statement c := M U L ( a, a ) , wecould choose a misleading statement ADD ( a, a ) and use thecombining statement c (cid:48) := M U L ( a, a ) · b + ADD ( a, a ) · b with binary selector variable b = 1 − b being 1 and b being 0. Note, that since b = 1 and b = 0 wehave c = c (cid:48) . If the selector variables b , b as well asother variables and subexpressions are encrypted, i.e., wecompute EN C ( c (cid:48) ) := EN C ( M U L ( a, a )) · EN C ( b ) + EN C ( ADD ( a, a )) · EN C ( b ) (with · , + operating on en-crypted data), then an attacker is not able to discover the truestatement without knowing b (or b ).In contrast to program level security (Section III-A), wejust execute a single but larger program. Generally, for thesame level of security, this single program is much morecompact (and efficient to evaluate) than the concatenation ofall programs used for program security. Roughly speaking, forprograms of length n , where each statement could be assignedone out of k expressions, the number of feasible programs is k n . The concatenation of all misleading statements is of length k · n , since for each of the n statements there are k options.This concatenation covers all k n feasible programs. Therefore,by executing k · n rather than n statements, an attacker has tochoose from k n options. This polynomial relationship between computational costs and number of possible programs is muchbetter than the linear relationship presented in the prior SectionIII-A. Therefore, we focus on statement level security.So far, we have focused on obfuscating a single confidentialstatement by combining its result and the result of severalmisleading statements. In principle, one could also generate“misleading combining statements” consisting of misleadingstatements only, ie. the entire statement is irrelevant. Thus,these combining statements merely serve the purpose ofconfusing the attacker (and hiding the true length of theconfidential program). In fact, such ‘misleading combiningstatements” might make comprehension more difficult. Butthey do not make the life of the attacker harder, when it comesto obtaining a program with the same input-output behavior.The attacker does not have to guess any misleading statementcorrectly, since all of them have no impact on the output.Still, the number of possible expressions k can be ex-ceedingly large, rendering the execution of k · n statementspractically infeasible. Therefore, we can only choose a subsetof all possible statements as discussed next. C. Program Classes
There is an interesting relationship between an obfuscatedprogram P O and program classes C C containing the confi-dential program P C . An obfuscated program P O using binaryselector variables defines a program class C C , i.e., a set ofprograms, containing the confidential program that could havebeen mapped by the obfuscator to create P O .For example, the confidential program P C = { r := M U L ( a, a ) } could have been mapped to the encrypted pro-gram P O := { r M U L ( EN C ( a ) , EN C ( a ))); r ADD ( EN C ( a ) , EN C ( a )); r r · EN C ( b ) + r · EN C ( b ) } , where M U L, ADD compute the multiplicationand addition of plaintexts using encrypted values. The en-crypted program P ) defines the program class C C consisting of { M U L ( a, a ) , ADD ( a, a ) } . The program class defined by theencrypted program depends on the details of the obfuscationalgorithm, its parameters and the input program. Intuitively,large program class C C seem preferable, since they enlargethe search space for the attacker. Indistinguishability obfusca-tion ensures that an attacker seeing the obfuscated programcannot determine any information about which program of allprograms in the program class C C is the confidential program.But indistinguishability obfuscation alone does not necessarilygive any practical security guarantees, if the program class C C is known to the attacker and there exist many programs that areclose to the confidential program. For example, if all programsin the program class C C (including the confidential program)yield (almost) the same output on any input then an attackercan pick any program from the class to get (almost) the samefunctionality. Moreover, it might be possible to exclude manyprograms from the program class C C through simple reasoningthat might not even require background knowledge. Certainprograms might be executable and yield valid outputs, but theirsemantics might seem unreasonable. For example, a programthat returns the same output irrespective of the input. Anattacker can safely ignore such programs therefore reducing the size of the program class containing the confidentialalgorithm. It is generally not clear how to best choose theprogram class so that the attacker cannot easily eliminate manyprograms from the class with more or less simple analysis orbackground knowledge. There are many factors that come intoplay. We discuss mechanisms to create obfuscated programsin the next section.IV. O BFUSCATION M ETRICS
Our metrics primarily follow (but enhance) the metrics for(standard) obfuscation [10]. We introduce a new metric thatcharacterizes the quality of a program class.
Code potency : Obfuscated code can be assessed using tradi-tional code complexity measures, e.g., based on control flowand data access. One might also use parameters from theobfuscation process itself, which illustrate the complexity ofthe encrypted code. For example, if we choose (up to) k misleading statements for each statement, we might use ametric such as a “mislead factor”. This is analogous to thesecurity parameter for encryption. Resilience : This is the ability to withstand attacks usingautomated tools (maybe given some background knowledge).For example, an attacker could try to reconstruct the programusing known-plaintext attacks, i.e., he knows the input ofthe function and a few outputs. The attacker could thensimply try each program from the encryption space and returnall matching ones. Another example involves knowledge ofcoding guidelines used in the program or having unencryptedcode of the same programmer at hand. Programmers typicallyhave their own style that might be comparable to a signatureidentifying the programmer. Thus, an attacker could searchfor patterns in the encrypted code that match the codingguidelines (or the patterns matching the programmer’s codingstyle). This could help to reduce the search space of possiblenon-obfuscated programs given an obfuscated program. Twometrics have been proposed in the literature [10]: i) program-mer effort to construct an automatic de-obfuscator; ii) de-obfuscator effort: execution time and space required by thede-obfuscator.One might also consider a metric for measuring potency reduc-tion , stating how effective a de-obfuscator is. For example, forstatement level security it could measure the “mislead factorreduction”, i.e., the (average) number of misleading (or junk)statements that the de-obfuscator could identify per statementthat has been obfuscated.
Program class quality : On a high level a program classshould contain programs that are sufficiently different fromthe confidential program, such that knowing an obfuscatedprograms is of no value to an attacker. Given a program fromthe program class, an attacker should not easily be able tojudge whether or not it is the confidential program. Sinceour obfuscated program using encrypted selector variablesdiscloses the program class, both aspects are important. Givinga formal definition of a program class is difficult, since itdepends on the semantics of a program. However, we mightreason about the attackers capabilities and define the quality ofthe program class in terms of the probability estimates p ( P ) of a rational attacker that a specific program is confidential.A rational attacker would leverage all its knowledge aboutthe confidential program, the obfuscation algorithm etc. tocompute a probability distribution p ( P ) stating the probabilityestimate that a program P is confidential. To determine thequality of the program class given the distribution of anattacker we define the rank r of a program P as the numberof programs P (cid:48) with p ( P (cid:48) ) ≤ p ( P C ) . We define the qualityof program class C C with a subset S ⊆ C C of programssuch that all of the programs in S are considered confidentialas Q ( C C ) := 1 − /r with r being the minimum rank ofany program P ∈ S . The quality ranges between 0 and 1.It is 0 if the attacker correctly assigns a (or the) confidentialprogram the largest probability to be confidential. It is 1 if theattacker believes it is the least likely. The motivation is that avery reasonable behavior for an attacker given the probabilitydistribution p ( P ) is to perform an in depth investigationof the programs sorted by their probability estimate. Forexample, assuming the attacker has access to some plaintexts(input and output), he might check one program after theother, whether its input-output behavior matches the knownplaintexts. Thus, the time needed to infer the confidentialprogram depends directly on our quality measure. Definingthe set of confidential programs S is application dependent.For example, given one confidential program P C one mightdefine the set S as all programs having the same input-outputbehavior for all inputs, ie. S = { P |∀ x : P ( x ) = P C ( x ) } . Stealth refers to difficulty to de-obfuscate programs by hu-mans, meaning that junk statements should not be easilydiscoverable by a human. The ability to discover them sig-nificantly depends on the knowledge a human has about theprogram (and surrounding statements).
Execution cost : For instance, the overhead factor being theratio of the (average) execution time of non-obfuscated andthe obfuscated program.V. P
RACTICAL O BFUSCATION ON THE S TATEMENT L EVEL
Executing all possible (simple) expressions for a real-worldprogramming language instead of just a single one for eachstatement in a program is infeasible for even moderatelylarge programs. Thus, we introduce a weaker form of securityby adding fewer misleading statements in order to improveperformance. Choosing these misleading statements in a waythat is hard to disguise is non-trivial. One might have to takeinto account existing patterns in code. and, potentially, removethese coding patterns to create a more uniformly lookingencrypted code that is less sensitive to statistical attacks(Section V-A). We aim at schemes that make it possible tostrike a balance between several metrics, in particular securityand performance. We discuss metrics for obfuscation andsecurity (Section IV) as well as what kind of information ofa real-world program might be worth to hide (Section ?? ). A. Preprocessing - Source Code Uniformization
Before obfuscating source code, we might modify the codethrough several preprocessing operations that do not changethe semantics of the code but make it more ‘uniform’ in the sense that patterns consisting of potentially multiple statementsare removed. We might also avoid generating highly unlikelypatterns. Source code might contain patterns due to preferencesof a programmer or due to the nature of the problem. Program-mer preferences might be found in certain command constructsused to solve frequently reoccurring tasks. Patterns can bemined and used for statistical attacks. Assume the obfuscatedstatement contains two types of comparison expressions a (cid:54) = b and a = b out of which one is misleading. Assume weknow that a programmer frequently almost never uses ‘ (cid:54) = (cid:48) but almost uses ‘not’ ‘ (cid:54) (cid:48) and ‘ = (cid:48) to express inequality, et. (cid:54) ( a (cid:54) = b ) . This information makes it easier to find non-obfuscated code. In another example, a programmer mightalso prefer to state a loop condition involving integer I andloop variable x as x ≤ I − rather than x < I . Theformer condition x ≤ I − is expressed using two statements, J := I − and x ≤ J . Therefore, when trying to determine astatement out of a set of misleading statements, and seeing acondition x ≤ J , an attacker might investigate the precedingstatements to check whether one of them contains a statementof the form J := I − , if so, this is an indication thatthe code contains x ≤ I − . Both cases are easy to detectautomatically in source code of typed languages. We canenforce using, eg. ’ · ’ and x < I by rewriting the codewithout changing the semantics. In fact, some cases might evenbe covered by compiler optimizations. This might reduce thepossibilities for an attacker to exploit prior knowledge aboutcertain programming preferences. B. Misleading Code Generation
An essential question for obfuscation relates to the choiceof misleading statements to satisfy the metrics in Section IV.Several considerations apply:
Number of misleading statements : Choosing relatively few(misleading) statements might cause only little overhead dueto obfuscation but put security at jeopardy.
Execution costs of a misleading statement : It may bepreferable to choose misleading statements that do not requirea lot of computation. For example, rather than executing a‘sort’ command on a list as misleading statement, one mightremove the first element. The first command requires O ( l log l ) time for a list of length l , whereas the amortized time for thesecond command is only O (log l ) [23]. Similarity of misleading and confidential statements : Onestrategy is to choose misleading statements independently ofexisting source code. However, independently chosen state-ments might be easy to identify, since they might appearas outliers. For example, given a complex function of manysimpler mathematical functions, such as sine, square root,division and so forth, it is not recommended to add misleadingstatements manipulating strings. It seems likely that a stringexpression is identified as not being part of the confidentialprogram. To make the junk statements hard to identify, poten-tially given some knowledge about the application or sourcecode structure, it seems favorable to choose misleading codethat is similar to the actual source code.
Intermediate code vs. source code : One might add statements for intermediate code (e.g., Java bytecode) or plain sourcecode. Generally, in our scenario only intermediate code isgiven to an untrusted party. An attacker typically decompilesit. Modifying intermediate code might cause decompilers tofail. Therefore, an attacker might have to work on intermediatecode itself, which is considerably harder. Compilers can alsogenerate intermediate code according to different metrics, forexample with the goal to minimize the amount of code. In ourcase, in order to minimize the absolute performance impact ofobfuscation, it seems preferable to generate short programs.As an example, if we add three junk statements per statementin a short program, the absolute increase in running time isless than if we do the same for a long program. Furthermore,intermediate code might remove certain coding patterns thatcould help in statistical attacks (see Section V-A). Thus, if ourobfuscation technique does not account for dependencies, anattacker might be able to rule out some misleading statements.In intermediate code, such patterns might be removed.
Simplifying complex expressions : One has to decide whichconstructs in the source code should be enhanced by mislead-ing ‘statements’. Depending on the programming language,statements are allowed to vary a lot in complexity: they couldbe decomposable into many simple expressions. Disguisinga complex statement as a whole, also requires a statementof similar complexity (otherwise an attacker might assumethat very complex statements are non-junk). Trying to createcomplex misleading statements seems rather challenging, sincea single subexpression that appears as outlier might be suffi-cient for an attacker to disguise the entire complex statementas ‘misleading’. Furthermore, in the most extreme case anentire program is expressed in one statement, which resultsin similar disadvantages as discussed when obfuscating entireprograms compared to individual statements (Section III-A).Therefore, it might be preferable to create a uniform sourcecode representation of simple statements, such as three-addresscodes and obfuscate them.
Syntactic and semantic correctness : For three-address code,a junk statement comprises of an operation and two operands.Clearly, the operation must be executable for the operands.This implies that the operands must have a type that issupported by the operation. But the values of the operandsshould also be supported. For example, when adding a divisionas a junk statement, the divisor should not be zero.
Operation and operands obfuscation : We can create mis-leading statements by using the same operations but differentoperands or different operations with the same operands or wecan vary both, e.g., for the statement a · b , we could add a + b or a − b to disguise the operation only and x · y or e · f to disguisethe operands only. In order to combine both, we might createtemporary variables involving several variables with selectorvariables and then we could use these temporary variables invarious operations. This yields the largest search space for anattacker. For example, we can set temporary variable t to be t b · a + (1 − b · x and t b · b + (1 − b · y forselector variables b , b . We can use these temporary variablesin statements t · t and t t . This yields a total of eightpossibilities of statements. More generally, if both temporaryvariables can be chosen out of k variables and the operations can be chosen out of o operations, we get k · o possibilities. Patterns and statement dependency : Statements are gener-ally not independent. Given a single statement the probabilityof the next statement is not uniform across all statements. Partsof the dependency are due to the data type, that is, only certainstatements are feasible for certain types. For example, givenwe know a statement performs string concatenation, it seemsmore likely that among the next statements there is anotherstatement for string operation rather than a trigonometricfunction. There might be common patterns consisting of asequence of a few statements that are more likely than othersin general or depending on a specific application domainor programmer. Finding the optimal strategy of the attackerand the obfuscator to use (or counteract) the knowledge ofpattern could be cast as the problem of finding a mixed-strategy Nash equilibria, eg. we look at the obfuscator andthe attacker as players. A formal treatment of Nash equilibriais beyond the scope of this work and we refer to text bookson game theory covering these concepts, eg. [8]. To get someintuition, consider the following scenario. There are n differentstatements S that are not uniformly distributed. There is onefrequent statement F ∈ S with (large) likelihood p l andevery other of the n − statements has the same probability / ( n − , ie. we have p l (cid:29) /n . Assume the strategyof the obfuscator is to generate one misleading statementfor each statement s C belonging to the confidential code C by choosing it uniformly at random from S \ s C , ie. itdoes not choose the misleading statement to be the same asthe confidential statement, since this provides no protection.Say an attacker wants to choose the correct statement for apair of two statements s C , M with M being the misleadingstatement and s C a statement of the confidential program. Hewould behave as follows to maximize the expected numberof correctly inferred statements: If one of the two statements s C , M equals F , he guesses F . If both statements do notequal F he chooses randomly among the two. Let us computethe probability that an attacker correctly infers a statementgiven two statements s C , M of a large program. The attacker isalways correct if the true statement s C is F , which happens fora fraction p l of all statements. If the true statement is not F butthe misleading statement is F the attacker will always predictwrongly. If the misleading statement is not F and the truestatement is not F , it predicts correctly with probability / .Thus, the expected number of correctly predicted statements is p l + (1 − p l ) · (1 − /n ) · / . Now, assume that our obfuscatoris aware of the true distribution of statements as well. Assumethat it always chooses F as misleading statement if s C (cid:54) = F .If the attacker does not alter its strategy, the expected numberof correctly guessed statements is now only p l . However, wecould do better using a different strategy.VI. E MPIRICAL E VALUATION
In Section V-B we introduced several general aspects formisleading code generation. In this section, we discuss somepoints in more detail and further provide a short evaluationby programmers. A key aspect is the structure of the code:How much dependency is there across statements? Does code contain patterns that must be taken into account whengenerating code? The second question is discussed next.
A. Patterns in Source Code
Patterns in source code might facilitate the decoding ofobfuscated code using encryption as discussed in the beginningof Section V. There is a sheer endless number of patterns onecould look for, ranging from high-level design patterns to theuse of individual operators. Since we operate primarily ona statement level, we compared the frequencies of patternson a lower level. We looked at the distribution of operators,i.e., we counted how often an operator appeared in the code,since simple statements essentially consist of an operatorand one or two variables. An even more particular statementinvolves an integer constant and an operator. We conjecturedthat the usage of integer constants might vary significantlyacross programmers, for example the usage of ≤ n − vs. < n . Therefore, we looked at how integer constants are usedwithin binary expressions, specifically what values they haveand with what operator they are used with. In order to getsome intuition about more complex patterns, we looked at thecomposition of binary and unary expressions: What are thetype of expression(s) and the operation that constitute a binary(or unary) expression?For comparison, we chose five data mining frameworksimplemented in JAVA namely WEKA, ELKI, JSAT, Java-ML and Spmf . We compared all clustering algorithmsper framework together (see Table I) as well as particularalgorithms, namely k -Means and OPTICS clustering (fork-Means see Tables III), to get a more robust estimate. Wealso compared the GUI parts of the frameworks (see TableII) to detect variation across different applications.The less uniform code is across programmers and appli-cation domains, the more important it is to take particularcoding patterns into account. When looking at the standarddeviation and the mean for an application domain on its own(Tables II and III) for all the structural patterns, we see thatthe mean is considerably larger than the standard deviation.This shows limited dispersion and a certain uniformity. There-fore, knowing that code stems from a particular frameworkhelps in predicting a statement but only to a rather limiteddegree. Generating misleading statements according to such adistribution seems helpful, i.e., we might choose misleadingstatements such that the overall distribution roughly stays thesame. When comparing the means for the same patterns ofdifferent application domains, we see stronger variation. Forexample, the mean for the increment ‘posIncrement NameE’for GUI and clustering varies by a factor of two. Still, thedifferences are not that large that a decoding of statementswould be easily possible by merely knowing that we consider aspecific type of application. Judging from the usage of integerconstants and operators, there are strong deviations for a fewcases. For example, the plus operator occurs almost three times more often in WEKA than in JSAT or JavaML. JSATdoes not use integer constants except divisions by 2, ELKIin turn never uses divisions by 2. This hints at programmerpreferences. Given the strong variation for some examples,it seems advisable to create misleading commands based onpatterns observed in the code. Frameworks Mean StdPattern Weka Elki JSAT Spmf JavaMLStructural Pattern: Binary and Unary ExpressionposIncrement NameE 9 (446) (316) (369) (37) (285) (387) (152) (13) (15) (7) (342) (441) (285) (32) (165) (339) (410) (191) (31) (94) (195) (251) (172) (7) (62) (171) (44) (34) (13) (35) (114) (78) (61) (1) (31) (90) (95) (57) (9) (72) (86) (45) (25) (2) (16) (63) (50) (13) (5) (9) (50) (3) (0) (2) (4) (21) (0) (19) (0) (20) (1360) (1106) (719) (100) (437) (1210) (748) (284) (61) (145) (479) (478) (458) (36) (289) (242) (260) (142) (24) (130) (228) (242) (155) (39) (45) (166) (178) (72) (7) (54) TABLE IS
ORTED PATTERNS FOR ALL CLUSTERING ALGORITHMS WITHRELATIVE (%)
AND ABSOLUTE FREQUENCIES IN BRACKETS
Frameworks Mean StdPattern Weka Elki JSAT SpmfStructural Pattern: Binary and Unary Expressionassign MethodCallE 7 (1871) (31) (2) (39) (1501) (69) (9) (73) (1423) (43) (6) (37) (1183) (47) (3) (9) (1084) (0) (0) (19) (1048) (4) (14) (23) (570) (25) (0) (1) (401) (4) (0) (9) (344) (5) (0) (1) (340) (7) (0) (8) (267) (0) (11) (0) (163) (3) (0) (1) (8687) (240) (17) (167) (5032) (63) (16) (85) (1884) (73) (1) (89) (1736) (66) (3) (29) (1228) (14) (16) (20) (1140) (15) (13) (15) TABLE IIS
ORTED PATTERNS FOR
GUI
WITH RELATIVE (%)
AND ABSOLUTEFREQUENCIES IN BRACKETS
B. (De-)Obfuscation Case Study
In this section we present examples of how simple obfus-cated code might look like and we also perform a small-scaleempirical evaluation with respect to the ability of humansto break the obfuscation, ie. the discover the confidentialprogram. To the best of our knowledge, we are among thefirst to do such an empirical study providing some evidenceon the value of obfuscation. We (manually) obfuscated code
Frameworks Mean StdPattern Weka Elki JSAT Spmf JavaMLStructural Pattern: Binary and Unary Expressionassign MethodCallE 12 (65) (7) (6) (7) (0) (46) (9) (7) (5) (13) (33) (9) (13) (7) (9) (27) (0) (0) (2) (0) (23) (2) (0) (1) (3) (22) (0) (0) (0) (0) (29) (1) (2) (0) (0) (13) (1) (0) (0) (1) (5) (2) (0) (0) (2) (5) (0) (2) (1) (0) (4) (0) (0) (0) (0) (3) (0) (0) (0) (0) (175) (16) (32) (20) (21) (136) (8) (5) (8) (3) (52) (11) (11) (7) (13) (35) (0) (10) (3) (2) (22) (1) (2) (2) (2) (19) (5) (3) (7) (2) TABLE IIIS
ORTED PATTERNS FOR K -M EANS ALGORITHM WITH RELATIVE (%)
ANDABSOLUTE FREQUENCIES IN BRACKETS for two simple tasks that are understandable by any noviceprogrammer. The first task is to avoid division by zero bychecking the value of a divisor and return some (error) value,if the divisor is zero: if (y!= 0) then r:=x/y else r:=-9999
The second task is to return the maximum of an array: m:=a[0]for(x=0;x
We assumed that all constants are encrypted and appear asvariables. The code of task one becomes: if (y!= u) then r:=x/y else r:=v
Obfuscation: There are many different options to obfuscatethe code using encryption (see Section IV). Discussingmultiple techniques in detail is out of scope for this work.Therefore, we focus on a case study with simple manual codegeneration. We considered two levels of obfuscation ( L and L ) with a different number of misleading statements andvariables.For the lower-level obfuscation, i.e., L , we introducedtwo fake variables w, z . Thus, for obfuscation we allowany operand for the first task to be one of the variables u, v, w, x, y, z . For comparison operations, we allowedeither a ‘non-equal’ or a ‘smaller than’ comparison. Formathematical operations, we used multiplication or division.We did not introduce any additional fake statements. Thus,for the L obfuscation, since variables occur at 5 places, thisgives options for choosing variables and for choosingoperations. Overall, this gives a total of 12,500 options.For the first simple assessment by humans, we limitedthe number of source code options, i.e., the program class C C , and rather presented complete code examples. Morespecifically, we chose 10 options for each task yielding 10 code examples chosen from C C . They were selected suchthat an ordinary software engineer could solve them withoutadditional knowledge in a short amount of time. For example,we had an option containing a statement ’r:=z/z’ that waseasy to identify as most likely not being coded by a human.We figured that letting a programmer deal with encryptedcode with binary selector variables is too complex.For the more thorough obfuscation ( L ) we used three mis-leading combining statements and five simple statements percombining statement, ie. for each statement in the confidentialcode we added four more statemetns. We used 8 instead of 6variables and additionally + , − operators. In contrast to L ,where we only listed 10 options per task, we allowed for 5options per 6 statements, yielding a total of = 15 , options for the participants to choose from. The handouts toprogrammers are listed in Figure 1. Fig. 1. Handouts to programmers for Simple Obfuscation L1
Experimental Setup: We gave the programmers a short intro-duction ( ≈
15 minutes) about the obfuscation technique ingeneral including examples. The programmers were not toldwhat problem the source code solves, only that it is a commonand simple task. We conducted two separate experiments onefor L and for L obfuscation with different programmers.Experiment 1: For L we used a total of 6 programmers.They were given a paper sheet with the source code options.They had 20 minutes to break the code. The participants hadto rate how much sense an option makes to be the pieceof code solving the problem. The answers had 5 optionsexpressing different degree of confidence. The perfect answerwould consist of one ‘Yes, that’s it’ and nine answers ‘No, it’snot’. Since all options are syntactically correct, programmersneeded some kind of understanding of the semantics of theprogram. Even given very good understanding, they might notbe able to get the perfect answer since they lack backgroundinformation about the task. Rating the source code options cor-responds to the process of de-obfuscation, since programmerschoose a probability distribution over all available options forconfidential programs.Experiment 2: For L we used 5 programmers (different fromthe first experiment). We only considered the first task dealingwith division by zero. The handout to programmers is givenin Figure 1. In this case, it seems very difficult to identify the task. Therefore, we asked them to state unlikely and likelycombinations of options for each statement. Put differently,rather than specifying for each statement just a single optionthey could rate several options as likely (or unlikely). Thus,they could state sets of options using multiple choices perstatement. This corresponds to fixing paths (or traces) in thecode (see the example shown in Figure 1). They were alsogiven 20 minutes to solve the task.Results: Experiment 1: Overall programmers solved this taskwell, ie. they managed to de-obfuscate the code. Given thelimited set of options this is not surprising. Four participantswere indeed able to identify the correct option for the firsttask. The others ranked it as likely but together with at leastone other option. On average, programmers were only able toreliably exclude two options. For the second task, there weretwo equally valid solutions. One for computing the minimumof the array and one for the maximum. Four participantsidentified both options. One chose a wrong solution. Oneparticipant got one answer correct and rated several othersequally likely, including the correct one. Programmersrequired 20 minutes or less for both tasks.Experiment 2: Overall participants could not significantlynarrow the search space. Often their intuition seemed nothelpful, eg. the (statement) options rated as ‘likely’ did notcontain any statements belonging to the confidential code.We discuss identifying unlikely options first. All participantscould correctly identify unlikely options. However, they wererather specific, ie. five or less pairs of unlikely options,eg. similar to the example in Figure 1. Thus, none of theparticipants managed to reduce the search space by morethan a factor of two. Identifying likely paths was even lesssuccessful: One participant did not mention any likely paths,one participant mentioned paths not sharing any commonoption with the solution. One participant mentioned pathswith various options per statement but in fact only one ofthey had only one single option in common with the solution.One answer contained paths with two statements that werealso part of the solution. Another answer contained paths, outof which only one had one statement in common with thesolution. Thus, none of the participants was able to identifya set of paths that contained the solution.In summary, the second task was significantly more difficultand the participants were not successful at de-obfuscating.Overall, we conclude that humans were not able to correctlyand efficiently narrow down the search space to a reasonablenumber of program options. Although this might change givenmore time and experience. We also want to emphasize that weused only a toy example both with respect to the number ofoptions generated and also with respect to code length. Thus,we believe that source code encryption is effective.VII. R ELATED W ORK
There is a variety of approaches to avoid piracy, reverse en-gineering, and tampering [21] such as license files, checksumsobfuscation, to name but a few. We only discuss obfuscation: (De-)Obfuscation techniques: There is a rich literature onsource code obfuscation techniques, including general surveys(e.g., [3]) as well as a more recent survey focusing on malwareobfuscation techniques [25] and a short discussion involving(indistinguishability) obfuscation [6]. One common techniqueis to introduce opaque predicates to insert dead code. Theidea is to have a control statement with two branches thatalways executes one of the branches. However, it should notbe detectable by (static) analysis that the condition alwaysevaluates to the same value. Clearly, dynamic analysis mightyield some hints whether or not the code in one branch isdead, but there are theoretical limitations (as the problem isreducible to the halting problem) [5]. Since we encrypt thecondition and always execute both branches, it is impossiblethat code analysis provides any information about whichbranch is executed. In ordinary programs, adding fake or junkstatements must be done with care because the program mustremain correct. Thus, typically any change to the programby a junk statement that modifies the final outcome has tobe undone, eg. by applying the inverse operation. However,little is known on how to actually create junk statements todisguise a human programmer. In the same work, obfuscationthrough dead code, void code, code duplication are brieflymentioned but with primary focus on avoiding detection by anautomated de-obfuscater (rather than by a reverse engineer).An Intermediate Level Obfuscation Method [11] gives somehigh level guidelines for obfuscation: The authors mentionthat constants can be calculated automatically because theymight help to identify an algorithm. They also mention deadcode generation focusing on alias analysis. Furthermore, theystate briefly that dead code should be similar to the originalexecutable code but without giving further insights. There isalso work on adding junk to confuse a disassembler [18]. Thejunk statements should be unreachable and partial statements.Other than that, there is no discussion on what statements tochoose. We are not limited to add junk statements and theirinverses in our approach, since we can use selector variablesto control whether or not a junk statement influences programstate. Generally, the effectiveness of obfuscation techniquesis still subject to study [1], [9]. For example, one possibleattack against obfuscation is frequency analysis using, e.g.,pattern mining algorithms. Pattern mining has been employedfor program comprehension [24] and reverse engineering [19].We provide an empirical analysis of source code patterns forsimilar problems across developers. These patterns could beused to create obfuscated statements that are harder to identify.Recent work has identified programmers using abstract syn-tax trees with a surprisingly high success rate despite obfusca-tion of code [7]. We do not attempt to identify programmersbased on code, but we use code of the same programmerto support the reconstruction of obfuscated code. We useonly small parts of the abstract syntax tree and compute alsoaggregate (i.e, pattern) statistics.Obfuscation is also (ab)used by malware [22] to circumventintrusion detection systems by using polymorphic techniques.Though obfuscation is typically associated with source or ma-chine code, obfuscation techniques have also been applied todata [2]. Obfuscation has also been leverage to to construct an abstract state machine that enables computation on encryptedand non-encrypted data [20].Metrics and goals: Metrics to assess code obfuscation havebeen discussed in the literature as well [10], namely codepotency (related to code complexity measures), resilience(against an automated de-obfuscator), stealth (easiness to spotobfuscated code by reverse engineer) and (additional) execu-tion cost. We maintain the underlying ideas of these metricsand adjust them to our context. We also add a new metric‘program class quality’. In this work, we mainly focus onobfuscating functions. However, literature on obfuscation hasdiscussed other directions as well, e.g., false refactoring todisguise class structure [3] or removing type information forByte-code [12].Circuit Privacy: Traditional program obfuscation attempts dis-guise the computation (of a circuit) in one way or another.For instance, Indistinguishability obfuscation requires thatgiven any two equivalent circuits C and C of similar size,the obfuscations of C and C should be computationallyindistinguishable [13]. In functional encryption [13], cipher-texts encrypt inputs x , and keys are issued for circuits C .Using a key to decrypt a ciphertext EN C ( C ( x )) , yields theresult of the circuit C evaluated for x but does not revealanything else about x . Furthermore, no collusion of secretkey holders should be able to learn anything more than theunion of what they can each learn individually. Since theintroduction of these concepts [13], significant progress hasbeen made (see [16] for an overview). We argue that from apractical perspective, encryption at the source (or byte) codelevel is more meaningful than for Boolean circuits due toperformance reasons. Given a processor supports n commandseach implemented using circuits of size x (cid:29) log n , it sufficesto hide the command type rather than the circuit.One cannot hope to obfuscate arbitrary programs [4] whenrequiring that a circuit should leak no information exceptits input and output behavior. Using a more relaxed notionfor obfuscation, an obfuscated program may leak as muchinformation as any other program with equivalent function-ality [15]. However, this makes information theoretic obfus-cation impossible (in polynomial time).Keeping the circuit private while changing any wire valuein the circuit has also been investigated [17]. An attack canbe detected and data can then be erased.In his thesis Gentry [14] gave a fully homomorphic encryp-tion (FHE) scheme. He also discussed how to ensure circuitprivacy by adding a large random error vector.VIII. C ONCLUSIONS
Computing on encrypted data is close to being practical,e.g., using secure multi-party computation. In this paper, wehave looked at the next step after protecting data privacy:Keeping algorithms confidential at the source code or bytecodelevel. We have shown that by adding misleading statementsa high degree of protection can be achieved with a modestincrease in computational complexity given that computationis carried out on encrypted data. While our assessment showedthe effectiveness of our approach, this is just one of the first steps and there are many interesting directions for future workin this domain. R
EFERENCES[1] B. Anckaert, M. Madou, B. De Sutter, B. De Bus, K. De Bosschere,and B. Preneel. Program obfuscation: a quantitative approach. In
Proceedings of the ACM workshop on Quality of protection , pages 15–20. ACM, 2007.[2] D. E. Bakken, R. Parameswaran, D. M. Blough, A. A. Franz, and T. J.Palmer. Data obfuscation: Anonymity and desensitization of usable datasets.
IEEE Security and Privacy , 2(6):34–41, 2004.[3] A. Balakrishnan and C. Schulze. Code obfuscation literature survey.http://pages.cs.wisc.edu/ ∼ arinib/writeup.pdf, 2005.[4] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan,and K. Yang. On the (im) possibility of obfuscating programs. Journalof the ACM (JACM) , 59(2):6, 2012.[5] P. Beaucamps and ´E. Filiol. On the possibility of practically obfuscatingprograms towards a unified perspective of code protection.
Journal inComputer Virology , 3(1):3–21, 2007.[6] M. Beunardeau, A. Connolly, R. Geraud, and D. Naccache. Cdoeobofsucaitn: Securing software from within.
IEEE Security & Privacy ,14(3):78–81, 2016.[7] A. Caliskan-Islam, R. Harang, A. Liu, A. Narayanan, C. Voss, F. Ya-maguchi, and R. Greenstadt. De-anonymizing Programmers via CodeStylometry. In , pages 255–270, 2015.[8] C. Camerer.
Behavioral game theory . New Age International, 2010.[9] M. Ceccato, M. Di Penta, P. Falcarin, F. Ricca, M. Torchiano, andP. Tonella. A family of experiments to assess the effectiveness andefficiency of source code obfuscation techniques.
Empirical SoftwareEngineering , 19(4):1040–1074, 2014.[10] C. Collberg, C. Thomborson, and D. Low. Manufacturing cheap,resilient, and stealthy opaque constructs. In
Proceedings of the 25th ACMSIGPLAN-SIGACT symposium on Principles of programming languages ,pages 184–196. ACM, 1998.[11] D. Dunaev and L. Lengyel. An intermediate level obfuscation method.
Acta Polytechnica Hungarica , 11(7), 2014.[12] C. Foket, B. De Sutter, and K. De Bosschere. Pushing java typeobfuscation to the limit.
IEEE Transactions on Dependable and SecureComputing , 11(6):553–567, 2014.[13] S. Garg, C. Gentry, S. Halevi, M. Raykova, A. Sahai, and B. Waters.Candidate indistinguishability obfuscation and functional encryption forall circuits. In
Foundations of Computer Science (FOCS) , pages 40–49,2013.[14] C. Gentry.
A fully homomorphic encryption scheme . PhD thesis, StanfordUniversity, 2009.[15] S. Goldwasser and G. N. Rothblum. On best-possible obfuscation. In
Theory of Cryptography , pages 194–213. Springer, 2007.[16] M. Horv´ath. Survey on cryptographic obfuscation.
IACR CryptologyePrint Archive , 2015:412, 2015.[17] Y. Ishai, M. Prabhakaran, A. Sahai, and D. Wagner. Private circuitsii: Keeping secrets in tamperable circuits. In
Advances in Cryptology-EUROCRYPT 2006 , pages 308–327. Springer, 2006.[18] C. Linn and S. Debray. Obfuscation of executable code to improveresistance to static disassembly. In
Proceedings of the 10th ACMconference on Computer and communications security , pages 290–299.ACM, 2003.[19] O. Maqbool, A. Karim, H. Babri, and M. Sarwar. Reverse engineeringusing association rules. In
Proceedings of International Multi TopicConference(INMIC) , pages 389–395. IEEE, 2004.[20] O. Mazonka, N. G. Tsoutsos, and M. Maniatakos. Cryptoleq: A hetero-geneous abstract machine for encrypted and unencrypted computation.
IEEE Transactions on Information Forensics and Security , 2016.[21] G. Naumovich and N. Memon. Preventing piracy, reverse engineering,and tampering.
Computer , (7):64–71, 2003.[22] P. OKane, S. Sezer, and K. McLaughlin. Obfuscation: the hiddenmalware.
IEEE Security & Privacy , 9(5):41–47, 2011.[23] J. Schneider. Lean and Fast Secure Multi-Party Computation: Minimiz-ing Communication and Local Computation Using A Helper. , 2016.[24] C. Tjortjis, L. Sinos, and P. Layzell. Facilitating program comprehensionby mining association rules from source code. In