[PDF] Obfuscation using Encryption

Abstract

Protecting source code against reverse engineering and theft is an important problem. The goal is to carry out computations using confidential algorithms on an untrusted party while ensuring confidentiality of algorithms. This problem has been addressed for Boolean circuits known as `circuit privacy'. Circuits corresponding to real-world programs are impractical. Well-known obfuscation techniques are highly practicable, but provide only limited security, e.g., no piracy protection. In this work, we modify source code yielding programs with adjustable performance and security guarantees ranging from indistinguishability obfuscators to (non-secure) ordinary obfuscation. The idea is to artificially generate `misleading' statements. Their results are combined with the outcome of a confidential statement using encrypted \emph{selector variables}. Thus, an attacker must `guess' the encrypted selector variables to disguise the confidential source code. We evaluated our method using more than ten programmers as well as pattern mining across open source code repositories to gain insights of (micro-)coding patterns that are relevant for generating misleading statements. The evaluation reveals that our approach is effective in that it successfully preserves source code confidentiality.

Full PDF

11 Obfuscation using Encryption

Johannes Schneider ∗ and Thomas Locher †∗ University of Liechtenstein, Liechtenstein { ﬁrstname.lastname } @uni.li † ABB Corporate Research, Baden-Daettwil, Switzerland { ﬁrstname.lastname } @ch.abb.com Abstract —Protecting source code against reverse engineeringand theft is an important problem. The goal is to carry outcomputations using conﬁdential algorithms on an untrusted partywhile ensuring conﬁdentiality of algorithms. This problem hasbeen addressed for Boolean circuits known as ‘circuit privacy’.Circuits corresponding to real-world programs are impractical.Well-known obfuscation techniques are highly practicable, butprovide only limited security, e.g., no piracy protection. In thiswork, we modify source code yielding programs with adjustableperformance and security guarantees ranging from indistin-guishability obfuscators to (non-secure) ordinary obfuscation.The idea is to artiﬁcially generate ‘misleading’ statements. Theirresults are combined with the outcome of a conﬁdential statementusing encrypted selector variables . Thus, an attacker must ‘guess’the encrypted selector variables to disguise the conﬁdentialsource code. We evaluated our method using more than tenprogrammers as well as pattern mining across open source coderepositories to gain insights of (micro-)coding patterns that arerelevant for generating misleading statements. The evaluationreveals that our approach is effective in that it successfullypreserves source code conﬁdentiality.

I. I

NTRODUCTION

Intellectual property, e.g., in the form of algorithms, iscostly to develop and it is always at risk of being stolen.In particular, in an untrusted cloud environment, algorithmsholding valuable expert knowledge are susceptible to theft: Ifthe cloud infrastructure is compromised, an attacker can steal(compiled) program code. Insider attacks are also a real threat.A typical approach to (partially) protect intellectual propertyis to obscure it in such a way that it becomes difﬁcult to ﬁgureout its purpose and functionality. However, such obfuscationdoes not solve the problem satisfactorily as the functioningof the algorithm can still be observed and analyzed, even tothe point where the whole algorithm and its parameters arereconstructed. Thus, an attacker with access to the cloud canstill steal an algorithm and execute it without modiﬁcation. Inturn, Boolean circuits can be cryptographically protected, e.g.,using fully homomorphic encryption. An attacker obtaining a‘protected’ circuit cannot evaluate it. However, computationson Boolean circuits alone are considered highly impracticaldue to the large size of circuits that are needed for complexfunctionality. We tackle the problem at a higher level byencrypting statements in source code instead of individualgates. Our primary focus is on encryption of source code aswritten by the programmer (or intermediate code). Our obfus-cation technique yields analogous guarantees as for Booleancircuits, i.e., candidate indistinguishability obfuscators, using higher level source code primitives. Moreover, we discusshow to obtain less secure obfuscations that run much faster.Obfuscation is done by a novel technique: we transform sourcecode into “encrypted” source code by adding a multitudeof additional misleading statements (and variables) combinedwith selector variables. Selector variables are encrypted binaryvariables that “choose” the right result among misleadingand conﬁdential statements. Since the selector variables areencrypted, it is generally non-trivial to determine which state-ments merely serve the purpose of misleading an attacker andwhich statements actually contribute to the computation ofthe result. In particular, current de-obfuscation methods andtools are unable to do so. The difﬁculty of de-obfuscationor extracting the conﬁdential source code depends on thebackground knowledge of the attacker about the programto de-obfuscate as well as on the choice of the misleadingstatements. We also provide an analysis of several open sourcecode programs to determine patterns that might have to beobserved when creating obfuscated source code. Finally, wegive a brief assessment of our method using a small-scaleexperiment using actual programmers whose task was to breaka simple encrypted piece of code. Programmers were very farfrom breaking our obfuscation scheme despite the fact that weused a rather limited degree of source code encryption. Thus,the results provide some evidence that the proposed methodis indeed effective.II. M

ODEL AND P ROBLEM

A client wishes to execute conﬁdential source code inan untrusted environment. The data might or might not beconﬁdential, but we assume that the source code computes onencrypted data. Typically, the clients encrypt code (and data)and send both to an untrusted server. The server performspossibly multiple computations using the encrypted code andvarious input data. It returns encrypted results to clients, wherethe result are decrypted.Intuitively, the mechanism is secure if the adversary havingfull access to the server (including CPU registers, memory,hard drives and the obfuscated code) does not learn anythingabout the source code or about the data. However, in practice,it might be sufﬁcient if only certain parts or aspects of apiece of software remain conﬁdential, eg. the call graph or thearchitecture. We focus on the case where an attacker shouldlearn as little as possible how the computation of the outputworks. a r X i v : . [ c s . CR ] D ec We assume that all encryptions are perfectly secure, i.e., anattacker cannot gain any knowledge from encrypted data.Obfuscation maps a conﬁdential program P C from aprogram class C C to another program from a program class C O , ie. we can express an obfuscator O as O : C C → C O .As we shall discuss later, the relationship is non-injective,ie. every program P O ∈ C O maps to some set C C . Whereasoperations in the given program P C operate on plaintexts,obfuscated programs in C O perform computations onencrypted data. For example, a program a · b with plaintextvariables a, b might correspond to EN C ( a ) · EN C ( b ) on encrypted values, ie. EN C ( a ) , EN C ( b ) denotesciphertexts and the ‘ · ’ operator performs multiplicationof plaintexts using encrypted values only, such that DEC ( EN C ( a ) · EN C ( b )) = a · b , where DEC denotesdecryption. For this example, knowing

EN C ( a ) · EN C ( b ) sufﬁces to discover the program on plaintexts, ie. a · b ,since the performed operations on plaintext can generally beinferred by the attacker having the obfuscated program withoperations on ciphertexts. For example, for the Goldwasser-Micali crytposystem multiplication of ciphertexts correspondsto an XOR of plaintexts. We assume that the attackerknows which cryptoysystem is employed. For more generalcomputations we need either fully homomorphic encryption orsecure multi-party computation. Generally, merely encryptingdata used for computation does not provide any security ofalgorithms.The adversary might know the obfuscation algorithm andprogram classes C O and C C . The task of the attacker isto assign to each program P from the program class C C aprobability p ( P ) stating his belief that the program P is the (ora) conﬁdential program P C . Thus, the goal of the attacker is tochoose P to maximize p ( P ) without knowing the distribution p . However, an attacker might have some knowledge about thedistribution. In particular, he might know coding patterns, eg.covering a few lines of code, that are more likely to occurthan other lines of code. Generally, ﬁnding a program P (cid:48) that differs only slightly from the conﬁdential program P C might also be satisfactory. An attacker is said to successfullybreak our scheme, if it can do so using any computation takingpolynomial time in the size of the largest program in C O . A. Indistinguishability Obfuscation

Indistinguishability obfuscation (see Deﬁnition 1 in [13])essentially says that the obfuscations of two circuits look‘equivalent’ to an attacker. This deﬁnition focusing on circuitscan be extended to more general programs being composed ofmore complex commands than Boolean gates. By contrast, wealso use additional misleading input data for a program thathas no impact on the result, eg. a function f ( x ) = x can beextended to f ( x, y ) = x +0 · y . Therefore, we assume that anyprogram can be called with any superset of the required inputto compute the result. A program class C C is an arbitrary setof programs. We discuss program classes and their relationto our obfuscation technique later (see Section III-C). Moreformally, we deﬁne similarly to Deﬁnition 1 in [13]: Deﬁnition 1 (Indistinguishability Obfuscator (iO) ) . A uniformProbabilistic Polynomial-Time (PPT) machine iO: C C → C O is called an indistinguishability obfuscator for program class C C if the following is satisﬁed:i) For all P ∈ C C and inputs x , we have P r [ DEC ( P (cid:48) ( EN C ( x ))) = P ( x ) : P (cid:48) ← iO ( P )] = 1 ii) For any PPT distinguisher D, there exists a negligible value (cid:15) such that for all pairs of programs P, P (cid:48) ∈ C C if P ( x ) = P (cid:48) ( x ) for all inputs x , then | P r [ D ( iO ( P )) = 1] − P r [ D ( iO ( P (cid:48) )) = 1] | ≤ (cid:15) Condition (i) essentially demands that the obfuscated pro-gram and the non-obfuscated program produce the same out-come. Condition (ii) says that knowing an obfuscated programis of no use to an attacker. This deﬁnition by itself is not suf-ﬁcient to ensure conﬁdentiality without using proper programclasses. In other words, an obfuscator satisfying the abovedeﬁnition does not necessarily protect an algorithm well. Inparticular, the conﬁdential program is not protected sufﬁcientlyif the program class is not meaningful in that many programsin the program class can easily be ruled out as the conﬁdentialprogram (see Section III-C). Furthermore, Condition (ii) in theabove deﬁnition (and also in [13]) focuses only on avoidingthe leakage of programs that behave identically with respectto outputs, ie. P ( x ) = P (cid:48) ( x ) . In practice, any program withvery similar input-output behavior to the conﬁdential programmight also be deemed conﬁdential. B. Program Deﬁnition

We keep to a simple yet sufﬁcient deﬁnition. Assumethat a program P of length n is a sequence of statements s , s , s , ..., s n − . A statement s i is an assignment of a(simple) expression to a variable, ie. a statement s couldbe r := a · b with expression a · b and result variable r . A simple expression consists of a single operationand two (input) variables. We generally write it in theform Op ( Input , Input , eg. M U L ( a, b ) denotes themultiplication a · b . The program returns the result, ie.variable, of the last statement. Later (see Deﬁnition 2),we introduce in more detail combining statements , whichaggregate the results of several misleading statements andone conﬁdential statement. Note that this simple deﬁnitionof a statement also covers control-ﬂow statements forcode operating on encrypted data (only), since conditionalstatements must be transformed to hide the outcome of thebranching conditions. For example, loop unrolling is doneusing an upper bound on the possible number of iterations.For illustration, consider a program to square a number a . Let program P ( a ) consist of a single statement b := M U L ( a, a ) with input value a ∈ { , , ... } . Assume theset of possible operations in the used programming languageis given by S O := { M U L, ADD, DIV } and there is justone variable a . A simple expression consists of an operationand its input variables, eg., M U L ( a, a ) . If we deﬁne the set of programs to be all programs with one simple expressionand one input variable, we could deﬁne the program classas P = { ( b := M U L ( a, a )) , ( b := ADD ( a, a )) , ( b := DIV ( a, a )) } . If we use two statements, where the ﬁrst deﬁnessome variable, we get P = { ( b := M U L ( a, a ); c := M U L ( a, b )) , ( b := M U L ( a, a ); c := ADD ( a, b )) , ... } . Out ofthese programs three actually perform the desired functionalityof squaring a number, namely ( b := M U L ( a, a ); c := M U L ( a, a )) , ( b := ADD ( a, a ); c := M U L ( a, a )) , ( b := DIV ( a, a ); c := M U L ( a, a )) . The number of simple ex-pressions given | V | variables and | S O | operations taking t input variables can be bounded by | S O | · | V | t . Any of theprior programs that squares a number might serve as anobfuscated program in the traditional sense (i.e., without usingencryption). But standard dead code elimination would alreadyremove the added misleading statement for obfuscation. Next,we introduce a more sophisticated technique that makes useof encrypted selector variables.III. S OURCE C ODE O BFUSCATION THROUGH E NCRYPTION

We distinguish two levels of source code encryption differ-ing in the unit of abstraction they seek to conceal: Programlevel encryption considers programs as a whole. A ﬁnergranularity is given by hiding individual statements.

A. Program Level

The idea is to conceal the conﬁdential program is to run(all) possible programs (not just those yielding the desiredoutput), ie. all programs from class C C . Then we select theconﬁdential result out of all results using encrypted binaryselector variables. Since all programs are run and only theresult is selected, an attacker obtains no information whichof the executed programs yields the result. Thus, the class C O consists of just one program executing all programs in C C using encrypted inputs. It is important that the order ofcomputation of the programs is randomized, ie. we randomlypermute all programs and then evaluate them.Let EN C ( r i ) be the encrypted result of program P i ( x ) ∈ C O for an input x . Deﬁne b i ∈ { , } to be a selector bit forprogram i , ie. b i = 1 if i = i ∗ and 0 otherwise. Let P i ∗ = P C be the conﬁdential program, e.g., with the functionality tosquare a number. We say P i ∗ ( x ) = r ∗ yields the desiredresult r ∗ . The ﬁnal encrypted result r ∗ can be obtained by EN C ( r ∗ ) := (cid:80) i EN C ( b i ) · EN C ( r i ) , where ‘ · ’ is a multipli-cation on encrypted data and the summation is also performedon encrypted values. The client can decrypt EN C ( r ∗ ) toget r ∗ . For example, say P = P C := { M U L ( a, a ) } and P := {} ADD ( a, a ) } . We set b = 1 and b = 0 .Therefore, EN C ( r ∗ ) := EN C ( M U L ( a, a )) · EN C ( b ) + EN C ( ADD ( a, a )) · EN C ( b ) = EN C ( M U L ( a, a )) Since the bits b i as well as the results are encrypted, anattacker does not know which of the results is chosen. It isimmediate that such an obfuscator satisﬁes Deﬁnition 1 giventhat all secure computation mechanisms are secure, ie. wefulﬁll condition (i) due to the deﬁnition of selector variablesresulting in choosing the output of the conﬁdential program forany input among all executed programs. We fulﬁll (ii), since there is essentially just one obfuscated program (comprisingof all programs), ie. any two programs P (cid:48) and P map to thesame obfuscated program.Running entire programs and selecting the result is not ideal.When choosing k programs with similar running times orcomputational costs, the running time increases by a factor of k . But an attacker only has to choose one out of k programs,i.e., the attacker breaks the “encryption” with probability /k by guessing. We would like to have at least an exponentialrelationship in the computational cost and the difﬁculty toguess, e.g., / k . The key idea to reach this goal is to applyselector bits per statement rather than picking entire programs. B. Statement Level

An attacker should not be able to infer a speciﬁc statement,even given that he knows all prior and all following statements.Our strategy is analogous as for (entire) programs: In order todisguise the statement s we create a set of k − ‘misleading’statements M i . Then all statements { M , ..., M k − , s } arerandomly permuted. After that they are executed in that order.We choose the correct result, ie. that of the conﬁdentialvariable s , using binary selector variables b i . More precisely,we compute the combining statement , which is deﬁned asfollows: Deﬁnition 2.

The combining statement for a set of statements S = { S C , M , ..., M k − } with conﬁdential statement S C andmisleading statements M i is given by EN C ( r ∗ ) := (cid:88) s ∈ S EN C ( b s ) · EN C ( r s ) where r s is the result of statement s and b s are binary(selector) variables, such that b s = 1 for s = S C and 0otherwise. For example, for the statement c := M U L ( a, a ) , wecould choose a misleading statement ADD ( a, a ) and use thecombining statement c (cid:48) := M U L ( a, a ) · b + ADD ( a, a ) · b with binary selector variable b = 1 − b being 1 and b being 0. Note, that since b = 1 and b = 0 wehave c = c (cid:48) . If the selector variables b , b as well asother variables and subexpressions are encrypted, i.e., wecompute EN C ( c (cid:48) ) := EN C ( M U L ( a, a )) · EN C ( b ) + EN C ( ADD ( a, a )) · EN C ( b ) (with · , + operating on en-crypted data), then an attacker is not able to discover the truestatement without knowing b (or b ).In contrast to program level security (Section III-A), wejust execute a single but larger program. Generally, for thesame level of security, this single program is much morecompact (and efﬁcient to evaluate) than the concatenation ofall programs used for program security. Roughly speaking, forprograms of length n , where each statement could be assignedone out of k expressions, the number of feasible programs is k n . The concatenation of all misleading statements is of length k · n , since for each of the n statements there are k options.This concatenation covers all k n feasible programs. Therefore,by executing k · n rather than n statements, an attacker has tochoose from k n options. This polynomial relationship between computational costs and number of possible programs is muchbetter than the linear relationship presented in the prior SectionIII-A. Therefore, we focus on statement level security.So far, we have focused on obfuscating a single conﬁdentialstatement by combining its result and the result of severalmisleading statements. In principle, one could also generate“misleading combining statements” consisting of misleadingstatements only, ie. the entire statement is irrelevant. Thus,these combining statements merely serve the purpose ofconfusing the attacker (and hiding the true length of theconﬁdential program). In fact, such ‘misleading combiningstatements” might make comprehension more difﬁcult. Butthey do not make the life of the attacker harder, when it comesto obtaining a program with the same input-output behavior.The attacker does not have to guess any misleading statementcorrectly, since all of them have no impact on the output.Still, the number of possible expressions k can be ex-ceedingly large, rendering the execution of k · n statementspractically infeasible. Therefore, we can only choose a subsetof all possible statements as discussed next. C. Program Classes

There is an interesting relationship between an obfuscatedprogram P O and program classes C C containing the conﬁ-dential program P C . An obfuscated program P O using binaryselector variables deﬁnes a program class C C , i.e., a set ofprograms, containing the conﬁdential program that could havebeen mapped by the obfuscator to create P O .For example, the conﬁdential program P C = { r := M U L ( a, a ) } could have been mapped to the encrypted pro-gram P O := { r M U L ( EN C ( a ) , EN C ( a ))); r ADD ( EN C ( a ) , EN C ( a )); r r · EN C ( b ) + r · EN C ( b ) } , where M U L, ADD compute the multiplicationand addition of plaintexts using encrypted values. The en-crypted program P ) deﬁnes the program class C C consisting of { M U L ( a, a ) , ADD ( a, a ) } . The program class deﬁned by theencrypted program depends on the details of the obfuscationalgorithm, its parameters and the input program. Intuitively,large program class C C seem preferable, since they enlargethe search space for the attacker. Indistinguishability obfusca-tion ensures that an attacker seeing the obfuscated programcannot determine any information about which program of allprograms in the program class C C is the conﬁdential program.But indistinguishability obfuscation alone does not necessarilygive any practical security guarantees, if the program class C C is known to the attacker and there exist many programs that areclose to the conﬁdential program. For example, if all programsin the program class C C (including the conﬁdential program)yield (almost) the same output on any input then an attackercan pick any program from the class to get (almost) the samefunctionality. Moreover, it might be possible to exclude manyprograms from the program class C C through simple reasoningthat might not even require background knowledge. Certainprograms might be executable and yield valid outputs, but theirsemantics might seem unreasonable. For example, a programthat returns the same output irrespective of the input. Anattacker can safely ignore such programs therefore reducing the size of the program class containing the conﬁdentialalgorithm. It is generally not clear how to best choose theprogram class so that the attacker cannot easily eliminate manyprograms from the class with more or less simple analysis orbackground knowledge. There are many factors that come intoplay. We discuss mechanisms to create obfuscated programsin the next section.IV. O BFUSCATION M ETRICS

Our metrics primarily follow (but enhance) the metrics for(standard) obfuscation [10]. We introduce a new metric thatcharacterizes the quality of a program class.

Code potency : Obfuscated code can be assessed using tradi-tional code complexity measures, e.g., based on control ﬂowand data access. One might also use parameters from theobfuscation process itself, which illustrate the complexity ofthe encrypted code. For example, if we choose (up to) k misleading statements for each statement, we might use ametric such as a “mislead factor”. This is analogous to thesecurity parameter for encryption. Resilience : This is the ability to withstand attacks usingautomated tools (maybe given some background knowledge).For example, an attacker could try to reconstruct the programusing known-plaintext attacks, i.e., he knows the input ofthe function and a few outputs. The attacker could thensimply try each program from the encryption space and returnall matching ones. Another example involves knowledge ofcoding guidelines used in the program or having unencryptedcode of the same programmer at hand. Programmers typicallyhave their own style that might be comparable to a signatureidentifying the programmer. Thus, an attacker could searchfor patterns in the encrypted code that match the codingguidelines (or the patterns matching the programmer’s codingstyle). This could help to reduce the search space of possiblenon-obfuscated programs given an obfuscated program. Twometrics have been proposed in the literature [10]: i) program-mer effort to construct an automatic de-obfuscator; ii) de-obfuscator effort: execution time and space required by thede-obfuscator.One might also consider a metric for measuring potency reduc-tion , stating how effective a de-obfuscator is. For example, forstatement level security it could measure the “mislead factorreduction”, i.e., the (average) number of misleading (or junk)statements that the de-obfuscator could identify per statementthat has been obfuscated.

Program class quality : On a high level a program classshould contain programs that are sufﬁciently different fromthe conﬁdential program, such that knowing an obfuscatedprograms is of no value to an attacker. Given a program fromthe program class, an attacker should not easily be able tojudge whether or not it is the conﬁdential program. Sinceour obfuscated program using encrypted selector variablesdiscloses the program class, both aspects are important. Givinga formal deﬁnition of a program class is difﬁcult, since itdepends on the semantics of a program. However, we mightreason about the attackers capabilities and deﬁne the quality ofthe program class in terms of the probability estimates p ( P ) of a rational attacker that a speciﬁc program is conﬁdential.A rational attacker would leverage all its knowledge aboutthe conﬁdential program, the obfuscation algorithm etc. tocompute a probability distribution p ( P ) stating the probabilityestimate that a program P is conﬁdential. To determine thequality of the program class given the distribution of anattacker we deﬁne the rank r of a program P as the numberof programs P (cid:48) with p ( P (cid:48) ) ≤ p ( P C ) . We deﬁne the qualityof program class C C with a subset S ⊆ C C of programssuch that all of the programs in S are considered conﬁdentialas Q ( C C ) := 1 − /r with r being the minimum rank ofany program P ∈ S . The quality ranges between 0 and 1.It is 0 if the attacker correctly assigns a (or the) conﬁdentialprogram the largest probability to be conﬁdential. It is 1 if theattacker believes it is the least likely. The motivation is that avery reasonable behavior for an attacker given the probabilitydistribution p ( P ) is to perform an in depth investigationof the programs sorted by their probability estimate. Forexample, assuming the attacker has access to some plaintexts(input and output), he might check one program after theother, whether its input-output behavior matches the knownplaintexts. Thus, the time needed to infer the conﬁdentialprogram depends directly on our quality measure. Deﬁningthe set of conﬁdential programs S is application dependent.For example, given one conﬁdential program P C one mightdeﬁne the set S as all programs having the same input-outputbehavior for all inputs, ie. S = { P |∀ x : P ( x ) = P C ( x ) } . Stealth refers to difﬁculty to de-obfuscate programs by hu-mans, meaning that junk statements should not be easilydiscoverable by a human. The ability to discover them sig-niﬁcantly depends on the knowledge a human has about theprogram (and surrounding statements).

Execution cost : For instance, the overhead factor being theratio of the (average) execution time of non-obfuscated andthe obfuscated program.V. P

RACTICAL O BFUSCATION ON THE S TATEMENT L EVEL

Executing all possible (simple) expressions for a real-worldprogramming language instead of just a single one for eachstatement in a program is infeasible for even moderatelylarge programs. Thus, we introduce a weaker form of securityby adding fewer misleading statements in order to improveperformance. Choosing these misleading statements in a waythat is hard to disguise is non-trivial. One might have to takeinto account existing patterns in code. and, potentially, removethese coding patterns to create a more uniformly lookingencrypted code that is less sensitive to statistical attacks(Section V-A). We aim at schemes that make it possible tostrike a balance between several metrics, in particular securityand performance. We discuss metrics for obfuscation andsecurity (Section IV) as well as what kind of information ofa real-world program might be worth to hide (Section ?? ). A. Preprocessing - Source Code Uniformization

Before obfuscating source code, we might modify the codethrough several preprocessing operations that do not changethe semantics of the code but make it more ‘uniform’ in the sense that patterns consisting of potentially multiple statementsare removed. We might also avoid generating highly unlikelypatterns. Source code might contain patterns due to preferencesof a programmer or due to the nature of the problem. Program-mer preferences might be found in certain command constructsused to solve frequently reoccurring tasks. Patterns can bemined and used for statistical attacks. Assume the obfuscatedstatement contains two types of comparison expressions a (cid:54) = b and a = b out of which one is misleading. Assume weknow that a programmer frequently almost never uses ‘ (cid:54) = (cid:48) but almost uses ‘not’ ‘ (cid:54) (cid:48) and ‘ = (cid:48) to express inequality, et. (cid:54) ( a (cid:54) = b ) . This information makes it easier to ﬁnd non-obfuscated code. In another example, a programmer mightalso prefer to state a loop condition involving integer I andloop variable x as x ≤ I − rather than x < I . Theformer condition x ≤ I − is expressed using two statements, J := I − and x ≤ J . Therefore, when trying to determine astatement out of a set of misleading statements, and seeing acondition x ≤ J , an attacker might investigate the precedingstatements to check whether one of them contains a statementof the form J := I − , if so, this is an indication thatthe code contains x ≤ I − . Both cases are easy to detectautomatically in source code of typed languages. We canenforce using, eg. ’ · ’ and x < I by rewriting the codewithout changing the semantics. In fact, some cases might evenbe covered by compiler optimizations. This might reduce thepossibilities for an attacker to exploit prior knowledge aboutcertain programming preferences. B. Misleading Code Generation

An essential question for obfuscation relates to the choiceof misleading statements to satisfy the metrics in Section IV.Several considerations apply:

Number of misleading statements : Choosing relatively few(misleading) statements might cause only little overhead dueto obfuscation but put security at jeopardy.

Execution costs of a misleading statement : It may bepreferable to choose misleading statements that do not requirea lot of computation. For example, rather than executing a‘sort’ command on a list as misleading statement, one mightremove the ﬁrst element. The ﬁrst command requires O ( l log l ) time for a list of length l , whereas the amortized time for thesecond command is only O (log l ) [23]. Similarity of misleading and conﬁdential statements : Onestrategy is to choose misleading statements independently ofexisting source code. However, independently chosen state-ments might be easy to identify, since they might appearas outliers. For example, given a complex function of manysimpler mathematical functions, such as sine, square root,division and so forth, it is not recommended to add misleadingstatements manipulating strings. It seems likely that a stringexpression is identiﬁed as not being part of the conﬁdentialprogram. To make the junk statements hard to identify, poten-tially given some knowledge about the application or sourcecode structure, it seems favorable to choose misleading codethat is similar to the actual source code.

Intermediate code vs. source code : One might add statements for intermediate code (e.g., Java bytecode) or plain sourcecode. Generally, in our scenario only intermediate code isgiven to an untrusted party. An attacker typically decompilesit. Modifying intermediate code might cause decompilers tofail. Therefore, an attacker might have to work on intermediatecode itself, which is considerably harder. Compilers can alsogenerate intermediate code according to different metrics, forexample with the goal to minimize the amount of code. In ourcase, in order to minimize the absolute performance impact ofobfuscation, it seems preferable to generate short programs.As an example, if we add three junk statements per statementin a short program, the absolute increase in running time isless than if we do the same for a long program. Furthermore,intermediate code might remove certain coding patterns thatcould help in statistical attacks (see Section V-A). Thus, if ourobfuscation technique does not account for dependencies, anattacker might be able to rule out some misleading statements.In intermediate code, such patterns might be removed.

Simplifying complex expressions : One has to decide whichconstructs in the source code should be enhanced by mislead-ing ‘statements’. Depending on the programming language,statements are allowed to vary a lot in complexity: they couldbe decomposable into many simple expressions. Disguisinga complex statement as a whole, also requires a statementof similar complexity (otherwise an attacker might assumethat very complex statements are non-junk). Trying to createcomplex misleading statements seems rather challenging, sincea single subexpression that appears as outlier might be sufﬁ-cient for an attacker to disguise the entire complex statementas ‘misleading’. Furthermore, in the most extreme case anentire program is expressed in one statement, which resultsin similar disadvantages as discussed when obfuscating entireprograms compared to individual statements (Section III-A).Therefore, it might be preferable to create a uniform sourcecode representation of simple statements, such as three-addresscodes and obfuscate them.

Syntactic and semantic correctness : For three-address code,a junk statement comprises of an operation and two operands.Clearly, the operation must be executable for the operands.This implies that the operands must have a type that issupported by the operation. But the values of the operandsshould also be supported. For example, when adding a divisionas a junk statement, the divisor should not be zero.

Operation and operands obfuscation : We can create mis-leading statements by using the same operations but differentoperands or different operations with the same operands or wecan vary both, e.g., for the statement a · b , we could add a + b or a − b to disguise the operation only and x · y or e · f to disguisethe operands only. In order to combine both, we might createtemporary variables involving several variables with selectorvariables and then we could use these temporary variables invarious operations. This yields the largest search space for anattacker. For example, we can set temporary variable t to be t b · a + (1 − b · x and t b · b + (1 − b · y forselector variables b , b . We can use these temporary variablesin statements t · t and t t . This yields a total of eightpossibilities of statements. More generally, if both temporaryvariables can be chosen out of k variables and the operations can be chosen out of o operations, we get k · o possibilities. Patterns and statement dependency : Statements are gener-ally not independent. Given a single statement the probabilityof the next statement is not uniform across all statements. Partsof the dependency are due to the data type, that is, only certainstatements are feasible for certain types. For example, givenwe know a statement performs string concatenation, it seemsmore likely that among the next statements there is anotherstatement for string operation rather than a trigonometricfunction. There might be common patterns consisting of asequence of a few statements that are more likely than othersin general or depending on a speciﬁc application domainor programmer. Finding the optimal strategy of the attackerand the obfuscator to use (or counteract) the knowledge ofpattern could be cast as the problem of ﬁnding a mixed-strategy Nash equilibria, eg. we look at the obfuscator andthe attacker as players. A formal treatment of Nash equilibriais beyond the scope of this work and we refer to text bookson game theory covering these concepts, eg. [8]. To get someintuition, consider the following scenario. There are n differentstatements S that are not uniformly distributed. There is onefrequent statement F ∈ S with (large) likelihood p l andevery other of the n − statements has the same probability / ( n − , ie. we have p l (cid:29) /n . Assume the strategyof the obfuscator is to generate one misleading statementfor each statement s C belonging to the conﬁdential code C by choosing it uniformly at random from S \ s C , ie. itdoes not choose the misleading statement to be the same asthe conﬁdential statement, since this provides no protection.Say an attacker wants to choose the correct statement for apair of two statements s C , M with M being the misleadingstatement and s C a statement of the conﬁdential program. Hewould behave as follows to maximize the expected numberof correctly inferred statements: If one of the two statements s C , M equals F , he guesses F . If both statements do notequal F he chooses randomly among the two. Let us computethe probability that an attacker correctly infers a statementgiven two statements s C , M of a large program. The attacker isalways correct if the true statement s C is F , which happens fora fraction p l of all statements. If the true statement is not F butthe misleading statement is F the attacker will always predictwrongly. If the misleading statement is not F and the truestatement is not F , it predicts correctly with probability / .Thus, the expected number of correctly predicted statements is p l + (1 − p l ) · (1 − /n ) · / . Now, assume that our obfuscatoris aware of the true distribution of statements as well. Assumethat it always chooses F as misleading statement if s C (cid:54) = F .If the attacker does not alter its strategy, the expected numberof correctly guessed statements is now only p l . However, wecould do better using a different strategy.VI. E MPIRICAL E VALUATION

In Section V-B we introduced several general aspects formisleading code generation. In this section, we discuss somepoints in more detail and further provide a short evaluationby programmers. A key aspect is the structure of the code:How much dependency is there across statements? Does code contain patterns that must be taken into account whengenerating code? The second question is discussed next.

A. Patterns in Source Code

Patterns in source code might facilitate the decoding ofobfuscated code using encryption as discussed in the beginningof Section V. There is a sheer endless number of patterns onecould look for, ranging from high-level design patterns to theuse of individual operators. Since we operate primarily ona statement level, we compared the frequencies of patternson a lower level. We looked at the distribution of operators,i.e., we counted how often an operator appeared in the code,since simple statements essentially consist of an operatorand one or two variables. An even more particular statementinvolves an integer constant and an operator. We conjecturedthat the usage of integer constants might vary signiﬁcantlyacross programmers, for example the usage of ≤ n − vs. < n . Therefore, we looked at how integer constants are usedwithin binary expressions, speciﬁcally what values they haveand with what operator they are used with. In order to getsome intuition about more complex patterns, we looked at thecomposition of binary and unary expressions: What are thetype of expression(s) and the operation that constitute a binary(or unary) expression?For comparison, we chose ﬁve data mining frameworksimplemented in JAVA namely WEKA, ELKI, JSAT, Java-ML and Spmf . We compared all clustering algorithmsper framework together (see Table I) as well as particularalgorithms, namely k -Means and OPTICS clustering (fork-Means see Tables III), to get a more robust estimate. Wealso compared the GUI parts of the frameworks (see TableII) to detect variation across different applications.The less uniform code is across programmers and appli-cation domains, the more important it is to take particularcoding patterns into account. When looking at the standarddeviation and the mean for an application domain on its own(Tables II and III) for all the structural patterns, we see thatthe mean is considerably larger than the standard deviation.This shows limited dispersion and a certain uniformity. There-fore, knowing that code stems from a particular frameworkhelps in predicting a statement but only to a rather limiteddegree. Generating misleading statements according to such adistribution seems helpful, i.e., we might choose misleadingstatements such that the overall distribution roughly stays thesame. When comparing the means for the same patterns ofdifferent application domains, we see stronger variation. Forexample, the mean for the increment ‘posIncrement NameE’for GUI and clustering varies by a factor of two. Still, thedifferences are not that large that a decoding of statementswould be easily possible by merely knowing that we consider aspeciﬁc type of application. Judging from the usage of integerconstants and operators, there are strong deviations for a fewcases. For example, the plus operator occurs almost three times more often in WEKA than in JSAT or JavaML. JSATdoes not use integer constants except divisions by 2, ELKIin turn never uses divisions by 2. This hints at programmerpreferences. Given the strong variation for some examples,it seems advisable to create misleading commands based onpatterns observed in the code. Frameworks Mean StdPattern Weka Elki JSAT Spmf JavaMLStructural Pattern: Binary and Unary ExpressionposIncrement NameE 9 (446) (316) (369) (37) (285) (387) (152) (13) (15) (7) (342) (441) (285) (32) (165) (339) (410) (191) (31) (94) (195) (251) (172) (7) (62) (171) (44) (34) (13) (35) (114) (78) (61) (1) (31) (90) (95) (57) (9) (72) (86) (45) (25) (2) (16) (63) (50) (13) (5) (9) (50) (3) (0) (2) (4) (21) (0) (19) (0) (20) (1360) (1106) (719) (100) (437) (1210) (748) (284) (61) (145) (479) (478) (458) (36) (289) (242) (260) (142) (24) (130) (228) (242) (155) (39) (45) (166) (178) (72) (7) (54) TABLE IS

ORTED PATTERNS FOR ALL CLUSTERING ALGORITHMS WITHRELATIVE (%)

AND ABSOLUTE FREQUENCIES IN BRACKETS

Frameworks Mean StdPattern Weka Elki JSAT SpmfStructural Pattern: Binary and Unary Expressionassign MethodCallE 7 (1871) (31) (2) (39) (1501) (69) (9) (73) (1423) (43) (6) (37) (1183) (47) (3) (9) (1084) (0) (0) (19) (1048) (4) (14) (23) (570) (25) (0) (1) (401) (4) (0) (9) (344) (5) (0) (1) (340) (7) (0) (8) (267) (0) (11) (0) (163) (3) (0) (1) (8687) (240) (17) (167) (5032) (63) (16) (85) (1884) (73) (1) (89) (1736) (66) (3) (29) (1228) (14) (16) (20) (1140) (15) (13) (15) TABLE IIS

ORTED PATTERNS FOR

GUI

WITH RELATIVE (%)

AND ABSOLUTEFREQUENCIES IN BRACKETS

B. (De-)Obfuscation Case Study

In this section we present examples of how simple obfus-cated code might look like and we also perform a small-scaleempirical evaluation with respect to the ability of humansto break the obfuscation, ie. the discover the conﬁdentialprogram. To the best of our knowledge, we are among theﬁrst to do such an empirical study providing some evidenceon the value of obfuscation. We (manually) obfuscated code

Frameworks Mean StdPattern Weka Elki JSAT Spmf JavaMLStructural Pattern: Binary and Unary Expressionassign MethodCallE 12 (65) (7) (6) (7) (0) (46) (9) (7) (5) (13) (33) (9) (13) (7) (9) (27) (0) (0) (2) (0) (23) (2) (0) (1) (3) (22) (0) (0) (0) (0) (29) (1) (2) (0) (0) (13) (1) (0) (0) (1) (5) (2) (0) (0) (2) (5) (0) (2) (1) (0) (4) (0) (0) (0) (0) (3) (0) (0) (0) (0) (175) (16) (32) (20) (21) (136) (8) (5) (8) (3) (52) (11) (11) (7) (13) (35) (0) (10) (3) (2) (22) (1) (2) (2) (2) (19) (5) (3) (7) (2) TABLE IIIS

ORTED PATTERNS FOR K -M EANS ALGORITHM WITH RELATIVE (%)

ANDABSOLUTE FREQUENCIES IN BRACKETS for two simple tasks that are understandable by any noviceprogrammer. The ﬁrst task is to avoid division by zero bychecking the value of a divisor and return some (error) value,if the divisor is zero: if (y!= 0) then r:=x/y else r:=-9999

The second task is to return the maximum of an array: m:=a[0]for(x=0;x= a[x] then m:=a[x]

We assumed that all constants are encrypted and appear asvariables. The code of task one becomes: if (y!= u) then r:=x/y else r:=v

Obfuscation: There are many different options to obfuscatethe code using encryption (see Section IV). Discussingmultiple techniques in detail is out of scope for this work.Therefore, we focus on a case study with simple manual codegeneration. We considered two levels of obfuscation ( L and L ) with a different number of misleading statements andvariables.For the lower-level obfuscation, i.e., L , we introducedtwo fake variables w, z . Thus, for obfuscation we allowany operand for the ﬁrst task to be one of the variables u, v, w, x, y, z . For comparison operations, we allowedeither a ‘non-equal’ or a ‘smaller than’ comparison. Formathematical operations, we used multiplication or division.We did not introduce any additional fake statements. Thus,for the L obfuscation, since variables occur at 5 places, thisgives options for choosing variables and for choosingoperations. Overall, this gives a total of 12,500 options.For the ﬁrst simple assessment by humans, we limitedthe number of source code options, i.e., the program class C C , and rather presented complete code examples. Morespeciﬁcally, we chose 10 options for each task yielding 10 code examples chosen from C C . They were selected suchthat an ordinary software engineer could solve them withoutadditional knowledge in a short amount of time. For example,we had an option containing a statement ’r:=z/z’ that waseasy to identify as most likely not being coded by a human.We ﬁgured that letting a programmer deal with encryptedcode with binary selector variables is too complex.For the more thorough obfuscation ( L ) we used three mis-leading combining statements and ﬁve simple statements percombining statement, ie. for each statement in the conﬁdentialcode we added four more statemetns. We used 8 instead of 6variables and additionally + , − operators. In contrast to L ,where we only listed 10 options per task, we allowed for 5options per 6 statements, yielding a total of = 15 , options for the participants to choose from. The handouts toprogrammers are listed in Figure 1. Fig. 1. Handouts to programmers for Simple Obfuscation L1

Experimental Setup: We gave the programmers a short intro-duction ( ≈

15 minutes) about the obfuscation technique ingeneral including examples. The programmers were not toldwhat problem the source code solves, only that it is a commonand simple task. We conducted two separate experiments onefor L and for L obfuscation with different programmers.Experiment 1: For L we used a total of 6 programmers.They were given a paper sheet with the source code options.They had 20 minutes to break the code. The participants hadto rate how much sense an option makes to be the pieceof code solving the problem. The answers had 5 optionsexpressing different degree of conﬁdence. The perfect answerwould consist of one ‘Yes, that’s it’ and nine answers ‘No, it’snot’. Since all options are syntactically correct, programmersneeded some kind of understanding of the semantics of theprogram. Even given very good understanding, they might notbe able to get the perfect answer since they lack backgroundinformation about the task. Rating the source code options cor-responds to the process of de-obfuscation, since programmerschoose a probability distribution over all available options forconﬁdential programs.Experiment 2: For L we used 5 programmers (different fromthe ﬁrst experiment). We only considered the ﬁrst task dealingwith division by zero. The handout to programmers is givenin Figure 1. In this case, it seems very difﬁcult to identify the task. Therefore, we asked them to state unlikely and likelycombinations of options for each statement. Put differently,rather than specifying for each statement just a single optionthey could rate several options as likely (or unlikely). Thus,they could state sets of options using multiple choices perstatement. This corresponds to ﬁxing paths (or traces) in thecode (see the example shown in Figure 1). They were alsogiven 20 minutes to solve the task.Results: Experiment 1: Overall programmers solved this taskwell, ie. they managed to de-obfuscate the code. Given thelimited set of options this is not surprising. Four participantswere indeed able to identify the correct option for the ﬁrsttask. The others ranked it as likely but together with at leastone other option. On average, programmers were only able toreliably exclude two options. For the second task, there weretwo equally valid solutions. One for computing the minimumof the array and one for the maximum. Four participantsidentiﬁed both options. One chose a wrong solution. Oneparticipant got one answer correct and rated several othersequally likely, including the correct one. Programmersrequired 20 minutes or less for both tasks.Experiment 2: Overall participants could not signiﬁcantlynarrow the search space. Often their intuition seemed nothelpful, eg. the (statement) options rated as ‘likely’ did notcontain any statements belonging to the conﬁdential code.We discuss identifying unlikely options ﬁrst. All participantscould correctly identify unlikely options. However, they wererather speciﬁc, ie. ﬁve or less pairs of unlikely options,eg. similar to the example in Figure 1. Thus, none of theparticipants managed to reduce the search space by morethan a factor of two. Identifying likely paths was even lesssuccessful: One participant did not mention any likely paths,one participant mentioned paths not sharing any commonoption with the solution. One participant mentioned pathswith various options per statement but in fact only one ofthey had only one single option in common with the solution.One answer contained paths with two statements that werealso part of the solution. Another answer contained paths, outof which only one had one statement in common with thesolution. Thus, none of the participants was able to identifya set of paths that contained the solution.In summary, the second task was signiﬁcantly more difﬁcultand the participants were not successful at de-obfuscating.Overall, we conclude that humans were not able to correctlyand efﬁciently narrow down the search space to a reasonablenumber of program options. Although this might change givenmore time and experience. We also want to emphasize that weused only a toy example both with respect to the number ofoptions generated and also with respect to code length. Thus,we believe that source code encryption is effective.VII. R ELATED W ORK

There is a variety of approaches to avoid piracy, reverse en-gineering, and tampering [21] such as license ﬁles, checksumsobfuscation, to name but a few. We only discuss obfuscation: (De-)Obfuscation techniques: There is a rich literature onsource code obfuscation techniques, including general surveys(e.g., [3]) as well as a more recent survey focusing on malwareobfuscation techniques [25] and a short discussion involving(indistinguishability) obfuscation [6]. One common techniqueis to introduce opaque predicates to insert dead code. Theidea is to have a control statement with two branches thatalways executes one of the branches. However, it should notbe detectable by (static) analysis that the condition alwaysevaluates to the same value. Clearly, dynamic analysis mightyield some hints whether or not the code in one branch isdead, but there are theoretical limitations (as the problem isreducible to the halting problem) [5]. Since we encrypt thecondition and always execute both branches, it is impossiblethat code analysis provides any information about whichbranch is executed. In ordinary programs, adding fake or junkstatements must be done with care because the program mustremain correct. Thus, typically any change to the programby a junk statement that modiﬁes the ﬁnal outcome has tobe undone, eg. by applying the inverse operation. However,little is known on how to actually create junk statements todisguise a human programmer. In the same work, obfuscationthrough dead code, void code, code duplication are brieﬂymentioned but with primary focus on avoiding detection by anautomated de-obfuscater (rather than by a reverse engineer).An Intermediate Level Obfuscation Method [11] gives somehigh level guidelines for obfuscation: The authors mentionthat constants can be calculated automatically because theymight help to identify an algorithm. They also mention deadcode generation focusing on alias analysis. Furthermore, theystate brieﬂy that dead code should be similar to the originalexecutable code but without giving further insights. There isalso work on adding junk to confuse a disassembler [18]. Thejunk statements should be unreachable and partial statements.Other than that, there is no discussion on what statements tochoose. We are not limited to add junk statements and theirinverses in our approach, since we can use selector variablesto control whether or not a junk statement inﬂuences programstate. Generally, the effectiveness of obfuscation techniquesis still subject to study [1], [9]. For example, one possibleattack against obfuscation is frequency analysis using, e.g.,pattern mining algorithms. Pattern mining has been employedfor program comprehension [24] and reverse engineering [19].We provide an empirical analysis of source code patterns forsimilar problems across developers. These patterns could beused to create obfuscated statements that are harder to identify.Recent work has identiﬁed programmers using abstract syn-tax trees with a surprisingly high success rate despite obfusca-tion of code [7]. We do not attempt to identify programmersbased on code, but we use code of the same programmerto support the reconstruction of obfuscated code. We useonly small parts of the abstract syntax tree and compute alsoaggregate (i.e, pattern) statistics.Obfuscation is also (ab)used by malware [22] to circumventintrusion detection systems by using polymorphic techniques.Though obfuscation is typically associated with source or ma-chine code, obfuscation techniques have also been applied todata [2]. Obfuscation has also been leverage to to construct an abstract state machine that enables computation on encryptedand non-encrypted data [20].Metrics and goals: Metrics to assess code obfuscation havebeen discussed in the literature as well [10], namely codepotency (related to code complexity measures), resilience(against an automated de-obfuscator), stealth (easiness to spotobfuscated code by reverse engineer) and (additional) execu-tion cost. We maintain the underlying ideas of these metricsand adjust them to our context. We also add a new metric‘program class quality’. In this work, we mainly focus onobfuscating functions. However, literature on obfuscation hasdiscussed other directions as well, e.g., false refactoring todisguise class structure [3] or removing type information forByte-code [12].Circuit Privacy: Traditional program obfuscation attempts dis-guise the computation (of a circuit) in one way or another.For instance, Indistinguishability obfuscation requires thatgiven any two equivalent circuits C and C of similar size,the obfuscations of C and C should be computationallyindistinguishable [13]. In functional encryption [13], cipher-texts encrypt inputs x , and keys are issued for circuits C .Using a key to decrypt a ciphertext EN C ( C ( x )) , yields theresult of the circuit C evaluated for x but does not revealanything else about x . Furthermore, no collusion of secretkey holders should be able to learn anything more than theunion of what they can each learn individually. Since theintroduction of these concepts [13], signiﬁcant progress hasbeen made (see [16] for an overview). We argue that from apractical perspective, encryption at the source (or byte) codelevel is more meaningful than for Boolean circuits due toperformance reasons. Given a processor supports n commandseach implemented using circuits of size x (cid:29) log n , it sufﬁcesto hide the command type rather than the circuit.One cannot hope to obfuscate arbitrary programs [4] whenrequiring that a circuit should leak no information exceptits input and output behavior. Using a more relaxed notionfor obfuscation, an obfuscated program may leak as muchinformation as any other program with equivalent function-ality [15]. However, this makes information theoretic obfus-cation impossible (in polynomial time).Keeping the circuit private while changing any wire valuein the circuit has also been investigated [17]. An attack canbe detected and data can then be erased.In his thesis Gentry [14] gave a fully homomorphic encryp-tion (FHE) scheme. He also discussed how to ensure circuitprivacy by adding a large random error vector.VIII. C ONCLUSIONS

Computing on encrypted data is close to being practical,e.g., using secure multi-party computation. In this paper, wehave looked at the next step after protecting data privacy:Keeping algorithms conﬁdential at the source code or bytecodelevel. We have shown that by adding misleading statementsa high degree of protection can be achieved with a modestincrease in computational complexity given that computationis carried out on encrypted data. While our assessment showedthe effectiveness of our approach, this is just one of the ﬁrst steps and there are many interesting directions for future workin this domain. R

EFERENCES[1] B. Anckaert, M. Madou, B. De Sutter, B. De Bus, K. De Bosschere,and B. Preneel. Program obfuscation: a quantitative approach. In

Proceedings of the ACM workshop on Quality of protection , pages 15–20. ACM, 2007.[2] D. E. Bakken, R. Parameswaran, D. M. Blough, A. A. Franz, and T. J.Palmer. Data obfuscation: Anonymity and desensitization of usable datasets.

IEEE Security and Privacy , 2(6):34–41, 2004.[3] A. Balakrishnan and C. Schulze. Code obfuscation literature survey.http://pages.cs.wisc.edu/ ∼ arinib/writeup.pdf, 2005.[4] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan,and K. Yang. On the (im) possibility of obfuscating programs. Journalof the ACM (JACM) , 59(2):6, 2012.[5] P. Beaucamps and ´E. Filiol. On the possibility of practically obfuscatingprograms towards a uniﬁed perspective of code protection.

Journal inComputer Virology , 3(1):3–21, 2007.[6] M. Beunardeau, A. Connolly, R. Geraud, and D. Naccache. Cdoeobofsucaitn: Securing software from within.

IEEE Security & Privacy ,14(3):78–81, 2016.[7] A. Caliskan-Islam, R. Harang, A. Liu, A. Narayanan, C. Voss, F. Ya-maguchi, and R. Greenstadt. De-anonymizing Programmers via CodeStylometry. In , pages 255–270, 2015.[8] C. Camerer.

Behavioral game theory . New Age International, 2010.[9] M. Ceccato, M. Di Penta, P. Falcarin, F. Ricca, M. Torchiano, andP. Tonella. A family of experiments to assess the effectiveness andefﬁciency of source code obfuscation techniques.

Empirical SoftwareEngineering , 19(4):1040–1074, 2014.[10] C. Collberg, C. Thomborson, and D. Low. Manufacturing cheap,resilient, and stealthy opaque constructs. In

Proceedings of the 25th ACMSIGPLAN-SIGACT symposium on Principles of programming languages ,pages 184–196. ACM, 1998.[11] D. Dunaev and L. Lengyel. An intermediate level obfuscation method.

Acta Polytechnica Hungarica , 11(7), 2014.[12] C. Foket, B. De Sutter, and K. De Bosschere. Pushing java typeobfuscation to the limit.

IEEE Transactions on Dependable and SecureComputing , 11(6):553–567, 2014.[13] S. Garg, C. Gentry, S. Halevi, M. Raykova, A. Sahai, and B. Waters.Candidate indistinguishability obfuscation and functional encryption forall circuits. In

Foundations of Computer Science (FOCS) , pages 40–49,2013.[14] C. Gentry.

A fully homomorphic encryption scheme . PhD thesis, StanfordUniversity, 2009.[15] S. Goldwasser and G. N. Rothblum. On best-possible obfuscation. In

Theory of Cryptography , pages 194–213. Springer, 2007.[16] M. Horv´ath. Survey on cryptographic obfuscation.

IACR CryptologyePrint Archive , 2015:412, 2015.[17] Y. Ishai, M. Prabhakaran, A. Sahai, and D. Wagner. Private circuitsii: Keeping secrets in tamperable circuits. In

Advances in Cryptology-EUROCRYPT 2006 , pages 308–327. Springer, 2006.[18] C. Linn and S. Debray. Obfuscation of executable code to improveresistance to static disassembly. In

Proceedings of the 10th ACMconference on Computer and communications security , pages 290–299.ACM, 2003.[19] O. Maqbool, A. Karim, H. Babri, and M. Sarwar. Reverse engineeringusing association rules. In

Proceedings of International Multi TopicConference(INMIC) , pages 389–395. IEEE, 2004.[20] O. Mazonka, N. G. Tsoutsos, and M. Maniatakos. Cryptoleq: A hetero-geneous abstract machine for encrypted and unencrypted computation.

IEEE Transactions on Information Forensics and Security , 2016.[21] G. Naumovich and N. Memon. Preventing piracy, reverse engineering,and tampering.

Computer , (7):64–71, 2003.[22] P. OKane, S. Sezer, and K. McLaughlin. Obfuscation: the hiddenmalware.

IEEE Security & Privacy , 9(5):41–47, 2011.[23] J. Schneider. Lean and Fast Secure Multi-Party Computation: Minimiz-ing Communication and Local Computation Using A Helper. , 2016.[24] C. Tjortjis, L. Sinos, and P. Layzell. Facilitating program comprehensionby mining association rules from source code. In