[PDF] SNITCH: Dynamic Dependent Information Flow Analysis for Independent Java Bytecode

Abstract

Software testing is the most commonly used technique in the industry to certify the correctness of software systems. This includes security properties like access control and data confidentiality. However, information flow control and the detection of information leaks using tests is a demanding task without the use of specialized monitoring and assessment tools. In this paper, we tackle the challenge of dynamically tracking information flow in third-party Java-based applications using dependent information flow control. Dependent security labels increase the expressiveness of traditional information flow control techniques by allowing to parametrize labels with context-related information and allowing for the specification of more detailed and fine-grained policies. Instead of the fixed security lattice used in traditional approaches that defines a fixed set of security compartments, dependent security labels allow for a dynamic lattice that can be extended at runtime, allowing for new security compartments to be defined using context values. We present a specification and instrumentation approach for rewriting JVM compiled code with in-lined reference monitors. To illustrate the proposed approach we use an example and a working prototype, SNITCH. SNITCH operates over the static single assignment language Shimple, an intermediate representation for Java bytecode used in the SOOT framework.

Full PDF

DD. Ancona and G. Pace (Eds.): Veriﬁcationof Objects at RunTime EXecution 2018 (VORTEX 2018)EPTCS 302, 2019, pp. 16–31, doi:10.4204/EPTCS.302.2 c (cid:13)

E. Geraldo & J.C. SecoThis work is licensed under theCreative Commons Attribution License.

SNITCH: Dynamic Dependent Information Flow Analysis forIndependent Java Bytecode

Eduardo Geraldo João Costa Seco

NOVA LINCS - Faculdade de Ciências e Tecnologia da Universidade Nova de LisboaPortugal

Software testing is the most commonly used technique in the industry to certify the correctnessof software systems. This includes security properties like access control and data conﬁdentiality.However, information ﬂow control and the detection of information leaks using tests is a demandingtask without the use of specialized monitoring and assessment tools.In this paper, we tackle the challenge of dynamically tracking information ﬂow in third-partyJava-based applications using dependent information ﬂow control. Dependent security labels in-crease the expressiveness of traditional information ﬂow control techniques by allowing to parametrizelabels with context-related information and allowing for the speciﬁcation of more detailed and ﬁne-grained policies. Instead of the ﬁxed security lattice used in traditional approaches that deﬁnes aﬁxed set of security compartments, dependent security labels allow for a dynamic lattice that can beextended at runtime, allowing for new security compartments to be deﬁned using context values.We present a speciﬁcation and instrumentation approach for rewriting JVM compiled code within-lined reference monitors. To illustrate the proposed approach we use an example and a work-ing prototype, SNITCH. SNITCH operates over the static single assignment language Shimple, anintermediate representation for Java bytecode used in the SOOT framework.

Data conﬁdentiality is central in current software engineering practices. In the past years, there have beenrecurrent news about information leaks surfacing as a result of subtle programming errors. For instance,GitHub and Twitter , both large-scale systems with impact on a large number of users, discovered andreported that their users’ passwords were stored in cleartext to internal system logs, from where an ill-intended employee could have access to them and enter the users’ accounts. Certifying the functionalcorrectness of software systems by testing is commonly accepted as a satisfactory approximation forcompliance with functional speciﬁcations and requirement fulﬁlment in the software industry. However,testing aspects such as data conﬁdentiality is a difﬁcult task when using traditional approaches. Propertieslike access control and information ﬂow control require setting up complex testing scenarios where thesymptoms of an error are hardly detectable. Typically, information leaks are only perceived at a globalscale by detailed observation of side-effects.Information ﬂow analysis [5, 10, 20, 25] is a language-based approach for information leak detec-tion on software systems. Information ﬂow analysis is present in the literature, in the form of staticand dynamic analysis, each with their advantages and disadvantages. Static analysis usually requiresa considerable effort in code annotation or the complete refactoring of the target system. Besides, the http://bit.ly/2XNfEEU http://bit.ly/2XuMH16 . Geraldo & J.C. Seco A from accessing information of a user B . We follow a more expressive approach that in-troduces dependent types for information ﬂow [14, 16] and access-control [7] allowing for the deﬁnitionof data-dependent policies. Value-dependent security labels improve the expressiveness of traditionalinformation ﬂow techniques. By allowing the parametrization of security labels with context-relatedinformation, it is possible not only to deﬁne more detailed and ﬁne-grained policies, but also to createnew security compartments at runtime. Java Information Flow (JIF) [18, 19] supports dynamic labelswhich differ from dependent security labels. Dynamic labels follow a decentralized security model [18]based on the notion of data ownership and authorizations. According to this model, each data item has anowner, and the owner allows, or not, its data to be read or written by some entity. Dependent security la-bels follow a traditional security model with a security lattice that hierarchically organizes security labelsand where a datum has a security label and can only be accessed by entities with sufﬁcient privileges.For instance, using dependent security labels, we can deﬁne policies restricting access to an employee’spersonal telephone number to the employee itself and its department manager.In this paper, we present a strategy to specify and rewrite the intermediate Java code of applicationsto embed reference monitors capable of enforcing information ﬂow policies using dependent securitylabels. Our low-level code rewriting approach for the Java virtual machine language is inspired by toolslike SASI [ ? ] and TaintDroid [12] that automatically monitor the conﬁdentiality of information of com-piled Java programs. To check data conﬁdentiality in an application, TaintDroid [12] instruments theunderlying runtime system (Android Java virtual machine) while our approach instruments the applica-tion itself. We require the speciﬁcation of some selected classes that make up the entry points of a system,for instance, service controllers [9] or DAO classes [2]. Then, we use this information to introduce in-lined reference monitors and instrument the application code to taint computed values with dependentsecurity labels. Our approach follows the style and semantics of the seminal work by Austin and Flana-gan on dynamic information ﬂow analysis [5] and is inspired by works such as the one by Lourenço andCaires [16, 17] on dependent information ﬂow analysis and the work of Chandra and Franz [8] abouthybrid information ﬂow analysis for Java bytecode.The rewriting process, presented in section 3 operates over an intermediate representation in thestatic single assignment form [4, 26] (SSA). SSA is a way to arrange operations such that each variableis deﬁned only once and allows to simplify and improve some optimizations such as constant propaga-tion, value numbering, common sub-expression elimination and partial redundancy elimination amongothers. We present an example that illustrates the rewriting process over an intermediate representationin the SSA form. Our approach is backed by a prototype tool, SNITCH, to instrument intermediate Javacode. SNITCH was evaluated on small-scale web applications using Java servlets and it uses the SOOTframework [23] for code rewriting, a framework for optimizing and manipulating Java bytecode and Database access objects. Dtatypes that match database table schemas. https://github.com/Sable/soot SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode offers multiple intermediate representations. One of such representations is Shimple, an intermediaterepresentation in the SSA form over which our prototype operates.Our contributions can be summarized as follows: • a rewriting system for instrumenting static single assignment instructions with in-line referencemonitors for information ﬂow control with dependent information ﬂow labels; • a speciﬁcation schema to deﬁne dependent information ﬂow policies; • a way to deﬁne dependent security labels in Java; and • a tool capable of instrumenting third-party compiled code with an in-line reference monitor.We leave for future work the introduction of abstract interpretation to optimize the computationof security labels and mechanisms like the one presented by Austin and Flanagan [6] to reduce labelcreeping and increase the number of accepted programs. Abstract interpretation would allow us not onlyto reduce the number of security label related computations executed at runtime, but also to achieve agradual approach.We start this paper by brieﬂy presenting some concepts on dependent information ﬂow labels inSection 2. Section 3 presents our approach and describes an example of a web application. As wepresent our approach we also describe the steps required to instrument the example application to testit for information leaks. In Section 4 we illustrate the code rewriting process. In Sections 5 and 6 weprovide validate our approach and discuss the related work. Finally, in Section 7, we conclude with someremarks on how to pursue this line of work. Language-based security [21], and in particular information ﬂow control [10], specify and provide aplatform to enforce security policies from the perspective of data creation, manipulation and data ﬂowoperations. Information ﬂow control allows the deﬁnition of hierarchic security compartments and thetracking of all uses of data, ensuring that higher security data does not ﬂow (leak) to unrelated or lowersecurity compartments. Traditionally, Security labels are organized in lattices [11] and are associatedwith value types at compile-time [18, 24] or used to taint values at runtime [5, 8].Information ﬂow control allows for the detection of both explicit and implicit illegal informationﬂows. Explicit ﬂows result from data transfer operations such as assignments, while implicit informationﬂows arise from the control ﬂow of a program. High security label computations can have side effects onvalues of lower security labels allowing those with access to lower labelled variables to infer the values ofthose computations. The side effects of high security computations on lower labelled values go againstthe non-interference property, a property at the core of information ﬂow control and that denotes theabsence of information leaks. According to non-interference, changes to high security label values mustnot reﬂect themselves on lower security labelled values, i.e., changes in the secret input of a programmust not interfere with the program’s public output [25].Traditional security lattices enforce a signiﬁcant degree of label squashing due to the lack of preci-sion of the security labels used. For instance, it is usually the case that a single security label is used torepresent all the users of a system, not allowing to deﬁne ﬁne-grained, per user, information ﬂow restric-tions. The introduction of dependent security policies increases the preciseness of security speciﬁcationsand introduces a higher degree of ﬂexibility and usability. Dependent security policies are present in ap-proaches like value-dependent information ﬂow types [14, 16, 17], and dynamic labels [19]. The former . Geraldo & J.C. Seco Object Tainting Parameters andmethod return passing Instruction rewritingSpeciﬁcation ﬁles Instrumented classesApplication classes

Figure 1: Code Instrumentation Phaseswas ﬁrst introduced in the context of access control policies in [7] and then extended to the domain ofstatic checking of information ﬂow in [14, 16, 17]. With dependent security labels, we can express, forinstance, that a given function yields values of a parametric security label user(u) where u is a runtimevalue, allowing for row-level compartmentalization of security data visibility (c.f., [16]). The prede-ﬁned lattice is automatically extended to capture dependent security labels like user(u) , user( (cid:62) ) , and user( ⊥ ) . The generic security label user( (cid:62) ) is the label taht allows access to all users’ data. Thesecurity label user( ⊥ ) is the label all users can read. We have the relations user( ⊥ ) (cid:60) user(u) (cid:60) user( (cid:62) ) , for any user u , and user(u) user(v) for all users u and v such that u (cid:54) = v . The ﬁrst re-lation means that, for any user u , data can ﬂow from the user’s security compartment user(u) , to label user( (cid:62) ) , and from user( ⊥ ) to label user(u) . Our technical approach follows the phases depicted in Figure 1. Given an application, a dependentsecurity lattice, and a security speciﬁcation for key classes of the application, our approach instrumentsthe application with an in-lined reference monitor capable of enforcing information ﬂow policies. Wenext illustrate our approach with the help of an example.Let us consider a small web application to implement a directory for a given company integrated intheir website. The information stored by such a system for each employee includes its identiﬁer, name,address, salary, and its password. We deﬁne two kinds of employees in this example: supervisor andassociate. The latter category includes extra information, namely its supervisor and information aboutits last evaluation. Other users include unregistered users accessing the company’s website.In this example, we consider the following access constraints to the stored information: • only registered users can see the address of other users (employees); • an associate employee can only access its salary; • a supervisor user can access all associate users’ salary information; • the information regarding who supervises who can only be accessed by supervisor users; • the information about the evaluation of an associate user can only be accessed by supervisor users; • passwords are always secret, and no one but its owner should access it.Our example implements both retrieval and insertion operations. The operations are the following: it ispossible to list the employees in the system; to retrieve the information about a speciﬁc employee; tocompute the average salary; to add new employees to the system. Only a registered user, an employee,can execute the operation that retrieves the information about any other employee. This operation exhibitsdifferent behaviours depending on who is executing it and what information is retrieved.0 SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode

Servlets Dispatchers ApplicationLogicBoundaries to specify DatabaseUser DAOclasses

Figure 2: Typical architecture of a web applicationIn Figure 2 we illustrate the generalised architecture of a web application. For the sake of space andsimplicity, in our evaluation example we merged the application logic and request dispatchers in a singlelayer. In Figure 4 we present the DAO classes to represent the company’s employees. Class

Associate represents associate employees and Class

Supervisor represents supervisor employees. Both classes,

Associate and

Supervisor , extend the abstract class

Employee that contains the information commonto both kinds of employees. Class

Supervisor is empty since it does not add any new ﬁelds to class

Employee . We omit all class methods as there are only getters and setters.

Dependent Security Labels

The instrumentation of a system starts by deﬁning the security speciﬁca-tion. First, it is necessary to deﬁne and implement custom security labels to extend the default securitylattice provided by SNITCH which only includes two built-in labels. The label

Public , the lowestsecurity label of all, and the label

Secret , the highest security label of all.Custom security labels are implemented by extending the abstract class

SecurityLevel , providedin a companion library, and by deﬁning a required comparator method. By implementing the missingcomparator method, we model the security lattice used by the reference monitor during the system’sexecution. Each label requires also a constructor which takes the same parameters as the security la-bel. For each label parameter, the label’s contructor requires one extra parameter of type int . Forinstance, a custom label

User , parametrized by a

String and a long has a constructor with the signature

User(String,int,long,int) . The extra parameter tells the monitor if the value passed to the securitylabel’s constructor is ⊥ , (cid:62) , or if the corresponding parameter is to be considered as is.In order to instrument the example, we deﬁne two custom security labels, SupervisorSL , and

AssociateSL , which deﬁne security compartments for supervisors and associates, respectively. Bothlabels have a single parameter, an employee id, and their comparator methods deﬁne the security latticedepicted in Figure 3.

Speciﬁcations ﬁles

Besides custom security labels, it is necessary to deﬁne a security layer, using aset of speciﬁcation ﬁles and the security lattice (custom security labels) used to in-line the referencemonitor. In Figure 2 we show the layer that needs speciﬁcation, the classes that make the boundaries ofthe system. This layer includes all the classes that communicate with the exterior context of the system,such as service controllers and DAO classes in a typical architecture. . Geraldo & J.C. Seco (⊥)(⊤)(⊤)(⊥)( ) ( ) ( ) ( ) ...... Figure 3: Security lattice used to instrument example. abstract class Employee {long id;String name;String address;double salary;String pwd;...} class Associate extends Employee {Supervisor supervisor;double evaluation;...}class Supervisor extends Employee {}

Figure 4: Classes

Employee , Associate , and

Supervisor

Speciﬁcations ﬁles include annotations to deﬁne the security labels of class ﬁelds and method sig-natures. The semantics of ﬁeld annotations is the following. If a security label is explicitly assignedto a class ﬁeld, it will be ﬁxed (as maximum) throughout the entire execution. Any attempt to store avalue in such a ﬁeld will result in one of two outcomes: if the incoming value’s security label is lowerthan the expected security label, then the incoming value’s security label is upgraded; if the incomingvalue’s security label is higher than the expected ﬁeld’s security label, the monitor signals an informationleak. When a ﬁeld is not annotated with a security label, it changes according to the stored values. We,however, do not let assignment operations to lower the security labels of variables or ﬁelds (c.f., [5]) toavoid implicit information leaks.When deﬁning speciﬁcations for methods, it is possible to annotate both parameters and return val-ues. Method annotations differ from ﬁeld annotations as they include one of two modiﬁers, ? or ! . Ifan annotation uses the modiﬁer ? , the reference monitor compares the security label of the annotatedvalue with the security label used in the annotation and, if higher, the monitor signals an informationleak. If an annotation uses the modiﬁer ! , the monitor will associate the annotation’s security label withthe annotated value. The modiﬁer ! allows one to associate a security label with input from outside the2 SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode abstrcat class Employee {long:Public id;String:Public name;String:AssociateSL(_) address;String:Secret pwd;} (a) class Supervisor extends Employee {double:SupervisorSL(id) salary;} (b) class Associate extends Employee {double:AssociateSL(id) salary;Supervisor:SupervisorSL(_) supervisor;double:SupervisorSL(supervisor) evaluation;} (c)

Figure 5: Security speciﬁcations for classes

Employee , Supervisor and

Associate public String associateDispatch(long requesterId, long queriedId) {Employee queried = EmployeeRepository.getInstance().getEmployeeById(queriedId);String response = "{"+ "\"id\": " + "\"" + queried.getId() + "\""+ "\"name\": \"" + queried.getName() + "\","+ "\"address\": \"" + queried.getAddress() + "\",";if (requesterId == queriedId)response += "\"salary\": \"" + queried.getSalary() + "\",";return response + "}";}

Figure 6: Source code for the employee information dispatchersystem and to declassify information. Since it allows for information declassiﬁcation [1], it is necessaryto use this modiﬁer carefully as it may easily lead to incorrect speciﬁcations which result in undetectedinformation leaks. If a parameter or return value does not have a security annotation, then the monitorwill propagate its security label without performing any extra operation.In the current example, we deﬁne speciﬁcation ﬁles for all dispatcher classes and for classes thatrepresent stored information about employees. The speciﬁcations for classes

Employee , Associate ,and

Supervisor are depicted in Figure 5a, Figure 5b, and Figure 5c respectively. Notice that all ﬁeldsin the classes are annotated with a security label from the lattice in Figure 3. Security label parametersare instantiated with ﬁelds to denote a concrete dependency, or (_) to represent ⊥ . The values used toinstantiate security label parameters must belong to the same object and produce a security dependencybetween ﬁelds. For instance, in class Supervisor we use

SupervisorSL(id) as a security label depen-dent on the value of ﬁeld id , declared in the superclass Employee . The security label

AssociateSL(_) establishes a security compartment accessible to all

Associate employees.The effort required to write security speciﬁcations depends on several factors: the knowledge about . Geraldo & J.C. Seco class EmployeeInfoDispatcher {String:?AssociateSL(requesterId) associateDispatch(long requesterId, long queriedId);} Figure 7: Security speciﬁcations for the employee information dispatcherthe system to instrument, the constraints of the system, and the complexity of the security speciﬁcationand lattice. After deﬁning the security speciﬁcations and labels, it is possible to instrument the applica-tion.

Value tainting

In the ﬁrst instrumentation step, we inject shadow ﬁelds in every application class. Oneof them holds the security label of the class instance while the remaining ones mirror existing class ﬁeldsof a primitive or library (non-instrumented) type.

Methods and Parameter passing

In order to propagate the security labels of primitive or library-typearguments and return values, we add shadow ﬁelds to each class. These shadow ﬁelds help preparingmethod calls by allowing the caller to store and the callee to retrieve the arguments’ security labels.When the called method terminates its execution, the callee stores the security labels of the argumentsand return value for the caller to retrieve.

Instruction rewriting

The ﬁnal step of the instrumentation process consists in the instrumentationof method bodies. The body of a method consists of a graph of basic blocks, where a basic blockis a sequence of instructions starting with a label and terminating in a return, a branching, or a jumpinstruction. The instrumentation process compositionally rewrites instructions in the SSA form [4, 26],according to the rules deﬁned in Figure 8. Every rule for instructions that give place to information ﬂowstake into account the security label associated with the computation itself, i.e., the security label of theprogram counter ( pc% (cid:96) ) [5].The set of instructions considered is the following: load a value to a local variable v = s or v = o . f ;method call v = o . g ( v ... v n ) , where o represents the target object; object instantiation v = new C ; binaryoperations v = op ( e , e ) ; phi expressions v = φ ( v , v ) ; conditional jumps if ( e ) goto l ; unconditional jumps goto l ; and the return instruction return e . A phi function is a pseudo-function used in merge points toyield one of its arguments according to the control-ﬂow path executed. Notation: we use C to denote class names, v and o to denote local variables or registers, k to denote valueliterals, e ranges over local variables and constants; f to denote object ﬁelds, and g to denote methodnames. A variable v s stores the security label of variable v , a ﬁeld f s stores the security label of ﬁeld f s ,a ﬁeld g pi stores the security label of parameter i of the method g and a ﬁeld g ret stores the security labelof the return value of method g .The rules for loading operations, depicted in Figure 8, ( CONST , LOCAL , and

FIELD ) work by com-bining the pc , the security label of the value, and the variable’s security label. The dynamic modiﬁcationof a ﬁeld, rule FIELD

W, potentially increases the ﬁeld’s security label with combination of pc and thevalue’s security label. Rule FIELD

C applies to ﬁxed label ﬁelds, where writes are always lower or equalto the current label.To deal with a method call, we have two rules. Rule

CALL handles the call to an instrumented method.It starts by copying the arguments’ security labels to the target method with the help of auxiliary ﬁelds andthen calls the method. Once the method completes its execution, we retrieve the result and argument’s4

SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode [[ v : = k ]] (cid:96) (cid:44) v : = k ; v s : = v s (cid:116) pc% (cid:96) ( CONST ) [[ v : = v (cid:48) ]] (cid:96) (cid:44) v : = v (cid:48) ; v s : = v s (cid:116) v (cid:48) s (cid:116) pc% (cid:96) ( LOCAL ) [[ v : = f ]] (cid:96) (cid:44) v : = f ; v s : = v s (cid:116) f s (cid:116) pc% (cid:96) ( FIELD ) [[ f : = v ]] (cid:96) (cid:44) assert ( v s (cid:116) pc% (cid:96) (cid:118) f s ) ; f : = v ; ( f has spec.) ( FIELD C) [[ f : = v ]] (cid:96) (cid:44) f : = v ; f s : = f s (cid:116) v s (cid:116) pc% (cid:96) ( f has no spec.) ( FIELD W) [[ v : = new C ]] (cid:96) (cid:44) v : = new C ; v s : = v s (cid:116) pc% (cid:96) ( NEW ) [[ v : = o . g ( v , . . . , v n )]] (cid:96) (cid:44) o . f p : = v s (cid:116) pc% (cid:96) ; . . . ; o . f pn : = v n s (cid:116) pc% (cid:96) ; v : = o . g ( v , . . . , v n ) ; v s : = o . f p ; . . . ; v n s : = o . f pn ; v s : = o . f return ( o has spec.) ( CALL ) [[ v : = o . g ( v , . . . , v n )]] (cid:96) (cid:44) v : = o . g ( v , . . . , v n ) ; v s : = v s (cid:116) o s (cid:116) v s (cid:116) . . . (cid:116) v n s (cid:116) pc% (cid:96) ( o has no spec.) ( CALL X) [[ v : = op ( e , e )]] (cid:96) (cid:44) v : = op ( e , e ) ; v s : = v s (cid:116) v s (cid:116) pc% (cid:96) ( BIN OP ) [[ if ( v ) goto k ]] (cid:96) (cid:44) pc% (cid:96) out : = pc% (cid:96) in (cid:116) v s if ( v ) goto k ( BRANCH ) [[ goto k ]] (cid:96) (cid:44) goto k ( GOTO ) [[ v : = φ ( v , v )]] (cid:96) (cid:44) v = : φ ( v , v ) ; v s : = v s (cid:116) φ ( v s , v s ) ( PHI ) [[ return e ]] (cid:96) (cid:44) this . f return : = e s (cid:116) pc% (cid:96) ; o . f p : = v s ; . . . ; o . f pn : = v n s ; return e ; ( RETURN )Figure 8: Instrumentation rulessecurity label. It is necessary to collect the argument security labels after the call since they might havechanged during the method’s execution. Rule

CALL

X accounts for the use of non-instrumented methods,where the resulting security label is the combination of all operands’ security labels plus the programcounter and, in the case of instance methods, the callee’s security label.Notice that in the case of rule

BRANCH , the value for the context’s security label ( pc ) increasesaccording to the security label of the branch condition. Once the execution leaves the scope started bythe branch condition, it is necessary to reinstate pc ’s old security label. To restore the pc , we follow therules depicted in Figure 9. The rule depicted in Figure 9a is applied in the case where there are multiplepredecessors ( (cid:96) , ..., (cid:96) n ) of the basic block (cid:96) but e does not post-dominate a common predecessor of (cid:96) , ..., (cid:96) n , i.e., multiple control ﬂows converge but the scope does not change. According to this rule, thesecurity label of the context at beginning of the basic (cid:96) results from the φ function of the predecessors’context security labels. The second rule, the rule depicted in Figure 9b, applies when there are multiplepredecessors ( (cid:96) , ..., (cid:96) n ) of (cid:96) and e is the ﬁrst instruction to post-dominate a common predecessor, d , of (cid:96) , ..., (cid:96) n . In this case, the context security label at beginning of block (cid:96) ( pc% (cid:96) in ) is equal to the contextsecurity label at the beginning of d ( pc% d in ), i.e., e is the ﬁrst instruction to execute outside the scopecreated in d and reinstates the value of pc before entering the new scope. Unconditional branches do notchange any security meta-information. Rule P HI chooses the security label according to the executedpredecessor. When a return instruction executes, the pc stack is placed at the same label as it was when . Geraldo & J.C. Seco (cid:96) : pc% (cid:96) : = φ ( pc% (cid:96) , ..., pc% (cid:96) n )[[ e ]] ... [[ e n ]] (cid:96) , ..., (cid:96) n are predecessor nodes of (cid:96) . (a) (cid:96) : pc% (cid:96) in : = pc% d in [[ e ]] ... [[ e n ]] d is the post-dominated nodewhere the current scope started. (b) pc% (cid:96) out is equal to pc% (cid:96) in if not stated otherwise.Figure 9: Context security label rules for merging control ﬂowsthe function was called (because of ad-hoc returns at any point in the method body). Besides restoringthe pc , it is also necessary to copy the returned value’s security label and the argument’s security labelsto auxiliary ﬁelds. It is necessary to update the security labels of the arguments to deal with cases wherethey are objects of non-instrumented types. The objects’ security label might change during the method’sexecution, in which case it is necessary to propagate any changes to the caller. Testing phase

Once deﬁned the security layer and instrumented the application, we test the examplefor information leaks. To do so, we need to test all available operations in the instrumented application.If an operation has an information leak, the monitor halts the system’s execution indicating an assertionviolation. Strong guarantees about data conﬁdentiality depend on the test coverage achieved.In summary, our approach for the detection of illegal information ﬂows using dependent securitylabels, is embodied in a tool based on the SOOT framework which instruments a target application witha reference monitor. With this approach, we believe to have improved the process of security certiﬁcationfor third-party systems. Despite the need for some speciﬁcation effort, typically, there is a set of DAOand controller classes that are known and for which is possible to design a speciﬁcation.

In this section we illustrate the rewriting process using a small example. Let us consider the Java classdepicted in Figure 10b. As stated in section 3, ﬁrst, we add new ﬁelds (to which we will refer as “labelﬁelds”) to the application classes for storing security labels. We add one label ﬁeld for the objects’ secu-rity label ( secLbl$this ), one label ﬁeld for every ﬁeld of a non-instrumented type ( secLbl$field0 )and for every method we add label ﬁelds for parameters and return values of non-instrumented types( secLbl$methodA$p0 , secLbl$methodA$p1 , and secLbl$methodA$ret ). We show the result the ofﬁeld injection on class Example in Figure 10a.Once injected all the necessary ﬁelds, we can proceed to rewrite the methods’ body in the SSA form.To do so, we rewrite every instruction according to the rules deﬁned in Section 3. Figure 11a depicts apossible representation of methodA in the static single assignment form and, Figure 11b illustrates theresult of methodA ’s body rewriting.Lines 6-7 retrieve arguments’ security labels ( secLlb$a and secLlb$b ) from auxiliary ﬁelds in-jected for the purpose ( secLlb$methodA$p0 and secLlb$methodA$p1 respectively);6

SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode class Example {SecurityLabel secLbl$this;SecurityLabel secLbl$field0;long field0;Example field1;SecurityLabel secLbl$methodA$p0;SecurityLabel secLbl$methodA$p1;int methodA (int a, int b) {...}SecurityLabel secLbl$methodA$ret;Example methodB(Example e) {...}} (a) Class

Example after ﬁeld injection. class Example {long field0;Example field1;int methodA (int a, int b) {if(a > b)return a;return b;}Example methodB(Example e) {...}} (b) Method methodA of Class

Example . Figure 10: Class

Example

Lines 8-10 compute the condition’s security label. Then, update pc , keeping its old value so thatwe can restore it when the execution leaves the branch’s scope. Finally execute thebranching instruction.Lines 11-16 compute the value to return, security operations accompany every operation executed.Each branch stores the result in a different version of the same variable ( result_1 and result_2 ).Lines 17-19 terminate the context initiated with the branching instruction (more speciﬁcally in line9). When the execution leaves the scope of the branch instruction, it is necessaryto restore pc to its previous value. Since there are two paths converging, it is alsonecessary to decide which version of the variables to consider using φ functions.Lines 19-20 conclude the method’s execution. They store the result’s label ( secLbl$result_2 )in ﬁeld secLbl methodA ret and update the arguments’ label ﬁelds. The return onlyexecutes after storing the labels. To provide some validation to our approach, we developed a prototype tool, SNITCH, and the instru-mented the web application presented to provide some validation to our approach, we developed a pro-totype tool, SNITCH, and then used it to instrument the web application presented in Section 3.Just as deﬁned in the approach; SNITCH, based on a set of security speciﬁcations, instruments asystem with an in-lined reference monitor. As can be seen Figure 12, which depicts SNITCH’s architec-ture, SNITCH consists of two modules; a parser for the security speciﬁcations and an instrumentationmodule for bytecode rewriting. The latter component makes use of the SOOT [23] framework which,as previously stated, is a framework for Java bytecode manipulation and optimization. In an attempt toreduce the reference monitor’s impact on the system’s execution, SNITCH makes use of the optimizationsuites SOOT offers for optimizing the instrumented code. . Geraldo & J.C. Seco class Example { ... int methodA (int a, int b) { if(a > b) goto LABEL0 result_0 = b goto LABEL1 LABEL0: result_1 = a; LABEL1: result_2 = phi(result_0, result_1) return result_2 } ... } (a) class Example { ... int methodA (int a, int b) { secLbl$a = this.secLbl$methodA$p0 secLbl$b = this.secLbl$methodA$p1 secLbl$cond = combine(secLbl$a, secLbl$b) secLbl$oldPC = increasePC(cond) if(a > b) goto LABEL0 result_0 = b secLbl$result_0 = secLbl$b goto LABEL1 LABEL0: result_1 = a; secLbl$result_1 = secLbl$a LABEL1: setPC(secLbl$oldPC) result_2 = phi(result_0, result_1) secLbl$result_2 = phi(secLbl$result_0, secLbl$result_1) this.secLbl$methodA$ret = secLbl$result_2 this.secLbl$methodA$p0 = secLbl$a this.secLbl$methodA$p1 = secLbl$b return result_2 } ... } (b) Figure 11: Original (left) and instrumented (right) SSA code for method methodA of Class

Example

To test the approach, we introduced information leaks in the example application. The leaks resultedfrom implicit and explicit information ﬂows. The leaks caused by explicit ﬂows were bad assignments orattempts to return classiﬁed information. The in-lined reference monitor in the application was capableof detecting all the information leaks introduced in the example application.The example’s information retrieval methods’ implementation was naive, returning all informationavailable on the employees disregarding any information access restrictions. We reached the ﬁnal imple-mentation of the application through a trial and error process in which we instrumented, tested, and ﬁxedthe application multiple times until no further information leaks were detected.The instrumentation of the example web application also the collection of for the collection of somebroad measurements on the reference monitor’s impact on the execution time of an application. Still,more applications need to be instrumented and tested to obtain more accurate values. We deﬁned a setof ﬁve operations which we used to measure the total execution time and each operation’s average exe-cution time. We measured operations’ execution time in both the original and instrumented applications.Figure 13 shows how the reference monitor affects the execution time of each operation. The operationfor information retrieval is the one where the impact of the monitor was the greatest. An explanationfor this is that this operation extracts the most information per employee; therefore, it executes data8

SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode

SNITCHSecurity SpeciﬁcationParserSecurity Labels . class . class . classSpeciﬁcation FIles . spec . spec . specTarget's compiledJava code Security labelinjection Field ShadowingMethod shadowing Method bodyrewritingJAR SOOToptimizations Instrumentedapplication . class . class . classSNITCH Instrumentor Figure 12: SNITCH internal phases L i s t E m p l o y e e s G e t E m p l o y e e I n f o G e t A v g S a l a r y A d d n e w S u p e r v i s o r A d d N e w A s s o c i a t e O v e r h ea d f ac t o r( x1 ) Figure 13: Runtime overhead per application method. . Geraldo & J.C. Seco

There is a considerable amount of work on information ﬂow analysis in the literature, ranging fromaxiomatic approaches [3], dynamic analyses [5, ? ], programming languages and types [15, 19, 22] toinstrumented virtual machines [12]. Java Information Flow [18, 19], contains embedded informationﬂow analysis capabilities, and allows the deﬁnition of a form of dynamic matching between labels andprincipals which can in turn be used to parametrize classes and deﬁne richer runtime policies.TaintDroid [12] is an approach that does not extend or create a programming language but instru-ments the virtual machine where the intermediate language executes. Sensitive data is tainted at its source(e.g., GPS) and the instrumented virtual machine propagates the taint along a program’s execution. Whentainted data reaches a sink (e.g., network interface), the information leak is logged. An advantage of theapproach taken by TaintDroid over JIF is that it is not necessary to change application code.Austin and Flanagan [5] present a dynamic approach for information ﬂow analysis that guaranteesnon-interference in dynamically-typed languages. It presents and compares two approaches. UniversalLabelling , where all values have an explicit label (security label); and

Sparse Labelling where all valuesare tracked but only some are explicitly labelled. Sparse labelling is observably equivalent to universallabelling but with signiﬁcantly less overhead.Ferreira [14] introduces the use of reﬁnement types in information ﬂow analysis. It presents anextension of the LiveWeb/ λ DB [7] with type-based information ﬂow. Security labels are expressed usingﬁrst-order logic propositions dependent on runtime values. Value-dependent security labels are furtherdeveloped by [16], who presents the ﬁrst non-interference result for dependent information ﬂow types. The purpose of this work is to study the applicability of information ﬂow analysis to the certiﬁcation ofthird-party Java-based software systems. To convey a more usable, ﬂexible and expressive framework,we have adopted dependent information ﬂow control as the preferred abstraction.This paper presents work in the development of a certiﬁcation tool that attaches in-lined referencemonitors to existing compiled code, based on interface speciﬁcations in observable points of systems.We foresee some immediate follow-ups on this work, the challenges in dealing with label creepingand the introduction of abstract interpretation to help reduce the runtime overhead beyond the opti-mizations resulting from the use of SSA intermediate language. Considering the security label com-bination operation ( (cid:116) ) that given two security labels (cid:96) A and (cid:96) B , yields the lowest security label thatis higher or equal to both (cid:96) A and (cid:96) B and the security lattice shown in Figure 3, computations such as AssociateSL ( ⊥ ) (cid:116) AssociateSL ( (cid:62) ) can be removed as its result is known beforehand ( AssociateSL ( (cid:62) ) SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode ). By statically analysing the code, not only trivial computations could be removed, but also, it wouldbe possible, in some instances, to detect illegal information ﬂows statically. The introduction of suchmechanisms would allow our approach to evolve from a dynamic information ﬂow control mechanismto a hybrid one. Another possible line of work would be the introduction of our approach in software de-velopment frameworks as a development tool. This would allow software developers to test their code asthey develop. There also some advantages that our approach can beneﬁt from if integrated with softwaredevelopment tools like the automatic extraction of speciﬁcations based on the frameworks annotations.Frameworks like Spring and Jersey annotate classes with information relevant to the speciﬁcation ﬁles;for instance spring uses the annotation @Entity to ﬂag DAO classes.Regarding the monitor’s overhead presented on Section 5, we would like to highlight that the mea-surements made only took into account CPU time. When taking into account I/O operations, we canconsider the monitor’s overhead as negligible. For instance, the measurements of CPU time were of theorder of the microseconds, while I/O operations took milliseconds, three decimal orders of magnitudegreater and network operations even worse.

Acknowledgements

This work was funded by NOVA LINCS UID/CEC/04516/2013, COST CA15123 - FC&T ProjectCLAY - PTDC/EEI-CTP/4293/2014

References [1] Ana Almeida Matos & Gérard Boudol (2009):

On Declassiﬁcation and the Non-disclosure Policy . J. Comput.Secur.

Core J2EE Patterns (CoreDesign Series): Best Practices and Design Strategies , 2 edition. Sun Microsystems, Inc., Mountain View,CA, USA.[3] Gregory R. Andrews & Richard P. Reitman (1980):

An Axiomatic Approach to Information Flow in Pro-grams . ACM Trans. Program. Lang. Syst.

SSA is Functional Programming . SIGPLAN Not.

Efﬁcient Purely-dynamic Information Flow Analysis . In:

Proceedings of the ACM SIGPLAN Fourth Workshop on Programming Languages and Analysis for Security ,PLAS ’09, ACM, New York, NY, USA, pp. 113–124, doi:10.1145/1554339.1554353.[6] Thomas H. Austin & Cormac Flanagan (2010):

Permissive Dynamic Information Flow Analysis . In:

Pro-ceedings of the 5th ACM SIGPLAN Workshop on Programming Languages and Analysis for Security , pp.3:1–3:12, doi:10.1145/1814217.1814220.[7] Luís Caires, Jorge A. Pérez, João Costa Seco, Hugo Torres Vieira & Lúcio Ferrão (2011):

Type-basedAccess Control in Data-centric Systems . In:

Proceedings of the 20th European Conference on Program-ming Languages and Systems: Part of the Joint European Conferences on Theory and Practice of Software ,ESOP’11/ETAPS’11, Springer-Verlag, Berlin, Heidelberg, pp. 136–155, doi:10.1006/inco.1994.1093.[8] D. Chandra & M. Franz (2007):

Fine-Grained Information Flow Analysis and Enforcement in a Java VirtualMachine . In:

Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007) , pp. 463–475, doi:10.1109/ACSAC.2007.37.[9] Robert Daigneau (2011):

Service Design Patterns: Fundamental Design Solutions for SOAP/WSDL andRESTful Web Services , 1 edition. Addison-Wesley Professional.[10] Dorothy E. Denning (1976):

A Lattice Model of Secure Information Flow . Commun. ACM . Geraldo & J.C. Seco [11] Dorothy E. Denning & Peter J. Denning (1977): Certiﬁcation of Programs for Secure Information Flow . Commun. ACM

TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoringon Smartphones . Communications of the ACM , doi:10.1145/2494522.[13] Úlfar Erlingsson & Fred B. Schneider (2000):

SASI Enforcement of Security Policies: A Retrospective . In:

Proceedings of the 1999 Workshop on New Security Paradigms , NSPW ’99, ACM, New York, NY, USA, pp.87–95, doi:10.1145/335169.335201. Available at http://doi.acm.org/10.1145/335169.335201 .[14] Paulo Jorge Abreu Duarte Ferreira (2012):

MSc Dissertation. Information ﬂow analysis using data-dependentlogical propositions.

Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa.[15] Jürgen Graf, Martin Hecker & Martin Mohr (2013):

Using JOANA for Information Flow Control in JavaPrograms - A Practical Guide . In:

Proceedings of the 6th Working Conference on Programming Languages(ATPS’13) .[16] Luísa Lourenço & Luís Caires (2015):

Dependent Information Flow Types . SIGPLAN Not.

A type system for value-dependent information ﬂow analy-sis . Ph.D. thesis.[18] Andrew C. Myers & Barbara Liskov (2003):

Protecting privacy using the decentralized label model . In:

Foundations of Intrusion Tolerant Systems, OASIS 2003 , pp. 89–116, doi:10.1145/363516.363526.[19] Andrew C. Myers, Lantian Zheng, Steve Zdancewic, Stephen Chong & Nathaniel Nystrom (2006):

Jif 3.0:Java information ﬂow . Available at .[20] Andrei Sabelfeld & Andrew C. Myers (2003):

Language-based information-ﬂow security . IEEE Journal onSelected Areas in Communications

A Language-Based Approach to Security .[22] V. Simonet (2003):

The Flow Caml System (version 1.00): Documentation and user’s manual . Available at .[23] R Vallée-Rai, P Co & E Gagnon (1999):

Soot-a Java bytecode optimization framework . CASCON .[24] Stephan Arthur Zdancewic (2002):

Programming Languages for Information Security . Ph.D. thesis, Ithaca,NY, USA. AAI3063751.[25] Steve Zdancewic (2004):

Challenges for information-ﬂow security . Proceedings of the 1st InternationalWorkshop on the Programming Language Interference and Dependence (PLID’04) .[26] Jianzhou Zhao, Santosh Nagarakatte, Milo M.K. Martin & Steve Zdancewic (2013):