SNITCH: Dynamic Dependent Information Flow Analysis for Independent Java Bytecode
DD. Ancona and G. Pace (Eds.): Verificationof Objects at RunTime EXecution 2018 (VORTEX 2018)EPTCS 302, 2019, pp. 16–31, doi:10.4204/EPTCS.302.2 c (cid:13)
E. Geraldo & J.C. SecoThis work is licensed under theCreative Commons Attribution License.
SNITCH: Dynamic Dependent Information Flow Analysis forIndependent Java Bytecode
Eduardo Geraldo João Costa Seco
NOVA LINCS - Faculdade de Ciências e Tecnologia da Universidade Nova de LisboaPortugal
Software testing is the most commonly used technique in the industry to certify the correctnessof software systems. This includes security properties like access control and data confidentiality.However, information flow control and the detection of information leaks using tests is a demandingtask without the use of specialized monitoring and assessment tools.In this paper, we tackle the challenge of dynamically tracking information flow in third-partyJava-based applications using dependent information flow control. Dependent security labels in-crease the expressiveness of traditional information flow control techniques by allowing to parametrizelabels with context-related information and allowing for the specification of more detailed and fine-grained policies. Instead of the fixed security lattice used in traditional approaches that defines afixed set of security compartments, dependent security labels allow for a dynamic lattice that can beextended at runtime, allowing for new security compartments to be defined using context values.We present a specification and instrumentation approach for rewriting JVM compiled code within-lined reference monitors. To illustrate the proposed approach we use an example and a work-ing prototype, SNITCH. SNITCH operates over the static single assignment language Shimple, anintermediate representation for Java bytecode used in the SOOT framework.
Data confidentiality is central in current software engineering practices. In the past years, there have beenrecurrent news about information leaks surfacing as a result of subtle programming errors. For instance,GitHub and Twitter , both large-scale systems with impact on a large number of users, discovered andreported that their users’ passwords were stored in cleartext to internal system logs, from where an ill-intended employee could have access to them and enter the users’ accounts. Certifying the functionalcorrectness of software systems by testing is commonly accepted as a satisfactory approximation forcompliance with functional specifications and requirement fulfilment in the software industry. However,testing aspects such as data confidentiality is a difficult task when using traditional approaches. Propertieslike access control and information flow control require setting up complex testing scenarios where thesymptoms of an error are hardly detectable. Typically, information leaks are only perceived at a globalscale by detailed observation of side-effects.Information flow analysis [5, 10, 20, 25] is a language-based approach for information leak detec-tion on software systems. Information flow analysis is present in the literature, in the form of staticand dynamic analysis, each with their advantages and disadvantages. Static analysis usually requiresa considerable effort in code annotation or the complete refactoring of the target system. Besides, the http://bit.ly/2XNfEEU http://bit.ly/2XuMH16 . Geraldo & J.C. Seco A from accessing information of a user B . We follow a more expressive approach that in-troduces dependent types for information flow [14, 16] and access-control [7] allowing for the definitionof data-dependent policies. Value-dependent security labels improve the expressiveness of traditionalinformation flow techniques. By allowing the parametrization of security labels with context-relatedinformation, it is possible not only to define more detailed and fine-grained policies, but also to createnew security compartments at runtime. Java Information Flow (JIF) [18, 19] supports dynamic labelswhich differ from dependent security labels. Dynamic labels follow a decentralized security model [18]based on the notion of data ownership and authorizations. According to this model, each data item has anowner, and the owner allows, or not, its data to be read or written by some entity. Dependent security la-bels follow a traditional security model with a security lattice that hierarchically organizes security labelsand where a datum has a security label and can only be accessed by entities with sufficient privileges.For instance, using dependent security labels, we can define policies restricting access to an employee’spersonal telephone number to the employee itself and its department manager.In this paper, we present a strategy to specify and rewrite the intermediate Java code of applicationsto embed reference monitors capable of enforcing information flow policies using dependent securitylabels. Our low-level code rewriting approach for the Java virtual machine language is inspired by toolslike SASI [ ? ] and TaintDroid [12] that automatically monitor the confidentiality of information of com-piled Java programs. To check data confidentiality in an application, TaintDroid [12] instruments theunderlying runtime system (Android Java virtual machine) while our approach instruments the applica-tion itself. We require the specification of some selected classes that make up the entry points of a system,for instance, service controllers [9] or DAO classes [2]. Then, we use this information to introduce in-lined reference monitors and instrument the application code to taint computed values with dependentsecurity labels. Our approach follows the style and semantics of the seminal work by Austin and Flana-gan on dynamic information flow analysis [5] and is inspired by works such as the one by Lourenço andCaires [16, 17] on dependent information flow analysis and the work of Chandra and Franz [8] abouthybrid information flow analysis for Java bytecode.The rewriting process, presented in section 3 operates over an intermediate representation in thestatic single assignment form [4, 26] (SSA). SSA is a way to arrange operations such that each variableis defined only once and allows to simplify and improve some optimizations such as constant propaga-tion, value numbering, common sub-expression elimination and partial redundancy elimination amongothers. We present an example that illustrates the rewriting process over an intermediate representationin the SSA form. Our approach is backed by a prototype tool, SNITCH, to instrument intermediate Javacode. SNITCH was evaluated on small-scale web applications using Java servlets and it uses the SOOTframework [23] for code rewriting, a framework for optimizing and manipulating Java bytecode and Database access objects. Dtatypes that match database table schemas. https://github.com/Sable/soot SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode offers multiple intermediate representations. One of such representations is Shimple, an intermediaterepresentation in the SSA form over which our prototype operates.Our contributions can be summarized as follows: • a rewriting system for instrumenting static single assignment instructions with in-line referencemonitors for information flow control with dependent information flow labels; • a specification schema to define dependent information flow policies; • a way to define dependent security labels in Java; and • a tool capable of instrumenting third-party compiled code with an in-line reference monitor.We leave for future work the introduction of abstract interpretation to optimize the computationof security labels and mechanisms like the one presented by Austin and Flanagan [6] to reduce labelcreeping and increase the number of accepted programs. Abstract interpretation would allow us not onlyto reduce the number of security label related computations executed at runtime, but also to achieve agradual approach.We start this paper by briefly presenting some concepts on dependent information flow labels inSection 2. Section 3 presents our approach and describes an example of a web application. As wepresent our approach we also describe the steps required to instrument the example application to testit for information leaks. In Section 4 we illustrate the code rewriting process. In Sections 5 and 6 weprovide validate our approach and discuss the related work. Finally, in Section 7, we conclude with someremarks on how to pursue this line of work. Language-based security [21], and in particular information flow control [10], specify and provide aplatform to enforce security policies from the perspective of data creation, manipulation and data flowoperations. Information flow control allows the definition of hierarchic security compartments and thetracking of all uses of data, ensuring that higher security data does not flow (leak) to unrelated or lowersecurity compartments. Traditionally, Security labels are organized in lattices [11] and are associatedwith value types at compile-time [18, 24] or used to taint values at runtime [5, 8].Information flow control allows for the detection of both explicit and implicit illegal informationflows. Explicit flows result from data transfer operations such as assignments, while implicit informationflows arise from the control flow of a program. High security label computations can have side effects onvalues of lower security labels allowing those with access to lower labelled variables to infer the values ofthose computations. The side effects of high security computations on lower labelled values go againstthe non-interference property, a property at the core of information flow control and that denotes theabsence of information leaks. According to non-interference, changes to high security label values mustnot reflect themselves on lower security labelled values, i.e., changes in the secret input of a programmust not interfere with the program’s public output [25].Traditional security lattices enforce a significant degree of label squashing due to the lack of preci-sion of the security labels used. For instance, it is usually the case that a single security label is used torepresent all the users of a system, not allowing to define fine-grained, per user, information flow restric-tions. The introduction of dependent security policies increases the preciseness of security specificationsand introduces a higher degree of flexibility and usability. Dependent security policies are present in ap-proaches like value-dependent information flow types [14, 16, 17], and dynamic labels [19]. The former . Geraldo & J.C. Seco Object Tainting Parameters andmethod return passing Instruction rewritingSpecification files Instrumented classesApplication classes
Figure 1: Code Instrumentation Phaseswas first introduced in the context of access control policies in [7] and then extended to the domain ofstatic checking of information flow in [14, 16, 17]. With dependent security labels, we can express, forinstance, that a given function yields values of a parametric security label user(u) where u is a runtimevalue, allowing for row-level compartmentalization of security data visibility (c.f., [16]). The prede-fined lattice is automatically extended to capture dependent security labels like user(u) , user( (cid:62) ) , and user( ⊥ ) . The generic security label user( (cid:62) ) is the label taht allows access to all users’ data. Thesecurity label user( ⊥ ) is the label all users can read. We have the relations user( ⊥ ) (cid:60) user(u) (cid:60) user( (cid:62) ) , for any user u , and user(u) user(v) for all users u and v such that u (cid:54) = v . The first re-lation means that, for any user u , data can flow from the user’s security compartment user(u) , to label user( (cid:62) ) , and from user( ⊥ ) to label user(u) . Our technical approach follows the phases depicted in Figure 1. Given an application, a dependentsecurity lattice, and a security specification for key classes of the application, our approach instrumentsthe application with an in-lined reference monitor capable of enforcing information flow policies. Wenext illustrate our approach with the help of an example.Let us consider a small web application to implement a directory for a given company integrated intheir website. The information stored by such a system for each employee includes its identifier, name,address, salary, and its password. We define two kinds of employees in this example: supervisor andassociate. The latter category includes extra information, namely its supervisor and information aboutits last evaluation. Other users include unregistered users accessing the company’s website.In this example, we consider the following access constraints to the stored information: • only registered users can see the address of other users (employees); • an associate employee can only access its salary; • a supervisor user can access all associate users’ salary information; • the information regarding who supervises who can only be accessed by supervisor users; • the information about the evaluation of an associate user can only be accessed by supervisor users; • passwords are always secret, and no one but its owner should access it.Our example implements both retrieval and insertion operations. The operations are the following: it ispossible to list the employees in the system; to retrieve the information about a specific employee; tocompute the average salary; to add new employees to the system. Only a registered user, an employee,can execute the operation that retrieves the information about any other employee. This operation exhibitsdifferent behaviours depending on who is executing it and what information is retrieved.0 SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode
Servlets Dispatchers ApplicationLogicBoundaries to specify DatabaseUser DAOclasses
Figure 2: Typical architecture of a web applicationIn Figure 2 we illustrate the generalised architecture of a web application. For the sake of space andsimplicity, in our evaluation example we merged the application logic and request dispatchers in a singlelayer. In Figure 4 we present the DAO classes to represent the company’s employees. Class
Associate represents associate employees and Class
Supervisor represents supervisor employees. Both classes,
Associate and
Supervisor , extend the abstract class
Employee that contains the information commonto both kinds of employees. Class
Supervisor is empty since it does not add any new fields to class
Employee . We omit all class methods as there are only getters and setters.
Dependent Security Labels
The instrumentation of a system starts by defining the security specifica-tion. First, it is necessary to define and implement custom security labels to extend the default securitylattice provided by SNITCH which only includes two built-in labels. The label
Public , the lowestsecurity label of all, and the label
Secret , the highest security label of all.Custom security labels are implemented by extending the abstract class
SecurityLevel , providedin a companion library, and by defining a required comparator method. By implementing the missingcomparator method, we model the security lattice used by the reference monitor during the system’sexecution. Each label requires also a constructor which takes the same parameters as the security la-bel. For each label parameter, the label’s contructor requires one extra parameter of type int . Forinstance, a custom label
User , parametrized by a
String and a long has a constructor with the signature
User(String,int,long,int) . The extra parameter tells the monitor if the value passed to the securitylabel’s constructor is ⊥ , (cid:62) , or if the corresponding parameter is to be considered as is.In order to instrument the example, we define two custom security labels, SupervisorSL , and
AssociateSL , which define security compartments for supervisors and associates, respectively. Bothlabels have a single parameter, an employee id, and their comparator methods define the security latticedepicted in Figure 3.
Specifications files
Besides custom security labels, it is necessary to define a security layer, using aset of specification files and the security lattice (custom security labels) used to in-line the referencemonitor. In Figure 2 we show the layer that needs specification, the classes that make the boundaries ofthe system. This layer includes all the classes that communicate with the exterior context of the system,such as service controllers and DAO classes in a typical architecture. . Geraldo & J.C. Seco (⊥) (⊤) (⊤) (⊥) ( ) ( ) ( ) ( ) ...... Figure 3: Security lattice used to instrument example. abstract class Employee {long id;String name;String address;double salary;String pwd;...} class Associate extends Employee {Supervisor supervisor;double evaluation;...}class Supervisor extends Employee {}
Figure 4: Classes
Employee , Associate , and
Supervisor
Specifications files include annotations to define the security labels of class fields and method sig-natures. The semantics of field annotations is the following. If a security label is explicitly assignedto a class field, it will be fixed (as maximum) throughout the entire execution. Any attempt to store avalue in such a field will result in one of two outcomes: if the incoming value’s security label is lowerthan the expected security label, then the incoming value’s security label is upgraded; if the incomingvalue’s security label is higher than the expected field’s security label, the monitor signals an informationleak. When a field is not annotated with a security label, it changes according to the stored values. We,however, do not let assignment operations to lower the security labels of variables or fields (c.f., [5]) toavoid implicit information leaks.When defining specifications for methods, it is possible to annotate both parameters and return val-ues. Method annotations differ from field annotations as they include one of two modifiers, ? or ! . Ifan annotation uses the modifier ? , the reference monitor compares the security label of the annotatedvalue with the security label used in the annotation and, if higher, the monitor signals an informationleak. If an annotation uses the modifier ! , the monitor will associate the annotation’s security label withthe annotated value. The modifier ! allows one to associate a security label with input from outside the2 SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode abstrcat class Employee {long:Public id;String:Public name;String:AssociateSL(_) address;String:Secret pwd;} (a) class Supervisor extends Employee {double:SupervisorSL(id) salary;} (b) class Associate extends Employee {double:AssociateSL(id) salary;Supervisor:SupervisorSL(_) supervisor;double:SupervisorSL(supervisor) evaluation;} (c)
Figure 5: Security specifications for classes
Employee , Supervisor and
Associate public String associateDispatch(long requesterId, long queriedId) {Employee queried = EmployeeRepository.getInstance().getEmployeeById(queriedId);String response = "{"+ "\"id\": " + "\"" + queried.getId() + "\""+ "\"name\": \"" + queried.getName() + "\","+ "\"address\": \"" + queried.getAddress() + "\",";if (requesterId == queriedId)response += "\"salary\": \"" + queried.getSalary() + "\",";return response + "}";}
Figure 6: Source code for the employee information dispatchersystem and to declassify information. Since it allows for information declassification [1], it is necessaryto use this modifier carefully as it may easily lead to incorrect specifications which result in undetectedinformation leaks. If a parameter or return value does not have a security annotation, then the monitorwill propagate its security label without performing any extra operation.In the current example, we define specification files for all dispatcher classes and for classes thatrepresent stored information about employees. The specifications for classes
Employee , Associate ,and
Supervisor are depicted in Figure 5a, Figure 5b, and Figure 5c respectively. Notice that all fieldsin the classes are annotated with a security label from the lattice in Figure 3. Security label parametersare instantiated with fields to denote a concrete dependency, or (_) to represent ⊥ . The values used toinstantiate security label parameters must belong to the same object and produce a security dependencybetween fields. For instance, in class Supervisor we use
SupervisorSL(id) as a security label depen-dent on the value of field id , declared in the superclass Employee . The security label
AssociateSL(_) establishes a security compartment accessible to all
Associate employees.The effort required to write security specifications depends on several factors: the knowledge about . Geraldo & J.C. Seco class EmployeeInfoDispatcher {String:?AssociateSL(requesterId) associateDispatch(long requesterId, long queriedId);} Figure 7: Security specifications for the employee information dispatcherthe system to instrument, the constraints of the system, and the complexity of the security specificationand lattice. After defining the security specifications and labels, it is possible to instrument the applica-tion.
Value tainting
In the first instrumentation step, we inject shadow fields in every application class. Oneof them holds the security label of the class instance while the remaining ones mirror existing class fieldsof a primitive or library (non-instrumented) type.
Methods and Parameter passing
In order to propagate the security labels of primitive or library-typearguments and return values, we add shadow fields to each class. These shadow fields help preparingmethod calls by allowing the caller to store and the callee to retrieve the arguments’ security labels.When the called method terminates its execution, the callee stores the security labels of the argumentsand return value for the caller to retrieve.
Instruction rewriting
The final step of the instrumentation process consists in the instrumentationof method bodies. The body of a method consists of a graph of basic blocks, where a basic blockis a sequence of instructions starting with a label and terminating in a return, a branching, or a jumpinstruction. The instrumentation process compositionally rewrites instructions in the SSA form [4, 26],according to the rules defined in Figure 8. Every rule for instructions that give place to information flowstake into account the security label associated with the computation itself, i.e., the security label of theprogram counter ( pc% (cid:96) ) [5].The set of instructions considered is the following: load a value to a local variable v = s or v = o . f ;method call v = o . g ( v ... v n ) , where o represents the target object; object instantiation v = new C ; binaryoperations v = op ( e , e ) ; phi expressions v = φ ( v , v ) ; conditional jumps if ( e ) goto l ; unconditional jumps goto l ; and the return instruction return e . A phi function is a pseudo-function used in merge points toyield one of its arguments according to the control-flow path executed. Notation: we use C to denote class names, v and o to denote local variables or registers, k to denote valueliterals, e ranges over local variables and constants; f to denote object fields, and g to denote methodnames. A variable v s stores the security label of variable v , a field f s stores the security label of field f s ,a field g pi stores the security label of parameter i of the method g and a field g ret stores the security labelof the return value of method g .The rules for loading operations, depicted in Figure 8, ( CONST , LOCAL , and
FIELD ) work by com-bining the pc , the security label of the value, and the variable’s security label. The dynamic modificationof a field, rule FIELD
W, potentially increases the field’s security label with combination of pc and thevalue’s security label. Rule FIELD
C applies to fixed label fields, where writes are always lower or equalto the current label.To deal with a method call, we have two rules. Rule
CALL handles the call to an instrumented method.It starts by copying the arguments’ security labels to the target method with the help of auxiliary fields andthen calls the method. Once the method completes its execution, we retrieve the result and argument’s4
SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode [[ v : = k ]] (cid:96) (cid:44) v : = k ; v s : = v s (cid:116) pc% (cid:96) ( CONST ) [[ v : = v (cid:48) ]] (cid:96) (cid:44) v : = v (cid:48) ; v s : = v s (cid:116) v (cid:48) s (cid:116) pc% (cid:96) ( LOCAL ) [[ v : = f ]] (cid:96) (cid:44) v : = f ; v s : = v s (cid:116) f s (cid:116) pc% (cid:96) ( FIELD ) [[ f : = v ]] (cid:96) (cid:44) assert ( v s (cid:116) pc% (cid:96) (cid:118) f s ) ; f : = v ; ( f has spec.) ( FIELD C) [[ f : = v ]] (cid:96) (cid:44) f : = v ; f s : = f s (cid:116) v s (cid:116) pc% (cid:96) ( f has no spec.) ( FIELD W) [[ v : = new C ]] (cid:96) (cid:44) v : = new C ; v s : = v s (cid:116) pc% (cid:96) ( NEW ) [[ v : = o . g ( v , . . . , v n )]] (cid:96) (cid:44) o . f p : = v s (cid:116) pc% (cid:96) ; . . . ; o . f pn : = v n s (cid:116) pc% (cid:96) ; v : = o . g ( v , . . . , v n ) ; v s : = o . f p ; . . . ; v n s : = o . f pn ; v s : = o . f return ( o has spec.) ( CALL ) [[ v : = o . g ( v , . . . , v n )]] (cid:96) (cid:44) v : = o . g ( v , . . . , v n ) ; v s : = v s (cid:116) o s (cid:116) v s (cid:116) . . . (cid:116) v n s (cid:116) pc% (cid:96) ( o has no spec.) ( CALL X) [[ v : = op ( e , e )]] (cid:96) (cid:44) v : = op ( e , e ) ; v s : = v s (cid:116) v s (cid:116) pc% (cid:96) ( BIN OP ) [[ if ( v ) goto k ]] (cid:96) (cid:44) pc% (cid:96) out : = pc% (cid:96) in (cid:116) v s if ( v ) goto k ( BRANCH ) [[ goto k ]] (cid:96) (cid:44) goto k ( GOTO ) [[ v : = φ ( v , v )]] (cid:96) (cid:44) v = : φ ( v , v ) ; v s : = v s (cid:116) φ ( v s , v s ) ( PHI ) [[ return e ]] (cid:96) (cid:44) this . f return : = e s (cid:116) pc% (cid:96) ; o . f p : = v s ; . . . ; o . f pn : = v n s ; return e ; ( RETURN )Figure 8: Instrumentation rulessecurity label. It is necessary to collect the argument security labels after the call since they might havechanged during the method’s execution. Rule
CALL
X accounts for the use of non-instrumented methods,where the resulting security label is the combination of all operands’ security labels plus the programcounter and, in the case of instance methods, the callee’s security label.Notice that in the case of rule
BRANCH , the value for the context’s security label ( pc ) increasesaccording to the security label of the branch condition. Once the execution leaves the scope started bythe branch condition, it is necessary to reinstate pc ’s old security label. To restore the pc , we follow therules depicted in Figure 9. The rule depicted in Figure 9a is applied in the case where there are multiplepredecessors ( (cid:96) , ..., (cid:96) n ) of the basic block (cid:96) but e does not post-dominate a common predecessor of (cid:96) , ..., (cid:96) n , i.e., multiple control flows converge but the scope does not change. According to this rule, thesecurity label of the context at beginning of the basic (cid:96) results from the φ function of the predecessors’context security labels. The second rule, the rule depicted in Figure 9b, applies when there are multiplepredecessors ( (cid:96) , ..., (cid:96) n ) of (cid:96) and e is the first instruction to post-dominate a common predecessor, d , of (cid:96) , ..., (cid:96) n . In this case, the context security label at beginning of block (cid:96) ( pc% (cid:96) in ) is equal to the contextsecurity label at the beginning of d ( pc% d in ), i.e., e is the first instruction to execute outside the scopecreated in d and reinstates the value of pc before entering the new scope. Unconditional branches do notchange any security meta-information. Rule P HI chooses the security label according to the executedpredecessor. When a return instruction executes, the pc stack is placed at the same label as it was when . Geraldo & J.C. Seco (cid:96) : pc% (cid:96) : = φ ( pc% (cid:96) , ..., pc% (cid:96) n )[[ e ]] ... [[ e n ]] (cid:96) , ..., (cid:96) n are predecessor nodes of (cid:96) . (a) (cid:96) : pc% (cid:96) in : = pc% d in [[ e ]] ... [[ e n ]] d is the post-dominated nodewhere the current scope started. (b) pc% (cid:96) out is equal to pc% (cid:96) in if not stated otherwise.Figure 9: Context security label rules for merging control flowsthe function was called (because of ad-hoc returns at any point in the method body). Besides restoringthe pc , it is also necessary to copy the returned value’s security label and the argument’s security labelsto auxiliary fields. It is necessary to update the security labels of the arguments to deal with cases wherethey are objects of non-instrumented types. The objects’ security label might change during the method’sexecution, in which case it is necessary to propagate any changes to the caller. Testing phase
Once defined the security layer and instrumented the application, we test the examplefor information leaks. To do so, we need to test all available operations in the instrumented application.If an operation has an information leak, the monitor halts the system’s execution indicating an assertionviolation. Strong guarantees about data confidentiality depend on the test coverage achieved.In summary, our approach for the detection of illegal information flows using dependent securitylabels, is embodied in a tool based on the SOOT framework which instruments a target application witha reference monitor. With this approach, we believe to have improved the process of security certificationfor third-party systems. Despite the need for some specification effort, typically, there is a set of DAOand controller classes that are known and for which is possible to design a specification.
In this section we illustrate the rewriting process using a small example. Let us consider the Java classdepicted in Figure 10b. As stated in section 3, first, we add new fields (to which we will refer as “labelfields”) to the application classes for storing security labels. We add one label field for the objects’ secu-rity label ( secLbl$this ), one label field for every field of a non-instrumented type ( secLbl$field0 )and for every method we add label fields for parameters and return values of non-instrumented types( secLbl$methodA$p0 , secLbl$methodA$p1 , and secLbl$methodA$ret ). We show the result the offield injection on class Example in Figure 10a.Once injected all the necessary fields, we can proceed to rewrite the methods’ body in the SSA form.To do so, we rewrite every instruction according to the rules defined in Section 3. Figure 11a depicts apossible representation of methodA in the static single assignment form and, Figure 11b illustrates theresult of methodA ’s body rewriting.Lines 6-7 retrieve arguments’ security labels ( secLlb$a and secLlb$b ) from auxiliary fields in-jected for the purpose ( secLlb$methodA$p0 and secLlb$methodA$p1 respectively);6
SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode class Example {SecurityLabel secLbl$this;SecurityLabel secLbl$field0;long field0;Example field1;SecurityLabel secLbl$methodA$p0;SecurityLabel secLbl$methodA$p1;int methodA (int a, int b) {...}SecurityLabel secLbl$methodA$ret;Example methodB(Example e) {...}} (a) Class
Example after field injection. class Example {long field0;Example field1;int methodA (int a, int b) {if(a > b)return a;return b;}Example methodB(Example e) {...}} (b) Method methodA of Class
Example . Figure 10: Class
Example
Lines 8-10 compute the condition’s security label. Then, update pc , keeping its old value so thatwe can restore it when the execution leaves the branch’s scope. Finally execute thebranching instruction.Lines 11-16 compute the value to return, security operations accompany every operation executed.Each branch stores the result in a different version of the same variable ( result_1 and result_2 ).Lines 17-19 terminate the context initiated with the branching instruction (more specifically in line9). When the execution leaves the scope of the branch instruction, it is necessaryto restore pc to its previous value. Since there are two paths converging, it is alsonecessary to decide which version of the variables to consider using φ functions.Lines 19-20 conclude the method’s execution. They store the result’s label ( secLbl$result_2 )in field secLbl methodA ret and update the arguments’ label fields. The return onlyexecutes after storing the labels. To provide some validation to our approach, we developed a prototype tool, SNITCH, and the instru-mented the web application presented to provide some validation to our approach, we developed a pro-totype tool, SNITCH, and then used it to instrument the web application presented in Section 3.Just as defined in the approach; SNITCH, based on a set of security specifications, instruments asystem with an in-lined reference monitor. As can be seen Figure 12, which depicts SNITCH’s architec-ture, SNITCH consists of two modules; a parser for the security specifications and an instrumentationmodule for bytecode rewriting. The latter component makes use of the SOOT [23] framework which,as previously stated, is a framework for Java bytecode manipulation and optimization. In an attempt toreduce the reference monitor’s impact on the system’s execution, SNITCH makes use of the optimizationsuites SOOT offers for optimizing the instrumented code. . Geraldo & J.C. Seco class Example { ... int methodA (int a, int b) { if(a > b) goto LABEL0 result_0 = b goto LABEL1 LABEL0: result_1 = a; LABEL1: result_2 = phi(result_0, result_1) return result_2 } ... } (a) class Example { ... int methodA (int a, int b) { secLbl$a = this.secLbl$methodA$p0 secLbl$b = this.secLbl$methodA$p1 secLbl$cond = combine(secLbl$a, secLbl$b) secLbl$oldPC = increasePC(cond) if(a > b) goto LABEL0 result_0 = b secLbl$result_0 = secLbl$b goto LABEL1 LABEL0: result_1 = a; secLbl$result_1 = secLbl$a LABEL1: setPC(secLbl$oldPC) result_2 = phi(result_0, result_1) secLbl$result_2 = phi(secLbl$result_0, secLbl$result_1) this.secLbl$methodA$ret = secLbl$result_2 this.secLbl$methodA$p0 = secLbl$a this.secLbl$methodA$p1 = secLbl$b return result_2 } ... } (b) Figure 11: Original (left) and instrumented (right) SSA code for method methodA of Class
Example
To test the approach, we introduced information leaks in the example application. The leaks resultedfrom implicit and explicit information flows. The leaks caused by explicit flows were bad assignments orattempts to return classified information. The in-lined reference monitor in the application was capableof detecting all the information leaks introduced in the example application.The example’s information retrieval methods’ implementation was naive, returning all informationavailable on the employees disregarding any information access restrictions. We reached the final imple-mentation of the application through a trial and error process in which we instrumented, tested, and fixedthe application multiple times until no further information leaks were detected.The instrumentation of the example web application also the collection of for the collection of somebroad measurements on the reference monitor’s impact on the execution time of an application. Still,more applications need to be instrumented and tested to obtain more accurate values. We defined a setof five operations which we used to measure the total execution time and each operation’s average exe-cution time. We measured operations’ execution time in both the original and instrumented applications.Figure 13 shows how the reference monitor affects the execution time of each operation. The operationfor information retrieval is the one where the impact of the monitor was the greatest. An explanationfor this is that this operation extracts the most information per employee; therefore, it executes data8
SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode
SNITCHSecurity SpecificationParserSecurity Labels . class . class . classSpecification FIles . spec . spec . specTarget's compiledJava code Security labelinjection Field ShadowingMethod shadowing Method bodyrewritingJAR SOOToptimizations Instrumentedapplication . class . class . classSNITCH Instrumentor Figure 12: SNITCH internal phases L i s t E m p l o y e e s G e t E m p l o y e e I n f o G e t A v g S a l a r y A d d n e w S u p e r v i s o r A d d N e w A s s o c i a t e O v e r h ea d f ac t o r( x1 ) Figure 13: Runtime overhead per application method. . Geraldo & J.C. Seco
There is a considerable amount of work on information flow analysis in the literature, ranging fromaxiomatic approaches [3], dynamic analyses [5, ? ], programming languages and types [15, 19, 22] toinstrumented virtual machines [12]. Java Information Flow [18, 19], contains embedded informationflow analysis capabilities, and allows the definition of a form of dynamic matching between labels andprincipals which can in turn be used to parametrize classes and define richer runtime policies.TaintDroid [12] is an approach that does not extend or create a programming language but instru-ments the virtual machine where the intermediate language executes. Sensitive data is tainted at its source(e.g., GPS) and the instrumented virtual machine propagates the taint along a program’s execution. Whentainted data reaches a sink (e.g., network interface), the information leak is logged. An advantage of theapproach taken by TaintDroid over JIF is that it is not necessary to change application code.Austin and Flanagan [5] present a dynamic approach for information flow analysis that guaranteesnon-interference in dynamically-typed languages. It presents and compares two approaches. UniversalLabelling , where all values have an explicit label (security label); and
Sparse Labelling where all valuesare tracked but only some are explicitly labelled. Sparse labelling is observably equivalent to universallabelling but with significantly less overhead.Ferreira [14] introduces the use of refinement types in information flow analysis. It presents anextension of the LiveWeb/ λ DB [7] with type-based information flow. Security labels are expressed usingfirst-order logic propositions dependent on runtime values. Value-dependent security labels are furtherdeveloped by [16], who presents the first non-interference result for dependent information flow types. The purpose of this work is to study the applicability of information flow analysis to the certification ofthird-party Java-based software systems. To convey a more usable, flexible and expressive framework,we have adopted dependent information flow control as the preferred abstraction.This paper presents work in the development of a certification tool that attaches in-lined referencemonitors to existing compiled code, based on interface specifications in observable points of systems.We foresee some immediate follow-ups on this work, the challenges in dealing with label creepingand the introduction of abstract interpretation to help reduce the runtime overhead beyond the opti-mizations resulting from the use of SSA intermediate language. Considering the security label com-bination operation ( (cid:116) ) that given two security labels (cid:96) A and (cid:96) B , yields the lowest security label thatis higher or equal to both (cid:96) A and (cid:96) B and the security lattice shown in Figure 3, computations such as AssociateSL ( ⊥ ) (cid:116) AssociateSL ( (cid:62) ) can be removed as its result is known beforehand ( AssociateSL ( (cid:62) ) SNITCH: Dynamic Information Flow Analysis for Independent Java Bytecode ). By statically analysing the code, not only trivial computations could be removed, but also, it wouldbe possible, in some instances, to detect illegal information flows statically. The introduction of suchmechanisms would allow our approach to evolve from a dynamic information flow control mechanismto a hybrid one. Another possible line of work would be the introduction of our approach in software de-velopment frameworks as a development tool. This would allow software developers to test their code asthey develop. There also some advantages that our approach can benefit from if integrated with softwaredevelopment tools like the automatic extraction of specifications based on the frameworks annotations.Frameworks like Spring and Jersey annotate classes with information relevant to the specification files;for instance spring uses the annotation @Entity to flag DAO classes.Regarding the monitor’s overhead presented on Section 5, we would like to highlight that the mea-surements made only took into account CPU time. When taking into account I/O operations, we canconsider the monitor’s overhead as negligible. For instance, the measurements of CPU time were of theorder of the microseconds, while I/O operations took milliseconds, three decimal orders of magnitudegreater and network operations even worse.
Acknowledgements
This work was funded by NOVA LINCS UID/CEC/04516/2013, COST CA15123 - FC&T ProjectCLAY - PTDC/EEI-CTP/4293/2014
References [1] Ana Almeida Matos & Gérard Boudol (2009):
On Declassification and the Non-disclosure Policy . J. Comput.Secur.
Core J2EE Patterns (CoreDesign Series): Best Practices and Design Strategies , 2 edition. Sun Microsystems, Inc., Mountain View,CA, USA.[3] Gregory R. Andrews & Richard P. Reitman (1980):
An Axiomatic Approach to Information Flow in Pro-grams . ACM Trans. Program. Lang. Syst.
SSA is Functional Programming . SIGPLAN Not.
Efficient Purely-dynamic Information Flow Analysis . In:
Proceedings of the ACM SIGPLAN Fourth Workshop on Programming Languages and Analysis for Security ,PLAS ’09, ACM, New York, NY, USA, pp. 113–124, doi:10.1145/1554339.1554353.[6] Thomas H. Austin & Cormac Flanagan (2010):
Permissive Dynamic Information Flow Analysis . In:
Pro-ceedings of the 5th ACM SIGPLAN Workshop on Programming Languages and Analysis for Security , pp.3:1–3:12, doi:10.1145/1814217.1814220.[7] Luís Caires, Jorge A. Pérez, João Costa Seco, Hugo Torres Vieira & Lúcio Ferrão (2011):
Type-basedAccess Control in Data-centric Systems . In:
Proceedings of the 20th European Conference on Program-ming Languages and Systems: Part of the Joint European Conferences on Theory and Practice of Software ,ESOP’11/ETAPS’11, Springer-Verlag, Berlin, Heidelberg, pp. 136–155, doi:10.1006/inco.1994.1093.[8] D. Chandra & M. Franz (2007):
Fine-Grained Information Flow Analysis and Enforcement in a Java VirtualMachine . In:
Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007) , pp. 463–475, doi:10.1109/ACSAC.2007.37.[9] Robert Daigneau (2011):
Service Design Patterns: Fundamental Design Solutions for SOAP/WSDL andRESTful Web Services , 1 edition. Addison-Wesley Professional.[10] Dorothy E. Denning (1976):
A Lattice Model of Secure Information Flow . Commun. ACM . Geraldo & J.C. Seco [11] Dorothy E. Denning & Peter J. Denning (1977): Certification of Programs for Secure Information Flow . Commun. ACM
TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoringon Smartphones . Communications of the ACM , doi:10.1145/2494522.[13] Úlfar Erlingsson & Fred B. Schneider (2000):
SASI Enforcement of Security Policies: A Retrospective . In:
Proceedings of the 1999 Workshop on New Security Paradigms , NSPW ’99, ACM, New York, NY, USA, pp.87–95, doi:10.1145/335169.335201. Available at http://doi.acm.org/10.1145/335169.335201 .[14] Paulo Jorge Abreu Duarte Ferreira (2012):
MSc Dissertation. Information flow analysis using data-dependentlogical propositions.
Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa.[15] Jürgen Graf, Martin Hecker & Martin Mohr (2013):
Using JOANA for Information Flow Control in JavaPrograms - A Practical Guide . In:
Proceedings of the 6th Working Conference on Programming Languages(ATPS’13) .[16] Luísa Lourenço & Luís Caires (2015):
Dependent Information Flow Types . SIGPLAN Not.
A type system for value-dependent information flow analy-sis . Ph.D. thesis.[18] Andrew C. Myers & Barbara Liskov (2003):
Protecting privacy using the decentralized label model . In:
Foundations of Intrusion Tolerant Systems, OASIS 2003 , pp. 89–116, doi:10.1145/363516.363526.[19] Andrew C. Myers, Lantian Zheng, Steve Zdancewic, Stephen Chong & Nathaniel Nystrom (2006):
Jif 3.0:Java information flow . Available at .[20] Andrei Sabelfeld & Andrew C. Myers (2003):
Language-based information-flow security . IEEE Journal onSelected Areas in Communications
A Language-Based Approach to Security .[22] V. Simonet (2003):
The Flow Caml System (version 1.00): Documentation and user’s manual . Available at .[23] R Vallée-Rai, P Co & E Gagnon (1999):
Soot-a Java bytecode optimization framework . CASCON .[24] Stephan Arthur Zdancewic (2002):
Programming Languages for Information Security . Ph.D. thesis, Ithaca,NY, USA. AAI3063751.[25] Steve Zdancewic (2004):
Challenges for information-flow security . Proceedings of the 1st InternationalWorkshop on the Programming Language Interference and Dependence (PLID’04) .[26] Jianzhou Zhao, Santosh Nagarakatte, Milo M.K. Martin & Steve Zdancewic (2013):