[PDF] Contextual and Granular Policy Enforcement in Database-backed Applications

Abstract

Database-backed applications rely on inlined policy checks to process users' private and confidential data in a policy-compliant manner as traditional database access control mechanisms cannot enforce complex policies. However, application bugs due to missed checks are common in such applications, which result in data breaches. While separating policy from code is a natural solution, many data protection policies specify restrictions based on the context in which data is accessed and how the data is used. Enforcing these restrictions automatically presents significant challenges, as the information needed to determine context requires a tight coupling between policy enforcement and an application's implementation. We present Estrela, a framework for enforcing contextual and granular data access policies. Working from the observation that API endpoints can be associated with salient contextual information in most database-backed applications, Estrela allows developers to specify API-specific restrictions on data access and use. Estrela provides a clean separation between policy specification and the application's implementation, which facilitates easier auditing and maintenance of policies. Policies in Estrela consist of pre-evaluation and post-evaluation conditions, which provide the means to modulate database access before a query is issued, and to impose finer-grained constraints on information release after the evaluation of query, respectively. We build a prototype of Estrela and apply it to retrofit several real world applications (from 1000-80k LOC) to enforce different contextual policies. Our evaluation shows that Estrela can enforce policies with minimal overheads.

Full PDF

CContextual and Granular Policy Enforcement inDatabase-backed Applications

Abhishek Bichhawat

Carnegie Mellon UniversityPittsburgh, PA, USA

Matt Fredrikson

Carnegie Mellon UniversityPittsburgh, PA, USA

Jean Yang

Carnegie Mellon UniversityPittsburgh, PA, USA

Akash Trehan ∗ Microsoft VancouverVancouver, British Columbia, Canada

ABSTRACT

Database-backed applications rely on inlined policy checks to pro-cess users’ private and confidential data in a policy-compliantmanner as traditional database access control mechanisms can-not enforce complex policies. However, application bugs due tomissed checks are common in such applications, which result indata breaches. While separating policy from code is a natural so-lution, many data protection policies specify restrictions based onthe context in which data is accessed and how the data is used.Enforcing these restrictions automatically presents significant chal-lenges, as the information needed to determine context requiresa tight coupling between policy enforcement and an application’simplementation.We present Estrela, a framework for enforcing contextual andgranular data access policies. Working from the observation thatAPI endpoints can be associated with salient contextual informationin most database-backed applications, Estrela allows developersto specify

API-specific restrictions on data access and use. Estrelaprovides a clean separation between policy specification and theapplication’s implementation, which facilitates easier auditing andmaintenance of policies. Policies in Estrela consist of pre-evaluationand post-evaluation conditions, which provide the means to modu-late database access before a query is issued, and to impose finer-grained constraints on information release after the evaluation ofquery, respectively. We build a prototype of Estrela and apply itto retrofit several real world applications (from 1000-80k LOC) toenforce different contextual policies. Our evaluation shows thatEstrela can enforce policies with minimal overheads.

CCS CONCEPTS • Security and privacy → Access control ; Software securityengineering ; Web application security . ∗ Work done while interning at Carnegie Mellon University.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

ASIA CCS ’20, October 5–9, 2020, Taipei, Taiwan © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-6750-9/20/06...$15.00https://doi.org/10.1145/3320269.3384759

KEYWORDS

Database-backed applications; granular access policies; contextualaccess control; API-specific policies

ACM Reference Format:

Abhishek Bichhawat, Matt Fredrikson, Jean Yang, and Akash Trehan. 2020.Contextual and Granular Policy Enforcement in Database-backed Appli-cations. In

Proceedings of the 15th ACM Asia Conference on Computer andCommunications Security (ASIA CCS ’20), October 5–9, 2020, Taipei, Taiwan.

ACM, New York, NY, USA, 13 pages. https://doi.org/10.1145/3320269.3384759

Modern systems collect a plethora of personal user information [3],but their growing complexity makes it increasingly difficult to en-sure compliance with data protection policies, and to avoid widely-publicized data breaches [20]. Application-wide policy compliancechecks are a widely-used way to address these concerns, but ensur-ing correct enforcement across component boundaries in applica-tions is particularly challenging because it requires coordinatingsuch inline checks across different parts of the application. Thus,enforcing compliance in applications and updating them with newregulations, as they are introduced, is time-consuming, expensive,and error-prone.The policies that regulations often require, and that these appli-cations need to enforce, include more than just access conditions— they specify how the data can flow and be used by differentusers, and are contextual or context-dependent. Some other policiesmight require declassification , revealing partial information aboutsensitive data in certain cases. Both contextual and informationrelease policies are prevalent in major privacy laws [17, 32, 45] andorganizational policies [13, 19, 28].

We illustrate the need for supporting contextual and release poli-cies using the example of the Health Insurance Portability andAccountability Act (HIPAA) privacy rule [32].

H1.

While some sections of HIPAA restrict the use or disclosureof protected health information (PHI) for marketing or research,other sections allow disclosure if the patient is in need of emer-gency treatment. Such contextual policies depend on the settingand circumstances of data access and use, i.e., sensitive data mayneed to be retrieved differently, or disallowed, depending on theconditions under which access is needed (e.g., purpose of access). a r X i v : . [ c s . CR ] M a r s it is essential that in case of an emergency a doctor in the emer-gency department must be allowed access to the PHI of the patient,normal authorization rules might not apply. H2.

HIPAA also includes declassification policies — for instance,hospitals are allowed to release aggregate information like thetotal number of patients diagnosed with a particular disease to bereleased as it is not PHI (although derived from it), but the individualdetails of affected patients may not be revealed. For declassificationpolicies, the same set of data accessed individually and togethermust be subject to different policies.The main challenge in specifying and enforcing such policieslies in the fact that doing so may require detailed information aboutthe application’s internal state and architecture, as well as theunderlying database (whether the data is accessed for emergencyor regular treatment, or is it a part of some aggregate informationor is individually identifiable).There are numerous existing options for enforcing data pro-tection policies in database-backed applications. Most enterprisedatabase systems incorporate role-based fine-grained access controlmechanisms [23, 40, 41] that can enforce policies that prohibit spec-ified users and groups from accessing certain tables, rows, columnsor cells, or provide policy-specific views of the database to differentroles. However, because these policies cannot refer to informationabout the applications that use the database, they are not sufficientfor enforcing the contextual policies needed by many applications,and may introduce performance overheads when propagating up-dates for simpler policies [27].Several instances of prior work [6, 26, 27, 35, 44, 46, 47] have ex-plored dynamic query-transformation or query-rewriting methodsto enforce different sorts of policies, but these approaches do notimmediately address the challenge of managing context and appli-cation state, and do not support partial release policies that requiremodifications to query responses. For instance, in the emergencyaccess example ( H1 ), some of the proposed approaches [6, 16, 26] in-clude additional conditions like purpose with the traditional accesscontrol mechanisms where the user specifies the purpose of accessadditionally, which is then used to determine access. This, however,requires the electronic health record systems (EHRs) to trust theuser requesting access who may not always be benign [25]. OtherEHRs might provide a single access point that always allow accessto the data, and use after-the-fact auditing for ensuring compliance. A proper solution to enforce policies in database-backed applica-tions to ensure compliance with data protection laws and regula-tions should meet the following criteria in addition to supportingbasic cell-, row- and column-level access policies:(1)

Contextual data flow policies:

The enforcement mechanismshould be able to enforce different policies based on thecontext of data access or the current state of the application.(2)

Granular policies for complete mediation:

Apart from support-ing cell-, row- or column-level policies, the enforcementshould be amenable to providing partial data disclosure,which can also modify sensitive data before disclosure. (3)

Enable a clean separation between the policy specification, theapplication, and its underlying database:

Given the scale ofmodern applications, enforcing policies as inline checks canquickly become difficult to maintain and implement correctly.Thus, it is imperative that the enforcement framework shouldbe independent of the application and the database.(4)

Practicality:

As a separate policy enforcement mechanismincurs overhead for policy selection and application, it shouldbe a practical solution for the users offering very low ornegligible performance overheads while correctly enforcingthe policies.In this work, we build on two key insights for enforcing contex-tual and granular policies:1.

Application-specific information needed to enforce fine-grainedcontextual policies can often be obtained at the API-endpoint bound-aries of such applications:

Typically, servers expose endpoints that aclient can call into using a uniform resource identifier (URI). Theseendpoints map to different APIs defining distinct functionalities,which can further be mapped to what data is retrieved from thedatabase and under what context the access occurs. As differentAPIs may access the same data from a database under differentcontexts, policies that are

API-specific will reflect the necessaryapplication context. For example, if a doctor in the emergency de-partment of a hospital requests access to a patient’s PHI throughthe emergency treatment endpoint (say, /emergency/ ), it can bereasonably inferred that the purpose of data access is an emergency.Thus, the context of data access can be determined based on theAPI endpoint used to access the data. Even if there isn’t such adistinction of access, the application code can be refactored to in-troduce this distinction so that the application can correctly specifythe policies.2.

Policies that selectively release information can be applied afterquery evaluation by mediating the results returned by the database:

While normal access policies can be applied before data access byquery modification, policies that modify the content of the datacannot be enforced via query rewriting (when dealing with multiplepolicies - more in Section 3.4). Thus, a declarative policy specifica-tion in SQL is inadequate to enforce policies that release parts of acell’s content. However, such policies can be enforced by modifyingthe data returned by the database after query evaluation.We present Estrela, a policy enforcement framework that ad-dresses the challenge of enforcing contextual and granular policiesfor database-backed applications. In Estrela, developers specifypolicies that are explicitly associated with specific APIs providedby their application. The policies are associated with the databaseschema separately from the application code without requiringany support from the back-end database. This makes it easier toimplement, modify, and audit both the policy and application.Estrela supports fine-grained contextual policies that are fac-tored into pre-eval and post-eval components. Database accessesthroughout the application are subject to pre-eval policies, whichare enforced before database access by query rewriting. Post-evalpolicies provide the flexibility needed by partial release or declassi-fication requirements, by modifying the results of a database queryat the granularity of individual rows.e demonstrate that Estrela can be applied to real applicationsto enforce a diverse range of practical data access policies. Weprototyped Estrela in Python, on top of Django, and applied ourprototype to migrate and build applications with complex policyrequirements: a port of the open-source Spirit forum software [43]and open-source social-networking site Vataxia [36], modified toenforce fine-grained data flow policies that restrict how users accesstopics, posts and user-profiles; a conference management systembuilt on top of Jacqueline [52] used for an academic workshopported to Estrela; and as a microbenchmark, we build an intranetapplication that manages the profile and compensation details ofemployees, and facilitates events and meetings. Using these appli-cations, we show that Estrela incurs very low overhead over theoriginal applications where policies are inlined through the code.

To summarize our contributions, we presenta novel policy enforcement framework, Estrela, that supports richcontextual policies on sensitive data while maintaining strong sep-aration between policy, code and data. Estrela factors policies intopre- and post-eval components, enabling both data access and in-formation release restrictions in policies. Estrela does not requiremodifications to the application or the database, thus simplifyingpolicy implementation and maintenance. We prototype an enforce-ment mechanism for Estrela policies as an integrated languageruntime in Python on top of Django and evaluate it by using it tobuild/migrate four applications ranging from 1000 LOC to 80 kLOC.We show that it incurs low overheads while requiring minimalchanges to existing application code.

Estrela is a framework that assists the developer in building policy-compliant applications. In Estrela, a policy is specified centrallyalongside the database schema, and the runtime ensures that thepolicy is enforced correctly across all application components. Es-trela supports fine-grained, contextual policies that are factoredinto pre-eval and post-eval components. Policies are enforced basedon which tables and fields (and their transformations) are queried.

Policies in Estrela have the following form: t . f , . . . , t n . f n ; φ ; A : P In this expression, P is the policy body containing executable codethat either rewrites a query or filters a set of results; A is an optionalAPI identifier; φ is either pre or post ; and t i . f i is a column identifierfrom the target database schema.The policy P applies when t . f , . . . , t n . f n are accessed in a query. φ indicates whether the policy is a pre-eval policy ( pre ) or post-eval policy ( post ) depending on which the policy is applied beforeor after the query is executed. The function P works on eitheran unevaluated query for pre-eval policies, i.e., before the data isactually fetched from the database, or rows in the result-set of anevaluated query for post-eval policies. P always returns a modifiedquery or row, respectively. Pre-eval policies:

Database accesses throughout the applicationare subject to pre-eval policies, which are enforced before query evaluation by modifying queries to have additional conditions.These policies add additional filters to the query by either addingconditions to the query’s

WHERE clause or inserting subqueries ,thereby limiting the information returned by the database and therows being accessed by the query.

Post-eval policies:

Estrela supports post-eval policies that providethe necessary flexibility by associating policies with the result of anevaluated query. Such policies operate at a finer-granularity withmore contextual information, and are used alongside generic pre-eval policies for complete mediation. They are mainly informationrelease policies that apply on the query’s result, and modify therows in the result of the query or the result itself. Only the field(s)for which the policy is specified is(are) modified while other fieldsin the result remain unchanged.Policies can additionally be associated with a transformationfunction on a field, e.g.,

Avg , in which case one of the t i . f i is replacedwith F i ( t i . f i ) where F i is the transformation used on t i . f i .The field A contains the list of APIs on which the current policyapplies. Based on the API through which the data is accessed, thepolicies with A take precedence and override other policies. If A isomitted, then the policy is applied for every access.The policies, additionally, have access to the current user au-thenticated with the system and the API that made the query to thedatabase, which are extracted from the request sent to the serverfrom the client. The current user is represented as U in the policies.If the user is not authenticated with the server, the user is treatedas an anonymous user. The policy selection algorithm checks if thecurrent API is present in the A field of the policy and returns thepolicies that apply for that API.In case policies are specified for t . f , t . f , and t . f , t . f , allthree policies are applied when accessing t . f , t . f . If no policyapplies on a query, the default policy returns no rows to accountfor missed policies. Using an intranet for a large organization as a running example,the remaining section describes example Estrela policies and theirspecifications. We start by showing simple access control policiesfollowed by how more complex policies are enforced, as the sectionproceeds. The intranet provides different services to employees likeviewing personal employee details, payroll information, and set-ting up meetings and events within the organization. The schemafor the back-end database of this application is shown in Table 1.Briefly, the

User field contains the personal details of the com-pany’s employees; the

Payroll field stores the details of employeesalaries along with their manager’s identifier; the

EventCalendar field records the events or meetings organized within the companywhile

Invitee stores the list of employees invited to each event.In the examples that follow, we use an object-oriented policyspecification that employs pre-defined methods (e.g., filter , exclude )to specify policies for enforcement on SQL queries. Evaluation of aquery executes it on the database and retrieves the relevant rowsas the result. The pre-eval policies work on a query ( query in thepolicy) and are applied before the query is evaluated. On the otherhand, the post-eval policies apply on the result of evaluation of aueryset ( result in the policy). Policies have access to the currentuser authenticated with the server (denoted U in the policy). Example 1 (Access control policies):

Suppose that the siteenforces a policy that allows either the name or the age of the em-ployees to be accessed separately, but data linking the name and the age should only be accessible to the employee whose name and age is being accessed, or to an employee from the HR department. Thisis a basic access-control policy that defines what data is accessibleby a user in terms of the database state, and can be specified as apre-eval policy in Estrela. The annotation pre in the first line of thepolicy that follows the list of columns on which the policy applies,identifies the policy as pre-eval, as shown below: User.name, User.age; pre :if U .dept != 'HR':query = query.filter(id= U .id)return query The policy above checks if the current user (identified by U ) isin the HR department. If so, the policy doesn’t add any filters tothe original query. If not, the policy ensures that the user accessesonly his/her own record. The variable query refers to the query object that will be executed on the database while the function filter adds additional constraints on the query. In the above example, thefunction adds a clause to remove those rows whose id is not equalto the current user’s id .In certain cases, instead of denying access entirely, the applica-tion needs to release some information about a sensitive datumsuch as an aggregate statistic or a derived value. These policies canbe expressed as pre-eval policies in Estrela. In our running example,suppose that the policy requires that non-manager employees canonly access the average for other non-manager employees: Avg(Payroll.salary); pre :mgr = Payroll.values('mgid')if U .id not in mgr:query = query.exclude(id__in=mgr)return query The above policy is enforced when the average of employee salaries(

Avg(Payroll.salary) ) is accessed by an employee. The policy, ini-tially, retrieves the employee-ids of all managers in the organization.The function values returns only those fields in the table (in thiscase,

Payroll ) that are specified as an argument to the function, i.e., mgid (the manager’s id). If the current user is a manager, it returnsthe average of salaries of all employees. Otherwise, it adds a clauseto remove the salaries of employees who are also managers in thequery, before computing the average, using the exclude function.

Table Fields

User id, name, age, address, deptPayroll id, mgid, salaryEventCalendar eid, date, location, orgid, eventInvitee eid, empid

Table 1: Database schema for a company’s intranet

Example 2 (Context-dependent partial release):

There areother cases where the developer might want to release partial infor-mation about the sensitive information, which might vary accordingto the context. In Estrela, these policies are most naturally expressedas post-eval policies that are applied to a set of query results usingthe result object. This is denoted with the post annotation at thetop of the policy.Suppose the application enforces a policy that the address ofan employee is selectively visible to other employees, such thatonly the employee can see his/her complete address, employeesin the

Transportation department can see the neighborhood of anemployee to arrange a drop-off, and all other employees can onlysee the city name. As pre-eval policies apply on the query and noton individual rows, it is not possible to perform different transfor-mations on different rows. The policy given below demonstrateshow a post-eval policy can post-process results to enforce this finerconstraint: User.address; post : for row in result: if row.id == U .id: continue elif U .dept == "Transportation": row.address = getngh(row.address) else: row.address = getcity(row.address) return result This example returns different values based on the context. If thecurrent user is the employee itself, then the user is allowed accessto the complete address (line 3). If the user is in the

Transportation department (line 5), the user can see the neighborhood of the em-ployee’s address. If none of the two conditions hold, the user seesonly the city of the employee.

Example 3 (API-specific differential access):

A more inter-esting scenario arises when the developer wants to specify a policythat indicates only the existence of a sensitive value in certain cases,and in other contexts reveals more information. Such policies arenot supported by any of the existing mechanisms. An example ofthis is a calendar application, wherein the app allows users to createnew events and invite other employees to the events.Consider three endpoints identified by APIs get_events , delete_-events and get_location_events shown in Listing 1. The API get_ev-ents returns a list of events, the API delete_events returns a list ofevents that can be deleted, and the API get_location_events returnsthe list of events organized at the location identified by loc . Thefunction all retrieves all rows from the table. The system enforces apre-eval policy that allows a user to see only those events that theuser is invited to. When accessed through the delete_events API,the system enforces a pre-eval policy that allows the user to seeonly those events that the user created. However, when viewingthe list of events at a location, the user besides getting the detailsof the events to which she is invited should also see “Private event” for other events along with their date and time so that she cannotschedule another event at the same location at the given time. def get_events(request): e = EventCalendar.all() ... def delete_events(request): e = EventCalendar.all() ... def get_location_events(request, loc): e = EventCalendar.filter(location=loc) ... Listing 1: APIs exposed by event calendar service EventCalendar.∗; pre : return query.filter(eid__in=Invitee.filter(empid= U .id).values('eid')) EventCalendar.∗; pre ; [delete_events] : return query.filter(orgid= U .id) EventCalendar.∗; pre ; [get_location_events] : return query EventCalendar.∗; post ; [get_location_events] : for row in result: if not Invitee.filter(eid=row.eid, empid= U .id).exists(): row.event = "Private event" row.orgid = 0 return result Listing 2: Policies enforced by event calendar service

As getting a list of events is subject to the pre-eval policy thatreturns only those events that the user is invited to, a policy needsto be specified specifically for APIs accessing events to delete someevents, e.g., delete_events , and at a location, e.g., get_location_-events . The list of specified policies is shown in Listing 2. For API-specific policies, the list of APIs that these policies apply to arespecified after the pre or post annotation. If the value is omitted,the policy applies to all APIs.When the list of events is accessed via get_events , it applies thepre-eval policy defined on line 1 and returns only those events thatthe user is invited to. The ‘ ∗ ’ in the policy indicates that the policyapplies every time the table is queried. When the list of events isaccessed via delete_events , it applies the pre-eval policy defined online 4 overriding the policy defined on line 1 as it is API-specific,and returns only those events that the user has created. When thelist of events at a given location is accessed via get_location_events ,it retrieves all events from the database at that location as thepost-eval policy defined on line 10 overrides the pre-eval policy. Itchecks if there is an entry in the Invitee table that corresponds tothe current row’s event and the current user (line 12). If found, thepolicy returns the details of the event to the user; if not, it returns “Private event” for the event (line 13) with the organizer’s datahidden. The policy on line 7 ensures that the query is not filtered,thus, returning all events registered at the given location. Without

Figure 1: Application architecture with Estrela this, the policy on line 1 would filter the list to contain only thoseevents that the user is invited to.

Figure 1 describes the application architecture with the policy en-forcement mechanism using Estrela. The overall workflow of thearchitecture is as follows:(1) The server accepts incoming requests on different APIs andperforms the required query on the database.(2) Pre-eval policies associated with the data being retrieved areapplied as filters on the query before the object is fetchedfrom the database.(3) The filtered query is then executed on the database, whichgets some data from the database.(4) Once the data is fetched, the post-eval policies that apply onthe result of the query modify the query’s result-set.(5) The API performs any necessary computations on the dataretrieved, and creates and sends the response to the client.

Estrela aims to provide a unified, practical, and robust frameworkfor specifying and enforcing policies in database-backed applica-tions. The primary security goal of Estrela is to ensure that thedatabase queries and the APIs satisfy the conditions specified in thepolicy. Estrela mitigates authorization bugs by enforcing data flowpolicies that apply when the data is read from the database, and limitthe scope of our paper to those. We assume that developers make agood-faith effort to specify correct policies, and that the integrityof the Estrela framework is intact and uncompromised throughoutthe lifetime of the application. Estrela does not attempt to prevent lgorithm 1

Algorithm to enforce policies in Estrela function

GetPolicy( P , fields , A , def ) policy = {}, epol = {} for ( p : φ ) in P doif ( fields ; A ) in p then policy = policy ∪ φ else if ( fields ) in p then epol = epol ∪ φ if policy is empty thenif epol is not empty then policy = epol else if def then policy = φ D return policy function Apply( Q , A , Pre, Post ) fields = parse( Q ) policy = GetPolicy( Pre , fields , A , true ) Q = applyFilters( Q , policy ) result = executeQuery( Q ) postpol = GetPolicy( Post , fields , A , false ) result = applyPostPolicy( result , postpol ) return result leaks from hardware or operating system-level side channels, or bygroups of users who collude via out-of-band channels to learn morethan what is specified in the policy. Likewise, as Estrela policiesconcern server-side data, leaks that result from vulnerabilities inthe client-side browser or operating system are also not in scopefor our security goal. Algorithm 1 shows the top-level algorithm (and the auxiliary func-tion) used by Estrela to select and enforce policies on data access.The top-level function Apply takes the query, API, and the set ofpolicies as the input. The function starts by parsing the query ( Q ) todetermine which fields are used by the query. Using the fields andthe API information A , the function GetPolicy returns the pre-evalpolicies applicable to the query. These policies are enforced on thequery, which is then run. The results returned by the database aremodified as per the post-eval policies selected by GetPolicy forthe fields and the API A . The modified result is then returned to theuser.The operation ( fields ; A ) in p in GetPolicy checks if the fieldsand APIs in the policy p contain fields and A , i.e., the list of fieldsand APIs in p is a superset of fields and A . If fields contain aggregatefunctions, it additionally checks for the fields used in the aggregatefunction in the fields of p . If the appropriate policies for the API A are not found, the function applies general policies that apply to allAPIs. The policy φ D is the default policy that restricts access to allthe data. The variable def is set to true or false to indicate whetherthe default policy should apply. Estrela enforces multiple policies associated with the (set of) fieldsspecified in the query. It filters data based on the pre-eval policies à la the existing access-control mechanisms, and modifies the data inthe result to release partial information as per the post-eval policies.In our prototype, instead of specifying policies in a declarative lan-guage like SQL, we specify them using object-relational mapping(ORM) methods that offer an object-oriented view and a method-based access of the data. Besides making it easy to write policyfunctions for developers who might be more comfortable usingobject-oriented programming for policy-specification, ORM-basedspecification has certain advantages over the declarative specifica-tion when specifying complex policies:(1) Post-eval policies cannot be specified in SQL as they applyon a query’s result requiring an imperative specification.(2) Enforcing multiple policies when specified in SQL forces thepolicies to be access-control (as they are specified as addi-tional conditions in the WHERE clause [6, 26, 27] or specificSQL sub-queries [26, 47]); thus, not allowing informationrelease unless using user-defined functions that are alreadydefined in the database, which, in turn, requires modificationto the application code.(3) Enforcement of multiple data usage policies for informa-tion release (e.g., DataLawyer [46]) require all policies tobe checked before the policy-compliant data is returned,making the approach less efficient.

Whilethe access control mechanisms enforced by enterprise databasesystems [23, 40, 41] provide fine-grained data access, they havecertain shortcomings that make them unsuitable for applying thepolicies supported by Estrela: • They lack support for policies that link two or more tableswithout having explicitly defined views. Creating such viewsrequire modifications to the application such that it queriesfor the correct data. • Data masking either removes all rows or defaults the valuesin the column if the rows are inaccessible by the user. Theydo not support modification of specific rows in the table asshown in Example 2 above. • Contextual policies are application-specific, and hence can-not be implemented generically by database access controlsystems. Existing systems [6, 16] that take into account suchcontexts when allowing access require explicit specificationby the user in the query, which they validate. While the userneeds to be trusted to provide the correct context, it alsorequires modifications to applications to send appropriatequeries.As most of the other existing approaches [6, 26, 27, 35, 44, 46, 47]work independent of the application, they do not have access tothe API-specific information necessary for enforcing contextualpolicies. It is, however, possible to expose the path or API on whichthe request is made (similar to how the authenticated user is ex-posed) to the enforcement monitor. This would, in turn, require theexisting approaches to modify the policy specification language totake into account API information for enforcing contextual policies.However, it is not possible to specify post-eval policies in the ex-isting approaches (making modification to query results difficult),hich is required, for instance, in Example 3 when fetching eventsat a particular location ( get_location_events ).It is also instructive to see how the policy in Example 2 differsfrom prior work like Qapla [27], which also requires user-definedfunctions like getngh and getcity for specifying such policies. InQapla, the developer has to invoke the correct function, e.g., get-ngh(address) if the user is in the

Transportation department and getcity(address) if the user does not satisfy any criterion, to getaccess to the address. If the developer queries the wrong function,she does not get access to the address. In Estrela, the post-evalpolicy takes care of this; thus no modification to the application orits API is required. Moreover, the query needs to be repeated to getthe actual values of address and the city name when the user doesnot have appropriate access.Other prior works [26, 47] can use

CASE statements to specifysuch properties because their rewriting technique generates differ-ent data based on the conditions. However, when multiple policiesapply on a query for a column, it is unclear as to how the policieswould apply in these approaches.

Post-eval policies are at least asexpressive as pre-eval policies, but enforcing pre-eval policies hascertain advantages. Firstly, as pre-eval policies are applied beforethe query is evaluated by adding filter conditions or subqueries, thedatabase is queried only once. This enhances the performance ofthe policy-framework as post-eval policies require more databasehits (as the policy queries the database again). Secondly, pre-evalpolicies prevent timing-related leaks that are possible with post-eval policies. As they are applied before the database is queried,they do not reveal any information about the number of recordssatisfying the original unfiltered query. For instance, suppose, ina healthcare setting, a user wants to know how many patientshave a certain disease, but say, they are not allowed to access thisinformation. With post-eval policies, the time taken to first retrievethe list of patients that have a certain disease and then filtering theresults might be significant, dominating the time to respond to thequery. Therefore, if it takes a long time to respond, this may leaksome information on what the query response size was to the user,even though the result itself contains no sensitive information.Post-eval policies allow partial release of sensitive informationbased on the context making them flexible enough to handle caseswhere different values need to be returned as per the context. More-over, it is not possible to enforce all policies before evaluation assome policies need to post-process the results. Post-eval policies arealso useful when some additional information that is not presentin the result of the filtered query needs to be released.

The policies are applied considering all the fields used in the query irrespective of where they appearin the query. While this may at first seem too conservative, it isnecessary to prevent implicit information leaks, as might be thecase when, for example a query returns the names of all employeesof a particular age (

SELECT name FROM employees WHERE age >45 ). The policies for accessing name , and name and age togethermay be different — while all employees can access the names ofother employees, an employee can only access his/her own age andname together as shown below:

User.name; pre :return queryUser.name, User.age; pre :return query.filter(id = U .id) If the policy were based only on the selected name column, it wouldapply a relaxed policy that allows the names of all employees havingthe particular age to be displayed as the first policy does not applyany filter for access of name . However, as the two columns arelinked in the query, the correct policy would be the second oneassociated with both name and age that reveals to the user his/herown name and age only and no additional information because ofaccessing name and age together. To handle such leaks, all columnsused in the query are considered when selecting the policy to applyon the query.

Estrela provides a mechanism for automatically enforcing a givenset of policies. We discuss its implementation in this section andevaluate the approach in Section 5.Estrela is prototyped in Python and extends Django [15], aPython-based model-view-template application framework. Thus,apps written in Estrela are otherwise standard Django apps withpolicies in the schema. Policies are specified alongside the databaseschema using two class-methods (for pre-eval and post-eval poli-cies) that are inherited by all models (schemas). If these methods arenot overriden by a model, then a default conservative policy thatsuppresses all results applies. It includes an object-relational map-ping (ORM) to interact with databases based on which it constructsSQL queries, represented as querysets . Evaluation of a querysetobject executes the query on the database and retrieves the relevantrows as the result. Django constructs a queryset as soon as an APIstarts querying for some data in the database; however, it does notevaluate the queryset to reduce the number of database hits. Thepre-eval policies work on a queryset ( query in the policy) while thepost-eval policies apply on the result. Policies are Python functionsin our prototype.To enforce a pre-eval policy, we augment the database inter-face functions provided by Django via monkey-patching, whichamounts to inheriting from Django’s classes and overloading themethods relevant to Django’s interfacing with the database. Wemodified about 700 lines of Python code including code from theoriginal implementation. Using this approach, we achieve completemediation such that all database accesses in an Estrela applicationoccur through the instrumented methods. These methods are re-sponsible for invoking the policy functions, and passing them thecurrent user and API information that Estrela exposes from therequest to the server. As Django supports lazy evaluation, we applythe pre-eval policies just before the query is evaluated to get a betterperformance.To enforce a post-eval policy, we modify the results returnedfrom the database and selectively apply the relevant policy func-tions before the results are returned to the APIs. Selective enforce-ment is achieved by a case analysis on the fields involved in thequery, to determine which policies are relevant. When multiple poli-cies apply on the result of an API call, we assume an arbitrary butxed order in which to apply the policy functions to the querysetresult.Both pre- and post-eval policies need to consult the set of activeAPI-specific policies to bypass enforcement when operating in thecontext of a relevant API.

As Estrela is built on top of Django, migrating an existing Django ap-plication to Estrela is straightforward and does not require changesto the core application code. The developer only needs to specifythe policies alongside the data models (i.e., the database schema),which are modified to inherit from Estrela’s model class. The onlyother necessary change is to expose the request parameters of eachAPI to the policy enforcement mechanism, which is taken careof automatically by Estrela using a middleware configured in theapplication’s settings.Although Estrela is prototyped in Python on top of Django, itsprinciples are generic and can be extended to enforce authorizationpolicies in other existing frameworks (designed using any language).Estrela requires a separation of the models or the database schemafrom the actual application alongside which the policies are spec-ified in the language in which the framework is built. Enforcingpolicies would require adding hooks in the query evaluation processof the framework.An interesting question to consider when integrating Estrelawith legacy applications is how to bootstrap policy specification toassist the developers. Policy inference, which is orthogonal to theproblem studied in this paper, has been an area of active researchwhere prior works have proposed approaches to mine meaningfulpolicies using logs, traces and program specifications [1, 4, 5, 11,22, 24, 29, 37, 48–51]. The mined policies can be used to bootstrapthe initial set of policies when migrating applications to Estrela.However, the problem of policy inference is out of scope of thecurrent paper.

We evaluate Estrela by comparing the code changes required onexisting applications and the overhead it incurs due to the policychecking. We demonstrate that Estrela is easy to integrate withexisting applications, and incurs very low overheads, showing itseffectiveness and usefulness. We consider open-source applicationsfor migration to Estrela that are built using Django.

We used Estrela to migrate a few applications to enforce policies,ranging from about 1000 LOC to 80 kLOC. The first is a version ofSpirit [43], a forum software where users can discuss on differenttopics, migrated to Estrela with policy enforcement. The secondapplication is a social-networking site Vataxia [36]. The third isa multi-user conference management system that lets users add,edit, remove papers for a conference. The fourth application is acompany’s intranet on the lines of the examples discussed in Sec. 2.The case-studies were chosen to evaluate the effect of policy en-forcement on large applications, applications with multiple policies,and the overhead it incurs on simple and complex policies. For allthe case-studies, we add a middleware to expose the request details to the schema for evaluating the policies in Estrela. All other codein the applications remains the same. The baseline implementationsof our case-studies are all built using Django with policies includedin the code, allowing us to evaluate the performance of Estrela.We performed our experiments on a MacBook Pro having an 8 GBRAM and 3.1 GHz Intel Core i5 processor, running macOS Catalina10.15.1. We automated the process of sending requests to the serverto retrieve data from the database, and measured the average timetaken for processing the request and policy enforcement over 100trials. The back-end database was MySQL version 8.0.18. Unlessmentioned otherwise, all servers were build using Python 3.7.3 andDjango 2.2.7. The server and the client run on the same machineso the evaluation results do not include network latency or thetime taken for rendering. We ran this process in the context of aparticular user, who is authenticated before sending the request.The error bars in the graphs show the standard deviation.In the following sub-sections, we describe the functionality ofthese applications, the policies that we apply, and how these poli-cies are implemented and enforced using Estrela. We report theperformance numbers for these case-studies, by measuring the timetaken by the application to perform user operations in differentscenarios. Section 5.2.1 discuss the code changes required for theimplementations to enforce policies in their respective settings.

Spirit is a Django-based forum software for facilitating conversa-tions and discussions amongst users. The original software containsabout 80 kLOC. Users can create new conversations or post com-ments on existing conversations depending on the visibility of theconversation and/or whether the user has been invited to the con-versation or not. The schema of the actual site contains 28 tableswith 165 columns. We specified pre-eval policies for various tablesand measured the overhead incurred by Estrela, and demonstratethe ability of Estrela to scale to large applications.

We modified the models’ base class in theoriginal software to use Estrela’s model as the base class. Thisrequired a modification of about 30 LOC in the original software.The other modification required apart from the specification ofpolicies was to expose the incoming request to the models foridentifying the current user and the API that requested the data,which require a couple of lines of code to be added to the settingsof the application.One of the policy we enforce is associated with various topics ina forum that allows only logged-in users to view the topics. Withoutthe policy enforcement, Spirit shows all public topics even withouta user being logged-in to the system. In Estrela, we added a policythat checks if the current user is authenticated or not; if not, it doesnot show any topics to the user. In the original version of Spirit,this check has to be propagated to at least three different files inthe codebase, all of which access a topic and display it to the userreiterating the need for centralized specification of policies.

We added 1000 users and 1000 topics to thedatabase and evaluate the time taken to access the topics withdifferent users. The result of our experiment is shown in Figure 2.Estrela incurs an average overhead of about 0 . .

8% when A cc e ss T i m e ( s e c ) User accessing topics

DjangoEstrela

Figure 2: Time taken to request and access different topicsin the forum with Django as the baseline accessing the topics as an authenticated user. When accessing thetopics as an anonymous user, Estrela incurs an overhead of about1 ms, which is mainly due to Django’s lazy evaluation in Estrela.With Estrela, the policies are applied just before the evaluation afterthe objects have be created even if the objects are never required inthe future resulting in the additional overhead. The inline check inDjango can be added as early as possible in the API code preventingthe creation of objects, thereby avoiding the additional overhead.

Social-networking sites involve multiple users posting informationand sharing it with other users. Users can track the activity of otherusers by “following” them. We modify an existing open-sourcesocial-networking site, Vataxia [36], by implementing additionalfunctionality to follow users and view a user’s posts. The applicationallows new users to be added to the system and to search for usersto follow their posts. The original schema contained the followingmodels –

User , Post , PrivateMessage , Reply and

Vote ; we extend

User by adding a field, follow , to include a list of users that theuser is following. The table

Post contains a field user referencing

User table, and the message that the user has posted in msg . Thefront-end of the application is written in ReactJS while the back-endis developed using Django REST framework [14], Django 2.2.7 andPython 3.7.3.

We port Vataxia to Estrela by adding poli-cies for various tables modifying the models of the application toinherit from Estrela’s base class. The original application containsaround 19 kLOC as part of the front-end and 2 kLOC in the back-end.We modify 10 lines in the application, and include the middlewarethat exposes requests to the models.We discuss the enforcement of a policy that limits the poststhat a user can view based on whether the user is following theuser whose posts are being accessed or not. If accessing only theposts, a user can see posts by only those users that she is following.However, when the user accesses a profile of a particular user, shegets a default message asking her to follow the user to see theposts. In Estrela, this policy is implemented as a post-eval policyspecific to the profile_view

API because when accessed throughthe posts_view

API, this post-eval policy should not be enforced. def posts_view(request):posts = Post.objects.all()return Response(Serialize(posts))def profile_view(request, uid):profile = User.objects.filter(id=uid)user_posts = Post.objects.filter(user=uid)return Response(Serialize(profile, user_posts))

Listing 3: Application code for returning all posts andprofile of a user def posts_view(request):u = request.userif not u.is_authenticated():posts = []else:f_ids = u.follow.values('id')posts = Post.objects.filter(user__in=f_ids)return Response(Serialize(posts))def profile_view(request, uid):u = request.userif not u.is_authenticated():profile = Noneuser_posts = []else:profile = User.objects.filter(id=uid)f_ids = u.follow.values('id')user_posts = Post.objects.filter(user=uid, user__in=f_ids)if not user_posts:user_posts = ['Follow user to see the posts']return Response(Serialize(profile, user_posts))

Listing 4: Policy specified in Django for Vataxia

Post.∗; pre :if not U .is_authenticated():query = query.none()else:f_ids = U .follow.values('id')query = query.filter(user__in=f_ids)return queryPost.∗; post ; [profile_view]:if not result:result = ['Follow user to see the posts']return result Listing 5: Policy enforcement for Vataxia in Estrela

For code comparison, the policy codes for Vataxia are shownin Listings 4 and 5 for policy in views, and schema with Estrela, N o r m a li z ed A cc e ss T i m e Number of Posts

Django Estrela

Figure 3: Normalized time taken to access all posts througha REST API with policy in code as the baseline respectively. Although the policies are similar across the two List-ings, the developer has to add the policy for every access of

Post when adding checks in the API code (e.g., on line 7 and line 18 inListing 4) while the policy specification is much cleaner and doesnot require any modification to the code in Listing 3.

We added 1000 users and their details to thedatabase for the social-networking site with every user following0 − ,

000 posts to 256 ,

000 posts). The posts are generated automaticallybefore the processing starts and are related to random users. Figure 3shows the normalized time taken in different scenarios. The averageoverhead for accessing all posts with Estrela when compared to thebaseline of policy in the application code is around 0 . Table Policy - User can access

User either her own profile or another user’s profile if sheis the chair or in the pcPaper any paper if she is the chairPaper non-conflicted papers if she is in the pcPaper papers which she has co-authoredPaper accepted status of a paper either if she is the chair, orif she is the co-author and phase is finalPaper number of submitted and accepted papers in finalphase

Table 2: Policies for the conference management system

We modify an existing conference management system built inJacqueline [52], using Django, that has been deployed to managepaper reviewing process for an academic workshop (PLOOC 2014),to work without information flow tracking in place and migrateit to Estrela. This modified application contains about 4 kLOC re-taining the features of the existing system like creating users and N o r m a li z ed a cc e ss t i m e Django Estrela

Figure 4: Normalized time taken to access all papers andusers with Django as the baseline conferences, adding papers and roles for users, etc. while removingits dependence on the Jeeves [53] library and the functionality tohandle faceted values; the database also does not contain additionalfields for handling facets. The system supports single-blind submis-sion, handling conflicts, and submission of reviews and commentsby reviewers. Every conference has three phases - submission, re-view and final - which influence the policies. We enforce the policiesshown in Table 2 on different tables in the schema.

We created a dummy conference with 1000users (25 of them on the pc), and 1000 papers submitted by randomlychosen authors with 2 co-authors each. The policy for

User tableshown in Table 2 is a pre-eval policy while

Paper has a mix of both.Figure 4 shows the normalized time taken to access all papers andusers in the system with Django as the baseline. The performanceof Estrela when accessing the list of users incurs an overhead ofabout 0 . .

5% ascompared to Django when accessing the list of papers due to thenumber of policies (both pre-eval and post-eval) associated withthe

Paper table.

As a microbenchmark to evaluate the performance of Estrela whenenforcing different kinds of policies shown as examples in Sec-tion 2, we implement an intranet website in Estrela that handlesemployees’ personal and official information. The employees of thecompany can access their profiles, payroll information, their friends’profiles, and can schedule events or meetings within the companyas described earlier in Section 2 extended with an additional

Friends table that contains a list of friends in the organization. A brief de-scription of the policies that we enforce is shown in Table 3. Thepolicy-compliant queries executed on the database through Djangoand Estrela are the same. As policies in Estrela use high-level func-tions that are also used for the inline checks in Django, the modifiedquery generated in Estrela is the same as the one generated throughDjango using inline checks.

For the intranet example, we added employeedetails for 20,000 users in the database with each having at least 5friends. Additionally, we added 100000 events to test the scenariosinvolving events (Q5 and Q6). The policies for Q4 and Q6 are post-eval policies while for the other examples are pre-eval. We measurethe time taken to access employee and event information for thesix examples shown in Table 3. The normalized execution time uery Policy - Employee can access

Q1 only her friends’ agesQ2 all employee details if she is in the HR department; ifnot, only her own detailsQ3 average salary of employees who are not managersunless she is a managerQ4 address of her friends but sees only the city name forother employeesQ5 only those events to which she is invitedQ6 events at a particular location but sees “Private event”for events to which she is not invited

Table 3: Microbenchmark - Policies enforced for theintranet website N o r m a li z ed E x e c u t i on T i m e Intranet Examples

Django Estrela

Figure 5: Normalized execution time for intranet exampleswith Django as the baseline for the examples with policies enforced in Django and in Estrela isshown in Figure 5 with Django being the baseline. The time includesthe time taken to send the request, apply the policy, execute thequery, and to send back the response to the client. The overhead forpolicy enforcement in Estrela ranges from about 0% when accessingemployee details to 1% when accessing a set of events at a particularlocation. The additional overhead for Estrela is due to the checksto select the correct policies to apply.

Throughput.

To examine Estrela’s impact on the throughput, wemeasured the number of requests handled per second when issuingthe queries Q1 - Q6 described above. We used ApacheBench version2.3 to measure the throughput with a concurrency level of 10. Eachrun issued 500 requests. Figure 6 shows the resulting degradationin throughput, which was around 1% in the worst-case.

We show in Figure 7, how the complexity of a policy affects theoverhead of Estrela. We create an application in Estrela with acouple of tables each having 100 columns and a 100,000 rows, andtest the performance for different number of columns being used inthe policy. We ensure that all conditions in the policy are evaluatedto get the worst-case results, and evaluate it for both – Django withpolicy in the API code, and Estrela. The overhead that Estrela adds T h r oughpu t (r eq ./ s e c ) Intranet Examples

Django Estrela

Figure 6: Throughput of Estrela and Django for theintranet application with a concurrency level of 10 T i m e t o E x e c u t e Q ue r y ( i n s e c ) Number of Columns in the Policy

Django Estrela

Figure 7: Time taken for executing a query subject to apre-eval policy with different number of columns is almost constant, and does not increase as the complexity of thepolicy increases.

We describe some of the closely related works to Estrela on policyenforcement, most of which deal with enforcing policies on theserver-side of applications that communicate with a database toretrieve and store sensitive data.Qapla [27] is a framework to provide fine-grained access controlin database-backed applications where the policies are specifiedas SQL WHERE clauses that define what information the usersare allowed to access. Qapla’s enforcement engine modifies thelow-level SQL queries made to the database by adding the policiesas sub-queries. However, it does not support the specification ofdefault values that reveal the existence of a sensitive value, anddoes not provide the flexibility of applying release policies exceptaggregation offered by the post-eval policies in Estrela. ExtendingQapla to support post-eval policies is non-trivial because suchpolicies cannot be specified as SQL WHERE clauses requiring anadditional policy specification mechanism. Moreover, integratingQapla with existing applications might require modifications to thepplication code to query for the correct column-transformationswithout which it returns a more restrictive set of results.Hails [18] is an

MPVC web framework that adds policies declar-atively alongside database schemas and tags every piece of datain the system with a label. These labels are carried around as thedata flows in the system and checked by a trusted runtime whenleaving the system. The focus is to control the flow of data to un-trusted third-party applications by building applications using theHails web platform. Similarly, Jacqueline [52] is a framework totrack information flow in applications dynamically using the policy-agnostic design paradigm. Jacqueline relies on a modified databasethat stores multiple views of data based on who is allowed to accesswhat. It additionally allows specification of default values, but doesnot support policies that are linked to a set of sensitive fields inthe database or policies involving data-aggregates. While Estrelaprimarily enforces server-side policies without any client-side in-formation flow tracking, it supports contextual policies unlike Hailsand Jacqueline without requiring modifications to the database. Es-trela can be integrated with these frameworks for information flowtracking at the client-side to ensure that the sensitive data does notflow to unauthorized parties.FlowWatcher [30] is a system that enforces information flowpolicies within a web proxy without requiring modifications to theapplication. However, it is difficult to enforce fine-grained policies,like the ones Estrela supports, in the system. LWeb [33] is anothersystem that provides information flow control for web applicationsdeveloped in Haskell, which supports expressive policies albeit withmoderate overheads. LWeb, however, does not support contextualpolicies and requires building all applications from scratch. Sim-ilarly, Daisy [21] provides flow tracking in databases supportingfeatures like triggers and dynamic policies. However, their moni-tor supports only policies that can be expressed in the SQL accesscontrol model. SELinks [10] allows both server-side and database-side enforcement of policies by compiling server-side functions touser-defined functions that can run on the database for compliance.It does not support contextual policy compliance and is limited tocontrolling data disclosure.Sen et al. [42] propose Legalease language for stating policiesusing Deny and Allow clauses and enforce it on big data systems.The language is simple allowing easy policy specification by thedevelopers and does partial information flow tracking to catch vio-lations of policies, but can express only a smaller subset of policiesthan Estrela. For instance, Legalease cannot return parts of sensitiveinformation unless stored explicitly in the database and queriedexplicitly for by the application.Ur/Web [8] is a domain-specific language for programming webapplications, which includes a static information flow analysiscalled UrFlow [7]. Policies are expressed in the form of SQL queriesand can express what information a particular user can learn butthe analysis might over-approximate being static in nature. Data-Lawyer [46] is a system to analyze data usage policies that checksthese policies at runtime when a query is made to the database.The policies are specified in a formal language based on SQL. Itallows quite expressive policies but checks all policies wheneverthe database is queried. Estrela, on the contrary, selects the right setof policies based on the fields used in the query, making it more effi-cient. CLAMP [34] protects sensitive data leakage from web servers by isolating user sessions and instantiating a new virtual web serverinstance for every user session. The queries are restricted to dataaccessible by the current user; however, the granularity of policiesis limited to per-table.Byun and Li [6] and later Kabir et al. [16] proposed purpose-based access control to enforce purpose-based policies like theones related to HIPAA showed in the paper. While purpose-basedaccess control takes contextual purpose into account, the existingapproaches rely on user’s trustworthiness to specify the purposefor access. In contrast, our work uses path-specific information todetermine the purpose dynamically.SIF [9] is a framework for developing web applications that re-spect some confidentiality policies. The framework is built on topof Jif [31], an extension of Java with information flow control, andenforces information flow control in Java Servlets. Besides beinglanguage-specific, SIF also incurs moderate overheads because ofthe analysis. SeLINQ [38] is another information flow control sys-tem to enforce policies across database boundaries that modifies asubset of F

We present Estrela, a framework that ensures policy-compliancein database-backed applications by supporting specification andenforcement of contextual and granular policies. The policies arespecified separately from the application code alongside the data-base schema without requiring any modifications to the existingcode and the database. We prototyped Estrela in Python on top ofDjango to specify and enforce API-specific policies. Estrela sup-ports easy migration of legacy applications for policy-compliance.We show the applicability of Estrela by building/migrating fourapplications on top of it and showing that it incurs low overheadswhile enforcing expressive policies.

ACKNOWLEDGMENTS

We would like to thank our shepherd, Qi Li, and the anonymousreviewers for their insightful comments and feedback. The workas supported in part by the Center for Machine Learning andHealth (award no. PO0006063506) at Carnegie Mellon University.

REFERENCES [1] Lujo Bauer, Scott Garriss, and Michael K. Reiter. 2011. Detecting and ResolvingPolicy Misconfigurations in Access-Control Systems.

ACM Trans. Inf. Syst. Secur.

14, 1, Article 2 (June 2011), 28 pages.[2] A. Blankstein and M. J. Freedman. 2014. Automating Isolation and Least Privilegein Web Services. In

Proceedings of the 24th ACMSymposium on Access Control Models and Technologies (SACMAT ’19) . Associationfor Computing Machinery, New York, NY, USA, 161–172.[5] Thang Bui, Scott D. Stoller, and Jiajie Li. 2017. Mining Relationship-Based AccessControl Policies. In

Proceedings of the 22nd ACM on Symposium on Access ControlModels and Technologies (SACMAT ’17) . Association for Computing Machinery,New York, NY, USA, 239–246.[6] J. Byun and N. Li. 2008. Purpose Based Access Control for Privacy Protection inRelational Database Systems.

The VLDB Journal

17, 4 (July 2008), 603–619.[7] A. Chlipala. 2010. Static Checking of Dynamically-varying Security Policies inDatabase-backed Applications. In

Proceedings of the 9th USENIX Conference onOperating Systems Design and Implementation (OSDI’10) . 105–118.[8] A. Chlipala. 2015. Ur/Web: A Simple Model for Programming the Web. In

Pro-ceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles ofProgramming Languages (POPL ’15) . 153–165.[9] S. Chong, K. Vikram, and A. Myers. 2007. SIF: Enforcing Confidentiality andIntegrity in Web Applications. In

Proceedings of 16th USENIX Security Symposium .Article 1, 16 pages.[10] B. Corcoran, N. Swamy, and M. Hicks. 2009. Cross-tier, Label-based SecurityEnforcement for Web Applications. In

Proceedings of the 2009 ACM SIGMODInternational Conference on Management of Data (SIGMOD ’09) . 269–282.[11] C. Cotrini, T. Weghorn, and D. Basin. 2018. Mining ABAC Rules from SparseLogs. In . 31–46.[12] M. Dalton, C. Kozyrakis, and N. Zeldovich. 2009. Nemesis: Preventing Authenti-cation & Access Control Vulnerabilities in Web Applications. In

Proceedings ofthe 18th Conference on USENIX Security Symposium

Expert Syst. Appl.

Presented as part of the 10th USENIX Symposium on Operating Systems Design andImplementation (OSDI 12) . USENIX, Hollywood, CA, 47–60.[19] Google Privacy Policy 2019. https://policies.google.com/privacy.[20] J. De Groot. 2019. The History of Data Breaches. https://digitalguardian.com/blog/history-data-breaches.[21] M. Guarnieri, M. Balliu, D. Schoepe, D. Basin, and A. Sabelfeld. 2019. Information-Flow Control for Database-Backed Applications. In

Proceedings of the 4th IEEEEuropean Symposium on Security and Privacy (EuroS&P 2019) . 79–94.[22] Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining Frequent Patterns withoutCandidate Generation.

SIGMOD Rec.

29, 2 (May 2000), 1–12.[23] Patricia Huey. 2014. Oracle Database Security Guide 11g Release 1 (11.1). https://docs.oracle.com/cd/B28359_01/network.111/b28531/title.htm.[24] Padmavathi Iyer and Amirreza Masoumzadeh. 2018. Mining Positive and NegativeAttribute-Based Access Control Policy Rules. In

Proceedings of the 23nd ACM onSymposium on Access Control Models and Technologies (SACMAT ’18) . Associationfor Computing Machinery, New York, NY, USA, 161–172.[25] Jiang JX and Bai G. 2019. Evaluation of Causes of Protected Health InformationBreaches.

JAMA Intern Med.

Proceedings of the ThirtiethInternational Conference on Very Large Data Bases - Volume 30 (VLDB ’04) . VLDBEndowment, 108–119.[27] A. Mehta, E. Elnikety, K. Harvey, D. Garg, and P. Druschel. 2017. Qapla: Policycompliance for database-backed systems. In . Vancouver, BC, 1463–1479.[28] Microsoft Privacy Statement 2019. https://privacy.microsoft.com/en-us/privacystatement. [29] Decebal Mocanu, Fatih Turkmen, and Antonio Liotta. 2015. Towards ABACPolicy Mining from Logs with Deep Learning. In

In proc. of the 18th InternationalMulticonference, IS 2015, Intelligent Systems .[30] D. Muthukumaran, D. O’Keeffe, C. Priebe, D. Eyers, B. Shand, and P. Pietzuch.2015. FlowWatcher: Defending Against Data Disclosure Vulnerabilities in WebApplications. In

Proceedings of the 22Nd ACM SIGSAC Conference on Computerand Communications Security (CCS ’15) . 603–615.[31] A. Myers. 1999. JFlow: Practical Mostly-static Information Flow Control. In

Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Pro-gramming Languages (POPL ’99)

Proc. ACM Program. Lang.

3, POPL, Article 75 (Jan.2019), 75:1–75:30 pages.[34] B. Parno, J. McCune, D. Wendlandt, D. Andersen, and A. Perrig. 2009. CLAMP:Practical Prevention of Large-Scale Data Leaks. In

Proceedings of the 2009 30thIEEE Symposium on Security and Privacy (SP ’09) . 154–169.[35] S. Rizvi, A. Mendelzon, S. Sudarshan, and P. Roy. 2004. Extending Query RewritingTechniques for Fine-grained Access Control. In

Proceedings of the 2004 ACMSIGMOD International Conference on Management of Data (SIGMOD) . 551–562.[36] B. Roberts. 2017. Open source social network built with Django and DjangoREST framework . http://vataxia.net/.[37] Matthew W Sanders and Chuan Yue. 2019. Mining Least Privilege AttributeBased Access Control Policies. In

Proceedings of the 35th Annual Computer SecurityApplications Conference (ACSAC ’19) . Association for Computing Machinery, NewYork, NY, USA, 404–416.[38] D. Schoepe, D. Hedin, and A. Sabelfeld. 2014. SeLINQ: Tracking InformationAcross Application-database Boundaries. In

Proceedings of the 19th ACM SIGPLANInternational Conference on Functional Programming (ICFP ’14) . 25–38.[39] D. Schultz and B. Liskov. 2013. IFDB: Decentralized Information Flow Controlfor Databases. In

Proceedings of the 8th ACM European Conference on ComputerSystems (EuroSys ’13)

Proceedings of the 2014 IEEESymposium on Security and Privacy (SP ’14) . 327–342.[43] Spirit - Django based forum software 2019. https://spirit-project.com/.[44] M. Stonebraker and E. Wong. 1974. Access Control in a Relational Data BaseManagement System by Query Modification. In

Proceedings of the 1974 AnnualConference - Volume 1 (ACM ’74) . 180–186.[45] European Union. 2016. Regulation (EU) 2016/679 of the European Parliamentand of the Council of 27 April 2016 on the protection of natural persons withregard to the processing of personal data and on the free movement of such data.

Official Journal of the European Union

L119 (4 May 2016), 1–88.[46] P. Upadhyaya, M. Balazinska, and D. Suciu. 2015. Automatic Enforcement ofData Use Policies with DataLawyer. In

Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data (SIGMOD ’15) . 213–225.[47] Q. Wang, T. Yu, N. Li, J. Lobo, E. Bertino, K. Irwin, and J. Byun. 2007. On theCorrectness Criteria of Fine-grained Access Control in Relational Databases. In

Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB’07) . VLDB Endowment, 555–566.[48] Zhongyuan Xu and Scott D. Stoller. 2012. Algorithms for Mining MeaningfulRoles. In

Proceedings of the 17th ACM Symposium on Access Control Models andTechnologies (SACMAT) . ACM Press, 57–66.[49] Zhongyuan Xu and Scott D. Stoller. 2013. Mining Parameterized Role-BasedPolicies. In

Proceedings of the Third ACM Conference on Data and ApplicationSecurity and Privacy (CODASPY ’13) . Association for Computing Machinery, NewYork, NY, USA, 255–266.[50] Zhongyuan Xu and Scott D. Stoller. 2014. Mining Attribute-Based Access ControlPolicies from Logs. In

Proceedings of the 28th Annual IFIP WG 11.3 WorkingConference on Data and Applications Security and Privacy XXVIII - Volume 8566(DBSec 2014) . Springer-Verlag, Berlin, Heidelberg, 276–291.[51] Z. Xu and S. D. Stoller. 2015. Mining Attribute-Based Access Control Policies.

IEEE Transactions on Dependable and Secure Computing

12, 5 (Sep. 2015), 533–545.[52] J. Yang, T. Hance, T. Austin, A. Solar-Lezama, C. Flanagan, and S. Chong. 2016.Precise, Dynamic Information Flow for Database-backed Applications. In

Pro-ceedings of the 37th ACM SIGPLAN Conference on Programming Language Designand Implementation (PLDI ’16) . 631–647.[53] J. Yang, K. Yessenov, and A. Solar-Lezama. 2012. A Language for AutomaticallyEnforcing Privacy Policies. In

Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’12) . 85–96.[54] A. Yip, X. Wang, N. Zeldovich, and M. F. Kaashoek. 2009. Improving ApplicationSecurity with Data Flow Assertions. In