[PDF] DataProVe: A Data Protection Policy and System Architecture Verification Tool

Abstract

In this paper, we propose a tool, called DataProVe, for specifying high-level data protection policies and system architectures, as well as verifying the conformance between them in a fully automated way. The syntax of the policies and the architectures is based on semi-formal languages, and the automated verification engine relies on logic and resolution based proofs. The functionality and operation of the tool are presented using different examples.

Full PDF

DDataProVe: A

Data Pr otection P o licy and SystemArchitecture Ve riﬁcation Tool Vinh Thong TaLaboratory of Security and Forensic Research in Computing (SaFeR)University of Central Lancashire (UCLan)Preston, [email protected] 2, 2020

Abstract

In this paper, we propose a tool, called DataProVe, for specifying high-level data protectionpolicies and system architectures, as well as verifying the conformance between them in afully automated way. The syntax of the policies and the architectures is based on semi-formallanguages, and the automated veriﬁcation engine relies on logic and resolution based proofs.The functionality and operation of the tool are presented using diﬀerent examples.

Under the General Data Protection Regulation (GDPR) [1], personal data is deﬁned as “anyinformation relating to an identiﬁed or identiﬁable natural person” . The GDPR speciﬁes therights for living individuals who have their personal data processed, and enforce responsibilitiesfor the data controllers and the data processors who store, process or transmit such data.Despite the data protection laws, there were several data breaches incidents in the past (e.g.[3–5]) and nowadays, such as the Cambridge Analytica scandal of Facebook [6], where personaldata of more than 87 millions Facebook users has been collected and used for advertising andelection campaign purposes without a clear data usage consent. One of the main problems wasthe insuﬃcient check by Facebook on the third party applications. Google also faced lawsuit overcollecting personal data without permission, and has been reported to illegally gather the personaldata of millions of iPhone users in the UK [7].The GDPR took eﬀect in May 2018, and hence, designing compliant data protection policiesand system architectures became even more important for organizations to avoid penalties. Dataprotection by design, under the Article 25 of the GDPR [8], requires the design of data protectionmeasures into the development of business processes of service providers. The regulation also limitsbusinesses from performing user proﬁling and demanding appropriate consents before personal datacollection (Article 6 of the GDPR [9]).Unfortunately, in textual format, the data protection laws are sometimes ambiguous and canbe misinterpreted by the policy and system designers. From the technical perspective, to the bestof our knowledge, only a small number of studies can be found in the literature that investigatethe formal or automated method to design and verify policies and architectures in the contextof data protection and privacy. The main advantage of using formal approaches during systemdesign is that data protection properties can be mathematically proved, and design ﬂaws can bedetected at an early stage, which can save time and money.On the other hand, using formal method for this purpose is also challenging, as abstraction isrequired, which is diﬃcult in case of complex laws. In this paper, we address this problem, and In the US, personally identiﬁable information is used with a similar interpretation [2]. a r X i v : . [ c s . CR ] O c t odel some simple data protection requirements of GDPR with regards to the data collection,usage, storage, deletion, and transfer phases. Privacy requirements are also considered such as theright to have certain data and link certain data types. We focus on the policy and architecturelevels, and propose a variant of policy and architecture language, speciﬁcally designed for specifyingand verifying data protection and privacy requirements. In addition, we propose a fully automatedalgorithm, for verifying three types of conformance relations between a policy and an architecturespeciﬁed in our language. Our theoretical methods are implemented in the form of a software tool,called DataProVe, for demonstration purposes.The main goals of the our policy and architecture languages and software tool include helpinga system designer at the higher level speciﬁcation (compared to the other tools that mainly focuson the protocol level), such as with the policy and architecture design, to spot any potential errorsprior the concrete lower level system speciﬁcation. Besides, our tool can be used for educationor research purposes as well. To the best of our knowledge, this is the ﬁrst work that addressesthe problem of fully automated conformance check between a policy and an architecture in thecontext of data protection and privacy requirements.This paper includes the following contributions:1. We propose a variant of privacy policy language (in Section 3).2. We propose a variant of privacy architecture language (in Section 4).3. We propose the deﬁnition of three conformance relations between a policy and architecture(in Section 5), namely, the privacy, data protection, and functional conformance relations.4. We propose a logic based fully automated conformance veriﬁcation procedure (in Section 6)for the above three conformance relations.5. Finally, we propose a (prototype) tool, called DataProVe, based on the theoretical founda-tions (in Section 8). …... … ... …... … ... … ... P r o p o s e d l o g i c i n f e r e n c e r u l e s e t S y s t e m A r c h i t e c t u r e ( P r o p o s e d P r i v a c y a r c h i t e c t u r e l a n g u a g e ) initgoal/sub-policy goal P o li c y ( P r o p o s e d p r i v a c y p o li c y l a n g u a g e v a r i a n t ) C o ll e c t i o n U s a g e S t o r a g e a resolutionstep(sub-) goal (sub-) goal Architectural elements D e l e t i o n T r a n s f e r P o ss e ss i o n L i n k … ... Proposed logic based proof algorithm (sub-) goalA proof found(sub-) goal Proof failed

Figure 1: An overview and the intuition behind the contributions of this paper.In Figure 1, a policy is speciﬁed using our proposed language variant, which covers seven sub-policies (data collection, usage, storage, deletion, transfer, possession and link). Each sub-policyis mapped to a logic goal that reﬂects the requirement in the policy. The veriﬁcation engineattempts to prove each goal based on a set of logic inference rules and architectural elements(speciﬁed in our language). The proposed veriﬁcation algorithm is based on a series of resolution2teps, represented as a derivation tree, where the root is a goal to be proved, and the leaves arethe architectural elements used to prove the goal.The paper is structured as follows: In Section 2, we discuss the related policy and architecturelanguages. In Sections 3-4, we present our policy and architecture languages, respectively. Theautomated conformance veriﬁcation engine is detailed in Section 6. In Section 8 we present theDataProVe tool and its operation using two simple examples. Finally, we discuss the results andconclude the paper in Sections 7 and 9.

The Platform for Privacy Preferences (P3P) [10] enables web users to gain control over theirprivate information on online services. On a website users can express their privacy practices in astandard format that can be retrieved automatically and interpreted by web client applications.Users are notiﬁed about certain website’s privacy policies and have a chance to make decision onthat. To match the privacy preferences of the users and web services, the authors proposed thePreference Exchange Language (APPEL) [11] integrated into the web clients, with which the usercan express their privacy preferences that can be matched against the practices set by the onlineservices. According to the study [12], in APPEL, users can only specify what is unacceptable in apolicy. Identifying this, the authors in [12] proposed a more expressive preference language calledXPref giving more freedom for the users, such as allowing acceptable preferences.The Customer Proﬁle Exchange (CPExchange) language [13], is a XML-based policy language,which was designed to facilitate business-to-business communication privacy policies (i.e. theprivacy-enabled global exchange of customer proﬁle information). The eXtensible Access ControlMarkup Language (XACML) [14] is a de-facto, XML-based policy language, speciﬁcally designedfor access control management in distributed systems. The latest version was approved by theOASIS standards organization as an international standard in 2017. The Enterprise PrivacyAuthorisation Language (EPAL) of IBM [15] was designed to regulate an organisation’s internalprivacy policies. EPAL is partly similar to XACML, however, it mainly focuses on privacy policiesinstead of access control policies in XACML.A-PPL [16] is an accountability policy language speciﬁcally designed for modelling data ac-countability (such as data retention, logging and notiﬁcation) in the cloud. A-PPL is an extensionof the the PrimeLife Privacy Policy Language (PPL) [17], which enables speciﬁcation of accessand usage control rules for the data subjects and the data controller. PPL is built upon XACML,and allows users to deﬁne the so-called sticky policies on personal data based on obligations. Obli-gation deﬁnes whether the policy language can trigger tasks that must be performed by a serverand a client, once some event occurs and the related condition is fulﬁlled. This is also referredto as the Event-Action-Condition paradigm. The Policy Description Language (PDL) [18], pro-posed by Bell Labs, is one of the ﬁrst policy-based management languages, speciﬁcally for networkadministration. It is declarative and is based on the Event-Action-Condition paradigm like PPL.RBAC (Role-Based Access Control) [19] is one of the most well-known role-based access controlpolicy languages. It uses roles and permissions in the enforced policies, namely, a subject can beassigned roles, and roles can be assigned certain access control permissions. ASL (AuthorizationSpeciﬁcation Language) [20] is another Role-based access control language based on ﬁrst orderlogic, and RBAC components. Ponder [21] is a declarative and object-oriented policy language,and designed for deﬁning and modelling security policies using RBAC, and security managementpolicies for distributed systems. The policies are deﬁned on roles or group of roles. Rei [22]is a policy language based on deontic logic, designed mainly for modelling security and privacyproperties of pervasive computing environments. Its syntax involves obligation and permission,where policies are deﬁned as constraints over permitted and obligated actions on resources.3 .2 Architecture Description Languages (ADLs)

Research on formal speciﬁcation of architectures can be categorised into two groups of languagesfor software and hardware architectures, respectively. Darwin [23], one of the ﬁrst languages forarchitectures, deﬁned interaction of components through bindings . Bindings associate servicesrequired by a component with the services provided by others. Its semantics is based on the π -calculus [24], a process algebra that makes Darwin capable of modelling dynamic architectures.In Wright [25], components are associated via the connector elements instead of bindings. Itssemantics is deﬁned in another process algebra, CSP [26], with the architecture speciﬁc portprocesses that specify external behaviour of a component, and spec process , the internal behaviourof a component.Similar to Darwin, Rapide [27] deﬁnes connections between the required service and providedservice “ports" of components. Similar to Wright, Rapide also supports connectors, but in a morelimited way (e.g. no ﬁrst class connector elements), and hence, the user can only specify explicitlinks between the the required and provided services. Unlike Wright, Rapide also deﬁnes theactions in and out for asynchronous communication. The semantics of Rapide is based on theevent pattern language [27], and is deﬁned as a partially ordered set of events. Among the morerecent ADLs, SOFA [28] also deﬁnes connectors, which the user can specify based on four types ofcommunication, a procedure call, messaging, streaming, and blackboard. The semantics of SOFAis based on Behaviour Protocol [29], a simpliﬁed version of CSP.AADL [30], one of the most broadly-used ADLs, is speciﬁcally designed for embedded systems.AADL deﬁnes three groups of components, one for software architectures (including thread, pro-cess, and subprogram), the second one is for hardware architectures (such as processor, memory),and the last group is for specifying composite types. In AADL, ports and subprogram-calls areused to deﬁne interaction between components. PRISMA [31], another recent ADL, was designedto address aspect-oriented software engineering. Similar to Wright, PRISMA deﬁnes ﬁrst-classconnector elements, which are speciﬁed with a set of roles (i.e. components) and the behaviourof the roles is deﬁned by aspects. The semantics of PRISMA is deﬁned with modal logic and π -calculus. A recent attempt of architectures speciﬁcation towards automation is proposed in theproject, called CONNECT [32]. The semantics of this ADL is based on the FSP (ﬁnite state pro-cess) algebra [33], which allows automation and stochastic analyses of architectures. Finally, UMLhas also been used to specify architectures in practice, however, it is more high-level and lacksformal semantics. We note that none of these ADLs support the speciﬁcation of data protectionand privacy properties. The main diﬀerences between the policy languages above and our work is that, for instance, P3P,APPEL, XPref and even PPL are mainly designed for web applications/services, and the policiesare deﬁned in a XML-based language, with restricted options for the users, while ours is designedfor any type of services. In addition, our policy language variant is deﬁned on data types (datatype centred language), and supports a more systematic and ﬁne-grained policy speciﬁcation, asits syntax and semantics cover seven sub-policies capturing a representative data life-cycle (fromthe point the data is collected until its deletion). Our language variant is inspired by the onesproposed in [34, 35], which were proposed for biometrics surveillance systems and log design. Wemodiﬁed and extend those to specify diﬀerent data protection requirements.Unlike the ADLs above, our architecture language variant is designed to capture the dataprotection and privacy properties, and also supports cryptographic primitives. Our language isdata type centred, and its semantics does not rely on process algebra like most above mentionedADLs but instead is based on the state of all the deﬁned data types in a system. This concept wasapplied in some of our previous works, such as in [36, 37]. The language variants in [36, 37] mainlyfocus on the computation and integrity veriﬁcation of data based on trust relations. Unlike [36,37],the language variant in this paper focuses primarily on data protection and privacy properties,rather than the data integrity perspective. 4inally, to the best of our knowledge, this is the ﬁrst work that studies and proposes a fullyautomated conformance check between the policy and architecture levels. Our veriﬁcation engineis based on the syntax of our policy and architecture language variants, and logic resolution basedproofs.

A policy is deﬁned from the perspective of a data controller. Here, we assume that the datacontrollers are service providers who collect, store, use or transfer the personal data of the datasubjects. The data subjects in our case are system users whose personal data is/will be collectedand used by the data controller.

A policy of a service provider, sp , is deﬁned on a ﬁnite set of diﬀerent entities EntitySet sppol ={ E i ,. . . , E i n }, and a ﬁnite set of data types DataTypes sppol = { θ ,. . . , θ m }, supported by theservice. The entities represent any data subject, data controller, organisations, hardware/softwarecomponents. Deﬁnition 1 (Data Protection Policy). The syntax of the data protection policies is deﬁned asthe collection of seven sub-policies on a given data type, namely:

POL

DataTypes sppol = Pol

Col × Pol

Use × Pol

Str × Pol

Del × Pol Fw × Pol

Has × Pol

Link .where:1. Pol

Col = Cons col × CPurp. (Data Collection Sub-policy)2. Pol

Use = Cons use × UPurp. (Data Usage Sub-policy)3. Pol

Str = Cons str × Where str . (Data Storage Sub-policy)4. Pol

Del = FromWhere del × Ret delay . (Data Retention Sub-policy)5. Pol Fw = Cons fw × List to × FwPurp. (Data Transfer Sub-policy)6. Pol

Has = Who canhave . (Data Possession Sub-policy)7. Pol

Link = Who canlink . (Data Connection Sub-policy)

1. The data collection sub-policy speciﬁes whether a collection consent is required (

Cons col )and a set collection purposes (

CPurp ). These aim at capturing the consent and purposeslimitation requirements in

Article 6 [9] and

Article 5(1)(b) [38] of the GDPR.2. The data usage sub-policy speciﬁes whether a usage consent is required (

Cons use ) for usinga type of data, and the purposes of the data usage (

UPurp ). Again, these capture the

Article6 [9] and

Article 30(1)(b)) [39] of the GDPR, respectively.3. The data storage sub-policy speciﬁes whether a storage consent is required (

Cons str ) forstoring a type of data, and where the data can be stored (

Where str ). These partly capturethe storage limitation principle in

Article 5(1)(e) [38] of the GDPR.4. The data deletion sub-policy speciﬁes from where the data can be deleted (

FromWhere del ),alongside the corresponding retention period (

Ret del ). These partly capture the

Article5(1)(e) and

Article 17(1)(a) [40] of the GDPR.5 θ A policy on a data type θ ( π θ = ( π col , π use , π str , π del , π fw , π has , π link )) π col A data collection sub-policy. π use A usage sub-policy. π str A storage sub-policy. π del A retention sub-policy. π fw A transfer sub-policy. π has A data possession sub-policy. π link A data connection sub-policy. π θ . π ∗ A sub-policy π ∗ of π θ , where ∗ ∈ { col , use , str , del , fw , has , link }. π ∗ . arg An argument of a sub-policy π ∗ , ∗ ∈ { col , use , str , del , fw , has , link }. cons Specify if a consent is required ( Y for Yes, N for No). upurp, cpurp, fwpurp A set of usage, collection, and forward purposes, respectively,where each set is of the form { act : θ , . . . , act n : θ n }. act i : θ i A purpose (speciﬁes that a piece of data is used for an action act i ,and as a result we get a piece of data of type θ i ). where A set of places where a piece of data of type θ can be stored. fromwhere A set of places from where a piece of data of type θ can be deleted. deld A deletion delay value. fwto

A set of entities to which a piece of data can be transferred. whocanhave

A set of entities who has the right to have a type of data. whocanlink

A set that records which entity has the right to link which pairs of types of data.

Table 1: The notiﬁcations used in the policy syntax.5. The data transfer sub-policy involves whether a transfer consent is required (

Cons fw ), andall the entities to which the data can be transferred ( List to ) with the purposes in FwPurp .These partly capture the requirement of transferring data the third-party organisations in

Article 46(1) [41], GDPR.6. The data possession sub-policy determines who has the right to possess this type of data.7. The data connection sub-policy determines who has the right to link two types of data.A policy is deﬁned on a data type ( θ ), speciﬁcally, let π θ , π θ ∈ POL

DataTypes sppol , be a policydeﬁned on a data type θ , and on the seven sub-policies π col ∈ Pol

Col , π use ∈ Pol

Use , π str ∈ Pol

Str , π del ∈ Pol

Del , π fw ∈ Pol

F w , π has ∈ Pol

Has , π link ∈ Pol

Link , where π θ = ( π col , π use , π str , π del , π fw , π has , π link ).Each sub-policy of π θ is deﬁned as follows:1. π col = ( cons , cpurp ), with cons ∈ { Y , N }. This speciﬁes whether consent is required tobe collected from the data subjects (Y) or not (N) for a data type θ , and cpurp is a setof collection purposes. A purpose has the form act i : θ i , which speciﬁes that a piece of dataof type θ is used for an action act i to get some data of type θ i (e.g. name is collected for create : account ).2. π use = ( cons , upurp ), with a usage consent requirement, cons ∈ { Y , N }, and upurp , a setof usage purposes.3. π str = ( cons , where ), where where is a set of places where a piece of data of type θ can bestored, for instance, in a client’s machine (e.g. denoted by clientPC ), at a third party cloudservice, or in the service provider’s main or backup storage places (denoted by mainstorage , backupstorage ).4. π del = ( fromwhere , deld ), with• fromwhere deﬁnes the locations from where a piece of data can be deleted. This stronglydepends on the storage locations deﬁned in the storage policy (point 3).6 θ , v ) A pair of data type θ and data value v . θ A data type value θ (not a variable). θ A data type value that we get as a result of an service_spec_use_event (e.g. createat or calculateat) on a piece of data of type θ and value v . t Captures a time value when an event takes place. E to An entity value to whom a piece of data is transferred/forwarded. E from An entity value from which a piece of data is originated. place

A place where a piece of data of type θ and value v is stored. Itcan be mainstorage , backupstorage , or some other service spec. place. Table 2: The notiﬁcations used in the policy semantics.• deld is the delay value for deletion. This value can be either tt , which refers to a “ NonSpeciﬁc time”, or a speciﬁc “numerical" time value (e.g. 1 day, 10 mins, 5 years, etc.).5. π fw = ( cons , fwto , fwpurp ), where cons captures the requirements for transfer consent, and fwto speciﬁes a set of entities to whom the data can be transferred. Finally, fwpurp is a setof purposes for data transfer.6. π has = whocanhave , where whocanhave is a set of entities in the system that have the rightto have or possess a piece of data of type θ . If we forbid for a given entity to be able to havea given data type, then that entity must not have it (e.g. by intercepting, eavesdropping, orcalculating).7. π link = whocanlink , where whocanlink = {( E , θ ),. . . , ( E k , θ k )}, is a set of pairs of entitiesand data types deﬁned in the system. Each pair ( E i , θ i ) speciﬁes that E i has the right tolink two pieces of data of types θ and θ i . For instance, whether a service provider has theright to link a piece of information about someone’s disease with their work place.Finally, let { θ , . . . , θ m } be a set of all data types supported by the service of a provider sp ,we have:The data protection policy of a service provider sp is deﬁned by the set PL = { π θ , . . . , π θ m } . The semantics of the policy syntax can be deﬁned using the events that capture the actionsperformed by diﬀerent entities during an instance of a system operation. An event is deﬁned by atuple starting with an event name denoting an action done by an entity, followed by the time ofthe event, and some further parameters required by the action.Our language includes the following pre-deﬁned events: cconsentat , collectat , uconsentat , scon-sentat , service_spec_use_event , storeat , deleteat , fwconsentat , and forwardat , deﬁned as follows: Ev1 : ( cconsentat , t , E from , θ ). This event speciﬁes that a collection consent is being collected attime t , by the service provider for a piece of data of type θ from an entity E from .E.g. ( cconsentat , 2020.01.21.11:18, client, personalinfo) Ev2 : ( collectat , t , E from , θ , v ). This event speciﬁes when a piece of data of type θ and value v iscollected by the service provider from E from at time t .E.g. ( collectat , 2020.01.21.11:20, client, personalinfo, Peter) v3 : ( uconsentat , t , E from , θ ). This event speciﬁes that a usage consent is collected by the serviceprovider at time t from E from .E.g. ( uconsentat , 2020.01.21.11:18, client, energyconsumption) Ev4 : ( service_spec_use_event , t , E from , θ , θ , v ). This captures a service speciﬁc event thatspeciﬁes the usage of a piece of data, for example, using a piece of data to create or calculate someother data. A piece of data type θ is used by E from to obtain a piece of data type θ .E.g. ( createat , 2020.01.30.15:45, client, bill, energyconsumption, 20kWh) Ev5 : ( sconsentat , t , E from , θ ). This event speciﬁes that a storage consent is being collected bythe service provider for a piece of data of type θ from an entity E from .E.g. ( sconsentat , 2020.01.30.15:45, client, sickness) Ev6 : ( storeat , t , E from , θ , v , place ). This event speciﬁes that a piece of data of type θ and value v is stored at a place place at time t . We note that unlike the rest events, which are all related to anaction carried out by a service provider, this event can capture an action done by a diﬀerent entityas well. For example, if place = clientpc , then event storeat can refer to a storage action done by aclient PC. E.g. ( storeat , 2020.01.30.15:45, client, sickness, leukemia, backupstorage) Ev7 : ( deleteat , t , E from , θ , v , place ). This event speciﬁes that at some time t , a service providerdeletes a piece of data of type θ and value v from a place place .E.g. ( deleteat , 2020.01.30.15:45, client, sickness, leukemia, mainstorage) Ev8 : ( fwconsentat , t , E to , E from , θ ). This event speciﬁes that a service provider is collecting adata transfer consent on a piece of data of type θ from E from .E.g. ( fwconsentat , 2020.01.21.11:18, insurancecompany, client, personalinfo) Ev9 : ( forwardat , t , E to , E from , θ , v ). Finally, this event captures that at time t , E to receives apiece of data transferred by a service provider. This data has a type θ and value v , and is originatedfrom E from .E.g. ( forwardat , 2020.01.21.11:18, insurancecompany, client, personalinfo, Peter) We discuss the policy compliant system operations based on the events deﬁned in Section 3.2.1.We deﬁne 11 rules ( C - C ), where each rule deﬁnes a system operation that respects a sub-policyin Deﬁnition 1 (see Figure 2 for some illustration). In the sequel, we refer to each element e of atuple tup as tup . e , for example, we refer to π str in π θ as π θ . π str . In rules C - C , we assume thatthe data of type θ has not been deleted yet (between any two actions below).• C (collection consent): If in π θ . π col , cons = Y , then a consent must be collected before thecollection of the data itself. Formally: If during a system operation trace, ∃ Ev1 ( collectat , t , E from , θ , v ) for some time t ,then ∃ Ev2 ( cconsentat , t , E from , θ ) for some t in the trace, such that t ≥ t . • C (collection purposes): If in π θ . π col , cpurp = { act : θ , . . . , act n : θ n }, then the data of type θ must not be collected for any purpose that is not in cpurp . Formally: If during a system operation trace, ∃ ( collectat , t , E from , θ , v ) for some time t ,then there is not any instance of Ev4 , namely, event ( act , t , E from , θ , θ , v )for act : θ / ∈ cpurp , where t ≥ t . C (usage consent): For π θ , if cons = Y in π θ . π use , then consent must be collected beforethe usage of the data. Formally: If during a system operation trace, ∃ ( service_spec_use_event , t , E from , θ , θ , v )for some time t , then ∃ ( uconsentat , t , E from , θ ) for some t , such that t ≥ t . • C (usage purposes): If in π θ . π use , upurp = { act : θ , . . . , act n : θ n }, then the data must notbe collected for any purpose not in upurp . Formally: If during a system operation trace, there is an instance of

Ev4 , ( act , t , E from , θ , θ , v )for some time t , then act : θ ∈ upurp . • C (storage consent): If in π θ . π str , cons = Y , then a consent must be collected before thestorage of the data itself. Formally: If during a system operation trace, ∃ ( storeat , t , E from , θ , v , places ) for some time t ,then ∃ ( sconsentat , t , E from , E , θ ) for some t in the trace, such that t ≥ t . • C (storage places): If in π θ . π str , where = { place , . . . , place m }, then this data type mustnot be stored in any place that is not in where . Formally: If during a system operation trace, ∃ ( storeat , t , E from , θ , v , place ) for some time t ,then place ∈ where . • C (deletion places): If in π θ . π del , fromwhere = { place , . . . , place m }, then this data typemust be deleted from all the places in fromwhere . Formally: For all the events( deleteat , t , E from , θ , v , place ), . . . , ( deleteat , t n , E from , θ , v , place n )in a system operation trace, { place , . . . , place n } = fromwhere . • C (deletion delay): If in π θ . π del , deld = delay , then this data type must be deleted within delay time from the time of its collection. Formally: If during a system operation trace, ∃ ( collectat , t , E from , θ , v ) for some time t , and ∃ events ( deleteat , t , E from , θ , v , places ), . . . , ( deleteat , t n , E from , θ , v , places n ),for some n , then t + delay ≥ t ≥ t , . . . , t + delay ≥ t n ≥ t . • C (transfer consent): If in π θ . π fw , cons = Y , then a consent must be collected before thetransfer of the data. Formally: If during a system operation trace, ∃ ( forwardat , t , E to , E from , θ , v ) for some time t ,then ∃ ( fwconsentat , t , E to , E from , θ ), such that t ≥ t . • C (transfer to): If in π θ . π fw , fwto = { E , . . . , E n }, then the data must not be transferredto any other entity not in fwto . Formally: If during a system operation trace, ∃ ( forwardat , t , E to , E from , θ , v ) for some time t ,then E to ∈ fwto . • C (transfer purposes): If in π θ . π fw , fwpurp = { act : θ , . . . , act n : θ n }, then the data mustnot be transferred for any other purpose not in fwpurp . Formally: If during a system operation trace, ∃ ( forwardat , t , E to , E from , θ , v ) for some time t ,then there is not any instance of Ev4 , namely, event ( act , t , E to , θ , θ , v )for act : θ / ∈ fwpurp , where t ≥ t . ystem operation/service starts …(cconsentat, t’, (cid:2161) (cid:2188)(cid:2200)(cid:2197)(cid:2195) , )t’ (cid:2016) (collectat, t, (cid:2161) (cid:2188)(cid:2200)(cid:2197)(cid:2195) , , v) (cid:2016) … t …C1: system operation/service starts …(collectat, t, (cid:2161) (cid:2188)(cid:2200)(cid:2197)(cid:2195) , , v) t’ (cid:2016) (act’, t’, (cid:2161) (cid:2188)(cid:2200)(cid:2197)(cid:2195) , , , v) (cid:2016) …t …C2: (cid:2016)(cid:1314) system operation/service starts …(collectat, t, (cid:2161) (cid:2188)(cid:2200)(cid:2197)(cid:2195) , , v) ti (cid:2016) (deleteat, ti, (cid:2161) (cid:2188)(cid:2200)(cid:2197)(cid:2195) , , v , place) (cid:2016) …t …C8: t + delay…C11: system operation/service starts …(forwardat, t, (cid:2161) (cid:2202)(cid:2197) , (cid:2161) (cid:2188)(cid:2200)(cid:2197)(cid:2195) , , v) t’ (cid:2016) (act’, t’, (cid:2161) (cid:2202)(cid:2197) , , , v) (cid:2016) …t (cid:2016)(cid:1314) … Figure 2: The illustration of some policy compliance rules.

The data state keeps track of how the state of each data during a given system/service changesafter an event (deﬁned in Section 3.2.1) takes place.

Data:

A piece of data is deﬁned by a pair of data subject and type, namely, data = ( θ , E from ). Data States:

The semantics of policy events is deﬁned based on local states and the globalstate of the data types deﬁned in a system. Given a service provider sp , a local state captures thevalues of data = ( θ , E from ), for all θ ∈ DataTypes sppol from the perspective of an entity E , E ∈ EntitySet sppol . Intuitively, a local state of E captures how the value of ( θ , E from ) changes from theperspective of E during a system operation.Formally, a local state of E is a function StatePol V that assigns a value (including the undeﬁnedvalue ⊥ ) to each data ( θ , E from ). Local state of E (denoted by µ E ) StatePol E : Var Val ⊥ , where Var is a set of all possible data variables and

Val ⊥ a set of all possible values, including the undeﬁned value ⊥ . Assume that

EntitySet sppol = { E , . . . , E m }, the global state , deﬁned on a policy, is the collectionof all the local states in a corresponding system/service sp . A global state is denoted by µ , where µ = ( µ E , . . . , µ E m , T T ). Global state of an architecture (denoted by µ ) StatePol : StatePol mV × TVar . The initial ( global ) state for a policy PL is denoted by σ init , and is the collection of the initialstates of each deﬁned entity. Initially, the values of all the data have the undeﬁned value, ⊥ .10 init : Initial Global State µ init = ( µ initE , . . . , µ initE m , T T init ) with ∀ i ∈ [1 , m ] , µ initE i = ( ⊥ , . . . , ⊥ ) T T init = ⊥ . Event trace and state updates:

An event trace of a policy PL is denoted by τ PL , andcontains a ﬁnite sequence of events deﬁned in Figure 3.2.1, happening during a correspondingsystem operation. Below we deﬁne the semantics function, denoted by S polT , which deﬁnes how atrace τ PL changes the global state (Figure 3). S polT relies on the function S polE that deﬁnes how an event in τ PL changes the current globalstate of PL . Semantics function (Policy) S T : EventTrace × StatePol StatePol S E : Event × StatePol StatePol

Deﬁnition 2 (The semantics of policies)

The semantics of a policy PL is deﬁned as a set ofglobal states that can be reached from the initial global state : { µ ∈ StatePol | ∃ τ PL , S polT ( τ PL , µ init ) = µ } . S polT ( emptytrace , µ ) = µ S polT ( event . τ PL , µ ) = S polT ( τ PL , S polE ( event , µ )) S polE (( cconsentat , t , E from , θ ), µ ) = µ [ µ E /µ E [( cconsenttype , E from ) : v cconsent ] , T T /t ],where v cconsent is the value of a collection consent. S polE (( collectat , t , E from , θ , v ), µ ) = µ [ µ E / µ E [( θ, E from ) : v ], T T / t ] S E (( uconsentat , t , E from , θ ), µ ) = µ [ µ E / µ E [( uconsenttype , E from ) : v uconsent ], T T / t ]where v uconsent is the value of a usage consent. S E (( createat , t , E from , θ , θ , v ), µ ) = µ [ µ E / µ E [( θ , E from ) : v ], T T / t ] S E (( calculateat , t , E from , θ , θ , v ), µ ) = µ [ µ E / µ E [( θ , E from ) : v ], T T / t ] S E (( sconsentat , t , E from , θ ), µ ) = µ [ µ E /µ E [( sconsenttype , E from ) : v sconsent ], T T / t ]where v sconsent is the value of a storage consent. S E (( storeat , t , E from , θ , v , place ), µ ) = µ [ µ place / µ place [( θ, E from ) : v ], T T / t ] S E (( deleteat , t , E from , θ , v , place ), µ )= µ [ µ E / µ E [( θ , E from ) : ⊥ , ( cconsenttype , E from ) : ⊥ , ( uconsenttype , E from ) : ⊥ , ( sconsenttype , E from ) : ⊥ ,( fwconsenttype , E from ) : ⊥ ], T T / t )]. S E (( fwconsentat , t , E to , E from , θ ), µ ) = µ [ µ E / µ E [( fwconsenttype , E from ) : v fwconsent ], T T / t ]where v fwconsent is the value of a transfer consent. S polE (( forwardat , t , E to , E from , θ , v ), µ ) = µ [ µ E to / µ E to [( θ, E from ) : v ], T T / t ] Figure 3: The semantics of the policy events, where createat and calculateat are the two instancesof service_spec_use_event . 11ach event can either change the global state or leave it unchanged. To capture the modiﬁcationmade by an event at time t on the state of the variable ( θ , E from ) from the perspective of an entity E we write µ [ µ E /µ E [( θ, E from ) : v ] , T T /t ] (or µ [ µ E /µ E [( θ, E from ) : ⊥ ] , T T /t ] in case of the undeﬁnedvalue, e.g. when a piece of data has been deleted). Intuitively, this notation captures that theold state µ E is replaced with the new state µ E [( θ, E from ) : v ] ( µ E [( θ, E from ) : ⊥ ]), in which thevariable ( θ, E from ) has been given the value v (or the undeﬁned value ⊥ ) as a result of the event,the time variable T T is given the time value t . System architectures describe how a system is composed of components and how these componentsrelate to each other (which is abstracted away from the policy), however, they abstract away fromthe implementation details, such as the cryptographic algorithms, the speciﬁc order and timing ofthe messages.

In line with the policy speciﬁcation, a system architecture is deﬁned on a set of entities (compo-nents) and data types. For a service provider sp , we deﬁne a ﬁnite set of entities, EntitySet sparch = { E i , . . . , E i n }. Let DataTypes sparch = { θ ,. . . , θ m } be the set of all the data types deﬁned inan architecture. We assume a ﬁnite set of data variables Var , ( X θ ∈ Var ), time variables (

T T ∈ TVar ), data values

Val ( V θ ∈ Val ), and time and deletion delay values ( t ∈ TVal , dd ∈ DVal ). Terms:

As shown in Figure 4, a term, denoted by T , can be:• A variable ( X θ ) that represents some data of type θ , and a data constant or value ( V θ ) oftype θ .• A special term ds that speciﬁes the real identity of a data subject (this will be used formodelling pseudonyms). A term can be an entity E that speciﬁes any software or hardwarecomponent, organisations, a data controller, or a data subject.• A special function ( SpecFunc ) that speciﬁes the time, pseudonyms and consents. Finally, aterm can be a time value (

T i ).A variable X θ ∈ Var represents a piece of data of type θ supported by sp , such as the users’personal information, photos, videos, energy data, insurance number, etc. X θ can be a non-function/simple data D θ of type θ , a cryptographic or meta function ( CryptoFunc ), and ﬁnally,any other service speciﬁc function.

Functions:

The two groups of functions

SpecFunc and

CryptoFunc are deﬁned as follows:• Function

Time ( Ti ) speciﬁes the time with either a non-speciﬁc time value TT or a numericaldelay value, dd . While dd captures a numerical time value such as 3 years, 2 months, etc., thevalue TT is not numerical, and is used to express the informal term “at some point/time".Function P ( ds ) speciﬁes a pseudonym of a real identity ds .• Cconsent ( Data ), Uconsent ( Data ) and

Sconsent ( Data ), besides

Data = ( X θ , E from ),specify a piece of data of type collection, usage, and storage consent, respectively, on apiece of data X θ that is originally sent by E from . Finally, Fwconsent ( Data , E to ) speciﬁesa transfer consent on Data , alongside an entity to whom the data can be transferred ( E to ).• Meta ( X θ ) deﬁnes the metadata (information about other data), or information located inthe header of the packets (e.g. IP address). For simplicity, they are both modelled by Meta .• The basic cryptographic functions: – Sk ( X pkeytype ): This function deﬁnes a private key used in asymmetric key encryptionalgorithms. Its argument has a type of public key (pkeytype).12 erms: T ::= X θ | V θ | ds | E | SpecFunc | T i . X θ ::= D θ | CryptoFunc | Service_spec_fun ( X θ , . . . , X θ n ).(where TYPE( CryptoFunc ) = θ , TYPE( Service_spec_fun ( X θ , . . . , X θ n )) = θ ). T i ::= dd | T T . SpecFunc ::=

Time ( Ti ) | P ( ds ) | Cconsent ( Data ) | Uconsent ( Data ). | Sconsent ( Data ) | Fwconsent ( Data , E to ) Data ::= ( X θ , E from ) where E from is an entity who originally sent the data X θ . CryptoFunc ::= Sk ( X pkeytype ) | Senc ( X θ , X keytype ) | Aenc ( X θ , X pkeytype ). | Hash ( X θ ) | Mac ( X θ , X keytype ) | Meta ( X θ ).Destructor application on terms: G ( T , . . . , T n ) → T Function that returns a type of a term T :TYPE( T ) = θ , where θ ∈ DataTypes sparch .Function

HasAccessTo : HasAccessTo : E i ∈ EntitySet sparch → { E j ∈ EntitySet sparch }. Figure 4: Terms, Destructors and Types.13

Senc ( X θ , X keytype ): This function deﬁnes a symmetric key encryption, and has twoarguments, a piece of data (of type θ ) and a symmetric key (of type keytype). – Aenc ( X θ , X pkeytype ): This is the type of the cipher text resulted from an asymmetrickey encryption, and has two arguments, a piece of data and a public key (pkeytype). – Mac ( X θ , X keytype ): The type of the message authentication code that has two argu-ments, a piece of data and a symmetric key. – Hash ( X θ ): The type of the cryptographic hash that has one argument, a piece of dataof type θ . Values : A variable X θ will be given a speciﬁc data value V θ during an instance of a systemrun (see Section 4.2). V θ can be the value of both a simple (non-function) data or a function, andit can also be ⊥ , which denotes an undeﬁned value (every data variable X θ has the value ⊥ at thestart of a service). Destructor:

This represents an evaluation of a function, used to model a veriﬁcation pro-cedure. For instance, if X enc = Senc ( X name , X Skey ) that represents the encryption of data X with the server key X Skey , and X Skey represents a symmetric key, then G ( X enc , X Skey ) → X is Dec ( Senc ( X name , X Skey ), X Skey ) → X name . Note that not all functions have a corresponding de-structor, e.g., in case X hash is a one-way cryptographic hash function, X hash = Hash ( X password ),then due to the one-way property there is no destructor (reverse procedure) that returns X password from the hash X hash . HasAccessTo : This is a function that expects an entity as input and returns a set of otherentities deﬁned in the same architecture. It speciﬁes which entity can have access to the datahandled/stores/collected by other entities. For example, if E m and E p represent a smart meter,and a digital panel, respectively, and we want to specify that the service provider, sp , can haveaccess to the panel and the meter, then, we deﬁne the relation HasAccessTo ( sp ) = { E m , E p }. Itis used for verifying the data possesion and link policies. An architecture PA is deﬁned as a set of actions (denoted by {F} ). The formal deﬁnition of architectures is given as follows: PA ::= {F}F ::= OWN ( E , X θ ) | CALCULATEAT ( E , X θ , Time ( TT )) | CREATEAT ( E , X θ , Time ( TT )) | RECEIVEAT ( E , Data , Time ( TT )) | RECEIVEAT ( E , Cconsent ( Data ), Time ( TT )) | RECEIVEAT ( E , Uconsent ( Data ), Time ( TT )) | RECEIVEAT ( E , Sconsent ( Data ), Time ( TT )) | RECEIVEAT ( E , Fwconsent ( Data , E to ), Time ( TT )) | STOREAT ( E , Data , Time ( TT )) | DELETEWITHIN ( E , Data , Time ( dd )) | CALCULATE ( E , X θ ) | CREATE ( E , X θ ) | RECEIVE ( E , Data ) | STORE ( E , Data ) Where Data = ( X θ , E from ), X θ is originally sent by E from . Figure 5: The table shows the syntax of a system architecture with the deﬁned actions betweencomponents/entities. 14 Action

OWN ( E , X θ ) captures that E can own the data variable X of type θ (during aservice regardless of time). Note that X θ is the originally owned data (not the data ob-tained/received by E ).• CALCULATEAT ( E , X θ , Time ( TT )) speciﬁes that an entity E can calculate the variable X θ based on an equation X θ = T , for some term T at non-speciﬁc time T T (e.g. θ = bill,and X θ = Bill ( energyconsumption, tariﬀ )).• CREATEAT ( E , X θ , Time ( TT )) speciﬁes that E can create a piece of data of type θ , basedon an equation X θ = T (e.g. θ = account, and X θ = Account(name, address) ). The actions create and calculate merely diﬀer in the nature of T , for example, we calculate a bill, whilecreate an account.• RECEIVEAT ( E , Data , Time ( TT )) means that E can receive Data (i.e. ( X θ , E from )) attime TT .• RECEIVEAT ( E , Cconsent ( Data ), Time ( TT )), RECEIVEAT ( E , Uconsent ( Data ), Time ( TT )),and RECEIVEAT ( E , Sconsent ( Data ), Time ( TT )) speciﬁes that a collection, usage andstorage consent on Data , Data =( X θ , E from ), can be received by E at time TT .• RECEIVEAT ( E , Fwconsent ( Data , E to ), Time ( TT )) speciﬁes that a transfer consent on Data and E to can be received by E at time TT .• STOREAT ( E , Data , Time ( TT )) speciﬁes that Data can be stored at some non-speciﬁctime TT in a place E . A place can be mainstorage and backupstorage , which representa collection of main storage places such as main servers, and a collection of backup storageplaces (e.g. backup servers) of a service provider, respectively, or any service speciﬁc place(e.g., clientPC ).• DELETEWITHIN ( E , Data , Time ( dd )) speciﬁes that Data must be deleted from a place E within a certain time delay dd (where dd is a numerical time value, e.g. 10 years).• The last four CALCULATE, CREATE, RECEIVE and STORE actions at the end are thecorresponding versions of the previous four but without the Time () construct. They capturethe correspinding actions regardless of time. The semantics of these four actions are the sameas the previous four. They are deﬁned for convenient purposes, oﬀering a user an option tospecify a simpler actions if they only want to reason about privacy properties. The actionswith the

Time () construct are main used for reasoning about data protection propertiesand requirements such as whether a consent has been collected before collection, usage, ortransfer.

Phone server

Service provider (sp) Contact tracing app (capp) mainstorage

STOREAT(mainstorage, Positivetest(id, places), capp, Time(t))

Figure 6: A simple example architecture, where,

Data = ( X θ , E from ) = ( Positivetest(id, places) , capp ).An example architecture is shown in Figure 6, where a service provider collects positive (virus)test records sent by contact tracing apps. A record contains an unique ID and a set of places15here the phone has been brought to, and the record is stored in the main storage place(s) of sp .We also deﬁne HasAccessTo( sp ) = { server , mainstorage } so that sp can have access to server and mainstorage . Like the policy case, the semantics of an architecture is based on events and system run traces. Atrace Γ is a sequence of high-level events

Seq ( (cid:15) ) taking place in during a service, as presented inFigure 7. Γ ::=

Seq ( (cid:15) ) (cid:15) ::= own ( E , X θ : V θ , t ), for all t in any traces during a service | calculateat ( E , X θ : T , t ) | createat ( E , X θ : T , t ) | receiveat ( E , Data : V TYPE(Data) , t ) | receiveat ( E , Cconsent ( Data ): V cconsent , t ) | receiveat ( E , Uconsent ( Data ): V uconsent , t ) | receiveat ( E , Sconsent ( Data ): V sconsent , t ) | receiveat ( E , Fwconsent ( Data ): V fwconsent , t ) | storeat ( E , Data : V TYPE(Data) , t ) | deletewithin ( E , Data : V TYPE(Data) , dd , t ). Where Data = ( X θ , E from ), X θ is originally sent by E from . Figure 7: Events deﬁned for architectures.An event can be seen as an instance of an action deﬁned in Figure 5 that happens at somespeciﬁc time t (e.g. ) during a system run trace. Events are given the samenames as the corresponding actions but in lower-case letters in order to avoid confusion.• Event own ( E , X θ : V θ , t all ) captures that E owns X θ with a value V θ at time t all (where t all denotes “all the time" during a service). X θ : V θ means that the variable X θ is assigned avalue V θ .• calculateat ( E , X θ : T , t ) captures that at some time t , E calculates a piece of data of type θ that is equal to a term T (based on the equation X θ = T , e.g. X hash = Hash ( X password ).).• createat ( E , X θ : T , t ) captures that at some time t , E creates a piece of data of type θ thatis equal to a term T (e.g. X θ = Account ( X name , X address )).• receiveat ( E , Data : V TYPE(Data) , t ) speciﬁes that E receives a piece of data of type TYPE(Data) and value V TYPE(Data) at some speciﬁc time t .• Events receiveat ( E , Cconsent ( Data ): V cconsent , t ), receiveat ( E , Uconsent ( Data ): V uconsent , t ), receiveat ( E , Sconsent ( Data ): V sconsent , t ), and receiveat ( E , Fwconsent ( Data ): V fwconsent , t ) specify that E receives a (collection, usage, storage, or transfer) consent on Data witha value V θ , where θ is a corresponding type of consent ( θ ∈ { cconsent , uconsent , sconsent , fwconsent }).• storeat ( E , Data : V TYPE(Data) , t ) says that a piece of data of type TYPE(Data) is stored in aplace E .• deletewithin ( E , Data : V TYPE(Data) , dd , t ) speciﬁes that at time t , a piece of data of type TYPE(Data) is deleted from a place E , where t ≤ t collect + dd , where the data was collectedat t collect . V θ can be a name, e.g. Peter, that is assigned to the X θ during a service/system operation. This can be extended to the time of any other action (e.g. the time when the data is stored). S T , whichspeciﬁes the impact made by each event on the states of the data variables (i.e. how the values of X θ , for all θ ∈ DataTypes sparch , changes after an event takes place). For example, let

DataTypes sparch ={ name , bill }, the two types supported by sp , and Entity sparch ={ sp , client }. At the start of theservice, the variable states of both sp and client are ( X name = ⊥ , X bill = ⊥ ), where ⊥ is anundeﬁned (initial) value. As a result of an event own ( client , X name : Peter , t all ), the variable stateof sp remains unchanged, while the state of client has changed to ( X name = Peter , X bill = ⊥ ). States:

The semantics of events is deﬁned based on local states and the global state of the datatypes deﬁned in a system. Given a service provider sp , a local state captures the values of (adata variable) X θ , for all θ ∈ DataTypes sparch from the perspective of an entity (component) E .Intuitively, a local state of E captures how the value of X θ , θ ∈ DataTypes sparch , changes from theperspective of an E during a system operation.Formally, a local state of E is a function State V that assigns a value (including the undeﬁnedvalue ⊥ ) to each variable. Local state of E (denoted by µ E ) State E : Var Val ⊥ , where Var is a set of all possible data variables and

Val ⊥ a set of all possible values, including the undeﬁned value ⊥ . Assume that there are m entities E , . . . , E m deﬁned in an architecture. The global state ofan architecture is the collection of all the local states in a system. A global state is denoted by µ ,where µ = ( µ E , . . . , µ E m , T T ). Global state of an architecture (denoted by µ ) State : State mV × TVar . The initial ( global ) state for an architecture PA is denoted by σ init , and is the collection ofthe initial states of each deﬁned entity. Initially the values of all the variables deﬁned in thearchitecture (including the time variable) have the undeﬁned value, ⊥ . µ init : Initial Global State µ init = ( µ initE , . . . , µ initE m , T T init ) with ∀ i ∈ [1 , m ] , µ initE i = ( ⊥ , . . . , ⊥ ) T T init = ⊥ . Event trace and state updates:

An event trace of an architecture PA is denoted by τ PA ,and contains a ﬁnite sequence of events deﬁned in Figure 7, happening during a system operation.Below we deﬁne the semantics function, denoted by S T , which deﬁnes how a trace τ PA changesthe global state of an architecture (Figure 8). S T makes use of the function S E , which deﬁnes how each event in τ PA changes the currentglobal state of PA . Semantics function S T : EventTrace × State State S E : Event × State State

Deﬁnition 3 (The semantics of architectures)

The semantics of an architecture PA is de-ﬁned as a set of global states that can be reached from the initial global state : T ( emptytrace , µ ) = µ S T ( event . τ PA , µ ) = S T ( τ PA , S E ( event , µ )) S E ( own ( E , X θ : V θ , t ), µ ) = µ [ µ E /µ E [ X θ /V θ ] , T T /t ] S E ( calculateat ( E , X θ : T , t ), µ ) = µ [ µ E / µ E [ X θ / eval ( T , µ E )], T T / t ] S E ( createat ( E , X θ : T , t ), µ ) = µ [ µ E / µ E [ X θ / eval ( T , µ E )], T T / t ] S E ( receiveat ( E , Data : V TYPE(Data) , t ), µ ) = µ [ µ E /µ E [ Data /V TYPE(Data) ] , T T /t ] S E ( receiveat ( E , Cconsent ( Data ): V cconsent , t ), µ ) = µ [ µ E [ Cconsent ( Data ) /V cconsent ] , T T /t ] S E ( receiveat ( E , Uconsent ( Data ): V uconsent , t ), µ ) = µ [ µ E [ Uconsent ( Data ) /V uconsent ] , T T /t ] S E ( receiveat ( E , Sconsent ( Data )/ V sconsent , t ), µ ) = µ [ µ E [ Sconsent ( Data ) /V sconsent ] , T T /t ] S E ( receiveat ( E , Fwconsent ( Data )/ V fwconsent , t ), µ ) = µ [ µ E [ Fwconsent ( Data ) /V fwconsent ] , T T /t ] S E ( storeat ( E , Data : V TYPE(Data) , t ), µ ) = µ [ µ E / µ E [ X θ / V θ ], T T / t ] S E ( deletewithin ( E , Data : V TYPE(Data) , dd , t ), µ )= µ [ µ E / µ E [ X θ / ⊥ , Cconsent ( Data )/ ⊥ , Uconsent ( Data )/ ⊥ , Sconsent ( Data )/ ⊥ , Fwconsent ( Data )/ ⊥ ], T T / t )]. Figure 8: The semantics of architectural events. { µ ∈ State | ∃ τ PA , S T ( τ PA , µ init ) = µ } . Each event can either change the global state or leave it unchanged. To capture the modiﬁcationmade by an event at time t on (only) the variable state of an entity E we write µ [ µ E /µ E [ X θ /V θ ] , T T /t ](or µ [ µ E /µ E [ X θ / ⊥ ] , T T /t ] in case of the undeﬁned value, e.g., when a variable has been deleted).Intuitively, this denotation captures that the old state µ e is replaced with the new state µ E [ X θ /V θ ]( µ E [ X θ / ⊥ ]), in which the variable X θ has been given the value V θ (or the undeﬁned value ⊥ ) asa result of the event, the time variable T T is given the value t . eval ( T , µ E ) is a function thatevaluates the variables in T with µ E . We propose three types of conformance relation: (i) privacy conformance, (ii) conformance withregards to data protection properties (which we refer to as DPR conformance in this paper), and(iii) functional conformance. Privacy conformance compares a policy and an architecture basedon the privacy properties. Speciﬁcally, if we do not give an entity the right to have or link certaintypes of data, then in the architecture this entity cannot have or link those types of data.

Deﬁnition 4 (Proposed privacy conformance deﬁnition)1. If in a policy π θ an entity E does not have the right to have any data of type θ , then E cannot have this type of data in the corresponding architecture.2. If in a policy π θ an entity E does not have the right to link two types of data, θ and θ ,then E cannot link these types of data in the corresponding architecture. The DPR conformance relation deals with the data protection requirements (speciﬁed in thesub-policies), such as appropriate consent collection, satisfaction of the deﬁned deletion/retentiondelay, appropriate storage and transfer of a given type of data.18 eﬁnition 5 (Proposed DPR conformance deﬁnition):1. If in a policy π θ , the collection of a (collection, usage, storage, or transfer) consent is requiredfor a piece of data of a given type, then in the architecture the reception of a consent canhappen before or at the same time with the reception of the data itself.2. If in an architecture there is an action act (createat or calculateat) deﬁned on a data type θ , then in the policy π θ , there is a (collection, usage, storage, or transfer) purpose act : θ deﬁned for the type θ (besides some θ ).3. If in an architecture a piece of data of type θ can be stored in some storage place, strplace ,then in the policy π θ , strplace ∈ π str . where (see Table 1 for notations).4. If in the policy π θ , delplace ∈ π del . fromwhere , then in the corresponding architecture thesame data type can be deleted from the place delplace .5. If in an architecture, a piece of data of type θ can be deleted within a delay dd (from col-lection), then in the corresponding policy π θ , dd ≤ π del . deld . In other words, the retentiondelay deﬁned in the policy must be respected in the architecture.6. If in an architecture, a piece of data of type θ can be transferred to an entity E , then in thepolicy π θ , E ∈ π fw . fwto (again, see Table 1 for notations). Finally, functional conformance compares a policy and an architecture from the perspective offunctionality or eﬀectiveness. This conformance can help a system designer to ﬁnd an appropriatetrade-oﬀ between functionality and privacy as in real life, a system is expected to be able toprovide certain services.

Deﬁnition 6 (Proposed functional conformance deﬁnition)1. If in a policy π θ , an entity E has the right to have a type of data, θ , then E can have thistype of data in the corresponding architecture.2. If in a policy π θ , an entity E has the right to link two types of data, θ and θ , then E canlink these types of data in the corresponding architecture.3. If in a policy π θ , the collection of a (collection, usage, storage, or transfer) consent is not required, then no corresponding consent can be received in the corresponding architecture.4. If in a policy π θ , there is a (collection, usage, storage, or transfer) purpose act : θ deﬁned,then in the corresponding architecture there is an action act deﬁned on a data type θ (besidessome θ ).5. If in a policy π θ , ( strplace ∈ π str . where ) for some storage place strplace , then in thecorresponding architecture this type of data can be stored in strplace .6. If in an architecture a piece of data of type θ can be deleted from a storage place, delplace ,then in the corresponding policy π θ , we have (delplace = π del . fromwhere ).7. If in the policy π θ , E ∈ π fw . fwto , then in the corresponding architecture, the same type ofdata can be transferred to the same entity E . The veriﬁcation engine is based on logic and resolution based proofs. Below, we deﬁne the inferencerules that will be used in the proof process in Algorithm 1. See Table 3 for the notations used inthis section. 19 n inference rule H ‘ T , . . . , T n . The head of a rule H (in H ‘ T , . . . , T n ). The tail of a rule T , . . . , T n (in H ‘ T , . . . , T n ). A T i is called as a (sub-)goal in a proof. A fact

Any of H , T , . . . , T n . A predicate

Each fact has the form of PREDICATE(

Argument ,. . . , Argument m ). θV A variable that can be mapped to a data type θ in the policy/architecture. EV A variable that can be mapped to an entity E in the policy/architecture. DD A variable that can be mapped to a deletion delay dd in the policy/arch. TV A variable that can be mapped to a non-speciﬁc time value TT in the arch. TT , dd A non-speciﬁc time value ( TT ), and a numerical time value ( dd ). K , PK The variables that can be mapped to a type of symmetric and public key. initgoal

A goal to be proved, which is generated from/captures a sub-policy. AG A set of all possible initgoal s (covers all the seven sub-policies).

C/U/FwPurpSet

A set of facts that capture the collection/usage/transfer purposes, respectively.

UniqueTypes

A set of facts that capture the unique data types (UNIQUE( θ )). σ A uniﬁer or mapping, e.g. σ = { EV E , θV θ , DD dd , TV TT }, where E is an entity value (e.g. client), θ is a type value (e.g. name), dd (e.g. 6 years). T σ

Apply the mapping σ to the variables in T . Data ( θV , EV from ), θV : a type of a piece of data, EV from : who originally sent this data. isSuccessful [(rule, goal)] A dictionary used in e.g. the Python language, with (rule, goal) as the key. Table 3: The notiﬁcations used in the automated veriﬁcation engine.

Deﬁnition 7

An inference rule R is denoted by R = H ‘ T , . . . , T n , where H is the head ofthe rule and T , . . . , T n is the tail of the rule. Each element T i of the tail is called a fact (orcondition), and a head is called a “consequence". The rule R reads as “if T , . . . , T n , then H ". Figure 9 includes the proposed rules used in the veriﬁcation of the DPR conformance relations.For instance, D1 speciﬁes that if an entity EV can receive a transfer consent on Data , Data =( θV , EV from ), to EV to at some non-speciﬁc time T V , and EV to can receive this at the same time (orlater ), then we say that EV can collect the transfer consent on θV to EV to .Figure 10 includes the proposed rules used in the veriﬁcation of the privacy conformancerelation (i.e. a HAS/HASUPTO data possession property). For instance, rule P1 says that if anentity EV can store Data , Data = ( θV , EV from ) in Figure 9, and can delete Data within a timedelay DD , then the entity can have this data up to DD time. Rule P2 says that if a trustedauthority/organisation has any data that contains a pseudonym ( P ( DS )), alongside some otherdata, then the trusted authority can also have the same data that contains the “real" identityDS. P3 says that if EV can own a type of data (regardless of time), then it can have this typeof data. Finally, rule P4 says that if EV can receive Data at some non-speciﬁc time TV , then itcan have this data. The rest rules can be interpreted in the same way. Rules P8 - P10 capture thedecryption of the cryptographic data types. P8 says that if EV can have an encryption of Data using a symmetric key K , and it can also have K , then it can have Data . Similarly, P9 - P10 dealwith a decryption of message authentication code, and an asymmetric encryption, respectively.Figure 11 includes the proposed rules used in the veriﬁcation of the privacy conformancerelation (for the LINK property). For instance, rule

L1/a says that if the entity EV can have anydata that contains two pieces of data of types θV , and θV alongside any other data (denoted by θV and θV ), and any data that contains two pieces of data of types θV and θV , then EV is ableto link the data of types θV and θV . Note that this is not a “unique" linkability, meaning that EV cannot be sure that the data of types θV and θV belong to the same individual (althoughit can narrow down the set of possible individuals to some extent). On the other hand, rule U1 also says that if the type θV is unique (e.g. passport numbers), then EV is able to “unique"link the data of types θV and θV . Moreover, EV can also be sure that they belong to the same This is modelled in an abstract way by using the same non-speciﬁc time value

T V . More precisely, it can have the corresponding type of data ( θV ) belongs to Data = ( θV , EV from ).

1. FWCONSENTCOLLECTED( EV , θV , EV to ) ‘ RECEIVEAT( EV , Fwconsent ( Data , EV to ), Time ( TV )), RECEIVEAT( EV to , Data , Time ( TV ))D2. CCONSENTCOLLECTED( EV , θV ) ‘ RECEIVEAT( EV , Cconsent ( Data ), Time ( TV )), RECEIVEAT( EV , Data , Time ( TV ))D3. UCONSENTCOLLECTED( EV , θV ) ‘ RECEIVEAT( EV , Uconsent ( Data ), Time ( TV )), CREATEAT( EV , Anytype( Data ), Time ( TV ))D4. UCONSENTCOLLECTED( EV , θV ) ‘ RECEIVEAT( EV , Uconsent ( Data ), Time ( TV )), CALCULATEAT( EV , Anytype( Data ), Time ( TV ))D5. STRCONSENTCOLLECTED( EV , θV ) ‘ RECEIVEAT( EV , Sconsent ( Data ), Time ( TV )), STOREAT( EV , Data , Time ( TV ))Where Data = ( θV , EV from ) ( θV represents a data type, and EV from , an entity that originally sent this data). Figure 9: The proposed inference rules for DPR conformance check. The predicates and argumentsof the heads and tails in the rules are in line with the architecture syntax in Figure 5.individual . Rule L1/b is the same as rule

L1/a , but contains the

Meta () construct, to capturethe meta and packet header information.

The automated conformance veriﬁcation is based on the execution of resolution steps and back-ward search. Resolution is well-known in logic programming and is widely supported in logicprogramming languages. The formal deﬁnition of resolution is based on the so-called substitutionand uniﬁcation steps. A substitution binds some value to some variable, and we denote it by σ inthis paper. Deﬁnition 8

A substitution σ is the most general uniﬁer of a set of facts F if it uniﬁes F ,and forany uniﬁer µ of F , there is a uniﬁer λ such that µ = λσ . Deﬁnition 9

Given a goal (fact) F , and a rule R = H ‘ T ,. . . , T n , where F is uniﬁable with H with the most general uniﬁer σ , then the resolution F ◦ ( F,H ) R results in T σ , . . . , T n σ . Deﬁnition 10

The function that generates initial (veriﬁcation) goals is deﬁned as: G : Policy DataT ype sppol → {

ColG ∪ UseG ∪ StoreG ∪ DelG ∪ TransfG ∪ HasG ∪ LinkG } . G expects a policy as input and returns a set of seven subsets of goals to be proved in aconformance check. Each subset contains the goals capturing each sub-policy in Section 3.1. Fora data type θ , we have: G ( π θ ) = { G θ col ∪ G θ use ∪ G θ str ∪ G θ del ∪ G θ fw ∪ G θ has ∪ G θ link } The goals generation rules : In the following, we provide the rules for goals generation basedon the speciﬁc values of the sub-policies inside π θ , namely, ( π col , π use , π str , π del , π fw , π has , π link ):

1. For π col with the collection purpose values { cp : θ ,. . . , cp n : θ n }, the following veriﬁcation goals are generated: G θ col = G θ ccons ∪ G θ cpurp , where G θ ccons = {CCONSENTCOLLECTED(sp, θ )}, No rule is deﬁned for the trivial HAS, LINK, LINKUNIQUE properties (e.g. if sp can receive Bill ( name , address ), then it can have name , address , and can link them), but the facts HAS(sp, name),. . . , LINK(sp, name,address) are generated directly.

1. HASUPTO( EV , θV , Time ( DD )) ‘ STOREAT( EV , Data , Time ( TV )), DELETEWITHIN( EV , Data , Time ( DD ))P2. HAS( trusted , Anytype( DS , θV )) ‘ HAS( trusted , Anytype( θV , P ( DS ))),where Anytype is not a crypto function (Anytype / ∈ { Senc , Aenc , Mac , Hash }).P3. HAS( EV , θV ) ‘ OWN( EV , θV )P4. HAS( EV , θV ) ‘ RECEIVEAT( EV , Data , Time ( TV ))P5. HAS( EV , θV ) ‘ STOREAT( EV , Data , Time ( TV ))P6. HAS( EV , θV ) ‘ CREATEAT( EV , θV , Time ( TV ))P7. HAS( EV , θV ) ‘ CALCULATEAT( EV , θV , Time ( TV ))P8. HAS( EV , θV ) ‘ HAS( EV , Senc ( θV , K )), HAS( EV , K )P9. HAS( EV , θV ) ‘ HAS( EV , Mac ( θV , K )), HAS( EV , K )P10. HAS( EV , θV ) ‘ HAS( EV , Aenc ( θV , PK )), HAS( EV , Sk ( PK ))P11. HASUPTO(EV, θV , Time (DD)) ‘ STORE(EV, Data), DELETEWITHIN(EV, Data,

Time (DD))P12. HAS( trusted , Anytype(DS, θV )) ‘ HAS( trusted , Anytype( P (DS), θV ))P13. HAS( trusted , Anytype( θV ,DS)) ‘ HAS( trusted , Anytype( θV , P (DS)))P14. HAS( trusted ,Anytype( θV , DS)) ‘ HAS( trusted , Anytype( P (DS), θV ))P15. HAS(EV, θV ) ‘ RECEIVE(EV, Data)P16. HAS(EV, θV ) ‘ STORE(EV, Data)P17. HAS(EV, θV ) ‘ CREATE(EV, θV )P18. HAS(EV, θV ) ‘ CALCULATE(EV, θV ). Figure 10: Inference rules for privacy conformance check (HAS and HASUPTO property). P8 - P10 capture cryptographic veriﬁcation/decryption, i.e. the destructor application deﬁned in Figure 4.22 EV , θV , θV ) ‘ HAS( EV , Anytype1( θV , θV , θV )), HAS( EV , Anytype2( θV , θV , θV ))L1/b. LINK( EV , θV , θV ) ‘ HAS( EV , Anytype1( θV , θV , Meta ( θV ))), HAS( EV , Anytype2( θV , θV , Meta ( θV )))L2. LINK(EV, θV , θV ) ‘ HAS(EV,Anytype1( θV , θV , θV )), HAS(EV, Anytype2( θV , θV , θV ))L3. LINK(EV, θV , θV ) ‘ HAS(EV, Anytype1( θV , θV , θV )), HAS(E, Anytype2( θV , θV , θV ))L4. LINK(EV, θV , θV ) ‘ HAS(EV, Anytype1( θV , θV , θV )), HAS(EV, Anytype2( θV , θV , θV ))L5-L8 are similar to L1-L4, respectively, but with HAS(EV, Anytype1( θV , θV , θV ))instead of HAS(EV, Anytype1( θV , θV , θV ))U1. LINKUNIQUE( EV , θV , θV ) ‘ HAS( EV , Anytype1( θV , θV , θV )), HAS( EV , Anytype2( θV , θV , θV )), UNIQUE( θV )where Anytype1 and Anytype2 are not crypto functions.U2. LINKUNIQUE(EV, θV , θV ) ‘ HAS(EV, Anytype1( θV , θV , θV )), HAS(EV, Anytype2( θV , θV , θV )), UNIQUE( θV )U3. LINKUNIQUE(EV, θV , θV ) ‘ HAS(EV, Anytype1( θV , θV , θV )), HAS(EV, Anytype2( θV , θV , θV )), UNIQUE( θV )U4. LINKUNIQUE(EV, θV , θV ) ‘ HAS(EV, Anytype1( θV , θV , θV )), HAS(EV, Anytype2( θV , θV , θV )), UNIQUE( θV )U5-U8 are similar to U1-U4, respectively, but with HAS(EV, Anytype1( θV , θV , θV ))instead of HAS(EV, Anytype1( θV , θV , θV )). Figure 11: Inference rules for privacy conformance check (linkability and unique linkability).23 θ cpurp = {CPURPOSE( θ , cp ), . . . , CPURPOSE( θ n , cp n )}.If cons = N , then CCONSENTCOLLECTED(sp, θ ) / ∈ G θ use .2. For π use with the usage purpose values { up : θ ,. . . , up n : θ n }, the following veriﬁcation goals are generated: G θ use = G θ ucons ∪ G θ upurp , where G θ ucons = {UCONSENTCOLLECTED(sp, θ )}, G θ upurp = {UPURPOSE( θ , up ), . . . , UPURPOSE( θ n , up n )}.Again, if the ﬁrst argument of π use is cons = N , then UCONSENTCOLLECTED(sp, θ ) / ∈ G θ use .3. For π str with the storage place values { E ,. . . , E n }, the next veriﬁcation goals are generated: G θ str = G θ scons ∪ G θ places , where G θ scons = {STRCONSENTCOLLECTED(sp, θ )}, G θ places = {STORE( E , θ , EV from ), . . . , STORE( E n , θ , EV from ), . . . , STOREAT( E n , θ , EV from , Time ( TT ))}.4. If π del = ({ E ,. . . , E n }, dd ), where E ,. . . , E n are the values of the deletion places, and dd is the value of the deletion delay, then: G θ del = G θ hasupto ∪ G θ within , where G θ hasupto = {HASUPTO( E , θ , Time ( dd )), . . . , HASUPTO( E n , θ , Time ( dd ))}, G θ within = {DELETEWITHIN( E , θ , EV from , Time ( dd )), . . . , DELETEWITHIN( E n , θ , EV from , Time ( dd ))}.5. If π fw = ( cons , { E ,. . . , E n }, { fwp : θ ,. . . , fwp m : θ n }), where E ,. . . , E n are the entities who canreceive the transferred data, and fwp ,. . . , fwp m are the transfer purpose values, then: G θ fw = G θ fwcons ∪ G θ fwto ∪ G θ fwpurp , where G θ fwto = {RECEIVE( E , θ , EV from ), RECEIVE( E n , θ , EV from ), . . . , RECEIVEAT( E n , θ , EV from , Time ( TT ))}, G θ fwcons = {FWCONSENTCOLLECTED(sp, θ , E ),. . . , FWCONSENTCOLLECTED(sp, θ , E n )}, G θ fwpurp = {FWPURPOSE( θ , fwp ), . . . , FWPURPOSE( θ n , fwp m )}.6. For π has , if { E ,. . . , E n } is the set of all deﬁned entities in an architecture, then: G θ has = {HAS( E , θ ),. . . , HAS( E n , θ )}.7. For π link , if { E ,. . . , E n } is the set of all deﬁned entities in an architecture, and { θ ,. . . , θ m } isa set of all deﬁned data types (diﬀerent from θ ), then : G θ link = {LINK( E , θ , θ ), LINK( E , θ , θ ), . . . , LINK( E n , θ , θ n ), . . . , LINKUNIQUE( E n , θ m , θ )}. Finally, let us denote the set of all goals to be proved during a conformance veriﬁcation by AG , namely: AG = S ∀ θ ∈ DataT ypes sppol G ( π θ ),where DataT ypes sppol is a set of all data types deﬁned in the policy for a service provider sp . The generation of purpose-facts in architectures : Besides the actions deﬁned in Figure 5,to verify the DPR conformance regarding the (collection, usage, or forward) purposes, the so-calledpurpose-facts are generated. This is based on the following purposes-fact generation rules, for agiven architecture FA :1. If CREATEAT ( E , X θ , Time ( TT )) ∈ FA , then CPURPOSE( θ , createat) ∈ CPurpSet .2. If

CALCULATEAT ( E , X θ , Time ( TT )) ∈ FA , then UPURPOSE( θ , calculateat) ∈ UP-urpSet .3. If

RECEIVEAT ( E , Fwconsent ( X θ , E to ), Time ( TT )) ∈ FA , and CREATEAT ( E to , X θ , Time ( TT )) ∈ FA , then FWPURPOSE( θ , createat) ∈ FwPurpSet .24. If

RECEIVEAT ( E , Fwconsent ( X θ , E to ), Time ( TT )) ∈ FA , and CALCULATEAT ( E to , X θ , Time ( TT )) ∈ FA , then FWPURPOSE( θ , calculateat) ∈ FwPurpSet .These rules deﬁne how the facts for the collection (point 1), usage (point 2), and transfer (points3-4) purposes are generated from the architectural actions, and added into the sets

CPurpSet , UPurpSet , and

FwPurpSet , respectively, to be used in Algorithm 1.To speed up the veriﬁcation process, the actions deﬁned in an architecture are divided intofour subsets, speciﬁcally,

ArchTime , ArchPseudo , ArchMeta , and

Arch . ArchTime includes theactions that contain the

Time () construct,

ArchPseudo includes the actions that contain the P ()construct for pseudonym, ArchMeta includes the actions that contain the

Meta () construct formetadata, and ﬁnally,

Arch is a set of actions without any speciﬁc construct above (see rules

P15 - P18 in ?? ).Finally, if the set of unique data types deﬁned in the policy is { θ ,. . . , θ n }, { θ ,. . . , θ n } ⊆ DataType sppol , then we have the corresponding set of facts,

UniqueTypes , which can be used to provethe unique linkability properties (see rule U1 in Figure 11): UniqueTypes = {UNIQUE( θ ), . . . , UNIQUE( θ n )}.Let us deﬁne the following rule sets that we will use in the inference algorithms, namely:• DPRRules = { D D HasUpToRules = { P P HasRules = { P

3, . . . , P LinkRules = { L

1, . . . , L

8} (where L L /a and L /b ), and• LinkUniqueRules = { U

1, . . . , U Architecture isfulﬁlling the “initial" goal, initgoal , and returns either 1 if the proof is successful, or 0 if failed. Unique data types are types that can be used to uniquely identify a living individual, e.g. passport numbers. lgorithm 1: ConformanceCheck (initgoal, Architecture, Rulesets, N) Result:

Proof found (1) /Proof not found (0) (see Table 3 for theused notations)

Inputs :1. Rulesets = {

DPRRules , HasUpToRules , HasRules , LinkRules , LinkUniqueRules }.2. Architecture = {

ArchTime , ArchPseudo , ArchMeta , Arch }.3. ArchPurposes = {

CPurpSet , UPurpSet , FwPurpSet }.4. UniqueTypes.5. Goal: initgoal , where initgoal ∈ AG .6. Allowed layers of nested crypto functions: N . if initgoal ∈ G θ places ∪ G θ within ∪ G θ fwto thenfor arch in Architecture doif (initgoal ◦ ( initgoal , arch ) arch) is successful or (initgoal == arch) thenreturn endreturn endelseif the predicate of initgoal matches the predicate of a purpose-fact in AP, AP ∈ ArchPurposes thenfor purp in AP doif (initgoal ◦ ( initgoal , purp ) purp) is successful or (initgoal == purp) thenreturn endreturn endelseif VerifyAgainstRuleset (initgoal, Architecture, UniqueTypes, Rulesets, N) == 1 thenreturn elsereturn endendend Algorithm 2: VerifyAgainstRuleset (goal, Architecture, UniqueTypes, Rulesets, N) if the predicate of goal matches the predicate of a head of a rule in RS, RS ∈ Rulesets thenfor rule in RS do isSuccessful [(rule, goal)] = VerifyRule ( rule, goal, Architecture, UniqueTypes, Rulesets,N ) endif for all rule in RS: isSuccessful[(rule, goal)] == 0 thenreturn elsereturn endend Algorithm Explanation . Algorithm 1 expects as input the set of inference rules (

Rulesets ),a set of facts that capture the actions in an architecture (

Architecture ), a set of purposes deﬁnedin an architecture (

ArchPurposes ), a set of unique data types (

UniqueTypes ), and a veriﬁcationgoal, initgoal . N is a deﬁned number that denotes the maximum layers of nested cryptographicfunctions in a piece of data that the veriﬁcation engine examines. A ﬁnite N is used to ensure thetermination of the proof process. 26 lgorithm 3: VerifyUniqueTypes ( rule, goal, UniqueTypes ) for unique in UniqueTypes doif (goal ◦ ( goal , unique ) unique) is successful or (goal == unique) then Derivation_Unique_Successful[(rule,goal,unique)] = 1 (* Adictionary *) else

Derivation_Unique_Successful[(rule,goal,unique)] = 0 endendif for all arch in unique: Derivation_Unique_Successful[(rule,goal,unique)] == 0 thenreturn elsereturn end Algorithm 4: VerifyAgainstArch ( rule, goal, AS ) for arch in AS doif (goal ◦ ( goal , arch ) arch) is successful or (goal == arch) then Derivation_Arch_Successful[(rule,goal,arch)] = 1 (* A dictionary with the key(rule,goal,arch)*) else

Derivation_Arch_Successful[(rule,goal,arch)] = 0 endendif for all arch in AS: Derivation_Arch_Successful[(rule,goal,arch)] == 0 thenreturn elsereturn end

1. First of all, if initgoal ∈ G θ places ∪ G θ within ∪ G θ fwto (see points 3-5 in Deﬁnition 10),then we check whether inigoal can be uniﬁed with or equal to a fact in Architecture . Thealgorithm returns 1 if the proof was successful, and 0 otherwise.2. If initgoal is not an action fact, then we check if the (collection, usage, or transfer) purposesin an architecture is in line with the policy, namely, whether initgoal is in

ArchPurposes .The algorithm returns 1 if the proof was successful, and otherwise, 0.3. If initgoal is not a purpose-fact (e.g. initgoal = HAS(sp, name)), then we try to prove itusing the inference rule set and the given architecture. If a proof or a derivation was foundfor initgoal , then 1 is returned, otherwise, 0.4. In

VerifyRule ( rule, goal, Architecture, UniqueTypes, Rulesets, N ), inside algorithm 2, weattempt to carry out resolution steps between initgoal and each rule in an appropriate RS , RS ∈ Rulesets . If the proof has failed for all rules in RS , then 0 is returned (proof failed).Otherwise, if at least one rule can be used to prove the goal, then 1 is returned.5. In algorithm 5, a step goal ◦ ( goal , head of rule ) rule can be successful or unsuccessful (in casethere is no uniﬁer σ for goal and the head of rule ). This step results in the new (sub-)goals to be proved. If there is a new (sub-)goal that contains more than N layers of nestedcryptographic functions ( Senc , Aenc , Mac , Hash ), then we return 0, and this “branch" of theproof was unsuccessful . If there is a new (sub-)goal which corresponds to an architecturalaction, then we attempt to prove it using the facts in Architecture . A proof can be seen as a derivation tree, with initgoal in the root and the facts in

Architecture are the leaves.

27. Algorithm 4 speciﬁes a proof attempt using the (action) facts in

Architecture . If there is nomatching action for a goal, then this branch of the proof was unsuccessful. Otherwise, thisbranch of the proof has been successful.7. Finally, algorithm 3 checks goal against the set

UniqueTypes . If there is no matching, thenthis branch of the proof was unsuccessful. Otherwise, this branch of the proof has beensuccessful.Algorithm 5 deﬁnes a veriﬁcation process of initgoal via the sub-goals resulted from the reso-lution steps.

Algorithm 5: VerifyRule ( rule, goal, Architecture, UniqueTypes, Rulesets, N ) (* Note: The variable arguments in the inference rules are renamed before they are used in aresolution. *) GoalsToBeProved = { goal }; if goal ◦ ( goal , head of rule ) rule is successful thenif ∃ fact in ( goal ◦ ( goal , head of rule ) rule ) that contains more than N nested layers of cryptofunctions and rule ∈ {P8, P9, P10} thenreturn elseremove goal from GoalsToBeProved ; add the facts in ( goal ◦ ( goal , head of rule ) rule ) to GoalsToBeProved ; for nextgoal in GoalsToBeProved doif nextgoal is an action, and matches the Time/P/Meta construct in AS, AS ∈ Architecture thenif

VerifyAgainstArch (rule, nextgoal, AS) ==1 then isSuccessful [(rule, nextgoal)] = 1 else isSuccessful [(rule, nextgoal)] = 0 endelseif the predicate of nextgoal matches a fact in UniqueTypes thenif

VerifyUniqueTypes (rule, nextgoal, UniqueTypes) ==1 then isSuccessful [(rule, nextgoal)] = 1 else isSuccessful [(rule, nextgoal)] = 0 endelseif

VerifyAgainstRuleset (nextgoal, Architecture, UniqueTypes, Rulesets, N)== 1 then isSuccessful [(rule, nextgoal)] = 1 else isSuccessful [(rule, nextgoal)] = 0 endendendendif for all nextgoal in GoalsToBeProved: isSuccessful[(rule, nextgoal)] == 1 thenreturn elsereturn endendend Example 1 . Let

Architecture = {RECEIVEAT( sp , name , client , Time ( TT ))} and initgoal =28AS( sp , name ), namely, we want to prove that sp can have name . This can be proven with rule P4 in Figure 10 and a resolution step in Deﬁnition 9. RECEIVEAT( sp,name,client , TIME ( TT )) HAS( sp, name ) (cid:2035)(cid:1833) (cid:3035)(cid:3028)(cid:3020)(cid:3087) Resolution between

HAS( sp, name ) and rule P4 RECEIVEAT( sp,

Senc ( name,key ) ,client , TIME ( TT )) OWN( sp, key )HAS( sp, K )HAS( sp, Senc ( name, K ))RECEIVEAT( sp, Senc ( name,key ) ,client , TIME ( TT )) TT OWN( sp,spsp key ) example1 example2 P3P4 P8

Architecture initgoal generated by the possession sub-policy

HAS( sp, name ) (cid:2035)(cid:1833) (cid:3035)(cid:3028)(cid:3020)(cid:3087) Figure 12: Two example proofs (without and with encryption, respectively).•

Step 1 : initgoal ◦ ( initgoal, HAS ( EV ,θV )) P4 = RECEIVEAT( sp , name , client , Time ( TT ))),as initgoal can be uniﬁed with HAS( EV , θV ), the head of rule P4 , with the uniﬁer σ = { EV sp , θV name , EV from client , TV TT }. We have RECEIVEAT( EV , θV , E from , Time ( TV )) σ as a result, which is equal to RECEIVEAT( sp , name , client , Time ( TT ))).• Step 2 : As RECEIVEAT( sp , name , client , Time ( TT ))) ∈ Architecture , therefore, we get

ConformanceCheck ( initgoal, Architecture, Rulesets, N ) == 1, for any natural N . Example 2 . Let

Architecture = {RECEIVEAT( sp , Senc ( name ,key), client , Time ( TT )),OWN(sp, key)} and initgoal = HAS( sp , name ). This can be proven with rules P8 , then P3 , P4 as shown in Figure 12. (Correctness)We distinguish several cases based on the value of initgoal:1. If initgoal ∈ {HAS(E, θ ), HASUPTO(E, θ , Time (dd))}, and E ∈ π θ . π has at the policy level,then whenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1, Architecturefunctionally conforms with this requirement of the policy.2. If initgoal ∈ {HAS(E, θ ), HASUPTO(E, θ , Time (dd))}, and

E / ∈ π θ . π has , then when-ever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1, Architecture does notprivacy conform with the policy.3. If initgoal ∈ G θlink and ( E i , θ i ) ∈ π θ . π link , then whenever ConformanceCheck (initgoal,Architecture, Rulesets, N) == 1, the architecture functionally conforms with this link policy.4. If initgoal ∈ G θlink and ( E i , θ i ) / ∈ π θ . π link , then whenever ConformanceCheck (initgoal,Architecture, Rulesets, N) == 1, the architecture does not privacy conform with the policy.5. If initgoal ∈ G θccons ∪ G θucons ∪ G θscons ∪ G θfwcons , and π col .cons = Y , π use .cons = Y , π str .cons= Y , or π fw .cons = Y in π θ , respectively, then the architecture DPR conforms with the actualsub-policy whenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1.6. If initgoal = CPURPOSE( θ , cp) (i.e. initgoal ∈ G θcpurp ), and (cp: θ ∈ π use .cpurp), thenwhenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1, the architecturefunctionally conforms with the policy. . If initgoal = CPURPOSE( θ , cp) (i.e. initgoal ∈ G θcpurp ), and (cp: θ / ∈ π use .cpurp), thenwhenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1, the architecturedoes not DPR conform with the policy.8. If initgoal = UPURPOSE( θ , up) (i.e. initgoal ∈ G θupurp ), and (up: θ ∈ π use .upurp), thenwhenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1, the architecturefunctionally conforms with the policy.9. If initgoal = UPURPOSE( θ , up) (i.e. initgoal ∈ G θupurp ), and (up: θ / ∈ π use .upurp), thenwhenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1, the architecturedoes not DPR conform with the policy.10. If initgoal = FWPURPOSE( θ , fwp) (i.e. initgoal ∈ G θfwpurp ), and (fwp: θ ∈ π fw .fwpurp),then whenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1, the archi-tecture functionally conforms with the policy. (point 4 of Deﬁnition 4)11. If initgoal = FWPURPOSE( θ , fwp) (i.e. initgoal ∈ G θfwpurp ), and (fwp: θ / ∈ π fw .fwpurp),then whenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1, the archi-tecture does not DPR conform with the policy. (point 4 of Deﬁnition 4)12. If initgoal = {STORE(E, θ , EV from ), STOREAT(E, θ , EV from , Time (TT))} (i.e. initgoal ∈ G θplaces ), and (E ∈ π str .where) , then whenever ConformanceCheck (initgoal, Architec-ture, Rulesets, N) == 1, the architecture functionally conforms with the policy. (point 5 ofDeﬁnition 4)13. If initgoal = {STORE(E, θ , EV from ), STOREAT(E, θ , EV from , Time (TT))} (i.e. initgoal ∈ G θplaces ), and (E / ∈ π str .where) , then whenever ConformanceCheck (initgoal, Architec-ture, Rulesets, N) == 1, the architecture does not DPR conform with the policy. (point 5 ofDeﬁnition 4)14. If initgoal = DELETEWITHIN(E, θ , EV from , Time (dd)) (i.e. initgoal ∈ G θwithin ) and (dd ≤ π del .deld) and (E ∈ π del .fromwhere) , then whenever ConformanceCheck (initgoal,Architecture, Rulesets, N) == 1, the architecture functionally conforms with the policy.(point 5 of Deﬁnition 3)15. If initgoal = DELETEWITHIN(E, θ , EV from , Time (dd)) (i.e. initgoal ∈ G θwithin ) and (dd ≥ π del .deld) and (E ∈ π del .fromwhere) , then whenever ConformanceCheck (initgoal,Architecture, Rulesets, N) == 1, the architecture does not DPR conform with the policy.(point 5 of Deﬁnition 3)16. If initgoal = RECEIVE( E , θ , EV from ) (initgoal ∈ G θfwto ), and E ∈ π fw .fwto, then whenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1, the architecture function-ally conforms with the policy. (point 7 of Deﬁnition 4)17. If initgoal = RECEIVE( E , θ , EV from ) (initgoal ∈ G θfwto ), and E / ∈ π fw .fwto, then whenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 1, the architecture does notDPR conform with the policy. (point 7 of Deﬁnition 4)

Proof: ConformanceCheck ( initgoal, Architecture, Rulesets, N ) == 1 means that a proof of initgoal can be found with Architecture . Whenever initgoal can be proved with a rule rule = H ‘ T ,. . . , T n (Algorithm 5), the architectural actions that can be used to prove each of T ,. . . , T n belong to the same piece of data ( Algorithm 4). This is relevant, as for instance, rule D1 can beused to prove that a transfer consent on a data type θ has been collected before transferring thedata. To prove this, the two RECEIVEAT facts in the tail of D1 must belong to the same data.This is fulﬁlled because in rules D1 - D5 , P1 , P4 - P5 , P11 , and

P15 - P16 , Data is a pair of a datatype and the entity who originally sent it, i.e.

Data = ( θV , EV from ), which can be diﬀerentiatedfrom the other data pairs. 30herefore, in case of points 1 and 3 (of Property 1), the ﬁrst two points of Deﬁnition 6 aresatisﬁed, respectively. In case of points 2 and 4, the two points of Deﬁnition 4 are unsatisﬁed,respectively. In case of point 5, ConformanceCheck ( initgoal, Architecture, Rulesets, N ) == 1means that the ﬁrst point of Deﬁnition 5 is satisﬁed. Points 6, 8, 10 of Property 1 correspond tothe satisfaction of point 4 of Deﬁnition 6, while points 7, 9, 11 mean that point 2 in Deﬁnition 5 isunsatisﬁed. Point 12 of Property 1 correspond to the satisfaction of point 5 of Deﬁnition 6, whilepoint 13 correspond to (the unsatisﬁed) point 3 in Deﬁnition 5. Point 14 of Property 1 correspondto the satisfaction of point 6 of Deﬁnition 6, while point 15 correspond to (the unsatisﬁed) points4-5 in Deﬁnition 5. Point 16 corresponds to the satisfactory of point 7 of Deﬁnition 6. Finally,point 17 corresponds to (the unsatisﬁed) point 6 in Deﬁnition 5. (cid:3) Property 2 (Termination up-to N ) Let N be the maximum number of nested layers of crypto-graphic functions that the veriﬁcation engine will examine. Assume that the nested layers of thedeﬁned data types are ﬁnite, beside a ﬁnite N , the proof process never gets into an inﬁnite loop. Proof:

The veriﬁcation engine performs resolution steps between the goals and the rules in

Rulesets , as well as the (action) facts in

Architecture . If there is an inﬁnite loop in the proofprocess, then we would have an inﬁnite number of resolution steps. We will show that the numberof resolution steps is always ﬁnite during the proof of initgoal .As a result of a resolution step goal ◦ ( goal, head of rule ) rule , where rule ∈ { P8 , P9 , P10 }, weget the two new (sub-)goals in the tails of the rules (e.g. goal ◦ P8 = HAS( EV , Senc ( θV , K )) σ ,HAS( EV , K ) σ ). Since the veriﬁcation engine does not prove/examine any goal with more than N layers of cryptograpghic functions (e.g. HAS(sp, Senc ( Senc ,...(

Mac (name,key))),. . . ,key),key),there are maximum N recursive calls of the resolution step goal ◦ ( goal, head of rule ) rule , beside rule ∈ { P8 , P9 , P10 }. Each recursive call produces two (sub-)goals, hence, N recursive calls resultin at most 2 N (sub-)goals to be proved. In the worst case scenario, this would mean 2 N * | Rulesets | resolution steps (between each goal and rule pair, where | Rulesets | is the number rules in Rulesets ).In case rule is one of P3 - P7 or P15 - P18 , a resolution step goal ◦ ( goal, head of rule ) rule wouldgenerate a single goal (e.g. goal ◦ ( goal, head of P4 ) P4 = RECEIVEAT( EV , Data , Time ( TT )) σ ).Then, the resulted (sub-)goals will be checked against the facts in Architecture , which yields | Architecture | + 1 resolution steps for each rule (where | Architecture | is the number elements in Architecture ).In case rule is one of D1 - D5 or rule ∈ { P1 , P11 }, 2* | Architecture | + 1 resolution steps arecarried out. For rule ∈ { P2 , P12 , P13 , P14 }, a step goal ◦ ( goal, head of rule ) rule generates a single(sub-)goal. The (sub-)goals are then be checked against the rule set ( Rulesets ), including rules P3 - P7 (or P15 - P18 ), which yields 2* | Architecture | + 1 resolution steps in each case. In addition,when these (sub-)goals are checked against P8 - P10 , it yields 2 N * | Rulesets | resolution steps in eachcase.In case rule is one of L1 - L8 , a resolution step goal ◦ ( goal, head of rule ) rule generates two (sub-)goals. Each (sub-)goal will be examined against every rule (in Rulesets ), but a resolution stepcan only be successful in case of P3 - P10 . The resolution with each of these rules results in a ﬁnitenumber of further resolution steps (as we argued above). Similarly, the case of U1 - U8 only yieldsa ﬁnite number of resolution steps. (cid:3) The completeness property can be stated as a consequence of the termination property (Prop-erty 2), as follows:

Property 3 (Completeness)If all the data types speciﬁed in Architecture contain at most N layers of nested cryptographicfunctions, for some ﬁnite N , and all the deﬁned data types contain a ﬁnite number of layers ofother data types, then:1. If initgoal ∈ {HAS(E, θ ), HASUPTO(E, θ ,Time(dd))}, and E ∈ π θ . π has at the policy level,then whenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 0, the archi-tecture does not functionally conform with the policy. . If initgoal / ∈ G θlink and ( E, θ ) ∈ π θ . π link , then whenever ConformanceCheck (initgoal,Architecture, Rulesets, N) == 0, Architecture does not functionally conform with the policy.3. If initgoal ∈ G θccons ∪ G θucons ∪ G θscons ∪ G θfwcons , and π col .cons = Y , π use .cons = Y , π str .cons= Y , π fw .cons = Y in π θ , respectively, then the architecture does not DPR conform withthe policy whenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 0.4. If initgoal = CPURPOSE( θ , cp) (i.e. initgoal ∈ G θcpurp ), and (cp: θ ∈ π use .cpurp), thenwhenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 0, the architecture does not functionally conform with the policy.5. If initgoal = UPURPOSE( θ , up) (i.e. initgoal ∈ G θupurp ), and (up: θ ∈ π use .upurp), thenwhenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 0, the architecture does not functionally conform with the policy.6. If initgoal = FWPURPOSE( θ , fwp) (i.e. initgoal ∈ G θfwpurp ), and (fwp: θ ∈ π fw .fwpurp),then whenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 0, the archi-tecture does not functionally conform with the policy. (point 4 of Deﬁnition 4)7. If initgoal = {STORE(E, θ , EV from ), STOREAT(E, θ , EV from , Time (TT))} (i.e. initgoal ∈ G θplaces ), and (E ∈ π str .where) , then whenever ConformanceCheck (initgoal, Architec-ture, Rulesets, N) == 0, the architecture does not functionally conform with the policy.(point 5 of Deﬁnition 4)8. If initgoal = DELETEWITHIN(E, θ , EV from , Time (dd)) (i.e. initgoal ∈ G θwithin ) and (dd ≤ π del .deld) and (E ∈ π del .fromwhere) , then whenever ConformanceCheck (initgoal,Architecture, Rulesets, N) == 0, the architecture does not functionally conforms with thepolicy. (point 5 of Deﬁnition 3)9. If initgoal = RECEIVE( E , θ , EV from ) (initgoal ∈ G θfwto ), and E ∈ π fw .fwto, then whenever ConformanceCheck (initgoal, Architecture, Rulesets, N) == 0, the architecture does not functionally conform with the policy. (point 7 of Deﬁnition 4)

Property 3 says that completeness can only be “achieved" up to the maximum allowed nestedlayers of cryptographic functions, N . Proof: If ConformanceCheck ( initgoal, Architecture, Rulesets, N ) == 0, then initgoal cannotbe proved by any fact in Architecture provided that all facts in

Architecture contain at most N nested layers of functions Senc , Aenc , and

Mac , and nested layers of other data types. The latterassumption is required for a resolution step to be successful, while the ﬁrst is required to make theveriﬁcation terminates. Otherwise, if there is a set of facts in

Architecture , which can be used toprove initgoal , then there would be a derivation tree meaning that

ConformanceCheck ( initgoal,Architecture, Rulesets, N ) == 1.Therefore, point 1 of Property 3 does not satisfy the ﬁrst point of Deﬁnition 6. Similarly,point 2 of Property 3 does not satisfy the second point of Deﬁnition 6. Point 3 of Property 3does not satisfy the ﬁrst point of Deﬁnition 5. Points 4-6 of Property 3 correspond to point 4 ofDeﬁnition 6. Points 7, 8, and 9 of Property 3 correspond to points 5, 6, and 7 of Deﬁnition 6,respectively. (cid:3) As most of the laws and articles in the GDPR are complex, formally specifying them withoutsimpliﬁcation is either cumbersome or impossible. In this paper, we attempt to capture somebasic requirements in an abstract way. There are several ways to improve or extend the proposedformal sepciﬁcations. For instance, practically, depending on the context of a consent (e.g. health-care or education contexts), a consent may contain diﬀerent pieces of information that need to be32odelled. Furthermore, in our languages we do specify the deletion of a consent, but only whenthe data itself is deleted (see the last rule in Figure 8). A more detailed study of the consentrevocation process can be addressed in the future, for example, when the collected data has notbeen deleted yet, but the consent for transfer has been revoked. This could be addressed bychanging the last rule in Figure 8 such that only the consent in question is deleted.There are areas to improve regarding the transfer sub-policy as well, for example, the GDPRcovers the case when personal data is transferred to a third country or an international organi-sation, and appropriate agreement and arrangement must be done prior data transfer [41]. Thisagreement could be speciﬁed in the form of a sticky policy between a service provider and an in-ternational organisation. Sticky policies are used in PPL [17] to match the expectation of a clientand the obligation oﬀered by a service provider. Regarding the deletion sub-policy, in the GDPR,the data subject also has the right to request a deletion for their collected data. This can bemodelled with an event/action that captures the reception of a deletion request (e.g. recvdelreq ( θ , place , t )) and a corresponding deletion event within a speciﬁed delay. Finally, transparency is alsoan important part of the GDPR as it captures the “right to be informed", which can be deﬁnedby the event/action “notify" that happens before the data collection, usage, storage and transfer.To capture the (collection, usage, or transfer) purposes, for simplicity, the architecture lan-guage proposed in this paper relies on only the two basic actions create and calculate . In thesame way, additional actions can be added to specify purposes such as “send some type of data"(deﬁned by send : θ such as send : bill ), or “notify about some type of data" (e.g. notify : θ such as notify : energyconsumption ).Besides the simpliﬁed data protection requirements, the strength of our approach is the datapossession and data connection policies, as well as the automated veriﬁcation of these. Althoughat the policy and architecture levels the veriﬁcation process seems to be simpler than in case ofverifying a program code, it is relevant to detect any design ﬂaws at these higher levels. Manualand informal reasoning can be error-prone, especially when there are many complex data typesand entities in the system. DataProVe is written in Python, and is available for download from GitHub and its website . After launching the tool, as depicted in Figure 13, the default page can be seen, where the usercan specify a system architecture. DataProVe supports two types of components, the so-calledmain components, and the sub-components. The main components can represent an entire organ-isation, system or entities that consists of several smaller components, such as a service provider,a customer, or authority (trusted third-party organisation). Sub-components are elements of amain component, for example, a service provider can have a server, a panel, or storage place. Amain component usually has access to the data handled by its own sub-components, but this isnot always the case, for instance, two main components can share a sub-component and only onemain-component has access to its data. This can happen, for example, when a service provideroperates a device of a trusted third party, but it does not have free access to the content of thedata stored inside the device.In the ﬁrst version of DataProVe (v0.9), main components are represented by rectangularshapes, while sub-components are represented by circles. Examples can be seen in Figures 14-17.In this report, we will interchange between the two terms entity and component, because theterm entity has been used in our theoretical papers, while the tool uses the term component more.They refer to the same thing in our context. https://github.com/vinhgithub83/DataProVe https://sites.google.com/view/dataprove/ recvdmsg1 , which denotes that theserver receives a message called msg1 . Its content (depicted in Figure 20) says that sp can receivea reading that contains the energy consumption (energy) and the customer ID ( custID ).In the architecture level, we distinguish entity/component, actions and data, where actionsspecify what an entity/component can do on a piece of data (it may not perform this actioneventually during a low-level system run, but there are instances of the system run that wherethis action happens), except for DELETEWITHIN, as we will see later.34igure 15: Adding a new main component of size 200x50.Figure 16: Adding a new sub-component with a radius size of 20. Based on the deﬁnition of actions and architectures in Figure 5, we propose their correspondingformats that can be given in the text boxes/text editor in DataProVe.Actions are words/string of all capital letters, and DataProVe supports the actions "OWN","RECEIVE", "RECEIVEAT", "CREATE", "CREATEAT", "CALCULATE", "CALCULATEAT","STORE", "STOREAT", "DELETE", "DELETEWITHIN". The syntax of each action in Dat-aProVe is as follows.

Note: no space character is allowed when specifying the actions in the bullet points below.

The reserved/pre-deﬁned keywords are highlighted in bold, while the non-bold text can befreely deﬁned by the user:•

OWN (component,Datatype) : 35igure 17: Choosing the color for a component.Figure 18: Specify which main component has access to the data in which sub component (sp hasaccess to server and meter, while auth has access to meter and socialmediapage).This action deﬁnes that a component (e.g., sp , auth, server, meter etc.) can own a pieceof data of type Datatype. For example, OWN (server,spkey) say they server can own the apiece of data of type service provider key (spkey).•

RECEIVE (component,Datatype):This action deﬁnes that a component can receive a piece of data of type Datatype, forexample,

RECEIVE (server,Sicknessrecord(name,insurancenumber)) says that server canreceive a sickness record that contains a piece of data of type name and insurance number.•

RECEIVEAT (component,Datatype,

Time(t) ):36igure 19: Draw an arrow from the component meter to server.Figure 20: Specify the message content of recvdmsg1 (through the action RECEIVE).This action is similar to the previous one, except that here we also need to deﬁne the timewhen the data can be received. Since at the architecture level we do not intent to spec-ify the concrete time value, the generic time construct, denoted by the keyword

Time(t) speciﬁes that component can receive a piece of data of type Datatype at some (not speciﬁc)time t . RECEVEAT is used to deﬁne when a consent (

Cconsent (Datatype),

Ucon-sent (Datatype),

Sconsent (Datatype),

Fwconsent (Datatype)) is received.37

CREATE (component,Datatype):This action deﬁnes that a component can create a piece of data of type Datatype, for instance,

CREATE ( sp ,Account(name,address,phone)) deﬁnes that a service provider sp can createan account that contains three pieces of data of types name, address and phone number.• CREATEAT (component,Datatype,

Time(t) ):This action deﬁnes that a component can create a piece of data of type Datatype at some(not speciﬁc) time t . For example, CREATE ( sp ,Account(name,address,phone), Time(t) ).•

CALCULATE (component,Datatype):This action deﬁnes that a component can calculate a piece of data of type Datatype, forinstance,

CALCULATE ( sp ,Bill(energyconsumption)) deﬁnes that a service provider sp can calculate a bill using a piece of data of type energy consumption.• CALCULATEAT (component,Datatype,

Time(t) ):This action deﬁnes that a component can calculate a piece of data of type Datatype at some(not speciﬁc) time t . For example, CALCULATE ( sp ,Bill(energyconsumption), Time(t) ).•

STORE (storageplace,Datatype):This action deﬁnes that a service provider can store a piece of data of type Datatype in stor-ageplace, where storageplace can be mainstorage , backupstorage . These reserved key-words deﬁne a collection of storage place(s) that can be seen as “main" storage, or “backup"storage, respectively.For example, STORE ( mainstorage ,Account(name,address,phone)) deﬁnes that a serviceprovider can store an account that contains name, address and phone number in its mainstorage place(s).• STOREAT (storageplace,Datatype,

Time(t) ):This action deﬁnes that a component can store a piece of data of type Datatype in theplace(s) storageplace at some (not speciﬁc) time t .For example, STORE ( mainstorage ,Account(name,address,phone), Time(t) ) deﬁnes thatan account with a name, address and phone number can be stored in the main storage ofthe service provider at some time t .• DELETE (storageplace,Datatype):The action delete is closely related to the action store, as it deﬁnes that a piece of data oftype Datatype can be deleted from storageplace.For example,

DELETE ( mainstorage ,Account(name,address,phone)) captures that a ser-vice provider can.• DELETEWITHIN (storageplace,Datatype,

Time (tvalue)):This action captures that once the data is stored, a component must delete a piece of dataof type Datatype within the given time value tvalue (tvalue is a data type for time values).Unlike the non-speciﬁc

Time(t) , which is a predeﬁned construct, tvalue is deﬁned by theuser, and takes speciﬁc time values such as 3 years or 2 years 6 months.For example,

DELETE ( mainstorage ,Account(name,address,phone), Time (2y)) deﬁnes thatthe service provider must delete an account from its main storage within 2 years.38 .1.2 COMPONENTS/ENTITY

A component can be speciﬁed by a string of all lower case, for example, a service provider canbe speciﬁed by sp , or a third-party authority by auth (obviously they can be speciﬁed with anyother string).DataProVe supports some pre-deﬁned or reserved components/entities, such as sp , trusted , mainstorage , backupstorage .• sp : this reserved keyword deﬁnes a service provider. DataProVe only allows a single serviceprovider at a time (in the speciﬁcation of a policy and architecture).• trusted : this reserved keyword deﬁnes a trusted authority that is able to link a pseudonymto the corresponding real name.• mainstorage : this reserved keyword deﬁnes the collection of main storage places of a serviceprovider.• backupstorage : this reserved keyword deﬁnes the collection of backup storage places of aservice provider. Note: An entity/component is always deﬁned as the ﬁrst argument of an action.

DataProVe supports two groups of data types, the so-called compound data types, and simpledata types.•

Simple data types do not have any arguments, and they are speciﬁed by strings of alllower cases, without any space or special character. Example simple data types includename, address, phonenumber, nhsnumber, etc.•

Compound data types have arguments, and they are speciﬁed by strings that start witha capital letter followed by lower cases (again without any space or special character).For example, Account(name,address,phone) is a compound data type that contains threesimple data types as arguments. Another example compound data type can be Hospital-record(name,address,insurance). Any similar compound data types can be deﬁned by theuser. We note that the space character is not allowed in the compound data types.Nested compound data types are compound data types that contain another compounddata types. For instance, Hospitalrec(Sicknessrec(name,disease),address,insurance) capturesa hospital record that contains a sickness record of a name and disease, and an address, andﬁnally, an insurance number.

Note: The ﬁrst version of DataProve (v0.9) supports three layers of nested data types.

DataProVe has pre-deﬁned or reserved data types, such as• The types of consents:

Cconsent (Datatype),

Uconsent (Datatype),

Sconsent (Datatype),

Fwconsent (Datatype).We do not diﬀerentiate among the diﬀerent consent format, it can be e.g. written consent,or online consent form, or some other formats.39

Cconsent (Datatype): This is a type of collection consent on a piece of data of typeDatatype. For example,

Cconsent (illness),

Cconsent (Account(creditcard,address))capture the collection consent on the illness information, and the account containing acredit card number and address. – Uconsent (Datatype): A type of usage consent on a piece of data of type Datatype.For example,

Uconsent (Energy(gas,water,electricity)),

Uconsent (address). – Sconsent (Datatype): A type of storage consent on a piece of data of type Datatype.For example,

Sconsent (personalinfo),

Sconsent (Account(creditcard,address)) deﬁnesthe types of storage consent on a type of personal information and account, respectively. – Fwconsent (Datatype,component): A type of forward/transfer consent on a piece ofdata of type Datatype, and a component to whom the data is forwarded/transfered. E.g.

Fwconsent (personalinfo,auth),

Fwconsent (Account(creditcard,address),auth) deﬁnesthe type of forward consent on the type of personal information and account, respec-tively, as well as a third party authority (auth) to which the given data is forwarded.• The types of time and time value:

Time(t) or Time (tvalue), where

Time () is a time datatype, while the pre-deﬁned special keyword t denotes a type of non-speciﬁc time, and tvalueis a type of time value (such as 5 years, 2 hours, 1 minute, etc.). tvalue is a (recursive) typeand takes the form of tvalue ::= y | mo | w | d | h | m | numtvalue | tvalue + tvalue where y speciﬁes a year, mo a month, w a week, d a day, h an hour and m a minute. Further,numtvalue is the a number (num) before tvalue, for example if num = 3 and tvalue = y,then numtvalue is 3y (i.e. 3 years). Additional examples include tvalue = 5y + 2mo + 1d+ 5m.It is important to note that Time (tvalue) can only be used in the action

DELETE-WITHIN , RECEIVEAT , CREATEAT , CALCULATEAT , STOREAT must containthe non-speciﬁc time

Time(t) .For example, the actions – DELETEWITHIN ( sp , mainstorage ,Webpage(photo,job), Time (10y+6mo)) Any web-page must be deleted from the main storage of the service provider within 10 years and6 months. – RECEIVEAT ( sp , Cconsent (illness),

Time(t) ) The service provider can receive a col-lection consent on illness information at some non-speciﬁc time t . – RECEIVEAT ( sp , Uconsent (Webpage(photo,job)),

Time(t) ) The service provider canreceive a usage consent on a webpage at some non-speciﬁc time t . – STOREAT ( sp , backupstorage ,Webpage(photo,job), Time(t) ) The service providercan store a webpage in its back up storage places at some non-speciﬁc time t . – CREATEAT (server,Account(name,address), Time(t) ): The service provider can cre-ate an account that contains a name and address in at some non-speciﬁc time t . – CALCULATEAT ( sp ,Bill(tariﬀ,Energy(gas,water,electricity)), Time(t) ): The serviceprovider can create an account that contains a name and address in at some non-speciﬁctime t .• The type of metadata and meta values: Meta (Datatype).This data type deﬁnes the type of metadata (information about other data), or informationlocated in the header of the packets, the meta information often travels through a network40ithout any encryption or protection, which may pose privacy concern. Careful policy andsystem design are necessary to avoid privacy breach caused by the analysis of metadata orheader information.

Note:

Meta (Datatype) is always deﬁned as the last argument in a piece of data.

Example application of metadata includes: – RECEIVE (sp,Sicknessrec(name,disease,

Meta (ip))):This action deﬁnes that the service provider can receive a packet that containing aname and disease, but the packet also includes the metadata IP address of the sendercomputer. We note that this syntax is simpliﬁed in terms that it aims to eliminatethe complexity of nested data type. Speciﬁcally, this syntax abstracts away from thedeﬁnition of the so-called packet data type, an “abbreviation" of the lengthy

RE-CEIVE ( sp ,Packet(Sicknessrec(name,disease), Meta (ip)))). – RECEIVE ( sp ,Sicknessrec(name,disease, Meta ( Enc (ip,k)))): This action is similar tothe previous one, but now the metadata IP address is encrypted with a key k. – RECEIVEAT ( sp ,Sicknessrec(name,disease, Meta (ip)),

Time(t) ):This action is similar to the ﬁrst one, but it includes the time data types at the end. Itdeﬁnes that the service provider receives the sickness record along with the IP addressof the sender device, at some non-speciﬁc time t .Obviously, any metadata can be deﬁned instead of IP address in the examples above.• The type pseudonymous data: P (Datatype | component).This data type deﬁnes the type of pseudonymous data, for example, a pseudonym. Theargument can be either a data type or a component . Pseudonym is a means for achievinga certain degree of privacy in practice as the real identity/name and the pseudonym can onlybe linked by a so-called trusted authority. DataProVe also captures this property, namely,only the component trusted can link the pseudonym to the real name/identity.For example, – RECEIVE ( sp ,Sicknessrec( P (name),disease)):This action deﬁnes that a service provider can receive a sickness record, but this time,the name in the record is not the real name but a pseudonym, hence, the service providercannot link a real name to a disease. – RECEIVE ( trusted ,Sicknessrec( P (name),disease)):This is similar to previous case, but the trusted authority can receive a sickness recordinstead of the service provider. – RECEIVE ( sp ,Sicknessrec( P (name),disease, Meta (ip))): Again, this is similar to theﬁrst case, but with metadata. – RECEIVEAT ( sp ,Sicknessrec(name,disease, Meta (ip)),

Time(t) ): This is similar toprevious case, but also include the time data type.• The types of cryptographic primitives and operations: DataProVe supports the basic cryp-tographic primitives for the architecture. Again, we provide the reserved keywords in bold. This would be in the versions above v0.9. In the version 0.9, DataProVe preserves the keyword (all smallletters) ds for data subject, and the user can deﬁne P( ds ) to specify that the real data subject/identity has beenpseudonymised. Private key: Sk (Pkeytype):This data type deﬁnes the type of private key used in asymmetric encryption algorithms.Its argument has a type of public key (Pkeytype). We note that public key is not areserved data type. – Symmetric encryption:

Senc (Datatype,Keytype):This is the type of the cipher text resulted from a symmetric encryption, and has twoarguments, a piece of data and a symmetric key (Keytype).For example,∗

RECEIVE ( sp , Senc (Account(name,address),key)):This speciﬁes that a service provider can receive a symmetric key encryption of anaccount using a key of type key.∗

RECEIVE ( sp , Senc (Account(

Senc (name,key),address),key)):This speciﬁes that a service provider can receive a symmetric key encryption of anaccount that contains another encryption of a name, using a key of type key.∗

OWN ( sp ,key):This speciﬁes that a service provider can own a key of type key. – Asymmetric encryption:

Aenc (Datatype,Pkeytype):This is the type of the cipher text resulted from an asymmetric encryption, and hastwo arguments, a piece of data and a public key (Pkeytype).For example,∗

RECEIVE ( sp , Aenc (Account(name,address),pkey)):This speciﬁes that a service provider can receive an asymmetric key encryption ofan account using a public key of type pkey.∗

CALCULATE ( sp , Sk (pkey)):This speciﬁes that a service provider can calculate a private key corresponding tothe public key (of type pkey).∗ OWN ( sp ,pkey):This speciﬁes that a service provider can own a public key of type pkey. – Message authentication code (MAC):

Mac (Datatype,Keytype):This is the type of the message authentication code that has two arguments, a piece ofdata and a symmetric key (Keytype).For example,∗

RECEIVE ( sp , Mac (Account(name,address),key)):This speciﬁes that a service provider can receive a message authentication code ofan account using a key of type key. – Cryptographic hash:

Hash (Datatype):This is the type of the cryptographic hash that has only one argument, a piece of data.For example,∗

RECEIVE (server,

Hash (password)):This speciﬁes that a server can receive a hash of a password.∗

STORE ( sp , mainstorage , Hash (password)):This speciﬁes that a service provider can store a hash of a password in its mainstorage place(s).

On the data protection policy speciﬁcation page, we can deﬁne a high-level data protection policy(as shown in Figure 21). 42igure 21: The Policy Speciﬁcation Page.

The policy page has three parts, the top part is to specify the entities/components in the system,such as authority, client etc. On the left side, the user is expected to provide a short notation,and on the right side, the full name/description to help identifying the meaning of the notation.For instance, in Figure 21, the notation is auth, and the description is third party authority. Afteradding a new entity, it will appear in the drop-down option menu in the bottom part. Note that the entity sp (service provider) is a pre-deﬁned entity that is already added by default(hence, the user does not need to add). The user can specify any other entities.

The middle part in the policy speciﬁcation page is for deﬁning the data groups and data types.As shown in Figure 22, the user can deﬁne a group of data types, for instance, a data groupdenoted by personalinfo is deﬁned which includes four data types, name, address, dateofbirth , and phonenumber .The option menu in the middle (called “IS THIS UNIQUE") expects the user to provide ifthe data group together with its data types can be used to uniquely identify an individual. Forinstance, a name alone cannot be used to unique identify an individual, but a name together withan address, date of birth and phone number, can be, so the option “Yes” was chosen. Anotherexample is shown in Figure 23, with the data group called energy (refers to energy consumption)and its data types, gas, water, and electricity consumption. This type group together with itstypes cannot be used to uniquely identify an individual, hence, the option “No” was chosen.43igure 22: Specifying data groups (personalinfo) and its data types.Figure 23: Specifying data groups (energy) and its data types.

Based on the syntax of the policy language given in Section 3.1, we follow the seven sub-policies.However, here to avoid confusion we divide the last sub-policy, the data connection policy, intotwo categories,the data connection permit and data connection forbid policies. In the ﬁrst one theuser can specify which data link they allow, while in the second one for which they forbid.A data protection policy is deﬁned on a data group/type and an entity. In DataProVe, eachpolicy consists of eight sub-policies, to achieve a ﬁne-grained requirement speciﬁcation (Figure 24).The users do not have to deﬁne all the eight sub-policies, but they can if it is necessary. Both thepolicies and architectures can be saved, and opened later to modify or extend.Figure 24: The Policy Speciﬁcation Page (entities).Figure 25: The Policy Speciﬁcation Page (data groups).44he ﬁrst ﬁve sub-policies (collection, transfer) are deﬁned only from the service provider’s perspective. For the rest three sub-policies (data possession and the two data connections policies),the user can specify from any entity’s perspective.Figure 26: The Policy Speciﬁcation Page (choosing among data types).The eight sub-policies are data collection, data usage, data storage, data retention, data trans-fer, data possession and the two data connection sub-policies. Below we only highlight foursub-policies, for the rest four the readers are referred to full manual in the GitHub repository . The data collection sub-policy:

In the data collection sub-policy window, for a given entityand data group the user can specify whether consent is required to be collection when the selectedentity collect a selected data group (Y for Yes/N for No), and then specify the collection purposes.Figure 27: The data collection sub-policy.The collection purposes can be given row by row, each row with a diﬀerent action in the formatof: action1:data1,data2,. . . ,data_n https://github.com/vinhgithub83/DataProVe The data possession sub-policy:

The data possession sub-policy deﬁnes who can have/possess a piece of data of a given group.The users only need to specify who are allowed to have or possess a given data group, DataProVeautomatically assumes that the rest entities/components are not allowed to have/possess theselected type of data. Figure 28: The data possession sub-policy.

The data connection permitted sub-policy:

This sub-policy speciﬁes which entity ispermitted to connect or link two types/groups of data.In the second drop-down option menu, the user can specify further if the selected entity ispermitted to be able to link two pieces of data uniquely, meaning that it will be able to deducethat the two pieces of data belongs to the same individual.For example, in Figure 29, we speciﬁed that the service provider is permitted to be able tolink the data group energy and the data group personalinfo. However, we do not allow the serviceprovider to be able to uniquely link the two data groups. Obviously, if personalinfo was deﬁned asunique, then unique link would be possible, so there is chance that the architecture always violatesthis requirement of the policy.

The data connection forbidden sub-policy:

This sub-policy is the counterpart of thepermitted policy. While in case of the data possession policy, the user only needs to specifywhich entity is allowed to have or possess certain type of data, and DataProVe automaticallyassumes that the rest are not allowed, here the user needs to explicitly specify which pair of datatypes/groups are an entity is forbidden to be able to link together.For example, in Figure 30, we forbid for the third-party authority to be able to link the datagroup personalinfo with the data group energy. Here, we forbid the unique link-ability of thesetwo data groups for the third-party authority.If we choose “No” (Figure 31), then it means that any ability to link any two pieces of data of46igure 29: The data connection permission sub-policy.the given data groups, is forbidden (not just unique link). Hence, this option is stricter than thepreviously one.

We deﬁne three types of conformance, namely, functional conformance, privacy conformance andthe so-called DPR conformance.

The functional conformance captures if an architecture is functionally conforming with the speci-ﬁed policy. Namely:1. If in the policy, we allow for an entity to be able to have a piece of data of certain datatype/group, then in the architecture the same entity can have a piece of data of the sametype/group.2. If in the policy, we allow for an entity to be able to link/uniquely link two pieces of dataof certain types/groups, then in the architecture the same entity can link/uniquely link twopieces of data of the same types/groups. 47igure 30: The data connection permission sub-policy. The case when only unique link is forbid-den.3. If in the policy, the (collection, usage, storage, transfer) consent collection is not required fora piece of data of given type/group, then in the architecture there is no consent collection.4. If in the policy, we deﬁne(a) a storage option “Main and Backup Storage" for a piece of data of certain type/group,then in the architecture there is a

STORE or STOREAT action deﬁned for both mainstorage and backupstorage , and for the same data type/group;(b) a storage option “Only Main Storage", then in the architecture there is a

STORE or STOREAT action deﬁned for only mainstorage , and for the same data type/group.(c) If in the policy, we allow a piece of data of certain type/group, data, to be transferredto an entity ent, then in the architecture there is

RECEIVEAT (ent,data,

Time(t) )or

RECEIVE (ent,data).

1. In the policy, we allow for an entity to be able to have a piece of data of certain datatype/group, but in the architecture the same entity cannot have a piece of data of the sametype/group. 48igure 31: The data connection permission sub-policy.2. In the policy, we allow for an entity to be able to link/uniquely link two pieces of data ofcertain types/groups, but in the architecture the same entity cannot link/uniquely link twopieces of data of the same types/groups.3. In the policy, the (collection, usage, storage, transfer) consent collection is not required fora piece of data of given type/group, but in the architecture there is a consent collection,namely, an action•

RECEIVEAT ( sp , Cconsent (data),

Time(t) ), or•

RECEIVEAT ( sp , Sconsent (data),

Time(t) ), or•

RECEIVEAT ( sp , Uconsent (data),

Time(t) ), or•

RECEIVEAT (third,

Fwconsent (data,third),

Time(t) ).4. In the policy, we deﬁne(a) a storage option “Main and Backup Storage” for a piece of data of certain type/group,but in the architecture, there is

STORE or STOREAT action deﬁned for only either mainstorage or backupstorage , or no store action deﬁned at all, for the same datatype/group;(b) a storage option “Only Main Storage”, but in the architecture there is no STORE or STOREAT action deﬁned at all, for the same data type/group.49igure 32: To verify the conformance between the speciﬁed system architecture and policy.5. In the policy, we allow a piece of data of certain type/group, data, to be transferred toan entity ent, but in the architecture there is no

RECEIVEAT (ent,data,

Time(t) ) or

RECEIVE (ent,data) deﬁned (i.e., data is not transferred to the entity ent).

The privacy conformance captures if an architecture satisﬁes the privacy requirements deﬁned inthe policy. Namely:1. If in the policy, we forbid for an entity to be able to have or possess a piece of data of certaintype/group, then in the architecture the same entity cannot have or possess a piece of dataof the same type/group.2. If in the policy, we forbid for an entity to be able to link/uniquely link two pieces of dataof certain types/groups, then in the architecture the same entity cannot link/uniquely linktwo pieces of data of the same types/groups.

1. In the policy, we forbid for an entity to be able to have or possess a piece of data of certaintype/group, but in the architecture the same entity can/is be able to have or possess a pieceof data of the same type/group.2. In the policy, we forbid for an entity to be able to link/uniquely link two pieces of data ofcertain types/groups, but in the architecture the same entity can/is be able to link/uniquelylink two pieces of data of the same types/groups.

The privacy conformance captures if an architecture satisﬁes the data protection requirementsdeﬁned in the policy. Namely:1. If in the policy, the (collection, usage, storage, transfer) consent collection is required fora piece of data of given type/group, then in the architecture there is a collection for thecorresponding consent. 50. If in the policy, we deﬁne a (collection, usage, storage) purpose action:data for a piece ofdata of certain type/group, then in the architecture there is the action action deﬁned on acompound data type data.

1. In the policy, the (collection, usage, storage, transfer) consent collection is required for apiece of data of given type/group, but in the architecture, there is no collection for thecorresponding consent.2. In the policy, we deﬁne a (collection, usage, storage) purpose action:data for a piece of dataof certain type/group, but in the architecture there is not any action action deﬁned on acompound data type data, or besides action, there are also other actions deﬁned in thearchitecture on data that are not allowed in the policy.3. In the policy, we deﬁne(a) a storage option “Main and Backup Storage" for a piece of data of certain type/group,but in the architecture there is a STORE or STOREAT action deﬁned for some storageplace, diﬀerent from mainstorage and backupstorage , for the same data type/group;(b) a storage option “Only Main Storage", but in the architecture there is a

STORE or STOREAT action deﬁned for some storage place, diﬀerent from mainstorage , forthe same data type/group.4. In the policy, we deﬁne(a) a deletion option “From Main and Backup Storage" for a piece of data of a certain datatype/group, data, but in the architecture there is not any of the action•

DELETE ( mainstorage ,data) or• DELETEWITHIN ( mainstorage ,data, Time (tvalue)), or•

DELETE ( backupstorage ,data) or• DELETEWITHIN ( backupstorage ,data, Time (tvalue));(b) a deletion option “Only From Main Storage" for a piece of data of a certain datatype/group, data, but in the architecture there is no action

DELETE ( mainstorage ,data)or DELETEWITHIN ( mainstorage ,data, Time (tvalue)).5. In the policy, we allow a piece of data of certain type/group, data, to be transferred to anentity ent, but in the architecture there is also an action

RECEIVEAT (ent1,data,

Time(t) )or

RECEIVE (ent1,data) deﬁned for some ent1 to whom we do not allow data transfer inthe policy.

In this section, we highlight the operation of DataProVe using two very simple examples.

In this example, in the policy we specify a data group (a group of data types) called personalinfo ,which is stored centrally at the main storage places of the service provider. In the storage sub-policy, we also set that storage consent is required before the storage of personalinfo . Finally, wedo not give service provider (sp) the right to have the data of group/type personalinfo . In thedeletion policy, we set the retention delay in the main storage to 8 years (i.e. 8y in Figure 33).In the architecture level, we add an action that says a piece of data of type personalinfo mustbe deleted from the main storage within 10 years (action DELETEWITHIN, in the last line).51igure 33: We set that the data of type/group personal information must be deleted from themain storage places of the service provider within 8 year.Content of spmessages : RECEIVEAT ( sp , Sconsent (personalinfo),

Time (t))Content of storagemessages : RECEIVEAT ( mainstorage ,personalinfo, Time (t))Content of storemain : STOREAT ( mainstorage ,personalinfo, Time (t))Content of deletion : DELETEWITHIN ( mainstorage ,personalinfo, Time (10y)).In the architecture shown in Figure 34, the service provider (sp) can receive a storage consentfor personalinfo at some non-speciﬁc time t . The main storage places of sp can receive the data atsome non-speciﬁc time and store it. The data of this type/group is deleted within 10 years fromthe main storage places.As a veriﬁcation result (Figure 35), we got that the architecture violates the privacy con-formance, as the architecture allows for sp to have the data of type personalinfo after 8 years,however, in the policy we set it to only 8 years. In the last line of the veriﬁcation result window,we can also see a DPR conformance property, namely, sp collects storage consent before the datais stored. In the second simple example, we focus on the data possession and data connection sub-policies.We present the receive action with the Meta construct (metadata or "packet" header data such asIP address, source, destination addresses, etc.).In the policy, we deﬁne four data groups, nhsnumber (National Health Service number), name,photo, and address (see Figure 36).Then, we forbid (any kind of link-ability, not only unique link) for the service provider to beable to link two pieces of data of types nhsnumber, and photo (see Figure 37). Again, we alsoforbid for the service provider to be able to have all the four data types/groups.52igure 34: The service provider (sp) stores the personal information in its main storage places.Figure 35: The veriﬁcation results show the violation of the privacy and DPR conformance prop-erties. We also got the ﬁrst two lines of DPR conformance because in this example, we did notspecify the collection and usage sub-policies (we left them blank).Figure 36: The policy level with the four data types/groups.53igure 37: The speciﬁed data connection sub-policy for example 2.In the architecture, a service provider collects data from two phone applications (Figure 38).The "HealthXYZ" app sends the service provider a sickness record with a public IP address (anunique IP of a phone) other app, called, "SocialXYZ" also sends the social proﬁle with the sameip address (same phone). Both data types are encrypted (using symmetric key encryption) withthe service provider keys (and sp owns the two keys).Figure 38: The speciﬁed architecture for example 2.54ontent of spmessage1 in Figure 38:

RECEIVE ( sp , Senc (Sicknessrecord(nhsnumber,name,

Meta (ip)),spkey1))Content of spmessage2 in Figure 38:

RECEIVE ( sp , Senc (Socproﬁle(photo,address,

Meta (ip)),spkey2))Content of spowned in Figure 38:

OWN ( sp ,spkey1) OWN ( sp ,spkey2)As a result (Figure 39), we got that the service provider not only be able to link the data oftypes nhsnumber with the data of type photo, but it also has all the data of types nhsnumber,name, photo and address. The reason is that sp will be able to decrypt both messages and link,have the data inside them. Note that we only have linkability but not unique link, because theApps can be used by diﬀerent people in one family, so the set of possible individuals can benarrowed down, but sp cannot be sure that nhsnummber and photo belong to the same individual.Figure 39: The veriﬁcation result for example 2. We addressed the problem of formal speciﬁcation and automated veriﬁcation of data protectionrequirements at the policy and architecture levels. Speciﬁcally, we proposed a variant of policy andarchitecture languages to specify a simple set of data protection requirements based on the GDPR.In addition, we proposed DataProVe, a tool based on the syntax of our languages and a logicbased veriﬁcation engine to check the conformance between a policy and an architecture. In thispaper, our language variants and tool only cover a limited set of data protection requirements in anabstract way, hence, there are many possibilities to extend and improve their syntax and semanticsto specify more complex laws. Regarding the conformance check of the privacy properties (theright to have and link data), a possible extension would be including the behaviour of the hostileattackers (e.g. steal personal data) in the veriﬁcation. Finally, we plan to improve the eﬀectivenessof the conformance check algorithm for the data types with a large number of nested layers.55 eferences [1] General Data Protection Regulation (GDPR). Article 4. https://gdpr-info.eu/art-4-gdpr/ .[2] Erika McCallister, Tim Grance, Karen Scarfone. Guide to Protecting the Conﬁdentiality ofPersonally Identiﬁable Information (PII). Natinonal Institute of Standards and Technology.US Department of Commerce, SP 800-122, 1995.[3] Karen Kullo. Facebook sued over alleged scanning of private messages. Bloomberg,2 January 2014. .[4] Samual Gibbs. Belgium takes Facebook to court over privacy breaches and user tracking.The Guardian, 15 June 2015. .[5] Sean Buckley. Deleting Google Photos won’t stop your phone from upload-ing pictures. Engaget.com, 13 July 2015. .[6] Facebook and Cambridge Analytica: What You Need to Know as Fallout Widens. TheNew York Times, 19 March 2018. .[7] Google faces UK suit over alleged snooping on iPhone users. Financial Times, 30 November2017. .[8] General Data Protection Regulation (GDPR). Article 25. https://gdpr-info.eu/art-25-gdpr/ .[9] General Data Protection Regulation (GDPR). Article 6. https://gdpr-info.eu/art-6-gdpr/ .[10] The Platform for Privacy Preferences. P3P, 2012. .[11] The Platform for Privacy Preferences (P3P). APPEL 1.0, 2012. .[12] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu. Xpref: a preferencelanguage for p3p.

Computer Networks , 48(5):809 – 827, 2005. Web Security.[13] Kathy Bohrer and Bobby Holland. Customer Proﬁle Exchange (CPExchange) SpeciﬁcationVersion 1.0, 2000. http://xml.coverpages.org/cpexchangev1_0F.pdf .[14] OASIS Open. Extensible access control markup language (xacml) version 3.0, 2017. http://docs.oasis-open.org/xacml/3.0/errata01/os/xacml-3.0-core-spec-errata01-os.html .[15] P. Ashley, S. Hada, G. Karjoth, C. Powers and M. Schunter. Enterprise PrivacyAuthorization Language (EPAL 1.2), 2000. .[16] Monir Azraoui, Kaoutar Elkhiyaoui, Melek Önen, Karin Bernsmed, Anderson SantanaDe Oliveira, and Jakub Sendor. A-ppl: An accountability policy language. In JoaquinGarcia-Alfaro, Jordi Herrera-Joancomartí, Emil Lupu, Joachim Posegga, Alessandro Aldini,Fabio Martinelli, and Neeraj Suri, editors,

Data Privacy Management, Autonomous Sponta-neous Security, and Security Assurance , pages 319–326, Cham, 2015. Springer InternationalPublishing. 5617] S Trabelsi, Akram Njeh, Laurent Bussard, and Gregory Neven. Ppl engine: A symmetricarchitecture for privacy policy handling.

W3C Workshop on Privacy and data usage control ,pages 1–5, 04 2010.[18] J. Lobo, R. Bhatia, and S. Naqvi. A policy description language. In

Proceedings 16th NationalConference on Artiﬁcial Intelligence , AAAI-99, pages 291–298, Orlando, USA, 1999. ACM.[19] R.S. Sandhu, E.J. Coyne, H.L. Feinstein, and C.E. Youman. Role-based access control models.

IEEE Computer , 29(2):38–47, 1996.[20] Sushil Jajodia, Pierangela Samarati, and V. S. Subrahmanian. A logical language for express-ing authorizations. In

Proceedings of the 1997 IEEE Symposium on Security and Privacy , SP’97, pages 31–46, Washington, DC, USA, 1997. IEEE Computer Society.[21] Nicodemos Damianou, Naranker Dulay, Emil Lupu, and Morris Sloman. The ponder policyspeciﬁcation language. In

Proceedings of the International Workshop on Policies for Dis-tributed Systems and Networks , POLICY ’01, pages 18–38, London, UK, UK, 2001. Springer-Verlag.[22] Lalana Kagal, Tim Finin, and Anupam Joshi. A policy language for a pervasive computingenvironment. In

Proceedings of the 4th IEEE International Workshop on Policies for Dis-tributed Systems and Networks , POLICY ’03, pages 63–, Washington, DC, USA, 2003. IEEEComputer Society.[23] Jeﬀ Magee, Naranker Dulay, Susan Eisenbach, and Jeﬀ Kramer. Specifying distributed soft-ware architectures. In Wilhelm Schäfer and Pere Botella, editors,

Software Engineering —ESEC ’95 , pages 137–153, Berlin, Heidelberg, 1995. Springer Berlin Heidelberg.[24] A calculus of mobile processes, i.

Information and Computation , 100(1):1 – 40, 1992.[25] Robert Allen and David Garlan. A formal basis for architectural connection.

ACM Transac-tion on Software Engineering and Methodology , 6(3):213–249, July 1997.[26] C. A. R. Hoare. Communicating sequential processes.

Communications of the ACM ,21(8):666–677, August 1978.[27] D. C. Luckham and J. Vera. An event-based architecture deﬁnition language.

IEEE Trans-actions on Software Engineering , 21(9):717–734, 1995.[28] F. Plasil, D. Balek, and R. Janecek. Sofa/dcup: architecture for component trading and dy-namic updating. In

Proceedings. Fourth International Conference on Conﬁgurable DistributedSystems (Cat. No.98EX159) , pages 43–51, 1998.[29] F. Plasil and S. Visnovsky. Behavior protocols for software components.

IEEE Transactionson Software Engineering , 28(11):1056–1076, 2002.[30] R. B. Franca, J. Bodeveix, M. Filali, J. Rolland, D. Chemouil, and D. Thomas. The aadlbehaviour annex – experiments and roadmap. In , pages 377–382, 2007.[31] J. Perez, I. Ramos, J. Jaen, P. Letelier, and E. Navarro. Prisma: towards quality, aspectoriented and dynamic software architectures. In

Third International Conference on QualitySoftware, 2003. Proceedings. , pages 59–66, 2003.[32] Valérie Issarny, Amel Bennaceur, and Yérom-David Bromberg.

Middleware-Layer ConnectorSynthesis: Beyond State of the Art in Middleware Interoperability , pages 217–255. SpringerBerlin Heidelberg, Berlin, Heidelberg, 2011.[33] Amelia Bădică and Costin Bădică. Fsp and ﬂtl framework for speciﬁcation and veriﬁcationof middle-agents.

Int. J. Appl. Math. Comput. Sci. , 21(1):9–25, March 2011.5734] Vinh-Thong Ta, Denis Butin, and Daniel Le Métayer. Formal accountability for biometricsurveillance: A case study. In Bettina Berendt, Thomas Engel, Demosthenes Ikonomou,Daniel Le Métayer, and Stefan Schiﬀner, editors,

Privacy Technologies and Policy , pages21–37, Cham, 2016. Springer International Publishing.[35] Denis Butin and Daniel Le Métayer. Log Analysis for Data Protection Accountability. In , volume 8442 of

Lecture Notesin Computer Science , pages 163–178. Springer, 2014.[36] Vinh-Thong Ta and Thibaud Antignac. Privacy by design: On the conformance between pro-tocols and architectures. In Frédéric Cuppens, Joaquin Garcia-Alfaro, Nur Zincir Heywood,and Philip W. L. Fong, editors,

Foundations and Practice of Security , pages 65–81, Cham,2015. Springer International Publishing.[37] Thibaud Antignac and Daniel Le Métayer. Privacy architectures: Reasoning about data min-imisation and integrity. In Sjouke Mauw and Christian Damsgaard Jensen, editors,

Securityand Trust Management , pages 17–32, Cham, 2014. Springer International Publishing.[38] General Data Protection Regulation (GDPR). Article 5. https://gdpr-info.eu/art-30-gdpr/ .[39] General Data Protection Regulation (GDPR). Article 30. https://gdpr-info.eu/art-30-gdpr/ .[40] General Data Protection Regulation (GDPR). Article 17. https://gdpr-info.eu/art-17-gdpr/ .[41] General Data Protection Regulation (GDPR). Article 46. https://gdpr-info.eu/art-46-gdpr/https://gdpr-info.eu/art-46-gdpr/