[PDF] Fuzzi: A Three-Level Logic for Differential Privacy

Abstract

Curators of sensitive datasets sometimes need to know whether queries against the data are differentially private [Dwork et al. 2006]. Two sorts of logics have been proposed for checking this property: (1) type systems and other static analyses, which fully automate straightforward reasoning with concepts like "program sensitivity" and "privacy loss," and (2) full-blown program logics such as apRHL (an approximate, probabilistic, relational Hoare logic) [Barthe et al. 2016], which support more flexible reasoning about subtle privacy-preserving algorithmic techniques but offer only minimal automation. We propose a three-level logic for differential privacy in an imperative setting and present a prototype implementation called Fuzzi. Fuzzi's lowest level is a general-purpose logic; its middle level is apRHL; and its top level is a novel sensitivity logic adapted from the linear-logic-inspired type system of Fuzz, a differentially private functional language [Reed and Pierce 2010]. The key novelty is a high degree of integration between the sensitivity logic and the two lower-level logics: the judgments and proofs of the sensitivity logic can be easily translated into apRHL; conversely, privacy properties of key algorithmic building blocks can be proved manually in apRHL and the base logic, then packaged up as typing rules that can be applied by a checker for the sensitivity logic to automatically construct privacy proofs for composite programs of arbitrary size. We demonstrate Fuzzi's utility by implementing four different private machine-learning algorithms and showing that Fuzzi's checker is able to derive tight sensitivity bounds.

Full PDF

FFuzzi: A Three-Level Logic for Differential Privacy

HENGCHU ZHANG,

University of Pennsylvania, USA

EDO ROTH,

University of Pennsylvania, USA

ANDREAS HAEBERLEN,

University of Pennsylvania, USA

BENJAMIN C. PIERCE,

University of Pennsylvania, USA

AARON ROTH,

University of Pennsylvania, USACurators of sensitive datasets sometimes need to know whether queries against the data are differentiallyprivate [Dwork et al. 2006]. Two sorts of logics have been proposed for checking this property: (1) typesystems and other static analyses, which fully automate straightforward reasoning with concepts like “programsensitivity” and “privacy loss,” and (2) full-blown program logics such as apRHL (an approximate, probabilistic,relational Hoare logic) [Barthe et al. 2016], which support more flexible reasoning about subtle privacy-preserving algorithmic techniques but offer only minimal automation.We propose a three-level logic for differential privacy in an imperative setting and present a prototypeimplementation called Fuzzi. Fuzzi’s lowest level is a general-purpose logic; its middle level is apRHL; and itstop level is a novel sensitivity logic adapted from the linear-logic-inspired type system of Fuzz, a differentiallyprivate functional language [Reed and Pierce 2010]. The key novelty is a high degree of integration betweenthe sensitivity logic and the two lower-level logics: the judgments and proofs of the sensitivity logic can beeasily translated into apRHL; conversely, privacy properties of key algorithmic building blocks can be provedmanually in apRHL and the base logic, then packaged up as typing rules that can be applied by a checker forthe sensitivity logic to automatically construct privacy proofs for composite programs of arbitrary size.We demonstrate Fuzzi’s utility by implementing four different private machine-learning algorithms andshowing that Fuzzi’s checker is able to derive tight sensitivity bounds.Additional Key Words and Phrases: Differential privacy, typechecking, static analysis, apRHL, Fuzz, Fuzzi

ACM Reference Format:

Hengchu Zhang, Edo Roth, Andreas Haeberlen, Benjamin C. Pierce, and Aaron Roth. 2019. Fuzzi: A Three-LevelLogic for Differential Privacy.

Proc. ACM Program. Lang.

1, 1 (June 2019), 42 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

Differential privacy [Dwork et al. 2006] has become the gold standard for privacy-preserving statis-tical analysis in the academic community, and it is being adopted by a growing number of industryand government organizations, including Apple [Apple 2017], Google [Erlingsson et al. 2014],Microsoft [Microsoft 2017] and the US Census Bureau [N. Dajani et al. 2017]. Differential privacymakes minimal assumptions about an adversary’s knowledge, allowing analysts to quantitativelyestimate privacy loss. However, the reasoning needed to correctly achieve differential privacy can

Authors’ addresses: Hengchu Zhang, University of Pennsylvania, USA, [email protected]; Edo Roth, University ofPennsylvania, USA, [email protected]; Andreas Haeberlen, University of Pennsylvania, USA, [email protected];Benjamin C. Pierce, University of Pennsylvania, USA, [email protected]; Aaron Roth, University of Pennsylvania,USA, [email protected] to make digital or hard copies of all or part of this work for personal or classroom use is granted without feeprovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice andthe full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee. Request permissions from [email protected].© 2019 Association for Computing Machinery.2475-1421/2019/6-ART $15.00https://doi.org/10.1145/nnnnnnn.nnnnnnn Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. a r X i v : . [ c s . P L ] M a y Hengchu Zhang, Edo Roth, Andreas Haeberlen, Benjamin C. Pierce, and Aaron Roth be rather subtle, as multiple errors in published algorithms attest [Chen and Machanavajjhala 2015;Lyu et al. 2016].Barthe et al. [2016] developed the first program logic for formalizing proofs of differential privacyfor imperative programs, called apRHL (Approximate, Probabilistic, Relational Hoare Logic). Theabstractions provided by apRHL are expressive enough to capture the essence of many complexdifferentially private algorithms, allowing experts to prove differential privacy for small, trickycode sequences at a fairly high level of abstraction. However, proving differential privacy in apRHLfor large programs can be a rather tedious endeavor. Fortunately, in many proofs for larger privatedata analysis programs (either in apRHL or on paper), the expert knowledge of differential privacyis concentrated in the analysis of small differentially private subroutines, while the rest of theproof basically just propagates “sensitivity” information and aggregate privacy costs betweensubroutines. This suggests that one could considerably increase the range of possible use cases,especially for analysts who are not privacy experts, by combining a small but extensible set ofbuilding blocks (and the corresponding manual proofs) with a largely automated analysis thatmechanically completes the proof for a given program.To enable this approach, we build a new layer of abstraction over apRHL to automate themechanical parts of this process. This layer tracks sensitivities for program variables and privacycosts of commands using Hoare-triple-style proof rules. This information about sensitivity andprivacy cost has a direct translation to lower-level apRHL assertions. This allows information inthe higher-level logic to seamlessly interact with expert proofs of differential privacy that havebeen carried out using the two lower layers. Since the top layer is entirely automated, we will oftenrefer to it as a type system (and to its proof rules as typing rules).We use the term mechanisms to refer to building blocks of differentially private programs. Manydifferentially private mechanisms can be viewed as parameterized program templates, where thedifferential privacy properties depend on properties of the instantiation parameters, which canthemselves be program expressions or commands. In order to integrate expert reasoning about suchmechanisms, we develop a framework for expressing program templates and the correspondingparameterized proofs of differential privacy. This allows experts to extend the sensitivity typesystem with a specialized typing rule for each template, allowing non-expert programmers to writeapplication programs that combine these templates in straightforward ways. This framework usesapRHL directly to give structured proofs of privacy property, while using the general-purpose baselogic to establish lower-level semantic properties that go beyond the capabilities of apRHL.We instantiate these ideas in the design and implementation of Fuzzi, a small imperative languagefor differentially private queries with automatic and extensible typechecking. Following a briefreview of technical background on differential privacy (Section 2) and a high-level overview ofFuzzi’s design (Section 3), we offer the following contributions:(1) We propose a high-level sensitivity logic for tracking differential privacy (Section 4). Thislogic is expressive enough to capture detailed sensitivity properties for a simple imperativecore language; its soundness is established via a straightforward embedding into apRHL.(2) We show how to connect manual proofs for privacy properties of algorithmic buildingblocks to the sensitivity logic and develop proofs for several mechanisms that transformprivate datasets, plus a mechanism that aggregates privacy costs better than straightforwardcomposition (Section 5).(3) Using a prototype implementation of Fuzzi (Section 6), we implement private machinelearning algorithms from four different classes of learning methods (discriminative models,ensemble models, generative models and instance-based learning) and show that Fuzzi’schecker is able to derive tight sensitivity bounds (Section 7).

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 3

Section 8 discusses limitations of the current design. Sections 9 and 10 survey related and futurework.

Differential privacy is an indistinguishability property of randomized programs on neighboringinput datasets. Informally, a function is differentially private if removing or adding a single row inthe input results in at most a small change in the output distribution.Definition 1 (Neighboring Dataset).

Two datasets are neighbors if one can be transformed intothe other by adding or removing a single row of data.

Let D and D ′ be two neighboring datasets, and let f be a randomized program. The output of f is a sample from some distribution parameterized by the input datasets. We write f ( D ) and f ( D ′ ) for these two distributions.Definition 2 ( ( ϵ , δ ) -Differential Privacy [Dwork et al. 2006]). The program f is ( ϵ , δ ) -differentially private if, for any set of possible outputs E , the probability of observing E satisfies therelation P x ∼ f ( D ) [ x ∈ E ] ≤ e ϵ P x ∼ f ( D ′ ) [ x ∈ E ] + δ . The parameters ϵ and δ quantify different aspects of the privacy cost of a differentially privatecomputation. Informally, the value of ϵ measures the ability of an observer to distinguish whether f was run with D or D ′ after observing E in the “common case”, while δ serves as an upper boundon the probability that f fails to provide the privacy guarantee implied by ϵ . The parameter ϵ istypically taken to be a small constant (say, 1), whereas δ must be set so that δ ≪ / n , where n isthe number of dataset rows, in order for the privacy guarantees to be non-trivial (otherwise analgorithm which outputs a dataset row uniformly at random satisfies ( , δ ) -differential privacy). The notion of sensitivity is crucial to differential privacy. Many differentially private mechanismsrelease data by adding noise proportional to the sensitivity of a private value. In Fuzzi, the term“sensitivity” specifically refers to an upper bound on the distance between the values held by somevariable between any two runs.Distance may be calculated differently for values of different types. For primitive values with type int and real , distance is the magnitude of the two values’ difference. However, for arrays, thereare two important distance definitions for differential privacy: database distance and

L1 distance .The database distance measures the number of rows that need to be added or removed in order tomake two datasets indistinguishable up to permutation; while the L1 distance measures the sum ofelement-wise distance between vectors. To avoid confusion in later discussions, we refer to arraysfor which distance is intended to be measured as database distance as bags and arrays with L1distance as vectors . When we come to defining the type system, we will write { τ } for the type ofbags holding values of type τ and [ τ ] for vectors of τ (Figure 1). As an example, the two arrays [ , , ] and [ , , ] have vector distance 2, but they have bag distance 4, since we need to removeelements 2 and 5 and add elements 3 and 4 to the first bag in order to make it a permutation of thesecond one. Database distance can actually be viewed as just L1 distance on a different representation of datasets. The differentialprivacy literature sometimes uses the “histogram representation” for datasets. For a universe of possible elements U ,the histogram representation maps each x ∈ U to a count of how many times x appears, and the L1 distance of thisrepresentation corresponds to the database distance. However, in order to keep Fuzzi’s semantics minimal as a core language,we choose to represent datasets as arrays, rather than maps from records to counts.Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. Hengchu Zhang, Edo Roth, Andreas Haeberlen, Benjamin C. Pierce, and Aaron Roth σ : = int | real | bool τ : = σ | [ τ ] | { τ } op : = + | − | · | / | && | || | < | ≤ | > | ≥ | ¬ e , i : = x | lit | e op e | e [ i ] | e . lenдth c : = x = e | x [ i ] = e | x . lenдth = e | x = L b ( e ) | if e then c else c end | while e do c end | c ; c Fig. 1. Core Language Syntax

Formally, we write d τ for the distance function at type τ , with type τ × τ → R + ∪ {∞} —i.e., itmaps two values of type τ to a non-negative real number or infinity.Definition 3 (Vector distance). If a and a are two vectors of the same length and their elementshave type τ , then the vector distance is defined as d [ τ ] ( a , a ) = (cid:205) L − i = d τ ( a [ i ] , a [ i ]) where L is thelength of both vectors. Vectors of different lengths are assigned the distance ∞ . Definition 4 (Bag distance).

Let a and a be two bags; their distance is defined as d { τ } ( a , a ) = | a \ a | + | a \ a | . The backslash operator is multiset difference. Note that bag distance (unlike vector distance) ismeaningful for bags of different sizes. This is the same database distance introduced earlier.

The Laplace mechanism is an essential tool for releasing private data with bounded sensitivi-ties [Dwork et al. 2006; Dwork and Roth 2014]. Fuzzi provides access to the Laplace mechanismthrough the sampling assignment command x = L b ( e ) , which adds noise to the value of e andassigns that value to the variable x , with the constant literal b determining the scale of the noise.Adding noise scaled with b to a value with sensitivity s incurs a privacy cost of ( s / b , ) . Fuzzi’stype system will statically keep track of each usage of the Laplace mechanism and report an upperbound of the total privacy cost as part of a program’s type. The core of Fuzzi is a simple imperative programming language with while loops, conditionals, andassignments (Figure 1). It has just a few built-in data types: reals, integers, booleans, and arrays(whose elements can be reals, integers, booleans, or nested arrays). Programs can modify the lengthof arrays through assignments of the form x . length = e . When the value of e is less than the currentlength of x , the array is truncated; and when the value of e is greater than the length of x , the arrayis padded with default values that are of the same data type as elements in x . If e evaluates to anegative number, the length assignment diverges.One slightly unusual feature of Fuzzi is that all assignments are copying assignments, includingassignments to array variables. For example, if x holds an array value, then the assignment y = x sets y to a copy of x , instead of making both x and y point to the same underlying array. We makethis choice to avoid reasoning about sharing, which we consider as out of scope for this work.The special command x = L b ( e ) performs probabilistic assignment to x by sampling from aLaplace distribution centered at the value of e , with width equal to the value of b (which must be areal-valued literal). Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 5

From now on, we refer to the sensitivity logic as a type system to emphasize that it is a specializedand automated layer that tracks sensitivity and privacy cost as types. The main data structuremanipulated by Fuzzi’s type system is typing contexts —maps from program variables to sensitivities,represented as non-negative reals extended with infinity. ϕ : = r ∈ R ≥ ∪ {∞} τ : = σ | [ τ ] | { τ } Γ : = ∅ | x : ϕ τ , Γ A typing context Γ should be interpreted as a relation between two different run-time stores—intuitively, the stores arising from two “neighboring” executions starting with neighboring initialstates. For each variable x with sensitivity Γ ( x ) , the values in the two stores must be no more than Γ ( x ) apart.Typing judgements for commands have the form { Γ } c { Γ ′ , ( ϵ , δ )} , meaning that, if the distanceof the values in two run-time stores are described by the initial typing context Γ , then executing c will either both diverge or else both terminate with two final stores described by Γ ′ , along the wayincurring a privacy cost of ( ϵ , δ ) .For example, the typing rule for commands of the form x = e computes the sensitivity of e usingsensitivities of its free variables, and maps x to this sensitivity of e in the output context. Assign Γ ⊢ e ∈ s τ { Γ } x = e { Γ [ x (cid:55)→ s ] , ( , )} The typechecker also computes the privacy cost ( ϵ , δ ) incurred by the analyzed command. In thecase of assignment, no privacy cost is incurred, so the output from the typechecker after processing x = e is the updated typing context Γ [ x (cid:55)→ s ] and the pair of privacy costs ( , ) , where s is thederived sensitivity of e under Γ .A more interesting typing rule is the one for sequence commands of the form c ; c . This rulechains together the typing judgements for each of the commands, using the output context Γ i fromanalyzing c i as the input context for processing the next command c i + . The privacy cost incurredby the whole program is the sum of privacy costs ( ϵ i , δ i ) incurred by each c i , following the “simplecomposition theorem” for differential privacy [Dwork et al. 2006]. Seqence { Γ } c { Γ , ( ϵ , δ )} { Γ } c { Γ , ( ϵ , δ )}{ Γ } c ; c { Γ , ( ϵ + ϵ , δ + δ )} There are also core typing rules for simple loops and conditionals that do not branch on sensitivedata; we will see these in Section 4.1.

The privacy properties of interesting differentially private mechanisms are generally too subtle tobe tracked by the core type system. In Fuzzi, such mechanisms can be defined as extensions andequipped with specialized typing rules whose soundness is proved manually. Such proofs typicallyinvolve reasoning about relational properties for distributions, as well as aggregating privacy costs.The program logic apRHL is tailored to tackle both problems, making it a good choice for rigorousmanual proofs of differential privacy.An apRHL judgement has the form ⊢ c ∼ ( ϵ , δ ) c : Φ ⇒ Ψ , where c and c are two commandsto be related, Φ and Ψ are relational assertions that state pre- and post-conditions relating theprogram states before and after executing c and c . A sound apRHL judgement can be roughlyinterpreted as: if (1) some pair of program states satisfy the pre-condition Φ , and (2) executing c in the first state terminates iff executing c in the second state does, then the pair of states Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019.

Hengchu Zhang, Edo Roth, Andreas Haeberlen, Benjamin C. Pierce, and Aaron Roth after executing c and c will satisfy the post-condition Ψ , incurring privacy cost ( ϵ , δ ) . To expressdifferential privacy as a post-condition, we can simply state out ⟨ ⟩ = out ⟨ ⟩ for the output of theprograms.Given a typing context Γ , we can interpret Γ as a relation on program states. Writing ⟨ ⟩ and ⟨ ⟩ after variables to refer to their values in the first or second execution, we translate x : σ τ ∈ Γ tothe assertion d τ ( x ⟨ ⟩ , x ⟨ ⟩) ≤ σ ; the conjunction of all these pointwise distance assertions formsan apRHL assertion that corresponds to Γ .Conversely, to connect manual proofs in apRHL with the type system, we phrase their premisesand conclusions as typing judgements. Indeed, we use apRHL not only for extensions but also forthe soundness proofs of the core typing rules. As a result, the privacy proofs implicitly constructedby the typechecker are combinations of apRHL proof objects, some of them generated by thetypechecker, others written manually by experts. To give a first taste of Fuzzi’s differential privacy typechecking process, we present a simpleprogram that computes a private approximation for the average income of a group through privateestimations of the group’s size and sum. First, we estimate the group’s size with the Laplacemechanism. size = L . ( group . length ) ; Assuming that group is a dataset with sensitivity 1, Fuzzi’s typechecker deduces its size is 1-sensitive.Applying the Laplace mechanism then incurs a ( . , ) -privacy cost.Next, we sum the group’s incomes using the mechanism bsum , pronounced “bag sum”, whichclips each income value so that its magnitude is at most a given constant (here 1000). bsum ( group , sum , i , temp , 1000) ; This clipping step ensures the sum does not vary too much on neighboring datasets. Withoutclipping, a single outlier could sway the sum substantially, revealing the outlier’s existence, andviolating differential privacy. The parameters sum , i , and temp specify the names of variables that bsum can use for internal purposes. It is the programmer’s responsibility to make sure they do notclash with variables used elsewhere in the program. (It should not be hard to fix this infelicity bymaking extensions themselves deal with fresh variable generation, but doing so will introduce afew additional technicalities so we leave it for future work.)The command bsum (...) refers to an extension that expands to a sequence of plain core-languagecommands implementing summing up a bag of numbers with clipping: extension bsum (in , out , idx , t_in , bound ) {idx = 0;out = 0.0; while idx < in . length do t_in = in [ idx ]; if t_in < -1.0 * bound then out = out - bound ; elseif t_in > bound then out = out + bound ; else out = out + t_in ; endend ;idx = idx + 1; end }; Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 7

This specifies the name of the extension, the names of its parameters (which range over Fuzziexpressions and commands), and how it expands to core Fuzzi commands. During typechecking,extension applications are replaced by their expansions, with extension variables substituted bythe snippets of Fuzzi syntax provided as parameters. )The typing rule for bsum is: literal bound bound ≥ ϕ = Γ ( in ) Γ out = Γ [ out (cid:55)→ ϕ · bound ][ i , t in (cid:55)→ ∞]{ Γ } bsum ( in , out , i , t in , bound ) { Γ out , ( , )} It requires the last parameter bound to be a non-negative literal value—non-negative because bound specifies the clipping magnitude, and literal because the sensitivity of the output variable dependson bound . The inference rule updates the sensitivity of the output sum variable to the product of bound and the sensitivity of Γ ( in ) . Intuitively, since up to ϕ elements may be added or removedfrom in , and each can contribute up a value with magnitude up to bound toward the sum, the sumvalue itself will vary by at most ϕ · bound . This intuition can be made rigorous, as we show inAppendix D.4.The Haskell implementation of the Fuzzi typechecker is likewise extended with a piece of codeimplementing the typing rule as a function that transforms an input typing context to an outputtyping context and privacy costs.Continuing the example, we next compute differentially private estimates of the clipped sumsand calculate the group’s average income using the size and sum estimates: noised_sum = L . ( sum ) ;avg = noised_sum / size ; The sum variable is 1000-sensitive, so releasing noised_sum incurs another ( . , ) -privacy cost. Thetypechecker reports an aggregate privacy cost of ( . , ) . Throughout the paper, we will use the operator [[·]] to denote the semantic function for commandsand expressions in Fuzzi. We use the notation ⃝ S to denote sub-distributions over values in S . Wewill use the letter M , N to stand for program states, which are finite maps from variable names tothe values they hold, and use the letter M to stand for the set of all program states.The semantics of a Fuzzi program c is a function from program states to sub-distributionsover program states [[ c ]] : M → ⃝ M . Each type in Fuzzi is associated with a set of values: int with the set Z , real with the set R , and [ τ ] and { τ } with the set of finite sequences of valuesassociated with τ . The meaning of a Fuzzi expression e with type τ is a partial function fromprogram states to associated values of that type [[ e ]] : M ⇀ τ . Partiality of expressions stems frominvalid operations such as arithmetic between incompatible values, and out-of-bound indexing.The complete definition of Fuzzi semantics can be found in Appendix A. Recall from Section 3.1that Fuzzi assignments are copy-assignments for all values, including vectors and bags.Fuzzi’s semantics directly follows from the work of Barthe et al. [2016]. It is worth noting that theoriginal apRHL developed by Barthe et al. [2016] only reasons with discretized Laplace distributions,and Fuzzi shares this restriction in its semantic model. A later model based on category theoryenhances apRHL’s proof rules for continuous distributions [Sato 2016]. However, the underlyingproof method of this model is not compatible with Fuzzi’s development, and only recently havenew abstractions been proposed to generalize the original apRHL proof methods to continuousdistributions [Sato et al. 2019]. Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019.

Hengchu Zhang, Edo Roth, Andreas Haeberlen, Benjamin C. Pierce, and Aaron Roth

Plus Γ ⊢ e l ∈ s τ Γ ⊢ e r ∈ t ττ = int ∨ τ = real Γ ⊢ e l + e r ∈ s + t τ Mult-L-Constant literal k Γ ⊢ e r ∈ t ττ = int ∨ τ = real Γ ⊢ k · e r ∈ k · t τ Mult Γ ⊢ e l ∈ s τ Γ ⊢ e r ∈ t ττ = int ∨ τ = real Γ ⊢ e l · e r ∈ approx ( s , t ) τ Fig. 2. Arithmetic Expression Typing Rules

Fuzzi’s typing context Γ tracks both the data type and the sensitivity of variables. Typecheckinginvolves checking data types as well as computing sensitivities. We refer to data typecheckingas shape checking and sensitivity computations as sensitivity checking . We will elide details ofshape checking since it is the standard typechecking that rules out operations between values ofincompatible types. To emphasize sensitivity checking in Fuzzi, and to reduce clutter in syntax,we will write Γ ( x ) for the sensitivity of the variable x under the typing context Γ , and we write Γ [ x (cid:55)→ s ] for a typing context which updates variable x ’s sensitivity to s , but does not alter itsdata type. We overload this syntax when we update a set of variables xs to the same sensitivity Γ [ xs (cid:55)→ s ] . We also overload the notation Γ ( e ) to denote the derived sensitivity of expression e under typing context Γ . When we need to refer to the data type of expression e , we will use the fulltyping judgment of an expression Γ ⊢ e ∈ ϕ τ , which we pronounce “expression e has sensitivity ϕ and type τ under context Γ .”We use the notation shape ( Γ ) to extract the shape checking context from a typing context Γ ,dropping all sensitivity annotations. In order to compute sensitivity updates throughout sequences of commands, the type system needsto first compute sensitivities for expressions used within each command. We discuss the typing rulesfor addition and multiplication here as examples. Intuitively, if the values of two expressions e l and e r can each vary by 1, then their sum can vary by at most 2 (the sum of their individual sensitivities)by the triangle inequality; and if the value of e can vary by at most 1, then multiplying e by a literalconstant k results in a value that can vary by at most k . The rules Plus, and Mult-L-Constantin Figure 2 capture these cases.There are also expressions for which we cannot give precise sensitivity bounds. For instance, ifone of the operands for a multiplication between e l and e r is sensitive, then, without knowing theexact value of the other operand, we cannot a priori know how much the value of entire productcan change. This case is captured by the Mult rule, where the function approx is defined by theequations approx ( , ) = approx ( s , s ) if s + s > = ∞ which conservatively take the sensitivity to be ∞ if at least one side of the expression is sensitive.Fuzzi provides bag and vector index operations, and Fuzzi’s typechecker supports sensitivitychecking for lookup expressions on bags and vectors. These typing rules use the definition of bagand vector distances to establish sound upper bounds of sensitivities on lookup expressions. Vector-Index Γ ⊢ e ∈ ϕ [ τ ] Γ ⊢ i ∈ int ϕ < ∞ Γ ⊢ e [ i ] ∈ ϕ τ Bag-Index Γ ⊢ e ∈ { τ } Γ ⊢ i ∈ int Γ ⊢ e [ i ] ∈ ∞ τ Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 9

Assign Γ ⊢ e ∈ ϕ ′ τ { Γ } x = e (cid:8) Γ [ x (cid:55)→ ϕ ′ ] , ( , ) (cid:9) Assign-Vector-Index Γ ⊢ x ∈ ϕ [ τ ] Γ ⊢ e ∈ σ τϕ < ∞ Γ ⊢ i ∈ int { Γ } x [ i ] = e { Γ [ x (cid:55)→ ϕ + σ ] , ( , )} Assign-Vector-Length Γ ⊢ x ∈ ϕ [ τ ] Γ ⊢ e ∈ int { Γ } x . lenдth = e { Γ , ( , )} Assign-Bag-Length Γ ⊢ x ∈ ϕ { τ } Γ ⊢ e ∈ int { Γ } x . lenдth = e { Γ [ x (cid:55)→ ∞] , ( , )} Laplace Γ , x : ϕ real ⊢ e ∈ ϕ ′ real (cid:8) Γ , x : ϕ real (cid:9) x = L b ( e ) (cid:8) Γ , x : real , ( ϕ ′ / b , ) (cid:9) Skip { Γ } skip { Γ , ( , )} Seqence { Γ } c { Γ , ( ϵ , δ )} { Γ } c { Γ , ( ϵ , δ )}{ Γ } c ; c { Γ , ( ϵ + ϵ , δ + δ )} If { Γ } c t { Γ t , ( ϵ t , δ t )} { Γ } c f (cid:8) Γ f , ( ϵ f , δ f ) (cid:9) Γ ⊢ e ∈ bool ϵ ′ = max ( ϵ t , ϵ f ) δ ′ = max ( δ t , δ f ){ Γ } if e then c t else c f end (cid:8) max ( Γ t , Γ f ) , ( ϵ ′ , δ ′ ) (cid:9) While { Γ } c { Γ , ( , )} Γ ⊢ e ∈ bool { Γ } while e do c end { Γ , ( , )} Fig. 3. Core Typing Rules

The Vector-Index rule applies when the lookup expression is 0-sensitive. A 0-sensitive indexvalue must be the same across two executions, and the distance between two values at the sameposition must be bounded by the overall sensitivity of the vector itself according to Definition 3. Asan example, given two vectors [ , , ] and [ , , ] , if we indexed both vectors at the last position,then the resulting values 3 and 4 are at distance 1 apart, which is bounded by the distance betweenthe original vectors. The premise ϕ < ∞ is necessary to ensure the indexed arrays have the samelength in both executions, so that the lookup expression terminates in one execution if and only ifit terminates in the other. We refer to this property as co-termination . It is discussed in Section 4.4.It may be surprising that Fuzzi’s typechecker only accepts bag lookup operations over non-sensitive bags. This is due to requirement of co-termination and the fact that bags with non-zerosensitivities may have different lengths in neighboring runs. To see why the bag lookup expressionhas sensitivity ∞ , consider two bags [ , , ] and [ , , ] ; these are at distance 0, but if weaccess both bags with index 1, the resulting values 100 and 2 are distance 98 apart. The typing judgments for commands has the form { Γ } c { Γ ′ , ( ϵ , δ )} . We can think of these judg-ments as a Hoare-triple— Γ is a pre-condition of the program c , and Γ ′ is a post-condition for c —annotated with ( ϵ , δ ) , the total privacy cost of running c .There are three forms of assignment in Fuzzi: (1) direct assignment to variables, (2) indexedassignment to vectors and bags, and (3) length assignments to vectors and bags. There is a separatetyping rule for each form of assignment (Figure 3). The Assign rule updates the LHS variable’ssensitivity to the derived sensitivity of the RHS expression. The Assign-Vector-Index rule addsthe derived sensitivity of RHS expression to a vector’s sensitivity provided the index itself is non-sensitive. (For example, consider the vectors xs ⟨ ⟩ = [ , , ] and xs ⟨ ⟩ = [ , , ] . If we performthe assignment xs [ ] = e where e ⟨ ⟩ = e ⟨ ⟩ =

10, then the two vectors become [ , , ] and [ , , ] , increasing the distance between them by 9. We require finite sensitivity of the vectorvariable on the left-hand-side to ensure co-termination—only vectors with finite sensitivities musthave the same length.) Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019.

We have separate rules for vector and bag length updates. In the Assign-Vector-Length rule,since the RHS length expression is non-sensitive, the two vectors will be truncated or padded tothe same length. In the case of truncation, the distance between the two vectors will decrease orremain the same; on the other hand, if both vectors were padded, since the pad will be the same,they will not introduce any additional distance. Thus, the LHS vector variable’s sensitivity remainsthe same.In the Assign-Bag-Length rule, it may be surprising that updating the lengths bags—even witha 0-sensitive new length—can result in ∞ -sensitivity for the bag. Consider two subsequences of thesame length X and Y , let L be their identical length. We can choose X and Y such that their bagdistance is 2 L . Now, the two bags XY and YX have distance 0 since they contain the same elements,but if we truncated both bags to length L , then their distance grows to 2 L . The Assign-Bag-Lengthrule must account for this worst case scenario by setting the sensitivity of x to ∞ .The core typing rules for operations that involve bags are rather restrictive. We will see inSection 5 how to operate more flexibly over bags using extensions.The Laplace rule computes the privacy cost of releasing a single sensitive real value. The Laplacerule sets the sensitivity of x to 0 after noise is added, which may seem surprising since x ’s value israndomized. Intuitively, the 0-sensitivity expresses that x ’s value is now public information andcan be used in the clear. We justify 0-sensitivity as an upper bound on the distance between x ⟨ ⟩ and x ⟨ ⟩ in Appendix B. However, readers do not need to look there to understand the Fuzzi’s typesytem design.The no-op command skip does not alter the program state at all: given any pre-condition Γ , wecan expect the same condition to hold after skip . Also since skip does not release any private data,it has a privacy cost of 0. This is described by the Skip rule.As described in Section 3, the Seqence rule chains together the intermediate Γ s for twocommands c and c and adds up the individual privacy costs for each command.The control flow command if may modify the same variable with different RHS expressionsin each branch; if we allowed expressions with arbitrary sensitivities as the branch condition, wewould not be able to derive valid sensitivities for modified variables due to different executionpaths. Consider the following example, where e is a sensitive boolean expression: if e then x =y else x = z end . In one execution, control flow may follow the true branch, assigning y to x ,while, on the other execution, control flow follows the false branch, assigning z to x . Since thetyping context Γ does not provide any information on the distance between y and z , we cannotderive a useful upper bound on | x ⟨ ⟩ − x ⟨ ⟩| after the if statement.On the other hand, if the branch condition is a non-sensitive boolean, then we know thatcontrol will go through the same branch in both executions. In this case, we can take the pointwisemaximum of the sensitivities from the post-condition of both branches to derive a sound sensitivityfor variables modified by the if statement. Similarly, the privacy cost of the entire if statement isbounded by the maximum of the two branches’ privacy costs.The core typing rule for while loops require Γ to be a loop invariant of the loop body c , the loopguard e a non-sensitive boolean value under Γ , and the loop body c incur no privacy cost. Had weallowed e be a sensitive boolean, then the while loop may diverge in one execution but terminatein the other. In order to ensure that the two executions co-terminate, we must force the values of e ⟨ ⟩ and e ⟨ ⟩ before each iteration. We achieve this by checking that the invariant Γ of the loopinduces a 0 sensitivity on the loop guard e .The Simple Composition Theorem [Dwork et al. 2006] implies that the total privacy cost of awhile loop is bounded by the sum of individual privacy cost from each iterations. Even though wecan ensure the two executions of while loops co-terminate, we cannot always statically tell forhow many iterations both loops will run. In order to ensure the soundness of the total privacy cost Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 11 estimation, we conservatively forbid loop bodies from using any commands that will increase ϵ or δ .These core typing rules place rather heavy restrictions on Fuzzi programs operating over vectorsand bags, or programs using conditionals and loops; for example, the core rules grossly overestimatesensitivities for vectors and bags and forbid sensitive booleans in branch and loop conditions. Thecore rules are designed for chaining together blocks of differentially private mechanisms, and thesetyping rules are often not enough to typecheck the implementation of interesting differentiallyprivate algorithms. We will see how to teach Fuzzi’s typechecker to derive more precise sensitivitiesfor differentially private mechanisms involving all these constructs in Section 5. There has been a rich line of work on developing type systems and language safety properties usingfoundational methods [Ahmed 2006; Appel and McAllester 2001; Appel et al. 2007; Frumin et al.2018; Jung et al. 2017, etc.]. The foundational approach develops the typing rules of a language astheorems in an expressive logic. Type systems developed using foundational methods benefit fromthe soundness of the underlying logic: if a typing rule is proven true as a theorem, then adding newrules as theorems to the type system will not break validity of existing rules. Most importantly,the foundational approach allows Fuzzi to mix typing rules for an automated typechecker withspecialized typing rules extracted from manual proofs of differential privacy.We choose apRHL [Barthe et al. 2016] as the foundational logic to build Fuzzi’s type system upon.The apRHL logic extends Floyd-Hoare Logic [Hoare 1969] with relational assertions, reasoningof probabilistic commands, and differential privacy cost accounting. An apRHL judgment has theform ⊢ c ∼ ( ϵ , δ ) c : Ψ ⇒ Φ . The metavariables c and c stand for two programs related by thisjudgment, the annotations ϵ and δ stand for the quantitative “cost” of establishing this relation,and Ψ and Φ are both assertions over pairs of program states, standing for the pre-condition andthe post-condition of this judgment respectively.We have seen two kinds of rules in the Fuzzi type system so far: expression typing rules andcore typing rules for commands. Although both are presented in the form of inference rules, thesetwo typing judgments are very different in nature. The expression typing rules are defined as aninductive relation, while the typing rules for commands are theorems to be proven. This choice ismotivated more by practicality and less by theory—foundational proofs are more difficult to workwith than inductive relations, since we do not plan on mixing the typing rules for expressions withmanual proofs, there is no need to use the foundational methods for expressions.Because expression typing rules of the form Γ ⊢ e ∈ ϕ τ are instances of an inductive relation,we need to prove a few soundness properties that will make these expression typing rules usefulin the development of command typing rule proofs. In particular, we care about soundness withrespect to sensitivity and co-termination. We elide proofs by straightforward induction.Lemma 1 (Expression Sensitivity Sound). Given Γ ⊢ e ∈ ϕ τ and two program states M and M related by Γ , if [[ e ]] M = v and [[ e ]] M = v , then d τ ( v , v ) ≤ ϕ . Lemma 2 (Expression Co-termination).

Given Γ ⊢ e ∈ ϕ τ and two program states M and M related by Γ , evaluating the expression [[ e ]] M yields some value v if and only if [[ e ]] M yields somevalue v . The command typing rules have the form { Γ } c { Γ ′ , ( ϵ , δ )} . And earlier, we described Γ and Γ ′ as pre-condition and post-conditions. What does it mean to treat a typing context as pre- andpost-conditions?Recall the translation from typing contexts to apRHL assertions in Section 3. This translationnaturally induces a relation on each program variable: each variable’s type information x : ϕ τ Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. becomes the relation d τ ( x ⟨ ⟩ , x ⟨ ⟩) ≤ ϕ . The entire typing context Γ is translated to conjunctionsof the pointwise relation for each program variable. As an example, the context x : int , y : real corresponds to the relation d int ( x ⟨ ⟩ , x ⟨ ⟩) ≤ ∧ d real ( y ⟨ ⟩ , y ⟨ ⟩) ≤

2. We also use the [[·]] functionto denote the translation of typing contexts.So a typing rule is in fact an apRHL judgement in disguise: ⊢ c ∼ ( ϵ , δ ) c : [[ Γ ]] ⇒ [[ Γ ′ ]] . To provethese judgments valid, we need to use the apRHL proof rules. In fact, many of Fuzzi’s core typingrules are specialized versions of the corresponding apRHL rule for that command. We list all apRHLproof rules used in this paper in Appendix B, but readers do not need to look there to understandthe following content in the main body of the paper.As an example, the soundness of the Assign rule is justified by the following lemma:Lemma 3. Given Γ ⊢ e ∈ ϕ ′ τ , the judgement ⊢ x = e ∼ ( , ) x = e : [[ Γ ]] ⇒ [[ Γ [ x (cid:55)→ ϕ ′ ]]] is true. We define one such lemma for each of the typing rules given in Figure 3, and justify them usingcorresponding apRHL proof rules.One important technical subtlety is that the original presentation of apRHL only reasons overterminating programs. Requiring Fuzzi’s typechecker to prove termination for all programs wouldunavoidably rule out some useful ones. Fortunately, we actually need only a subset of apRHL’sproof rules, and these are all sound even if programs only co-terminate [Hsu 2018]; we can thus werelax the “all programs terminate” assumption of apRHL in the development of Fuzzi.

Remark.

Although we carry out privacy proofs in apRHL, the logic apRHL does not fully isolateits user from the underlying semantics of the language. For example, some of the apRHL proofrules used to develop Fuzzi require us to prove termination of commands, but apRHL does not giveproof rules for termination. So we develop our own sound termination typing rules that matchapRHL’s termination definition, using the semantics of Fuzzi. This necessitates an even lower-levellogic L to formalize the parts not specified by apRHL. In the following sections, we will explicitlycall out objects defined in L . In this section, we discuss how to integrate the core Fuzzi type system with specialized typingrules for Fuzzi extensions; we then introduce several concrete extensions that will be used later forour case studies: operations for mapping a piece of code over all the cells in a bag or vector, anoperation for partitioning a bag into a collection of smaller bags according to some criterion, anoperation for summing the elements of a bag, and an operation for sequencing several commandsusing an “advanced composition theorem” from the differential privacy literature to obtain a lowerprivacy cost than the one given by the plain sensitivity typing rule for sequencing.Definition 5.

An extension is a 4-tuple ( ext , f , rule , proof ) . The first field ext is the name ofthe extension. The second field is a function f that maps Fuzzi expressions or commands to a Fuzzicommand, we will call f the syntax expansion function. Let v , . . . , v i be the syntactic variables boundin f ; the third field is a typing rule, parameterized by the same v , . . . , v i syntactic variables. Thetyping rule may contain premises over any combination of v , . . . , v i , and the typing rule’s conclusionhas the shape of a Fuzzi typing triple for the expanded code of the extension. Finally, the last fieldproof is a proof of the soundness of the typing rule. We will use the notation ext ( p , p , . . . , p i ) for the syntax of invoking an extension. These exten-sion commands are replaced by the expanded body f ( p , p , . . . , p i ) with each p i substituting foreach syntax variable v i . The proof assistant Coq [Coq Development Team 2018] is a suitable candidate of L ; indeed, we have already formalizedsome parts of Fuzzi in Coq.Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 13 Bag-Map

Expressing the typing rule requires the full generality of the lowest level logic L , since extensiontyping rules contain side conditions (such as termination) that are not captured by apRHL. Thegeneral shape of this theorem is of the form ∀ v , . . . , v i , P ∧ · · · ∧ P j ⇒ { Γ } ext ( v , . . . , v i ) { Γ ′ , ( ϵ , δ )} Each P j is a premise of the typing rule, and all premises may bind any combination of v , . . . , v i .The conclusion of the theorem is always of the shape of a Fuzzi typing triple, so that proofs of thetyping rule can mix with the soundness proofs of the core typing rules introduced in Section 4.1.The final component of an extension is a soundness proof of the typing rule theorem.Some of these premises are Fuzzi typing judgments, while some others are auxiliary judgmentsthat asserts termination or describes a linear scaling relationship between the pre-condition sen-sitivities and the post-condition sensitivities. These two extra kinds of auxiliary judgments aredefined in L , we will give definitions for these auxiliary judgments as we encounter them. We willdescribe their proof rules in Appendix C.We will provide an overview of the use case for each extension, and only provide a sketch ofthe proof of soundness to conserve space. Detailed soundness proofs for all the extensions can befound in Appendix D. Our first extension, Bag-Map, takes an input bag variable, an output bag variable, a few auxiliaryvariables used by the expanded loop, and finally a “bag-map body” c that reads from a single bagentry and outputs a mapped value for that bag entry. This command c represents a single step ofthe “map” operation. Bag-Map applies this operation uniformly for all entries in a bag.The premises of Bag-Map’s typing rule use a few new ingredients—the function mvs collectsthe set of modified variables from a command c , and the function stretch “expands” a typingcontext. The stretch function takes a typing context Γ , and for each variable x ∈ Γ , if Γ ( x ) > Γ ( x ) = ∞ , otherwise leaves Γ ( x ) as 0. Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. stretch ∅ = ∅ stretch ( x : τ , Γ ) = x : τ , stretch Γ stretch ( x : σ τ , Γ ) = x : ∞ τ , stretch Γ if σ > stretch is used when we need to verify that some output variable’s sensitive valuederives solely from some input variable. Since a variable can only become sensitive if its value derives from another sensitive variable, itis perhaps not surprising that Fuzzi’s sensitivity type system is capable of tracking dependencyas well. We will use the function stretch to help uncover this dependency analysis part of thesensitivity type system.Consider any program fragment c , for which we want to verify that for a single variable t ,after executing c , the sensitive data held in t must only come from s . If we had the followingtyping judgment about c : { stretch Γ [ s (cid:55)→ ]} c { Γ , ( , )} where Γ ( t ) =

0, then we know if s isnon-sensitive, t is also non-sensitive. This implies that the only sensitive dependency of t is at mostthe singleton set { s } .However, this typing judgment only tells us the dependency of t when all sensitive variablesare ∞ -sensitive before executing c . Does the same result hold when those variables have finitesensitivity? To arrive at this conclusion, we need to apply the Conseq rule from apRHL afterunfolding our previous typing judgment into an apRHL judgement. Conseq ⊢ c ∼ ϵ ′ , δ ′ c : Φ ′ ⇒ Ψ ′ ⊨ Φ ⇒ Φ ′ ⊨ Ψ ′ ⇒ Ψ ⊨ ϵ ′ ≤ ϵ ⊨ δ ′ ≤ δ ⊢ c ∼ ( ϵ , δ ) c : Φ ⇒ Ψ The Conseq rule allows us to strengthen the pre-condition. Now, in the context stretch Γ [ s (cid:55)→ ] ,if we change any other variable’s sensitivity to some finite value, then we have a stronger statement,since ∞ -sensitivity is implied by finite sensitivity. Thus, the Conseq rule allows us to change thepre-condition to one that implies the stretched typing context.This technique allows us to verify t ’s dependency is at most { s } through sensitivity analysis.However, the program c may modify variables other than t . In order to make sure there are noother sensitive output variables from the program fragment c , we also want to verify the lack ofdependency on s . Can we re-use the sensitivity type system to check some other modified variable v does not depend on the variable s ? Indeed we can, using the stretch function again in a slightlydifferent way. Consider the typing judgment { stretch Γ [ s (cid:55)→ ∞]} c { Γ , ( , )} where Γ ( v ) = s and the other sensitive values changesbetween two executions of c , the value held in v remains the same at the end of the execution. Soindeed v does not depend on s or any other sensitive variable. Again Conseq rule allows us tostrengthen the ∞ -sensitivity to any finite sensitivity.The Bag-Map typing rule applies this technique to check that, on each iteration of c , the valueof t out derives only from the corresponding input bag entry t in . The program fragment c shouldnot access the original bag value directly, and neither should it write directly to the output bag.Furthermore, each iteration of the bag map body should be independent of each other, so the valuesof its modified variables should not carry over to the next iteration. For these reasons, we set thesensitivity of modified variables of c , the variable i , and the input and outputs bags variables in and out to ∞ in the judgment { stretch Γ [ t in (cid:55)→ ] σ } c { Γ , ( , )} , and check Γ ( t out ) =

0. We use theletter σ to abbreviate the update expression [ mvs c , i , in , out (cid:55)→ ∞] . This kind of dependency analysis is one of the motivating examples for Benton’s seminal work on relational Hoarelogic [Benton 2004].Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 15

We also want to verify that none of the modified variables, except for t out , has any dependencyon sensitive data. This is why we check the only variable that can potentially hold sensitive data is t out in the judgment { stretch Γ [ t in (cid:55)→ ∞] σ } c { Γ , ( , )} The variables mvs c , i , in , out are againset to ∞ to ensure these variables do not leak information across iterations as discussed above.We use the judgment determ c to assert that c is a deterministic Fuzzi program. It is easy toshow that any program c that do not contain the Laplace mechanism is deterministic. We use thejudgment Γ ⊢ c term to denote that command c terminates for any program state that is well-typedaccording to shape ( Γ ) . That is, for all M in shape ( Γ ) , running the program c with the well-typedprogram state [[ c ]] M terminates with probability 1.Since the apRHL judgment that corresponds to the conclusion of this typing rule implies co-termination of Bag-Map programs, we need to prove the expanded while loops actually co-terminate.However, because these two while loops may have different number of iterations due to bagshaving different sizes, thus executing c for different number of times, we cannot simply show c co-terminates. So, we take the extra step of requiring termination of c on all well-shaped inputs,which ensures both loops will always terminate.The soundness proof for the Bag-Map typing rule applies dependency analysis to ensure themap body c maps the input value t in deterministically to t out , and that c does not store sensitivedata in any of its other modified variables. This allows us to ensure in [ i ] maps deterministicallyand uniformly to out [ i ] through each iteration of c . Now, adding or removing any entry from theinput bag will correspondingly add or remove the mapped value from the output bag. So the outputbag must have the same sensitivity as the input bag. Our second example, the Vector-Map extension is, very similar to Bag-Map in that it also requiresthe “map” command to restrict its flow of sensitive data from t in to only t out . However, Vector-Maphas an additional requirement that map body must be “linear”:Definition 6 (Linear Commands). We write k Γ to denote a typing context Γ ′ where Γ ′ ( x ) = k Γ ( x ) .A deterministic and terminating command c is linear with respect to Γ and Γ , if for any k > , thescaled typing judgment { k Γ } c { k Γ , ( , )} is true. We define k · ∞ = ∞ for k > , and · ∞ = . This definition tells us that the updates in sensitivity in the post-condition scale linearly withrespect to the sensitivities in the pre-condition. An example of a linear command is x = y + Γ = x : real , y : real and Γ = x : real , y : real .A counterexample is if x > then x = x + else x = x + end with Γ = x : real and Γ = x : real . This command conditionally increments x by a constant of 1 or 2, so if x was 1sensitive before executing this command, then we can show in apRHL that x is 2 sensitive afterthis conditional command. Now, if we scaled x ’s sensitivity by 0 .

5, then x is 1 . · . =

1. So this command is not linear with respect to thechose Γ and Γ . However, had we chosen Γ = x : ∞ real , then this command is linear with respectto the new post-condition, because k · ∞ = ∞ , and no matter what the values of x ⟨ ⟩ and x ⟨ ⟩ are, their difference is always bounded by ∞ . We present the proof rules for linear commands inAppendix C.For vector map, we need to know the scaling relationship between the sensitivity of t out andthe sensitivity of t in in order to derive the sensitivity of the output vector. With c being a linearcommand for the chosen pre-condition Γ [ t in (cid:55)→ ] σ and post-condition Γ , by the definition, for anyscale factor k > d ( t in ⟨ ⟩ , t in ⟨ ⟩) ≤ k before executing c , then d ( t out ⟨ ⟩ , t out ⟨ ⟩) ≤ sk after executing c , where s = Γ ( t out ) . Recall the definition of vector distance, by instantiating k withthe actual distance for each pair of i th entries from the input vectors d ( in ⟨ ⟩[ i ] , in ⟨ ⟩[ i ]) , we know Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019.

Vector-Map

Termination, Deterministic: Γ ⊢ c term determ c Should Not Modify: t in , in , out , i (cid:60) mvs c Abbreviation: σ = [ mvs c , i , in , out (cid:55)→ ∞] σ ′ = [ mvs c , i , t in , t out (cid:55)→ ∞] Dependency: { stretch Γ [ t in (cid:55)→ ] σ } c { Γ , ( , )}{ stretch Γ [ t in (cid:55)→ ∞] σ } c { Γ , ( , )} Γ ( t out ) = { x | x ∈ mvs c ∧ Γ ( x ) > } ⊆ { t out } Linear: { Γ [ t in (cid:55)→ ] σ } c { Γ , ( , )} linear Output Sensitivity: Γ out = Γ [ out (cid:55)→ Γ ( in ) · Γ ( t out )] σ ′ { Γ } vmap ( in , out , t in , i , t out , c ) { Γ out , ( , )} i = 0; out . length = in . length ; while i < in . length do t in = in [ i ]; c ; out [ i ] = t out ; i = i + 1; end Fig. 5. Vector-Map typing rule and extension code pattern the distance between the output vectors satisfy the following condition: d [ τ ] ( out ⟨ ⟩ , out ⟨ ⟩) = (cid:213) i d τ ( out ⟨ ⟩[ i ] , out ⟨ ⟩[ i ])≤ (cid:213) i s d τ ( in ⟨ ⟩[ i ] , in ⟨ ⟩[ i ]) = sd [ τ ] ( in ⟨ ⟩ , in ⟨ ⟩) . This justifies the sensitivity derived by the typing rule for vector map.

Our third extension, Partition, allows programmers to break apart a larger bag into a vector ofsmaller bags. Partition is parameterized by an input bag variable, an output vector variable, a fewauxiliary variables for storing results from intermediate computations, and finally a command thatmaps each input bag entry to a partition index.The Partition extension is similar to Bag-Map in that it maps each bag item to some value,but they differ in how the output value from the each iteration is used. With Bag-Map, the outputvalue from each iteration is collected into the output bag as is. Partition uses the output value asan assignment index into the output bag out , and appends the bag entry at current iteration to thesub-bag at out [ t idx ] . As an example, if the input bag is [ . , . , . ] , and the map operation simplyrounds down each value to the nearest integer, then the output bag will be [[] , [ . ] , [ . ] , [ . ]] .It may seem redundant that Partition takes the number of partitions as a parameter. Shouldn’tPartition be able to compute the number of partitions as it processes the input bag values? Itshould not, because a computed number of partitions is a sensitive value that depends on thecontents of the input bag. Taking the previous example, if we add a value of 100 . Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 17

Partition

Termination, Deterministic: Γ ⊢ c term determ c Should Not Modify: t in , in , out , i , out idx (cid:60) mvs c Abbreviation: σ = [ mvs c , i , in , out idx (cid:55)→ ∞] σ ′ = [ mvs c , i , t in , t idx , out idx , t part (cid:55)→ ∞] Number of Partitions Non-Sensitive: Γ ⊢ nParts ∈ intfvs nParts ∩ mvs c = ∅ i , t in , t idx , out idx , t part (cid:60) fvs nParts Dependency: { stretch Γ [ t in (cid:55)→ ] σ } c { Γ , }{ stretch Γ [ t in (cid:55)→ ∞] σ } c { Γ , } Γ ( t out ) = { x | x ∈ mvs c ∧ Γ ( x ) > } ⊆ { t out } Output Sensitivity: Γ out = Γ [ out (cid:55)→ Γ ( in )] σ ′ { Γ } partition ( in , out , t in , i , t out , t idx , out idx , t part , nParts , c ) { Γ out , } i = 0; out . length = nParts ; while i < nParts do out [ i ]. length = 0; i = i + 1; end ; bmap ( in , out idx , t in , i , t out , c); i = 0; while i < out idx . length do t idx = out idx [ i ]; if t idx && t idx < out . length then t part = out [ t idx ]; t part . length = t part . length + 1; t part [ t part . length - 1] = in [ i ]; out [ t idx ] = t part ; elseskip ; end ; i = i + 1; end Fig. 6. Partition typing rule and extension expansion made arbitrarily large by adding a single item to the input bag. This is why we fix the number ofpartitions and drop the items whose partition indices are out of range.The soundness of the sensitivity check for partition comes from the fact that each index is derivedonly from its corresponding bag entry, thus adding or removing one bag entry can cause at mostone sub-bag in the output vector to vary by distance 1. Generalizing this fact shows that the outputvector has the same sensitivity as the input bag does to partition.

Our fourth extension, Bag-Sum, works with bags of real-valued data and adds these values up withclipping. The clipping process truncates a value s such that its magnitude is no larger than bound .This is important to ensure the output of Bag-Sum has finite sensitivity. Recall that the sensitivitydefinition on a bag places no constraints on the distance of values held by the bag. If we naïvelysummed the two bags [ , ] and [ , , ] , although their bag distance is bounded by 1, their sumshave distance 100. Using only the bag distance, the typechecker will have no information on thesensitivity of the sum. Truncating each value into the range [− bound , bound ] allows us to bound Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019.

Bag-Sum literal bound bound ≥ ϕ = Γ ( in ) Γ out = Γ [ out (cid:55)→ ϕ · bound ][ i , t in (cid:55)→ ∞]{ Γ } bsum ( in , out , i , t in , bound ) { Γ out , } i = 0;out = 0; while i < in . length do t in = in [i ]; if t in < − bound then out = out - bound ; elseif t in > bound then out = out + bound ; else out = out + t in ; endend i = i + 1; end Fig. 7. Bag-Sum typing rule and expansion

Adv-Comp

Loop body: { Γ } c { Γ , ( ϵ , δ )} Privacy cost: ϵ ∗ = ϵ (cid:112) n ln ( / ω ) + nϵ ( e ϵ − ) δ ∗ = nδ + ω Should Not Modify: i (cid:60) mvs c Adv-Comp Parameters: ω > n > literal ω literal n { Γ } ac ( i , n , ω , c ) (cid:8) Γ , ( ϵ ∗ , δ ∗ ) (cid:9) i = 0; while i < n do c ;i = i + 1; end Fig. 8. Adv-Comp typing rule and expansion the total sensitivity of the sum value—if up to ϕ bag items may be added or removed, and each cancontribute up to bound towards the total sensitivity of out , then at the end of the loop, out must be ( ϕ · bound ) -sensitive. Our fifth extension, Adv-Comp, simply expands to a loop that runs the supplied command c for n times. However, this extension provides a special privacy cost accounting mechanism known asAdvanced Composition [Dwork et al. 2010]. Compared to Simple Composition, Adv-Comp gives anasymptotically better ϵ that grows at the rate of O (√ n ) , at the cost of a small increase in δ . Simplecomposition of loop iterations will give privacy costs that grow at the rate of O ( n ) instead. Theprogrammer chooses the increase in δ by providing a positive real number ω , which is used tocompute the aggregated privacy cost for the entire loop.The Adv-Comp extension is useful for programs that iteratively release data, and it also allowsprograms to be run for more iterations while staying under the same privacy budget. We useAdv-Comp in our implementation of logistic regression in Section 7.It is worth noting that Adv-Comp does not always give a better privacy cost than simplecomposition: when the ϵ cost of c is large, the term nϵ ( e ϵ − ) becomes the dominating term. This Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 19 term again grows linearly with n , and it has a multiplicative factor of e ϵ −

1. When Adv-Compgives worse privacy cost in both ϵ and δ in comparison to simple composition, the type systemfalls back to simple composition for the expanded loop for privacy cost accounting. Our prototype Fuzzi checker expands extensions before typechecking, leaving hints in the expandedabstract syntax tree so that it can tell when to apply the macro-typing rules that accompany eachextension. The typechecking algorithm includes three major components: 1) a checker that computessensitivity, 2) a checker for termination, and 3) a checker for linear properties of commands.The implementation uses three separate ASTs types—called Imp, ImpExt, and ImpTC—to representa Fuzzi program in different phases of checking. The Imp AST is what the parser produces—i.e., theconstructs of the core language plus extension applications. ImpExt is a convenient language forentering extension declarations for Fuzzi. Finally, ImpTC represents programs expanded from Imp—while-language with no extension application nor extension definition, but contains typecheckerhints, and the typechecker expects terms from ImpTC. The ImpTC language is not accepted bythe parser, as we do not anticipate users wanting to enter typechecker hints directly. We use theextensible sum encoding described in data types à la carte [Swierstra 2008] to represent these ASTsin order to avoid code duplication. We depend on the compdata package [Bahr and Hvitved 2011]to manipulate ASTs in this encoding.The three checkers are implemented separately. A checker composition function takes resultsfrom each checker, and produces the final type information for a Fuzzi program.To efficiently execute Fuzzi code, we compile Fuzzi programs to Python and use fast numericoperations from the numpy library [Oliphant 2015] whenever appropriate.

To evaluate Fuzzi’s effectiveness, we implement four differentially private learning algorithmsfrom four diverse classes of learning methods—discriminative models, ensemble models, generativemodels, and instance-based learning. The algorithms and datasets are both taken from canonicalsources. We want to know (1) whether Fuzzi can express these algorithms adequately, (2) whetherthe typechecker derives sensitivity bounds comparable to results of a careful manual analysis, and(3) whether the final privacy costs are within a reasonable range.We use datasets obtained from the UCI Machine Learning repository [Dheeru and Karra Taniski-dou 2017] and the MNIST database of handwritten digits [LeCun and Cortes 2010]. We focus onevaluating Fuzzi’s usability on prototyping differentially private learning tasks in these experiments,rather than trying to achieve state-of-the-art learning performance.We find that Fuzzi can indeed express all four examples and that it correctly derives sensitivitybounds comparable to results from a manual analysis. The examples also demonstrate that theextensions described in Section 5 are useful for real-world differential privacy programming, sinceeach of the learning algorithms can be expressed as a straightforward combination of extensions.On the other hand, the privacy costs that Fuzzi derives are arguably a bit disappointing. Onereason for this is that we ran the experiments on fairly small datasets. A deeper reason is that Fuzzifocuses on accurate automatic inference of sensitivities , an important building block in differentialprivacy. Tracking sensitivities is somewhat orthogonal to the question of how to most tightlytrack privacy costs, which is achieved via composition theorems that sit on top of the sensitivitycalculations. Our focus in this work has been mainly on tracking sensitivity; in particular, weimplement only simple composition theorem in the core type system. The result is that Fuzzimay report a larger privacy cost than is optimal, even when it optimally computes sensitivities.However, stronger composition theorems can be added as extensions: we give an example of this

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. by demonstrating an “advanced composition” [Dwork et al. 2010] extension in Section 5. We viewadding extensions for more sophisticated methods of tracking privacy costs as future work (seeSection 9 and Section 10).

We first investigate a binary classification problem. The dataset contains 12,665 digits (either 0 or 1)from the MNIST database. (We only work with 0 and 1 digits because it simplifies our presentation.A 10-class logistic regression model that classifies all 10 digits can be implemented using the samemethods we show here.) We use 11,665 digits for training and leave 1,000 digits on the side forevaluation. Each digit is represented by a 28 ×

28 grayscale image plus a label indicating whetherit is a 0 or a 1. The image and its label are flattened into a 785-dimensional vector. We then usegradient descent, a simple and common machine learning technique to train a standard logisticregression model that classifies these two shapes of digits. We apply differential privacy here toprotect the privacy of each individual image of the digits. In other words, differential privacy limitsan adversary’s ability to tell whether a particular image was used in training the classificationmodel. In particular, we modify gradient descent with a gradient clipping step to achieve differentialprivacy. Gradient clipping is a common technique for implementing differentially private gradientdescent [Abadi et al. 2016; McMahan et al. 2018].The logistic regression model is parameterized by a vector (cid:174) w of the same dimension as the inputdata and a scalar b . A “loss function” L ( (cid:174) w , b , (cid:174) x i ) quantifies the mistakes the model makes given apair of (cid:174) w and b , and an input image x i . In ordinary (non-private) gradient descent, we computethe gradients ∂ L ∂ (cid:174) w and ∂ L ∂ b for each image x i , and we move the current parameters (cid:174) w and b inthe direction of the average of these gradients, decreasing the value of the loss function L (i.e.,improving the quality of model parameters). To set the initial values of (cid:174) w and b , we take randomsamples from a normal distribution centered at 0 and with variance 1.Since the gradients here are computed from private images, the model parameters modified withthese gradients are also sensitive information that cannot be released directly. Instead, we releasenoised estimations of the average gradients and use these values to update model parameters. Weapply bmap over the input dataset, computing a bag of both gradients for each image. We thenuse bsum and the Laplace mechanism to release the sum of the gradients. We also use the Laplacemechanism to compute a noised estimate of the size of the dataset, and then update the modelparameter with the noised average gradient. The bsum extension clips each gradient value so thatthe final sum has a bounded sensitivity.We iterate the gradient descent calculation with the ac (advanced composition) extension. With100 passes over the training set, we reach a training accuracy of 0 . .

84. Wemeasure accuracy as the fraction of images the trained model correctly classifies. The differentiallyprivate model’s accuracy is comparable to the accuracy of 0 .

88 for a logistic regression modelwithout differential privacy [Lecun et al. 1998]. Training the model with 100 passes incurs privacycost ϵ = .

02 and δ = − . Our ϵ privacy cost is larger than the results achieved by Abadi et al.[2016] ( ϵ = . , δ = − ) on MNIST, due to our use of a simpler privacy composition theorem;Abadi et al. invented a specialized “moments accountant” method to derive tighter aggregatedprivacy costs given the same sensitivity analysis. Next, we build on the logistic regression from above, together with ideas from Papernot et al. [2016],to design an ensemble model —a collection of models—that classifies 0 and 1 digits from the MNISTdataset. Papernot et al. demonstrated a general approach for providing privacy guarantees with

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 21 respect to the training data: first partition the private training data into k partitions, then apply atraining procedure to each partition to build a private model for it. These k private models form aPrivate Aggregation of Teacher Ensembles (PATE). The PATE then predicts labels using differentialprivacy for another unlabeled public data set. Since the resulting public dataset and its labels are notprivate information, they can be used to train any other model, effectively transferring knowledgefrom the PATE to a public model while preserving privacy. Note that we do not require that themodels used internally in creating the PATE to be differentially private: only the aggregation step(used to predict labels for the public dataset) involves differential privacy.We split the training input dataset of MNIST digits into a bag of five parts. Using the extension bmap for each part, we independently train a logistic regression model with non-private gradientdescent. Assuming the input dataset has sensitivity 1, only one part of training data can change.Fuzzi correctly derives the fact that therefore at most 1 trained logistic regression model will change,resulting in a bag of model parameters with sensitivity 1.We use the private ensemble of models to label another 100 test images; with privacy cost ϵ = .

0, and δ = .

0, we are able to reach an accuracy of 0 .

82. The large ϵ value here is relatedto the small size of the training set. To release a public label for a given image, the private scoresare collected from the PATE with bmap and then a noised average of the private scores is releasedby bsum with the Laplace mechanism. Since we only have 5 private scores for each image, thenoise variance must be small so as not to destroy the utility of the scores, resulting in big ϵ . Toincrease the stability of the released label (and hence decrease the privacy cost) we could increasethe number of models, thus increasing the number of private scores and the scale of the summedscore, thereby allowing more noise added to their average. However, the result would be that eachmodel would have been trained on correspondingly fewer images, resulting in worse classificationperformance on this dataset. On larger datasets, our Fuzzi implementation of PATE would providethe same level of classification performance with lower ϵ cost. We next implement a simple spam detection algorithm using the Spambase dataset from UCIMachine Learning Repository [Dheeru and Karra Taniskidou 2017]. The binary-labeled dataset(spam or non-spam) consists of 57 features, mostly of word frequencies for a set of given words,with additional features describing run lengths of certain sequences. We binarize all features fromthe data set to simplify the probability model described below (i.e., instead of how frequently aword appears on a scale of [ , ] , we only know whether the word was used (1) or not (0)). We canimplement a more sophisticated Gaussian Na¨ve Bayes model that takes advantage of the frequencydata using the same principles as in this experiment, but we chose to simplify the features to presenta simpler model. We use 4500 samples for training and 100 samples for evaluation. Our privacygoal in this experiment is to limit an adversary’s ability to guess whether a particular documentwas used to train the classification model.A key assumption of the Naïve Bayes model is that given the class y of a data point (cid:174) x , all featuresare conditionally independent of each other. This assumption allows us to decompose the jointprobability P ((cid:174) x , y ) into the product P ( y ) · Π j P ( x j | y ) , where x j represents the j -th coordinate ofthe binary vector (cid:174) x . In our experimental setup, the j -th coordinate of a data point representsthe presence of a word in the document. The goal of the Naïve Bayes model is to estimate theprobabilities of P ( y = ) and P ( x j = | y = ) and P ( x j = | y = ) given the training data. Thus,when we get a new document (cid:174) x ′ , we can compare the probabilities P ( y = ) Π j P ( x j = x ′ j | y = ) ≤ ? P ( y = ) Π j P ( x j = x ′ j | y = ) Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. to make a prediction on whether the document is spam or not ( y = y = · + =

115 parameters to estimate.Estimating the parameter P ( y = ) simply involves adding noise to the number of spam docu-ments in the training set and dividing that count by the noised size of the training data set. Weachieve this by first applying bmap to map each training data point to either 1 or 0 depending on itslabel, followed by bsum and Laplace mechanism to get a noised count. We can get a noised size ofthe training set through applying Laplace mechanism the training set’s size.Estimating the parameters P ( x j = | y = ) and P ( x j = | y = ) follows an essentially identicalprocedure. We first apply bmap to each training data point to map them to either 1 or 0 based on thevalues of x j and y , followed by a bsum operation to get the value of these conditional counts. Wealready computed the noised count of training samples with y = P ( y = ) , butwe perform the same procedure to compute a noised count of y =

0. Dividing the noised conditionalcounts by the noised counts of y = y = .

70 and test accuracy of 0 .

69, with privacy costs ϵ = .

70 and δ =

0. This classification accuracy is only slightly worse than the accuracy 0 .

72 of a non-privateNaïve Bayes model that we implemented using binarized features from the same dataset.

Finally, we perform a K-Means clustering experiment to evaluate Fuzzi’s usability for an un-supervised learning task on the iris dataset from Fisher [1936]. This dataset contains three classesof iris flowers, with 50 flowers from each class. Each flower comes with four numeric features ofpetal and sepal dimensions and a label representing its class. Our experiment randomly selectedone data point from each of the three classes as the initial public centroids; the Fuzzi program uses partition to map each data point to its closest centroid and create partitions accordingly. Otherthan the three data points used to initialize centroids, we used all other data for unsupervisedtraining. (This experiment assumes a small part—in this case, three data points—of the trainingset is given as public information. Past work implementing differentially private K-Means made asimilar assumption [Reed and Pierce 2010].)On each pass over the training set, we first compute a noised sum of data points within eachpartition; we also compute noised sizes of each partition. We use these values to compute eachpartition’s average point as the new centroids for the next pass. For evaluation, we classify allpoints within a partition with the majority label, and we obtain the accuracy of the clustering withthese classifications. We do not use the labels for unsupervised training.We found that the performance of the clustering algorithm varies depending on the initialcentroids selected: running the experiment 100 times, all within 5 passes over the data set, we reachlowest accuracy of 0.55 and highest accuracy of 0.9, with a median accuracy of 0.69. Increasingthe iteration count does not reduce this spread. We implemented a non-private version of thesame algorithm, and achieved lowest accuracy 0 .

59, highest accuracy 0 .

96 and median accuracy of0 .

59 on 100 experiments. Similar to Naïve Bayes, we see a slight drop in classification accuracycompared to the non-private implementation.Each run has privacy cost ϵ = . δ = .

0. The large ϵ cost here is again related to thesmall size of the training set. In a small dataset, each data point has a larger impact on the releasedcentroids; in order to reach a reasonable level of classification accuracy, we chose to apply theLaplace mechanism with a smaller noise level, resulting in larger ϵ cost. We briefly discuss some limitations and shortcomings of Fuzzi.

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 23

Limitation of sensitivity.

Fuzzi’s type system interface strikes a careful balance between expres-siveness and complexity. Our approach is sufficient for expressing sensitivities of primitive valuessuch as int and real and can capture a top-level sensitivity for vectors and bags; however, typingcontexts cannot express sensitivities for individual values within a vector. For example, McSherryand Mironov developed a differentially private version of the recommender system based on Netflixuser ratings [McSherry and Mironov 2009], where the sensitivity of inputs to the system is definedby the changes that may happen to a single row within a matrix, rather than the whole matrix.Fuzzi currently cannot carry out automatic inter-structure sensitivity derivation and cannot provideautomatic differential privacy checking for McSherry and Mironov’s algorithm.

Lack of support for abstraction.

Vectors and bags are well studied objects in the differentialprivacy literature, and they have first class support in Fuzzi. However, Fuzzi does not providefacilities to specify general abstract data types and their neighboring relations. Fuzzi must knowhow to translate neighboring relation into an apRHL assertion, and this translation is not currentlyextensible. This limitation may force programmers to contort their code in order to represent ahigh-level concept through arrays. An example algorithm that cannot be adequately expressed inFuzzi due to lack of abstraction is the binary mechanism [Chan et al. 2011], which builds a tree ofpartial sums of the input data and accumulates a statistic whose sensitivity is proportional to thedepth of the tree.

Potential Vulnerabilities.

Fuzzi’s semantics uses real numbers as a model for the type real .However, the implementation uses floating point numbers. As shown by Mironov [2012], usingthe Laplace mechanism in this setting may result in vulnerable distributions that can compromisethe original sensitive data. Although Fuzzi guarantees co-termination over neighboring data, it isvulnerable to timing channel attacks [Haeberlen et al. 2011]. A Fuzzi program that uses sensitiveloop conditions may result in vastly different execution duration. This side channel allows anattacker to distinguish runs with high confidence. The first issue can be alleviated by a carefulimplementation of Laplace mechanism that incorporates Mironov’s mitigation strategy, whilethe second issue is more fundamental—Fuzzi’s type system needs to approximately measure theexecution time, which we did not address in this work.

Performance concern due to copy assignments.

Fuzzi uses copy assignments for arrays. We haveworked with relatively small datasets in the experiments, and the sizes of these arrays have notcaused severe performance problems in our experiments. However, today’s machine learning taskstypically operate on datasets that are many orders of magnitude larger, and Fuzzi likely cannothandle computations over these datasets efficiently. To adapt Fuzzi’s theory for a semantics thatallows sharing, we need to create a new flavor of apRHL that can reason about heaps. One potentialdirection is to integrate separation logic [Reynolds 2002] into apRHL.

Query languages.

McSherry introduced Privacy Integrade Queries (PINQ) as an embedded querylanguage extension for the C partition extension takesinspiration from PINQ’s partition operator, adapting it to an imperative program that computesover arrays.The FLEX framework [Johnson et al. 2018] allows programmers to run differentially privateSQL queries against a private database. FLEX uses an elastic sensitivity technique to support SQLqueries with joins based on equality. Fuzzi focuses on adding support of Differential Privacy to a

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. general-purpose imperative language; however, the theory around elastic sensitivities could inspirefuture extensions to the Fuzzi type system.DJoin [Narayan and Haeberlen 2012] runs SQL queries over databases distributed over manymachines. The distributed nature of the data is not just a question of size, but may be due to thefact that different databases may be owned by different organizations that do not wich to sharethem; there simply is no single way to get all the data in the same place. (For example, analystsmay want to correlate travel data with illness diagnosis data, with the former provided by airlinecompanies while the latter provided by hospitals.) Fuzzi does not address distributed computations:it runs on a single machine and assumes data is already in the memory of this machine.

Fuzz and related languages.

Fuzz is a higher-order functional programming language with asensitivity-tracking type system and differentially private primitives [Reed and Pierce 2010]. Fuzzi’ssensitivity type system is inspired by Fuzz, but differs in that Fuzzi separately tracks the sensitivityof each value in the store (which may be change as the program assigns to variables), while Fuzztracks only function sensitivity. Also, Fuzz’s type system is restricted to ( ϵ , ) -differential privacy,while Fuzzi generalizes this to ( ϵ , δ ) -differential privacy.DFuzz [Gaboardi et al. 2013] extends Fuzz with linear indexed types and dependent types,allowing programmers to abstract types over sensitivity annotations. Compared to DFuzz, bothFuzz and Fuzzi only allow purely numeric values as sensitivity annotations in types. This additionallevel of expressiveness admits programs whose sensitivities and privacy costs scale with inputsensitivities. Although Fuzzi does not allow such indexed types, the extension mechanism doesallow language developers to add typing rules quantified over unknown constants (such as theloop count in Adv-Comp for advanced composition); this provides another way for programmersto write programs whose privacy costs scale with program constants.AdaptiveFuzz [Winograd-Cort et al. 2017] extends Fuzz by using staged computation and streamsemantics to implement a powerful composition mechanism called Privacy Filters [Rogers et al.2016]. These give programmers the freedom to run future computations based on results releasedfrom earlier differentially private computations. This allows, for example, programmers to stop aprivate gradient descent loop as soon as accuracy reaches a desired threshold, rather than fixing thenumber of iterations ahead of time. Fuzzi implements advanced composition for improved privacycost aggregation, but privacy filters are not yet formalized in either apRHL or Fuzzi.Duet [Near et al. 2019] is a higher-order functional language that provides ( ϵ , δ ) -differentialprivacy. Fuzz’s original type system relies on composition properties that break down whengeneralized to cases where δ >

0. As an example, an ( ϵ , ) -DP Fuzz function f that takes a 1-sensitive dataset as input has the property that, when f runs on a 2-sensitive dataset, the privacycosts scales accordingly to ( ϵ , ) -DP. However, if f is a general ( ϵ , δ ) -DP computation on 1-sensitivedatasets, it is not true that running f on a 2-sensitive dataset is ( ϵ , δ ) -DP. Duet solves this problemby separating its type system into two disjoint parts: one that keeps track of sensitivities and allowsscaling and another that keeps track of privacy costs and disallows scaling. In Fuzzi, we havea similar separation: the typing contexts of a command sequence are strict pre-conditions andpost-conditions and do not allow scaling, except for commands typechecked with the linear typing judgements that explicitly allow scaling of sensitivities in the pre- and post-condition. These linear commands are deterministic and cannot use the Laplace mechanism by definition, so scalingtheir typing judgements’ sensitivities does not bring up the same issues that Fuzz has. Verification systems.

Albarghouthi and Hsu [2017] developed an automated differential privacyproof synthesis system based on the idea of coupling . Fuzzi and apRHL use the same mathemati-cal device to simplify relational reasoning between non-deterministic outputs from the Laplacemechanism.

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 25

LightDP is an imperative language with semi-automatic typechecking that uses dependent typesto prove differential privacy properties of a program [Zhang and Kifer 2017]. LightDP’s type systemalso keeps track of distances of variables between executions on neighboring inputs. A majordifference from Fuzzi is that LightDP’s type system tracks the exact distance between variablesthrough a dependent type system, while Fuzzi tracks upper bounds on the distance betweenvariables. LightDP elaborates the source code into a slightly extended language that explicitly keepstrack of privacy costs in a distinguished variable and then uses a MaxSMT solver to discharge thegenerated verification conditions in the process of typechecking. Fuzzi typing rules’ soundnessare proven ahead of time, and Fuzzi’s sensitivity checking process does not generate new proofobligations. Due to this design, Fuzzi does not require a constraint solver to aid in typechecking.The EasyCrypt toolset allows developers to construct machine-checkable proofs of relationalproperties for probabilistic computations [EasyCrypt Development Team 2018]. EasyCrypt has adevelopment branch that focuses on Differential Privacy verification through apRHL. EasyCryptprovides built-in support of apRHL proof rules, and also supports termination analysis through acompatible program logic pHL (Probabilistic Hoare Logic). Fuzzi’s development does not connectwith EasyCrypt’s apRHL implementation, but this is a potential future direction for rigorouslychecking Fuzzi’s theories.

Testing Differential Privacy.

Ding et al. [2018] developed a statistical testing framework fordetecting violations of differential privacy of Python programs. This framework performs staticanalysis on Python code and generates inputs that seem likely to violate differential privacy basedon this analysis. It also repeatedly executes the program to collect statistical evidence of violationsof differential privacy. This framework demonstrates the potential for a lighter-weight approach toproviding differential privacy guarantees; we could potentially apply the same methodology to aidFuzzi extension designers by testing typing rules before formally proving their soundness.

High-level frameworks.

The PSI private data sharing interface [Gaboardi et al. 2016] is designedto enable non-expert users and researchers to both safely deposit private data and apply a select setof differentially private algorithms to collect statistics from the deposited data. Fuzzi, on the otherhand, is designed only for the task of implementing differentially private algorithms. It expects itsusers to have some familiarity with key concepts such as sensitivity and privacy budget, and itallows power users to extend its typechecker for more sophisticated programs.The ϵ KTELLO Framework [Zhang et al. 2018] provides a set of expressive high-level combi-nators for composing algorithms, with the guarantee that any algorithm composed of ϵ KTELLOcombinators automatically satisfies differential privacy. The ϵ KTELLO framework allows users tocustomize differentially private algorithms in order to achieve higher utility from querying privatedata. Fuzzi, by contrast, is an attempt at building a rather low-level core language with support forDifferential Privacy. A future direction could be to build a high-level framework like ϵ KTELLO overFuzzi, providing both expressive combinators and automatic verification of Differential Privacy forthe implementation of these combinators in the same system.

10 CONCLUSION AND FUTURE WORK

The rise of Differential Privacy calls for reliable, yet familiar tools that help programmers controlprivacy risks. Fuzzi gives programmers a standard imperative language with automated privacychecking, which can be enriched by expert users with extensions whose privacy properties areproved in a special-purpose relational logic.Many avenues for improvement still remain. (1) We can enrich the set of Fuzzi extensions tofurther increase Fuzzi’s utility. For example, adding Report Noisy Max would allow an analystto find the largest value in a vector with small privacy cost; the Exponential mechanism would

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. allow programs to release categorical data (as opposed to numerical) with differential privacyguarantees [Dwork and Roth 2014]. We expect both mechanisms can be formalized in apRHL [Bartheet al. 2016] and added to Fuzzi. (2) We can engineer typechecker plugins that dynamically loadnew extension typing rules to make Fuzzi’s implementation more flexible. (At the moment, addingan extension typing rule requires editing the typechecker sources.) (3) We can formalize PrivacyFilters [Rogers et al. 2016] in apRHL and add adaptive composition to Fuzzi. This would allowprogrammers to use the adaptive aggregation mechanism from AdaptiveFuzz in Fuzzi. (4) We canimplement Fuzzi as a formalized framework in Coq [Coq Development Team 2018]. This would allowpower users to write machine-checked proofs of extension typing rules. (5) We can incorporateproof synthesis techniques to automatically search for privacy proofs for extensions, followingAlbarghouthi and Hsu [2017], who demonstrated the effectiveness of synthesizing privacy proofsfor interesting differential privacy mechanisms. Proof synthesis could streamline prototyping newFuzzi extensions and their typing rules.

11 ACKNOWLEDGMENTS

We are grateful to Justin Hsu, David Darais, and Penn PLClub for their comments, and we thankthe anonymous ICFP reviewers for their detailed and helpful feedback. This work was supported inpart by the National Science Foundation under grants CNS-1065060 and CNS-1513694.

REFERENCES

Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. DeepLearning with Differential Privacy. In

Proceedings of the 2016 ACM SIGSAC Conference on Computer and CommunicationsSecurity (CCS ’16) . ACM, New York, NY, USA, 308–318. https://doi.org/10.1145/2976749.2978318Amal Ahmed. 2006. Step-Indexed Syntactic Logical Relations for Recursive and Quantified Types. In

Programming Languagesand Systems , Peter Sestoft (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 69–83.Aws Albarghouthi and Justin Hsu. 2017. Synthesizing Coupling Proofs of Differential Privacy.

Proc. ACM Program. Lang.

ACM Trans. Program. Lang. Syst.

23, 5 (Sept. 2001), 657–683. https://doi.org/10.1145/504709.504712Andrew W. Appel, Paul-André Melliès, Christopher D. Richards, and Jérôme Vouillon. 2007. A Very Modal Model of aModern, Major, General Type System. In

Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principlesof Programming Languages (POPL ’07) . ACM, New York, NY, USA, 109–122. https://doi.org/10.1145/1190216.1190235Apple. 2017. Apple Differential Privacy Whitepaper. https://images.apple.com/privacy/docs/Differential_Privacy_Overview.pdfPatrick Bahr and Tom Hvitved. 2011. Compositional Data Types. In

Proceedings of the Seventh ACM SIGPLAN Workshop onGeneric Programming (WGP ’11) . ACM, New York, NY, USA, 83–94. https://doi.org/10.1145/2036918.2036930Gilles Barthe, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2016. Proving Differential Privacy viaProbabilistic Couplings. In

Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science (LICS ’16) .ACM, New York, NY, USA, 749–758. https://doi.org/10.1145/2933575.2934554Nick Benton. 2004. Simple Relational Correctness Proofs for Static Analyses and Program Transformations. In

Proceedingsof the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’04) . ACM, New York, NY,USA, 14–25. https://doi.org/10.1145/964001.964003T.-H. Hubert Chan, Elaine Shi, and Dawn Song. 2011. Private and Continual Release of Statistics.

ACM Trans. Inf. Syst.Secur.

14, 3, Article 26 (Nov. 2011), 24 pages. https://doi.org/10.1145/2043621.2043626Yan Chen and Ashwin Machanavajjhala. 2015. On the Privacy Properties of Variants on the Sparse Vector Technique.

CoRR abs/1508.07306 (2015). arXiv:1508.07306 http://arxiv.org/abs/1508.07306The Coq Development Team. 2018.

The Coq Proof Assistant Reference Manual, version 8.8 . http://coq.inria.frDua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlZeyu Ding, Yuxin Wang, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. 2018. Detecting Violations of DifferentialPrivacy. In

Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18) . ACM,New York, NY, USA, 475–489. https://doi.org/10.1145/3243734.3243818Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating Noise to Sensitivity in Private DataAnalysis. In

Proceedings of the Third Conference on Theory of Cryptography (TCC’06) . Springer-Verlag, Berlin, Heidelberg,Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 27

Found. Trends Theor. Comput.Sci.

9, 3&

Proceedings of the 51stAnnual IEEE Symposium on Foundations of Computer Science (FOCS ‘10) . IEEE, IEEE, Las Vegas, NV, 51–60. http://dx.doi.org/10.1109/FOCS.2010.12The EasyCrypt Development Team. 2018.

EasyCrypt Reference Manual, version 1.x

Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS’14) . ACM, New York, NY, USA, 1054–1067. https://doi.org/10.1145/2660267.2660348R. A. Fisher. 1936. The Use of Multiple Measurements in Taxonomic Problems.

Annals of Eugenics

7, 7 (1936), 179–188.Dan Frumin, Robbert Krebbers, and Lars Birkedal. 2018. ReLoC: A Mechanised Relational Logic for Fine-Grained Concurrency.In

Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS ’18) . ACM, New York, NY,USA, 442–451. https://doi.org/10.1145/3209108.3209174Marco Gaboardi, Andreas Haeberlen, Justin Hsu, Arjun Narayan, and Benjamin C. Pierce. 2013. Linear Dependent Types forDifferential Privacy. In

Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of ProgrammingLanguages (POPL ’13) . ACM, New York, NY, USA, 357–370. https://doi.org/10.1145/2429069.2429113Marco Gaboardi, James Honaker, Gary King, Kobbi Nissim, Jonathan Ullman, and Salil P. Vadhan. 2016. PSI ( Ψ ): a Privatedata Sharing Interface. CoRR abs/1609.04340 (2016). arXiv:1609.04340 http://arxiv.org/abs/1609.04340Andreas Haeberlen, Benjamin C. Pierce, and Arjun Narayan. 2011. Differential Privacy Under Fire. In

Proceedings of the 20thUSENIX Conference on Security (SEC’11) . USENIX Association, Berkeley, CA, USA, 33–33. http://dl.acm.org/citation.cfm?id=2028067.2028100C. A. R. Hoare. 1969. An Axiomatic Basis for Computer Programming.

Commun. ACM

12, 10 (Oct. 1969), 576–580.https://doi.org/10.1145/363235.363259Justin Hsu. 2017. Probabilistic Couplings for Probabilistic Reasoning.

CoRR abs/1710.09951 (2017). arXiv:1710.09951http://arxiv.org/abs/1710.09951Justin Hsu. 2018. Private Communication.Noah Johnson, Joseph P. Near, and Dawn Song. 2018. Towards Practical Differential Privacy for SQL Queries.

Proc. VLDBEndow.

11, 5 (Jan. 2018), 526–539. https://doi.org/10.1145/3187009.3177733Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. 2017. RustBelt: Securing the Foundations of theRust Programming Language.

Proc. ACM Program. Lang.

2, POPL, Article 66 (Dec. 2017), 34 pages. https://doi.org/10.1145/3158154Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition.

Proc. IEEE

CoRR abs/1603.01699 (2016). arXiv:1603.01699 http://arxiv.org/abs/1603.01699H. Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private RecurrentLanguage Models. In

International Conference on Learning Representations . https://openreview.net/forum?id=BJ0hF1Z0bFrank McSherry and Ilya Mironov. 2009. Differentially Private Recommender Systems: Building Privacy into the NetflixPrize Contenders. In

Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and DataMining (KDD ’09) . ACM, New York, NY, USA, 627–636. https://doi.org/10.1145/1557019.1557090Frank D. McSherry. 2009. Privacy Integrated Queries: An Extensible Platform for Privacy-preserving Data Analysis. In

Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD ’09)

Proceedings of the 2012ACM Conference on Computer and Communications Security (CCS ’12) bind :: ⃝ A → ( A → ⃝ B ) → ⃝ B ret :: A → ⃝ A Fig. 9. Operations over the distribution monad

Arjun Narayan and Andreas Haeberlen. 2012. DJoin: Differentially Private Join Queries over Distributed Databases.In

Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12) . USENIXAssociation, Berkeley, CA, USA, 149–162. http://dl.acm.org/citation.cfm?id=2387880.2387895Joseph P. Near, David Darais, Tim Stevens, Paranav Gaddamadugu, Lun Wang, Neel Somani, Mu Zhang, Nikhil Sharma,Alex Shan, and Dawn Song. 2019. (2019). http://david.darais.com/assets/papers/duet/duet.pdfTravis E. Oliphant. 2015.

Guide to NumPy (2nd ed.). CreateSpace Independent Publishing Platform, USA.Nicolas Papernot, MartÃŋn Abadi, ÃŽlfar Erlingsson, Ian Goodfellow, and Kunal Talwar. 2016. Semi-supervised KnowledgeTransfer for Deep Learning from Private Training Data. arXiv:1610.05755 [cs, stat] (Oct. 2016). http://arxiv.org/abs/1610.05755 arXiv: 1610.05755.Jason Reed and Benjamin C. Pierce. 2010. Distance Makes the Types Grow Stronger: A Calculus for Differential Privacy.

SIGPLAN Not.

45, 9 (Sept. 2010), 157–168. https://doi.org/10.1145/1932681.1863568John C. Reynolds. 2002. Separation Logic: A Logic for Shared Mutable Data Structures. In

Proceedings of the 17th AnnualIEEE Symposium on Logic in Computer Science (LICS ’02) . IEEE Computer Society, Washington, DC, USA, 55–74. http://dl.acm.org/citation.cfm?id=645683.664578Ryan M Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. 2016. Privacy Odometers and Filters: Pay-as-you-Go Composition. In

Advances in Neural Information Processing Systems 29 , D. D. Lee, M. Sugiyama, U. V.Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 1921–1929. http://papers.nips.cc/paper/6170-privacy-odometers-and-filters-pay-as-you-go-composition.pdfTetsuya Sato. 2016. Approximate Relational Hoare Logic for Continuous Random Samplings.

Electronic Notes in TheoreticalComputer Science

325 (2016), 277 – 298. https://doi.org/10.1016/j.entcs.2016.09.043 The Thirty-second Conference on theMathematical Foundations of Programming Semantics (MFPS XXXII).Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata. 2019. Approximate Span Liftings.

CoRR abs/1710.09010 (2019). arXiv:1710.09010 http://arxiv.org/abs/1710.09010Wouter Swierstra. 2008. Data Types à La Carte.

J. Funct. Program.

18, 4 (July 2008), 423–436. https://doi.org/10.1017/S0956796808006758Daniel Winograd-Cort, Andreas Haeberlen, Aaron Roth, and Benjamin C. Pierce. 2017. A Framework for Adaptive DifferentialPrivacy.

Proc. ACM Program. Lang.

1, ICFP, Article 10 (Aug. 2017), 29 pages. https://doi.org/10.1145/3110254Danfeng Zhang and Daniel Kifer. 2017. LightDP: Towards Automating Differential Privacy Proofs.

SIGPLAN Not.

52, 1 (Jan.2017), 888–901. https://doi.org/10.1145/3093333.3009884Dan Zhang, Ryan McKenna, Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, and Gerome Miklau. 2018. Ektelo: AFramework for Defining Differentially-Private Computations. In

SIGMOD Conference . A SEMANTICS

Fuzzi’s semantics directly follows from the work of Barthe et al. [2016], but we extend the languagewith operations over vectors and bags (Figure 10). Due to the possibility of out-of-bounds indexing,Fuzzi’s semantics for expression accounts for partiality by modeling expressions with partialfunctions. We use a list structure to model arrays in Fuzzi. The function length v returns thelength of the list v . The function resize len v updates the list v such that it has length equal to len and pads the list with well-shaped default values if necessary. The function update b i v returnsanother list whose i -th element is set to b , or ⊥ if i is out of bounds. We elide an implicit coercionfrom ⊥ to the special distribution distr . The distribution distr is the empty distribution; it isused to model non-termination.The semantics of Fuzzi commands are given as probabilistic functions from program statesto sub-distributions over program states. We write ⃝ A for the set of sub-distributions over A .Sub-distributions forms a monad with two operators ret and bind (Figure 9). The ret operatortakes a value and produces a distribution whose entire mass is concentrated on that single value; Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 29 [[ x = e ]] = λM . let v = [[ e ]] M in ret ( M [ x (cid:55)→ v ])[[ x [ i ] = e ]] = λM . let v = [[ e ]] M inlet idx = [[ i ]] M inret ( M [ x (cid:55)→ update v idx M ( x )])[[ x . lenдth = e ]] = λM . let len = [[ e ]] M inret ( M [ x (cid:55)→ resize len M ( x )])[[ if e then c else c end ]] = λM . let cond = [[ e ]] M inif cond then [[ c ]] M else [[ c ]] M [[ while e do c end ]] n = λ M . if n ≤ then distr else if [[ e ]] M then bind ([[ c ]] M ) ([[ while e do c end ]] n − ) else ret M [[ while e do c end ]] = (cid:195) n [[ while e do c end ]] n Fig. 10. Semantics of Fuzzi the bind operator builds conditional distributions through compositions between distributions andfunctions from samples to distributions.

B PROOF RULES FOR APRHL

Reasoning about programs that implement differentially private mechanisms involves both rela-tional and probablistic reasoning. The authors of apRHL combined ingredients from program logicand probability theory to create a proof system that manages the complexity of such proofs. Theunion of two key ingredients— relational Hoare logic and approximate liftings—results in a proofsystem that is very well suited for differential privacy proofs.Benton [2004] established relational Hoare logic (RHL). RHL provides a system for writing downproofs of properties between two executions of deterministic while-programs. This proof systemwould allow us to embed a type system that tracked sensitivity (but not privacy costs) for thedeterministic fragment of Fuzzi; however, RHL lacks the reasoning facility for distributions.Distributions over program states have complicated structures that arise from branches andloops; proofs that directly manipulated these distributions may get unwieldy due to the complexityof these distributions’ structures. Fortunately, apRHL applies “approximate liftings” to significantlysimplify such proofs over distributions.An approximate lifting is characterized by a relation R between two distributions’ support: given µ A a distribution in ⃝ A and µ B in ⃝ B , the relation R is a subset of A × B . This approximate liftingbased on R allows us to only consider elements linked by R while proving relational properties on µ A and µ B , and apRHL applies this abstraction to simplify proofs of probabilistic programs.An apRHL judgment has the form ⊢ c ∼ ( ϵ , δ ) c : Φ ⇒ Ψ . The assertions Φ and Ψ are relationsover program states. The pre-condition Φ is a deterministic relation that the input program statesto c and c satisfies. Since c and c are probabilistic programs, the output program states afterrunning c and c are distributions of program states, and the post-condition is used to construct anapproximate lifting between the two output program state distributions. The validity of an apRHLjudgment implies a valid approximate lifting of the post-condition Ψ over [[ c ]] M and [[ c ]] M . The Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019.

Fig. 11. Approximate lifting of R between two Laplace distributions relations Φ and Ψ may describe properties as x ⟨ ⟩ = x ⟨ ⟩ —the value held in x is the same in bothexecutions.An approximate lifting of R ⊆ A × A that relates two distributions µ : ⃝ A and µ : ⃝ A isjustified by two witness distributions and two cost parameters ( ϵ , δ ) . Let ⋆ be a distinct elementnot in A ∪ A , and write the extended relation R ⋆ = R ∪ A × { ⋆ } ∪ { ⋆ } × A . Let A ⋆ and A ⋆ be A and A extended with ⋆ , respectively. We define the ϵ -distance between µ L and µ R as following: d ϵ ( µ L , µ R ) = max S ⊆ A ⋆ × A ⋆ ( µ L ( S ) − exp ( ϵ ) · µ R ( S )) Then the two witness distributions µ L and µ R should satisfy the following properties [Hsu 2017]:1. π ( µ L ) = µ and π ( µ R ) = µ Marginal2. supp ( µ L ) ∪ supp ( µ R ) ⊆ R ⋆ Support3. d ϵ ( µ L , µ R ) ≤ δ DistanceIn the Marginal condition, π and π are first and second projections of distributions thatproduces the first and second marginal distributions on A ⋆ × A ⋆ , this condition requires thewitnesses to “cover” the original µ and µ on their respective projections.The Support condition requires support of both witnesses to reside within the relation R ⋆ .The Distance condition gives bounds on the exp ( ϵ ) multiplicative difference and the δ additivedifference between these two witness distributions, and this distance definition matches up withthe definition of privacy costs used in differential privacy.We can gain some intuition of approximate liftings from Figure 11. Here we visualize an approxi-mate lifting of the equality relation over two Laplace distributions. The validity of the approximatelifting gives a global bound on the difference in probability of the linked elements in this plot, andthis bound can be naturally interpreted as privacy costs for differential privacy.The proof rules from apRHL constructs these witness distributions for probabilistic imperativeprograms. In particular, the Seq rule allows us to treat the approximate lifting of a post-conditionas the pre-condition of the next command. The resulting proof system effectively abstracts awayexplicit reasoning of distributions. The relational assertions of apRHL allows us to naturallyexpress predicates over pairs of program states, and approximate liftings cleverly hides much ofthe plumbing for probabilistic reasoning. We have listed a subset of apRHL proof rules used in thedevelopment of Fuzzi in Figure 12 and Figure 13.The structural apRHL proof rules relate programs that are structurally similar: assignments arerelated with assignments, conditionals are related with conditionals by relating their corresponding Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 31 true and false branches, and loops are related by synchronizing their loop bodies. Among thestructural rules, the Lap rule formalizes the Laplace mechanism—the pre-condition states that avalue to be released has sensitivity k , and releasing this private value with noise scaled with 1 / ϵ is ( kϵ , ) -DP.An important quirk of apRHL is that it does not have a conjunction rule like this: ⊢ c ∼ ( ϵ , δ ) c : Φ ⇒ Ψ ⊢ c ∼ ( ϵ , δ ) c : Φ ⇒ Θ ⊢ c ∼ ( ϵ , δ ) c : Φ ⇒ Ψ ∧ Θ Since an apRHL judgement is justified by the existence of witness distributions for the approximatelifting of the post-condition, the two judgements above the inference bar tells us there existswitnesses separately justifying the approximate liftings of Ψ and Θ . However, this does not guaranteethe existence of an approximate lifting for their conjunction. Thus, this rule is not valid. However,apRHL does have a Frame rule, which allows us to conjunct Θ with the pre- and post-conditions,as long as Θ does not mention any modified variables of the two related programs.Sometimes structural apRHL rules are not enough, since sensitive values may steer programcontrol flow towards different sequences of code. The authors of apRHL account for reasoning ofthese programs through one-sided proof rules, as shown in Figure 13.The rules While-L and While-R deserve some special attention: it allows us to relate a while loopwith the skip command using a one-sided loop invariant. This allows us to carry out standard Floyd-Hoare logic reasoning on a single while loop. We will use these one-sided loop rules extensivelylater in the proof of extension typing rules. These one-sided loop rules require a side-condition oflossless-ness for the loop body—a program c is lossless if executing c results in a proper distribution,i.e., the probability distribution sums up to 1. Although apRHL gives the definition of lossless, it doesnot provide proof rules for lossless-ness. In the development of Fuzzi, we developed a terminationtype system compatible with this definition of lossless. Details can be found in Appendix C. C RULES FOR term

AND linear

C.1 Justifying term rules

In Section 5, we defined two auxiliary properties of Fuzzi programs— term and linear —in order totypecheck various extensions. The typing rules for these auxillary properties share a similar designas the typing rules for sensitivity—we give the definition of each auxillary property in the baselogic L and give the typing rules as theorems to be justified in L . However, for expressions, weagain use inductive relations instead of foundational definition since we do not plan on extendingthe rules for expressions.Definition 7. The well-shaped judgment M ∈ shape ( Γ ) for memories is defined as: for any x ∈ σ τ in Γ , there exists some v , such that M ( x ) = v and v ∈ τ . Lemma 4 (Termination for expressions).

For an expression e , given the termination judgment Γ ⊢ e term , then for any program state M ∈ shape ( Γ ) , evaluating [[ e ]] M results in some value v . Definition 8 (Termination for commands).

For a command c , the judgment Γ ⊢ c term isdefined as: given any program state M ∈ shape ( Γ ) , evaluating [[ c ]] M results in a proper distribution. In order to prove the soundness of termination rules for commands, we need a lemma of the“preservation” property for well-shaped commands.Lemma 5 (Preservation). If M ∈ shape ( Γ ) and c is well-shaped according to Γ , then for any M ′ ∈ supp ([[ c ]] M ) , the program state M ′ ∈ shape ( Γ ) . Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019.

Skip ⊢ skip ∼ ( , ) skip : Φ ⇒ Φ Assn ⊢ x ⟨ ⟩ = e ⟨ ⟩ ∼ ( , ) x ⟨ ⟩ = e ⟨ ⟩ : Φ [ e ⟨ ⟩ , e ⟨ ⟩/ x ⟨ ⟩ , x ⟨ ⟩] ⇒ Φ Lap Φ ≜ | e ⟨ ⟩ − e ⟨ ⟩| ≤ k ⊢ x ⟨ ⟩ = L / ϵ ( e ⟨ ⟩) ∼ ( kϵ , ) x ⟨ ⟩ = L / ϵ ( e ⟨ ⟩) : Φ ⇒ x ⟨ ⟩ = x ⟨ ⟩ Seq ⊢ c ∼ ( ϵ , δ ) c : Φ ⇒ Ψ ⊢ c ′ ∼ ( ϵ ′ , δ ′ ) c ′ : Ψ ⇒ Θ ⊢ c ; c ′ ∼ ( ϵ + ϵ ′ , δ + δ ′ ) c ; c ′ : Φ ⇒ Θ Cond ⊨ Φ ⇒ e ⟨ ⟩ = e ⟨ ⟩⊢ c ∼ ( ϵ , δ ) c : Φ ∧ e ⟨ ⟩ ⇒ Ψ ⊢ c ′ ∼ ( ϵ , δ ) c ′ : Φ ∧ ¬ e ⟨ ⟩ ⇒ Ψ ⊢ if e then c else c ′ end ∼ ( ϵ , δ ) if e then c else c ′ end : Φ ⇒ Ψ While* ⊨ Φ ⇒ e ⟨ ⟩ = e ⟨ ⟩⊢ c ∼ ( , ) c : Φ ∧ e ⟨ ⟩ ⇒ Φ ⊢ while e do c end ∼ ( , ) while e do c end : Φ ⇒ Φ ∧ ¬ e ⟨ ⟩ Conseq ⊢ c ∼ ( ϵ ′ , δ ′ ) c : Φ ′ ⇒ Ψ ′ ⊨ Φ ⇒ Φ ′ ⊨ Ψ ′ ⇒ Ψ ⊨ ϵ ′ ≤ ϵ ⊨ δ ′ ≤ δ ⊢ c ∼ ( ϵ , δ ) c : Φ ⇒ Ψ Eqiv ⊢ c ′ ∼ ( ϵ , δ ) c ′ : Φ ⇒ Ψ c ≡ c ′ c ≡ c ′ ⊢ c ∼ ( ϵ , δ ) c : Φ ⇒ Ψ Frame ⊢ c ∼ ( ϵ , δ ) c : Φ ⇒ Ψ fvs Θ ∩ mvs ( c , c ) = ∅ ⊢ c ∼ ( ϵ , δ ) c : Φ ∧ Θ ⇒ Ψ ∧ Θ Fig. 12. Proof rules for apRHL (Structural Rules)

Lemma 5 tells us that well-shaped commands do not change the data type of variables. In thesequence case, we know that both commands c and c terminates if executed under program statesin shape ( Γ ) . With the preservation property, we know all variables in the resulting program stateafter c still have the same data type, so executing c in the resulting program state still terminates.For extensions, if the extension takes a command as an argument, then the termination of theexpanded program certainly relies on the termination of the argument. In the case of bag map andvector map, the while loop executes the supplied map body c for a finite number of iterations—thelength of the input bag or vector. Array accesses within the while loops are safe because the indexis bounded by the array length. So, requiring c to terminate makes the entire loop terminate.For partition, the first while loop terminates because it’s bounded by the value of the supplied nParts expression. The following bag map terminates due to the same arguments as above. Thenext while loop also terminates, because the loop is bounded by the number of entries in out idx ,and the index variable i is in range; the modification to t part also uses an index expression that’s inrange. Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 33

While-L ⊢ c ∼ ( , ) skip : Φ ∧ e ⟨ ⟩ ⇒ Φ ⊨ Φ ⇒ Φ ⟨ ⟩ Φ ⟨ ⟩ ⊨ while e do c end lossless ⊢ while e do c end ∼ ( , ) skip : Φ ⇒ Φ ∧ ¬ e ⟨ ⟩ While-R ⊢ skip ∼ ( , ) c : Φ ∧ e ⟨ ⟩ ⇒ Φ ⊨ Φ ⇒ Φ ⟨ ⟩ Φ ⟨ ⟩ ⊨ while e do c end lossless ⊢ skip ∼ ( , ) while e do c end : Φ ⇒ Φ ∧ ¬ e ⟨ ⟩ Assn-L ⊢ x = e ∼ ( , ) skip : Φ [ e ⟨ ⟩/ x ⟨ ⟩] ⇒ Φ Assn-R ⊢ skip ∼ ( , ) x = e : Φ [ e ⟨ ⟩/ x ⟨ ⟩] ⇒ Φ Cond-L ⊢ c ∼ ( ϵ , δ ) c : Φ ∧ e ⟨ ⟩ ⇒ Ψ ⊢ c ′ ∼ ( ϵ , δ ) c : Φ ∧ ¬ e ⟨ ⟩ ⇒ Ψ ⊢ if e then c else c ′ end ∼ ( ϵ , δ ) c : Φ ⇒ Ψ Cond-R ⊢ c ∼ ( ϵ , δ ) c : Φ ∧ e ⟨ ⟩ ⇒ Ψ ⊢ c ∼ ( ϵ , δ ) c ′ : Φ ∧ ¬ e ⟨ ⟩ ⇒ Ψ ⊢ c ∼ ( ϵ , δ ) if e then c else c ′ end : Φ ⇒ Ψ Fig. 13. Proof rules for apRHL (One-sided Rules) x ∈ ΓΓ ⊢ x term lit ∈ int ∨ lit ∈ real ∨ lit ∈ bool Γ ⊢ lit term Γ ⊢ e term Γ ⊢ e term Γ ⊢ e op e term Γ ⊢ e term Γ ⊢ e . length term Γ ⊢ skip term Γ ⊢ e term Γ ⊢ x = e term Γ ⊢ c term Γ ⊢ c term Γ ⊢ c ; c term Γ ⊢ e term Γ ⊢ c term Γ ⊢ c term Γ ⊢ if e then c else c end term Γ ⊢ c term Γ ⊢ bmap ( in , out , t in , i , t out , c ) term Γ ⊢ c term Γ ⊢ vmap ( in , out , t in , i , t out , c ) term Γ ⊢ c term Γ ⊢ nParts term Γ ⊢ partition ( in , out , t in , i , t out , t idx , out idx , t part , nParts , c ) term Γ ⊢ bsum ( in , out , i , t in , bound ) term Fig. 14. Rules for term

The bag sum extension terminates since its loop is bounded by the input bag’s size, and the indexvariable i is in range when accessing in . C.2 Justifying linear rules

We will discuss the first 4 linear rules here and delay the linear rules for extensions to Appendix D.1because the linear property of extensions are intimately tied to the proof of their sensitivityproperties.

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. { Γ } skip { Γ , ( , )} linear Γ ⊢ e ∈ σ τ { Γ } x = e { Γ [ x (cid:55)→ σ ] , ( , )} linear { Γ } c { Γ , ( , )} linear { Γ } c { Γ , ( , )} linear { Γ } c ; c { Γ , ( , )} linear { Γ } c { Γ , ( , )} linear { Γ } c { Γ , ( , )} linear Γ ⊢ e ∈ bool { Γ } if e then c else c end { max ( Γ , Γ ) , ( , )} linear Termination, Deterministic: Γ ⊢ c term determ c Should Not Modify: t in , in , out , i (cid:60) mvs c Abbreviation: σ = [ mvs c , i , in , out (cid:55)→ ∞] σ ′ = [ mvs c , i , t in , t out (cid:55)→ ∞] Dependency: { stretch Γ [ t in (cid:55)→ ] σ } c { Γ , ( , )}{ stretch Γ [ t in (cid:55)→ ∞] σ } c { Γ , ( , )} Γ ( t out ) = { x | x ∈ mvs c ∧ Γ ( x ) > } ⊆ { t out } Output Sensitivity: Γ ( t out ) = Γ out = Γ [ out (cid:55)→ Γ ( in )] σ ′ { Γ } bmap ( in , out , t in , i , t out , c ) { Γ out , ( , )} linear Termination, Deterministic: Γ ⊢ c term determ c Should Not Modify: t in , in , out , i (cid:60) mvs c Abbreviation: σ = [ mvs c , i , in , out (cid:55)→ ∞] σ ′ = [ mvs c , i , t in , t out (cid:55)→ ∞] Dependency: { stretch Γ [ t in (cid:55)→ ] σ } c { Γ , ( , )}{ stretch Γ [ t in (cid:55)→ ∞] σ } c { Γ , ( , )} Γ ( t out ) = { x | x ∈ mvs c ∧ Γ ( x ) > } ⊆ { t out } Linear: { Γ [ t in (cid:55)→ ] σ } c { Γ , ( , )} linear Output Sensitivity: Γ out = Γ [ out (cid:55)→ Γ ( in ) · Γ ( t out )] σ ′ { Γ } vmap ( in , out , t in , i , t out , c ) { Γ out , ( , )} linear Fig. 15. Rules for linear (Part 1)

Recall Definition 6: a deterministic and terminating command c is linear with respect to Γ and Γ , if for any k ≥

0, the scaled typing judgment { k Γ } c { k Γ , ( , )} is true.The rule for skip is true because for any fixed k , we can show { k Γ } skip { k Γ , ( , )} by reusingthe core typing rule for skip .For assignment, we first prove a lemma that shows the typing relation on expressions is linear.Lemma 6. Given Γ ⊢ e ∈ σ τ and any k > , the judgment k Γ ⊢ e ∈ kσ τ is also true. Now, if we scale the pre-condition Γ by k , we know e has sensitivity kσ under k Γ . Applying thecore assignment rule concludes the proof.For sequence of commands, since the two commands to be sequenced together are linear with Γ , Γ and Γ , Γ respectively. We know that for any k > { k Γ } c { k Γ , ( , )} and { k Γ } c { k Γ , ( , )} hold. Now, applying the core sequence rule concludes theproof.In the case of conditional commands, we need another lemma that allows us to change thepost-condition in a linear typing judgment.Lemma 7. Given { Γ } c { Γ , ( , )} linear and Γ ≤ Γ , then { Γ } c { Γ , ( , )} linear also holds. Proof. We need to show that for any k >

0, the typing judgement { k Γ } c { k Γ , ( , )} is true. Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 35

From the premise, we know that { k Γ } c { k Γ , ( , )} is true. Since Γ ≤ Γ , scaling both by k preserves the pointwise order k Γ ≤ k Γ . We are only weakening the post-condition here, soapplying the Conseq rule from apRHL concludes this proof. □ Using Lemma 7, we know both branches are linear with respect to Γ and max ( Γ , Γ ) . Then, since e is 0-sensitive under Γ , this allows us to apply the Cond rule from apRHL concludes this case. D SOUNDNESS PROOFS FOR EXTENSIONSD.1 Bag-Map

To prove Bag-Map’s typing rule is sound, we need to show the apRHL judgement correspondingto the conclusion is true. This apRHL judgement relates two instances of the bag map program bmap ( in , out , t in , i , t out , c ) ∼ bmap ( in , out , t in , i , t out , c ) Our general strategy for proving the soundness of these extensions is to first prove some specifica-tion f in the logic L specifies the computation of the expanded extension code and then separatelyreason about the sensitivity of the output from the specification f .Our first step is to apply the Eqiv rule and rewrite the pair of related programs with an extra skip : bmap ( in , out , t in , i , t out , c ) ; skip ∼ skip ; bmap ( in , out , t in , i , t out , c ) And apply Seq followed by While-L and While-R rule to perform one-sided reasoning. Since thetwo one-sided cases are symmetric, we only discuss the case for bmap ( in , out , t in , i , t out , c ) ∼ skip . Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019.

Since our goal is to give a specification of bag map, a natural choice would be to model bag mapwith the map operator over lists, which applies a function over each value in a bag and returns anew bag. However, we need to find a function f that adequately describes the semantics of c as anargument to map .We know, from the typing rule of bmap , that c is a deterministic and terminating command. Here,we state an important lemma about deterministic and terminating commands that will help us findsuch a function f .Lemma 8 (Semantics of Deterministic Terminating Programs). Given determ c and Γ ⊢ c term , then there exists a total function f [[ c ]] : M → M such that [[ c ]] M = ret ( f [[ c ]] M ) for anyprogram state M in shape ( Γ ) . From Lemma 8, we know that the semantics of c can be described by a total function f [[ c ]] mappingprogram states to program states. However, the dependency analysis we performed reveals moreabout f [[ c ]] . First, consider an arbitrary iteration in the bag map loop; let’s divide the program stateright before executing c into 4 parts: 1) the value of t in , 2) the values of all modified variables in c ,3) the values of all other non-sensitive variables, and 4) the values of all other sensitive variables.From the dependency analysis, we know c ’s modified variables have no dependency on 2) and 4).The variables that hold values from 3) are not modified, so their values remain constant throughoutthe entire loop, and are also the same in both executions. Using this information, we can build afunction f [[ c ]] spec that takes the value of t in as the sole input, but calculates the same program stateas f [[ c ]] : let v be the input to f [[ c ]] spec , we first create a fictitious program state M ′ by instantiatingvariables of 1), 2), and 4) in M ′ with well-shaped default values, and copying 3) from M into M ′ .Now, by feeding M ′ [ t in (cid:55)→ v ] to f [[ c ]] , and accessing its value at t out , we get what c would havecomputed for t out .Knowing these properties of f [[ c ]] spec , we can choose the one-sided invariant as: out [ . . . i ] = map f [[ c ]] spec in [ . . . i ] where f [[ c ]] spec v = f [[ c ]] M ′ [ t in (cid:55)→ v ] .The notation v [ i . . . j ] selects a sub-array from v in the range [ i , j ) ; if j ≤ i , the it selects an emptysub-array. Thus, at the end of both execution of bag map with c , we know the output bags arecomputed by mapping the semantic function of c over the values of the input array.Having established this specification of bag map, we now need to consider its sensitivity proper-ties. We can prove the following lemma for map :Lemma 9. For any function f : τ → σ , and two input lists x and x , if the bag distance between x and x is d , then the values of map f x and map f x have bag distance up to d . Proof. By induction on d .When d =

0, the two bags must be permutations of each other. So the mapped values are alsopermutations of each other. Thus, the mapped values also have bag distance 0.In the inductive case, if the two input bags have distance d +

1, without loss of generality, assume x has an element that is missing from x . Then, it must be the case that f applied to this elementis also missing from map f x . From the induction hypothesis, we know the mapped values of x without this extra element and that of x have bag distance up to d . Thus, adding an extra elementwill increase the bag distance up to d + □ We also need to show that the bag map extension is linear with respect to the pre-conditionand post-condition in the conclusion of its typing rule. First, we note that our sensitivity analysisof the bag map specification reveals the input and output bags have the same sensitivity. So, if

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 37 input bag’s sensitivity is scaled by k , then the output bag’s sensitivity will be scaled by the same k according to this typing rule. Next, we note that all other modified variables have sensitivity ∞ , forany k >

0, multiplying k with ∞ still results in ∞ . For the variables not modified by bag map, theirsensitivities are not changed in this typing rule, using the Frame rule from apRHL we can showtheir sensitivities are also scaled by k . This accounts for the linear scaling of all the variables in thepre- and post-conditions. D.2 Vector-Map

We follow the same strategy applied in the proof of bag map and first build the function f [[ c ]] spec that characterizes c behavior using just t in and the values of non-modified non-sensitive variablesbefore entering the vector map loop. We also establish the same loop invariant: out [ . . . i ] = map f [[ c ]] spec in [ . . . i ] From the premises of the vector map typing rule, we know c is a linear command. So, byDefinition 6, we know that if v ⟨ ⟩ and v ⟨ ⟩ have distance d , then f [[ c ]] spec v ⟨ ⟩ and f [[ c ]] spec v ⟨ ⟩ havedistance sd , where s = Γ ( t out ) , and Γ is the typing context as specified in the typing rule for vectormap. In fact, we give the following definition to characterize functions like f [[ c ]] spec :Definition 9. Given a function f : τ → σ , if there is a number s ∈ R > ∪ {∞} such that for any x , x ∈ τ , the distance d σ ( f x , f x ) ≤ sd τ ( x , x ) , then we call f a linear function with scale factor s with the chosen distance functions d τ and d σ . We remark that functions as defined by Definition 9 are also called lipschitz functions withlipschitz constant s . However, in the usual mathematical definition of s -lipschitz functions, thevalue of s does not include ∞ .We then prove the following lemma for such functions:Lemma 10. If f : τ → σ is a linear function with scale factor s , then map f is also a linear functionwith scale factor s with distance functions chosen as the array distances on [ τ ] and [ σ ] . Proof. We first case analyze on whether the two input arrays have the same length.If they have different lengths, then their distance is ∞ , so for any positive s , the scaled distance s · ∞ = ∞ is still infinite. And we are done.If they have the same length, then we proceed by induction on the length of these arrays.When the length is 0, the two arrays have distance 0, and so do the mapped arrays. The inequality0 ≤ s · = x and x have length n +

1. Consider theprefix sub-arrays of length n . Let d prefix be the distance between the two prefix sub-arrays, and let d last = d [ τ ] ( x , x ) − d prefix . Let the distance between the mapped prefix sub-arrays be d ′ prefix .From the induction hypothesis, we know d ′ prefix ≤ sd prefix . Now, let d ′ last = d σ ( f x [ n ] , f x [ n ]) .Since f is a linear function with scale factor s , we know d ′ last ≤ sd last .Combining this inequality with the previous one, we get d ′ prefix + d ′ last ≤ s ( d prefix + d last ) . By thedefinition of array distances, we know the distance between the mapped arrays is less than thedistance between the input arrays scaled with s . □ Applying Lemma 10 together with the loop invariant we established concludes the proof forarray map.The vector map program is also linear with respect to the pre- and post-conditions produced byits sensitivity typing rule. Since the output array has a sensitivity in the post-condition that is amultiple of the input array’s sensitivity in the pre-condition, scaling the input array’s sensitivity by

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. some k > D.3 Partition

We use the same reasoning from bag map to show that out idx = map f [[ c ]] spec in — that is, the partitionindices are computed by mapping the semantics f [[ c ]] spec over the input bag.For the second while loop that places each input bag element into the corresponding partition,we will specify it using the combinator foldl over bags: foldl : (b → a → b) → b → {a} → b The first argument to foldl will be a function that specifies how to place one element from theinput bag into the output partitions. We will use the function place to build this argument: place : [{ τ }] → ( int * τ ) → [{ τ }]place parts (idx , elmt ) = match nth idx parts with Some part → update ( part ++ [ elmt ]) idx partsNone → parts The place function takes the partitions, followed by a pair of partition index and the bag element,and produces new partitions such that for pairs whose indices are in range, the bag element will beinserted at the end of the indexed partition. The pairs whose indices are out of range are simplyignored.The specification of the second while loop is then out [ . . . i ] = foldl place empty ( zip out idx [ . . . i ] in [ . . . i ]) The right hand side can be expanded into foldl place empty ( zip ( map f [[ c ]] spec in [ . . . i ]) in [ . . . i ]) using the specification established the first while loop. The empty value is the initial empty partition—an array of empty bag values, whose length is equal to the value of the specified nParts parameter.The zip operator takes two lists and produces a list of pairs.We prove this specification characterizes the behavior of the second loop using the one-sidedapRHL rules.Next, we need to consider the sensitivity of out using the foldl specification. Our typing ruleclaims that the distance between out ⟨ ⟩ and out ⟨ ⟩ is at most the distance between the input bags in ⟨ ⟩ and in ⟨ ⟩ . Let d be the bag distance between in ⟨ ⟩ and in ⟨ ⟩ .But first, we will need to establish a few more properties about values in Fuzzi.Definition 10 (Eqivalence). Two values v and v of type τ are equivalent if their distance d τ ( v , v ) is . Lemma 11.

The equivalence definition is a proper equivalence relation: it is symmetric, reflexiveand transitive.

Proof. By induction on the type. □ Lemma 12.

Given two values x and x ′ of type τ that are equivalent, for any x of the same type, d τ ( x , x ) = d τ ( x ′ , x ) . Proof. By induction on the type. □ Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 39

For bags, two bags are equivalent if they are permutations of each other.For arrays, two arrays are equivalent if they have the same length and each pair of elements intheir corresponding positions are equivalent.So, two arrays of bags are equivalent if these two arrays have the same length and each pair ofbags at every position are permutations of each other.Let’s establish a notation for writing down permutations. We will use a sequence of integers toexpress a permutation. For example, given the bag { a , b , c } , an equivalent bag under the permutation312 is { c , a , b } .Next, we show that the partition specification respects the equivalence relation.Lemma 13. Given two bags x and x and any total function f , if x and x are equivalent, then foldl place empty ( zip ( map f x ) x ) and foldl place empty ( zip ( map f x ) x ) are also equivalent. Proof. Let σ be the permutation such that σ ( x ) = x .We call i , j an inversion in σ if i appears before j , but i > j . We proceed by induction on thenumber of inversions in σ .In the base case, σ is identity, so x = x , and we are done.In the inductive case, let there be k + σ . There must be an adjacent inversion in σ .An adjacent inversion is a length-2 subsequence ij in σ such that i appears immediately before j ,but i > j . If there were no such adjacent inversions, then σ must be identity again, and this wouldbe a contradiction with the number of inversions k + σ ′ by swapping i and j . The new permutation σ ′ must have k inversions. Consider the following illustration: σ ′ = σ σ . . . ji . . . σ n For all numbers σ upto σ k right before i , their relative position did not change with respect to i or j , so we did not introduce nor eliminate inversions by swapping i and j . The same argument goesfor all numbers after j .Since ij itself is an inversion, the total number of inversions must have decreased by 1.So, by induction hypothesis, the output from folding x and σ ′ ( x ) must be equivalent.Now, we just need to show the output from folding σ ′ ( x ) and x must be equivalent as well,then we are done. Recall x = σ ( x ) .The only difference between these two bags is that x [ i ] and x [ j ] appears in swapped orders in σ ′ ( x ) and σ ( x ) .Let’s do a case analysis on whether f x [ i ] = f x [ j ] . Let the partition index computed byapplying f to x [ i ] and x [ j ] be m and n .If m (cid:44) n , then the output from foldl will in fact be identical. This is because since x [ i ] is placedinto the m -th output bag, and x [ j ] is placed into the n -th output bag, but the order in which theyare processed have not changed with respect to other elements in their respective output bags. Sothe output remains identical.If m = n , then only the m -th output bag will be impacted. And in that output bag, the values x [ i ] and x [ j ] will be swapped since the order in which they are processed is changed. Butthis just permutes the m -th output bag. So, the output arrays from folding σ ′ ( x ) and x remainequivalent. □ Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019.

By Lemma 12 and Lemma 13, we know that for any two inputs x and x , permuting them doesnot change the distance on their output from the specification of partition.Knowing this, we are ready to prove the sensitivity condition for partition.Lemma 14. Given two bags x and x with distance d and a total function f , the distance between foldl place empty ( zip ( map f x ) x ) and foldl place empty ( zip ( map f x ) x ) is also d . Proof. Since we know permuting x and x does not change the distance on the outputs, without loss of generality, we can assume that x and x are arranged such that x = px ′ and x = px ′ ,where p is a common prefix, and x ′ is the multiset difference between x and x , and similarly x ′ isthe multiset difference between x and x . This implies d = | x ′ | + | x ′ | .Proceed by induction on d .In the base case, we know x = x = p , and we are done.In the inductive case, assume the bag distance is d +

1, and x = px ′ v and x = px ′ , where v isthe last element in the multiset difference between x and x .Since d + = | x ′ v | + | x ′ | , which implies d = | x ′ | + | x ′ | . By induction hypothesis, we know thedistance between folding px ′ and x is d . We just need to consider how adding the last value v to x changes the distance.Let i = f v . So v will be placed at the end of the i -th output bag. Since we are only adding v to the output from px ′ and nothing is added to the output from x , this implies the distance mustincrease by 1. This concludes the proof. □ D.4 Bag-Sum

We again deploy the same strategy used so far. We can establish a specification that models bagsum with the one-sided loop invariant out = foldl ( λ sum v . sum + clip bound v ) in [ . . . i ] where clip returns a value whose magnitude is within the bound set by its first argument.We also show that this specification respects equivalence relations on bags.Lemma 15. Given two equivalent bags x and x , the values foldl ( λ sum v . sum + clip bound v ) x and foldl ( λ sum v . sum + clip bound v ) x are the same. Proof. The proof again proceeds by induction on the number of inversions in the permutation σ where x = σ ( x ) .In the inductive case, we apply commutativity of addition to conclude the proof. □ So, by Lemma 12 and Lemma 15, we can again permute the inputs without changing the distanceon the outputs.Finally, to reason about the sensitivity of the specification, we consider the following lemma.Lemma 16.

Given two bags x and x of distance d and a non-negative real number bound, then dis-tance between foldl ( λ sum v . sum + clip bound v ) x and foldl ( λ sum v . sum + clip bound v ) x is at most d · bound. Proof. With out loss of generality, we can again assume x = px ′ and x = px ′ just like we didfor partition. We know d = | x ′ | + | x ′ | .Proceed by induction on d . The base case follows directly from x = x = p . Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. uzzi: A Three-Level Logic for Differential Privacy 41

In the inductive case, we assume x = px ′ v , where v is the last element of x . Since d = | x ′ | + | x ′ | ,by induction hypothesis, we know the distance between bag summing px ′ and x is at most d · bound .The bag sum of x simply adds clip bound v to the bag sum of px ′ . But the absolute valueof clip bound v is at most bound . So, the distance between the bag sum of x and x is at most ( d + ) · bound . This concludes the proof. □ The bag sum program is also linear with respect to the typing contexts admitted by its sensitivityrules. Scaling the pre-condition by k > k in the post-condition. The other modified variables have ∞ -sensitivity, and thescaled post-condition also has their sensitivity as ∞ . D.5 Advanced Composition

The advanced composition rule is a straightforward application of the apRHL advanced compositionrule.

E FUZZI IMPLEMENTATION OF DIFFERENTIALLY PRIVATE GRADIENT DESCENT

We show the full implementation of differentially private gradient descent for logistic regression (asdiscussed in Section 7.1) here. This code shown here is largely comprised of three parts: (1) a bmap application that preprocesses the input data, (2) a second bmap application that computes the privategradients, (3) and a final step that releases noised gradients and updates model parameter. The codealso uses a special extension called repeat . This extension takes a loop index variable, a constantliteral integer and a Fuzzi command as parameters, and expands to a while loop that executes thecommand for the specified number of times. The typing rule for this extension simply unrolls theloop for the specified number of times, but perform no special deduction on the sensitivities andprivacy cost for the entire loop. We had elided this extension from the main body of the paperbecause it only provides a better programming experience (one could simply copy the loop bodyfor the specified number of times to reach same result), but does not provide additional insight toFuzzi’s design.

Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: June 2019. lamb = 0.1;rate = 0.1;epoch = 0;size $= lap (10.0 , fc ( length ( db )));/* used advanced composition for 100 total passes */ ac ( epoch , 100 , 1.0e -6 ,/* extend each row to account for bias */ bmap (db , db1 , trow , i , trow1 ,trow1 = zero_786 ;trow1 [0] = 1.0;repeat (j , 785 , trow1 [j +1] = trow [j ];) ;j = 0;);/* compute the gradient for each row */i = 0;trow1 = zero_786 ; bmap (db1 , dws , trow1 , i , twout ,twout = zero_785 ;repeat (j , 785 , twout [j] = trow1 [j ];) ;j = 0;dt = clip ( dot ( twout , w) , 100.0) ;temp = exp ( -1.0 * trow1 [785] * dt );prob = 1.0 / (1.0 + temp );sc = (1.0 - prob ) * trow1 [785];twout = scale (sc , twout );dt = 0.0;temp = 0.0;prob = 0.0;sc = 0.0;);/* compute noised gradient and update model parameter */repeat (j , 785 ,i = 0; twout = zero_785 ; tf_out = 0.0; bmap (dws , dws_j , twout , i , tf_out , tf_out = twout [j ];) ;i = 0; tf_out = 0.0; bsum ( dws_j , j_sum , i , tf_out , 1.0) ;j_sum $= lap (5000.0 , j_sum );w[j] = w[j] + ( j_sum / size - 2.0 * lamb * w[j ]) * rate ;);/* clear aux variables */db1 = {}; dt = 0.0;dws = {}; dws_j = {};i = 0; j = 0;prob = 0.0; sc = 0.0; temp = 0.0; tf_out = 0.0;trow = zero_785 ; trow1 = zero_786 ; twout = zero_785 ;);( dws_j , j_sum , i , tf_out , 1.0) ;j_sum $= lap (5000.0 , j_sum );w[j] = w[j] + ( j_sum / size - 2.0 * lamb * w[j ]) * rate ;);/* clear aux variables */db1 = {}; dt = 0.0;dws = {}; dws_j = {};i = 0; j = 0;prob = 0.0; sc = 0.0; temp = 0.0; tf_out = 0.0;trow = zero_785 ; trow1 = zero_786 ; twout = zero_785 ;);