[PDF] Approximate Span Liftings

Abstract

We develop new abstractions for reasoning about relaxations of differential privacy: Rényi differential privacy, zero-concentrated differential privacy, and truncated concentrated differential privacy, which express different bounds on statistical divergences between two output probability distributions. In order to reason about such properties compositionally, we introduce approximate span-lifting, a novel construction extending the approximate relational lifting approaches previously developed for standard differential privacy to a more general class of divergences, and also to continuous distributions. As an application, we develop a program logic based on approximate span-liftings capable of proving relaxations of differential privacy and other statistical divergence properties.

Full PDF

11Approximate Span Liftings

Compositional Semantics for Relaxations of Differential Privacy

TETSUYA SATO,

University at Buffalo, SUNY, USA

GILLES BARTHE,

IMDEA Software Institute, Spain

MARCO GABOARDI,

University at Buffalo, SUNY, USA

JUSTIN HSU,

Cornell University, USA

SHIN-YA KATSUMATA,

National Institute of Informatics, USAWe develop new abstractions for reasoning about relaxations of differential privacy:

Rényi differential privacy , zero-concentrated differential privacy , and truncated concentrated differential privacy , which express differentbounds on statistical divergences between two output probability distributions. In order to reason aboutsuch properties compositionally, we introduce approximate span-lifting , a novel construction extending theapproximate relational lifting approaches previously developed for standard differential privacy to a moregeneral class of divergences, and also to continuous distributions. As an application, we develop a programlogic based on approximate span-liftings capable of proving relaxations of differential privacy and otherstatistical divergence properties.CCS Concepts: • Software and its engineering → General programming languages ; •

Social and pro-fessional topics → History of programming languages ; ACM Reference Format:

Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata. 2018. Approximate SpanLiftings: Compositional Semantics for Relaxations of Differential Privacy.

Proc. ACM Program. Lang.

1, CONF,Article 1 (January 2018), 42 pages.

Differential privacy [Dwork et al. 2006] is a strong, statistical notion of data privacy that has attractedthe attention of theoreticians and practitioners alike. One reason for its success is that differentialprivacy can often be proved compositionally , enabling easy construction of new private algorithmsand making formal verification practical. By now, researchers have developed a wide variety ofprogramming languages and program analysis tools to prove differential privacy [Albarghouthiand Hsu 2018; Barthe et al. 2015, 2013; Gaboardi et al. 2013; McSherry 2009; Reed and Pierce 2010;Winograd-Cort et al. 2017; Zhang and Kifer 2017] (Barthe et al. [2016c] provide a recent survey).Seeking more refined composition properties, researchers have recently proposed new relaxationsof differential privacy:

Rényi differential privacy (RDP) [Mironov 2017], zero-concentrated differentialprivacy (zCDP) [Bun and Steinke 2016], and truncated concentrated differential privacy (tCDP) [Bunet al. 2018]. Roughly speaking, standard differential privacy requires a bound on the magnitude of arandom variable measuring the privacy loss, while RDP, zCDP, and tCDP model finer bounds on the moments of this random variable. (Recall that the first moment of a random variable is its averagevalue, and the second moment of a random variable is its variance.) These relaxations capturefine-grained aspects of the privacy loss, enabling more precise privacy analyses and allowingalgorithms to add less random noise to achieve the same privacy level.

Authors’ addresses: Tetsuya Sato, University at Buffalo, SUNY, Buffalo, New York, USA; Gilles Barthe, IMDEA SoftwareInstitute, Madrid, Spain; Marco Gaboardi, University at Buffalo, SUNY, Buffalo, New York, USA; Justin Hsu, CornellUniversity, Ithaca, New York, USA; Shin-ya Katsumata, National Institute of Informatics, Tokyo, USA.2018. 2475-1421/2018/1-ART1 $15.00https://doi.org/ Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. a r X i v : . [ c s . P L ] J u l :2 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata Each of RDP, zCDP, and tCDP is defined in terms

Rényi divergences [Renyi 1961], sophisticateddistances on distributions originating from information theory. Inspiring our work, Barthe andOlmedo previously developed abstractions for reasoning about a family of divergences called f - divergences as part of their work on the program logic f pRHL [Barthe and Olmedo 2013; Olmedo2014]. In particular, the semantic foundation of f pRHL is a for f -divergences, which tracks the f -divergence between relates pairs of distributions. However, thisframework is not sufficient to establish about our target properties for two reasons. First, Rényidivergences are not f -divergences, while zCDP and tCDP are properly described as supremums of Rényi divergences , rather than single divergences. As a result, these relaxations of differentialprivacy cannot be described in terms of f -divergences, nor captured in f pRHL. Accordingly, wedevelop new relational liftings supporting significantly more general divergences, allowing directreasoning about RDP, zCDP, and tCDP.A further challenge is that 2-witness relational liftings to date have only been proposed fordiscrete distributions, while many algorithms satisfying relaxations of differential privacy—indeed,the motivating examples of such algorithms—sample from continuous distributions, such as theGaussian distribution. Handling these distributions requires a careful treatment of measure theory.Sato [2016] has previously considered a different semantic model for standard differential privacyover continuous distributions using witness-free relational lifting based on a categorical constructioncalled codensity lifting [Katsumata and Sato 2015], but it is not clear how to handle more generaldivergences with this method.To overcome these difficulties, we generalize 2-witness liftings in two directions. First, we replacethe notion of f -divergence with a more general class of divergences, identifying the basic propertiesneeded for compositional reasoning. Second, we generalize these liftings to about continuousprobability measures. The main challenge is establishing a sequential composition principle—thecontinuous case introduces further measurability requirements for composition. Accordingly, weextend the structure of 2-witness liftings to a new notion called approximate span-liftings , whichhave the necessary data to ensure closure under sequential composition. Finally, we specialize ourgeneral model to Rényi divergence, divergences for zCDP, and divergences for tCDP, establishingcategorical properties needed to build approximate span-liftings. As an extended application, wedevelop a relational program logic that can verify differential privacy, RDP, zCDP, and tCDP withina single logic for programs using discrete or continuous sampling, and interpret the logic viaapproximate span-liftings.After motivating the various relaxations of differential privacy and presenting the key technicalchallenges (Section 2), and introducing mathematical preliminaries (Section 3), we present ourmain contributions. • We identify a general class of divergences supporting basic properties composition properties,and we show that our class can model RDP, zCDP and tCDP (Section 4). • We extend 2-witness relational liftings to the continuous case by introducing a novel notionof approximate span-lifting and showing how to translate composition properties of specificdivergences to their corresponding approximate span-liftings (Section 5). • We develop a program logic supporting four flavors of differential privacy—standard DP, RDP,zCDP, and tCDP—where programs may use both discrete and continuous random sampling,and show soundness (Section 6). We demonstrate our logic on three examples (Section 7).We survey related work (Section 8) and then conclude with promising future directions (Section 9). For instance, all f -divergences are jointly convex while Rényi divergences are only quasi-convex [Van Erven and Harremoës2014].Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:3 To better understand the key technical challenges, we first introduce relevant background onprivacy, divergences, and existing relational verification techniques. For simplicity, in this sectionwe consider probability distributions which have associated density functions.

We first introduce differential privacy. A randomized algorithm is a measurable function A : X → Prob ( Y ) from a set X of inputs to the set Prob ( Y ) of probability distributions on a set Y of outputs. Definition 2.1 (Differential Privacy (DP) [Dwork et al. 2006]).

A randomized algorithm A : X → Prob ( Y ) is ( ε , δ ) -differentially private w.r.t an adjacency relation Φ ⊆ X × X , if for any pairs of inputs ( x , x ′ ) ∈ Φ , and any measurable subset S ⊆ Y , we have Pr [A( x ) ∈ S ] ≤ e ε Pr [A( x ′ ) ∈ S ] + δ . Definition 2.2 (Rényi divergence [Renyi 1961]).

Let α >

1. The

Rényi divergence of order α betweentwo probability distributions µ and µ on a measurable space X is defined by: D αX ( µ || µ ) def = α − ∫ X µ ( x ) (cid:18) µ ( x ) µ ( x ) (cid:19) α dx . (1) Definition 2.3 (Rényi Differential Privacy (RDP) [Mironov 2017]).

A randomized algorithm A : X → Prob ( Y ) is ( α , ρ ) -Rényi differentially private w.r.t an adjacency relation Φ ⊆ X × X , if for anypairs of inputs ( x , x ′ ) ∈ Φ , we have D αX (A( x )||A( y )) ≤ ρ . Definition 2.4 (zero-Concentrated Differential Privacy (zCDP) [Bun and Steinke 2016]).

A random-ized algorithm A : X → Prob ( Y ) is ( ξ , ρ ) -zero concentrated differentially private w.r.t an adjacencyrelation Φ ⊆ X × X , if for any pairs of inputs ( x , x ′ ) ∈ Φ , we have ∀ α > . D αY (A( x )||A( x ′ )) ≤ ξ + α ρ . (2) Definition 2.5 (Truncated Concentrated Differential Privacy (tCDP) [Bun et al. 2018]).

A randomizedalgorithm A : X → Prob ( Y ) is ( ρ , ω ) -truncated concentrated differentially private w.r.t an adjacencyrelation Φ ⊆ X × X , if for any input pairs ( x , x ′ ) ∈ Φ , we have ∀ < α < ω . D αY (A( x )||A( x ′ )) ≤ α ρ . (3)While these notions may seem cryptic at first sight, they can all be understood as bounds on the privacy loss , defined for any two private inputs x , x ′ by L x → x ′ ( y ) = Pr [A( x ) = y ] Pr [A( x ′ ) = y ] . Intuitively, the privacy loss measures how much information is revealed when the output of aprivate algorithm is seen to be y . While output values with a high value of privacy loss are highlyrevealing—since they are far more likely to result from a private input x rather than a differentprivate input x ′ —if these outputs are only seen with very small probability, then their influencecan be discounted. Accordingly, the different privacy definitions bound different functions of theprivacy loss function, evaluated at some output y drawn from the output distribution of the privatealgorithm. The following table summarizes these bounds.Privacy notion of A Bound on privacy loss L( ε , δ ) -DP Pr y ∼A( x ) [L x → x ′ ( y ) ≤ e ε ] ≥ − δ ( α , ρ ) -RDP E y ∼A( x ) [L x → x ′ ( y ) α ] ≤ e ( α − ) ρ ( ξ , ρ ) -zCDP ∀ α > . E y ∼A( x ) [L x → x ′ ( y ) α ] ≤ e ( α − )( ξ + α ρ ) ( ω , ρ ) -tCDP ∀ < α < ω . E y ∼A( x ) [L x → x ′ ( y ) α ] ≤ e ( α − ) α ρ Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :4 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata

In particular, DP bounds the maximum value of the privacy loss, ( α , ·) -RDP bounds the α -moment, zCDP bounds all moments, and (· , ω ) -tCDP bounds the moments up to some cutoff ω .Many conversions are known between these definitions; for instance, the relaxations of RDP, zCDP,and tCDP are known to sit between ( ε , ) and ( ε , δ ) -differential privacy in terms of expressivity, upto some modification in the parameters. While this means that RDP, zCDP, and tCDP can sometimesbe analyzed by reduction to standard differential privacy, converting between the different notionsrequires weakening the parameters and often the privacy analysis is simplest and most precise byworking with RDP, zCDP, or tCDP directly. For further details, the interested reader can refer tothe original papers [Bun and Steinke 2016; Mironov 2017].A motivating example of a mechanism fitting these three definitions is the Gaussian mechanism and

Sinh Normal mechanism , which add noise according to a Gaussian distribution and sinh-normaldistribution over the real numbers respectively. The distributions are generated by continuousdensity functions. f -divergences in Discrete Case Barthe and Olmedo [2013] observed that standard differential privacy can be phrased in terms of ageneral class of divergences, called f - divergences . Definition 2.6. A weight function is a convex function f : R ≥ → R continuous at 0. Definition 2.7 ( f -divergence). For a weight function f , the f -divergence ∆ f between two distribu-tions µ , µ over a measurable space X is defined as ∆ fX ( µ , µ ) = ∫ X µ ( x ) f (cid:18) µ ( x ) µ ( x ) (cid:19) dx . (4)In particular, differential privacy can be modeled by the f -divergence ∆ DP ( ε ) with weight function DP ( ε )( t ) = max ( , − e ε t ) [Barthe and Olmedo 2013; Olmedo 2014]. For any randomized algorithm A : X → Prob ( Y ) and adjacency relation Φ ⊆ X × X , we have A is ( ε , δ ) -DP iff ( for all ( x , x ′ ) ∈ Φ , ∆ DP ( ε ) Y (A( x ) , A( x ′ )) ≤ δ ) . To verify f -divergence properties of probabilistic programs, Barthe and Olmedo introduced for f -divergences as a key abstraction. This construction lifts a relation R ⊆ X × Y over discrete sets X , Y to a relation R ♯ ( f , δ ) ⊆ Dist ( X ) × Dist ( Y ) over subprobabilitydistributions: R ♯ ( f , δ ) = (cid:110) ( µ , µ ) (cid:12)(cid:12)(cid:12) ∃ µ L , µ R ∈ Dist ( R ) . π ( µ L ) = µ , π ( µ R ) = µ , ∆ fR ( µ L , µ R ) ≤ δ (cid:111) . (5)Above, π i ( µ ) is the i -th marginal of µ , that is, ( π ( µ ))( x ) = (cid:205) y ∈ Y µ ( x , y ) and ( π ( µ ))( y ) = (cid:205) x ∈ X µ ( x , y ) .The distributions µ L and µ R are called witness distributions , since to show that two distributions arerelated by a lifting, one must show the existence of two appropriate witnesses.Barthe and Olmedo used these relational liftings as the foundation of their relational programlogic f pRHL. These liftings have several attractive features. First, they reflect f -divergences:Eq ♯ ( f , δ ) X = { ( x , x ) | x ∈ X } ♯ ( f , δ ) = (cid:110) ( µ , µ ) (cid:12)(cid:12)(cid:12) ∆ fX ( µ , µ ) ≤ δ (cid:111) . As is conventional [Liese and Vajda 2006], we exclude the condition f ( ) = α . We also assume 0 f ( a / ) = lim t → + tf ( a / t ) for a > f ( / ) = In order to reason about possibly non-terminating programs, they work with an extension of f -divergence to subprobabilitydistributions .Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:5 So, they can be used to characterize differential privacy: a program A : X → Dist ( Y ) is ( ε , δ ) -differentially private w.r.t. an adjacency relation Φ , if (A( x ) , A( x ′ )) ∈ Eq ♯ ( DP ( ε ) , δ ) Y , for every ( x , x ′ ) ∈ Φ . Second, 2-witness liftings satisfy various composition properties, enabling cleanverification of probabilistic programs. However, this construction works only in the discrete case—all subprobability distributions are over countable discrete sets—and the logic f pRHL cannot reasonabout programs that sample from continuous distributions, like the Gaussian distribution. Much like standard differential privacy can be viewed in terms of f -divergences, we would like toview RDP, zCDP, and tCDP as bounds on more general divergences. A natural candidate for Rényidifferential privacy is Rényi divergence D α , as in its original definition. Indeed, we have: A is ( α , ρ ) -RDP iff ( for all ( x , x ′ ) ∈ Φ , D αY (A( x )||A( x ′ )) ≤ ρ ) . However, the Rényi divergence D α ( µ || µ ) of order α is not an f -divergence, and so it does not fitin the 2-witness lifting framework. Likewise, zCDP [Bun and Steinke 2016] and tCDP [Bun et al.2018] can be defined via uniform bounds on families of Rényi divergence: ∆ zCDP ( ξ ) X ( µ , µ ) = sup < α α (cid:0) D αX ( µ || µ ) − ξ (cid:1) for 0 ≤ ξ , (6) ∆ ω − tCDP X ( µ , µ ) = sup < α < ω α (cid:0) D αX ( µ || µ ) (cid:1) for 1 < ω , (7)letting us reformulate zCDP and tCDP as A is ( ξ , ρ ) -zCDP iff ( for all ( x , x ′ ) ∈ Φ , ∆ zCDP ( ξ ) Y (A( x ) , A( x ′ )) ≤ ρ )A is ( ρ , ω ) -tCDP iff ( for all ( x , x ′ ) ∈ Φ , ∆ ω − tCDP Y (A( x ) , A( x ′ )) ≤ ρ ) . These divergences are also not f -divergences. Furthermore, the RDP, zCDP and tCDP divergencesmay take negative values when applied to sub-probability distributions, which can arise fromprobabilistic computations that may not terminate with probability 1. Accordingly, we generalize thenotion of divergence to go beyond f -divergences and also to handle sub-probability distributions.Starting from families of real valued functions from pairs of distributions, we introduce basicproperties needed to give good composition properties for their corresponding liftings. In order to support natural examples for RDP, zCDP, and tCDP, we need a framework supportingcontinuous distributions, such as Gaussian, Laplace, and sinh-normal distributions. Unfortunately,extending 2-witness relational liftings to the continuous case presents further technical challengesrelated to composition. The relational lifting (−) ♯ ( DP ( ε ) , δ ) for standard differential privacy satisfies asequential composition principle: ( f , д ) : R → S ♯ ( DP ( ε ) , δ ) is a relation-preserving map. ( f ♯ , д ♯ ) : R ♯ ( DP ( ε ) , δ ) → S ♯ ( DP ( ε + ε ) , δ + δ ) is a relation-preserving map.Here, f ♯ and д ♯ are the Kleisli liftings of f and д with respect to the monad Dist of (discrete)subprobability distributions; this composition property gives 2-witness relational liftings a gradedmonad structure [Fujii et al. 2016; Katsumata 2014], highly useful for compositional reasoning.Since 2-witness lifting is defined through the existence of witness distributions, for any ( d , d ) ∈ R ♯ ( DP ( ε ) , δ ) , we then need witness distributions showing ( f ♯ ( d ) , д ♯ ( d )) ∈ S ♯ ( DP ( ε + ε ) , δ + δ ) . In thediscrete case, these witnesses can be constructed in two steps: Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :6 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata (1) For any ( x , y ) ∈ R , there exist witnesses d ′ L , d ′ R ∈ Dist ( S ) proving ( f ( x ) , д ( y )) ∈ S ♯ ( DP ( ε ) , δ ) .By applying the axiom of choice, we obtain a selection function ⟨ l , l ⟩ : R → (cid:110) ( d ′ L , d ′ R ) | ∆ DP ( ε ) S ( d ′ L , d ′ R ) ≤ δ (cid:111) (2) For any witnesses d L , d R ∈ Dist ( R ) proving ( d , d ) ∈ R ♯ ( DP ( ε ) , δ ) , ( l ♯ ( d L ) , l ♯ ( d R )) is a pair ofwitness distributions proving ( f ♯ ( d ) , д ♯ ( d )) ∈ S ♯ ( DP ( ε + ε ) , δ + δ ) by composability of ∆ DP ( ε ) .The first step is problematic to extend to the continuous case because the witness-selecting functions l and l obtained by the axiom of choice may not be measurable—the Kleisli extensions l ♯ and l ♯ in the second step may not be well-defined in the continuous case.To resolve this difficulty, we introduce a novel notion of approximate span-liftings . The key ideais that morphisms between span-liftings carry a built-in measurable witness selection function,making it unnecessary to use the axiom of choice when proving sequential composition. We briefly review some definitions from measure theory; readers should consult a textbook for moredetails [Rudin 1987]. Given a set X , a σ - algebra on X is a collection Σ of subsets of X including theempty set, closed under complements, countable unions, and countable intersections; a measurablespace X is a set | X | with a σ -algebra Σ X , called the measurable sets. A countable set X yields the discrete measurable space where all subsets are measurable: Σ X = X .A map f : X → Y between measurable spaces is measurable if f − ( A ) ∈ Σ X for all A ∈ Σ Y .Any subset S of measurable space X forms a subspace where the σ -algebra is given by Σ S = { A ∩ S | A ∈ Σ X } . Σ S is given as the coarsest one making the inclusion map S (cid:44) → X measurable.A measure on a measurable space is a map µ : Σ X → R ≥ ∪ {∞} such that µ ( ∅ ) = µ (∪ i X i ) = (cid:205) i µ ( X i ) for any countable family of disjoint measurable sets X i . Measures with µ ( X ) = probability measures , and measures with µ ( X ) ≤ subprobability measures .For any pair of subprobability measures µ on X and µ on Y , the product measure µ ⊗ µ of µ and µ is the unique measure on X × Y satisfying ( µ ⊗ µ )( A × B ) = µ ( A ) · µ ( B ) .For any measurable space X and element x ∈ X , we write d x for the Dirac measure on X centeredat x , defined as d x ( A ) = x ∈ A , and d x ( A ) = Meas ; this category has all limitsand colimits, and finite products distribute over finite coproducts. We denote by

Fin the fullsubcategory of

Meas consisting of all finite discrete spaces.

The sub-Giry monad G is the subprobabilistic variant of the Giry monad [Giry 1982]. Definition 3.1.

The sub-Giry monad (G , η , (−) ♯ ) over Meas is defined as follows: • For any X ∈ Meas , the measurable space G X is the set of subprobability measures ( measureswhose mass is equal or less than 1 ) on X equipped with the coarsest σ -algebra induced bythe evaluation functions ev A : G X → [ , ] defined by ν (cid:55)→ ν ( A ) ( A ∈ Σ X ). • For each f : X → Y in Meas , G f : G X → G Y is defined by (G f )( µ ) = µ ( f − (−)) . • The unit η is defined by the Dirac distributions η X ( x ) = d x . • The Kleisli extension f ♯ : G X → G Y of f : X → G Y is given by for any µ ∈ G X and A ∈ Σ Y , f ♯ ( µ )( A ) = ∫ X f ( x )( A ) dµ ( x ) . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:7

The sub-Giry monad satisfies useful properties for interpreting probabilistic programs. It iscommutative and strong with respect to the Cartesian products of

Meas , where the double strengthdst X , Y : G( X ) × G( Y ) ⇒ G( X × Y ) is given by the product measures dst X , Y ( ν , ν ) = ν ⊗ ν .The double strength is used to define semantics for composition and to interpret typing contexts.Additionally, the sub-Giry monad provides a structure to interpret loops. Namely, we can introducean ω CPO ⊥ structure over measurable functions of type X → G( Y ) with the following order: f ⊑ д ⇐⇒ ∀ x ∈ X , B ∈ Σ Y . f ( x )( B ) ≤ д ( x )( B ) ( f , д : X → G( Y ) in Meas ) . A graded monad [Fujii et al. 2016; Katsumata 2014] is a monad refined by indices from a monoid.Let A = ( A , · , A , ⪯) be a preordered monoid. An A -graded monad on a category C consists of • a family { T e } e ∈ M of endofunctors T e on C , • a morphism η X : X → T A X for X ∈ C (unit), • a morphism (−) e ♯ e : C ( X , T e Y ) → C ( T e X , T e e Y ) for X , Y ∈ C and e , e ∈ A (Kleislilifting), • a family {⊑ e , e } e ⪯ e of natural transformations ⊑ e , e : T e ⇒ T e (inclusion)satisfying the following compatibility condition: for any f : X → T e Y and д : Y → T e Z , ⊑ ( e e ) , ( e e ) Z ◦ f e ♯ e = (⊑ e , e Y ◦ f ) e ♯ e , f e ♯ e ◦ ⊑ e , e X = ⊑ ( e e ) , ( e e ) Y ◦ f e ♯ e , f ♯ e ◦ η X = f , η ♯ eX = id T e X , ( д e ♯ e ◦ f ) e ♯ e e = д e e ♯ e ◦ f e ♯ e . A typical way of constructing a graded monad is by refining a plain monad with indices. An A -graded lifting of a monad ( T , η T , (−) ♯ ) on D along a functor U : C → D is an A -graded monad { T e } e ∈ A on C satisfying U ◦ T e = T ◦ U , U ( f e ♯ e ) = ( U f ) ♯ , U ( η D ) = η TU D , and U (⊑ e , e D ) = id TU D .The functor U erases the grading of T e , yielding the original (plain) monad T . To extend the relational lifting approach to the continuous setting, we work with the categoryof spans , whose objects generalize relations by taking arbitrary functions in place of projections.Morphisms between spans will encode the information needed to ensure good compositionalbehavior.

Definition 3.2.

The category

Span ( Meas ) of spans in Meas consists of: • Objects ( X , Y , Φ , ρ , ρ ) given by span X ρ ←−− Φ ρ −−→ Y in Meas . • Morphisms ( X , Y , Φ , ρ , ρ ) → ( Z , W , Ψ , ρ ′ , ρ ′ ) given by triples ( h , k , l ) of morphisms h : X → Z , k : Y → W , and l : Φ → Ψ in Meas satisfying h ◦ ρ = ρ ′ ◦ l and k ◦ ρ = ρ ′ ◦ l .For simplicity, we often denote a Span ( Meas ) -object ( X , Y , Φ , ρ , ρ ) by Φ . The category Span ( Meas ) has several useful properties. First, the category has binary products: ( X , Y , Φ , ρ , ρ ) (cid:219)× ( Z , W , Ψ , ρ ′ , ρ ′ ) = ( X × Z , Y × W , Φ × Ψ , ρ × ρ ′ , ρ × ρ ′ ) . We will frequently use two notions of pairing on functions. Let f : X → Y , f : X → W , we have ⟨ f , f ⟩ : X → Y × W and f × f : X × X → Y × W . As functions, ⟨ f , f ⟩ takes a single input x and returns a pair ( f ( x ) , f ( x )) . On the other hand, f × f take a pair of inputs ( x , y ) and returns ( f ( x ) , f ( y )) . This ordering gives an ω CPO ⊥ -enrichment of the Kleisli category Meas G , which is equivalent to the partial additivity ofstochastic relations [Panangaden 1999].Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :8 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata The category

Span ( Meas ) also has coproducts: ( X , Y , Φ , ρ , ρ ) (cid:219) + ( X ′ , Y ′ , Φ ′ , ρ ′ , ρ ′ ) = ( X + X ′ , Y + Y ′ , Φ + Φ ′ , ρ + ρ ′ , ρ + ρ ′ ) . Standard binary relations can be interpreted as spans. For X , Y ∈ Meas , any binary relation Φ ⊆ | X | × | Y | determines a span X π ←−− Φ π −−→ Y in Meas , where π and π are projections, and Φ isregarded as a subspace of X × Y .Finally, relation-preserving maps can be interpreted as morphisms of spans. Consider twobinary relations Φ ⊆ | X | × | Y | and Ψ ⊆ | Z | × | W | , and suppose that they are interpreted asspans ( X , Y , Φ , π , π ) and ( Z , W , Ψ , π , π ) as above. If f : X → Z and д : Y → W in Meas satisfy ( f ( x ) , д ( y )) ∈ Ψ for any ( x , y ) ∈ Φ , then we have the following morphism ( f , д , f × д | Φ ) : ( X , Y , Φ , π , π ) → ( Z , W , Ψ , π , π ) in Span ( Meas ) where f × д | Φ is the restriction of f × д on Φ (we often write just f × д ). These features are crucialto interpret probabilistic program logics, as we will see in Section 6. Now that we have covered the preliminaries, our goal is to build a suitable graded monad on

Span ( Meas ) —this will be our abstraction for relational reasoning about divergences. We proceedin two stages. In this section, we introduce a general class of divergences , real-valued functions ontwo measures over the same space. Then, we identify important composition properties inspiredfrom analogous properties of f -divergences [Barthe and Olmedo 2013; Liese and Vajda 2006]. Wewill leverage these properties to give a graded monad structure on Span ( Meas ) capturing thesedivergences in the next section. We write R for the set R ∪ {−∞ , + ∞} of extended reals. We regardboth R and R ≥ as partially ordered additive monoids. For the former one, the addition is extendedby ∞ + (−∞) = −∞ . Definition 4.1. A divergence is a family ∆ = { ∆ X } X ∈ Meas of functions ∆ X : |G X | × |G X | → R . To describe composition of divergences, it is useful to work with indexed families of divergences;often, two divergences can be combined to give a new divergence with different indices. Forinstance, the notion of zCDP can be characterized by the family { ∆ zCDP ( ξ ) } ≤ ξ of divergences ∆ zCDP ( ξ ) introduced in Section 2 (Equation 6). For this reason, we introduce the notion of gradedfamilies of divergences . Definition 4.2.

Let ( A , · , A , ⪯) be a preordered monoid. An A - graded family of divergences is afamily ∆ = { ∆ α } α ∈ A such that α ⪯ β = ⇒ ( ∀ X ∈ Meas . ∀ µ , µ ∈ G X . ∆ βX ( µ , µ ) ≤ ∆ αX ( µ , µ )) . Note that the preorder on the grading is contravariant. We will regard a divergence ∆ as asingleton-graded family { ∆ } . We define basic properties of graded families of divergences for given ( A , · , A , ⪯) . Definition 4.3. An A -graded family ∆ = { ∆ α } α ∈ A of divergences is: reflexive: if ∆ αX ( µ , µ ) ≤ functorial: if ∆ αY (G k ( µ ) , G k ( µ )) ≤ ∆ αX ( µ , µ ) for any k : X → Y . substitutive: if ∆ αY ( f ♯ µ , f ♯ µ ) ≤ ∆ αX ( µ , µ ) for any f : X → G Y . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:9 additive: if ∆ α · βX × Y ( µ ⊗ µ , µ ⊗ µ ) ≤ ∆ αX ( µ , µ ) + ∆ βY ( µ , µ ) . continuous: if ∆ αX ( µ , µ ) = sup (cid:8) ∆ αI (G k ( µ ) , G k ( µ )) | I ∈ Fin , k : X → I (cid:9) . composable: if ∆ α · βY ( f ♯ µ , д ♯ µ ) ≤ ∆ αX ( µ , µ ) + sup x ∈ X ∆ βY ( f ( x ) , д ( x )) for any f , д : X → G Y .All functions are assumed to be measurable.These properties are inspired by properties from the literature on f -divergences and differentialprivacy. For instance, substitutivity is the generalization of the usual notion of data-processinginequality for f -divergences [Pardo and Vajda 1997, Chapter 2], while functoriality is the specialcase where the data-processing function is deterministic. These two properties are also known in thedifferential privacy literature as resilience to post-processing [Dwork and Roth 2013, Proposition 2.1],in the randomized and deterministic case. Composability corresponds to composition in differentialprivacy, which states that we can adaptively compose two differentially private mechanisms.Additivity corresponds to a simple instance of composition where the second mechanism doesnot depend on the result of the first. Continuity is the generalization of the continuity of f -divergences [Pardo and Vajda 1997, Theorem 16], which approximates divergences of continuousdistributions by divergences of discrete distributions.Reflexivity and composability are key properties to give a structure of graded monad. Intuitively,reflexivity gives a unit, and composability gives a (graded) Kleisli lifting. We also need additivity togive a strength of the graded monad, allowing a lifting on real-valued distributions—often availablefrom known results in probability theory—to be converted into a lifting on distributions over largerspaces (e.g., program memories). In some ways, composability is the key property: reflexivity isusually immediate, and additivity is a consequence.Theorem 4.4. An A -graded family ∆ is additive if it is continuous and composable. Although these properties have been studied before in the discrete case, there are subtletieswhen passing to our continuous ones. For example, in the case of discrete distributions, additivityis an instance of composability [Barthe and Olmedo 2013, Proposition 4]. In the case of continuousdistributions, this may no longer hold. However, one can recover additivity from composability byusing a continuity property.To prove composability, it is often easier to establish two other properties of families of diver-gences first: approximability and finite-composability. These properties describe divergences thatare well-behaved with respect to discretization, in order to smoothly extend properties in thediscrete case to the continuous case.

Definition 4.5. An A -graded family ∆ = { ∆ α } α ∈ A of divergences is: approximable: if for any X ∈ Meas and I ∈ Fin , f , д : X → G I , and µ , µ ∈ G X , there are J n ∈ Fin and m ∗ n : X → J n and m n : J n → X in Meas such that ∆ αI ( f ♯ ( µ ) , д ♯ ( µ )) = lim n →∞ ∆ αI (( f ◦ m n ◦ m ∗ n ) ♯ ( µ ) , ( д ◦ m n ◦ m ∗ n ) ♯ ( µ )) . finite-composable: if for any I , J ∈ Fin , f , д : I → G J , and d , d ∈ G I , ∆ α · βJ ( f ♯ d , д ♯ d ) ≤ ∆ αI ( d , d ) + sup i ∈ I ∆ βJ ( f ( i ) , д ( i )) . The function m ∗ n in the definition of the approximability of ∆ discretizes points in X to J n , and m n reconstructs points in X from J n . Finite-composability of ∆ means the composability of ∆ in thediscrete case.These properties allow us to extend composability of divergences in the discrete case, witnessedby finite-composability, to the continuous case. Finite-composability is often known for standard Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :10 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata divergences, or can be established by direct calculations. If ∆ is approximable and continuous,finite-composability implies composability. Formally, we have the following theorem.Theorem 4.6. A continuous approximable A -graded family ∆ is composable if finite-composable. f -divergences To discuss basic properties of divergences for DP, RDP, zCDP, and tCDP, we begin with basicproperties of f -divergences since DP can be formulated by a graded family ∆ DP = { ∆ DP ( ε ) } ≤ ε of f -divergences, and Rényi divergences are logarithms of f -divergences. An f -divergence ∆ f of subprobability measures is defined in the same way as f -divergence of probability measures(4). The f -divergences are not necessarily positive for subprobability measures, though they arepositive for proper probability measures. We can extend the continuity of f -divergences [Liese andVajda 2006, Theorem 16] to support subprobability measures.Theorem 4.7 (Cf. Liese and Vajda [2006, Theorem 16]). For any weight function f , the f -divergence ∆ f is continuous: for any subprobability measures µ , µ ∈ G X on X , we have ∆ fX ( µ , µ ) = sup (cid:40) n (cid:213) i = µ ( A i ) f (cid:18) µ ( A i ) µ ( A i ) (cid:19) | { A i } ni = is a measurable finite partition of X (cid:41) . As we have seen, DP can be formulated by the R ≥ -graded family ∆ DP = { ∆ DP ( ε ) } ≤ ε of f -divergences, while the Rényi divergences supporting RDP, zCDP, and tCDP are logarithms of f -divergences. Before proving basic properties of divergences for DP, RDP, zCDP, and tCDP, wefirst need two important basic properties of f -divergences, continuity and approximability, and weshow that finite-composability of f -divergences are extended to (proper) composability.Theorem 4.8. The f -divergence ∆ f is approximable for any weight function f . Therefore, any finite-composable family of f -divergences is composable.Theorem 4.9. An A -graded family ∆ = { ∆ f α } α ∈ A of the f α -divergences is composable if it isfinite-composable. We remark here that any composable family of f -divergences is also additive by applyingTheorem 4.4, since f -divergences are always continuous (Theorem 4.7). As we have seen, DP can be formulated by the R ≥ -graded family ∆ DP of f -divergences. ByTheorem 4.4 and 4.9 and Barthe and Olmedo [2013, Theorem 1], we obtain the basic properties ofthe divergences ∆ DP for DP as follows:Theorem 4.10 (Cf. Barthe and Olmedo [2013, Theorem 1]). The R ≥ -graded family ∆ DP = { ∆ DP ( ε ) } ≤ ε is reflexive, continuous, approximable, composable, and additive. Similarly, we can obtain basic properties for RDP, zCDP, and tCDP. First, by Theorem 4.7and Theorem 4.8, the exponential exp ( D α ) of Rényi divergence of order α is continuous andapproximable because is exactly the f -divergence with weight function t (cid:55)→ exp ( α /( − α )) t α .Since the logarithm function is monotone and continuous except at 0, Rényi divergence iscontinuous and approximable too. Reflexivity and finite-composability of Rényi divergences followby direct calculations. Theorem 4.9 yields: Note that a measurable finite partition { A i } ni = on X is equivalent to a measurable function k : X → I where I = { , , . . . , n } .Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:11 Theorem 4.11.

For any α > , the Rényi divergence D α of order α is reflexive, continuous, approx-imable, composable, and additive (as a singleton-graded family). We extend the following properties of Rényi divergences which give the transitive laws of RDPand zCDP to support subprobability measures. (An known analogous law for tCDP is not known.)Proposition 4.1 (Cf. Van Erven and Harremoës [2014, Theorem 3]).

We have < α ≤ β = ⇒ D αX ( µ || µ ) ≤ D βX ( µ || µ ) . Proposition 4.2 (Cf. Langlois et al. [2014, Lemma 4.1]).

For any α > , µ , µ , µ ∈ G X , and p , q > satisfying p + q = , we have D αX ( µ || µ ) ≤ pα − p ( α − ) D pαX ( µ || µ ) + D qp ( pα − ) X ( µ || µ ) . As we have seen in Section 2.4, we can define divergences for zCDP and tCDP by Equation(6) and Equation (7). Explicitly, we introduce the divergences for zCDP and tCDP by ∆ zCDP ( ξ , ρ ) = sup < α α ( D α − ξ ) and ∆ ω − tCDP ( ρ ) = sup < α < ω α D α respectively. Since two supremums are com-mutative (sup x sup y A ( x , y ) = sup y sup x A ( x , y ) ) in general, the following basic properties of thegraded family of zCDP and the divergence of tCDP are obtained from Theorem 4.11.Theorem 4.12. The R ≥ -graded family ∆ zCDP = { ∆ zCDP ( ξ ) } ≤ ξ for zCDP is reflexive, continuous,composable, and additive. Theorem 4.13.

For each < ω , the divergence ∆ ω − tCDP for ω -tCDP is reflexive, continuous, com-posable, and additive. Note that we may not have approximability, but the family is still composable. These results alsohold for subprobability measures where Rényi divergence and divergences for zCDP and tCDP aredefined in a way similar to Equation (1) and Equation (2) respectively.

We are now ready to combine graded divergences with spans, leading to our new relationalliftings. Given an A -graded family ∆ = { ∆ α } α ∈ A of divergences, we introduce a graded monadon Span ( Meas ) called the approximate span-lifting (−) ♯ ( ∆ , α , δ ) for the family ∆ , where α ∈ A and δ ∈ R . We first define its action on objects. Definition 5.1.

We define the span-constructor (−) ♯ ( ∆ , α , δ ) as follows: for any ( X , Y , Φ , ρ , ρ ) in Span ( Meas ) , we define the Span ( Meas ) -object ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , α , δ ) = (G X , G Y , W ( Φ , ∆ , α , δ ) , G ρ ◦ π , G ρ ◦ π ) where W ( Φ , ∆ , α , δ ) = (cid:8) ( ν , ν ) ∈ G Φ × G Φ | ∆ α Φ ( ν , ν ) ≤ δ (cid:9) . We view W ( Φ , ∆ , α , δ ) as a subspace of the measurable space G Φ × G Φ .Intuitively, ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , α , δ ) relates subprobability measures with ∆ α -distance at most δ . The set W ( Φ , ∆ , α , δ ) contains all possible witness distributions, and π and π are canonicalprojections from W ( Φ , ∆ , α , δ ) to G Φ . As a special case, the approximate span-lifting (−) ♯ ( ∆ , α , δ ) recovers the divergence ∆ α by applying the equality relation ( X , X , Eq X , π , π ) ♯ ( ∆ , α , δ ) .Theorem 5.2. For any A -graded family ∆ , α ∈ A , and δ ∈ R , we have ( X , X , X , id X , id X ) ♯ ( ∆ , α , δ ) = (G X , G X , (cid:8) ( µ , µ ) | ∆ αX ( µ , µ ) ≤ δ (cid:9) , π , π ) . Here, ( X , X , X , id X , id X ) is isomorphic to the equality relation ( X , X , Eq X , π | Eq X , π | Eq X ) . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :12 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata

Next, we give approximate span-liftings the structure of a graded monad with double strength.We consider the important case where ∆ is a reflexive, composable, and additive A -graded familyof divergences; in some cases, we can recover more limited versions of approximate span-liftingsby dropping or weakening these properties.Theorem 5.3. If an A -graded family ∆ is reflexive, composable, and additive, then the approximatespan-lifting (−) ♯ ( ∆ , α , δ ) form an A × R -graded monad with double strength. Namely, there are maps Functor:

For any morphism ( h , k , l ) : ( X , Y , Φ , ρ , ρ ) → ( Z , W , Ψ , ρ ′ , ρ ′ ) in the category Span ( Meas ) and any ( α , δ ) ∈ A × R , (G h , G k , G l × G l ) : ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , α , δ ) → ( Z , W , Ψ , ρ ′ , ρ ′ ) ♯ ( ∆ , α , δ ) . Unit:

For any morphism ( X , Y , Φ , ρ , ρ ) in Span ( Meas ) , ( η X , η Y , ⟨ η Φ , η Φ ⟩) : ( X , Y , Φ , ρ , ρ ) → ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , A , ) . Kleisli lifting:

For any morphism ( h , k , l ) : ( X , Y , Φ , ρ , ρ ) → ( Z , W , Ψ , ρ ′ , ρ ′ ) ♯ ( ∆ , α , δ ) in Span ( Meas ) and ( β , γ ) ∈ A × R , ( h ♯ , k ♯ , ( π ◦ l ) ♯ × ( π ◦ l ) ♯ ) : ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , β , γ ) → ( Z , W , Ψ , ρ ′ , ρ ′ ) ♯ ( ∆ , α β , δ + γ ) Inclusions:

For any ( X , Y , Φ , ρ , ρ ) in Span ( Meas ) , and any α ⪯ β and δ ≤ γ , ( id G X , id G Y , id G Φ × id G Φ ) : ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , α , δ ) → ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , β , γ ) . Double strength:

For any ( X , Y , Φ , ρ , ρ ) and ( Z , W , Ψ , ρ ′ , ρ ′ ) in Span ( Meas ) , and parameters ( α , δ ) and ( β , γ ) in A × R , by letting θ i = dst Φ , Ψ ◦ ( π i × π i ) where i = , , ( dst X , Z , dst Y , W , ⟨ θ , θ ⟩) : ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , α , δ ) (cid:219)× ( Z , W , Ψ , ρ ′ , ρ ′ ) ♯ ( ∆ , β , γ ) → ( Φ (cid:219)× Ψ ) ♯ ( ∆ , α β , δ + γ ) . Proof Sketch. Checking of the axioms of graded monad is straightforward since all struc-tures are inherited from the sub-Giry monad G . It suffices to prove the well-definedness of theabove maps. For example, we check the well-definedness of the Kleisli lifting of a morphism ( h , k , l ) : ( X , Y , Φ , ρ , ρ ) → ( Z , W , Ψ , ρ ′ , ρ ′ ) ♯ ( ∆ , α , δ ) in Span ( Meas ) . To prove this, we first showthat the third component ( π ◦ l ) ♯ × ( π ◦ l ) ♯ of the Kleisli lifting forms a measurable function from W ( Φ , ∆ , β , γ ) to W ( Ψ , ∆ , α β , δ + γ ) by using the composability of ∆ where measurability is obvioussince W ( Φ , ∆ , β , γ ) and W ( Ψ , ∆ , α β , δ + γ ) are the subspaces of G Φ × G Φ and G Ψ × G Ψ . Next, weshow G ρ ′ ◦ π ◦ (( π ◦ l ) ♯ × ( π ◦ l ) ♯ ) = h ♯ ◦ ρ and G ρ ′ ◦ π ◦ (( π ◦ l ) ♯ × ( π ◦ l ) ♯ ) = k ♯ ◦ ρ , butthis is given from the assumption ρ ′ ◦ l = h ◦ ρ and ρ ′ ◦ l = k ◦ ρ .Similary, the well-definedness of functor part and unit are proved by using the composabilityand reflexivity of ∆ ; the inclusion is obtained from the definition of A -graded family of divergences;the double strength is obtained from the additivity of ∆ . □ Many composition theorems of differential privacy are based on the notion of k -fold adaptivecomposition [Winograd-Cort et al. 2017, Definition 2.3] and [Dwork et al. 2010, Section A]. Roughlyspeaking, for k programs q , . . . , q k their k -fold adaptive composition q ▷ q ▷ · · · ▷ q k calculatesin the following way:(1) The first program q takes an input x in X , and returns an output y in Y .(2) The second program q takes an input x ∈ X and the output y ∈ Y of the previous program q , and returns an output y ∈ Y .. . . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:13 ( k ) The k -th program q k takes an input x ∈ X and the outputs y , . . . , y k − of previous programs q , . . . , q k − , and returns an output y k ∈ Y k .We observe that our definition of composability of divergences covers the standard composabilitywith respect to k -fold adaptive composition. For example, adaptive composition of two randomizedprograms can be formulated categorically as follows: let f : X → G Y and f : Y × X → G X be tworandomized programs. The adaptive composition f ▷ д : X → G( Y × Z ) is defined by f ▷ д = ( st Y , Z ◦ ( id Y × д ) ◦ α Y , Y , Z ) ♯ ◦ st ′ Y × Y , X ◦ (G copy Y × id X ) ◦ ( f × id X ) ◦ copy X . Here, st ′ Y × Y , X is the costrength G( Y × Y )× X → G(( Y × Y )× X ) ; copy X is the diagonal map X → X × X ( x (cid:55)→ ( x , x ) ) on X ; α Y , Y , Z is the associativity ( Y × Y ) × Z → Y × ( Y × Z ) of cartesian product of Meas . We show that the composability of ∆ is stronger than the adaptive composability. Supposethat ∆ reflexive, continuous and composable. Since (−) ♯ ( ∆ , α , δ ) is a graded span-lifting with a doublestrength, the adaptive composition of the following two morphisms ( f , f , f ) : Φ → Ψ ♯ ( ∆ , α , δ ) and ( д , д , д ) : Ψ (cid:219)× Φ → Ω ♯ ( ∆ , β , γ ) of spans is given by ( f ▷ д , f ▷ д , l ) : Φ → ( Ψ (cid:219)× Ω ) ♯ ( ∆ , α β , δ + γ ) (weomit details of l ). Finally, we build approximate span-liftings for DP, RDP, zCDP, and tCDP by combining Theorems4.10, 4.11, 4.12, and 4.13 with the construction of categorical structures of approximate span-liftings(Theorem 5.3).Theorem 5.4 (Approximate span-lifting for DP, RDP, zCDP, tCDP).

The following approximatespan-liftings are graded liftings with a double strength of

G×G along U : Span ( Meas ) →

Meas × Meas .Privacy (Graded family of )Divergence Approximate span-lifting Grading MonoidDP ∆ DP = { ∆ DP ( ε ) } ≤ ε {(−) ♯ ( ∆ DP , ε , δ ) } ≤ ε , ≤ δ R ≥ × R ≥ RDP D α (Rényi divergence; see (1)) {(−) ♯ ( D α , ∗ , ρ ) } ∗∈{∗} , ρ ∈ R R zCDP ∆ zCDP = { ∆ zCDP ( ξ ) } ≤ ξ (see (6)) {(−) ♯ ( ∆ zCDP , ξ , ρ ) } ≤ ξ , ρ ∈ R R ≥ × R tCDP ∆ ω − tCDP = { ∆ ω − tCDP } (see (7)) {(−) ♯ ( ∆ ω − tCDP , ∗ , ρ ) } ∗∈{∗} , ρ ∈ R R The previous section showed that the RDP, zCDP, and tCDP relaxations of differential privacycan be captured by relational liftings with the same categorical properties enjoyed by relationalliftings for standard differential privacy. As a result, we can use these liftings to give the semanticfoundation for formal verification of these relaxations. To demonstrate a concrete application, wedesign a program logic span-apRHL that can prove DP, RDP, zCDP, and tCDP for randomizedalgorithms, supporting both discrete and continuous random samplings. For differential privacy, there are advanced composition theorems such as Dwork et al. [2010, Theorem 3.3], Dwork andRoth [2013, Theorem 3.20], which give stronger privacy guarantees.Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :14 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata

We take a standard, first-order language pWHILE, augmenting the usual imperative commandswith a random sampling statement (we omit the grammar of expressions which is largely standard). τ :: = bool | int | real | τ d ( d ∈ N ) | . . . (basic types) e :: = x | b ∈ B | n ∈ Z | r ∈ R | e ⊕ e | e ▷◁ e | e [ e ] | . . . (expressions) ⊕ :: = + | − | ∗ | / | min | max | ∧ | ∨ ▷◁ :: = ≤ | ≥ | = | (cid:44) | < | > ν :: = Dirac ( e ) | Bern ( e ) | Lap ( e , e ) | Gauss ( e , e ) | . . . (probabilistic expression) c :: = skip | x $ ←− ν | c ; c | if e then c else c | while e do c (commands)Here, b , n , and r are constants; τ is a value type ; x is a variable ; e is an expression ; ν is a probabilisticexpression ; Dirac , Bern , Lap , and

Gauss represent the Dirac, Bernoulli, Laplace, and the Gaussiandistributions, respectively; c is a command/program . We will use the following shorthands: x ← e = def x $ ←− Dirac ( e ) and if b then c = def if b then c else skip . We consider programs that are welltyped. The type system is largely standard, with three kinds of judgments: Γ ⊢ t e : τ , Γ ⊢ p ν : τ ,and Γ ⊢ c for expressions, distributions and programs, respectively. For details, see Appendix. Our assertion logic uses formulas of the form Φ , Ψ :: = E | Φ ∧ Ψ | Φ ∨ Ψ | ¬ Φ where E represents basic relational expressions, namely: E :: = e ⟨ ⟩ ▷◁ e ⟨ ⟩ | ( e ⟨ ⟩ ⊕ e ⟨ ⟩) ▷◁ ( e ⟨ ⟩ ⊕ e ⟨ ⟩) . As usual in relational logics, we use the tags ⟨ ⟩ and ⟨ ⟩ to distinguish expressions evaluated inthe first and second memory, respectively. For simplicity, we consider only the relations given inthe above syntax, the language can be easily extended with other constructions. In the followingwe will use some syntactic sugar for constant k : ( e ⟨ ⟩ ▷◁ k ) = def ( e ▷◁ k )⟨ ⟩ = true ⟨ ⟩ , and ( e ⟨ ⟩ ▷◁ k ) = def true ⟨ ⟩ = ( e ▷◁ k )⟨ ⟩ . We consider only relation expression Φ that are well-formed in acontext Γ , and we denote this by the judgment Γ ⊢ R Φ . Rules for deriving this kind of judgmentsare standard, and postponed to Appendix.Since we use span-liftings instead of relational liftings, we interpret relational assertions asspans, that is, as Span ( Meas ) -objects. This can be done by first interpreting assertions Γ ⊢ R Φ asbinary relations (cid:74) Φ (cid:75) ⊆ (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) , and then converting to spans ( (cid:74) Γ (cid:75) , (cid:74) Γ (cid:75) , (cid:74) Φ (cid:75) , π , π ) . We describethe semantics of relation assertions in the next section.We will also use implications of relations Γ ⊢ I Φ = ⇒ Ψ , which is defined when Γ ⊢ R Φ and Γ ⊢ R Ψ , and the implication Φ = ⇒ Ψ forms a tautology under the typing context Γ . For example,we have the following inclusion, where Γ ⊢ t x : real : Γ ⊢ I (( x ⟨ ⟩ ≤ x ⟨ ⟩) ∧ ( x ⟨ ⟩ ≥ x ⟨ ⟩)) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) . In span-apRHL we can prove three kinds of judgments corresponding to differential privacy, RDP,zCDP, and tCDP. For well-typed commands Γ ⊢ c and Γ ⊢ c and assertions Γ ⊢ R Φ and Γ ⊢ R Ψ , we Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:15 Γ ⊢ x ← e ∼ ∆ A , x ← e : Φ { e ⟨ ⟩ , e ⟨ ⟩/ x ⟨ ⟩ , x ⟨ ⟩} = ⇒ Φ [assn] Γ ⊢ c ∼ ∆ α , δ c ′ : Φ = ⇒ Φ ′ Γ ⊢ c ∼ ∆ β , γ c ′ : Φ ′ = ⇒ Ψ [seq] Γ ⊢ c ; c ∼ ∆ α β , δ + γ c ′ ; c ′ : Φ = ⇒ ΨΓ ⊢ I Φ ′ = ⇒ Φ Γ ⊢ I Ψ = ⇒ Ψ ′ Γ ⊢ c ∼ ∆ α , δ c : Φ = ⇒ Ψ α ≤ β δ ≤ γ [weak] Γ ⊢ c ∼ ∆ β , γ c : Φ ′ = ⇒ Ψ ′ Fig. 1. Selection of span-apRHL basic rules. define judgments: Γ ⊢ c ∼ DP ε , δ c : Φ = ⇒ Ψ ( ε , δ ) -differential privacy (DP) Γ ⊢ c ∼ α − RDP ρ c : Φ = ⇒ Ψ ( α , ρ ) -Rényi differential privacy (RDP) Γ ⊢ c ∼ zCDP ξ , ρ c : Φ = ⇒ Ψ ( ξ , ρ ) -zero-concentrated differential privacy (zCDP) Γ ⊢ c ∼ ω − tCDP ρ c : Φ = ⇒ Ψ ( ω , ρ ) -truncated-concentrated differential privacy (tCDP)We divide the proof rules of span-apRHL in four classes: basic rules (Figure 1), rules for basicmechanisms (Figure 2), rules for reasoning about transitivity (Figure 3), and rules for conversions(Figure 4). The basic rules can be used to reason about either differential privacy, RDP, zCDP, andtCDP. We describe the basic rules in a parametric way by considering {∼ ∆ α , δ } α ∈ A , ≤ δ to stand forone of the families {∼ DP ε , δ } ≤ ε , ≤ δ , {∼ α − RDP ρ } ∗∈{∗} , ≤ ρ , {∼ zCDP ξ , ρ } ≤ ξ , ≤ ρ , and {∼ ω − tCDP ρ } ≤ ρ . We givea selection of the proof rules in Figure 1; the rest of the rules are standard and we defer them tothe appendix. Here, we comment briefly on the rules. The [assn] rule for assignment is mostlystandard, the only non-standard aspect is that depending on which notion of privacy we want touse, we need to select the corresponding unit 1 A . The rule [seq] is the sequential composition ofcommands and takes the same form no matter which family of divergence we consider. The rule[weak] is our version of the usual consequence rule, where additionally we can weaken also theprivacy parameters for each of the privacy definitions.In Figure 2, we show some rules for the basic mechanisms that we support: Bernoulli, Laplace,and Gauss. We give several of them to show the difference, in terms of the parameters, for the samemechanism, that we have in the different logics. All of them are supported in the continuous case.We show only DP rules for Bernoulli and Laplace mechanisms, and postpone other Bernoulli andLaplace mechanism rules to the Appendix.In Figure 3, we show rules for transitivity in span-apRHL. Transitivity is important becauseit allows one to reason about group privacy [Dwork and Roth 2013]. The different flavors of thelogic have different numeric parameters for these rules, reflecting the slight differences in groupprivacy [Bun and Steinke 2016; Dwork and Roth 2013; Mironov 2017]. Finally, Figure 4 givesrules for converting between judgments for different flavors of differential privacy. In some ofthem we have a loss in the parameters, in others there is no loss. These rules correspond to thedifferent conversion theorems for the different logics [Bun and Steinke 2016; Mironov 2017]. Noticethat most of these rules require lossless programs because they have been formulated in terms ofdistributions, rather than subdistributions. Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :16 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata Γ ⊢ x ←− Bern ( e ) ∼ DP log max ( p , − p )− log min ( p , − p ) , x ←− Bern ( e ) : (( e ⟨ ⟩ = p ) ∧ ( − e ⟨ ⟩ = e ⟨ ⟩) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [DP-Bern] Γ ⊢ x ←− Bern ( e ) ∼ DP , x ←− Bern ( e ) : ( e ⟨ ⟩ = e ⟨ ⟩) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [DP-Bern-Eq] Γ ⊢ x ←− Lap ( e , λ ) ∼ DP r / λ , x ←− Lap ( e , λ ) : (| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [DP-Lap] Γ ⊢ x ←− Gauss ( e , σ ) ∼ α − RDP αr / σ x ←− Gauss ( e , σ )(| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [RDP-G] Γ ⊢ x ←− Gauss ( e , σ ) ∼ zCDP , r / σ x ←− Gauss ( e , σ )(| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [zCDP-G] Γ ⊢ x ←− Gauss ( e , σ ) ∼ tCDP , r / σ x ←− Gauss ( e , σ )(| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [tCDP-G] ∃ c > + √ . ( ( . / δ ) ≤ c ) ∧ ( crε ≤ σ ) Γ ⊢ x ←− Gauss ( e , σ ) ∼ DP ε , δ x ←− Gauss ( e , σ ) : (| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [DP-G]1 < /√ ρ ≤ A / δ Γ ⊢ x ←− e + A · arsinh (cid:0) A Gauss ( , δ / ρ ) (cid:1) ∼ A / δ − tCDP ρ x ←− e + A · arsinh (cid:0) A Gauss ( , δ / ρ ) (cid:1) : (| e ⟨ ⟩ − e ⟨ ⟩| ≤ δ ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [tCDP-SinhG] Fig. 2. Rules for basic mechanisms for DP, RDP, zCDP, and tCDP in span-apRHL. Γ ⊢ c ∼ DP ε , δ c : Φ = ⇒ x ⟨ ⟩ = x ⟨ ⟩ Γ ⊢ c ∼ DP ε , δ c : Ψ = ⇒ x ⟨ ⟩ = x ⟨ ⟩ Γ ⊢ c ∼ DP ε + ε , max ( e ε δ + δ , e ε δ + δ ) c : Φ ◦ Ψ = ⇒ x ⟨ ⟩ = x ⟨ ⟩ [DP-Trans] Γ ⊢ c ∼ pα − RDP ρ c : Φ = ⇒ x ⟨ ⟩ = x ⟨ ⟩ Γ ⊢ c ∼ q ( pα − )/ p − RDP ρ c : Ψ = ⇒ x ⟨ ⟩ = x ⟨ ⟩ p + q = < p < q Γ ⊢ c ∼ α − RDP (( pα − ) ρ / p ( α − )) + ρ c : Φ ◦ Ψ = ⇒ x ⟨ ⟩ = x ⟨ ⟩ [RDP-Trans] Γ ⊢ c ∼ zCDP ξ ( k − ) (cid:205) k − i = , ( k − ) ρ c : Φ = ⇒ x ⟨ ⟩ = x ⟨ ⟩ Γ ⊢ c ∼ zCDP ξ , ρ c : Ψ = ⇒ x ⟨ ⟩ = x ⟨ ⟩ k ∈ N < k Γ ⊢ c ∼ zCDP ξ k (cid:205) ki = , k ρ c : Φ ◦ Ψ = ⇒ x ⟨ ⟩ = x ⟨ ⟩ [zCDP-Trans] Fig. 3. Span-apRHL transitivity rules for group privacy

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:17 Γ ⊢ c ∼ DP ε , c : Φ = ⇒ Ψ c , c : lossless [D/z] Γ ⊢ c ∼ zCDP ε , c : Φ = ⇒ Ψ c , c : lossless Γ ⊢ c ∼ zCDP , ρ c : Φ = ⇒ Ψ [z/R] ∀ α > . Γ ⊢ c ∼ α − RDP ρ c : Φ = ⇒ ΨΓ ⊢ c ∼ zCDP ξ , ρ c : Φ = ⇒ Ψ c , c : lossless 0 < δ < Γ ⊢ c ∼ DP ξ + ρ + √ ρ log ( / δ ) , δ c : Φ = ⇒ ΨΓ ⊢ c ∼ ω − tCDP ρ c : Φ = ⇒ Ψ , c , c : lossless, β = min ( ω , + (cid:112) log ( / δ )/ ρ ) , 0 < δ < Γ ⊢ c ∼ DP ρβ + log ( / δ )/( β − ) , δ c : Φ = ⇒ ΨΓ ⊢ c ∼ α − RDP ρ c : Φ = ⇒ Ψ c , c : lossless 0 < δ < Γ ⊢ c ∼ DP ρ − log δ /( α − ) , δ c : Φ = ⇒ Ψ Fig. 4. Rules for conversions between DP, RDP and zCDP in span-apRHL.

To prove the soundness of span-apRHL we interpret pWHILE in

Meas using the sub-Giry monad G . Most of the definitions are standard. The value types are interpreted as expected. To give asemantics to expressions, distribution expressions, and commands, we interpret their associatedtyping/well-formedness judgments in some context Γ , which is interpreted as usual as a product.We interpret an expression judgment Γ ⊢ t e : τ as a measurable function (cid:74) Γ ⊢ t e : τ (cid:75) : (cid:74) Γ (cid:75) → (cid:74) τ (cid:75) ;for instance, the variable case Γ ⊢ t x : τ is interpreted as the projection π x : (cid:74) Γ (cid:75) → (cid:74) τ (cid:75) . Note thatall operators ⊕ and comparisons ▷◁ are interpreted to measurable functions ⊕ : (cid:74) τ (cid:75) × (cid:74) τ (cid:75) → (cid:74) τ (cid:75) and ▷◁ : (cid:74) τ (cid:75) × (cid:74) τ (cid:75) → (cid:74) bool (cid:75) respectively. Likewise, we interpret a distribution expression judgment Γ ⊢ p ν : τ as a measurable function (cid:74) Γ ⊢ p ν : τ (cid:75) : (cid:74) Γ (cid:75) → G (cid:74) τ (cid:75) ; for instance, the Gaussian expression Γ ⊢ p Gauss ( e , e ) : real is interpreted as a Gaussian distribution. N ( (cid:74) Γ ⊢ t e : real (cid:75) , (cid:74) Γ ⊢ t e : real (cid:75) ) .Finally, we interpret a command judgment Γ ⊢ c as a measurable function (cid:74) Γ ⊢ c (cid:75) : (cid:74) Γ (cid:75) → G (cid:74) Γ (cid:75) defined inductively as (cid:74) Γ ⊢ x $ ←− ν (cid:75) = G( rw ⟨ Γ | x : τ ⟩) ◦ st (cid:74) Γ (cid:75) , (cid:74) τ (cid:75) ◦ ⟨ id (cid:74) Γ (cid:75) , (cid:74) ν (cid:75) ⟩ , (cid:74) Γ ⊢ c ; c (cid:75) = (cid:74) Γ ⊢ c (cid:75) ♯ ◦ (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ skip (cid:75) = η (cid:74) Γ (cid:75) (cid:74) Γ ⊢ if b then c else c (cid:75) = [ (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c (cid:75) ] ◦ br ⟨ Γ ⟩ ◦ ⟨ (cid:74) Γ ⊢ b (cid:75) , id (cid:74) Γ (cid:75) ⟩ Here, rw ⟨ Γ | x : τ ⟩ : (cid:74) Γ (cid:75) × (cid:74) x : τ (cid:75) → (cid:74) Γ (cid:75) ( x : τ ∈ Γ ) is an overwriting operation of memory (( a , . . . , a k , . . . , a n ) , b k ) (cid:55)→ ( a , . . . , b k , . . . , a n ) , which is given from the Cartesian products in Meas . The function br ⟨ Γ ⟩ : 2 × (cid:74) Γ (cid:75) → (cid:74) Γ (cid:75) + (cid:74) Γ (cid:75) comes from the canonical isomorphism 2 × (cid:74) Γ (cid:75) (cid:27) (cid:74) Γ (cid:75) + (cid:74) Γ (cid:75) given from the distributivity of Meas .To interpret loops, we introduce the dummy “abort” command Γ ⊢ null that is interpreted bythe null/zero measure (cid:74) Γ ⊢ null (cid:75) =

0, and the following commands corresponding to the finite

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :18 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata unrollings of the loop: [ while b do c ] n = (cid:40) if b then null else skip , if n = if b then c ; [ while b do c ] k , if n = k + (cid:74) Γ ⊢ while b do c (cid:75) = sup n ∈ N (cid:74) Γ ⊢ [ while e do c ] n (cid:75) . This is well-defined,since the family { (cid:74) Γ ⊢ [ while e do c ] n (cid:75) } n ∈ N is an ω -chain with respect to the ω CPO ⊥ -enrichment ⊑ of Meas G . Since we use span-liftings instead of relational liftings, we need to interpret relation expressionsto spans, that is,

Span ( Meas ) -objects. We proceed in two steps: first interpreting expressions asbinary relations, and then converting relations to spans. In the first step, we interpret a relationexpression Γ ⊢ R Φ as a binary relation over (cid:74) Γ (cid:75) : (cid:76) Γ ⊢ R e ⟨ ⟩ ▷◁ e ⟨ ⟩ (cid:77) = (cid:8) ( m , m ) ∈ (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) | (cid:74) Γ ⊢ t e : τ (cid:75) ( m ) ▷◁ (cid:74) Γ ⊢ t e : τ (cid:75) ( m ) (cid:9) (cid:76) Γ ⊢ R ( e ⟨ ⟩ ⊗ e ⟨ ⟩) ▷◁ ( e ⟨ ⟩ ⊗ e ⟨ ⟩) (cid:77) = (cid:26) ( m , m ) ∈ (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) | (cid:74) Γ ⊢ t e : τ (cid:75) ( m ) ⊗ (cid:74) Γ ⊢ t e : τ (cid:75) ( m ) ▷◁ (cid:74) Γ ⊢ t e : τ (cid:75) ( m ) ⊗ (cid:74) Γ ⊢ t e : τ (cid:75) ( m ) (cid:27) We interpret the connectives in the expected way: (cid:76) Γ ⊢ R Φ ∧ Ψ (cid:77) = (cid:76) Γ ⊢ R Φ (cid:77) ∩ (cid:76) Γ ⊢ R Ψ (cid:77) (cid:76) Γ ⊢ R Φ ∨ Ψ (cid:77) = (cid:76) Γ ⊢ R Φ (cid:77) ∪ (cid:76) Γ ⊢ R Ψ (cid:77)(cid:76) Γ ⊢ R ¬ Φ (cid:77) = ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) ) \ (cid:76) Γ ⊢ R Φ (cid:77) Then, we can convert the binary relation (cid:76) Γ ⊢ R Φ (cid:77) ⊆ (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) to the span (cid:74) Γ ⊢ R Φ (cid:75) = ( (cid:74) Γ (cid:75) , (cid:74) Γ (cid:75) , (cid:76) Γ ⊢ R Φ (cid:77) , π | (cid:76) Γ ⊢ R Φ (cid:77) , π | (cid:76) Γ ⊢ R Φ (cid:77) ) . We interpret the implication Γ ⊢ I Φ = ⇒ Ψ by the following morphism in Span ( Meas ) : (cid:74) Γ ⊢ I Φ = ⇒ Ψ (cid:75) = ( id (cid:74) Γ (cid:75) , id (cid:74) Γ (cid:75) , ( id (cid:74) Γ (cid:75) × id (cid:74) Γ (cid:75) )| (cid:76) Γ ⊢ R Φ (cid:77) ) : (cid:74) Γ ⊢ R Φ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) . We say a judgment Γ ⊢ c ∼ ∆ α , δ c : Φ = ⇒ Ψ is valid if there exists a measurable function l : (cid:76) Γ ⊢ R Φ (cid:77) → W ( (cid:74) Γ ⊢ R Ψ (cid:75) , ∆ , α , δ ) (we call it a witness function ) such that ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c (cid:75) , l ) : (cid:74) Γ ⊢ R Φ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ , α , δ ) is a morphism in Span ( Meas ) . Concretely, we define the validity in span-apRHL as follows: Γ | = c ∼ DP ε , δ c : Φ = ⇒ Ψ iff ∃ l . ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c (cid:75) , l ) : (cid:74) Γ ⊢ R Φ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ DP , ε , δ ) , Γ | = c ∼ α − RDP ρ c : Φ = ⇒ Ψ iff ∃ l . ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c (cid:75) , l ) : (cid:74) Γ ⊢ R Φ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( D α , ∗ , ρ ) , Γ | = c ∼ zCDP ξ , ρ c : Φ = ⇒ Ψ iff ∃ l . ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c (cid:75) , l ) : (cid:74) Γ ⊢ R Φ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ zCDP , ξ , ρ ) Γ | = c ∼ ω − tCDP ρ c : Φ = ⇒ Ψ iff ∃ l . ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c (cid:75) , l ) : (cid:74) Γ ⊢ R Φ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ ω − tCDP , ∗ , ρ ) . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:19

Theorem 6.1. If Γ ⊢ c ∼ ∆ α , δ c : Φ = ⇒ Ψ is derivable in span-apRHL, then it is valid. Proof sketch. The soundness of the basic rules is derived from the unit, graded Kleisli liftings,and inclusions of the graded span-lifting {(−) ♯ ( ∆ , α , δ ) } α , δ given in Section 5. We focus here on thesoundness of the [seq] rule. Since the judgments Γ ⊢ c ∼ ∆ α , δ c ′ : Φ = ⇒ Φ ′ and Γ ⊢ c ∼ ∆ α β , δ + γ c ′ : Φ ′ = ⇒ Ψ are valid, for some witness functions l and l we have ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) , l ) : (cid:74) Φ (cid:75) → (cid:74) Φ ′ (cid:75) ♯ ( ∆ , α , δ ) , ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) , l ) : (cid:74) Φ ′ (cid:75) → (cid:74) Ψ (cid:75) ♯ ( ∆ , β , γ ) . By taking the graded Kleisli extension of the second morphism ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) , l ) , for somewitness function l given by the construction in Theorem 5.3 (Kleisli lifting), we have the followingmorphism in the category Span ( Meas ) : ( (cid:74) Γ ⊢ c (cid:75) ♯ , (cid:74) Γ ⊢ c ′ (cid:75) ♯ , l ) : (cid:74) Φ ′ (cid:75) ♯ ( ∆ , α , δ ) → (cid:74) Ψ (cid:75) ♯ ( ∆ , α β , δ + γ ) . Composing them, we conclude the validity of Γ ⊢ c ; c ∼ ∆ α β , δ + γ c ′ ; c ′ : Φ = ⇒ Ψ .The soundness of the mechanism rules are proved by interpreting known results of mechanismsfor DP, RDP, zCDP, and tCDP to span-liftings. For example, the soundness of [RDP-G] proved byinterpreting the Rényi differential privacy of Gaussian mechanism to span-liftings. First, the function f = N (− , σ ) : R → G R describing a Gaussian mechanism is measurable. From the previousresult Mironov [2017, Proposition 7] of Rényi differential privacy of the Gaussian mechanism, themeasurable function f satisfies the following implication: | x − y | ≤ r = ⇒ D α ( f ( x )|| f ( y )) ≤ αr / σ . This implies that we have the below morphism in the category

Span ( Meas ) : ( f , f , ( f × f )| Φ ) : { ( x , y ) ∈ R × R | | x − y | ≤ r } → Eq ♯ ( D α , ∗ , αr / σ ) R . From this, by straightforward calculations, we obtain the soundness of [RDP-G].Note that we need to give measurable functions l selecting witness distributions when provingthese rules—in the discrete case, these functions can be obtained by the axiom of choice. In thecase of [RDP-G], we could give the witness l = f × f directly.Similary, the soundness of the rest of mechanism rules follows from the following previousresults on DP, RDP, zCDP and tCDP: Mironov [2017, Propositions 6], Dwork et al. [2006, Proposition1], Sato [2016, Lemma 4.2] (an enhancement of Dwork and Roth [2013, Theorem 3.22]), and Bunet al. [2018, Theorem 19], and the soundness of transitive rules follows from: Olmedo [2014, Lemma4.2(iii)], Bun and Steinke [2016, Proposition 27] and Langlois et al. [2014, Lemma 4.1]. The soundnessof the conversion rules follows by applying the comparison theorems of divergences Bun andSteinke [2016, Proposition 4], Mironov [2017, Proposition 3], Bun and Steinke [2016, Lemmas 3.2,3.5], Bun et al. [2018, Lemma 8] to the following inclusion between the approximate span-liftings: ( ∆ α ≤ δ = ⇒ ∆ β ≤ γ ) = ⇒ (( id , id , id ) : ( Φ ) ♯ ( ∆ , α , δ ) → ( Φ ) ♯ ( ∆ , β , γ ) in Span ( Meas )) . □ We show how we can use the span-pRHL program logic to verify concrete programs. We stress animportant point here, since the guarantees provided by RDP, zCDP, and tCDP can all be converted inguarantees about ( ϵ , δ ) -differential privacy, one could just use the latter for analyze all the exampleswe will show. The interest however in performing as much reasoning as possible using these Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :20 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata relaxations is that one can achieve better values of the parameters. This will become particularlyevident in the last example.

As a warm up, we begin with the following classic example of a one-way marginal algorithm withadditive noise.

Algorithm 1

A mechanism estimates the attribute means procedure AttMean ( n : int , ρ : real (const.), x : bool n (dataset), i : int , y , z , w : real ) i ← y ← while i < n do y ← y + x [ i ] ; i ← i + z ← y / n ; w $ ←− Gauss ( z , / n ρ ) ;We first show the Rényi-differential privacy of AttMean . We set a typing context Γ of AttMean by x : bool n (dataset), i : int , and y , z , w : real . We show the following judgment: Γ ⊢ AttMean ∼ RDP α ρ

AttMean : adj ( x ⟨ ⟩ , x ⟨ ⟩) = ⇒ w ⟨ ⟩ = w ⟨ ⟩ . Here, the adjacent relation adj ( x ⟨ ⟩ , x ⟨ ⟩) means that two datasets x ⟨ ⟩ and x ⟨ ⟩ differs at mostin one record. Explicitly, we define it by the following relation expression: adj ( x ⟨ ⟩ , x ⟨ ⟩) = (cid:219) ≤ i ≤ n (cid:32) ( x [ i ]⟨ ⟩ (cid:44) x [ i ]⟨ ⟩) = ⇒ (cid:219) ≤ j < i , i < j ≤ n ( x [ j ]⟨ ⟩ = x [ j ]⟨ ⟩) (cid:33) . The proof of this judgment follows by splitting

AttMean into two commands

LoopAM ; NoiseG where

NoiseG = w $ ←− Gauss ( z , / n ρ ) , and LoopAM is the rest of the program. Since the loop part

LoopAM is deterministic, by standard reasoning, we obtain: Γ ⊢ LoopAM ∼ α − RDP LoopAM : adj ( x ⟨ ⟩ , x ⟨ ⟩) = ⇒ (| z ⟨ ⟩ − z ⟨ ⟩| ≤ / n ) . By applying [RDP-G], for the noise-adding step

NoiseG we have: Γ ⊢ NoiseG ∼ α − RDP α ρ

NoiseG : (| z ⟨ ⟩ − z ⟨ ⟩| ≤ / n ) = ⇒ ( w ⟨ ⟩ = w ⟨ ⟩) . Thus, by applying [seq] we complete the proof. A similar proof could have been carried out withboth the rules for differential privacy, zCDP, and tCDP. Due to the simplicity of the example (thatis,

LoopAM is deterministic), the resulting guarantee would have been the same.

Algorithm 2

A mechanism estimates the attribute means with SinhNormal noise procedure AMSinh ( n : int , ρ : real (const.), x : bool n (dataset), i : int , y , z , w : real ) i ← y ← while i < n do y ← y + x [ i ] ; i ← i + z ← y / n ; w $ ←− w + A · arsinh (cid:0) A Gauss ( , / n ρ ) (cid:1) ; Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:21

We change the noise in the algorithm

AttMean from Gaussian noise to SinhNormal noise. Explicitly,we define a new algorithm

AMSinh = LoopAM ; NoiseSinh where the noise-adding part is changedto

NoiseSinh = w $ ←− w + A · arsinh (cid:0) A Gauss ( , / n ρ ) (cid:1) , where A is a constant satisfying 1 < /√ ρ ≤ A / n . In the similar way as the previous example AttMean , for the loop part

LoopAM , weobtain: Γ ⊢ LoopAM ∼ n · A / − tCDP LoopAM : adj ( x ⟨ ⟩ , x ⟨ ⟩) = ⇒ (| z ⟨ ⟩ − z ⟨ ⟩| ≤ / n ) . By applying [tCDP-SinhG], the noise-adding part

NoiseSinh satisfies Γ ⊢ NoiseSinh ∼ n · A / − tCDP ρ NoiseSinh : (| z ⟨ ⟩ − z ⟨ ⟩| ≤ / n ) = ⇒ ( w ⟨ ⟩ = w ⟨ ⟩) . Thus, by applying [seq], we conclude that the algorithm

AMSinh is ( ρ , n · A / ) -tCDP. The following algorithm gives the histograms of dataset x over the finite set T with additive noise.We use a primitive data type T as a finite set of size T . Algorithm 3

A mechanism estimates the histogram procedure Histogram ( n int , ρ : real (const.), x : [ T ] n (dataset), y , z : real T , i : int ) i ← y ← ( , . . . , ) ; while i < n do y [ x [ i ]] ← y [ x [ i ]] + i ← i + i ← z ← ( , . . . , ) ; while i < T do z [ i ] $ ←− Gauss ( y [ i ] , / ρ ) ; i ← i + Histogram . We set a typing context Γ by x : [ T ] n (dataset), y , z : real T , and i : int . We want to prove the validity of the following judgment: Γ ⊢ Histogram ∼ zCDP , ρ Histogram : adj ( x ⟨ ⟩ , x ⟨ ⟩) = ⇒ z ⟨ ⟩ = z ⟨ ⟩ . Here, adj ( x ⟨ ⟩ , x ⟨ ⟩) is defined in the similar way as the previous algorithm. We split the algorithm Histogram into

Histogram = HGCalc ; HGNoise where

HGNoise is the second loop for adding noise,and

HGCalc is the rest of the program that calculates a histogram without noise. We can now definetwo additional assertions for 0 ≤ K (cid:44) L < T and 0 ≤ I < n : Φ I , K , L = ( x [ I ]⟨ ⟩ (cid:44) x [ I ]⟨ ⟩) ∧ ( i (cid:44) I = ⇒ x [ i ]⟨ ⟩ = x [ i ]⟨ ⟩) ∧ ( x [ I ]⟨ ⟩ = K ) ∧ ( x [ I ]⟨ ⟩ = L ) Ψ K , L = ( y [ K ]⟨ ⟩ = y [ K ]⟨ ⟩ + ) ∧ ( y [ L ]⟨ ⟩ + = y [ L ]⟨ ⟩) ∧ ( j (cid:44) K , L = ⇒ y [ j ]⟨ ⟩ = y [ j ]⟨ ⟩) . It is easy to see that adj ( x ⟨ ⟩ , x ⟨ ⟩) ⇐⇒ ∃ I , K , L . Φ I , K , L . Using this and some standard reasoning,we have Γ ⊢ HGCalc ∼ zCDP , HGCalc : Φ ( I , K , L )( x ⟨ ⟩ , x ⟨ ⟩) = ⇒ Θ ( K , L ) ∧ ( i ⟨ ⟩ = ) where Θ ( K , L ) = Ψ ( K , L )∧( z ⟨ ⟩ = z ⟨ ⟩)∧( i ⟨ ⟩ = i ⟨ ⟩) . For proving the right judgment for HGNoise we also use the following additional axiom for zCDP that concludes ( , ) -zCDP if both noises andinputs are the same (the soundness is rather straightforward): Γ ⊢ x ←− Gauss ( e , σ ) ∼ zCDP , x ←− Gauss ( e , σ ) : ( e ⟨ ⟩ = e ⟨ ⟩) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) . Now by using this axiom, [zCDP-G], and some basic reasoning for the loop we obtain: Γ ⊢ HGNoise ∼ zCDP , ρ HGNoise : Θ ( K , L ) ∧ ( i ⟨ ⟩ = ) = ⇒ Θ ( K , L ) ∧ ( i ⟨ ⟩ = T ) . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :22 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata

Roughly speaking, we may regard

HGNoise as a composition c [ ] ; c [ ] ; · · · ; c [ T − ] where c [ j ] isthe j -th execution of the loop body of HGNoise . For j (cid:44) K , L by using the new axiom, Γ ⊢ c [ j ] ∼ zCDP , c [ j ] : Θ ( K , L ) ∧ ( i ⟨ ⟩ = j ) = ⇒ Θ ( K , L ) ∧ ( i ⟨ ⟩ = j + ) . On the other hand, for j = K , L by applying [zCDP-G] (with σ = ρ / Γ ⊢ c [ j ] ∼ zCDP , ρ / c [ j ] : Θ ( K , L ) ∧ (⟨ ⟩ = j ) = ⇒ Θ ( K , L ) ∧ ( i ⟨ ⟩ = j + ) . Note that the second case occurs twice. The [seq] rule sums up the grading of each execution c [ j ] ,and we conclude ρ -zCDP of HGNoise . Finally, by using [seq] and some conditional computations,we complete the proof. k -fold Gaussian mechanism Consider a type

DATA of dataset and an predicate

ADJ (− , = ) of adjacency for the type DATA , andconsider K queries q ( i , −) : DATA → real (0 ≤ i < K ) with sensitivity 1, that is, ADJ ( D , D ′ ) = ⇒ | q ( i , D ) − q ( i , D ′ )| ≤ . We want now to prove private the following K -fold Gaussian mechanism. Even though standardDP can already be handled by other verification techniques, our proof applies the conversionrules between DP and zCDP along with composition in zCDP, yielding a more precise analysis forstandard DP. Algorithm 4

Sum of K Gaussian mechanisms procedure FoldG K ( K : int , σ : real (const.), D : DATA , x , y , z : real , i : int ) i ← z ← while i < K do x ← q ( i , D ) ; y $ ←− Gauss ( , σ ) ; z ← x + y + z ; i ← i + FoldG K by D : DATA , x , y , z : real , and i : int . Following sensitivity ofqueries q , for any 0 ≤ i < K we may assume Γ ⊢ x ← q ( i , D ) ∼ zCDP , x ← q ( i , D ) : ADJ ( D ⟨ ⟩ , D ⟨ ⟩) = ⇒ | x ⟨ ⟩ − x ⟨ ⟩| ≤ . Thus, for the loop body c (line 4), by applying [zCDP-G], [seq] and [assn], we have Γ ⊢ c ∼ zCDP , / σ c : ADJ ( D ⟨ ⟩ , D ⟨ ⟩) ∧ ( z ⟨ ⟩ = z ⟨ ⟩) = ⇒ z ⟨ ⟩ = z ⟨ ⟩ . Then, by applying [assn], [seq], and [while] (the proof rule for while-loop) rules, we conclude Γ ⊢ FoldG K ∼ zCDP , K / σ FoldG K : ADJ ( D ⟨ ⟩ , D ⟨ ⟩) = ⇒ z ⟨ ⟩ = z ⟨ ⟩ . Hence, the algorithm

FoldG K is ( , K / σ ) -zCDP. Furthermore, by applying [z/D], we concludethat the algorithm FoldG K is (cid:18) K σ + √ K log ( / δ ) σ , δ (cid:19) -DP for any 0 < δ < / < δ < /

2, the loop body c satisfies Γ ⊢ c ∼ DP max (( + √ )/ σ , √ ( . / δ )/ σ ) , δ c : adj ( D ⟨ ⟩ , D ⟨ ⟩) ∧ ( z ⟨ ⟩ = z ⟨ ⟩) = ⇒ z ⟨ ⟩ = z ⟨ ⟩ . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:23

Let ε = max (( + √ )/ σ , (cid:112) ( . / δ )/ σ ) . The algorithm FoldG K can be seen as K -fold adaptivecomposition of the loop body c ; · · · ; c . By applying the advanced composition theorem [Dworkand Roth 2013, Theorem 3.20], the algorithm FoldG K is (cid:16) ε · (cid:112) K log ( / δ ) + Kε , Kδ + δ (cid:17) -DP for any 0 < δ , δ < / . We compare this bound and the bound given in the avove. When δ < .

4, we have 2 log ( . / δ ) > ε > . / σ by the definition. Then, we can compute: K σ + (cid:112) K log ( / δ ) σ < K σ + (cid:112) K log ( / δ ) σ · (cid:112) ( . / δ ) ≤ ε · (cid:112) K log ( / δ ) + Kε . Hence, ε · (cid:112) K log ( / δ ) + Kε > K σ + √ K log ( / δ ) σ whenever δ = Kδ + δ and δ < . FoldG . First, in the verification via zCDP, the approximation error δ is given regardless ofthe number of queries K . Second, if the approximation error satisfies δ < . δ < . δ in the ( ε , δ ) -DP is thought as the probability of failure of ε -DP. Moreover inpractical use of ( ε , δ ) -DP, the parameter δ is usually taken to be quite small (e.g., δ ≈ − ). f -divergences As we have mentioned, our work is inspired by work on verifying probabilistic relational propertiesinvolving f -divergences by Barthe and Olmedo [2013]; we generalize their results to a broaderclass of divergences and also to handle continuous distributions. Barthe and Olmedo also consider f -divergences that satisfy a more limited version of composability, called weak composability .Roughly, these composition results only apply when corresponding pairs of distributions haveequal weight; the KL-divergence, Hellinger distance, and χ divergences only satisfy this weakerversion of composability. While we do not detail this extension, our framework can naturally handleweakly composable divergences in the continuous case.A similar approach has also been used by Barthe et al. [2016a] in the context of an higher orderfunctional language for reasoning about Bayesian inference. Their type system uses a gradedmonad to reason about f -divergences. The graded monad supports only discrete distributions andis interpreted via a set-theoretic semantics, again using the lifting by Barthe and Olmedo [2013]. Approximate relational liftings were originally proposed for program logics targeting differentialprivacy. The first such system used a one-witness definition of lifting [Barthe et al. 2013], whichwas subsequently refined to several notions of two-witness lifting [Barthe et al. 2016b; Barthe andOlmedo 2013]. Sato [2016] developed approximate liftings and a program logic for continuous distri-bution using witness-free lifting based on a categorical monad lifting [Katsumata 2005; Katsumataand Sato 2015]. A witness-free relational lifting for differential privacy was introduced by Sato[2016]. This can be seen as an application of the general construction of graded relational lifting [Katsumata 2014, Section 5] to the Giry monad, using the technique of codensity lifting [Katsumataand Sato 2015, Section 3.3] instead of ⊤⊤ -lifting. The witness-free relational lifting by Sato [2016] Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :24 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata sends a binary relation R between measurable spaces X , Y to the following one between G X , G Y : R ⊤⊤( ε , δ ) = (cid:217) ( k , l ) : R (cid:219)→ S ( ε ′ , δ ′) ( k ♯ × l ♯ ) − S ( ε + ε ′ , δ + δ ′ ) where S ( ε ′ , δ ′ ) = (cid:110) ( x , y ) ∈ G × G | x ≤ e ε ′ y + δ ′ (cid:111) . where G is the sub-Giry monad, k ♯ and l ♯ denote the Kleisli extensions of k and l respectively, (cid:219)→ denotes a relation-preserving map, and ⊤⊤ is used to denote the codensity lifting and todistinguish it from our 2-witness lifting. Here, the intersection is taken over all measurable functions k : X → G , l : Y → G R to those related by S ( ε ′ , δ ′ ) . We note that thebinary relation S ( ε ′ , δ ′ ) is a parameter of this witness-free lifting, and by changing it, we can deriveother graded relational liftings of G .Checking the membership for R ⊤⊤( ε , δ ) is complex: we have to test the pair ( x , y ) against everypair ( k , l ) of measurable functions such that ( k , l ) : R (cid:219)→ S ( ε , δ ) . Fortunately, since the divergence ∆ DP ( ε ) is defined by a linear inequality of measures, the witness-free lifting R ⊤⊤( ϵ , δ ) can be simplified to the following R ⊤⊤( ε , δ ) = { ( d , d ) ∈ G X × G Y | ∀ A ⊆ Σ X . d ( A ) ≤ e ε d ( R ( A )) + δ } . While we would like to generalize this lifting construction to handle more general divergences forRDP, zCDP, and tCDP, there are at least two obstacles. First, it is not clear how to find a parameter S to derive the suitable graded relational lifting for a given general divergence. Second, even if wecan find a suitable parameter S , it is awkward to work with the lifting unless we can simplify thelarge intersection into a more convenient form. In contrast, 2-witness liftings seem more concreteand easier to work with: It suffices to give witness distributions to check the membership of liftedrelations.In the discrete case, witness-free liftings are equivalent to the witness-/span-based liftingsby Barthe et al. [2017]. Recent work also considers liftings with more fine-grained parameters thatcan vary over different pairs of samples [Albarghouthi and Hsu 2018]. Rényi and zero-concentrated differential privacy were recently proposed in the differential privacyliterature; to the best of our knowledge, we are the first to verify these properties. In contrast, thereare now numerous systems targeting differential privacy using a wide range of techniques beyondprogram logics, including dynamic analyses [McSherry 2009], linear [Azevedo de Amorim et al.2014; Gaboardi et al. 2013; Reed and Pierce 2010] and dependent [Barthe et al. 2015] type systems,product programs [Barthe et al. 2014], partial evaluation [Winograd-Cort et al. 2017], and constraint-solving [Albarghouthi and Hsu 2018; Zhang and Kifer 2017]; see the recent survey [Barthe et al.2016c] for more details.

We have developed a framework for reasoning about three relaxations of differential privacy: Rényidifferential privacy, zero concentrated differential privacy, and truncated concentrated differentialprivacy. We extended the notion of divergences to a more general class, and to support subprobabilitymeasures. Additionally, we have introduced a novel notion of approximate span-lifting supportingthese divergences and continuous distributions.One promising direction for future work is to study the moment-accountant compositionmethod [Abadi et al. 2016]. This composition method tracks the moments of the privacy loss

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:25 random variable, although it does not directly correspond to composition for RDP or zCDP. An-other interesting direction would be to analyze recently-proposed RDP mechanisms for posteriorsampling [Geumlek et al. 2017], and the GAP-Max tCDP algorithm by Bun et al. [2018].

REFERENCES

Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. DeepLearning with Differential Privacy. In

ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna,Austria . 308–318. https://doi.org/10.1145/2976749.2978318Aws Albarghouthi and Justin Hsu. 2018. Synthesizing Coupling Proofs of Differential Privacy.

Proceedings of the ACM onProgramming Languages

2, POPL, Article 58 (Jan. 2018). https://doi.org/10.1145/3158146 arXiv:cs.PL/1709.05361 Appearedat ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL), Los Angeles, California.Arthur Azevedo de Amorim, Marco Gaboardi, Emilio Jesús Gallego Arias, and Justin Hsu. 2014. Really natural linearindexed type-checking. In

Implementation of Functional Languages (IFL), Boston, Massachusetts . ACM Press, 5:1–5:12.http://arxiv.org/abs/1503.04522Gilles Barthe, Thomas Espitau, Justin Hsu, Tetsuya Sato, and Pierre-Yves Strub. 2017. ⋆ -Liftings for Differential Privacy.In International Colloquium on Automata, Languages and Programming (ICALP), Warsaw, Poland (Leibniz InternationalProceedings in Informatics) , Vol. 80. Schloss Dagstuhl–Leibniz Center for Informatics, 102:1–102:12. https://doi.org/10.4230/LIPIcs.ICALP.2017.102Gilles Barthe, Gian Pietro Farina, Marco Gaboardi, Emilio Jesús Gallego Arias, Andy Gordon, Justin Hsu, and Pierre-YvesStrub. 2016a. Differentially Private Bayesian Programming. In

ACM SIGSAC Conference on Computer and CommunicationsSecurity (CCS), Vienna, Austria . 68–79. https://doi.org/10.1145/2976749.2978371Gilles Barthe, Noémie Fong, Marco Gaboardi, Benjamin Grégoire, Justin Hsu, and Pierre-Yves Strub. 2016b. Advancedprobabilistic couplings for differential privacy. In

ACM SIGSAC Conference on Computer and Communications Security(CCS), Vienna, Austria . 55–67. https://arxiv.org/abs/1606.07143Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, César Kunz, and Pierre-Yves Strub. 2014. ProvingDifferential Privacy in Hoare Logic. In

IEEE Computer Security Foundations Symposium (CSF), Vienna, Austria . 411–424.https://doi.org/10.1109/CSF.2014.36 arXiv:cs.LO/1407.2988Gilles Barthe, Marco Gaboardi, Emilio Jesús Gallego Arias, Justin Hsu, Aaron Roth, and Pierre-Yves Strub. 2015. Higher-OrderApproximate Relational Refinement Types for Mechanism Design and Differential Privacy. In

ACM SIGPLAN–SIGACTSymposium on Principles of Programming Languages (POPL), Mumbai, India . 55–68. https://doi.org/10.1145/2676726.2677000 arXiv:cs.PL/1407.6845Gilles Barthe, Marco Gaboardi, Justin Hsu, and Benjamin C. Pierce. 2016c. Programming language techniques for differentialprivacy.

ACM SIGLOG News

3, 1 (Jan. 2016), 34–53. http://siglog.hosting.acm.org/wp-content/uploads/2016/01/siglog_news_7.pdfGilles Barthe, Boris Köpf, Federico Olmedo, and Santiago Zanella-Béguelin. 2013. Probabilistic Relational Reasoning forDifferential Privacy.

ACM Transactions on Programming Languages and Systems

35, 3 (Nov. 2013), 9:1–9:49. https://doi.org/10.1145/2492061Gilles Barthe and Federico Olmedo. 2013. Beyond Differential Privacy: Composition Theorems and Relational Logic for f -Divergences between Probabilistic Programs. In International Colloquium on Automata, Languages and Programming(ICALP), Riga, Latvia (Lecture Notes in Computer Science) , Vol. 7966. Springer-Verlag, 49–60. https://doi.org/10.1007/978-3-642-39212-2_8Mark Bun, Cynthia Dwork, Guy N. Rothblum, and Thomas Steinke. 2018. Composable and Versatile Privacy via TruncatedCDP. In

ACM SIGACT Symposium on Theory of Computing (STOC), Los Angeles, California .Mark Bun and Thomas Steinke. 2016. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. In

IACR Theory of Cryptography Conference (TCC), Beijing, China (Lecture Notes in Computer Science) , Vol. 9985. Springer-Verlag, 635–658. https://doi.org/10.1007/978-3-662-53641-4_24 arXiv:cs.CR/1605.02065Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. 2006. Calibrating Noise to Sensitivity in Private DataAnalysis. In

IACR Theory of Cryptography Conference (TCC), New York, New York . Lecture Notes in Computer Science,Vol. 3876. Springer-Verlag, 265–284. https://doi.org/10.1007/11681878_14Cynthia Dwork and Aaron Roth. 2013. The Algorithmic Foundations of Differential Privacy.

Foundations and Trends® inTheoretical Computer Science

9, 3–4 (2013). https://doi.org/10.1561/0400000042C. Dwork, G. N. Rothblum, and S. Vadhan. 2010. Boosting and Differential Privacy. In

IEEE Symposium on Foundations ofComputer Science (FOCS), Las Vegas, Nevada . 51–60. https://doi.org/10.1109/FOCS.2010.12Soichiro Fujii, Shin-ya Katsumata, and Paul-André Melliès. 2016. Towards a Formal Theory of Graded Monads. In

Foundationsof Software Science and Computation Structures - 19th International Conference, FOSSACS 2016, Held as Part of the EuropeanJoint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings .Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :26 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata

ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL), Rome, Italy .357–370. https://doi.org/10.1145/2429069.2429113Joseph Geumlek, Shuang Song, and Kamalika Chaudhuri. 2017. Renyi Differential Privacy Mechanisms for PosteriorSampling. In

Conference on Neural Information Processing Systems (NIPS), Long Beach, California . 5295–5304. http://arxiv.org/abs/1710.00892Michèle Giry. 1982. A categorical approach to probability theory. In

Categorical Aspects of Topology and Analysis , B. Ba-naschewski (Ed.). Lecture Notes in Mathematics, Vol. 915. Springer-Verlag, 68–85. https://doi.org/10.1007/BFb0092872Shin-ya Katsumata. 2005. A Semantic Formulation of TT-Lifting and Logical Predicates for Computational Metalanguage.In

International Workshop on Computer Science Logic (CSL), Oxford, England , Luke Ong (Ed.). Lecture Notes in ComputerScience, Vol. 3634. Springer-Verlag, 87–102. https://doi.org/10.1007/11538363_8Shin-ya Katsumata. 2014. Parametric Effect Monads and Semantics of Effect Systems. In

ACM SIGPLAN–SIGACT Symposiumon Principles of Programming Languages (POPL), San Diego, California . 633–645. https://doi.org/10.1145/2535838.2535846Shin-ya Katsumata and Tetsuya Sato. 2015. Codensity Liftings of Monads. In , Vol. 35. Schloss Dagstuhl–LeibnizCenter for Informatics, 156–170. https://doi.org/10.4230/LIPIcs.CALCO.2015.156Adeline Langlois, Damien Stehlé, and Ron Steinfeld. 2014. GGHLite: More Efficient Multilinear Maps from Ideal Lattices.(2014), 239–256. https://doi.org/10.1007/978-3-642-55220-5_14Friedrich Liese and Igor Vajda. 2006. On Divergences and Informations in Statistics and Information Theory.

IEEETransactions on Information Theory

52, 10 (Oct 2006), 4394–4412. https://doi.org/10.1109/TIT.2006.881731Frank McSherry. 2009. Privacy Integrated Queries. In

ACM SIGMOD International Conference on Management of Data(SIGMOD), Providence, Rhode Island . 19–30. https://doi.org/10.1145/1559845.1559850Ilya Mironov. 2017. Rényi Differential Privacy. In

IEEE Computer Security Foundations Symposium (CSF), Santa Barbara,California . 263–275. https://doi.org/10.1109/CSF.2017.11Federico Olmedo. 2014.

Approximate Relational Reasoning for Probabilistic Programs . Ph.D. Dissertation. Technical Universityof Madrid.Prakash Panangaden. 1999. The Category of Markov Kernels.

Electronic Notes in Theoretical Computer Science

22 (1999),171–187. https://doi.org/10.1016/S1571-0661(05)80602-4M. C. Pardo and I. Vajda. 1997. About distances of discrete distributions satisfying the data processing theorem of informationtheory.

IEEE Transactions on Information Theory

43, 4 (Jul 1997), 1288–1293. https://doi.org/10.1109/18.605597Jason Reed and Benjamin C. Pierce. 2010. Distance Makes the Types Grow Stronger: A Calculus for Differential Privacy.In

ACM SIGPLAN International Conference on Functional Programming (ICFP), Baltimore, Maryland . 157–168. http://dl.acm.org/citation.cfm?id=1863568Alfred Renyi. 1961. On Measures of Entropy and Information. In

Berkeley Symposium on Mathematical Statistics andProbability, Volume 1: Contributions to the Theory of Statistics . University of California Press, Berkeley, Calif., 547–561.http://projecteuclid.org:443/euclid.bsmsp/1200512181Walter Rudin. 1987.

Real and complex analysis (third ed.). McGraw-Hill Book Co., New York. xiv+416 pages.Tetsuya Sato. 2016. Approximate Relational Hoare Logic for Continuous Random Samplings.

Electronic Notes in TheoreticalComputer Science

325 (2016), 277–298. https://doi.org/10.1016/j.entcs.2016.09.043 Conference on the MathematicalFoundations of Programming Semantics (MFPS), Pittsburgh, Pennsylvania.Tim Van Erven and Peter Harremoës. 2014. Rényi Divergence and Kullback-Leibler Divergence.

IEEE Transactions onInformation Theory

60, 7 (July 2014), 3797–3820. https://doi.org/10.1109/TIT.2014.2320500Daniel Winograd-Cort, Andreas Haeberlen, Aaron Roth, and Benjamin C. Pierce. 2017. A Framework for AdaptiveDifferential Privacy.

Proceedings of the ACM on Programming Languages

1, ICFP, Article 10 (2017), 29 pages. https://doi.org/10.1145/3110254Danfeng Zhang and Daniel Kifer. 2017. LightDP: Towards Automating Differential Privacy Proofs. In

ACM SIGPLAN–SIGACTSymposium on Principles of Programming Languages (POPL), Paris, France . 888–901. https://doi.org/10.1145/3009837.3009884Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:27

A CONTINUITY OF f -DIVERGENCES OF SUBPROBABILITY MEASURES In this section we show the subprobability version of continuity of f -divergences [Liese and Vajda2006, Theorem 16] in a different way from the paper [Liese and Vajda 2006].Theorem A.1 (Theorem 4.7 / Subprobability version of[Liese and Vajda 2006, Theorem16]). For any weight function f , the f -divergence ∆ f is continuous: for any subprobability measures µ , µ ∈ G X on X , we have ∆ fX ( µ , µ ) = sup (cid:40) n (cid:213) i = µ ( A i ) f (cid:18) µ ( A i ) µ ( A i ) (cid:19) | { A i } ni = is a measurable finite partition of X (cid:41) . To prove this proposition, we introduce the singularity of measures. Two measures µ and µ on X are said to be mutually singular (written ν ⊥ ν ) if there are partition A , A ∈ Σ X of X suchthat µ i ( E ) = µ i ( A i ∩ E ) for any E ∈ Σ X ( i = , Let µ and µ be σ -finite measures on X .There are unique finite measures µ • and µ ⊥ on X such that µ • ≪ µ and µ ⊥ ⊥ µ . We recall that the f -divergence for subprobability measures is defined by for any µ , µ , µ ∈ G X such that µ , µ ≪ µ , ∆ fX ( µ , µ ) = ∫ X dµ dµ f (cid:18) dµ / dµdµ / dµ (cid:19) dµ . We remark that µ satisfying µ , µ ≪ µ always exists (e.g. ( µ + µ )/ ∆ fX ( µ , µ ) does notdepend on the choice of µ . We want to prove the continuity: ∆ fX ( µ , µ ) = sup (cid:40) n (cid:213) i = µ ( A i ) f (cid:18) µ ( A i ) µ ( A i ) (cid:19) | { A i } ni = is a finite measurable partition of X (cid:41) . We define the following restricted sum of f -divergences. For any measurable subset D ∈ Σ X , ∆ fX ( µ , µ )| D = ∫ D dµ dµ f (cid:18) dµ / dµdµ / dµ (cid:19) dµ . ∆ fX ( µ , µ )| D = sup (cid:40) n (cid:213) i = µ ( A i ) f (cid:18) µ ( A i ) µ ( A i ) (cid:19) | { A i } ni = is a finite measurable partition of D (cid:41) = sup (cid:40) (cid:213) i ∈ I µ ( k − ( i )) f (cid:18) µ ( k − ( i )) µ ( k − ( i )) (cid:19) | I ∈ Fin , k : D → I (cid:41) Of course, ∆ fX ( µ , µ ) = ∆ fX ( µ , µ )| X . We write ∆ fX ( µ , µ ) = ∆ fX ( µ , µ )| X We temporary consider a positive weight function f .Lemma A.3. If µ ≪ µ then ∆ fX ( µ , µ )| D ≤ ∆ fX ( µ , µ )| D for any D ∈ Σ X . Proof. Since µ ≪ µ , we may assume µ = µ (hence dµ / dµ = ∆ fX ( µ , µ )| D = ∫ D f ( d µ d µ ) dµ . Since f is convex, there is α ∈ R ≥ which makes that f is monotone increasing onthe interval [ , α ) and monotone decreasing on [ α , ∞) . Let { A i } ni = be an arbitrary finite partitionof D which is finer than the partition { d µ d µ − ([ , α )) ∩ D , d µ d µ − ([ α , ∞)) ∩ D } . The function f ◦ d µ d µ iseither monotone increasing or monotone decreasing, on each partition A i . Hence, inf x ∈ A i f ( d µ d µ )( x ) Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :28 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata is either f ( inf x ∈ A i d µ d µ ( x )) or f ( sup x ∈ A i d µ d µ ( x )) . From the mean-value theorem for measures, weobtain inf x ∈ A i dµ dµ ( x ) ≤ µ ( A i ) µ ( A i ) ≤ sup x ∈ A i dµ dµ ( x ) . Hence, n (cid:213) i = µ ( A i ) inf x ∈ A i f ( dµ dµ )( x ) ≤ n (cid:213) i = µ ( A i ) f ( µ ( A i ) µ ( A i ) ) . Since { A i } ni = is arbitrary, we conclude ∆ fX ( µ , µ )| D ≤ ∆ fX ( µ , µ )| D . □ Lemma A.4. If µ ≪ µ and the Radon-Nikodym derivative dµ / dµ is bounded on D then ∆ fX ( µ , µ )| D = ∆ fX ( µ , µ )| D . Proof. We fix a positive integer 1 ≤ K ∈ N such that 0 ≤ d µ d µ ≤ M . For given N ∈ N , we definethe partition { A i } N Ki = of D by A i = (cid:32)(cid:18) dµ dµ (cid:19) − ( B i ) (cid:33) ∩ D , B i =  [ i N , i + N ) ≤ i < N { } i = N ( i − N , i N ] N < i ≤ N K . Since µ ≪ µ and 0 f ( / ) =

0, if µ ( A i ) = µ ( A i ) µ ( A i ) µ ( A i ) =

0. If µ ( A i ) > (cid:12)(cid:12)(cid:12) d µ d µ ( x ) − µ ( A i ) µ ( A i ) (cid:12)(cid:12)(cid:12) ≤ −( N − ) for all x ∈ A i , from the definition of { A i } N Ki = , i − N ≤ inf x ∈ A i dµ dµ ( x ) ≤ µ ( A i ) µ ( A i ) ≤ sup x ∈ A i dµ dµ ( x ) ≤ i + N . Consider an arbitrary ε >

0. Since f is uniformly continuous on the closed interval [ , K ] , thereare large enough N ∈ N and the corresponding partition { A i } N Ki = such that µ ( A i ) > = ⇒ (cid:12)(cid:12)(cid:12)(cid:12) inf x ∈ A i f ( dµ dµ )( x ) − f ( µ ( A i ) µ ( A i ) ) (cid:12)(cid:12)(cid:12)(cid:12) < ε Hence, for any partition { C i } ni = of D finer than { A i } N Ki = , we obtain n (cid:213) i = µ ( C i ) f (cid:18) µ ( C i ) µ ( C i ) (cid:19) ≤ n (cid:213) i = µ ( C i ) (cid:18) inf x ∈ C i f (cid:18) dµ dµ (cid:19) ( x ) (cid:19) + ε . This implies ∆ fX ( µ , µ )| D ≤ ∆ fX ( µ , µ )| D + ε . Since ε > ∆ fX ( µ , µ )| D ≤ ∆ fX ( µ , µ )| D . □ Lemma A.5.

We have ∆ fX ( µ , µ )| D = ∆ fX ( µ , µ )| D when µ ≪ µ . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:29

Proof. Let D n = (cid:18)(cid:16) d µ d µ (cid:17) − [ n , n + ) (cid:19) ∩ D ( n ∈ N ). From Jensen’s inequality, we obtain for anypartition { A i } mi = of D , m (cid:213) i = µ ( A i ) f (cid:18) µ ( A i ) µ ( A i ) (cid:19) = m (cid:213) i = ( (cid:213) n ∈ N µ ( D n ∩ A i )) f (cid:18) (cid:205) n ∈ N µ ( D n ∩ A i ) (cid:205) n ∈ N µ ( D n ∩ A i ) (cid:19) ≤ m (cid:213) i = (cid:213) n ∈ N µ ( D n ∩ A i ) f (cid:18) µ ( D n ∩ A i ) µ ( D n ∩ A i ) (cid:19) = (cid:213) n ∈ N m (cid:213) i = µ ( D n ∩ A i ) f (cid:18) µ ( D n ∩ A i ) µ ( D n ∩ A i ) (cid:19) This implies ∆ fX ( µ , µ )| D ≤ (cid:205) ∞ n = ∆ fX ( µ , µ )| D n for each n ∈ N .Since the Radon-Nikodym derivative d µ d µ is bounded on each D n , by Lemmas A.3 and A.4, ∆ fX ( µ , µ )| D n = ∆ fX ( µ , µ )| D n for each n ∈ N . Hence, ∆ fX ( µ , µ ) ≤ ∞ (cid:213) n = ∆ fX ( µ , µ )| D n = ∞ (cid:213) n = ∆ fX ( µ , µ )| D n = ∆ fX ( µ , µ ) ≤ ∆ fX ( µ , µ ) . This implies ∆ fX ( µ , µ ) = ∆ fX ( µ , µ ) . □ Theorem 4.7, Positive Case. We show that for any positive weight function f , the continuity ∆ fX ( µ , µ ) = ∆ fX ( µ , µ ) holds. Let ( µ • , µ ⊥ ) be the Lebesgue decomposition of µ with respectto µ . Since ( µ • , µ ⊥ ) is the Lebesgue decomposition of µ with respect to µ , there is A ∈ Σ X such that µ ( E ) = µ ( E \ A ) and µ ⊥ ( E ) = µ ⊥ ( E ∩ A ) for any E ∈ Σ X . The subset A also satisfies µ ( E \ A ) = µ • ( E \ A ) for any E ∈ Σ X . We then obtain ∆ fX ( µ , µ ) = ∆ fX ( µ , µ )| X \ A + ∆ fX ( µ , µ )| A = ∆ fX ( µ • , µ )| X \ A + ∆ fX ( µ ⊥ , µ )| A = ∆ fX ( µ • , µ )| X \ A + ∆ fX ( µ ⊥ , µ )| A = ∆ fX ( µ , µ )| X \ A + ∆ fX ( µ , µ )| A = ∆ fX ( µ , µ ) From Lemma A.5, ∆ fX ( µ • , µ )| X \ A = ∆ fX ( µ • , µ )| X \ A holds, and using the dual f ∗ we have ∆ fX ( µ ⊥ , µ )| A = ∫ A dµ dµ f (cid:18) dµ ⊥ / dµdµ / dµ (cid:19) dµ = ∫ A f ∗ ( ) dµ ⊥ dµ dµ = f ∗ ( ) µ ( A ) = ∆ fX ( µ ⊥ , µ )| A . □ Theorem 4.7, General case. We show the continuity of ∆ f for arbitrary weight function f . Let α , β : R ≥ → R the functions be defined by α ( t ) = a and β ( t ) = bt respectively where a , b ≥ f is convex, there are α and β that makes f + α + β positive. Hence, ∆ fX ( µ , µ ) + aµ ( X ) + bµ ( X ) = ∆ f + α + βX ( µ , µ ) = ∆ f + α + βX ( µ , µ ) = ∆ fX ( µ , µ ) + aµ ( X ) + bµ ( X ) . This completes the proof. □ Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :30 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata

B OMITTED STRUCTURES OF THE PROGRAM LOGICB.1 Typing Rules for Expressions and Programs

Before we give the semantics of programs, we first give a type system for expressions, distributions,and programs. A typing context is a finite set Γ = { x : τ , x : τ , . . . , x n : τ n } of pairs of a variableand a value type such that each variable occurs only once in the context. The type system is largelystandard, with two kinds of judgments: Γ ⊢ t e : τ states that expression e has type τ in context Γ ,while Γ ⊢ p ν : τ states that ν is a distribution over τ in context Γ . The third judgment Γ ⊢ c statesthat program c is well-typed in context Γ , e.g., all guards are booleans, assignments are well-typed,etc. The expression typing rules are as follows: x : τ ∈ ΓΓ ⊢ t x : τ Γ ⊢ t e : τ Γ ⊢ t e : τ Γ ⊢ t e ⊕ e : τ Γ ⊢ t e : τ Γ ⊢ t e : τ Γ ⊢ t e ▷◁ e : bool Γ ⊢ t e : τ d Γ ⊢ t e : int Γ ⊢ t e [ e ] : τ Γ ⊢ t e : real Γ ⊢ p Bern ( e ) : bool Γ ⊢ t e : real Γ ⊢ t e : real Γ ⊢ p Lap ( e , e ) : real Γ ⊢ t e : real Γ ⊢ t e : real Γ ⊢ p Gauss ( e , e ) : real Γ ⊢ t e : τ Γ ⊢ p Dirac ( e ) : τ Γ ⊢ skip Γ , x : τ ⊢ p ν : τ Γ , x : τ ⊢ x $ ←− ν Γ ⊢ c Γ ⊢ c Γ ⊢ c ; c Γ ⊢ t b : bool Γ ⊢ c Γ ⊢ c Γ ⊢ if b then c else c Γ ⊢ t b : bool Γ ⊢ c Γ ⊢ while b do c B.1.1 Forming Relation Expressions.

The judgment Γ ⊢ R Φ states that the relation expression Φ is well-formed in context Γ . Γ ⊢ t e ▷◁ e : bool Γ ⊢ R e ⟨ ⟩ ▷◁ e ⟨ ⟩ Γ ⊢ t ( e ⊕ e ) ▷◁ ( e ⊕ e ) : bool Γ ⊢ R ( e ⟨ ⟩ ⊕ e ⟨ ⟩) ▷◁ ( e ⟨ ⟩ ⊕ e ⟨ ⟩) Γ ⊢ R Φ Γ ⊢ R ΨΓ ⊢ R Φ ∧ Ψ Γ ⊢ R Φ Γ ⊢ R ΨΓ ⊢ R Φ ∨ Ψ Γ ⊢ R ΦΓ ⊢ R ¬ Φ B.1.2 Basic proof rules.

The basic proof rules are given in Figure 5.

B.2 mechanism rules

Figure 6 is the list of mechanism rules in span-apRHL.

B.3 Denotational Semantics of pWHILE

To prove the soundness of span-apRHL we interpret pWHILE in

Meas using the sub-Giry monad G . First, we interpret the value types bool , int , and real as the finite discrete space B = + = { true , false } , the countable discrete space Z = { , , . . . } , and the Lebesgue measurablespace R respectively. We interpret τ d as the product (cid:74) τ (cid:75) d and we interpret a typing context Γ = { x : τ , x : τ , . . . , x n : τ n } as a product (cid:74) τ (cid:75) × (cid:74) τ (cid:75) × · · · × (cid:74) τ n (cid:75) .To give a semantics to expressions, distribution expressions, and commands, we interpret theirassociated typing/well-formedness judgments in a context Γ . We interpret an expression judgment Γ ⊢ t e : τ as a measurable function (cid:74) Γ ⊢ t e : τ (cid:75) : (cid:74) Γ (cid:75) → (cid:74) τ (cid:75) ; for instance, the variable case Γ ⊢ t x : τ is interpreted as the projection π x : (cid:74) Γ (cid:75) → (cid:74) τ (cid:75) .We interpret a reference (cid:74) Γ ⊢ t e [ e ] : τ (cid:75) of an element by ref ⟨ τ , n ⟩( (cid:74) Γ ⊢ t e (cid:75) , (cid:74) Γ ⊢ t e (cid:75) ) whereref ⟨ τ , n ⟩ : (cid:74) τ (cid:75) n × Z → (cid:74) τ (cid:75) is defined by ref ⟨ τ , n ⟩(( x , . . . , x n − ) , k ) = x min ( max ( k , ) , n ) . We can describe it categorically by using products and coproducts in

Meas .Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:31 Γ ⊢ x ← e ∼ ∆ A , x ← e : Φ { e ⟨ ⟩ , e ⟨ ⟩/ x ⟨ ⟩ , x ⟨ ⟩} = ⇒ Φ [assn] Γ ⊢ c ∼ ∆ α , δ c ′ : Φ = ⇒ Φ ′ Γ ⊢ c ∼ ∆ β , γ c ′ : Φ ′ = ⇒ Ψ [seq] Γ ⊢ c ; c ∼ ∆ α β , δ + γ c ′ ; c ′ : Φ = ⇒ ΨΓ ⊢ skip ∼ ∆ A , skip : Φ = ⇒ Φ [skip] Γ ⊢ I Φ = ⇒ b ⟨ ⟩ = b ′ ⟨ ⟩ Γ ⊢ c ∼ ∆ α , δ c ′ : Φ ∧ b ⟨ ⟩ = ⇒ Ψ Γ ⊢ c ∼ ∆ α , δ c ′ : Φ ∧ ¬ b ⟨ ⟩ = ⇒ Ψ [cond] ⊢ if b then c else c ∼ ∆ α , δ if b ′ then c ′ else c ′ : Φ = ⇒ ΨΓ ⊢ t e : int Γ ⊢ I Θ = ⇒ Θ ∧ ( b ⟨ ⟩ = b ⟨ ⟩) Γ ⊢ I Θ ∧ ( e ⟨ ⟩ ≥ n ) = ⇒ Θ ∧ ¬ b ⟨ ⟩ ∀ ≤ k ≤ n − . Γ ⊢ c ∼ ∆ α k , δ k c : Θ ∧ ( e ⟨ ⟩ = k ) ∧ ( e ⟨ ⟩ ≤ n ) = ⇒ Θ ∧ ( e ⟨ ⟩ > k ) [while] Γ ⊢ while b do c ∼ ∆ (cid:206) n − k = α k , (cid:205) n − k = δ k while b do c : Θ ∧ b ⟨ ⟩ ∧ ( e ⟨ ⟩ ≥ ) = ⇒ Θ ∧ ¬ b ⟨ ⟩ Γ ⊢ c ∼ ∆ α , δ c : Φ = ⇒ Ψ Γ ⊢ c ∼ ∆ α , δ c : Φ = ⇒ Ψ [case] Γ ⊢ c ∼ ∆ α , δ c : Φ ∨ Φ = ⇒ ΨΓ ⊢ I Φ ′ = ⇒ Φ Γ ⊢ I Ψ = ⇒ Ψ ′ Γ ⊢ c ∼ ∆ α , δ c : Φ = ⇒ Ψ α ≤ β δ ≤ γ [weak] Γ ⊢ c ∼ ∆ β , γ c : Φ ′ = ⇒ Ψ ′ Fig. 5. Basic rules.

All operators ⊕ and comparisons ▷◁ are interpreted as measurable functions ⊕ : (cid:74) τ (cid:75) × (cid:74) τ (cid:75) → (cid:74) τ (cid:75) and ▷◁ : (cid:74) τ (cid:75) × (cid:74) τ (cid:75) → (cid:74) bool (cid:75) respectively. Likewise, we interpret a distribution expression judgment Γ ⊢ p ν : τ as a measurable function (cid:74) Γ ⊢ p ν : τ (cid:75) : (cid:74) Γ (cid:75) → G (cid:74) τ (cid:75) as follows: (cid:74) Γ ⊢ p Dirac ( e ) : τ (cid:75) = η (cid:74) τ (cid:75) ◦ (cid:74) Γ ⊢ t e : τ (cid:75) , (cid:74) Γ ⊢ p Bern ( e ) : bool (cid:75) = Bern ( (cid:74) Γ ⊢ t e : real (cid:75) ) , (cid:74) Γ ⊢ p Lap ( e , e ) : real (cid:75) = Lap ( (cid:74) Γ ⊢ t e : real (cid:75) , (cid:74) Γ ⊢ t e : real (cid:75) ) , (cid:74) Γ ⊢ p Gauss ( e , e ) : real (cid:75) = N ( (cid:74) Γ ⊢ t e : real (cid:75) , (cid:74) Γ ⊢ t e : real (cid:75) ) . Finally, we interpret a command judgment Γ ⊢ c inductively as a measurable function (cid:74) Γ ⊢ c (cid:75) : (cid:74) Γ (cid:75) →G (cid:74) Γ (cid:75) by (cid:74) Γ ⊢ x $ ←− ν (cid:75) = G( rw ⟨ Γ | x : τ ⟩) ◦ st (cid:74) Γ (cid:75) , (cid:74) τ (cid:75) ◦ ⟨ id (cid:74) Γ (cid:75) , (cid:74) ν (cid:75) ⟩ , (cid:74) Γ ⊢ c ; c (cid:75) = (cid:74) Γ ⊢ c (cid:75) ♯ ◦ (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ skip (cid:75) = η (cid:74) Γ (cid:75) (cid:74) Γ ⊢ if b then c else c (cid:75) = [ (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c (cid:75) ] ◦ br ⟨ Γ ⟩ ◦ ⟨ (cid:74) Γ ⊢ b (cid:75) , id (cid:74) Γ (cid:75) ⟩ Here, rw ⟨ Γ | x : τ ⟩ : (cid:74) Γ (cid:75) × (cid:74) x : τ (cid:75) → (cid:74) Γ (cid:75) ( x : τ ∈ Γ ) is an overwriting operation of memories mapping (( a , . . . , a k , . . . , a n ) , b k ) (cid:55)→ ( a , . . . , b k , . . . , a n ) ; this is given by the Cartesian products in Meas .The function br ⟨ Γ ⟩ : 2 × (cid:74) Γ (cid:75) → (cid:74) Γ (cid:75) + (cid:74) Γ (cid:75) comes from the canonical isomorphism 2 × (cid:74) Γ (cid:75) (cid:27) (cid:74) Γ (cid:75) + (cid:74) Γ (cid:75) from the distributivity of Meas .To interpret loops, we introduce the dummy “abort” command Γ ⊢ null that is interpreted bythe null/zero measure (cid:74) Γ ⊢ null (cid:75) =

0, and the following commands corresponding to the finite

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :32 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata Γ ⊢ x ←− Bern ( e ) ∼ DP log max ( p , − p )− log min ( p , − p ) , x ←− Bern ( e ) : (( e ⟨ ⟩ = p ) ∧ ( − e ⟨ ⟩ = e ⟨ ⟩) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [DP-Bern] Γ ⊢ x ←− Bern ( e ) ∼ DP , x ←− Bern ( e ) : ( e ⟨ ⟩ = e ⟨ ⟩) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [DP-Bern-Eq] Γ ⊢ x ←− Bern ( e ) ∼ α − RDP α − (( − p ) − α p α + p − α ( − p ) α ) x ←− Bern ( e ) : ( e ⟨ ⟩ = p ) ∧ ( − e ⟨ ⟩ = e ⟨ ⟩) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [RDP-Bern] Γ ⊢ x ←− Bern ( e ) ∼ α − RDP x ←− Bern ( e ) : ( e ⟨ ⟩ = e ⟨ ⟩) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [RDP-Bern-Eq] Γ ⊢ x ←− Bern ( e ) ∼ zCDP log max ( p , − p )− log min ( p , − p ) , x ←− Bern ( e ) : ( e ⟨ ⟩ = p ) ∧ ( − e ⟨ ⟩ = e ⟨ ⟩) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [zCDP-Bern] Γ ⊢ x ←− Bern ( e ) ∼ zCDP , x ←− Bern ( e ) : ( e ⟨ ⟩ = e ⟨ ⟩) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [zCDP-Bern-Eq] Γ ⊢ x ←− Lap ( e , λ ) ∼ DP r / λ , x ←− Lap ( e , λ ) : (| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [DP-Lap] Γ ⊢ x ←− Lap ( e , λ ) ∼ α − RDP α − log { α α − e ( α − )/ λ + α − α − e − α / λ } x ←− Lap ( e , λ ) : (| e ⟨ ⟩ − e ⟨ ⟩| ≤ ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [RDP-Lap] Γ ⊢ x ←− Lap ( e , λ ) ∼ zCDP r / λ , x ←− Lap ( e , λ ) : (| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [zCDP-Lap] Γ ⊢ x ←− Gauss ( e , σ ) ∼ α − RDP αr / σ x ←− Gauss ( e , σ )(| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [RDP-G] Γ ⊢ x ←− Gauss ( e , σ ) ∼ zCDP , r / σ x ←− Gauss ( e , σ )(| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [zCDP-G] Γ ⊢ x ←− Gauss ( e , σ ) ∼ tCDP , r / σ x ←− Gauss ( e , σ )(| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [tCDP-G] ∃ c > + √ . ( ( . / δ ) ≤ c ) ∧ ( crε ≤ σ ) Γ ⊢ x ←− Gauss ( e , σ ) ∼ DP ε , δ x ←− Gauss ( e , σ ) : (| e ⟨ ⟩ − e ⟨ ⟩| ≤ r ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [DP-G]1 < /√ ρ ≤ A / δ Γ ⊢ x ←− e + A · arsinh (cid:0) A Gauss ( , δ / ρ ) (cid:1) ∼ tCDP ρ , A / δ x ←− e + A arsinh (cid:0) A Gauss ( , δ / ρ ) (cid:1) : (| e ⟨ ⟩ − e ⟨ ⟩| ≤ δ ) = ⇒ ( x ⟨ ⟩ = x ⟨ ⟩) [tCDP-SinhG] Fig. 6. Rules for basic mechanisms for DP, RDP, zCDP, and tCDP in span-apRHL.

Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:33 unrollings of the loop: [ while b do c ] n = (cid:40) if b then null else skip , if n = if b then c ; [ while b do c ] k , if n = k + (cid:74) Γ ⊢ while b do c (cid:75) = sup n ∈ N (cid:74) Γ ⊢ [ while e do c ] n (cid:75) . B.4 Proof of Soundness of the Program Logic

Lemma B.1.

The [assn] rule is sound.

Proof. We may assume x (cid:44) x without loss of generality. Let (( ϕ , a , a ) , ( ϕ , a , a )) ∈ (cid:76) Γ ⊢ R Φ { e ⟨ ⟩ , e ⟨ ⟩/ x ⟨ ⟩ , x ⟨ ⟩} (cid:77) where a ij is a value of variable x j ( i = , x and x are not free variables in e and e respectively, we have (( ϕ , (cid:74) Γ ⊢ t e : τ (cid:75) ( ϕ , a , a ) , a ) , ( ϕ , a , (cid:74) Γ ⊢ t e : τ (cid:75) ( ϕ , a , a )) ∈ (cid:76) Γ ⊢ R Φ (cid:77) . Therefore, ( f ( ϕ , a , a ) , f ( ϕ , a , a )) ∈ (cid:76) Γ ⊢ R Φ (cid:77) where f i = rw ⟨ Γ | x i : τ ⟩ ◦ ⟨ id (cid:74) Γ (cid:75) , (cid:74) Γ ⊢ t e i : τ (cid:75) ⟩ ( i = , (cid:74) Γ ⊢ R Φ { e ⟨ ⟩ , e ⟨ ⟩/ x ⟨ ⟩ , x ⟨ ⟩} (cid:75) and (cid:74) Γ ⊢ R Φ (cid:75) are binaryrelation converted to spans) , ( f , f , ( f × f )| (cid:76) Γ ⊢ R Φ { e ⟨ ⟩ , e ⟨ ⟩/ x ⟨ ⟩ , x ⟨ ⟩} (cid:77) ) : (cid:74) Γ ⊢ R Φ { e ⟨ ⟩ , e ⟨ ⟩/ x ⟨ ⟩ , x ⟨ ⟩} (cid:75) → (cid:74) Γ ⊢ R Φ (cid:75) . Letting д i = η (cid:74) Γ (cid:75) ◦ f i = (cid:74) Γ ⊢ x i ← e i (cid:75) , we conclude ( д , д , ⟨ η Φ , η Φ ⟩ ◦ ( д × д )| (cid:76) Γ ⊢ R Φ { e ⟨ ⟩ , e ⟨ ⟩/ x ⟨ ⟩ , x ⟨ ⟩} (cid:77) ) : (cid:74) Γ ⊢ R Φ { e ⟨ ⟩ , e ⟨ ⟩/ x ⟨ ⟩ , x ⟨ ⟩} (cid:75) → (cid:74) Γ ⊢ R Φ (cid:75) ♯ ( ∆ , A , ) . □ Lemma B.2.

The [seq] rule is sound.

Proof. Since the judgments Γ ⊢ c ∼ ∆ α , δ c ′ : Φ = ⇒ Φ ′ and Γ ⊢ c ∼ ∆ β , γ c ′ : Φ ′ = ⇒ Ψ are valid,we obtain the following two morphisms in Span ( Meas ) for witness functions l and l : ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) , l ) : (cid:74) Γ ⊢ R Φ (cid:75) → (cid:74) Γ ⊢ R Φ ′ (cid:75) ♯ ( ∆ , α , δ ) ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) , l ) : (cid:74) Γ ⊢ R Φ ′ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ , β , γ ) By taking the graded Kleisli lifting of the second morphism ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) , l ) , for some witnessfunction l , we have a Span ( Meas ) -morphism ( (cid:74) Γ ⊢ c (cid:75) ♯ , (cid:74) Γ ⊢ c ′ (cid:75) ♯ , l ) : (cid:74) Γ ⊢ R Φ ′ (cid:75) ♯ ( ∆ , α , δ ) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ , α β , δ + γ ) . Composing, we have a span-morphism giving validity of Γ ⊢ c ; c ∼ ∆ α β , δ + γ c ′ ; c ′ : Φ = ⇒ Ψ : ( (cid:74) Γ ⊢ c (cid:75) ♯ ◦ (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) ♯ ◦ (cid:74) Γ ⊢ c ′ (cid:75) , l ◦ l ) : (cid:74) Γ ⊢ R Φ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ , α β , δ + γ ) . This is well-defined, since the family { (cid:74) Γ ⊢ [ while e do c ] n (cid:75) } n ∈ N is an ω -chain with respect to the ω CPO ⊥ -enrichment ⊑ of Meas G . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :34 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata □ Lemma B.3.

The [weak] rule is sound

Proof. Since the judgment Γ ⊢ c ∼ ∆ α , δ c : Φ = ⇒ Ψ is valid, we have a witness function l : (cid:76) Γ ⊢ R Φ (cid:77) → W ( (cid:74) Γ ⊢ R Ψ (cid:75) , ∆ , α , δ ) such that ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c (cid:75) , l ) : (cid:74) Γ ⊢ R Φ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ , α , δ ) From the inclusions Γ ⊢ I Φ ′ = ⇒ Φ and Γ ⊢ I Ψ = ⇒ Ψ ′ of relations, we have ( id (cid:74) Γ (cid:75) , id (cid:74) Γ (cid:75) , ( id (cid:74) Γ (cid:75) × id (cid:74) Γ (cid:75) )| (cid:76) Γ ⊢ R Φ ′ (cid:77) ) : (cid:74) Γ ⊢ R Φ ′ (cid:75) → (cid:74) Γ ⊢ R Φ (cid:75) ( id (cid:74) Γ (cid:75) , id (cid:74) Γ (cid:75) , ( id (cid:74) Γ (cid:75) × id (cid:74) Γ (cid:75) )| (cid:76) Γ ⊢ R Ψ (cid:77) ) : (cid:74) Γ ⊢ R Ψ (cid:75) → (cid:74) Γ ⊢ R Ψ ′ (cid:75) . Thanks to the inclusion structure of the span-lifting (−) ♯ ( ∆ ) , we obtain (G id (cid:74) Γ (cid:75) , G id (cid:74) Γ (cid:75) , (G id (cid:74) Γ (cid:75) × G id (cid:74) Γ (cid:75) )| W ( (cid:74) Γ ⊢ R Φ (cid:75) , ∆ , α , δ ) ) : (cid:74) Γ ⊢ R Ψ ′ (cid:75) ♯ ( ∆ , α , δ ) → (cid:74) Γ ⊢ R Ψ ′ (cid:75) ♯ ( ∆ , β , γ ) Therefore, we conclude ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c (cid:75) , l | (cid:76) Γ ⊢ R Φ ′ (cid:77) ) : (cid:74) Γ ⊢ R Φ ′ (cid:75) → (cid:74) Γ ⊢ R Ψ ′ (cid:75) ♯ ( ∆ , β , γ ) □ Lemma B.4.

The [cond] rule is sound.

Proof. Since the judgments Γ ⊢ c ∼ ∆ α , δ c : Φ ∧ b ⟨ ⟩ = ⇒ Ψ and Γ ⊢ c ′ ∼ ∆ α , δ c ′ : Φ ∧ ¬ b ⟨ ⟩ = ⇒ Ψ are valid, we have two witness functions l T : (cid:76) Γ ⊢ R Φ ∧ b ⟨ ⟩ (cid:77) → W ( (cid:74) Γ ⊢ R Ψ (cid:75) , ∆ , α , δ ) and l F : (cid:76) Γ ⊢ R Φ ∧ ¬ b ⟨ ⟩ (cid:77) → W ( (cid:74) Γ ⊢ R Ψ (cid:75) , ∆ , α , δ ) that make the following morphisms in Span ( Meas ) : ( (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c (cid:75) , l T ) : (cid:74) Γ ⊢ R Φ ∧ b ⟨ ⟩ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ , α , δ ) ( (cid:74) Γ ⊢ c ′ (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) , l F ) : (cid:74) Γ ⊢ R Φ ∧ ¬ b ⟨ ⟩ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ , α , δ ) . By the coproduct structure of

Span ( Meas ) , we have the following span-morphism: ([ (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) ] , [ (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) ] , [ l T , l F ]) : (cid:74) Γ ⊢ R Φ ∧ b ⟨ ⟩ (cid:75) (cid:219) + (cid:74) Γ ⊢ R Φ ∧ ¬ b ⟨ ⟩ (cid:75) → (cid:74) Γ ⊢ R Ψ (cid:75) ♯ ( ∆ , α , δ ) . We write д = br ⟨ Γ ⟩ ◦ ⟨ (cid:74) Γ ⊢ t b (cid:75) , id (cid:74) Γ (cid:75) ⟩ and д = br ⟨ Γ ⟩ ◦ ⟨ (cid:74) Γ ⊢ t ¬ b (cid:75) , id (cid:74) Γ (cid:75) ⟩ . We construct thefollowing morphism by using Γ ⊢ I Φ = ⇒ b ⟨ ⟩ = b ′ ⟨ ⟩( д , д , H ◦ ( д × д )| (cid:76) Γ ⊢ R Φ (cid:77) ) : (cid:74) Γ ⊢ R Φ (cid:75) → (cid:74) Γ ⊢ R Φ ∧ b ⟨ ⟩ (cid:75) (cid:219) + (cid:74) Γ ⊢ R Φ ∧ ¬ b ⟨ ⟩ (cid:75) , (8)where H is the composition H ◦ H ◦ H of • H : ( (cid:74) Γ (cid:75) + (cid:74) Γ (cid:75) ) × ( (cid:74) Γ (cid:75) + (cid:74) Γ (cid:75) ) (cid:27) × ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) ) defined by ( ι i ( ϕ ) , ι j ( ϕ )) (cid:55)→ (( i , j ) , ( ϕ , ϕ )) where i , j ∈ • H : 4 × ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) ) → × ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) ) defined by (( b , b ) , ϕ , ϕ ) (cid:55)→ ( b , ϕ , ϕ ) , • H : 2 × ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) ) (cid:27) ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) ) + ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) ) defined by ( b , ( ϕ , ϕ )) (cid:55)→ ι b ( ϕ , ϕ ) . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:35

Here, ι i are coprojections ι : A → A + B and ι : B → A + B . The bijections H and H are givenfrom the distributivity of products and coproducts in Meas , and H is given by a projection.Now, let ( ϕ , ϕ ) ∈ (cid:76) Γ ⊢ R Φ (cid:77) . Since we suppose Γ ⊢ I Φ = ⇒ b ⟨ ⟩ = b ′ ⟨ ⟩ , we have ( д ( ϕ ) , д ( ϕ )) = (cid:40) (( , ϕ ) , ( , ϕ )) ( ϕ , ϕ ) ∈ (cid:76) Γ ⊢ R Φ ∧ b ⟨ ⟩ (cid:77) = (cid:76) Γ ⊢ R Φ ∧ b ′ ⟨ ⟩ (cid:77) (( , ϕ ) , ( , ϕ )) ( ϕ , ϕ ) ∈ (cid:76) Γ ⊢ R Φ ∧ ¬ b ⟨ ⟩ (cid:77) = (cid:76) Γ ⊢ R Φ ∧ ¬ b ′ ⟨ ⟩ (cid:77) . We observe the role of H in the first case ( ( ϕ , ϕ ) ∈ (cid:76) Γ ⊢ R Φ ∧ b ⟨ ⟩ (cid:77) ), H ( д ( ϕ ) , д ( ϕ )) = H ◦ H ◦ H (( , ϕ ) , ( , ϕ )) = H ◦ H (( , ) , ( ϕ , ϕ )) = H ( , ( ϕ , ϕ )) = ι ( ϕ , ϕ ) . In the same way, we have H ( д ( ϕ ) , д ( ϕ )) = ι ( ϕ , ϕ ) in the second case. Therefore, the measurablefunction H ◦ ( д × д )| (cid:76) Γ ⊢ R Φ (cid:77) forms a function from (cid:76) Γ ⊢ R Φ (cid:77) to (cid:76) Γ ⊢ R Φ ∧ b ⟨ ⟩ (cid:77) + (cid:76) Γ ⊢ R Φ ∧ ¬ b ⟨ ⟩ (cid:77) satisfying (8).Since (cid:74) Γ ⊢ if b then c else c ′ (cid:75) = [ (cid:74) Γ ⊢ c (cid:75) , (cid:74) Γ ⊢ c ′ (cid:75) ] ◦ br ⟨ Γ ⟩ ◦ ⟨ (cid:74) Γ ⊢ t b (cid:75) , id (cid:74) Γ (cid:75) ⟩ , we conclude thesoundness. □ Remark B.1.

Similarly, we have soundness of [case].

Remark B.2.

The soundness of the [while] rule is a consequence of the soundness of [seq], [weak],and the [case] rule since the [while] in our logic deal only with finite-loops.

Lemma B.5.

The rule [RDP-G] is sound.

Proof. We assume x (cid:44) x . First, it can be directly checked that the function f = N (− , σ ) : R →G R is measurable. From Mironov [2017, Proposition 3], the function f satisfies D α ( f ( x )|| f ( y )) ≤ αr / σ whenever | x − y | ≤ r . Hence, ( f , f , ( f × f )| Φ ) is a span-morphism Φ → Eq ♯ ( D α , ∗ , αr / σ ) R where Φ = { ( x , y ) ∈ R × R | | x − y | ≤ r } is regarded as a span.We next construct a span-morphism ( h , h , ( h × h )| Θ ) mapping Θ → Eq ♯ ( D α , ∗ , αr / σ ) R where Θ = (cid:74) Γ ⊢ R | e ⟨ ⟩ − e ⟨ ⟩| ≤ r (cid:75) and h i = (cid:74) Γ ⊢ p Gauss ( e i , σ ) : real (cid:75) ( i = , д i = (cid:74) Γ ⊢ t e i : real (cid:75) ( i = , ( д , д , ( д × д )| Θ ) is a span-morphism Θ → Φ . Since h i = f i ◦ д i ( i = , ( h , h , ( h × h )| Θ ) is a span-morphism Θ → Eq ♯ ( D α , ∗ , αr / σ ) R .Now, the triple ( id (cid:74) Γ (cid:75) × h , id (cid:74) Γ (cid:75) × h , ( id (cid:74) Γ (cid:75) × id (cid:74) Γ (cid:75) )×( h × h )| Θ ) is a morphism of spans ⊤ (cid:74) Γ (cid:75) (cid:219)× Θ →⊤ (cid:74) Γ (cid:75) (cid:219)× ( Eq R ) ♯ ( D α , ∗ , αr / σ ) where ⊤ (cid:74) Γ (cid:75) = ( (cid:74) Γ (cid:75) , (cid:74) Γ (cid:75) , (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) , π , π ) . Thanks to the unit and the dou-ble strength of the span-lifting {(−) ♯ ( D α , ∗ , ρ ) } ρ , the triple ( st (cid:74) Γ (cid:75) , R , st (cid:74) Γ (cid:75) , R , ⟨ st (cid:74) Γ (cid:75) , R ◦ ( π × π ) , st (cid:74) Γ (cid:75) , R ◦( π × π )⟩| ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) )× W ( Eq R , D α , ∗ , αr / σ ) ) is a morphism of spans ⊤ (cid:74) Γ (cid:75) (cid:219)×( Eq R ) ♯ ( D α , ∗ , αr / σ ) → (⊤ (cid:74) Γ (cid:75) (cid:219)× Eq R ) ♯ ( D α , ∗ , αr / σ ) .We write k i = rw ⟨ Γ | x i : real ⟩ ( i = , ((( ϕ , a , a ) , r ) , (( ϕ , a , a ) , r )) ∈ ⊤ (cid:74) Γ (cid:75) (cid:219)× Eq R where a ij is a value of variable x j ( i = , ( rw ⟨ Γ | x : real ⟩(( ϕ , a , a ) , r ) , rw ⟨ Γ | x : real ⟩(( ϕ , a , a ) , r )) = (( ϕ , r , a ) , ( ϕ , a , r )) ∈ (cid:76) Γ ⊢ R x ⟨ ⟩ = x ⟨ ⟩ (cid:77) Hence, the triple ( k , k , ( k × k )| ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) )× Eq R ) forms a morphism of spans (⊤ (cid:74) Γ (cid:75) (cid:219)× Eq R ) → (cid:74) Γ ⊢ R x ⟨ ⟩ = x ⟨ ⟩ (cid:75) .(Note that (⊤ (cid:74) Γ (cid:75) (cid:219)× Eq R ) and (cid:74) Γ ⊢ R x ⟨ ⟩ = x ⟨ ⟩ (cid:75) are binary relations converted to spans.)By the functoriality of the span-lifting {(−) ♯ ( D α , ∗ , ρ ) } ρ , we obtain in Span ( Meas ) , ( k , k , ( k × k )| ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) )× Eq R ) ♯ ( D α , ∗ , αr / σ ) : (⊤ (cid:74) Γ (cid:75) (cid:219)× Eq R ) ♯ ( D α , ∗ , αr / σ ) → (cid:74) Γ ⊢ R x ⟨ ⟩ = x ⟨ ⟩ (cid:75) ♯ ( D α , ∗ , αr / σ ) . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :36 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata

Since (cid:74) Γ ⊢ x i $ ←− Gauss ( e i , σ ) (cid:75) = G k i ◦ st (cid:74) Γ (cid:75) , R ◦ ⟨ id (cid:74) Γ (cid:75) , h i ⟩ ( i = , ( (cid:74) Γ ⊢ x ←− Gauss ( e , σ ) (cid:75) , (cid:74) Γ ⊢ x ←− Gauss ( e , σ ) (cid:75) , l ) = ( k , k , ( k × k )| ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) )× Eq R ) ♯ ( D α , ∗ , αr / σ ) ◦ ( st (cid:74) Γ (cid:75) , R , st (cid:74) Γ (cid:75) , R , ⟨ st (cid:74) Γ (cid:75) , R ◦ ( π × π ) , st (cid:74) Γ (cid:75) , R ◦ ( π × π )⟩| ( (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) )× W ( Eq R , D α , ∗ , αr / σ ) )◦ ( id (cid:74) Γ (cid:75) × h , id (cid:74) Γ (cid:75) × h , ( id (cid:74) Γ (cid:75) × id (cid:74) Γ (cid:75) ) × ( h × h )| Θ )◦ (⟨ id (cid:74) Γ (cid:75) , id (cid:74) Γ (cid:75) ⟩ , ⟨ id (cid:74) Γ (cid:75) , id (cid:74) Γ (cid:75) ⟩ , ⟨ id (cid:74) Γ (cid:75) × (cid:74) Γ (cid:75) | Θ , id Θ ⟩) : Θ → (cid:74) Γ ⊢ R x ⟨ ⟩ = x ⟨ ⟩ (cid:75) ♯ ( D α , ∗ , αr / σ ) . □ Soundness of other mechanism rules follows similarly using Mironov [2017, Propositions 5, 6, 7],Dwork et al. [2006, Proposition 1], Sato [2016, Lemma 4.2] (a refinement of Dwork and Roth [2013,Theorem 3.22]), the soundness of the transitivity rules are proved by Olmedo [2014, Lemma 4.2(iii)],Bun and Steinke [2016, Proposition 27] and Lemma 4.2, and the soundness of the conversion rulesfollows by Bun and Steinke [2016, Proposition 4], Mironov [2017, Proposition 3], and Bun andSteinke [2016, Lemmas 3.2, 3.5].

C OMITTED PROOFS

Theorem C.1 (Theorem 4.4). An A -graded family ∆ is additive if it is continuous and composable. Proof. From the continuity of ∆ , ∆ α βX × Y ( µ ⊗ µ , µ ⊗ µ ) = sup (cid:110) ∆ α βI (G k ( µ ⊗ µ ) , G k ( µ ⊗ µ )) (cid:12)(cid:12)(cid:12) k : X × Y → I (cid:111) . We fix k : X × Y → I . For any µ ∈ G Y , we define K µ : X → G I by K µ = G k ◦ st X , Y ◦( id X × µ )◦ ρ − X where µ is the generalized element 1 → G Y assigning µ , and ρ X is a canonical isomorphism X (cid:27) X ×

1. We then obtain for any µ ∈ G X , K ♯ µ ( µ ′ ) = G k ◦ µ X × Y ◦ G st X , Y ◦ G( id X × µ ) ◦ G ρ − X ( µ ′ ) = G k ◦ µ X × Y ◦ G st X , Y ◦ G( id X × µ ) ◦ st ′ X , ◦ ρ − G X ( µ ′ ) = G k ◦ µ X × Y ◦ G st X , Y ◦ st ′ X , G Y ◦ ( id G X × µ ) ◦ ρ − G X ( µ ′ ) = G k ◦ dst X , Y ( µ ′ , µ ) = G k ( µ ′ ⊗ µ ) . We also obtain K µ ( x ) = G k ( d x ⊗ µ ) for any x ∈ X . This implies K µ ( x ) = G k ( x , −)( µ ) where k ( x , −) Y → I is measurable because ( d x ⊗ µ )( k − ( A )) = µ (( k − ( A ))| x ) = µ ( k ( x , −) − ( A )) for any A ⊆ I .From the composability and continuity of ∆ , we have ∆ α βI (G k ( µ ⊗ µ ) , G k ( µ ⊗ µ )) = ∆ α βI ( K ♯ µ ( µ ) , K ♯ µ ( µ ))≤ ∆ αX ( µ , µ ) + sup x ∈ X ∆ βI ( K µ ( x ) , K µ ( x )) = ∆ αX ( µ , µ ) + sup x ∈ X ∆ βI (G k ( x , −)( µ ) , G k ( x , −)( µ ))≤ ∆ αX ( µ , µ ) + ∆ βY ( µ , µ ) . Since k : X × Y → I is arbitrary, we conclude the additivity of ∆ . □ Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:37

Theorem C.2 (Theorem 4.6).

A continuous approximable A -graded family ∆ is composable iffinite-composable. Proof. Let µ , µ ∈ G X and f , д : X → G Y . Since ∆ is continuous, approximable, and finitecomposable, we obtain, ∆ α βY ( f ♯ ( µ ) , д ♯ ( µ ))≤ sup (cid:110) ∆ α βI (G k ( f ♯ ( µ )) , G k ( f ♯ ( µ ))) | I ∈ Fin , k : X → I (cid:111) ≤ sup (cid:110) lim n →∞ ∆ α βI ((G k ◦ f ◦ m n ◦ m ∗ n ) ♯ ( µ ) , (G k ◦ д ◦ m n ◦ m ∗ n ) ♯ ( µ )) | I ∈ Fin , k : X → I . (cid:111) ≤ sup (cid:110) lim n →∞ ∆ αJ n (G m ∗ n ( µ ) , G m ∗ n ( µ )) | I ∈ Fin , k : X → I (cid:111) + sup (cid:40) lim n →∞ sup j ∈ J n ∆ βI (G k ◦ f ◦ m n ( j ) , G k ◦ д ◦ m n ( j )) | I ∈ Fin , k : X → I (cid:41) Regarding the first term of the last inequality, since m ∗ n : X → J n where J n ∈ Fin , and ∆ α iscontinuous, we have ∆ αJ n (G m ∗ n ( µ ) , G m ∗ n ( µ )) ≤ ∆ αX ( µ , µ ) . Concerning the second term, since m n ( j ) ∈ X for any n and j ∈ J n , and k : I → X and ∆ β iscontinuous, we obtainsup j ∈ J n ∆ βI (G k ◦ f ◦ m n ( j ) , G k ◦ д ◦ m n ( j )) ≤ sup x ∈ X ∆ βI (G k ◦ f ( x ) , G k ◦ д ( x )) ≤ sup x ∈ X ∆ βY ( f ( x ) , д ( x )) . This completes the proof. □ Theorem C.3 (Theorem 4.8).

The f -divergence ∆ f is approximable for any weight function f . Proof. Consider h , k : X → G I . Let | I | = N . We may regard G I ⊆ [ , ] N . We define a partition { C nj ... j N } j ,..., j N ∈{ , ,..., n − } of X by C nj ... j N = h − ( B nj ... j N ) ∩ k − ( B nj N + ... j N ) where B nj ... j N = A j × · · · × A j N , A n = { } and A nl + = ( l / n , ( l + )/ n ] . We define J n = (cid:110) ( j , . . . , j N ) | j , . . . , j N ∈ { , , . . . , n − } , C nj ... j N (cid:44) ∅ (cid:111) . We next define m ∗ n : X → J n and m n : J n → X as follows: m ∗ n ( x ) is the unique element ( j , . . . , j N ) ∈ J n satisfying x ∈ C nj ,..., j N , and m n ( j , . . . , j N ) is an element of C nj ,..., j N .From the construction of { C nj ... j N } j ,..., j N ∈{ , ,..., n − } , for any n ∈ N , x ∈ X , and i ∈ I , | h ( x )( i ) − ( h ◦ m n ◦ m ∗ n )( x )( i )| ≤ / n , | k ( x )( i ) − ( k ◦ m n ◦ m ∗ n )( x )( i )| ≤ / n holds. In particular, for any i ∈ I , the sequences of functions {( h ◦ m n ◦ m ∗ n )(−)( i )} n ∈ N and {( k ◦ m n ◦ m ∗ n )(−)( i )} n ∈ N converge uniformly to h (−)( i ) and k (−)( i ) respectively. Hence, for any µ , µ ∈ G X ,we have h ♯ ( µ )( i ) = ∫ X h (−)( i ) dµ = lim n →∞ ∫ X ( h ◦ m n ◦ m ∗ n )(−)( i ) dµ = lim n →∞ ( h ◦ m n ◦ m ∗ n ) ♯ ( µ )( i ) , д ♯ ( µ )( i ) = ∫ X k (−)( i ) dµ = lim n →∞ ∫ X ( k ◦ m n ◦ m ∗ n )(−)( i ) dµ = lim n →∞ ( k ◦ m n ◦ m ∗ n ) ♯ ( µ )( i ) . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :38 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata

Therefore, ∆ fI ( h ♯ ( µ ) , k ♯ ( µ )) = (cid:213) i ∈ I д ♯ ( µ )( i ) f (cid:32) h ♯ ( µ )( i ) д ♯ ( µ )( i ) (cid:33) = (cid:213) i ∈ I ( lim n →∞ ( k ◦ m n ◦ m ∗ n ) ♯ ( µ )( i )) f (cid:32) lim n →∞ ( h ◦ m n ◦ m ∗ n ) ♯ ( µ )( i ) lim n →∞ ( k ◦ m n ◦ m ∗ n ) ♯ ( µ )( i ) (cid:33) = lim n →∞ (cid:213) i ∈ I ( k ◦ m n ◦ m ∗ n ) ♯ ( µ )( i ) f (cid:32) ( h ◦ m n ◦ m ∗ n ) ♯ ( µ )( i )( k ◦ m n ◦ m ∗ n ) ♯ ( µ )( i ) (cid:33) = lim n →∞ ∆ fI (( h ◦ m n ◦ m ∗ n ) ♯ ( µ ) , ( k ◦ m n ◦ m ∗ n ) ♯ ( µ )) Remark that the third equality in the above calculation is obtained from the continuity of the weightfunction f . We then conclude that ∆ f is approximable. □ Theorem C.4 (Theorem 4.11).

For any α > , the Rényi divergence D α of order α is reflexive,continuous, approximable, composable, and additive (as a singleton-graded family). Proof. By Theorems 4.7 and 4.8, the f -divergence ∆ R ( α ) of the weight function f ( t ) = t α is continuous and approximable. Since the function д : R ≤ → R defined by д ( t ) = α − log ( t ) ismonotone and continuous, D α = α − log ∆ R ( α ) is also continuous and approximable. Thus, it sufficesto show the reflexivity and finite-composability of D α . The reflexivity is obvious: D αX ( µ || µ ) X = α − log µ ( X ) ≤

0. We show the finite-composability. Let I , J ∈ Fin , d , d ∈ G J , and h , k : J → G I .We calculate by Jensen’s inequality: ∆ R ( α ) I ( h ♯ d , k ♯ d ) = (cid:213) i ∈ I (cid:32)(cid:213) j ∈ J d ( j ) · k ( j )( i ) (cid:33) (cid:18) (cid:205) j ∈ J d ( j ) · h ( j )( i ) (cid:205) j ∈ J d ( j ) · k ( j )( i ) (cid:19) α ≤ (cid:213) j ∈ J d ( j ) (cid:18) d ( j ) d ( j ) (cid:19) α (cid:213) i ∈ I k ( j )( i ) (cid:18) h ( j )( i ) k ( j )( i ) (cid:19) α ≤ (cid:213) j ∈ J d ( j ) (cid:18) d ( j ) d ( j ) (cid:19) α · ∆ R ( α ) α ( h ( j ) , k ( j ))≤ ∆ R ( α ) J ( d , d ) · sup j ∈ J ∆ R ( α ) I ( h ( j ) , k ( j )) . This implies D αI ( h ♯ d || k ♯ d ) ≤ D αJ ( d || d ) + sup j ∈ J D αI ( h ( j )|| k ( j )) . □ Proposition C.1 (Proposition 4.1). If < α ≤ β then D αX ( µ || µ ) ≤ D βX ( µ || µ ) . Proof. The proof is almost the same as Van Erven and Harremoës [2014, Theorem 3]. Since D α and D β are continuous, it suffices to prove in finite discrete case. We denote by | p | the sum (cid:205) i ∈ I p i . We may assume | p | > | p | = D αI ( p || q ) = D βI ( p || q ) = −∞ . We remark that Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:39 the function t (cid:55)→ t α − β − is concave. We have (cid:16) | p | (cid:17) α − β − ≤ | p | since 1 ≤ | p | . Therefore,1 α − (cid:213) i ∈ I p αi q − αi = α − (cid:169)(cid:173)(cid:171) | p | (cid:213) i ∈ I p i | p | (cid:32)(cid:18) p i q i (cid:19) − β (cid:33) α − β − (cid:170)(cid:174)(cid:172) ≤ α − (cid:169)(cid:173)(cid:171) | p | (cid:32)(cid:213) i ∈ I p i | p | (cid:18) p i q i (cid:19) − β (cid:33) α − β − (cid:170)(cid:174)(cid:172) ≤ α − (cid:32) | p | · (cid:213) i ∈ I p i | p | (cid:18) p i q i (cid:19) − β (cid:33) α − β − = β − (cid:213) i ∈ I p βi q − βi This completes the proof. □ Proposition C.2 (Proposition 4.2).

For any α > , µ , µ , µ ∈ G X , and p , q > satisfying p + q = , we have D αX ( µ || µ ) ≤ pα − p ( α − ) D pαX ( µ || µ ) + D qp ( pα − ) X ( µ || µ ) . Proof. Recall that if µ (cid:51) µ then D αX ( µ || µ ) = ∞ . Hence, we may assume µ ≪ µ ≪ µ without loss of generality (if not so, the right-hand side should be infinity). By chain rule ofRadon-Nikodym derivative and Hölder’s inequality, ∆ R ( α ) X ( µ , µ ) = ∫ X (cid:18) dµ dµ (cid:19) α dµ = ∫ X (cid:18) dµ dµ · dµ dµ (cid:19) α dµ = ∫ X (cid:18) dµ / dµ dµ / dµ (cid:19) α · (cid:18) dµ dµ (cid:19) p · (cid:18) dµ dµ (cid:19) α − p dµ ≤ (cid:18)∫ X (cid:18) dµ / dµ dµ / dµ (cid:19) pα · (cid:18) dµ dµ (cid:19) dµ (cid:19) p · (cid:32)∫ X (cid:18) dµ dµ (cid:19) q ( α − p ) dµ (cid:33) q = ∆ R ( pα ) X ( µ || µ ) p · ∆ R ( qα − qp ) X ( µ || µ ) q We then conclude D αX ( µ || µ ) ≤ pα − p ( α − ) D pαX ( µ || µ ) + D qp ( pα − ) X ( µ || µ ) . □ Theorem C.5 (Theorem 4.12).

The R ≥ -graded family ∆ zCDP = { ∆ zCDP ( ξ ) } ≤ ξ is reflexive, continu-ous, composable, and additive. Proof. Consider any α >

1. We consider a ( R ≥ , + , , ≤) -graded family ∆ zCDP + ( α ) = { ∆ zCDP + ( ξ , α ) } ξ ∈ R ≥ of the following divergences: ∆ zCDP + ( ξ , α ) X ( µ , µ ) = α ( D α ( µ || µ )) − ξ ) . Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :40 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata

By the previous theorem 4.11, this family is reflexive and continuous for any α >

1. The compos-ability of the family ∆ zCDP + ( α ) = { ∆ zCDP + ( ξ , α ) } ξ ∈ R ≥ is the direct consequence of the composabilityof α -Rényi divergence: for any µ , µ ∈ G X , and f , д : X → G Y ,1 α ( D αX ( f ♯ ( µ )|| д ♯ ( µ )) − ( ξ + ξ )) ≤ α ( D αX ( µ || µ ) − ξ ) + sup x ∈ X α ( D αY ( f ( x )|| д ( y )) − ξ ) . Since ∆ zCDP ( ξ ) = sup α > ∆ zCDP + ( ξ , α ) , the graded family ∆ zCDP = { ∆ zCDP ( ξ ) } ≤ ξ is reflexive, continuous,and composable. The additivity is obtained from Theorem 4.4. □ C.1 Detailed Construction and Proof of Well-definedness of Approximate Span-lifting

Definition C.6 (Functors).

If the family ∆ is functorial then the structure of endofunctor on Span ( Meas ) of the approximate span-lifting (−) ♯ ( ∆ , α , δ ) is given as follows: for all α ∈ A , δ ∈ R , and ( h , k , l ) : ( X , Y , Φ , ρ , ρ ) → ( X ′ , Y ′ , Ψ , ρ ′ , ρ ′ ) in Span ( Meas ) , (G h , G k , (G l × G l )| W ( Φ , ∆ , α , δ ) ) : ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , α , δ ) → ( X ′ , Y ′ , Ψ , ρ ′ , ρ ′ ) ♯ ( ∆ , α , δ ) . (9)Theorem C.7 (Well-definedness). If ∆ is functorial then the above structure (−) ♯ ( ∆ , α , δ ) formsindeed an endofunctor on Span ( Meas ) . Proof. We first show the well-definedness of (9). We fix ( h , k , l ) : ( X , Y , Φ , ρ , ρ ) → ( X ′ , Y ′ , Ψ , ρ ′ , ρ ′ ) in Span ( Meas ) and parameters α ∈ A and δ ∈ R . Let ( ν , ν ) ∈ W ( Φ , ∆ , α , δ ) . The pair satisfies ∆ α Φ ( ν , ν ) ≤ δ . Since the divergence ∆ α is functorial, we have ∆ α Ψ (G( l )( ν ) , G( l )( ν )) ≤ δ . Thus, (G l × G l )| W ( Φ , ∆ , α , δ ) is a measurable function from W ( Φ , ∆ , α , δ ) to W ( Ψ , ∆ , α , δ ) . Since G is afunctor on Meas , we obtain, G ρ ′ ◦ π ◦ (G l × G l )| W ( Φ , ∆ , α , δ ) = G ρ ′ ◦ G l ◦ π | W ( Φ , ∆ , α , δ ) = G h ◦ G ρ ◦ π | W ( Φ , ∆ , α , δ ) , G ρ ′ ◦ π ◦ (G l × G l )| W ( Φ , ∆ , α , δ ) = G ρ ′ ◦ G l ◦ π | W ( Φ , ∆ , α , δ ) = G k ◦ G ρ ◦ π | W ( Φ , ∆ , α , δ ) . Thus, the construction (9) is a mapping on

Span ( Meas ) -morphisms.The functoriality is obvious by definition. □ Definition C.8 (Graded monad structures).

If the family ∆ is reflexive and composable then thestructure of A × ( R , + , , ≤) - graded monad on Span ( Meas ) is given as follows. Unit: for any span ( X , Y , Φ , ρ , ρ ) , we define ( η X , η Y , ⟨ η Φ , η Φ ⟩) : ( X , Y , Φ , ρ , ρ ) → ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , A , ) . (10) Kleisli extensions: for any morphism ( h , k , l ) : ( X , Y , Φ , ρ , ρ ) → ( X ′ , Y ′ , Ψ , ρ ′ , ρ ′ ) ♯ ( ∆ , α , δ ) in Span ( Meas ) , we define ( h ♯ , k ♯ , (( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ × ( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ )| W ( Φ , ∆ , β , γ ) ) : ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , β , γ ) → ( X ′ , Y ′ , Ψ , ρ ′ , ρ ′ ) ♯ ( ∆ , α β , δ + γ ) (11) Inclusions: for any α ⪯ β , δ ≤ γ , and ( X , Y , Φ , ρ , ρ ) in Span ( Meas ) , we define ( id G X , id G Y , ( id G Φ × id G Φ )| W ( Φ , ∆ , α , δ ) ) : ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , α , δ ) → ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , β , γ ) . (12)We remark here that each (−) ♯ ( ∆ , α , δ ) is also an endofunctor because ∆ is the functorial since it isboth reflexive and composable. We obtain these properties from commutativity sup y ∈ Y sup x ∈ X f ( x , y ) = sup x ∈ X sup y ∈ Y f ( x , y ) of supremums. Wedrop the approximability, which is not given by a supremum but rather by a limit. Strictly speaking, we consider the function W ( Φ , ∆ , α , δ ) (G l ×G l )| W ( Φ , ∆ , α , δ ) −−−−−−−−−−−−−−−−−→ ( Image ) inclusion −−−−−−−→ W ( Φ , ∆ , α , δ ) throughthe image ( Image ) . Functoriality shows the existence of the inclusion.Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. pproximate Span Liftings 1:41 Theorem C.9 (Well-definedness). If ∆ is reflexive and composable then the above structures (−) ♯ ( ∆ , α , δ ) form indeed an A × R -graded monad on Span ( Meas ) . Proof. We first prove that the components are well-defined.

Unit:

We show the well-definedness of (10). We fix ( X , Y , Φ , ρ , ρ ) in Span ( Meas ) . For any ϕ ∈ Φ , we have ⟨ η Φ , η Φ ⟩( ϕ ) = ( d ϕ , d ϕ ) . Since ∆ is reflexive, we have ∆ A ( d ϕ , d ϕ ) ≤

0. Thus, ⟨ η Φ , η Φ ⟩ is indeed a measurable function from ( X , Y , Φ , ρ , ρ ) to W ( Φ , ∆ , A , ) . Since η is aunit of the sub-Giry monad G , we obtain G ρ ◦ π | W ( Φ , ∆ , A , ) ◦ ⟨ η Φ , η Φ ⟩ = G ρ ◦ η Φ = η X ◦ ρ , G ρ ◦ π | W ( Φ , ∆ , A , ) ◦ ⟨ η Φ , η Φ ⟩ = G ρ ◦ η Φ = η Y ◦ ρ . Thus (10) is well-defined.

Kleisli extensions:

We show the well-definedness of (11). We fix a

Span ( Meas ) -morphism ( h , k , l ) : ( X , Y , Φ , ρ , ρ ) → ( X ′ , Y ′ , Ψ , ρ ′ , ρ ′ ) ♯ ( ∆ , α , δ ) and parameters β ∈ A and γ ∈ R . For any ϕ ∈ Φ , we have ∆ α Ψ ( π | W ( Ψ , ∆ , α , δ ) ◦ l ( ϕ ) , π | W ( Ψ , ∆ , α , δ ) ◦ l ( ϕ )) ≤ δ . Since ∆ is composable, we have for any ( ν , ν ) ∈ W ( Φ , ∆ , δ , γ ) , ∆ α β Ψ (( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ ( ν ) , ( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ ( ν )) ≤ δ + γ This implies that (( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ × ( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ )| W ( Φ , ∆ , β , γ ) is indeed a measurablefunction from W ( Φ , ∆ , β , γ ) to W ( Ψ , ∆ , α β , δ + γ ) . Since (−) ♯ is the Kleisli lifting of thesub-Giry monad, we obtain G ρ ′ ◦ π | W ( Ψ , ∆ , α β , δ + γ ) ◦ (( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ × ( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ )| W ( Φ , ∆ , β , γ ) = G ρ ′ ◦ ( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ ◦ π | W ( Φ , ∆ , β , γ ) = (G ρ ′ ◦ π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ ◦ π | W ( Φ , ∆ , β , γ ) = h ♯ ◦ G ρ ◦ π | W ( Φ , ∆ , β , γ ) G ρ ′ ◦ π | W ( Ψ , ∆ , α β , δ + γ ) ◦ (( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ × ( π | W ( Ψ , ∆ , α , δ ) ◦ l ) ♯ )| W ( Φ , ∆ , β , γ ) = k ♯ ◦ G ρ ◦ π | W ( Φ , ∆ , β , γ ) Thus (11) is well-defined.

Inclusions:

We show the well-definedness of (12). We fix ( X , Y , Φ , ρ , ρ ) in Span ( Meas ) andparameters α ⪯ β and δ ≤ γ . Since ∆ is an A -graded family of divergences, we have ∆ β ≤ ∆ α .This implies that there is the inclusion function W ( Φ , ∆ , α , δ ) (cid:44) → W ( Φ , ∆ , β , γ ) in Meas .Hence, by treating the restrictions of functions, we obtainid G X ◦ G ρ ◦ π | W ( Φ , ∆ , α , δ ) = G ρ ◦ π ◦ ( id G Φ × id G Φ )| W ( Φ , ∆ , α , δ ) id G Y ◦ G ρ ◦ π | W ( Φ , ∆ , α , δ ) = G ρ ◦ π ◦ ( id G Φ × id G Φ )| W ( Φ , ∆ , α , δ ) . Therefore (12) is well defined.Therefore, the components of graded monad structures are well-defined. It is easy to check theaxioms of graded monad in Katsumata [2014, Definition 2.3] by using monad structure of thesub-Giry monad G since the graded monad structure of the approximate span-lifting is given byusing the monad structure of G and restrictions. □ Definition C.10 (Double strength).

If the family ∆ is reflexive, composable, and additive then a double strength of the graded monad (−) ♯ ( ∆ , α , δ ) is given as follows: for any pair ( X , Y , Φ , ρ , ρ ) and Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. :42 Tetsuya Sato, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Shin-ya Katsumata ( X ′ , Y ′ , Ψ , ρ ′ , ρ ′ ) of spans, ( dst X , X ′ , dst Y , Y ′ , ⟨ dst Φ , Ψ ◦ ( π × π ) , dst Φ , Ψ ◦ ( π × π )⟩| W ( Φ , ∆ , α , δ )× W ( Ψ , ∆ , β , γ ) ) : ( X , Y , Φ , ρ , ρ ) ♯ ( ∆ , α , δ ) (cid:219)× ( X ′ , Y ′ , Ψ , ρ ′ , ρ ′ ) ♯ ( ∆ , β , γ ) → ( Φ (cid:219)× Ψ ) ♯ ( ∆ , α β , δ + γ ) . (13)Theorem C.11 (Well-definedness (Theorem 5.3)). If ∆ is reflexive, composable, and additive thenthe above structure forms indeed a double strength of the graded monad (−) ♯ ( ∆ , α , δ ) on Span ( Meas ) . Proof. Since ∆ is reflexive and composable, (−) ♯ ( ∆ , α , δ ) forms an A × R -graded monad on Span ( Meas ) . We show the well-definedness of (13). We fix spans ( X , Y , Φ , ρ , ρ ) and ( X ′ , Y ′ , Ψ , ρ ′ , ρ ′ ) and parameters α , β ∈ A and γ , δ ∈ R . Since ∆ is additive, ⟨ dst Φ , Ψ ◦ ( π × π ) , dst Φ , Φ ′ ◦ ( π × π )⟩| W ( Φ , ∆ , α , δ )× W ( Ψ , ∆ , β , γ ) is indeed a measurable function from W ( Φ , ∆ , α , δ ) × W ( Ψ , ∆ , β , γ ) to W ( Φ (cid:219)× Ψ , ∆ , α β , δ + γ ) . From the binaturality of the double strength dst of the sub-Giry monad G ,we have G( ρ × ρ ′ ) ◦ π | W ( Φ (cid:219)× Ψ , ∆ , α β , δ + γ ) ◦ ⟨ dst Φ , Ψ ◦ ( π × π ) , dst Φ , Ψ ◦ ( π × π )⟩| W ( Φ , ∆ , α , δ )× W ( Ψ , ∆ , β , γ ) = G( ρ × ρ ′ ) ◦ dst Φ , Ψ ◦ ( π × π )| W ( Φ , ∆ , α , δ )× W ( Ψ , ∆ , β , γ ) = G( ρ × ρ ′ ) ◦ dst Φ , Ψ ◦ ( π | W ( Φ , ∆ , α , δ ) × π | W ( Ψ , ∆ , β , γ ) ) = dst X , X ′ ◦ ((G ρ ◦ π | W ( Φ , ∆ , α , δ ) ) × (G ρ ′ ◦ π | W ( Ψ , ∆ , β , γ ) )) , G( ρ × ρ ′ ) ◦ π | W ( Φ (cid:219)× Ψ , ∆ , α β , δ + γ ) ◦ ⟨ dst Φ , Ψ ◦ ( π × π ) , dst Φ , Ψ ◦ ( π × π )⟩| W ( Φ , ∆ , α , δ )× W ( Ψ , ∆ , β , γ ) = dst Y , Y ′ ◦ ((G ρ ◦ π | W ( Φ , ∆ , α , δ ) ) × (G ρ ′ ◦ π | W ( Ψ , ∆ , β , γ ) )) . Hence, (13) is well-defined. It is easy to check the axioms of double strength (modulo grading) byusing double strength of the sub-Giry monad G . □□