[PDF] A Monad for Probabilistic Point Processes

Abstract

Full PDF

DDavid I. Spivak and Jamie Vicary (Eds.):Applied Category Theory 2020 (ACT2020)EPTCS 333, 2021, pp. 19–32, doi:10.4204/EPTCS.333.2 © S. Dash & S. StatonThis work is licensed under theCreative Commons Attribution License.

A Monad for Probabilistic Point Processes

Swaraj Dash Sam Staton

University of OxfordOxford, United Kingdom [email protected] [email protected]

A point process on a space is a random bag of elements of that space. In this paper we exploreprogramming with point processes in a monadic style. To this end we identify point processes ona space X with probability measures of bags of elements in X . We describe this view of pointprocesses using the composition of the Giry and bag monads on the category of measurable spacesand functions and prove that this composition also forms a monad using a distributive law for monads.Finally, we deﬁne a morphism from a point process to its intensity measure, and show that this is amonad morphism. A special case of this monad morphism gives us Wald’s Lemma, an identity usedto calculate the expected value of the sum of a random number of random variables. Using our monadwe deﬁne a range of point processes and point process operations and compositionally compute theircorresponding intensity measures using the monad morphism. Point processes (e.g. [5]) are random collections of points. They serve as important tools in statisticalanalysis, where they are used to study various phenomena in ﬁelds as diverse as ecology, astronomy,computational neuroscience, and telecommunications, and in Bayesian analysis, where they are usedfor probabilistic inference (e.g. [17]). As a simple example, in Figure 1 we illustrate ﬁve draws froma Poisson point process on the unit square. A Poisson point process is deﬁned to be one in which thenumber of points in any two disjoint regions are independent of each other. One of the core tools of pointprocess theory is the notion of intensity measure, which assigns to each region the average number ofpoints that will appear in the region. In a homogeneous Poisson point process like Figure 1, the averagenumber of points in a region is proportional to the area of the region. This is a very simple point process,but an important starting point for many models.The centerpiece of our categorical analysis of point processes is the space G ( B ( X )) which we now explain.• X is a measurable space such as the unit square I or the natural numbers N . We work in a categoryof measurable spaces so that we can discuss probability and integration in both the countable anduncountable settings. (See §2 for details.)Figure 1: Five draws from a Poisson point process on the unit square with rate 10.0.0 A Monad for Point Processes

Figure 2: Four draws from a point process on the natural numbers.• B ( X ) is a space of all bags of points in X . A bag (aka multiset) is a ﬁnite unordered list of elementsin X . For example, the ﬁrst draw in Figure 1 is a bag of 9 points in X = I , and the last draw inFigure 2 is a bag of 6 points in X = N , with 5 overlapping points at 0 (multiplicity 5) and one at 6(multiplicity 1). (See §3 for details.)• G ( B ( X )) is a space of point processes, i.e. the space of probability measures on the space of bagsof points in X . Here, G stands for Giry, who carried out early work on the category theory ofspaces of probability measures.(This space G ( B ( X )) is typically uncountable, but as is common in statistics, it is helpful to run a simu-lation, outputting a ﬁnite number of draws, as in Figures 1, 2, and 4.)This paper has two main contributions:• The construction G ( B ( X )) forms a monad (§4). This is useful because it gives us a composi-tional framework for point processes. One can build point processes (elements 1 → G ( B ( X )) ) bycomposing morphisms in the Kleisli category for the monad GB , using a syntax like Haskell’sdo-notation (§5). Our construction of a monad uses Beck’s theory of distributive laws of monads.• The construction assigning to each point process its intensity measure is a monad morphism (§6,Theorem. 15). Thus, if we build a point process in a compositional way, we can also calculate itsintensity in the same compositional way. The key idea here is to regard both G and B as submonadsof a monad M of all measures, so that the intensity measure function can be deﬁned as a composite GB (cid:44) → MM µ −→ M . Broader context.

The broader context of this work is the idea that category theory can be a languagefor organizing the structure of statistical models. At one end of the spectrum, this line of work involves afoundational categorical analysis (e.g. [6, 8, 9, 10, 11, 18, 19, 23, 27]). At the other end of the spectrum is‘probabilistic programming’, a popular method of statistical modelling using programs (e.g. [13, 30, 25]);in many instances this is functional programming and so heavily inspired by category theory. This fullspectrum of work plays a foundational role in probability and statistics, but also addresses a practicalgrand challenge of interpretability in statistical models, since category theory and programming allow usto clearly organize the structure of complicated statistical models, via composition.Our work here appears to be the ﬁrst work on point processes in this context. However, point pro-cesses are widely used in statistical models in practice. We mention two programming styles whoserelationship to point processes has only recently become evident:• probabilistic logic programming, in the style of B

LOG [31], is about describing random sets ofpoints;• probabilistic databases are random bags of records [15]. . Dash & S. Staton

We recall basic measure theory, which is the standard formulation for probability theory over uncountablespaces (e.g. [26]).

Deﬁnition 1. A σ -algebra on a set X is a nonempty family Σ X of subsets of X that is closed undercomplements and countable unions. The pair ( X , Σ X ) is called a measurable space (we just write X when Σ X can be inferred from context).Given ( X , Σ X ) , a measure is a function ν : Σ X → R ∞ + such that for all countable collections of disjointsets A i ∈ Σ X , ν ( (cid:83) i A i ) = ∑ i ν ( A i ) . In particular, ν ( ∅ ) =

0. It is a probability measure if ν ( X ) = pre-measure is deﬁned to be this additivity condition except that it is not necessarily deﬁned on a σ -algebra. Examples.

The Borel sets form the least σ -algebra Σ R of subsets of R that contain the intervals ( a , b ) .On a countable set X , such as N or the one-point set = { (cid:63) } , we will typically consider the discrete σ -algebra, which contains all the subsets. In this context, the measures are entirely determined by theirvalues on singletons, ν ( { x } ) , and so a measure is the same thing as a function X → R ∞ + . Deﬁnition 2.

Let ( X , Σ X ) and ( Y , Σ Y ) be two measurable spaces. A measurable function f : X → Y is afunction such that f − ( U ) ∈ Σ X when U ∈ Σ Y . The category Meas contains as objects measurable spaceswith the morphisms being measurable functions between them.For any measurable function f : X → R ∞ + , and any measure ν : Σ X → R ∞ + , we can deﬁne the Lebesgueintegral or expected value (cid:82) X f d ν of f . Deﬁnition 3.

A measurable space ( X , Σ X ) is a standard Borel space if it is either measurably isomorphicto ( R , Σ R ) or it is countable and discrete. The Giry functor G : Meas → Meas sends a measurable space to the space of all possible probabilitymeasures on it [12]. By slight abuse of notation, let G ( X , Σ X ) : = ( GX , Σ GX ) . GX is the set of all probabil-ity measures ν : Σ X → [ , ] on X equipped with the σ -algebra Σ GX generated by the set of all evaluationmaps ev U : GX → [ , ] , sending ν to ν ( U ) (where U ∈ Σ X ). In other words, it is generated by sets ofprobability measures D UI = { ν : Σ X → [ , ] | ν ( U ) ∈ I } . Σ GX = σ ( { D UI | U ∈ Σ X , I ∈ B ([ , ]) } ) where B ([ , ]) is the Borel σ -algebra and σ is the closure operator which when given a family of subsetsgenerates the required σ -algebra by closing the family under countable unions and complements. Thefollowing unit η GX : X → GX and multiplication µ GX : GGX → GX make G into a monad. η GX ( x ) = δ x = λ U . (cid:40) x ∈ U µ GX ( ν ) = λ U . (cid:90) GX ev U d ν A Monad for Point Processes

Given measurable f : X → Y , the functorial action G f : GX → GY sends ν ∈ GX to the push-forward measure of ν along f , ν ◦ f − ∈ GY . An important property of push-forward measures is the change-of-variables formula (cid:82) Y g d ( G f )( ν ) = (cid:82) X ( g ◦ f ) d ν (where g : Y → [ , ] ). The all-measures functor M : Meas → Meas is deﬁned similarly to the Giry functor. It sends measurablespaces ( X , Σ X ) to the space of all measures ( MX , Σ MX ) where MX is the set of measures µ : Σ X → R ∞ + and Σ MX = σ ( { m Ur | U ∈ Σ X , r ∈ R } ) with m Ur = { µ : Σ X → R ∞ + | µ ( U ) < r } . The same unit and multiplicationmaps make M into a monad. (Warning: The Giry monad is strong and commutative, but the all-measuresmonad M is not strong, because Fubini’s Theorem does not hold for arbitrary measures. So somethingmore reﬁned is needed for functional programming, but that is not an issue for this paper; see e.g. [28]for details.) In this Section we discuss the construction of the bag monad in the context of measure theory. A bag (akamultiset) is a ﬁnite unordered list of elements in some set. For example, the bag [ , , , , ] contains 5twice and 8 three times, and can also be written as [ , , , , ] since bags are unordered. We begin byrecalling the bag endofunctor B in Set , which we show to lift to an endofunctor in

Meas by assigningthe σ -algebra Σ BX to the space of bags in § 3.1 (Deﬁnition 4). In § 3.2 we prove that B , which is amonad in Set , lifts to a monad in

Meas by showing that the unit and multiplication maps extend tomeasurable functions. Later in § 4 we will need to deﬁne probability measures on BX , i.e., functions Σ BX → [ , ] . Deﬁning such functions entails having to deﬁne them on arbitrary combinations of unionsand intersections of our generating sets A Uk . We simplify this task by making use of Carath´eodory’sextension theorem in § 3.3 to show that it sufﬁces to simply deﬁne these functions on the generating setsof Σ BX without needing to deﬁne them on all of Σ BX . Consider the well-known ﬁnite bag endofunctor B : Set → Set where BX is the set of all ﬁnite bags withelements of X . Given a function f : X → Y , the function B f : BX → BY applies f component-wise to itsargument bag. The natural transformations η BX : X → BX and µ BX : BBX → BX which return the singletonbag and the (multiplicity respecting) union of bags respectively make ( B , η B , µ B ) into a monad. η BX ( x ) = [ x ] µ bX ([ b , . . . , b n ]) = (cid:91) i b i In order to lift B : Set → Set to B : Meas → Meas we equip a σ -algebra Σ BX to our set BX . Deﬁnition 4 (Measurable space of bags) . Let BX be the set of bags on the measurable space X . Equip BX with the σ -algebra Σ BX formed by the σ -closure of generating sets A Uk = { b ∈ BX | b contains exactly k elements in U } . Σ BX = σ ( { A Uk | U ∈ Σ X , k ∈ N } ) Then ( BX , Σ BX ) is the measurable space of bags of X . Throughout this paper we use λ -notation to describe functions between sets. Note that the category Meas is not cartesianclosed and so this is not intended as a formal internal language (c.f. [18]). . Dash & S. Staton A Uk ∈ Σ BX contains bags of X of cardinality at least k ,as each bag in it contains k elements in U , in addition to possible other elements not in U . The set A Xk ,on the other hand, contains all the bags of X of cardinality exactly k (since X is our universal set). Theirintersection A Xn ∩ A Uk is then the set of bags of cardinality n with k elements in U . This can be extendedto construct the set of bags of cardinality n containing k i elements in U i for some family of sets U i ∈ Σ X ,which is then the intersection A Xn ∩ (cid:16) (cid:84) i A U i k i (cid:17) . Lemma 5.

The unit and multiplication maps η BX : X → BX and µ BX : BBX → BX are measurable.Proof.

To prove the measurability of these functions it sufﬁces to show that the pre-images of the gen-erating sets A Uk are measurable. Consider U ∈ Σ X and some A Uk ∈ Σ BX . The inverse image map of theunit η BX − ( A Uk ) evaluates to U (the complement of U ) if k = U if k =

1, and ∅ otherwise, all of whichare elements of Σ X , and so η BX − is measurable. We now sketch why µ BX − ( A Uk ) ∈ Σ BBX and later show adetailed argument for the case k =

4. Call this set P . By deﬁnition of the inverse image map, P is thecollection of bags of bags such that the arbitrary union of each bag of bags contains exactly k elementsin U . Recall that the set of bags of cardinality n containing k i elements in U i for some family of sets U i ∈ Σ X , is given by A Xn ∩ (cid:16) (cid:84) i A U i k i (cid:17) . By using this technique of describing collections of bags and con-sidering the various partitions of the number k such that the resulting arbitrary union of bags will contain k elements in U , we can express P entirely using measurable sets, allowing us to conclude that µ BX is ameasurable function. Example . Let P = µ BX − ( A U ) . 4 can be partitioned in ﬁve ways: { , + , + , + + , + + + } . P is the set of bags of bags such that the arbitrary union of each bag of bags contains exactly 4elements in U . We start by considering elements of this set based on their cardinalities.• There is only one collection of bags of cardinality 1 which are members of P . These are the bagswhich contain a single bag which in turn contains 4 elements in U . Denote this collection as (cid:104) (cid:105) .• There are three collections of bags of cardinality 2 which are members of P . The ﬁrst containstwo bags which have 4 and 0 elements in U , the second with 3 and 1 elements in U , and the thirdwith 2 and 2 elements in U , respectively. We write them as (cid:104) , (cid:105) , (cid:104) , (cid:105) , and (cid:104) , (cid:105) .• Cardinality 3: (cid:104) , , (cid:105) , (cid:104) , , (cid:105) , (cid:104) , , (cid:105) , and (cid:104) , , (cid:105) .• Cardinality 4: (cid:104) , , , (cid:105) , (cid:104) , , , (cid:105) , (cid:104) , , , (cid:105) , (cid:104) , , , (cid:105) , (cid:104) , , , (cid:105) .• Cardinality 5: (cid:104) , , , , (cid:105) , (cid:104) , , , , (cid:105) , (cid:104) , , , , (cid:105) , (cid:104) , , , , (cid:105) , (cid:104) , , , , (cid:105) . And so on.Each collection is deﬁnable using the generating sets, and the collections of different cardinalities aremutually disjoint. For example, (cid:104) , , (cid:105) = A BX ∩ ( A B ∩ A B ∩ A B ) and (cid:104) , , , (cid:105) = A BX ∩ ( A B ∩ A B ∩ A B ) where B i = A Ui . Finally, P is the union of all these disjoint collections. P = µ BX − ( A U ) = (cid:104) (cid:105) ∪ (cid:104) , (cid:105) ∪ (cid:104) , (cid:105) ∪ (cid:104) , (cid:105) ∪ (cid:104) , , (cid:105) ∪ . . . Theorem 7. ( B : Meas → Meas , η B , µ B ) is a monad.Proof. The monad laws hold as in

Set . Furthermore, η B and µ B are measurable (Lemma 5).4 A Monad for Point Processes BX In this Section we construct a ring – a set of sets containing the empty set closed under pairwise unionsand relative complements – of the generating sets A Uk of Σ BX in order to invoke Carath´eodory’s extensiontheorem. This allows us to deﬁne measures by deﬁning them on just speciﬁc unions of intersections ofthe sets A Uk of Σ BX rather than having to deﬁne them on all the arbitrary combinations of unions andintersections of these sets. Theorem 8 (Carath´eodory’s extension) . Let R be a ring and ν : R → R ∞ + be a pre-measure. Then thereexists a measure ˜ ν : σ ( R ) → R ∞ + such that ˜ ν ( S ) = ν ( S ) for all S ∈ R . We start by deﬁning R (cid:48) to be the set of countable intersections of our generating sets above such thattheir base sets are mutually disjoint. R (cid:48) = (cid:40) (cid:92) i A U i k i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) U i ∈ Σ X , U i ’s mutually disjoint , k i ∈ N (cid:41) Now deﬁne R to be the closure of R (cid:48) under countable unions. The elements of R are the unions ofintersections of certain generating sets. In particular, any P ∈ R can be expressed as P = (cid:83) i (cid:84) j A U i , j k i , j . Examples.

One example of such a set P is ( A α ∩ A β ) ∪ ( A β ∩ A γ ) ∪ ( A α ∩ A β ∩ A δ ) where α , β , γ , δ ∈ Σ X . Note that although α and β , β and γ , and α , β , and δ are all mutually disjoint (by deﬁnition of R (cid:48) ), it isstill possible for α and γ to overlap. Using standard set theoretic identities we can redeﬁne P in terms of α \ γ (instead of just α ) , β , γ , δ for all the base sets across the unions to be mutually disjoint.Consider also the set ( A α ∩ A β ) ∪ ( A α ∩ A γ ) where α , β , γ ∈ Σ X are mutually disjoint. We can rewrite A α ∩ A β as (cid:83) i ( A α ∩ A β ∩ A γ i ) since (cid:83) i A γ i is simply the universal set. The right half of the set above cansimilarly be rewritten, enabling us to reformulate it as (cid:83) i ( A α ∩ A β ∩ A γ i ) ∪ (cid:83) i ( A α ∩ A β i ∩ A γ ) . From the two examples above, we can assume without loss of generality that an arbitrary element P ∈ R will be of the form (cid:83) i (cid:84) j A U j k i , j such that all the U j ’s are mutually disjoint.Finally, note that any twosets (cid:84) j A U j k m , j and (cid:84) j A U j k n , j are disjoint unless for all j , k m , j = k n , j . And so, every P can be viewed as thedisjoint union of a set of sets. Call this the disjoint normal form (it is not unique). Lemma 9. R contains the empty set and is closed under pairwise unions and relative complements.Proof. It is clear that ∅ ∈ R . The set R is by deﬁnition closed under countable unions, and so isalso closed under pairwise unions. Consider P and Q ∈ R with their respective disjoint normal forms.Without loss of generality, we can express both P and Q using the same set of mutually disjoint basesets U j ∈ Σ X . This gives us P = (cid:83) i (cid:84) j A U j a i , j and Q = (cid:83) i (cid:84) j A U j b i , j . Since both P and Q have been formedby taking the unions of a common set of disjoint sets belonging to R (cid:48) , their difference P \ Q can also beexpressed at the disjoint union of sets belonging to R (cid:48) and so P \ Q ∈ R .Having shown R to be a ring, we have by Carath´eodory’s extension theorem that any pre-measuredeﬁned on R extends to a measure deﬁned on σ ( R ) . And so, in order to deﬁne a measure on BX , it willsufﬁce to deﬁne it on sets (cid:83) i (cid:84) j A U j k i , j . We use this fact in the next Section. . Dash & S. Staton A point process on a space X is a probability measure on bags of X . By composing the Giry and bagmonads we can deﬁne GBX to be the space of point processes on X . In other words, a point process α ∈ GBX is a probability measure α : Σ BX → [ , ] assigning probabilities to measurable subsets of bags A Uk . The probability of observing k points in the region U of the point process α is then α ( A Uk ) . Earlier we showed that G and B both form monads. It is well-known that the composition of twomonads does not automatically yield a new monad. In this Section we prove that the composite functor GB admits a monadic structure by deﬁning the natural transformation l : BG → GB , called the distributivelaw [1] of G over B , such that the following identities hold. (Triangle I) l ◦ B η G = η G B G η B = l ◦ η B G (Triangle II)(Pentagon I) l ◦ B µ G = µ G B ◦ Gl ◦ lG G µ B ◦ lB ◦ Bl = l ◦ µ B G (Pentagon II) This distributive law l then induces the GB monad with the unit deﬁned as the horizontal composition η G ∗ η B , and the join deﬁned as the composition of the horizontal composition µ G ∗ µ B with GlB . η GB : 1 η G ∗ η B −−−−→ GB and µ GB : GBGB

GlB −−→

GGBB µ G ∗ µ B −−−−→ GB The distributive law l X : BGX → GBX is a function from bags of probability measures to probabilitymeasures on bags. We showed in § 3.3 that in order to deﬁne a measure on Σ BX it sufﬁces to deﬁne apre-measure on sets of the form (cid:83) i (cid:84) j A U j k i , j . Carath´eodory’s extension theorem ensures this pre-measureextends to a measure.In the deﬁnition that follows, we consider a bag of probability measures [ ν , . . . , ν n ] ∈ BGX and (cid:83) i (cid:84) j A U j k i , j ∈ Σ BX . We deﬁne the application of l [ ν , . . . , ν n ] to this disjoint union of intersections as thesum of products of l [ ν , . . . , ν n ]( A U j k i , j ) . Each of these sub-terms is in turn deﬁned as the push-forward ofthe product measure along K n , a function mapping n -tuples to bags of cardinality n . An example followsin (1). l [ ν , . . . , ν n ]  (cid:91) i (cid:92) j A U j k i , j  def = ∑ i ∏ j l [ ν , . . . , ν n ] (cid:16) A U j k i , j (cid:17) def = ∑ i ∏ j GK n ( ⊗ i ν i ) (cid:16) A U j k i , j (cid:17) where K n : Y n → B n Y is the measurable function that sends n -tuples to bags of cardinality n ( B n Y ⊆ BY ).( K n is measurable since K − n : Σ B n Y → Σ Y n sends sets A Uk to their corresponding disjoint unions of n -products of U and U . For example, K − ( A U ) = U × U (cid:93) U × U .) In the deﬁnition above Y has beeninstantiated to be GX . Intuition: the term l [ ν , . . . , ν n ] is a point process where the probability of observing k points in someregion U ∈ Σ X is the probability of observing a total of k points landing in U after independently samplinga point each from all the ν i ’s ∈ GX . The following example calculation of l [ ν , ν ]( A U ) conﬁrms thisidea. l [ ν , ν ]( A U ) = GK ( ν ⊗ ν )( A U ) = ( ν ⊗ ν )( K − ( A U ))= ( ν ⊗ ν )( U × U (cid:93) U × U ) = ν ( U ) ν ( U ) + ν ( U ) ν ( U ) . (1)6 A Monad for Point Processes

Note that l [ ν , . . . , ν n ]( A Uk ) = k > n . Observe that ∑ i l [ ν , . . . , ν n ]( A Ui ) = . Showing that l is measurable is a routine calculation, and is made simpler with the knowledge thatsets of constant-cardinality bags are measurable.We noted earlier that the sets A Um contain bags of varying cardinalities. In the following lemma weshow that the measure l [ ν , . . . , ν n ] acts only on the subset of these sets with cardinality n . This result issimple yet very useful in providing an intuitive understanding of the distributive law, and is instrumentalin proving the second pentagon identity. Lemma 10.

For [ ν , . . . , ν n ] ∈ BGX , l [ ν , . . . , ν n ]( A Xm ∩ W ) = l [ ν , . . . , ν n ]( W ) if m = n and 0 if m (cid:54) = n.Proof. Consider l [ ν , . . . , ν n ]( A Xm ) . Using the deﬁnition of l this probability can be expressed as the sumof products of a combination of ν i ( X ) and ν j ( X ) terms (where i , j range from 1 to n ). Unless m = n ,each summand will contain at least one factor with ν j ( X ) = ν j ( ∅ ) =

0, nullifying the entire sum. It isonly non-zero when m = n , representing the probability of observing n points in the entire space, whichhas probability 1. The measure of any set A Xn ∩ W is then just the measure of W . Theorem 11. ( GB : Meas → Meas , η GB , µ GB ) is a monad via the distributive law l : BG → GB.Proof (sketch).

We prove that the four identities for the distributive law hold. The two triangle identitiesfollow from simple algebra. For the ﬁrst pentagon identity we make use of the change-of-variablesformula, as l [ ν , . . . , ν n ] = GK n ( (cid:78) i ν i ) is a push-forward measure, and prove the resulting equality usingstandard integration identities. For the ﬁnal pentagon identity we are required to work with the set µ BX − ( A Uk ) , which we decompose using the method presented in Lemma 5. Invoking Lemma 10 on theseconstant-cardinality decompositions allows us to simplify the resulting expression by removing sets withmeasure zero and prove the ﬁnal equality after some more algebraic manipulations. The unit η GB returns the deterministic point process η GB ( x ) with the singular point x . When program-ming with monads it is often convenient to focus on Kleisli composition in a stylized form, usingthe function >> = GB : Meas ( X , GBY ) → Meas ( GBX , GBY ) (pronounced bind ); we write α >> = GB f for >> = GB ( f )( α ) [24]. x x ( ) (cid:55)→ ( ) (cid:55)→ f ( x ) f ( x ) ( ) (cid:55)→ Figure 3: Sampling from a composite process.This presents us with useful intuition for pro-gramming with point processes. Let X and Y be discrete sets. Then the process of samplingpoints from α >> = GB f ∈ GBY can be viewed asthe following simulation, illustrated in Figure 3.(1) Sample a bag of points from α . (2) Each point x i in this bag produces a point process f ( x i ) from which we sample a bag of points in Y . (3) A sampleof points from the overall point process is the union of these bags of points in Y . This intuition allows usto declaratively program with point processes by being able to deﬁne them simply by how they must besimulated. We make extensive use of this intuition in the next Section. Aside about related work.

A long-term problem in programming semantics has been the combinationof probability and non-determinism (e.g. [29]). In that context, it is well known that there is no dis-tributive law between the probability monad and the powerset monad (e.g. [29, 32, 14]). It has recentlybecome well-known that, in the set-theoretic case, it is possible to ﬁnd a distributive law by using a bagmonad instead of a powerset monad (e.g. [22, App. A], [4, 32]). This is not a sleight of hand, because . Dash & S. Staton co monoid also playsan important role in the theory of linear logic. Recently, bag-like exponentials have arisen in models ofprobabilistic linear logic [3, 16]. The precise relationship to our monad and our distributive law remainsto be seen.More broadly, bags, multisets and urns play a fundamental role in statistics and arise at various pointsin a categorical treatment (e.g. [19, 20, 21]). Probability distributions as point processes

As a ﬁrst example, we describe probability distributionson the natural numbers as point processes on the singleton space = { (cid:63) } , based on the observation that abag of singletons is a natural number ( B ∼ = N ). Any probability distribution d ∈ G N (so that ∑ ∞ i = d i = d ∈ GB where we observe k copies of (cid:63) with probability d k : d (cid:16) A { (cid:63) } k (cid:17) : = d k (2) Building compound probability distributions.

Using our monad we deﬁne compound distributions as point processes on the unit type. A compound probability distribution is the probability distribution ofthe sum of a number of independent identically-distributed random variables, where the number of termsto be added is itself a random variable. For example, given a random variable N ∼ Poisson ( Λ ) and iidvariables X i , the random variable Y = ∑ Ni = X i forms a compound Poisson distribution.Recall the behaviour of >> = GB on countable sets described in § 4.2. By considering N ∈ GB (say, thePoisson distribution) and X ∈ GB (the distribution of the iid X i ), we can express compound distributionsas: γ = N >> = GB λ (cid:63). X ∈ GB . (3) A Poisson point process on the unit square

The Poisson point process on the unit square I (Fig. 1)can be simulated by ﬁrst sampling a random number of points from the Poisson distribution, and thenuniformly distributing these points across I . Consider again a Poisson distribution N ∈ GB as a pointprocess. Now consider the point process U ∈ GB I which returns a single point uniformly distributed in I . The Poisson point process π can be built using the monad: π = N >> = GB λ (cid:63). U ∈ GB I . (4) Thinning a point process.

Thinning is an operation applied to the points of an underlying point pro-cess, where the points are thinned (removed) according to some probabilistic rule. Given some pointprocess α ∈ GBX and some thinning rule t : X → GBX such that t ( x ) probabilistically returns either [ x ] or ∅ , we can use the monad to build the thinned point process α (cid:48) ∈ GBX as α (cid:48) = α >> = GB λ x . t ( x ) ∈ GBX . A Monad for Point Processes

Displacing a point process.

Displacement is an operation applied to the points of an underlying pointprocess, where the points are independently randomly displaced (translated) according to some distri-bution. We model this distribution as a single-point point process ∆ ∈ GB R . The location of this ran-dom point is the random displacement distance. For α ∈ GB R we simulate the displaced point process α (cid:48) ∈ GB R by sampling an x from α , a displacement distance d from ∆ , and then returning the displacedpoint. α (cid:48) = α >> = GB λ x . ( ∆ >> = GB λ d . η GB ( x + d )) ∈ GB R . Figure 4: Two draws from a clustered point process

Clustered point processes.

Clus-tered point processes are useful inmodelling phenomena which involvemultiple points spawning from in-dividual seeds, such as clusters oftrees, galaxies, or diseases. In-formally, a clustered point processis anything built using the monadicbind γ >> = λ x . γ ( x ) , where γ is apoint process for the initial seeds,and γ is a point process that growsfrom each seed, where the location x of a given seed may be a parameter. For a simple example, considerthe point process in Fig. 4 consisting of small square clusters within the unit square. To simulate it weﬁrst sample the centres of these clusters from our Poisson point process π on the unit square (4), and foreach cluster center we sample another Poisson point process. To sample the second point process, weagain sample from another Poisson distribution N (cid:48) , whose rate now depends on the provided coordinatesof the cluster – the closer to the diagonal, the higher the rate, and we uniformly distribute these points ina small square about this center using U (cid:48) , which is a location-dependent and scaled-down modiﬁcationof U introduced earlier. This results a Poisson number of clusters of Poisson processes, with those closerto the diagonal being denser than those farther away. β = π >> = GB λ ( x , y ) . ( N (cid:48) ( x , y ) >> = GB λ (cid:63). U (cid:48) ( x , y )) ∈ GB I . Here, N is itself deﬁned using the monad (4). This example is quite simple, but already illustrates that wecan use the monad to quickly and clearly compose point processes to build complex statistical models. A useful characteristic for describing point processes is the expected number of points in a given region.For example, in Fig. 1 we illustrated the homogeneous Poisson process with rate 10. The expectednumber of points in any region is proportional to 10 a where a is the area of the region. More generally,the intensity measure of a point process is the measure that assigns to each measurable subset the expectednumber points in it.There is a function E : GB ( X ) → M ( X ) that takes a point process to its intensity measure. In Theo-rem. 15 we show that this function is actually a monad morphism from the point process monad (§4) tothe monad of all measures (§2.3). Thus, if we build a point process using the monadic constructions (forexample by composing morphisms in the Kleisli category of GB ) then we can immediately read off its . Dash & S. Staton The expected number of points in a region U ∈ Σ X of a point process α ∈ GBX can be given in terms ofour generating sets A Uk for BX , as ∑ ∞ k = k · α ( A Uk ) . We ﬁrst show how to understand this in a more abstract way, by injecting both probability measures GX and bags BX each into measures MX , in a measurable and natural way. The injection i G : G → M is straightforward because GX ⊆ MX . The injection i B : B → M sends bags [ x , . . . , x n ] to ∑ ni = δ x i , themultiplicity-respecting sum of Dirac deltas centered around the elements x i . It is measurable since itsinverse image map sends the generating sets m Ur ∈ Σ MX to (cid:83) (cid:98) r (cid:99) i = A Ui ∈ Σ BX . The proof that this is injectiverelies on X being standard Borel. This injection is familiar in point process theory, indeed many authorsactually deﬁne BX as a space of integer-valued measures in the ﬁrst place. We can combine the horizontalcomposition of these two injections ( i G ∗ i B ) with the multiplication of M in order to deﬁne E . E def = GB i G ∗ i B −−−→ MM µ M −−→ M In the remainder of this paper we omit ∗ when writing horizontal compositions. This deﬁnition of E doesindeed return the intensity measure of a point process: Lemma 12.

For any point process α ∈ GBX , E ( α )( U ) = ∑ k k · α ( A Uk ) .Proof. Consider α ∈ GBX and U ∈ Σ X . On expanding the horizontal composition i G i B and using thechange-of-variables formula for pushforward measures we have that E ( α )( U ) = (cid:90) MX ev U d Mi BX ( i GBX ( α )) = (cid:90) b ∈ BX i BX ( b )( U ) α ( d b ) . We separately compute this integral on the disjoint partitions A Uk ( k ∈ N ) of BX . In each partition, thevalue of i BX ( b )( U ) is equal to k (by deﬁnition). This gives us the desired inﬁnite sum of ∑ k k · α ( A Uk ) .To show that E : GB → M is a monad morphism we need to prove that (Unit) η M = E ◦ η GB and µ M ◦ EE = E ◦ µ GB (Mult) . Our main result stems from the fact that l interacts well with i G and i B , which we prove next. Lemma 13. ( E ◦ l =) µ M ◦ i G ∗ i B ◦ l = µ M ◦ i B i G : BG → M. BGX GBXM X M XMXl X ( i B i G ) X µ MX µ MX ( i G i B ) X Proof. (Diagram chasing) Consider [ ν , . . . , ν n ] ∈ BGX and U ∈ Σ X . Go-ing from BGX to MX along the left edge and applying the resulting mapto U gives us ∑ i ν i ( U ) . Along the other edge, making use of Lemma12, we get ∑ i i · l [ ν , . . . , ν n ]( A Ui ) . Their equality can be proved by notic-ing that l [ ν , . . . , ν n ]( A Uk ) is simply the coefﬁcient of x k in the polynomial P ( x ) = ∏ i ( ν i ( ¯ U ) + ν i ( U ) · x ) . And so equivalently P ( x ) = ∑ i l [ ν , . . . , ν n ]( A Ui ) · x i . The desired equalityis then arrived at by taking the derivative of P ( x ) at x = Lemma 14. i G : G → M and i B : B → M are monad morphisms. A Monad for Point Processes

Theorem 15.

The intensity measure E : GB → M is a monad morphism.Proof.

A simple calculation shows

Unit to hold. For

Mult consider the two diagrams below.

GBGB GGBBGMBGMMB GMMBM M M M M M M M M III IIIII III

GlBi G MMi B i G MMi B G µ M B G µ M BGi B i G B Gi G i B BM µ M M M µ M Mi G Mi B µ M MM µ M MMM µ M M µ M M µ M µ M µ M µ M IV VVI VII

GGBB GBB GBMMBB MBB MBM M M i G i G BB i G BB i G BMMi B i B Mi B i B Mi B µ G BB G µ B µ M BB M µ B µ M MM M µ M All the sub-diagrams above commute: (I) due to Lemma 13, (II) by naturality, (III) due to associativityof µ M , (IV) and (VII) due to i G and i B being monad morphisms (Lemma 14), and ﬁnally (V) and (VI) bynaturality. Using the commutative diagrams above we prove the required equality. µ M ◦ EE : GBGB → M µ M ◦ EE = µ M ◦ M µ M ◦ µ M MM ◦ i G MMi B ◦ Gi B i G B (defn. of E + naturality) = µ M ◦ M µ M ◦ µ M MM ◦ i G MMi B ◦ Gi G i B B ◦ GlB (left diagram) = µ M ◦ M µ M ◦ µ M MM ◦ MMi B i B ◦ i G i G BB ◦ GlB (naturality) = µ M ◦ Mi B ◦ i G B ◦ G µ B ◦ µ G BB ◦ GlB (right diagram) = µ M ◦ Mi B ◦ i G B ◦ µ GB (defn. of µ GB ) = E ◦ µ GB (defn. of E ) Example . In §5 we simulated a Poisson point process by composing the Poisson distribution with auniform singleton. We show this has the required intensity measure in a compositional way, using themonad morphism. Let N ∈ GB ∼ = G N be the Poisson distribution with mean Λ , and let U ∈ GB I bethe uniformly distributed single-point process, from §5. The simulated Poisson process is π = ( N >> = GB λ (cid:63). U ) , and we have E ( π ) = E ( N >> = GB λ (cid:63). U )= E ( N ) >> = M λ (cid:63). E ( U ) (Theorem 15) = λ W ∈ Σ I . E ( N )( (cid:63) ) × E ( U )( W )= λ W ∈ Σ I . Λ ×| W | ∈ M I In the penultimate step we use the fact that in the discrete case the bind for M , just like for G , computesa weighted sum, which in this case is just a single term. In the ﬁnal step, we substitute in the intensity . Dash & S. Staton N and the intensity measure of the uniform point process U , producing the correct intensity. W is ameasurable subset of the unit square and | W | is its area. Example

17 (Discrete Wald’s Lemma) . We regard arbitrary probability distributions N , X on the naturalnumbers as point processes in GB via (2). Wald’s lemma says that the expected value of the compounddistribution ( γ , (3)) is the product of the expectations, which is immediate from the fact that E is a monadmorphism: E ( γ ) = E ( N >> = GB λ (cid:63). X )= E ( N ) >> = M λ (cid:63). E ( X ) (Theorem 15) = λ (cid:63). E ( N )( (cid:63) ) × E ( X )( (cid:63) ) ∈ M Remark.

We remark that a natural transformation in the opposite direction ( M + → GB ) has been ex-hibited in [2], where M + ( X ) is the space of ﬁnite non-empty measures. This natural transformation takesan intensity measure to the corresponding inhomogeneous Poisson process. Since M + is not a monad, itremains to be seen whether this natural transformation can be made into a monad morphism somehow. Concluding remarks.

We have exhibited a monad GB for point processes (§4), and shown that theintensity measure is a monad morphism (§6). This gives a compositional way of building and reasoningabout increasingly complicated point processes (§5). This is further evidence towards the claim thatapplied category theory has the potential to be a useful tool for statistical modelling. Acknowledgements.

We are grateful for discussions with Peter Lindner regarding the role of pointprocesses in his work [15]. Thanks too to Bart Jacobs and Gordon Plotkin about the role of multisets.Thanks to the anonymous reviewers and to Mathieu Huot and Dario Stein for their feedback. Finallywe appreciate the opportunity to present this work at the LAFI 2020 workshop [7]. Staton’s research issupported by a Royal Society University Research Fellowship.

References [1] J. Beck (1969):

Distributive laws . In B. Eckmann, editor:

Seminar on Triples and Categorical HomologyTheory , Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 119–140, doi:10.1007/BFb0083084.[2] F. Dahlqvist, V. Danos & I. Garnier (2016):

Giry and the Machine . In:

Proc. MFPS 2016 , pp. 85–110,doi:10.1016/j.entcs.2016.09.033.[3] F. Dahlqvist & D. Kozen (2020):

Semantics of Higher-Order Probabilistic Programs with Conditioning . In:

Proc. POPL 2020 , doi:10.1017/S0960129516000426.[4] F. Dahlqvist, L. Parlant & A. Silva (2018):

Layer by Layer – Combining Monads . In:

Proc. ICTAC 2018 ,doi:10.2168/lmcs-3(4:11)2007.[5] D. Daley & D. Vere-Jones (2006):

An Introduction to the Theory of Point Processes: Volume I: ElementaryTheory and Methods . Probability and Its Applications, Springer New York.[6] V. Danos & I. Garnier (2015):

Dirichlet is Natural . In:

Proc. MFPS 2015 , Electr. Notes Theoret. Comput. Sci

A Monad for Point Processes . Talk at LAFI 2020.[8] T. Ehrhard, M. Pagani & C. Tasson (2018):

Measurable cones and stable, measurable functions: a model forprobabilistic higher-order programming . Proc. ACM Program. Lang. (POPL)

2, doi:10.1145/3158147. A Monad for Point Processes [9] T. Fritz (2019):

A synthetic approach to Markov kernels, conditional independence and theorems on sufﬁcientstatistics . arxiv:1908.07021.[10] T. Fritz, P. Perrone & S. Rezagholi (2019):

The support is a morphism of monads . In:

Proc. ACT 2019 .[11] R. Garner (2019):

Hypernormalisation, linear exponential monads and the Giry tricocycloid . In:

Proc. ACT2019 .[12] M. Giry (1982):

A categorical approach to probability theory . In:

Categorical aspects of topologyand analysis (Ottawa, Ont., 1980) , Lecture Notes in Mathematics

Church: a languagefor generative models . In:

Proc. UAI 2008 , pp. 220–229.[14] A. Goy & D. Petrisan (2020):

Combining probabilistic and non-deterministic choice via weak distributivelaws . In:

Proc. LICS 2020 , doi:10.1017/S0960129505005074.[15] M. Grohe & P. Lindner (2019):

Probabilistic Databases with an Inﬁnite Open-World Assumption . In:

Proc. PODS 2019 , pp. 17–31, doi:10.1145/3294052.3319681.[16] M. Hamano (2019):

A Linear Exponential Comonad in s-ﬁnite Transition Kernels and Probabilistic CoherentSpaces . arxiv:1909.07589.[17] T. Herlau, M.N. Schmidt & M. Morup (2016):

Completely random measures for modelling block-structuredsparse networks . In:

Proc. NeurIPS 2016 , pp. 4260–4268, doi:10.5555/3157382.3157574.[18] C. Heunen, O. Kammar, S. Staton & H. Yang (2017):

A convenient category for higher-order probabilitytheory . In:

Proc. LICS 2017 , IEEE Press, doi:10.1109/LICS.2017.8005137.[19] B. Jacobs (2019):

Structured Probabilitistic Reasoning . Draft available from the author’s website.[20] B. Jacobs & S. Staton (2020):

De Finetti’s construction as a categorical limit . In:

Proc. CMCS 2020 .[21] B. Jacobs (2019):

Learning along a Channel:the Expectation part of Expectation-Maximisation . In:

Proc. MFPS 2019 , doi:10.1016/j.entcs.2019.09.008.[22] K. Keimel & G. Plotkin:

Mixed powerdomains for probability and nondeterminism . arXiv:1612.01005.[23] P. McCullagh (2002):

What is a statistical model?

Annals of Statistics

Notions of Computation and Monads . Inf. Comput.

Probabilistic inference by program trans-formation in Hakaru (system description) . In:

Proc. FLOPS 2016 , Springer, pp. 62–79, doi:10.1007/978-3-319-29604-3 5.[26] D. Pollard (2001):

A User’s Guide to Measure Theoretic Probability . CUP,doi:10.1017/CBO9780511811555.[27] A. Simpson (2017):

Probability Sheaves and the Giry Monad . In:

Proc. CALCO 2017 ,doi:10.4230/LIPIcs.CALCO.2017.1.[28] S. Staton (2017):

Commutative semantics for Probabilistic Programming . In:

Proc. ESOP 2017 , Lect. NotesComput. Sci.

Distributing probability over non-determinism . Mathematical structures incomputer science

16, pp. 87–113, doi:10.1017/S0960129505005074.[30] F. Wood, J.W. van de Meent & V. Mansinghka (2014):

A new approach to probabilistic programming infer-ence . In:

Proc. AISTATS 2014 .[31] Y. Wu, S. Srivastava, N. Hay, S. Du & S.J. Russell (2018):

Discrete-Continuous Mixtures in ProbabilisticProgramming: Generalized Semantics and Inference Algorithms . In:

Proc. ICML 2018 , pp. 5339–5348.[32] M. Zwart & D. Marsden (2019):

No-Go Theorems for Distributive Laws . In: