[PDF] Families of discrete circular distributions with some novel applications

Abstract

Motivated by some cutting edge circular data such as from Smart Home technologies and roulette spins from online and casino, we construct some new rich classes of discrete distributions on the circle. We give four new general methods of construction, namely (i) maximum entropy, (ii) centered wrapping, (iii) marginalized and (iv) conditionalized methods. We motivate these methods on the line and then work on the circular case and provide some properties to gain insight into these constructions. We mainly focus on the last two methods (iii) and (iv) in the context of circular location families, as they are amenable to general methodology. We show that the marginalized and conditionalized discrete circular location families inherit important properties from their parent continuous families. In particular, for the von Mises and wrapped Cauchy as the parent distribution, we examine their properties including the maximum likelihood estimators, the hypothesis test for uniformity and give a test of serial independence. Using our discrete circular distributions, we demonstrate how to determine changepoint when the data arise in a sequence and how to fit mixtures of this distribution. Illustrative examples are given which triggered the work. For example, for roulette data, we test for uniformity (unbiasedness) , test for serial correlation, detect changepoint in streaming roulette-spins data, and fit mixtures. We analyse a smart home data using our mixtures. We examine the effect of ignoring discreteness of the underlying population, and discuss marginalized versus conditionalized approaches. We give various extensions of the families with skewness and kurtosis, to those supported on an irregular lattice, and discuss potential extension to general manifolds by showing a construction on the torus

Full PDF

aa r X i v : . [ m a t h . S T ] S e p Families of discrete circular distributions with somenovel applications ∗ Kanti V. Mardia † Karthik Sriram ‡ Abstract

Motivated by some cutting edge circular data such as from Smart Home tech-nologies and roulette spins from online and casino, we construct some new richclasses of discrete distributions on the circle. We give four new general meth-ods of construction, namely (i) maximum entropy, (ii) centered wrapping, (iii)marginalized and (iv) conditionalized methods. We motivate these methods onthe line and then work on the circular case and provide some properties to gaininsight into these constructions. We mainly focus on the last two methods (iii)and (iv) in the context of circular location families, as they are amenable todevelopment of a general methodology. We show that the marginalized and con-ditionalized discrete circular location families inherit some important propertiesfrom their parent continuous families. The resulting discrete families are alsosymmetric and have again two parameters akin to the mean direction and con-centration parameter. In particular, for the von Mises and wrapped Cauchy asthe parent distribution, we examine their properties including the maximum like-lihood estimators, the hypothesis test for uniformity and give a test of serialindependence. Using our discrete circular distributions, we demonstrate how todetermine changepoint when the data arise in a sequence and how to ﬁt mixturesof this distribution. Illustrative examples are given which triggered the work. Forexample, for roulette data, we test for uniformity (unbiasedness) , test for serialcorrelation, detect changepoint in streaming roulette-spins data, and ﬁt mixtures.We analyse a smart home data of human activities including ﬁtting of our mix-tures. In practice, the choice between marginalized approach and conditionalizedmethods is driven by the type of population. Further, choice between members ofa given marginalized or conditionalized discrete family is usually data driven butwe give a rule of thumb on diﬀerent choices. We give various extensions of thefamilies with skewness and kurtosis, to those supported on an irregular lattice,and discuss potential extension to general manifolds by showing a constructionon the torus.Keywords: changepoint; conditionalized distributions; entropy; Kullback-Leibler diver-gence; marginalized distributions; mixtures; multivariate; regular lattice; roulette data; skew; ∗ This paper was presented at the LASR 2019 † University of Leeds and University of Oxford ‡ Indian Institute of Management Ahmedabad, India. mart home technology; stable distributions; torus; von Mises distribution; wrapped Cauchydistribution. Since the discussion paper of Mardia (1975a), the subject of directional data, as expected hasgrown tremendously with the focus on “statistics on manifolds” leading to new distributionson hyper-sphere, torus, Stiefel manifold, Grassmann manifold and so on. The progress inthis area can be seen through several books since then: Fisher (1993), Fisher et al. (1987) ,Mardia and Jupp (2000), Jammalamadaka and Sengupta (2001), Ley and Verdebout (2017)and Ley and Verdebout (2018). Further, recently Pewsey and Garc´ıa-Portugu´es (2020) havegiven a comprehensive survey of the ﬁeld. However, as far as we could trace, it seems no workon constructing families of discrete distributions has appeared even for the circular case. Thispaper lays the foundation for constructing families of discrete circular distributions, whichcan be further extended to higher manifolds.For circular data, there are now good choices for continuous models but for the discretedata even such as modeling roulette wheel data (possibly biased) from gaming industries, thereis a dearth of good models. The same comment applies to smart home monitoring for elderly.In this paper, we give four plausible methods to construct families of discrete distributions onthe circle. The approaches to obtain the probability distribution under these methods can bebrieﬂy described as follows.(i) Maximum entropy method: determine the discrete probability distribution which hasthe maximum Shannon entropy among all those satisfying a set of moment constraints.(ii) Centered wrapping method: start with a discrete distribution on the line and wrap iton the circle.(iii) Marginalized method: start with a continuous circular distribution, which we refer to asthe “parent” and obtain the discrete distribution by integrating the probability densityfunction (pdf) on pre-determined intervals on the circle.(iv) Conditionalized method: start with a continuous circular distribution as the parent andobtain the distribution by restricting the pdf to a pre-determined lattice on the circle,and normalize.The maximum entropy method arises as an adaptation of such construction for continuousdistributions given in Mardia (1975a) and we show that it leads to general exponential families;the entropy deﬁnition can be modiﬁed to include non-exponential distributions. The wrappingof integer valued distributions, which was proposed in (Mardia, 1972, e.g. p. 54-55) usuallydo not have a natural centering parameter, but can be modiﬁed to include one. The nexttwo methods start with a continuous distribution on the circle and then apply two diﬀerentdiscretization approaches, viz. “marginalized” and “conditionalized”. We show that the latterthree approaches are interrelated through a duality-theorem, wherein discretizing on the lineand then wrapping on the circle leads to the same distribution as ﬁrst wrapping and thendiscretizing.For our discussion on inference and application to the real data examples, we select themarginalized and conditionalized approaches, as they are amenable to development of a gen-eral methodology required for inference, as well as more involved problems such as changepoint nalysis and ﬁtting of mixtures. However, we note that the particular distributions we con-sider for our data applications also happen to arise as special cases from one of the other twomethods. We describe the marginalized and conditionalized approaches in general for symmet-ric location families on the circle, and in particular apply to two main circular distributions,viz. von Mises (VM) and wrapped Cauchy (WC). Both of these continuous distributions havetwo parameters, namely “mean direction parameter” and “concentration parameter” (equiv-alently precision). These two cover two contrasting features with von Mises in exponentialfamily and wrapped Cauchy not in exponential family but heavy tailed, and used mainly forM-type estimators.The choice between methods, marginalized versus conditionalized, ideally depends on thenature of the data, i.e. grouped versus naturally discrete, but we will give evidence that thetwo methods turn out to be close except in the case of very small lattice supports. Similarly,the choice between diﬀerent distributions arising from a particular method applied to diﬀerentparent distributions (e.g. VM versus WC), depends on what best ﬁts the practical applica-tion. However, it is possible that under some conditions the distributions albeit diﬀerentmay be practically indistinguishable leading to similar results. For conditionalized discretedistributions, we quantify such similarities by considering the Kullback-Liebler, L and L divergence metrics. We carry out diﬀerent comparisons, between discrete versus continuous,between distributions from diﬀerent methods as well as within a given method. Similar resultsare obtained for the marginalized discrete distributions; in particular, we note that for ourspeciﬁc data examples, the marginalized and conditionalized methods happen to yield similarconclusions.We now give the two data examples which have motivated the paper though the subjectis very broad. The data outcomes are supported on r ∈ { , , . . . , m − } corresponding tothe circular lattice { πr/m : r = 0 , , . . . , m − } . Roulette wheel data:

A typical European roulette is shown in Figure 1, which has m =37 discrete outcomes, viz. { , , , . . . , } . A problem of interest to a regulator as wellas a casino is to ensure by regular monitoring that the roulette wheel remains unbiased,and detect a bias as quickly as possible based on streaming outcomes. For example, theUK Gambling Commission requires statistical testing to ensure fairness, and that the test-ing should be performed by an approved third party (see page 4 of the documentation onTesting strategy for compliance with remote gambling and software technical standards at .) We consider data sequences obtained fromspins of three diﬀerent European roulette wheels, one from an online roulette simulator andtwo from a casino industry. Smart home eating activity data:

One of the main sources of smart home data is the En-vironmental Protective Agency’s (EPA) “Consolidated Human Activity Database (CHAD)”. Various studies have appeared, and Chinellato et al. (2017) have given a brief descriptionof the CHAD data base and reviewed some recent studies. In particular, they point out thatnormally the data is periodic (daily/ hourly measurements) and use a mixture of von Misesdistributions. However, smart home sensors usually record half hourly or hourly measure-ments. In smart home industries such discrete data are often recorded, e.g. HOWZ: A smarthome for the elderly (HOWZ). For illustrative purpose, we study the data taken extracted m = 37 used in our real examples. from CHAD on eating habit activity of a household over a period of about one year recordedat half-hour intervals during the day, hence m = 48.The roulette application is naturally discrete and conditionalized method would be anatural consideration to model such outcomes. In the smart-home application, the half-hourdiscretization of the clock is “artiﬁcial” in the sense of aggregation and hence it is preferableto use a model based on a parent continuous distribution so the marginalized method wouldbe natural.Section 2 gives the diﬀerent methods of constructing discrete circular distributions, alongwith some results interrelating them. Section 3 gives some properties of the marginalizedand conditionalized distributions, both in general as well as some speciﬁc to families arisingfrom the cardioid, von Mises and wrapped Cauchy distributions. In Section 4, we study in-ference including maximum likelihood estimation and hypothesis test for uniformity, basedon marginalized and conditionalized distributions, also making some comparisons with thecontinuous case. Practical applications also involve changepoint detection and ﬁtting mixtureof distributions and we put forward Bayesian solutions for these in Section 5. Section 6 dealswith two practical examples. Our ﬁrst example uses one real online as well as two casinoroulette wheel data to determine presence of bias, and the other example is from smart homeeating habit activity data from a well established data base (CHAD). In Section 7, we pro-vide some comparative studies between continuous versus discrete, between type of methods:marginalized versus conditionalized, and distributions arising from a particular method. InSection 8, ﬁrst we give a construction which extends the distributions to those supportedon irregular lattice. Next, we extend the marginalized and conditionalized distributions totractable and interpretable general families of distributions, which allows also for skewnessand kurtosis. We also indicate how to construct discrete distribution on the torus, a startingpoint for potential future work on manifolds. We conclude the paper with a discussion inSection 9. The detailed proofs of all the results are given in the supplement. In this section, we elaborate on the four diﬀerent methods of construction of circular distri-butions mentioned in the introduction. We would like the distributions to be supported ona regular circular lattice. By way of notation, we will denote the set of real numbers by R ,non-negative real numbers by R + , the set of integers by Z , non-negative integers by Z + and he cyclic group of integers modulo (a given positive integer) m by Z m = { , , . . . , m − } . (1)The circular lattice domain of interest is Z m , or equivalently the set of vertices of a regularpolygon on the circle, which can be written as D m = { πr/m, r ∈ Z m } . (2)We generally use f ( · ) or g ( · ) to denote a probability density function (pdf) of a continuousdistribution on the line or circle and p ( · ) to denote a discrete probability function on integers. For a probability function { p r , r ∈ Z m } , with p r denoting the probability of the point2 πr/m ∈ D m , Shannon’s entropy is deﬁned as − m − X r =0 p r log p r . (3)Let t , t , . . . , t q be real valued functions deﬁned on Z m and suppose we are interested indiscrete distributions that satisfy the moment conditions E ( t ( r )) = a , E ( t ( r )) = a , , . . . , E ( t q ( r )) = a q . (4)Then a useful method to construct discrete distributions is to maximize the entropy amongall distributions on the given support that satisfy the given moment conditions. As noted inKemp (1997), the philosophy behind this construction is that “one should use all the giveninformation and nothing else”. The following theorem gives this construction, which is theresult of Mardia (1975a) adapted here to the circular discrete case. Theorem 1 (Maximum Entropy Distributions) . The probability function supported on D thatmaximizes the entropy (3) subject to the constraints (4) is of the form P (2 πr/m ) = e P qi =1 b i t i ( r ) P m − k =0 e P qi =1 b i t i ( k ) , r ∈ Z m , (5) provided there exist constants b , b , . . . , b r satisfying P m − r =0 t j ( r ) e P qi =1 b i t i ( r ) P m − k =0 e P qi =1 b i t i ( k ) = a j , j = 1 , , . . . , q. (6) Further, if they exist then the distribution is unique.

We now give a few examples of maximum entropy discrete distributions.Example 1. von Mises distribution: Suppose q = 2 and t ( r ) = cos(2 πr/m ) and t ( r ) = sin(2 πr/m ), the maximum entropy distribution is of the form p ( r ) = e κ cos(2 πr/m − µ ) P m − k =0 e κ cos(2 πr/m − µ ) , r ∈ Z m , (7)where κ = p b + b and tan µ = b /b . xample 2. Beran distributions: A more general family than the previous example, adiscrete version of Beran family (Beran 1979) is obtained by considering constraints onexpected values of t k ( r ) = (cos( k πr/m ) , sin( k πr/m )), which leads to the probabilityfunction p ( r ) ∝ e P qk =1 ( a k cos( k πr/m )+ b k sin( k πr/m )) . (8)Example 3. Geometric distribution: Suppose q = 1 and t ( r ) = r the maximum entropydistribution is of the form p ( r ) = (1 − p ) p r − p m , r ∈ Z m , where p = e b . (9)The above three examples also arise out of the “conditionalized” construction of discretecircular distributions that we later deﬁne in this paper. Further, (9) is the “centered wrappedgeometric distribution” discussed below. However, the maximum entropy construction canlead to more general subclasses of the exponential family of distributions, which we do notpursue in this paper. Further, distributions can be constructed by maximizing diﬀerentialentropy on extending the work on the continuous circular case for example the properties ofwrapped Cauchy distributions given in Kato and Pewsey (2015). A natural construction to obtain a circular discrete distribution is to start with a discretedistribution on the line and wrap it on the circle (see Mardia 1972, p.50). Let p ( · ) be aprobability distribution on integers z ∈ Z . We get the wrapped distribution z w = z (mod m )2 π/m ∈ D m , and the probability function of z w is given by P ( z w = 2 πr/m ) = p w ( r ) = ∞ X k = −∞ p ( r + km ) , r ∈ Z m . (10)It follows that the characteristic function of z w is given by ψ p = φ (2 πp/m ) . (11)where φ ( · ) is the characteristic function of z .In general, these distributions do not have a mean direction or centering parameter andtherefore we introduce a centering parameter t as follows P ( z w = 2 πr/m ) = p w ( r ) = ( p w ( r − t + m ) , r < tp w ( r − t ) , r ≥ t, r, t ∈ Z m . (12)Choosing the domain of t as Z m ensures probabilities are well deﬁned without changing thedomain of the distribution. Some examples are given below. xample 4. Centered wrapped Poisson distribution: If x has the Poisson distributionwith mean λ , then from equation (10) , x w has the wrapped Poisson distribution withthe probability function p w ( r ) = e − λ ∞ X k =0 λ r + km ( r + km )! , r ∈ Z m . (13)The probability function with centering parameter t is then given by p w ( r ) = ( e − λ P ∞ k =0 λ r − t + m + km ( r − t + m + km )! , r < te − λ P ∞ k =0 λ r − t + km ( r − t + km )! , r ≥ t r, t ∈ Z m . (14)This is a special case of distributions considered by Mastrantonio et al. (2019).Example 5. Centered wrapped skew Laplace distribution: The probability function ofdiscrete skew Laplace (see for example Jayakumar and Jacob 2012) is given by p ( k ) = (1 − p )(1 − q )1 − pq × ( p k , k = 0 , , . . . ..q − k , k = 0 , − , − , . . . , p, q ∈ (0 , . (15)From equation (10) p w ( r ) = (1 − p )(1 − q )1 − pq (cid:18) p r − p m + q m − r − q m (cid:19) , r ∈ Z m , (16)and the probability function with centering parameter is given by p w ( r ) =  (1 − p )(1 − q )1 − pq (cid:16) p r − t + m − p m + q t − r − q m (cid:17) , r < t (1 − p )(1 − q )1 − pq (cid:16) p r − t − p m + q m − r + t − q m (cid:17) , r ≥ t r, t ∈ Z m . (17)Example 6. Centered wrapped geometric distribution: The probability function of thegeometric distribution is given by p ( k ) = p k (1 − p ) , k ∈ Z + , p ∈ (0 , . (18)From equation (10) we have p w ( r ) = (1 − p ) p r − p m , r ∈ Z m . (19)Mardia (1972, p. 50) proposed the wrapped geometric distribution as a model for(possibly biased) roulette outcomes. For a roulette wheel having m = 2 n + 1 pointswith bias at θ = 0, an alternative form for the wrapped geometric model was given as: P ( θ = 2 π | r | /m ) ∝ p ( r +1) , | r | = 0 , , . . . , m, (20)Equation (20) can be obtained by a simple change the domain. The probability functionof the centered wrapped geometric distribution is given by p w ( r ) = ( (1 − p ) p r − t + m − p m , r < t (1 − p ) p r − t − p m , r ≥ t r, t ∈ Z m . (21) .3 Marginalized and conditionalized discrete distributions We ﬁrst motivate the marginalized and conditionalized methods of constructions on the lineand then we apply to the circle.

Let f ( x ) be a pdf on R , which satisﬁes some regularity conditions. We call f as the par-ent distribution, and describe two methods, namely “marginalized” and “conditionalized” asways to construct discrete distributions from it. Both methods lead to discrete distributionssupported on r ∈ Z . Deﬁnition 1.

The probability function of the marginalized discrete (MD) distribution on theline is obtained by integrating the pdf in each interval, and is given by p ( r ) = Z r +1 r f ( x ) dx, r ∈ Z . (22) Deﬁnition 2.

The probability function of the conditionalized discrete (CD) distribution onthe line is obtained by conditioning x = r (i.e. plugging in x = r ) in the pdf and normalizing,and is given by p ( r ) = f ( r ) P ∞ r = −∞ f ( r ) , r ∈ Z . (23)Of course, with appropriate modiﬁcation, the domain of the discrete distribution can begeneralized to c + δ Z , for some δ, c ∈ R . Also, the same procedure can be applied to non-negative domains. To understand these two constructions, let us consider two examples: onestarting with the exponential distribution and the other with the Cauchy distribution.Example 7. The pdf of the exponential distribution is given by f ( x ) = λe − λx , x ≥ , λ > . The marginalized discrete exponential distribution is given by p ( r ) = Z r +1 r λe − λx dx = e − λr (1 − e − λ ) = p r (1 − p ) , p = e − λ , r ∈ Z + . (24)The conditionalized discrete exponential distribution is given by p ( r ) ∝ e − λr , or , p ( r ) = (1 − p ) p r , p = e − λ r ∈ Z + . (25)Interestingly, if we start with the exponential distribution, both marginalized and con-ditionalized lead to the same discrete distribution, namely the geometric distribution.This leads to an interesting characterization theorem that the two approaches lead tothe same distribution if and only if the parent is the exponential distribution. We alsohave an analogous results for the discrete circular case (see supplement).Example 8. The pdf of Cauchy distribution is given by f ( x ) = aπ a + x , x ∈ ( −∞ , ∞ ) , a > . he marginalized discrete Cauchy distribution is then given by p ( r ) = 1 π (cid:18) tan − (cid:18) r + 1 a (cid:19) − tan − (cid:16) ra (cid:17)(cid:19) = 1 π tan − aa + r ( r + 1) , r ∈ Z . (26)The conditionalized discrete Cauchy distribution (using Jolley (1961, p.22, equation(124)) is given by p ( r ) = a tanh( aπ ) π a + r , r ∈ Z . (27)So, for Cauchy distribution, marginalized and conditionalized discrete distributions arenot the same.There has been some scattered literature on the conditionalized discretization on the line, forexample based on method (23), Kemp (1997) and Szab lowski (2001) discuss the condition-alized discrete normal, Inusaha and Kozubowski (2006) discuss the conditionalized discreteLaplace and Papadatos (2018) derives characteristic function of the conditionalized discreteCauchy. The conditionalized approach can also be described as a “plug-in” approach, whereasthe marginalized approach is in fact equivalent to the well known probit construction, usuallyused for univariate and multivariate normal distributions, see for example Joe (2014, p.20).However, this is the ﬁrst time there is a uniﬁed treatment of these approaches as a strategyto construct rich classes of discrete distributions. We have dealt with only univariate distri-butions but the same constructions go through for multivariate distributions. Our focus is togive insight into these methods of construction and their ramiﬁcations, which are clear fromdealing with the univariate case. Besides, our emphasis here is on the circular distributions. Following the above discussion, we now deﬁne the marginalized and conditionalized discretefamilies for the circular case. Let θ be a random variable with pdf f ( θ ) , θ ∈ [0 , π ). Deﬁnition 3.

The probability function of the marginalized discrete (MD) distribution on thecircle is given by p ( r ) = Z π ( r +1) m πrm f ( θ ) dθ = F (cid:18) π ( r + 1) m (cid:19) − F (cid:18) πrm (cid:19) , r ∈ Z m , (28) where F ( · ) is the cumulative distribution function of f ( θ ) . We note that this is also the probability function of the discrete random variable ⌊ mθ π ⌋ ,where ⌊·⌋ denotes the largest integer less than or equal to the given number. Deﬁnition 4.

The probability function of the conditionalized discrete (CD) distribution onthe circle is given by p ( r ) = f (2 πr/m ) P m − k =0 f (2 πk/m ) , r ∈ Z m . (29)For simplicity, we will denote both probability functions by the same notation, but thechoice will be obvious from the context. One question of interest is whether the marginalizedand conditionalized methods applied to a pdf f can lead to the same discrete distribution.Theorem 2 shows that if both the methods applied to pdf f result in the same discreteprobability function on the line, then the methods applied to the f wrapped on the circle willalso result in the same discrete probability function. heorem 2 (Marginalized and conditionalized distribution invariance) . Suppose f is a pdf on [0 , ∞ ) satisfying the condition f ( rδ ) P ∞ k =0 f ( kδ ) = Z ( r +1) δrδ f ( x ) dx, ∀ δ > . (30) Let f w ( θ ) be the pdf when f is wrapped on the circle. Then we have f w (2 πr/m ) P m − k =0 f w (2 πk/m ) = Z π ( r +1) /m πr/m f w ( θ ) dθ. (31)As seen in the previous section, for the exponential distribution, both marginalized andconditionalized methods lead to the same discrete distribution. It follows from Theorem 2that the property also holds for the wrapped exponential on the circle. However, in general,the two approaches will lead to diﬀerent distributions. There is also an interesting connectionbetween the constructions on the circle and the line, which is given by the following theorem. Theorem 3 (Duality) . Let X ∼ f ( · ) a pdf on R . Consider the following two methods of constructing discrete circulardistributions supported on Z m .(a) Discretize the random variabe ˜ X = mX/ (2 π ) and denote it by ˜ X d , and further denoteit’s wrapped version ( ˜ X d mod m ) by ˜ X dw .(b) Wrap the random variable X as X w = ( X mod π ) , and discretize the random variable ˜ X w = mX w / π , denote it by ˜ X wd .For both the discretization methods, viz marginalized and conditionalized, the above two ap-proaches lead to the same discrete distribution, i.e. ˜ X dw and ˜ X wd have the same distributions. We now apply the marginalized and conditionalized methods starting with the von Misesand the wrapped Cauchy as the parent distributions. The basic circular distribution (analogueof the normal distribution on line) is the von Mises (VM) distribution (see, for example,Mardia and Jupp 2000, p.36), which has the pdf f v ( θ | κ, µ ) = 12 πI ( κ ) e κ cos( θ − µ ) , θ ∈ [0 , π ) , µ ∈ [0 , π ) , κ ≥ , (32)where µ is the mean direction, κ is the concentration parameter and I ( κ ) is the modiﬁedBessel function of order 0. Using equation (28), we can now deﬁne the marginalized discretevon Mises distribution (MDVM). Deﬁnition 5.

The probability function for the marginalized discrete von Mises (MDVM)distribution with mean parameter µ and concentration parameter κ is given by p ( r | m, κ, µ ) = 12 πI ( κ ) Z π ( r +1) /m πr/m e κ cos( θ − µ ) dθ, r ∈ Z m , µ ∈ [0 , π ) , (33)We note that this does not have a closed form. Starting with the pdf (32), we can deﬁnethe conditionalized discrete von Mises (CDVM) as below. eﬁnition 6. The probability function for the conditionalized discrete von Mises (CDVM)distribution with mean parameter µ and concentration parameter κ is given by p ( r | m, κ, µ ) = 1 L ( κ, µ ) e κ cos(2 πr/m − µ ) , r ∈ Z m , µ ∈ [0 , π ) , (34) where the normalizing constant is the reciprocal of the function L ( κ, µ ) = m − X r =0 e κ cos(2 πr/m − µ ) . (35)Another distribution, which is becoming increasingly popular (see, for examples, Kato and Jones2015, Kato and Pewsey 2015, and Kent and Tyler 1988 ) is the wrapped Cauchy (WC) dis-tribution with the pdf given by (see, for example, Mardia and Jupp 2000, page 51 ) f c ( θ | ρ, µ ) = 12 π − ρ ρ − ρ cos( θ − µ ) , θ ∈ [0 , π ) , µ ∈ [0 , π ) , ρ ∈ [0 , , (36)where µ is the mean direction parameter and ρ is the concentration parameter. Here thecumulative distribution function has a closed form and based on the marginalization methodin equation (28), we can deﬁne the probability function of the marginalized discrete wrappedCauchy (MDWC) below. Deﬁnition 7.

The probability function of marginalized discrete wrapped Cauchy (MDWC)with mean parameter µ and concentration parameter ρ is given by p ( r | m, ρ, µ ) = 12 π cos − ( (1 + ρ ) cos( π ( r +1) m − µ ) − ρ ρ − ρ cos( π ( r +1) m − µ ) ) − π cos − ( (1 + ρ ) cos( πrm − µ ) − ρ ρ − ρ cos( πrm − µ ) ) r ∈ Z m . (37)Alternatively, this can be written as follows, the form which we will be using for compu-tational purposes. p ( r | m, ρ, µ ) = 1 π arctan  ρ − ρ { tan( π ( r + 1) /m − µ/ − tan( πr/m − µ/ } (cid:16) ρ − ρ (cid:17) tan( π ( r + 1) /m − µ/

2) tan( πr/m − µ/  ,r ∈ Z m , µ ∈ [0 , π ) , ρ ∈ [0 , . (38)Again, using equations (29) and (36), we have the following deﬁnition. Deﬁnition 8.

The probability function of the conditionalized discrete wrapped Cauchy (CDWC)distribution with mean parameter µ and concentration parameter ρ is given by p ( r | m, ρ, µ ) = (1 − ρ m cos( mµ ) + ρ m )(1 − ρ ) m (1 − ρ m ) 11 − ρ cos (cid:0) πrm − µ (cid:1) + ρ ,r ∈ Z m , µ ∈ [0 , π ) . (39) he normalizing constant of the probability function in (39) is derived in the supplement.For simplicity of notation, while writing the marginalized or conditionalized probability func-tions, we will omit the subscripts v or c corresponding to von Mises or Cauchy, but theywill be clear from the context and by the explicit mention of κ versus ρ as the concentrationparameters.Without major loss of information, for mathematical and computational convenience, wetake the mean direction parameter µ for discrete case as µ = 2 πt/m, t ∈ Z m , (40)and call t the “centering” parameter. However, all the data analysis in this paper can besuitably modiﬁed to allow for an unrestricted µ parameter. Accordingly, the probabilityfunction of MDWC (38) can be written as p ( r | m, ρ, µ ) = 1 π tan −  ρ − ρ { tan( π ( r + 1 − t ) /m ) − tan( π ( r − t ) /m ) } (cid:16) ρ − ρ (cid:17) tan( π ( r + 1 − t ) /m ) tan( π ( r − t ) /m )  ,r, t ∈ Z m , ρ ∈ [0 , . (41)From equation (34) the corresponding conditionalized probability function for CDVM is givenby p ( r | m, κ, t ) = 1 L ( κ ) e κ cos(2 π ( r − t ) /m ) , r, t ∈ Z m , κ ≥ , (42)where normalizing constant is now free of the parameter t and is the reciprocal of L ( κ ) = m − X r =0 e κ cos(2 πr/m ) . (43)Similarly, from equation (39) the corresponding probability function for CDWC is given by p ( r | m, ρ, t ) = (1 − ρ )(1 − ρ m ) m (1 + ρ m ) 11 + ρ − ρ cos(2 π ( r − t ) /m )) ,r, t ∈ Z m , ρ ∈ [0 , . (44)We will write r ∼ M DV M ( m, κ, t ) and r ∼ M DW C ( m, ρ, t ) for the marginalized discrete vonMises and marginalized discrete wrapped Cauchy distributions respectively, if t = 0. If t = 0,we just omit t and write r ∼ M DV M ( m, κ ) and r ∼ M DW C ( m, ρ ). Similarly, we will write r ∼ CDV M ( m, κ, t ) and r ∼ CDW C ( m, ρ, t ) for the conditionalized discrete von Mises andconditionalized discrete wrapped Cauchy distributions respectively, if t = 0. If t = 0, we justomit t and write r ∼ CDV M ( m, κ ) and r ∼ CDW C ( m, ρ ). We will always denote the discreteprobability functions by p ( r | m, κ, t ) for von Mises or p ( r | m, ρ, t ) for wrapped Cauchy, and if t = 0, we just write p ( r | m, κ ) or p ( r | m, ρ ). The distinction between von Mises and wrappedCauchy can be made by the use of κ and ρ to denote the concentration parameters. Againfor simplicity, we avoid separate notations for marginalized versus conditionalized probabilityfunctions but their use will be clear from the context. .4 Discrete circular distributions using copulas Here, we brieﬂy outline a strategy to construct multivariate discrete distributions usingcopulas. Suppose f ( θ ) and f ( θ ) are univariate pdfs on the circle, i.e. θ ∈ [0 , π ) and f ( θ +2 π ) = f ( θ ), f ( θ +2 π ) = f ( θ ). We consider the circular copulas of Johnson and Wehrly(1978), which has been further studied by Jones et al. (2015). Suppose F and F be the cor-responding cdfs. Then the following bivariate density will have f and f as its marginals. f ( θ , θ ) = 2 πf ( θ ) f ( θ ) g (2 π ( F ( θ ) − F ( θ ))) , (45)where g ( · ) is a univariate circular pdf on [0 , π ).The marginalized bivariate discrete probability function with (45) as parent can then beobtained as p ( r , r )= Z π ( r +1) /m πr /m Z π ( r +1) /m πr /m f ( θ ) f ( θ ) g (2 π ( F ( θ ) − F ( θ )) dθ dθ (46)It is easy to verify that the marginal distributions of the marginalized bivariate discreteprobability function are again the marginalized univariate probability functions.The conditionalized bi-variate discrete probability function with (45) as parent is p ( r , r ) = f (2 πr /m ) f (2 πr /m ) g (2 π ( F (2 πr /m ) − F (2 πr /m ))) P m − r =0 P m − r =0 f (2 πr /m ) f (2 πr /m ) g (2 π ( F (2 πr /m ) − F (2 πr /m ))) (47)We explore these ideas further with a speciﬁc construction on the torus in Section 8.3. We consider a general circular location family (see, for example, Mardia 1975b) with proba-bility density function given by f ( θ | τ, µ ) = g τ ( θ − µ ) , θ, µ ∈ [0 , π ) , (48)which we assume to be unimodal with mode at µ . For simplicity, we assume g τ ( θ ) = g τ (2 π − θ )for all θ ∈ [0 , π ) and also that g τ (2 π ) = g τ (0), and g τ >

0. Note that the normalizing constantwill depend only on τ and not on µ . Here, τ ≥ µ , suchthat τ = 0 corresponds to the case of uniform distribution and the dispersion around themode decreases as τ increases. For example, τ = κ for the von Mises distribution (32), and τ = ρ for wrapped Cauchy distribution (36). We will need some condition on ﬁrst and secondderivatives of g τ ( · ) to consider inference problems such as the maximum likelihood estimationetc. Here, we discuss some general properties of the marginalized and conditionalized discretecircular distributions arising from such location families. The probability function for the marginalized discrete distribution based on the circular loca-tion family (48) is given by p ( r | m, τ, t ) = Z π ( r +1) /m πr/m g τ ( θ − πt/m ) dθ, r, t ∈ Z m . (49) e will call this distribution the “marginalized discrete circular location family” and summa-rize its properties in the following theorem. Theorem 4.

The marginalized discrete circular location family has the following propertiesinherited from the parent family (48), but with some variations.(a) It has normalizing constant free of centering parameter t .(b) It is not unimodal, but has two modes, at r = t − and r = t .(c) It is symmetrical about ( t − / , and satisﬁes p ( m − r − t | m, τ, t ) = p ( r − − t | m, τ, t ) . (d) Its characteristic function (for integer p (mod m )) is E h e ip πrm i = e ip πtm ψ p,m ( τ ) , (50) where ψ p,m ( τ ) is the characteristic function of the conditionalized discrete distributioncentered at t = 0 , and is given by ψ p,m ( τ ) = ( , p = 0 e − iπpm m sin( πp/m ) π S p,m , p ∈ Z m \{ } , , (51) where S p,m = φ p p + ∞ X l =1 (cid:18) φ lm + p ( lm + p ) − φ lm − p ( lm − p ) (cid:19) , p ∈ Z m \{ } , and φ p is the characteristic function of the pdf g τ . From the above theorem, we can write the trigonometric moments for p ∈ Z m \{ } as α p,m = E (cid:20) sin (cid:18) p πrm (cid:19)(cid:21) = m sin(2 πp/m )2 π S p,m , (52) β p,m = E (cid:20) cos (cid:18) p πrm (cid:19)(cid:21) = − m sin ( πp/m ) π S p,m . (53)An interesting example is the cardioid distribution. The pdf of the cardioid distribution (seeMardia and Jupp 2000, p.45) is given by f ( θ ) = 12 π (1 + 2 ρ cos( θ − µ )) , θ, µ ∈ [0 , π ) , | ρ | < / . Note that ρ is the concentration parameter and ρ = 0 gives the circular uniform distribution.The marginalized discrete cardioid distribution will have the probability function p ( r | m, ρ, t ) = 1 m + 2 ρ sin( π/m ) π cos (cid:18) π ( r − (2 t − / m (cid:19) , r, t ∈ Z m . (54)We note that the probability function (54 is also the conditionalized distribution supported on Z m based on cardioid distribution, but with a diﬀerent concentration parameter = 2 ρ sin( π/m )and mean parameter = π (2 t − /m . So, an interesting property of the cardioid distributionis its marginalized discrete distribution is a member of the family of conditionalized discretedistributions. Indeed, this property will hold for inﬁnite mixtures of cardioid distributions asgiven by the following theorem. heorem 5. Consider the pdf of the parent family deﬁned by f ( θ ) = 12 π ∞ X k =1 η k (1 + 2 ρ k cos( θ − µ k )) , θ ∈ [0 , π ) , (55) where P ∞ k =1 η k = 1 , ∀ k , η k ≥ , µ k ∈ [0 , π ) , | ρ k | < / . Then the marginalized discretedistribution is also a member of the family of conditionalized discrete distributions. To visualize marginalized discrete distributions, Figure 2 plots the probability functions onthe line for (i) MDVM for κ = 0 , . , . ρ = 0 , . , .

50 and 0 . m = 10, centered at t = 5. We know analytically from Theorem 4 that the marginalizeddiscrete distribution will have two modes, at t − t , and the distribution is symmetric about t − /

2, which can be seen in both the graphs (i) and (ii). It can be seen that for small valuesof the concentration parameters ( κ for MDVM and ρ for MDWC) the distribution is diﬀusedand gets more concentrated as concentration parameter increases. We selected m = 10 in the(i) MDVM . . . . . . r p r obab ili t y f un c t i on (a) k =0(b) k = 0.5(c) k = 1(d) k = 2.5 (ii) MDWC . . . . . . . . r p r obab ili t y f un c t i on (a) r =0(b) r = 0.25(c) r = 0.50(d) r = 0.75 Figure 2: Probability functions of marginalized discrete circular distributions plottedon the line for (i) MDVM for κ = 0 , . , . ρ = 0 , . , . .

75, with m = 10, centered at t = 5. (i) CDVM . . . . . . r p r obab ili t y f un c t i on (a) k =0(b) k = 0.5(c) k = 1(d) k = 2.5 (ii) CDWC . . . . . . . . r p r obab ili t y f un c t i on (a) r =0(b) r = 0.25(c) r = 0.50(d) r = 0.75 Figure 3: Probability functions of conditionalized discrete circular distributions plottedon the line, with m = 10, centered at t = 5. (i) CDVM for κ = 0 , . , . ρ = 0 , . , .

50 and 0 . ﬁgure to also clearly see the diﬀerence between marginalized and conditionalized distributions. he corresponding conditionalized distributions (CDVM and CDWC) are shown in Figure 3.We can see by comparing the two ﬁgures that the conditionalized distribution is more peakedthan the corresponding marginalized distribution for both von Mises and wrapped Cauchy.We note that for larger m (e.g. m = 37 as in roulette wheel data) the marginalized andconditionalized distributions are very close.Using equation (51), Table 1 shows the characteristic functions for marginalized discretedistributions obtained from von Mises, wrapped Cauchy and cardioid distributions. Table 1: Characteristic functions of marginalized discrete distributions ( m ≥

2) .

Parent φ p ψ p,m Cardioid ( , p = 0 ρ p = ±  , p = 0 e − iπm mρ sin( π/m ) π , p = 1 , m − , ≤ p ≤ m − . von Mises I p ( κ ) I ( κ ) e − iπpm m sin( πp/m ) πI ( κ ) n I p ( κ ) p + P ∞ l =1 (cid:16) I lm + p ( κ )( lm + p ) − I lm − p ( κ )( lm − p ) (cid:17)o , ≤ p ≤ m − ρ p ( , p = 0 ,e − iπpm m sin( πp/m ) π R ρ x p − (1 − x m − p )1 − x m dx, ≤ p ≤ m − . The probability function of the conditionalized discrete distribution based on the circularlocation family (48) is given by p ( r | m, τ, t ) = g τ (2 π ( r − t ) /m ) P m − r =0 g τ (2 πr/m ) . (56)We will call this distribution the “conditionalized discrete circular location family” and sum-marize its properties in the following theorem. Theorem 6.

The conditionalized discrete circular location family inherits important proper-ties from (48), namely(a) It has normalizing constant free of centering parameter t .(b) It is unimodal and the mode is at r = t .(c) It is symmetrical about t . d) Its characteristic function ((for integer p (mod m )) is E h e ip πrm i = e ip πtm ψ p,m ( τ ) , (57) where ψ p,m ( τ ) is the characteristic function of the conditionalized discrete distributioncentered at t = 0 , and is given by ψ p,m ( τ ) = φ p + P ∞ l =1 ( φ lm + p ( τ ) + φ lm − p ( τ ))1 + 2 P ∞ l =1 φ lm ( τ ) , p ∈ Z m . (58) where φ p is the characteristic function of the pdf g τ . From the above theorem, the trigonometric moments for p ∈ Z m \{ } are α p,m = E (cid:20) sin (cid:18) p πrm (cid:19)(cid:21) = 0 and β p,m = E (cid:20) cos (cid:18) p πrm (cid:19)(cid:21) = ψ p,m ( τ ) . (59)Figure 3 plots the probability functions on the line for (i) CDVM for κ = 0 , . , . ρ = 0 , . , .

50 and 0 .

75, with m = 10, centered at t = 5. As expectedfrom Theorem 6 the distributions shown in parts (i) and (ii) of the ﬁgure are unimodal withmode at t and are symmetric about t . Similar to our observation in the case of marginalizeddistributions, for small values of the concentration parameters ( κ for CDVM and ρ for CDWC)the distribution is diﬀused and gets more concentrated as concentration parameter increases.Using equation (58), Table 2 shows the characteristic functions for conditionalized discretedistributions obtained from von Mises, wrapped Cauchy and cardioid distributions. Table 2: Characteristic functions of conditionalized discrete distributions ( m ≥ Parent φ p ψ p,m Cardioid  , p = 0 ρ, p = ± , otherwise .  , p = 0 ρ, p = 1 , m − , ≤ p ≤ m − I p ( κ ) I ( κ ) I p ( κ )+ P ∞ l =1 ( I lm + p ( κ )+ I lm − p ( κ ) ) P ∞ l =1 I lm ( κ ) , ≤ p ≤ m − ρ p ρ p ( ρ m − p ) ρ m , ≤ p ≤ m − We now make some additional observations on the CDVM and CDWC distributions.(a) For the CDVM distribution, we can also obtain an alternative simpliﬁed form for thecharacteristic function. Let us write L p ( κ ) = m − X r =0 cos (cid:18) p πrm (cid:19) e κ cos ( πrm ) . (60) rom Theorem 6, we have E h e ip πrm i = e ip πtm B p ( κ ) , where B p ( κ ) = L p ( κ ) L ( κ ) . (61) L p ( κ ) behaves like the Bessel function I p ( κ ); so we have the following identities (Mardia and Jupp2000, Appendix 1) with I p ( κ ) replaced by L p ( κ ). L ′ p ( κ ) = 12 ( L p − ( κ ) + L p +1 ( κ )) and L ′ ( κ ) = L ( κ ) , (62)where L ′ p ( κ ) = dL p ( κ ) dκ . It then follows by writing B ( κ ) = B ( κ ), that B ( κ ) = E (cid:18) cos 2 πrm (cid:19) and B ′ ( κ ) = V ar (cid:18) cos 2 πrm (cid:19) . (63)Since for large m CDVM tends to VM, we can show from Mardia and Jupp (2000,Appendix 1 p.349 equation A.4) that we have for large κ and m , L p ( κ ) ≈ m √ πκ e κ . (64)Some Bessel function identities do not hold for the discrete version (see supplement).However, as expected, these identities hold for large m and κ .(b) For the CDWC distribution, It follows from Table 2 that the p th trigonometric momentsof CDW C ( m, ρ ) are given by α p,m = E (cid:20) sin (cid:18) p πrm (cid:19)(cid:21) = 0 , β p,m = E (cid:20) cos (cid:18) p πrm (cid:19)(cid:21) = ρ p (1 + ρ m − p )1 + ρ m . (65)For p = 1, this leads to the mean resultant length ρ w = ρ (1 + ρ m − )1 + ρ m . (66)In general, 0 ≤ ρ w ≤ m → ∞ , ρ w → ρ . By property of characteristic functions,in general 0 ≤ β p,m ≤

1, and as m → ∞ , β p,m → ρ p , as can be seen from equation(65), thus leading to the characteristic function of the wrapped Cauchy distribution asexpected. Further, this convergence happens at an exponential rate. To see this, notethat ψ p,m ( ρ ) − ρ p = ρ p (cid:0) ρ m − p − ρ m (cid:1) ρ m = ρ m − p (cid:0) − ρ p (cid:1) ρ m . It follows that | ψ p,m ( ρ ) − ρ p | ≤ ρ m − p , and hence | ψ p,m ( ρ ) − ρ p | = O ( ρ m − p ). In particular,since ψ ,m = ρ w , we have | ρ w − ρ | = O ( ρ m − ) (see supplement for an illustrative plot). In this section, we will discuss maximum likelihood estimation and uniformity tests usingthe marginalized and conditionalized discrete location families of distributions, which werediscussed in the previous section. uppose w = ( w , w , . . . , w n ) is an iid sample from a discrete distribution (56) withsupport Z m . Let n r be the number of data points with w i = r , where r ∈ Z m . We will usethe following notation for trigonometric moments for discrete data. a p = 1 n m − X r =0 n r cos (cid:18) p πrm (cid:19) , b p = 1 n m − X r =0 n r sin (cid:18) p πrm (cid:19) , m ′ p = a p + ib p = ¯ R p e i ¯ θ p , (67)where ¯ θ p and ¯ R p denote the p th sample mean direction and p th sample mean resultant length.In particular, for the ﬁrst trigonometric moments, we will write¯ C = m − X r =0 n r n cos 2 πrm , ¯ S = m − X r =0 n r n cos 2 πrm and ¯ R = p ¯ C + ¯ S . (68)Further, the sample mean direction will be denoted by¯ θ = tan − (cid:18) ¯ S ¯ C (cid:19) , (69)where ¯ θ ∈ [0 , π ) is taken from the appropriate quadrant. We will ﬁrst discuss the generalmethodology and then some speciﬁc cases of CDVM and CDWC only, where some furthersimpliﬁcation in the methodology is possible. We will also consider a nonparametric test(Pearson type chi-square statistic for circular data, i.e. U G as deﬁned below) for testinguniformity. The data log-likelihood based on the marginalized discrete probability function (49) can bewritten as LL ( w | m, τ, t ) = m − X r =0 n r log Z π ( r +1) /m πr/m g τ ( θ − πt/m ) dθ, (70)and the log-likelihood based on the conditionalized discrete probability function (56) can bewritten as LL ( w | m, τ, t ) = − n log m − X r =0 g τ (2 πr/m ) ! + m − X r =0 n r log g τ (2 π ( r − t ) /m ) . (71)The maximum likelihood estimate (mle) can then be obtained by maximizing LL ( w | m, τ, t )over τ and t ∈ Z m . Another problem of interest is to test for uniformity, i.e. testing thehypothesis H : τ = 0 vs. H : τ = 0 . We note that under the null hypothesis ( H ), we can assume t = 0 without any loss ofgenerality. Here, we describe a testing procedure based on the likelihood ratio statistic. Giventhe data vector w of iid observations, we deﬁne the test statistic ( T ) as T ( w , ˆ τ , ˆ t ) = − LL ( w | m, τ = 0 , t = 0) + 2 LL ( w | ˆ τ , ˆ t ) , (72) here (ˆ τ , ˆ t ) is the mle based on the data vector w and LL ( · ) is the log-likelihood as in equation(71). The test rule is to reject H for large values of T or small values of the p-value. If wedenote the computed value of T in the data sample by T d , thenp-value = P (cid:0) T ( w , ˆ τ , ˆ t ) ≥ T d (cid:1) . For a comparison, we will also look at test of uniformity based on the following “ad hoc” teststatistics.(a) Let us consider the circular Karl-Pearson chi-squared statistic (see Mardia and Jupp2000, p.117) U G = 1 nm m − X j =0 S j − m m − X i =0 S i ! , where S j = j X i =0 ( O i − E i ) , j ∈ Z m , where O i =observed frequency at angle 2 πi/m for i ∈ Z m , E i = m × n . The 5% and 1%critical values for U G under H can be approximately obtained from Mardia and Jupp(2000, p.104).(b) T = 2 n ¯ R , which under H , is approximately chi-squared with 2 degrees of freedom.It is used for the unimodal alternatives, namely, the Rayleigh Test.(c) T = 2 n ( ¯ R + ¯ R ), where ¯ R is obtained similar to ¯ R in equation (68), but with4 πr/m used for computation in place of 2 πr/m . Under H , T is approximately chi-squared with 4 degrees of freedom. The extra term added to T allows axial alternatives.We note that these three test statistics are known in the context of continuous circular data,but we have adapted these to the discrete case. We use parametric boostrap to computemeasures of uncertainty for the mle. Given the circular nature of parameter t , we compute¯ R based on bootstrap replications of mle of t as a measure of uncertainty; larger value of ¯ R implies lesser uncertainty around mle of t and vice-versa. For ˆ τ , we compute the standarderror (SE). We also use bootstrap resampling to compute the p-value under H used for testing(see supplement for details).We can expect that for any given sample size, the precision of the estimation will be higherfor larger values of the concentration parameter τ , and will increase as sample size increases.Similarly, for the hypothesis test of uniformity, for larger τ , non-uniformity can be more easilydetected at smaller sample sizes, but for smaller τ , a larger sample size would be needed. Wehave veriﬁed that this is indeed the case based on an analysis of simulated data for CDVMand CDWC, and also that the testing results were consistent for the three alternative testsmentioned above (see supplement). It is also interesting to note that, for CDVM and CDWC,there is an approximate monotonic relationship between T and ¯ R . However, since this is notan exact relationship the results from tests based on T will not necessarily be same as thatbased on T (see supplement). We will next concentrate on the speciﬁcs of inference only inthe case of CDVM and CDWC, where some further simpliﬁcation is possible. Suppose w = ( w , w , . . . , w n ) is a vector of iid observations from CDV M ( m, κ, t ) with knownsupport, i.e. known m . Then the log-likelihood in (71) can be written as LL ( w | m, κ, t ) = − n log( L ( κ )) + nκ ¯ R cos (cid:18) ¯ θ − πtm (cid:19) . (73) he mle of t and κ are unique and can be obtained from the following equations.ˆ t = arg max t ∈S t cos (cid:18) ¯ θ − πtm (cid:19) = (cid:20) m ¯ θ π (cid:21) m , (74) B (ˆ κ ) = ¯ R cos (cid:18) ¯ θ − π ˆ tm (cid:19) , (75)where [ x ] m denotes the integer closest to the number x , modulo m . Calculation of ˆ t is directsince ¯ θ is easily calculated from the data and we ﬁnd based on simulations that 2 π ˆ t/m does notvary much from ¯ θ getting closer as n or κ increases (see supplement for a pictorial illustrationof (74) and simulation details). By equation (63), B ( κ ) is strictly increasing and therefore forthe given ˆ t , equation (75) can be inverted to get ˆ κ . Suppose w = ( w , w , . . . , w n ) is a vector of iid observations from CDW C ( m, ρ, t ) with knownsupport, i.e. known m . Then the log-likelihood in (71) can be written as LL ( w | m, ρ, t ) = n (cid:0) log(1 − ρ ) + log(1 − ρ m ) − log( m ) − log(1 + ρ m ) (cid:1) − m − X r =0 n r log(1 + ρ − ρ cos(2 π ( r − t ) /m )) . (76)The mle is obtained by maximizing LL ( w | m, ρ, t ) over ρ ∈ [0 ,

1) and t ∈ Z m . We note thatthe likelihood is not necessarily concave (see supplement) but mle can be computed. For anygiven t , let h ( ρ, t ) = 1 n ∂LL∂ρ = − ρ − ρ − mρ m − − ρ m − m − X r =0 n r n ρ − π ( r − t ) /m )1 + ρ − ρ cos(2 π ( r − t ) /m ) . (77)The above expression (for ρ = 0) can be simpliﬁed as h ( ρ, t ) = − (1 + ρ ) ρ (1 − ρ ) − mρ m − − ρ m + 1 − ρ ρ m − X r =0 n r n ρ + 1 − ρ cos π ( r − t ) m . (78)Let ˆ ρ ( t ) =  , if h ( ρ, t ) < ∀ ρ ∈ [0 , , , if h ( ρ, t ) > ∀ ρ ∈ [0 , , ˜ ρ, if h (˜ ρ, t ) = 0 , for some ˜ ρ ∈ [0 , . (79)The mle for ˆ ρ is unique if h (ˆ ρ ( t ) , t ) ≥ ρ ≤ ˆ ρ ( t ) and h (ˆ ρ ( t ) , t ) < ρ > ˆ ρ ( t ) , ∀ t ∈ Z m . (80)Accordingly, the mle (ˆ t, ˆ ρ ) should then satisfyˆ t = arg max t ∈S t LL ( w | m, ˆ ρ ( t ) , t ) , (81) h (ˆ ρ, ˆ t ) = 0 except for extreme cases ˆ ρ = 0 or 1 . (82) e conjecture that the mle is unique in general and in fact, it is the case for all the data usedin this paper (see supplement for a graphical illustration of how the solution propogates fordiﬀerent values of t ).Due to the closed form expression for trigonometric moments in the case of CDWC, onecould consider an alternative estimation approach for estimating ρ . Given the estimate ofthe centering parameter, we can use the moment estimator of ρ by inverting the function inequation 66, by taking ˆ ρ w = ¯ R (see supplement).Note that ρ w gets closer to ρ as ρ → ρ →

1, and this proximity will improve forlarger m , indeed ρ w → ρ as m → ∞ . Furthermore, we have shown in Section 3 that CDWCtends to wrapped Cauchy for large m ; so in that case one can estimate the parameters usingthe maximum likelihood methods for wrapped Cauchy (e.g. using the R routine given inPewsey et al. (2013) of Kent and Tyler (1988)). Here, we discuss general methodology of changepoint detection and mixtures as motivated bytwo practical applications, which we are going to apply in the next section. However, we willmake speciﬁc distributional choices for our data examples and will explain our reasoning fullyin sections 6 and 7.

When data is obtained as a sequence of streaming independent observations, it is of interestto estimate components by detecting an appropriate changepoint. Suppose we obtain in asequence of observations: { w i : i ∈ { , , . . . , n }} . Our goal is to estimate a change-point inthe data. The model with a changepoint at i = K can be constructed as follows: w i ∼ ( p ( ·| m, τ , t ) if i ≤ Kp ( ·| m, τ , t ) if i > K, (83)where p ( ·| m, τ, t ) is the probability function of the discrete circular distribution as in equations(49) and (56). The likelihood for the data can be written as L ( w | ( τ , τ ) , ( t , t ) , K ) = K Y i =1 p ( w i | m, τ , t ) × n Y i = K +1 p ( w i | m, τ , t ) . (84)Of particular interest in many applications is to detect a change from uniformity, i.e. τ = 0,where t can be arbitrary but we take as 0 without loss of generality. An example is that ofthe gaming roulette, where the casino is interested in detecting a deviation from the uniformdistribution as early as possible based on sequence of roulette spins. We can use Bayesianestimation with non-informative ﬂat priors on the unknown parameters, viz τ , t , τ , t and K . Suppose this prior is denoted by π ( τ , τ , t , t , K ), then the joint posterior distribution isobtained as Π (( τ , τ , t , t , K ) | w ) ∝ L ( w | ( τ , τ ) , ( t , t ) , K ) × π ( τ , τ , t , t , K ) . (85)Gibbs sampling can be used to simulate from the posterior distribution of the parametersand then generate the relevant summaries (see supplement for computational steps and a test xample). Pewsey and Garc´ıa-Portugu´es (2020) have given a survey of changepoint detectionwith continuous angular data, but our method is for circular data, which is discrete. We now consider constructing mixture of discrete circular distribution components. Theprobability function for a mixture of K distribution components can be written as p ( r | m, ( τ , . . . , τ K ) , ( t , . . . , t K )) = K X j =1 p j · p ( r | m, τ j , t j ) , (86)where p j is the mixing probability, τ j and t j are the concentration and centering parametersof the j th discrete distribution component. Note that this K is not to be confused with thenotation for changepoint used in the last subsection. Suppose we have w = ( w , w , . . . , w n )a vector of iid observations from a discrete distribution mixture with a known number ofcomponents K . We would be interested in estimating the component-wise parameters ( τ j , t j )and also the corresponding mixing probabilities p j . The likelihood for the data can be writtenas L ( w | m, τ , t , p ) = n Y i =1 p ( w i | m, ( τ , . . . , τ K ) , ( t , . . . , t K )) . (87)We use Bayesian estimation with non-informative ﬂat priors on the parameters τ =( τ , . . . , τ K ), t = ( t , . . . , t K ) and the mixing probabilities p = ( p , . . . , p K ), and obtain theirposterior distribution conditional on the data. Suppose this prior is denoted by π ( τ , t , p ),then the joint posterior distribution is obtained asΠ ( τ , t , p | w ) ∝ L ( w | m, τ , t , p ) × π ( τ , t , p ) . (88)Gibbs sampling can be used to simulate from the posterior distribution of the parametersand then generate the relevant summaries (see supplement for computational steps and a testexample). Using the methods developed in the previous sections, we analyse real data examples of theonline as well as casino roulette wheel data, and smart home eating habit human eatingactivity data.

There has been ongoing search to ﬁnd a plausible test for testing unbiasedness of a roulettewheel. We described the roulette example in the introduction section and the need ofsuitable test for bias as well as detection of changepoint when bias creeps in. The prob-lem is now more pressing than ever before with the rise of many online gaming sites, e.g. https://10bestcasinos.co.uk/en-en_d_rl.html . Indeed, Mardia (1972, p. 50), has touchedthe problem and proposed two simple models: we have already described the model based onthe geometric model in Section 2. He also proposed P ( θ = 0) = c p, P ( θ = 2 π | r | /m ) = c p/ | r | , | r | = 1 , . . . , n, here c and c are the normalizing constants. But these distributions are of course not asﬂexible as our proposed families and were not applied. To illustrate our procedure, we considerdata sequences obtained from spins of three diﬀerent European roulette wheels. Note thathere m = 37 (as against American roulette m = 38). Data 1

Our ﬁrst data is of size n = 1000, a sequence of outcomes from successive spinsof an online European roulette simulator. Data 2

This data has outcome from successive spins of a real European rouletterecorded in a casino.

Data 3

This data is from the same casino as Data 2 , but from a diﬀerent roulettewheel.Data 1, 2 and 3 are of size n = 1000 , Analysis 1 : Bias testing (Section 4)

Analysis 2 : Changepoint detection based on full data sequence as well as partial datasequences (Section 5.1), and

Analysis 3 : Two-component mixture where one component is forced to uniform andthe other is estimated along with its mixing probability (Section 5.2).We carry out these analyses with the following model choices. For Analysis 1, we show resultsfrom both CDVM and CDWC, but for Analyses 2 and 3, we will just present results forCDWC, as the results would be similar under either distribution.

Analysis 1

After mapping the roulette outcome to the corresponding angle position ( w ),we applied the estimation and testing procedure described in Section 4. Table 3 shows thecomputations of mle and the test statistics T based on CDVM as well as CDWC along withp-value to test H : unbiased wheel vs. H : biased wheel . The estimated value of the concentration parameter κ being slightly diﬀerent from 0 is apreliminary indication of a biased wheel. However, we further carry out a formal test to checkstatistical signiﬁcance based on the statistic T as in Section 4. The larger p-value (0 . . t parameter for a roulette need not coincidewith the mode of the frequency distribution (e.g. for Data 1, ˆ t = 17 but mode of the frequencydistribution is 0). This is because the value of t for which the CDVM likelihood is maximizedalso depends on value of κ .For a comparison, we also carried out alternative tests based on the statistics U G , T and T as described in Section 4 (see supplement). It is satisfying to note that these results aremostly consistent with the above observations made from Table 3. Incidentally, we note in T , for the 3 European roulettes. For both CDVM and CDWC, thep-value suggests no evidence for bias in Data1, weak evidence for Data 2 and strongevidence for bias in Data3. (a) CDVM roulette n ¯ R ˆ κ SE(ˆ κ ) ˆ t ˆ θ = π ˆ tm ¯ R (ˆ θ ) T SE ( T ) p-value(i) Data 1 1000 0.018 0.036 1.357 17 2.887 1.000 0.654 56.239 0.715(ii) Data 2 8299 0.020 0.040 0.567 30 5.094 1.000 6.688 129.634 0.039(iii) Data 3 8106 0.029 0.058 0.585 31 5.264 1.000 13.716 130.800 0.001 (b) CDWC roulette n ¯ R ˆ ρ SE(ˆ ρ ) ˆ t ˆ θ = π ˆ tm ¯ R (ˆ θ ) T SE ( T ) p-value(i) Data 1 1000 0.018 0.019 0.018 18 3.057 0.624 0.683 3.228 0.7150(ii) Data 2 8299 0.020 0.020 0.007 30 5.094 0.956 6.675 7.128 0.0430(iii) Data 3 8106 0.029 0.030 0.008 31 5.264 0.950 14.245 7.098 0.0020 Table 3 part (a) for CDVM (ﬁndings are similar for (b) CDWC) that the highest value of ¯ R is 0.029, and using the mapping between ¯ R and κ in Mardia and Jupp (2000, Appendix 2.4),we get ˆ κ = 0 . κ we obtained for CDVM. Analysis 2

We apply the approach discussed in 5.1 for changepoint detection using CDWC.For this analysis, recall that the ﬁrst component before the changepoint is forced to be uniform(i.e. ρ = 0 , t = 0). We ﬁrst carry out the analysis based on the full data sequence. Theresulting posterior distribution summaries for the three data are shown in Table 4. For Data1 and Data 2, we do not detect any changepoint as the 95% credible interval for the possiblechangepoint K is very wide spanning almost over the entire range, with posterior mode close tothe extreme. Also, the 95% credible interval is not removed away from 0, suggesting that thesecond component may not be diﬀerent from uniform. For Data 3, a changepoint is detectedwith a likely position being 1997 as indicated by the mode in the posterior distribution of K and also the much narrower 95% credible interval compared to full range of possibilities, andthe 95% credible interval for ρ of second component is clearly removed from 0. These ﬁndingsare consistent with the results of Analysis 1, where Data 3 showed strong evidence for bias.Here, we have gone one step further and determined the likely time-point in the data sequencewhen the bias may have set in. Table 4: : Results showing posterior distribution summaries for the distribution afterchangepoint using CDWC based on full data range for the three roulette data. ρ πt /m K roulette mean sd 95% interval mean ¯ R mode sd 95% interval(i) Data 1 0.064 0.109 [ 0 , 0.400] 2.605 0.146 998 305 [ 42 , 997](ii) Data 2 0.026 0.063 [ 0 , 0.100] 5.347 0.635 176 2,635 [139 , 8234](iii) Data 3 0.035 0.010 [0.010 , 0.050] 5.295 0.951 1,997 869 [183 , 3414] Analysis with streaming data:

Further, to see how early we can detect a change with treaming data, we applied the changepoint detection procedure on partial data sequences ofthe three roulette datasets, i.e. spins 1 : u for diﬀerent bounds u , namely, u ∈ { , , . . . , } for Data 1 , u ∈ { , , , . . . , , } for Data 2 ,and u ∈ { , , , . . . , } for Data 3.Figure 4 obtained for Data 1, 2 and 3 shows the plots of the 95% credible intervals, andposterior mean for ρ and mode for K for diﬀerent choices of the upper bound. For Data1 and Data 2, we can see that the 95% credible interval for K almost spans across the fullrange of possibilities, i.e. from 1 to the upper bound of the sequence. Also, the 95% credibleintervals for ρ always contain 0 suggesting that the second component may not be diﬀerentfrom uniform. For Data 3, we can see that starting from an upper bound of 4500 onwards,the 95% credible interval for K starts becoming much narrower compared to the range ofpossibilities. Also, correspondingly the 95% credible interval for ρ is clearly removed from0, and the posterior mode is settled around 2000. So, for Data 3, while we start detectingthe change weakly based on data range 1:2500, the evidence becomes stronger starting from1:4500. In the plot for Data 2, we note that there are a few instances at the right extremewhere the mode for K is not close to the upper end of the 95% credible interval. This isdue to presence of multiple modes in the posterior distribution based on MCMC simulations,which appear as rapid ﬂuctuations in the dotted line. Given that the posterior is diﬀused andthe 95% credible interval for changepoint spans over almost entire range of possibilities, thesemodes do not have any signiﬁcance. In fact, the posterior distribution for Data 2 based onthe full data range shows a mode at either extreme (see supplement). Analysis 3 . We use the procedure described in Section 5.2 on ﬁtting mixture of CDWCcomponents. Table 5 shows the posterior summaries from ﬁtting 2-component CDWC mixtureto the three roulette datasets (see supplement for the the posterior distribution plots). ForData 1, it is interesting to note that the 95% credible interval is removed from 0, although thespan of the interval is very large (0 .

01 to 0 . ρ is not removed from 0 suggesting that the second component maynot be diﬀerent from uniform. This ﬁnding is again consistent with the changepoint analysisand the testing. For Data 3, the 95% credible interval for ρ is very clearly removed from0, which is suggestive of a second non-uniform component. This also is consistent with ourﬁndings from testing as well as changepoint analysis (Analysis 2). Since the roulette data is a time series data, it is important to assess whether there is anydependence before we apply any iid statistical work. One of the common method of assessingdependence is to use Watson and Beran (1967) serial correlation coeﬃcient to check whetherthere is presence of ﬁrst order serial dependence in the three roulette data. Recall m = 37 andeach roulette outcome w ∈ Z m . We consider the sequence of roulette outcomes w , w , . . . , w n and compute the following Watson-Beran’s modiﬁed test, statistics¯ C = 1 n n X i =2 cos (2 π ( r i − r i − ) /m ) , ¯ S = 1 n n X i =2 sin (2 π ( r i − r i − ) /m ) , ¯ R = ¯ C + ¯ S , i) Data 1 . . . . (a) data range upper bound r

150 250 350 450 550 650 750 850 950 (b) data range upper bound K

150 250 350 450 550 650 750 850 950 (ii) Data 2 . . . . . (a) data range upper bound r

500 1500 2500 3500 4500 5500 6500 7500 (b) data range upper bound K

500 1500 2500 3500 4500 5500 6500 7500 (iii) Data 3 . . . . (a) data range upper bound r

500 1500 2500 3500 4500 5500 6500 7500 (b) data range upper bound K

500 1500 2500 3500 4500 5500 6500 7500

Figure 4: Results of changepoint analysis based on diﬀerent partial data ranges forData 1, 2 and 3 shown in rows (i), (ii) and (iii) respectively. Each row of plots showsposterior summaries for ρ and K plotted against data range upper bound, shown asplots (a) and (b) respectively. In each plot, solid lines mark the 95% credible intervals.The dotted line in plot (a) shows the posterior mean for ρ and in plot (b) shows theposterior mode for K . and we reject the hypothesis of independence at 1% level based on large (absolute) valuesof the three statistics, though ¯ R is the omnibus statistic. The required critical values arecomputed using simulations from the null distribution. We note that for large n and m , thestatistics have the following asymptotic null distributions. √ n ¯ C ∼ N (0 , , √ n ¯ S ∼ N (0 ,

1) 2 n ¯ R ∼ χ (2) . (89)Table 6 shows the computed values of the statistics (89) for the three roulette data and theapplicable 1% cutoﬀs calculated based on (100,000) simulations from the null distribution, i.e.discrete circular uniform supported on equi-spaced lattice with m = 37 points. We see thatfor all the three data the statistic values are well within the cutoﬀs. So, there is no evidenceto support existence of serial dependence in any of the roulette sequences. We note that for ρ = 0 , t = 0). Posterior summaries areshown for the parameters ( ρ , t , p ) of the second component in the mixture. (i) Data 1parameter mean sd 95% interval(a) ρ πt /m p (ii) Data 2parameter mean sd 95% interval(a) ρ πt /m p (iii) Data 3parameter mean sd 95% interval(a) ρ πt /m p the continuous case, the critical values in Table 6 are roughly comparable to that based onthe asymptotic distribution (89). Table 6: Computed statistic values for the three roulette data and the 1% critical valuecalculated based on (100000) simulations from the null distribution. computed statistic values 1% critical valueroulette n √ n ¯ C √ n ¯ S n ¯ R √ n ¯ C √ n ¯ S L ¯ R Data 1 1000 0.436 -0.904 1.007 ± ± ± ± ± ± In this section, we study the data taken from CHAD on eating habit activity of a householdover a period of about one year recorded at half-hour intervals during the day, hence m =48. The context of the data was described in the introduction section. Figure 5 shows thehistogram on the line and the circular histogram for the data. The histogram clearly showsthree modes and hence we ﬁt the data with a mixture of 3 components using the Bayesianprocedure described in Section 5.2. Since the eating activity data is aggregated data, ideallyit is more appropriate to use the marginalized discrete distributions as components ratherthan conditionalized discrete distributions. However, as we discuss in Section 7, with m = 48,the two approaches are expected to give similar results. We ﬁt the data with 3 componentsof MDWC as well as CDWC.Table 7 shows posterior summaries for the parameters of a 3-component mixture ﬁttedto the data based on MDWC (part (a)) and CDWC (part (b)). We note the results fromthe two distributions are similar. The posterior mean values of ( t , t , t ) are (14 . , . , . . , . , .

2) for CDWC. This means that the data is a mixture of three omponents approximately centered at 7:30 hours (Breakfast), 12:30 hours (Lunch) , and18:00 hours (Dinner), which is what can be expected for a typical household. Due to thecircular nature of the parameters ( t , t , t ), we report ¯ R as a measure of variation and donot compute the credible intervals. Our analysis for this data shows there is relatively morevariation in the lunch time for this household as indicated by the smaller ¯ R corresponding tothe second component. . . . . . . . . (a) w p r obab ili t y f un c t i on − − (b) p p p Figure 5: Histogram on the line and circular histogram for the human eating activitydataTable 7: Posterior summaries for the parameters of a 3-component mixture of MDWC(a) and CDWC (b) are shown for the Eating Activity data. The components are shownin the order of increasing estimated centering parameter t . (a) MDWC parameter mean sd(or ¯ R (2 πt j /m )) 95% interval ρ ρ ρ π/mt π/mt π/mt p p p (b) CDWC parameter mean sd(or ¯ R (2 πt j /m )) 95% interval ρ ρ ρ π/mt π/mt π/mt p p p In this section, we carry out several comparisons to gain some insights on the choices betweenthe families resulting from marginalized and conditionalized methods, or choices betweencandidates within a given type of family. We brieﬂy summarize the type of comparisons asfolows: • Cross comparison, i.e. continuous versus discrete models: This is to study the eﬀecton inference if we used the continuous distribution instead of a discerete distribution. e ﬁnd that the inference using a continuous model can be misleading especially whenwe are estimating parameters from discrete data with higher concentration parameter.However, for testing uniformity on highly dispersed data, the use of test based ondiscrete distribution and the Rayleigh test based on continuous distribution will lead tosimilar conclusions. • Comparison within a particular family of discrete distributions, e.g. among condition-alized discrete distributions using on divergence measures: By using Kullback-Leibler, L and L measures, we will show here that the conditionalized discrete distributionsresulting from von Mises and wrapped Cauchy can be very diﬀerent from each other; soone cannot be easily approximated by the other family. In contrast, we show that theconditionalized discrete wrapped normal (deﬁned in Section 7.2) and CDVM are veryclose to each other; so for practical purposes may be interchangeable in data analysis • Comparison among diﬀerent types of discrete models, i.e. marginalized and condition-alized: We note here that except for very small values of m such as m <

10, inferencebased on the marginalized and conditionalized approaches will be similar. • Re-examination of Sheppard correction for circular distribution directly based on marginal-ized and conditionalized discrete distributions.

The question is whether using a discrete distribution is really necessary for inference withdiscrete circular data. The question is whether we would reach similar conclusions if wejust carried out the inference based on the approximate continuous model instead of thediscrete model. To check this, we study the eﬀect on inference, namely maximum likelihoodestimation and testing using simulated data arising from CDVM and CDWC. We expectsimilar ﬁndings to hold for marginalized discrete distributions. We show below that theinference using a continuous model can be misleading especially when we are estimatingparameters from discrete data with higher concentration parameter. Also for example whenwe are dealing with changepoint analysis for roulette data, it is natural to work on discretesupport. However, for testing uniformity on highly dispersed data, the use of test based ondiscrete distribution and the Rayleigh test based on continuous distribution will lead to similarconclusions.

We simulate 1000 datasets, each of size n = 1000 for each choice of m ∈ { , } and κ ∈{ , . , } for CDV M ( m, κ, t = 0) ,and ρ ∈ { . , . , . } for CDW C ( m, ρ, t = 0). Wecompute the mle from the conditionalized discrete model and compare it with that obtainedfrom the continuous model. Table 8 shows the comparison of bias, standard deviation(sd)and mean squared error (mse)calculated based on the 1000 simulated datasets under diﬀerentscenarios of m and the concentration parameter ( κ or ρ ). Part (a) of the table comparesCDVM with VM, and part (b) compares CDWC with WC. We used the “CircStats” libraryin R to compute the mle under von Mises and wrapped Cauchy models. n part (a) of Table 8 we can see that for m = 10, the bias, sd and mse for CDVM arecomparable with that of VM for κ = 1 , .

5, but has lesser bias and mse than that of VM for κ = 10. However, as m increases to 20, the diﬀerences between CDV M and

V M decrease.In part (b) of Table 8, the diﬀerences are more pronounced. For m = 10, we see thatthe bias, sd and mse for CDWC are comparable with that of WC for ρ = 0 .

5, but bias, sdand mse are much lesser than that of

W C for ρ = 0 . .

8. With increasing m to 20, thediﬀerences at ρ = . ρ = 0 . m would be required for the two to match.In summary, using the continuous model for discrete data with possibly high concentrationand moderate m can lead to highly biased and inaccurate estimation of parameters. Further,how large m needs to be for a continuous approximation to work depends on the concen-tration parameter. In problems, such as in mixture estimation, where the underlying natureof concentration parameters are apriori unknown, it is therefore more appropriate to workwith the discrete distribution, as inferences from (the approximate) continuous model can bemisleading, i.e. biased with higher standard deviation.(a) MLE : CDVM vs. VM CDVM VMm κ Bias SD MSE Bias SD MSE10 1 0.004 0.055 0.055 0.002 0.053 0.05310 2.5 0.002 0.092 0.092 -0.003 0.092 0.09210 10 0.008 0.385 0.385 1.513 0.690 1.66320 1 0.000 0.052 0.052 -0.002 0.051 0.05120 2.5 0.005 0.095 0.095 -0.000 0.094 0.09420 10 0.035 0.432 0.433 0.042 0.433 0.435 (b) MLE : CDWC vs. WC

PDWC WC m ρ

Bias SD MSE Bias SD MSE10 0.5 0.001 0.016 0.016 0.013 0.019 0.02310 0.6 -0.000 0.013 0.013 0.058 0.024 0.06310 0.8 -0.000 0.007 0.007 ⋆ ⋆ mle using mle.wrappedcauchy() in R converged to ρ = 1 for this case Table 8: Bias , sd and mse for the mle for the concentration parameter ( κ in (a), ρ in (b)) computed based on 1000 datasets, each of size n = 1000 for each choice of m ∈ { , } and κ ∈ { , . , } for CDV M ( m, κ, t = 0), and ρ ∈ { . , . , . } for CDW C ( m, ρ, t = 0). Recall that the test of uniformity for a discrete location family (56) is formulated as H : τ = 0 vs. H : τ = 0, where τ = κ for CDVM and τ = ρ for CDWC. This can betested based on the discrete distribution using the test statistic T as discussed in Section4 or an alternative approach is to use the Rayleigh test based on test statistic ¯ R . Table 9shows the power computed for a 5% test based on either statistic at a speciﬁc alternative,viz. κ = 0 .

05 for CDVM and ρ = 0 .

03 for CDWC, for each of diﬀerent choices of data-size n ∈ { , } and m ∈ { , } . The 5% critical values and the power for T are computedbased 10000 simulated datasets under the null and alternative hypothesis respectively, fromthe conditionalized discrete distribution (CDVM in (a) and CDWC in (b) with a given valueof m ) of the given data-size n .The 5% critical values and the power for ¯ R are computed based10000 simulated datasets under the null and alternative hypothesis respectively, from thecontinuous distribution (VM in (a) and WC in (b)) of the given data-size n . For both CDVMand CDWC, we ﬁnd that the power of the test based on T is comparable to that based on ¯ R (i..e Rayleigh). a) CDVM: H : κ = 0 H : κ = 0 power for 5% test at κ = 0 . n m T Rayleigh ( ¯ R )1000 10 0.1445 0.16061000 37 0.1637 0.166810000 10 0.8976 0.896710000 37 0.8997 0.8992 (b) CDWC: H : ρ = 0 H : ρ = 0 power for 5% test at ρ = 0 . n m T Rayleigh ( ¯ R )1000 10 0.207 0.2121000 37 0.214 0.20910000 10 0.974 0.97610000 37 0.976 0.976 Table 9: shows the power computed for a 5% test based on the statistics T and ¯ R at a speciﬁc alternative, viz. κ = 0 .

05 for CDVM and ρ = 0 .

03 for CDWC, for eachof diﬀerent choices of data-size n ∈ { , } and m ∈ { , } . The 5% criticalvalues and the power for T are computed based 10000 simulated datasets under thenull and alternative hypothesis respectively. We will show here that the conditionalized discrete distributions resulting from von Mises andwrapped Cauchy can be very diﬀerent from each other; so one cannot be easily approximatedby the other family. In contrast, we show that the conditionalized discrete wrapped normal(CDWN) and conditionalized discrete von Mises are very close to each other; so for practicalpurposes may be interchangeable in data analysis.In order to compare the probability functions of

CDW C ( m, ρ, t ) and CDV M ( m, κ, t ),we need to ﬁrst map the parameters ρ to κ . We do so by matching their ﬁrst trigonometricmoments given by equations (63) and (66), i.e. B ( κ ) = ρ w . Figure 6 plots the probabilityfunctions for (i) m = 10 and (ii) m = 37 with t = 5 and t = 16 respectively, for ρ = 0 . κ value. We see that the CDWC is more spiked and heavy tailed compared toCDVM. (i) . . . . . . m = r = k = r p r obab ili t y f un c t i on (ii) . . . . m = r = k = r p r obab ili t y f un c t i on Figure 6: Probability functions of

CDW C ( m, ρ, t )(triangles joined by solid line ) and CDV M ( m, κ, t ) (cross joined by dotted lines) plotted for (i) m = 10 and (ii) m = 37with t = 5 and t = 16 respectively, for ρ = 0 . κ value by matching theﬁrst trigonometric moment. Further, to assess the diﬀerences quantitatively, we compute the maximum possible dis- ance between CDVM and CDWC using the Kullback-Leibler divergence ( KL ), L and L norms. Suppose f , f are members of the circular location family (48) with mode at 0, and˜ p , ˜ p be the respective conditionalized discrete distributions supported on { πr/m, r ∈ Z m .For our purpose, ˜ p , ˜ p are members of the conditionalized discrete location family. Then wedeﬁne KL (˜ p , ˜ p ) = m − X r =0 ˜ p (2 πr/m ) log ˜ p (2 πr/m )˜ p (2 πr/m ) . (90) L (˜ p , ˜ p ) = m − X r =0 | ˜ p (2 πr/m ) − ˜ p (2 πr/m ) | . (91) L (˜ p , ˜ p )) = vuut m − X r =0 (˜ p (2 πr/m ) − ˜ p (2 πr/m )) × m π . (92)We include the constant factor m/ (2 π ) in the deﬁnition of discrete L norm to ensure that itmatches the L norm of the continuous case as m tends to inﬁnity. So, we havelim m →∞ KL (˜ p , ˜ p ) = KL ( f , f ) , lim m →∞ L (˜ p , ˜ p ) = L ( f , f ) , lim m →∞ L (˜ p , ˜ p ) = L ( f , f ) . (93)For CDWC with respect to CDVM (as base), Table 10 (a) shows the maximum values for KL , L and L for m = 10 , , ρ w . We emphasizethat for m = 1000 and m = 10000, the maximum KL, L and L may be attained at valueslarger than ρ w = 0 . ρ w . We also see that as m increases the maximum KL , and the ρ w at which itis attained also increases. As a yard stick reference for analysing, we also compute KL , L and L for the conditionalized discrete wrapped normal (CDWN) distribution with base asCDVM, again by mapping the parameters based on ﬁrst trigonometric moments. The CDWNcan be constructed starting from the wrapped normal using the approach given in Section 2,equation (29), and its probability function can be shown to be p ( r | m, ρ, t ) = 1 + 2 P ∞ q =1 h ρ q cos { q (2 π ( r − t ) /m ) } i m (cid:0) P ∞ k =1 ρ k m (cid:1) , r, t ∈ Z m , ρ ∈ [0 , , (94)where ρ is the mean resultant length parameter of the wrapped normal (WN). In fact, this isa particular case of conditionalized discrete wrapped stable distributions, which we discuss inSection 8.4.Table 10 (b) shows the results for CDWN versus CDVM. Pewsey and Jones (2005) haveobserved that the von Mises and wrapped normal distributions are very close with respect to L and L norms. Our ﬁndings in Table 10 (b) are consistent with their work, in that the L and L values we compute for large m (m=10000) closely match their computations. Further,we can see from Table 10 that CDWN and CDVM are much closer with respect to KL, L and L than CDWC and CDVM. So, in general, CDWC is very diﬀerent from CDVM. However,CDVM and CDWC can be close for small values of ρ (e.g. ρ < . ρ due to the heavy tail nature of CDWC (see supplement). Further, for m → ∞ , the CDVMbecomes VM, CDWC becomes WC and CDWN become WN. The following result shows thatthe maximum KL of WC with base as VM is inﬁnite, whereas the KL of WN with base asVM goes to zero (see supplement for the proof). L and L with corresponding value of ρ w , for CDWCvs CDVM as base in (a), and CDWN vs CDVM as base in (b). The latter serves as ayard-stick reference. (a) CDWC vs CDVM KL L L m max ρ w max ρ w max ρ w

10 0.313 0.900 0.639 0.852 0.441 0.87037 0.951 0.985 1.126 0.975 1.491 0.9801000 ⋆ ⋆ ⋆ emphasizes that the maximum value is higher. (b) PDWN vs PDVM KL L L m max ρ w max ρ w max ρ w

10 0.018 0.700 0.099 0.563 0.051 0.63837 0.017 0.698 0.107 0.609 0.051 0.6341000 0.017 0.698 0.107 0.604 0.051 0.63410000 0.017 0.698 0.107 0.604 0.051 0.634

Theorem 7.

Let f v , f c and f n be the pdf of von Mises, wrapped Cauchy and wrapped normal with concen-tration parameters κ , ρ and σ respectively. Let the parameters be mapped to each other basedon their ﬁrst trigonometric moments as A ( κ ) = ρ = e − σ / , where A ( κ ) = I ( κ ) /I ( κ ) . Then for large κ , we have(i) KL ( f v , f c ) ≈ log(2 π ) − + R ∞−∞ e − z / √ π log (cid:16) √ κ + √ κz (cid:17) dz. (ii) KL ( f v , f n ) ≈ . Ideally, the marginalized approach is applicable to grouped data, whereas the conditionalizedapproach is appropriate when the data is naturally discrete. Practically, the conditionalizedapproach is mathematically and computationally more easily tractable and may be used whenwe do not expect much diﬀerence from the analyses based on the two distributions. We notehere that except for very small values of m such as m <

10, inference based on the condition-alized and marginalisation approaches will be similar. Figure 7 plots the probability functionsof MDWC (dashed lines with crosses) and CDWC (solid line with circle) for m = 10 , ρ ∈ { . , . , . } . For small values of m (e.g. 10) the two probabilityfunctions are diﬀerent (mainly peakedness), but for larger m (e.g. 37) the two merge witheach other across diﬀerent ρ values.In this paper, we have two practical examples, one for smart homes and another for roulettedata from casinos. In the former case, marginalized approach is natural, whereas in theroulette the conditionalized approach is natural. However, as noted above, for moderatelylarge m , the two approaches should lead to similar. Indeed, for the human activity data ( seeSection 6.2), we have veriﬁed that both the conditionalized and marginalized approaches leadto estimation of similar mixture components. However, for changepoint analysis with roulettedata, since the domain is a regular lattice, we used the conditionalized approach. . . . . . m = r = r p r obab ili t y f un c t i on . . . . . . m = r = r p r obab ili t y f un c t i on . . . . . . . m = r = r p r obab ili t y f un c t i on . . . . . m = r = r p r obab ili t y f un c t i on . . . . . . m = r = r p r obab ili t y f un c t i on . . . . m = r = r p r obab ili t y f un c t i on Figure 7: Comparison of probability functions of MDWC (dashed lines withcrosses) and CDWC (solid line with circle) for m = 10 ,

37, diﬀerent values of ρ ∈{ . , . , . } . Circular discrete data could arise mainly from two sources. The ﬁrst source is that thepopulation is continuous but the observed values are often recorded/used in an aggregatedform, e.g. grouped in bins or rounded appropriately. The second source, which is more naturalwhere the population itself is discrete, such as for the roulette data. In the ﬁrst case, therehas been a recent survey by Humphreys and Ruxton (2017) to ﬁnd out what are the mostcommon values of m in aggregating. It turns out that the most common values of m are 4,8, 12 and 36, which suggest that the assumption of a continuous distribution is often violatedfor data analysis.Suppose we have circular data { θ i , i = 1 , , . . . , n } We ﬁrst recap the Sheppard correctionfor the p th resultant length deﬁned below, which is in general key element of any data analysis.¯ R p = vuut n n X i =1 cos( pθ i ) ! + n n X i =1 sin( pθ i ) ! For example, the Rayleigh test for uniformity is based on ¯ R . When the data is actuallydiscrete, such as binned data, computing ¯ R p by treating it like continuous data may needthe Sheppard correction. For example, Humphreys and Ruxton (2017), highlights severalexamples in ecology, where the data is binned and inference by treating it as continuous maynot be appropriate. A general multiplier, when the data is discretized in bins of length h for his correction is as follows (see Mardia 1972, pp.37 -38)˜¯ R p = a ( ph ) ¯ R p , where a ( h ) = h h/ . (95)Here, we study the need for such a correction for the marginalized and conditionalized wrappedCauchy distributions, if the data is treated as if it were from wrapped Cauchy. Table 11shows trigonometric moments computed for M DW C ( m, ρ, t ) and CDW C ( m, ρ, t ) for ρ = 0 . t = 0, for diﬀerent values of m . As a reference, for wrapped Cauchy, E cos( θ ) = 0 . E cos(2 θ ) = 0 .

25. Discretization matters for MDWC if m less than 20, i.e. bin with angle18 degrees. Surprisingly eﬀect is a bit smaller for CDWC. The overriding message is thatSheppard correction does not inﬂuence the conclusions based on the trigonometric momentsunless the grouping is very coarse Mardia (1972). Our table gives a speciﬁc ﬂavor of theeﬀect rather than very broad Sheppard correction, which is generic . However, when thedata is intrinsically discrete such as in the roulette case it will be essential to use a discretedistribution for correct inference such as in mixtures, changepoint. Table 11: Trigonometric moments computed for

M DW C ( m, ρ, t ) and CDW C ( m, ρ, t )for ρ = 0 . t = 0, for diﬀerent values of m . As reference, for W C E cos( θ ) = 0 . E cos(2 θ ) = 0 . MDWC ⋆ CDWCm E cos( θ ) E cos(2 θ ) E cos( θ ) E cos(2 θ )3 0.159 0.159 0.667 0.6675 0.368 0.038 0.545 0.36410 0.466 0.190 0.501 0.25415 0.485 0.221 0.500 0.25020 0.493 0.232 0.500 0.25030 0.495 0.242 0.500 0.25050 0.497 0.248 0.500 0.250100 0.503 0.247 0.500 0.250500 0.499 0.248 0.500 0.250 ⋆ For MDWC the moments were computed using simulated data of size 200000.

First, we show how the marginalized and conditionalized discrete distributions (location fam-ily) extend to the irregular lattice support. Next we give a generalization of discrete circulardistributions to a four parameter family. Further, we give some distributions on the torusbased on a new bivariate wrapped Cauchy distribution, which has both marginal as wellas conditional wrapped Cauchy (unlike for bivariate von Mises distributions); so we checkwhether such a property is inherited by the corresponding discrete distribution or not. Finally,we give a generalization of conditionalized discrete distributions to conditionalized discretestable distributions. .1 A construction of location family supported on irregularlattice We have considered so far marginalized or conditionalized discrete distributions on the regularlattice but the domain can be extended to irregular lattice as follows. We can rewrite themarginalized and conditionalized discrete probability function in equations (49) and (56) interms of the angle θ ∈ { πr/m, r ∈ Z m } as˜ p ( θ | m, τ, t ) = Z θ +2 π/mθ g τ ( θ − πt/m ) dθ or ˜ p ( θ | m, τ, t ) = g τ ( θ − πt/m ) P m − r =0 g τ (2 πr/m ) . (96)We can then consider the following two component mixture p · ˜ p ( θ | m , τ , t ) + (1 − p ) · ˜ p ( θ | m , τ , t ) , where m = m . (97)The mixture is supported on the union of the supports of the individual discrete distributioncomponents, i.e., θ ∈ { πr/m , r = 0 , , , . . . , m − } ∪ { πr/m , r = 0 , , , . . . , m − } . With this construction, we can create new rich classes of discrete circular distributions possiblysupported on irregular lattices, with modes at diﬀerent locations and with possibly densedistribution with several modes. Here, we provide a simple example to illustrate the concept.Consider a mixture of discrete distribution components with m = 4 and m = 9. The supportfor m = 4 is on angles { , π/ , π, π/ } and for m = 9 it is { , π/ , π/ , π/ , π/ , π/ , π/ , π/ , π/ } . However, their mixture gives rise to a distribution with an irregular lattice support. Figure 8plots the points on the circle that form the support for such a distribution. The points denotedby circles are in the support of the distribution with m = 4 and the points denoted by crossesare in the support of the distribution with m = 9. While each component is supported on aregular lattice, when these two components form a mixture, the support is an irregular latticeon the circle (see supplement for some interesting cases). We follow the general family of circular distributions of Kato and Jones (2015), which includeswrapped Cauchy distribution. It has four parameters which controls its ﬁrst four trigonometricmoments leading to unimodal symmetrical as well as skew distributions as particular cases.This family also has an analytically tractable normalizing constant and its pdf is given by g KJ ( θ ) = 12 π (cid:18) γ cos( θ − µ ) − ρ cos λ ρ − ρ cos( θ − µ − λ ) (cid:19) , − π < θ ≤ π, (98)where the parameters are constrained by0 ≤ ρ < , ≤ γ ≤ (1 + ρ ) / , − π ≤ µ, λ ≤ π, and ργ cos λ ≥ ( ρ + 2 γ − / . (99) m = 4 and m = 9. Support for m = 4 indicated by circles and for m = 9 bycrosses. −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 − . − . . . . p p p The marginalized discrete distribution based on (98) (see for example Kato and Jones 2015,supplementary ﬁle) can be shown to have a closed form probability function given by p ( r | m, ρ, µ, γ, λ )= 1 m (1 − γ/ρ cos λ ) + γ sin λ πρ log 1 + ρ − ρ cos(2 πr/m − µ − λ )1 + ρ − ρ cos(2 π ( r + 1) /m − µ − λ )+ γ cos λπρ tan −  ρ − ρ (tan( π ( r + 1) /m − µ/ − λ/ − tan( πr/m − µ/ − λ/ (cid:16) ρ − ρ (cid:17) tan( π ( r + 1) /m − µ/ − λ/

2) tan( πr/m − µ/ − λ/  + γ cos λρ (cid:0) I { tan( π ( r +1) / − µ/ − λ/ < / tan( λ/ } − I { tan( πr/ − µ/ − λ/ < / tan( λ/ } (cid:1) ,r ∈ Z m , (100)with the same constraints on parameters as in (99). We will call this family, the marginalizeddiscrete Kato-Jones family, or brieﬂy as MDKJ family. For λ = 0, a family of symmetricdistributions is obtained with the probability function p ( r | m, ρ, µ, γ, λ = 0)= 1 − γ/ρm + γπρ tan −  ρ − ρ { tan( π ( r + 1) /m − µ/ − tan( πr/m − µ/ } (cid:16) ρ − ρ (cid:17) tan( π ( r + 1) /m − µ/

2) tan( πr/m − µ/  ,r ∈ Z m . (101)The conditionalized discrete distribution from (98), is given by the probability function p ( r | m, ρ, µ, γ, λ ) = 1 D ⋆ γ cos( πrm − µ ) − ρ cos λ ρ − ρ cos( πrm − µ − λ ) ! , r ∈ Z m , (102)with the same constraints on parameters as in (99). We will call this family, the conditionalizeddiscrete Kato-Jones family, or brieﬂy as CDKJ family. The CDWC is obtained as a special m = 40, t = 18 , ρ = 0 . γ, λ ): (a) (0 . ,

0) (solid line and circle) , (b) (0 . , . , π/m ) (dotted line and circle) and (d)(0 . , π ( m − /m ) (dotted line and triangle) respectively. . . . . . r p r obab ili t y f un c t i on case when λ = 0 and γ = ρ. Note that the constraints ensure that the probability functionin (98) is positive and hence also for the discretized version (102). The normalizing constantcan be simpliﬁed as stated in the following theorem.

Theorem 8.

For the probability function (102), we have D ⋆ = m (cid:18) γρ m − cos( m ( µ + λ ) − λ ) − ρ m cos λ ρ m − ρ m cos( m ( µ + λ )) (cid:19) . (103)The proof of this theorem is given in the supplement. We now give a plot in Figure 9of typical particular cases of this CDKJ distribution. We have selected m = 40, t = 18 or µ = 2 π × t/m , ρ = 0 . γ, λ ): (a) (0 . ,

0) , (b) (0 . ,

0) , (c)(0 . , π/m ), (d) (0 . , π ( m − /m ). Note that (a) is “CDWC” . It can be seen now that thedistribution (b) is “ﬂatter than CDWC but still symmetric”, (c) is “more skewed than CDWC”and (d) is “ﬂatter and more skewed than CDWC” respectively. Thus, the CDKJ family isalso quite ﬂexible like that of Kato and Jones (2015), including various type of symmetricalas well as skewed distributions.It is to be noted that the Beran family (Beran 1979) provides generalization of von Misesdistribution with exponent trigonometric functions of higher order and we can write condition-alized discrete Beran distributions. However, even in the simple case of four terms only (up tosecond order trigonometric function), it does not lead to a neat interpretation of parameters( see for example Gatto and Jammalamadaka 2007) though it allows for skewness. Kato and Pewsey (2015) have given a bivariate Cauchy distribution on the torus with pdf of he form g ( θ , θ ) ∝ ( c − c cos( θ − µ ) − c cos( θ − µ ) − c cos( θ − µ ) cos( θ − µ ) − c sin( θ − µ ) sin( θ − µ )) − , (104) − π ≤ µ , µ ≤ π, − π ≤ µ , µ ≤ π . This is one of the only known distribution whichhas both marginals and conditionals of the same form, namely wrapped Cauchy. Indeed, amultivariate version of (104), which we call “multivariate wrapped Cauchy”(MWC) will be ofthe form g ( θ , θ , . . . , θ d ) ∝ (cid:18) c + κ T cos( θ − µ ) + (cid:2) cos θ sin θ (cid:3) Σ − (cid:20) cos θ sin θ (cid:21)(cid:19) − , (105)where θ = ( θ , θ , . . . , θ d ) T , µ = ( µ , µ , . . . , µ d ) T and Σ − = (cid:20) Σ Σ Σ Σ (cid:21) is a positive deﬁnitematrix.A particular case of interest is g ( θ , θ , . . . , θ d ) ∝ c + P di =1 P dj =1 c ij cos( θ i − θ j − µ ij ) . (106)Furthermore, the generalized multivariate von Mises distribution (gMVM) is given by thepdf g ( θ , θ , . . . , θ d ) ∝ exp (cid:26) c + κ T cos( θ − µ ) + (cid:2) cos θ sin θ (cid:3) Σ − (cid:20) cos θ sin θ (cid:21)(cid:27) . (107)This is a particular case of the distribution given by Khatri and Mardia (1977) which is thegeneralized multivariate von Mises distribution, see also (Navarro et al., 2017). For Σ =Σ = , we get the distribution of Mardia and Patrangenaru (2005). Thus from equations(105) and (107), we can obtain the corresponding marginalized and conditionalized discretedistributions with the random variable θ i replaced by 2 πr i /m for i = 1 , . . . , d .Obtaining the marginalized discrete generalized von Mises (MDgMVM) distribution from(107) would involve evaluation of multiple integration, which would need to be done numeri-cally. However, one of the key advantages of this construction is that normalizing constant isautomatically taken care and further the marginal distributions will be univariate MDgVM.The conditionalized discrete generalized von Mises (CDgMVM) is given by p (2 π r /m ) ∝ exp (cid:26) c + κ T cos(2 π r /m − µ ) + (cid:2) cos(2 π r /m ) sin(2 π r /m ) (cid:3) Σ − (cid:20) cos(2 π r /m )sin(2 π r /m ) (cid:21)(cid:27) . (108)The conditional distributions, i.e. r i | ( r , . . . , r i − , r i +1 , . . . , r k ) for each i will be CDVM. Also,being in the exponential family, we can write a conjugate prior following standard approach(see for example Mardia 2010)We now discuss below, the bivariate case and some properties. It is expected followingKato and Pewsey (2015) and Kent and Tyler (1988), that it should be useful when we seekrobust estimates. Kato and Pewsey (2015) have stated that “the bivariate Cauchy distributionshould be useful when the frequency of data some distance from the mode is relatively high”(in contrast to the bivariate von Mises distribution of Mardia (1975a)). .3.1 Bivariate case We start with the simplest bivariate case given in equation (5) of Kato and Pewsey (2015)(which is the distribution ﬁrst introduced by Kato (2009)), i.e. g ( θ , θ ) = 14 π

11 + ρ − | ρ | cos( qθ − θ − µ ) , − π < θ , θ ≤ π, (109)where q = sgn ( ρ ) ( q = ± − π < µ ≤ π and − < ρ <

1. For simplicity take ρ >

0; so q = 1.Note that here the marginal distributions are uniform and the conditional distributions arewrapped Cauchy.Using equation (38), the bivariate marginalized discrete distribution can be written as p ( r , r | m, ρ, µ ) = 12 π Z π ( r +1) /m πr /m π Z π ( r +1) /m πr /m dθ ρ − ρ cos( θ − θ − µ ) dθ = 12 π Z π ( r +1) /m πr /m π tan − (cid:18) ρ − ρ tan ( π ( r + 1) /m − µ/ − θ / (cid:19) dθ − π Z π ( r +1) /m πr /m π tan − (cid:18) ρ − ρ tan ( πr /m − µ/ − θ / (cid:19) dθ This can be further simpliﬁed as p ( r , r | m, ρ, µ ) = v ( u ( r − r ) , u ( r − r + 1)) − v ( u ( r − r − , u ( r − r )) , (110)where v ( k , k ) = 12 π − ρ ρ Z u ( k ) u ( k ) tan − u (cid:16) − ρ ρ (cid:17) u du, with u ( k ) = 1 + ρ − ρ tan( πk/m + µ/ . The probability function (110) does not admit a closed form and would need to be computednumerically. Note that m − X r =0 p ( r , r | m, ρ, µ ) = Z π − π g (2 πr /m, θ ) dθ = 12 π Z π ( r +1) /m πr /m dθ = 1 m . So, the marginal distributions will be uniform on the lattice. The conditional distribution of r | r is therefore given by p ( r | r , m, ρ, µ ) = p ( r , r | m, ρ, µ ) P m − r =0 p ( r , r | m, ρ, µ ) = m p ( r , r , m, ρ, µ ) , r ∈ Z m , which is proportional to the right hand side of (110). So, the conditional distribution is notMDWC.The bivariate conditionalized discrete distribution from equation (109) is more easily tractableand can be written as p ( r , r | m, ρ, µ ) = 1 D ⋆⋆

11 + ρ − ρ cos(2 π ( r − r ) /m − µ ) , (111) here the normalizing constant is derived in the supplement as D ⋆⋆ = m (1 − ρ m )(1 − ρ )(1 + ρ m − ρ m cos ( mµ )) . (112)Using the same formula it can be shown that the marginal distributions are uniform on thelattice. Also, it is easy to see that the conditional distributions are CDWC. This is also truewith q = − θ , θ ) to the new variables ( φ , φ ) deﬁned by θ i = ν i + 2 arctan (cid:18) ξ i − ξ i (cid:19) tan (cid:18) φ i − µ − ν i (cid:19) , i = 1 , . (113)However, the M¨obius transformation is a nonlinear transformation; so ( φ , φ ) from θ , θ donot get mapped to the lattice and in particular the resulting points are not equispaced so themapped bivariate CDWC is not expected to have the marginals CDWC. Indeed, we have thefollowing result starting from the conditionalized discrete version from equation (104). Theorem 9.

The general 6-parameter distribution with conditionalized discrete version ofKato-Pewsey bivariate distribution on the torus has conditional distributions as CDWC butcannot have marginals as CDWC, except for some trivial cases.

A similar result applies for the multivariate (CDgMWC) case, i.e., the conditionals areCDWC, but not the marginals.

The wrapped Cauchy distribution generalises to wrapped stable distributions (see, for exam-ple, Mardia and Jupp (2000), which has the probability density function f s ( θ ) = 12 π  ∞ X q =1 (cid:2) ρ q a cos { q ( θ − µ ) + bq a } (cid:3) , θ, µ ∈ [0 , π ) , ρ ∈ [0 , , < a ≤ . (114)The wrapped stable distributions with b = 0 include the wrapped normal ( a = 2) and thewrapped Cauchy ( a = 1) distributions. In the case b = 0, the distribution has a uniquemode at µ , as it is a particular case of the circular location family. The correspondingconditionalized discrete wrapped stable distribution of equation (114), is obtained on replacing θ by θ = 2 πr/m and µ by 2 πt/m , and the probability function is given by p ( r ) = 1 + 2 P ∞ q =1 (cid:2) ρ q a cos { q (2 π ( r − t ) /m ) + bq a } (cid:3)P m − r =0 n P ∞ q =1 [ ρ q a cos { q (2 π ( r − t ) /m ) + bq a } ] o , r, t ∈ Z m . (115)For b = 0, it can be shown that we get the symmetrical distribution with the probabilityfunction p ( r | m, ρ, t, a ) = 1 + 2 P ∞ q =1 (cid:2) ρ q a cos { q (2 π ( r − t ) /m ) } (cid:3) m (1 + 2 P ∞ k =1 ρ ( mk ) a ) ,r, t ∈ Z m , ρ ∈ [0 , , a ∈ (0 , . (116)Its characteristic function is given by the following theorem (see supplement for the proof). heorem 10. The characteristic function of the probability function (116) is given by E h e ip πrm i = e ip πtm ψ p,m , (117) where ψ p,m = ρ p a + P ∞ k =1 ( ρ ( km − p ) a + ρ ( km + p ) a )1 + 2 P ∞ k =1 ρ ( km ) a , for p ( mod m ) . (118) In this paper, we have given various ways of constructing families of discrete circular distribu-tions and have selected the marginalized and conditionalized approaches for our analysis, butthe other constructions described above can be explored further. Sometimes we have selectedmarginalized or conditionalized, and von Mises or wrapped Cauchy discrete distributions. Wenote that such a situation of using two distributions for diﬀerent practical applications is notuncommon in directional statistics. Indeed, Kendall (1974) made a similar case when usingvon Mises vs wrapped normal distribution (the lumped Gaussian distribution):“For analytical, computational, and statistical purposes it is sometimes the lumped Gaus-sian distribution which is the more convenient, and sometimes the von Mises. As they arepractically indistinguishable, we shall have (and will accept) the option of using sometimesone, and sometimes the other, and we shall change horses more than once in crossing thebroad stream that lies before us.”However, the situation is a bit more involved here since we have competitive discretedistributions from the two families. In Section 7, we have carried out various comparisons todeal with these and related inferential questions.The marginalized and conditionalized families of distributions have a signiﬁcant potentialfor further development. In fact we have already pointed out a characterization on line thatthe two approaches lead to the same distribution if and only if the parent is the exponentialdistribution. We have some analogous results for the discrete circular case.In Section 8 we have introduced the conditionalized circular stable distributions, marginal-ized and conditionalized Kato-Jones distribution and bivariate discrete distributions on thetorus, but further work can be done in various directions, such as dealing with inferenceproblems and data analysis. We have extended the model to higher manifolds on torus withdiscrete support but we have left as an open problem on how to extend the model to hy-persphere for example; this temptation has been resisted as we have not had any motivatingpractical applications.

Supplementary material

Supplementary materials may be requested from the authors. More details and explanationwill be available in the forthcoming monograph Mardia and Sriram (2020).

Acknowledgments

We wish to thank Arthur Pewsey for some help with R, Eris Chinellato for help with thehumane activity data and CHAD, to John Kent, Peter Green and John Wootton for some iscussions. The ﬁrst author would also like to thank the Leverhulme Trust for the EmeritusFellowship. References

Beran, R. (1979) Exponential models for directional data.

Annals of Statistics , , 1162–1178.CHAD (2002) Extracting human activity information from chadon the pc. u.s. environmental protection agency, washington. .Chinellato, E., Mardia, K. V., Hogg, D. and Cohn, A. (2017) An incremental von misesmixture framework for modelling human activity streaming data. Proceedings Internationalwork-conference on Time Series (ITISE 2017.) , 379–389.Downs, T. D. and Mardia, K. V. (2002) Circular regression.

Biometrika , , 683–698.Fisher, N. I. (1993) Statistical Analysis of Circular Data . Cambridge University Press.Fisher, N. I., Lewis, T. and Embleton, B. (1987)

Statistical Analysis of Spherical Data . Cam-bridge University Press.Gatto, R. and Jammalamadaka, S. R. (2007) The generalized von mises distribution.

StatisticalMethodology , , 341–353.HOWZ (n.d.) Link to howz: A smart home for the elderly .Humphreys, R. K. and Ruxton, G. D. (2017) Consequences of grouped data for testing fordeparture from circular uniformity. Behavioral Ecology and Sociobiology , , 167.Inusaha, S. and Kozubowski, T. J. (2006) A discrete analogue of the laplace distribution. Journal of Statistical Planning and Inference , , 1090–1102.Jammalamadaka, R. and Sengupta, A. (2001) Topics in Circular Statistics . Chapman andHall/CRC.Jayakumar, K. and Jacob, S. (2012) Wrapped skew laplace distribution on integers: a newprobability model for circular data.

Open Journal of Statistics , , 106–114.Joe, H. (2014) Monographs on Applied Statistics and Probability . CRC Press, Taylor FrancisGroup, Chapman and Hall.Johnson, R. A. and Wehrly, T. E. (1978) Some angular-linear distributions and related re-gression models.

Journal of American Statistical Association , , 602–606.Jolley, L. B. W. (1961) Summation of Series . Dover Publications Inc, New York.Jones, M. C., Pewsey, A. and Kato, S. (2015) On a class of circulas: copulas for circulardistributions.

Annals of the Institute of Statistical Mathematics , , 843–862. ato, S. (2009) A distribution for a pair of unit vectors generated by brownian motion. Bernoulli , , 898–921.Kato, S. and Jones, M. C. (2015) A tractable and interpretable four-parameter family ofunimodal distributions on the circle. Biometrika , , 181–190.Kato, S. and Pewsey, A. (2015) A m¨obius transformation-induced distribution on the torus. Biometrika , , 359–370.Kemp, A. W. (1997) Characterizations of a discrete normal distribution. Journal of StatisticalPlanning and Inference , , 223–229.Kendall, D. G. (1974) Hunting quanta. Philosophical Transactions of the Royal Society ofLondon. Series A, Mathematical and Physical Sciences , , 231–266.Kent, J. T. and Tyler, D. E. (1988) Maximum likelihood estimation for the wrapped cauchydistribution. Journal of Applied Statistics , , 247–254.Khatri, C. G. and Mardia, K. V. (1977) The von mises-ﬁsher matrix distribution in orientationstatistics. Journal of Royal Statistical Society B , , 95–106.Ley, C. and Verdebout, T. (2017) Modern Directional Statistics . Chapman and Hall/CRC.— (2018)

Applied Directional Statistics, Modern Methods and Case Studies . Chapman andHall/CRC.Mardia, K. V. (1972)

Statistics of Directional Data . Academic Press, London.— (1975a) Statistics of directional data (with discussion).

Journal of Royal Statistical SocietyB , , 349–393.— (1975b) Characterizations of directional distributions. In G. P. Patil, S. Kotz and J.K. Ord (eds), Statistical Distributions in Scientiﬁc Work , , 365–385 Reidel, Dordrecht.(55, 171, 262).— (2010) Bayesian analysis for bivariate von mises distributions. Journal of Applied Statistics , , 515–528.Mardia, K. V. and Jupp, P. E. (2000) Directional Statistics . Wiley.Mardia, K. V. and Patrangenaru, V. (2005) Directions and projective shapes.

The Annals ofStatistics , , 1666–1699.Mardia, K. V. and Sriram, K. (2020) “Circular Statistics for Discrete Data with R” ResearchMonograph (in preparation) .Mastrantonio, G., Jona Lasinio, G., Maruotti, A. and Calise, G. (2019) Invariance propertiesand statistical inference for circular data. Statistica Sinica , , 67–80.Navarro, A. K. W., Frellsen, J. and Turner, R. E. (2017) The multivariate gener-alised von mises distribution: inference and applications. ArXiv working paper: https: // arxiv. org/ abs/ 1602. 05003 . apadatos, N. (2018) The characteristic function of the discrete cauchy distribution. ArXivlink https: // arxiv. org/ abs/ 1809. 09443 .Pewsey, A. and Garc´ıa-Portugu´es, E. (2020) Recent advances in directional statistics.

ArXiv https: // arxiv. org/ abs/ 2005. 06889 .Pewsey, A. and Jones, M. C. (2005) Discrimination between the von mises and wrappednormal distributions: just how big does the sample size have to be?

Statistics , , 81–89.Pewsey, A., Neu¨auser, M. and Ruxton, G. D. (2013) Circular Statistics in R . Oxford UniversityPress.Szab lowski, P. J. (2001) Discrete normal distribution and its relationship with jacobi thetafunctions.

Statistics and Probability Letters , , 289–299.Watson, G., S. and Beran, R. J. (1967) Testing a sequence of unit vectors for serial correlation. Journal of Geophysical Research , , 5655–5659., 5655–5659.