[PDF] Collective Bias Models in Two-Tier Voting Systems and the Democracy Deficit

Abstract

We analyse optimal voting weights in two-tier voting systems. In our model, the overall population (or union) is split in groups (or member states) of different sizes. The individuals comprising the overall population constitute the first tier, and the council is the second tier. Each group has a representative in the council that casts votes on their behalf. By `optimal weights', we mean voting weights in the council which minimise the democracy deficit, i.e. the expected deviation of the council vote from a (hypothetical) popular vote. We assume that the voters within each group interact via what we call a local collective bias or common belief (through tradition, common values, strong religious beliefs, etc.). We allow in addition an interaction across group borders via a global bias. Thus, the voting behaviour of each voter depends on the behaviour of all other voters. This correlation is stronger between voters in the same group, but in general not zero for voters in different groups. We call the respective voting measure a Collective Bias Model (CBM). The `simple CBM' introduced by Kirsch (2007) and in particular the Impartial Culture and the Impartial Anonymous Culture are special cases of our general model. We compute the optimal weights for large groups rather explicitly. Those optimal weights are unique as long as there is no `complete' correlation between the groups. If the correlation between voters in different groups is extremely strong, then the optimal weights are not unique at all. In fact, in this case, the weights are essentially arbitrary.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b Collective Bias Models in Two-Tier VotingSystems and the Democracy Deﬁcit

Werner Kirsch ∗ and Gabor Toth † Abstract

In this article, we analyse a two-tier voting system in which the over-all population is split into several diﬀerent groups and the democracydeﬁcit, the expected quadratic deviation of the council vote from the ref-erendum involving the entire population. The proposals voted on fall inthe category of yes/no voting. The voting behaviour is described by aprobabilistic model in which there is a central inﬂuence or bias that af-fects the voters’ decision. We study diﬀerent versions of the model andthe correlation between voters belonging to diﬀerent groups is allowed tobe weaker than within groups. The two main questions we analyse arethe asymptotic behaviour of the (normalised) democracy deﬁcit and theoptimal voting weights the groups receive in the council vote chosen tominimise the democracy deﬁcit.

Keywords. Two-tier voting systems, probabilistic voting, collective bias models,democracy deﬁcit, optimal weights, limit theorem.2020 Mathematics Subject Classiﬁcation. 91B12, 91B14, 60F05.

We study voting in two-tier voting systems and the determination of the optimalvoting weights. Suppose there is a population subdivided into M ∈ N groups,each of which sends a representative to a council who casts a vote in binary votesaccording to the decision reached by majority rule by their respective group.Since the groups may be of diﬀerent size, it is natural to assign diﬀerent votingweights to each one. How should the weights be determined? One objectivestudied in the literature is to minimise the democracy deﬁcit (see e.g. [3, 7]),i.e. the expected quadratic deviation of the council vote from a hypotheticalreferendum across the entire population.Suppose the overall population is of size N , whereas each group has size N λ ,where the subindex λ stands for the group λ ∈ { , . . . , M } . Let the two ∗ FernUniversit¨at in Hagen, Germany, [email protected] † FernUniversit¨at in Hagen, Germany, [email protected] ±

1. We can interpret +1 as ‘aye’ and − X λi , where the subindex λ stands once again forthe group and i ∈ { , . . . , N λ } for the voter. We assume that voting behaviouris described by a probability measure. Deﬁnition 1.

A voting measure P is a probability measure on the space of votingconﬁgurations {− , } N with the symmetry property P ( X = x , . . . , X MN M = x MN M ) = P ( X = − x , . . . , X MN M = − x MN M )(1)for all voting conﬁgurations ( x , . . . , x MN M ) ∈ {− , } N . Let E be the expect-ation with respect to P .While the votes cast are assumed to be deterministic, obeying the voters’ prefer-ences which we do not model explicitly, the proposal put before them is assumedto be randomly selected. Since each yes/no question can be posed in two oppos-ite ways, one to which a given voter would respond ‘aye’ to and one to whichshe would respond ‘nay’ to, it is reasonable to assume that each voter votes‘aye’ with the same probability she votes ‘nay’. We also assume that there isa suﬃcient range of proposals that elicits all 2 N possible responses from thevoting population.Each group votes on any given issue. Since we assume the majority rule, thesum of the votes within each group are of key importance: Deﬁnition 2.

For each group λ , we deﬁne the voting margin S λ := P N λ i =1 X λi .The overall voting margin is S := P Mλ =1 S λ .Each group casts a vote in the council: Deﬁnition 3.

The council vote of group λ is given by χ λ := ( , if S λ > , − , otherwise.It is clear that each χ λ is a random variable deﬁned on the same probabilityspace as the voting measure. Now we can deﬁne the democracy deﬁcit:The democracy deﬁcit given a voting measure P and a set of weights w , . . . , w M is deﬁned by ∆ := E  Sσ − M X λ =1 w λ χ λ !  , where σ is a normalising constant.Note that the democracy deﬁcit depends on the voting measure. P Mλ =1 w λ χ λ isthe council vote. The council vote is in favour of a proposal if P Mλ =1 w λ χ λ > / P Mλ =1 w λ χ λ > emark . We note here that for a ﬁxed relative quota of 1 /

2, a voting systemwith a set of weights w , . . . , w M is equivalent to a voting system with the samequota and weights w ′ , . . . , w ′ M , where there is a constant s > λ = 1 , . . . , M w ′ λ = sw λ .We will be concerned with matters related to the asymptotic voting behaviourand the democracy deﬁcit. We will deﬁne a class of voting measures calledcollective bias models and analyse the large N limit of the model in the senseof convergence theorems of the type (cid:18) S N γ , . . . , S M N γ M M (cid:19) = ⇒ N →∞ η, where = ⇒ N →∞ stands for convergence in distribution, each γ λ > η is some probability measure on R M . In the models analysed inthis article, all moments of the types E (cid:16) S λ N γλλ (cid:17) , E (cid:16) S λ N γλλ S µ N γµµ (cid:17) , E (cid:16) S λ N γλλ χ µ (cid:17) = E (cid:16) S λ N γλλ sgn ( S µ ) (cid:17) for all λ, µ = 1 , . . . , M converge to the moments of η . If ( P N )is a sequence of voting measures such that ( S /N γ , . . . , S M /N γ M M ) convergesin distribution, and the above mentioned moments also converge, we will alsowrite P N → η and say ‘ P N converges to a limiting distribution’.The general framework of a collective bias model is given by a set of bias randomvariables that represent some cultural or political inﬂuence that acts on allvoters. There is a central bias variable Z with distribution µ which inducescorrelation between voters of diﬀerent groups. Furthermore, there is a biasvariable Z λ for each group. Its conditional distribution given the realisation Z = ζ is ρ ζ . The group bias variable Z λ induces correlation between the votersbelonging to that group. Every probability measure involved has a supportwhich belongs to [ − , Z = ζ according to µ and Z λ = ζ λ accordingto ρ ζ , all voters in group λ cast their vote independently, with a probabilityof voting ‘aye’ equal to ζ λ . Hence, a value ζ λ = 1 implies that all votersbelonging to group λ vote ‘aye’ almost surely. Similarly, ζ λ = − ζ λ = 0 means there is no bias and all voters in thegroup vote independently.The symmetry condition (1) requires that the distribution of the random vari-ables Z λ be symmetric around the origin for all λ . For simplicity’s sake, we willbe assuming the suﬃcient condition that1. µ is symmetric,2. for all ζ ∈ [ − , ρ ζ satisfy for all measurablesets A ⊂ [ − ,

1] the equality ρ ζ A = ρ − ζ ( − A ).3ince we want to study correlated voting, we impose a monotonicity conditionon the conditional distributions ρ ζ :The function ζ ρ ζ (0 ,

1] is increasing.The monotonicity condition represents our assumption that higher central biasesincrease the likelihood that voters vote in favour of a proposal. It implies thereis positive correlation between group biases in diﬀerent groups.

Lemma 5.

For all groups κ, λ E ( Z κ Z λ ) ≥ holds. In fact, the inequality E ( Z κ Z λ ) > µ is not the Dirac measure δ and the function ζ ρ ζ (0 ,

1] is not constant.

Deﬁnition 6.

A Collective Bias Model (CBM) P is deﬁned by setting for eachvoting conﬁguration ( x , . . . , x MN M ) ∈ {− , } N P ( X = x , . . . , X MN M = x MN M ) := Z [ − , Z [ − , · · · Z [ − , M Y λ =1 N λ Y i =1 ((1 − p ) δ − ( { x λi } ) + pδ ( { x λi } )) d ρ ζ · · · d ρ ζM d µ, where p = ζ λ . The measures ρ ζ , . . . , ρ ζM are all identical to ρ ζ . We will alsowrite P ( ζ ,...,ζ M ) ( X = x , . . . , X MN M = x MN M ) := M Y λ =1 N λ Y i =1 ((1 − p ) δ − ( { x λi } ) + pδ ( { x λi } ))for the conditional product measure.For the rest of this article, we will assume that each group becomes large as theoverall population goes to inﬁnity, lim N →∞ N λ = ∞ . Suppose that the groupsizes as fractions of the overall population converge to ﬁxed limits: α λ := lim N →∞ N λ N , λ = 1 , . . . , M.

We want to set the weights so that the democracy deﬁcit is minimal. By takingpartial derivatives of ∆ we obtain the linear equation system that characterisesthe optimal weights.( E N ( χ λ χ µ )) λ,µ =1 ,...,M ( w ν ) ν =1 ,...,M = 1 σ N ( E N ( χ λ S )) λ =1 ,...,M . (2)We will refer to the coeﬃcient matrix above as A N := ( a λ,µ ) λ,µ =1 ,...,M := ( E N ( χ λ χ µ )) λ,µ =1 ,...,M , w := ( w ν ) ν =1 ,...,M , and the vector on the right hand side as b := ( b λ ) λ =1 ,...,M := 1 σ N ( E N ( χ λ S )) λ =1 ,...,M . (3)The matrix A N is the covariance matrix of the random vector ( χ , . . . , χ M ). It isinvertible under very mild conditions. If any linear combination of the χ λ is notalmost surely constant, i.e. if there are no constants α ∈ R M , α = 0 , β ∈ R suchthat P Mλ =1 α λ χ λ = β a.s., then we say χ , . . . , χ M are stochastically linearlyindependent. Then the following proposition holds: Proposition 7.

The covariance matrix A N is positive deﬁnite if and only if χ , . . . , χ M are stochastically linearly independent. The proof of this statement is straightforward (see Theorem 1.4 in [9]).A suﬃcient condition for stochastic linear independence is

Lemma 8.

If for all ( x , . . . , x M ) ∈ {− , } M we have P (sgn ( S λ ) = x λ , λ = 1 , . . . , M ) > , then χ , . . . , χ M are stochastically linearly independent. For ﬁnite populations, A is positive deﬁnite unless the biases are overwhelming. Proposition 9.

If for all λ the bias Z λ satisﬁes E | Z λ | < , then A is invertible.Proof. If E | Z λ | <

1, then there is a closed interval [ − q λ , q λ ] ( [ − ,

1] such that P ( Z λ ∈ [ − q λ , q λ ]) >

0. Due to the conditional independence of the X λi given( ζ , . . . , ζ M ) ∈ [ − q λ , q λ ] P ( ζ ,...,ζ M ) ( S λ < , P ( ζ ,...,ζ M ) ( S λ > >

0. This fol-lows from | ζ λ | ≤ q λ < P ( ζ ,...,ζ M ) ( X λi < , P ( ζ ,...,ζ M ) ( X λi > > i . Since conditionally on ( ζ , . . . , ζ M ), the voting margins S λ are inde-pendent, we obtain the suﬃcient condition in Lemma 8 by integrating P ( ζ ,...,ζ M ) with respect to ρ ζ , . . . , ρ ζM , and µ .Therefore, the optimal weights are uniquely determined provided the biases arenot overwhelming. However, calculating the inverse matrix to A and analysingits properties is very diﬃcult for ﬁnite N . We instead turn to the limit A :=lim N →∞ A N .For large populations, it is not suﬃcient that E | Z λ | < Proposition 10.

The covariance matrix A is positive deﬁnite if and only if forall λ, µ ∈ { , . . . , M } , λ = µ, P ( Z λ Z µ ≤ > holds. For a proof of this proposition see Proposition 6.2.3 in [11].We now turn to the democracy deﬁcit. In Deﬁnition 3, we choose the normal-isation σ N := p E ( S ). Then ∆ N remains uniformly bounded for all N . This5llows us to study the large N behaviour of the democracy deﬁcit. If we did notnormalise, i.e. σ N = 1, then the democracy deﬁcit would grow without bound,even if we did choose the weights to minimise it. Unless we state otherwise, wewill be considering σ N = p E ( S ). Lemma 11.

For any ﬁxed set of weights w , . . . , w M , there is a constant K > such that for all N the democracy deﬁcit is bounded above by K . Also, E N (cid:16) S σ N (cid:17) = 1 for every N .Remark . The second fact is essential. If we normalised in such a way that E N (cid:16) S σ N (cid:17) → N → ∞ , then minimising ∆ would be equivalent to minimisingthe squared council voting margin, i.e. we would be selecting weights to ensurethat the council is as evenly split on average as possible. That is not what wewant. Proof.

The second claim is obvious: E N (cid:16) S σ N (cid:17) = E N (cid:16) S E N ( S ) (cid:17) = 1. For anyﬁxed set of weights w , . . . , w M , the random variable W := P Mλ =1 w λ χ λ isbounded, so absolute moments E (cid:16) | W | k (cid:17) of all orders k ∈ N exist. We cal-culate an upper bound for ∆ that is independent of N :∆ ≤ E (cid:18) S σ (cid:19) + 2 E (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) Sσ (cid:12)(cid:12)(cid:12)(cid:12) | W | (cid:19) + E (cid:0) W (cid:1) ≤ s E (cid:18) S σ (cid:19)p E ( W ) + E (cid:0) W (cid:1) ≤ √ M + M = ( M + 1) =: K. Next we prove two results concerning the behaviour of ∆ for large N . As wewill see later, for most models we consider in this article, the set of optimalweights is uniquely determined for ﬁnite N . The same cannot be said for thelimit N → ∞ .We will call the set of optimal weights according to the limiting distribution W ∞ . Let w N ∈ R M be the uniquely determined optimal weights for overallpopulation size N . Proposition 13.

Assume P N converges to a limiting distribution. Then for all v, w ∈ W ∞ , we have lim N →∞ | ∆ N ( v ) − ∆ N ( w ) | = 0 . This statement holds for all convergent sequences of voting measures with uniqueoptimal votes w N . We will state a stronger result next that relies on someproperties of CBMs. 6 roof. The democracy deﬁcits for ﬁnite N, ∆ N , and for the limiting distribu-tion, ∆ , are polynomials of degree 2 in the weights w , . . . , w M . Due to theconvergence of P N , we have pointwise convergence ∆ N → ∆. Also | ∆ N ( v ) − ∆ N ( w ) | ≤ | ∆ N ( v ) − ∆( w ) | + | ∆( v ) − ∆( w ) | + | ∆( w ) − ∆ N ( w ) | . The ﬁrst and the last term converge to 0 due to the pointwise convergence ∆ N → ∆. The second term equals 0, since v, w ∈ W ∞ and hence ∆( v ) = ∆( w ). Deﬁnition 14.

For any two functions f and g of a natural number n , we write f ( n ) ≈ g ( n ) to indicate that lim n →∞ f ( n ) g ( n ) = 1 holds. Proposition 15.

Assume P N converges to a limiting distribution. Suppose alsothat1. | W ∞ | = 1 , or2. the limiting distribution’s covariance matrix of ( χ , . . . , χ M ) is A = (1) λ,µ =1 ,...,M and for all λ, µ = 1 , . . . , M E N ( Sχ λ ) ≈ E N ( Sχ µ ) .Then we have for all w ∈ W ∞ : lim N →∞ (cid:12)(cid:12) ∆ N (cid:0) w N (cid:1) − ∆ N ( w ) (cid:12)(cid:12) = 0 .Remark . As we will see later, all models considered in this article satisfy oneof these two conditions. Either the limiting covariance matrix A is invertible, andhence the optimal weights are uniquely determined, or else the second conditionholds. Deﬁnition 17. If P N converges to a limiting distribution, then we say that P N belongs to the ﬁrst category if | W ∞ | = 1. If instead the second condition inProposition 15 holds, then we say that P N belongs to the second category. Proof of Proposition 15.

We ﬁrst prove the assertion if | W ∞ | = 1. Since foreach N the weight vector w N is the solution of linear equation system (2).Furthermore, as P N converges to a limiting distribution, the coeﬃcients in theequation system corresponding to N converge to the coeﬃcients of the limitingequation system. This implies that for W ∞ = { w } , we have w N → w as N → ∞ . We calculate (cid:12)(cid:12) ∆ N (cid:0) w N (cid:1) − ∆ N ( w ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) ∆ N (cid:0) w N (cid:1) − ∆ (cid:0) w N (cid:1)(cid:12)(cid:12) + (cid:12)(cid:12) ∆ (cid:0) w N (cid:1) − ∆( w ) (cid:12)(cid:12) + | ∆( w ) − ∆ N ( w ) | . w N → w . The third term converges to0 due to the pointwise convergence ∆ N → ∆. As for the ﬁrst term, (cid:12)(cid:12) ∆ N (cid:0) w N (cid:1) − ∆ (cid:0) w N (cid:1)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 M X µ =1 E N ( S λ S µ ) σ N − lim N →∞ M X λ =1 M X µ =1 E N ( S λ S µ ) σ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 2 M X λ =1 M X µ =1 (cid:12)(cid:12)(cid:12)(cid:12) E N ( S λ χ µ ) σ N − lim N →∞ E N ( S λ χ µ ) σ N (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12) w Nµ (cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 w Nλ ! − M X λ =1 w Nλ ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . The last summand is 0. The ﬁrst summand converges to 0 due to the convergenceof P N , whereas the second term converges because of the convergence of P N andthe boundedness of the convergent sequence w N .Next we prove the assertion under the second condition. First note that sim-ilarly to the ﬁrst part of the proof, we have d (cid:0) w N , W ∞ (cid:1) → N → ∞ .Here d (cid:0) w N , W ∞ (cid:1) denotes the distance of the point w N from the set W ∞ , i.e. d (cid:0) w N , W ∞ (cid:1) := inf w ∈ W ∞ d (cid:0) w N , w (cid:1) . However, there may not a be a single w ∗ ∈ W ∞ such that d (cid:0) w N , w ∗ (cid:1) →

0. Let w ∈ W ∞ . Then (cid:12)(cid:12) ∆ N (cid:0) w N (cid:1) − ∆ N ( w ) (cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 M X µ =1 E N ( S λ S µ ) σ N − M X λ =1 M X µ =1 E N ( S λ S µ ) σ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 M X µ =1 E N ( S λ χ µ ) σ N w Nµ − M X λ =1 M X µ =1 E N ( S λ χ µ ) σ N w µ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 w Nλ ! − M X λ =1 w λ ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . The ﬁrst term above is 0. The last term converges to 0 because A = (1) λ,µ =1 ,...,M imposes a condition on the sum of the optimal weights. Even if w N does notconverge to w , P Mλ =1 w Nλ → P Mλ =1 w λ must hold. The middle term is equal to2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 M X µ =1 E N ( S λ χ µ ) σ N (cid:0) w Nµ − w µ (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X µ =1 E N ( Sχ µ ) σ N (cid:0) w Nµ − w µ (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 2 (cid:12)(cid:12)(cid:12)(cid:12) E N ( Sχ ) σ N (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X µ =1 (cid:0) w Nµ − w µ (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) → , where we used the boundedness of the sequence (cid:12)(cid:12)(cid:12) E N ( Sχ ) σ N (cid:12)(cid:12)(cid:12) .8o conclude this section, we calculate the optimal weights according to the lim-iting distribution. Since the group bias variables Z λ are identically distributed,the limiting covariance matrix A has diagonal entries equal to 1 and identicaloﬀ-diagonal entries a := a κλ = E ( χ κ χ λ ). With identical oﬀ-diagonal entries, A is invertible if and only if 0 ≤ a < A is then invertible withentries given by (cid:0) A − (cid:1) κκ = 1 + ( M − a (1 − a )(( M − a + 1) ( κ = 1 , . . . , M ) , (cid:0) A − (cid:1) κλ = − a (1 − a )(( M − a + 1) ( κ, λ = 1 , . . . , M, κ = λ ) . The factor D := − a )(( M − a +1) > κ is given by w κ = D  (1 + ( M − a ) b κ − a X λ = κ b λ  . (4)In the next section, we present a limit theorem that will allow us to calculatethe entries of b .If we do not normalise, i.e. σ = 1, then the entries of b will in general diverge toinﬁnity. Since A N converges to A , this implies the weights will tend to inﬁnity.Hence there is no set of optimal weights according to the limiting distribution.We can get around this by calculating relative weights: w κ P λ w λ = (1 + ( M − a ) b κ − a P λ = κ b λ (1 − a ) P Mλ =1 b λ . (5)As we will see in the next section, the absolute weights (4) and the relativeweights (5) are equivalent in the sense of Remark 4. The main result for the large N behaviour of these models is the following Lawof Large Numbers: Theorem 18.

Let ( P N ) be a sequence of collective bias measures. Then (cid:18) S N , . . . , S M N M (cid:19) ⇒ ( Z , . . . , Z M ) , and the moments of ( S /N , . . . , S M /N M ) converge to the moments of ( Z , . . . , Z M ).9or the proof, see the Appendix.It should be mentioned that for Dirac-distributed Z λ , we cannot use this theoremto calculate and analyse the optimal weights. We must normalised with a smallerpower γ λ < Z λ ∼ δ , then allvoters are independent. It is well known that the square root law applies, i.e.the optimal weights are proportional to √ α λ . This was ﬁrst studied by Penrose[8].However, if at least one Z λ is not Dirac-distributed, then we can calculate theoptimal weights (4) with the help of Theorem 18. We recall from (3), the entriesin the vector b are given by b λ = σ E ( χ λ S ). We calculate σ = p E ( S )= vuut M X κ =1 M X λ =1 E ( S κ S λ )= vuut M X κ =1 E ( S κ ) + M X κ =1 X λ = κ E ( S κ S λ ) . By Theorem 18, for all κ = λ E (cid:0) S κ (cid:1) = E (cid:18) S κ N κ (cid:19) N κ ≈ N κ + E (cid:0) Z κ (cid:1) N κ , E ( S κ S λ ) = E (cid:18) S κ N κ S λ N λ (cid:19) N κ N λ ≈ E ( Z κ Z λ ) N κ N λ . Due to our assumption that at least one Z λ is not Dirac-distributed, there mustbe some λ for which E (cid:0) S κ (cid:1) ≈ E (cid:0) Z κ (cid:1) N κ . Since we have N κ ≈ α κ N , σ N ≈ N vuuut E  M X κ =1 α κ Z κ !  . E ( χ λ S ) = M X κ =1 E ( χ λ S κ )= E ( | S λ | ) + X κ = λ E ( χ λ S κ )= E (cid:18) | S λ | N λ (cid:19) N λ + X κ = λ E (cid:18) χ λ S κ N κ (cid:19) N κ ≈ E | Z λ | N λ + X κ = λ E (sgn ( Z λ ) Z κ ) N κ ≈ N  E | Z λ | α λ + X κ = λ E (sgn ( Z λ ) Z κ ) α κ  = N  E | Z | α λ + E (sgn ( Z ) Z ) X κ = λ α κ  = N [ E | Z | α λ + E (sgn ( Z ) Z ) (1 − α λ )]= N [( E | Z | − E (sgn ( Z ) Z )) α λ + E (sgn ( Z ) Z )] . For the rest of this article , we will write d for r E (cid:16)P Mκ =1 α κ Z κ (cid:17) , m for E | Z | ,and r = E (sgn ( Z ) Z ). Note that m ≥ r . Then b λ = ( m − r ) α λ + rd . Substituting these into (4), we obtain after some simpliﬁcation w κ = D  (1 + ( M − a ) b κ − a X λ = κ b λ  = Dd [ r − am + (1 + ( M − a ) ( m − r ) α κ ] . (6)The optimal weights are given by the sum of a constant term Dd ( r − am ) anda term proportional to the size of the group α κ . This is similar to the electoralcollege in the U.S., where each state is represented by a number of electors equalto the number of senators (two for each state, independently of the population)plus the number of representatives (proportional to the state’s population).The optimal relative weights are given by w κ P λ w λ = r − am + (1 + ( M − a ) ( m − r ) α κ (1 − a ) (( M − r + m ) . (7)As we see, the optimal absolute weights (6) and the optimal relative weights(7) only diﬀer by a constant multiplicative factor. Per Remark 4, both sets ofoptimal weights yield the same voting system.11 Multiplicative Collective Bias Model

We now deﬁne the ﬁrst of two versions of a CBM. These two CBMs were ﬁrstintroduced and analysed in [11]. The single-group CBM was previously studiedby several authors ([10, 3, 7, 5]).

Deﬁnition 19.

Let µ be some symmetric measure on [ − ,

1] and let Y be apositive random variable with range belonging to (0 , Y , . . . , Y M bei.i.d copies of Y that are also independent of Z . Set the group bias variables Z λ := ZY λ . We call a CBM as in Deﬁnition 6 with this type of group biasvariables a multiplicative Collective Bias Measure (m-CBM). We will refer toeach Y λ as a group modiﬁer. Remark . The above deﬁnition ﬁts into the general CBM framework by pos-iting some probability measure ρ on (0 , ζ be a realisation of Z and let A be a measurable set belonging [0 ∧ ζ, ∨ ζ ]. Then we deﬁne the conditionaldistribution ρ ζ A := ( δ A if ζ = 0 ,ρ (cid:16) ζ A (cid:17) if ζ = 0 . So ρ λ is a contraction of ρ with the degree of contraction given by the value ζ .In the deﬁnition above, it is instead the variable Y λ which rescales the centralbias variable Z .Note that due to the symmetry of µ , condition (1) holds for any m-CBM. Themonotonicity of ζ ρ ζ (0 ,

1] is also satisﬁed. In fact, for all ζ ≤ ρ ζ (0 ,

1] = 0and for all ζ > ρ ζ (0 ,

1] = 1.We revisit the covariance between prevalent biases in diﬀerent groups (Lemma5).

Lemma 21.

For m-CBMs, for all groups κ, λ E ( Z κ Z λ ) = E (cid:0) Z (cid:1) ( EY ) , whichis positive if and only µ = δ . A simple calculation yields the claim. Next we calculate the asymptotic mo-ments of a m-CBM.

Theorem 22.

For all groups ν, ν ′ ∈ { , . . . , M } , ν = ν ′ :1. E (cid:0) S ν (cid:1) ≈ N ν + N ν E (cid:0) Z (cid:1) E (cid:0) Y (cid:1) . E ( S ν S ν ′ ) = N ν N ν ′ E (cid:0) Z (cid:1) ( EY ) . E ( S ν χ ν ′ ) ≈ N ν E ( | Z | ) E ( Y ) ≈ E ( | S ν | ) = E ( S ν χ ν ) . E ( χ ν χ ν ′ ) ≈ µ ( Z = 0) . Note that in the ﬁrst statement the leading term has a coeﬃcient equal to 0 ifand only if Z follows a δ distribution. In that case, the lower term becomesimportant. Since Z ∼ δ implies that all voters are independent, we will not bepursuing this further. 12 orollary 23. If E (cid:0) Z (cid:1) > , then we have E (cid:0) S ν (cid:1) ≈ N ν (cid:0) E (cid:0) Z (cid:1) + E (cid:0) Y (cid:1)(cid:1) . Together with Theorem 18, this theorem shows that any sequence of m-CBMsconverges to the limiting distribution ( Z , . . . , Z M ) = ( ZY , . . . , ZY M ) given byits bias variables.As we see by the fourth statement, the set of all m-CBMs can be partitionedinto the two categories established in Deﬁnition 17. The ﬁrst category consistsof those cases where µ { } >

0. In this category, on some fraction of all possibleissues, the voters make up their own minds and vote independently. This holdseven in the large N limit. In the second category are those cases where µ { } = 0.In this case, the voters tend to vote alike. In the large N limit, the votingbehaviour becomes strongly correlated due to Theorem 18. The distinctionbetween the two categories has important implications for the optimal weights.If µ { } = 1, then all voters cast their votes independently. This case has beenextensively analysed in the past and we will not consider it. µ = δ implies E µ ( | Z | ) , E µ (cid:0) Z (cid:1) > A , the limiting covariance matrix of ( χ , . . . , χ M ), issingular: A = (1) λ,µ =1 ,...,M . Hence, the optimal weights are not uniquely de-termined. Instead, any set of weights which sum to b = lim N →∞ σ N E N ( Sχ )is optimal. We need to show that the second part of condition two, b κ = b λ forall κ, λ , holds as well. Theorem 24.

For any m-CBM in the second category and for all groups λb λ = lim N →∞ σ N E N ( Sχ λ ) = E | Z | p E ( Z ) > , which is independent of the speciﬁc group λ .Proof. We have E ( | Z | ) , E (cid:0) Z (cid:1) >

0, and E | Y | > σ N E N ( Sχ λ ) ≈ σ N M X ν =1 E N ( S ν χ λ ) ≈ σ N M X ν =1 E N ( | S ν | ) ≈ P Mν =1 E | Z | EY N ν pP κ P ν E ( S κ S ν ) ≈ N P Mν =1 E | Z | EY α ν pP κ P ν E ( Z ) EY EY N κ N ν ≈ N P Mν =1 E | Z | EY α ν N qP κ P ν E ( Z ) ( EY ) α κ α ν = E | Z | EY p E ( Z ) EY = E | Z | p E ( Z ) . µ { } = 0, then the m-CBMs satisﬁes condition twoin Proposition 15 and hence belongs to the second category. Thus the minimaldemocracy deﬁcit, when the weights are chosen optimally, converges as N → ∞ .Therefore, we can approximate the minimal level of democracy deﬁcit for largebut ﬁnite populations by solving the asymptotic problem. Since asymptoticallyany set of weights with a certain sum is optimal, we can choose any weightswhich are politically feasible. The intuition behind this result is that in thesecond category an overwhelming majority of voters tend to align, hence allgroups vote the same almost surely. It does not matter what the voting weightsin the council are if all representatives agree on almost all issues.If µ { } >

0, the limiting matrix A is invertible by Proposition 10, so the optimalweights are uniquely determined and condition one in Proposition 15 is satisﬁed.By that proposition, the minimal democracy deﬁcit for ﬁnite but large N canbe well approximated by plugging in the limiting optimal weights. We can evenapproximate the optimal weights w N by the optimal weights in the limitingproblem.As a conclusion to the analysis of m-CBMs, we calculate the optimal weights.Let 0 ≤ a := E ( χ χ ) < w κ = Dd [ r − am + (1 + ( M − a ) ( m − r ) α κ ] . In m-CBMs, the terms a, m, r are given by a = E ( χ χ ) = µ ( Z = 0) ,m = E | Z | = E | ZY | = E | Z | EY,r = E (sgn ( Z ) Z ) = E (sgn ( ZY ) ZY )= E (sgn ( Z ) Z ) EY = E | Z | EY.

We have m = r , so the part of the optimal weight that is proportional to thegroup’s size is 0. The optimal weights according to the limiting distribution areequal to w κ = DE | Z | EY (1 − µ ( Z = 0)) d . So all groups receive the same weight to minimise the democracy deﬁcit inde-pendently of their respective group size.

The second version of a CBM we consider is speciﬁed by

Deﬁnition 25.

Let µ and ρ be two symmetric measures on [ − / , /

2] and let Y , . . . , Y M be i.i.d. according to ρ and independent of Z . Set the group bias14ariables Z λ := Z + Y λ . We call a CBM as in Deﬁnition 6 with this type ofgroup bias variables an additive Collective Bias Model (a-CBM). We will referto each Y λ as a group modiﬁer. Remark . The above deﬁnition ﬁts into the general CBM framework by deﬁn-ing the conditional distributions ρ ζ for each ζ ∈ [ − / , /

2] as the distributionof the random variable Y + ζ . So ρ ζ is a translation of ρ by ζ .Note that due to the symmetry of µ and ρ , condition (1) holds for any a-CBM.The monotonicity of ζ ρ ζ (0 ,

1] is also satisﬁed.

Lemma 27.

For a-CBMs, for all groups κ, λ E ( Z κ Z λ ) = E (cid:0) Z (cid:1) , which is pos-itive if and only if µ = δ . Contrary to m-CBMs, in a-CBMs the groups can be independent while stillhaving correlation within each group.We turn to the limiting moments of an a-CBM.

Theorem 28.

For all groups ν, ν ′ ∈ { , . . . , M } , ν = ν ′ :1. E (cid:0) S ν (cid:1) ≈ N ν + N ν (cid:0) E (cid:0) Z (cid:1) + E (cid:0) Y (cid:1)(cid:1) . E ( S ν S ν ′ ) = N ν N ν ′ E (cid:0) Z (cid:1) . E ( | S ν | ) ≈ √ N ν + N ν E | Z + Y | . E ( S ν χ ν ′ ) ≈ N ν E [( Z + Y ν ) sgn ( Z + Y ν ′ )] . E ( χ ν χ ν ′ ) ≈ E [sgn ( Z + Y ν ) sgn ( Z + Y ν ′ )] . Note that in the ﬁrst and third statement the leading terms have coeﬃcientsequal to 0 if and only if both Z and Y follow a δ distribution. In that case,the lower terms become important. Since Z, Y ∼ δ implies that all voters areindependent, we will not be pursuing this further. Corollary 29. If E (cid:0) Z (cid:1) > or E (cid:0) Y (cid:1) > , then we have E (cid:0) S ν (cid:1) ≈ N ν (cid:0) E (cid:0) Z (cid:1) + E (cid:0) Y (cid:1)(cid:1) , E ( | S ν | ) ≈ N ν E | Z + Y | . Whereas for m-CBMs the covariance E ( χ ν χ ν ′ ) equals 1 if and only if µ ( Z = 0),the situation is more complicated for a-CBMs. We recall Proposition 10, whichstates that A is positive deﬁnite if and only if for all λ, κ ∈ { , . . . , M } , λ = κ, P ( Z λ Z κ ≤ > P ( | Y | > | Z | ) >

0. This condition can be interpreted in terms of almost sure(or statewise) stochastic dominance. It says that | Z | does not almost surelydominate | Y | . Put diﬀerently, the group modiﬁers should be able to overridethe central bias on some of the issues. It seems sensible to assume that thecentral bias is not as extreme as the group modiﬁers can potentially be due to15actors such as the diversity of the overall population versus the possibly morehomogeneous group populations.In the last section we could solve the m-CBM for all possible bias distributionsand group sizes and show that there are only two possibilities: if an m-CBMbelongs to the ﬁrst category, then all optimal weights are equal. If it belongs tothe second category, then all sets of weights are optimal. At least this secondpart holds for a-CBMs, too. Proposition 30.

All a-CBMs belong to either the ﬁrst or the second category.An a-CBM belongs to the ﬁrst category if and only if a = E ( χ χ ) < .Proof. We have a = E [sgn ( Z + Y ) sgn ( Z + Y )] = 1 if and only if sgn ( Z + Y ) =sgn ( Z + Y ) almost surely. Due to the symmetry of the measure µ and ρ thislast equality is equivalent to P ( | Y | > | Z | ) = 0. So a < P being in the ﬁrst category. We only need to show that if a = 1, then b κ = b λ for all κ, λ holds as well. b λ = 1 σ N E N ( Sχ λ ) = P Mν =1 E N ( S ν χ λ ) pP κ P ν E ( S κ S ν )= E N ( | S λ | ) + P ν = λ E N ( S ν χ λ ) qP κ E ( S κ ) + P κ P ν = κ E ( S κ S ν )= N λ E | Z + Y | + P ν = λ N ν E [( Z + Y ν ) sgn ( Z + Y ν ′ )] qP κ N κ ( E ( Z ) + E ( Y )) + P κ P ν = κ N κ N ν E ( Z ) ≈ α λ E | Z + Y | + E [( Z + Y ) sgn ( Z + Y )] P Mν =1 α ν q ( E ( Z ) + E ( Y )) P κ α κ + E ( Z ) P κ P ν = κ α κ α ν = α λ ( E | Z + Y | − E [( Z + Y ) sgn ( Z + Y )]) + E [( Z + Y ) sgn ( Z + Y )] p E ( Z ) + E ( Y ) P κ α κ . As we see by this calculation, the entry b λ depends on λ if and only if E | Z + Y | >E [( Z + Y ) sgn ( Z + Y )] which is itself equivalent to P ( | Y | > | Z | ) > P belongs to the second category, then any set of weights is optimal for large N . We turn to the a-CBMs in the ﬁrst category.As mentioned, if | Z | almost surely dominates | Y | , then the a-CBM belongs to thesecond category. If the relation between | Z | and | Y | is inverted and | Y | almostsurely dominates | Z | , then the model belongs to the ﬁrst category. What is more,voters belonging to diﬀerent groups are independent. This is a consequence ofthe group modiﬁer overriding the central bias. We have sgn ( Z + Y λ ) = sgn ( Y λ )almost surely. However, voters within each group are still correlated. In order16o calculate the optimal weights, we have to determine the following quantities: a = E ( χ χ ) = 0 ,m = E | Z | = E | Z + Y | > ,r = E [( Z + Y ) sgn ( Z + Y )] = 0 . We note that r − am = 0 and m − r = m .Substituting these expressions into (6), we obtain w κ = Dd mα κ . As we see, if the group modiﬁer overrides the central bias, the optimal weightsare proportional to the group sizes.Next we postulate a model in which some range of biases is equally likely andthe group modiﬁers tend to be more diverse than the central bias. Let µ be theuniform distribution on some interval [ − β, β ] and ρ the uniform distribution on[ − γ, γ ] such that 0 < β ≤ γ ≤ /

2. This model belongs to the ﬁrst category,since a ≤ /

3. Now we calculate a = β γ ,m = β + 3 γ γ ,r = β γ . Hence, r − am = β (cid:0) γ − β (cid:1) γ > ,m − r = 3 γ − β γ > w κ = Dd " β (cid:0) γ − β (cid:1) γ + (cid:18) M − β γ (cid:19) γ − β γ α κ = D (cid:0) γ − β (cid:1) γ d (cid:2) β + (cid:0) γ + ( M − β (cid:1) α κ (cid:3) and it is the sum of a constant term which is the same for all groups regardlessof size and a summand which is proportional to the group’s size α κ . If wedivide all weights by the common factor D ( γ − β ) γ d >

0, we do not alter thevoting system. Then the constant term is β and the proportional term is17 γ + ( M − β (cid:1) α κ . Hence for small β in relation to γ , the constant termbecomes negligible and the optimal weights are close to proportional to thegroup sizes. That is the case when the central bias tends to be small in relationto the group modiﬁers. Here small groups will receive little voting weight in thecouncil.On the other hand, the optimal weight even a very small group receives has alower bound β . The sum of all weights is at most 2 ( M + 1) γ . So even verysmall groups with α κ close to 0 receive a fraction of at least β M + 1) γ if we normalise the sum of weights to 1. If the central bias has the same distri-bution as the group modiﬁers, i.e. β = γ , then even very small groups receive afraction of at least 1 / (2 ( M + 1)) of the total weight.Next we investigate the complementary case where the group modiﬁers tend tobe less diverse than the central bias: 0 < γ ≤ β ≤ / a = 1 − γβ ,m = β γ β ,r = β − γ β . We also calculate r − am = γ (cid:18) γ β − γβ + 3 (cid:19) > ,m − r = γ β > w κ = Dγ d (cid:20)(cid:18) γ β − γβ + 3 (cid:19) + 3 (cid:18) M − (cid:18) − γβ (cid:19)(cid:19) γβ α κ (cid:21) . We normalise the weights to 1 as before. A small value of γ implies that thegroup modiﬁers tend to be small and the group bias is mostly due to the centralbias. As we can see from the above formula, for γ close to 0, the weight a smallgroup ( α κ close to 0) receives is w κ P λ w λ = 33 M = 1 M .

On the other hand, if γ = β , then we already know form our previous analysisthat the relative weight of a small group is given by 1 / (2 ( M + 1)). Hence asmall group fares better as far as its weight is concerned in a situation where thecentral bias is much stronger than the group modiﬁers compared to a situationwhere both have the same distribution.18 Global Collective Bias Model

In this section, we shall consider the case that the central bias is the onlyinﬂuence on voting behaviour. This might be the case in a random partitionof a population which is not based on any cultural diﬀerences. In the generalframework, the conditional distributions ρ ζ = δ ζ given Z = ζ yield the desiredsetup. This special case also arises in an m-CBM if we let Y ∼ δ . Hence, allresults applicable to m-CBMs holds for a global CBM P as well. P belongs tothe ﬁrst category if and only if µ { } >

0. As previously, we will not considerthe case µ = δ where all voters are independent. Therefore, any set of weightsis optimal if and only if µ { } = 0 and equal weights for all groups are optimalif µ { } > One obvious extension to the general CBM framework is to allow diﬀerent con-ditional distributions ρ ζλ for each group to account for more strongly or moreweakly correlated groups.Another interesting question is under what conditions the positivity of all op-timal weights can be guaranteed. Obviously, a negative optimal weight impliesthat the theoretical minimum of the democracy deﬁcit cannot be achieved dueto the incentives a group with a negative voting weight faces to misrepresent itspreferences.For m-CBMs we managed to solve the problem of optimal weights according tothe limiting distribution. These weights are either arbitrary (second category)or equal (ﬁrst category). So for m-CBMs the optimal weights can always bechosen to be positive. This is not the case for a-CBMs and potentially othervoting measures.In Section 5, we studied an a-CBM with µ being the uniform distribution onsome interval [ − β, β ] and ρ the uniform distribution on [ − γ, γ ] such that 0 <β, γ ≤ /

2. For this model we determined and analysed the properties of theoptimal weights which are always positive. So if the bias variables are uniformlydistributed on centred intervals, there cannot be negative weights. To illustratethat this can occur with other distributions µ and ρ , consider the following case:Let A , A , B be measurable sets with the symmetry property A i = − A i , B = − B and assume they are ordered such that | a | < | b | < | a | holds for all a i ∈ A i , i = 1 , , b ∈ B . Let µ and ρ be symmetric measures with supp µ ⊂ B andsupp ρ ⊂ A ∪ A . We claim that r − am < < a < b < a with µ = ( δ − b + δ b ) and ρ = ( δ − a + δ − a + δ a + δ a ). However, the claim holds19or arbitrary measures µ and ρ with the properties described above. We have a = 14 ,m = 12 E | Z | + E ( | Y | I A ) = b a ,r = 12 E | Z | = b . In the second line above, I A stands for the indicator function of the set A .The inequality r − am ≥ b ≥ (cid:18) b a (cid:19) ⇐⇒ b ≥ a . Since a > b , in this model r − am can be positive, 0, or negative, dependingon the ratio of b and a . If 3 b < a , then r − am is negative. Given that theoptimal weight of each group is w κ = Dd [ r − am + (1 + ( M − a ) ( m − r ) α κ ] , we see that for α κ small enough w κ will be negative. Appendix

Proof of Theorem 18

We prove the statement of Theorem 18 by the method of moments.

Deﬁnition 31.

Let X be an m -dimensional real random vector and let K =( K , . . . , K m ) ∈ N m distributed according to P X . Then we deﬁne the absolutemoment of order K of Xm K ( P X ) := Z R m (cid:12)(cid:12)(cid:12) x K · · · x K m m (cid:12)(cid:12)(cid:12) d P X . If this expression is ﬁnite, then we deﬁne the K -th moment of Xm K ( P X ) := Z R m x K · · · x K m m d P X . We will also write m K ( X ) instead of m K ( P X ).It is well known (see e.g. [6]) that for a sequence of random vectors ( X n ) n ∈ N of dimension m each, convergence in distribution is implied by the convergenceof moments of all orders K ∈ N m under some conditions for the growth of themoments as the components of K go to inﬁnity. Here we only need convergence20n distribution for bounded random vectors, so the convergence of the moments m K ( P X n ) to the corresponding moments of a ﬁxed distribution m K ( P X ) im-plies the convergence in distribution X n = ⇒ n →∞ X .To apply the method of moments, we need to show the convergence of expect-ations of the shape E "(cid:18) S N (cid:19) K · · · (cid:18) S M N M (cid:19) K M . (8)This expectation we can express as a sum of correlations E [ X · · · X k · · · X M · · · X Mk M ] , (9)where k λ ∈ { , , . . . , K λ } for each λ . It suﬃces to consider the ﬁrst k λ votesfrom each group instead of arbitrary k λ votes from that group because the ran-dom variables belonging to the same groups are exchangeable so in the expect-ation above only the number of diﬀerent variables from each group is relevant -not their identities. To deal with the task of expressing the moments (8) in termsof correlations as in (9), we need to introduce some combinatorial concepts. Let | A | stand for the cardinality of the set A . Deﬁnition 32.

We deﬁne a multiindex i = ( i , i , . . . , i L ) ∈ { , , . . . , N } L .1. For j ∈ { , , . . . , N } we set ν j ( i ) := |{ k ∈ { , , . . . , L } | i k = j }| .

2. For ℓ = 0 , , . . . , L we deﬁne ρ ℓ ( i ) := |{ j | ν j ( i ) = ℓ }| and ρ ( i ) := ( ρ ( i ) , . . . , ρ L ( i )) . The expression ν j ( i ) represents the multiplicity of each index j ∈ { , , . . . , N } in the multiindex i , and ρ ℓ ( i ) represents the number of indices in i that occurexactly ℓ times. We shall call ρ ( i ) the proﬁle of the multiindex i . Lemma 33.

For all i = ( i , i , . . . , i L ) ∈ { , , . . . , N } L we have P Lℓ =1 ℓρ ℓ ( i ) = L . We use this basic property of proﬁles to deﬁne

Deﬁnition 34.

Let r = ( r , . . . , r L ) be such that P Lℓ =1 ℓr ℓ = L hold. We call r a proﬁle vector. We deﬁne w L ( r ) = (cid:12)(cid:12) { i ∈ { . . . , N } L | ρ ( i ) = r } (cid:12)(cid:12) to represent the number of multiindices i that have a given proﬁle vector r .21e now deﬁne the set of all proﬁle vectors for a given L ∈ N . Deﬁnition 35.

Let Π ( L ) = n r ∈ { , , . . . , L } L | P Lℓ =1 ℓr ℓ = L o . Some importantsubsets of Π ( L ) are Π ( L ) k = (cid:8) r ∈ Π ( L ) | r = k (cid:9) , Π L ) = (cid:8) r ∈ Π ( L ) | r ℓ = 0 for all ℓ ≥ (cid:9) and Π +( L ) = (cid:8) r ∈ Π ( L ) | r ℓ > ℓ ≥ (cid:9) . We can also combine super-scripts and subscripts. Then we have, e.g., Π L )0 = (cid:8) r ∈ Π ( L ) | r ℓ = 0 for all ℓ = 2 (cid:9) .We shall write for any i ∈ { , , . . . , N } L X i = X i · · · X i L . For any r ∈ Π ( L ) let j ∈ { , , . . . , N } L be such that ρ (cid:0) j (cid:1) = r . Then we let X r stand for X j . Thisdeﬁnition is not problematic if we are only interested in the expectation E (cid:0) X r (cid:1) = E (cid:16) X j (cid:17) , and the random variables X , . . . , X N are exchangeable.If there are M sets { , , . . . , N ν } L ν , and for each ν i ν ∈ { , , . . . , N ν } L ν , thenwe set i := (cid:0) i ν (cid:1) and write X i for X i · · · X i M . Similarly, if we have proﬁle vectors r ν ∈ Π ( L ν ) , and j ν ∈{ . . . , N ν } L ν such that ρ (cid:0) j ν (cid:1) = r ν , then we write X r for X j · · · X j M . Proposition 36.

For r ∈ Π ( L ) set r := N − P Lℓ =1 r ℓ . Then w L ( r ) = N ! r ! r ! . . . r L ! r ! L !1! r r · · · L ! r L . If we let N go to inﬁnity, then we have w L ( r ) ≈ N P Ll =1 r l r ! r ! . . . r L ! L !1! r r · · · L ! r L . This result is based on Theorem 3.14 and Corollary 3.18 in [4].Now we have the necessary concepts to prove Theorem 18. Let K = ( K , . . . , K M ) ∈ N M and K = P Mν =1 K ν . We need to show that the moments m N K := E N (cid:20)(cid:16) S N (cid:17) K · · · (cid:16) S M N M (cid:17) K M (cid:21) converge to m K := m K ( Z , . . . , Z M ). In the ﬁrstsum below, for each λ ∈ { , . . . , M } i λ ∈ { , , . . . , N λ } K λ , in the second sum, r λ ∈ Π ( K λ ) . m K (cid:18) S N , . . . , S M N M (cid:19) = E (cid:18) S K N M . . . S MM N MM (cid:19) = 1 Q Mν =1 N K ν ν X i ,...,i M E (cid:16) X i · · · X i M (cid:17) = 1 Q Mν =1 N K ν ν X r ,...,r M M Y λ =1 w K λ (cid:0) r λ (cid:1) E (cid:16) X r · · · X r M (cid:17) . (10)22e need to know E (cid:16) X r · · · X r M (cid:17) . Let k λ ∈ { , , . . . , K λ } for each λ . E [ X · · · X k · · · X M · · · X Mk M ]= Z [ − , Z [ − , · · · Z [ − , E ( ζ ,...,ζ M ) [ X · · · X k · · · X M · · · X Mk M ] d ρ ζ · · · d ρ ζM d µ = Z [ − , Z [ − , · · · Z [ − , Y λ E ζ λ [ X λ · · · X λk λ ] d ρ ζ · · · d ρ ζM d µ = Z [ − , Z [ − , · · · Z [ − , Y λ ζ k λ λ d ρ ζ · · · d ρ ζM d µ, (11)where we used the conditional independence of all X λi given Z λ = ζ λ andthe identical conditional distribution within each group with E ζ λ [ X λi ] = ζ λ .In particular, this correlation is independent of N and therefore of each N ν .According to Deﬁnition 6, the moment m K is equal to m K = Z [ − , Z [ − , · · · Z [ − , Y λ ζ k λ λ d ρ ζ · · · d ρ ζM d µ. Proposition 36 says that for each λw K λ ( r λ ) ≈ N P Ll =1 r λl λ r λ ! r λ ! . . . r λL ! K λ !1! r λ r λ · · · K λ ! r λKλ ≤ N K λ λ K λ ! K λ !1! K λ = N K λ λ , where the inequality holds for large enough N and hence N λ .We conclude that, asymptotically, only a single summand of the sum (10) con-tributes to the moment m K (cid:16) S N , . . . , S M N M (cid:17) : the one where each r λ = ( K λ , , . . . , Q Mν =1 N K ν ν M Y λ =1 N K λ λ E (cid:0) X ( K , ,..., · · · X ( K M , ,..., (cid:1) = E ( X ( K , ,..., · · · X ( K M , ,..., )= E (cid:16) X j · · · X j M (cid:17) , (12)where ρ (cid:0) j λ (cid:1) = ( K λ , , . . . ,

0) for each λ = 1 , . . . , M . By (11), this correlation isequal to Z [ − , Z [ − , · · · Z [ − , Y λ ζ k λ λ d ρ ζ · · · d ρ ζM d µ, which is also equal to the moment m K . This concludes the proof of Theorem18. 23 eferenceseferences