Collective Bias Models in Two-Tier Voting Systems and the Democracy Deficit
aa r X i v : . [ m a t h . P R ] F e b Collective Bias Models in Two-Tier VotingSystems and the Democracy Deficit
Werner Kirsch ∗ and Gabor Toth † Abstract
In this article, we analyse a two-tier voting system in which the over-all population is split into several different groups and the democracydeficit, the expected quadratic deviation of the council vote from the ref-erendum involving the entire population. The proposals voted on fall inthe category of yes/no voting. The voting behaviour is described by aprobabilistic model in which there is a central influence or bias that af-fects the voters’ decision. We study different versions of the model andthe correlation between voters belonging to different groups is allowed tobe weaker than within groups. The two main questions we analyse arethe asymptotic behaviour of the (normalised) democracy deficit and theoptimal voting weights the groups receive in the council vote chosen tominimise the democracy deficit.
Keywords. Two-tier voting systems, probabilistic voting, collective bias models,democracy deficit, optimal weights, limit theorem.2020 Mathematics Subject Classification. 91B12, 91B14, 60F05.
We study voting in two-tier voting systems and the determination of the optimalvoting weights. Suppose there is a population subdivided into M ∈ N groups,each of which sends a representative to a council who casts a vote in binary votesaccording to the decision reached by majority rule by their respective group.Since the groups may be of different size, it is natural to assign different votingweights to each one. How should the weights be determined? One objectivestudied in the literature is to minimise the democracy deficit (see e.g. [3, 7]),i.e. the expected quadratic deviation of the council vote from a hypotheticalreferendum across the entire population.Suppose the overall population is of size N , whereas each group has size N λ ,where the subindex λ stands for the group λ ∈ { , . . . , M } . Let the two ∗ FernUniversit¨at in Hagen, Germany, [email protected] † FernUniversit¨at in Hagen, Germany, [email protected] ±
1. We can interpret +1 as ‘aye’ and − X λi , where the subindex λ stands once again forthe group and i ∈ { , . . . , N λ } for the voter. We assume that voting behaviouris described by a probability measure. Definition 1.
A voting measure P is a probability measure on the space of votingconfigurations {− , } N with the symmetry property P ( X = x , . . . , X MN M = x MN M ) = P ( X = − x , . . . , X MN M = − x MN M )(1)for all voting configurations ( x , . . . , x MN M ) ∈ {− , } N . Let E be the expect-ation with respect to P .While the votes cast are assumed to be deterministic, obeying the voters’ prefer-ences which we do not model explicitly, the proposal put before them is assumedto be randomly selected. Since each yes/no question can be posed in two oppos-ite ways, one to which a given voter would respond ‘aye’ to and one to whichshe would respond ‘nay’ to, it is reasonable to assume that each voter votes‘aye’ with the same probability she votes ‘nay’. We also assume that there isa sufficient range of proposals that elicits all 2 N possible responses from thevoting population.Each group votes on any given issue. Since we assume the majority rule, thesum of the votes within each group are of key importance: Definition 2.
For each group λ , we define the voting margin S λ := P N λ i =1 X λi .The overall voting margin is S := P Mλ =1 S λ .Each group casts a vote in the council: Definition 3.
The council vote of group λ is given by χ λ := ( , if S λ > , − , otherwise.It is clear that each χ λ is a random variable defined on the same probabilityspace as the voting measure. Now we can define the democracy deficit:The democracy deficit given a voting measure P and a set of weights w , . . . , w M is defined by ∆ := E Sσ − M X λ =1 w λ χ λ ! , where σ is a normalising constant.Note that the democracy deficit depends on the voting measure. P Mλ =1 w λ χ λ isthe council vote. The council vote is in favour of a proposal if P Mλ =1 w λ χ λ > / P Mλ =1 w λ χ λ > emark . We note here that for a fixed relative quota of 1 /
2, a voting systemwith a set of weights w , . . . , w M is equivalent to a voting system with the samequota and weights w ′ , . . . , w ′ M , where there is a constant s > λ = 1 , . . . , M w ′ λ = sw λ .We will be concerned with matters related to the asymptotic voting behaviourand the democracy deficit. We will define a class of voting measures calledcollective bias models and analyse the large N limit of the model in the senseof convergence theorems of the type (cid:18) S N γ , . . . , S M N γ M M (cid:19) = ⇒ N →∞ η, where = ⇒ N →∞ stands for convergence in distribution, each γ λ > η is some probability measure on R M . In the models analysed inthis article, all moments of the types E (cid:16) S λ N γλλ (cid:17) , E (cid:16) S λ N γλλ S µ N γµµ (cid:17) , E (cid:16) S λ N γλλ χ µ (cid:17) = E (cid:16) S λ N γλλ sgn ( S µ ) (cid:17) for all λ, µ = 1 , . . . , M converge to the moments of η . If ( P N )is a sequence of voting measures such that ( S /N γ , . . . , S M /N γ M M ) convergesin distribution, and the above mentioned moments also converge, we will alsowrite P N → η and say ‘ P N converges to a limiting distribution’.The general framework of a collective bias model is given by a set of bias randomvariables that represent some cultural or political influence that acts on allvoters. There is a central bias variable Z with distribution µ which inducescorrelation between voters of different groups. Furthermore, there is a biasvariable Z λ for each group. Its conditional distribution given the realisation Z = ζ is ρ ζ . The group bias variable Z λ induces correlation between the votersbelonging to that group. Every probability measure involved has a supportwhich belongs to [ − , Z = ζ according to µ and Z λ = ζ λ accordingto ρ ζ , all voters in group λ cast their vote independently, with a probabilityof voting ‘aye’ equal to ζ λ . Hence, a value ζ λ = 1 implies that all votersbelonging to group λ vote ‘aye’ almost surely. Similarly, ζ λ = − ζ λ = 0 means there is no bias and all voters in thegroup vote independently.The symmetry condition (1) requires that the distribution of the random vari-ables Z λ be symmetric around the origin for all λ . For simplicity’s sake, we willbe assuming the sufficient condition that1. µ is symmetric,2. for all ζ ∈ [ − , ρ ζ satisfy for all measurablesets A ⊂ [ − ,
1] the equality ρ ζ A = ρ − ζ ( − A ).3ince we want to study correlated voting, we impose a monotonicity conditionon the conditional distributions ρ ζ :The function ζ ρ ζ (0 ,
1] is increasing.The monotonicity condition represents our assumption that higher central biasesincrease the likelihood that voters vote in favour of a proposal. It implies thereis positive correlation between group biases in different groups.
Lemma 5.
For all groups κ, λ E ( Z κ Z λ ) ≥ holds. In fact, the inequality E ( Z κ Z λ ) > µ is not the Dirac measure δ and the function ζ ρ ζ (0 ,
1] is not constant.
Definition 6.
A Collective Bias Model (CBM) P is defined by setting for eachvoting configuration ( x , . . . , x MN M ) ∈ {− , } N P ( X = x , . . . , X MN M = x MN M ) := Z [ − , Z [ − , · · · Z [ − , M Y λ =1 N λ Y i =1 ((1 − p ) δ − ( { x λi } ) + pδ ( { x λi } )) d ρ ζ · · · d ρ ζM d µ, where p = ζ λ . The measures ρ ζ , . . . , ρ ζM are all identical to ρ ζ . We will alsowrite P ( ζ ,...,ζ M ) ( X = x , . . . , X MN M = x MN M ) := M Y λ =1 N λ Y i =1 ((1 − p ) δ − ( { x λi } ) + pδ ( { x λi } ))for the conditional product measure.For the rest of this article, we will assume that each group becomes large as theoverall population goes to infinity, lim N →∞ N λ = ∞ . Suppose that the groupsizes as fractions of the overall population converge to fixed limits: α λ := lim N →∞ N λ N , λ = 1 , . . . , M.
We want to set the weights so that the democracy deficit is minimal. By takingpartial derivatives of ∆ we obtain the linear equation system that characterisesthe optimal weights.( E N ( χ λ χ µ )) λ,µ =1 ,...,M ( w ν ) ν =1 ,...,M = 1 σ N ( E N ( χ λ S )) λ =1 ,...,M . (2)We will refer to the coefficient matrix above as A N := ( a λ,µ ) λ,µ =1 ,...,M := ( E N ( χ λ χ µ )) λ,µ =1 ,...,M , w := ( w ν ) ν =1 ,...,M , and the vector on the right hand side as b := ( b λ ) λ =1 ,...,M := 1 σ N ( E N ( χ λ S )) λ =1 ,...,M . (3)The matrix A N is the covariance matrix of the random vector ( χ , . . . , χ M ). It isinvertible under very mild conditions. If any linear combination of the χ λ is notalmost surely constant, i.e. if there are no constants α ∈ R M , α = 0 , β ∈ R suchthat P Mλ =1 α λ χ λ = β a.s., then we say χ , . . . , χ M are stochastically linearlyindependent. Then the following proposition holds: Proposition 7.
The covariance matrix A N is positive definite if and only if χ , . . . , χ M are stochastically linearly independent. The proof of this statement is straightforward (see Theorem 1.4 in [9]).A sufficient condition for stochastic linear independence is
Lemma 8.
If for all ( x , . . . , x M ) ∈ {− , } M we have P (sgn ( S λ ) = x λ , λ = 1 , . . . , M ) > , then χ , . . . , χ M are stochastically linearly independent. For finite populations, A is positive definite unless the biases are overwhelming. Proposition 9.
If for all λ the bias Z λ satisfies E | Z λ | < , then A is invertible.Proof. If E | Z λ | <
1, then there is a closed interval [ − q λ , q λ ] ( [ − ,
1] such that P ( Z λ ∈ [ − q λ , q λ ]) >
0. Due to the conditional independence of the X λi given( ζ , . . . , ζ M ) ∈ [ − q λ , q λ ] P ( ζ ,...,ζ M ) ( S λ < , P ( ζ ,...,ζ M ) ( S λ > >
0. This fol-lows from | ζ λ | ≤ q λ < P ( ζ ,...,ζ M ) ( X λi < , P ( ζ ,...,ζ M ) ( X λi > > i . Since conditionally on ( ζ , . . . , ζ M ), the voting margins S λ are inde-pendent, we obtain the sufficient condition in Lemma 8 by integrating P ( ζ ,...,ζ M ) with respect to ρ ζ , . . . , ρ ζM , and µ .Therefore, the optimal weights are uniquely determined provided the biases arenot overwhelming. However, calculating the inverse matrix to A and analysingits properties is very difficult for finite N . We instead turn to the limit A :=lim N →∞ A N .For large populations, it is not sufficient that E | Z λ | < Proposition 10.
The covariance matrix A is positive definite if and only if forall λ, µ ∈ { , . . . , M } , λ = µ, P ( Z λ Z µ ≤ > holds. For a proof of this proposition see Proposition 6.2.3 in [11].We now turn to the democracy deficit. In Definition 3, we choose the normal-isation σ N := p E ( S ). Then ∆ N remains uniformly bounded for all N . This5llows us to study the large N behaviour of the democracy deficit. If we did notnormalise, i.e. σ N = 1, then the democracy deficit would grow without bound,even if we did choose the weights to minimise it. Unless we state otherwise, wewill be considering σ N = p E ( S ). Lemma 11.
For any fixed set of weights w , . . . , w M , there is a constant K > such that for all N the democracy deficit is bounded above by K . Also, E N (cid:16) S σ N (cid:17) = 1 for every N .Remark . The second fact is essential. If we normalised in such a way that E N (cid:16) S σ N (cid:17) → N → ∞ , then minimising ∆ would be equivalent to minimisingthe squared council voting margin, i.e. we would be selecting weights to ensurethat the council is as evenly split on average as possible. That is not what wewant. Proof.
The second claim is obvious: E N (cid:16) S σ N (cid:17) = E N (cid:16) S E N ( S ) (cid:17) = 1. For anyfixed set of weights w , . . . , w M , the random variable W := P Mλ =1 w λ χ λ isbounded, so absolute moments E (cid:16) | W | k (cid:17) of all orders k ∈ N exist. We cal-culate an upper bound for ∆ that is independent of N :∆ ≤ E (cid:18) S σ (cid:19) + 2 E (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) Sσ (cid:12)(cid:12)(cid:12)(cid:12) | W | (cid:19) + E (cid:0) W (cid:1) ≤ s E (cid:18) S σ (cid:19)p E ( W ) + E (cid:0) W (cid:1) ≤ √ M + M = ( M + 1) =: K. Next we prove two results concerning the behaviour of ∆ for large N . As wewill see later, for most models we consider in this article, the set of optimalweights is uniquely determined for finite N . The same cannot be said for thelimit N → ∞ .We will call the set of optimal weights according to the limiting distribution W ∞ . Let w N ∈ R M be the uniquely determined optimal weights for overallpopulation size N . Proposition 13.
Assume P N converges to a limiting distribution. Then for all v, w ∈ W ∞ , we have lim N →∞ | ∆ N ( v ) − ∆ N ( w ) | = 0 . This statement holds for all convergent sequences of voting measures with uniqueoptimal votes w N . We will state a stronger result next that relies on someproperties of CBMs. 6 roof. The democracy deficits for finite N, ∆ N , and for the limiting distribu-tion, ∆ , are polynomials of degree 2 in the weights w , . . . , w M . Due to theconvergence of P N , we have pointwise convergence ∆ N → ∆. Also | ∆ N ( v ) − ∆ N ( w ) | ≤ | ∆ N ( v ) − ∆( w ) | + | ∆( v ) − ∆( w ) | + | ∆( w ) − ∆ N ( w ) | . The first and the last term converge to 0 due to the pointwise convergence ∆ N → ∆. The second term equals 0, since v, w ∈ W ∞ and hence ∆( v ) = ∆( w ). Definition 14.
For any two functions f and g of a natural number n , we write f ( n ) ≈ g ( n ) to indicate that lim n →∞ f ( n ) g ( n ) = 1 holds. Proposition 15.
Assume P N converges to a limiting distribution. Suppose alsothat1. | W ∞ | = 1 , or2. the limiting distribution’s covariance matrix of ( χ , . . . , χ M ) is A = (1) λ,µ =1 ,...,M and for all λ, µ = 1 , . . . , M E N ( Sχ λ ) ≈ E N ( Sχ µ ) .Then we have for all w ∈ W ∞ : lim N →∞ (cid:12)(cid:12) ∆ N (cid:0) w N (cid:1) − ∆ N ( w ) (cid:12)(cid:12) = 0 .Remark . As we will see later, all models considered in this article satisfy oneof these two conditions. Either the limiting covariance matrix A is invertible, andhence the optimal weights are uniquely determined, or else the second conditionholds. Definition 17. If P N converges to a limiting distribution, then we say that P N belongs to the first category if | W ∞ | = 1. If instead the second condition inProposition 15 holds, then we say that P N belongs to the second category. Proof of Proposition 15.
We first prove the assertion if | W ∞ | = 1. Since foreach N the weight vector w N is the solution of linear equation system (2).Furthermore, as P N converges to a limiting distribution, the coefficients in theequation system corresponding to N converge to the coefficients of the limitingequation system. This implies that for W ∞ = { w } , we have w N → w as N → ∞ . We calculate (cid:12)(cid:12) ∆ N (cid:0) w N (cid:1) − ∆ N ( w ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) ∆ N (cid:0) w N (cid:1) − ∆ (cid:0) w N (cid:1)(cid:12)(cid:12) + (cid:12)(cid:12) ∆ (cid:0) w N (cid:1) − ∆( w ) (cid:12)(cid:12) + | ∆( w ) − ∆ N ( w ) | . w N → w . The third term converges to0 due to the pointwise convergence ∆ N → ∆. As for the first term, (cid:12)(cid:12) ∆ N (cid:0) w N (cid:1) − ∆ (cid:0) w N (cid:1)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 M X µ =1 E N ( S λ S µ ) σ N − lim N →∞ M X λ =1 M X µ =1 E N ( S λ S µ ) σ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 2 M X λ =1 M X µ =1 (cid:12)(cid:12)(cid:12)(cid:12) E N ( S λ χ µ ) σ N − lim N →∞ E N ( S λ χ µ ) σ N (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12) w Nµ (cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 w Nλ ! − M X λ =1 w Nλ ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . The last summand is 0. The first summand converges to 0 due to the convergenceof P N , whereas the second term converges because of the convergence of P N andthe boundedness of the convergent sequence w N .Next we prove the assertion under the second condition. First note that sim-ilarly to the first part of the proof, we have d (cid:0) w N , W ∞ (cid:1) → N → ∞ .Here d (cid:0) w N , W ∞ (cid:1) denotes the distance of the point w N from the set W ∞ , i.e. d (cid:0) w N , W ∞ (cid:1) := inf w ∈ W ∞ d (cid:0) w N , w (cid:1) . However, there may not a be a single w ∗ ∈ W ∞ such that d (cid:0) w N , w ∗ (cid:1) →
0. Let w ∈ W ∞ . Then (cid:12)(cid:12) ∆ N (cid:0) w N (cid:1) − ∆ N ( w ) (cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 M X µ =1 E N ( S λ S µ ) σ N − M X λ =1 M X µ =1 E N ( S λ S µ ) σ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 M X µ =1 E N ( S λ χ µ ) σ N w Nµ − M X λ =1 M X µ =1 E N ( S λ χ µ ) σ N w µ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 w Nλ ! − M X λ =1 w λ ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . The first term above is 0. The last term converges to 0 because A = (1) λ,µ =1 ,...,M imposes a condition on the sum of the optimal weights. Even if w N does notconverge to w , P Mλ =1 w Nλ → P Mλ =1 w λ must hold. The middle term is equal to2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X λ =1 M X µ =1 E N ( S λ χ µ ) σ N (cid:0) w Nµ − w µ (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 2 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X µ =1 E N ( Sχ µ ) σ N (cid:0) w Nµ − w µ (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 2 (cid:12)(cid:12)(cid:12)(cid:12) E N ( Sχ ) σ N (cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) M X µ =1 (cid:0) w Nµ − w µ (cid:1)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) → , where we used the boundedness of the sequence (cid:12)(cid:12)(cid:12) E N ( Sχ ) σ N (cid:12)(cid:12)(cid:12) .8o conclude this section, we calculate the optimal weights according to the lim-iting distribution. Since the group bias variables Z λ are identically distributed,the limiting covariance matrix A has diagonal entries equal to 1 and identicaloff-diagonal entries a := a κλ = E ( χ κ χ λ ). With identical off-diagonal entries, A is invertible if and only if 0 ≤ a < A is then invertible withentries given by (cid:0) A − (cid:1) κκ = 1 + ( M − a (1 − a )(( M − a + 1) ( κ = 1 , . . . , M ) , (cid:0) A − (cid:1) κλ = − a (1 − a )(( M − a + 1) ( κ, λ = 1 , . . . , M, κ = λ ) . The factor D := − a )(( M − a +1) > κ is given by w κ = D (1 + ( M − a ) b κ − a X λ = κ b λ . (4)In the next section, we present a limit theorem that will allow us to calculatethe entries of b .If we do not normalise, i.e. σ = 1, then the entries of b will in general diverge toinfinity. Since A N converges to A , this implies the weights will tend to infinity.Hence there is no set of optimal weights according to the limiting distribution.We can get around this by calculating relative weights: w κ P λ w λ = (1 + ( M − a ) b κ − a P λ = κ b λ (1 − a ) P Mλ =1 b λ . (5)As we will see in the next section, the absolute weights (4) and the relativeweights (5) are equivalent in the sense of Remark 4. The main result for the large N behaviour of these models is the following Lawof Large Numbers: Theorem 18.
Let ( P N ) be a sequence of collective bias measures. Then (cid:18) S N , . . . , S M N M (cid:19) ⇒ ( Z , . . . , Z M ) , and the moments of ( S /N , . . . , S M /N M ) converge to the moments of ( Z , . . . , Z M ).9or the proof, see the Appendix.It should be mentioned that for Dirac-distributed Z λ , we cannot use this theoremto calculate and analyse the optimal weights. We must normalised with a smallerpower γ λ < Z λ ∼ δ , then allvoters are independent. It is well known that the square root law applies, i.e.the optimal weights are proportional to √ α λ . This was first studied by Penrose[8].However, if at least one Z λ is not Dirac-distributed, then we can calculate theoptimal weights (4) with the help of Theorem 18. We recall from (3), the entriesin the vector b are given by b λ = σ E ( χ λ S ). We calculate σ = p E ( S )= vuut M X κ =1 M X λ =1 E ( S κ S λ )= vuut M X κ =1 E ( S κ ) + M X κ =1 X λ = κ E ( S κ S λ ) . By Theorem 18, for all κ = λ E (cid:0) S κ (cid:1) = E (cid:18) S κ N κ (cid:19) N κ ≈ N κ + E (cid:0) Z κ (cid:1) N κ , E ( S κ S λ ) = E (cid:18) S κ N κ S λ N λ (cid:19) N κ N λ ≈ E ( Z κ Z λ ) N κ N λ . Due to our assumption that at least one Z λ is not Dirac-distributed, there mustbe some λ for which E (cid:0) S κ (cid:1) ≈ E (cid:0) Z κ (cid:1) N κ . Since we have N κ ≈ α κ N , σ N ≈ N vuuut E M X κ =1 α κ Z κ ! . E ( χ λ S ) = M X κ =1 E ( χ λ S κ )= E ( | S λ | ) + X κ = λ E ( χ λ S κ )= E (cid:18) | S λ | N λ (cid:19) N λ + X κ = λ E (cid:18) χ λ S κ N κ (cid:19) N κ ≈ E | Z λ | N λ + X κ = λ E (sgn ( Z λ ) Z κ ) N κ ≈ N E | Z λ | α λ + X κ = λ E (sgn ( Z λ ) Z κ ) α κ = N E | Z | α λ + E (sgn ( Z ) Z ) X κ = λ α κ = N [ E | Z | α λ + E (sgn ( Z ) Z ) (1 − α λ )]= N [( E | Z | − E (sgn ( Z ) Z )) α λ + E (sgn ( Z ) Z )] . For the rest of this article , we will write d for r E (cid:16)P Mκ =1 α κ Z κ (cid:17) , m for E | Z | ,and r = E (sgn ( Z ) Z ). Note that m ≥ r . Then b λ = ( m − r ) α λ + rd . Substituting these into (4), we obtain after some simplification w κ = D (1 + ( M − a ) b κ − a X λ = κ b λ = Dd [ r − am + (1 + ( M − a ) ( m − r ) α κ ] . (6)The optimal weights are given by the sum of a constant term Dd ( r − am ) anda term proportional to the size of the group α κ . This is similar to the electoralcollege in the U.S., where each state is represented by a number of electors equalto the number of senators (two for each state, independently of the population)plus the number of representatives (proportional to the state’s population).The optimal relative weights are given by w κ P λ w λ = r − am + (1 + ( M − a ) ( m − r ) α κ (1 − a ) (( M − r + m ) . (7)As we see, the optimal absolute weights (6) and the optimal relative weights(7) only differ by a constant multiplicative factor. Per Remark 4, both sets ofoptimal weights yield the same voting system.11 Multiplicative Collective Bias Model
We now define the first of two versions of a CBM. These two CBMs were firstintroduced and analysed in [11]. The single-group CBM was previously studiedby several authors ([10, 3, 7, 5]).
Definition 19.
Let µ be some symmetric measure on [ − ,
1] and let Y be apositive random variable with range belonging to (0 , Y , . . . , Y M bei.i.d copies of Y that are also independent of Z . Set the group bias variables Z λ := ZY λ . We call a CBM as in Definition 6 with this type of group biasvariables a multiplicative Collective Bias Measure (m-CBM). We will refer toeach Y λ as a group modifier. Remark . The above definition fits into the general CBM framework by pos-iting some probability measure ρ on (0 , ζ be a realisation of Z and let A be a measurable set belonging [0 ∧ ζ, ∨ ζ ]. Then we define the conditionaldistribution ρ ζ A := ( δ A if ζ = 0 ,ρ (cid:16) ζ A (cid:17) if ζ = 0 . So ρ λ is a contraction of ρ with the degree of contraction given by the value ζ .In the definition above, it is instead the variable Y λ which rescales the centralbias variable Z .Note that due to the symmetry of µ , condition (1) holds for any m-CBM. Themonotonicity of ζ ρ ζ (0 ,
1] is also satisfied. In fact, for all ζ ≤ ρ ζ (0 ,
1] = 0and for all ζ > ρ ζ (0 ,
1] = 1.We revisit the covariance between prevalent biases in different groups (Lemma5).
Lemma 21.
For m-CBMs, for all groups κ, λ E ( Z κ Z λ ) = E (cid:0) Z (cid:1) ( EY ) , whichis positive if and only µ = δ . A simple calculation yields the claim. Next we calculate the asymptotic mo-ments of a m-CBM.
Theorem 22.
For all groups ν, ν ′ ∈ { , . . . , M } , ν = ν ′ :1. E (cid:0) S ν (cid:1) ≈ N ν + N ν E (cid:0) Z (cid:1) E (cid:0) Y (cid:1) . E ( S ν S ν ′ ) = N ν N ν ′ E (cid:0) Z (cid:1) ( EY ) . E ( S ν χ ν ′ ) ≈ N ν E ( | Z | ) E ( Y ) ≈ E ( | S ν | ) = E ( S ν χ ν ) . E ( χ ν χ ν ′ ) ≈ µ ( Z = 0) . Note that in the first statement the leading term has a coefficient equal to 0 ifand only if Z follows a δ distribution. In that case, the lower term becomesimportant. Since Z ∼ δ implies that all voters are independent, we will not bepursuing this further. 12 orollary 23. If E (cid:0) Z (cid:1) > , then we have E (cid:0) S ν (cid:1) ≈ N ν (cid:0) E (cid:0) Z (cid:1) + E (cid:0) Y (cid:1)(cid:1) . Together with Theorem 18, this theorem shows that any sequence of m-CBMsconverges to the limiting distribution ( Z , . . . , Z M ) = ( ZY , . . . , ZY M ) given byits bias variables.As we see by the fourth statement, the set of all m-CBMs can be partitionedinto the two categories established in Definition 17. The first category consistsof those cases where µ { } >
0. In this category, on some fraction of all possibleissues, the voters make up their own minds and vote independently. This holdseven in the large N limit. In the second category are those cases where µ { } = 0.In this case, the voters tend to vote alike. In the large N limit, the votingbehaviour becomes strongly correlated due to Theorem 18. The distinctionbetween the two categories has important implications for the optimal weights.If µ { } = 1, then all voters cast their votes independently. This case has beenextensively analysed in the past and we will not consider it. µ = δ implies E µ ( | Z | ) , E µ (cid:0) Z (cid:1) > A , the limiting covariance matrix of ( χ , . . . , χ M ), issingular: A = (1) λ,µ =1 ,...,M . Hence, the optimal weights are not uniquely de-termined. Instead, any set of weights which sum to b = lim N →∞ σ N E N ( Sχ )is optimal. We need to show that the second part of condition two, b κ = b λ forall κ, λ , holds as well. Theorem 24.
For any m-CBM in the second category and for all groups λb λ = lim N →∞ σ N E N ( Sχ λ ) = E | Z | p E ( Z ) > , which is independent of the specific group λ .Proof. We have E ( | Z | ) , E (cid:0) Z (cid:1) >
0, and E | Y | > σ N E N ( Sχ λ ) ≈ σ N M X ν =1 E N ( S ν χ λ ) ≈ σ N M X ν =1 E N ( | S ν | ) ≈ P Mν =1 E | Z | EY N ν pP κ P ν E ( S κ S ν ) ≈ N P Mν =1 E | Z | EY α ν pP κ P ν E ( Z ) EY EY N κ N ν ≈ N P Mν =1 E | Z | EY α ν N qP κ P ν E ( Z ) ( EY ) α κ α ν = E | Z | EY p E ( Z ) EY = E | Z | p E ( Z ) . µ { } = 0, then the m-CBMs satisfies condition twoin Proposition 15 and hence belongs to the second category. Thus the minimaldemocracy deficit, when the weights are chosen optimally, converges as N → ∞ .Therefore, we can approximate the minimal level of democracy deficit for largebut finite populations by solving the asymptotic problem. Since asymptoticallyany set of weights with a certain sum is optimal, we can choose any weightswhich are politically feasible. The intuition behind this result is that in thesecond category an overwhelming majority of voters tend to align, hence allgroups vote the same almost surely. It does not matter what the voting weightsin the council are if all representatives agree on almost all issues.If µ { } >
0, the limiting matrix A is invertible by Proposition 10, so the optimalweights are uniquely determined and condition one in Proposition 15 is satisfied.By that proposition, the minimal democracy deficit for finite but large N canbe well approximated by plugging in the limiting optimal weights. We can evenapproximate the optimal weights w N by the optimal weights in the limitingproblem.As a conclusion to the analysis of m-CBMs, we calculate the optimal weights.Let 0 ≤ a := E ( χ χ ) < w κ = Dd [ r − am + (1 + ( M − a ) ( m − r ) α κ ] . In m-CBMs, the terms a, m, r are given by a = E ( χ χ ) = µ ( Z = 0) ,m = E | Z | = E | ZY | = E | Z | EY,r = E (sgn ( Z ) Z ) = E (sgn ( ZY ) ZY )= E (sgn ( Z ) Z ) EY = E | Z | EY.
We have m = r , so the part of the optimal weight that is proportional to thegroup’s size is 0. The optimal weights according to the limiting distribution areequal to w κ = DE | Z | EY (1 − µ ( Z = 0)) d . So all groups receive the same weight to minimise the democracy deficit inde-pendently of their respective group size.
The second version of a CBM we consider is specified by
Definition 25.
Let µ and ρ be two symmetric measures on [ − / , /
2] and let Y , . . . , Y M be i.i.d. according to ρ and independent of Z . Set the group bias14ariables Z λ := Z + Y λ . We call a CBM as in Definition 6 with this type ofgroup bias variables an additive Collective Bias Model (a-CBM). We will referto each Y λ as a group modifier. Remark . The above definition fits into the general CBM framework by defin-ing the conditional distributions ρ ζ for each ζ ∈ [ − / , /
2] as the distributionof the random variable Y + ζ . So ρ ζ is a translation of ρ by ζ .Note that due to the symmetry of µ and ρ , condition (1) holds for any a-CBM.The monotonicity of ζ ρ ζ (0 ,
1] is also satisfied.
Lemma 27.
For a-CBMs, for all groups κ, λ E ( Z κ Z λ ) = E (cid:0) Z (cid:1) , which is pos-itive if and only if µ = δ . Contrary to m-CBMs, in a-CBMs the groups can be independent while stillhaving correlation within each group.We turn to the limiting moments of an a-CBM.
Theorem 28.
For all groups ν, ν ′ ∈ { , . . . , M } , ν = ν ′ :1. E (cid:0) S ν (cid:1) ≈ N ν + N ν (cid:0) E (cid:0) Z (cid:1) + E (cid:0) Y (cid:1)(cid:1) . E ( S ν S ν ′ ) = N ν N ν ′ E (cid:0) Z (cid:1) . E ( | S ν | ) ≈ √ N ν + N ν E | Z + Y | . E ( S ν χ ν ′ ) ≈ N ν E [( Z + Y ν ) sgn ( Z + Y ν ′ )] . E ( χ ν χ ν ′ ) ≈ E [sgn ( Z + Y ν ) sgn ( Z + Y ν ′ )] . Note that in the first and third statement the leading terms have coefficientsequal to 0 if and only if both Z and Y follow a δ distribution. In that case,the lower terms become important. Since Z, Y ∼ δ implies that all voters areindependent, we will not be pursuing this further. Corollary 29. If E (cid:0) Z (cid:1) > or E (cid:0) Y (cid:1) > , then we have E (cid:0) S ν (cid:1) ≈ N ν (cid:0) E (cid:0) Z (cid:1) + E (cid:0) Y (cid:1)(cid:1) , E ( | S ν | ) ≈ N ν E | Z + Y | . Whereas for m-CBMs the covariance E ( χ ν χ ν ′ ) equals 1 if and only if µ ( Z = 0),the situation is more complicated for a-CBMs. We recall Proposition 10, whichstates that A is positive definite if and only if for all λ, κ ∈ { , . . . , M } , λ = κ, P ( Z λ Z κ ≤ > P ( | Y | > | Z | ) >
0. This condition can be interpreted in terms of almost sure(or statewise) stochastic dominance. It says that | Z | does not almost surelydominate | Y | . Put differently, the group modifiers should be able to overridethe central bias on some of the issues. It seems sensible to assume that thecentral bias is not as extreme as the group modifiers can potentially be due to15actors such as the diversity of the overall population versus the possibly morehomogeneous group populations.In the last section we could solve the m-CBM for all possible bias distributionsand group sizes and show that there are only two possibilities: if an m-CBMbelongs to the first category, then all optimal weights are equal. If it belongs tothe second category, then all sets of weights are optimal. At least this secondpart holds for a-CBMs, too. Proposition 30.
All a-CBMs belong to either the first or the second category.An a-CBM belongs to the first category if and only if a = E ( χ χ ) < .Proof. We have a = E [sgn ( Z + Y ) sgn ( Z + Y )] = 1 if and only if sgn ( Z + Y ) =sgn ( Z + Y ) almost surely. Due to the symmetry of the measure µ and ρ thislast equality is equivalent to P ( | Y | > | Z | ) = 0. So a < P being in the first category. We only need to show that if a = 1, then b κ = b λ for all κ, λ holds as well. b λ = 1 σ N E N ( Sχ λ ) = P Mν =1 E N ( S ν χ λ ) pP κ P ν E ( S κ S ν )= E N ( | S λ | ) + P ν = λ E N ( S ν χ λ ) qP κ E ( S κ ) + P κ P ν = κ E ( S κ S ν )= N λ E | Z + Y | + P ν = λ N ν E [( Z + Y ν ) sgn ( Z + Y ν ′ )] qP κ N κ ( E ( Z ) + E ( Y )) + P κ P ν = κ N κ N ν E ( Z ) ≈ α λ E | Z + Y | + E [( Z + Y ) sgn ( Z + Y )] P Mν =1 α ν q ( E ( Z ) + E ( Y )) P κ α κ + E ( Z ) P κ P ν = κ α κ α ν = α λ ( E | Z + Y | − E [( Z + Y ) sgn ( Z + Y )]) + E [( Z + Y ) sgn ( Z + Y )] p E ( Z ) + E ( Y ) P κ α κ . As we see by this calculation, the entry b λ depends on λ if and only if E | Z + Y | >E [( Z + Y ) sgn ( Z + Y )] which is itself equivalent to P ( | Y | > | Z | ) > P belongs to the second category, then any set of weights is optimal for large N . We turn to the a-CBMs in the first category.As mentioned, if | Z | almost surely dominates | Y | , then the a-CBM belongs to thesecond category. If the relation between | Z | and | Y | is inverted and | Y | almostsurely dominates | Z | , then the model belongs to the first category. What is more,voters belonging to different groups are independent. This is a consequence ofthe group modifier overriding the central bias. We have sgn ( Z + Y λ ) = sgn ( Y λ )almost surely. However, voters within each group are still correlated. In order16o calculate the optimal weights, we have to determine the following quantities: a = E ( χ χ ) = 0 ,m = E | Z | = E | Z + Y | > ,r = E [( Z + Y ) sgn ( Z + Y )] = 0 . We note that r − am = 0 and m − r = m .Substituting these expressions into (6), we obtain w κ = Dd mα κ . As we see, if the group modifier overrides the central bias, the optimal weightsare proportional to the group sizes.Next we postulate a model in which some range of biases is equally likely andthe group modifiers tend to be more diverse than the central bias. Let µ be theuniform distribution on some interval [ − β, β ] and ρ the uniform distribution on[ − γ, γ ] such that 0 < β ≤ γ ≤ /
2. This model belongs to the first category,since a ≤ /
3. Now we calculate a = β γ ,m = β + 3 γ γ ,r = β γ . Hence, r − am = β (cid:0) γ − β (cid:1) γ > ,m − r = 3 γ − β γ > w κ = Dd " β (cid:0) γ − β (cid:1) γ + (cid:18) M − β γ (cid:19) γ − β γ α κ = D (cid:0) γ − β (cid:1) γ d (cid:2) β + (cid:0) γ + ( M − β (cid:1) α κ (cid:3) and it is the sum of a constant term which is the same for all groups regardlessof size and a summand which is proportional to the group’s size α κ . If wedivide all weights by the common factor D ( γ − β ) γ d >
0, we do not alter thevoting system. Then the constant term is β and the proportional term is17 γ + ( M − β (cid:1) α κ . Hence for small β in relation to γ , the constant termbecomes negligible and the optimal weights are close to proportional to thegroup sizes. That is the case when the central bias tends to be small in relationto the group modifiers. Here small groups will receive little voting weight in thecouncil.On the other hand, the optimal weight even a very small group receives has alower bound β . The sum of all weights is at most 2 ( M + 1) γ . So even verysmall groups with α κ close to 0 receive a fraction of at least β M + 1) γ if we normalise the sum of weights to 1. If the central bias has the same distri-bution as the group modifiers, i.e. β = γ , then even very small groups receive afraction of at least 1 / (2 ( M + 1)) of the total weight.Next we investigate the complementary case where the group modifiers tend tobe less diverse than the central bias: 0 < γ ≤ β ≤ / a = 1 − γβ ,m = β γ β ,r = β − γ β . We also calculate r − am = γ (cid:18) γ β − γβ + 3 (cid:19) > ,m − r = γ β > w κ = Dγ d (cid:20)(cid:18) γ β − γβ + 3 (cid:19) + 3 (cid:18) M − (cid:18) − γβ (cid:19)(cid:19) γβ α κ (cid:21) . We normalise the weights to 1 as before. A small value of γ implies that thegroup modifiers tend to be small and the group bias is mostly due to the centralbias. As we can see from the above formula, for γ close to 0, the weight a smallgroup ( α κ close to 0) receives is w κ P λ w λ = 33 M = 1 M .
On the other hand, if γ = β , then we already know form our previous analysisthat the relative weight of a small group is given by 1 / (2 ( M + 1)). Hence asmall group fares better as far as its weight is concerned in a situation where thecentral bias is much stronger than the group modifiers compared to a situationwhere both have the same distribution.18 Global Collective Bias Model
In this section, we shall consider the case that the central bias is the onlyinfluence on voting behaviour. This might be the case in a random partitionof a population which is not based on any cultural differences. In the generalframework, the conditional distributions ρ ζ = δ ζ given Z = ζ yield the desiredsetup. This special case also arises in an m-CBM if we let Y ∼ δ . Hence, allresults applicable to m-CBMs holds for a global CBM P as well. P belongs tothe first category if and only if µ { } >
0. As previously, we will not considerthe case µ = δ where all voters are independent. Therefore, any set of weightsis optimal if and only if µ { } = 0 and equal weights for all groups are optimalif µ { } > One obvious extension to the general CBM framework is to allow different con-ditional distributions ρ ζλ for each group to account for more strongly or moreweakly correlated groups.Another interesting question is under what conditions the positivity of all op-timal weights can be guaranteed. Obviously, a negative optimal weight impliesthat the theoretical minimum of the democracy deficit cannot be achieved dueto the incentives a group with a negative voting weight faces to misrepresent itspreferences.For m-CBMs we managed to solve the problem of optimal weights according tothe limiting distribution. These weights are either arbitrary (second category)or equal (first category). So for m-CBMs the optimal weights can always bechosen to be positive. This is not the case for a-CBMs and potentially othervoting measures.In Section 5, we studied an a-CBM with µ being the uniform distribution onsome interval [ − β, β ] and ρ the uniform distribution on [ − γ, γ ] such that 0 <β, γ ≤ /
2. For this model we determined and analysed the properties of theoptimal weights which are always positive. So if the bias variables are uniformlydistributed on centred intervals, there cannot be negative weights. To illustratethat this can occur with other distributions µ and ρ , consider the following case:Let A , A , B be measurable sets with the symmetry property A i = − A i , B = − B and assume they are ordered such that | a | < | b | < | a | holds for all a i ∈ A i , i = 1 , , b ∈ B . Let µ and ρ be symmetric measures with supp µ ⊂ B andsupp ρ ⊂ A ∪ A . We claim that r − am < < a < b < a with µ = ( δ − b + δ b ) and ρ = ( δ − a + δ − a + δ a + δ a ). However, the claim holds19or arbitrary measures µ and ρ with the properties described above. We have a = 14 ,m = 12 E | Z | + E ( | Y | I A ) = b a ,r = 12 E | Z | = b . In the second line above, I A stands for the indicator function of the set A .The inequality r − am ≥ b ≥ (cid:18) b a (cid:19) ⇐⇒ b ≥ a . Since a > b , in this model r − am can be positive, 0, or negative, dependingon the ratio of b and a . If 3 b < a , then r − am is negative. Given that theoptimal weight of each group is w κ = Dd [ r − am + (1 + ( M − a ) ( m − r ) α κ ] , we see that for α κ small enough w κ will be negative. Appendix
Proof of Theorem 18
We prove the statement of Theorem 18 by the method of moments.
Definition 31.
Let X be an m -dimensional real random vector and let K =( K , . . . , K m ) ∈ N m distributed according to P X . Then we define the absolutemoment of order K of Xm K ( P X ) := Z R m (cid:12)(cid:12)(cid:12) x K · · · x K m m (cid:12)(cid:12)(cid:12) d P X . If this expression is finite, then we define the K -th moment of Xm K ( P X ) := Z R m x K · · · x K m m d P X . We will also write m K ( X ) instead of m K ( P X ).It is well known (see e.g. [6]) that for a sequence of random vectors ( X n ) n ∈ N of dimension m each, convergence in distribution is implied by the convergenceof moments of all orders K ∈ N m under some conditions for the growth of themoments as the components of K go to infinity. Here we only need convergence20n distribution for bounded random vectors, so the convergence of the moments m K ( P X n ) to the corresponding moments of a fixed distribution m K ( P X ) im-plies the convergence in distribution X n = ⇒ n →∞ X .To apply the method of moments, we need to show the convergence of expect-ations of the shape E "(cid:18) S N (cid:19) K · · · (cid:18) S M N M (cid:19) K M . (8)This expectation we can express as a sum of correlations E [ X · · · X k · · · X M · · · X Mk M ] , (9)where k λ ∈ { , , . . . , K λ } for each λ . It suffices to consider the first k λ votesfrom each group instead of arbitrary k λ votes from that group because the ran-dom variables belonging to the same groups are exchangeable so in the expect-ation above only the number of different variables from each group is relevant -not their identities. To deal with the task of expressing the moments (8) in termsof correlations as in (9), we need to introduce some combinatorial concepts. Let | A | stand for the cardinality of the set A . Definition 32.
We define a multiindex i = ( i , i , . . . , i L ) ∈ { , , . . . , N } L .1. For j ∈ { , , . . . , N } we set ν j ( i ) := |{ k ∈ { , , . . . , L } | i k = j }| .
2. For ℓ = 0 , , . . . , L we define ρ ℓ ( i ) := |{ j | ν j ( i ) = ℓ }| and ρ ( i ) := ( ρ ( i ) , . . . , ρ L ( i )) . The expression ν j ( i ) represents the multiplicity of each index j ∈ { , , . . . , N } in the multiindex i , and ρ ℓ ( i ) represents the number of indices in i that occurexactly ℓ times. We shall call ρ ( i ) the profile of the multiindex i . Lemma 33.
For all i = ( i , i , . . . , i L ) ∈ { , , . . . , N } L we have P Lℓ =1 ℓρ ℓ ( i ) = L . We use this basic property of profiles to define
Definition 34.
Let r = ( r , . . . , r L ) be such that P Lℓ =1 ℓr ℓ = L hold. We call r a profile vector. We define w L ( r ) = (cid:12)(cid:12) { i ∈ { . . . , N } L | ρ ( i ) = r } (cid:12)(cid:12) to represent the number of multiindices i that have a given profile vector r .21e now define the set of all profile vectors for a given L ∈ N . Definition 35.
Let Π ( L ) = n r ∈ { , , . . . , L } L | P Lℓ =1 ℓr ℓ = L o . Some importantsubsets of Π ( L ) are Π ( L ) k = (cid:8) r ∈ Π ( L ) | r = k (cid:9) , Π L ) = (cid:8) r ∈ Π ( L ) | r ℓ = 0 for all ℓ ≥ (cid:9) and Π +( L ) = (cid:8) r ∈ Π ( L ) | r ℓ > ℓ ≥ (cid:9) . We can also combine super-scripts and subscripts. Then we have, e.g., Π L )0 = (cid:8) r ∈ Π ( L ) | r ℓ = 0 for all ℓ = 2 (cid:9) .We shall write for any i ∈ { , , . . . , N } L X i = X i · · · X i L . For any r ∈ Π ( L ) let j ∈ { , , . . . , N } L be such that ρ (cid:0) j (cid:1) = r . Then we let X r stand for X j . Thisdefinition is not problematic if we are only interested in the expectation E (cid:0) X r (cid:1) = E (cid:16) X j (cid:17) , and the random variables X , . . . , X N are exchangeable.If there are M sets { , , . . . , N ν } L ν , and for each ν i ν ∈ { , , . . . , N ν } L ν , thenwe set i := (cid:0) i ν (cid:1) and write X i for X i · · · X i M . Similarly, if we have profile vectors r ν ∈ Π ( L ν ) , and j ν ∈{ . . . , N ν } L ν such that ρ (cid:0) j ν (cid:1) = r ν , then we write X r for X j · · · X j M . Proposition 36.
For r ∈ Π ( L ) set r := N − P Lℓ =1 r ℓ . Then w L ( r ) = N ! r ! r ! . . . r L ! r ! L !1! r r · · · L ! r L . If we let N go to infinity, then we have w L ( r ) ≈ N P Ll =1 r l r ! r ! . . . r L ! L !1! r r · · · L ! r L . This result is based on Theorem 3.14 and Corollary 3.18 in [4].Now we have the necessary concepts to prove Theorem 18. Let K = ( K , . . . , K M ) ∈ N M and K = P Mν =1 K ν . We need to show that the moments m N K := E N (cid:20)(cid:16) S N (cid:17) K · · · (cid:16) S M N M (cid:17) K M (cid:21) converge to m K := m K ( Z , . . . , Z M ). In the firstsum below, for each λ ∈ { , . . . , M } i λ ∈ { , , . . . , N λ } K λ , in the second sum, r λ ∈ Π ( K λ ) . m K (cid:18) S N , . . . , S M N M (cid:19) = E (cid:18) S K N M . . . S MM N MM (cid:19) = 1 Q Mν =1 N K ν ν X i ,...,i M E (cid:16) X i · · · X i M (cid:17) = 1 Q Mν =1 N K ν ν X r ,...,r M M Y λ =1 w K λ (cid:0) r λ (cid:1) E (cid:16) X r · · · X r M (cid:17) . (10)22e need to know E (cid:16) X r · · · X r M (cid:17) . Let k λ ∈ { , , . . . , K λ } for each λ . E [ X · · · X k · · · X M · · · X Mk M ]= Z [ − , Z [ − , · · · Z [ − , E ( ζ ,...,ζ M ) [ X · · · X k · · · X M · · · X Mk M ] d ρ ζ · · · d ρ ζM d µ = Z [ − , Z [ − , · · · Z [ − , Y λ E ζ λ [ X λ · · · X λk λ ] d ρ ζ · · · d ρ ζM d µ = Z [ − , Z [ − , · · · Z [ − , Y λ ζ k λ λ d ρ ζ · · · d ρ ζM d µ, (11)where we used the conditional independence of all X λi given Z λ = ζ λ andthe identical conditional distribution within each group with E ζ λ [ X λi ] = ζ λ .In particular, this correlation is independent of N and therefore of each N ν .According to Definition 6, the moment m K is equal to m K = Z [ − , Z [ − , · · · Z [ − , Y λ ζ k λ λ d ρ ζ · · · d ρ ζM d µ. Proposition 36 says that for each λw K λ ( r λ ) ≈ N P Ll =1 r λl λ r λ ! r λ ! . . . r λL ! K λ !1! r λ r λ · · · K λ ! r λKλ ≤ N K λ λ K λ ! K λ !1! K λ = N K λ λ , where the inequality holds for large enough N and hence N λ .We conclude that, asymptotically, only a single summand of the sum (10) con-tributes to the moment m K (cid:16) S N , . . . , S M N M (cid:17) : the one where each r λ = ( K λ , , . . . , Q Mν =1 N K ν ν M Y λ =1 N K λ λ E (cid:0) X ( K , ,..., · · · X ( K M , ,..., (cid:1) = E ( X ( K , ,..., · · · X ( K M , ,..., )= E (cid:16) X j · · · X j M (cid:17) , (12)where ρ (cid:0) j λ (cid:1) = ( K λ , , . . . ,
0) for each λ = 1 , . . . , M . By (11), this correlation isequal to Z [ − , Z [ − , · · · Z [ − , Y λ ζ k λ λ d ρ ζ · · · d ρ ζM d µ, which is also equal to the moment m K . This concludes the proof of Theorem18. 23 eferenceseferences