[PDF] Cutoff for Almost All Random Walks on Abelian Groups

Abstract

Consider the random Cayley graph of a finite group G with respect to k generators chosen uniformly at random, with 1 \ll \log k \ll \log |G|; denote it G_k. A conjecture of Aldous and Diaconis (1985) asserts, for k \gg \log |G|, that the random walk on this graph exhibits cutoff. Further, the cutoff time should be a function only of k and |G|, to sub-leading order. This was verified for all Abelian groups in the '90s. We extend the conjecture to 1 \ll k \lesssim \log |G|. We establish cutoff for all Abelian groups under the condition k - d(G) \gg 1, where d(G) is the minimal size of a generating subset of G, which is almost optimal. The cutoff time is described (abstractly) in terms of the entropy of random walk on \mathbb Z^k. This abstract definition allows us to deduce that the cutoff time can be written as a function only of k and |G| when d(G) \ll \log |G| and k - d(G) \asymp k \gg 1; this is not the case when d(G) \asymp \log |G| \asymp k. For certain regimes of k, we find the limit profile of the convergence to equilibrium. Wilson (1997) conjectured that \mathbb Z_2^d gives rise to the slowest mixing time for G_k amongst all groups of size at most 2^d. We give a partial answer, verifying the conjecture for nilpotent groups. This is obtained via a comparison result of independent interest between the mixing times of nilpotent G and a corresponding Abelian group \overline G, namely the direct sum of the Abelian quotients in the lower central series of G. We use this to refine a celebrated result of Alon and Roichman (1994): we show for nilpotent G that G_k is an expander provided k - d(\overline G) \gtrsim \log |G|. As another consequence, we establish cutoff for nilpotent groups with relatively small commutators, including high-dimensional special groups, such as Heisenberg groups.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b Cutoﬀ for Almost All Random Walks on Abelian Groups

Jonathan Hermon Sam Olesker-Taylor

Abstract

Consider the random Cayley graph of a ﬁnite group G with respect to k generators chosenuniformly at random, with 1 ≪ log k ≪ log | G | ; denote it G k . A conjecture of Aldous andDiaconis [1] asserts, for k ≫ log | G | , that the random walk on this graph exhibits cutoﬀ.Further, the cutoﬀ time should be a function only of k and | G | , to sub-leading order.This was veriﬁed for all Abelian groups in the ’90s. We extend the conjecture to 1 ≪ k . log | G | . We establish cutoﬀ for all Abelian groups under the condition k − d ( G ) ≫

1, where d ( G ) is the minimal size of a generating subset of G , which is almost optimal. The cutoﬀtime is described (abstractly) in terms of the entropy of random walk on Z k . This abstractdeﬁnition allows us to deduce that the cutoﬀ time can be written as a function only of k and | G | when d ( G ) ≪ log | G | and k − d ( G ) ≍ k ≫

1; this is not the case when d ( G ) ≍ log | G | ≍ k .For certain regimes of k , we ﬁnd the limit proﬁle of the convergence to equilibrium.Wilson [46] conjectured that Z d gives rise to the slowest mixing time for G k amongst allgroups of size at most 2 d . We give a partial answer, verifying the conjecture for nilpotentgroups. This is obtained via a comparison result of independent interest between the mixingtimes of nilpotent G and a corresponding Abelian group G , namely the direct sum of theAbelian quotients in the lower central series of G . We use this to reﬁne a celebrated result ofAlon and Roichman [3]: we show for nilpotent G that G k is an expander provided k − d ( G ) & log | G | . As another consequence, we establish cutoﬀ for nilpotent groups with relatively smallcommutators, including high-dimensional special groups, such as Heisenberg groups.The aforementioned results all hold with high probability over the random Cayley graph G k . Keywords: cutoﬀ, mixing times, random walk, random Cayley graphs, entropy

MSC 2020 subject classiﬁcations:

Contents

Jonathan Hermon Sam [email protected], math.ubc.ca/ ∼ jhermon/ [email protected], mathematicalsam.wordpress.comUniversity of British Columbia, Vancouver, Canada Department of Mathematical Sciences, University of Bath, UKSupported by EPSRC EP/L018896/1 and an NSERC Grant Supported by EPSRC Grants 1885554 and EP/N004566/1The vast majority of this work was undertaken while both authors were at the University of Cambridge Introduction and Statement of Results

We analyse properties of the random walk (abbreviated RW ) on a Cayley graph of a ﬁnitegroup. The generators of this graph are chosen independently and uniformly at random. Precisedeﬁnitions are given in § G be a ﬁnite group, let k be an integer (allowed todepend on G ) and denote by G k the Cayley graph of G with respect to k independently anduniformly random generators. We consider values of k with 1 ≪ log k ≪ log | G | for which G k isconnected with high probability (abbreviated whp ), ie with probability tending to 1 as | G | grows.Since pioneering work of Erd˝os, it has been understood that the typical behaviour of random objects in some class can shed valuable light on the class as a whole. Thus, when considering someclass of combinatorial objects, it is natural to ask questions such as the following. · What does a typical object in this class ‘look like’ ? · If an object is chosen uniformly at random, which properties hold with high probability?

Aldous and Diaconis [1] applied this philosophy to the study of random walks on groups.Aldous and Diaconis [1, 2] coin the phrase cutoﬀ phenomenon : this occurs when the totalvariation distance (TV) between the law of the RW and its invariant distribution drops abruptlyfrom close to 1 to close to 0 in a time-interval of smaller order than the mixing time. The materialin this article is motivated by a conjecture of theirs regarding ‘universality of cutoﬀ’ for the RWon the random Cayley graph G k ; it is given in [1, Page 40], which is an extended version of [2]. Conjecture (Aldous and Diaconis, 1985) . For any group G , if k ≫ log | G | and log k ≪ log | G | ,then the random walk on G k exhibits cutoﬀ whp. Further, the cutoﬀ time, to leading order, isindependent of the algebraic structure of the group: it can be written as a function only of k and | G | . This conjecture spawned a large body of work, including [19, 20, 30, 31, 32, 43, 46]; see § § § Z p for prime p and [46]which considers Z d (which enforces k ≥ d = log | G | ), focus has been on k ≫ log | G | .We establish cutoﬀ for all Abelian groups when 1 ≪ k . log | G | under almost optimal condi-tions in terms of group-generation. We also give simple conditions under which the cutoﬀ time isindependent of the algebraic structure of the group.The second part of this article is motivated by a conjecture of Wilson. Wilson [46] establishescutoﬀ for the RW on G k when G = Z d and then conjectures that Z d is the slowest amongst allgroups of size at most 2 d , asymptotically as d → ∞ ; see [46, Theorem 1 and Conjecture 7]. Conjecture (Wilson, 1997) . For all diverging d and n with n ≤ d and all groups G of size n , if k − log n ≫ and log k ≪ log n , then t mix ( ε, G k ) /t mix ( ε ′ , H k ) ≤ o (1) whp for all ε, ε ′ ∈ (0 , where H := Z d —ie, the mixing time for G k is at most that of H k whp up to smaller order terms. We establish a comparison between the mixing times for nilpotent and Abelian groups. Wilson’sconjecture in the nilpotent set-up is an immediate consequence of this general comparison result.We apply our nilpotent–Abelian comparison theorem to establish cutoﬀ for various examples ofnon-Abelian groups, including p -groups with ‘small’ commutators and Heisenberg groups. Our focus is on mixing properties of the RW on the random Cayley graph G k . We considerthe limit as n := | G | → ∞ under the assumption that 1 ≪ log k ≪ log | G | . The condition1 ≪ log k ≪ log | G | is necessary for cutoﬀ on G ± k for all nilpotent G ; see Remark A.6.We establish cutoﬀ when G is any Abelian group, requiring only k − d ( G ) ≫

1, where d ( G ) is the minimal size of a generating subset of G . We show that the leading order2erm in the cutoﬀ time is independent of the algebraic structure of G when d ( G ) ≪ log | G | and k − d ( G ) ≍ k , ie it depends only on k and | G | . It is the time at which theentropy of RW on Z k is log | G | . This extends the Aldous–Diaconis conjecture to 1 ≪ k . log | G | . For certain k , we ﬁnd the limit proﬁle of the convergence to equilibrium.We deduce Wilson’s conjecture in the Abelian set-up, as a consequence of our cutoﬀwork. We then extend this to the nilpotent set-up via the following result, which is ofindependent interest: to a nilpotent group G , we associate an Abelian group G of thesame size, which is the direct sum of the Abelian quotients in the lower central seriesof G , and show that t mix ( G k ) /t mix ( G k ) ≤ o (1) whp (provided k − d ( G ) ≫ G k where G is a nilpotent group with a relatively small commutator. Examples ofsuch groups include high-dimensional extra special or Heisenberg groups.Lastly, we show that the random Cayley graph of a nilpotent group G is an expander whp whenever k & log | G | and k − d ( G ) ≍ k . (If G is Abelian, then G = G .)Introduced by Aldous and Diaconis [1], there has been a great deal of research into these randomCayley graphs. Motivation for this model and an overview of historical work is given in § Cayley graphs are either directed or undirected; we emphasise this by writing G + k and G − k ,respectively. When we write G k or G ± k , this means “either G − k or G + k ”, corresponding to the undir-ected, respectively directed, graphs with generators chosen independently and uniformly at random.Conditional on being simple, G + k is uniformly distributed over the set of all simple degree- k Cayley graphs. Up to a slightly adjusted deﬁnition of simple for undirected Cayley graphs, ourresults hold with G k replaced by a uniformly chosen simple Cayley graph of degree k ; see § G N ) N ∈ N of ﬁnite groups with | G N | → ∞ as N → ∞ . For ease ofpresentation, we write statements like “let G be a group” instead of “let ( G N ) N ∈ N be a sequence ofgroups”. Likewise, the quantities d ( G ) and, of course, k appearing in the statements all correspondto sequences, which need not be ﬁxed (or bounded) unless we explicitly say otherwise. In the samevein, an event holds with high probability (abbreviated whp ) if its probability tends to 1.We use standard asymptotic notation: “ ≪ ” or “ o ( · )” means “of smaller order”; “ . ” or O ( · )”means “of order at most”; “ ≍ ” means “of the same order”; “ h ” means “asymptotically equivalent”. We analyse mixing in the total variation (abbreviated TV ) distance. The uniform distributionon G , denoted π G , is invariant for the RW. Let S = ( S ( t )) t ≥ denote the RW on G k ; its law isdenoted P G k ( S ( t ) ∈ · ). For t ≥

0, denote the TV distance between the law of S ( t ) and π G by d G k ( t ) := (cid:13)(cid:13) P G k (cid:0) S ( t ) ∈ · (cid:1) − π G (cid:13)(cid:13) TV := max A ⊆ G (cid:12)(cid:12) P G k (cid:0) S ( t ) ∈ A (cid:1) − | A | / | G | (cid:12)(cid:12) . Throughout, unless explicitly speciﬁed otherwise, we use continuous time: t ≥ t ∈ [0 , ∞ ). We use standard notation and deﬁnitions for mixing and cutoﬀ ; see, eg, [34, § § Deﬁnition.

A sequence ( X N ) N ∈ N of Markov chains is said to exhibit cutoﬀ when, in a shorttime-interval, known as the cutoﬀ window , the TV distance of the distribution of the chain fromequilibrium drops from close to to close to , or more precisely if there exists ( t N ) N ∈ N with lim sup N →∞ d N (cid:0) t N (1 − ε ) (cid:1) = 1 and lim sup N →∞ d N (cid:0) t N (1 + ε ) (cid:1) = 0 for all ε ∈ (0 , , where d N ( · ) is the TV distance of X N ( · ) from its equilibrium distribution for each N ∈ N .We say that a RW on a sequence of random graphs ( H N ) N ∈ N exhibits cutoﬀ around time ( t N ) N ∈ N whp if, for all ﬁxed ε , in the limit N → ∞ , the TV distance at time (1 + ε ) t N convergesin distribution to and at time (1 − ε ) t N to , where the randomness is over H N .

3o extend the Aldous–Diaconis conjecture to 1 ≪ k . log | G | , one needs additional assump-tions. For an Abelian group G , write d ( G ) for the minimal size of a generating set of G . If k < d ( G ),then the group cannot be generated by any choice of generators. Pomerance [42] shows that theexpected number of independent, uniform generators required to generate the group is at most d ( G ) + 3. (That is, if Z , Z , ... ∼ iid Unif( G ) and κ ∈ N is minimal with h Z , ..., Z κ i = G ,then d ( G ) ≤ E ( κ ) ≤ d ( G ) + 3.) Thus k − d ( G ) ≫ G to be gener-ated by { Z ± , ..., Z ± k } whp (by Markov’s inequality); we assume this throughout. In many cases, k − d ( G ) ≫ k − d ( G ) ≍ k is particularly relevant for the Aldous–Diaconis conjecture; see Remark A.4.We use an entropic method , which involves deﬁning entropic times ; see § § W to generate the walk S ; one then studies the entropy of the process W . Write Z =[ Z , ..., Z k ] for the (multiset of) generators of the Cayley graph; then G k corresponds to choosing Z , ..., Z k ∼ iid Unif( G ). Here, W i ( t ) is, for each i , the number of times generator Z i has been appliedminus the number of times Z − i has been applied; W is a rate-1 RW on Z k . Then S ( t ) = W ( t ) · Z when the group is Abelian. (This auxiliary process W is key even when studying nilpotent groups.)For undirected graphs, W is the usual simple RW (abbreviated SRW ): a coordinate is selecteduniformly at random and incremented/decremented by 1 each with probability . For directedgraphs, inverses are never applied, so a step of W is as follows: a coordinate is selected uniformlyat random and incremented by 1; we term this the directed RW (abbreviated

DRW ). Deﬁnition A.

For γ ∈ N ∪ {∞} , let t ± γ := t ± γ ( k, G ) be the time at which the entropy of rate-1 RW(ie SRW or DRW, as appropriate) on Z kγ is log | G/γG | , where γG := { γg | g ∈ G } ; we use theconvention, Z ∞ := Z and ∞ G := | G | G = { id } . Set t ±∗ := t ±∗ ( k, G ) := max γ ∈ N t ± γ ( k, G ) . We establish cutoﬀ for all Abelian groups, under almost optimal conditions on k in terms of G .This gives an aﬃrmative answer for Abelian groups in a strong sense to the primary part of theconjecture (occurrence of cutoﬀ) of Aldous and Diaconis [1, 2] as well as the informal question askedby Diaconis [13]; we discuss the secondary part (time depending only on k and | G | ) in Remark A.4.Cutoﬀ has already been established for Abelian groups when k ≫ log | G | with log k ≪ log | G | ,as mentioned above; see § ≪ k . log | G | . For 1 ≪ k . log | G | , only two groups had been considered previously: Z d in [46] and Z p with p prime in[32]. Recall that 1 ≪ log k ≪ log | G | is necessary for cutoﬀ for nilpotent G , eg Abelian G ; seeRemark A.6. More reﬁned statements are given in Theorems 2.4, 3.6 and 4.1. Theorem A.

Let G be an Abelian group and k an integer with ≪ k . log | G | . Suppose that k − d ( G ) ≫ . Then the RW on G ± k exhibits cutoﬀ at time t ±∗ ( k, G ) whp.Moreover, if k − d ( G ) ≍ k and d ( G ) ≪ log | G | , then t ∗ ( k, G ) h t ∞ ( k, | G | ) h k | G | /k / (2 πe ) . If k > d ( G ) , then t ∗ ( k, G ) . k | G | /k log k. If k − d ( G ) ≍ log | G | , then t ∗ ( k, G ) ≍ k | G | /k . We now give some remarks on this theorem. Further remarks are deferred to § Remark A.1.

For certain regimes of k we ﬁnd the limit proﬁle of the convergence to equilibrium:we deﬁne entropic times t α and show that d G k ( t α ) → P Ψ( α ) , where Ψ is the standard Gaussiantail; see Deﬁnition 2.1, Proposition 2.2 and Theorem 2.4. This holds for any Abelian group if,for example, k − d ( G ) ≍ k and 1 ≪ k ≪ log | G | / log log log | G | or k − d ( G ) ≫ ≪ k ≪ p log | G | / log log log | G | . The result holds for any 1 ≪ k ≪ log | G | under some constraints on thegroup. In [25, Theorem A] we show the same for k ≍ log | G | , again with some constraints on G . △ Remark A.2.

From the abstract entropic deﬁnition, amongst Abelian groups Z d is the slowest:max (cid:8) t ∗ ( k, G ) (cid:12)(cid:12) G Abelian group with | G | ≤ d (cid:9) = t ∗ ( k, Z d ) . This thus veriﬁes Wilson’s conjecture in the Abelian set-up. △ emark A.3. The entropic time t ∞ arises naturally; see § W t := (cid:8) w ∈ Z k | P (cid:0) W ( t ) = w (cid:1) ≪ / | G | (cid:9) = (cid:8) w ∈ Z k (cid:12)(cid:12) − log P (cid:0) W ( t ) = w (cid:1) − log | G | ≫ (cid:9) to satisfy P ( W ( t ) ∈ W t ) = 1 − o (1). We thus want the entropy of W ( t ) to be at least log | G | .The arisal of the entropic times t γ is more delicate. We outline this in § △ One can also consider cutoﬀ in L distance, instead of TV (ie L ). For time t ≥

0, deﬁne d (2) G k ( t ) := (cid:13)(cid:13) P G k (cid:0) S ( t ) ∈ · (cid:1) − π G (cid:13)(cid:13) ,π G := (cid:0) | G | − P g ∈ G (cid:0) | G | P G k ( S ( t ) = g ) − (cid:1) (cid:1) / . One can then deﬁne mixing and cutoﬀ with respect to L analogously to TV ( L ) distance.It turns out that L mixing time is at least a constant larger than the TV. Similar considerationsto those in Remark A.3 suggest that for the L mixing the key condition is P ( W (2 t ) = 0) ≪ / | G | . This leads us to the following conjecture for the L mixing time, which we state informally. Conjecture A.

For γ ∈ Z ∪ {∞} , let ˜ t ± γ := ˜ t ± γ ( k, G ) be the time t at which the return probabilityfor RW on Z kγ at time t is | G | − . Set ˜ t ±∗ ( k, G ) := max γ ∈ N ˜ t ± γ ( k, G ) . Then, under similar conditionsto those of Theorem A, whp, the RW on G k exhibits cutoﬀ in the L metric at time ˜ t ±∗ ( k, G ) . We also consider cutoﬀ in separation distance . For time t ≥

0, deﬁne s G k ( t ) := max g ∈ G (cid:8) − | G | P G k (cid:0) S ( t ) = g (cid:1)(cid:9) . One can then deﬁne mixing and cutoﬀ with respect to separation distance analogously to TV.It is standard that, under reversibility, the TV and separation mixing times diﬀer by up to afactor 2; see, eg, [34, Lemmas 6.16 and 6.17]. However, Hermon, Lacoin and Peres [22, Theorem 1.1]showed that TV and separation cutoﬀ are not equivalent, and that neither one implies the other.We analyse the regime k − d ( G ) ≍ k & log | G | ; in this regime, we show that separation cutoﬀoccurs whp, and moreover that the cutoﬀ time is the same, up to subleading order, as for TV.A more reﬁned statement is given in Theorem 5.1. Theorem B.

Let G be an Abelian group and k an integer. Suppose that ≪ log k ≪ log | G | and k − d ( G ) ≫ max { ( k log | G | ) , (log | G | ) / } . Then the RW on G k exhibits cutoﬀ in separationdistance at time t ∗ ( k, G ) whp. Remark B.

The conditions hold if k & (log | G | ) / , log k ≪ log | G | and k − d ( G ) ≫ (log | G | ) / .Analogously to Remark A.2, the slowest amongst Abelian groups for separation mixing is Z d . △ The previous results established cutoﬀ. The next results are of a slightly diﬀerent ﬂavour. Theyconsider nilpotent groups: these are groups G whose lower central series , ie the sequence ( G ℓ ) ℓ ≥ deﬁned by G := G and G ℓ := [ G ℓ − , G ] for ℓ ≥

1, stabilises at the trivial group. The resultscompared the mixing times between diﬀerent groups; these mixing times are random.

Deﬁnition.

For ε ∈ (0 , and a Cayley graph H , write t mix ( ε, H ) := inf { t ≥ | d H ( t ) ≤ ε } . For two sequences H := ( H N ) N ∈ N and H ′ := ( H ′ N ) N ∈ N of random Cayley graphs, say that t mix ( H ) /t mix ( H ′ ) ≤ o (1) whp if there exist non-random sequences ( γ N ) N ∈ N and ( δ N ) N ∈ N with lim N δ N = 0 such that, for all ε, ε ′ ∈ (0 , , we have lim N →∞ P (cid:0) t mix ( ε, H N ) ≤ (1 + δ N ) γ N (cid:1) = 1 = lim N →∞ P (cid:0) (1 − δ N ) γ N ≤ t mix ( ε ′ , H ′ N ) (cid:1) . We establish Wilson’s conjecture in the nilpotent set-up, as the following theorem describes.

Theorem C.

For all diverging d and n with n ≤ d and all nilpotent groups G of size n , if k − log n ≫ and log k ≪ log n , then t mix ( G k ) /t mix ( H k ) ≤ o (1) whp where H := Z d .

5s noted in Remark A.2, for Abelian groups this follows from our cutoﬀ result and the abstractentropic deﬁnition of the cutoﬀ time t ∗ ( k, G ) for Abelian G . The extension to nilpotent groups isthen established by Theorem D below, which is of independent interest. It is quite signiﬁcantlystronger than Wilson’s conjecture in the nilpotent set-up. We can use it to establish cutoﬀ for aclass of nilpotent groups with ‘small commutator’; see Corollary D. Theorem D.

Let G be a nilpotent group. Set G := ⊕ L ( G ℓ − /G ℓ ) where ( G ℓ ) ℓ ≥ is the lowercentral series of G and L := min { ℓ ≥ | G ℓ = { id }} . Suppose that ≪ log k ≪ log | G | and k − d ( G ) ≫ . Then t mix ( G k ) /t mix ( G k ) ≤ o (1) whp. Besides being the key ingredient in the proof of Wilson’s conjecture in the nilpotent case, wedemonstrate that this result is tight enough to establish cutoﬀ for some nilpotent groups. For agroup G , denote by G com := [ G, G ] its commutator and by G ab := G/G com its

Abelianisation . Corollary D.1.

Let G be a ﬁnite, non-Abelian, nilpotent group and k such that ≪ log k ≪ log | G | . · If k . log | G | , then suppose that k ≫ d ([ G, G ]) log | [ G, G ] | and k − d ( G ab ) ≫ d ([ G, G ]) . · If k ≫ log | G | , then suppose only that log | [ G, G ] | ≪ log | G ab | .Then the RW on G k exhibits cutoﬀ at t ∗ ( k, G ab ) whp. For step-2 nilpotent groups, [

G, G ] is Abelian and hence [

G, G ] = [

G, G ]. The above corollary isthus particularly applicable for these groups. A particular example of such groups is special groups with small commutator. For a prime p , a p -group is special if it is step-2 and has centre Z ( G ),Frattini subgroup Φ( G ) and commutator subgroup [ G, G ] all equal and elementary Abelian (ieisomorphic to Z sp for some s ). In this case, also G ab ∼ = Z rp where r := ℓ − s and ℓ := log p | G | .Using this particular form of the Abelianisation and commutator, we can relax the conditionson k . Note that t p ( k, Z rp ) is the time at which the entropy of RW on Z kp reaches log( p r ) = log | Z rp | . Corollary D.2.

Let p be prime, G be a non-Abelian, special p -group and k be such that ≪ log k ≪ log | G | . Let r := log p | G ab | , s := log p | G com | and ℓ := r + s = log p | G | . Suppose that k ≥ ℓ . · If k . log | G | , then suppose that k ≫ s log p and k − r ≫ s . · If k ≫ log | G | , then suppose only that s ≪ r .Then the RW on G k exhibits cutoﬀ at t ∗ ( k, G ab ) = t p ( k, Z rp ) whp conditional that G k is connected.If ( k − r ) p ≫ , then G k is connected whp. If k − r ≍ k and p ≫ , then t p ( k, Z rp ) h t ∞ ( k, Z rp ) . Special groups are ubiquitous amongst p -groups of a given size in a precise, quantitative sense.Hence Corollary D.2 is applicable to many groups. See Remark 6.11 for a precise statement as wellas some asymptotic expressions. Sims [44] gives, for given ( p, ℓ, s ), a simple, explicit description ofall special groups of size p ℓ whose commutator is of size p s . Extra special groups satisfy G com ∼ = Z p (so d ( G com ) = 1) and | G | = p d − for some integer d ≥

3. For given d and p = 2, up to isomorphism there are only two extra special groups. Oneof these is the Heisenberg group , which can be deﬁned for p not prime also. For (not necessarilyprime) m, d ∈ N , the Heisenberg group H m,d is the set triples ( x, y, z ) ∈ Z d − m × Z d − m × Z m with( x, y, z ) ◦ ( x ′ , y ′ , z ′ ) := ( x + x ′ , y + y ′ , z + z ′ + x · y ′ ) , where x · y ′ is the usual dot product for vectors in Z d − m . We have H ab m,d ∼ = Z d − m and H com m,d ∼ = Z m . For p prime, H p,d with d ≫ r = 2 d − s = 1. The following corollary thus focusses on H m,d with m not (necessarily) prime. Note that t ∞ ( k, Z rm ) is the time at which the entropy of RW on Z k = Z k ∞ reaches log( m r ) = log | Z rm | . Corollary D.3.

Let m, d ∈ N with d ≫ . Suppose that k − d ≫ , k ≫ log m and log k ≪ d log m ≍ log | H m,d | . Then, whp, the RW on ( H m,d ) k exhibits cutoﬀ at t ∗ ( k, H ab m,d ∼ = Z d − m ) . If additionally k − d ≍ k and m ≫ , then t ∗ ( k, Z d − m ) = t m ( k, Z d − m ) h t ∞ ( k, Z d − m ) h km d − /k / (2 πe ) . If m is ﬁxed (and thus d ≫ k ≫ log m is absorbed into k ≫

1. Thus thiscorollary handles arbitrary Heisenberg groups H m,d with m ﬁxed and k − d ≫ Remark D.1.

Wilson’s conjecture requires k − log | G | ≫ t mix ( G k ) with t ∗ ( k, Z d ).We have d ( G ) ≤ max { ℓ ∈ N | p ℓ divides | G | for some prime p } ≤ log | G | ; often d ( G ) is muchsmaller than log | G | . (In fact, in some precise sense of choosing an Abelian group H uniformly,typically d ( H ) ≪ log | H | .) Further, t mix ( G k ) may be signiﬁcantly smaller than t ∗ ( k, Z d ).The bounds on t ∗ ( k, G ), for Abelian G , described in Theorem A complement the upper bound t mix ( G k ) ≤ t mix ( G k ) to give explicit bounds on t mix ( G k ) which hold whp. △ Remark D.2.

In the course of proving this theorem, we prove an exact relation between the L mixing time for the RWs on G k and G k , namely E ( d (2) G k ( t )) ≤ E ( d (2) G k ( t )) . We actually prove a morereﬁned version of this which allows us to compare the modiﬁed L distances, ie the L distancesconditional that W ( t ) is in some ‘typical set’, ie W ( t ) ∈ W t ; recall Remark A.3. We use preciselythis modiﬁed L calculation to upper bound the TV mixing time in the proof of Theorem A. Thecomparison of TV distances, namely Theorem D, follows since P ( W ( t ) ∈ W t ) = 1 − o (1). △ As explained below, it is natural to conjecture that Theorem D does not require G to benilpotent. The deﬁnition of the Abelian group G corresponding to G required G to be nilpotent.We extend this deﬁnition to allow general group G . (The deﬁnitions are equivalent if G is nilpotent.)The following conjecture extends Theorem D; it contains, as a special case, Wilson’s conjecture. Conjecture D.

Let G be a group. Let ( G ℓ ) ℓ ≥ be its lower central series and L := min { ℓ ≥ | G ℓ = { id }} . Let the prime decomposition of | G L | be | G L | = Q r p j . Set G := ( ⊕ L ( G ℓ − /G ℓ )) ⊕ ( ⊕ r Z p j ) . Suppose that ≪ log k ≪ log | G | and k − d ( G ) ≫ . Then t mix ( G k ) /t mix ( G k ) ≤ o (1) whp. We are showing in Theorem D, for nilpotent groups, that being non-Abelian can only speed upthe mixing. Finite nilpotent groups are intuitively thought of as ‘almost Abelian’; this is (partially)because two elements having co-prime orders must commute. Thus removing the nilpotent propertyshould only mean the group is ‘farther from Abelian’ and speed up the mixing.

Our last result considers the expansion properties of the random Cayley graph.

Deﬁnition E.

The isoperimetric constant of a ﬁnite d -regular graph G = ( V, E ) is deﬁned as Φ ∗ := min ≤| S |≤ | V | Φ( S ) where Φ( S ) := d | S | (cid:12)(cid:12)(cid:2) { a, b } ∈ E (cid:12)(cid:12) a ∈ S, b ∈ S c (cid:3)(cid:12)(cid:12) . Theorem E.

Let G be a nilpotent group. Set G := ⊕ L ( G ℓ − /G ℓ ) where ( G ℓ ) ℓ ≥ is the lowercentral series of G and L := min { ℓ ≥ | G ℓ = { id }} . For all c > , there exists a c ′ > so that if k − d ( G ) ≥ c log | G | , then Φ ∗ ( G k ) ≥ c ′ whp. Remark E.

This theorem is already known when k − log | G | ≍ k , without the nilpotent restriction;it is a celebrated result of Alon and Roichman [3]. It thus suﬃces to consider only k ≍ log | G | . △ Here we make some remarks on Theorem A in addition to the three in § Remark A.4.

When d ( G ) ≪ log | G | and k − d ( G ) ≍ k , one can check that t ∗ ( k, G ) is the sameas the time at which the entropy of rate-1 RW on Z k is log | G | . When 1 ≪ k ≪ log | G | , thisis k | G | /k / (2 πe ), up to smaller order terms; see Proposition 3.2b. This means that the naturalextension of the Aldous–Diaconis conjecture to the regime 1 ≪ k . log | G | is veriﬁed in full forAbelian groups when d ( G ) ≪ log | G | and k − d ( G ) ≍ k .However, when k ≍ log | G | ≍ d ( G ), while cutoﬀ is still exhibited whp, the cutoﬀ time does notdepend only on k and | G | . Eg, if k h r ), then Z r and Z r give rise to mixing times which7iﬀer by a constant factor. There are even regimes with 1 ≪ k ≪ log | G | where the claim does nothold, provided 1 ≪ k − d ( G ) ≪ k ; see [25, Proposition 3.2 and Theorem 3.4] where Z dp is studied(with both of p and d allowed to diverge). △ Remark A.5.

For k ≫ log | G | with log k ≪ log | G | , cutoﬀ has been established for all Abeliangroups, at an explicit time, and this time is an upper bound on the mixing time for arbitrary (notjust Abelian) groups; see § t ∞ ( k, | G | ), which in turn is equivalent to t ∗ ( k, | G | ); see, eg, [27, Proposition B.19]. △ Remark A.6.

This article establishes cutoﬀ in a variety of set-ups, but always in the regime 1 ≪ log k ≪ log | G | . This leaves the regimes k ≍ k ≍ log | G | , for which there is no cutoﬀ for anychoice of generators: when k ≍

1, this holds whenever the group is nilpotent; when log k ≍ log | G | ,this holds for all groups. The former result is due to Diaconis and Saloﬀ-Coste [16]; we give a shortexposition of this in [25, § § k ≍ log | G | . △ Remark A.7.

Our approach lifts the walk S from the Abelian Cayley graph G ( Z ) to a walk W onthe free Abelian group with k = | Z | generators. Note that the walk W is independent of Z , ie of which k generators are used. We then study the lifted walk W , in particular its entropic proﬁle,before projecting back from W to S . This gives us a candidate mixing time; see § § △ Remark A.8.

The theorem is established via two distinct approaches: the former applies for k notgrowing too rapidly; the second can be seen as a reﬁnement of the ﬁrst, optimised for larger k ,where the ﬁrst breaks down. We combine the two approaches to analyse an interim regime of k .We separate the exposition of the approaches: they are given in § § §

4, respectively.In the ﬁrst two a concept of entropic times is deﬁned; see § § § § § △ In this subsection, we give a fairly comprehensive account of previous work on mixing andcutoﬀ for random walk on random Cayley graphs; we compare our results with existing ones. Theoccurrence of cutoﬀ in particular has received a great deal of attention over the years. We alsomention, where relevant, other results which we have proved in companion papers; see also § Aldous and Diaconis [1, Page 40] stated their conjecture for k ≫ log | G | . A more reﬁned versionis given by Dou [19, Conjectures 3.1.2 and 3.4.5]; see also [30, 43]. An informal, more general,variant was reiterated by Diaconis in [13, Chapter 4G, Question 8]; he gave some related openquestions recently in [14, § T ( k, | G | ) := log | G | / log( k/ log | G | ) = ρρ − log k | G | where ρ is deﬁned by k = (log | G | ) ρ . (To have k ≫ log | G | , one needs ρ − ≫ / log log | G | .) See also Dou [19] and Hildebrand [31].There is a trivial diameter-based lower bound of log k | G | . If ρ ≫

1, ie k is super-polylogarithmicin | G | , then T ( k, | G | ) h log k | G | . Thus cutoﬀ is established for all groups for such k .In [24, Theorem B], using the group U m,d of d × d unit upper triangular matrices with entriesin Z m , we disprove the part of the conjecture concerning the independence of the cutoﬀ time8rom the algebraic structure of the group: if d ≥ k = (log | U m,d | ) /d , then there iscutoﬀ at d T ( k, | U m,d | ). In fact, T ( k, | U m,d | ) does not even capture the correct order: letting d → ∞ suﬃciently slowly, we still have k = (log | U m,d | ) /d ≫ log | U m,d | and the cutoﬀ time is still shownto be d T ( k, | U m,d | ), which is o ( T ( k, | U m,d | )).There has been some investigation into the regime 1 ≪ k . log | G | , but with much less success.Hildebrand [30, Theorem 4] showed that the mixing time must be super-polylogarithmic, unlikefor k ≫ log | G | . Wilson [46, Theorem 1] established cutoﬀ for Z d ; this naturally requires k ≥ d =log | G | . Regarding 1 ≪ k ≪ log | G | , a breakthrough came recently when Hough [32, Theorem 1.7]established cutoﬀ for Z p with 1 ≪ k ≤ log p/ log log p and p a (diverging) prime. The techniqueswere specialised to their respective cases; we consider arbitrary Abelian groups. In the direction of comparison of mixing times, there has been much less work. The only workof note (of which we are aware) is by Pak [39]. There he studies universal mixing bounds (ie onesvalid for all groups), but his bounds are not tight; they are always at least a constant factor awayfrom those conjectured by Wilson [46] (and by us above).A related universal bound in which Z d is the worst case is given by Pak [40]. Let ϕ k ( G ) := P ( G k is connected), ie the probability that the group G is generated by k uniformly chosen gener-ators. Then Pak [40, Lecture 1, Theorem 6] proves that if | G | ≤ d then ϕ k ( G ) ≥ ϕ ( Z d ) for all k . The study of random walks on Heisenberg groups and other groups of upper triangular matriceshas a rich history. We give a detailed historical account in [24, § d × d unit upper triangular matrices with entries in Z m . Byviewing the Heisenberg group as d × d matrices (see § d × d unit upper triangularmatrices can be seen as a supergroup of the d -dimensional Heisenberg group. A celebrated result of Alon and Roichman [3, Corollary 1] asserts that, for any ﬁnite group G , the random Cayley graph with at least C ε log | G | random generators is whp an ε -expander,provided C ε is a suﬃciently large (in terms of ε ). (A graph is an ε -expander if its isoperimetricconstant is bounded below by ε ; up to a reparametrisation, this is equivalent to the spectral gap ofthe graph being bounded below by ε .) There has been a considerable line of work building upon thisgeneral result of Alon and Roichman. (Pak [38] proves a similar result.) Their proof was simpliﬁedand extended, independently, by Loh and Schulman [35] and Landau and Russell [33]; both wereable to replace log | G | by log D ( G ), where D ( G ) is the sum of the dimensions of the irreduciblerepresentations of the group G ; for Abelian groups D ( G ) = | G | . A ‘derandomised’ argument forAlon–Roichman is given by Chen, Moore and Russell [10]. Both [10, 33] use some Chernoﬀ-typebounds on operator valued random variables.Christoﬁdes and Markstr¨om [11] improve these further by using matrix martingales and provinga Hoeﬀding-type bound on operator valued random variables. They also improved the quantiﬁca-tion for C ε , showing that one may take C ε := 1+ c ε with c ε → ε →

0; this means that, whp, thegraph is an ε -expander whenever k ≥ (1 + c ε ) log D ( G ) and c ε → ε →

0. They also generaliseAlon–Roichman to random coset graphs. The proofs use tail bounds on the (random) eigenvalues.It is well-known that D ( G ) ≥ p | G | . Thus all these results require at least k ≥ log | G | . Ourresult, on the other hand, applies k ≥ c log | G | for any constant c >

0, provided the underlyinggroup is suitable—eg, this is the case if G is Abelian and d ( G ) ≪ log | G | ; another example is givenby d × d unit upper triangular matrix groups with entries in Z m if m ≫ p , that the order ofthe relaxation time of the RW on the cyclic group Z p is p /k when 1 ≪ k ≤ log p/ log log p .In [26, Theorem D], we restrict to Abelian groups under the assumption k − d ( G ) ≍ k anddetermine (via an altogether diﬀerent method) the order of the relaxation time whenever 1 ≪ k . log | G | : it is | G | /k whp. Thus k ≍ log | G | and k − d ( G ) ≍ k gives relaxation time order 1, which9s equivalent to being an expander by the Cheeger inequalities. Further, we show for ‘most G ’ (ina precise sense) that k − d ( G ) ≍ k suﬃces. This extends, in the Abelian set-up, the above results. We now put our results into a broader context. A common theme in the study of mixing timesis that ‘generic’ instances often exhibit the cutoﬀ phenomenon. In this set-up, a family of transitionmatrices chosen from a certain family of distributions is shown to, whp, give rise to a sequenceof Markov chains which exhibits cutoﬀ. A few notable examples include random birth and deathchains [17, 45], the simple or non-backtracking random walk on various models of sparse randomgraphs, including random regular graphs [37], random graphs with given degrees [5, 6, 7, 8], thegiant component of the Erd˝os–R´enyi random graph [7] (where the authors consider mixing froma ‘typical’ starting point) and a large family of sparse Markov chains [8], as well as random walkson a certain generalisation of Ramanujan graphs [9] and random lifts [9, 12].A recurring idea in the aforementioned works is that the cutoﬀ time can be described in terms ofentropy. One can look at some auxiliary random process which up to the cutoﬀ time can be coupledwith, or otherwise related to, the original Markov chain—often in the above examples this is theRW on the corresponding Benjamini–Schramm local limit. The cutoﬀ time is then shown to be (upto smaller order terms) the time at which the entropy of the auxiliary process equals the entropyof the invariant distribution of the original Markov chain. It is a relatively new technique, and hasbeen used recently in [7, 8, 9, 12]. For ‘most’ regimes of k , this is the case for us too; further, forthe non-Abelian groups considered in [24] we use a similar idea. As our auxiliary random process,we use a SRW, respectively DRW, in the undirected, respectively directed, case.With the exception of the very recent [28], to the best of our knowledge, in all previous instanceswhere the entropic method was used the graphs were tree-like. This is not the case for us: in theAbelian set-up, G k has cycles of length 4 (potentially up to the direction of edges). Admittedly,this has less of an impact on the walk since each vertex is of diverging degree. Consider a ﬁnite group G . Let Z be a multisubset of G . We consider geometric properties,namely through distance metrics and the spectral gap, of the Cayley graph of (

G, Z ); we call Z the generators . The undirected , respectively directed , Cayley graph of G generated by Z , denoted G − ( Z ), respectively G + ( Z ), is the multigraph whose vertex set is G and whose edge multiset is (cid:2) { g, g · z } | g ∈ G, z ∈ Z (cid:3) , respectively (cid:2) ( g, g · z ) | g ∈ G, z ∈ Z (cid:3) . If the walk is at g ∈ G , then a step in G + ( Z ), respectively G − ( Z ), involves choosing a generator z ∈ Z uniformly at random and moving to gz , respectively one of gz or gz − each with probability .We focus attention on the random Cayley graph deﬁned by choosing Z , ..., Z k ∼ iid Unif( G );when this is the case, denote G + k := G + ( Z ) and G − k := G − ( Z ). While we do not assume that theCayley graph is connected (ie, Z may not generate G ), in the Abelian set-up the random Cayleygraph G k is connected whp whenever k − d ( G ) ≫

1; see [25, Lemma 8.1]. In the nilpotent set-up,this is the case whenever k − d ( G/ [ G, G ]) ≫

1; see [24, Remark E.1].The graph depends on the choice of Z . Sometimes it is convenient to emphasise this; we use asubscript, writing P G ( z ) ( · ) if the graph is generated by the group G and multiset z . Analogously, P G k ( · ) stands for the random law P G ( Z ) ( · ) where Z = [ Z , ..., Z k ] with Z , ..., Z k ∼ iid Unif( G ). The directed Cayley graph G + ( z ) is simple if and only if no generator is picked twice, ie z i = z j for all i = j . The undirected Cayley graph G − ( z ) is simple if in addition no generator is the inverseof any other, ie z i = z − j for all i, j ∈ [ k ]. In particular, this means that no generator is of order 2,as any s ∈ G of order 2 satisﬁes s = s − —this gives a multiedge between g and gs for each g ∈ G .The RW on G − ( z ) is equivalent to an adjusted RW on G + ( z ) where, when a generator s ∈ z is chosen, instead of applying a generator s , either s or s − is applied, each with probability .10busing terminology, we relax the deﬁnition of simple Cayley graphs to allow order 2 generators,ie remove the condition z i = z − i for all i .Given a group G and an integer k , we are drawing the generators Z , ..., Z k independently anduniformly at random. It is not diﬃcult to see that the probability of drawing a given multisetdepends only on the number of repetitions in that multiset. Thus, conditional on being simple, G k is uniformly distributed on all simple degree- k Cayley graphs. Since k ≪ p | G | , the probability ofsimplicity tends to 1 as | G | → ∞ . So when we say that our results hold “whp (over Z )”, we couldequivalently say that the result holds “for almost all degree- k simple Cayley graphs of G ”.Our asymptotic evaluation does not depend on the particular choice of Z , so the statistics inquestion depend very weakly on the particular choice of generators for almost all choices. In manycases, the statistics depend only on G via | G | and d ( G ). This is a strong sense of ‘universality’. This paper is one part of an extensive project on random Cayley graphs. There are three mainarticles [23, 24, 26] (including the current one [23]), a technical report [25] and a supplementarydocument [27].

Each main article is readable independently.

The main objective of the project is to establish cutoﬀ for the random walk and determiningwhether this can be written in a way that, up to subleading order terms, depends only on k and | G | ; we also study universal mixing bounds, valid for all, or large classes of, groups. Separately,we study the distance of a uniformly chosen element from the identity, ie typical distance, and thediameter; the main objective is to show that these distances concentrate and to determine whetherthe value at which these distances concentrate depends only on k and | G | .[23] Cutoﬀ phenomenon (and Aldous–Diaconis conjecture) for general Abelian groups; also, fornilpotent groups, expander graphs and comparison of mixing times with Abelian groups.[26] Typical distance, diameter and spectral gap for general Abelian groups.[24] Cutoﬀ phenomenon and typical distance for upper triangular matrix groups.[25] Additional results on cutoﬀ and typical distance for general Abelian groups.[27] Deferred technical results mainly regarding random walk on Z and the volume of lattice balls. This whole random random Cayley graphs project has beneﬁted greatly from advice, discussionsand suggestions of many of our peers and colleagues. We thank a few of them speciﬁcally here. · Allan Sly for suggesting the underlying entropy idea for cutoﬀ in Approach § · Justin Salez for reading this paper in detail and giving many helpful and insightful commentsas well as stimulating discussions ranging across the entire random Cayley graphs project. · Evita Nestoridi and Persi Diaconis for general discussions, consultation and advice.

In this section, we prove the ﬁrst part of the upper bound on mixing for arbitrary Abeliangroups. The main result of the section is Theorem 2.4. The outline of the section is as follows. · § entropic method . · § entropic times and states a CLT. · § · § · § · § · § We use an ‘entropic method’, as mentioned in § W ( t )) t ≥ , recording how many times each generatorhas been used: for t ≥

0, for each generator i = 1 , ..., k , write W i ( t ) for the number of times that ithas been picked by time t . By independence, W ( · ) forms a rate-1 DRW on Z k + . For the undirectedcase, recall that we either apply a generator or its inverse; when we apply the inverse of generator i , increment W i → W i − W i → W i + 1). In this case, W ( · ) is a SRW on Z k .Since the underlying group is Abelian, the order in which the generators are applied is irrelevantand generator-inverse pairs cancel; hence we can write S ( t ) = P ki =1 W i ( t ) Z i = W ( t ) · Z. Recall that the invariant distribution is uniform, regardless of the group. For an Abelian group G , we propose as the mixing time the time at which the auxiliary process W obtains entropylog | G | . The reason for this is the following: take t to be slightly larger than the above entropictime; using the equivalence − log µ ≥ log | G | if and only if µ ≤ / | G | , ‘typically’ W ( t ) takes valuesto which it assigns probability smaller than 1 / | G | ; informally, this means that W ( t ) is ‘well spreadout’. If we could immediately deduce that S ( t ) typically takes values to which it assigns probabilityapproximately 1 / | G | , we would be basically done. However, one could have two independent copies S and S ′ (using the same generators Z ) with S ( t ) = S ′ ( t ) but W ( t ) = W ′ ( t ); the uniformity ofthe generators will show that, on average, this is unlikely. We thus deduce that S ( t ) is well spreadout, ie well mixed. In contrast, if the entropy is much smaller than log | G | , then W ( t ) is not wellspread out: it is highly likely to live on a set of size o (1 / | G | ). The same must then be true for S ( t );hence S ( t ) is not mixed. We now deﬁne precisely the notion of entropic times . Write µ t , respectively ν s , for the law of W ( t ), respectively W ( sk ); so µ t = ν ⊗ kt/k . Deﬁne Q i ( t ) := − log ν t/k (cid:0) W i ( t ) (cid:1) , and set Q ( t ) := − log µ t (cid:0) W ( t ) (cid:1) = P k Q i ( t ) . So E ( Q ( t )) and E ( Q ( t )) are the entropies of W ( t ) and W ( t ), respectively. Observe that t E ( Q ( t )) : [0 , ∞ ) → [0 , ∞ ) is a smooth, increasing bijection. Deﬁnition 2.1 (Entropic and Times) . For all k, n ∈ N and all α ∈ R , deﬁne t α := t α ( k, n ) so that E (cid:0) Q ( t α ) (cid:1) = (cid:0) log n + α √ vk (cid:1) /k and s α := t α /k, where v := V ar (cid:0) Q ( t ) (cid:1) , assuming that log n + α √ vk ≥ . We call t the entropic time and the { t α } α ∈ R cutoﬀ times . Direct calculation with the Poisson distribution and SRW on Z gives the following relations.These calculations are sketched below in § § A].

Proposition 2.2 (Entropic and Cutoﬀ Times; [27, Proposition A.2]) . Assume that ≪ k ≪ log n .For all α ∈ R , we have t α h t and furthermore t h k · n /k / (2 πe ) and ( t α − t ) /t h α p /k. Since Q = P k Q i is a sum of k iid random variables, Q ( t ) concentrates around log N . One canshow that if the time is multiplied by a factor 1+ ξ for any constant ξ > ξ < p V ar( Q ( t )).Thus Q ((1 + ξ ) t ) concentrates around this new value. In particular, the following hold: µ (1+ ξ ) t (cid:0) W (cid:0) (1 + ξ ) t (cid:1)(cid:1) = exp (cid:0) − Q (cid:0) (1 + ξ ) t (cid:1)(cid:1) ≪ /n whp; µ (1 − ξ ) t (cid:0) W (cid:0) (1 − ξ ) t (cid:1)(cid:1) = exp (cid:0) − Q (cid:0) (1 − ξ ) t (cid:1)(cid:1) ≫ /n whp . The following proposition quantiﬁes this change in entropy and this concentration; see [27, § A].12 roposition 2.3 (CLT; [27, Proposition A.3]) . Assume that ≪ k ≪ log n . For all α ∈ R , we have P (cid:0) Q ( t α ) ≤ log n ± ω (cid:1) → Ψ( α ) for ω := V ar (cid:0) Q ( t ) (cid:1) / = ( vk ) / . (There is no speciﬁc reason for choosing this ω . We just need some ω with ≪ ω ≪ ( vk ) / .) In this subsection, we sketch details towards a proof of Proposition 2.2. The full, rigorous detailscan be found in [27, Proposition A.2], where all of the approximations below are justiﬁed.Recall that t is the time t at which the entropy of W ( t ), which is a rate-1 /k process, islog n/k . We need to ﬁnd the variance V ar( Q ( s k )), as this is used in the deﬁnition of t α , given inDeﬁnition 2.1. In the sketch below, we replace V ar( Q ( t )) by an approximation.For s ≥

0, denote X s := W ( sk ) for s ≥ X s as H ( s ). The target entropylog n/k ≫

1, and so s ≫

1. For s ≫

1, we ﬁnd that X s has approximately the normal N ( E ( X s ) , s )distribution. Translating the random variable has no aﬀect on its entropy, and so we approximatethe entropy of X s , which we denoted H ( s ), by the entropy of a N (0 , s ) random variable, which wedenoted H ( s ). Direct calculation with the normal distribution shows that H ( s ) = log(2 πes ) and hence H ′ ( s ) = 1 / (2 s ) . Deﬁne s α as the entropic times for the approximation: H ( s α ) = (cid:0) log n + α √ vk (cid:1) /k where v := V ar (cid:0) Q (cid:0) s k (cid:1)(cid:1) , where Q ( sk ) is the analogue of Q ( sk ), except with W ( sk ) replaced by N (0 , s ). Hence s = n /k / (2 πe ) ≫ . By direct calculation, speciﬁc to the normal distribution, for s ≫ V ar (cid:0) Q ( sk ) (cid:1) = . As mentioned above, for this sketch, to ease the calculation of t α in Deﬁnition 2.1, we replace V ar( Q ( t )) by its approximation , and assume the above normal distribution approximation.In order to ﬁnd the window, assuming for the moment that α >

0, we write s α − s = R α ds a da da. Again, we replace s α with s α . By deﬁnition, s α satisﬁes H ( s α ) = log n/k + α/ √ k, and hence ds α dα H ′ ( s α ) = 1 / √ k. Using the expressions for ds a /da and H ′ ( s ) above, we ﬁnd that s α − s = (2 k ) − / R α s a da ≈ (2 k ) − / R α s da = αs p /k, since s a only varies by subleading order terms over a ∈ [0 , α ]. The argument is analogous for α < s α , ie when approximating W ( sk ) by N ( E ( X s ) , s ). Itwill turn out that this approximation is suﬃciently good for the results to pass over to the originalcase, ie to apply to s and t = s k . This is made rigorous with a local CLT. In this subsection, we state precisely the main theorem of the section. There are some simpleconditions on k , in terms of d ( G ) and | G | , needed for the upper bound. Hypothesis A.

The sequence ( k N , G N ) N ∈ N satisﬁes Hypothesis A if the following hold: lim N →∞ | G N | = ∞ , lim N →∞ (cid:0) k N − d ( G N ) (cid:1) = ∞ and k N − d N ( G N ) − k N ≥ k N log | G N | + 2 d N ( G N ) log log k N log | G N | for all N ∈ N .

13n Remark 2.5 below, we give some suﬃcient conditions of Hypothesis A to hold. Throughoutthe proofs, we drop the subscript- N from the notation, eg writing k or n , considering sequencesimplicitly. Recall that we abbreviate the TV distance from uniformity at time t as d G k ,N ( t ) = (cid:13)(cid:13) P G N ([ Z ,...,Z kN ]) (cid:0) S ( t ) ∈ · (cid:1) − π G N (cid:13)(cid:13) TV where Z , ..., Z k N ∼ iid Unif( G N ) . We now state the main theorem of this section. Recall that Ψ is the standard Gaussian tail.

Theorem 2.4.

Let ( k N ) N ∈ N be a sequence of positive integers and ( G N ) N ∈ N a sequence of ﬁnite,Abelian groups; for each N ∈ N , deﬁne Z ( N ) := [ Z , ..., Z k N ] by drawing Z , ..., Z k N ∼ iid Unif( G N ) .Suppose that the sequence ( k N , G N ) N ∈ N satisﬁes Hypothesis A. For all α ∈ R and N ∈ N ,write t α,N := t α ( k N , | G N | ) . Let α ∈ R . Then t α,N /t ,N → and d ± G k ,N ( t α,N ) → P Ψ( α ) (in probability) as N → ∞ . That is, whp there is cutoﬀ at time t with proﬁle given by { t α } α ∈ R : for all ε ∈ (0 , , the diﬀerencein the mixing times t mix ( ε ) − t mix ( ) is given, up to subleading order terms, by t Ψ − ( ε ) − t . Moreover,the implicit lower bound on the TV distance holds deterministically, ie for all choices of generators. Remark.

Using Proposition 2.2, we can write the cutoﬀ statement in the form (cid:0) t mix ( ε ) − t (cid:1) /w → P Ψ − ( ε ) for all ε ∈ (0 , , where t h k | G | /k / (2 πe ) is the mixing time and w h √ k | G | /k / ( √ πe ) the window. △ Remark 2.5.

Write n := | G | . Note that the ﬁnal condition of Hypothesis A implies that k ≤ log n ;so we are in the regime 1 ≪ k . log n . Any of the following conditions imply Hypothesis A:1 ≪ k . p log n/ log log log n and k − d ≫ ≪ k . p log n and k − d ≫ log log k ;1 ≪ k ≪ log n/ log log log n and k − d ≥ δk for some suitable δ = o (1); d ≪ log n/ log log log n and k − d ≍ k ≪ log n. △ Remark.

The CLT, Proposition 2.3, will give the dominating term in the TV distance: · on the event { Q ( t α ) ≤ log n − ω } , we lower bound the TV distance by 1 − o (1); · on the event { Q ( t α ) ≥ log n + ω } , we upper bound the expected TV distance by o (1).Combined with the CLT, we deduce that the d G k ( t α ) → Ψ( α ) in probability. △ Remark.

Observe that Hypothesis A does not cover the regime k & log | G | . Under fairly mildconditions on the group we can apply a variation on the argument given below to obtain a limitproﬁle result for any 1 ≪ log k ≪ log | G | . We do not carry out the analysis here; see [25, § △ We now give a high-level description of our approach, introducing notations and concepts alongthe way. No results or calculations from this section will be used in the remainder of the document.Recall the deﬁnitions from the previous sections.In all cases we show that cutoﬀ occurs around the entropic time. As Q ( t ) is a sum of iidrandom variables, we expected it to be concentrated around its mean. Loosely speaking, we showthat the shape of the cutoﬀ, ie the proﬁle of the convergence to equilibrium, is determined by theﬂuctuations of Q ( t ) around its mean, which in turn, by the CLT (Proposition 2.3), are determinedby V ar( Q ( t )), for t ‘close’ to t ; note that V ar( Q ( t )) = k V ar( Q ( t )) since the Q i are iid.Throughout this section ( § G . We now outline the proof in more detail. We often drop t -dependence from the notation.We start by discussing the lower bound. If Q is suﬃciently small, then W , and hence also S ,is restricted to a small set. Indeed, Q ≤ log n − ω if and only if µ ( W ) ≥ n − e ω , and thus if this is14he case then W ∈ { w | µ ( w ) ≥ n − e ω } . Since we generate S via W , it is thus also the case that S ∈ E := { g ∈ G | P (cid:0) S = g (cid:1) ≥ n − e ω } . But clearly | E | ≤ ne − ω . Choosing the time t slightly smaller than the entropic time t and ω ≫ { Q ( t ) ≤ log n − ω } will hold whp. Thus, whp, S ( t ) is restricted to a set of size o ( n ). It hence cannot be mixed. This heuristic applies for any choice of generators.Precisely, we show for any ω with 1 ≪ ω ≪ log n , all t and all Z = [ Z , ..., Z k ], that d G k ( t ) ≥ P (cid:0) Q ( t ) ≤ log n − ω (cid:1) − e − ω . Observe that the probability on the right-hand side is independent of Z . Thus we are naturallyinterested in the ﬂuctuations of Q ( t ) for t close to t . Using the CLT application above, ie Propos-ition 2.3 with ω := V ar( Q ( t )) / , we deduce the lower bound in Theorem 2.4.We now turn to discussing the upper bound. The lower bound was valid for any choice ofgenerators Z . Here we exploit the independence and uniform randomness of the elements of Z .Let W ′ ( t ) be an independent copy of W ( t ), and let V ( t ) := W ( t ) − W ′ ( t ). Observe that, inboth the undirected and directed case, the law of V ( t ) is that of the rate-2 SRW in Z k , evaluatedat time t . The standard L calculation (using Cauchy–Schwarz) says that2 k ζ − π G k TV ≤ k ζ − π G k = q n P x ∈ G (cid:0) ζ ( x ) − n (cid:1) , recalling that π G ( x ) = 1 /n for all x ∈ G . A standard, elementary calculation shows that (cid:13)(cid:13) P G k (cid:0) S ( t ) ∈ · (cid:1) − π G (cid:13)(cid:13) = q n P (cid:0) V ( t ) · Z = 0 | Z (cid:1) − . Unfortunately, writing X = ( X ( s )) s ≥ for a rate-1 SRW on Z , a simple calculation shows that P (cid:0) V ( t ) · Z = 0 | Z (cid:1) ≥ P (cid:0) V ( t ) = (0 , ..., ∈ Z k (cid:1) = P (cid:0) X (2 t /k ) = 0 (cid:1) k ≫ /n. (This calculation diﬀers amongst the regimes of k .) Moreover, the L -mixing time can then beshown to be larger than the TV-mixing time by at least a constant factor; hence this is insuﬃcientlyprecise for showing cutoﬀ in TV. (We drop the t -dependence from the notation from now on.)This motivates the following ‘modiﬁed L calculation’. First let W ⊆ Z k , and write typ := (cid:8) W, W ′ ∈ W (cid:9) , P ( · ) := P ( · | typ ) and E ( · ) := E ( · | typ );note that here we are (implicitly) averaging over Z . The set W ⊆ Z k will be chosen later so that P (cid:0) V = 0 (cid:1) ≪ /n and P (cid:0) W / ∈ W (cid:1) = o (1);we call this typicality . We now perform the same L calculation, but for P rather than P : d G k ( t ) = (cid:13)(cid:13) P G k (cid:0) S ∈ · (cid:1) − π G (cid:13)(cid:13) TV ≤ (cid:13)(cid:13) P G k (cid:0) S ∈ · | W ∈ W (cid:1) − π G (cid:13)(cid:13) TV + P (cid:0) W / ∈ W (cid:1) ;4 E (cid:0)(cid:13)(cid:13) P G k (cid:0) S ∈ · | W ∈ W (cid:1) − π G (cid:13)(cid:13) (cid:1) ≤ E (cid:0) | G | P (cid:0) V · Z = 0 | Z (cid:1) − (cid:1) = | G | P (cid:0) V · Z = 0 (cid:1) − Z and doing a modiﬁed L calculation, we transformedthe quenched estimation of the mixing time into an annealed calculation concerning the probabilitythat a random word involving random generators is equal to the identity. This is a key step.To have w ∈ W , we impose local and global typicality requirements . The global part says that − log µ ( w ) ≥ log n + ω for all w ∈ W , where ω := ( vk ) / as above; the local part will come later. For a precise statement of the typicalityrequirements, see Deﬁnition 2.7. These have the property that P ( W / ∈ W ) = Ψ( α ) + o (1) ≍ t = t α ; see Proposition 2.8. This has the advantage that now P (cid:0) V = (0 , ..., (cid:1) ≍ P (cid:0) W = W ′ | W ′ ∈ W (cid:1) ≤ n − e − ω , − log x ≥ log n + ω if and only if x ≤ n − e − ω .Of course, there are other scenarios in which we may have V · Z ≡

0. To deal with these, sincelinear combinations of independent uniform random variables in an Abelian group are uniform ontheir support, we have v · Z ∼ Unif( g v G ) where g v := gcd( v , ..., v k , n ); see Lemma 2.11. Then | G | P (cid:0) V · Z = 0 , V = 0 (cid:1) = | G | E (cid:0) ( V = 0) / | g V G | (cid:1) . (Recall that V and Z are independent.) We use the local typicality conditions to ensure thatmax i | W i | ≤ r ∗ , for some explicit r ∗ which diverges a little faster than n /k . This allows us toconsider only values g ∈ [2 r ∗ ] for the gcd. It is here where the two approaches ( § §

3) diverge.In the ﬁrst approach ( §

2) we use a rather direct approach. First, it is elementary that | G | E (cid:0) ( V = 0) / | g V G | (cid:1) ≤ E (cid:0) g d ( G ) V ( V = 0) (cid:1) ≤ P r ∗ γ =2 γ d ( G ) P (cid:0) g V = γ (cid:1) ;see Lemma 2.12. Since the law of SRW on Z is unimodal, for each non-zero coordinate, the prob-ability that γ divides it is at most 1 /γ . Thus in general the probability is at most 1 /γ plus theprobability that the coordinate is 0, the latter of which is order 1 /n /k ≍ / p t α /k . This leads to P ( g V = γ ) . (cid:0) /n /k + 1 /γ (cid:1) k ;see Lemma 2.14. Provided at least one of d ( G ) or k is not too close to log n , we are able to use thisinequality to control the expectation, showing E ( g d ( G ) V ( V = 0)) = 1 + o (1); see Corollary 2.15.Combining these two analyses, we deduce that n P ( V · Z = 0) ≤ n P ( V · Z = 0 , V = 0) + n P ( V = 0) = 1 + o (1) . The modiﬁed L calculation then says that the TV distance is roughly Ψ( α ) plus a term o P (1), ietending to 0 in probability. This establishes a matching limiting upper bound of Ψ( α ) in probability.The second approach ( §

3) analyses the term P ( g V = γ ) and uses it to kill | G/γG | directly in | G | E (cid:0) ( V = 0) / | g V G | (cid:1) = P γ ∈ N P ( g V = γ ) | G/γG | . We outline in more detail the adaptation in § In this subsection, we prove the lower bound on mixing, which holds for every choice of Z . Proof of Lower Bound in Theorem 2.4.

For this proof, assume that Z is given, and suppress it.We convert the CLT, Proposition 2.3, from a concentration statement about Q into one about W : for all α ∈ R , by the CLT, we have P (cid:0) E α (cid:1) h Ψ( α ) where E α := (cid:8) µ (cid:0) W ( t α ) (cid:1) ≥ n − e ω (cid:9) = (cid:8) Q ( t α ) ≤ log n − ω (cid:9) ;recall that ω ≫

1. Fix α ∈ R . Consider the set E α := (cid:8) x ∈ G (cid:12)(cid:12) ∃ w ∈ Z d st µ t α ( w ) ≥ n − e ω and x = w · Z (cid:9) . Since we use W to generate S , we have P ( S ( t α ) ∈ E α | E α ) = 1. Every element x ∈ E α can berealised as x = w x · Z for some w x ∈ Z k with µ t α ( w x ) ≥ n − e ω . Hence, for all x ∈ E α , we have P (cid:0) S ( t α ) = x (cid:1) ≥ P (cid:0) W ( t α ) = w x (cid:1) = µ t α ( w x ) ≥ n − e ω . Taking the sum over all x ∈ E α , we deduce that1 ≥ P x ∈ E α P (cid:0) S ( t α ) = x (cid:1) ≥ | E α | · n − e ω , and hence | E α | /n ≤ e − ω = o (1) . Finally we deduce the lower bound from the deﬁnition of TV distance: (cid:13)(cid:13) P (cid:0) S ( t α ) ∈ · | Z (cid:1) − π G (cid:13)(cid:13) TV ≥ P (cid:0) S ( t α ) ∈ E α (cid:1) − π G ( E α ) ≥ P ( E α ) − n | E α | ≥ Ψ( α ) − o (1) . Remark.

Given a general (not necessarily Abelian) group G , one can project to its Abelianisation G ab = G/ [ G, G ], which is an Abelian group. The lower bound t ( k, | G ab | ) then holds for theprojected walk on G ab . But projection cannot increase the TV distance. Thus t ( k, | G ab | ) gives alower bound for the mixing of the original walk on G . △ .7 Upper Bound on Mixing It is often easier to consider L distances than L : roughly, squares are easier to deal with thanabsolute values; moreover, L admits several exact formulas, eg involving return probability, thespectral decomposition and representation theory. TV has a signiﬁcant advantage, though: it isuniformly bounded (by 1). As such, we can condition on high probability events and upper boundby the TV distance 1 when this event fails.We use a ‘modiﬁed L calculation’: ﬁrst conditioning that W is ‘typical’; then using a standard L calculation on the conditioned law. Let W ′ be an independent copy of W ; then S ′ := W ′ · Z has the same law as S and is conditionally independent of S given Z . Lemma 2.6.

For all t ≥ and all W ⊆ Z k , the following inequalities hold: d G k ( t ) = (cid:13)(cid:13) P G k (cid:0) S ( t ) ∈ · (cid:1) − π G (cid:13)(cid:13) TV ≤ (cid:13)(cid:13) P G k (cid:0) S ( t ) ∈ · | W ( t ) ∈ W (cid:1) − π G (cid:13)(cid:13) TV + P (cid:0) W ( t ) / ∈ W (cid:1) ;4 E (cid:0)(cid:13)(cid:13) P G k (cid:0) S ( t ) ∈ · | W ( t ) ∈ W (cid:1) − π G (cid:13)(cid:13) (cid:1) ≤ n P (cid:0) S ( t ) = S ′ ( t ) | W ( t ) , W ′ ( t ) ∈ W (cid:1) − . Proof.

The ﬁrst claim follows immediately from the triangle inequality. For the second, usingCauchy–Schwarz, we upper bound the TV distance of the conditioned law by its L distance:4 (cid:13)(cid:13) P G k (cid:0) S ∈ · | W ∈ W (cid:1) − π G (cid:13)(cid:13) ≤ n P x (cid:0) P G k (cid:0) S = x | W ∈ W (cid:1) − n (cid:1) = n P x P G k (cid:0) S = x | W ∈ W (cid:1) − n P x P G k (cid:0) S = S ′ = x | W, W ′ ∈ W (cid:1) − , as S = W · Z and S ′ = W ′ · Z . The claim follows by taking expectations.We now make the speciﬁc choice of the ‘typical’ set W ; we make a diﬀerent choice for each α ∈ R . The collection {W α } α ∈ R of sets will satisfy P ( W ( t α ) / ∈ W α ) h Ψ( α ) , using the CLT(Proposition 2.3); see Proposition 2.8. (Recall that Ψ is the standard Gaussian tail.) We show thatthe modiﬁed L distance (given typicality) is o (1); see Proposition 2.9. Applying Lemma 2.6, weﬁnd that d G k ( t α ) ≤ Ψ( α ) + o (1) whp over Z . This matches the lower bound from § α ∈ R , we are able to ﬁnd the shape of the cutoﬀ. If we only desire the order ofthe window, then we need only consider the limit α → ∞ ; in this case, P ( W ( t α ) / ∈ W α ) ≈ Ψ( α ) ≈ W α .The typicality conditions will be a combination of ‘local’ (coordinate-wise) and ‘global’ ones. Deﬁnition 2.7.

For all α ∈ R , deﬁne the local and global typicality conditions , respectively: W α, loc := (cid:8) w ∈ Z k (cid:12)(cid:12) (cid:12)(cid:12) w i − E (cid:0) W ( t α ) (cid:1)(cid:12)(cid:12) ≤ r ∗ ∀ i = 1 , ..., k (cid:9) where r ∗ := n /k (log k ) ; W α, glo := (cid:8) w ∈ Z k (cid:12)(cid:12) P (cid:0) W ( t α ) = w (cid:1) ≤ n − e − ω (cid:9) . Deﬁne W α := W α, loc ∩ W α, glo , and say that w ∈ Z k is ( α -) typical if w ∈ W α . The following proposition determines the probability that W ( t α ) lies in W α , ie of typicality. Proposition 2.8.

For each α ∈ R , we have P (cid:0) W ( t α ) / ∈ W α (cid:1) → Ψ( α ) . Proof.

By our CLT, Proposition 2.3, the probability that the global conditions hold converges to1 − Ψ( α ). Proposition 2.2 along with [27, Deﬁnitions C.1 and C.2 and Proposition C.3] togethersay that the probability that a single coordinate fails the local condition is at most k − / . By theunion bound, the probability that local typicality fails to hold is then at most k − / = o (1).Herein, we ﬁx α ∈ R and frequently suppress the time t α from the notation, eg writing W for W ( t α ) or W for W α . Let V := W − W ′ , so { W · Z = W ′ · Z } = { V · Z = 0 } . Write D := D ( t α ) := n P (cid:0) V ( t α ) · Z = 0 | typ α (cid:1) − typ := typ α := (cid:8) W ( t α ) , W ′ ( t α ) ∈ W α ) (cid:9) .

17t remains to show that D ( t α ) = o (1) for all α ∈ R . Recall Hypothesis A, the crux of which is that k − d − k − d log log k log n ≥ k log n and k − d ≫ . For r , ..., r ℓ ∈ Z \ { } , we use the convention gcd( r , ..., r ℓ ,

0) := gcd( | r | , ..., | r ℓ | ) . Proposition 2.9.

Suppose that ( d, n, k ) jointly satisfy Hypothesis A. (Recall that, implicitly, ( d, n, k ) is a sequence of triples of integers.) Write g := gcd( V , ..., V k , n ) . Then, for all α ∈ R , we have ≤ D ( t α ) = P γ ∈ N P ( g = γ | typ ) · | G | / | γG | − o (1) . Given this proposition, we can prove the upper bound in the main theorem, Theorem 2.4.

Proof of Upper Bound in Theorem 2.4 Given Proposition 2.9.

Fix α ∈ R and consider the TVdistance at time t α . Apply the modiﬁed L calculation, ie Lemma 2.6 and Deﬁnition 2.7, at time t α :by Proposition 2.9, the modiﬁed L distance (given typicality) is o (1) in expectation; by Markov’sinequality, it is thus o (1) whp. Proposition 2.8 says that typicality holds with probability Ψ( t α )asymptotically. Combined, this all says that d G k ( t α ) ≤ Ψ( α ) + o (1) whp.It remains to prove Proposition 2.9, ie to bound the modiﬁed L distance. The remainder ofthe section is dedicated to this goal. To do this, we are interested in the law of V · Z .Obviously, when V = 0, we have V · Z = 0. The following auxiliary lemma controls thisprobability; its proof is deferred to the end of the subsection. Lemma 2.10.

We have n P (cid:0) V = 0 | typ (cid:1) ≤ e − ω / P ( typ ) . e − ω = o (1) . Linear combinations of independent uniform random variables in an Abelian group are them-selves uniform on their support. Hence the distribution of v · Z is uniform on gcd( v , ..., v k , n ) G .This is proved in [27, Lemma F.1]; we state the version which we desire here. Lemma 2.11.

For all v ∈ Z k , we have v · Z ∼ Unif (cid:0) γG (cid:1) where γ := gcd( v , ..., v k , n ) . We thus need to control | γG | , since Lemma 2.11 implies that P (cid:0) V · Z = 0 | typ (cid:1) = P γ ∈ N P ( g = γ | typ ) / | γG | where g := gcd (cid:0) V , ..., V k , n (cid:1) . Lemma 2.12.

For all Abelian groups G and all γ ∈ N , we have | G | / | γG | ≤ γ d ( G ) . Proof.

Decompose G as ⊕ d Z m j with d = d ( G ) and some m , ..., m d ∈ N . Then γG can bedecomposed as ⊕ d gcd( γ, m j ) Z m j . Hence | γG | = Q d ( m j / gcd( γ, m j )) ≥ Q d ( m j /γ ) = | G | /γ d . These lemmas combine to produce a simple, but key, corollary.

Corollary 2.13.

We have n P (cid:0) V · Z = 0 , V = 0 | typ (cid:1) ≤ E (cid:0) g d ( G ) ( V = 0) | typ (cid:1) . Proof.

The conditioning does not aﬀect Z . The corollary follows from Lemmas 2.11 and 2.12.In order to control this gcd, we need to determine the probability that an individual coordinateis a multiple of a given number. We evaluate the RW around the entropic time t . The proof ofthe following auxiliary lemma is deferred to the end of the subsection. This and Corollary 2.13 arethe key elements to the proof of Proposition 2.9.18 emma 2.14. For all γ ∈ N , we have P (cid:0) V ∈ γ Z | V = 0 (cid:1) ≤ /γ and P ( g = γ | typ ) . (cid:0) /γ + 2 /n /k (cid:1) k . From this, using the conditions of Hypothesis A, we can deduce that E ( g d ( G ) ( V = 0) | typ ) =1 + o (1) . We refer to this as a “corollary”, since its proof is purely technical, not relying on anyproperties of the RW or the generators, just algebraic manipulation. Its proof is brieﬂy deferred.

Corollary 2.15.

Given Hypothesis A, we have E ( g d ( G ) ( V = 0) | typ ) = 1 + o (1) . Proposition 2.9 now follows immediately from Lemma 2.10 and Corollaries 2.13 and 2.15.

Proof of Proposition 2.9.

By Lemma 2.10 and Corollaries 2.13 and 2.15, we have n P (cid:0) V · Z = 0 | typ (cid:1) ≤ n P (cid:0) V = 0 | typ (cid:1) + n P (cid:0) V · Z = 0 , V = 0 | typ (cid:1) ≤ n P (cid:0) V = 0 | typ (cid:1) + E (cid:0) g d ( G ) ( V = 0) | typ (cid:1) = 1 + o (1) . We now give the deferred proof of Corollary 2.15.

Proof of Corollary 2.15.

Let d := d ( G ). By local typicality, g ≤ r ∗ = n /k (log k ) if V = 0. Thus E (cid:0) g d ( V = 0) | typ (cid:1) = P γ ∈ N γ d P (cid:0) g = γ | typ (cid:1) ≤ P ⌊ n /k (log k ) ⌋ γ =2 γ d P ( g = γ | typ ) . For γ ≥

2, we use Lemma 2.14. Let δ ∈ (0 , ≤ γ ≤ δn /k , we use the bound P ( g = γ | typ ) . (cid:0) /γ + 2 / ( γ/δ ) (cid:1) k = (1 + 2 δ ) k /γ k . For γ ≥ δn /k , we use the slightly crude bound ( a + b ) k ≤ k ( a k + b k ) for a, b ≥ P ( g = γ | typ ) . k (cid:0) /γ k + 2 k /n (cid:1) = 2 k /γ k + 4 k /n. Dividing the appropriate sum over γ into two parts according to whether or not γ ≤ δn /k andusing the above inequalities, elementary algebraic manipulations can be used to deduce that E (cid:0) g d ( V = 0) | typ (cid:1) − . e δk d +1 − k + 2 k δ d +1 − k n ( d +1 − k ) /k + 4 k n ( d +1) /k (log k ) d +1) /n. This is o (1), by the conditions of Hypothesis A, as we now outline. Write η := ( k − d − /k ∈ (0 , δ as large as possible so that the ﬁrst term is o (1); set δ := η . With this deﬁn-ition, it is not diﬃcult to see that the assumption η ≥ k/ log n , which follows immediately fromHypothesis A, is suﬃcient to make the middle term small. Finally, the inequality in Hypothesis Ais designed precisely so that the ﬁnal term is o (1), noting that ηk ≥ k − d − ≥ ( k − d ). Remark.

We have always been assuming that k − d ( G ) ≫

1. Our analysis does apply if M := k − d ( G ) ≥ G = Z d then it is not. Our analysis shows that with probability boundedaway from 0 the mixing time is of order t ; by choosing M suﬃciently large, this probability canbe made arbitrarily close to 1—but to be 1 − o (1) one requires M ≫ △ It remains to prove the auxiliary lemmas, namely Lemmas 2.10 and 2.14.

Proof of Lemma 2.10.

By direct calculation, since W and W ′ are independent copies, P (cid:0) V = 0 , typ (cid:1) = P (cid:0) W = W ′ , W ∈ W (cid:1) = P w ∈W P (cid:0) W = w (cid:1) . Recall global typicality: P ( W = w ) ≤ n − e − ω for all w ∈ W . Thus n P (cid:0) V = 0 | typ (cid:1) ≤ n P w ∈W P (cid:0) W = w (cid:1) / P ( typ ) ≤ e − ω / P ( typ ) . roof of Lemma 2.14. Let X = ( X s ) s ≥ be a rate-1 SRW on Z . To calculate the expectation,we use that V = W − W ′ has the distribution of a SRW run at twice the speed; in particular, V i ( t ) ∼ X t/k , and that coordinates of V are independent. (This holds for both the undirected anddirected cases.) Clearly the distribution of X is symmetric about 0.It is easy to see that any non-increasing distribution on N can be written as a mixture ofUnif( { , ..., Y } ) distributions, for diﬀerent Y ∈ N . Observe that the map m P ( | X s | = m ) : N → [0 ,

1] is non-increasing for any s ≥

0. Hence | V | conditional on V = 0 has such a distribution. Thus | V | ∼ Unif { , ..., Y } conditional on V = 0 , where Y has some distribution. Hence we have P (cid:0) V ∈ γ Z | V = 0 (cid:1) = E (cid:0)(cid:4) Y /γ (cid:5)(cid:14) Y (cid:1) ≤ /γ. If the gcd g = γ , then V i ∈ γ Z for all i ∈ [ k ]. Hence, by independence of coordinates, we obtain P ( g = γ | typ ) ≤ P ( g = γ ) / P ( typ ) . P ( V ∈ γ Z ) k ≤ (cid:0) P ( V = 0) + P ( V ∈ γ Z | V = 0) (cid:1) k , noting that P ( typ ) ≍

1. Using Proposition 2.2 to argue that P ( V = 0) ≤ /n /k , we deduce that P ( g = γ | typ ) . (cid:0) /n /k + 1 /γ (cid:1) k . Recall that Theorem A is established via distinct approaches. In the previous section we usedone approach to deal with the case that k is ‘not too large’. In this section we use a new approachto deal with the case that k is ‘not too small’. The main result of the section is Theorem 3.6.Theoutline of the section is roughly the same as that of the previous one. · § · § entropic times . · § · § · § · § · § The underlying principles of the method used in this section ( §

3) are the same as those of theprevious ( § § d was very large. Eg consider Z d . Since all elements are of order 2, instead oflooking at W , a RW on Z , we could equally have looked at W taken modulo 2. The entropy of W mod 2 is signiﬁcantly smaller than that of W at the original entropic time t .We saw that V · Z ∼ Unif( γG ) when gcd( V , ..., V k , n ) = γ . (This assumes that the group G is Abelian.) This motivates deﬁning t γ to be the time at which the entropy of W mod γ islog | G/γG | . The proposed upper bound is then given by t ∗ := max γ ∈ N t γ .While this method will be able to handle arbitrary Abelian groups, we only get an abstractdeﬁnition of the cutoﬀ time, which is not easily calculable for many groups. For some, though, itis: eg t ∗ ( k, Z ) is the time at which the entropy of RW on Z k reaches log | G | .As in the previous sections, not only are we interested in the entropy at this proposed mixingtime t ∗ , but we also desire quantitative information about the rate of change of entropy at thistime and the variance of the ‘random entropy’, denoted Q .20 .2 Entropic Times: Deﬁnition and Concentration In this section, we redeﬁne entropic times. There is some overlap with notation from before,but all entropic deﬁnitions from § entropic times . Let W = ( W i ( t ) | i ∈ [ k ] , t ≥ Z , counting the uses of generators, as in the previous sections. (This can be either a SRWon Z or DRW on Z + .) As before, S ( t ) = W ( t ) · Z . For γ ∈ N , deﬁne W γ via W γ,i ( t ) := W i ( t ) mod γ ;write W ∞ := W . Then W γ is a RW on Z kγ ; so W γ,i := ( W γ,i ( t )) t ≥ forms an iid sequence (over i ∈ [ k ]) of rate-1 /k RWs on Z γ .Write µ γ,t , respectively ν γ,s , for the law of W γ ( t ), respectively W γ, ( sk ); so µ γ,t = ν ⊗ kγ,t/k . Deﬁne Q γ ( t ) := − log µ γ,t (cid:0) W γ ( t ) (cid:1) and Q γ,i ( t ) := − log ν γ,t/k (cid:0) W γ,i ( t ) (cid:1) ;so Q γ,i forms an iid sequence over i ∈ [ k ] and Q γ ( t ) = P ki =1 Q γ,i ( t ) , h γ ( t ) := E (cid:0) Q γ ( t ) (cid:1) and H γ ( s ) := E (cid:0) Q γ, ( sk ) (cid:1) . So h γ ( t ) and H γ ( s ) are the entropies of W ( t ) and W ( sk ), respectively. Note that h γ ( t ) = kH γ ( t/k )and that h γ : [0 , ∞ ) → [0 , log( γ k )) is a strictly increasing bijection.Some of these expressions, such as h γ , depend on k ; we usually suppress this from the notation. Deﬁnition 3.1.

For

N < γ k , deﬁne the entropic time t ( γ, N ) := h − γ (log N ) and s ( γ, N ) := t ( γ, N ) /k = H − γ (log N/k ) . We are interested primarily in N := | G/γG | . For an Abelian group G , deﬁne t ∗ ( k, G ) := max γ ≀| G | t ( γ, | G/γG | ) . Our next result determines the asymptotics of t ∗ . The ﬁrst part is for k − d ( G ) ≍ k : it shows thathere the mixing time is the same order as that from Approach k | G | /k . Combining the twoapproaches, this means that all Abelian groups have mixing time order kn /k when 1 ≪ k . log | G | and k − d ( G ) ≍ k . The second part allows k − d ( G ) to diverge arbitrarily slowly: in this case themixing time can be as large as order k | G | /k log k . The ﬁnal part evaluates t ∗ up to smaller orderterms when d ( G ) ≪ log | G | and k − d ( G ) ≍ k . The proofs are given in [27, § B.3.2].

Proposition 3.2a ([27, Proposition B.17]) . Suppose that ≪ k . log | G | . The following hold:if k − d ( G ) ≍ k, then t ∗ ( k, G ) ≍ k | G | /k ; if k − d ( G ) > , then k | G | /k . t ∗ ( k, G ) . k | G | /k log k. Proposition 3.2b ([27, Proposition B.18]) . Suppose that d ( G ) ≪ log | G | and k − d ( G ) ≍ k ≫ .Then t ∗ ( k, G ) h t ( ∞ , G ) . (Note that t ( ∞ , | G | ) = t ( k, | G | ) in the notation of Deﬁnition 2.1.) Heuristics Behind Proofs.

For the RW on Z γ , until time γ the walk looks roughly the same asif it were on Z . In particular, the entropy growth rates are comparable. From this, we are able tosee that s ∗ is the same order as the entropic time s from § k − d ≍ k .For k − d ≫

1, by Lemma 2.12, we have s ( γ, | G/γG | ) ≤ s ( γ, γ d ) = s ( γ, | Z dγ | ) = R − γ ( ζ γ ) . Sothe worst case is studying relative entropy for the RW on Z dγ . In [25, §

3] we analyse in detail RWson random Cayley graphs of Z dp . In particular, we analyse this entropic time for 1 ≪ k − d ≪ k .The same heuristics hold for the regime 1 ≪ k ≪ log | G | , except that now one checks thatany extremal γ (in the sense of attaining the maximum in the deﬁnition of t ∗ ) satisﬁes γ ≫ s ( γ, log | G/γG | ) ≪ γ . In this case, the RW on Z γ is almost indistinguishable from that on Z .Hence the entropic times are asymptotically equivalent.For k ≍ log | G | , the mixing time is order k ≍ log | G | . As such one expects each generator to bepicked an order 1 number of times. Separate into large γ and small γ : upper bound | G/γG | ≤ | G | in the former and | G/γG | ≤ γ d ( G ) in the latter. Any extremal γ must be small, ie satisfy γ ≍ § t ( γ, | G/γG | ) is a lower bound on mixing for all γ , for all Z . Throughoutthis section, we work under the assumption that k . log | G | . (Recall from § k ≫ log | G | .) As a result of this, taking γ := | G | , we see that the mixing time is at least order k . There hence exists a ς > ςk . (This is true for all choices of Z , not just whp over Z .)For technical reasons, for γ with t ( γ, | G/γG | ) ≤ ς , it is convenient to replace t ( γ, | G/γG | )with an adjusted entropic time t γ deﬁned below. Crucially, max γ ∈ N t γ = max γ ∈ N t ( γ, | G/γG | ) . Deﬁnition 3.3.

For s ≥ and γ ≥ , deﬁne the (adjusted) entropic time and relative entropy via s γ := s ( γ, | G/γG | ) ∨ ς, t γ := s γ k and R γ ( s ) := log γ − H γ ( s ) . The maximal entropy of a distribution on Z γ is log γ , obtained uniquely by the uniform distri-bution Unif( Z γ ). Hence R γ ( s ) → s → ∞ since the RW converges to Unif( Z γ ). In the previous approach, we had a CLT for the random variable Q . Here we do not give suchprecise results; this means that while we show cutoﬀ, we do not ﬁnd the proﬁle. (Even if we knewsuch reﬁned information, it would be diﬃcult to calculate max γ ∈ N t γ , as this is highly dependenton the group.) Instead, we determine the rate of change of the entropy around the entropic timeand establish concentration estimates on the ‘random entropy’, ie the Q γ random variable, at atime shortly after the entropic time.The ﬁrst lemma controls the rate of change of the entropy near the entropic time; see [27, § B].

Lemma 3.4 ([27, Lemma B.20]) . There exists a continuous function c : (0 , → (0 , so that, forall γ ≥ , all ξ ∈ ( − , \ { } and all s ≥ ς , we have (cid:12)(cid:12) H γ (cid:0) s (1 + ξ ) (cid:1) − H γ ( s ) (cid:12)(cid:12) ≥ c | ξ | (cid:0) R γ ( s ) ∧ (cid:1) . Given that we know how much the entropy, ie the expectation of Q γ , changes, we now wanta concentration result, giving upper and lower tail estimates. The upper tail is used for the lowerbound on mixing: it says that Q γ is at most some value whp. Similarly, the lower tail is used forthe upper bound on mixing. These are given in Proposition 3.5, which is proved in [27, § B].Recall that t ∗ = max γ ∈ N t γ and d = d ( G ). For γ ∈ N , write ζ γ := k ( k − d ( G )) log γ. Proposition 3.5 ([27, Proposition B.21]) . Assume that k > d . There exists a continuous function c : (0 , → (0 , so that, for all γ ≥ and all ε ∈ (0 , , the following hold: P (cid:0) Q γ (cid:0) t ∗ (1 + ε ) (cid:1) ≤ log | G/γG | + c ε ( ζ γ ∧ k (cid:1) ≤ exp (cid:0) − c ε ( ζ γ ∧ k (cid:1) ; P (cid:0) Q γ (cid:0) t (1 − ε ) (cid:1) ≥ log | G/γG | − c ε ( ζ γ ∧ k (cid:1) = o (1) for all t ≤ t γ . Heuristics Behind Proof.

The proof of this proposition is given in [27, § B]. We give a brief outlinehere. Recall that Q γ ( t ) = P k Q γ,i ( t ) is a sum of iid terms, each of which has mean H γ ( t/k ) /k .Applying the entropy growth rate lemma, ie Lemma 3.4, we see, for any ξ ∈ ( − , \ { } , that thechange in entropy between times s and (1 + ξ ) s is order R γ ( s ) ∧ | ξ | ). Taking s := s ( γ, | G/γG | ), recalling that | G/γG | ≤ γ d ( G ) by Lemma 2.12, gives R γ ( s ) = log γ − H γ ( s ) = log γ − (log | G/γG | ) /k ≥ k (cid:0) k − d ( G ) (cid:1) log γ = ζ γ . (We are interested in the times s γ , not s ( γ, | G/γG | ); this is a minor technical complication.)Regarding the concentration, the non-quantitative part is then an application of Chebyshev’sinequality, once one has shown that the variance V ar( Q γ, ( sk )) is uniformly bounded over s ≥ ς .The quantitative part requires ﬁrst arguing that E ( Q ,γ ) − Q ,γ is deterministically bounded (fromabove); we then apply a (one-sided) variant of Bernstein’s inequality for a sum of iid, determinist-ically bounded random variables. 22 .4 Precise Statement and Remarks In this subsection, we state precisely the main theorem of the section. There are some simpleconditions on k , in terms of d ( G ) and | G | , needed for the upper bound. Hypothesis B.

The sequence ( k N , G N ) N ∈ N satisﬁes Hypothesis B if the following hold: lim sup N →∞ k N / log | G N | < ∞ , lim inf N →∞ (cid:0) k N − d ( G N ) (cid:1) = ∞ and lim inf N →∞ k N / log |H N | = ∞ , where H N := (cid:8) γG N (cid:12)(cid:12) γ ≀ | G N | and γ ∈ [2 , n ∗ ,N ] (cid:9) and n ∗ ,N := ⌊| G N | /k N (log k N ) ⌋ . In Remark 3.7 below, we give a suﬃcient condition for Hypothesis B to hold. Throughoutthe proofs, we drop the subscript- N from the notation, eg writing k or n , considering sequencesimplicitly. Recall that we abbreviate the TV distance from uniformity at time t as d G k ,N ( t ) = (cid:13)(cid:13) P G N ([ Z ,...,Z kN ]) (cid:0) S ( t ) ∈ · (cid:1) − π G N (cid:13)(cid:13) TV where Z , ..., Z k N ∼ iid Unif( G N ) . We now state the main theorem of this section. Recall that t ∗ = max γ ∈ N t ( γ, | G/γG | ) . Theorem 3.6.

Let ( k N ) N ∈ N be a sequence of positive integers and ( G N ) N ∈ N a sequence of ﬁnite,Abelian groups; for each N ∈ N , deﬁne Z ( N ) := [ Z , ..., Z k N ] by drawing Z , ..., Z k N ∼ iid Unif( G N ) .Suppose that the sequence ( k N , G N ) N ∈ N satisﬁes Hypothesis B. Let c ∈ ( − , \ { } . Then d ± G k ,N (cid:0) (1 + c ) t ±∗ ( k N , G N ) (cid:1) → P ( c < (in probability) as N → ∞ . That is, whp, there is cutoﬀ at max γ t ± ( γ, | G/γG | ) . Moreover, the implicit lower bound holdsdeterministically, ie for all choices of generators. Remark 3.7. If k ≫ √ log n , then k ≫ log |H| , since |H| ≤ n ∗ ≤ n /k (log k ) . △ The general outline of this approach is the same as that of the previous; see § d or k became too large or k − d became too small. We outline here the ideas used to cover these cases.For the lower bound, we project the walk from G to G/γG . This can only decrease the TVdistance. The idea, then, is that where before we looked at a RW on Z k and waited until it hasentropy log | G | , instead we look at a RW on Z kγ and wait until it has entropy log | G/γG | ; seeDeﬁnition 3.1. We then take a worst-case over γ ∈ N .For the upper bound, fundamentally, we still wish to bound the same expression: D ( t ) = P γ ∈ N P ( g = γ | typ ) · | G | / | γG | − § | G | / | γG | ≤ γ d ( G ) . In certain situations,this is too crude. Instead, observe that if g = γ then V ≡ γ . But W γ := W mod γ and W ′ γ := W ′ mod γ are simply RWs on Z kγ . Just as we used the entropic time (and typicality) to get P (cid:0) W = W ′ | typ (cid:1) ≪ / | G | in Lemma 2.10, here we adjust the entropic time (and typicality) so that P (cid:0) g = γ | typ (cid:1) ≤ P (cid:0) W γ = W ′ γ | typ (cid:1) ≪ | γG | / | G | ;see Deﬁnitions 3.1 and 3.8 and the proof of Proposition 3.13.23 .6 Lower Bound on Mixing In this subsection, we state and prove the lower bound, matching the upper bound of The-orem 3.6; it holds not only for all groups G but also for all choices of Z , not just whp over Z .The idea is to quotient out by γG , and show that the walk on this quotient is not mixed at time(1 − ε ) t ( γ, | G/γG | ), and hence the original walk is not mixed on G either. We use the same ideaas in § γ , the walk is not mixed on G/γG at time (1 − ε ) t ( γ, | G/γ, G | ).In § Proof of Lower Bound in Theorem 3.6.

For this proof, assume that Z is given, and suppress it.We ﬁrst convert the statement from one about Q γ to one about W γ . Let ε ∈ (0 ,

1) and set t := (1 − ε ) t ( γ, | G/γG | ). Write ζ γ := R γ ( s ( γ, | G/γG | )) . From Proposition 3.5, we obtain P ( E ) = 1 − o (1) where E := (cid:8) µ γ,t (cid:0) W γ ( t ) (cid:1) ≥ δ − γ,ε / | G/γG | (cid:9) and δ γ,ε := exp (cid:0) − c ε ( ζ γ ∧ k (cid:1) . From Lemma 2.12, we have | G/γG | ≤ γ d ( G ) . Thus ζ γ = R γ (cid:0) s ( γ, log | G/γG | ) (cid:1) = log γ − log | G/γG | /k ≥ k (cid:0) k − d ( G ) (cid:1) log γ ;also, k − d ( G ) ≫

1. Thus δ γ,ε = o (1) uniformly in γ . Consider the set A := (cid:8) x ∈ G/γG (cid:12)(cid:12) ∃ w ∈ Z kγ st µ γ,t ( w ) ≥ δ − γ,ε / | G/γG | and x = ( w · Z ) γG (cid:9) . Deﬁne S γ to be the projection of S to G/γG . Since we use W to generate S γ , we have P ( S γ ( t ) ∈ A | E ) = 1. Every element x ∈ A can be realised as x = w x · Z for some w x ∈ Z kγ with µ γ,t ( w x ) ≥ δ − γ,ε / | G/γG | . Hence, for all x ∈ A , we have P (cid:0) S γ ( t ) = x (cid:1) ≥ P (cid:0) W γ ( t ) = w x (cid:1) = µ γ,t ( w x ) ≥ δ − γ,ε / | G/γG | , recalling that S γ lives in the quotient G/γG . Summing over x ∈ A , we deduce that1 ≥ P x ∈ A P (cid:0) S γ ( t ) = x (cid:1) ≥ | A | · δ − γ,ε / | G/γG | , and hence | A | / | G/γG | ≤ δ γ,ε = o (1) . Projecting onto

G/γG (which can only decrease the TV distance), we see that (cid:13)(cid:13) P G k (cid:0) S ( t ) ∈ · (cid:1) − π G (cid:13)(cid:13) TV ≥ P (cid:0) S γ ( t ) ∈ A (cid:1) − π G/γG ( A ) ≥ P ( E ) − | A | / | G/γG | = 1 − o (1) . Finally, recall that max γ ∈ N t γ = max γ ∈ N t ( γ, | G/γG | ) . This completes the proof.

To upper bound the mixing time, we use a ‘modiﬁed L calculation’, as in the previous approach.This involves ﬁrst conditioning that W has some ‘typical’ properties, laid out in the followingdeﬁnition, and then performing a standard TV– L upper bound on the conditioned law.Abbreviate t ∗ ,ε := t ∗ (1 + ε ) . Recall that d = d ( G ) and ζ γ = k ( k − d ) log γ ; set ˆ ζ γ := ζ γ ∧ Deﬁnition 3.8.

Let ε > ; recall the constant c ε > from Proposition 3.5. The following dependon ε ; we suppress this in the notation. Deﬁne global typical sets for γ ∈ N by W γ, glo := (cid:8) w ∈ Z kγ (cid:12)(cid:12) P ( W γ ( t ∗ ,ε ) = w ) ≤ δ γ,ε / | G/γG | (cid:9) where δ γ := δ γ,ε := e − c ε ˆ ζ γ k . Also deﬁne δ ∞ := δ ∞ ,ε := e − c ε k . Deﬁne the local typicality set by W loc := (cid:8) w ∈ Z k (cid:12)(cid:12) | w i − E ( W i ( t ∗ ,ε )) | ≤ r ∗ ∀ i ∈ [ k ] (cid:9) where r ∗ := | G | /k (log k ) . When W ′ is an independent copy of W , deﬁne typicality by typ := (cid:8) W ( t ∗ ,ε ) , W ′ ( t ∗ ,ε ) ∈ W loc (cid:9) ∩ (cid:0) ∩ γ ∈ Γ (cid:8) W γ ( t ∗ ,ε ) , W ′ γ ( t ∗ ,ε ) ∈ W γ, glo (cid:9)(cid:1) , where Γ is a subset of [2 , | G | ] to be deﬁned below in Deﬁnition 3.11.

24e are going to do a union bound over γ ∈ Γ, so desire control on P γ ∈ Γ δ γ . Lemma 3.9.

For all Γ ⊆ N \ { } , we have P γ ∈ Γ δ γ ≤ δ ∞ ,ε | Γ | + o (1) . Proof.

Since min Γ ≥ k − d ≫

1, we have P γ ∈ Γ δ γ ≤ P γ ∈ Γ ( e − c ε k + e − c ε ζ γ k ) = e − c ε k | Γ | + P γ ∈ Γ γ − c ε ( k − d ) = δ ∞ | Γ | + o (1) . Proposition 3.10.

For all ε > and any subset Γ ⊆ N \ { } , we have P ( typ ) ≥ − δ ∞ ,ε | Γ | − o (1) . Proof.

Suppress the time-dependence from the notation, eg writing W for W ( t ∗ ,ε ).Consider global typicality. First, observe that Q γ = − log µ γ ( W γ ) ≥ log | G/γG | + c ε ˆ ζ γ k if and only if µ γ ( W γ ) ≤ e − c ε ˆ ζ γ k / | G/γG | . Hence, recalling that δ γ = δ γ,ε = exp( − c ε ˆ ζ γ k ), by Proposition 3.5, we have P (cid:0) µ γ ( W γ ) ≤ δ γ / | G/γG | (cid:1) ≤ δ γ , and hence P (cid:0) ∩ γ ∈ Γ (cid:8) W γ ∈ W γ, glo (cid:9)(cid:1) ≥ − P γ ∈ Γ δ γ , by the union bound. Recall that ζ γ = k ( k − d ) log γ . Applying Lemma 3.9, we deduce that P (cid:0) ∩ γ ∈ Γ (cid:8) W γ ∈ W γ, glo (cid:9)(cid:1) ≥ − δ ∞ ,ε | Γ | − o (1) where δ ∞ ,ε = e − c ε k . Now consider local typicality. Proposition 3.2a says that t/k ≤ | G | /k log k . Then [27, Deﬁni-tions C.1 and C.2 and Proposition C.3] together give P (cid:0) ∩ i (cid:8) | W i − E ( W i ) | ≤ r ∗ (cid:9)(cid:1) = 1 − o (1); hence P (cid:0) W ∈ W loc (cid:1) = 1 − o (1) . The claim follows by combining local and global typicality and applying the union bound.We now choose the set Γ, to make sense of typicality. Recall that α ≀ β means that α divides β . Deﬁnition 3.11.

Abbreviate n ∗ := ( n − ∧ ⌊ r ∗ ⌋ . Deﬁne ∆ := { γ ∈ [2 , n ∗ ] | γ ≀ n } . Write H forthe set of all proper subgroups H of G which can be represented as H = γG for some γ ∈ ∆ : H := (cid:8) H | H = γG = G for some γ ≀ n with ≤ γ ≤ n ∗ (cid:9) . Given H ∈ H , write Γ H := { γ ∈ ∆ | H = γG } and denote by γ H the minimal γ ≀ n so that H = γG ,ie γ H := inf Γ H . Finally, deﬁne

Γ := { γ H | H ∈ H} ∪ { n } ; so Γ ⊆ ∆ ∪ { n } ⊆ [2 , n ∗ ] ∪ { n } . The following lemma, whose proof is deferred to the end of the subsection, will also be needed.

Lemma 3.12.

For all H ∈ H and all γ ∈ Γ H , we have γ H ≀ γ . As shown below, we can combine our results to control the L distance conditioned on typicality.In analogy with § D := D ( t ) := n P (cid:0) V ( t ) · Z = 0 | typ (cid:1) − . Proposition 3.13.

Write g := gcd( V , ..., V k , n ) . Then, for all ε ∈ (0 , , we have ≤ D (cid:0) t (1 + ε ) (cid:1) = P γ ∈ N P ( g = γ | typ ) · | G | / | γG | − ≤ (cid:0) δ ∞ ,ε |H| + o (1) (cid:1) / P ( typ ) . (The conditions of Hypothesis B imply immediately that this last term is o (1) .) From Propositions 3.10 and 3.13, it is straightforward to deduce the upper bound on mixing.25 roof of Upper Bound in Theorem 3.6.

We use a modiﬁed L calculation at time (1+ ε ) max γ t γ . · Condition that W satisﬁes typicality; see Deﬁnition 3.8 and Proposition 3.10. · Perform the standard TV– L upper bound on the law of S conditioned that W is typical. · Upper bound the expected L distance by ( δ ∞ ,ε |H| + o (1)) / P ( typ ); see Proposition 3.13. · This gives an upper bound on the expected TV distance of ( δ ∞ ,ε |H| + o (1)) / P ( typ ) + P ( typ c ) . · From the deﬁnition of Γ, it is clear that | Γ | ≤ |H| + 1. Since δ ∞ ,ε = e − c ε k = o (1) , with c ε an arbitrary constant, the assumed condition k ≫ log |H| gives a ﬁnal bound of o (1) on theexpected TV distance, recalling that P ( typ ) = 1 − o (1) by Proposition 3.10. · By Markov’s inequality, this means that the TV distance is o (1) whp.We now prove Proposition 3.13. To ease exposition, while all terms are evaluated at time t ∗ ,ε = (1 + ε ) max γ ∈ N t γ , we suppress this from the notation. Proof of Proposition 3.13.

Write V := W − W ′ and g := gcd( V ∞ , , ..., V ∞ ,k , n ). If g = γ , whichmust have γ ≀ n as the gcd is with n , then V · Z ∼ Unif( γG ) by Lemma 2.11. Then D = n P (cid:0) V · Z = 0 | typ (cid:1) − | G | P γ ≀ n P ( g = γ | typ ) / | γG | − . We consider various cases. Combining together all γ such that γG = G , we upper bound | G | P (cid:0) g ∈ { γ | γG = G } (cid:1) / | γG | ≤ . If V ∞ = 0 in Z k , then g = γ = n , which gives γG = { id } ; using the deﬁnition of typicality, | G | P (cid:0) V ∞ = 0 | typ (cid:1) / | γG | = | G | E (cid:0) P (cid:0) W ∞ = W ′∞ | W ′∞ , typ (cid:1) | typ (cid:1) ≤ δ ∞ / P ( typ );cf Lemma 2.10. If V ∞ = 0, then, given (local) typicality, g ≤ n ∗ = ( n − ∧ ⌊ r ∗ ⌋ .So it remains to study γ ∈ ∆ . As a consequence of Lemma 3.12, for any H ∈ H , we have (cid:8) V γ = 0 for some γ ∈ Γ H (cid:9) ⊆ (cid:8) V γ H = 0 (cid:9) . (Recall that V γ ∈ Z kγ for each γ .) This is key: it allows us to collapse the consideration of all γ ∈ Γ H down to the single element γ H . Indeed, using the above we have P γ ∈ Γ H P ( g = γ | typ ) / | γG | = P (cid:0) ∪ γ ∈ Γ H { g = γ } | typ (cid:1) / | H |≤ P (cid:0) V γ = 0 for some γ ∈ Γ H (cid:1) / | H | ≤ P (cid:0) V γ H = 0 | typ (cid:1) / | H | ≤ ( δ γ H / | G | ) / P ( typ ) , with the ﬁnal inequality using typicality, as above. We decompose P γ ∈ ∆ into P H ∈H P γ ∈ Γ H : | G | P γ ∈ ∆ P ( g = γ | typ ) / | γG | = | G | P H ∈H P γ ∈ Γ H P ( g = γ | typ ) / | γG | ≤ P H ∈H δ γ H / P ( typ )(Note that every γ gives rise to a unique H such that γG = H and, by deﬁnition, H is the set ofall H which can be obtained as γG for some γ ; hence this decomposition neither overcounts norundercounts γ ∈ ∆.) Combining all these and using Lemma 3.9, we deduce the proposition:0 ≤ n P (cid:0) V · Z = 0 | typ (cid:1) − | G | P γ ≀ n P ( g = γ | typ ) / | γG | − ≤ (cid:0) δ ∞ |H| + o (1) (cid:1) / P ( typ ) . It remains to give the deferred proof of the divisibility lemma, namely Lemma 3.12.

Proof of Lemma 3.12.

Consider any decomposition of G as ⊕ r Z m j ; this does not require r = d ( G ). Fix some β ∈ Γ H . Since αG = βG if and only if gcd( α, m j ) = gcd( β, m j ) for all j , we maydecompose H as ⊕ r h j Z m j where h j := gcd( β, m j ) for all j . Set γ ∗ := lcm( h , ..., h r ). We showthat γ ∗ G = H and that γ ∗ ≀ α for all α ∈ Γ H ; this proves the lemma.Fix j ∈ [ r ]. Now, h j ≀ γ ∗ = lcm( h , ..., h r ) and h j ≀ m j by assumption. Hence h j ≀ gcd( γ ∗ , m j ).Conversely, if x ≀ z and y ≀ z then lcm( x, y ) ≀ z , and so γ ∗ = lcm( h , ..., h r ) ≀ β since h j ≀ β . Hencegcd( γ ∗ , m j ) ≀ gcd( β, m j ) = h j . Thus h j = gcd( γ ∗ , m j ). Hence γ ∗ G = H . Now consider any α with αG = H ; so h j = gcd( α, m j ) for all j . Hence h j ≀ α for all j , and so lcm( h , ..., h r ) ≀ α , ie γ ∗ ≀ α .26 TV Cutoﬀ: Combining Approaches

In this section we combine the analysis from the previous two approaches to study the regime p log | G | / log log log | G | . k . p log | G | with 1 ≪ k − d ( G ) ≪ k. We use the more reﬁned notion of the entropic times; see § In this subsection, we state precisely the main theorem of the section. There are some simpleconditions on k , in terms of d ( G ) and | G | , needed for the upper bound. Hypothesis C.

The sequence ( k N , G N ) N ∈ N satisﬁes Hypothesis C if the following hold: lim inf N →∞ k N / p log | G N | / log log log | G N | > , lim sup N →∞ k N / p log | G N | < ∞ , lim inf N →∞ (cid:0) k N − d ( G N ) (cid:1) = ∞ and lim sup N →∞ (cid:0) k N − d ( G N ) (cid:1) /k N = 0 . Throughout the proofs, we drop the subscript- N from the notation, eg writing k or n , consid-ering sequences implicitly. Recall that we abbreviate the TV distance from uniformity at time t as d G k ,N ( t ) = (cid:13)(cid:13) P G N ([ Z ,...,Z kN ]) (cid:0) S ( t ) ∈ · (cid:1) − π G N (cid:13)(cid:13) TV where Z , ..., Z k N ∼ iid Unif( G N ) . We now state the main theorem of this section. Recall that t ∗ = max γ ∈ N t ( γ, | G/γG | ) . Theorem 4.1.

In short, the conditions of Hypothesis C say that p log | G | / log log log | G | . k . p log | G | and 1 ≪ k − d ( G ) ≪ k. The regime of smaller k is covered by Approach k by Approach △ Remark.

Recall that the lower bound from § ≪ k . log | G | and k − d ( G ) ≫ △ Fundamentally, we still wish to bound the same expression that we did in previously: P γ ≀| G | P ( g = γ | typ ) · | G | / | γG | − § | G/γG | ≤ γ d ( G ) . In § P (cid:0) g = γ | typ (cid:1) ≤ P (cid:0) W γ = W ′ γ | typ (cid:1) ≪ / | G/γG | . The upper bound | G/γG | ≤ γ d ( G ) is fairly crude. Roughly the idea here is to show, for this interimregime of k around p log | G | , that for all but e o ( k ) of the γ we can improve it; for the remaining γ ,we use the second approach. (Before we considered |H| diﬀerent γ , and so required |H| = e o ( k ) .)27 .3 Upper Bound on Mixing Let G be an Abelian group; set n := | G | . One can ﬁnd a decomposition ⊕ d Z m j of G suchthat d = d ( G ), the minimal size of a generating set, and m i ≀ m j for all i ≤ j . (This can beproved by induction. Alternatively, write G as a direct sum of p -groups then merge the p -groupsappropriately.) For the remainder of this section we ﬁx such a decomposition.We use the more reﬁned concept of typicality from Approach ε > t :=(1 + ε ) t ∗ ( k, G ). We frequently suppress the t and ε dependence in the notation. Let c := c ε > ζ γ := k ( k − d ) log γ, ˆ ζ γ := ζ γ ∧ δ γ := e − c ˆ ζ γ k . Note that k − d ≫ k . √ log n , so ˆ ζ n = 1; set ˆ ζ ∞ := 1. Recall that W is a RW on Z and wedeﬁne W γ by W mod γ ; set W ∞ := W . We now deﬁne typicality for this section precisely. Deﬁnition 4.3 (cf Deﬁnition 3.8) . Deﬁne typical sets for γ ∈ N ∪ {∞} by the following: W γ, glo := (cid:8) w ∈ Z kγ (cid:12)(cid:12) P ( W γ ( t ) = w ) ≤ δ γ / | G/γG | (cid:9) where δ γ := e − c ˆ ζ γ k ; W loc := (cid:8) w ∈ Z k (cid:12)(cid:12) | w i − E ( W i ( t )) | ≤ r ∗ ∀ i ∈ [ k ] (cid:9) where r ∗ := n /k (log k ) . Choose L to be the maximal integer in [1 , d ] with m L ≤ M where M := exp (cid:0)p log n/ log log n (cid:1) ; set Γ := (cid:8) rm (cid:12)(cid:12) r ∈ [ k / ] , m ≀ m L , rm ≀ n (cid:9) \ { } . When W ′ is an independent copy of W , deﬁne typicality by typ := (cid:8) W ( t ) , W ′ ( t ) ∈ W loc (cid:9) ∩ (cid:0) ∩ γ ∈ Γ (cid:8) W γ ( t ) , W ′ γ ( t ) ∈ W γ, glo (cid:9)(cid:1) . Lemma 4.4.

We have log | Γ | ≪ k. In particular, δ ∞ | Γ | = o (1) . Proof.

We have | Γ | ≤ k / div m L where div m is the number of divisors of m ∈ N . By [21, § m . log m/ log log m uniformly in m ∈ N . By the deﬁnition of m L and theassumption that k & p log n/ log log log n , we obtainlog div m L . log M/ log log M . p log n/ log log n/ log log n ≪ k. Thus log | Γ | ≪ k. Recall that log(1 /δ ∞ ) ≍ k .In §

3, typicality was initially deﬁned for a general subset of N \ { } ; see Deﬁnition 3.8. Thefollowing result is an immediate consequence of Lemmas 3.9 and 4.4 and Proposition 3.10. Lemma 4.5 (cf Lemma 3.9 and Proposition 3.10) . We have P γ ∈ Γ δ γ = o (1) and P ( typ ) = 1 − o (1) . Thus, by applying the modiﬁed L calculation, it suﬃces to prove the following result. Proposition 4.6.

We have ≤ d − L ≤ p log n/ log log n ≪ k. In particular, L h d h k . Proof.

Since n = m · · · m d , m ≤ · · · ≤ m d and m L ≤ M , if L < d then M d − L ≤ m d − LL +1 ≤ n .Rearranging gives the inequality. Finally, recall that k & p log n/ log log log n and k h d .We prove Proposition 4.6 by separating the sum over γ into two parts according to Γ.28 roof of Proposition 4.6. Observe that | G/γG | P ( g = γ | typ ) ≤ γ = 1. Also, g ≀ n . Thus P γ ∈ N | G/γG | P (cid:0) g = γ | typ (cid:1) − ≤ P γ ∈ Γ | G/γG | P (cid:0) g = γ | typ (cid:1) + P γ ∈ Γ | G/γG | P (cid:0) g = γ | typ (cid:1) where Γ := { γ ∈ [2 , n ] | γ ≀ n } \ Γ . We analyse these sums with Approach o (1), when t := (1+ ε ) t ∗ ( k, G ) with ε > Analysis via Approach

Suppose that γ ∈ Γ, so γ / ∈ Γ ∪ { } . We improve the inequality | G/γG | ≤ γ d via the following argument. For each j ∈ [ L ], we may write γ = r j · gcd( γ, m j ) and m j = r ′ j · gcd( γ, m j ) where gcd( r j , r ′ j ) = 1 . By deﬁnition of Γ, if γ = ˜ r · m for some m ≀ m j , then ˜ r > k / . Hence gcd( γ, m j ) = γ/r j ≤ γ/k / for j ∈ [ L ]. Applying this to the ﬁrst L terms of the product gives | G/γG | = Q d gcd( γ, m j ) ≤ γ d /k L/ . Exactly the same analysis as in the proof of Corollary 2.15 then leads us to P γ ∈ Γ | G/γG | P (cid:0) g = γ | typ (cid:1) ≤ e δk d +1 − k + 2 k δ d +1 − k n ( d +1 − k ) /k + 4 k (log k ) d +1) /k L/ , where δ is any value in (0 , δ := ( k − d − /k ; the calculation showing that the ﬁrsttwo terms are o (1) if k − d ≫ ≪ k ≪ log n for this choice is the same as the one fromCorollary 2.15. For the third term, 4 k (log k ) d +1) /k L/ ≪ L h k h d ; thus the ﬁnal term isalso o (1). We thus deduce that the sum over γ ∈ Γ is o (1). Analysis via Approach

The typicality conditions set out in Deﬁnition 4.3 imply that P (cid:0) g = γ | typ (cid:1) ≤ P (cid:0) W γ = W ′ γ | typ (cid:1) ≤ δ γ / | G/γG | ;cf Lemma 2.10. Combining this with Lemma 4.5, we deduce that the sum over γ ∈ Γ is o (1): P γ ∈ Γ | G/γG | P (cid:0) g = γ | typ (cid:1) ≤ P γ ∈ Γ δ γ = o (1) . In this section we prove Theorem B, namely establish cutoﬀ in the separation metric for anappropriate regime of k . Recall that, for t ≥

0, the separation distance is deﬁned by s ( t ) := max x,y { − P t ( x, y ) /π ( y ) } , where P t ( x, y ) is the time- t transition probability from x to y and π the invariant distribution. Wewrite s ± G k ,N when considering sequences ( k N , G N ) N ∈ N , analogously to d ± G k ,N . We now state the main theorem; as for the previous theorems, there are conditions on ( k, G ). Hypothesis D.

The sequence ( k N , G N ) N ∈ N satisﬁes Hypothesis D if the following hold: lim inf N →∞ k N − d ( G N )max (cid:8) (log | G N | /k N ) , (log | G N | ) / (cid:9) = ∞ and lim sup N →∞ log k N log | G N | = 0 . Theorem 5.1.

It is easy to check that Hypothesis D is satisﬁed when k & (log | G | ) / , k − d ( G ) ≫ (log | G | ) / and log k ≪ log | G | (simultaneously) . △ The proof uses the previously established TV mixing time bound as a building block.29 .2 Lower Bound

Since TV is a lower bound on separation (see, eg, [34, Lemma 6.16]), the lower bound followsfrom the TV result. References for the TV result are as follows. See Theorem 3.6, speciﬁcally § k ≍ log | G | . For k ≫ log | G | , TV cutoﬀ had alreadybeen established at time t ∗ ( k, G ); see § We analyse the upper bound in Theorem 5.1 via a sequence of lemmas. From these, the upperbound in Theorem 5.1 follows immediately. Throughout, Hypothesis D should be assumed.

Preliminaries.

For y, z ∈ G and t ≥

0, write P t ( y, z ) := P y ( S ( t ) = z ) for the time- t transitionprobability from y to z . Write n := | G | . We want to show, for ﬁxed ξ >

0, thatmin x ∈ G P ± t (0 , x ) ≥ n (cid:0) − o (1) (cid:1) for some t ≤ (1 + 2 ξ ) t ±∗ ( k, G ) . Abbreviate d := d ( G ). Let χ = o (1) to be speciﬁed later. Throughout the proof, we imposeconditions on χ ; at the end of the proof, we show that these are equivalent to Hypothesis D. Set k ′ := k − χ ( k − d ); then k ′ h k and k ′ − d = (1 − χ )( k − d ) h k − d ≫

1. Let A := [ Z , ..., Z k ′ ] be theﬁrst k ′ generators and B := [ Z k ′ +1 , ..., Z k ] be the remaining k − k ′ = χ ( k − d ). Since G is Abelian, P t = P t,A P t,B where in P t,A , respectively P t,B , we pick each generator of A , respectively B , at rate1 /k independently. (In words, ﬁrst apply the generators from A and then those from B .)Let ξ > t ′ := (1 + ξ ) t ∗ ( k ′ , G ). Since there is cutoﬀ at t ∗ ( k ′ , G ), we can thenchoose δ = o (1) so that t ′ is larger than the δ -TV mixing time for the rate-1 RW on G ( A ) for atypical choice of A . In the regime k ≫ log n , simply having δ = o (1) will be suﬃcient. In the regime k . log n , we quantify this δ ; since k ≫ √ log n , by Hypothesis D, Approach §

3) applies. Wealso compare t ∗ ( k ′ , G ) and t ∗ ( k, G ), the whp-cutoﬀ times for G ( A ) and G ( Z ), respectively.The following two auxiliary lemmas have their proofs deferred to § Lemma 5.3.

Assume Hypothesis D. When k . log n there exists a constant c > so that we maychoose δ := e − c ( k − d ) , ie the e − c ( k − d ) mixing time of the RW on G ( A ) is at most t ′ whp. Lemma 5.4.

We have t ∗ ( k ′ , G ) h t ∗ ( k, G ) if and only if χ ( k − d ) k − log n ≪ . Assume that χ ( k − d ) k − log n ≪ t ∗ ( k ′ , G ) h t ∗ ( k, G ). To relate this to the rate-1RW on G ( Z ), rescale time by k/ | A | = 1 / (1 − χ ( k − d ) /k ): set t := t ′ / (1 − χ ( k − d ) /k ). Thus t h (1 + ξ ) t ∗ ( k, G ) as χ ≪ ≤ k/ ( k − d ); in particular, t ≤ (1 + 2 ξ ) t ∗ ( k, G ). By monotonicity ofthe separation distance with respect to time, it thus suﬃces to show thatmin x ∈ G P t (0 , x ) ≥ n (cid:0) − o (1) (cid:1) . Lemma 5.5.

Assume that χ ( k − d ) k − log n ≪ . Suppose that we can choose δ, χ, η ≪ so that,for all (deterministic) sets D ⊆ G with | G \ D | ≤ δ | G | and all x ∈ G uniformly, we have P (cid:0) Q B ( x, D ) ≤ − η (cid:1) = o (1 / | G | ) where Q B ( y, z ) := | B ± | − P b ∈ B ± ( y + b − = z ) for y, z ∈ G where B + := B and B − := B ∪ B − (as multisets). Then min x ∈ G P t (0 , x ) ≥ n (cid:0) − o (1) (cid:1) whp . Proof.

We condition on a typical realisation of A , namely write A := { a | t mix ( δ ; G ( a )) ≤ t ′ } andcondition on A = a for a ﬁxed a ∈ A . We have P ( A ∈ A ) = 1 − o (1) . Given A = a ∈ A , the set D := (cid:8) z ∈ G (cid:12)(cid:12) P t,a (0 , z ) ≥ n (1 − δ ) (cid:9) satisﬁes | D | ≥ n (1 − δ ) . For the undirected case (ie the RW on G − k ), by reversibility, conditional on A , we have P − t (0 , x ) ≥ P − t,B ( x, D ) · n (1 − δ ) . G + k is not reversible, Cayley graphs have the special property that a step‘backwards’ with a generator z corresponds to a step ‘forwards’ with z − . Thus P + t (0 , x ) ≥ Q + t,B ( x, D ) · n (1 − δ )where Q + · ,B is the heat kernel for the RW on G + ( B − ) where B − := [ z − | z ∈ B ], rather than on G ( B ). For the RW on G − k , replacing the generators with their inverses has no eﬀect on the graph(or RW); set Q −· ,B := P −· ,B . We want to show that Q t,B ( x, D ) = 1 − o (1) uniformly in x ∈ G whp.This is a RW on G ± ( B − ) run for time t . By considering just the ﬁnal step of this RW, we nowargue that the hypothesis of the lemma is suﬃcient. Indeed, ﬁrst note thatmin x Q t,B ( x, D ) ≥ (cid:0) − e − t | B | /k (cid:1) · min x Q B ( x, D ) , where e − t | B | /k is the probability that none of the generators in B are applied by time t . We imposebelow the condition that χ ( k − d ) ≫ log n ; since k − d ≤ k . log n , this in particular implies that | B | = χ ( k − d ) ≫

1. Since t & k , we deduce that t ≫ k/ | B | , ie e − t | B | /k = o (1). Thus the abovefailure probability allows us to perform a union bound to say, conditional on A = a ∈ A , that P (cid:0) min x Q t,B ( x, D ) ≤ − η (cid:12)(cid:12) A = a (cid:1) = o (1) , where the randomness is over the generators B , provided η decays suﬃciently slowly. For A ∈ A we have the desired lower bound on min x P t (0 , x ). Finally we average over A and use P ( A ∈ A ) =1 − o (1) to show that min x P t (0 , x ) ≥ n (1 − o (1)) whp.We next ﬁnd conditions under which the supposition of the lemma is satisﬁable. Lemma 5.6.

Assume that χ ( k − d ) k − log n ≪ and χ ( k − d ) ≫ log n . We can choose δ, χ, η ≪ so that, for all (deterministic) sets D ⊆ G with | G \ D | ≤ δ | G | and all x ∈ G uniformly, we have P (cid:0) Q B ( x, D ) ≤ − η (cid:1) = o (1 / | G | ) where Q B ( y, z ) := | B ± | − P b ∈ B ± ( y + b − = z ) for y, z ∈ G where B + := B and B − := B ∪ B − (as multisets). Proof.

Fix an arbitrary x ∈ G . We desire at least a proportion 1 − η of the generators in B to connect x to D . The generators are chosen independently, and each connect with probability | D | / | G | ≥ − δ . Since there are χ ( k − d ) generators, it thus suﬃces to choose η ≪ P (cid:0) Bin( χ ( k − d ) , − δ ) ≤ χ ( k − d )(1 − η ) (cid:1) = o (1 / | G | ) . Let L := χ ( k − d ). Direct calculation, using standard inequalities, gives P (cid:0) Bin( L, − δ ) ≤ L (1 − η ) (cid:1) = P (cid:0) Bin(

L, δ ) ≥ ηL (cid:1) ≤ (cid:0) LηL (cid:1) δ ηL ≤ ( δe/η ) ηL = ( δe/η ) ηχ ( k − d ) . We require this to be o (1 /n ). Here we separate the regimes k . log n and k ≫ log n .Consider ﬁrst k ≫ log n ; necessarily, k − d h k . In this case, we do not quantify δ : we simplyknow that δ = o (1). By choosing η and χ to vanish suﬃciently slowly (compared with δ ) gives( δe/η ) ηχ = o (1) . Raising this to the power k − d h k ≫ log n , we obtain super-polynomial decay.Consider now k . log n . We use the quantiﬁcation of δ = e − c ( k − d ) from Lemma 5.3. Choosing η to satisfy η ≥ e − c ( k − d )+1 , we deduce that ( δe/η ) ηχ ( k − d ) ≤ exp( − cηχ ( k − d ) ) . Choosing η vanishingsuﬃciently slowly so that ηχ ( k − d ) ≫ log n gives P (cid:0) Q B ( x, D ) ≤ − η (cid:1) = P (cid:0) Bin( χ ( k − d ) , δ ) ≥ ηχ ( k − d ) (cid:1) ≤ ( δe/η ) ηχ ( k − d ) = o (1 / | G | ) . This bound is independent of x , and hence holds for all x ∈ G uniformly, as required.It remains to show that the above conditions are satisﬁed under Hypothesis D. Lemma 5.7.

Suppose that Hypothesis D holds. Then we can choose some χ = o (1) satisfying χ ( k − d ) k − log n ≪ and χ ( k − d ) ≫ log n. In fact, Hypothesis D are equivalent to being able to pick such a χ . Proof.

We defer the proof of this auxiliary lemma to § .4 Auxiliary Lemmas It remains to give the deferred proofs of the auxiliary lemmas.

Proof of Lemma 5.3.

By Hypothesis D, we have k & (log n ) / and k − d & (log n ) / ; hence − log( δ ∞ | Γ | ) ≍ k − d ≍ − log( δ ∞ |H| ) and log | Γ | ≤ log |H| . k log n log log log n ≪ k − d. Also, − log δ ∞ ≍ k − d. By Proposition 3.13, this means that the TV distance conditioned ontypicality is at most e − c ( k − d ) for some constant c >

0, as desired. It remains to check that typicalityholds with suﬃciently high probability, ie with probability at least 1 − e − c ( k − d ) for some constant c >

0. Then following

Proof of Upper Bound in Theorem 3.6 then gives the quantiﬁcation.Quantifying the o (1)-error in Lemma 3.9 using the above relations, we see that global typicalityfails with probability at most e − c ( k − d ) for some constant c >

0. Lastly, we used [27, Deﬁnitions C.1and C.2 and Proposition C.3] to say that local typicality failed with probability o (1); this can bequantiﬁed using [27, Propositions C.5 and C.6]. Replace r ∗ in the deﬁnition of typicality by r ∗ log n .Then the quantiﬁed results show that the failure probability decays super-polynomially in n . Since k − d ≤ k . log n , this is at most e − c ( k − d ) for some constant c >

0. This adjustment to r ∗ increases | Γ | and |H| by a factor at most log n ; this has no eﬀect as log log n ≪ k − d . Proof of Lemma 5.4.

We have k h k ′ and k − d h k ′ − d . Observe that n /k h n /k ′ if and only if1 ≫ (cid:0) k ′ − k (cid:1) log n = (cid:0) k − χ ( k − d ) − k (cid:1) log n, ie χ ( k − d ) k − log n ≪ . The claim follows by Proposition 3.2a for 1 ≪ k . log n . On the other hand, if k ≫ log n , then t ∗ ( k, G ) h T ( k, n ) := log n / log( k/ log n );see § T ( k, n ) h T ( αk, n ) for all α ∈ (0 , ∞ ). Thus T ( k, n ) h T ( k ′ , n ) as k h k ′ . Proof of Lemma 5.7.

Rearranging the conditions, they are equivalent to p log n / χ ≪ k − d ≪ k / (cid:0) χ log n (cid:1) for some 1 / ( k − d ) ≤ χ ≪ . placing χ with χω for some ω ≫ χ ≥ / ( k − d )with χ ≫ / ( k − d ) gives an equivalent set of conditions. Let ε ∈ (0 , ∞ ) and set χ := εk ( k − d ) log n ; then s log nχ = √ k − d log n √ εk . The conditions on χ then, in terms of ε , become(log n ) ( k − d ) k ≪ ε ≪ nk ≪ ε ≪ ( k − d ) log nk . We can ﬁnd such an ε ∈ (0 , ∞ ), implicitly a sequence, if and only ifmax (cid:26) (log n ) ( k − d ) k , log nk (cid:27) ≪ min (cid:26) , ( k − d ) log nk (cid:27) . Some case analysis shows that this condition is equivalent to the ﬁrst condition of Hypothesis D.

In this section we compare the mixing time of a general nilpotent group G with a ‘corresponding’Abelian group G : we show that t mix ( G k ) /t mix ( G k ) ≤ o (1) whp. We apply this to upper boundthe 1 /n c -mixing times, for some constant c >

0, for G k : we show that it is order log n , from whichwe deduce that the graph is an expander, whp. 32 .1 Precise Statements We compare the mixing time for G with that for G . Speciﬁcally, we prove Theorem D, whichwe recall here for the reader’s convenience. Theorem 6.1.

Let G be a nilpotent group. Set G := ⊕ L ( G ℓ − /G ℓ ) where ( G ℓ ) ℓ ≥ is the lowercentral series of G and L := min { ℓ ≥ | G ℓ = { id }} . Suppose that ≪ log k ≪ log | G | and k − d ( G ) ≫ . Let ε > and let t ≥ (1 + ε ) t ∗ ( k, G ) . Then d G k ( t ) = o (1) whp. Remark.

An upper bound valid for all groups has already been established in the regime k ≫ log | G | at T ( k, | G | ) h t ∗ ( k, G ); recall Remark A.5. Thus we need only consider 1 ≪ k . log | G | . △ We use this mixing time bound to show that G k for nilpotent G is an expander whp when k − d ( G ) & log | G | . The isoperimetric constant was deﬁned in Deﬁnition E for d -regular graphs:Φ ∗ := min ≤| S |≤ | V | Φ( S ) where Φ( S ) := d | S | (cid:12)(cid:12)(cid:2) { a, b } ∈ E (cid:12)(cid:12) a ∈ S, b ∈ S c (cid:3)(cid:12)(cid:12) . Speciﬁcally, we prove Theorem E, which we recall here for the reader’s convenience.

Theorem 6.2.

Let G be a nilpotent group. Set G := ⊕ L ( G ℓ − /G ℓ ) where ( G ℓ ) ℓ ≥ is the lowercentral series of G and L := min { ℓ ≥ | G ℓ = { id }} . Suppose that k − d ( G ) & log | G | . Then Φ ∗ ( G k ) ≍ whp. The isoperimetric constant is deﬁned more generally for Markov chains; see [34, § Let L be the minimal integer such that G L is the trivial group. Consider the series of quotients( Q ℓ := G ℓ − /G ℓ ) Lℓ =1 . For each ℓ ∈ [ L ], choose a set R ℓ ⊆ G ℓ − of representatives for Q ℓ = G ℓ − /G ℓ .In order to sample Z i ∼ Unif( G ) it suﬃces to sample Z i,ℓ ∼ Unif( R ℓ ) for each ℓ independ-ently and then take the product: Z i := Z i, · · · Z i,L ; see Lemma 6.3. Then Z i,ℓ G ℓ ∼ Unif( Q ℓ )independently for each i and ℓ ; see Corollary 6.4.Suppose that M steps are taken; let σ : [ M ] → [ k ] indicate which generator is used in eachstep. Set S := Q Mm =1 Z σ ( m ) . For each ℓ ∈ [ L ], let S ℓ := Q Mm =1 Z σ ( m ) ,ℓ ; this is the projection of S to Q ℓ . Then each S ℓ G ℓ is a RW on Q ℓ , which is an Abelian group, but all using the choice σ .Since these are RWs on Abelian groups, the ordering in σ will not matter. For each i ∈ [ k ], let W i be the number of times in σ that generator Z i has been applied minus the number of timesthat Z − i has been applied. Let σ ′ be an independent copy of σ and deﬁne S ′ and W ′ via σ ′ and Z ; for each ℓ ∈ [ L ], deﬁne S ′ ℓ := Q Mm =1 Z σ ( m ) ,ℓ . Then S and S ′ are iid conditional on Z .To compare the RW on the nilpotent group with one on an Abelian group, we show that n P (cid:0) S = S ′ | ( W, W ′ ) (cid:1) ≤ n Q L P (cid:0) S ℓ G ℓ = S ′ ℓ G ℓ | ( W, W ′ ) (cid:1) = (cid:12)(cid:12) G/ g G (cid:12)(cid:12) , where g := gcd( W − W ′ , ..., W k − W ′ k , n ); see Proposition 6.6 and Corollary 6.9. Via analysing | G/ g G | , we showed in § § G k is mixed whp shortly after t ∗ ( k, G ); see speciﬁcallyLemma 2.11. From this and the inequality above, we are able to deduce that the RW on G k ismixed whp shortly after the same time. Let L be the minimal integer such that G L is the trivial group. Consider the series of quotients( Q ℓ := G ℓ − /G ℓ ) Lℓ =1 . For each ℓ ∈ [ L ], choose a set R ℓ ⊆ G ℓ − of representatives for Q ℓ = G ℓ − /G ℓ ,33e a set R ℓ with | R ℓ | = | Q ℓ | and { rG ℓ } r ∈ R ℓ = G ℓ − /G ℓ = Q ℓ .We want to sample the uniform generators by using uniform random variables on each of thequotients. In this way, projecting to one of the quotients, we get a RW on this quotient. Thefollowing two proofs are deferred to [27, Lemma F.5 and Corollary F.6], respectively. Lemma 6.3.

For each ℓ ∈ [ L ] , let Y ℓ ∼ Unif( R ℓ ) independently. Then Y := Y · · · Y L ∼ Unif( G ) . Corollary 6.4.

For each ( i, ℓ ) ∈ [ k ] × [ L ] , sample Z i,ℓ ∼ Unif( R ℓ ) independently and set Z i := Z i, · · · Z i,L . Then Z , ..., Z L ∼ iid Unif( G ) . Further, Z i,ℓ G ℓ ∼ Unif( Q ℓ ) independently for each ( i, ℓ ) . For the remainder of the section, assume that Z is drawn in this way. The next main result(Proposition 6.6) is the key element of the proof of Theorem 6.1. Informally, it reduces the problemto a collection of Abelian calculations, the like of which were handled when we established cutoﬀwhen the underlying group was Abelian. We ﬁrst need a preliminary ‘worst-case’ lemma.As is standard, we write 0 for the identity of an Abelian group. Lemma 6.5.

Let H be an Abelian group. Let Z , ..., Z k ∼ iid Unif( H ) . Let v ∈ Z k . Then max h ∈ H P (cid:0) v · Z = h (cid:1) = P (cid:0) v · Z = 0 (cid:1) . Proof.

Let h ∈ H . Write A ( h ) := { z ∈ H k | v · z = h } . If w ∈ A ( h ), then B := { z − w | z ∈ A ( h ) } ⊆ A (0); also, clearly, | B | = | A ( h ) | , so | A ( h ) | ≤ | A (0) | . Hence P (cid:0) v · Z = h (cid:1) = | A ( h ) | / | H | k ≤ | A (0) | / | H | k = P (cid:0) v · Z = 0 (cid:1) . We now prove the decomposition theorem. It crucially uses the nilpotency of the group.

Proposition 6.6.

Let

M, M ′ ∈ N . Let σ : [ M ] → [ k ] and σ ′ : [ M ′ ] → [ k ] . Let η ∈ {± } M and η ′ ∈ {± } M ′ . For ℓ ∈ [ L ] , set S ℓ := Q Mm =1 Z η m σ ( m ) ,ℓ , S ′ ℓ := Q Mm =1 Z η ′ m σ ′ ( m ) ,ℓ , S := Q Mm =1 Z η m σ ( m ) and S ′ := Q Mm =1 Z η ′ m σ ′ ( m ) . For i ∈ [ k ] , write v i := P m ∈ [ M ′ ]: σ ′ ( m )= i η ′ m − P m ∈ [ M ]: σ ( m )= i η m . Then P (cid:0) S = S ′ (cid:1) ≤ Q Lℓ =1 P (cid:0) S ℓ G ℓ = S ′ ℓ G ℓ (cid:1) = Q Lℓ =1 P (cid:0)P ki =1 v i Z i,ℓ G ℓ = id ( Q ℓ ) (cid:1) . Proof.

The claimed equality follows immediately from the fact that Q ℓ is Abelian.We now set up a little notation. Write A i,ℓ := Z i, · · · Z i,ℓ − and B i,ℓ := Z i,ℓ +1 · · · Z i,L ; then Z i = A i,ℓ Z i,ℓ B i,ℓ . (Here, A i, := id and B i,L := id .) Note that B j,ℓ ∈ G ℓ for all j ∈ [ k ] and ℓ ∈ [ L ].Let E ℓ := { S ′ S − ∈ G ℓ } . Then P (cid:0) S = S ′ (cid:1) = Q L P (cid:0) E ℓ | E ℓ − (cid:1) . For all g ∈ G and h ∈ G ℓ − , we have [ g, h ] ∈ G ℓ and hg = gh [ h − , g − ] = gh [ g, h ] − . We can hencewrite S ′ S − in the following way: S ′ S − = M ℓ N ℓ · (cid:0)Q M ′ m =1 B η ′ m σ ′ ( m ) ,ℓ C ′ σ ′ ( m ) ,ℓ (cid:1) · (cid:0)Q Mm =1 B − η M +1 − m σ ( M +1 − m ) ,ℓ C ′ σ ( M +1 − m ) ,ℓ (cid:1) for some C j,ℓ , C ′ j,ℓ ∈ G ℓ and M ℓ and N ℓ deﬁned as follows: M ℓ := (cid:0)Q M ′ m =1 A η ′ m σ ′ ( m ) ,ℓ (cid:1) · (cid:0)Q Mm =1 A − η M +1 − m σ ( M +1 − m ) ,ℓ (cid:1) N ℓ := (cid:0)Q M ′ m =1 Z η ′ m σ ′ ( m ) ,ℓ (cid:1) · (cid:0)Q Mm =1 Z − η M +1 − m σ ( M +1 − m ) ,ℓ (cid:1) ∈ G ℓ − . We thus see that E ℓ − = { S ′ S − ∈ G ℓ − } holds if and only if { M ℓ ∈ G ℓ − } holds. Crucially, thisimplies that the indicator ( E ℓ − ) of this event is independent of N ℓ .We claim the following:given that S ′ S − ∈ G ℓ − , we have S ′ S − ∈ G ℓ if and only if M ℓ N ℓ ∈ G ℓ . To prove this, ﬁrst make the following observations, recalling that G ℓ − /G ℓ is Abelian:34 for all α ∈ G ℓ − , we have αG ℓ = G ℓ and ( αβ ) G ℓ = ( αG ℓ )( βG ℓ ) for all β ∈ G ; · B j,ℓ , C j,ℓ , C ′ j,ℓ ∈ G ℓ for all j ∈ [ k ] and N ℓ ∈ G ℓ − ; · S ′ S − ∈ G ℓ − if and only if M ℓ ∈ G ℓ − , and so M ℓ N ℓ ∈ G ℓ − .Assume that S ′ S − ∈ G ℓ − . Applying these observations in the above formula above gives S ′ S − G ℓ = ( M ℓ N ℓ G ℓ ) · (cid:0)Q M ′ m =1 ( B η ′ m σ ′ ( m ) ,ℓ G ℓ )( C ′ σ ′ ( m ) ,ℓ G ℓ ) (cid:1) · (cid:0)Q Mm =1 ( B − η M +1 − m σ ( M +1 − m ) ,ℓ G ℓ )( C σ ( M +1 − m ) ,ℓ G ℓ ) (cid:1) = M ℓ N ℓ G ℓ . Thus S ′ S − ∈ G ℓ − if and only if M ℓ N ℓ ∈ G ℓ − , as claimed.Now, M ℓ is independent of N ℓ and so N ℓ is independent also of ( E ℓ − ). Thus P (cid:0) E ℓ | E ℓ − (cid:1) = P (cid:0) M ℓ N ℓ ∈ G ℓ | E ℓ − (cid:1) ≤ max x ∈ G ℓ − P (cid:0) xN ℓ ∈ G ℓ (cid:1) . Now, G ℓ − /G ℓ is Abelian and N ℓ is a product of generators Z j,ℓ and Z − j,ℓ for diﬀerent j ∈ [ k ].Hence we are in the set-up of Lemma 6.5. Applying said lemma, we deduce that P (cid:0) E ℓ | E ℓ − (cid:1) ≤ P (cid:0) N ℓ ∈ G ℓ (cid:1) = P (cid:0) S ℓ G ℓ = S ′ ℓ G ℓ (cid:1) , using the deﬁnition of N ℓ . This proves the desired inequality. When establishing cutoﬀ for RWs on Abelian groups, we had to bound a very similar expressionto those in the product of Proposition 6.6. In particular, since the Q ℓ are Abelian groups, it doesnot matter in which order the generators are applied. So instead of considering the exact sequence σ : [ M ] → [ k ], it suﬃces to consider W where W i := P Mm =1 ( σ ( m ) = i ) for each i ∈ [ k ].Key in analysing these Abelian-type terms are gcds: for all w, w ′ ∈ Z k , deﬁne g ( w,w ′ ) := gcd (cid:0) w − w ′ , w − w ′ , ..., w k − w ′ k , | G | (cid:1) . We use this to evaluate the right-hand side of Proposition 6.6, culminating in Corollary 6.9.

Lemma 6.7.

Let ℓ ∈ [ L ] . For all w, w ′ ∈ Z k , we have P ki =1 v i Z i,ℓ G ℓ ∼ Unif (cid:0) g ( w,w ′ ) Q ℓ (cid:1) . Proof.

Corollary 6.4 says that each Z i,ℓ G ℓ is an independent Unif( Q ℓ ). [27, Lemma F.1] in the sup-plementary material says that linear combinations of independent random variables in an Abeliangroup are also uniform, but on the subgroup given by the gcd of the coeﬃcients.This leads us to a bound on P ( w,w ′ ) ( S = S ′ ) in terms of a product of | Q ℓ | / | γQ ℓ | over ℓ ∈ [ L ],for some γ which is a suitable gcd. The following lemma controls this product. Lemma 6.8.

For all γ ∈ N , we have Q Lℓ =1 | γQ ℓ | = | γG | . Proof.

For any Abelian groups A and B and any γ ∈ N , we have γ ( A ⊕ B ) = ( γA ) ⊕ ( γB ) and | A ⊕ B | = | A || B | . Since G was deﬁned to be a direct sum of the Q ℓ , the claim now follows.Let ( S ′ , W ′ ) be an independent copy of ( S, W ). Combining Proposition 6.6 and Lemmas 6.7and 6.8 gives the following corollary. For w, w ′ ∈ Z k , write P ( w,w ′ ) ( · ) := P ( · | ( W, W ′ ) = ( w, w ′ )) . Corollary 6.9.

For all w, w ′ ∈ Z k , we have n P ( w,w ′ ) (cid:0) S = S ′ (cid:1) ≤ Q Lℓ =1 | Q ℓ | / | g ( w,w ′ ) Q ℓ | = | G | / | g ( w,w ′ ) G | = (cid:12)(cid:12) G/ g ( w,w ′ ) G (cid:12)(cid:12) . roof. Note that | Q ℓ | divides | G | , and so gcd( v , ..., v k , | Q ℓ | ) ≤ gcd( v , ..., v k , | G | ) for all v ∈ Z k .Also, for any Abelian subgroup H of G , if α ≀ | H | and α ≀ β , then αH ≤ βH . Combined withProposition 6.6 and Lemma 6.7, this proves the inequality. The ﬁrst equality follows immediatelyfrom Lemma 6.8. The second equality follows from Lagrange’s theorem.Observe that the right-hand side of this corollary depends only on the Abelian group G . Byapplying the results used for Abelian groups, we can prove Theorem 6.1; we explain this now. Here,as there, we use a modiﬁed L calculation; see Lemma 2.6. Lemma 6.10 (Lemma 2.6) . For all t ≥ and all W ⊆ Z k , the following inequalities hold: d G k ( t ) = (cid:13)(cid:13) P G k (cid:0) S ( t ) ∈ · (cid:1) − π G (cid:13)(cid:13) TV ≤ (cid:13)(cid:13) P G k (cid:0) S ( t ) ∈ · | W ( t ) ∈ W (cid:1) − π G (cid:13)(cid:13) TV + P (cid:0) W ( t ) / ∈ W (cid:1) ;4 E (cid:0)(cid:13)(cid:13) P G k (cid:0) S ( t ) ∈ · | W ( t ) ∈ W (cid:1) − π G (cid:13)(cid:13) (cid:1) ≤ n P (cid:0) S ( t ) = S ′ ( t ) | W ( t ) , W ′ ( t ) ∈ W (cid:1) − . Proof of Theorem 6.1.

Let

W ⊆ Z k be arbitrary for the moment. Set D := n P (cid:0) S = S ′ | typ (cid:1) − typ := { W, W ′ ∈ W} . Abbreviate g := g ( W,W ′ ) . Applying now Corollary 6.9, we obtain D ≤ P γ ∈ N P (cid:0) g = γ | typ (cid:1) · (cid:12)(cid:12) G/γG (cid:12)(cid:12) − . This latter expression is purely a statistics of the Abelian group G . We established the upperbound on mixing by looking at precisely this quantity. Bounding it was one of the main challenges.There were three diﬀerent arguments for bounding it, corresponding to diﬀerent regimes of k . Webrieﬂy outline these arguments now. The choice of W varies from argument to argument. · In § | G/γG | ≤ γ d ( G ) ; we then used unimodality to show that P ( γ ≀ W i | W i = 0) ≤ /γ, and convert this into P ( g = γ | typ ) ≤ (1 /γ + P ( W = 0 | typ )) k . · In § W, W ′ ) taken modulo γ , for each γ ; we then used entropic considerationsto bound P ( g = γ | typ ) ≪ | G/γG | in a quantitative sense. · In § k, G ); see Hypotheses A to C. At least oneof these is satisﬁed if 1 ≪ k . log | G | and k − d ( G ) ≫

1; see Remarks 2.5, 3.7 and 4.2.We need to choose the set W ; see Deﬁnitions 2.7 and 3.8 for the respective deﬁnitions. (Inthose deﬁnitions, replace G with G .) See Propositions 2.9, 3.13 and 4.6 speciﬁcally for the resultsbounding this sum. The conclusion of these results is that D ≤ P γ ∈ N P (cid:0) g = γ | typ (cid:1) · (cid:12)(cid:12) G/γG (cid:12)(cid:12) − o (1) . Combined with the modiﬁed L calculation of Lemma 6.10 this completes the proof. In this subsection we prove Corollaries D.1 and D.3. Namely, we establish cutoﬀ for the RW ona nilpotent group with small commutator—this includes high-dimensional Heisenberg groups.

Proof of Corollary D.1.

The lower bound is a relatively straightforward projection argument. Weoutline it here; more details are given in [24, § G ab is Abelian, so a lower bound of t ∗ ( k, G ab ) holds by Theorem A.We now argue that it suﬃces to establish an upper bound of t ∗ ( k, G ). Indeed, G = G ab ⊕ [ G, G ].The conditions of Corollary D.1 are precisely those of [27, Proposition B.30] with A := G ab and B :=[ G, G ]. Said proposition implies that t ∗ ( k, G ) h t ∗ ( k, G ab ) . Hence an upper bound of t ∗ ( k, G ) indeedsuﬃces. We now establish this upper bound, analysing k ≫ log | G | and k . log | G | separately.Consider ﬁrst k ≫ log | G | . Here it is known that T ( k, | G | ) = log | G | / log( k/ log | G | ) h t ∗ ( k, G )gives an upper bound, regardless of the underlying group; see § k . log | G | . The upper bound of t ∗ ( k, G ) is immediate from Theorem D.36ecall that t m ( k, Z rm ) is the time at which the entropy of RW on Z km reaches log( m r ) = log | Z rm | . Proof of Corollary D.2.

The lower bound argument is exactly the same as for Corollary D.1.For the upper bound, we slightly reﬁne the argument used to prove Theorem D. First, we claimthat G ab ∼ = Z rp . Indeed, the Frattini subgroup Φ( G ) satisﬁes Φ( G ) = [ G, G ] G p when G is a p -groupwhere G p := h g p | g ∈ G i . By deﬁnition of being special, Φ( G ) = [ G, G ], thus G p ≤ [ G, G ]. Inparticular, the Abelianisation is of exponent p , as required. Thus G ∼ = Z ℓp as G com ∼ = Z sp .To prove that general upper bound of Theorem D, we cited three approaches used in establishingthe upper bound for general Abelian groups in Theorem A, ie § § § H ∼ = Z ℓp , not simply a general Abelian group. Thus we do not need the full generality of these citedapproaches. We instead apply the approach of [25, Theorem B]; there we study Z ℓp in detail, usingexactly the same modiﬁed L method, but now specialised to this group. The conditions for thisapproach are only k ≥ ℓ —rather than k − r ≫ s (ie k − ℓ ≫ s , as s ≪ r ) previously.We turn to the entropic time. We have γ Z rp = Z rp unless p ≀ γ . Thus the worst-case γ in t ∗ ( k, Z rp ) = max γ ≀ p r t γ ( k, Z rp ) is γ = p . Thus t ∗ ( k, Z rp ) = t p ( k, Z rp ) . Regarding group generation, it is standard that to generate a nilpotent group it suﬃces togenerate the Abelianisation, which in this case is Z rp . We analyse the probability of generatinggeneral Abelian groups in [25, Lemma 8.1]. From this, one sees that k − r ≫ p ≫ k − r > H m,d , we have the explicit expression H ab m,d ∼ = Z d − m , even for m notprime. This allows us to evaluate t ∗ ( k, H ab m,d ) even when m is not prime, provided m ≫ Proof of Corollary D.3.

Let r := 2 d −

4. Since γ Z rm = Z rm/ gcd( γ,m ) , by replacing γ by gcd( γ, m ),we need only consider t γ ( k, Z rm ) with γ ≀ m . For such γ , we have | Z rm /γ Z rm | = γ r . Thus t γ ( k, Z rm ) = t γ ( k, Z rγ ). We analyse these times in detail in [27, Proposition B.25a]; replace d there with r . Itis not diﬃcult to see from said proposition that if k − r ≍ k and m ≫ γ in t ∗ ( k, Z rm ) = max γ ≀ m r t γ ( k, Z rm ) is γ = m . Thus t ∗ ( k, Z rm ) = t m ( k, Z rm ). Further, [27, Proposi-tions A.2 and B.25a] together show that in fact t m ( k, Z rm ) h t ∞ ( k, Z rm ) under these conditions.Finally, after Corollary D.2 we mentioned that special groups are ubiquitous amongst p -groupsof a given size. We elaborate on this claim in the following remark. Remark 6.11.

In his classical work [29], Higman gave upper and lower bounds on the numbergroups of size p ℓ for a prime p . The upper bound was later reﬁned by Sims [44]. Together theyshow that this number is p (2 / ℓ ±O ( ℓ / ) . The lower bound p (2 / ℓ −O ( ℓ ) is obtained from Hig-man [29, Theorem 2.1] by counting step-2 groups whose Frattini group is equal to the centre andis elementary Abelian of size p s and of index p r , where r = ℓ − s . It is classical that such a groupis special if and only if it has exponent p , ie every element other than the identity has order p .Higman [29] showed that the number of such groups of size p ℓ is between p (1 / sr ( r +1) − r − s and p (1 / sr ( r +1) − s ( s − if s ≤ r ( r + 1) and 0 otherwise. A small variant of his argument showsthat the number of special groups of size p s + r whose commutator subgroup is of size p s is between p (1 / sr ( r − − r − s and p (1 / sr ( r − − s ( s − for s ≤ r ( r − F all elementsof order p . See also the short argument in Sims [44, Page 152]; there the change is considering thecase that b ( i, j ) = 0 for all 1 ≤ i ≤ r and 1 ≤ j ≤ s .) Taking r and s such that | r − s | < p ℓ is dominated by special groups. △ k & log | G | We analyse the spectral gap via considering the 1 /n c -mixing time for some c > Proposition 6.12.

Let G be a nilpotent group. Suppose that k − d ( G ) ≍ k ≍ log | G | . Let ε > and set t := (1 + ε ) t −∗ ( k, G ) . Then there exists a constant c > so that d − G k ( t ) ≤ | G | − c whp. Proof.

Consider ﬁrst Abelian G ; here, G = G . Since Hypothesis D is satisﬁed, d G k ( t ) ≤ e − c ′ ( k − d ( G )) whp for some constant c ′ > k − d ( G ) ≍ log | G | here.37onsider now nilpotent G ; here, G = G . We apply our nilpotent-to-Abelian method. There weupper bounded the modiﬁed L distance for the RW on G (at time t ) by the modiﬁed L distancefor the RW on G (at time t ); see speciﬁcally Proposition 6.6, Lemma 6.7 and Corollary 6.9. ForAbelian groups we used the modiﬁed L calculation (in § § Proof of Theorem 6.2.

As noted in Remark E, it suﬃces to consider k ≍ log n .First, use the well-known discrete analogue of Cheeger’s inequality, discovered independently bymultiple authors: for a discrete-time, ﬁnite, reversible Markov chain, writing γ for its spectral gap, γ ≤ Φ ∗ ≤ p γ ;see, eg, [34, Theorem 13.10]. The spectral gap for the discrete-time chain is up to constants thesame as that of the associated (rate-1) continuous time chain. Thus to show that Φ ∗ ≍ γ ≍ n with uniform invariant distribution, writing t mix ( · ) for its mixing time t mix ( · ) and γ for its spectral gap, t mix (1 /n c ) ≍ γ − log n for any constant c > γ ≍ t mix (1 /n c ) . log n for some constant c >

0. This then implies that Φ ∗ ≍ k ≍ log | G | . § k is a ﬁxed constant. § k ≫ log | G | . § L and relative entropy. § k Is Constant

Throughout the paper we have always been assuming that k → ∞ as | G | → ∞ . It is naturalto ask what happens when k does not diverge. This case has actually already been covered byDiaconis and Saloﬀ-Coste [16], using their concept of moderate growth . There is no cutoﬀ.Diaconis and Saloﬀ-Coste establish this not only for Abelian groups, but for nilpotent groups.Recall that a group G is called nilpotent of step at most L if its lower central series terminates inthe trivial group after at most L steps: G := G and G ℓ := [ G ℓ − , G ] for ℓ ∈ N with G L = { id } .For a Cayley graph G ( Z ), use the following notation. Write ∆ := diam G ( Z ) for its diameter.For the lazy simple random walk on G ( Z ), write t rel := t rel ( G ( Z )) for the relaxation time (ie theinverse of the spectral gap) and t mix := t mix ( ε ; G ( Z )) for the (TV) ε -mixing time, for ε ∈ (0 , G N ( Z ( N ) )) N ∈ N , add an N -sub/superscript.We phrase the result of Diaconis and Saloﬀ-Coste [16] in our language. Theorem 7.1 (cf [16, Corollary 5.3]) . Let ( G N ) N ∈ N be a sequence of ﬁnite, nilpotent groups. Foreach N ∈ N , let Z ( N ) be a symmetric generating set for G N and write L N for the step of G N .Suppose that sup N | Z ( N ) | < ∞ and sup N L N < ∞ . Then t N mix /k N . ∆ N . t N rel . t N mix as N → ∞ ;in particular, ( t N mix ) N ∈ N does not exhibit the cutoﬀ phenomenon

38e give a very brief exposition of the results of Diaconis and Saloﬀ-Coste [16], including thedeﬁnition of moderate growth, leading to this conclusion in [25, § In this subsection we give a very short argument upper bounding the mixing time for arbitrarygroups and k ≫ log | G | ; it is a small modiﬁcation of Roichman’s argument [43, Theorem 2], butit applies in both the undirected and directed cases. (Roichman [43, Theorem 1] deals with thedirected case, but requires additional matrix algebra machinery.)The proof proceeds as follows. Assume that k ≫ log | G | and log k ≪ log | G | ; let ε > t := (1 + ε ) log | G | / log( k/ log | G | ). Note that 1 ≪ k ≪ t . Choose some ω ≫

1, diverging arbitrarilyslowly; set t ± := ⌊ t (1 ± ω/ √ t ) ⌋ and L := ω ⌊ t /k ⌋ . Whp the number of generators picked at mostonce is at least k − L ; whp of these the number picked exactly once lies in [ t − , t + ]. Take typ to bethe event that these two conditions hold for two independent copies, W and W ′ . We use a modiﬁed L calculation (see, eg, Lemma 2.6) meaning that we need to control | G | P (cid:0) S = S ′ | W = W ′ , typ (cid:1) − . Let E be the event that some generator is used once in W and not at all in W ′ or vice versa, ie E := S i ∈ [ k ] (cid:0) {| W i | = 1 , | W ′ i | = 0 } ∪ {| W ′ i | = 1 , | W i | = 0 } (cid:1) . Then S ′ · S − ∼ Unif( G ) on E . Indeed, if Z ∼ Unif( G ) and X, Y ∈ G are independent of Z , then XZY ∼ Unif( G ); here Z corresponds to one of the generators used once in W and not in W ′ orvice versa, with the obvious choice of X and Y so that XZY = S ′ S − . Oﬀ E , every generatorpicked once in W must be picked at least once in W ′ and vice versa. There are at most L generatorswhich are picked more than once in W ′ . Thus P (cid:0) E | typ (cid:1) ≤ min a ∈ [ t − ,t + ] ,b ≤ L / (cid:0) k − ba − b (cid:1) = 1 / (cid:0) k − Lt − − L (cid:1) . An application of Stirling’s approximation shows that this probability is o (1 / | G | ) when ω divergessuﬃciently slowly. Combined with the modiﬁed L calculation, this proves the upper bound.Finally, consider the case k = | G | α for some ﬁxed α ∈ (0 , ⌈ /α ⌉ − (cid:0) kt (cid:1) ≫ | G | for t := ⌊ /α ⌋ + 1, by the above argument we see that the walk is mixed whp after t steps.Dou proves a more general statement than this which allows the generators to be picked froma distribution other than the uniform distribution; see [19, Theorems 3.3.1 and 3.4.7]. One can also consider cutoﬀ in the L distance. For time t ≥

Let γ ∈ Z ∪ {∞} . Let ˜ t γ ( k, G ) be the time t at which the return probability for SRWon Z kγ at time t is | G/γG | − . Equivalently, ˜ t γ ( k, G ) := ks where s is the unique solution to P ( X s = 0) = | G/γG | − /k where ( X s ) s ≥ is a rate- SRW on Z γ . Set ˜ t ∗ ( k, G ) := max γ ∈ N ˜ t γ ( k, G ) . For reasons explained below, we strongly believe that the following is true—and can be provedin the framework which we have developed in this article.

Let G be an Abelian group and suppose that ≪ k . log | G | . Suppose that k − d ( G ) ≫ . Then, whp, the RW on G ± k exhibits cutoﬀ in the L metric at time ˜ t ∗ ( k, G ) . L calculation. By replacing r ∗ := | G | /k (log k ) with r ∗ := | G | /k log | G | , local typicality holds with probability 1 − o (1 / | G | );cf Proposition 6.12. Thus we may condition on local typicality as this can only change the L distance by at most an o (1) additive term. On the other hand, we no longer condition on globaltypicality. Instead we must handle directly terms like P ( W = W ′ ) or P ( W γ = W ′ γ ) . For Approach ≪ k . p log | G | ,increasing r ∗ as we have has little eﬀect on the proof, in essence because (log n ) d = n o (1) . InApproach |H| by |H| log | G | , but still k ≫ log | G | implies that k ≫ log( |H| log | G | ).Lastly, the combination of the two approaches works when p log | G | / log log | G | ≪ k . p log | G | . Using somewhat similar adaptations, we believe that cutoﬀ in the relative entropy (abbreviated RE ) distance can be established. In this case, we quantify the probability with which global typical-ity holds: the maximal relative entropy of a measure on G with respect to π G is log | G | ; thus, naivelyat least, to condition on global typicality we desire it to hold with probability 1 − o (1 / log | G | )—for L we had 1 − o (1 / | G | ). Also, one should modify local typicality as previously. This gives conditionson k and d ( G ). Under such conditions, the RE and TV cutoﬀ times should then be the same.We believe that with more eﬀort these conditions can be improved via obtaining some estimateson the relative entropy given that global typicality fails. We close the paper with some questions which are left open.

1: Does the Product Condition Imply Cutoﬀ?

The problem of singling out abstract conditions under which the cutoﬀ phenomenon occurshas drawn considerable attention. For a reversible Markov chain X , write t mix ( X ) for its mixingtime and γ gap ( X ) for its spectral gap. In 2004, Peres [41] proposed a simple spectral criterion fora sequence ( X N ) N ∈ N of reversible Markov chains, known as the product condition :cutoﬀ is equivalent to t mix ( X N ) γ gap ( X N ) → ∞ as N → ∞ . It is well-known that the product condition is a necessary condition for cutoﬀ; see, eg, [34,Proposition 18.4]. It is relatively easy to artiﬁcially create counter-examples, but these are not‘natural’; see, eg, [34, §

18] where constructions due to Aldous and due to Pak are described. Theproduct condition is widely believed to be suﬃcient for “most” chains.We conjecture that the product condition implies cutoﬀ for random Cayley graph of Abeliangroups. In fact, we conjecture this whenever G is nilpotent of bounded step (denoted step G ), iehas lower central series terminating at the trivial group and this sequence is of bounded length. Conjecture 1.

Let ( G N ) N ∈ N be a sequence of ﬁnite, nilpotent group and ( Z ( N ) ) N ∈ N asequence of subsets with Z ( N ) ⊆ G N for all N ∈ N . For each N ∈ N , write t N mix , respect-ively γ N gap , for the mixing time, respectively spectral gap, of the SRW on G N ( Z ( N ) ) .Suppose that lim sup N →∞ step G N < ∞ and that the product condition holds, ie t N mix γ N gap → ∞ as N → ∞ . Then the sequence of SRWs exhibits cutoﬀ. An equivalence between the product condition and cutoﬀ has been established for birth-and-death chains by Ding, Lubetzky and Peres [18] and, more generally, for RWs on trees by Basu,Hermon and Peres [4]. It is believed to imply cutoﬀ for the SRW on transitive expanders of boundeddegree, but this is known only in the case of Ramanujan graphs, due to Lubetzky and Peres [36].

2: An Explicit Choice of Generators

We have shown that if one chooses the generators Z uniformly, then one obtains cutoﬀ whp,at a time which does not depend on Z . In particular, this means that there is cutoﬀ for almostall choices of generators at a time independent of the choice of generators. This ‘almost universal’mixing time is given by t ∗ ( k, G ) from Deﬁnition 3.1. A question raised to us by Diaconis [15] is toﬁnd explicit sets of generators for which cutoﬀ occurs; see also [13, Chapter 4G, Question 2].40 pen Problem 2. Let G be an Abelian group and ≪ k . log | G | . Find an explicitchoice of generators Z so that the RW on G ( Z ) exhibits cutoﬀ. Further, ﬁnd generatorsso that the cutoﬀ time is t ∗ ( k, G ) . Hough [32, Theorem 1.11] shows for the cyclic group Z p with p prime that the choice Z :=[0 , ± , ± , ..., ± ⌈ log p ⌉− ], which he describes as “an approximate embedding of the classical hy-percube walk into the cycle”, gives rise to a random walk on Z p which has cutoﬀ. The cutoﬀ timeis not the entropic time, however. Although the entropic time is the mixing time for ‘most’ choiceof generators, ﬁnding an explicit choice of generators which gives rise to cutoﬀ at the entropic timeis still open—even for the cyclic group of prime order. References [1] D. Aldous and P. Diaconis (1985). Shuﬄing Cards and Stopping Times.

Technical Report 231, Depart-ment of Statistics, Stanford University . Available online[2] D. Aldous and P. Diaconis (1986). Shuﬄing Cards and Stopping Times.

Amer. Math. Monthly . .5(333–348) MR841111 DOI[3] N. Alon and Y. Roichman (1994). Random Cayley Graphs and Expanders. Random Structures Algorithms . .2 (271–284) MR1262979 DOI[4] R. Basu, J. Hermon and Y. Peres (2017). Characterization of Cutoﬀ for Reversible Markov Chains. Ann.Probab. .3 (1448–1487) MR3650406 DOI[5] A. Ben-Hamou, E. Lubetzky and Y. Peres (2018). Comparing Mixing Times on Sparse Random Graphs. Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms , SIAM, Phil-adelphia, PA (1734–1740) MR3775901 DOI[6] A. Ben-Hamou and J. Salez (2017). Cutoﬀ for Nonbacktracking Random Walks on Sparse RandomGraphs.

Ann. Probab. .3 (1752–1770) MR3650414 DOI[7] N. Berestycki, E. Lubetzky, Y. Peres and A. Sly (2018). Random Walks on the Random Graph. Ann.Probab. .1 (456–490) MR3758735 DOI[8] C. Bordenave, P. Caputo and J. Salez (2019). Cutoﬀ at the “Entropic Time” for Sparse Markov Chains. Probab. Theory Related Fields . .1-2 (261–292) MR3916108 DOI[9] C. Bordenave and H. Lacoin (2018). Cutoﬀ at the Entropic Time for Random Walks on Covered ExpanderGraphs. arXiv:1812.06769 [10] S. Chen, C. Moore and A. Russell (2013). Small-Bias Sets for Nonabelian Groups: Derandomizations ofthe Alon–Roichman Theorem. Approximation, Randomization, and Combinatorial Optimization , LectureNotes in Comput. Sci. Springer, Heidelberg (436–451) MR3126546 DOI[11] D. Christoﬁdes and K. Markstr¨om (2008). Expansion Properties of Random Cayley Graphs and VertexTransitive Graphs via Matrix Martingales.

Random Structures Algorithms . .1 (88–100) MR2371053DOI[12] G. Conchon–Kerjan (2019). Cutoﬀ for Random Lifts of Weighted Graphs. arXiv:1908.02898 [13] P. Diaconis (1988). Group Representations in Probability and Statistics . Institute of MathematicalStatistics, Hayward, CA MR964069[14] P. Diaconis (2013). Some Things We’ve Learned (about Markov Chain Monte Carlo).

Bernoulli . .4(1294–1305) MR3102552 DOI[15] P. Diaconis (2019). Private Communication.[16] P. Diaconis and L. Saloﬀ-Coste (1994). Moderate Growth and Random Walk on Finite Groups. Geom.Funct. Anal. .1 (1–36) MR1254308 DOI[17] P. Diaconis and P. M. Wood (2013). Random Doubly Stochastic Tridiagonal Matrices. Random StructuresAlgorithms . .4 (403–437) MR3068032 DOI[18] J. Ding, E. Lubetzky and Y. Peres (2010). Total Variation Cutoﬀ in Birth-and-Death Chains. Probab.Theory Related Fields . .1-2 (61–85) MR2550359 DOI[19] C. Dou (1992). Studies of Random Walks on Groups and Random Graphs. Thesis, MassachusettsInstitute of Technology MR2716375

20] C. Dou and M. Hildebrand (1996). Enumeration and Random Random Walks on Finite Groups.

Ann.Probab. .2 (987–1000) MR1404540 DOI[21] G. H. Hardy and E. M. Wright (2008). An Introduction to the Theory of Numbers . Sixth ed., OxfordUniversity Press, Oxford MR2445243[22] J. Hermon, H. Lacoin and Y. Peres (2016). Total Variation and Separation Cutoﬀs Are Not Equivalent andNeither One Implies the Other.

Electronic Journal of Probability . (Paper No. 44, 36 pp.) MR3530321DOI[23] J. Hermon and S. Olesker-Taylor (2021). Cutoﬀ for Almost All Random Walks on Abelian Groups. Available on arXiv [24] J. Hermon and S. Olesker-Taylor (2021). Cutoﬀ for Random Walks on Upper Triangular Matrices.

Available on arXiv [25] J. Hermon and S. Olesker-Taylor (2021). Further Results and Discussions on Random Cayley Graphs.

Available on arXiv [26] J. Hermon and S. Olesker-Taylor (2021). Geometry of Random Cayley Graphs of Abelian Groups.

Avail-able on arXiv [27] J. Hermon and S. Olesker-Taylor (2021). Supplementary Material for Random Cayley Graphs Project.

Available on arXiv [28] J. Hermon, A. Sly and P. Sousi (2020). Universality of Cutoﬀ for Graphs With an Added RandomMatching. arXiv:2008.08564 [29] G. Higman (1960). Enumerating p -Groups. I: Inequalities. Proceedings of the London MathematicalSociety . (24–30) MR113948 DOI[30] M. Hildebrand (1994). Random Walks Supported on Random Points of Z /n Z . Probab. Theory RelatedFields . .2 (191–203) MR1296428 DOI[31] M. Hildebrand (2005). A Survey of Results on Random Random Walks on Finite Groups. Probab. Surv. (33–63) MR2121795 DOI[32] R. Hough (2017). Mixing and Cut-Oﬀ in Cycle Walks. Electron. J. Probab. (Paper No. 90, 49 pp.)MR3718718 DOI[33] Z. Landau and A. Russell (2004). Random Cayley Graphs Are Expanders: A Simple Proof of the Alon–Roichman Theorem. Electron. J. Combin. .1 (Research Paper 62, 6 pp.) MR2097328 DOI[34] D. A. Levin, Y. Peres and E. L. Wilmer (2017). Markov Chains and Mixing Times . Second ed., AmericanMathematical Society, Providence, RI, USA MR3726904 DOI[35] P.-S. Loh and L. J. Schulman (2004). Improved Expansion of Random Cayley Graphs.

Discrete Math.Theor. Comput. Sci. .2 (523–528) MR2180056[36] E. Lubetzky and Y. Peres (2016). Cutoﬀ on All Ramanujan Graphs. Geom. Funct. Anal. .4 (1190–1216) MR3558308 DOI[37] E. Lubetzky and A. Sly (2010). Cutoﬀ Phenomena for Random Walks on Random Regular Graphs. DukeMath. J. .3 (475–510) MR2667423 DOI[38] I. Pak (1999). Random Cayley Graphs with O (log | G | ) Generators Are Expanders.

Algorithms—ESA ’99(Prague) , Lecture Notes in Comput. Sci. Springer, Berlin (521–526) MR1729149 DOI[39] I. Pak (1999). Random Walks on Finite Groups with Few Random Generators.

Electron. J. Probab. (Paper No. 1, 11 pp.) MR1663526 DOI[40] I. Pak (2001). Combinatorics, Probability, and Computations on Groups Lecture Notes. Available at [41] Y. Peres (2004). American Institute of Mathematics Research Workshop “Sharp Thresholds for MixingTimes” (Palo Alto).

Summary available at [42] C. Pomerance (2001). The Expected Number of Random Elements to Generate a Finite Abelian Group.

Period. Math. Hungar. .1-2 (191–198) MR1830576 DOI[43] Y. Roichman (1996). On Random Random Walks. Ann. Probab. .2 (1001–1011) MR1404541 DOI[44] C. C. Sims (1965). Enumerating p -Groups. Proceedings of the London Mathematical Society . (151–166) MR169921 DOI[45] A. Smith (2017). The Cutoﬀ Phenomenon for Random Birth and Death Chains. Random StructuresAlgorithms . .2 (287–321) MR3607126 DOI[46] D. B. Wilson (1997). Random Random Walks on Z d . Probab. Theory Related Fields . .4 (441–457)MR1465637 DOI.4 (441–457)MR1465637 DOI