[PDF] Mixing time of fractional random walk on finite fields

Abstract

We study a random walk on \mathbb{F}_p defined by X_{n+1}=1/X_n+\varepsilon_{n+1} if X_n\neq 0, and X_{n+1}=\varepsilon_{n+1} if X_n=0, where \varepsilon_{n+1} are independent and identically distributed. This can be seen as a non-linear analogue of the Chung--Diaconis--Graham process. We show that the mixing time is of order \log p, answering a question of Chatterjee and Diaconis.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b MIXING TIME OF FRACTIONAL RANDOM WALK ONFINITE FIELDS

JIMMY HE, HUY TUAN PHAM, AND MAX WENQIANG XU

Abstract.

We study a random walk on F p deﬁned by X n +1 = 1 /X n + ε n +1 if X n = 0, and X n +1 = ε n +1 if X n = 0, where ε n +1 are independentand identically distributed. This can be seen as a non-linear analogueof the Chung–Diaconis–Graham process. We show that the mixing timeis of order log p , answering a question of Chatterjee and Diaconis [12]. Introduction

This paper studies a non-linear analogue of the Chung–Diaconis–Grahamprocess, deﬁned on F p by X n +1 = aX n + ε n +1 , where a ∈ F × p is ﬁxed and the ε i are independent and identically distributed.The mixing time of this Markov chain has been extensively studied, and itis now known that for certain a (such as a = 2) and almost all p , cutoﬀoccurs at time c log p for an explicit constant c [18].Since simple random walk on F p requires order p steps to mix (see [28]for example), the Chung–Diaconis–Graham process provides an explicit ex-ample of a random walk where applying a deterministic bijection (in thiscase, x ax ) between steps of the walk exponentially speeds up mixing.This was studied in [12], where it was asked whether other explicit examplescould be provided.In this paper, we consider the random walk on F p deﬁned by X n +1 = ι ( X n ) + ε n +1 , where ι ( x ) = 1 /x if x = 0 and ι (0) = 0, and the ε i are independent andidentically distributed. Chatterjee and Diaconis asked about the order of themixing time for this walk [12], which was originally suggested by Soundarara-jan. We solve this problem by showing that this random walk mixes in orderlog p steps. Since simple random walk on F p mixes in order p steps, thisgives an exponential speedup for the mixing time, and provides an explicitexample of when adding a deterministic function between steps of a Markovchain gives an exponential speedup.The proof uses comparison theory developed by Smith [33] to relate therandom walk on F p to a random walk on the projective line P ( F p ). The walkon P ( F p ) is the quotient of a random walk on SL ( F p ) which is known tohave a constant order spectral gap by results of Bourgain and Gamburd [6].While our methods identify the order of the mixing time, they are not strong enough to identify the constant or obtain cutoﬀ, and we leave this as an openproblem.Our result also implies bounds on the number of solutions to the con-gruence xy ≡ p ) for x ∈ I and y ∈ J intervals of the same length,although these bounds are weaker than the ones established in [14].1.1. Main results.

We now formally state our main results. For two prob-ability measures µ and ν , let k µ − ν k T V = sup A | µ ( A ) − ν ( A ) | denote the total variation distance . Let X n be an ergodic Markov chain withtransition matrix P and stationary distribution π , . Let P n ( x, y ) denote theprobability that X n = y given that X = x . Then the mixing time is deﬁnedto be t mix ( ε ) = inf (cid:26) n : sup x k P n ( x, · ) − π k T V ≤ ε (cid:27) . Let ι : F p → F p be the function deﬁned by ι ( x ) = 1 /x if x = 0 and ι (0) = 0. Let ε i be independent and identically distributed on Z . Let K denote the transition matrix for the Markov chain on F p deﬁned by(1.1) X n +1 = ι ( X n ) + ε n +1 . We show that for this Markov chain, t mix ( ε ) is Θ(log p ), where the impliedconstants are allowed to depend on both µ and ε . Informally, this meansthat order log p steps are necessary and suﬃcient for the Markov chain toconverge to its stationary distribution. The following theorems give thedesired lower and upper bounds on t mix respectively . Theorem 1.1 (Lower bound) . Let µ denote a probability measure on Z ,and assume that H ( µ ) = X − µ ( x ) log µ ( x ) < ∞ . Let K be the transition matrix for the Markov chain on F p deﬁned by X n +1 = ι ( X n ) + ε n +1 , where the ε n are independent and identically distributed according to µ . Let π denote the uniform distribution on F p . Then for all x in F p k K n ( x, · ) − π k T V ≥ − nH ( µ ) + log 2log p . Theorem 1.2 (Upper bound) . Let µ denote a probability measure on Z whose support contains at least two values. Let K be the transition matrixfor the Markov chain on F p deﬁned by X n +1 = ι ( X n ) + ε n +1 , where the ε n are independent and identically distributed according to µ . Let π denote the uniform distribution on F p . Then there exists a constant C > depending only on µ such that for all x in F p and suﬃciently large p , k K n ( x, · ) − π k T V ≤ √ p e − Cn . IXING TIME OF FRACTIONAL RANDOM WALK ON FINITE FIELDS 3

The lower bound follows easily from entropy considerations. We make anassumption that the entropy of µ is ﬁnite, which cannot be removed entirely(see Remark 2.2). However, this assumption is relatively weak, and indeedall ﬁnitely supported distributions have ﬁnite entropy.We now describe our plan to prove the upper bound. We establish theupper bound by connecting the Markov chain X n with a Markov chain on P ( F p ), which in turn is a projection of a Markov chain on a Cayley graphof SL ( F p ). We next sketch this connection.Recall that given a group G and a generating set S , we deﬁne the Cayleygraph G ( G, S ) as the graph with vertex set G and two elements x and y areconnected if and only if x = σy for some σ ∈ S . We use A ( G ) to denotethe normalised adjacency matrix of G , whose largest eigenvalue is 1. Let S be a subset of SL ( Z ) so that S p generates SL ( F p ), where S p is the set S mod p . Results of Bourgain and Gamburd [6] give conditions for when G (SL ( F p ) , S p ) has a constant spectral gap, i.e., λ ( A ( G (SL ( F p ) , S p ))) ≤ − c for some constant c > p , and thus also a O (log p )mixing time.The random walk on G (SL ( F p ) , S p ) has a natural projection on P ( F p ),deﬁned by the action (cid:18) a bc d (cid:19) · x = ax + bcx + d . This can also be seen as a random walk on the Schreier graph correspondingto this action. Since the spectral gaps of the walks on SL ( F p ) and P ( F p )are related, this gives an O (log p ) mixing time for the walk on P ( F p ) . Finally, we connect the Markov chain on P ( F p ) with the fractional Markovchain X n on F p . With an appropriate choice of the set S , the Markov chainon P ( F p ) induced by the random walk on G (SL ( F p ) , S p ) and X n have verysimilar transition probabilities. The mixing time bound of X n can thus beobtained from that of the chain on P ( F p ) via comparison theory of Markovchains on diﬀerent state spaces.A key input to the above plan is the spectral gap of random walks onCayley graphs of SL ( F p ), which we discuss in further detail in Section 1.4.1.2. Bounds on solutions to a congruence equation.

We also givean application of these ideas to bounding the number of solutions to thecongruence xy ≡ p )with x ∈ I and y ∈ J for intervals I and J of the same length m ≤ p/

2. Inparticular, for large enough p and m , we establish that |{ ( x, y ) ∈ I × J | xy ≡ p ) }| ≤ (1 − δ ) m for some absolute constant δ > xy ≡ p )is weaker than the bounds obtained in [11] and [14] for intervals that arenot too large and weaker than the standard ones coming from estimateson incomplete Kloosterman sums for intervals that are not too small. The JIMMY HE, HUY TUAN PHAM, AND MAX WENQIANG XU number of solutions has also been estimated in [15,19,32]. In particular, themethod used in [32] also relies on SL action. While this result is not new,the proof is straightforward and we hope that our ideas can lead to furtherapplications.1.3. Related work.

The random walk we study may be viewed as a non-linear version of the Chung–Diaconis–Graham process, or the ax + b process,which is the random walk on F p (or more generally Z /n Z for composite n )deﬁned by X n +1 = aX n + ε n +1 , where a ∈ F × p is ﬁxed and the ε i are independent and identically distributed,drawn from some distribution. It was introduced in [13], where the case of a = 2 and ε i distributed uniformly on {− , , } was studied in detail. Theyshowed that the mixing time was at most order log p log log p , and that foralmost all odd p , order log p steps were suﬃcient, although for inﬁnitelymany odd p , order log p log log p steps were necessary. This process wassubsequently studied in [10,18,25–27,29]. In particular, Eberhard and Varj´urecently showed in [18] that for almost all p , the walk exhibits cutoﬀ at c log p for some explicit constant c ≈ . n elements, almost all bijections, when applied betweensteps of a Markov chain, cause the chain to mix in O (log n ) steps. Our workgives another example of this phenomenon, which ﬁts into a broader themeof studying how convergence can be sped up, via either deterministic orrandom means [2, 5, 24].In [22], the ﬁrst author studied a more general non-linear version of theChung–Diaconis–Graham process, deﬁned by X n +1 = f ( X n ) + ε n +1 , where f is a bijection on F p . It was shown that for functions f ( x ) whichwere extensions of rational functions, the mixing time was of order at most p ε for any ε >

0, where the implicit constant depends only on ε and thedegree of the polynomials appearing in f ( x ). In particular, this applies forthe function ι that we consider. However, the only known lower bound onthe mixing time is of order log p . Our work closes the large gap between theupper and lower bounds in this special case.1.4. Random walk on Cayley graphs of SL ( F p ) . Selberg’s Theorem[31] shows that if S is a subset of SL ( Z ) such that S generates a subgroupof SL ( Z ) of ﬁnite index, thenlim sup p →∞ λ ( A ( G (SL ( F p ) , S p ))) < . Bourgain and Gamburd [6] strengthen this result and show that the aboveproperty holds if and only if S generates a non-elementary subgroup ofSL ( Z ). Weigel [34] gives a convenient characterization that h S i is non-elementary if and only if h S p i = SL ( F p ) for some prime p ≥ IXING TIME OF FRACTIONAL RANDOM WALK ON FINITE FIELDS 5

We remark that a simple reduction allows us to deduce Theorem 1.2 fromthe special case where the distribution of ε n is supported on two points. Inthis case, we can construct the set S corresponding to the fractional Markovchain and easily verify that h S i is non-elementary.Golumbev and Kamper [21] show that under more stringent assumptionson the set of generators S , the Cayley graphs G (SL ( F p ) , S p ) have a strongerproperty that implies cut-oﬀ for the associated projected random walk on P ( F p ). This property also holds with high probability for random generat-ing sets S p . We also remark that the result of Bourgain-Gamburd has beenfurther developed in various aspects, e.g., [8, 9, 20, 21]. These results alsohave many applications in number theory, including sum product problemson ﬁnite ﬁelds [7, 23].1.5. Outline.

In Section 2, we prove Theorem 1.1. In Section 3, we explainthe comparison theory for Markov chains used to relate the spectral gapsfor the walks on F p and P ( F p ). In Section 4, we give a proof of Theorem1.2. Finally, in Section 5, we give a bound for the number of solutions to xy ≡ p ) in a square I × I ⊆ F p .1.6. Notations.

Throughout, we let ι : F p → F p be the function deﬁned by ι ( x ) = 1 /x if x = 0 and ι ( x ) = 0 if x = 0. We view P ( F p ) as a superset of F p with one extra element ∞ . We deﬁne the function ι : P ( F p ) → P ( F p )by ι ( x ) = 1 /x if x = 0 , ∞ , and ι (0) = ∞ and ι ( ∞ ) = 0.If P is a symmetric matrix, we let λ i ( P ) denote the i th largest eigenvalueof P . 2. Lower bound

In this section, we prove Theorem 1.1, which follows from entropy con-siderations. For a discrete probability measure µ , we let H ( µ ) = X − µ ( x ) log µ ( x )denote the entropy of µ , with the convention that 0 log 0 = 0. If X is arandom variable, we let H ( X ) denote the entropy of the law of X . We needtwo basic properties of entropy. The ﬁrst is that it is subadditive, in thesense that if X and Y are independent, then H ( X + Y ) ≤ H ( X ) + H ( Y ).The second is that for any function f , H ( f ( X )) ≤ H ( X ), with equality if f is bijective.The lower bound follows by noting that the entropy of the random walkat time n is at most nH ( µ ), which is too low to be close to uniform if n is toosmall. This idea is certainly not new, and similar arguments have appearedbefore, see [1, 4, 18] for example. Our argument follows [1], although we ﬁllin some details and ﬁx a minor error.The following lemma shows that measures close to uniform in total varia-tion must have large entropy. It was stated without the log 2 term in [1], butthis cannot be correct, as can be seen if µ is uniform and ν is concentratedat a single point. It can be seen as a special case of Theorem 1 of [3] (infact, it is established as part of the proof). JIMMY HE, HUY TUAN PHAM, AND MAX WENQIANG XU

Lemma 2.1.

Let X be a ﬁnite set of size n , and let π denote the uniformmeasure on X . Let ν be any probability measure on X , and let δ = k ν − π k T V .Then | H ( µ ) − H ( ν ) | ≤ δ log( n − − δ log δ − (1 − δ ) log(1 − δ ) ≤ δ log n + log(2) . Proof.

The ﬁrst inequality is Equation 11 of [3] and the second inequality isclear. (cid:3)

Proof of Theorem 1.1.

Because ι is a bijection, H ( X ) = H ( ι ( X )). Fur-thermore, if µ p denotes the distribution given by reducing µ mod p , then H ( µ p ) ≤ H ( µ ). Then by subadditivity of entropy, H ( K n ( x, · )) ≤ nH ( µ ).Thus, Lemma 2.1 gives k K n ( x, · ) − π k T V ≥ − nH ( µ ) + log 2log p . (cid:3) Remark . We note that the ﬁnite entropy assumption cannot be removedentirely. Fix any constant δ ∈ (0 , Z given by µ ( i ) ∝ i (log i ) − δ . There is some constant c = c ( δ ) > p and i ≤ p , X k ∈ Z µ ( i + kp ) ≥ X k ≥ p kp (log( kp )) − δ ≥ cp (log p ) − δ . Then on F p , the walk deﬁned by (1.1) satisﬁes P ( X n +1 = x | X n = y ) ≥ cp (log p ) − δ for all x and y , and so we can couple two copies of the walk, X n and X ′ n , sothat independent of the initial states, P ( X n = X ′ n ) ≤ (cid:18) − c (log p ) − δ (cid:19) n . Since the distance to stationarity can be controlled by this probability (see[28, Corollary 5.5]), this implies that k K n ( x, · ) − π k T V ≤ (cid:18) − c (log p ) − δ (cid:19) n , and so t mix ( ε ) = O ε ((log p ) − δ ), which is much smaller than log p .For positive integers k , deﬁne log ( k ) iteratively by log (1) ( x ) = log x andlog ( k ) ( x ) = log(log ( k − ( x )) for k ≥

2. The measure µ ( i ) ∝ i (log i )(log (2) i ) · · · (log ( k − i )(log ( k ) i ) gives a probability distribution on Z with inﬁnite entropy for which themixing time t mix ( ε ) = O ε (log ( k ) p ). IXING TIME OF FRACTIONAL RANDOM WALK ON FINITE FIELDS 7 Comparison theory

Comparison theory for Markov chains was introduced by Diaconis andSaloﬀ-Coste in [16, 17]. This theory allows the spectral gaps (and also log-Sobolev constants) of diﬀerent Markov chains on the same state space to becompared. Unfortunately, since we wish to compare a Markov chain on F p with one on P ( F p ), this theory does not immediately apply. Smith extendedthese ideas in [33] to the case of random walks on state spaces X ⊆ X .We ﬁrst recall the relationship between the Dirichlet form and the spectralgap, and then explain the comparison theory developed by Smith for Markovchains on diﬀerent state spaces. This is used to compare the spectral gapsof the random walks on P ( F p ) and F p . For a more thorough treatment andproofs, we refer the reader to [33].3.1. The Dirichlet form.

Recall the following standard notions. Let P bea reversible Markov chain on a ﬁnite space X with stationary distribution π . Deﬁne the function V π ( f ) = 12 X x,y ∈ X | f ( x ) − f ( y ) | π ( x ) π ( y )and the Dirichlet form E P ( f, f ) = 12 X x,y ∈ X | f ( x ) − f ( y ) | P ( x, y ) π ( x )for f : X → R . Then the spectral gap can be computed by(3.1) 1 − λ ( P ) = inf f : X → R f not constant E P ( f, f ) V π ( f ) . Comparison theory for Markov chains on diﬀerent state spaces.

Let X ⊆ X , and let P be a Markov chain on X and P be a Markov chainon X , with stationary distributions π and π respectively. Assume that bothare ergodic and reversible. The following results of Smith compare V π and E P with V π and E P . Lemma 3.1 ([33, Lemma 2]) . Let f : X → R and let f : X → R be anyextension of f . Then V π ( f ) ≤ CV π ( f ) , where C = sup x ∈ X π ( x ) π ( x ) . The analogous comparison result for the Dirichlet form requires furthersetup. For each x ∈ X , ﬁx a probability measure Q x on X such that Q x = δ x if x ∈ X . For any f : X → R , this deﬁnes an extension f : X → R by f ( x ) = X y ∈ X Q x ( y ) f ( y ) . Next, choose a coupling of Q x and Q y for all x, y ∈ X for which P ( x, y ) > Q x,y . JIMMY HE, HUY TUAN PHAM, AND MAX WENQIANG XU A path in X from x to y is a sequence of x i ∈ X for i = 0 , . . . , k forwhich x = x , x k = y and P ( x i , x i +1 ) > i . For a path γ from x to y , call x the initial vertex and y the ﬁnal vertex , and denote them by i ( γ )and o ( γ ) respectively. Let | γ | denote the length of the path.A ﬂow on X is a function F on the set of paths in X whose restriction topaths from x to y gives a probability measure for all x , y . We require a choiceof ﬂow on X . Actually it suﬃces to deﬁne the ﬂow only for paths from a to b such that there exists x, y ∈ X with Q x,y ( a, b ) >

0. For our purposes,it will suﬃce to choose the ﬂow such that when restricted to paths from a to b , it concentrates on a single path.The following theorem compares E P with E P in terms of the chosen data. Theorem 3.2 ([33, Theorem 4]) . Consider the setup deﬁned above. Thenfor f : X → R and f the extension to X deﬁned by the measures Q x , wehave E P ( f, f ) ≤ AE P ( f , f ) , where A = sup P ( x,y ) > P ( x, y ) π ( x ) (cid:18) X γ ∋ ( x,y ) | γ | F ( γ ) P ( i ( γ ) , o ( γ )) π ( i ( γ ))+ 2 X γ ∋ ( x,y ) | γ | F ( γ ) X b X Q b ( o ( γ )) P ( i ( γ ) , b ) π ( i ( γ ))+ X γ ∋ ( x,y ) | γ | F ( γ ) X a,b X P ( x,y ) > Q a,b ( i ( γ ) , o ( γ )) P ( a, b ) π ( a ) (cid:19) . Note that by (3.1), an immediate consequence of these comparison resultsis that 1 − λ ( P ) ≥ C A (1 − λ ( P )) . Upper bound

In this section, we prove Theorem 1.2. The proof of the upper boundconsists of three main steps. First, we control the mixing time by the spec-tral gap of a symmetrized random walk (Corollary 4.2). Next, we relate thespectral gap of the symmetrized walk on F p with a walk on P ( F p ) (Propo-sition 4.4). Finally, we show that the spectral gap of the walk on P ( F p ) isof constant order (Proposition 4.8). Proof of Theorem 1.2.

Let P denote the Markov matrix encoding the step X X + ε and let Π denote the Markov matrix encoding X ι ( X ). Notethat P T encodes the step X X − ε . By Corollary 4.2, it suﬃces to showthat λ ( P T Π P T P Π P ) ≤ − c for some constant c >

0, independent of p .Let a , a ∈ Z be two distinct elements in the support of µ and write b = a − a . Then in the Markov chain given by the transition matrix P T Π P T P Π P , there is u > µ such that we transition IXING TIME OF FRACTIONAL RANDOM WALK ON FINITE FIELDS 9 from x to one of x + b , x − b , ι ( ι ( x + a ) + b ) − a and ι ( ι ( x + a ) − b ) − a withprobability at least u . For example, note that x + b = ι ( ι ( x + a )+ a − a ) − a and so there is a positive probability of moving from x to x + b .Then we can write P T Π P T P Π P = uL + (1 − u ) L ′ where L is thetransition matrix for the random walk going from x to one of x + b , x − b , ι ( ι ( x + a ) + b ) − a and ι ( ι ( x + a ) − b ) − a with equal probability, and L ′ a symmetric stochastic matrix. We have λ ( P T Π P T P Π P ) ≤ uλ ( L ) + 1 − u, and so it suﬃces to show that λ ( L ) ≤ − c for some constant c > ι : P ( F p ) → P ( F p ) be the function deﬁned by ι ( x ) = 1 /x if x = 0 , ∞ ,and ι (0) = ∞ , and ι ( ∞ ) = 0. Let L denote the transition matrix for therandom walk on P ( F p ) going from x to one of x + b , x − b , ι ( ι ( x + a )+ b ) − a and ι ( ι ( x + a ) − b ) − a with equal probability. By Proposition 4.4, we caninstead show λ ( L ) ≤ − c for the walk deﬁned by L instead of the walk deﬁned by L , and this isexactly given by Proposition 4.8. (cid:3) Reduction to the symmetric walk.

Recall that we are interested instudying the walk X n +1 = ι ( X n )+ ε n +1 on F p , where ε n are independent andidentically distributed according to µ . This random walk is non-reversible,so the ﬁrst step is to relate it to a suitable symmetrization.Let Π denote the transition matrix for the (deterministic) walk X ι ( X )on F p and let P denote the walk X X + ε . Note that P T is also a Markovmatrix and encodes the walk X X − ε . Let λ be the second largesteigenvalue of P T Π P T P Π P . Note that all eigenvalues of P T Π P T P Π P arereal and non-negative.The following result from [12] relates the original walk K = P Π to thesymmetrized walk P T Π P T P Π P . Lemma 4.1 ([12, Corollary 5.4]) . Let k ≥ and let x ∈ R p such that P x i = 0 . Then k ( K T ) k x k ≤ λ ( k − / k x k . This result means that if P T Π P T P Π P can be shown to have a spectralgap of constant order, then the walk K mixes in order log p steps. Thefollowing is an easy consequence of Lemma 4.1 (see Equation 5.5 in [12]). Corollary 4.2.

Let π denote the uniform measure on F p . Then for all x ∈ F p , k K k ( x, · ) − π k T V ≤ √ p λ ( k − / . Comparison of the random walks on P ( F p ) and F p . We nowexplain how to apply the comparison results to bound the spectral gap ofthe walk on F p in terms of the spectral gap of the walk on P ( F p ).Recall from the proof of Theorem 1.2 that we denote by L the transitionmatrix of the random walk on F p which moves from x ∈ F p to one of x + b , x − b , ι ( ι ( x + a ) + b ) − a and ι ( ι ( x + a ) − b ) − a with equal probability. We denote by L the transition matrix of the walk on P ( F p ) which moves from x ∈ F p to one of x + b , x − b , ι ( ι ( x + a ) + b ) − a and ι ( ι ( x + a ) − b ) − a .Here, recall that ι ( x ) = 1 /x if x = 0 , ∞ , and ι (0) = ∞ , and ι ( ∞ ) = 0. Thefollowing lemma is useful in constructing the data required in Theorem 3.2. Lemma 4.3.

For all x, y ∈ F p except x = − a and y = − a , if L ( x, y ) > ,then L ( x, y ) > .Proof. First, note that if x = − a , b − − a , − b − − a , then the statement isclear since in this case the functions ι and ι are identical for the transitionsinvolved. It can be checked that if x = b − − a or x = − b − − a , and y = ∞ and L ( x, y ) >

0, then L ( x, y ) >

0. Similarly, we can check that if x = − a and L ( x, y ) > L ( x, y ) > y = − a . (cid:3) Proposition 4.4.

Let L and L denote the transition matrices for the ran-dom walk on F p and P ( F p ) respectively. Then there exists an absolute con-stant c > such that − λ ( L ) ≥ c (1 − λ ( L )) . Proof.

We begin by deﬁning the data needed to apply Theorem 3.2. Take X = F p and X = P ( F p ). Deﬁne for x ∈ F p , Q x = δ x and Q ∞ is uniformon the set of y ∈ F p such that L ( ∞ , y ) >

0. Take all couplings to beindependent, so Q x,y ( x , y ) = Q x ( x ) Q y ( y ). Then note that the conditionthat Q x,y ( x , y ) > x, y ∈ X is exactly that either L ( x , y ) > L ( x , ∞ ) > L ( y , ∞ ) > L ( x , y ) >

0, then L ( x , y ) > x = − a and y = − a . Thus, if L ( x , y ) >

0, we pick our ﬂow F to beconcentrated on the path consisting of the single edge ( x , y ), except when x = y = − a where we take the path of length 2 given by − a b − − a a .If L ( x , ∞ ) > L ( y , ∞ ) >

0, then there must be a path of length 2connecting x to y . This is because x , y ∈ { b − − a , − b − − a } and sowe can take our ﬂow to be concentrated on the path x

7→ − a y . Thus,our ﬂow is concentrated entirely on paths of lengths 1 and 2.We can now compute the constants C and A . It’s easy to see that C ≤ A , note that for each ( x, y ) for which L ( x, y ) >

0, there is atmost one path γ of length 1 on which F ( γ ) = 0 containing ( x, y ), namely γ = ( x, y ), and there is a bounded number of paths γ of length 2 containing( x, y ) for which F ( γ ) >

0. Thus, for any ( x, y ) for which L ( x, y ) > A , each of bounded size. Hence, A is bounded above by a constant. Thisimplies that 1 − λ ( L ) ≥ c (1 − λ ( L ))for some absolute constant c > (cid:3) Remark . For a reversible Markov chain P with stationary distribution π , let φ ( P ) = min π ( S ) ≤ / P x ∈ S,y ∈ S c π ( x ) P ( x, y ) π ( S ) IXING TIME OF FRACTIONAL RANDOM WALK ON FINITE FIELDS 11 denote the bottleneck ratio (sometimes also called the

Cheeger constant or conductance ). We note that a constant order spectral gap for L could alsobe derived using Cheeger’s inequality (see [28, Theorem 13.10] for example),which states that φ ( P ) ≤ − λ ( P ) ≤ φ ( P ) . The bottleneck ratios of L and L can be easily compared, which wouldgive a uniform lower bound for 1 − λ ( L ) using the one for 1 − λ ( L ). Weprefer to use comparison theory because the argument can be more easilygeneralized. In particular, it is more robust and would give a sharper resultif L did not have a constant order spectral gap.4.3. Spectral gap for random walk on P ( F p ) . In this section, we provethat the random walk on P ( F p ) has a constant order spectral gap.Recall that SL ( F p ) has a transitive action on P ( F p ), viewed as lines in F p . This is through linear fractional transformations, and may be formallydeﬁned by(4.1) (cid:18) a bc d (cid:19) · x =  ax + bcx + d if cx + d = 0 and x ∈ F p ∞ if cx + d = 0 and x ∈ F pac if x = ∞ and c = 0 ∞ if x = ∞ and c = 0 . The random walk on P ( F p ) given by moving from x to one of x + b , x − b , ι ( ι ( x + a ) + b ) − a and ι ( ι ( x + a ) − b ) − a uniformly at random can beviewed as the quotient of a random walk on SL ( F p ). The following resultof Bourgain and Gamburd gives a constant order spectral gap for randomwalks on SL ( F p ). Theorem 4.6 ([6, Theorem 1]) . Let S ⊆ SL ( Z ) be a symmetric set whichgenerates a non-elementary subgroup of SL ( Z ) . Let S p denote the set ofgenerators mod p for a prime p . Let P denote the transition matrix for therandom walk on SL ( F p ) deﬁned by X n +1 = X n ε n +1 , where the ε i are independent and uniformly distributed on S p . Then for allprimes p large enough, λ ( P ) ≤ − c for some constant c > independent of p . To show that this applies to the random walk we wish to study, we needthe following lemma.

Lemma 4.7.

Let S = (cid:26)(cid:18) b (cid:19) , (cid:18) − a b − a bb a b + 1 (cid:19) , (cid:18) − b (cid:19) , (cid:18) a b a b − b − a b (cid:19)(cid:27) . Then S generates a non-elementary subgroup of SL ( Z ) . Proof.

By Theorem 2.5 of [30], non-elementary subgroups are the same asZariski-dense subgroups in SL ( Z ). By a result of Weigel [34] (see alsoTheorem 2.2 of [30]), it suﬃces to show that after reducing mod p for some p ≥ S p generates SL ( F p ).If b = 0 (mod p ), then the matrices (cid:18) b (cid:19) and (cid:18) − b (cid:19) generate allmatrices of the form (cid:18) t (cid:19) for t ∈ F p .Using (cid:18) t (cid:19) and (cid:18) − a b − a bb a b + 1 (cid:19) , we can generate all matrices ofthe form (cid:18) t ′ (cid:19) · (cid:18) − a b − a bb a b + 1 (cid:19) · (cid:18) t (cid:19) . In particular, we can generate all matrices in the subset X b of SL ( F p ) ofmatrices whose lower left corner equal to b .Similarly, S p generates matrices in the subset X − b of SL ( F p ) of matriceswith lower left corner equal to − b , using (cid:18) a b a b − b − a b (cid:19) instead of (cid:18) − a b − a bb a b + 1 (cid:19) .Finally, we can easily check that X b and X − b generate SL ( F p ). Thus, S p generates SL ( F p ) for all p > | b | , and hence, S generates a non-elementarysubgroup of SL ( Z ). (cid:3) Since the random walk on P ( F p ) is a quotient of a random walk onSL ( F p ), we can obtain the desired bound on the spectral gap. Proposition 4.8.

Let L denote the transition matrix for the random walkon P ( F p ) which moves from x ∈ F p to one of x + b , x − b , ι ( ι ( x + a )+ b ) − a and ι ( ι ( x + a ) − b ) − a uniformly at random. Then λ ( L ) ≤ − c for some constant c > independent of p .Proof. Note that the formulas given in 4.1 actually deﬁne a GL ( F p ) action.It’s easy to see that ι and addition by a ∈ F p are both linear frationaltransformations, represented by the matrices (cid:18) a (cid:19) , (cid:18) (cid:19) in GL ( F p ). Then we may write the function x x + b as a product of thesematrices, and similarly for x − b , ι ( ι ( x + a ) + b ) − a and ι ( ι ( x + a ) − b ) − a .This allows us to view the walk deﬁned by L as the quotient of the walk onSL ( F p ) generated by the set S = (cid:26)(cid:18) b (cid:19) , (cid:18) − a b − a bb a b + 1 (cid:19) , (cid:18) − b (cid:19) , (cid:18) a b a b − b − a b (cid:19)(cid:27) , with transition matrix e L . This means that the spectrum of L is contained inthe spectrum of e L , and in particular λ ( L ) ≤ λ ( e L ). But since S generates IXING TIME OF FRACTIONAL RANDOM WALK ON FINITE FIELDS 13 a non-elementary subgroup of SL ( Z ) by Lemma 4.7, Theorem 4.6 impliesthat λ ( e L ) ≤ − c for some constant c > p , giving us the desired bound. (cid:3) Points on modular hyperbolas

In this section, we prove the following theorem stating that for all subsets A ⊆ F p of size at most p/

2, the sets A and ι ( A ) cannot both be close tointervals. As a corollary, we obtain bounds on the number of solutions tothe congruence xy = 1 (mod p ) lying in I × J ⊆ F p for I and J intervals ofthe same length. Theorem 5.1.

Let I and J be two intervals in F p , each of length m ≤ p/ ,and let A ⊆ F p with A ⊆ I and ι ( A ) ⊆ J . There exists an absolute constant δ > such that for all p and m suﬃciently large, | A | ≤ (1 − δ ) m .Proof. Throughout the proof, we use the notation [ − k, k ] to denote the setof integers {− k, − k + 1 , . . . , k − , k } . Let Q = P T Π P T P Π P , where P denotes the transition matrix for the random walk on F p generated by theuniform measure on [ − ,

1] and Π encodes the bijection ι . Then Q is thetransition matrix of a reversible Markov chain on F p , and from the proof ofTheorem 1.2 it has a constant order spectral gap, i.e. for large enough p ,1 − λ ( Q ) ≥ γ for some absolute constant γ > I , J such that A ⊆ I , ι ( A ) ⊆ J , and | A | ≥ (1 − δ ) m . We will show using Cheeger’s inequality that this impliesthe spectral gap cannot be of constant order, which gives a contradiction.Observe that Q ( x, y ) ≥ c > c independent of p . Sincethe stationary distribution is uniform, the bottleneck ratio of the chain X n is given by Φ = min S ⊆ F p , | S |≤ p/ P x ∈ S,y / ∈ S Q ( x, y ) | S | . By Cheeger’s inequality, we have Φ is bounded below by a positive constant γ ′ depending only on γ . Thus, P x ∈ A,y / ∈ A Q ( x, y ) | A | ≥ γ ′ . For x ∈ A , we consider y ∈ F p for which Q ( x, y ) >

0. Then y ∈ ι ( ι ( A + [ − , − , − , . Note that ι ( A + [ − , ⊆ ι ( I + [ − , ι ( I + [ − , ι ( A ∪ (( I + {− , +1 } ) \ A ))= ι ( A ) ∪ ι (( I + {− , +1 } ) \ A ) . Let S = ι (( I + {− , +1 } ) \ A ). Then ι ( A + [ − , ⊆ ι ( A ) ∪ S , and | S | ≤ δm . Similarly, since ι ( A ) ∪ S ⊆ J ∪ S , we have ι (( ι ( A ) ∪ S ) + [ − , ⊆ ι ( J + [ − , ∪ ι ( S + [ − , ⊆ A ∪ ι (( J + [ − , \ ι ( A )) ∪ ι ( S + [ − , . Here, | ι (( J + [ − , \ ι ( A )) | ≤ δm , and | ι ( S + [ − , | ≤ | S | . Letting S = ι (( J + [ − , \ ι ( A )) ∪ ι ( S + [ − , , we have ι ( ι ( A + [ − , − , ⊆ A ∪ S , where | S | ≤

14 + 6 δm .Finally, we have y ∈ ( A ∪ S ) + [ − , A ∪ S ) + [ − , ⊆ ( I ∪ S ) + [ − , ⊆ ( I + [ − , ∪ ( S + [ − , ⊆ A ∪ (( I + [ − , \ A ) ∪ ( S + [ − , . Let S = (( I + [ − , \ A ) ∪ ( S + [ − , | S | ≤

44 + 19 δm .Thus, if x ∈ A , y / ∈ A , and Q ( x, y ) >

0, then we must have y ∈ S . Hence, P x ∈ A,y / ∈ A Q ( x, y ) | A | ≤ P x ∈ A,y ∈ S Q ( x, y ) | A | ≤

44 + 19 δm (1 − δ ) m ≤ δ − δ , where we assumed that m is suﬃciently large (in δ ), and used that X x ∈ A,y ∈ S Q ( x, y ) ≤ X x ∈ F p ,y ∈ S Q ( y, x ) = | S | . For δ suﬃciently small, 20 δ/ (1 − δ ) < γ ′ , and we have a contradiction. (cid:3) The following corollary follows easily from Theorem 5.1 by taking A = { x ∈ I | ι ( x ) ∈ J } , which away from 0 counts the solutions of interest. Corollary 5.2.

Let I and J be two intervals in F p of the same length m ≤ p/ . There exists an absolute constant δ > such that for suﬃciently large p and m , |{ ( x, y ) ∈ I × J | xy ≡ p ) }| ≤ (1 − δ ) m. Acknowledgements

The authors thank Sourav Chatterjee, Persi Diaconis, Jacob Fox, SeanEberhard, Ilya Shkredov, Kannan Soundararajan, P´eter Varj´u, and ThuyDuong Vuong for their help and comments on earlier drafts. We are greatlyindebted to P´eter Varj´u for pointing out the many useful references to thestudy of spectral gap of Cayley graphs of SL ( F p ), in particular, [6]. IXING TIME OF FRACTIONAL RANDOM WALK ON FINITE FIELDS 15

References [1] David Aldous. Random walks on ﬁnite groups and rapidly mixing Markov chains. In

Seminar on probability, XVII , volume 986 of

Lecture Notes in Math. , pages 243–297.Springer, Berlin, 1983.[2] Noga Alon, Itai Benjamini, Eyal Lubetzky, and Sasha Sodin. Non-backtracking ran-dom walks mix faster.

Commun. Contemp. Math. , 9(4):585–603, 2007.[3] Koenraad M. R. Audenaert. A sharp continuity estimate for the von Neumann en-tropy.

J. Phys. A , 40(28):8127–8136, 2007.[4] Charles Bordenave, Pietro Caputo, and Justin Salez. Cutoﬀ at the “entropic time”for sparse Markov chains.

Probab. Theory Related Fields , 173(1-2):261–292, 2019.[5] Charles Bordenave, Yanqi Qiu, and Yiwei Zhang. Spectral gap of sparse bistochas-tic matrices with exchangeable rows.

Ann. Inst. Henri Poincar´e Probab. Stat. ,56(4):2971–2995, 2020.[6] Jean Bourgain and Alex Gamburd. Uniform expansion bounds for Cayley graphs ofSL ( F p ). Ann. of Math. (2) , 167(2):625–642, 2008.[7] Jean Bourgain, Alex Gamburd, and Peter Sarnak. Aﬃne linear sieve, expanders, andsum-product.

Invent. Math. , 179(3):559–644, 2010.[8] Jean Bourgain and P´eter P. Varj´u. Expansion in SL d ( Z /q Z ) , q arbitrary. Invent.Math. , 188(1):151–173, 2012.[9] Emmanuel Breuillard and Alex Gamburd. Strong uniform expansion in SL(2 , p ). Geom. Funct. Anal. , 20(5):1201–1209, 2010.[10] Emmanuel Breuillard and P´eter P. Varj´u. Cut-oﬀ phenomenon for the ax+b Markovchain over a ﬁnite ﬁeld, 2019.[11] Tsz Ho Chan and Igor E. Shparlinski. On the concentration of points on modularhyperbolas and exponential curves.

Acta Arith. , 142(1):59–66, 2010.[12] Sourav Chatterjee and Persi Diaconis. Speeding up Markov chains with deterministicjumps.

Probab. Theory Related Fields , 178(3-4):1193–1214, 2020.[13] F. R. K. Chung, Persi Diaconis, and R. L. Graham. Random walks arising in randomnumber generation.

Ann. Probab. , 15(3):1148–1165, 1987.[14] Javier Cilleruelo and Moubariz Z. Garaev. Concentration of points on two and threedimensional modular hyperbolas and applications.

Geom. Funct. Anal. , 21(4):892–904, 2011.[15] Javier Cilleruelo and Ana Zumalac´arregui. Saving the logarithmic factor in the errorterm estimates of some congruence problems.

Math. Z. , 286(1-2):545–558, 2017.[16] Persi Diaconis and Laurent Saloﬀ-Coste. Comparison techniques for random walk onﬁnite groups.

Ann. Probab. , 21(4):2131–2156, 1993.[17] Persi Diaconis and Laurent Saloﬀ-Coste. Comparison theorems for reversible Markovchains.

Ann. Appl. Probab. , 3(3):696–730, 1993.[18] Sean Eberhard and P´eter P. Varj´u. Mixing time of the Chung–Diaconis–Grahamrandom process.

Probab. Theory Related Fields , 2020.[19] M. Z. Garaev. On the logarithmic factor in error term estimates in certain additivecongruence problems.

Acta Arith. , 124(1):27–39, 2006.[20] A. Salehi Golseﬁdy and P´eter P. Varj´u. Expansion in perfect groups.

Geom. Funct.Anal. , 22(6):1832–1891, 2012.[21] Konstantin Golubev and Amitay Kamber. Cutoﬀ on graphs and the sarnak-xue den-sity of eigenvalues, 2019.[22] Jimmy He. Markov chains on ﬁnite ﬁelds with deterministic jumps, 2020.[23] H. A. Helfgott. Growth and generation in SL ( Z /p Z ). Ann. of Math. (2) , 167(2):601–623, 2008.[24] Jonathan Hermon, Allan Sly, and Perla Sousi. Universality of cutoﬀ for graphs withan added random matching, 2020.[25] Martin Hildebrand. On the Chung-Diaconis-Graham random process.

Electron.Comm. Probab. , 11:347–356, 2006.[26] Martin Hildebrand. A lower bound for the Chung-Diaconis-Graham random process.

Proc. Amer. Math. Soc. , 137(4):1479–1487, 2009. [27] Martin Hildebrand. On a lower bound for the Chung-Diaconis-Graham random pro-cess.

Statist. Probab. Lett. , 152:121–125, 2019.[28] David A. Levin and Yuval Peres.

Markov chains and mixing times . American Math-ematical Society, Providence, RI, 2017. Second edition, With contributions by Eliz-abeth L. Wilmer, With a chapter on “Coupling from the past” by James G. Proppand David B. Wilson.[29] Richard Neville, III.

On lower bounds of the Chung-Diaconis-Graham random process .ProQuest LLC, Ann Arbor, MI, 2011. Thesis (Ph.D.)–State University of New Yorkat Albany.[30] Igor Rivin. Zariski density and genericity.

Int. Math. Res. Not. IMRN , (19):3649–3657, 2010.[31] Atle Selberg. On the estimation of Fourier coeﬃcients of modular forms. In

Proc.Sympos. Pure Math., Vol. VIII , pages 1–15. Amer. Math. Soc., Providence, R.I.,1965.[32] I. D. Shkredov. Modular hyperbolas and bilinear forms of Kloosterman sums.

J.Number Theory , 220:182–211, 2021.[33] Aaron Smith. Comparison theory for Markov chains on diﬀerent state spaces andapplication to random walk on derangements.

J. Theoret. Probab. , 28(4):1406–1430,2015.[34] Thomas Weigel. On the proﬁnite completion of arithmetic groups of split type. In

Loisd’alg`ebres et vari´et´es alg´ebriques (Colmar, 1991) , volume 50 of

Travaux en Cours ,pages 79–101. Hermann, Paris, 1996.

Department of Mathematics, Stanford University, Stanford, CA, USA

Email address : [email protected] Department of Mathematics, Stanford University, Stanford, CA, USA

Email address ::