A local limit theorem for Quicksort key comparisons via multi-round smoothing
aa r X i v : . [ m a t h . P R ] J a n A local limit theorem for
Quicksort keycomparisons via multi-round smoothing
B´ela Bollob´as ∗† James Allen Fill ‡§ Oliver Riordan ¶ January 16, 2017
Abstract
As proved by R´egnier [11] and R¨osler [13], the number of key com-parisons required by the randomized sorting algorithm
QuickSort tosort a list of n distinct items (keys) satisfies a global distributionallimit theorem. Fill and Janson [5, 6] proved results about the limitingdistribution and the rate of convergence, and used these to prove aresult part way towards a corresponding local limit theorem. In thispaper we use a multi-round smoothing technique to prove the full locallimit theorem. QuickSort , a basic sorting algorithm, may be described as follows. Theinput is a list, of length n >
0, of distinct real numbers (say). If n = 0 or n =1, do nothing (the list is already sorted). Otherwise, pick an element of thelist uniformly at random to use as the pivot , and compare each other elementwith the pivot. Recursively sort the two resulting sublists, and combine themin the obvious way, with the pivot in the middle. (Equivalently, one can sort ∗ Department of Pure Mathematics and Mathematical Statistics, Wilberforce Road,Cambridge CB3 0WB, UK and Department of Mathematical Sciences, University of Mem-phis, Memphis TN 38152, USA. E-mail: [email protected] . † Research supported in part by NSF grant DMS-1301614 and EU MULTIPLEX grant317532. ‡ Department of Applied Mathematics and Statistics, The Johns Hopkins University,Baltimore, MD 21218-2682, USA. E-mail: [email protected] . § Research supported by the Acheson J. Duncan Fund for the Advancement of Researchin Statistics. ¶ Mathematical Institute, University of Oxford, Radcliffe Observatory Quarter, Wood-stock Road, Oxford OX2 6GG, UK. E-mail: [email protected] . executiontree , with one node for each call. Each node either has no children (if thecorresponding list had length 0 or 1) or two children. The main quantitywe study here is the random variable Q n , the total number of comparisonsused in sorting a list of n distinct items.R´egnier [11] and R¨osler [13] each established, using different methods,a distributional limit theorem for Q n , proving that ( Q n − E Q n ) /n d → Q as n → ∞ , where Q has a certain distribution that can be characterizedin a variety of ways—to name one, as the unique fixed point of a certaindistributional identity. Using that distributional identity, Fill and Janson [5]showed (among stronger results) that the distribution of Q has a continuousand strictly positive density f on R .Fill and Janson [6] proved bounds on the rate of convergence in var-ious metrics, including the Kolmogorov–Smirnov distance (i.e., sup-normdistance for distribution functions). Using this and their results about f from [5], they proved a ‘semi-local’ limit theorem for Q n ; see their Theo-rem 6.1, which is reproduced in large part as Theorem 14 below. They posedthe question [6, Open Problem 6.2] of whether the corresponding local limittheorem (LLT) holds. Here we show that the answer is yes, using a multi-round smoothing technique developed in an initial draft of [2], but not usedin the final version of that paper. This method may well be applicable toother distributions in which one can find ‘smooth parts’ on various differentscales, including other distributions obeying recurrences of a type similar tothat obeyed by Q n . Taking the ‘semi-local’ limit theorem of [6] as a startingpoint, in this paper we shall prove the following LLT for Q n , together withan explicit (but almost certainly not sharp) rate of convergence. Theorem 1.
Defining Q n and Q as above, and setting q n := E Q n , thereexists a constant ε > such that the following holds. We have P ( Q n = x ) = n − f (( x − q n ) /n ) + O ( n − − ε ) (1) uniformly in integers x , where f is the continuous probability density func-tion of Q . In fact, our proof of Theorem 1 gives a bound of the form O ( n − / log n )on the error probability in (1).The basic idea used in our proof, that of strengthening a distributional(often normal) limit theorem to a local one by smoothing, is by now quite2ld. Suppose that X n takes integer values, and that we know that( X n − µ n ) /σ n d → X, (2)for some nice distribution X (say with continuous, strictly positive density f on R ). By the corresponding LLT we mean the statement that whenever x n is a sequence of deterministic values with x n = µ n + O ( σ n ) then P ( X n = x n ) = σ − n f (( x n − µ n ) /σ n ) + o ( σ − n ) . (3)It is not hard to see that to deduce (3) from (2), it suffices to show that‘nearby’ values have similar probabilities, i.e., that if x n , x ′ n = µ n + O ( σ n )and x n − x ′ n = o ( σ n ), then P ( X n = x n ) = P ( X n = x ′ n ) + o ( σ − n ) . (4)In turn, to prove (4) we might (as in MacDonald [8]) try to find a ‘smoothpart’ within the distribution of X n . More precisely, we might try to write X n = A n + B n where, for some σ -algebra F n , we have that A n is F n -measurable and the conditional distribution of B n given F n obeys (or nearlyalways obeys) a relation corresponding to (4). Then it follows easily (by firstconsidering conditional probabilities given F n ) that (4) holds. One idea isto choose F n so that B n has a very well understood distribution, such as abinomial one.In some contexts, this approach works directly. Here (as far as we cansee) it does not. We can decompose Q n as above with B n binomial (seeLemma 12), but B n will have variance Θ( n ), whereas Var Q n = Θ( n ).This would, roughly speaking, allow us to establish that P ( Q n = x n ) and P ( Q n = x ′ n ) are similar for x n − x ′ n = o ( √ n ), but we need this relation forall x n − x ′ n = o ( n ). The key idea, as in the draft of [2], is not to try to jump straight fromthe global limit theorem to the local one, but to proceed in stages. Forcertain pairs of values ℓ < m with ℓ > m = o ( n ) we attempt to showthat for any two length- ℓ subintervals I , I of an interval J of length m wehave P ( Q n ∈ I ) = P ( Q n ∈ I ) + o ( ℓ/n ) . (5) Actually, since [6] already contains a ‘semi-LLT’, it would suffice to consider x n − x ′ n = O ( n / ). A related idea has recently been used (independently) by Diaconis and Hough [4], ina different context. They work with characteristic functions, rather than directly withprobabilities as we do here, establishing smoothness at a range of frequency scales. m = o ( n ) each in-terval J of length m has about the right probability, and we then use therelation above to transfer this to shorter and shorter scales, eventually end-ing with ℓ = 1. In establishing (5), the idea is as before to find a suitabledecomposition Q n = A n + B n , but we can use a different decomposition foreach scale—there is no requirement that these decompositions be ‘compat-ible’ in any way. For each pair ( ℓ, m ) we need such a decomposition wherethe distribution of B n has a property analogous to (5).There are some complications carrying this out. Our random variables B n will have smaller variances than the original random variables Q n . Thismeans that the point probabilities P ( B n = x n ), and (as it turns out) theirdifferences P ( B n = x n ) − P ( B n = x ′ n ), are too large compared with thebounds we are aiming for, and the same holds with the points x n and x ′ n replaced by intervals. For this reason we mostly work with ratios, showingunder suitable conditions that P ( B n ∈ I ) ∼ P ( B n ∈ I ). But this is notalways true: Even if I and I are close, if both are far into a tail of B n the ratio of the probabilities may be far from 1. To deal with this we useanother trick: If for some interval I there is a significant probability p that A n + B n ∈ I with the translated interval I − A n being far above the meanof B n , say, then there is another interval J (to the left of I ) such that thereis a probability much larger than p that A n + B n ∈ J . Hence what we willactually show, for a series of scales m , is that (i) each interval of length m has about the right Q n -probability, and (ii) no interval of length m has Q n -probability much larger than it should. We will use (ii) at the longerscale m to show that the ‘tail contributions’ described above are small atscale ℓ . Thus we will be able to transfer the combined statement (i)+(ii)from longer to shorter scales.In the particular context of QuickSort there is a very nice way to findbinomial-like smooth parts: we partially expand the execution tree, looking,roughly speaking, for a way of writing the original instance as the union ofΘ( s ) instances of QuickSort each run on Θ( r ) input values, where s = n/r .Conditioning on this partial expansion (plus a little further information) theunknown part of the distribution is then ‘binomial-like’: it is a sum of Θ( s )independent random variables each with ‘scale’ Θ( r ).The rest of the paper is organized as follows: in Section 2 we state twostandard results we shall need later, and then establish the existence of thedecompositions described in the previous paragraph. In Section 3 we provesome simple properties of ‘binomial-like’ distributions. Section 4 is the heartof the paper; here we present the core smoothing argument, showing how totransfer ‘smoothness’ from a scale m to a scale ℓ m under suitable con-4itions. In Section 5 we complete the proof of Theorem 1; this is a matterof applying the results from Section 4 with suitable parameters, taking asa starting point the ‘semi-local’ limit theorem established by Fill and Jan-son [6]. Finally, in Section 6 we outline a different way of applying the samesmoothing results, taking a weaker distributional convergence result as thestarting point; this may be applicable in other settings. We shall use the Azuma–Hoeffding inequality (see [1] and [7]) in the followingform (see, for example, Ross [14, Theorem 6.3.3]).
Theorem 2.
Let ( Z n ) n > be a martingale with mean µ = E Z n . Let Z = µ and suppose that for nonnegative constants α i , β i , i > , we have − α i Z i − Z i − β i . Then, for any n > and a > we have P ( Z n − µ > a ) exp ( − a , n X i =1 ( α i + β i ) ) , and the same bound applies to P ( Z n − µ − a ) . We shall also need Esseen’s inequality, also known as the Berry–EsseenTheorem; see, for example, Petrov [10, Ch. V, Theorem 3]. We write Φ forthe distribution function of the standard normal random variable.
Theorem 3.
Let Z , . . . , Z t be independent random variables with ρ = P ti =1 E ( | Z i | ) finite, and let S = P ti =1 Z i . Then sup x (cid:12)(cid:12) P ( S x ) − Φ(( x − µ ) /σ ) (cid:12)(cid:12) Aρ/σ , where µ and σ are the mean and variance of S , and A is an absoluteconstant. In this subsection we shall show that, given a parameter r , a single run of QuickSort on a list of length n will, with high probability, involve Ω( n/r )instances of QuickSort run on disjoint lists of length between r/ r .5et 2 r < n be integers. We can implement QuickSort on a listof length n in two phases as follows: in the first step of Phase I, pick therandom pivot dividing the original list into two sublists of total length n − t of Phase I, if all the current sublists have length at most r , donothing. Otherwise, pick a sublist of length at least r + 1 arbitrarily, andpick the random pivot in this sublist, dividing its remaining elements intotwo new sublists. After n steps, we proceed to Phase II, where we simplyrun QuickSort on all remaining sublists. Let X n,r denote the number ofsublists at the end of Phase I that have length between r/ r . Lemma 4.
Let r > be even and n > r . Then P (cid:0) X n,r n r (cid:1) e − n/ (400 r ) . Proof.
We have specified that r be even only for convenience. We havemade no attempt to optimize the values of the various constants; these willbe irrelevant later.Running QuickSort in two phases as above, let T be the number of‘active’ steps in Phase I, i.e., steps in which we divide a sublist into two.Clearly, T n , the first T steps of Phase I are active, and after T steps wehave T + 1 sublists of total length n − T . The idea of the proof is to showthat T is very unlikely to be larger than 20 n/r , say, that E X n,r is of order n/r , and that each decision in the first phase of our algorithm alters theconditional expectation of X n,r by at most 1. The result will then followfrom the Azuma–Hoeffding inequality.Throughout the proof we keep r >
20 fixed. Let t = ⌈ n/r ⌉ . Observe that if T > t , then after step t we have t + 1 sublists with totallength < n . Since at most 10 n/r t / r/
10, at least t / < r/
10. Let N be thenumber of sublists after t steps that have length less than r/
10, so we haveshown that P ( T > t ) P ( N > t / . In any step of Phase I, we either do nothing, or randomly divide a list ofsome length ℓ > r + 1. In the latter case, the (conditional, given the past)probability of producing a sublist of length < r/
10 is at most2 ( r/
10 + 1) ℓ r/ ℓ < , r >
20 and ℓ > r . It follows that N is stochastically dominated by abinomial distribution with parameters t and 3 /
10, so P (cid:0) T > t (cid:1) P (cid:0) N > t / (cid:1) P (cid:0) Bin( t , / > t / (cid:1) e − t / , (6)using Theorem 2, or a standard Chernoff bound, for the last step.Turning to the next part of the argument, as r is fixed throughout, let uswrite X n for the random variable X n,r . We extend the definition of X n tothe case n r by considering Phase I to end immediately (with one ‘sublist’of length n ) in this case. The sequence ( X n ) satisfies the deterministic initialconditions X = · · · = X ( r/ − = 0 ,X r/ = · · · = X r = 1 , and (considering the first step in Phase I as described above) the distribu-tional recurrence relation X n L = X U n − + X ∗ n − U n , n > r + 1 , (7)where, on the right, X j and X ∗ j are independent probabilistic copies of X j for each j = 1 , . . . , n − U n is uniformly distributed on { , . . . , n } , andis independent of all the X and X ∗ variables. Let ξ n := E X n . From (7) we have ξ n = n P n − i =0 ξ i for n > r + 1. It follows that ξ = · · · = ξ ( r/ − = 0 ,ξ r/ = · · · = ξ r = 1 , and ξ n = n + 1 r + 1 , n > r + 1 . (8)(The last equation holds also for n = r .) Define ˜ ξ n = n +1 r +1 for all n . Then˜ ξ k − + ˜ ξ n − k = ˜ ξ n always. Since | ξ n − ˜ ξ n | r/ r + 1 < , it follows that if n > r + 1 (and so ξ n = ˜ ξ n ), then − < ξ k − + ξ n − k − ξ n < k n .Let F t denote the σ -algebra corresponding to information revealed inthe first t steps of Phase I as described above. Define M t = E [ X n | F t ] , so that ( M t ) nt =0 is a (Doob) martingale. It follows from (9) that the martin-gale ( M t ), which has mean M = ξ n given by (8), satisfies − < M t − M t − < t .Let E be the event that X n n r . Since ξ n = n + 1 r + 1 > nr + 1 > n r , when E holds we have X n − ξ n − n r . After the first T steps of Phase I,nothing further happens, so M T = M T +1 = · · · = M n = X n . Hence, writing t = ⌈ n/r ⌉ as before, we have P ( E ) P ( T > t ) + P (cid:0) M t − ξ n − n r (cid:1) . By (6) and the Azuma–Hoeffding inequality (Theorem 2), it follows that P ( E ) e − t / + exp (cid:18) − n r t (cid:19) exp (cid:18) − n r (cid:19) + exp (cid:16) − n r (cid:17) exp (cid:16) − n r (cid:17) , where the penultimate inequality holds because 20 n/r t n/r , since n/r >
5, and the final inequality holds because e − x/ + e − x/ e − x/ for x > Corollary 5.
Let r > be even and n > r . Then we may write Q n = A + B where, for some σ -algebra F , we have that A is F -measurable, and,with probability at least − e − n/ (400 r ) , the conditional distribution of B given F is the sum of s = ⌈ n/ (3 r ) ⌉ independent random variables B , . . . , B s witheach B i having the distribution Q r i for some r i with r/ r i r . roof. Run
QuickSort in two phases as above, and define X n,r as in Lemma 4.Let E be the event that X n,r > s = ⌈ n/ (3 r ) ⌉ , so P ( E ) > − e − n/ (400 r ) byLemma 4. We now subdivide Phase II into two parts. When E holds, weselect s sublists from the end of Phase I with length between r/ r , oth-erwise we do not select any. In Phase IIa, we run QuickSort on all sublists except the selected ones. In Phase IIb, we run
QuickSort on the selectedsublists. Take the σ -algebra F to be the σ -algebra corresponding to all theinformation uncovered in Phases I and IIa, and A to be the total numberof comparisons made during Phases I and IIa. Take B i , i = 1 , . . . , s , to be(when E occurs) the number of comparisons involved in running QuickSort on the i th selected sublist. The sum of the B i above will roughly serve as our ‘binomial-like’ distribu-tion, but we would like a little more information about it. Knowing that B i has ‘scale’ roughly r i ≈ r , we shall condition on | B i − E B i | being at most2 r i . This will still keep a constant fraction of the variance, while giving usbetter control on the distribution of the sum of such random variables.Writing q n for E Q n , for n > Q ∗ n = ( Q n − q n ) /n denote the centeredand normalized form of Q n . Since Q ∗ n converges in distribution to Q , adistribution with a continuous positive density on R , we know that thereare constants n and c > n > n we have, say, P ( Q ∗ n ∈ [ − , − > c and P ( Q ∗ n ∈ [1 , > c . Hence, for n > n , P ( Q ∗ n ∈ [ − , > c (10)and, since P ( Q ∗ n ∈ I | Q ∗ n ∈ [ − , > c / c for I = [ − ,
1] and I = [1 , Q ∗ n | Q ∗ n ∈ [ − , > c . Let W ′ n denote the distribution of Q ∗ n conditioned to lie in [ − , W n := W ′ n − E W ′ n . Then | W n | W n > c . We will record theconsequences for the unrescaled distribution of Q n immediately after thefollowing definition. Definition 1.
Given r > D r denote the set of probability distributionsof random variables X with the following properties: E X = 0, | X | r ,and Var X > c ( r/ .The calculations above have the following simple consequence: for any r > n and any r ′ satisfying r/ r ′ r , we have P ( Q r ′ ∈ [ q r ′ − r ′ , q r ′ +9 r ′ ]) > c , and the conditional distribution of Q r ′ given this event is of theform z r ′ + X r ′ for some constant z r ′ and some X r ′ with law in D r . Definition 2.
Given r > s , let B r,s denote the setof s -fold convolutions of distributions from D r .In other words, X has a distribution in B r,s if we can write X = X + · · · + X s where the X i are independent and each has law in D r . The distri-butions in B r,s will be the ‘binomial-like’ ones we shall use in the smoothingargument. Remark.
More properly we should write D r,c and B r,s,c for the classesdefined in Definitions 1 and 2. In this paper we need only consider a par-ticular value of c as at the start of this subsection, but in other contextsone might consider these classes for other values of c . The results below ofcourse extend to this setting.The next lemma, a simple consequence of Corollary 5, will play a keyrole in our smoothing arguments. Lemma 6.
There are positive constants r , c and c such that the followingholds whenever n and r are positive integers with r even and r r c n :we may write Q n = A + B where, for some σ -algebra F , we have that A is F -measurable, and, with probability at least − e − c n/r , the conditionaldistribution of B given F is in the class B r,s , with s = ⌈ c n/r ⌉ .Proof. We start by taking F ′ , A ′ , and B ′ to be as in Corollary 5. Let E ′ ∈ F ′ be the event that we may write the conditional distribution of B ′ asthe sum of independent variables B ′ , . . . , B ′ t , t = ⌈ n/ (3 r ) ⌉ , with B ′ i having(conditionally given F ′ ) the distribution of Q r i for some r/ r i r .By Corollary 5 we have P ( E ′ ) > − e − Ω( n/r ) . We choose c c /
6, andset s = ⌈ c n/r ⌉ . Note that c n/r >
1, so s c n/r . We shall revealcertain extra information as described in a moment. Let E i denote theevent that B ′ i ∈ [ q r i − r i , q r i + 2 r i ], and let E denote the event that atleast s of the events E i occur. Each event E i has conditional probabilityat least 2 c by (10). Since the E i ’s are conditionally independent given F ′ ,and c t > c n/r > s , we see [from P ( E | E ′ ) > P (Bin( t, c ) > c t ) andChernoff’s inequality] that P ( E | E ′ ) > − e − Ω( t ) = 1 − e − Ω( n/r ) . Hence P ( E ) > − e − Ω( n/r ) .The extra information we reveal is as follows: firstly, which E i ’s occur,and hence whether E occurs. When E does occur, we let I be the set ofthe first s indices i such that E i occurs, otherwise we may take I = ∅ , say.10e now reveal the values of all B ′ i , i / ∈ I , and set B = P i ∈I ( B ′ i − z r i ). Let F ⊃ F ′ denote the σ -algebra generated by all information revealed so far.Then A = Q n − B = A ′ + P i ∈I z r i + P i/ ∈I B ′ i is certainly F -measurable.Also, when E occurs, the conditional distribution of B given F is in B r,s , asrequired. In this section we establish some simple properties of distributions in theclass B r,s without aiming for tight bounds. The first property is asymptoticnormality, which will give ‘smoothness’ on scales larger than r . Here and inwhat follows all constants are absolute, except in that they may depend onthe absolute constant c in the definition of D r . Lemma 7.
For any random variable X with distribution in B r,s we have Var X = Θ( r s ) and P (cid:0) X E X + x √ Var X (cid:1) = Φ( x ) + O (1 / √ s ) , where Φ is the standard normal distribution function, and the implicit con-stants depend only on the constant c in Definition 1.Proof. Dividing X , and each of the s summands X i comprising it, throughby r , we may assume without loss of generality that r = 1. Then apply theBerry–Esseen Theorem (Theorem 3 above), noting that each X i is boundedin absolute value by 4, and so has bounded third moment, and that Var X is Θ( s ) (under our assumption that r = 1).Next we establish a common tail bound for all distributions in the class B r,s . Lemma 8.
There are constants c > and C such that, for all X withdistribution in B r,s , all t > , and all ℓ > r we have P ( X ∈ [ t, t + ℓ ]) Cℓr √ s e − ct / ( r s ) . Proof.
The Azuma–Hoeffding inequality gives that, uniformly for Y withdistribution in B r,s , we have P ( Y > t ) e − Ω( t / ( r s )) . (11)11eparately, for any interval I of length ℓ > r , we have P ( Y ∈ I ) = O ( ℓ/ ( r √ s )) . (12)Indeed, writing Y ′ for a Gaussian with the same mean and variance as Y ,by Lemma 7 we have P ( Y ∈ I ) = P ( Y ′ ∈ I ) + O (1 / √ s ). As Y ′ has varianceΘ( r s ) we have P ( Y ′ ∈ I ) = O ( ℓ/ ( r √ s )), and the error term is absorbedinto the main term by the lower bound on ℓ . Somewhat surprisingly, theproof can be completed by multiplying these two bounds.Indeed, in proving the claimed result, adjusting the constants if needed,we may assume that s is even. Then we may write X = Y + Z where Y and Z are independent and have distributions in the class B r,s/ . Let I = [ t, t + ℓ ]with t >
0. Since X > t implies either Y > t/ Z > t/
2, we may write P ( X ∈ I ) P (cid:0) Y + Z ∈ I, Y > t/ (cid:1) + P (cid:0) Y + Z ∈ I, Z > t/ (cid:1) . (13)We bound the first term from above by P ( Y > t/ P ( Y + Z ∈ I | Y > t/ P ( Y > t/
2) sup y P ( Y + Z ∈ I | Y = y )= P ( Y > t/
2) sup x P ( Z ∈ [ x, x + ℓ ]) . The final quantity is e − Ω( t / ( r s )) × O ( ℓ/ ( r √ s )) by (11) and (12). The secondterm in (13) may be bounded in the same way. In proving Lemma 7 we applied the Berry–Esseen Theorem to distributionsfrom B r,s ; next we shall apply the same result to exponential tilts of thesedistributions, to prove the following result. In what follows, c is the con-stant appearing in Definition 1. Lemma 9.
Let
K > be constant. There exists a constant C ′ = C ′ ( c , K ) such that the following holds whenever λ > , m > ℓ > C ′ r and λmr √ s K. (14) Let X be a random variable with distribution in B r,s , and let I and I be subintervals, each of length ℓ , of an interval J of length m with J ⊂ [ − Kλr √ s, Kλr √ s ] . Then P ( X ∈ I ) = P ( X ∈ I ) (cid:18) O (cid:18) rℓ + λmr √ s (cid:19)(cid:19) . Y be arandom variable with bounded support. Then for any α ∈ R we may definethe tilted distribution L ( Y ( α ) ) by P ( Y ( α ) ∈ d x ) = P ( Y ∈ d x ) e αx γ , (15)where γ = γ ( L ( Y ) , a ) = E e αY ; here L ( Y ) denotes the law, or distribution,of the random variable Y .Before starting the proof of Lemma 9, we establish some elementaryproperties of tilted versions of distributions with law in the set D definedin Definition 1. Lemma 10.
There is a constant c > , depending only on c , such thatfor any Y with L ( Y ) ∈ D and any α ∈ [ − , we have Var Y ( α ) > c .Furthermore, dd α E Y ( α ) > c (16) whenever | α | .Proof. The first statement is intuitively clear: we take a distribution whosevariance is bounded from below, and ‘distort it’ by a bounded amount, sothe variance will still be bounded from below. We spell out a concreteargument, not aiming for the best possible bound.Let Y have distribution in D . Then, recalling Definition 1, for any b > c E Y b P ( Y b ) + 16 P ( Y > b ) b + 16 P ( | Y | > b ) . Take b = √ c . Then P ( | Y | > √ c ) > ( − ) c = c > c . Without loss of generality we may thus assume that P ( Y > √ c ) > c . Since E Y = 0 and Y is supported on [ − , P ( Y < > · √ c · c = c / . Y is supported on [ − , | α | γ = E e αY e , while e αx is at least e − for all x in the support of Y . Hence, P ( Y ( α ) > √ c ) > e − γ P ( Y > √ c ) > e − c . Similarly, P ( Y ( α ) < > e − c / . The last two bounds clearly imply a lower bound on Var Y that dependsonly on c .To establish (16), note that E Y ( α ) = E ( Y e αY ) E ( e αY ) , so by the quotient rule,dd α E Y ( α ) = E ( Y e αY ) E ( e αY ) − (cid:18) E ( Y e αY ) E ( e αY ) (cid:19) = Var Y ( α ) . Proof of Lemma 9.
Let X = X + · · · + X s have distribution in B r,s . We aimto bound P ( X ∈ I ) / P ( X ∈ I ), where I and I are intervals of length ℓ bothcontained in a common interval J of length m . By rescaling (considering ℓ/r and m/r in place of ℓ and m ) we may assume without loss of generalitythat r = 1.A simple calculation shows that if we tilt each X i by the same parameter α , then the independent sum of X ( α )1 , . . . , X ( α ) s has the same distribution as X ( α ) . By Lemma 10 we thus havedd α E X ( α ) = dd α s X i =1 E X ( α ) i > cs (17)for α ∈ [ − , I j = [ t j , t j + ℓ ]for j = 1 ,
2. Since r = 1 and m > C ′ r = C ′ by (14), we have | t | s Kλ √ ss = Kλ √ s KλmC ′ √ s K C ′ , C ′ largeenough, we have | t | cs . Since E X (0) = E X = 0, it follows from (17) thatthere is a unique value a ∈ [ − ,
1] such that E X ( a ) = t . Moreover, we have a = O ( t /s ) = O ( λs − / ) . (18)From now on we fix this tilting parameter, writing X ′ i for X ( a ) i , and X ′ for the independent sum X ′ + · · · + X ′ s . As noted above, X ′ has the samedistribution as X ( a ) . In other words, P ( X ′ ∈ d x ) = P ( X ∈ d x ) e ax γ , where γ = E e aX is independent of x . Since I and I lie in an interval J oflength m , it follows easily that P ( X ′ ∈ I ) P ( X ′ ∈ I ) = P ( X ∈ I ) P ( X ∈ I ) e O ( am ) . Now by (18) and (14), am = O ( λms − / ) = O (1) , so the e O ( am ) term is 1 + O ( am ). Hence P ( X ∈ I ) P ( X ∈ I ) = P ( X ′ ∈ I ) P ( X ′ ∈ I ) (cid:18) O (cid:18) λms / (cid:19)(cid:19) . (19)It remains to bound the ratio P ( X ′ ∈ I ) / P ( X ′ ∈ I ).Like the distribution of X i , the distribution of X ′ i is supported on theinterval [ − , X ′ i = O (1). But by the first part of Lemma 10,Var X ′ i > c , so Var X ′ i = Θ(1). Also, the absolute third moment E | X ′ i | isclearly at most 4 = O (1). The implicit constants in these estimates dependonly on c and K , not on i . Hence, µ := E X ′ = t ,σ := Var X ′ = s X i =1 Var X ′ i = Θ( s ) ,ρ := s X i =1 E | X ′ i | = O ( s ) . λ > λm/ √ s = O (1) we have m λm = O ( √ s ) = O ( σ ) , and hence m σ = O (cid:16) mσ (cid:17) = O (cid:16) ms / (cid:17) = O (cid:18) λms / (cid:19) . (20)Let Z ∼ N(0 ,
1) be a standard normal random variable. By the Berry–Esseen Theorem (Theorem 3), we have (cid:12)(cid:12) P ( X ′ ∈ I ) − P ( µ + σZ ∈ I ) (cid:12)(cid:12) Aρσ . Now by definition t = µ , and by assumption I and I are contained inan interval of length m . Hence, for any y ∈ I ∪ I we have | y − µ | m = O ( σ ) . (21)It follows that for j = 1 , P ( µ + σZ ∈ I j ) = P (cid:18) Z ∈ (cid:20) t j − µσ , t j − µ + ℓσ (cid:21)(cid:19) = Θ(1) × ℓσ = Θ (cid:18) ℓs / (cid:19) . Since
Aρ/σ = O ( s − / ), we thus have P ( X ′ ∈ I j ) = P ( µ + σZ ∈ I j )(1 + O (1 /ℓ )) . The implicit constant here does not depend on C ′ . Recalling that ℓ > C ′ r = C ′ , choosing C ′ large enough, the 1 + O (1 /ℓ ) factor here is at least 1 /
2, andit follows by dividing the bound for j = 2 by that for j = 1 that P ( X ′ ∈ I ) P ( X ′ ∈ I ) = P ( µ + σZ ∈ I ) P ( µ + σZ ∈ I ) (1 + O (1 /ℓ )) . (22)Let φ ( x ) = (2 π ) − / e − x / be the density function of the standard nor-mal variable. Since φ ( x ) = φ (0) e O ( x ) , from (21) we have P ( µ + σZ ∈ I j ) = P (cid:18) Z ∈ (cid:20) t j − µσ , t j − µ + ℓσ (cid:21)(cid:19) = ℓφ (0) σ e O ( m /σ ) . Hence, P ( µ + σZ ∈ I ) P ( µ + σZ ∈ I ) = e O ( m /σ ) = 1 + O ( m /σ ) = 1 + O (cid:18) λms / (cid:19) , (23)recalling (20). Together, (19), (22) and (23) complete the proof.16 The core smoothing argument
In this section we prove an ungainly lemma (Lemma 11), which is the heartof the smoothing argument. In this lemma there are many parameters; inthe next section we give a simple choice of parameters that allows us toprove Theorem 1. The reason for keeping the greater generality here is thatit seems (to us) to give a better picture of why the method works, and mayhelp in applying the method in other contexts.So far, it has not mattered whether the intervals we consider are open,closed or half-open. However, in the arguments below, at certain points wewill need to partition one interval into disjoint intervals of the same type.For this reason, from now on we consider only half-open intervals of the form( a, b ].The following definition is key to our smoothing arguments. Recall that q n = E Q n , and that ( Q n − q n ) /n d → Q , where Q has a continuous positivedensity function f on R . Given an integer n > m , ε , and Γ >
1, we say that the statement S ( n, m, ε, Γ) holds if(i) for any half-open interval I ⊂ R of length m and any x ∈ R such that q n + nx ∈ I we have | P ( Q n ∈ I ) − mn f ( x ) | ε mn , and(ii) for any half-open interval I of length m we have P ( Q n ∈ I ) Γ mn .Roughly speaking, the fact that ( Q n − q n ) /n d → Q implies that S ( n, m, ε, Γ)holds for some m = m ( n ) = o ( n ), some ε = ε ( n ) →
0, and some constant Γ.We seek to show that [property (i) of] S ( n, , ε ′ , Γ ′ ) holds for some slightlylarger ε ′ = ε ′ ( n ) and Γ ′ . Lemma 11.
There exist positive constants c , C ′ , C and r such that thefollowing holds. If the statement S ( n, m, ε, Γ) holds, ℓ divides m , and thereexist real numbers r and λ > such that r r cn, m > ℓ > C ′ r and λm √ rn, (24) then S ( n, ℓ, ε ′ , Γ ′ ) holds for any ε ′ > ε + Γ η and Γ ′ > Γ(1 + η ) , where η = C (cid:18) e − cλ + λm √ rn + rℓ + nℓ e − cn/r (cid:19) . (25) We could work with a statement S ( n, m, L, ε, Γ) where, in condition (i) only, theinterval I is restricted to lie within [ q n − L, q n + L ]. In Lemmas 11 and 13, the ‘input’value of L could be anything (at least m ), and the ‘output’ value of L would be the sameas the input. Nothing would change in the proofs.
17f course, we could replace C and C ′ by a single constant max { C, C ′ } ,but they play very different roles in the proof, so we keep them separate.As we shall see later, the key terms on the right in (25) are the second andthird; we can choose λ = log n , say, and then the key conditions to keep η small are (roughly stated) that r ≪ ℓ m ≪ √ rn. (26) Proof.
Let J ⊂ R be any interval of length m , and let I and I be subin-tervals of J with length ℓ such that P ( Q n ∈ I ) is minimal and P ( Q n ∈ I )is maximal. We shall show that P ( Q n ∈ I ) − P ( Q n ∈ I ) η Γ ℓ/n. (27)Assuming this for the moment, let us show that the lemma follows. Toestablish property (i) of S ( n, ℓ, ε ′ , Γ ′ ), let I be any interval of length ℓ , andchoose an interval J of length m with I ⊂ J . Let x be such that q n + nx ∈ I and define I and I as above. By definition, P ( Q n ∈ I ) P ( Q n ∈ I ) P ( Q n ∈ I ). Also, since J can be partitioned into intervals of length ℓ , bysimple averaging we have P ( Q n ∈ I ) ℓm P ( Q n ∈ J ) P ( Q n ∈ I ) . (28)By assumption P ( Q n ∈ J ) is within εm/n of f ( x ) m/n . From (27) and (28)it follows that P ( Q n ∈ I ) and P ( Q n ∈ I ), and hence P ( Q n ∈ I ), are within( εℓ/n ) + ( η Γ ℓ/n ) ε ′ ℓ/n of f ( x ) ℓ/n , as required.The argument for property (ii) is very similar. Given any interval I of length ℓ , find an interval J of length m containing it, and define I and I as above. This time, by assumption, P ( Q n ∈ J ) Γ m/n , so (byaveraging) P ( Q n ∈ I ) Γ ℓ/n . But then P ( Q n ∈ I ) P ( Q n ∈ I ) (Γ ℓ/n ) + ( η Γ ℓ/n ) Γ ′ ℓ/n , as required.It remains to prove (27). Let r and λ > r slightly if necessary (and adjusting the constants in the lemma appropri-ately), we may assume that r is an even integer. Let s = ⌈ c n/r ⌉ where c isas in Lemma 6. Write Q n = A + B where A , B , and the σ -algebra F are as inLemma 6. The idea is to condition on F and use the fact that, with very highprobability, B has conditional distribution in B r,s to show that P ( Q n ∈ I )and P ( Q n ∈ I ) are similar. Let σ = √ rn . Since s = Θ( n/r ), we have r √ s = Θ( σ ), so by the first part of Lemma 7, Var[ B | F ] = Θ( r s ) = Θ( σ )(with very high probability). It will be crucial that m ≪ σ , so that I and I are not too far apart on the scale over which B varies, but that ℓ ≫ r .18n the following proof, all statements hold provided c is small enoughand C is large enough. Let E be the event that the conditional distributionof B given F is indeed in B r,s , so that by Lemma 6 P ( E ) > − e − cn/r . (29)Let E be the event that E occurs and the midpoint of I − A lies in [ − λσ, λσ ].Note that I is deterministic, and A is F -measurable, so E ∈ F . Supposefirst that E occurs. Since I and I are both subsets of J , an intervalof length m σ/λ λσ , we have that I − A and I − A are both con-tained in { x : | x | λσ } . Note that 2 λσ = O ( λr √ s ). Also, λm/ ( r √ s ) = O ( λm/ √ rn ) = O (1) by (24). Hence, the conditions of Lemma 9 hold forsome constant K , provided we take C ′ > C ′ ( c , K ). When E occurs, itthus follows from Lemma 9 that P ( B ∈ I − A | F ) = P ( B ∈ I − A | F )[1 + O ( η )] , (30)where η = ( r/ℓ ) + ( λm/σ ) . Since E is F -measurable, and Q n ∈ I i if and only if B ∈ I i − A , takingthe expectation of both sides of (30) it follows that P (cid:0) { Q n ∈ I } ∩ E (cid:1) = P (cid:0) { Q n ∈ I } ∩ E (cid:1) [1 + O ( η )] , so P (cid:0) { Q n ∈ I } ∩ E (cid:1) − P (cid:0) { Q n ∈ I } ∩ E (cid:1) = O ( η ) P ( Q n ∈ I ) = O ( η Γ ℓ/n ) , (31)since P ( Q n ∈ I ) ℓm P ( Q n ∈ J ) Γ ℓn .We now consider the ‘tail case’, where I − A (and hence I − A ) is farfrom the mean (zero) of B . We split this case further according to how far.Assuming purely for convenience that λ is an integer, for each integer y > λ let E +2 ,y be the event that E occurs and the midpoint of I − A lies in( yσ, ( y + 1) σ ]. Similarly, let E − ,y be the event that E holds and the midpointof I − A lies in [ − ( y + 1) σ, − yσ ). Note that E = E ∪ [ y > λ E +2 ,y ∪ [ y > λ E − ,y , that this union is disjoint, and that all the events involved are F -measurable.Fix some y > λ and suppose that E +2 ,y holds. (The argument for E − ,y will be essentially identical, of course.) Because I has length at most σ , we19ee that the left-endpoint of I − A is at least ( y − ) σ > yσ/
2. Hence, byLemma 8, P ( Q n ∈ I | F ) = P ( B ∈ I − A | F ) C ℓσ e − cy , (32)after increasing C and decreasing c if necessary. Let J y = J − yσ , an intervalof length m containing I − yσ . Note that (for a given y ) the interval J y is deterministic. Now, when E +2 ,y holds, J y − A is an interval of length m contained (recalling m σ ) in [ − σ, σ ]. By Lemma 7 it follows that P ( Q n ∈ J y | F ) = P ( B ∈ J y − A | F ) = Θ( mσ ) − O (1 / √ s ) . The implicit constants here depend on c , but not on C ′ . Recalling ourassumption m > C ′ r , we may thus choose C ′ large enough to ensure that the O (1 / √ s ) error term is at most half the main term Θ( m/σ ) = Θ( m/ ( r √ s )),so P ( Q n ∈ J y | F ) = Ω( mσ ) . (33)Combining (32) and (33), we see that when E +2 ,y holds, then P ( Q n ∈ I | F ) P ( Q n ∈ J y | F ) O (cid:0) ℓm e − cy (cid:1) . (34)Since E +2 ,y is F measurable, it follows that P (cid:0) { Q n ∈ I } ∩ E +2 ,y (cid:1) O (cid:0) ℓm e − cy (cid:1) P (cid:0) { Q n ∈ J y } ∩ E +2 ,y (cid:1) O (cid:0) ℓm e − cy (cid:1) P ( { Q n ∈ J y } ) O (cid:0) Γ ℓn e − cy (cid:1) , using property (ii) of the statement S ( n, m, ε, Γ) for the last step. A similarbound holds for E − ,y . Since P y > λ e − cy = O ( e − cλ ), summing we concludethat P (cid:0) { Q n ∈ I } ∩ ( E \ E ) (cid:1) = O (cid:0) Γ ℓn e − cλ (cid:1) . (35)Finally, recalling (29), P (cid:0) { Q n ∈ I } ∩ E c (cid:1) P ( E c ) e − cn/r . (36)From (31), (35), and (36) we conclude that P ( Q n ∈ I ) P ( Q n ∈ I ) + η Γ ℓn for some η that satisfies η = O (cid:16) η + e − cλ + n Γ ℓ e − cn/r (cid:17) = O (cid:18) e − cλ + λm √ rn + rℓ + nℓ e − cn/r (cid:19) , recalling for the last step that Γ > σ = √ rn bydefinition. This completes the proof of (27) and thus of the lemma.20emma 11 will do most of the work for us, but there is a snag. Inapplying it, we need to assume r ≪ ℓ . Since we must have r > ℓ = 1 with this method. The fundamentalproblem is using the Berry–Esseen Theorem (as in the proof of Lemma 9) totry to get a good bound on the probability that an integer-valued randomvariable is in an interval of length 1—since the assumptions of the theoremdo not distinguish between intervals such as [ k, k + 1] (which contains twointegers) and [ k − / , k + 1 /
2] (which contains one), we can’t hope for agood approximation in this way. The solution in this case was outlined nearthe start of the paper: for this part of the argument we work not with abinomial-like distribution, but with a binomial distribution. Then we cancalculate the relevant probabilities directly, avoiding the approximation inthe Berry–Esseen Theorem. This is captured in Lemma 13 below, whoseproof is a variant of the proof of Lemma 11. Before coming to this lemma,we give the decomposition result that we shall need.
Lemma 12.
There are constants c > and n such that for any n > n wemay write Q n = A + B where, for some σ -algebra F , we have that A is F -measurable, and with probability at least − e − cn , the conditional distributionof B given F is the binomial distribution Bin( ⌈ cn ⌉ , / . Of course, this lemma can be rephrased to say that there are indepen-dent random variables A and B , with B ∼ Bin( ⌈ cn ⌉ , / − e − cn we have Q n = A + B . We keep the wording above tostrengthen the analogy to Lemmas 4 and 6. Proof.
For n = 3, QuickSort either needs two comparisons (if the initialpivot happens to be the middle element) or, with probability 2 /
3, threecomparisons. A simple variant of the proof of Lemma 4 shows that if c > − e − cn we may partiallyexpand the execution tree of QuickSort run on a list of n elements so as toleave ⌈ cn ⌉ instances of QuickSort of size 3. We take B to be the number ofcomparisons in these instances minus 2 ⌈ cn ⌉ . Lemma 13.
There exist constants n , c and C such that the following holds.Let n > n be an integer and let m > and λ > be real numbers such that λm √ n and λ c √ n/ . (37) If S ( n, m, ε, Γ) holds, then so does S ( n, , ε ′ , Γ ′ ) for any ε ′ > ε + Γ η and Γ ′ > Γ(1 + η ) , where η = C (cid:18) e − cλ + λm √ n + ne − cn (cid:19) . (38)21 roof. We shall show that whenever x and x are integers with | x − x | m , then P ( Q n = x ) − P ( Q n = x ) η Γ /n. (39)The result then follows roughly as in the proof of Lemma 11; since there isa small twist to deal with non-integer m , we briefly outline the argument.First, to establish property (ii) of S ( n, , ε ′ , Γ ′ ), let x be any integer, andpick an interval J ⊂ R of length m containing x , with ⌈ m ⌉ integers in J . Byaveraging, there is some x ∈ J such that P ( Q n = x ) P ( Q n ∈ J ) / ⌈ m ⌉ P ( Q n ∈ J ) /m Γ /n, using the assumption S ( n, m, ε, Γ) in the last step. Applying (39) with x = x gives P ( Q n = x ) Γ ′ /n , as required. For property (i), using thesame J and the bound P ( Q n ∈ J ) ( f ( y ) + ε ) m/n where y = ( x − q n ) /n gives P ( Q n = x ) ( f ( y ) + ε ′ ) /n . For the lower bound, consider an interval J ′ of length m containing x , with ⌊ m ⌋ integers in J . Then find x ∈ J with P ( Q n = x ) > P ( Q ′ n ∈ J ) /m and apply (39) with x = x .It remains to prove (39). We follow the proof of Lemma 11, but replacingthe distribution of class B r,s by a binomial distribution Bin( s, p ) where s =Θ( n ) and p = 2 /
3. (Any p bounded away from 0 and 1 would work.) Theexistence of the relevant decomposition is given by Lemma 12; let s = ⌈ cn ⌉ where c is as in that lemma, and let E be the event that the conditionaldistribution of B is indeed Bin( s, / P ( E ) > − e − cn .Suppose that E occurs. Then for 0 k s − P ( B = k + 1 | F ) P ( B = k | F ) = (cid:0) sk +1 (cid:1)(cid:0) sk (cid:1) p − p = s − kk + 1 p − p = 1 − ( k/s )1 − p p ( k/s ) + (1 /s ) . (40)If s/ k s/ P ( B = k + 1 | F ) P ( B = k | F ) = 1 + O (cid:0) | ( k/s ) − p | + (1 /s ) (cid:1) . Recalling that s = Θ( n ), when k is within 2 λ √ n cn/ s/
10 of ps thisgives P ( B = k + 1 | F ) / P ( B = k | F ) = 1 + O ( λ/ √ n ) (uniformly in such k ).It follows that if k and k satisfy | k i − ps | λ √ n and | k − k | m , then P ( B = k | F ) = P ( B = k | F ) (cid:0) O ( λm/ √ n ) (cid:1) . (41)Let E be the event that E occurs and x − A is within λ √ n of the mean ps of B . Since m √ n/λ λ √ n , this implies that | ( x − A ) − ps | λ √ n ,say. Using (41) we see that when E holds, then P ( Q n = x | F ) = P ( Q n = x | F ) (cid:0) O ( λm/ √ n ) (cid:1) P (cid:0) { Q n = x } ∩ E (cid:1) − P (cid:0) { Q n = x } ∩ E (cid:1) = O (cid:0) ( λm/ √ n )Γ /n (cid:1) . (42)As in the proof of Lemma 11, for integer y > λ let E +2 ,y be the event that E occurs and x − A lies in ( ps + yσ, ps + ( y + 1) σ ], where now σ = √ n .By an elementary calculation [using (40) as a starting point, for example],letting k = x − A and recalling that s = Θ( n ), there exist positive constants c ′ and c such that whenever E +2 ,y holds then P ( B = x − A | F ) = P (Bin( s, p ) = k ) P (Bin( s, p ) = ⌊ ps ⌋ ) e − c ′ y σ /s Cn − / e − cy . As before (compare the definition of J y in the proof of Lemma 11), let J y bean interval of length m containing x − yσ . Then, when E +2 ,y holds, J y − A is an interval of length m contained in [ ps − σ, ps + 2 σ ], say, and it followsusing elementary properties of the binomial distribution that P ( Q n ∈ J y | F ) = P (Bin( s, p ) ∈ J y − A | F ) = Θ( m/σ ) = Θ( m/ √ n ) . It follows that when E +2 ,y holds, then P ( Q n = x | F ) P ( Q n ∈ J y | F ) O (cid:0) m e − cy (cid:1) , (43)the analogue of (34). The rest of the proof follows exactly that of Lemma 11;we omit the details, noting only that the error terms arise as follows: e − cλ from (43) [just as from (34)], λm/ √ n from (42), and ne − cn from the prob-ability that E fails (recalling that our error terms are written relative toΓ /n ).Note that in Lemma 13, there is no error term corresponding to the r/ℓ term in Lemma 11, which can be traced back to the approximation errorfrom applying the Berry–Esseen Theorem in Lemma 9. In this section we prove Theorem 1 using Lemmas 11 and 13, together withthe following result of Fill and Janson [5, 6].23 heorem 14.
Let F n denote the distribution function of ( Q n − q n ) /n and f the continuous density function of the limiting distribution Q . There is aconstant C such that for any x and any n > we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F n ( x + δ n ) − F n ( x − δ n ) δ n − f ( x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Cn − / , (44) where δ n = 2 Cn − / . Furthermore, f is differentiable on R , and we have | f ( x ) | and | f ′ ( x ) | e C := 2466 (45) for all x ∈ R . The first (main) statement is part of [6, Theorem 6.1]; the second state-ment is from [5, Theorem 3.3]. Of course, the particular values of the con-stants will not be relevant here.Rephrased, (44) says that for any (half-open, as usual) interval I oflength m = δ n n = 2 Cn / , we have | P ( Q n ∈ I ) − mn f ( x I ) | Cn − / mn , (46)where x I is such that q n + nx I is the midpoint of I . This is almost, but notquite, condition (i) of the statement S ( n, m, ε, Γ) defined before Lemma 11.
Corollary 15.
Let C and e C be as in Theorem 14, and set C ′ = C + e CC .If n is large enough then S ( n, m, ε, Γ) holds with m = 2 Cn / , ε = C ′ n − / ,and Γ = 17 .Proof.
Let I be any (half-open, as always) interval of length m . To establishproperty (i) of S ( n, m, ε, Γ) we must show that | P ( Q n ∈ I ) − mn f ( x ) | C ′ n − / mn (47)for any x such that q n + nx ∈ I . Let x I be such that q n + nx I is the midpointof I . Then | x − x I | = | ( q n + nx ) − ( q n + nx I ) | n m n = Cn − / . By the Mean Value Theorem and the second bound in (45) we have | f ( x ) − f ( x I ) | e C | x − x I | e CCn − / . This, (46) and the triangle inequality imply (47).24o establish statement (ii) of S ( n, m, ε, Γ) simply note that by (47) and(45) we have P ( Q n ∈ I ) mn f ( x ) + C ′ n − / mn (16 + C ′ n − / ) mn mn , if n is large enough.We are now ready to prove Theorem 1. Proof of Theorem 1.
We will show that the theorem holds with the errorterm O ( n − − ε ) replaced by O ( n − − (1 / log n ). To do this, it suffices toestablish that S ( n, , ε ′ , Γ ′ ) holds for some ε ′ = O ( n − / log n ); condition(i) of this statement is exactly (1). We establish this by using K := ⌊ log n ⌋ + 1rounds of smoothing, as we now explain in some detail.In round k , for 1 k K −
1, we will apply Lemma 11 with parameters( m, ε, Γ , ℓ, r, λ ) = ( m k , ε k , Γ k , ℓ k , r k , log n ) , to be specified in a moment. Let the constants C and C ′ be as in Corol-lary 15. We set m k := 4 Cn / − k , k K,ℓ k := m k +1 = 2 Cn / − k , k K − , and r k := ( m k ℓ k ) / n / = Θ( n / − k/ ) , k K − . Furthermore, we set ε := C ′ n − / and Γ := 17 , and η k := b C − k/ n − / log n, k K − , for a constant b C to be chosen in a moment. Then we inductively define ε k := ε k − + Γ k − η k − , k K and Γ k := Γ k − (1 + η k − ) , k K. ℓ k | m k and the conditions (24) of Lemma 11are easily seen to hold for n large enough. Moreover, we have λ m k √ r k n = Θ (cid:0) − k/ n − / log n (cid:1) and r k ℓ k = Θ (cid:0) − k/ n − / (cid:1) . (Indeed, given m k and ℓ k , we have chosen r k to balance these terms, ignoringthe slowly varying factor λ = log n .) Since the outer two terms in (25) aresuperpolynomially small, we see that if b C is chosen suitably large, thenin each application of Lemma 11 we have η η k . Since the statement S ( n, m , ε , Γ ) holds by Corollary 15, applying Lemma 11 K − S ( n, m K , ε K , Γ K ) holds.Note that m K = Θ( n / − K ) = Θ( n / ). In the final round we applyLemma 13 with ( m, ε, Γ , λ ) = ( m K , ε K , Γ K , log n ) . The conditions (37) hold with room to spare (for n large enough). Thequantity η appearing in (38) is O ( n − / log n ) = o ( n − / ), so we concludethat S ( n, , ε ′ , Γ ′ ) holds, with Γ ′ = O (Γ K ) and ε ′ = ε K + o ( n − / Γ K ).It remains to estimate Γ K and ε K . First,Γ K = Γ K − Y j =1 (1 + η j ) Γ exp K − X j =1 η j Γ e η ∼ Γ as n → ∞ , so (for n large enough), Γ k
18 for all k . Then ε K ε + 18 k − X j =1 η j = ε + O ( η ) = O ( n − / log n ) . It follows that ε ′ = O ( n − / log n ), completing the proof. We describe here an argument for a weaker version of Theorem 1. The ad-vantage of this argument is that it requires less as input: only a distributionallimit theorem, not one with the explicit rate of convergence in Theorem 14.This may be useful in other contexts.
Theorem 16.
Uniformly in integers x we have P ( Q n = x ) = n − f (( x − q n ) /n ) + o ( n − ) as n → ∞ , where f is the continuous probability density function of Q . roof. We take as our starting point that ( Q n − q n ) /n d → Q , making noassumption about the rate of convergence. We do assume certain properties,immediate from [5, Theorem 3.3], of the density function f of Q , namelythat f is bounded (by M , say) and uniformly continuous on R . The onlyother properties of Q n we use are the decompositions provided by Lemmas 6and 12. This is all we need to prove Lemmas 11 and 13 exactly as above.The difference to the argument in Section 5 is how we apply these lemmas.Let F n be the distribution function of the normalized distribution Q ∗ n =( Q n − q n ) /n , and let F be that of Q . Since Q ∗ n d → Q and F is continuous,we have F n → F in sup-norm by Poly`a’s theorem (for example, [3, Exer-cise 4.3.4]). In other words, there is some δ ( n ) → x and n we have | F n ( x ) − F ( x ) | δ ( n ) . (48)Let γ ( n ) := sup x,y : | x − y | δ ( n ) / | f ( x ) − f ( y ) | . (49)Since δ ( n ) / → f is uniformly continuous, γ ( n ) → n → ∞ .For any interval I of length | I | = δ ( n ) / , by (48) we have that P ( Q ∗ n ∈ I )is within 2 δ ( n ) of R I f ( x )d x which, by (49), is within γ ( n ) | I | of f ( x ) | I | for any x ∈ I . Thus, P ( Q ∗ n ∈ I ) is within [2 δ ( n ) / + γ ( n )] | I | of f ( x ) | I | .Replacing Q ∗ n by Q n and I by q n + nI , this says exactly that property (i)of S ( n, m , ε , Γ ) holds, where m = m ( n ) = nδ ( n ) / and ε = ε ( n ) =2 δ ( n ) / + γ ( n ). Taking Γ = M + 1, which is at least M + ε ( n ) for n largeenough, we also have property (ii).To summarize, the distributional limit theorem (plus assumptions on f )gives us that there exist m = o ( n ), ε = o (1) and Γ = O (1) such that S ( n, m , ε , Γ ) holds. We now aim to apply Lemma 11 as many times asnecessary. The key point is that if S ( n, m, ε, Γ) holds where m = n/ω , with ω → ∞ , then in one step we can roughly square ω . More precisely, set ℓ = n/ω . , say. Then to satisfy (26) we can take say r = n/ω . , so m √ rn = ω − . and rℓ = ω − . . Choosing λ = log ω , say, the e − cλ term in (25) is superpolynomially smallin ω . The term ( n/ℓ ) e − cn/r is ω . e − cω . , which tends to zero extremelyquickly as ω grows. The conclusion is that the conditions of Lemma 11will hold, with η = O ( ω − . log ω ) = O ( ω − . ), say. Applying the lemmarepeatedly, in the i th application we have ω = ω i = ω . i ; we stop when ℓ
27s no more than n . , say (and hence, provided ω n . , which we maypresume without loss of generality, is at least n . ). Since P i ω − . i = o (1)(recalling that ω → ∞ ), the sum of the error terms η is o (1), and we findthat some S ( n, m, ε, Γ) holds with n . m n . , ε = o (1) and Γ = O (1).A single application of Lemma 13, say with λ = log n , yields S ( n, , ε ′ , Γ ′ )where also ε ′ = o (1), completing the proof. References [1] K. Azuma, Weighted sums of certain dependent random variables,
Tˆohoku Math. J. (1967), 357–367.[2] B. Bollob´as and O. Riordan, Counting connected hypergraphs via theprobabilistic method, Combin. Probab. Comput. (2016), 21–75.[3] Chung, K. L. A Course in Probability Theory.
Second edition. AcademicPress, New York, 1974.[4] P. Diaconis and B. Hough, Random walk on unipotent matrix groups,Preprint (2015); arXiv:1512.06304[5] J. Fill and S. Janson, Smoothness and decay properties of the limitingQuicksort density function, Pages 53-64 in
Mathematics and ComputerScience: Algorithms, Trees, Combinatorics and Probabilities (eds.: D.Gardy and A. Mokkadem), a volume in the series Trends in Mathemat-ics, Birkh¨auser Verlag (2000).[6] J. Fill and S. Janson, Quicksort asymptotics (with an unpublished ap-pendix).
Journal of Algorithms (2002), 4–28.[7] W. Hoeffding, Probability inequalities for sums of bounded randomvariables, J. Amer. Statist. Assoc. (1963), 13–30.[8] D.R. McDonald, On local limit theorem for integer valued random vari-ables, Teor. Veroyatnost. i Primenen. (1979), 607–614; see also The-ory Probab. Appl. (1980), 613–619.[9] R. Neininger, Recursive random variables with subgaussian distribu-tions, Statistics & Decisions (2005), 131–146.[10] V.V. Petrov, Sums of independent random variables , translated fromthe Russian by A. A. Brown.