[PDF] QuickXsort - A Fast Sorting Scheme in Theory and Practice

Abstract

QuickXsort is a highly efficient in-place sequential sorting scheme that mixes Hoare's Quicksort algorithm with X, where X can be chosen from a wider range of other known sorting algorithms, like Heapsort, Insertionsort and Mergesort. Its major advantage is that QuickXsort can be in-place even if X is not. In this work we provide general transfer theorems expressing the number of comparisons of QuickXsort in terms of the number of comparisons of X. More specifically, if pivots are chosen as medians of (not too fast) growing size samples, the average number of comparisons of QuickXsort and X differ only by o(n) -terms. For median-of- k pivot selection for some constant k , the difference is a linear term whose coefficient we compute precisely. For instance, median-of-three QuickMergesort uses at most nlgn−0.8358n+O(logn) comparisons. Furthermore, we examine the possibility of sorting base cases with some other algorithm using even less comparisons. By doing so the average-case number of comparisons can be reduced down to nlgn−1.4106n+o(n) for a remaining gap of only 0.0321n comparisons to the known lower bound (while using only O(logn) additional space and O(nlogn) time overall). Implementations of these sorting strategies show that the algorithms challenge well-established library implementations like Musser's Introsort.

Full PDF

QQuickXsort – A Fast Sorting Scheme inTheory and Practice ∗ Stefan Edelkamp Armin Weiß Sebastian WildNovember 6, 2018

Abstract.

QuickXsort is a highly eﬃcient in-place sequential sorting schemethat mixes Hoare’s Quicksort algorithm with X, where X can be chosen from awider range of other known sorting algorithms, like Heapsort, Insertionsort andMergesort. Its major advantage is that QuickXsort can be in-place even if X isnot. In this work we provide general transfer theorems expressing the number ofcomparisons of QuickXsort in terms of the number of comparisons of X. Morespeciﬁcally, if pivots are chosen as medians of (not too fast) growing size samples,the average number of comparisons of QuickXsort and X diﬀer only by o ( n )-terms. For median-of- k pivot selection for some constant k , the diﬀerence is alinear term whose coeﬃcient we compute precisely. For instance, median-of-threeQuickMergesort uses at most n lg n − . n + O (log n ) comparisons.Furthermore, we examine the possibility of sorting base cases with some otheralgorithm using even less comparisons. By doing so the average-case numberof comparisons can be reduced down to n lg n − . n + o ( n ) for a remaininggap of only 0 . n comparisons to the known lower bound (while using only O (log n ) additional space and O ( n log n ) time overall).Implementations of these sorting strategies show that the algorithms challengewell-established library implementations like Musser’s Introsort. Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . .

2. QuickXsort . . . . . . . . . . . . . . . . . . . .

3. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4. The QuickXsort recurrence . . . . . . . . . . . ∗ Parts of this article have been presented (in preliminary form) at the

International Computer ScienceSymposium in Russia (CSR) 2014 [12] and at the

International Conference on Probabilistic, Combinatorialand Asymptotic Methods for the Analysis of Algorithms (AofA) 2018 [50]. a r X i v : . [ c s . D S ] N ov QuickXsort – A Fast Sorting Scheme in Theory and Practice . .

5. Analysis for growing sample sizes . . . . . . . . . . . . . . . . . . . . . . .

6. Analysis for ﬁxed sample sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7. Analysis of QuickMergesort and Quick-Heapsort . . . . . . . . . . . . . . . . . . . .

8. Variance of QuickXsort . . . . . . . . . . . . . . . . . . . . . . . . .

9. QuickMergesort with base cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10. Experiments . . . . . . . . . . . . .

11. Conclusion A. Notation A.1. Generic mathematics . . . . . . . A.2. Stochastics-related notation . . . . A.3. Speciﬁc notation for algorithms andanalysis . . . . . . . . . . . . . .

1. Introduction

Sorting a sequence of n elements remains one of the most frequent tasks carried out bycomputers. In the comparisons model, the well-known lower bound for sorting n distinctelements says that using fewer than lg( n !) = n lg n − lg e · n ± O (log n ) ≈ n lg n − . n + O (log n ) comparisons is not possible, both in the worst case and in the average case. Theaverage case refers to a uniform distribution of all input permutations (random-permutationmodel).In many practical applications of sorting, element comparisons have a similar running-time cost as other operations (e.g., element moves or control-ﬂow logic). Then, a method hasto balance costs to be overall eﬃcient. This explains why Quicksort is generally consideredthe fastest general purpose sorting method, despite the fact that its number of comparisonsis slightly higher than for other methods.There are many other situations, however, where comparisons do have signiﬁcant costs,in particular, when complex objects are sorted w.r.t. a order relation deﬁned by a customprocedure. We are therefore interested in algorithms whose comparison count is optimal upto lower order terms , i.e., sorting methods that use n lg n + o ( n log n ) or better n lg n + O ( n )comparisons; moreover, we are interested in bringing the coeﬃcient of the linear term as closeto the optimal − . practical methods whose running time is competitive to standardsorting methods even when comparisons are cheap. As a consequence, expected (rather thanworst case) performance is our main concern.We propose QuickXsort as a general template for practical, comparison-eﬃcientinternal sorting methods. QuickXsort we uses the recursive scheme of ordinary Quicksort, We write lg for log , but use log to denote an otherwise unspeciﬁed logarithm in the O notation. Throughout the text, we avoid the (in our context somewhat ambiguous) terms in-place or in-situ . Weinstead call an algorithm internal if it needs at most O (log n ) words of space (in addition to the array . Introduction QuickXsort . The key insight is that X can use the second segment as a temporary buﬀerarea; so X can be an external method, but the resulting

QuickXsort is still an internal method.

QuickXsort only requires O (1) words of extra space, even when X itself requiresa linear-size buﬀer.We discuss a few concrete candidates for X to illustrate the versatility of QuickXsort .We provide a precise analysis of

QuickXsort in the form of “transfer theorems”: we expressthe costs of

QuickXsort in terms of the costs of X, where generally the use of

QuickXsort adds a certain overhead to the lower order terms of the comparison counts. Unlike previousanalyses for special cases, our results give tight bounds.A particularly promising (and arguably the most natural) candidate for X is

Mergesort . Mergesort is both fast in practice and comparison-optimal up to lower order terms; but thelinear-extra space requirement can make its usage impossible. With

QuickMergesort wedescribe an internal sorting algorithm that is competitive in terms of number of comparisonsand running time.

Outline.

The remainder of this section surveys previous work and summarizes the contri-butions of this article. We then describe

QuickXsort in detail in Section 2. In Section 3,we introduce mathematical notation and recall known results that are used in our analysisof

QuickXsort . In Section 4, we postulate the general recurrence for

QuickXsort anddescribe the distribution of subproblem sizes. Section 5 contains transfer theorems forgrowing size samples and Section 6 for constant size samples. In Section 7, we apply thesetransfer theorems to

QuickMergesort and

QuickHeapsort and discuss the results. InSection 8 contains a transfer theorem for the variance of

QuickXsort . Finally, in Section 10we present our experimental results and conclude in Section 11 with some open questions.

We pinpoint selected relevant works from the vast literature on sorting; our overview cannotbe comprehensive, though.

Comparison-eﬃcient sorting.

There is a wide range of sorting algorithms achieving thebound of n lg n + O ( n ) comparisons. The most prominent is Mergesort , which additionallycomes with a small coeﬃcient in the linear term. Unfortunately,

Mergesort requires linearextra space. Concerning the space

UltimateHeapsort [28] does better, however, withthe cost of a quite large linear term. Other algorithms, provide even smaller linear termsthan

Mergesort . Table 1 lists some milestones in the race for reducing the coeﬃcientin the linear term. Despite the fundamental nature of the problem, little improvementhas been made (w.r.t. the worst-case comparisons count) over the Ford and Johnson’s

MergeInsertion algorithm [17] – which was published 1959!

MergeInsertion requires n lg n − . n + O (log n ) comparisons in the worst case [31]. MergeInsertion has a severe drawback that renders the algorithm completely im-practical, though: in a naive the number of element moves is quadratic in n . Its running to be sorted). In particular, Quicksort is an internal algorithm whereas standard

Mergesort is not(hence called external ) since it uses a linear amount of buﬀer space for merges.

QuickXsort – A Fast Sorting Scheme in Theory and Practice

Table 1:

Milestones of comparison-eﬃcient sorting methods. The methods use (at most) n lg n + bn + o ( n ) comparisons for the given b in worst ( b wc ) and/or averagecase ( b ac ). Space is given in machine words (unless indicated otherwise). Algorithm b ac b ac empirical b wc Space Time

Lower bound − . − . O (1) O ( n log n ) Mergesort [31] − . − . O ( n ) O ( n log n ) Insertionsort [31] − . − . O (1) O ( n ) (cid:5) MergeInsertion [31] − . [ − . , − . − . O ( n ) O ( n ) (cid:5) MI+IS [27] − . O ( n ) O ( n ) (cid:5) BottomUpHeapsort [48] ? [0 . , . ω (1) O (1) O ( n log n ) WeakHeapsort [7, 9] ? [ − . , − .

42] 0 . O ( n ) bits O ( n log n ) RelaxedWeakHeapsort [8] − . − . − . O ( n ) O ( n log n ) InPlaceMergesort [41] ? − . O (1) O ( n log n ) QuickHeapsort [3] − . ≤ ≈ . ω (1) O (1) O ( n log n )Improved QuickHeapsort [4] − . ≤ ≈ − . ω (1) O ( n ) bits O ( n log n ) UltimateHeapsort [28] O (1) ≈ O (1) O (1) O ( n log n ) QuickMergesort − .

24 [ − . , − . − . † O (1) O ( n log n ) QuickMergesort (IS) ⊥ − . − . † O (log n ) O ( n log n ) QuickMergesort (MI) ⊥ − . − . , − . − . † O (log n ) O ( n log n ) QuickMergesort (MI+IS) ⊥ − . − . † O (log n ) O ( n log n ) in this paper ≤ only upper bound proven in cited source † assuming InPlaceMergesort as a worst-case stopper; with median-of-medians fallbackpivot selection: O (1), without worst-case stopper: ω (1) ⊥ using given method for small subproblems; MI = MergeInsertion , IS =

Insertionsort . (cid:5) using a rope data structure and allowing additional O ( n ) space in O ( n log n ). time can be improved to O ( n log n ) by using a rope data structure [2] (or a similar datastructure which allows random access and insertions in O (log n ) time) for insertion of ele-ments (which, of course, induces additional constant-factor overhead). The same is true for Insertionsort , which, unless explicitly indicated otherwise, refers to the algorithm thatinserts elements successively into a sorted preﬁx by ﬁnding the insertion position by binarysearch – as opposed to linear/sequential search in

StraightInsertionsort . Note that

MergeInsertion or Insertionsort can still be used as comparison-eﬃcient subroutines tosort base cases for

Mergesort (and

QuickMergesort ) of size O (log n ) without aﬀectingthe overall running-time complexity of O ( n log n ).Reinhardt [41] used this trick (and others) to design an internal Mergesort variantthat needs n lg n − . n ± O (log n ) comparisons in the worst case. Unfortunately, imple-mentations of this InPlaceMergesort algorithm have not been documented. Katajainenet al.’s [29, 19, 15] work inspired by Reinhardt is practical, but the number of comparisonsis larger.Improvements over

MergeInsertion have been obtained for the average number ofcomparisons. A combination of

MergeInsertion with a variant of

Insertionsort (in-serting two elements simultaneously) by Iwama and Teruyama uses ≤ n lg n − . n comparisons on average [27]; as for MergeInsertion the overall complexity of remainsquadratic (resp. Θ ( n log n )), though. Notice that the analysis in [27] is based on our boundon MergeInsertion in Section 9.2. . Introduction Previous work on QuickXsort.

Cantone and Cincotti [3] were the ﬁrst to explicitly namingthe mixture of Quicksort with another sorting method; they proposed

QuickHeapsort .However, the concept of

QuickXsort (without calling it like that) was ﬁrst used in

UltimateHeapsort by Katajainen [28]. Both versions use an external

Heapsort variantin which a heap containing m elements is not stored compactly in the ﬁrst m cells of thearray, but may be spread out over the whole array. This allows to restore the heap propertywith d lg n e comparisons after extracting some element by introducing a new gap (we canthink of it as an element of inﬁnite weight) and letting it sink down to the bottom of theheap. The extracted elements are stored in an output buﬀer.In UltimateHeapsort , we ﬁrst ﬁnd the exact median of the array (using a lineartime algorithm) and then partition the array into subarrays of equal size; this ensuresthat with the above external

Heapsort variant, the ﬁrst half of the array (on which theheap is built) does not contain gaps (Katajainen calls this a two-level heap); the otherhalf of the array is used as the output buﬀer.

QuickHeapsort avoids the signiﬁcantadditional eﬀort for exact median computations by choosing the pivot as median of somesmaller sample. In our terminology, it applies

QuickXsort where X is external Heapsort.

UltimateHeapsort is inferior to

QuickHeapsort in terms of the average case numberof comparisons, although, unlike

QuickHeapsort , it allows an n lg n + O ( n ) bound forthe worst case number of comparisons. Diekert and Weiß [4] analyzed QuickHeapsort more thoroughly and described some improvements requiring less than n lg n − . n + o ( n )comparisons on average (choosing the pivot as median of √ n elements). However, both theoriginal analysis of Cantone and Cincotti and the improved analysis could not give tightbounds for the average case of median-of- k QuickMergesort .In [15] Elmasry, Katajainen and Stenmark proposed

InSituMergesort , following thesame principle as

UltimateHeapsort but with

Mergesort replacing

ExternalHeap-sort . Also

InSituMergesort only uses an expected linear algorithm for the mediancomputation.In the conference paper [12], the ﬁrst and second author introduced the name

QuickX-sort and ﬁrst considered

QuickMergesort as an application (including weaker forms ofthe results in Section 5 and Section 9 without proofs). In [50], the third author analyzed

QuickMergesort with constant-size pivot sampling (see Section 6). A weaker upperbound for the median-of-3 case was also given by the ﬁrst two authors in the preprint [14].The present work is a full version of [12] and [50]; it uniﬁes and strengthens these results(including all proofs) and it complements the theoretical ﬁndings with extensive running-timeexperiments.

In this work, we introduce

QuickXsort as a general template for transforming an externalalgorithm into an internal algorithm. As examples we consider

QuickHeapsort and

QuickMergesort . For the readers convenience, we collect our results here (with referencesto the corresponding sections).• If X is some sorting algorithm requiring x ( n ) = n lg n + bn ± o ( n ) comparisons onexpectation and k ( n ) ∈ ω (1) ∩ o ( n ). Then, median-of- k ( n ) QuickXsort needs x ( n ) ± o ( n ) comparisons in the average case (Theorem 5.1). QuickXsort – A Fast Sorting Scheme in Theory and Practice • Under reasonable assumptions, sample sizes of √ n are optimal among all polynomialsize sample sizes.• The probability that median-of- √ n QuickXsort needs more than x wc ( n ) + 6 n com-parisons decreases exponentially in √ n (Proposition 5.5).• We introduce median-of-medians fallback pivot selection (a trick similar to Intro-sort [39]) which guarantees n lg n + O ( n ) comparisons in the worst case while alteringthe average case only by o ( n )-terms (Theorem 5.7).• Let k be ﬁxed and let X be a sorting method that needs a buﬀer of b αn c elementsfor some constant α ∈ [0 ,

1] to sort n elements and requires on average x ( n ) = n lg n + bn ± o ( n ) comparisons to do so. Then median-of- k QuickXsort needs c ( n ) = n lg n + ( P ( k, α ) + b ) · n ± o ( n ) , comparisons on average where P ( k, α ) is some constant depending on k and α (The-orem 6.1). We have P (1 ,

1) = 0 . QuickHeapsort or Quick-Mergesort ) and P (1 , /

2) = 0 . QuickMergesort ).• We compute the standard deviation of the number of comparisons of median-of- k QuickMergesort for some small values of k . For k = 3 and α = , the standarddeviation is 0 . n (Section 8).• When sorting small subarrays of size O (log n ) in QuickMergesort with some sortingalgorithm Z using z ( n ) = n lg n + ( b ± ε ) n + o ( n ) comparisons on average and otheroperations taking at most O ( n ) time, then QuickMergesort needs z ( n ) + o ( n )comparisons on average (Corollary 9.2). In order to apply this result, we prove that – (Binary) Insertionsort needs n lg n − (1 . ± . n + o ( n ) comparisons onaverage (Proposition 9.3). – (A simpliﬁed version of) MergeInsertion [18] needs at most n lg n − . n + o ( n ) on average (Theorem 9.5).Moreover, with Iwama and Teruyama’s algorithm [27] this can be improved sightly to n lg n − . n + o ( n ) comparisons (Corollary 9.9).• We run experiments conﬁrming our theoretical (and heuristic) estimates for the averagenumber of comparisons of QuickMergesort and its standard deviation and verifyingthat the sublinear terms are indeed negligible (Section 10).• From running-time studies comparing

QuickMergesort with various other sortingmethods, we conclude that our

QuickMergesort implementation is among the fastestinternal general-purpose sorting methods for both the regime of cheap and expensivecomparisons (Section 10).To simplify the arguments, in all our analyses we assume that all elements in the inputare distinct. This is no severe restriction since duplicate elements can be handled well usingfat-pivot partitioning (which excludes elements equal to the pivot from recursive calls andcalls to X). . QuickXsort

2. QuickXsort

In this section we give a more precise description of

QuickXsort . Let X be a sortingmethod that requires buﬀer space for storing at most b αn c elements (for α ∈ [0 , n elements. The buﬀer may only be accessed by swaps so that once X has ﬁnished its work,the buﬀer contains the same elements as before, albeit (in general) in a diﬀerent order thanbefore. sort by X sort recursively Figure 1:

Schematic steps of

QuickXsort . The pictures showa sequence, where the vertical height corresponds tokey values. We start with an unsorted sequence (top),and partition it around a pivot value (second fromtop). Then one part is sorted by X (second frombottom) using the other segment as buﬀer area (greyshaded area). Note that this in general permutes theelements there. Sorting is completed by applying thesame procedure recursively to the buﬀer (bottom).

QuickXsort now works as follows: First, we choose a pivot element; typically we usethe median of a random sample of the input. Next, we partition the array according to thispivot element, i.e., we rearrange the array so that all elements left of the pivot are less orequal and all elements on the right are greater or equal than the pivot element. This resultsin two contiguous segments of J resp. J elements; we exclude the pivot here (since it willhave reached its ﬁnal position), so J + J = n −

1. Note that the (one-based) rank R of thepivot is random, and so are the segment sizes J and J . We have R = J + 1 for the rank.We then sort one segment by X using the other segment as a buﬀer. To guaranteea suﬃciently large buﬀer for X when it sorts J r ( r = 1 or 2), we must make sure that J − r ≥ αJ r . In case both segments could be sorted by X, we use the larger of the two. Afterone part of the array has been sorted with X, we move the pivot element to its correctposition (right after/before the already sorted part) and recurse on the other segment of thearray. The process is illustrated in Figure 1.The main advantage of this procedure is that the part of the array that is not currentlybeing sorted can be used as temporary buﬀer area for algorithm X. This yields fast internal variants for various external sorting algorithms such as Mergesort . We have to make sure,however, that the contents of the buﬀer is not lost. A simple suﬃcient condition is to requirethat X to maintains a permutation of the elements in the input and buﬀer: whenever adata element should be moved to the external storage, it is swapped with the data elementoccupying that respective position in the buﬀer area. For

Mergesort , using swaps in themerge (see Section 2.1) is suﬃcient. For other methods, we need further modiﬁcations.

Remark 2.1 (Avoiding unnecessary copying):

For some X, it is convenient to have thesorted sequence reside in the buﬀer area instead of the input area. We can avoid unnecessaryswaps for such X by partitioning “in reverse order”, i.e., so that large elements are left ofthe pivot and small elements right of the pivot.

QuickXsort – A Fast Sorting Scheme in Theory and Practice

Pivot sampling.

It is a standard strategy for

Quicksort to choose pivots as the median ofsome sample. This optimization is also eﬀective for

QuickXsort and we will study its eﬀectin detail. We assume that in each recursive call, we choose a sample of k elements, where k = 2 t + 1, t ∈ N is an odd number. The sample can either be selected deterministically(e.g.some ﬁxed positions) or at random. Usually for the analysis we do not need randomselection; only if the algorithm X does not preserve randomness of the buﬀer element, wehave to assume randomness (see Section 4). However, notice that in any case randomselection might be beneﬁcial as it protects against against a potential adversary who providesa worst-case input permutation.Unlike for Quicksort , in

QuickXsort pivot selection contributes only a minor termto the overall running time (at least in the usual case that k (cid:28) n ). The reason is that QuickXsort only makes a logarithmic number of partitioning rounds in expectation (while

Quicksort always makes a linear number of partitioning rounds) since in expectation aftereach partitioning round constant fraction of the input is excluded from further consideration(after sorting it with X). Therefore, we do not care about details of how pivots are selected,but simply assume that selecting the median of k elements needs s ( k ) = Θ ( k ) comparisonson average (e.g. using Quickselect [24]).We consider both the case where k is a ﬁxed constant and where k = k ( n ) is an increasingfunction of the (sub)problem size. Previous results in [4, 35] for Quicksort suggest thatsample sizes k ( n ) = Θ ( √ n ) are likely to be optimal asymptotically, but most of the relativesavings for the expected case are already realized for k ≤

10. It is quite natural to expectsimilar behavior in QuickXsort, and it will be one goal of this article to precisely quantifythese statements.

A natural candidate for X is Mergesort: it is comparison-optimal up to the linear term (andquite close to optimal in the linear term), and needs a Θ ( n )-element-size buﬀer for practicalimplementations of merging. Step 1: swap

Step 2: merge

Result:Figure 2:

Usual merging procedure where one of the two runs ﬁts into the buﬀer. Merging can be done in place using more advanced tricks (see, e.g., [19, 34]), but those tend not to becompetitive in terms of running time with other sorting methods. By changing the global structure,a “pure” internal Mergesort variant [29] can be achieved using part of the input as a buﬀer (as inQuickMergesort) at the expense of occasionally having to merge runs of very diﬀerent lengths. . QuickXsort Algorithm 1

Simple merging procedure that uses the buﬀer only by swaps. We move theﬁrst run A [ ‘..m −

1] into the buﬀer B [ b..b + n −

1] and then merge it with the second run A [ m..r ] (still in the original array) into the empty slot left by the ﬁrst run. By the time thisﬁrst half is ﬁlled, we either have consumed enough of the second run to have space to growthe merged result, or the merging was trivial, i.e., all elements in the ﬁrst run were smaller. SimpleMergeBySwaps ( A [ ‘..r ] , m, B [ b..e ]) // Merges runs A [ ‘, m − and A [ m..r ] in-place into A [ l..r ] using scratch space B [ b..e ] n := r − ‘ + 1; n := r − ‘ + 1 // Assumes A [ ‘, m − and A [ m..r ] are sorted, n ≤ n and n ≤ e − b + 1 . for i = 0 , . . . , n − Swap ( A [ ‘ + i ] , B [ b + i ]) end for i := b ; i := m ; o := ‘ while i < b + n and i ≤ r if B [ i ] ≤ A [ i ] Swap ( A [ o ] , B [ i ]); o := o + 1; i := i + 1 else Swap ( A [ o ] , A [ i ]); o := o + 1; i := i + 1 end if end while while i < b + n Swap ( A [ o ] , B [ i ]); o := o + 1; i := i + 1 end while Simple swap-based merge.

To be usable in QuickXsort, we use a swap-based mergeprocedure as given in Algorithm 1. Note that it suﬃces to move the smaller of the two runsto a buﬀer (see Figure 2); we use a symmetric version of Algorithm 1 when the second runis shorter. Using classical top-down or bottom-up Mergesort as described in any algorithmstextbook (e.g. [46]), we thus get along with α = .The code in Algorithm 1 illustrates that very simple adaptations suﬃce for QuickMerge-sort . This merge procedure leaves the merged result in the range previously occupied bythe two input runs. This “in-place”-style interface comes at the price of copying one run. “Ping-pong” merge.

Copying one run can be avoided if we instead write the merge resultinto an output buﬀer (and leave it there). This saves element moves, but uses buﬀer spacefor all n elements, so we have α = 1 here. The Mergesort scaﬀold has to take care tocorrectly orchestrate the merges, using the two arrays alternatingly; this alternating patternresembled the ping-pong game. “Ping-pong” merge with smaller buﬀer.

It is also possible to implement the “ping-pong”merge with α = . Indeed, the copying in Algorithm 1 can be avoided by sorting the ﬁrst runwith the “ping-pong” merge. This will automatically move it to the desired position in thebuﬀer and the merging can proceed as in Algorithm 1. Figure 3 illustrates this idea, whichis easily realized with a recursive procedure. Our implementation of QuickMergesort uses this variant.0

QuickXsort – A Fast Sorting Scheme in Theory and Practice

Step 1: ping-pong sort

Step 2: ping-pong sort

Step 3: merge

Result:Figure 3:

Mergesort with α = 1 / using ping-pong merges. Step 1:Step 2:Result:Figure 4:

Reinhardt’s merging procedure that needs only buﬀer space for half of the smaller run.In the ﬁrst step the two sequences are merged starting with the smallest elementsuntil the empty space is ﬁlled. Then there is enough empty space to merge thesequences from the right into the ﬁnal position.

Reinhardt’s merge.

A third, less obvious alternative was proposed by Reinhardt [41], whichallows to use an even smaller α for merges where input and buﬀer area form a contiguousregion; see Figure 4. Assume we are given an array A with positions A [1 , . . . , t ] beingempty or containing dummy elements (to simplify the description, we assume the ﬁrst case), A [ t + 1 , . . . , t + ‘ ] and A [ t + ‘ + 1 , . . . , t + ‘ + r ] containing two sorted sequences. We wish tomerge the two sequences into the space A [1 , . . . , ‘ + r ] (so that A [ ‘ + r + 1 , . . . , t + ‘ + r ]becomes empty). We require that r/ ≤ t < r . First we start from the left merging thetwo sequences into the empty space until there is no space left between the last element ofthe already merged part and the ﬁrst element of the left sequence (ﬁrst step in Figure 4).At this point, we know that at least t elements of the right sequence have been introducedinto the merged part; so, the positions t + ‘ + 1 through ‘ + 2 t are empty now. Since ‘ + t + 1 ≤ ‘ + r ≤ ‘ + 2 t , in particular, A [ ‘ + r ] is empty now and we can start mergingthe two sequences right-to-left into the now empty space (where the right-most element ismoved to position A [ ‘ + r ] – see the second step in Figure 4).In order to have a balanced merge, we need ‘ = r and so t ≥ ( ‘ + r ) /

4. Therefore, whenapplying this method in

QuickMergesort , we have α = . Remark 2.2 (Even less buﬀer space?):

Reinhardt goes even further: even with εn space, we can merge in linear time when ε is ﬁxed by moving one run whenever we run out. QuickXsort of space. Even though not more comparisons are needed, this method is quickly dominatedby the additional data movements when ε < , so we do not discuss it in this article.Another approach for dealing with less buﬀer space is to allow imbalanced merges: forboth Reinhardt’s merge and the simple swap-based merge, we need only additional space for(half) the size of the smaller run. Hence, we can merge a short run into a long run with arelatively small buﬀer. The price of this method is that the number of comparisons increases,while the number of additional moves is better than with the previous method. We shedsome more light on this approach in [10]. Avoiding Stack Space.

The standard version of

Mergesort uses a top-down recursiveformulation. It requires a stack of logarithmic height, which is usually deemed acceptablesince it is dwarfed by the buﬀer space for merging. Since

QuickMergesort removes theneed for the latter, one might prefer to also avoid the logarithmic stack space.An elementary solution is bottom-up

Mergesort , where we form pairs of runs andmerge them, except for, potentially, a lonely rightmost run. This variant occasionally mergestwo runs of very diﬀerent sizes, which aﬀects the overall performance (see Section 3.6).A simple (but less well-known) modiﬁcation that we call boustrophedonic Mergesort allows us to get the best of both worlds [20]: instead of leaving a lonely rightmost rununmerged (and starting again at the beginning with the next round of merges), we start thenext merging round at the same end, moving backwards through the array. We hence beginby merging the lonely run, and so avoid ever having a two runs that diﬀer by more than afactor of two in length. The logic for handling odd and even numbers of runs correctly ismore involved, but constant extra space can be achieved without a loss in the number ofcomparisons.

Another good option – and indeed the historically ﬁrst one – for X is Heapsort.

Why Heapsort?

In light of the fact that Heapsort is the only textbook method withreasonable overall performance that already sorts with constant extra space, this suggestionmight be surprising. Heapsort rather appears to be the candidate least likely to proﬁt fromQuickXsort. Indeed, it is a reﬁned variant of Heapsort that is an interesting candidate for X.To work in place, standard Heapsort has to maintain the heap in a very rigid shape tostore it in a contiguous region of the array. And this rigid structure comes at the price ofextra comparisons. Standard Heapsort requires up to 2( h −

1) comparisons to extract themaximum from a heap of height h , for an overall 2 n lg n ± O ( n ) comparisons in the worstcase.Comparisons can be saved by ﬁrst ﬁnding the cascade of promotions (a.k.a. the specialpath ), i.e., the path from the root to a leaf, always choosing to the larger of the two children.Then, in a second step, we ﬁnd the correct insertion position along this line of the elementcurrently occupying the last position of the heap area. The standard procedure correspondsto sequential search from the root. Floyd’s optimization (a.k.a. bottom-up Heapsort [48])instead uses sequential search from the leaf. It has a substantially higher chance to succeed after boustrophedon, a type of bi-directional text seen in ancient manuscripts where lines alternate betweenleft-to-right and right-to-left order; literally “turning like oxen in ploughing”. QuickXsort – A Fast Sorting Scheme in Theory and Practice early (in the second phase), and is probably optimal in that respect for the average case. Ifa better worst case is desired, one can use binary search on the special path, or even moresophisticated methods [21].

External Heapsort. In ExternalHeapsort , we avoid any such extra comparisons byrelaxing the heap’s shape. Extracted elements go to an output buﬀer and we only promotethe elements along the special path into the gap left by the maximum. This leaves a gap atthe leaf level, that we ﬁll with a sentinel value smaller than any element’s value (in the caseof a max-heap).

ExternalHeapsort uses n lg n ± O ( n ) comparisons in the worst case, butrequires a buﬀer to hold n elements. By using it as our X in QuickXsort, we can avoid theextra space requirement.When using ExternalHeapsort as X, we cannot simply overwrite gaps with sentinelvalues, though: we have to keep the buﬀer elements intact! Fortunately, the buﬀer elementsthemselves happen to work as sentinel values. If we sort the segment of large elementswith

ExternalHeapsort , we swap the max from the heap with a buﬀer element, whichautomatically is smaller than any remaining heap element and will thus never be promotedas long as any actual elements remain in the heap. We know when to stop since we knowthe segment sizes; after that many extractions, the right segment is sorted and the heap areacontains only buﬀer elements.We use a symmetric variant (with a min-oriented heap) if the left segment shall be sortedby X. For detailed code for the above procedure, we refer to [3] or [4].

Trading space for comparisons.

Many options to further reduce the number of comparisonshave been explored. Since these options demand extra space beyond an output buﬀer andcannot restore the contents of that extra space, using them in

QuickXsort does not yieldan internal sorting method, but we brieﬂy mention these variants here.One option is to remember outcomes of sibling comparisons to avoid redundant compar-isons in following steps [37]. In [4, Thm. 4], this is applied to

QuickHeapsort togetherwith some further improvements using extra space.Another option is to modify the heap property itself. In a weak heap, the root of asubtree is only larger than one of the subtrees, and we use an extra bit to store (andmodify) which one it is. The more liberal structure makes construction of weak heapsmore eﬃcient: indeed, they can be constructed using n − WeakHeapsort has been introduced by Dutton [7] and applied to

QuickWeakHeapsort in [8]. Weintroduced a reﬁned version of

ExternalWeakHeapsort in [12] that works by thesame principle as

ExternalHeapsort ; more details on this algorithm, its application in

QuickWeakHeapsort , and the relation to

Mergesort can be found in our preprint [11].Due to the additional bit-array, which is not only space-consuming, but also costs time toaccess,

WeakHeapsort and

QuickWeakHeapsort are considerably slower than ordinary

Heapsort , Mergesort , or

Quicksort ; see the experiments in [8, 12]. Therefore, we donot consider these variants here in more detail.

3. Preliminaries

In this section, we introduce some important notation and collect known results for reference.The reader who is only interested in the main results may skip this section. A comprehensive . Preliminaries stmt ] to mean 1 if stmt is true and 0 otherwise. P [ E ] denotesthe probability of event E , E [ X ] the expectation of random variable X . We write X D = Y todenote equality in distribution.With f ( n ) = g ( n ) ± h ( n ) we mean that | f ( n ) − g ( n ) | ≤ h ( n ) for all n , and we usesimilar notation f ( n ) = g ( n ) ± O ( h ( n )) to state asymptotic bounds on the diﬀerence | f ( n ) − g ( n ) | = O ( h ( n )). We remark that both use cases are examples of “one-way equalities”that are in common use for notational convenience, even though ⊆ instead of = would beformally more appropriate. Moreover, f ( n ) ∼ g ( n ) means f ( n ) = g ( n ) ± o ( g ( n )).Throughout, lg refers to the logarithm to base 2 while, while ln is the natural logarithm.Moreover, log is used for the logarithm with unspeciﬁed base (for use in O -notation).We write a b (resp. a b ) for the falling (resp. rising) factorial power a ( a − · · · ( a − b + 1)(resp. a ( a + 1) · · · ( a + b − A function f : I → R deﬁned on a bounded interval I is Hölder-continuous with exponent η ∈ (0 ,

1] if ∃ C ∀ x, y ∈ I : (cid:12)(cid:12) f ( x ) − f ( y ) (cid:12)(cid:12) ≤ C | x − y | η . Hölder-continuity is a notion of smoothness that is stricter than (uniform) continuity butslightly more liberal than Lipschitz-continuity (which corresponds to η = 1). f : [0 , → R with f ( z ) = z ln(1 /z ) is a stereotypical function that is Hölder-continuous (for any η ∈ (0 , Lemma 3.1 (Hölder integral bound):

Let f : [0 , → R be Hölder-continuous withexponent η . Then Z x =0 f ( x ) dx = 1 n n − X i =0 f ( i/n ) ± O ( n − η ) , ( n → ∞ ) . Proof:

The proof is a simple computation. Let C be the Hölder-constant of f . We split theintegral into small integrals over intervals of width n and use Hölder-continuity to boundthe diﬀerence to the corresponding summand: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z x =0 f ( x ) dx − n n − X i =0 f ( i/n ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = n − X i =0 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Z ( i +1) /ni/n f ( x ) dx − f ( i/n ) n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = n − X i =0 Z ( i +1) /ni/n (cid:12)(cid:12) f ( x ) − f ( i/n ) (cid:12)(cid:12) dx ≤ n − X i =0 Z ( i +1) /ni/n C (cid:12)(cid:12) x − in (cid:12)(cid:12) η dx QuickXsort – A Fast Sorting Scheme in Theory and Practice ≤ C n − X i =0 Z ( i +1) /ni/n (cid:0) n (cid:1) η dx = Cn − η Z dx = O ( n − η ) . (cid:3) Remark 3.2 (Properties of Hölder-continuity):

We considered only the unit intervalas the domain of functions, but this is no restriction: Hölder-continuity (on boundeddomains) is preserved by addition, subtraction, multiplication and composition (see, e.g.,[47, Section 4.6] for details). Since any linear function is Lipschitz, the result above holds forHölder-continuous functions f : [ a, b ] → R .If our functions are deﬁned on a bounded domain, Lipschitz-continuity implies Hölder-continuity and Hölder-continuity with exponent η implies Hölder-continuity with exponent η < η . A real-valued function is Lipschitz if its derivative is bounded. We write X D = Bin( n, p ) if X is has a binomial distribution with n ∈ N trials and successprobability p ∈ [0 , X is a sum of independent random variables with boundedinﬂuence on the result, Chernoﬀ bounds imply strong concentration results for X . We willonly need a very basic variant given in the following lemma. Lemma 3.3 (Chernoﬀ Bound, Theorem 2.1 of [36]):

Let X D = Bin( n, p ) and δ ≥ .Then P "(cid:12)(cid:12)(cid:12)(cid:12) Xn − p (cid:12)(cid:12)(cid:12)(cid:12) ≥ δ ≤ − δ n ) . (1) (cid:3) A consequence of this bound is that we can bound expectations of the form E [ f ( Xn )], by f ( p ) plus a small error term if f is “suﬃciently smooth”. Hölder-continuous (introducedabove) is an example for such a criterion: Lemma 3.4 (Expectation via Chernoﬀ):

Let p ∈ (0 , and X D = Bin( n, p ) , and let f : [0 , → R be a function that is bounded by | f ( x ) | ≤ A and Hölder-continuous withexponent η ∈ (0 , and constant C . Then it holds that E (cid:20) f (cid:18) Xn (cid:19)(cid:21) = f ( p ) ± ρ, where we have for any δ ≥ that ρ ≤ C ln 2 · δ η (cid:0) − e − δ n (cid:1) + 4 Ae − δ n For any ﬁxed ε > − η , we obtain ρ = o ( n − / ε ) as n → ∞ for a suitable choice of δ . Proof of Lemma 3.4:

By the Chernoﬀ bound we have P "(cid:12)(cid:12)(cid:12)(cid:12) Xn − p (cid:12)(cid:12)(cid:12)(cid:12) ≥ δ ≤ u exp( − δ n ) . (2) . Preliminaries E (cid:2)(cid:12)(cid:12) f (cid:0) Xn (cid:1) − f ( p ) (cid:12)(cid:12)(cid:3) , we divide the domain [0 ,

1] of Xn into the region of valueswith distance at most δ from p , and all others. This yields E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) f (cid:18) Xn (cid:19) − f ( p ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) ≤ (2) sup ξ : | ξ | <δ (cid:12)(cid:12) f ( p + ξ ) − f ( p ) (cid:12)(cid:12) · (cid:16) − e − δ n (cid:17) + sup x (cid:12)(cid:12) f ( x ) − f ( p ) (cid:12)(cid:12) · e − δ n ≤ Lemma 3.5 C · δ η · (cid:16) − e − δ n (cid:17) + 2 A · e − δ n . This proves the ﬁrst part of the claim.For the second part, we assume ε > − η is given, so we can write η = 1 − ε + 4 β for a constant β >

0, and η = (1 − ε ) / (1 − β ) for another constant β >

0. We mayfurther assume ε < ; for larger values the claim is vacuous. We then choose δ = n c with c = (cid:0) − − / − εη (cid:1) / − − − ε η . For large n we thus have ρ · n / − ε ≤ Cδ η n / − ε (cid:0) − − δ n ) (cid:1) + 4 An / − ε exp( − δ n )= Cn − β | {z } → · (cid:0) − − n β ) | {z } → (cid:1) + 4 A exp (cid:16) − n β + ( − ε ) ln( n ) (cid:17)| {z } → → n → ∞ , which implies the claim. (cid:3) The analysis in Section 6 makes frequent use of the beta distribution:

For λ, ρ ∈ R > , X D = Beta( λ, ρ ) if X admits the density f X ( z ) = z λ − (1 − z ) ρ − / B( λ, ρ ) where B( λ, ρ ) = R z λ − (1 − z ) ρ − dz is the beta function. It is a standard fact that for λ, ρ ∈ N ≥ we haveB( λ, ρ ) = ( λ − ρ − λ + ρ − λ, ρ > regularized incomplete beta function I x,y ( λ, ρ ) = Z yx z λ − (1 − z ) ρ − B( λ, ρ ) dz, ( λ, ρ ∈ R + , ≤ x ≤ y ≤ . (4)Clearly I , ( λ, ρ ) = 1.Let us denote by h the function h : [0 , → R ≥ h ( x ) = − x lg x . We have for abeta-distributed random variable X D = Beta( λ, ρ ) for λ, ρ ∈ N ≥ that E [ h ( X )] = B( λ, ρ ) (cid:0) H λ + ρ − H λ (cid:1) . (5)This follows directly from a well-known closed form a “logarithmic beta integral” (see,e.g., [49, Eq. (2.30)]). Z ln( z ) · z λ − (1 − z ) ρ − dz = B( λ, ρ ) (cid:0) H λ − − H λ + ρ − (cid:1) We will make use of the following elementary properties of h later (towards applyingLemma 3.4).6 QuickXsort – A Fast Sorting Scheme in Theory and Practice

Lemma 3.5 (Elementary Properties of h ): Let h : [0 , u → R ≥ with h ( x ) = − x lg( x ) . (a) h is bounded by ≤ h ( x ) ≤ lg ee ≤ . for x ∈ [0 , . (b) g ( x ) := − x ln x = ln(2) h ( x ) is Hölder-continuous in [0 , for any exponent η ∈ (0 , ,i.e., there is a constant C = C η such that | g ( y ) − g ( x ) | ≤ C η | y − x | η for all x, y ∈ [0 , .A possible choice for C η is given by C η = (cid:18)Z (cid:12)(cid:12) ln( t ) + 1 (cid:12)(cid:12) − η (cid:19) − η (6) For example, η = 0 . yields C η ≈ . . (cid:3) A detailed proof for the second claim appears in [49, Lemma 2.13]. Hence, h is suﬃcientlysmooth to be used in Lemma 3.4. Moreover, we use the beta-binomial distribution, which is a conditional binomial distributionwith the success probability being a beta-distributed random variable. If X D = BetaBin( n, λ, ρ )then P [ X = i ] = ni ! B( λ + i, ρ + ( n − i ))B( λ, ρ ) . Beta-binomial distributions are precisely the distribution of subproblem sizes after partition-ing in

Quicksort . We detail this in Section 4.3.A property that we repeatedly use here is a local limit law showing that the normalizedbeta-binomial distribution converges to the beta distribution. Using Chernoﬀ bounds afterconditioning on the beta distributed success probability shows that BetaBin( n, λ, ρ ) /n converges to Beta( λ, ρ ) (in a speciﬁc sense); but we obtain stronger error bounds for ﬁxed λ and ρ by directly comparing the probability density functions (PDFs). This yields thefollowing result; (a detailed proof appears in [49, Lemma 2.38]). Lemma 3.6 (Local Limit Law for Beta-Binomial, [49]):

Let ( I ( n ) ) n ∈ N ≥ be a familyof random variables with beta-binomial distribution, I ( n ) D = BetaBin( n, λ, ρ ) where λ, ρ ∈{ }∪ R ≥ , and let f B ( z ) = z λ − (1 − z ) ρ − / B( λ, ρ ) be the density of the Beta( λ, ρ ) distribution.Then we have uniformly in z ∈ (0 , that n · P (cid:2) I = b z ( n + 1) c (cid:3) = f B ( z ) ± O ( n − ) , ( n → ∞ ) . That is, I ( n ) /n converges to Beta( λ, ρ ) in distribution, and the probability weights convergeuniformly to the limiting density at rate O ( n − ) . For solving recurrences, we build upon Roura’s master theorems [43]. The relevant continuousmaster theorem is restated here for convenience: . Preliminaries Theorem 3.7 (Roura’s Continuous Master Theorem (CMT)):

Let F n be recursively deﬁned by F n =  b n , for ≤ n < N ; t n + n − X j =0 w n,j F j , for n ≥ N , (7) where t n , the toll function, satisﬁes t n ∼ Kn σ log τ ( n ) as n → ∞ for constants K = 0 , σ ≥ and τ > − . Assume there exists a function w : [0 , → R ≥ , the shape function, with R w ( z ) dz ≥ and n − X j =0 (cid:12)(cid:12)(cid:12)(cid:12) w n,j − Z ( j +1) /nj/n w ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) = O ( n − d ) , ( n → ∞ ) , (8) for a constant d > . With H := 1 − Z z σ w ( z ) dz , we have the following cases:1. If H > , then F n ∼ t n H .2. If H = 0 , then F n ∼ t n ln n e H with e H = − ( τ + 1) Z z σ ln( z ) w ( z ) dz .3. If H < , then F n = O ( n c ) for the unique c ∈ R with Z z c w ( z ) dz = 1 . (cid:3) Theorem 3.7 is the “reduced form” of the CMT, which appears as Theorem 1.3.2 in Roura’sdoctoral thesis [42], and as Theorem 18 of [35]. The full version (Theorem 3.3 in [43]) allowsus to handle sublogarithmic factors in the toll function, as well, which we do not need here.

We recapitulate some known facts about standard mergesort. The average number ofcomparisons for Mergesort has the same – optimal – leading term n lg n in the worst andbest case; this is true for both the top-down and bottom-up variants. The coeﬃcient of the linear term of the asymptotic expansion, though, is not a constant, but a bounded periodicfunction with period lg n , and the functions diﬀer for best, worst, and average case and thevariants of Mergesort [45, 16, 40, 25, 26].For this paper, we conﬁne ourselves to upper and lower bounds for the average case of theform x ( n ) = an lg n + bn ± O ( n − ε ) with constant b valid for all n . Setting b to the inﬁmumresp. supremum of the periodic function, we obtain the following lower resp. upper boundsfor top-down [26] and bottom-up [40] Mergesort x td ( n ) = n lg n − (cid:26) . n . n + 2 ± O ( n − ) (9)= n lg n − (1 . ± . n + 2 ± O ( n − ) and x bu ( n ) = n lg n − (cid:26) . n . n ± O (1)= n lg n − (0 . ± . n ± O (1) . QuickXsort – A Fast Sorting Scheme in Theory and Practice

4. The QuickXsort recurrence

In this section, we set up a recurrence equation for the costs of

QuickXsort . This recurrencewill be the basis for our analyses below. We start with some prerequisites and assumptionsabout X.

For simplicity we will assume that below a constant subproblem size w (with w ≥ k inthe case of constant size- k samples for pivot selection) are sorted with X (using a constantamount of extra space). Nevertheless, we could use any other algorithm for that as thisonly inﬂuences the constant term of costs. A common choice in practice is replace X by StraightInsertionsort to sort the small cases.We further assume that selecting the pivot from a sample of size k costs s ( k ) comparisons,where we usually assume s ( k ) = Θ ( k ), i.e., a (expected-case) linear selection method is used.Now, let c ( n ) be the expected number of comparisons in QuickXsort on arrays of size n ,where the expectation is over the random choices for selecting the pivots for partitioning. Preservation of randomness?

Our goal is to set up a recurrence equation for c ( n ). Wewill justify here that such a recursive relation exists.For the Quicksort part of QuickXsort, only the ranks of the chosen pivot elements has aninﬂuence on the costs; partitioning itself always needs precisely one comparison per element. Since we choose the pivot elements randomly (from a random sample), the order of the inputdoes not inﬂuence the costs of the Quicksort part of QuickXsort.For general X, the sorting costs do depend on the order of the input, and we would liketo use the average-case bounds for X, when it is applied on a random permutation. We mayassume that our initial input is indeed a random permutation of the elements, but this isnot suﬃcient! We also have to guarantee that the inputs for recursive calls are again randompermutations of their elements.A simple suﬃcient condition for this “randomness-preserving” property is that X may notcompare buﬀer contents. This is a natural requirement, e.g., for our Mergesort variants. Ifno buﬀer elements are compared to each other and the original input is a random permutationof its elements, so are the segments after partitioning, and so will be the buﬀer after X hasterminated. Then we can set up a recurrence equation for c ( n ) using the average-case costfor X. We may also replace the random sampling of pivots by choosing any ﬁxed positionswithout aﬀecting the expected costs c ( n ).However, not all candidates for X meet this requirement. (Basic) QuickHeapsort doescompare buﬀer elements to each other (see Section 2.2) and, indeed, the buﬀer elements are not in random order when the

Heapsort part has ﬁnished. For such X, we assume thatgenuinely random samples for pivot selection are used. Moreover, and we will have to useconservative bounds for the number of comparisons incurred by X, e.g., worst or best case We remark that this is no longer true for multiway partitioning methods where the number of comparisonsper element is not necessarily the same for all possible outcomes. Similarly, the number of swaps in thestandard partitioning method depends not only on the rank of the pivot, but also on how “displaced” theelements in the input are. It is a reasonable option to enforce this assumption in an implementation by an explicit random shuﬄe ofthe input before we start sorting. Sedgewick and Wayne, for example, do this for the implementation ofQuicksort in their textbook [46]. . The QuickXsort recurrence c ( n ), whereas for randomness preserving methods, the expected costs can becharacterized precisely by the recurrence.In both cases, we use x ( n ) as (a bound for) the number of comparisons needed by X tosort n elements, and we will assume that x ( n ) = an lg n + bn ± O ( n − ε ) , ( n → ∞ ) , for constants a , b and ε ∈ (0 , We can now proceed to the recursive description of the expected costs c ( n ) of QuickXsort .The description follows the recursive nature of the algorithm. Recall that

QuickXsort triesto sort the largest segment with X for which the other segment gives suﬃcient buﬀer space.We ﬁrst consider the case α = 1, in which this largest segment is always the smaller of thetwo segments created. Case α = 1 . Let us consider the recurrence for c ( n ) (which holds for both constant andgrowing size k = k ( n )). We distinguish two cases: ﬁrst, let α = 1. We obtain the recurrence c ( n ) = x ( n ) ≥ , (for n ≤ w ) c ( n ) = n − k ( n ) | {z } partitioning + s (cid:0) k ( n ) (cid:1)| {z } pivot sampling + E (cid:2) [ J > J ]( x ( J ) + c ( J )) (cid:3) + E (cid:2) [ J ≤ J ]( x ( J ) + c ( J )) (cid:3) (for n > w )= X r =1 E [ A r ( J r ) c ( J r )] + t ( n )where A ( J ) = [ J ≤ J ] , A ( J ) = [ J < J ] with J = ( n − − J,t ( n ) = n − k + s ( k ) + E [ A ( J ) x ( J )] + E [ A ( J ) x ( J )] . The expectation here is taken over the choice for the random pivot, i.e., over the segmentsizes J resp. J . Note that we use both J and J to express the conditions in a convenientform, but actually either one is fully determined by the other via J + J = n −

1. We call t ( n ) the toll function . Note how A and A change roles in recursive calls and toll functions,since we always sort one segment recursively and the other segment by X. General α . For α <

1, we obtain two cases: When the split induced by the pivot is“uneven” – namely when min { J , J } < α max { J , J } , i.e., max { J , J } > n − α – the smallersegment is not large enough to be used as buﬀer. Then we can only assign the large segmentas a buﬀer and run X on the smaller segment. If however the split is “about even”, i.e.,both segments are ≤ n − α we can sort the larger of the two segments by X. These cases also0 QuickXsort – A Fast Sorting Scheme in Theory and Practice show up in the recurrence of costs. c ( n ) = x ( n ) ≥ , (for n ≤ w ) c ( n ) = ( n − k ) + s ( k ) + E h(cid:2) J , J ≤ α ( n − (cid:3) · [ J > J ] · (cid:0) x ( J ) + c ( J ) (cid:1)i + E h(cid:2) J , J ≤ α ( n − (cid:3) · [ J ≤ J ] · (cid:0) x ( J ) + c ( J ) (cid:1)i + E h(cid:2) J > α ( n − (cid:3) · (cid:0) x ( J ) + c ( J ) (cid:1)i + E h(cid:2) J > α ( n − (cid:3) · (cid:0) x ( J ) + c ( J ) (cid:1)i (for n > w )= X r =1 E [ A r ( J r ) c ( J r )] + t ( n )where A ( J ) = h J, J ≤ α ( n − i · [ J ≤ J ] + h J > α ( n − i with J = ( n − − JA ( J ) = h J, J ≤ α ( n − i · [ J < J ] + h J > α ( n − i t ( n ) = n − k + s ( k ) + E [ A ( J ) x ( J )] + E [ A ( J ) x ( J )]The above formulation actually covers α = 1 as a special case, so in both cases we have c ( n ) = X r =1 E [ A r ( J r ) c ( J r )] + t ( n ) (10)where A (resp. A ) is the indicator random variable for the event “left (resp. right) segmentsorted recursively” and t ( n ) = n − k + s ( k ) + X r =1 E [ A r x ( J − r )] . (11)We note that the expected number of partitioning rounds is only Θ (log n ) and hence alsothe expected overall number of comparisons used in all pivot sampling rounds combined isonly O (log n ) when k is constant. Recursion indicator variables.

It will be convenient to rewrite A ( J ) and A ( J ) in termsof the relative subproblem size : A ( J ) = (cid:20) J n − ∈ h α α , i ∪ (cid:16)

11 + α , i(cid:21) ,A ( J ) = (cid:20) J n − ∈ h α α , (cid:17) ∪ (cid:16)

11 + α , i(cid:21) . Graphically, if we view J / ( n −

1) as a point in the unit interval, the following picture showswhich subproblem is sorted recursively for typical values of α ; (the other subproblem issorted by X). . The QuickXsort recurrence A = 1 A = 1 A = 1 A = 10 α α

12 11+ α α = A = 1 A = 1 A = 1 A = 10 α α

12 11+ α α = A = 1 A = 10 α = 1 Obviously, we have A + A = 1 for any choice of J , which corresponds to having exactlyone recursive call in QuickXsort. A vital ingredient to our analyses below is to characterize the distribution of the subproblemsizes J and J .Without pivot sampling, we have J D = U [0 ..n − median of arandom sample of k = 2 t + 1, elements, where t ∈ N . k may or may not depend on n ; wewrite k = k ( n ) to emphasize a potential dependency.By symmetry, the two subproblem sizes always have the same distribution, J D = J . Wewill therefore in the following simply write J instead of J when the distinction between leftand right subproblem is not important. Combinatorial model.

What is the probability P [ J = j ] to obtain a certain subproblemsize j ? An elementary counting argument yields the result. For selecting the j + 1-st elementas pivot, the sample needs to contain t elements smaller than the pivot and t elements largethan the pivot. There are (cid:0) nk (cid:1) possible choices for the sample in total, and (cid:0) jt (cid:1) · (cid:0) n − − jt (cid:1) ofwhich will select the j + 1-st element as pivot. Thus, P [ J = j ] = (cid:0) jt (cid:1)(cid:0) n − − jt (cid:1)(cid:0) nk (cid:1) Note that this probability is 0 for j < t or j > n − − t , so we can always write J = I + t for a random variable I ∈ [0 ..n − k ] with P [ I = i ] = P [ J = i + t ].The following lemma can be derived by direct elementary calculations, showing that J isconcentrated around its expected value n − . Lemma 4.1 ([4, Lemma 2]):

Let < δ < . If we choose the pivot as median of a randomsample of k = 2 t + 1 elements where k ≤ n , then the rank of the pivot R = J + 1 satisﬁes P (cid:2) R ≤ n − δn (cid:3) < kρ t and P (cid:2) R ≥ n + δn (cid:3) < kρ t where ρ = 1 − δ < . QuickXsort – A Fast Sorting Scheme in Theory and Practice

Proof:

First note that the probability for choosing the r -th element as pivot satisﬁes nk ! · P [ R = r ] = r − t ! n − rt ! . We use the notation of falling factorial x ‘ = x · · · ( x − ‘ + 1). Thus, (cid:0) x‘ (cid:1) = x ‘ /‘ !. P [ R = r ] = k ! · ( r − t · ( n − r ) t ( t !) · n k = tt ! k ( n − k − t − Y i =0 ( r − − i )( n − r − i )( n − i − n − i ) . For r ≤ t we have P [ R = r ] = 0. So, let t < r ≤ n − δn and let us consider an index i in theproduct with 0 ≤ i < t :( r − − i )( n − r − i )( n − i − n − i ) ≤ ( r − i )( n − r − i )( n − i )( n − i )= (cid:0)(cid:0) n − i (cid:1) − (cid:0) n − r (cid:1)(cid:1) · (cid:0)(cid:0) n − i (cid:1) + (cid:0) n − r (cid:1)(cid:1) ( n − i ) = (cid:0) n − i (cid:1) − (cid:0) n − r (cid:1) ( n − i ) ≤ − (cid:0) n − (cid:0) n − δn (cid:1)(cid:1) n = 14 − δ . We have (cid:0) tt (cid:1) ≤ t . Since k ≤ n , we obtain: P [ R = r ] ≤ t k ( n − t ) (cid:18) − δ (cid:19) t < k n ρ t . Now, we obtain the desired result: P [ R ≤ n − δn ] < b n − δn c X k =0 k n ρ t ≤ kρ t . (cid:3) Uniform model.

There is a second view on the distribution of J that will turn out convenientfor our analysis. Suppose our input consists of n real numbers drawn i.i.d. uniformly from(0 , value P ∈ (0 ,

1) ofthe (ﬁrst) pivot from its rank R ∈ [1 ..n ]. In particular, P only depends on the values inthe random sample, whereas R necessarily depends on the values of all elements in theinput. It is a well-known result that the median of a sample of U (0 ,

1) random variates hasa beta distribution: P D = Beta( t + 1 , t + 1). Indeed, the density of the beta distribution is . The QuickXsort recurrence x t (1 − x ) t , which is the probability to have t of the U (0 ,

1) elements ≤ x and t elements ≥ x (for a given value x of the sample median).Now suppose the pivot value P is ﬁxed. Then, conditional on P , all further (non-sample)elements fall into the categories “smaller than P ” resp. “larger than P ” independently andwith probability P resp. 1 − P (almost surely there are no duplicates). Apart from the t small elements from the sample, J precisely counts how many elements are less than P , sowe can write J = I + t where I is the number of elements that turned out to be smallerthan the pivot during partitioning.Since each of the n − k non-sample elements is smaller than P with probability P independent of all other elements, we have conditional on P that I D = Bin( n − k, P ). I is said to have mixed binomial distribution, with a beta-distributed mixer P . If wedrop the conditioning on P , we obtain the so-called beta-binomial distribution : I D =BetaBin( n − k, t + 1 , t + 1). We can express the probability weights by “integrating P out”: P [ I = i ] = E P " n − ki ! P i (1 − P ) i = Z x =0 n − ki ! x i (1 − x ) n − k − i · x t (1 − x ) t B( t + 1 , t + 1) dx = (cid:0) n − ki (cid:1) B( t + 1 , t + 1) Z x =0 x t + i (1 − x ) t + n − k − i dx = n − ki ! B( t + 1 + i, t + 1 + n − k − i )B( t + 1 , t + 1) , which yields the expression given in Section 3.4. (The last step above uses the deﬁnition of thebeta function.) Note that for t = 0, i.e., no sampling, we have t +BetaBin( n − k, t +1 , t +1) =BetaBin( n − , ,

1) = U [0 ..n − J byﬁrst conditioning on P , and then in a second step also taking expectations w.r.t. P , formallyusing the law of total expectation. In the ﬁrst step, we can make use of the simple Chernoﬀbounds for the binomial distribution (Lemma 3.3) instead of Lemma 4.1. The second step isoften much easier than the original problem and can use known formulas for integrals, suchas the ones given in Section 3.3. For a larger collection of such properties and connectionsto other stochastic processes see [49, Section 2.4.7]. Connection between models.

We obtained two expressions for P [ J = j ] from the twopoints of view above; the reader might ﬁnd it reassuring that they can indeed be provenequal by elementary term rewriting (see also [49, Lemma 6.3]): P [ J = j ] = (cid:0) jt (cid:1)(cid:0) n − − jt (cid:1)(cid:0) nk (cid:1) = k ! ( n − k )! n ! · j ! t ! ( j − t )! · ( n − − j )! t ! ( n − − j − t )!setting j = i + t and using k = 2 t + 1, we obtain= k ! ( n − k )! n ! · ( i + t )! t ! i ! · ( n − k − i + t )! t ! ( n − k − i )!= ( n − k )! i ! ( n − k − i )! · ( t + i )! ( t + n − k − i )! n ! (cid:30) t ! t ! k !4 QuickXsort – A Fast Sorting Scheme in Theory and Practice = (3) n − ki ! B( i + t + 1 , n − i − t )B( t + 1 , t + 1) .

5. Analysis for growing sample sizes

In this and the following section, we derive general transfer theorems that allow us to expressthe total cost of

QuickXsort in terms of the costs of X (as if used in isolation). We canthen directly use known results about X from the literature.As in plain

Quicksort , the performance of

QuickXsort is heavily inﬂuenced by themethod for choosing pivots (though the inﬂuence is only on the linear term of the number ofcomparisons). We distinguish two regimes here. The ﬁrst considers the case that the medianof a large sample is used; more precisely, the sample size is chosen as a growing but sublinearfunction in the subproblem size. This method yields optimal asymptotic results and allows arather clean analysis. This case is covered in Section 5.It is known for

Quicksort that increasing the sample size yields rapidly diminishingmarginal returns [44], and it is natural to assume that

QuickXsort behaves similarly.Asymptotically, a growing sample size will eventually be better, but the evidence in Section 10shows that a small, ﬁxed sample size gives the best practical performance on realistic inputsizes, so these variants deserve further study. This will be the purpose of Section 6.We mainly focus on the number of key comparisons as our cost model; the transfertheorems derived here are, however, oblivious to this.In this section, we derive general results which hold for a wide class of algorithms X. Aswe will show, the average number of comparisons of X and of median-of- k ( n ) QuickXsort diﬀer only by an o ( n )-term (if k ( n ) grows as n grows and under some natural assumptions). Throughout this section, we assume that the pivot is selected as the median of k = k ( n )elements where k ( n ) grows when n grows. The following theorem allows to transfer anasymptotic approximation for the costs of X to an asymptotic approximation of the costs of QuickXsort . We will apply this theorem to concrete methods X in Section 7.

Theorem 5.1 (Transfer theorem (expected costs, growing k )): Let c ( n ) be deﬁnedby Equation (10) (the recurrence for the expected costs of QuickXsort) and assume x ( n ) (the costs of X) and k = k ( n ) (the sample size) fulﬁll x ( n ) = an lg n + bn ± o ( n ) for constants a ≥ and b , and k = k ( n ) ∈ ω (1) ∩ o ( n ) as n → ∞ with ≤ k ( n ) ≤ n for all n .Then, c ( n ) ≤ x ( n )+ o ( n ) . For a = 1 , the above holds with equality, i.e., c ( n ) = x ( n )+ c ( n ) with c ( n ) = o ( n ) . Moreover, in the typical case with k ( n ) = Θ ( n κ ) for κ ∈ (0 , and x ( n ) = an lg n + bn ± O ( n δ ) with δ ∈ [0 , , we have for any ﬁxed ε > that c ( n ) = Θ ( n max { κ, − κ } ) ± O ( n max { δ, / ε } ) . We note that this result considerably strengthens the error term from o ( n ) in versions ofthis theorem in earlier work to O ( n / ε ) (for k ( n ) = √ n ). Since this error term is the onlydiﬀerence between the costs of QuickXsort and X (for a = 1), we feel that this improvedbound is not merely a technical contribution, but signiﬁcant strengthens our conﬁdence inthe utility and practicality of QuickXsort as an algorithmic template. . Analysis for growing sample sizes Remark 5.2 (Optimal sample sizes):

The experiments in [4] and the results for

Quick-select in [35] suggest that sample sizes k ( n ) = Θ ( √ n ) are likely to be optimal w.r.t.balancing costs of pivot selection and beneﬁt of better-quality pivots within the lower orderterms.Theorem 5.1 gives a proof for this in a special situation: assume that a = 1 , the error term ξ ( n ) ∈ O ( n δ ) for some δ ∈ [0 , ] and that we are restricted to sample sizes k ( n ) = Θ ( n κ ) , for κ ∈ (0 , . In this case Theorem 5.1 shows that κ = is the optimal choice, i.e., k ( n ) = √ n has the “best polynomial growth” among all feasible polynomial sample sizes. Proof of Theorem 5.1:

Let c ( n ) denote the average number of comparisons performedby QuickXsort on an input array of length n and let x ( n ) = an lg n + bn ± ξ ( n ) with ξ ( n ) ∈ o ( n ) be (upper and lower) bounds for the average number of comparisons performedby the algorithm X on an input array of length n . Without loss of generality we may assumethat ξ ( n ) is monotone.Let A be the indicator random variable for the event “left segment sorted recursively”and A = 1 − A similarly for the right segment. Recall that c ( n ) fulﬁlls the recurrence c ( n ) = X r =1 E [ A r c ( J r )] + t ( n ) , where t ( n ) = n − k ( n ) + s (cid:0) k ( n ) (cid:1) + X r =1 E [ A r x ( J − r )]and J and J are the sizes for the left resp. right segment created in the ﬁrst partitioningstep and s ( k ) ∈ Θ ( k ) is the expected number of comparisons to ﬁnd the median of thesample of k elements. Recurrence for the diﬀerence.

To prove our claim, we will bound the diﬀerence c ( n ) = c ( n ) − x ( n ); it satisﬁes a recurrence very similar to the one for c ( n ): c ( n ) = n − k ( n ) + s (cid:0) k ( n ) (cid:1) + E h A · (cid:0) c ( J ) + x ( J ) + x ( J ) (cid:1)i + E h A · (cid:0) c ( J ) + x ( J ) + x ( J ) (cid:1)i − x ( n )= E (cid:2) A c ( J ) (cid:3) + E (cid:2) A c ( J ) (cid:3) + n − k ( n ) + s (cid:0) k ( n ) (cid:1) + E (cid:2) x ( J ) (cid:3) + E (cid:2) x ( J ) (cid:3) − x ( n ) | {z } t ( n ) . (12)(Note how taking the diﬀerence here turns the complicated terms E [ A r x ( J − r )] from t ( n )into the simpler E [ x ( J r )] terms in t ( n ).) Approximating the toll function.

We will eventually bound c ( n ); the ﬁrst step is to studythe (asymptotic) behavior of the residual toll function t ( n ). Lemma 5.3 (Approximating t ( n )): Let t ( n ) as in Equation (12) . Then for ε > , wehave t ( n ) = (1 − a ) n + Θ (cid:16) k ( n ) + nk ( n ) (cid:17) ± O (cid:16) ξ ( n ) + n / ε (cid:17) . QuickXsort – A Fast Sorting Scheme in Theory and PracticeMoreover, if a = 1 , k ( n ) = Θ ( n κ ) for κ ∈ (0 , and ξ ( n ) = O ( n δ ) for δ ∈ [0 , , we have t ( n ) = Θ (cid:16) n max { κ, − κ } (cid:17) ± O (cid:16) n max { δ, / ε } (cid:17) . Proof of Lemma 5.3:

We start with the simple observation that J lg J = J (cid:0) lg( Jn ) + lg n (cid:1) = n · (cid:16) Jn lg Jn + Jn lg n (cid:17) = Jn n lg n + Jn lg (cid:0) Jn (cid:1) n. (13)With that, we can simplify t ( n ) to (recall s ( k ) ∈ Θ ( k )) t ( n ) = n − k ( n ) + s (cid:0) k ( n ) (cid:1) + X r =1 E (cid:2) aJ r lg J r + bJ r ± ξ ( J r ) (cid:3) − x ( n )= n + X r =1 a E [ J r ] lg n + a E [ J r n lg( J r n )] n + b E [ J r ] ± ξ ( n ) ! − x ( n ) + Θ ( k ( n ))= n + (cid:0) an lg n ± O (log n ) (cid:1) + 2 a E h J n lg( J n ) i n + ( bn ± O (1)) ± ξ ( n ) − (cid:0) an lg n + bn ± ξ ( n ) (cid:1) + Θ ( k ( n ))= (cid:18) a E h J n lg( J n ) i(cid:19) n + Θ (cid:0) k ( n ) (cid:1) ± O ( ξ ( n )) (14)The expectation E h J n lg( J n ) i = − E [ h ( J /n )] is almost of the form addressed in Lemma 3.4when we write the beta-binomial distribution of J as the mixed distribution J = t ( n ) + I ,where I D = BetaBin( n − k, t +1 , t +1): we only have to change the argument from − E [ h ( J /n )]to − E [ h ( I / ( n − k ))]. The ﬁrst step is to show that this can be done with a suﬃciently smallerror. For brevity we write J (resp. I ) instead of J (resp. I ).Let δ = δ ( n ) = 1 / p k ( n ). Thus, by Lemma 4.1 and 1 + x ≤ exp( x ), we obtain P [ J ≤ (1 / − δ ) n ] ≤ k ( n ) · − · p k ( n ) ! ( k ( n ) − / ≤ k ( n ) · exp − k ( n ) − p k ( n ) ! ≤ k ( n ) · exp (cid:0) − p k ( n ) (cid:1) = O (cid:0) k ( n ) − (cid:1) . (15)Notice that better bounds are easily possible, but do not aﬀect the result. We need tochange the argument in the expectation from J/n to I/ ( n − k ) where J = I + t . The idea isthat we split the expectation into two ranges: one for J ∈ [ b (1 / − δ ) n.. d (1 / δ ) n ec ] andone outside. By Equation (15), the outer part has negligible contribution. For the innerpart, we will now show that the diﬀerence between J/n and I/ ( n − k ) is very small. So let j ∈ (cid:2) d (1 / δ ) n e .. b (1 / − δ ) n c (cid:3) and write j = i + t . Then it holds that jn − in − k = j ( n − k ) − ( j − t ) nn ( n − k ) = tn − jkn ( n − k )= t − k · (cid:0) ± ( δ + 1 /n ) (cid:1) n − k (because j = n/ ± ( δn + 1))= − ± k ( δ + 1 /n ) n − k = O (cid:16) k / n (cid:17) (because k = 2 t + 1) . Analysis for growing sample sizes Ω ( k/n ) for unrestricted values of j ; only for the region close to n/

2, the above bound holds.)Now, recall from Lemma 3.5 that h is Hölder-continuous for any exponent η ∈ (0 , C η / ln 2. Thus, | h ( y ) − h ( z ) | = O (cid:16)(cid:16) k ( n ) / n (cid:17) η (cid:17) for y, z ∈ [0 ,

1] with | y − z | = O (cid:0) k ( n ) / n (cid:1) . We use this observation to show: E [ − h ( J/n )] = − n X j =0 P [ J = j ] h ( j/n )= Lemma 3.5–(a) − d (1 / δ ) n e X j = b (1 / − δ ) n c P [ J = j ] · h (cid:16) jn (cid:17) ± P (cid:2) J ≤ (1 / − δ ) n (cid:3) · lg ee = Hölder-cont. − d (1 / δ ) n e X j = b (1 / − δ ) n c P [ J = j ] · h (cid:16) j − tn − k (cid:17) ± P (cid:2) J ≤ (1 / − δ ) n (cid:3) · lg ee ± O (cid:16)(cid:16) k ( n ) / n (cid:17) η (cid:17) = Lemma 3.5–(a) − n X j =0 P [ J = j ] · h (cid:16) j − tn − k (cid:17) ± P (cid:2) J ≤ (1 / − δ ) n (cid:3) · lg ee ± O (cid:16)(cid:16) k ( n ) / n (cid:17) η (cid:17) = (15) E h In − k lg( In − k ) i ± O (cid:18) k ( n ) + (cid:16) k ( n ) / n (cid:17) η (cid:19) . (16)Thus, it remains to examine E [ − h ( I/ ( n − k ))] further. By the deﬁnition of the betabinomial distribution, we have I D = Bin( n − k ( n ) , P ) conditional on the value of the pivot P D = Beta( t ( n ) + 1 , t ( n ) + 1) (see Section 4.3). So we apply Lemma 3.4 on the conditional expectation to get for any ζ ≥ E h h (cid:0) In − k (cid:1) (cid:12)(cid:12)(cid:12) P i = h ( P ) ± ρ where ρ = C η ln 2 · ζ η (cid:16) − e − ζ ( n − k ( n )) (cid:17) + 4 lg ee e − ζ ( n − k ( n )) . By Equation (5) and the asymptotic expansion of the harmonic numbers (see, e.g., [22,Eq. (9.89)]), we ﬁnd E h − h (cid:16) In − k (cid:17)i = − E [ h ( P )] ± ρ = (5) − ( H k ( n )+1 − H ( k ( n )+1) / )ln 2 ± ρ = − · ln 2 − Θ (1 /k ( n ))ln 2 ± ρ and using the choice for ζ from Lemma 3.4= −

12 + Θ (cid:0) k ( n ) − (cid:1) ± O (cid:0) n − / ε (cid:1) for any ﬁxed ε ∈ (0 , ) with ε > − η (recall that still ε can be an arbitrarily small constant).Together with (14) and (16) this allows us to estimate t ( n ). Here, we set ε = 1 − η : t ( n ) = (1 − a ) n + Θ (cid:16) k ( n ) + nk ( n ) (cid:17) ± O (cid:16) ξ ( n ) + √ n · n ε + nk ( n ) + k ( n ) / · (cid:0) nk ( n ) / (cid:1) ε (cid:17) QuickXsort – A Fast Sorting Scheme in Theory and Practice replacing ε and ε by their maximum, we obtain for any small enough ε >

0, that t ( n ) = (1 − a ) n + Θ (cid:16) k ( n ) + nk ( n ) (cid:17) ± O (cid:16) ξ ( n ) + n ε · (cid:16) √ n + k ( n ) / (cid:17) + nk ( n ) (cid:17) = (1 − a ) n + Θ (cid:16) k ( n ) + nk ( n ) (cid:17) ± O (cid:16) ξ ( n ) + n / ε (cid:17) . To see the last step, let us verify that n ε k ( n ) / = O ( n / ε ) + o ( k ( n )): we write N = N ∪ N with N = { n ∈ N | k ( n ) ≤ √ n } and N = { n ∈ N | k ( n ) ≥ √ n } . For n ∈ N clearly wehave n ε k ( n ) / ≤ n / ε . For n ∈ N , we have p k ( n ) ≥ √ n ≥ n ε + ε for some small ε > ε is small); thus, n ε k ( n ) / ≤ k ( n ) n − ε . Altogether, we obtain n ε k ( n ) / = O ( n / ε ) + o ( k ( n )).In the case that a = 1, k ( n ) = Θ ( n κ ) for κ ∈ (0 ,

1) and ξ ( n ) = O ( n δ ) for δ ∈ [0 , t ( n ) = Θ (cid:16) n max { κ, − κ } (cid:17) ± O (cid:16) n max { δ,

12 + ε } (cid:17) (cid:3) Note that t ( n ) can be positive or negative (depending on x ( n )), but the Θ -bound isdeﬁnitively a positive term, and it will be minimal for k ( n ) ∼ √ n . Now that we know theorder of growth of t ( n ), we can proceed to our recurrence for the diﬀerence c ( n ). Bounding the diﬀerence.

The ﬁnal step is to bound c ( n ) from above. Recall that by (12),we have c ( n ) = E (cid:2) A c ( J ) (cid:3) + E (cid:2) A c ( J ) (cid:3) + t ( n ). For the case a >

1, Lemma 5.3 tellsus that t ( n ) is eventually negative and asymptotic to (1 − a ) n . Thus c ( n ) is eventuallynegative, as well, i.e., c ( n ) ≤ x ( n ) for large enough n . The claim follows.We therefore are left with the case a = 1. Lemma 5.3 only gives us a bound in that caseand certainly t ( n ) = o ( n ). The fact that t ( n ) can in general be positive or negative andneed not be monotonic, makes solving the recurrence for c ( n ) a formidable problem, butthe following simpler problem can easily be solved. Lemma 5.4:

Let ˆ t : R ≥ → R ≥ be monotonically increasing and consider the recurrence ˆ c ( n ) = E (cid:2) A ˆ c ( J ) (cid:3) + E (cid:2) A ˆ c ( J ) (cid:3) + ˆ t ( n ) with ˆ c ( n ) = c for n ≤ . Then for any constant β ∈ ( , there is a constant C = C ( β ) > such that ˆ c ( n ) ≤ C d log /β ( n ) e X i =0 ˆ t ( nβ i ) Proof of Lemma 5.4:

Since ˆ t ( n ) is non-negative and monotonically increasing, so is ˆ c ( n )and we can bound ˆ c ( n ) ≤ E (cid:2) ˆ c (max { J , J } ) (cid:3) + ˆ t ( n ) . Let us abbreviate ˆ J = max { J , J } . For given constant β ∈ ( , c thatˆ c ( n ) ≤ E (cid:2) ˆ c ( ˆ J ) (cid:3) + ˆ t ( n )= E (cid:2) ˆ c ( ˆ J ) (cid:12)(cid:12) ˆ J ≤ βn (cid:3) · P (cid:2) ˆ J ≤ βn (cid:3) + E (cid:2) ˆ c ( ˆ J ) (cid:12)(cid:12) ˆ J > βn (cid:3) · P (cid:2) ˆ J > βn (cid:3) + ˆ t ( n ) ≤ P (cid:2) ˆ J ≤ βn (cid:3) · ˆ c ( βn ) + P (cid:2) ˆ J > βn (cid:3) · ˆ c ( n ) + ˆ t ( n ) . . Analysis for growing sample sizes β > , we can bound P (cid:2) ˆ J ≤ βn (cid:3) ≥ C > C and all large enough n .Hence for n large enough,ˆ c ( n ) ≤ P (cid:2) ˆ J ≤ βn (cid:3) − P (cid:2) ˆ J > βn (cid:3) · ˆ c ( βn ) + 11 − P (cid:2) ˆ J > βn (cid:3) · ˆ t ( n ) ≤ ˆ c ( βn ) + 1 C · ˆ t ( n ) . Iterating the last inequality d log /β ( n ) e times, we ﬁnd ˆ c ( n ) ≤ C P d log /β ( n ) e i =0 ˆ t ( nβ i ). (cid:3) We can apply this lemma if we replace t ( n ) by ˆ t ( n ) := max m ≤ n | t ( m ) | , which is bothnon-negative and monotone. We clearly have t ( n ) ≤ ˆ t ( n ) by deﬁnition. Moreover, if t ( n ) = O ( g ( n )) for a monotonically increasing function g , then also ˆ t ( n ) = O ( g ( n )), andthe same statement holds with O replaced by o .Now let ˆ c ( n ) be deﬁned by the recurrence ˆ c ( n ) = E (cid:2) A ˆ c ( J ) (cid:3) + E (cid:2) A ˆ c ( J ) (cid:3) + ˆ t ( n ). Then,we have | c ( n ) | ≤ ˆ c ( n ). We will now bound ˆ c ( n ). o ( n ) bound. We ﬁrst show that ˆ c ( n ) = o ( n ). By Lemma 5.3 we have t ( n ) = o ( n ), and bythe above argument also ˆ t ( n ) = o ( n ). Since ˆ t ( n ) ∈ o ( n ), we know that for every ε >

0, thereis some N ε ∈ N such that for n ≥ N ε , we have ˆ t ( n ) ≤ nε . Let D ε = P N ε i =0 ˆ t ( i ). Then, for any β ∈ ( ,

1) by Lemma 5.4 there is some constant C such that for all n we have | c ( n ) | ≤ ˆ c ( n ) ≤ C d log /β ( n ) e X i =0 ˆ t ( nβ i ) ≤ D ε + C d log /β ( n ) e X i =0 εβ i n ≤ CD ε + ε n for ε := C − β · ε ≥ Cε P d log /β ( n ) e i =0 β i . Since we can hence ﬁnd a suitable ε = ε ( ε ) > ε >

0, the above inequality holds for all ε >

0, and therefore ˆ c ( n ) = o ( n ) holds.This proves the ﬁrst part of Theorem 5.1. Reﬁned bound.

Now, consider the case that k ( n ) = Θ ( n κ ) for κ ∈ (0 ,

1) and ξ ( n ) ∈ O ( n δ )with δ ∈ [0 , t ( n ) = O ( n γ ) for some γ ∈ (0 , C γ such that t ( n ) ≤ C γ n γ + O (1). By Lemma 5.4, we obtain c ( n ) ≤ d log /β ( n ) e X i =0 C γ ( nβ i ) γ + O (1) ≤ C γ n γ X i ≥ ( β γ ) i + O (1) = O ( n γ ) . Moreover, if t ( n ) ∈ Θ ( n γ ) (the case that max { κ, − κ } > max { δ, + ε } in Lemma 5.3),then also c ( n ) ∈ Θ ( n γ ) as c ( n ) ≥ t ( n ). This concludes the proof of the last part ofTheorem 5.1. (cid:3) Theorem 5.1 shows that for methods X that have optimal costs up to linear terms ( a = 1),also median-of- k ( n ) QuickXsort with k ( n ) = Θ ( n κ ) and κ ∈ (0 ,

1) as n → ∞ is optimalup to linear terms. We obtain the best lower-order terms with median-of- √ n QuickXsort ,namely c ( n ) = x ( n ) ± O ( n / ε ), and we will in the following focus on this case.Note that our proof actually gives slightly more information than stated in the theoremfor the case that the cost of X are not optimal in the leading-term coeﬃcient ( a > QuickXsort uses asymptotically fewer comparisons than X, whereas for X with optimalleading-term costs,

QuickXsort uses slightly more comparisons.0

QuickXsort – A Fast Sorting Scheme in Theory and Practice

Does

QuickXsort provide a good bound for the worst case? The obvious answer is “no”. Ifalways the √ n smallest elements are chosen for pivot selection, a running time of Θ ( n / )is obtained. However, we can prove that such a worst case is very unlikely. In fact, let x wc ( n ) be the worst case number of comparisons of the algorithm X. Proposition 5.5 statesthat the probability that QuickXsort needs more than x wc ( n ) + 6 n comparisons decreasesexponentially in n . (This bound is not tight, but since we do not aim for exact probabilities,Proposition 5.5 is enough for us.) Proposition 5.5:

Let ε > . The probability that median-of- √ n QuickXsort needs morethan x wc ( n ) + 6 n comparisons is less than (3 / ε ) √ n for n large enough. Proof:

Let n be the size of the input. We say that we are in a good case if an array of size m is partitioned in the interval [ m/ , m/ √ n elements. For smaller arrays,we can assume an upper bound of √ n = n comparisons for the worst case. If we are alwaysin a good case, all partitioning steps sums up to less than n · P i ≥ (3 / i = 4 n comparisons.We also have to consider the number of comparisons required to ﬁnd the pivot element. Atany stage the pivot is chosen as median of at most √ n elements. Since the median can bedetermined in linear time, for all stages together this sums up to less than n comparisons ifwe are always in a good case and n is large enough. Finally, for all the sorting phases withX we need at most x wc ( n ) comparisons in total (that is only a rough upper bound whichcan be improved). Hence, we need at most x wc ( n ) + 6 n comparisons if always a good caseoccurs.Now, we only have to estimate the probability that always a good case occurs. ByLemma 4.1, the probability for a good case in the ﬁrst partitioning step is at least 1 − d · √ n · (3 / √ n for some constant d . We have to choose log / ( √ n/n ) < .

21 lg n timesa pivot in the interval [ m/ , m/ √ n . We only haveto consider partitioning steps where the array has size greater than √ n (if the size of thearray is already less than √ n we deﬁne the probability of a good case as 1). Hence, for eachof these partitioning steps we obtain that the probability for a good case is greater than1 − d · √ n · (3 / √ n . Therefore, we obtain P [always good case] ≥ (cid:16) − d · √ n · (3 / √ n (cid:17) .

21 lg( n ) ≥ − .

21 lg( n ) · d · √ n · (3 / √ n by Bernoulli’s inequality. For n large enough we have 1 .

21 lg( n ) · d · √ n · (3 / √ n ≤ (3 / ε ) √ n . (cid:3) In order to obtain a provable bound for the worst case complexity we apply a simple tricksimilar to the one used in

Introsort [39]. We choose some δ ∈ (0 , / δn oﬀ from the median (i.e., if J ≤ (1 / − δ ) n or J ≤ (1 / − δ ) n ), wechoose the next pivot as median of the whole array using the median-of-medians algorithm [1] . Analysis for growing sample sizes Remark 5.6:

Notice that instead of choosing the next pivot as median, we can also switchto an entirely diﬀerent sorting algorithm as it is done in

Introsort – as we proposed in[12]. The advantage in [12] is that theoretically a better worst-case can be achieved: indeed,we showed that the worst-case is only n + o ( n ) comparisons above the worst case of thefallback algorithm. Thus, using Reinhardt’s Mergesort [41], we obtain a worst case of n log n − . n + o ( n ) . However, here we follow a diﬀerent approach for two reasons: ﬁrst, wewant to give a (almost) self-contained description of the algorithm; second, we are not awareof a fallback algorithm which in practice performs better than our approach: Heapsort and most internal

Mergesort variants are considerably slower. Moreover, we are even notaware of an implementation of Reinhardt’s

Mergesort . Theorem 5.7 (QuickXsort Worst-Case):

Let X be a sorting algorithm with at most x ( n ) = n lg n + bn + o ( n ) comparisons in the average case and x wc ( n ) = n lg n + O ( n ) comparisons in the worst case and let k ( n ) ∈ ω (1) ∩ o ( n ) with ≤ k ( n ) ≤ n for all n . If k ( n ) = ω ( √ n ) , we additionally require that always some worst-case linear time algorithmis used for pivot selection (e.g. using IntroSelect or the median-of-medians algorithm);otherwise, the worst-case is allowed to be at most quadratic (e.g. using

Quickselect ).Then, median-of- k ( n ) QuickXsort with median-of-medians fallback pivot selection is asorting algorithm that performs x ( n ) + o ( n ) comparisons in the average case and n lg n + O ( n ) comparisons in the worst case. Thus, by applying the median-of-medians fallback pivot selection, the average case changesonly in the o ( n )-terms. Notice that the O ( n )-term for the worst case of QuickMergesort israther large because of the median-of-medians algorithm. Nevertheless, in [10], we elaboratethe technique of median-of-medians pivot selection in more detail. In particular, we showhow to reduce the O ( n )-term for the worst case down 3 . n for QuickMergesort . Proof:

It is clear that the worst case is n lg n + O ( n ) comparisons since there can be atmost max { n, log / δ n } rounds of partitioning (by the additional requirement, pivotselection takes at most linear time). Thus, it remains to consider the average case – forwhich we follow the proof of Theorem 5.1. We say a pivot choice is “bad” if the next pivot isselected as median of the whole array (i.e., if J ≤ (1 / − δ ) n or J ≤ (1 / − δ ) n ), otherwisewe call the pivot “good”.The diﬀerence to the situation in Theorem 5.1 is that now we have four segments todistinguish instead of two: let A be the indicator random variable for the event “left segmentsorted recursively” and A similarly for the right segment – both for the case that the pivotwas good. Likewise, let A be the indicator random variable for the event “left segmentsorted recursively ” and A = 1 − A − A − A “right segment sorted recursively” in thecase that the pivot was bad. Then, A = A + A is the indicator random variable for theevent “left segment sorted recursively” and A = A + A the same for the right segment.Let c ( n ) denote the average number of comparisons of median-of- k ( n ) QuickXsort withmedian-of-medians fallback pivot selection and ˜ c ( n ) the same but in the case that the ﬁrst2 QuickXsort – A Fast Sorting Scheme in Theory and Practice pivot is selected with the median-of-medians algorithms. We obtain the following recurrence c ( n ) = n − k ( n ) | {z } partitioning + s (cid:0) k ( n ) (cid:1)| {z } pivot sampling + E h A · (cid:0) c ( J ) + x ( J ) (cid:1) + A · (cid:0) c ( J ) + x ( J ) (cid:1)i + E h A · (cid:0) ˜ c ( J ) + x ( J ) (cid:1) + A · (cid:0) ˜ c ( J ) + x ( J ) (cid:1)i = X r =1 E [ A r c ( J r )] + t ( n ) , where t ( n ) = n − k ( n ) + s (cid:0) k ( n ) (cid:1) + X r =1 E [ A r x ( J r )] + X r =1 E [ A r +2 (˜ c ( J r ) − c ( J r ))] . As before s ( k ) is the number of comparisons to select the median from the k sample elementsand J and J are the sizes for the left resp. right segment created in the ﬁrst partitioningstep. Since n log n − O ( n ) ≤ ˜ c ( n ) ≤ c wc ( n ) and c wc ( n ) = n log n + O ( n ), it follows that˜ c ( n ) − c ( n ) ∈ O ( n ). By Lemma 4.1 we have P [ A ] , P [ A ] ∈ o (1). Thus, ζ ( n ) := X r =1 E [ A r (˜ c ( J r ) − c ( J r ))] ∈ o ( n ) . As for Theorem 5.1 we now consider c ( n ) = c ( n ) − x ( n ) yielding c ( n ) = n − k ( n ) + s (cid:0) k ( n ) (cid:1) + E h A · (cid:0) c ( J ) + x ( J ) + x ( J ) (cid:1)i + E h A · (cid:0) c ( J ) + x ( J ) + x ( J ) (cid:1)i + ζ ( n ) − x ( n )= E (cid:2) A c ( J ) (cid:3) + E (cid:2) A c ( J ) (cid:3) + t ( n )for t ( n ) = n − k ( n ) + s (cid:0) k ( n ) (cid:1) + E (cid:2) x ( J ) (cid:3) + E (cid:2) x ( J ) (cid:3) + ζ ( n ) − x ( n ). Now the proof proceedsexactly as for Theorem 5.1. (cid:3)

6. Analysis for ﬁxed sample sizes

In this section, we consider the practically relevant version of QuickXsort, where we choosepivots as the median of a sample of ﬁxed size k . We think of k as a design parameter of thealgorithm that we have to choose. Setting k = 1 corresponds to selecting pivots uniformly atrandom; good practical performance is often achieved for moderate values, say, k = 3 , . . . , n ≤ w for a constant w ≥ k , we switch to anothersorting method (for simplicity we can assume that they are sortied directly with X). Clearlythis only inﬂuences the constant term of costs in QuickXsort . Moreover the costs ofsampling pivots is O (log n ) in expectation (for constant k and w ), so we how the median ofthe k sample elements is found is immaterial. We now state the main result of this section, the transfer theorem for median-of- k Quick-Xsort when k is ﬁxed. Instantiations for actual X are deferred to Section 7. Recall that I x,y ( λ, ρ ) denotes the regularized incomplete beta function, see Equation (4) on page 15. . Analysis for ﬁxed sample sizes Theorem 6.1 (Transfer theorem (expected costs, ﬁxed k )): Let c ( n ) be deﬁned byEquation (10) (the recurrence for the expected costs of QuickXsort) and assume x ( n ) (thecosts of X) fulﬁlls x ( n ) = an lg n + bn ± O ( n − ε ) for constants ε ∈ (0 , , a ≥ and b . Assumefurther that k (the sample size) is a ﬁxed odd constant k = 2 t + 1 , t ∈ N . Then it holds that c ( n ) = x ( n ) + q · n ± O ( n − ε + log n ) , where q = 1 H − a · H k +1 − H t +1 H ln 2 H = I , α α ( t + 2 , t + 1) , + I , α ( t + 2 , t + 1) . Before we prove Theorem 6.1, let us look at the consequences for the number of compar-isons of

QuickXsort . The QuickXsort penalty.

Since all our choices for X are optimal up to linear terms, so will

QuickXsort be. We thus have a = 1 in Theorem 6.1; b (and the allowable α ) still dependon X. We then ﬁnd that going from X to QuickXsort basically adds a “penalty” q in thelinear term that depends on the sampling size (and α ) but not on X. Table 2 shows thatthis penalty is ≈ n without sampling, but can be reduced drastically when choosing pivotsfrom a sample of 3 or 5 elements. k = 1 k = 3 k = 5 k = 7 k = 21 t → ∞ α = 1 1 . . . . . α = / . . . . . α = / . . . . . Table 2:

QuickXsort penalty. QuickXsort with x ( n ) = n lg n + bn yields c ( n ) = n lg n +( q + b ) n ,where q , the QuickXsort penalty, is given in the table. As we increase the sample size, we converge to the situation for growing sample sizeswhere no linear-term penalty is left (Section 5). That q is less than 0 .

08 already for a sampleof 21 elements indicates most beneﬁts from pivots sampling are achieved for moderate samplesizes. It is noteworthy that the improvement from no sampling to median-of-3 yields areduction of q by more than 50%, which is much more than its eﬀect on Quicksort itself(where it reduces the leading term of costs by 15 % from 2 n ln n to n ln n ). Proof of Theorem 6.1:

The proof of Theorem 6.1 ﬁlls the remainder of this section. Westart with Equation (12) on page 25, the recurrence for c ( n ) = c ( n ) − x ( n ). Recallthat c ( n ) denotes the expected number of comparisons performed by QuickXsort . With x ( n ) = an lg n + bn ± ξ ( n ) for a monotonic function ξ ( n ) = O ( n − ε ), the same arguments as Although the statement of the theorem is the same, our proof here is signiﬁcantly shorter than the onegiven in [50, Theorem 5.1]. First taking the diﬀerence c ( n ) − x ( n ) turns the much more complicated terms E [ A r x ( J − r )] from t ( n ) into the simpler E [ x ( J r )] in t ( n ), which allowed us to omit [50, Lemma E.1]. QuickXsort – A Fast Sorting Scheme in Theory and Practice in the proof of Theorem 5.1 lead to t ( n ) = (cid:18) a E h J n lg( J n ) i(cid:19) n + Θ (cid:0) s ( k ( n )) (cid:1) ± O ( ξ ( n )) (14) revisited = (cid:18) a E h Jn lg( Jn ) i(cid:19) n ± O ( n − ε ) . (17)The main complication for ﬁxed k is that – unlike for the median-of- √ n case, where thepivot was very close to the overall median with high probability – Jn here has signiﬁcantvariance. We will thus have to compute E (cid:2) Jn lg( Jn ) (cid:3) more precisely and also solve therecurrence for c ( n ) precisely. As a consequence, we need additional techniques over what weused in the previous section; these are established below. In terms of the result, more detailsof the algorithm have signiﬁcant inﬂuence on the overall cost, in particular α and the choicewhich subproblem is sorted recursively will inﬂuence the linear term of costs. In this section, we compute certain expectations that arise, e.g., in the toll function of ourrecurrence. The idea is to approximate Jn by a beta distributed variable, relying on the locallimit law Lemma 3.6. the conditionals translate to bounds of an integral. Carefully tracingthe error of this approximation yields the following result. Lemma 6.2 (Beta-integral approximation):

Let J D = BetaBin( n − c , λ, ρ ) + c be arandom variable that diﬀers by ﬁxed constants c and c from a beta-binomial variable withparameters n ∈ N and λ, ρ ∈ N ≥ .Then for any η ∈ (0 , holds E h Jn ln Jn i = λλ + ρ ( H λ − H λ + ρ ) ± O ( n − η ) , ( n → ∞ ) . Proof of Lemma 6.2:

By the local limit law for beta binomials (Lemma 3.6) it is plausibleto expect a reasonably small error when we replace E (cid:2) J lg J (cid:3) by E (cid:2) ( P n ) lg(

P n ) (cid:3) where P D = Beta( λ, ρ ) is beta distributed. We bound the error in the following.We ﬁrst replace J by I D = BetaBin( n, λ, ρ ) and argue later that this results in a suﬃcientlysmall error. E (cid:2) In ln (cid:0) In (cid:1)(cid:3) = n X i =0 in ln (cid:0) in (cid:1) · P [ I = i ]= 1 n n X i =0 in ln (cid:0) in (cid:1) · n P [ I = i ]= Lemma 3.6 n n X i =0 in ln in · (cid:18) ( i/n ) λ − (1 − ( i/n )) ρ − B( λ, ρ ) ± O ( n − ) (cid:19) = − λ, ρ ) · n n X i =0 f ( i/n ) ± O ( n − ) , where f ( z ) = ln(1 /z ) · z λ (1 − z ) ρ − . Since the derivative is ∞ for z = 0, f cannot beLipschitz-continuous, but it is Hölder-continuous on [0 ,

1] for any exponent η ∈ (0 , . Analysis for ﬁxed sample sizes z z ln(1 /z ) is Hölder-continuous (Lemma 3.5 – (b)), products of Hölder-continuousfunction remain so on bounded intervals and the remaining factor of f is a polynomial in z ,which is Lipschitz- and hence Hölder-continuous. By Lemma 3.1 we then have1 n n X i =0 f ( i/n ) = Z f ( z ) dz ± O ( n − η ) . Recall that we can choose η as close to 1 as we wish; this will only aﬀect the constant inside O ( n − η ).Changing from I back to J has no inﬂuence on the given approximation: To compensatefor the diﬀerence in the number of trials ( n − c instead of n ), we use the above formulas forwith n − c instead of n ; since we let n go to inﬁnity anyway, this does not change the result.Moreover, replacing I by I + c changes the value of the argument z = I/n of f by O ( n − );since z z ln(1 /z ) is smooth, namely Hölder-continuous, this also changes z ln(1 /z ) by atmost O ( n − η ).It remains to evaluate the beta integral; it is given in Equation (5). Inserting, we ﬁnd E [ Jn ln Jn ] = E [ In ln In ] ± O ( n − η )= λλ + ρ (cid:0) H λ − H λ + ρ (cid:1) ± O ( n − η )for any η ∈ (0 , (cid:3) Remark 6.3 (Generalization of beta-integral approximation):

The technique abovedirectly extends to E [ g ( Jn )] for any Hölder-continuous function g . For computing the variancein Section 8, we will have to deal with more complicated functions including the indicatorvariables A ( J ) resp. A ( J ) . As long as g is piecewise Hölder-continuous, the same argumentsand error bounds apply: We can break the sums resp. integrals into several parts and applythe above approximation to each individually. The indicator variables simply translate intorestricted bounds of the integral. For example, we obtain for constants ≤ x ≤ y ≤ that E (cid:2) [ xn ≤ J ≤ yn ] · J lg J (cid:3) = λλ + ρ I x,y ( λ + 1 , ρ ) · n lg n ± O ( n ) , ( n → ∞ ) . Building on the preparatory work from Lemma 6.2, we can easily determine an asymptoticapproximation for the toll function. We ﬁnd t ( n ) = (cid:18) a E h Jn lg( Jn ) i(cid:19) n ± O ( n − ε )= (cid:18) a E h Jn ln( Jn ) i ln 2 (cid:19) n ± O ( n − ε )= Lemma 6.2 a ln 2 (cid:18) t + 12( t + 1) ( H t +1 − H t +2 ) ± O ( n − η ) (cid:19)! n ± O ( n − ε )= (cid:18) − a (cid:0) H k +1 − H t +1 (cid:1) ln 2 (cid:19)| {z } ˆ q n ± O ( n − ε + n − η ) . (18)6 QuickXsort – A Fast Sorting Scheme in Theory and Practice

The expectations E [ A r ( J r ) c ( J r )] in Equation (12) (and in the same way for the originalcosts in Equation (10)) are ﬁnite sums over the values 0 , . . . , n − J := J can attain.Recall that J = n − − J and A ( J ) + A ( J ) = 1 for any value of J . With J = J D = J ,we ﬁnd X r =1 E [ A r ( J r ) c ( J r )] = E "(cid:20) Jn − ∈ h α α , i ∪ (cid:16)

11 + α , i(cid:21) · c ( J ) + E "(cid:20) Jn − ∈ h α α , (cid:17) ∪ (cid:16)

11 + α , i(cid:21) · c ( J ) = n − X j =0 w n,j · c ( j ) , where w n,j = P [ J = j ] · h jn − ∈ [ α α , ] ∪ ( α , i + P [ J = j ] · h jn − ∈ [ α α , ) ∪ ( α , i =  · P [ J = j ] if jn − ∈ [ α α , ) ∪ ( α , · P [ J = j ] if jn − = . We thus have a recurrence of the form required by the Roura’s continuous master theorem(CMT) (see Theorem 3.7) with the weights w n,j from above. Figure 5 shows a speciﬁcexample for how these weights look like.0 0 . . . . zn · w n,zn vs. w ( z ) ( n = 51, k = 3) Figure 5:

The weights w n,j (circles) for n = 51 , t = 1 and α = and the corresponding shapefunction w ( z ) (fat gray line); note the singular point at j = 25 . It remains to determine P [ J = j ]. Recall that we choose the pivot as the median of k = 2 t + 1 elements for a ﬁxed constant t ∈ N , and the subproblem size J fulﬁlls J = t + I . Analysis for ﬁxed sample sizes I D = BetaBin( n − k, t + 1 , t + 1). So we have for i ∈ [0 , n − − t ] by deﬁnition P [ I = i ] = n − ki ! B (cid:0) i + t + 1 , ( n − k − i ) + t + 1 (cid:1) B( t + 1 , t + 1)= n − ki ! ( t + 1) i ( t + 1) n − k − i ( k + 1) n − k The ﬁrst step towards applying the CMT is to identify a shape function w ( z ) that approx-imates the relative subproblem size probabilities w ( z ) ≈ nw n, b zn c for large n . Now thelocal limit law for beta binomials (Lemma 3.6) says that the normalized beta binomial I/n converges to a beta variable “in density”, and the convergence is uniform. With the betadensity f P ( z ) = z t (1 − z ) t / B( t + 1 , t + 1), we thus ﬁnd by Lemma 3.6 that P [ J = j ] = P [ I = j − t ] = 1 n f P ( j/n ) ± O ( n − ) , ( n → ∞ ) . The shift by the small constant t from ( j − t ) /n to j/n only changes the function value by O ( n − ) since f P is Lipschitz continuous on [0 ,

1] (see Section 3.1).With this observation, a natural candidate for the shape function of the recurrence is w ( z ) = 2 h α α < z < ∨ z > α i z t (1 − z ) t B( t + 1 , t + 1) . (19)It remains to show that this is indeed a suitable shape function, i.e., that w ( z ) fulﬁllsEquation (8), the approximation-rate condition of the CMT.We consider the following ranges for b zn c n − = jn − separately:• b zn c n − < α α and < b zn c n − < α .Here w n, b zn c = 0 and so is w ( z ). So actual value and approximation are exactly thesame.• α α < b zn c n − < and b zn c n − > α .Here w n,j = 2 P [ J = j ] and w ( z ) = 2 f P ( z ) where f P ( z ) = z t (1 − z ) t / B( t + 1 , t + 1)is twice the density of the beta distribution Beta( t + 1 , t + 1). Since f P is Lipschitz-continuous on the bounded interval [0 ,

1] (it is a polynomial) the uniform pointwiseconvergence from above is enough to bound the sum of (cid:12)(cid:12) w n,j − R ( j +1) /nj/n w ( z ) dz (cid:12)(cid:12) overall j in the range by O ( n − ).• b zn c n − ∈ { α α , , α } .At these boundary points, the diﬀerence between w n, b zn c and w ( z ) does not vanish (inparticularly is a singular point for w n, b zn c ), but the absolute diﬀerence is bounded.Since this case only concerns 3 out of n summands, the overall contribution to theerror is O ( n − ).Together, we ﬁnd that Equation (8) is fulﬁlled as claimed: n − X j =0 (cid:12)(cid:12)(cid:12)(cid:12) w n,j − Z ( j +1) /nj/n w ( z ) dz (cid:12)(cid:12)(cid:12)(cid:12) = O ( n − ) ( n → ∞ ) . (20)8 QuickXsort – A Fast Sorting Scheme in Theory and Practice . . . . t relative subproblem size α = 1 α = α = Figure 6: R zw ( z ) dz , the relative recursive subproblem size, as a function of t . Remark 6.4 (Relative subproblem sizes):

The integral R zw ( z ) dz is precisely the ex-pected relative subproblem size for the recursive call. This is of independent interest; whileit is intuitively clear that for t → ∞ , i.e., the case of exact medians as pivots, we must havea relative subproblem size of exactly , this convergence is not obvious from the behavior forﬁnite t : the mass of the integral R zw ( z ) dz concentrates at z = , a point of discontinuityin w ( z ) . It is also worthy of note that for, e.g., α = , the expected subproblem size isinitially larger than ( . for t = 0 ), then decreases to ≈ . around t = 20 and thenstarts to slowly increase again (see Figure 6). This eﬀect is even more pronounced for α = . We are now ready to apply the CMT (Theorem 3.7). Assume that a = ln 2 / ( H k +1 − H t +1 );the other (special) case will be addressed later. Then by Equation (18) our toll functionfulﬁlls t ( n ) ∼ ˆ qn for ˆ q = (cid:0) − a ( H k +1 − H t +1 ) / ln 2 (cid:1) . Thus, we have σ = 1, τ = 0 and K = ˆ q = 0 and we compute H = 1 − Z z w ( z ) dz = 1 − Z h α α < z < ∨ z > α i z t +1 (1 − z ) t B( t + 1 , t + 1) dz = 1 − t + 1 k + 1 Z h α α < z < ∨ z > α i z t +1 (1 − z ) t B( t + 2 , t + 1) dz = 1 − (cid:16) I α α , ( t + 2 , t + 1) + I α , ( t + 2 , t + 1) (cid:17) = I , α α ( t + 2 , t + 1) + I , α ( t + 2 , t + 1) (21)For any sampling parameters, we have H >

0, so by Case 1 of Theorem 3.7, we have that c ( n ) ∼ t ( n ) H ∼ ˆ qnH = qn, ( n → ∞ ) . . Analysis of QuickMergesort and QuickHeapsort Special case for a . If a = ln 2 / ( H k +1 − H t +1 ), i.e., ˆ q = 0, then t ( n ) = O ( n − ε ). Then theclaim follows from a coarser bound for c ( n ) = O ( n − ε + log n ) which can be established bythe same arguments as in the proof of Theorem 5.1. Since our toll function is not given precisely, but only up to an error term O ( n − ε ) for agiven ﬁxed ε ∈ (0 , c ( n ) again, but replace t ( n ) (entirely) by C · n − ε . If ε > R z − ε w ( z ) dz < R w ( z ) dz = 1, so we still ﬁnd H > O ( n − ε ). For ε = 1, we have H = 0 and case 2applies, giving an overall error term of O (log n ).This completes the proof of Theorem 6.1. (cid:3)

7. Analysis of QuickMergesort and QuickHeapsort

We have analyzed the expected cost of the

QuickXsort scheme in great detail. Next, weapply our transfer theorems to the concrete choices for X discussed in Section 2. Besidesdescribing how to overcome technical complications in the analysis, we also discuss ourresults. Comparing with analyses and measured comparison counts from previous work, weﬁnd that our exact solutions for the

QuickXsort recurrence yield more accurate predictionsfor the overall number of comparisons.

We use

QuickMergesort here to mean the “ping-pong” variant with smaller buﬀer ( α = )as illustrated in Figure 3 (page 10). Among the variations of Mergesort (that are allusable in

QuickXsort ) we discussed in Section 2.1, this is the most promising option interms of practical performance. The analysis of the other variants is very similar.We assume a variant of

Mergesort that generates an optimally balanced merges.Top-down mergesort is the typical choice for that, but there are also variations of bottom-upmergesort that achieve the same result without using logarithmic extra space for a recursionstack [20].

Corollary 7.1 (Average Case QuickMergesort):

The following results hold for the ex-pected number of comparisons when sorting a random permutation of n elements. (a) Median-of- √ n QuickMergesort is an internal sorting algorithm that performs n lg n − (1 . ± . n ± O ( n / ε ) comparisons on average for any constant ε > . (b) Median-of-3

QuickMergesort (with α = 1 / ) is an internal sorting algorithm thatperforms n lg n − (0 . ± . n ± O (log n ) comparisons on average. Proof:

We ﬁrst note that

Mergesort does never compare buﬀer elements to each other:The buﬀer contents are only accessed in swap operations. Therefore,

QuickMergesort preserves randomness: if the original input is a random permutation, both the calls to

Mergesort and the recursive call operate on a random permutation of the respective0

QuickXsort – A Fast Sorting Scheme in Theory and Practice elements. The recurrence for c ( n ) thus gives the exact expected costs of QuickMergesort when we insert for x ( n ) the expected number of comparisons used by Mergesort on arandom permutation of n elements. The latter is given in Equation (9) on page 17.Note that these asymptotic approximations in Equation (9) are not of the form requiredfor our transfer theorems; we need a constant coeﬃcient in the linear term. But since c ( n ) isa monotonically increasing function in x ( n ), we can use upper and lower bounds on x ( n )to derive upper and lower bounds on c ( n ). We thus apply Theorem 5.1 and Theorem 6.1separately with x ( n ) replaced by x ( n ) = n lg n − . n − O (1) resp. x ( n ) = n lg n − . n + O (1) . For part (a), we ﬁnd x ( n ) ± O ( n / ε ) ≤ c ( n ) ≤ x ( n ) ± O ( n / ε ) for any ﬁxed ε > q = 0 . x ( n ) + qn ± O (log n ) ≤ c ( n ) ≤ x ( n ) + qn ± O (log n ). (cid:3) Remark 7.2 (Randomization vs average case):

We can also prove a bound for the expected performance on any input, where the expectation is taken over the randomchoices for pivot sampling. By using an upper bound for the worst case of

Mergesort , x ( n ) = n lg n − . n + 1 , we ﬁnd that the expected number of comparisons is atmost n lg n − . n ± O ( n / ε ) for median-of- √ n QuickMergesort and at most n lg n − . n + O (log n ) for median-of-3 QuickMergesort . − . − . − − . Figure 7:

Exact comparison count of

Mergesort (red), median-of-3

QuickMergesort (black) and median-of- √ n QuickMergesort (blue) for small input sizes, computedfrom the recurrence. The information-theoretic lower bound (for the average case)is also shown (gray). The x -axis shows n (logarithmic), the y -axis shows (cid:0) c ( n ) − n lg n (cid:1) /n . The horizontal lines are the supremum and inﬁmum of the asymptoticperiodic terms. Given that the error term of our approximation for ﬁxed k is only of logarithmic growth,we can expect very good predictive quality for our asymptotic approximation. This is . Analysis of QuickMergesort and QuickHeapsort c ( n ) and the approximation n lg n − . n , then for n ≥ n . The numbers are computed from the exactrecurrences for Mergesort (see Section 8.3) and

QuickMergesort (Equation (10)) byrecursively tabulating c ( n ) for all n ≤ = 8192. For the pivot sampling costs s ( k ), weuse the average cost of ﬁnding the median with Quickselect, which are known precisely [32,p. 14]. For the numbers for median-of- √ n QuickMergesort , we use k ( n ) = 2 b√ n/ c + 1.The computations were done using Mathematica.For standard Mergesort , the linear coeﬃcient reaches its asymptotic regime ratherquickly; this is due to the absence of a logarithmic term. For median-of-3

QuickMergesort ,considerably larger inputs are needed, but for n ≥ √ n QuickMergesort needs substantially larger inputs than consideredhere to come close to

Mergesort . It is interesting to note that for roughly n ≤ √ n version uses fewercomparisons.Figure 7 shows the well-known periodic behavior for Mergesort . Oscillations areclearly visible also for

QuickMergesort , but compared to the rather sharp “bumps” in

Mergesort ’s cost,

QuickMergesort ’s costs are smoothed out. Figure 7 also conﬁrmsthat the amplitude of the periodic term is very small in

QuickMergesort . By QuickHeapsort we refer to

QuickXsort using the basic

ExternalHeapsort version(as described in Section 2.2) as X. We

Corollary 7.3 (Expected Case QuickHeapsort):

The following results hold for the ex-pected number of comparisons where the expectation is taken over the random choices ofthe pivots. (a)

Median-of- √ n QuickHeapsort is an internal sorting algorithm that performs n lg n +(0 . ± . n ± O ( n / ε ) comparisons for any constant ε > . (b) Median-of-3

QuickHeapsort is an internal sorting algorithm that performs n lg n +(1 . ± . n ± O ( n ε ) comparisons for any constant ε > . Proof:

ExternalHeapsort always traverses one path in the heap from root to bottomand does one comparison for each edge followed, i.e., b lg n c or b lg n c − n (cid:0) b lg n c − (cid:1) + 2 (cid:0) n − b lg n c (cid:1) ± O (log n ) ≤ n lg n − . n ± O (log n )comparisons for the sort-down phase (both in the best and worst case) [4, Eq. 1]. Theconstant of the given linear term is 1 − − lg(2 ln 2), the supremum of the periodicfunction at the linear term. Using the classical heap construction method adds between n − n comparisons and 1 . n comparisons on average [6]. We therefore ﬁnd thefollowing upper bounds for the average and worst case and lower bound for the best case of2 QuickXsort – A Fast Sorting Scheme in Theory and Practice

ExternalHeapsort : x ac ( n ) = n lg n + 0 . n ± O ( n ε ) x wc ( n ) = n lg n + 1 . n ± O ( n ε ) x bc ( n ) = n lg n ± O ( n ε )for any ε > ExternalHeapsort does not preserve the randomness of the buﬀerelements. Our recurrence, Equation (10), is thus not valid for

QuickHeapsort directly.We can, however, study a hypothetical method X that always uses x ( n ) = x wc ( n )comparisons on an input of size n , and consider the costs c ( n ) of QuickXsort for thismethod. This is clearly an upper bound for the cost of

QuickHeapsort since c ( n ) is amonotonically increasing function in x ( n ). Similarly, using x ( n ) = x bc ( n ) yields a lowerbound. The results then follow by applying Theorem 5.1 and Theorem 6.1. (cid:3) We note that our transfer theorems are only applicable to worst resp. best case boundsfor

ExternalHeapsort , but nevertheless, using the average case x ac ( n ) still might give usa better (heuristic) approximation of the actual numbers. Instance observed estimate upper bound CC DW

Fig. 4 [3], n = 10 , k = 1 806 +67 +79 +158 +156Fig. 4 [3], n = 10 , k = 3 714 +98 +110 — +168Fig. 4 [3], n = 10 , k = 1 1 869 769 −

600 +11 263 +90 795 +88 795Fig. 4 [3], n = 10 , k = 3 1 799 240 +9 165 +21 028 — +79 324Fig. 4 [3], n = 10 , k = 1 21 891 874 +121 748 +240 375 +1 035 695 +1 015 695Fig. 4 [3], n = 10 , k = 3 21 355 988 +49 994 +168 621 — +751 581Tab. 2 [4], n = 10 , k = 1 152 573 +1 125 +2 311 +10 264 +10 064Tab. 2 [4], n = 10 , k = 3 146 485 +1 136 +2 322 — +8 152Tab. 2 [4], n = 10 , k = 1 21 975 912 +37 710 +156 337 +951 657 +931 657Tab. 2 [4], n = 10 , k = 3 21 327 478 +78 504 +197 131 — +780 091 Table 3:

Comparison of estimates from this paper where we use the average for

External-Heapsort (estimate) and where we use the worst case for

ExternalHeapsort (upper bound), Theorem 6 of [3] (CC) and Theorem 1 of [4] (DW); shown is thediﬀerence between the estimate and the observed average.

Comparison with previously reported comparison counts.

Both [3] and [4] report averagedcomparison counts from running time experiments. We compare them in Table 3 againstthe estimates from our results and previous analyses. We compare both proven upper boundfrom above and the heuristic estimate using

ExternalHeapsort ’s average case.While the approximation is not very accurate for n = 100 (for all analyses), for larger n , our estimate is correct up to the ﬁrst three digits, whereas previous upper bounds havealmost one order of magnitude bigger errors. Our provable upper bound is somewhere inbetween. Note that we can expect even our estimate to be still on the conservative sidebecause we used the supremum of the periodic linear term for ExternalHeapsort . . Variance of QuickXsort

8. Variance of QuickXsort

If an algorithm’s cost regularly exceeds its expectation by far, good expected performance isnot enough. In this section, we approximate the variance of the number of comparisons in

QuickXsort under certain restrictions. Similar to the expected costs, we prove a generaltransfer theorem for the variance. We then review results on the variance of the number ofcomparisons in

Mergesort and

ExternalHeapsort , the two main methods of interestfor

QuickXsort , and discuss the application of the transfer theorem.

The purpose of this section is to explore what inﬂuence the distribution of the costs of Xhave on

QuickXsort . We assume a constant sample size k in this section. Formally, ourresult is the following. Theorem 8.1 (Variance of QuickXsort):

Assume X is a sorting method whose compari-son cost have expectation x ( n ) = an lg n + bn ±O ( n − ε ) and variance v X ( n ) = a v n + O ( n − ε ) for a constant a v and ε > ; the case a v = 0 is allowed. Moreover, let QuickXsort preserverandomness.Assuming the technical conjecture t v ( n ) = O ( n ) (see below), median-of- k QuickXsort is a sorting method whose comparison cost has variance v ( n ) ∼ cn for an explicitlycomputable constant c that depends only on k , α and a v . Remark 8.2:

We could conﬁrm the conjecture mentioned above for all tried combinationsof values for α and k , but were not able to prove it in the general setting, so we have toformally keep it as a prerequisite. We have no reason to believe it is not always fulﬁlled. Proof of Theorem 8.1:

This transfer theorem can be proven with similar techniques asfor the expected value, but the computations become lengthier.

Distributional recurrence.

We can precisely characterize the distribution of the randomnumber of comparisons, C n , that we need to sort an input of size n . We will generally denotethe random variables by capital letter C n and their expectations by lowercase letters c ( n ).We describe the distribution of C n in the form of a distributional recurrence, i.e., a recursivedescription of the distribution of the family of random variables ( C n ) n ∈ N . From these, wecan mechanically derive recurrence equations for the moments of the distribution and inparticular for the variance. We have C n D = n − k + s ( k ) + A · X J + A · ˜ X J | {z } T n + A · C J + A · ˜ C J , ( n > w ) (22)for ( X n ) n ∈ N the family of (random) comparisons to sort a random permutation of n elementswith X. ( ˜ C n ) n ∈ N and ( ˜ X n ) n ∈ N are independent copies of ( C n ) n ∈ N and ( X n ) n ∈ N , respectively,and these are also independent of ( J , J ); we will in the following omit the tildes for legibility;we implicitly deﬁne all terms in an equation from the same family as each coming from itsown independent copy. Base cases for small n are given by the recursion-stopper methodand are immaterial for the asymptotic regime (for constant w ).4 QuickXsort – A Fast Sorting Scheme in Theory and Practice

Recurrence for the second moment.

We start with the elementary equation Var[ C n ] = E [ C n ] − E [ C n ] . Of course, E [ C n ] = c ( n ), which we already know by Theorem 6.1. Fromthe distributional recurrence, we can compute the second moment m ( n ) = E [ C n ] as follows:Square of both sides in Equation (22), and then take expectations; that leaves m ( n ) onthe left-hand side. To simplify the right-hand side, we use the law of total expectation toﬁrst take expectations conditional on J (which also ﬁxes J = n − − J ) and then takeexpectations over J . We ﬁnd E [ C n | J ] = E (cid:20)(cid:16) T n + P r =1 A r C J r (cid:17) (cid:12)(cid:12)(cid:12)(cid:12) J (cid:21) = E (cid:2) T n (cid:12)(cid:12) J (cid:3) + X r =1 E h A r |{z} = A r C J r (cid:12)(cid:12)(cid:12) J i + E h A A | {z } =0 C J C J (cid:12)(cid:12)(cid:12) J i + X r =1 E (cid:2) T n · A r C J r (cid:12)(cid:12) J (cid:3) since A and A are fully determined by J , and since T n and C J r are conditionally indepen-dent given J , this is = E (cid:2) T n (cid:12)(cid:12) J (cid:3) + X r =1 A r m ( J r ) + 2 E [ T n | J ] X r =1 A r c ( J r ) . We now take expected values also w.r.t. J and exploit symmetries J D = J . We will write A := A and J := J ; we ﬁnd m ( n ) = 2 E [ A m ( J )] + E (cid:2) T n (cid:3) + 2 X r =1 E J h A r E [ T n | J ] c ( J r ) i| {z } t m ( n ) . To continue, we have to unfold t m ( n ) a bit more. We start with the simplest one, theconditional expectation of T n . For constant k , we ﬁnd E [ T n | J ] = E h n ± O (1) + P r =1 (1 − A r ) X J r (cid:12)(cid:12)(cid:12) J i = n + X r =1 (1 − A r ) E [ X J r | J ] ± O (1)= n + X r =1 (1 − A r ) x ( J r ) ± O (1) . So we ﬁnd for the last term in the equation for m ( n )2 X r =1 E J h A r E [ T n | J ] c ( J r ) i = 2 X r =1 E J (cid:20) A r (cid:16) n + P ‘ =1 (1 − A ‘ ) x ( J ‘ ) ± O (1) (cid:17) c ( J r ) (cid:21) = 2 n X r =1 E (cid:2) A r c ( J r ) (cid:3) + 2 X r =1 E (cid:2) A r x ( J − r ) c ( J r ) (cid:3) ± O ( n log n )= 4 n E (cid:2) A c ( J ) (cid:3) + 4 E (cid:2) A c ( J ) x ( n − − J ) (cid:3) ± O ( n log n ) . . Variance of QuickXsort T n : E [ T n ] = E (cid:20)(cid:16) n (cid:0) ± O ( n − ) (cid:1) + P r =1 (1 − A r ) X J r (cid:17) (cid:21) = X r =1 E J r h (1 − A r ) E [ X J r | J r ] i + n (cid:0) ± O ( n − ) (cid:1) + 2 n (cid:0) ± O ( n − ) (cid:1) E [(1 − A ) x ( J )] , denoting Var[ X n ] by v X ( n ) = Θ ( n ) and using E [ X ] = E [ X ] + Var[ X ]= 2 E h (1 − A ) (cid:0) x ( J ) + v X ( J ) (cid:1)i + n ± O ( n log n )= 2 E h (1 − A ) x ( J ) i + 2 a v E [(1 − A ) J ] + 4 n E [(1 − A ) x ( J )] + n ± O ( n − ε ) . We can see here that the variance of X only inﬂuences lower order terms of the variance of

QuickXsort when v X ( n ) = o ( n ). Recurrence for the variance.

We now have all ingredients together to compute an asymp-totic solution of the recurrence for m ( n ), the second moment of costs for QuickMergesort .However, it is more economical to ﬁrst subtract c ( n ) on the level of recurrences, sincemany terms will cancel. We thus now derive from the above results a direct recurrence for v ( n ) = Var[ C n ]. v ( n ) = m ( n ) − c ( n )= 2 E [ A v ( J )] + 2 E [ A c ( J )] − c ( n ) + t m ( n ) | {z } t v ( n ) . (23)For brevity, we write J = n − − J . We compute using c ( n ) = x ( n ) + qn ± O ( n δ ) for a δ < t v ( n ) = 2 E h A (cid:0) x ( J ) + qJ ± O ( n − ε ) (cid:1) i − (cid:0) x ( n ) + qn ± O ( n − ε ) (cid:1) + 4 n E h A (cid:0) x ( J ) + qJ ± O ( n − ε ) (cid:1)i + 4 E h A (cid:0) x ( J ) + qJ ± O ( n − ε ) (cid:1) x ( J ) i + 2 E (cid:2) (1 − A ) x ( J ) (cid:3) + 2 a v E [(1 − A ) J ] + 4 n E (cid:2) (1 − A ) x ( J ) (cid:3) + n ± O ( n − ε )= 2 E (cid:2) A x ( J ) (cid:3) + 4 q E (cid:2) AJ x ( J ) (cid:3) + 2 q E (cid:2) AJ (cid:3) − x ( n ) − qx ( n ) n − q n + 4 n E [ Ax ( J )] + 4 qn E [ AJ ] + 4 E h Ax ( J ) x ( J ) i + 4 q E h AJ x ( J ) i + 2 E (cid:2) (1 − A ) x ( J ) (cid:3) + 2 a v E [(1 − A ) J ] + 4 n E (cid:2) (1 − A ) x ( J ) (cid:3) + n ± O ( n − ε log n )= 2 E (cid:2) x ( J ) (cid:3) + 4 E h Ax ( J ) x ( J ) i − x ( n )+ 4 n E [ x ( J )] + 4 q E (cid:2) AJ (cid:0) x ( J ) + x ( J ) (cid:1)(cid:3) − qx ( n ) n + n + 2 q E (cid:2) AJ (cid:3) + 2 a v E [(1 − A ) J ] + 4 qn E [ AJ ] − q n ± O ( n − ε log n ) . At this point, the only route to make progress seems to be to expand all occurrences of x into x ( n ) = an lg n + bn + O ( n − ε ) and compute the expectations. For that, we use theapproximation by incomplete beta integrals that we introduced in Section 6.2 to compute6 QuickXsort – A Fast Sorting Scheme in Theory and Practice the expectations of the form E [ g ( J )], where g only depends on J . Writing z = Jn and z = 1 − z = Jn , we can expand all occurring functions g as follows: J lg ( J ) = z · n lg n + 2 z lg z · n lg n + z lg z · n J lg J = z · n lg n + z lg z · n J J lg( J ) lg( J ) = zz · n lg n + zz (lg z + lg z ) · n lg n + zz lg( z ) lg( z ) · n J J lg J = zz · n lg n + zz lg z · n . The right hand sides are all Hölder-continuous functions in z ∈ [0 , t v ( n ) is too big to state here in full, but it can easily befound and evaluated for ﬁxed values of t by computer algebra. We provide a Mathematicanotebook for this step as supplementary material [51].The incomplete beta integrals resulting form the rewritten expectations are principallysolvable symbolically by partial integration for given values of t and can be expressed usingspecial functions. A general closed form seems out of reach, though. We will list numericapproximations for small sample sizes below. Solution of the recurrence.

Although the above expression for t v ( n ) contains terms oforder n lg n and n lg n , in all examined cases, these higher-order terms canceled and left t v ( n ) ∼ cn for an explicitly computable constant c >

0. We conjecture that this is alwaysthe case, but we did not ﬁnd a simple proof. We therefore need the technical assumptionthat indeed t v ( n ) = Θ ( n ). Under that assumption, we obtain an asymptotic approximationfor v ( n ) from Equation (23) using the CMT (Theorem 3.7) with σ = 2 and τ = 0. Note thatthe shape function w ( z ) of the recurrence is exactly the same as for the expected costs (seeSection 6.4). We thus compute H = 1 − Z z w ( z ) dz = 1 − Z h α α < z < ∨ z > α i z t +2 (1 − z ) t B( t + 1 , t + 1) dz = 1 − t + 1) ( k + 1) Z h α α < z < ∨ z > α i z t +2 (1 − z ) t B( t + 3 , t + 1) dz = 1 − t + 2 k + 2 (cid:16) I α α , ( t + 3 , t + 1) + I α , ( t + 3 , t + 1) (cid:17) . (24)Since t +2 k +2 ≤ and the integral over the entire unit interval would be exactly 1, we have H > α and t . So by Case 1 of the CMT, the variance of QuickXsort is v ( n ) ∼ t v ( n ) H and in particular it is quadratic in n , and the leading coeﬃcient can be computed symboli-cally. (cid:3) Below, we give the leading-term coeﬃcient for the variance (i.e., c in the terminology ofTheorem 8.1) for several values of α and k . We ﬁx a = 1, i.e., we consider methods X with . Variance of QuickXsort b of the linear term in x ( n ) does not inﬂuence the leadingterm of the variance. In the results, we keep a v as a variable, although for the methods X ofmost interest, namely Mergesort and

ExternalHeapsort , we actually have a v = 0. k = 1 k = 3 k = 9 α = 1 0 . . a v . . a v . . a v α = / . . a v . . a v . . a v α = / . . a v . . a v . . a v Table 4:

Leading term coeﬃcients of the variance of

QuickXsort . First note that since

Mergesort ’s costs diﬀer by O ( n ) for the best and worst case, thevariance is obviously in O ( n ). A closer look reveals that Mergesort’s costs are indeed muchmore concentrated and the variance is of order Θ ( n ): For a given size n , the overall costsare the sum of independent contributions from the individual merges, each of which hasconstant variance. Indeed, the only source of variability in the merge costs is that we do notneed further comparisons once one of the two runs is exhausted.More precisely, for standard top-down mergesort, X n can be characterized by (see [16]) X n D = X d n/ e + X b n/ c + n − L d n/ e , b n/ c P [ L m,n ≤ ‘ ] = (cid:0) n + m − ‘m (cid:1) + (cid:0) n + m − ‘n (cid:1)(cid:0) n + mm (cid:1) . Following Mahmoud [33, eq. (10.3), eq. (10.1)], we ﬁnd that the variance of the costs for asingle merge is constant: E [ L m,n ] = mn + 1 + nm + 1 = m n − + n m − E [ L m,n ] = 2 m n − + 2 n m − Var[ L m,n ] = E [ L m,n ] + E [ L m,n ] − E [ L m,n ] = 2 m n − + 2 n m − + m n − + n m − − (cid:16) m n − + n m − (cid:17) ≤ , for | m − n | ≤ n for the variance. Precise asymptotic expansions have beencomputed by Hwang [26]: Var[ X n ] = nφ (lg( n )) − o (1)for a periodic function φ ( x ) ∈ [0 . , . We assume here an unmodiﬁed standard

Mergesort variant that executes all merges in any case. Inparticular we assume the following folklore trick is not used: One can check (with one comparison) whetherthe two runs are already sorted prior to calling the merge routine and skip merging entirely if they are.This optimization leads to a linear best case and will increase the variance. QuickXsort – A Fast Sorting Scheme in Theory and Practice

Since the variance of Mergesort is subquadratic, Theorem 8.1 would be applied with a v = 0,and we obtain, e.g., a variance of 0 . n for k = 1 and 0 . n for k = 3. Interestingly,these results do not depend on our choice for the constant b of the linear term of x ( n ). . . . α = , k = 1 2 . . . . . α = , k = 3 Figure 8:

Exact values for the normalized standard deviation in

QuickMergesort (computedfrom the exact recurrence for the second moment) and the asymptotic approximationfrom Table 4 (gray line). The x -axis shows the inputs size n (logarithmic) and the y -axis is the standard deviation of the number of comparisons divided by n . Theplots show diﬀerent sample sizes. They match empirical numbers quite well. There is still a noticeable diﬀerence in Figure 8,which compares the above approximations with exact values for small n computed fromthe recurrence. For large n , though, the accuracy is stunningly good, see Figure 14 in theexperiments section. Fine print.

Although our transfer theorem is perfectly valid and ﬁts Monte Carlo simulationsvery well, it is formally not applicable to

QuickMergesort . The reason for this are thetiny periodic ﬂuctuations (w.r.t. n ) in the cost of Mergesort in both expected costs andtheir variance.For the expected values, we could use upper and lower bounds for x ( n ) to derive upperand lower bounds for the costs of QuickXsort . Determining the precise inﬂuence ofﬂuctuations in

QuickXsort ’s expected cost is an interesting topic for future research, butsince the bounds are so close, our approach taken in this paper is certainly suﬃcient onpractical grounds. For the variance, this is diﬀerent. The variance of

QuickMergesort isinﬂuenced by the periodic terms of the expected costs of Mergesort, and simple argumentsdo not yield rigorous bounds.Intuitively

QuickMergesort acts as a smoothing on the costs of Mergesort sincesubproblem sizes are random. It is therefore quite expected to ﬁnd very smooth periodicinﬂuences of small amplitude. The fact that our estimate does not depend on b or the precisevariance of Mergesort at all, gives hope that is a very good approximation. But it remainsheuristic approximation. . QuickMergesort with base cases

9. QuickMergesort with base cases In QuickMergesort , we can improve the number of comparisons even further by sortingsmall subarrays with yet another algorithm Z. The idea is to use Z only for tiny subprob-lems, so that it is viable methods that require extra space and have otherwise prohibitivecost for other operations like moves. Obvious candidates for Z are

Insertionsort and

MergeInsertion .If we use O (log n ) elements for the base case of Mergesort , we have to call Z at most O ( n/ log n ) times. In this case we can allow an overall O ( n ) running time for Z and stillobtain only O (( n/ log n ) · log n ) = O ( n log n ) overhead in QuickMergesort . We notethat for the following result, we only need that the size of the base cases grows with n , butnot faster than logarithmic.We start by bounding the costs of Mergesort base case Z. Reinhardt [41] proposesthis idea using

MergeInsertion for base cases of constant size and essentially states thefollowing result, but does not provide a proof for it.

Theorem 9.1 (Mergesort with Base Case):

Let Z be some sorting algorithm with z ( n ) = n lg n + ( b ± ε ) n + o ( n ) comparisons on average and other operations taking atmost O ( n ) time. If base cases of size O (log n ) are sorted with Z, algorithmnameMergesortuses at most n lg n + ( b ± ε ) n + o ( n ) comparisons and O ( n log n ) other instructions on average. Proof:

Since Z uses z ( n ) = n lg n + ( b ± ε ) n + o ( n ) comparisons on average, for every δ > | z ( n ) − ( n lg n + bn ) | ≤ ( ε + δ ) · n for n large enough. Let k ≥ k/ ≤ n ≤ k and let x k ( m ) denote the averagecase number of comparisons of Mergesort with base cases of size k sorted with Z, i.e., x k ( n ) = z ( n ) for n ≤ k .By induction we will show that | x k ( n ) − ( n lg n + bn ) | ≤ (cid:18) ε + δ + 8 k (cid:19) · n − e k ( n )for n ≥ k/ k/ ≤ n ≤ k this holds by hypothesis, so assume that n > k . We have x k ( n ) = x k ( d n/ e ) + x k ( b n/ c ) + n − η ( n )for some η with 1 ≤ η ( n ) ≤ n (see e.g. [16, p. 676]). It follows that | x k ( n ) − ( n lg n + bn ) | = (cid:12)(cid:12)(cid:12) x k ( d n/ e ) + x k ( b n/ c ) + n − η ( n ) − ( n lg n + bn ) (cid:12)(cid:12)(cid:12) ≤ [inductive hypothesis] e k ( d n/ e ) + e k ( b n/ c ) + (cid:12)(cid:12)(cid:12) d n/ e (lg d n/ e + b )+ b n/ c (lg b n/ c + b ) + n − η ( n ) − ( n lg n + bn ) (cid:12)(cid:12)(cid:12) ≤ e k ( n ) − (cid:12)(cid:12)(cid:12) d n/ e (lg( n/

2) + b )+ b n/ c (lg( n/

2) + b ) + n − η ( n ) − ( n lg n + bn ) (cid:12)(cid:12)(cid:12) + 2 ≤ e k ( n ) − η ( n ) ≤ e k ( n )Notice here that lg d n/ e − lg( n/ ≤ · ( n +1) ≤ n . This can be easily seen by the seriesexpansion of the logarithm. By choosing k = lg n , the lemma follows. (cid:3) QuickXsort – A Fast Sorting Scheme in Theory and Practice

Mergesort with base cases can thus be very comparison eﬃcient, but is an externalalgorithm. By combining it with

QuickMergesort , we obtain an internal method withessentially the same comparison cost. Using the same route as in the proof of Corollary 7.1,we obtain the formal result.

Corollary 9.2 (QuickMergesort with Base Case):

Let Z be some sorting algorithmwith z ( n ) = n lg n + ( b ± ε ) n + o ( n ) comparisons on average and other operations taking atmost O ( n ) time. If base cases of size Θ (log n ) are sorted with Z, QuickMergesort usesat most n lg n + ( b ± ε ) n + o ( n ) comparisons and O ( n log n ) other instructions on average. Base cases of growing size always lead to a constant factor overhead in running time if analgorithm with a quadratic number of total operations is used. Therefore, in the experimentswe also consider constant size base cases which oﬀer a slightly worse bound for the numberof comparisons, but are faster in practice. A modiﬁcation of our proof above allows to boundthe impact on the number of comparisons, but we are facing a trade-oﬀ between comparisonsand other operations, so the best threshold for Z depends on the type of data to be sortedand the system on which the algorithms run.

We know study the average cost of the natural candidates for Z. We start with

Insertionsort ,since it is an elementary method and its analysis is used as part of our average-case analysisof

MergeInsertion later. Recall that

Insertionsort inserts the elements one by one intothe already sorted sequence by binary search. For the average number of comparisons weobtain the following result.

Proposition 9.3 (Average Case of Insertionsort):

The sorting algorithm

Insertion-sort needs n lg n − · n + c ( n ) · n + O (log n ) comparisons on average where c ( n ) ∈ [ − . , . . Sorting base cases of logarithmic size in

QuickMergesort with

Insertionsort , we obtainthe next result by Corollary 9.2:

Corollary 9.4 (QuickMergesort with Base Case Insertionsort):

Median-of- √ n QuickMergesort with

Insertionsort base cases uses at most n lg n − . n + o ( n ) comparisons and O ( n log n ) other instructions on average. Proof of Proposition 9.3:

First, we take a look at the average number of comparisons x Ins ( k ) to insert one element into a sorted array of k − k − d lg k e − d lg k e comparisons.There are k positions where the element to be inserted can end up, each of which is equallylikely. For 2 d lg k e − k of these positions d lg k e − k − (2 d lg k e − k ) = 2 k − d lg k e positions d lg k e comparisons are needed. This means x Ins ( k ) = (2 d lg k e − k ) · ( d lg k e −

1) + (2 k − d lg k e ) · d lg k e k = d lg k e + 1 − d lg k e k . QuickMergesort with base cases n elements: x InsSort ( n ) = n X k =1 x Ins ( k ) = n X k =1 d lg k e + 1 − d lg k e k ! = [31, 5.3.1–(3)] n · d lg n e − d lg n e + 1 + n − n X k =1 d lg k e k . We examine the last sum separately. As before we write H n = P nk =1 1 k = ln n + γ ± O ( n )for the harmonic numbers where γ ∈ R is Euler’s constant. n X k =1 d lg k e k = 1 + d lg n e− X i =0 2 i X ‘ =1 i +1 i + ‘ + n X ‘ =2 d lg n e− +1 d lg n e ‘ = 1 +  d lg n e− X i =0 i +1 · (cid:16) H i +1 − H i (cid:17) + 2 d lg n e · ( H n − H d lg n e− )= d lg n e− X i =0 i +1 · (cid:16) ln (cid:16) i +1 (cid:17) + γ − ln (cid:16) i (cid:17) − γ (cid:17) + (cid:16) ln( n ) + γ − ln (cid:0) d lg n e− (cid:1) − γ (cid:17) · d lg n e ± O (log n )= ln 2 · d lg n e− X i =0 i +1 + (cid:16) lg( n ) · ln 2 − ( d lg n e − · ln 2 (cid:17) · d lg n e ± O (log n )= ln 2 · (cid:16) · (cid:0) d lg n e− − (cid:1) + (lg n − d lg n e + 1) · d lg n e (cid:17) ± O (log n )= ln 2 · (cid:0) n − d lg n e (cid:1) · d lg n e ± O (log n ) . The error term of O (log n ) is due to the fact that for any C we have P d lg n e− i =0 i +1 · C i =2 C ( d lg n e − x InsSort ( n ) = n · d lg n e − d lg n e + n − ln 2 · (2 + lg n − d lg n e ) · d lg n e + O (log n ) . In order to obtain a numeric bound for x InsSort ( n ), we compute ( x InsSort ( n ) − n lg n ) /n andthen replace d lg n e − lg n by x . This yields a function x x − x + 1 − ln 2 · (2 − x ) · x , which oscillates between − .

381 and − .

389 for 0 ≤ x <

1; see also Figure 9. For x = 0, itsvalue is 2 ln 2 ≈ . (cid:3) MergeInsertion by Ford and Johnson [17] is one of the best sorting algorithms in terms ofnumber of comparisons. Applying it for sorting base cases of

QuickMergesort yields evenbetter results than

Insertionsort . W give a brief description of the algorithm and analyzeits average case for a simpliﬁed version. Algorithmically,

MergeInsertion ( s , . . . , s n − )can be described as follows (an intuitive example for n = 21 can be found in [31]):2 QuickXsort – A Fast Sorting Scheme in Theory and Practice . . . . − . − . − . − . − . Figure 9:

The periodic function in

Insertionsort x x − x + 1 − ln 2 · (2 − x ) · x for x = lg n − b lg n c ∈ [0 , .

1. Arrange the input such that s i ≥ s i + b n/ c for 0 ≤ i < b n/ c with one comparison perpair. Let a i = s i and b i = s i + b n/ c for 0 ≤ i < b n/ c , and b b n/ c = s n − if n is odd.2. Sort the values a ,...,a b n/ c− recursively with MergeInsertion .3. Rename the solution as follows: b ≤ a ≤ a ≤ · · · ≤ a b n/ c− and insert theelements b , . . . , b d n/ e− via binary insertion, following the ordering b , b ; b , b ; b , b , . . . , b , . . . ; b t k − − , . . . b t k − ; b t k − , . . . into the main chain, where t k = (2 k +1 +( − k ) / k comparisons for the elements b t k − , . . . , b t k − .While the description is simple, MergeInsertion is not easy to implement eﬃcientlybecause of the diﬀerent renamings, the recursion, and the insertion in the sorted list. Ourproposed implementation of

MergeInsertion is based on a tournament tree representationwith weak heaps as in [7, 9]. It uses quadratic time and requires n lg n + n extra bits.When inserting some of the b i with t k − ≤ i ≤ t k − k comparisons are needed. During an actual execution of the algorithm, it mighthappen, that only k − k − b t k − , b t k − − , . . . b t k − ) are always inserted into the same number of elements.Thus, for the elements of the k -th block always k comparisons are used – except for the lastblock b d n/ e− , . . . b t k . In our experiments we evaluate the simpliﬁed and the original variant. Theorem 9.5 (Average Case of MergeInsertion):

Simpliﬁed

MergeInsertion needs n lg n − c ( n ) · n + O (log n ) comparisons on average, where c ( n ) ≥ . . When applying

MergeInsertion to sort base cases of size O (log n ) in QuickMerge-sort , we obtain the next corollary from Corollary 9.2 and Theorem 9.5.

Corollary 9.6 (QuickMergesort with Base Case MergeInsertion):

Median-of- √ n QuickMergesort with

MergeInsertion for base cases needs at most n lg n − . n + o ( n ) comparisons and O ( n log n ) other instructions on average. Instead of growing-size base cases, we also can sort constant-size base cases with

MergeIn-sertion . When the size of the base cases is reasonably small, we can hard-code the MergeIn-sertion algorithm to get a good practical performance combined with a lower number ofcomparisons than just

QuickMergesort . In our experiments we also test one variantwhere subarrays up to nine elements are sorted with

MergeInsertion . . QuickMergesort with base cases Proof of Theorem 9.5:

According to Knuth [31],

MergeInsertion requires at most W ( n ) = n lg n − (3 − lg 3) n + n ( y + 1 − y ) + O (log n ) comparisons in the worst case, where y = y ( n ) = d lg(3 n/ e − lg(3 n/ ∈ [0 , F ( n ) denote the average number of comparisons of theinsertion steps of MergeInsertion , i. e., all comparisons minus the number of comparisons P ( n ) for forming pairs (during all recursion steps). It is easy to see that P ( n ) = n − O (log n )(indeed, P ( n ) = n − n is a power of two); moreover, it is independent of the actual inputpermutation. We obtain the recurrence relation F ( n ) = F ( b n/ c ) + G ( d n/ e ) , with G ( m ) = ( k m − α m ) · ( m − t k m − ) + k m − X j =1 j · ( t j − t j − ) , with k m such that t k m − ≤ m < t k m and some α m ∈ [0 ,

1] (recall that t k = (2 k +1 + ( − k ) / t k m − requires always the same number of comparisons. Thus, the term P k m − j =1 j · ( t j − t j − ) is independent of the data. However, inserting an element after t k m − may either need k m or k m − α m comes from. Note that α m only depends on m . We split F ( n ) into F ( n ) + F ( n ) with F ( n ) = F ( b n/ c ) + G ( d n/ e ) and G ( m ) = ( k m − α m ) · ( m − t k m − ) with k m such that t k m − ≤ m < t k m ,and F ( n ) = F ( b n/ c ) + G ( d n/ e ) and G ( m ) = k m − X j =1 j · ( t j − t j − ) with k m such that t k m − ≤ m < t k m .For the average case analysis, we have that F ( n ) is independent of the data. For n ≈ (4 / · k we have G ( n ) ≈

0, and hence, F ( n ) ≈

0. Since otherwise G ( n ) is positive,this shows that approximately for n ≈ (4 / · k the average case matches the worst caseand otherwise it is better.Now, we have to estimate F ( n ) for arbitrary n . We have to consider the calls to binaryinsertion more closely. To insert a new element into an array of m − d lg m e − d lg m e comparisons. For a moment assume that the element is inserted atevery position with the same probability. Under this assumption the analysis in the proof ofProposition 9.3 is valid, which states that x Ins ( m ) = d lg m e + 1 − d lg m e m comparisons are needed on average.The problem is that in our case the probability at which position an element is insertedis not uniformly distributed. However, it is monotonically decreasing with the index in the4 QuickXsort – A Fast Sorting Scheme in Theory and Practice array (indices as in the description in Section 9.2). Informally speaking, this is because ifan element is inserted further to the left, then for the following elements there are morepossibilities to be inserted than if the element is inserted on the right.Now, binary - insert can be implemented such that for an odd number of positions thenext comparison is made such that the larger half of the array is the one containing thepositions with lower probabilities. (In our case, this is the part with the higher indices.)That means the less probable positions lie on rather longer paths in the search tree, andhence, the average path length is better than in the uniform case. Therefore, we may assumea uniform distribution as an upper bound in the following.In each of the recursion steps we have d n/ e − t k d n/ e − calls to binary insertion into setsof size d n/ e + t k d n/ e − − t k d n/ e − ≤ d n/ e < t k d n/ e . Wewrite u d n/ e = t k d n/ e − . Hence, for inserting one element, the diﬀerence between the averageand the worst case is 2 d lg( d n/ e + u d n/ e ) ed n/ e + u d n/ e − . Summing up, we obtain for the average savings S ( n ) = W ( n ) − ( F ( n ) + P ( n ))) (recall that P ( n ) is the number of comparisons for forming pairs) w. r. t. the worst case number W ( n )the recurrence S ( n ) ≥ S ( b n/ c ) + (cid:0) d n/ e − u d n/ e (cid:1) ·  d lg( d n/ e + u d n/ e ) ed n/ e + u d n/ e −  . For m ∈ R > we write m = 2 ‘ m − lg 3+ x with ‘ m ∈ Z and x ∈ [0 ,

1) and we set f ( m ) = ( m − ‘ m − lg 3 ) · ‘ m m + 2 ‘ m − lg 3 − ! . Recall that we have t k = (2 k +1 + ( − k ) / k m − k m − log 3 + ( − k / ≤ m . Therefore, u m = 2 ‘ m − lg 3 and k m − ‘ m except for the case m = t k for some odd k ∈ Z . Assume m = t k for any odd k ∈ Z ; then we have d lg( m + u m ) e = l lg(2 ‘ m − lg 3+ x + 2 ‘ m − lg 3 ) m = ‘ m + d lg((2 x + 1) / e = ‘ m and, hence, f ( m ) = ( m − u m ) · (cid:16) d lg( m + um ) e m + u m − (cid:17) . On the other hand, if m = t k for some odd k ∈ Z , we have k m = ‘ m and f ( t k ) ≤ t k · k t k + 2 k / − ! = t k · · k k +1 − k − ! = t k · k − ≤ . Altogether this implies that f ( m ) and ( m − u m ) · (cid:16) d lg( m + um ) e m + u m − (cid:17) diﬀer by at most someconstant (as before u m = t k m − ). Furthermore, f ( m ) and f ( m + 1 /

2) diﬀer by at most aconstant. Hence, we have: S ( n ) ≥ S ( n/

2) + f ( n/ ± O (1) . Since we have f ( n/

2) = f ( n ) /

2, this resolves to S ( n ) ≥ X i> f ( n/ i ) ± O (log n ) = X i> f ( n ) / i ± O (log n ) = f ( n ) ± O (log n ) . . QuickMergesort with base cases . . . . . . . . Figure 10:

The periodic function in

MergeInsertion x (3 − lg 3) − (2 − x − − x ) +(1 − − x ) · (cid:0) x +1 − (cid:1) for x = lg 3 n − b lg 3 n c ∈ [0 , . With n = 2 k − lg 3+ x this means S ( n ) n = 2 k − lg 3+ x − k − lg 3 k − lg 3+ x · k k − lg 3+ x + 2 k − lg 3 − ! ± O (log n/n )= (1 − − x ) · (cid:18) x + 1 − (cid:19) ± O (log n/n ) . Recall that we wish to compute F ( n ) + P ( n ) ≤ W ( n ) − S ( n ). Writing F ( n ) + P ( n ) = n lg n − c ( n ) · n with c ( n ) ∈ O (1), we obtain with [31, 5.3.1 Ex. 15] c ( n ) ≥ − ( F ( n ) − n lg n ) /n = (3 − lg 3) − ( y + 1 − y ) + S ( n ) /n, where y = d lg(3 n/ e − lg(3 n/ ∈ [0 , n = 2 ‘ − lg 3 − y for some ‘ ∈ Z . With y = 1 − x it follows c ( n ) ≥ (3 − lg 3) − (1 − x + 1 − − x ) + (1 − − x ) · (cid:18) x + 1 − (cid:19) > . . (25)This function reaches its minimum in [0 ,

1) for x = lg (cid:18) ln 8 − q (1 − ln 8) − (cid:19) ≈ . . (cid:3) Remark 9.7 (Worst n for MergeInsertion): We know that for

Mergesort the op-timal input sizes are powers of two. Is the same true for

MergeInsertion ? We knowthat for the worst case, the best n are (close to) · k for an integer k . For the averagecase, we only have the upper bound of Equation (25) . Nevertheless, this should give areasonable approximation. It is not diﬃcult to observe that c (2 k ) = 1 . : For the lin-ear coeﬃcient e ( n ) in the worst case costs, W ( n ) = n lg n − e ( n ) · n + O (log n ) , we have e (2 k ) = (3 − lg 3) − ( y + 1 − y ) , where y = (cid:6) lg((3 / · k ) (cid:7) − lg((3 / · k ) . We know that y can be rewritten as y = (cid:6) lg(3) + lg(2 k / (cid:7) − (lg 3 + lg(2 k /

4) = d lg 3 e − lg 3 = 2 − lg 3 . Hence,we have e ( n ) = 4 / . Finally, we are interested in the value W ( n ) − S ( n ) = W (2 k ) − S (2 k ) = − / n − / n = − . n .Thus, for powers of two the proof of Theorem 9.5 gives almost the worst bounds, sopresumably these are among the worst input sizes for MergeInsertion (which also can beseen from the plot in Figure 10). QuickXsort – A Fast Sorting Scheme in Theory and Practice

Remark 9.8 (Better bounds?):

Can one push the coeﬃcient − . even further?Clearly, the non-simpliﬁed version of MergeInsertion will have a coeﬃcient below − . aswe can see in our experiments in Figure 11. A formal proof is lacking, but it should not bevery diﬃcult.For the simpliﬁed version studied here, the empirical numbers from Section 10 seem tosuggest that our bound is tight. However, there is one step in the proof of Theorem 9.5,which is not tight (otherwise, we loose only O (log n ) ): in order to estimate the costs of thebinary search, we approximated the probability distribution where the elements are insertedby a uniform distribution. We conjecture that diﬀerence between the approximation and thereal values is a very small linear term meaning that the actual coeﬃcient of the linear termcan be still just above or below − . .Also notice that the exact number of comparisons of the algorithm depends on a smallimplementation detail: in the binary search it is not completely speciﬁed which is the ﬁrstelements to compare with. Iwama and Teruyama [27] propose an improvement of

Insertionsort , which inserts a(sorted) pair of elements in one step. The main observation is that the binary searches aregood only if n is close to a power of two, but become more wasteful for other n . Insertingtwo elements together helps in such cases.On the other hand, MergeInsertion is much better than the upper bound in Equa-tion (25) when n is close to · k for an integer k (see Figure 10). Using their new(1,2)- Insertionsort unless n is close to times a power of two, Iwama and Teruyamaobtain a portfolio algorithm “ Combination ”, which needs n lg n − c ( n ) · n + O (log n ) com-parisons on average, where c ( n ) ≥ . O ( n ) (in anaive implementation), so that we can also use this algorithm as a base case sorter Z. Corollary 9.9 (QuickMergesort with Base Case Combination):

Median-of- √ n QuickMergesort with Iwama and Teruyama’s

MergeInsertion/(1,2)-Insertionsort method for base cases needs at most n lg n − . n + o ( n ) comparisonsand O ( n log n ) other instructions on average. In contrast to the original method of Iwama and Teruyama,

QuickMergesort with theirmethod for base cases is an internal sorting method with O ( n log n ) running time.With this present champion in terms of the average-case number of comparisons, we closeour investigation of asymptotically optimal sorting methods. In the following, we will take alook at their actually running times on realistic input sizes.

10. Experiments

In this section, we report on studies with eﬃcient implementations of our sorting methods.We conducted two sets of experiments: First, we compare our asymptotic approximationswith experimental averages for ﬁnite n to assess the inﬂuence of lower order terms forrealistic input sizes. Second, we conduct an extensive running-time study to compare QuickMergesort with other sorting methods from the literature.

0. Experiments Experimental setup.

We ran thorough experiments with implementations in C++ withdiﬀerent kinds of input permutations. The experiments are run on an Intel Core i5-2500KCPU (3.30GHz, 4 cores, 32KB L1 instruction and data cache, 256KB L2 cache per coreand 6MB L3 shared cache) with 16GB RAM and operating system Ubuntu Linux 64bitversion 14.04.4. We used GNU’s g++ (4.8.4); optimized with ﬂags -O3 -march=native . Fortime measurements, we used std::chrono::high_resolution_clock , for generating randominputs, the Mersenne Twister pseudo-random generator std::mt19937 . All experiments,except those in Figure 18, were conducted with random permutations of 32-bit integers.

Implementation details.

The code of our implementation of

QuickMergesort as well asthe other algorithms and our running time experiments is available at https://github.com/weissan/QuickXsort . In our implementation of

QuickMergesort , we use the mergingprocedure from [15], which avoids branch mispredictions. We use the partitioner from theGCC implementation of std::sort . For all running time experiments in

QuickMergesort we sort base cases up to 42 elements with

StraightInsertionsort . When countingthe number of comparisons

StraightInsertionsort is deactivated and

Mergesort isused down to arrays of size two. We also test one variant where base cases up to nineelements are sorted by a hard-coded

MergeInsertion variant. The median-of- √ n variantsare always implemented with α = 1 / α make very littlediﬀerence as the pivot is almost always very close to the median). Moreover, they switch topseudomedian-of-25 (resp. pseudomedian-of-9, resp. median-of-3) pivot selection for n below20 000 (resp. 800, resp. 100). number of elements n − − − − − − − − ( c o m p a r i s o n s − n l g n ) / n InsertionsortSimple MergeInsertionMergeInsertionLower bound

Figure 11:

Coeﬃcient of the linear term of the number of comparisons of

MergeInsertion ,its simpliﬁed variant and

Insertionsort (for the number of comparisons n lg n + bn the value of b is displayed). QuickXsort – A Fast Sorting Scheme in Theory and Practice

The ﬁrst set of experiments uses our eﬃcient implementations to obtain empirical estimatesfor the number of comparisons used.

Base case sorters.

First, we compare the diﬀerent algorithms we use as base cases:

MergeInsertion , its simpliﬁed variant, and

Insertionsort . The results can be seen inFigure 11. It shows that both Insertionsort and

MergeInsertion match the theoreticalestimates very well. Moreover,

MergeInsertion achieves results for the coeﬃcient ofthe linear term in the range of [ − . , − .

41] (for some values of n are even smaller than − . Insertionsort (as predicted inProposition 9.3) and

MergeInsertion ((25) for the simple variant).

Number of comparisons of QuickXsort variants.

We counted the number of comparisonsof diﬀerent

QuickMergesort variants. We also include an implementation of top-down

Mergesort which agrees in all relevant details with the

Mergesort part of our

Quick-Mergesort implementation. The results can be seen in Figure 12, Figure 13, and Table 5.Here each data point is the average of 400 measurements (with deterministically chosenseeds for the random generator) and for each measurement at least 128MB of data weresorted – so the values for n ≤ are actually averages of more than 400 runs. From theactual number of comparisons we subtract n lg n and then divide by n . Thus, we get anapproximation of the linear term b in the number of comparisons n lg n + bn + o ( n ). number of elements n − − − − − − − ( c o m p a r i s o n s − n l g n ) / n MergesortQuickMergesort (mo- √ n )QuickMergesort (mo- √ n , IS base)QuickMergesort (mo- √ n , MI base)QuickMergesort (mo- √ n , MI up to 9 Elem)QuickMergesort (mo3, α = 1 )QuickMergesort (mo3, α = 1 / )QuickMergesort (mo3, α = 1 / )QuickMergesort (no sampling, α = 1 / )lower bound Figure 12:

Coeﬃcient of the linear term of the number of comparisons ( ( comparisons − n lg n ) /n ). Median-of √ n QuickMergesort is always with α = 1 / . In Table 5, we also show the theoretical values for b . We can see that the actual numberof comparisons matches the theoretical estimate very well. In particular, we experimentallyconﬁrm that the sublinear terms in our estimates are negligible for the total number of For these experiments we use a diﬀerent experimental setup: depending on the size of the arrays thedisplayed numbers are averages over 10 – 10 000 runs.

0. Experiments number of elements n − − − − − − ( c o m p a r i s o n s − n l g n ) / n MergesortQuickMergesort (mo- √ n )QuickMergesort (mo- √ n , IS base)QuickMergesort (mo- √ n , MI base)QuickMergesort (mo- √ n , MI up to 9 Elem) QuickMergesort (mo3, α = 1 )QuickMergesort (mo3, α = 1 / )QuickMergesort (mo3, α = 1 / )QuickMergesort (no sampling, α = 1 / )lower bound number of elements n − − − − − ( c o m p a r i s o n s − n l g n ) / n Figure 13:

Detailed view of the coeﬃcient of the linear term of the number of comparisons( ( comparisons − n lg n ) /n ) for n ∈ [2 .. ] . Enlarged view of bottom part of theplot. Algorithm absolute empirical b theoretical bn = 2 n = 2 n = 2 n = 2 ( n → ∞ ) k = 1, α = 1 / − . − . ± . − . ± . α = 1 89 181 407 7 314 997 953 − . − . ± . − . ± . α = 1 / − . − . ± . − . ± . α = 1 / − . − . ± . − . ± . √ n

87 003 696 7 177 302 635 − . − . ± . · − − . ± . √ n , IS 86 527 879 7 146 103 511 − . − . ± . · − − . ± . √ n , MI 86 408 550 7 138 442 729 − . − . ± . · − ≤ − . Table 5:

Absolute numbers of comparisons and linear term ( b = ( comparisons − n lg n ) /n ) of QuickMergesort variants for n = 2 and n = 2 . We also show the asymptoticregime for b due to Table 2, Corollary 7.1, Corollary 9.4 and Corollary 9.6. The ± -termsfor the theoretical b represent our lower and upper bound. For the experimental b , the ± -terms are the standard error of the mean (standard deviation of the measurementsdivided by the square-root of the number of measurements). QuickXsort – A Fast Sorting Scheme in Theory and Practice comparisons (at least for larger values of n ). The experimental number of comparisons of QuickMergesort with

MergeInsertion base cases is better than the theoretical estimatebecause we analyzed only the simpliﬁed variant of

MergeInsertion .For constant-size samples we see that even with 400 measurements the plots still look abit bumpy, particularly for the largest inputs. Also the diﬀerence to the theoretical values islarger for n = 2 than for n = 2 in Table 5 – presumably because the average is takenover more measurements (see setup above). We note however that the deviations are stillwithin the range we could expect from the values of the standard deviation (both establishedtheoretically and experimentally – Table 6): for 400 runs, we obtain a standard deviation ofapproximately 0 . n/ √

400 = 0 . √ n QuickMergesort uses almost the same numberof comparisons as

Mergesort for larger values of n . This shows that the error terms inTheorem 5.1 are indeed negligible for practical issues. The diﬀerence between experimentaland theoretical values for median-of- √ n QuickMergesort is due to the fact that thebound holds for arbitrary n , but the average costs of Mergesort are actually minimal forpowers of two.In Figure 13 we see experimental results for problem sizes which are not powers oftwo. The periodic coeﬃcients of the linear terms of

Mergesort , Insertionsort and

MergeInsertion can be observed – even though these algorithms are only applied in

QuickXsort (and for the latter two even only as base cases in

QuickMergesort ). Theversion with constant size 9 base cases seems to combine periodic terms of

Mergesort and

MergeInsertion . For the median-of-three version, no signiﬁcant periodic patterns arevisible. We conjecture that the higher variability of subproblem sizes makes the periodicbehavior disappear in the noise.

Standard deviation.

Since not only the average running time (or number of comparisons)is of interest, but also how far an algorithm deviates from the mean running time, wealso measure the standard deviation of the running time and number of comparisons of

QuickMergesort . For comparison we also measured two variants of

Quicksort (whichhas a standard deviation similar to

QuickMergesort ): the GCC implementation of theC++ standard sorting function std::sort (GCC version 4.8.4) and a modiﬁed version wherethe pivot is excluded from recursive calls and otherwise agreeing with std::sort . We callthe latter variant simply

Quicksort as it is the more natural way to implement

Quicksort .Moreover, from both variants we remove the ﬁnal

StraightInsertionsort and insteaduse Quicksort down to size three arrays.In order to get a meaningful estimate of the standard deviation we need many moremeasurements than for the mean values. Therefore, we ran each algorithm 40 000 times(for every input size) and compute the standard deviation of these. Moreover, for everymeasurement only one array of the respective size is sorted. For each measurement we usea pseudo-random seed (generated with std::random_device ). The results can be seen inTable 6 and Figure 14.In Table 6 we also compare the experiments to the theoretical values from Table 4.Although these theoretical values are only approximate values (because Theorem 8.1 is notapplicable to

QuickMergesort ), they match the experimental values very well. This showsthat increase in variance due to the periodic functions in the linear term of the average

0. Experiments α = 1 and α = 1 / α = 1 / √ n is farbetter than median-of-3 (for N = 2 the standard deviation is only around one hundredth).All algorithms have a rather large standard deviation of running times for small inputs(which is no surprise because measurement imprecisions etc. play a bigger role here). There-fore, we only show the results for n ≥ . Also, while QuickMergesort with α = 1 / √ n ) ithas the largest standard deviation for the running time for large n . This is probably due tothe fact that (our implementation of) Reinhardt’s merging method is not as eﬃcient as thestandard merging method. Although median-of- √ n QuickMergesort has the smalleststandard deviation of running times, the diﬀerence is by far not as large as for the numberof comparisons. This indicates that other factors than the number of comparisons are morerelevant for standard deviation of running times.We also see that including the pivot into recursive calls in

Quicksort should be avoided.It increases the standard deviation of both the number of comparisons and the running time,and also for the average number of comparisons (which we do not show here).

Algorithm empirical theoretical n = 2 n = 2 Quicksort (mo3) 0.3385 0.3389 0.3390

Quicksort ( std::sort , no SIS) 0.3662 0.3642 – QuickMergesort (no sampling, α = 1 /

2) 0.6543 0.6540 0.6543

QuickMergesort (mo3, α = 1) 0.3353 0.3355 0.3345 QuickMergesort (mo3, α = 1 /

2) 0.3285 0.3257 0.3268

QuickMergesort (mo3, α = 1 /

4) 0.2643 0.2656 0.2698

QuickMergesort (mo- √ n ) 0.0172 0.00365 – Table 6:

Experimental and theoretical values for the standard deviation divided by n of Quick-Mergesort and

Quicksort (theoretical value for

Quicksort by [23, p. 331]and for

QuickMergesort by Table 4). Recall that for

QuickMergesort , thetheoretical value is only a heuristic approximation as Theorem 8.1 is not formallyapplicable with periodic linear terms. In light of this, the high precision of all thesepredictions is remarkable.

We compare

QuickMergesort and

QuickHeapsort with

Mergesort (our own imple-mentation which is identical with our implementation of

QuickMergesort , but with usingan external buﬀer of length n/ Wikisort [38] (in-place stable Mergesort based on [30]), std::stable_sort (a bottom-up Mergesort, from GCC version 4.8.4),

InSituMergesort [15] (which is essentially

QuickMergesort where always the median is used as pivot), and std::sort (median-of-three Introsort, from GCC version 4.8.4).All time measurements were repeated with the same 100 deterministically chosen seeds –the displayed numbers are the averages of these 100 runs. Moreover, for each time measure-ment, at least 128MB of data were sorted – if the array size is smaller, then for this time2

QuickXsort – A Fast Sorting Scheme in Theory and Practice number of elements n s t dd e v c o m p s / n number of elements n s t dd e v t i m e / n [ n s ] QuickMergesort (mo3, α = 1 / )QuickMergesort (mo3, α = 1 / )QuickMergesort (mo- √ n )QuickMergesort (no sampling, α = 1 / ) QuickMergesort (mo3, α = 1 )std::sort (no SIS)Quicksort Figure 14:

Standard deviation of the number of comparisons (left) and the running times(right). For the number of comparisons, median-of- √ n QuickMergesort and

QuickMergesort without pivot sampling are out of range. measurement several arrays have been sorted and the total elapsed time measured. Theresults for sorting 32-bit integers are displayed in Figure 16, Figure 15, and Figure 17, whichall contain the results of the same set of experiments – we use three diﬀerent ﬁgures becauseof the large number of algorithms and diﬀerent scales on the y-axes.Figure 15 compares diﬀerent

QuickMergesort variants to

Mergesort and std::sort .In particular, we compare median-of-3

QuickMergesort with diﬀerent values of α . Whilefor the number of comparisons a smaller α was beneﬁcial, it turns out that for the runningtime the opposite is the case: the variant with α = 1 is the fastest. Notice, however, thatthe diﬀerence is smaller than 1%. The reason is presumably that partitioning is faster thanmerging: for large α the problem sizes sorted by Mergesort are reduced and more “sortingwork” is done by the partitioning. As we could expect our

Mergesort implementationis faster than all

QuickMergesort variants – because it can do simply moves instead ofswaps. Except for small n , std::sort beats QuickMergesort . However, notice that for n = 2 the diﬀerence between std::sort and QuickMergesort without sampling is onlyapproximately 5%, thus, can most likely be bridged with additional tuning eﬀorts (e.g. blockpartitioning [13]).In Figure 16 we compare the

QuickMergesort variants with base cases with

Quick-Heapsort and std::sort . While

QuickHeapsort has still an acceptable speed for small n ,it becomes very slow when n grows. This is presumably due to the poor locality of memoryaccesses in Heapsort . The variants of

QuickMergesort with growing size base cases arealways quite slow. This could be improved by sorting smaller base cases with the respectivealgorithm – but this opposes our other aim to minimize the number of comparisons. Onlythe version with constant size

MergeInsertion base cases reaches a speed comparable to std::sort (as it can be seen also in Figure 15).

0. Experiments number of elements n t i m e p e r n l g n [ n s ] MergesortQuickMergesort (mo- √ n )QuickMergesort (mo- √ n , MI up to 9 Elem)QuickMergesort (mo3, α = 1 ) QuickMergesort (mo3, α = 1 / )QuickMergesort (mo3, α = 1 / )QuickMergesort (no sampling, α = 1 / )std::sort Figure 15:

Running times of

QuickMergesort variants,

Mergesort , and std::sort when sorting random permutations of integers. number of elements n t i m e p e r n l g n [ n s ] QuickHeapsort (mo3)QuickMergesort (mo- √ n , IS base)QuickMergesort (mo- √ n , MI base)QuickMergesort (mo- √ n , MI up to 9 Elem)std::sort Figure 16:

Running times of

QuickMergesort variants with base cases and

QuickHeap-sort when sorting random permutations of integers. QuickXsort – A Fast Sorting Scheme in Theory and Practice number of elements n t i m e p e r n l g n [ n s ] In-situ MergesortMergesortQuickMergesort (mo3, α = 1 )std::sortstd::stable_sortWikisort Figure 17:

Running times when sorting random permutations of integers.

Figure 17 shows median-of-3

QuickMergesort together with the other algorithmslisted above. As we see,

QuickMergesort beats the other in-place

Mergesort variants

InSituMergesort and

Wikisort by a fair margin. However, be aware that

QuickMerge-sort (as well as

InSituMergesort ) neither provides a guarantee for the worst case nor isit a stable algorithm.

Other data types.

While all the previous running time measurements were for sorting32-bit integers, in Figure 18 we also tested two other data types: (1) 32-bit integers with aspecial comparison function which before every comparison computes the logarithm of theoperands, and (2) pointers to records of 40 bytes which are compared by the ﬁrst 4 bytes.Thus in both cases, comparisons are considerably more expensive than for standard integers.Each record is allocated on the heap with new – since we do this in increasing order andonly shuﬄe the pointers, we expect them to reside memory in close-to-sorted order.For both data types,

QuickMergesort with constant size

MergeInsertion base casesis the fastest (except when sorting pointers for very large n ). This is plausible since it combinesthe best of two worlds: on one hand, it has an almost minimal number of comparisons,on the other hand, it does not induce the additional overhead for growing size base cases.Moreover, the bad behavior of the other QuickMergesort variants (“without” base cases)is probably because we sort base cases up to 42 elements with

StraightInsertionsort –incurring many more comparisons (which we did not count in Section 10.1).

11. Conclusion

Sorting n elements remains a fascinating topic for computer scientists both from a theoreticaland from a practical point of view. With QuickXsort we have described a procedure toconvert an external sorting algorithm into an internal one introducing only a lower orderterm of additional comparisons on average.We examined

QuickHeapsort and

QuickMergesort as two examples for this construc-tion.

QuickMergesort is close to the lower bound for the average number of comparisonsand at the same time is eﬃcient in terms of running time, even when the comparisons are

1. Conclusion number of elements n t i m e p e r n l g n [ n s ] In-situ MergesortMergesortQuickMergesort (mo- √ n , IS base)QuickMergesort (mo- √ n , MI base)QuickMergesort (mo- √ n , MI up to 9 Elem) QuickMergesort (no sampling, α = 1 / )QuickMergesort (mo3, α = 1 )std::sortstd::stable_sortWikisort number of elements n t i m e p e r n l g n [ n s ] Figure 18:

Running times when sorting random permutations of ints with special comparisonfunction (computing the log in every comparison – left) and pointers to Records(right).

Wikisort did not run for sorting pointers and

QuickMergesort with

Insertionsort base cases is out of range. fast.Using

MergeInsertion to sort base cases of growing size for

QuickMergesort , wederive an an upper bound of n lg n − . n + o ( n ) comparisons for the average case.Using the recent algorithm by Iwama and Teruyama [27] this can be improved even furtherto n lg n − . n + o ( n ), without causing the overall operations to become more than O ( n log n ). Thus, the average of our best implementation has a proven gap of at most0 . n + o ( n ) comparisons to the lower bound. Of course, there is still room in closing thegap to the lower bound of n lg n − . n + O (log n ) comparisons.This illustrates one underlying strength of the framework architecture of QuickXsort :by applying the transfer results as shown in this paper

QuickXsort directly participates inadvances to the performance of algorithm X. Moreover, our experimental results suggestthat the bound of n lg n − . n + O (log n ) element comparisons may be beaten at least forsome values of n . This very close gap between the lower and upper bound manifested in thesecond order (linear) term makes the sorting problem a fascinating topic and mainstay forthe analysis of algorithms in general.We were also interested in the practical performance of QuickXsort and study variantswith smaller sampling sizes for the pivot in great detail. Besides average-cases analyses,variances were analyzed. The established close mapping of the theoretical results with theempirical ﬁndings should be taken as a convincing arguments for the preciseness of themathematical derivations.6

QuickXsort – A Fast Sorting Scheme in Theory and Practice

Open questions.

Below, we list some possibilities for extensions of this work.• By Theorem 5.1 for the average number of comparisons sample sizes of Θ ( √ n ) areoptimal among all polynomial size samples. However, it remains open whether Θ ( √ n )sample sizes are also optimal among all (also non-polynomial) sample sizes.• In all theorems, we only use Θ (or O ) notation for sublinear terms and only give upperand lower bounds for the periodic linear terms. Exact formulas for the average numberof comparisons of QuickXsort are still open and also would be a tool to ﬁnd theexact optimal sample sizes.• In this work the focus was on expected behavior. Nevertheless, in practice often alsoguarantees for the worst case are desired. In Theorem 5.7, we did a ﬁrst step towardssuch guarantees. Moreover, in [10], we examined the same approach in more detail.Still there are many possibilities for good worst-case guarantees to investigate.• In Theorem 8.1 we needed the technical conjecture that the variance is in O ( n ) sincewe only could show it for special values of k and α . Hence, it remains to ﬁnd a generalproof (or disproof) that the variance is always in O ( n ) for constant size samples. Thisissue becomes even more interesting when ﬂuctuations in the expected costs of X aretaken into account.• What is the order of growth of the variance of QuickXsort for growing size samplesfor pivot selection.• We only analyzed the simpliﬁed variant of

MergeInsertion . The average number ofcomparisons of the original variant still is an open problem and seems rather diﬃcultto attack. Nevertheless, better bounds than just the simpliﬁed version should be withinreach.• Further future research avenues are to improve the empirical behavior for large-scaleinputs and to study options for parallelization.

References [1] Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert E.Tarjan. Time bounds for selection.

Journal of Computer and System Sciences , 7(4):448–461, 1973.[2] Hans-Juergen Boehm, Russell R. Atkinson, and Michael F. Plass. Ropes: An alternativeto strings.

Softw., Pract. Exper. , 25(12):1315–1330, 1995. URL: https://doi.org/10.1002/spe.4380251203 , doi:10.1002/spe.4380251203 .[3] D. Cantone and G. Cincotti. Quickheapsort, an eﬃcient mix of classical sortingalgorithms. Theoretical Computer Science , 285(1):25–42, August 2002. doi:10.1016/S0304-3975(01)00288-2 .[4] Volker Diekert and Armin Weiß. QuickHeapsort: Modiﬁcations and improvedanalysis.

Theory of Computing Systems , 59(2):209–230, aug 2016. doi:10.1007/s00224-015-9656-y . eferences http://dlmf.nist.gov .[6] Ernst E. Doberkat. An average case analysis of Floyd’s algorithm to construct heaps. Information and Control , 61(2):114–131, May 1984. doi:10.1016/S0019-9958(84)80053-4 .[7] Ronald D. Dutton. Weak-heap sort.

BIT , 33(3):372–381, 1993.[8] S. Edelkamp and P. Stiegeler. Implementing HEAPSORT with n log n − . n andQUICKSORT with n log n + 0 . n comparisons. ACM Journal of Experimental Algorith-mics , 10(5), 2002.[9] Stefan Edelkamp and Ingo Wegener. On the performance of Weak-Heapsort. In , volume 1770, pages254–266. Springer-Verlag, 2000.[10] Stefan Edelkamp and Armin Weiß. Worst-case eﬃcient sorting with QuickMergesort.In

ALENEX 2019 Proceedings . To appear.[11] Stefan Edelkamp and Armin Weiß. QuickXsort: Eﬃcient Sorting with n log n − . n + o ( n ) Comparisons on Average. ArXiv e-prints , abs/1307.3033, 2013. URL: http://arxiv.org/abs/1307.3033 .[12] Stefan Edelkamp and Armin Weiß. QuickXsort: Eﬃcient sorting with n log n − . n + o ( n ) comparisons on average. In International Computer Science Symposium in Russia ,pages 139–152. Springer, 2014. doi:10.1007/978-3-319-06686-8_11 .[13] Stefan Edelkamp and Armin Weiß. BlockQuicksort: Avoiding branch mispredictionsin Quicksort. In Piotr Sankowski and Christos D. Zaroliagis, editors, , volume 57 of

LIPIcs , pages 38:1–38:16. Schloss Dagstuhl - Leibniz-Zentrumfür Informatik, 2016. URL: http://dx.doi.org/10.4230/LIPIcs.ESA.2016.38 , doi:10.4230/LIPIcs.ESA.2016.38 .[14] Stefan Edelkamp and Armin Weiß. QuickMergesort: Practically eﬃcient constant-factoroptimal sorting, 2018. arXiv:1804.10062 .[15] Amr Elmasry, Jyrki Katajainen, and Max Stenmark. Branch mispredictions don’t aﬀectmergesort. In SEA , pages 160–171, 2012.[16] Philippe Flajolet and Mordecai Golin. Mellin transforms and asymptotics.

ActaInformatica , 31(7):673–696, July 1994. doi:10.1007/BF01177551 .[17] Jr. Ford, Lester R. and Selmer M. Johnson. A tournament problem.

The AmericanMathematical Monthly , 66(5):pp. 387–389, 1959. URL: .[18] Lester R. Ford and Selmer M. Johnson. A tournament problem.

The AmericanMathematical Monthly , 66(5):387, May 1959. doi:10.2307/2308750 .8 QuickXsort – A Fast Sorting Scheme in Theory and Practice [19] Viliam Geﬀert, Jyrki Katajainen, and Tomi Pasanen. Asymptotically eﬃcient in-placemerging.

Theor. Comput. Sci. , 237(1-2):159–181, 2000. URL: https://doi.org/10.1016/S0304-3975(98)00162-5 , doi:10.1016/S0304-3975(98)00162-5 .[20] Mordecai J. Golin and Robert Sedgewick. Queue-mergesort. Information ProcessingLetters , 48(5):253–259, December 1993. doi:10.1016/0020-0190(93)90088-q .[21] Gaston H. Gonnet and J. Ian Munro. Heaps on heaps.

SIAM Journal on Computing ,15(4):964–971, nov 1986. URL: https://doi.org/10.1137/0215068 , doi:10.1137/0215068 .[22] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: AFoundation For Computer Science . Addison-Wesley, 1994.[23] P. Hennequin. Combinatorial analysis of quicksort algorithm.

RAIRO - TheoreticalInformatics and Applications - Informatique Théorique et Applications , 23(3):317–333,1989. URL: http://eudml.org/doc/92337 .[24] C. A. R. Hoare. Algorithm 65: Find.

Commun. ACM , 4(7):321–322, July 1961. URL: http://doi.acm.org/10.1145/366622.366647 , doi:10.1145/366622.366647 .[25] Hsien-Kuei Hwang. Limit theorems for mergesort. Random Structures and Algo-rithms , 8(4):319–336, July 1996. doi:10.1002/(sici)1098-2418(199607)8:4<319::aid-rsa3>3.0.co;2-0 .[26] Hsien-Kuei Hwang. Asymptotic expansions of the mergesort recurrences.

Acta Infor-matica , 35(11):911–919, November 1998. doi:10.1007/s002360050147 .[27] Kazuo Iwama and Junichi Teruyama. Improved average complexity for comparison-based sorting. In Faith Ellen, Antonina Kolokolova, and Jörg-Rüdiger Sack, editors,

Workshop on Algorithms and Data Structures (WADS), Proceedings , volume 10389 of

Lecture Notes in Computer Science , pages 485–496. Springer, 2017. doi:10.1007/978-3-319-62127-2\_41 .[28] Jyrki Katajainen. The ultimate heapsort. In

Proceedings of the Computing: The 4thAustralasian Theory Symposium , Australian Computer Science Communications, pages87–96. Springer-Verlag Singapore Pte. Ltd., 1998. URL: .[29] Jyrki Katajainen, Tomi Pasanen, and Jukka Teuhola. Practical in-place mergesort.

Nordic Journal of Computing , 3(1):27–40, 1996. URL: .[30] Pok-Son Kim and Arne Kutzner. Ratio based stable in-place merging. In Manin-dra Agrawal, Ding-Zhu Du, Zhenhua Duan, and Angsheng Li, editors,

Theory andApplications of Models of Computation, 5th International Conference, TAMC 2008,Xi’an, China, April 25-29, 2008. Proceedings , volume 4978 of

Lecture Notes in Com-puter Science , pages 246–257. Springer, 2008. URL: https://doi.org/10.1007/978-3-540-79228-4_22 , doi:10.1007/978-3-540-79228-4_22 .[31] Donald E. Knuth. The Art Of Computer Programming: Searching and Sorting . AddisonWesley, 2nd edition, 1998. eferences

Selected Papers on Analysis of Algorithms , volume 102 of

CSLILecture Notes . Center for the Study of Language and Information Publications, 2000.[33] Hosam M. Mahmoud.

Sorting: A distribution theory . John Wiley & Sons, 2000.[34] Heikki Mannila and Esko Ukkonen. A simple linear-time algorithm for in situ merging.

Information Processing Letters , 18(4):203–208, May 1984. doi:10.1016/0020-0190(84)90112-1 .[35] Conrado Martínez and Salvador Roura. Optimal sampling strategies in Quicksortand Quickselect.

SIAM Journal on Computing , 31(3):683–705, 2001. doi:10.1137/S0097539700382108 .[36] C. J. H. McDiarmid. Concentration. In M. Habib, C. McDiarmid, J. Ramirez-Alfonsin,and B. Reed, editors,

Probabilistic Methods for Algorithmic Discrete Mathematics , pages195–248. Springer, Berlin, 1998.[37] Colin J. H. McDiarmid and Bruce A. Reed. Building heaps fast.

Journal of Algorithms ,pages 352–365, 1989.[38] Mike McFadden. WikiSort. Github repository at https://github.com/BonzaiThePenguin/WikiSort . URL: https://github.com/BonzaiThePenguin/WikiSort .[39] David R. Musser. Introspective sorting and selection algorithms.

Software—Practiceand Experience , 27(8):983–993, 1997.[40] Wolfgang Panny and Helmut Prodinger. Bottom-up mergesort—a detailed analysis.

Algorithmica , 14(4):340–354, October 1995. doi:10.1007/BF01294131 .[41] Klaus Reinhardt. Sorting in-place with a worst case complexity of n log n − . n + O (log n )comparisons and εn log n + O (1) transports. In International Symposium on Algorithmsand Computation (ISAAC) , pages 489–498, 1992. doi:10.1007/3-540-56279-6_101 .[42] Salvador Roura.

Divide-and-Conquer Algorithms and Data Structures . Tesi doctoral(Ph. D. thesis, Universitat Politècnica de Catalunya, 1997.[43] Salvador Roura. Improved master theorems for divide-and-conquer recurrences.

Journalof the ACM , 48(2):170–205, 2001. doi:10.1145/375827.375837 .[44] Robert Sedgewick. The analysis of Quicksort programs.

Acta Informatica , 7(4):327–355,1977. doi:10.1007/BF00289467 .[45] Robert Sedgewick and Philippe Flajolet.

An Introduction to the Analysis of Algorithms .Addison-Wesley-Longman, 2nd edition, 2013.[46] Robert Sedgewick and Kevin Wayne.

Algorithms . Addison-Wesley, 4th edition, 2011.[47] Houshang H. Sohrab.

Basic Real Analysis . Springer Birkhäuser, 2nd edition, 2014.[48] Ingo Wegener. Bottom-up-Heapsort, a new variant of Heapsort beating, on an average,Quicksort (if n is not very small). Theoretical Computer Science , 118(1):81–98, 1993.0

QuickXsort – A Fast Sorting Scheme in Theory and Practice [49] Sebastian Wild.

Dual-Pivot Quicksort and Beyond: Analysis of Multiway Partition-ing and Its Practical Potential . Doktorarbeit (Ph.D. thesis), Technische UniversitätKaiserslautern, 2016. ISBN 978-3-00-054669-3. URL: http://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:hbz:386-kluedo-44682 .[50] Sebastian Wild. Average cost of QuickXsort with pivot sampling. In James Allen Fill andMark Daniel Ward, editors,

International Conference on Probabilistic, Combinatorialand Asymptotic Methods for the Analysis of Algorithms (AofA 2018) , LIPIcs, 2018. doi:10.4230/LIPIcs.AofA.2018.36 .[51] Sebastian Wild. Supplementary mathematica notebook for variance computation.October 2018. doi:10.5281/zenodo.1463020 . . Notation AppendixA. Notation

A.1. Generic mathematics N , N , Z , R . . . . . . . . . natural numbers N = { , , , . . . } , N = N ∪ { } , integers Z = { . . . , − , − , , , , . . . } , real numbers R . R > , N ≥ etc. . . . . . . . restricted sets X pred = { x ∈ X : x fulﬁlls pred } .ln( n ), lg( n ), log n . . . . natural and binary logarithm; ln( n ) = log e ( n ), lg( n ) = log ( n ). We use logfor an unspeciﬁed (constant) base in O -terms X . . . . . . . . . . . . . . . . . . to emphasize that X is a random variable it is Capitalized.[ a, b ) . . . . . . . . . . . . . . . real intervals, the end points with round parentheses are excluded, thosewith square brackets are included.[ m..n ], [ n ] . . . . . . . . . . . integer intervals, [ m..n ] = { m, m + 1 , . . . , n } ; [ n ] = [1 ..n ].[stmt], [ x = y ] . . . . . . . Iverson bracket, [stmt] = 1 if stmt is true, [stmt] = 0 otherwise. H n . . . . . . . . . . . . . . . . . n th harmonic number; H n = P ni =1 /i . x ± y . . . . . . . . . . . . . . . x with absolute error | y | ; formally the interval x ± y = [ x − | y | , x + | y | ]; aswith O -terms, we use one-way equalities z = x ± y instead of z ∈ x ± y . (cid:0) nk (cid:1) . . . . . . . . . . . . . . . . . binomial coeﬃcients; (cid:0) nk (cid:1) = n k /k !.B( λ, ρ ) . . . . . . . . . . . . . for λ, ρ ∈ R + ; the beta function, B( λ, ρ ) = R z λ − (1 − z ) ρ − dz ; see alsoEquation (3) on page 15 I x,y ( λ, ρ ) . . . . . . . . . . . . the regularized incomplete beta function; I x,y ( λ, ρ ) = R yx z λ − (1 − z ) ρ − B( λ,ρ ) dz for λ, ρ ∈ R + , 0 ≤ x ≤ y ≤ a b , a b . . . . . . . . . . . . . . . factorial powers; “ a to the b falling resp. rising”; e.g., x = x ( x − x − x − = 1 / (( x + 1)( x + 2)( x + 3)). A.2. Stochastics-related notation P [ E ], P [ X = x ] . . . . . . probability of an event E resp. probability for random variable X to attainvalue x . E [ X ] . . . . . . . . . . . . . . . expected value of X ; we write E [ X | Y ] for the conditional expectation of X given Y , and E X [ f ( X )] to emphasize that expectation is taken w.r.t.random variable X . X D = Y . . . . . . . . . . . . . equality in distribution; X and Y have the same distribution. U ( a, b ) . . . . . . . . . . . . . . uniformly in ( a, b ) ⊂ R distributed random variable.Beta( λ, ρ ) . . . . . . . . . . . Beta distributed random variable with shape parameters λ ∈ R > and ρ ∈ R > .Bin( n, p ) . . . . . . . . . . . . binomial distributed random variable with n ∈ N trials and successprobability p ∈ [0 , n, λ, ρ ) . . . . . . beta-binomial distributed random variable; n ∈ N , λ, ρ ∈ R > ; QuickXsort – A Fast Sorting Scheme in Theory and Practice

A.3. Speciﬁc notation for algorithms and analysis n . . . . . . . . . . . . . . . . . . length of the input array, i.e., the input size. k , t . . . . . . . . . . . . . . . . sample size k ∈ N ≥ , odd; k = 2 t + 1, t ∈ N ; we write k ( n ) to emphasizethat k might depend on n . w . . . . . . . . . . . . . . . . . . threshold for recursion, for n ≤ w , we sort inputs by X; we require w ≥ k − α . . . . . . . . . . . . . . . . . . α ∈ [0 , b αn c elements. c ( n ) . . . . . . . . . . . . . . . . expected costs of QuickXsort ; see Section 4. x ( n ), a , b . . . . . . . . . . . expected costs of X, x ( n ) = an lg n + bn ± o ( n ); see Section 4. J , J . . . . . . . . . . . . . . (random) subproblem sizes; J + J = n − J = t + I ; I , I . . . . . . . . . . . . . . . (random) segment sizes in partitioning; I D = BetaBin( n − k, t + 1 , t + 1); I = n − k − I ; J = t + I R . . . . . . . . . . . . . . . . . . (one-based) rank of the pivot; R = J + 1. s ( k ) . . . . . . . . . . . . . . . . (expected) cost for pivot sampling, i.e., cost for choosing median of k elements. A , A , A . . . . . . . . . . . indicator random variables; A1