[PDF] Robust and adaptive search

Abstract

Binary search finds a given element in a sorted array with an optimal number of \log n queries. However, binary search fails even when the array is only slightly disordered or access to its elements is subject to errors. We study the worst-case query complexity of search algorithms that are robust to imprecise queries and that adapt to perturbations of the order of the elements. We give (almost) tight results for various parameters that quantify query errors and that measure array disorder. In particular, we exhibit settings where query complexities of \log n + ck, (1+\varepsilon)\log n + ck, and \sqrt{cnk}+o(nk) are best-possible for parameter value k, any \varepsilon>0, and constant c.

Full PDF

aa r X i v : . [ c s . D S ] F e b Robust and adaptive search

Yann Disser ∗ Stefan Kratsch † September 21, 2018

Binary search ﬁnds a given element in a sorted array with an optimal numberof log n queries. However, binary search fails even when the array is only slightlydisordered or access to its elements is subject to errors. We study the worst-casequery complexity of search algorithms that are robust to imprecise queries and thatadapt to perturbations of the order of the elements. We give (almost) tight results forvarious parameters that quantify query errors and that measure array disorder. Inparticular, we exhibit settings where query complexities of log n + ck , (1+ ε ) log n + ck ,and √ cnk + o ( nk ) are best-possible for parameter value k , any ε >

0, and constant c .

1. Introduction

Imagine a large register with n ﬁles from which you wish to extract a particular ﬁle. All ﬁlesare indexed by some key and the ﬁles are sorted by key value. Not knowing the distribution ofthe keys, you probably use binary search since looking at log n keys is best possible in the worstcase. Unfortunately, however, other users have accessed ﬁles before you and have only returnedthe ﬁles to approximately the right place. As a result, the register is unsorted, but at least eachﬁle is within some small number k of positions of where it should be. How should you proceed?If you knew k and n , at what ratio of k vs. n should you resort to a linear search of the register?If you do not know k , can you still do reasonably well? What if the register was recentlymoved, by packing the ﬁles into boxes, but in the process the order of the boxes got mixed up,and now there are large blocks of ﬁles that are far away from their correct locations? What ifyou misread some of the keys? Situations like these are close to searching in a sorted registerand there are plenty of parameters that measure closeness to a sorted array, e.g., maximumdisplacement or minimum block moves to sort, respectively persistent or temporary read errors.We give (almost) optimal algorithms for a large variety of these measures, and thereby establishfor each of them exact regimes in which we can outperform a linear search of all elements, oreven be almost as good as binary search.More formally, we study the fundamental topic of comparison-based search, which is centralto many algorithms and data structures [20, 24, 31]. In its most basic form, the search problemcan be phrased in terms of locating an element e within a given array A . In order to search A eﬃciently, we need structure in the ordering of its elements: In general, we cannot hope to avoidquerying all entries to ﬁnd e . The most prominent example of an eﬃcient search algorithm thatexploits special structure is binary search for sorted arrays. Binary search is best-possible for ∗ Main work done while at TU Berlin. Current address: TU Darmstadt, [email protected] † University of Bonn, [email protected]

What is the best-possible search algorithm if the data may bedisordered or we cannot access it reliably? In what regime of the considered measure is it betterthan linear search?

We provide (almost) tight bounds on the query complexity of searching an array A with n entries for an element e in a variety of settings. Each setting is characterized by boundinga diﬀerent parameter k that quantiﬁes the imperfections regarding either our access to arrayelements or regarding the overall disorder of the data. Note that one can always resort to linearsearch, which rules out lower bounds stronger than n comparisons. † Table 1 gives an overview ofthe parameters we analyze and our respective results. Qualitatively, our results can be groupedinto three groups of settings leading to diﬀerent query complexities, and we brieﬂy highlighteach group in the following.The ﬁrst group contains the parameters k sum , k max , and k inv , which quantify the summed/maximumdistance of each element from its position in sorted order and the number of element pairs inthe wrong relative order, respectively (detailed deﬁnitions can be found the the correspondingsections). For all of these parameters we are able to show that log n + ck queries are necessaryand suﬃcient, for constant c . Intuitively, this is the best complexity we can hope for: We cannotdo better than log( n ) queries, and the impact of k on the query complexity is linear and can beisolated.The second group of results is with respect to the parameters k lies , k faults , as well as multipleparameters for edit distances that measure the number of element operations needed to sort A .The parameter k lies limits the number of queries that yield the wrong result, and k faults limitsthe number of array positions that yield wrong query outcomes. For bounded values of k lies and k faults we show that e cannot be found with log n + ck queries using any binary-search-likealgorithm. On the other hand, we provide an algorithm that needs (1 + 1 /c ) log n + ck queries,for any c ≥

1. For bounded edit distances, it is easy to see that we need n queries if e need notbe at its correct position relative to sorted order, since e can be moved anywhere with just 2edits, forcing us to scan the whole array. If we assume e to be at its correct location, we cancarry over the results for k lies and k faults to obtain the same bounds for the edit-distance relatedparameters k rep , k seq , k mov , and k swap .Lastly, we consider the parameter k ainv that counts the number of adjacent elements thatare in the wrong relative order, as well as several parameters measuring the number of blockoperations needed to sort A . Intuitively, these settings are much more diﬃcult for a searchalgorithm, as it takes relatively small parameter values to introduce considerable disorder. Forthe case that e is guaranteed to be at the correct position, we show that √ cnk + o ( nk ) queriesare necessary and suﬃcient to locate e .The algorithms for k ainv and related parameters assume that the parameter value is knownto the algorithm a priori. In contrast, all our other algorithms are oblivious to the parameter,in the sense that they do not require knowledge of the parameter value as long as the targetelement e is guaranteed to be present in the array. Note that if e need not be present and wehave no bound on the disorder, we generally need to inspect every entry of the array in case † Accordingly, all (lower) bounds of the form f ( n, k ) throughout the paper are to be understoodas min { f ( n, k ) , n } . A naive bound of n can easily be obtained by scanning the whole array. We interpret the array as a binary tree (rooted at entry n/

2, with the two children n/

4, 3 n/

4, etc.), and callan algorithm “binary-search-like” if it never queries a node (other than the root) before querying its parent.

2e cannot ﬁnd e . For the parameter k lies , we do not even know how long we need to continuequerying the same elements until we may conclude that e is not part of the array. Any of ouroblivious algorithms can trade the guarantee that e ∈ A against knowledge of the parametervalue k : Compute from k the maximum number m of queries that it would take without knowing k when e ∈ A . If the algorithm does not stop within m queries then it is safe to answer that e is not in A .Overall, our results point out several parameters for which a fairly large regime of k (as afunction of n ) allows search algorithms that are provably better than linear search. For example,while moving only a single element by a lot can lead to bounds of Ω( n ) on the values of severalparameters, and hence trivial guarantees, moving many elements by at most k places gives k max = k and yields better bounds than linear search (roughly) for k < n , and as good asbinary search when k = O (log n ). Moving only few elements by an arbitrary number of spaces,in turn, still leads to good bounds via parameters such as k mov or k swap , as long as the targetis in the correct place. Parameters such as k ainv grow even more slowly, for certain types ofdisorder, but, on the other hand, only a small regime allows for better than trivial guarantees.While, for each individual parameter we study, there are “easily searchable” instances wherethe parameter becomes large and makes the corresponding bound trivial, our results often allowfor good bounds by resorting to a diﬀerent parameter. Our work falls into the area of adaptive analysis of algorithms , which aims at a ﬁne-grainedanalysis of polynomial-time algorithms with respect to structural parameters of the input. Anobjective of this ﬁeld is to ﬁnd algorithms whose running-time dependence on input size andthe structural parameters interpolates smoothly between known (good) bounds for special casesand the worst-case bound for general inputs. The topic of adaptive sorting, i.e., sorting arraysthat are presorted in some sense, has attracted a lot of attention, see, e.g., [4, 13, 23, 28].We now discuss results that are speciﬁc to searching in arrays. Several authors addressed thequestion of how much preprocessing, i.e., sorting, helps for searching, if we take into accountthe total time investment [8, 22, 29]. Fredman [18] gave lower bounds on searching regardingboth queries and memory accesses. A classic work of Yao [32] established that the best way ofstoring n elements in a table such as to minimize number of queries for accessing an elementis by keeping the elements sorted, which requires log n queries, provided that the key space islarge enough. Regarding searching in (partially) unordered arrays, there is a nice result of Biedlet al. [5] about insertion sort based on repeated binary searches.Under appropriate assumptions, namely that array is sorted and its elements are drawn froma known distribution (e.g., searching for a name in a telephone book), one can do much betterthan binary search, since the distribution allows a good prediction of where the target should belocated. In this case O (log log n ) queries suﬃce on average (cf. [31]); to avoid having to querythe entire array, previous work suggests combinations of algorithms that perform no worse thanbinary search in the worst case [10, 6]. Another interesting branch of study is related to searchin arrays of more complicated objects such as (long) strings [1, 17] or abstract objects withnonuniform comparison cost [19, 2].Many papers have studied searching in the presence of diﬀerent types of errors, e.g., [7, 15, 16,25], see [11, 27] for surveys. A popular error model for searching allows for a linear number oflies [3, 7, 12, 14, 26], for which Borgstrom and Kosaraju [7] gave an O (log n ) search algorithm. Inconstrast, we bound the number of lies separately via the parameter k lies . Rivest et al. [30] gavean upper bound of log n + k log log n + O ( k log k ) queries for this parameter. Their algorithm3able 1: Overview of our results, with main results in boldface. † ( o : even if oblivious to param-eter value; c : for all c ≥ t : for tree-algorithms; e : for pos( e ) = rank( e ))boundsparameter description lower upper Section 3 – number of imprecise queries k lies wrong outcomes log n + ck [Th. 3] ct (1+ c ) log n + (2 c +2) k [Th. 2] oc k faults indices with wrong outcomes log n + ck [Co. 1] ct (1+ c ) log n + (2 c +2) k [Th. 4] oc Section 4.1 – displacement of elements k sum total displacement log n/k + 2 k + O (1) [Th. 5,6] o k max maximum displacement log n/k + 3 k + O (1) [Th. 7,8] o Section 4.2 – number of inversions k inv all inversions log n/k +2 k + O (1) [Co. 3] log n/k + 4 k + O (1) [Co. 3] o k ainv adjacent inversions √ nk + o ( √ nk ) [Th. 9,10] e Section 4.3 – element operations needed to sort the array k rep element replacements log n + ck [Co. 4] cte (1+ c ) log n + (4 c +4) k [Th. 11] oe k seq n − | max ordered subseq. | log n + ck [Co. 4] cte (1+ c ) log n + (4 c +4) k [Th. 11] oe k mov element moves log n + ck [Co. 4] cte (1+ c ) log n + (4 c +4) k [Th. 11] oe k swap element swaps log n + ck [Co. 4] cte (1+ c ) log n + (8 c +8) k [Th. 11] oe k aswap adj. element swaps log n/k +2 k + O (1) [Co. 5] log n/k + 4 k + O (1) [Co. 5] o Section 4.4 – block operations needed to sort the array k bswap block swaps 4 √ nk + o ( √ nk ) [Co. 7] [Th. 13] e k rbswap equal size block swaps 2 √ nk + o ( √ nk ) [Co. 8] e √ nk + o ( √ nk ) [Th. 12] e k bmov block moves 2 √ nk + o ( √ nk ) [Th. 14] e [Co. 6] is based on a continuous strategy for the (equivalent) problem of ﬁnding an unknown valuein [1 , n ], upto a given precision, using few yes-no questions. Our algorithm (Theorem 2) usesasymptotically fewer queries if k lies = ω (log n/ log log n ). The works of Finocchi and Italiano [16] and Finocchi et al. [15] consider a parameter verysimilar to k faults , with the additional assumption that faults may aﬀect also the working memoryof the algorithm, except for O (1) “safe” memory words. Finocchi and Italiano [16] give adeterministic searching algorithm that needs O (log n + k ) queries. Brodal et al. [9] improvethis bound to O (log n + k ) and Finocchi et al. [15] provide a lower bound of Ω(log n + k ) evenfor randomized algorithms. Our results are incomparable as our result for parameter k faults usesonly (1+ c ) log n +(2 c +2) k queries, getting arbitrarily close to log n + O ( k ) (cf. Theorem 4), butdoes not consider faults in the working memory; the high level approach of balancing progressin the search with security queries is the same as in [9], but more careful counting is needed toget small constants. For parameter k lies we give a simpler algorithm with 2 log n + 4 k queriesand using only O (1) words of working memory, but it is not clear whether the result can betransferred to k faults without increasing the memory usage. A technical report of Long [21] claims that the actual tight bound of the algorithm of Rivest et al. [30] is O (log n + k ), which is consistent with our results.

2. Preliminaries

In this paper we consider the following problem: Given an array A of length n and an element e ,ﬁnd the position of e in A or report that e / ∈ A with as few queries as possible. We use A [ i ], i ∈ , . . . , n to denote the i -th entry of A . We allow access to the entries of A only via queriesto its indices, regarding the relation of the corresponding element to e . We write query( i ) forthe operation of querying A at index i , and let query( i ) = ‘¡’ (respectively, ‘¿’ or ‘=’) denotethe outcome indicating that A [ i ] < e (respectively A [ i ] > e or A [ i ] = e ). Note that in faultysettings the query outcome need not be accurate.To keep notation simple, we generally assume the entries of A to be unique unless explicitlystated otherwise. We emphasize that none of our results relies on this assumption. We canthen deﬁne pos( a ) to denote the index of a in A , by setting pos( a ) = i if and only if A [ i ] = a .Further, let rank( a ) = |{ i : A [ i ] < a }| + 1 be the “correct” position of a with respect to asorted copy of A , irrespective of whether or not a ∈ A . We often use an element a ∈ A and itsindex pos( a ) interchangeably, especially for the target element e . Note that, as discussed in theintroduction, for oblivious algorithms we generally assume e ∈ A .

3. Searching with imprecise queries

In this section, we consider the problem of ﬁnding the index pos( e ) of an element e in a sortedarray A of length n = 2 d , d ∈ N in a setting where queries may yield erroneous results. Wesay that ‘¡’ is a lie (the truth) for index i if A [ i ] ≥ e ( A [ i ] < e ), and analogously for ‘¿’ and‘=’. To quantify the number of lies, we introduce two parameters k lies and k faults . The ﬁrstparameter k lies simply bounds the number of queries with erroneous results, which we interpretas the number of lies allowed to an adversary. The second parameter k faults bounds the numberof indices i for which query( i ) (consistently) returns the wrong result, allowing the conclusionthat e / ∈ A in case query( e ) yields the wrong result. Equivalently, for an unsorted array A ,we can require all queries to be truthful and deﬁne k faults ( e ) to be the number of inversionsinvolving e , i.e., k faults ( e ) = | i : ( i < pos( e ) ∧ A [ i ] > e ) ∨ ( i > pos( e ) ∧ A [ i ] < e ) | . Observe thatboth deﬁnitions of k faults are equivalent. For clarity, we write k faults when considering theadversarial interpretation, and k faults ( e ) when considering it as a measure of disorder of anunsorted array. For both k lies and k faults , we only allow queries to e to yield ‘=’.The algorithms of this section operate on the binary search tree rooted at index r = n/ > ( i ) and next < ( i ) to denote thetwo successors of node i , e.g., next > ( r ) = n/ < ( r ) = 3 n/

4. Similarly, we write prev( i )to denote the predecessor of i in the binary search tree, and prev q ( i ) = v for the last vertex v on the unique r - i -path such that next q ( v ) also lies on the r - i -path (prev q ( i ) = ∅ if no suchnode exists). Intuitively, prev q ( i ) is the last vertex corresponding to an array entry larger (if q =“¿”) or smaller (if q =“¡”) than A [ i ]. For convenience, query( ∅ ) = ∅ , prev ( r ) = ∅ , andnext ( i ) = i if i is a leaf of the tree. We further denote by d ( i, j ) the length of the path from5ode i to node j in the search tree. We say that an algorithm operates on the binary search tree if no index is queried before its predecessor in the tree.We start by considering the parameter k lies . If we knew the value of this parameter, we couldtry a regular binary search, replace every query with 2 k lies + 1 queries to the same elementand use the majority outcome in each step. However, this would give (2 k lies + 1) log n queries,where ideally we should not use more than log n + f ( k ) queries. We ﬁrst give an algorithmthat achieves the separation between n and k lies while being oblivious to the value of k lies .Importantly, the algorithm only needs O (1) memory words, which also makes it applicable tosettings where “safe” memory, that cannot be corrupted during the course of the algorithm, islimited. This algorithm still needs 2 log n + f ( k ) queries, but we will show later how to buildon the same ideas to (almost) eliminate the factor of 2. Algorithm 1:

Algorithm with 2 log n + 4 k lies queries the algorithm stops once a query yields ‘=’ i ← n/ // start at the root while ( q ← query( i )) = ‘ = ’ do // by deﬁnition, ‘=’ cannot be a lie i ′ ← prev ¬ q ( i ) // ∅ if all queries on the path from the root yielded q while i = i ′ ∧ query( i ′ ) = q do // while query( i ′ ) contradicts its previous outcome. . . i ← prev( i ) // . . . backtrack towards i ′ if i = i ′ then // if we did not backtrack all the way to i ′ . . . i ← next q ( i ) // . . . proceed according to q Intuitively, Algorithm 1 searches the binary search tree deﬁned above, simply proceeding ac-cording to the query outcome at each node. In addition, the algorithm invests queries to doublecheck past decisions. We distinguish left and right turns, depending on whether the algorithmproceeds with the left or the right child. In particular, before proceeding, the algorithm queriesthe last vertex on the path from the root where it decided for a turn in the opposite direction.While an inconsistency to previous queries is detected, i.e., a query to a vertex where it turnedright (or left) gives ‘¿’ (or ‘¡’), the algorithm backtracks one step. In this manner, the algorithmguarantees that it never proceeds along a wrong path without the adversary investing additionallies. Note that if the algorithm only ever turned right (or left), i.e., there was no previous turnin the opposing direction, it does not double check any past decisions until the query outcomechanges. This is alright since either the algorithm is on the right path or the adversary needsto invest a lie in each step.

Theorem 1.

We can ﬁnd e obliviously using n + 4 k lies queries and O (1) memory.Proof. We claim that Algorithm 1 achieves the bound of the theorem. Note that prev ¬ q ( i )only depends on i and not on the outcome of previous queries, therefore, we can determine itwith O (1) memory words. We will show that in each iteration of the outer loop of the algorithm,the potential function Φ = 2 d ( i, e ) + 4 k decreases by at least one for each query, where k isthe number of remaining lies the adversary may make. This proves the claim, since Φ ≥ ≤ n + 4 k lies . We analyze a single iteration of the outer loop.Observe that if z is the number of iterations of the inner loop, then the total number of queriesis z + 2 if the inner loop terminates because query( i ′ ) = ¬ q , and z + 1 if it terminates because i = i ′ . If an iteration of the inner loop is caused by query( i ′ ) being a lie, then in this iteration∆Φ ≤ − −

2, and otherwise, d ( i, e ) is decreased by one and likewise ∆Φ = − − − z . If the inner loop6erminates because i = i ′ , then z ≥ ≤ − z ≤ − z − z + 1 queries.Now consider the case that the inner loop terminates because query( i ′ ) = ¬ q . If ¬ q is a liefor i ′ or q is a lie for i , the adversary invested an additional lie, and even if the last update to i increases d ( i, e ), the total change in potential is bounded by ∆Φ ≤ − z − ≤ − z − z + 2 queries. On the other hand, if ¬ q is the truth for i ′ and q is thetruth for i , then e ∈ { i ′ , . . . , i } and i must lie on the unique r - e -path in the search tree (and i = e ). The ﬁnal update to i thus decreases d ( i, e ) by 1 and the total change in potential is∆Φ = − z −

2, again enough to cover all z + 2 queries. Algorithm 2:

Algorithm with (1 + c ) log n + (2 c + 2) k lies queries the algorithm stops once a query yields ‘=’ i ← n/ // start at the root while ( q ← query( i )) = ‘ = ’ do // by deﬁnition, ‘=’ cannot be a lie i ′ ← prev ¬ q ( i ) // ∅ if all queries on the path from the root yielded q while < c ∆ i ′ < d ( i, i ′ ) + 1 do // while we do not have suﬃcient support to proceed. . . query( i ′ ) // . . . query i ′ for support if ∆ i ′ = 0 then // if we ran out of support at i ′ altogether. . . i ← i ′ // . . . backtrack to i ′ else // if we have suﬃcient support at i ′ . . . i ← next q ( i ) // . . . proceed according to q We now adapt Algorithm 1 to minimize the impact of potential lies on the dependencyon log n in the running time. Intuitively, instead of backing up each query q ← query( i ) bya query to prev ¬ q ( i ), we back only one in c queries (cf. Algorithm 2). During the course ofthe algorithm and its analysis, we let n q,j denote the number of queries (so far) to node j thatresulted in q ∈ { <, > } and ∆ j := | n <,j − n >,j | . Theorem 2.

For every c ≥ , we can ﬁnd e obliviously using (1 + c ) log n + (2 c + 2) k lies queries.Proof. We claim that Algorithm 2 achieves the bound of the theorem. In this algorithm, weintuitively back up every c -th query (for integral c ). To capture this in our potential function,we need a term that stores potential for the next c queries. We will introduce two such terms L, T , representing the case the algorithm’s current belief of the relation between i ′ and e is alie or the truth, respectively. We need to distinguish these cases, since they lead to diﬀerentbehavior regarding d ( i, e ) and the number of remaining lies.We need the following additional notation. For some current value of i during the execution ofthe algorithm, we deﬁne the type of a node j of the search tree on the r - i -path to be t j ∈ { <, > } if next t j ( j ) also lies on this path. Further, we let succ q ( j ) = j ′ if j ′ is the ﬁrst node on the j - i -path with t j ′ = q or succ q ( j ) = i if no such node exists. We set succ q ( i ) = ∅ . To avoidspecial treatment of leaves, we replace each leaf of the search tree by an inﬁnite binary tree ofnodes corresponding to the original leaf, in both algorithm and analysis. If e was a leaf, then,for each new node j corresponding to e , we set d ( j, e ) = d ( j, r e ) where r e is the root of thesubtree corresponding to e .Intuitively, the potential of the algorithm needs to depend on c ∆ i ′ − d ( i, i ′ ), since this diﬀerencecaptures the number of steps it can still make before it needs to use a backup query. To keeptrack of this diﬀerence across iterations of the algorithm, we introduce the notion of a zig-zag air , which we will deﬁne formally below. In particular, ( i, i ′ ) always forms a zig-zag pair. Let j = next q ( i ) in some iteration after which ∆ i ′ = 0, i.e., i gets updated to j . If j has thesame type as i in the next iteration, i ′ stays the same and we can simply replace the zig-zagpair ( i, i ′ ) with ( j, i ′ ). On the other hand, if j has a diﬀerent type in the next iteration, we needto introduce a new pair ( j, i ). Since we may backtrack later and continue diﬀerently at i , we alsoneed to keep the pair ( i, i ′ ). Conceptually, we need to keep track of all maximal l - l ′ -subpaths ofthe current r - i -path with the property that l ′ has the opposite type than all other nodes on thesubpath. If the algorithm backtracks to node l at some point, then, in the next iteration, i = l and i ′ = l ′ , and the diﬀerence c ∆ l ′ − d ( l, l ′ ) captures how much potential remains to continuequerying without using a back up query to i ′ = l ′ .Formally, we deﬁne the set of all zig-zag pairs as Z := (cid:8) ( j, j ′ ) | ∃ q ∈ { <, > } .j ′ = prev q ( j ) ∧ j = succ q ( j ′ ) (cid:9) . Note that ( i, i ′ ) ∈ Z throughout the algorithm, and that every node appears at most once asthe second element of a zig-zag pair, exactly if it has a diﬀerent type than its successor on theunique r - i -path. For convenience, we set ∆ ∅ to be equal to the number of all “queries” to ∅ inthe algorithm. We deﬁne L = X ( j,j ′ ) ∈ Z [ c ∆ j ′ − d ( j, j ′ )] · Λ t j ′ ,j ′ , where Λ q,j = 1 if q is a lie for j and Λ q,j = 0 otherwise. Similarly, we deﬁne T = X ( j,j ′ ) ∈ Z [ d ( j, j ′ ) − c (∆ j ′ − · (1 − Λ t j ′ ,j ′ ) . With this notation in place, we introduce the extended potential functionΦ = (1 + 1 c ) d ( i, e ) + (2 + 1 c ) L + 1 c T + (2 c + 2) k, where k is the number of lies remaining to the adversary.We claim that L, T ≥ ≥ Z to either L or T is non-negative. To see this,ﬁrst observe that in each iteration Z changes exactly by either removing the zig-zag pair ( i, i ′ )(unless i = r ), by replacing it with the pair (next q ( i ) , i ′ ), or by adding a new pair (next q ( i ) , i ).Inductively, it thus suﬃces to show that the contribution of (next q ( i ) , i ′ ) or (next q ( i ) , i ) in thelatter cases (∆ i ′ = 0) is positive. First, observe that ∆ i = 1 after the iteration, hence, if(next q ( i ) , i ) ∈ Z , its contribution to L or T must be positive.Now consider the case that (next q ( i ) , i ′ ) ∈ Z after the iteration. By deﬁnition of the algorithm,the inner loop ensures that c ∆ i ′ ≥ d ( i, i ′ ) + 1 = d (next q ( i ) , i ′ ) , hence the contribution of (next q ( i ) , i ′ ) to L is non-negative. Now consider the last iteration ofthe outer loop in which ∆ i ′ changed, and let j, j ′ be the corresponding values of i and i ′ in thatiteration. Either d ( i, i ′ ) = ∆ i ′ = 1, or the last change to ∆ i ′ was because j ′ = i ′ and c ∆ i ′

1. If q is the truth for i and ¬ q is the truth for i ′ , then e ∈ { i, . . . , i ′ } and i must lie on the unique r - e -path. The update to i then decreases d ( i, e ) andchanges the potential by ∆Φ = − (1 + c ) + 0 + c − − i ′ = 0, and ﬁx the value of∆ i ′ before the inner loop. We may assume that no query to i ′ yielded ¬ q , otherwise we canbalance each such query with a query that yielded q , one the two being a lie, for a change inpotential of ∆Φ = − (2 c + 2) ≤ −

4, which pays for both these queries. With this assumption,we have exactly ∆ i ′ ≥ q . Note that the previousiteration of the outer loop ensured that c ∆ i ′ ≥ d ( i, i ′ ). If ¬ q is a lie for i ′ , the eventual updateto i decreases d ( i, e ) by d ( i, i ′ ) and decreases L by c ∆ i ′ − d ( i, i ′ ) (since ( i, i ′ ) is eliminated from Z ). The overall change in potential then is∆Φ ≤ − (1 + 1 c ) · d ( i, i ′ ) − (2 + 1 c )[ c ∆ i ′ − d ( i, i ′ )] + 0 + 0= d ( i, i ′ ) − c ∆ i ′ − ( c + 1)∆ i ′ ≤ − ( c + 1)∆ i ′ c ≥ ≤ − − ∆ i ′ , which is enough to cover all 1 + ∆ i ′ queries. On the other hand, if ¬ q is the truth for i ′ , theeventual update may increase d ( i, e ) by at most d ( i, i ′ ) and it eliminates the contribution of( i, i ′ ) to T (since i = i ′ and, hence, ( i, i ′ ) / ∈ Z ). The adversary invested ∆ i ′ additional lies, andthe change in potential is∆Φ ≤ +(1 + 1 c ) d ( i, i ′ ) + 0 + 0 − (2 c + 2)∆ i ′ c ≥ ≤ d ( i, i ′ ) − i ′ ≤ − i ′ ≤ − − ∆ i ′ , which is again enough to cover all 1 + ∆ i ′ queries.Finally, consider the case where the inner loop is executed until c ∆ i ′ ≥ d ( i, i ′ ) + 1. As before, c ∆ i ′ ≥ d ( i, i ′ ) ≥

1, and, again, we may assume that no query to i ′ yielded q . Hence, as c ≥ i ′ that yielded ¬ q , i.e., 2 queries overall. We need to show that∆Φ ≤ −

2. Assume ﬁrst that ¬ q is the truth for i ′ and we thus decreased T by ( c − q isthe truth for i , we have ∆Φ ≤ − (1 + c ) + 0 − c ( c − − −

2. If q is a lie for i , we have∆Φ ≤ (1 + c ) + 0 − c ( c − − (2 c + 2) = − − c + c ≤ −

2. Now assume that ¬ q is a lie for i ′ and we thus increased L by ( c − ≤ (1 + c ) + (2 + c )( c −

1) + 0 − (2 c + 2) = − Theorem 3.

For every c ∈ N , no algorithm operating on the search tree can ﬁnd e with lessthan log n + ck lies queries in general. † Proof.

We consider the behavior of the algorithm on the search tree for large values of n thatare powers of two. We split the queries of the algorithm into phases, where phase p starts assoon as a node of depth ( c + 1) · p is queried for the ﬁrst time, starting with phase 0. We take theperspective of an adversary and specify the outcome to each query, ensuring that at most onelie is invested in each phase, and at most k lies lies overall. Note that we do not have to decideimmediately whether a query outcome is truthful and neither do we have to ﬁx the position of e a priori.Consider a ﬁxed phase p . The ﬁrst query of the phase to node i of depth ( c + 1) · p yields‘ < ’, all subsequent queries to positions smaller (larger) than i yield ‘ > ’ (‘ < ’). If the algorithmqueries more than once a node of depth ( c + 1) · p in the ﬁrst c + 1 queries of the phase or anynode in the left subtree of i , then the phase needs at least c + 2 queries and we do not lie, i.e., e is in the subtree rooted at the leftmost node of depth ( c + 1)( p + 1) in the right subtree of i . Otherwise, we lied for the query to node i and e is in the subtree rooted at the rightmostnode i ′ of depth ( c + 1)( p + 1) in the left subtree of i . Since no node in the left subtree of i hasbeen queried yet, the algorithm needs an additional c queries to reach i ′ , for a total of 2 c + 1queries in the phase. Once all lies have been used up, we continue answering queries as before,and each phase trivially needs at least c + 1 queries.Observe that querying every node on the path from node n/ e requires exactly log n queries, or c + 1 queries per phase (except maybe a last, partial phase). Now if we use up all k lies lies, then there are k lies phases that need c additional queries each, for a total of log n + ck lies ,as claimed. Otherwise, let P ( n ) > j log nc +1 k − k lies be the number of phases in which we did not lie.Each such phase needed c + 2 queries instead of c + 1. Overall, we have more than log n + P ( n )queries. Since P ( n ) is unbounded with growing n while k lies and c are constant, we havelog n + P ( n ) ≥ log n + ck lies for n large enough, as claimed.Note that the construction in the proof of Theorem 3 can be applied without change to k faults ,since the adversary never gives conﬂicting replies. As a consequence, we immediately obtain alower bound for k faults . Corollary 1.

For every c ∈ N , no algorithm operating on the search tree can ﬁnd e with lessthan log n + ck faults queries in general. We show how to translate any algorithm with a performance guarantee with respect to k lies to an algorithm with the same guarantee for k faults . Theorem 4.

Let f : N → N . If we can ﬁnd e with f ( n, k lies ) queries, then we can ﬁnd e with f ( n, k faults ) queries.Proof. Assume we have an algorithm that needs f ( n, k lies ) queries. The diﬃculty when applyingthis algorithm for k faults is that, in the faulty setting, there is no beneﬁt in querying the sameelements again. However, we can simulate repeated queries to the same element as follows. Saythe algorithm needs to query a previously queried element i with the understanding that the10 ainv k bswap k rbswap k swap k aswap = k inv k sum k bmov k rep = k mov = k seq k max k to k ′ means that k ≤ ck ′ , where c is the product of the edge labels along the path ( c = 1for unlabeled paths). If there is no solid black path from k to k ′ , then k cannot bebounded by ck ′ for any constant c . Every arc is proved explicitly in Appendix A(dashed red arcs correspond to unboundedness results), and all other relationshipsare implied.adversary has to pay for lying repeatedly. Let i ′ be the ﬁrst unqueried index to the left or to theright of i . If no such index exists, we already queried all elements and found e , since query( e )is guaranteed to return the correct result. We query i ′ instead of i . If i ′ = e , we are done.Otherwise, we know that no index in [ i, i ′ ] contains e , and, hence, all these elements are left of e or all of them are right of e . Therefore the query to i ′ is equivalent to another query to i whenthe adversary has to pay for repeated lies. Every fault can be treated as a lie by the adversary,and we get the claimed bound.

4. Searching disordered arrays

In this section, we consider the problem of ﬁnding the index pos( e ) of an element e in array A oflength n = 2 d , d ∈ N . In contrast to Section 3, we do not assume A to be sorted but expect allqueries to yield correct results. We study a variety of parameters that quantify the disorder of A and provide algorithms and lower bounds with respect to the diﬀerent parameters. Figure 1gives the relationship between every pair of parameters. The proofs of these relationships canbe found in Appendix A. We now consider the two parameters k sum and k max that quantify the displacement of elementsbetween A and A ⋆ . More precisely, we deﬁne k sum := P x ∈ A | pos( x ) − rank( x ) | and k max :=max x ∈ A | pos( x ) − rank( x ) | . We ﬁrst derive bounds in terms of k sum . Theorem 5.

Every search algorithm needs at least ⌊ log( n/ k sum ) ⌋ + 2 k sum + 1 queries, even ifthe elements other than e are in the correct relative order. † Proof.

We give a strategy for an adversary to position the elements of the array adaptively,depending on the queries of the search algorithm. The strategy maintains a range { l, . . . , r } ofcandidate indices for the searched element e that never grows during the course of the strategy.In the beginning, we set l = 1 and r = n .In the ﬁrst phase of the strategy, we maintain the invariant that all queries to indices i < l yield(and yielded) the result A [ i ] < e , and all queries to indices j > r yield A [ j ] > e . Whenever an11ndex i ∈ { l, . . . , r } is queried, the result depends on whether { l, . . . , i } is larger than { i, . . . , r } or not. In the former case, the query yields A [ i ] > e and we set r = i −

1. In the lattercase, it yields A [ i ] < e and we set l = i + 1. The ﬁrst phase ends after ⌊ log( n/ k sum ) ⌋ − ⌊ log( n/ k sum ) ⌋ ≤ ⌊ log( n/ (2 k sum + 2)) ⌋ queries. At this point, we have r − l + 1 ≥ k sum + 2,hence there are still at least 2 k sum + 2 positions left that may contain e .In the second phase of the adversarial strategy, we answer the next 2 k sum +1 queries to indices i ∈ { l, . . . , ⌊ ( l + r ) / ⌋} with A [ i ] < e and all queries to i ∈ {⌊ ( l + r ) / ⌋ + 1 , . . . , r } with A [ i ] > e .Afterwards, at least one unqueried index in { l, . . . , r } remains. It is easy to see that e beingin this position is consistent with all queries so far. Overall, the position of e cannot be foundwith fewer than ⌊ log( n/ k sum ) ⌋ + 2 k sum + 1 queries, as claimed. Moreover, all < answers areleft of > answers, allowing all elements other than e to be in correct relative order.We extract the following corollary from the proof of Theorem 5. Corollary 2.

There is a constant c ∈ N , such that for every l > , the adversary can ensurethat after log n/l + c queries, an unqueried subarray of length l remains, such that all elementsto the left of the subarray are smaller than e , while all elements to the right of it are largerthan e . We give an algorithm that achieves a optimal number of queries up to an additive up to anadditive gap of log k sum + O (1), while being oblivious of the value of k sum . Theorem 6.

We can ﬁnd e obliviously using log n/k sum + 2 k sum + O (1) queries.Proof. We ﬁrst perform a regular binary search for e , ignoring the fact that we may be misguidedby elements being displaced. In log n + O (1) steps, we ﬁnd e or an index i with A [ i ] < e and A [ i + 1] > e . Let ∆ i := | pos( A [ i ]) − rank( A [ i ]) | . We have rank( e ) > rank( A [ i ]) ≥ i − ∆ i andrank( e ) < rank( A [ i + 1]) ≤ i + 1 + ∆ i +1 . With pos( e ) ∈ { rank( e ) − ∆ e , . . . , rank( e ) + ∆ e } and ∆ e + ∆ i + ∆ i +1 ≤ k sum , we getpos( e ) ∈ { i − k sum + 1 , . . . , i + k sum } . We can search this range obliviously by querying the elements i, i + 1 , i − , i + 2 , i − , i + 3 , . . . inthis order, until we ﬁnd e . During the initial binary search, we already queried log k sum + O (1)of these elements, hence we need a total number of queries equal tolog n + 2 k sum − log k sum + O (1) = log n/k sum + 2 k sum + O (1) . We now turn our attention to the parameter k max . Theorem 7.

Every search algorithm needs at least log n/k max + 3 k max + O (1) queries. † Proof.

By Corollary 2, the adversary can ensure without creating inversions that after usinglog n/ k max + O (1) = log n/k max + O (1) queries the element e may still be at any position of anunqueried subarray of length 4 k max . It is therefore suﬃcient to show that ﬁnding e in an arrayof length 4 k max and with maximum displacement k max may take 3 k max − L = { , . . . , k max } contain elements smaller than e , whilethe positions in R = { k max + 1 , . . . , k max } contain elements larger than e . The ﬁrst phaseends when k max − k max − L atthe beginning of the second phase. Otherwise, this is true for R and the argument proceedsanalogously. We now restrict the position of e to deﬁnitely lie in L . The second phase proceedsuntil another k max − L have been queried. All queries to positions in R ∪ { } are answered as before. For the queries to positions in L \ { } we return the inverse answer tobefore.The third phase proceeds until one more position in L is queried, which will contain e .The number of queries up to this point is at least 2( k max −

1) for the ﬁrst phase, at least k max − k max − e are on the left (right) of e while not moving elements by morethan k max positions. We ﬁx a ﬁnal ordering by requiring that the smaller (larger) elementsremain in the same relative order.If position 1 was not queried in phase 2, then there are k max − e and k max − e in L . We have rank( e ) = k max , thus e is displaced by atmost k max positions. All elements smaller than e are displaced by at most k max positions, sincethere are at most k max elements that are greater or equal to e in L . Similarly, all elementslarger than e are displaced by at most k max positions, since there are at most k max − e in L \ { } .If position 1 was queried in phase 2, then there are k max elements smaller than e and k max − e in L . The element in position 1 has rank 1 and all other elements aredisplaced by at most k max positions in L \ { } , as before.To obtain a tight upper bound, we need the following observations. Proposition 1. If A [ i ] > A [ j ] then i ≥ j − (2 k max − .Proof. If A [ i ] > A [ j ] then rank( A [ i ]) ≥ rank( A [ j ]) + 1 by deﬁnition. Using that i ≥ rank( A [ i ]) − k max and rank( A [ j ]) ≥ j − k max , both by assumption, we derive i ≥ rank( A [ i ]) − k max ≥ rank( A [ j ]) + 1 − k max ≥ j − (2 k max − , as claimed. Lemma 1.

For all i we have |{ j < i : A [ j ] > A [ i ] }| ≤ k max and symmetrically |{ j > i : A [ j ] A [ i ] for all l ∈ { , , . . . , k max + 1 } (the symmetrical case can be proven analogously).Let A [ r ] be such that rank( A [ r ]) = j k max +1 − k max = i − k max −

1. We have r < i , since i − rank( A [ r ]) = k max + 1 > k max . Also A [ r ] < A [ j ] for all j ≥ i , since rank( A [ j ]) ≥ j − k max ≥ i − k max > rank( A [ r ]). On the other hand, the number of elements A [ j ] with j < i and A [ j ] ≤ A [ i ] is at most j k max +1 − ( k max + 1) = i − k max −

2, since the number of elements A [ j ]with j < i and A [ j ] > A [ i ] is at least k max + 1 by assumption. But then, the total number ofelements that are smaller than A [ r ] < A [ i ] is at most i − k max −

3. This is a contradiction withrank( A [ r ]) = i − k max − k max . Theorem 8.

We can ﬁnd e obliviously using log n/k max + 3 k max + O (1) queries. roof. We ﬁrst use a binary search to ﬁnd e or an index i with a i < e < a i +1 with log n + O (1)queries. By Proposition 1, we have pos( e ) ∈ W = { i − k max + 1 , . . . , i + 2 k max } . We query thepositions in W , starting from the center (positions i and i + 1 don’t need to be queried again)and moving to the left whenever the number of queried elements of W larger than e exceeds thenumber of smaller elements, and moving to the right otherwise. This can be done obliviously,i.e., without knowing k max .We claim that we are guaranteed to encounter e within 3 k + 1 queries. To see this, assumewithout loss of generality that e is in the left half of W . In this case, we claim that we do notquery any elements in { i + k max + 2 , . . . , i + 2 k max } . For the sake of contradiction, assume wequery positions { i + 1 , . . . , i + k max + 2 } , i.e., at least k max + 2 positions in the right half of W .This means that we queried at least k max + 1 elements smaller than e in W before ﬁnding e ,by construction of the algorithm. But then |{ j > pos( e ) : A [ j ] < e }| > k max , contradictingLemma 1.We can reﬁne this analysis by observing that we already queried at least log k + O (1) positionsamong { i − k max + 1 , . . . , i + k max } during the initial binary search. In this section we consider the number of inversions between elements of the array A . Moreprecisely, we deﬁne the number of inversions to be k inv := |{ i < j : A [ i ] > A [ j ] }| , and thenumber of adjacent inversions to be k ainv := |{ i : A [ i ] > A [ i + 1] }| .We have k sum ≤ k inv ≤ k sum (cf. Proposition 13), therefore the results for k sum (Theorems 5and 6) carry over to k inv with a gap of 2. Corollary 3.

Every search algorithm needs at least log n/k inv + 2 k inv + O (1) queries † , and wecan ﬁnd e obliviously with log n/k inv + 4 k inv + O (1) queries. In general, we cannot hope to obtain results of similar quality for the smaller parameter k ainv .In fact, already for k ainv = 1 any search algorithm needs to query all n elements. Proposition 2.

For k ainv ≥ , no algorithm can ﬁnd e with less than n queries.Proof. Consider the family of arrays that are obtained from [1 , . . . , n ] by moving n to an ar-bitrary position (possibly leaving it in place); all arrays of this form have k ainv ≤ n and the succeeding element). An adversary mayuse this family to force any search algorithm to query all n positions when searching for e = n :The adversary will answer the ﬁrst n − < , maintaining that all arrays where n is inany unqueried position are consistent with the answers given so far. (This is easy to see sinceall other elements are smaller than e = n .)Fortunately, we can do much better if the target e is guaranteed to be in the correct positionrelative to sorted order, i.e., if pos( e ) = rank( e ). Note that this restriction still allows us toprove a lower bound on the necessary number of queries that is much larger than all precedingresults. We will complement this lower bound by a search algorithm that matches it tightly (upto lower-order terms). Both upper and lower bound hinge on the question of how eﬃciently (interms of queries) an algorithm can ﬁnd a good estimate of rank( e ) by querying the array. Theorem 9.

Every search algorithm needs at least √ nk ainv − o ( √ nk ainv ) queries, even if pos( e ) = rank( e ) . † roof. We describe an adversary that will force any search algorithm to use at least the claimednumber of queries, i.e., at least 2 √ nk ainv minus lower order terms. The adversary will notﬁx the actual contents of the array beforehand but will guarantee throughout that there existsan n -element array with at most k ainv adjacent inversions that is consistent with all queriesanswered so far. We will consider positions of the underlying array to be numbered 1 , . . . , n ,and assume that n is even for convenience.At high level, the adversary aims to place the target e as close to the middle of the arrayas possible. Accordingly, his standard response to queries to the ﬁrst half is < whereas it is > for the second half. Eventually, he will be forced to pick a concrete array and position of e that is consistent with all queries. If e is placed in position x in the ﬁrst half then previousqueries between x and the middle have identiﬁed elements that are smaller than the target, andthat are now found to the right of it. To adhere to the restriction that pos( e ) = rank( e ), theadversary needs to choose an array that has the same number of larger elements to the left of e .He needs to place blocks of such elements between positions left of x that are already queried(and contain smaller elements) without causing too many adjacent inversions; these blocks arecalled hidden blocks in reference to the fact that none of their positions has been queried before.(All of this is symmetric for placement in the second half.)Let us now give a detailed description of the adversary’s strategy. The standard responseis < for positions 1 , . . . , n and > for positions n + 1 , . . . , n . The adversary keeps track of thefollowing values: p ≥ n − p has not been queried yet; q ≥ n + 1 + q has not been queried yet; ℓ is the number of queries thathave been made to positions 1 , . . . , n − p −

1; and r is the number of queries that have beenmade to positions n + q + 2 , . . . , n −

1. Initially we have p = q = ℓ = r = 0. Observe that p and q will never decrease, but ℓ and r may decrease upon increases of p or q , respectively. Notethat p + ℓ and q + r never decrease: E.g., p + ℓ counts the queries on 1 , . . . , n − p − n − p + 1 , . . . , n ; there is never any previous query for position n − p by choice of p . Thus,we also see that p + q + ℓ + r is always equal to the total number of queries made so far by thesearch algorithm.Concretely, the adversary plans to put the target e either in position n − p in the ﬁrst half orin position n + 1 + q in the second half of the array. Queries to the ﬁrst, respectively second,half of the array may force him to abandon the corresponding option. Abandoning the ﬁrst, saynot putting the target in the ﬁrst half, allows him to continue answering < in the ﬁrst half, andto answer > in the second half until he needs to commit to a position n + 1 + q in the secondhalf. In other words, the standard response can be continued. Once a standard response wouldforce to abandon the second option he instead needs to commit to an instantiation that puts thetarget in the previously still feasible half. At this point, no further queries will be guaranteed,since the current query could in principle be to the position of e in the chosen instantiation.Nevertheless, we show that the number p + q + ℓ + r of queries of the search algorithm investedwill be suﬃciently large at this point. To do so, we give a lower bound on p + ℓ for the pointwhen the ﬁrst half is no longer feasible, and an analog lower bound on q + r for the second half.Intuitively, the adversary aims to have the target as close as possible to the center of thearray. Of course, queries to positions around n will eventually increase the values of p and q ,which are bounds for how close the target e can be to the center. Since the adversary needs tofulﬁll pos( e ) = rank( e ), the position of e forces the adversary to choose an array with the correctnumbers of smaller and larger elements. In particular, if the adversary chooses for example toplace the target e in the current position p , then all elements in positions p + 1 , . . . , n are smallerthan the target (due to queries that were already answered); these are exactly p elements. Thus,to balance out the numbers the adversary needs to choose an array such that at least p elements15arger than the target are in positions 1 , . . . , p − p and have no additionalsmaller elements succeeding the target). This is hindered by the fact that ℓ queries were alreadymade on this part, and the fact that every maximal block of larger elements in the ﬁrst halfnecessarily ends with an adjacent inversion (either with a smaller element or with the target).This will eventually force the adversary to “give up” on one half of the array or (when thishappens the second time) to pick a concrete instantiation with the target placed in the secondhalf.Now, assume that some query is made by the search algorithm. We will discuss in detail whathappens for a query to a position in 1 , . . . , n ; queries to n + 1 , . . . , n are treated symmetrically.For a query to a position in 1 , . . . , n the adversary tests whether he can answer with < andstill maintain existence of a consistent instantiation that places e in the ﬁrst half. To this end,pretend that the query is answered < , update p and ℓ , and check whether there is a consistentinstantiation with target in position n − p that has at most k ainv adjacent inversions. Note thatthe adversary does not need to do this optimally. It suﬃces to have a strategy that causes theclaimed number of queries, and he may give up even though there could still be a consistentinstantiation.From the perspective of a sorted array, placing the target in position n − p ∈ { , . . . , n } onlyconﬂicts with the queries to positions n − p + 1 , . . . , n that were already answered with < , ofwhich there are exactly p . To balance out the total numbers of larger and smaller elements,the adversary checks for an instantiation that puts k blocks of larger elements into positions1 , . . . , n − p −

1, surrounded by smaller elements. This causes k adjacent inversions betweenthe last element of a block and the subsequent element. Moreover, one adjacent inversion existsbetween the target in position n − p and its successor, which must be a smaller element due toqueries, barring the trivial case of p = 0 where a fully sorted array without adjacent inversionssuﬃces. Thus, the adversary may choose k := k ainv −

1. In the interest of recycling the argumentlater, we will do the analysis in terms of k and only plug in k ainv − k non-overlapping blocks in positions 1 , . . . , n − p − p ; placing p larger elementsin such blocks will be consistent with queries answered so far. If such a choice of blocks existthen a valid instantiation would be to place p larger elements in these blocks, surrounded bysmaller ones, followed by the target in position n − p , followed by p smaller elements, and ﬁnallyby n larger elements. Observe that the adversary can choose the smaller respectively largerelements freely and, hence, there are no adjacent inversions inside blocks of smaller respectivelylarger elements. Thus, if the p larger elements that are required can be placed in k blocks inpositions 1 , . . . , n − p − k + 1 adjacent inversions ( k from the blocks and one between the target and its successor). With k ≤ k ainv − p + ℓ for the case that the adversary cannot ﬁnd k non-overlapping blocks of unqueried elements in 0 , . . . , n − p −

1. Clearly, the total size of any k maximal unqueried blocks in 0 , . . . , n − p − p . We can relate n , p , and k by counting the number F of unqueried elements in 1 , . . . , n − p − ℓ is the number of queries made to 1 , . . . , n − p − F = n − p − − ℓ. (1)On the other hand, letting B , . . . , B s denote the maximal unqueried blocks in 1 , . . . , n − p − F = s X i =1 | B i | . For convenience, we explicitly include empty blocks between adjacent queried positions, beforea position 1 if it is queried, and after position n − p − ℓ + 1. Since any k of these blocks have total size lessthan p we get | B | + . . . + | B k | < p , | B | + . . . + | B k +1 | < p , and so on (wrapping around indiceslarger than s ). Summing up these s = ℓ + 1 inequalities we get( ℓ + 1) · p = s · p > k · s X i =1 | B i | = k · F. Together with (1) this yields( ℓ + 1) · p > k · F = k · (cid:16) n − p − − ℓ (cid:17) . We bring this inequality into a more convenient form to derive a lower bound for ℓ + p :( ℓ + 1) · p > k · (cid:16) n − p − − ℓ (cid:17) ⇔ ( ℓ + 1) · p + k · ( p + ℓ ) > k · (cid:16) n − (cid:17) (2)The left hand side is upper bounded by (cid:18) p + ℓ + 12 (cid:19) + k · ( p + ℓ ) ≥ ( ℓ + 1) · p + k · ( p + ℓ ) , (3)since, for any x, y ≥ (cid:18) x − y (cid:19) ≥ ⇒ x − xy y ≥ ⇒ x xy y ≥ xy ⇒ (cid:18) x + y (cid:19) ≥ xy ;we use it with x = ℓ + 1 and y = p to obtain (3).Combining (2) with (3) yields (cid:18) p + ℓ + 12 (cid:19) + k · ( p + ℓ ) > k · (cid:16) n − (cid:17) , ⇔ (cid:18) p + ℓ + 12 (cid:19) + k · ( p + ℓ + 1) − k − k · (cid:16) n − (cid:17) > , ⇔ (cid:18) p + ℓ + 12 (cid:19) + k · ( p + ℓ + 1) − k · n > , (4) ⇔ ( p + ℓ + 1) + 4 k · ( p + ℓ + 1) − kn > . (5)17sing p + ℓ + 1 > p + ℓ + 1 we arrive at p + ℓ + 1 > − k + p k + 2 kn. ⇒ p + ℓ ≥ p k + 2 kn − k. Thus, if the proposed instantiation is not possible using k blocks of larger elements then wehave p + ℓ ≥ p k + 2 kn − k = √ kn − o ( √ kn ) . Similarly, if there is no feasible instantiation placing the target at n + 1 + q in the second halfof the array then we can prove that q + r ≥ √ k + 2 kn − k = √ kn − o ( √ kn ). We give thecalculations here for completeness.Let us check ﬁrst that the adversary can use the same number k = k ainv − n + 1 , . . . , n , contain (in order) q larger elements,the target in position n + 1 + q , and larger elements interspersed by up to k blocks of smallerelements. As before, elements within the group of larger/smaller elements can be assumed tobe sorted, thus, adjacent inversions are only possible before the target (if preceded by a largerelement), or before a smaller element (if preceded by the target or an element larger than thetarget). Clearly, we get an adjacent inversion at each of the k blocks, a single inversion betweenthe q larger elements and the target, and no further adjacent inversions. Hence, k = k ainv − n +1+ q is infeasible then in particular there are no k blocks of unqueried elementsin n + q + 2 , . . . , n of total size at least q . Again, using that the number F ′ of unqueried elementsin n + q + 2 , . . . , n is equal to n − q − − r , but also equal to the total size of the s ′ = r + 1maximal unqueried blocks B ′ , . . . , B ′ s ′ (including size-zero blocks), we get( r + 1) · q = s ′ · q > k · s ′ X i =1 | B ′ i | = k · F ′ = k · (cid:16) n − q − − r (cid:17) . This implies ( r + 1) · q + k · ( q + r ) > k · (cid:16) n − (cid:17) . At this point it is obvious that we arrive at the same lower bound for q + r as we had for p + ℓ ,i.e., as claimed above.Thus, we conclude that if the adversary gets to use instantiations with k hidden blocks (asabove) then he can enforce a total of at least p + ℓ + q + r ≥ p k + 2 kn − k = 2 √ kn − o ( √ kn )queries. For the case of k = k ainv − √ k ainv n − o ( √ k ainv n ) asclaimed.We now describe an algorithm that achieves the optimal number of queries (up to lower orderterms). Note that the algorithm requires knowledge of k ainv . Theorem 10.

We can ﬁnd e using √ nk ainv + o ( √ nk ainv ) queries if pos( e ) = rank( e ) .Proof. For the description of our algorithm it will be convenient to take the array as havingpositions numbered 0 , . . . , n −

1. As before, a query for some position i will yield < , > , or =depending on whether A [ i ] < e , A [ i ] > e , or A [ i ] = e . The assumption that pos( e ) = rank( e )18ill be crucial; the algorithm will attempt to get a good estimate for rank( e ) and then query acertain range around the estimated position.The algorithm ﬁrst ﬁxes a block size of the form p = c · q nk ainv for some constant c that wewill ﬁx later, and then queries positions 0 , p + 1 , p + 1) , . . . , ⌈ n − p ⌉ ( p + 1) , n −

1. We refer tothese positions as the grid . In this way, unqueried blocks of size (at most) p remain, with everyblock sandwiched between two grid positions. We will refer to these blocks according to queryoutcomes to the adjacent grid positions: <> -blocks, << -blocks, >< -blocks, and >> -blocks.We use ♯ ( xy ) to denote the number of xy -blocks for x, y ∈ { <, > } . We assume that the target e is not at a grid position, since otherwise the claimed bound holds trivially.Intuitively, since there are at most k ainv adjacent inversions, there can only be a limitednumber of >> -blocks containing elements smaller than e and of << -blocks containing largerelements: Either constellation leads to at least one adjacent inversion inside the block. Moreover,every >< -block must contain an adjacent inversion, which upper bounds their number by k ainv (we will give a better bound later). The number of <> -blocks is at most equal to the numberof >< -blocks plus one, because there must be an >< -block somewhere between any two <> -blocks.The algorithm now tries to estimate the position of e in the array. Crucially, because pos( e ) =rank( e ), there must be exactly pos( e ) elements smaller than e in the array. We denote by q ( x )the number of queries to grid positions that returned x , for x ∈ { <, >, = } . We further denote by η ( xy ), for x, y ∈ { <, > } , the number of adjacent inversions involving an element of a xy -blocks,including adjacent inversions between an element of the block and an adjacent grid position.We denote by N ( xy ), for x, y ∈ { <, > } , the total number of positions in xy -blocks (not countingthe queries adjacent to the block).Observe that pos( e ) is equal to the number of elements smaller than e in the array. We referto these elements simply as small elements, as opposed to large elements that are larger than e .We want to bound the number of small elements in order to obtain a range of positions thatour algorithm needs to search. By deﬁnition, we have q ( < ) small elements in grid positions.We now give upper and lower bounds for the number of small elements in each type of blockin terms of ♯ ( xy ) and η ( xy ). It can be observed that upper and lower bound are attained byblocks not containing e , hence we will tacitly ignore this case. << -blocks: The maximum number N ( << ) of smaller elements in << -blocks is attained if thereare no adjacent inversions. If there is at least one adjacent inversion in a << -block, it canhave anywhere between 0 and block size (at most p ) small elements: If the large elementsform a single block then there is exactly one adjacent inversion between the last elementof the block and the succeeding element (possibly a grid position). Overall, the numberof small elements in << -blocks is between N ( << ) − η ( << ) · p and N ( << ). >> -blocks: The minimum number of small elements in >> -blocks is attained if there are noadjacent inversions; in this case there are no smaller elements in these blocks. A >> -block with at least one adjacent inversion can contain any number between 0 and theblock size (at most p ) of small elements: A consecutive block of small elements causes asingle adjacent inversion at its start. Overall, the number of small elements in >> -blocksis between 0 and η ( >> ) · p . >< -blocks: Every >< -block contains at least one adjacent inversion. In such a block, evenwithout any further adjacent inversions, we can have between 0 and the block size (atmost p ) smaller elements: The block may contain all large elements followed by all small19nes, and have a single adjacent inversion between the last large and ﬁrst small element.Overall, the number of small elements in >< -blocks is between 0 and ♯ ( >< ) · p . Sincethere is at least one adjacent inversion per >< -block we have ♯ ( >< ) ≤ η ( >< ) and theupper bound becomes η ( >< ) · p . <> -blocks: In <> -blocks we can have any number of small elements followed by large ones(with the total being at most the block size p ) without any adjacent inversions. We notedalready that the number of these blocks is at most ♯ ( >< ) + 1. Overall, the number ofsmall elements in <> -blocks is between 0 and ♯ ( <> ) · p ≤ ( ♯ ( >< ) + 1) · p ≤ ( η ( >< ) + 1) · p .In total, we get that there are at least q ( < ) + N ( << ) − η ( << ) · p and at most q ( < ) + N ( << ) + η ( >> ) · p + η ( >< ) · p + ( η ( >< ) + 1) · p elements that are smaller than e .Since pos( e ) is equal to the number of small elements in the array, the gap (plus one) betweenthese two bounds is the number of positions that the algorithm has to query in order to ﬁnd e or be sure that it is not present. The gap (diﬀerence) is upper-bounded by( q ( < ) + N ( << ) + η ( >> ) · p + η ( >< ) · p + ( η ( >< ) + 1) · p ) − ( q ( < ) + N ( << ) − η ( << ) · p ) + 1= η ( >> ) · p + η ( >< ) · p + ( η ( >< ) + 1) · p + η ( << ) · p + 1= η ( >> ) · p + η ( << ) · p + 2 η ( >< ) · p + p + 1 . Since η ( >> ) + η ( << ) + η ( >< ) + η ( <> ) ≤ k ainv and since all values are non-negative, thisexpression is maximized for η ( >< ) = k ainv and η ( <> ) = η ( << ) = η ( >> ) = 0, i.e., if thereare no adjacent inversions inside << -blocks, >> -blocks, and <> -blocks. We get a range of atmost (2 k ainv + 1) · p + 1 positions to search, which coincides with the claim of the theorem,since p = c p n/k ainv . Unfortunately, while the algorithm knows k ainv and can thus computethe number of elements in the range it has to search, it does not know exactly where the rangestarts or ends because it does not have access to the values of η for each block and cannotcompute the value of the upper or lower bound.Since the search range is maximized when all inversions fall in >< -blocks it makes sense toreﬁne our initial grid in order to get rid of >< -blocks altogether. We can do this by running abinary search on each >< -block to ﬁnd an adjacent inversion in at most 1 + log p queries. To dothis, we simply query the center element and recurse on the left subblock if it is small and on theright subblock if it is large, until we are left with a subblock containing a large element followedby a small one. By extending our initial grid by the additional query positions, we replace each >< -block by some number of >> -blocks, an empty >< -block (the adjacent inversion), andsome number of << -blocks. We update the values of ♯ , N , η and q accordingly.Overall, our algorithm spends at most ♯ ( >< ) · (1 + log p ) queries on the reﬁnement of thegrid, resulting in all >< -blocks being empty. Thus, we get tighter bounds for the candidaterange for pos( e ) by setting the block size of >< -blocks to 0 (instead of p ), which eliminates theterm η ( >< ) · p . Note that while we introduced additional << -blocks and >> -blocks and maythus have increased η ( << ) and η ( >> ), the additional blocks contain fewer than p elements.Nevertheless, we can use the generous bound of p for the size of all blocks, since the block size20nly appears negatively in the lower bound and positively in the upper bound. We retain thelower bound of q ( < ) + N ( << ) − η ( << ) · p and get a new upper bound of q ( < ) + N ( << ) + η ( >> ) · p + ( η ( >< ) + 1) · p, Using that − k ainv ≤ − η ( << ) ≤ ≤ η ( >> ) + η ( >< ) ≤ k ainv we conclude that thenumber of smaller elements is lower bounded by q ( < ) + N ( << ) − k ainv · p and upper bounded by q ( < ) + N ( << ) + k ainv · p + p. Since the bounds depend only on values that are known to the algorithm, it can simply queryall positions in this range. Since pos( e ) = rank( e ), if the target e is contained in the array thenit must be in the position that equals the number of smaller elements. Thus, it suﬃces to querythe 2 k ainv p + p + 1 elements between the above bounds. Note that small savings are possiblehere because some of the positions have been queried previously, but we will not analyze this.Thus, the overall number of queries needed to establish the initial grid setup, for its reﬁnement,and for the ﬁnal sweep of the candidate range for pos( e ) is at most np + 1 + 2 + ♯ ( >< ) · (1 + log p ) + 2 k ainv · p + p + 1 = np + 1 + 2 k ainv · p + o ( k ainv p ) , (6)where np +1 + 2 upper bounds the number queries needed to establish the initial grid thatpartitions the array into unqueried blocks of size at most p each. Rounding up slightly, we areleft with choosing c in p = c · q nk ainv in order to minimize np + 1 + 2 k ainv · p ≤ np + 2 k ainv · p. (7)In other words, after plugging in p = c · q nk ainv , we seek c that minimizes nc · q nk ainv + 2 k ainv · c · r nk ainv = 1 c p nk ainv + 2 c · p nk ainv = (cid:18) c + 2 c (cid:19) · p nk ainv . Choosing c = √ yields ( c + 2 c ) = 2 √ √ nk ainv + o ( √ nk ainv )queries for ﬁnding the target e . In this section, we consider parameters that bound the number of elementary array modiﬁcationsneeded to sort the given array A . More precisely, a replacement is the operation of replacingone element with a new element, and we let k rep be the (minimum) number of replacementsneeded to obtain a sorted array. A swap is the exchange of the content of two array positions,and k swap is the number of swaps needed to sort A . We let k aswap be the number of swaps ofpairs of neighboring elements needed to sort A . A move is the operation of removing an element21nd re-inserting it after a given position i , shifting all elements between old and new positionby one. We let k mov be the number of moves needed to sort A .Clearly, starting from a sorted array, we can move e to any position, without using more thana single move or swap, or two replacements involving e . To ﬁnd e we then have to query theentire array. Proposition 3.

For k mov ≥ , k swap ≥ , or k rep ≥ , no algorithm can ﬁnd e with less than n queries in general. We can obtain signiﬁcantly improved bounds if the element e remains at its correct positionrelative to the sorted array. Recall that we can interpret k faults as a measure of disorder via k faults ( e ) = | i : ( i < pos( e ) ∧ A [ i ] > e ) ∨ ( i > pos( e ) ∧ A [ i ] < e ) | . Lemma 2. If rank( e ) = pos( e ) , then k faults ( e ) ≤ min { k rep , k swap , k mov } .Proof. Consider an array A with k rep = k . We get a modiﬁed array A ′ with k rep = k ′ byswitching out all elements smaller than e in A with a common element e < < e and all largerelements by e > > e . Let r j = ( i j , e j ) , j ∈ { , . . . , k } be k replacements that transform A into a sorted array. We deﬁne r ′ j = ( i j ,

0) if e j < e and r ′ j = ( i j , e ) otherwise. Clearly, thereplacements r ′ j = ( i j , e j ) , j ∈ { , . . . , k } transform A ′ into a sorted array, and, hence, k ′ ≤ k .Let m be the number of entries left of e that contain 2 e . Since rank( e ) = pos( e ), we have that m is also the number of entries equal 0 right of e . It is clear that k ′ ≥ m , since we have m disjoint pairs of elements in the wrong relative order that need to be repaired by replacing atleast one of the two. On the other hand, in both A and A ′ , we have k faults ( e ) = 2 m ≤ k ′ ≤ k .Now assume A has k swap = k and let s j = ( i j , i ′ j ) , j ∈ { , . . . k } be the k swaps (given bythe indices of the swapped elements) that transform A into a sorted array. Clearly, the samesequence of swaps turns A ′ into a sorted array, and, hence, k ′ ≤ k , where A ′ has k swap = k ′ . Asbefore, we have m disjoint pairs of elements in the wrong relative order that need to be repairedby switching at least one of the two elements. Each switch can repair two such pairs, and wethus have k ′ ≥ m/

2. In both A and A ′ , we have k faults ( e ) = 2 m ≤ k ′ ≤ k .Similarly, the transformation from A to A ′ does not increase k mov . Again, we have m disjointpairs of elements in the wrong relative order that need to be repaired by moving at least one ofthe two. Hence, k faults ( e ) = 2 m ≤ k mov .With this lemma, we can translate the upper bounds of any algorithm for k lies . Before wedo, we introduce another measure of disorder, that turns out to be closely related to k mov .We deﬁne the parameter k seq to be such that n − k seq is the length of a longest nondecreasingsubsequence in A . It turns out that k mov = k seq (cf. Proposition 5), and we can thus includethis parameter in our upper bound. Theorem 11.

Let f : N → N . If rank( e ) = pos( e ) , and we can ﬁnd e with f ( n, k lies ) queries,then we can ﬁnd e obliviously with min { f ( n, k rep ) , f ( n, k swap ) , f ( n, k mov ) , f ( n, k seq ) } queries. We can also carry over the lower bound from Corollary 1.

Corollary 4.

For every c ∈ N and pos( e ) = rank( e ) , no algorithm operating on the search treecan ﬁnd e with less than log n + ck queries in general, for k ∈ { k rep , k swap , k mov , k seq } . † Proof.

Because of pos( e ) = rank( e ), we group all indices in A with at least one wrong queryinto m disjoint pairs with one element left of e and one element right of e in each pair.22tarting from a sorted array, we can clearly produce the array A by swapping each pair.This requires k faults / k faults element moves or replacements. We get k faults ≥ max { k rep , k swap , k mov } . Corollary 1 together with k mov = k seq (Proposition 5) thus impliesthe claim.Finally, we immediately obtain bounds for k aswap from Corollary 3, because k aswap = k inv (cf. Proposition 6). Corollary 5.

Every search algorithm needs at least log n/k aswap + 2 k aswap + O (1) queries † , andwe can ﬁnd e obliviously with log n/k aswap + 4 k aswap + O (1) queries. In this section we consider the parameters k bswap , k rbswap , and k bmov , which bound the numberof block edit operations needed to sort A . A block is deﬁned to be a subarray A [ i, i + 1 , . . . , j ]of consecutive elements. A block swap is the operation of exchanging a subarray A [ i, . . . , j ]with a subarray A [ i ′ , . . . , j ′ ] and vice versa, where i < j < i ′ < j ′ . Note that a block swapmay aﬀect the positions of other elements in case that the two blocks are of diﬀerent sizes. Theparameter k bswap bounds the number of block swaps needed to sort A . For k rbswap we only allowblock swaps restricted to pairs of blocks of equal sizes. Finally, for k bmov one of the two blocksmust be empty, i.e., only block moves are allowed. For all three parameters one can easily provethat, without further restrictions, search algorithms need to query all positions of an array toﬁnd the target. Proposition 4.

For k bswap ≥ , k rbswap ≥ , or k bmov ≥ , no algorithm can ﬁnd e with lessthan n queries.Proof. For k bmov and k bswap consider the family of arrays obtained from [1 , . . . , n ] by moving n to an arbitrary position; arrays of this form have k bmov = k bswap ≤

1. An adversary may answerthe ﬁrst n − < while maintaining that placing e = n in any of the unqueriedpositions is consistent with all given answers.For k rbswap consider the family of arrays obtained from [1 , . . . , n ] by swapping n with anyelement (or keeping it in place; such arrays have k rbswap ≤

1. An adversary may again answerthe ﬁrst n − < while maintaining that placing e = n in any unqueried position isconsistent with the given answers.Complementing this lower bound, for all of three parameters, an upper bound of O ( √ nk ) forﬁnding e when pos( e ) = rank( e ) follows immediately from the results for k ainv of Section 4.2 andthe fact that k ainv ≤ k bswap (Proposition 16) and k bswap ≤ min { k rbswap , k bmov } (Propositions 9and 11). By inspecting the upper and lower bounds proved for k ainv , and adapting the proofs,we are able to obtain tight leading constants in the upper and lower bounds for k bmov and k bswap , and leading constants within a factor of √ k rbswap . First, we adapt the lower boundfor k ainv (Theorem 9) to k rbswap and k bswap . Theorem 12.

Every search algorithm needs at least p nk rbswap − o ( p nk rbswap ) queries toﬁnd e , even if pos( e ) = rank( e ) . † Proof.

We use the same adversary setup as in Theorem 9 but we need to now make sure thatthe adversary maintains existence of a suitable instantiation that is at most k rbswap restrictedblock swaps away from being sorted. We will show that the same strategy can be used, butwith k = k rbswap hidden blocks. 23e only discuss the case that the adversary commits to an instantiation with the target e in the ﬁrst half of the array and having k blocks of large elements between already queriedpositions. By the analysis from the proof of Theorem 9 we know that when the adversarycommits to putting the target into position n − p in the ﬁrst half, there exist k non-overlappingblocks of unqueried elements in the ﬁrst half, and with total size at least p . The instantiationconsists of (in this order) small elements in positions 1 , . . . , n − p −

1, interspersed with k blocksof large elements of total length p ; the target e in position n − p ; p small elements in positions n − p + 1 , . . . , n ; and n large elements in positions n + 1 , . . . , n . Clearly, this can be turned by k = k rbswap restricted block swaps into an array with small elements, followed by the target, andfollowed by large elements. Since the adversary does not need to report the numerical values,the actual numbers can be chosen with the instantiation in such a way that the latter intervalis sorted.The lower bound is thus obtained by plugging in k = k rbswap into the lower bound of2 p k + 2 kn − k = 2 √ kn − o ( √ kn ) . For k = k rbswap this yields the claimed lower bound.A lower bound of 2 p nk bswap − o ( p nk bswap ) now follows from k bswap ≤ k rbswap (Proposi-tion 9), but a better lower bound is obtained in the following theorem. Theorem 13.

Every search algorithm needs at least p nk bswap − o ( p nk bswap ) queries to ﬁnd e even if pos( e ) = rank( e ) . † Proof sketch.

Again we use the same adversary setup as in Theorem 9 and this time focus on(unrestricted) block swaps rather than restricted block swaps. To get a stronger lower bound,the adversary will use more hidden blocks, namely 2 k bswap − k hiddenblocks with only k +12 block swaps. It is formulated for the case of placing e in the second halfof an array but the other case is symmetric. Positions are numbered 1 through n . Claim 1.

Let n ∈ N , and let α , . . . , α k , β , β , . . . , β k ∈ N with P α i = q and P β i = n − q − .There is an array A with k bswap ( A ) ≤ ⌈ k +12 ⌉ that contains (in order) the following elements: n elements smaller than e , q elements larger than e , element e , and an alternating sequence ofblocks of larger and smaller elements of sizes β , α , β , . . . , α k , β k (with β i the sizes of blocks oflarger elements).Proof. Start with any sorted array A ′ that contains element e in position n + q + 1. Accordingly,with positions numbered 1 to n , array A ′ has exactly n + q = n + P α i elements that are smallerthan e and exactly n − q − P β i elements that are larger than e . We will construct thedesired array A from A ′ by a sequence of at most ⌈ k +12 ⌉ block swaps; we start with A := A ′ .As a ﬁrst block swap, exchange A [ n , n − q −

1] with A [ n + q + β , n + q + β + q ]. In A wenow have the following structure: n smaller elements, q larger elements, element e , β largerelements, q smaller elements, and β + . . . + β k larger elements. All further operations willonly be among these ﬁnal two groups of elements. For convenience, we discuss the remainingoperations on the subarray ˆ A containing only the ﬁnal q = α + . . . + α k smaller elementsfollowed by β + . . . + β k larger elements. Clearly, operations turning ˆ A into an alternatingsequence of smaller and larger blocks of sizes α , β , α , . . . , α k , β k can also be applied to get A into the required form. We show how to do this with ⌈ k − ⌉ block swaps.24f k = 1 then ˆ A has already the required form and we use 0 = ⌈ k − ⌉ block swaps; getting A into correct form thus used 1 = ⌈ k +12 ⌉ block swaps. If k = 2 then we need to transform α + α small elements followed by β + β large ones into pattern α , β , α , β , which can be done byswapping the last α small elements with the ﬁrst β large ones; in total we use two swaps on A .For k ≥ k ′ = k −

2. We have α + . . . + α k small elements followed by β + . . . + β k large ones,and need to reach pattern α , β , α , . . . , α k , β k . We will swap ˆ A [ α + 1 , . . . , α + α k ] withˆ A [ | ˆ A | − β k − β , | ˆ A | − β k − α k small elements with β small ones. Theresult is that ˆ A now contains (in order) the following blocks: (1) α small elements (not movedthis time), (2) β large elements (just swapped), (3) α + . . . + α k − small elements (not swappedthis time, but possibly shifted) , (4) β + . . . + β k − large elements (not swapped), (5) α k smallelements (just swapped), and (6) β k large elements (never moved).Observe that the remaining problem now becomes to transform a subarray ˜ A with α + . . . + α k − small elements followed by β + . . . + β k − large ones into one with pattern α , β , α , . . . , α k − , β k − .This part is situated in ˆ A [ α + β + 1 , | ˆ A | − α k − β k −

1] and hence also in A , and performing theblock swaps on this part does not aﬀect the already correctly placed elements. Thus, overall weneed at most ⌈ k +12 ⌉ block swaps, as claimed.Thus, for a lower bound in terms of the number k bswap of block swaps the adversary can use k = 2( k bswap −

1) hidden blocks: An array instantiation with the k hidden blocks in the requiredpositions costs him only ⌈ k +12 ⌉ = k bswap block swaps. Using this, we can plug k = 2( k bswap − p k + 2 kn − k = 2 √ kn − o ( √ kn )and obtain the claimed lower bound of 4 p nk bswap − o ( p nk bswap ).Now, we directly get a lower bound in terms of the number k bmov of block moves since, using k bmov ≤ k bswap , a more eﬃcient search would otherwise violate the lower bound for k bswap . Corollary 6.

Every search algorithm needs at least √ nk bmov − o ( √ nk bmov ) queries to ﬁnd e even if pos( e ) = rank( e ) . † A matching upper bound for k bswap , in the sense of 4 p nk bswap + o ( p nk bswap ), follows imme-diately from the fact that k ainv ≤ k bswap (Proposition 16). The same bound can be obtainedrelative to the number of block moves, using k ainv ≤ k bmov or k bswap ≤ k bmov , but a tightupper bound of 2 √ nk bmov + o ( √ nk bmov ) is proved in Theorem 14 below. For the number ofrestricted block swaps, i.e., swapping only blocks of the same size (never incurring any shifts),we get an upper bound of 4 p nk rbswap + o ( p nk rbswap ) from k bswap ≤ k rbswap (Proposition 9)but this is not tight regarding the leading constant but asymptotically tight. Corollary 7.

We can ﬁnd e using p nk bswap + o ( p nk bswap ) queries if pos( e ) = rank( e ) . Corollary 8.

We can ﬁnd e using p nk rbswap + o ( p nk rbswap ) queries if pos( e ) = rank( e ) . Theorem 14.

We can ﬁnd e using √ nk bmov + o ( √ nk bmov ) queries if pos( e ) = rank( e ) . The shifting of elements will not be relevant here but we mention it once to point out that it is not overlooked. roof. The idea for the proof is to revisit the upper bound for number k ainv of adjacent inversionsand observe that only the partition into blocks of elements smaller or greater than e matter (inaddition to the single element e ). Such a partition can have at most 2 k ainv + 2 blocks becauseevery block of large elements is followed by an adjacent inversion. We can get a similar boundin terms of the number k bmov of block moves, which allows us to conclude the analog bound of2 √ nk bmov + o ( √ nk bmov ) for k bmov . (Note that any sequence of k block moves that sorts thegiven array, can be reversed into one turning a sorted array into the given one.) Claim 2.

Starting from a sorted array containing at most a single copy of e , any sequence ofat most k block moves gives a partition into at most k + 2 maximal blocks of elements smaller,respectively larger than e , and possibly a unit block for the target.Proof. There is nothing to prove if there are only small or only large elements (but we give theproof independent of presence of e ). For convenience, assume that A only contains elements x , y , and e with x < e and y > e (i.e. possibly many copies of x and y , and at most a single copyof e ). We consider a single block move and show that it increases the number of maximal x -or y -blocks by at most 2. Concretely, we show that the number of alternations between blocksincreases by at most two (where we also count alternations between x - or y -blocks with e ).Say that we have A = ( . . . , a, b, . . . , c, d, . . . , e, f, . . . ) and we move block ( d, . . . , e ) between a and b , obtaining A ′ = ( . . . , a, d, . . . , e, b, . . . , c, f, . . . ). Note that a, b, c, d, e, f ∈ { x, y, e } . Weclaim that this block move cannot increase the number of block alternations by three. Assume,for contradiction, that it indeed increases this number by three. Since there are only three newadjacencies, it follows that ( a, d ), ( e, b ), and ( c, f ) must be alternations, i.e., a = d , e = b , and c = f . Similarly, we may not have removed alternations (or else the increase is at most two)so a = b , c = d , and e = f . It follows immediately that none of a, . . . , f is equal to e sincethat would imply having at least two copies. Accordingly, a, b, c, d, e, f ∈ { x, y } and we will use x = y and y = x , which allows us to replace, e.g., a = d by a = d . Thus, we get six equalitiesthat together yield a = d = c = f = e = b = a ; a contradiction. It follows that no block movecan create more than two additional block alternations, i.e., no block move can increase thenumber of maximal blocks by more than two.If e is present then the sorted array has small elements, followed by e , followed by largeelements; a total of three blocks. This increases to at most 2 k + 3 blocks after k block moves,one of which is e . If e is not present then we go from 2 blocks to at most 2 k + 2. This completesthe proof of the claim.It follows that arrays that can be sorted with at most k bmov block moves have at most k bmov + 1 pairs consisting of a large element followed by a small element: Such pairs can onlyoccur between diﬀerent blocks, neither of which is the block containing e . There are at most2 k bmov + 2 other blocks and hence at most 2 k bmov + 1 alternations between such blocks. Clearly,only every second block alternation can be from larger to smaller element, giving the claimednumber of at most k bmov + 1 adjacent alternations between an element larger than e and anelement smaller than e . Alternations of this type are the deﬁning quantity for the algorithmgiven in Theorem 10 for parameter k ainv ; there we used that this number is at most k ainv since they are a special case of adjacent alternations. This yields the claimed upper bound of2 √ nk bmov + o ( √ nk bmov ), completing the proof.26 . Conclusion We presented upper and lower bounds for the worst-case query complexity of comparison-basedsearch algorithms that are robust to persistent and temporary read errors, or are adaptiveto partially disordered input arrays. For many cases we gave algorithms that are optimal upto lower order terms. In addition, many of the algorithms are oblivious to the value of theparameter quantifying errors/disorder, assuming the target element is present in the array. Inmost cases, for small values of k , the dependence of our algorithms on the number n of elementsis close to log n , with only additive dependency on the number of imprecisions. In other words,these results smoothly interpolate beween parameter regimes where algorithms are as good asbinary search and the unavoidable worst-case where linear search is best possible.That said, why should one be interested in, e.g., almost tight bounds relative to the number ofblock moves that take A to a sorted array, as the bounds are far from binary search? The point isthat only the total number of comparisons matter, and having a worse function that depends ona (in this case) much smaller parameter value can be favorable to having a much better functionof a large parameter value. E.g., after a constant number of block swaps the parameters k max , k sum etc. may have value Ω( n ) and the guaranteed bound becomes trivial, while running thesearch algorithm for the case of few block swaps guarantees O ( √ n ) comparisons. Similarly,having tight bounds for the various parameters gives us the exact (worst-case) regime for thechosen parameter (in terms of n ) where a sophisticated algorithm can outperform linear search,or even be as good as binary search.Despite having already asymptotic tightness, it would be interesting to close the gaps betweencoeﬃcients of dominant terms in upper and lower bounds for some of the cases. Another questionwould be to ﬁnd a diﬀerent restriction than pos( e ) = rank( e ), i.e., the target being in the correctposition relative to sorted order, that avoids degenerate lower bounds of Ω( n ) queries for severalparameters. A relaxation to allowing a target displacement of ℓ and giving cost in terms of n , k ,and ℓ seems doable in most cases, but is unlikely to be particularly insightful. Finally, it seemsinteresting to study whether randomization could lead to improved algorithms for some of thecases. The analysis of randomized lower bounds requires entirely new adversarial strategies sincethe adversary must choose an instantiation without access to the random bits of the algorithm. Acknowledgements.

The authors are grateful to several reviewers for their helpful remarksregarding presentation and pertinent literature references.

References [1] Arne Andersson, Torben Hagerup, Johan H˚astad, and Ola Petersson. Tight bounds forsearching a sorted array of strings.

SIAM J. Comput. , 30(5):1552–1578, 2000.[2] Stanislav Angelov, Keshav Kunal, and Andrew McGregor. Sorting and selection with ran-dom costs. In

Proceedings of the 8th Latin American Symposium on Theoretical Informatics(LATIN) , pages 48–59, 2008.[3] Javed A. Aslam and Aditi Dhagat. Searching in the presence of linearly bounded errors. In

Proceedings of the 23rd Annual ACM Symposium on Theory of Computing (STOC) , pages486–493, 1991.[4] J´er´emy Barbay and Gonzalo Navarro. On compressing permutations and adaptive sorting.

Theor. Comput. Sci. , 513:109–123, 2013. 275] Therese C. Biedl, Timothy M. Chan, Erik D. Demaine, Rudolf Fleischer, Mordecai J.Golin, James A. King, and J. Ian Munro. Fun-sort–or the chaos of unordered binarysearch.

Discrete Applied Mathematics , 144(3):231–236, 2004.[6] Biagio Bonasera, Emilio Ferrara, Giacomo Fiumara, Francesco Pagano, and AlessandroProvetti. Adaptive search over sorted sets.

J. Discrete Algorithms , 30:128–133, 2015.[7] Ryan S. Borgstrom and S. Rao Kosaraju. Comparison-based search in the presence oferrors. In

Proceedings of the 25th Annual ACM Symposium on Theory of Computing(STOC) , pages 130–136, 1993.[8] Allan Borodin, Leonidas J. Guibas, Nancy A. Lynch, and Andrew Chi-Chih Yao. Eﬃcientsearching using partial ordering.

Inf. Process. Lett. , 12(2):71–75, 1981.[9] Gerth Stølting Brodal, Rolf Fagerberg, Irene Finocchi, Fabrizio Grandoni, Giuseppe F.Italiano, Allan Grønlund Jørgensen, Gabriel Moruz, and Thomas Mølhave. Optimal re-silient dynamic dictionaries. In

Proceedings of the 15th Annual European Symposium onAlgorithms (ESA) , pages 347–358, 2007.[10] F. Warren Burton and Gilbert N. Lewis. A robust variation of interpolation search.

Inf.Process. Lett. , 10(4/5):198–201, 1980.[11] Ferdinando Cicalese.

Fault-Tolerant Search Algorithms - Reliable Computation with Unre-liable Information . Springer, 2013.[12] Aditi Dhagat, Peter Gacs, and Peter Winkler. On playing “twenty questions” with a liar.In

Proceedings of the 3rd ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages16–22, 1992.[13] Vladimir Estivill-Castro and Derick Wood. A survey of adaptive sorting algorithms.

ACMComput. Surv. , 24(4):441–476, 1992.[14] Uriel Feige, Prabhakar Raghavan, David Peleg, and Eli Upfal. Computing with noisyinformation.

SIAM Journal on Computing , 23(5):1001–1018, 1994.[15] Irene Finocchi, Fabrizio Grandoni, and Giuseppe F. Italiano. Optimal resilient sorting andsearching in the presence of memory faults.

Theor. Comput. Sci. , 410(44):4457–4470, 2009.[16] Irene Finocchi and Giuseppe F. Italiano. Sorting and searching in faulty memories.

Algo-rithmica , 52(3):309–332, 2008.[17] Gianni Franceschini and Roberto Grossi. No sorting? better searching!

ACM Transactionson Algorithms , 4(1), 2008.[18] Michael L. Fredman. The number of tests required to search an unordered table.

Inf.Process. Lett. , 87(2):85–88, 2003.[19] Anupam Gupta and Amit Kumar. Sorting and selection with structured costs. In

Proceed-ings of the 42nd Annual Symposium on Foundations of Computer Science, (FOCS) , pages416–425, 2001.[20] Donald E. Knuth.

The Art of Computer Programming, Volume III: Sorting and Searching .Addison-Wesley, 1973. 2821] Philip M. Long. Sorting and searching with a faulty comparison oracle. Technical ReportUCSC-CRL-92-15, University of California at Santa Cruz, 1992.[22] Harry G. Mairson. Average case lower bounds on the construction and searching of partialorders. In

Proceedings of the 26th Annual Symposium on Foundations of Computer Science(FOCS) , pages 303–311, 1985.[23] Kurt Mehlhorn. Sorting presorted ﬁles. In

Proceedings of the 4th GI-Conference on Theo-retical Computer Science , pages 199–212, 1979.[24] Kurt Mehlhorn.

Data Structures and Algorithms 1: Sorting and Searching , volume 1 of

EATCS Monographs on Theoretical Computer Science . Springer, 1984.[25] S. Muthukrishnan. On optimal strategies for searching in the presence of errors. In

Proceed-ings of the 5th ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 680–689,1994.[26] Andrzej Pelc. Searching with known error probability.

Theoretical Computer Science ,63(2):185–202, 1989.[27] Andrzej Pelc. Searching games with errors – ﬁfty years of coping with liars.

TheoreticalComputer Science , 270(1-2):71–109, 2002.[28] Ola Petersson and Alistair Moﬀat. A framework for adaptive sorting.

Discrete AppliedMathematics , 59(2):153–179, 1995.[29] Erez Petrank and Guy N. Rothblum. Selection from structured data sets.

ElectronicColloquium on Computational Complexity (ECCC) , (85), 2004.[30] Ronald L. Rivest, Albert R. Meyer, Daniel J. Kleitman, Karl Winklmann, and Joel Spencer.Coping with errors in binary search procedures.

Journal of Computer and System Sciences ,20(3):396–404, 1980.[31] Robert Sedgewick.

Algorithms in C++ - Parts 1-4: Fundamentals, Data Structures, Sort-ing, Searching . Addison-Wesley-Longman, 1998.[32] Andrew Chi-Chih Yao. Should tables be sorted?

J. ACM , 28(3):615–628, 1981.29 . Relations between measures of array disorder

A.1. Equivalences

Proposition 5. k seq = k mov = k rep .Proof. k seq ≤ k mov : If k mov element moves lead to a sorted array then the at least n − k mov elements that are not moved must form a sorted subsequence. k mov ≤ k seq : The k seq elements that are not part of any ﬁxed ordered subsequence of length n − k seq can be moved to the correct positions in that sequence by k seq element moves. k seq ≤ k rep : Assume that k rep element replacements suﬃce to reach a sorted array. It followsthat at least n − k rep elements are not replaced and their subsequence must be already sorted. k rep ≤ k seq : Let P a set of k seq positions such that the subsequence S on the remaining n − k seq elements is sorted. One can replace the elements in P such that the entire array issorted. Proposition 6. k aswap = k inv .Proof. If there are any inversions then there is an adjacent inversion, which can be removed bya single swap of the adjacent elements; this lowers the number of inversions by one. On theother hand, each swap of adjacent elements aﬀects only the relative ordering of these elementsand, hence, removes at most one inversion.

A.2. Relations by similarity of operations

The following relations hold because some number of operations in terms of the ﬁrst measurecan be used to implement one operation of the second measure.

Proposition 7. k swap ≤ k aswap . Proposition 8. k rbswap ≤ k swap . Proposition 9. k bswap ≤ k rbswap . Proposition 10. k bmov ≤ k mov . Proposition 11. k bswap ≤ k bmov ≤ k bswap .Proof. Any block move can be implemented by swapping the block with an empty block. Anyblock swap can be implemented by two block moves.

Proposition 12. k rep ≤ k swap .Proof. A swap of two positions in the array can be implemented by two replacements.

A.3. Further relations

Proposition 13. k inv ≤ k sum ≤ k inv .Proof. k inv ≤ k sum : If at least one element is displaced then at least one element e occursbefore rank( e ) and at least one element e ′ occurs after rank( e ′ ). If any element e occurs beforerank( e ) then there must be a subsequent element e ′ , i.e., with pos( e ) < pos( e ′ ), that occurs afterrank( e ′ ). Consider a pair e and e ′ of elements with pos( e ) < rank( e ) and pos( e ′ ) > rank( e ′ ) suchthat all elements e ′′ between pos( e ) and pos( e ′ ) have their correct position pos( e ′′ ) = rank( e ′′ ).30Possibly there are no such elements e ′′ , but it should be clear that e and e can always be foundif k sum > e ) ≥ pos( e ′ ) since rank( e ) > pos( e ) and the positions betweenpos( e ) and pos( e ′ ) are already ﬁlled with non-displaced elements. Similarly, rank( e ′ ) ≤ pos( e ).Consider the operation of swapping e and e ′ : This would lower the total displacement by2(pos( e ′ ) − pos( e )) since rank( e ′ ) ≤ pos( e ) < pos( e ′ ) and rank( e ) ≥ pos( e ′ ) > pos( e ). In termsof swaps of adjacent elements the swap of e and e ′ costs exactly 2(pos( e ′ ) − pos( e ) −

1. Therelation follows since the number of adjacent swaps is at most the decrease in terms of totaldisplacement. k sum ≤ k inv : Recall that k inv = k aswap . Swapping any two adjacent elements can lower thetotal displacement by at most two since the elements are moved a total of two positions. Proposition 14. k max ≤ k inv .Proof. Recall that k inv = k aswap . Every swap of adjacent elements moves the elements byexactly one position each. Thus, the maximum displacement is lowered by at most one. Proposition 15. k rep ≤ k inv .Proof. Consider the ﬁrst element, say x , in the sequence that is in an inversion (with somelater element). It follows that elements preceding x are not larger than any later element. Inparticular, the directly preceding element, say y , must be strictly smaller than x and not exceedany later element. Now, if some later element is equal to y then replace x by y ; else, replaceit by an arbitrary value that is larger than y but smaller than any later element. Clearly, inboth cases all inversions involving x are handled (at least one), proving the bound. (Note thatthe replacement rules ensure that arrays of unique numbers will retain this property. Setting x to the value of y is only done if needed, i.e., if that value already occurs at least one moretime.) Proposition 16. k ainv ≤ k bswap .Proof. This can be proved by analyzing the three diﬀerent types of block swaps: (i) block moves,i.e., swapping a nontrivial block with an empty block, (ii) swapping two nontrivial, nonadjacentblocks, (iii) swapping two nontrivial, adjacent blocks. Cases (i) and (iii) can be veriﬁed to onlyincrease the number of adjacent inversions by at most two, else leading to a simple contradiction.For (ii), assume that we start with . . . , a, b, . . . , c, d, . . . , a ′ , b ′ , . . . , c ′ , d ′ , . . . and swap b, . . . , c with b ′ , . . . , c ′ to obtain . . . , a, b ′ , . . . , c ′ , d, . . . , a ′ , b, . . . , c, d ′ , . . . . Note that only the eight pairs ( a, b ), ( c, d ), ( a ′ , b ′ ), ( c ′ , d ′ ), ( a, b ′ ), ( c ′ , d ), ( a ′ , b ), and ( c, d ′ ) matterfor upper-bounding the increase in number of adjacent inversions. If a > b ′ and a ′ > b then a > b or a ′ > b ′ must hold (depending on b ≥ b ′ or b < b ′ ); in other words, if there are adjacentinversions at both ( a, b ′ ) and ( a ′ , b ) after the block swap then among ( a, b ) and ( a ′ , b ′ ) there wasat least one adjacent inversion. Similarly, if c ′ > d and c > d ′ then c > d or c ′ > d ′ must hold,i.e., if we have inversions at both ( c ′ , d ) and ( c, d ′ ) then we had at least one adjacent inversionamong ( c, d ) and ( c ′ , d ′ ). Thus, the total of adjacent inversions increases by at most two withthis block swap. Proposition 17. k ainv ≤ k seq roof. Pick any k seq positions P such that the subsequence S of the remaining n − k seq positionsis sorted. Clearly, any adjacent inversions must be between elements of P or between anelement of P and an element of S . Consider any subsequence x, y , . . . , y p , z where p ≥ y , . . . , y p ∈ P and x, z ∈ S . If there is the maximum of p + 1 adjacent inversions then it followsthat x > y > . . . > y p > z , violating that x < z in the sorted subsequence S . Else, there areat most p adjacent inversions incident with y , . . . , y p ∈ P . Thus, overall, have at most k seq adjacent inversions. A.4. Unboundedness results

Here we give pairs of measures such that the second can be unbounded, even if the ﬁrst isconstant. Each such relation is represented by a dashed red arc in Figure 1. Note that for anypair of parameters not connected with a directed path in the ﬁgure, unboundedness follows,because any bound would produce some path that contradicts an unboundedness relation.The ﬁrst proposition of this type covers the comparison of all parameters other than k max with k max since we showed that k ainv is bounded whenever any other parameter (except k max )is bounded. Proposition 18.

There exist arrays with k max = 1 and k ainv = Ω( n ) .Proof. For given even integer n , consider the array A = [2 , , , , . . . , n, n − e is (exactly) one position away from rank( e ). However, we ﬁnd that the array has n = Ω( n ) adjacent inversions. Proposition 19.

There exist arrays with k swap = 1 and k max = Ω( n ) .Proof. For given n , consider the array A = [ n, , , . . . , n − , n − , n − n ). Proposition 20.

There exist arrays with k rep = 1 and k max = Ω( n ) .Proof. For given n , consider the array A = [ n + 1 , , , . . . , n − , n − , n ]. A single elementreplacement, namely n + 1 by 1, suﬃces to reach a sorted array, but the maximum displacementis n − n ). Proposition 21.

There exist arrays with k rep = 1 and k rbswap = Ω(log n ) .Proof. We prove by induction on d that an array A of size n = 4 d with a large element followedby an increasing sequence of smaller elements cannot be sorted with less than d restricted blockswaps. The claim trivially holds for d ∈ { , } . For d >

1, consider any shortest possiblesequence of restricted block swaps to sort the array. Let A ′ be the array after executing theﬁrst swap only and let i be the index of A [1] in A ′ . If i ≤ n/

4, we can apply inductionon A ′ [ i, i + 1 , . . . , i + n/ − i > n/

4, let j be the largest index with A [ j ] = A ′ [ j ] and let j ′ be the index of A [ j ] in A ′ . We can then apply induction on A ′ [ j ′ , j ′ + 1 , . . . , j ′ + n/ −

1] toobtain the claimed bound.

Proposition 22.

There exist arrays with k rbswap = 1 and k seq = Ω( n ) .Proof. For given even integer n , consider the array A = [ n , n + 1 , . . . , n, , , . . . , n − A can be turned into a sorted array by a single restricted block swap, i.e., it has k rbswap = 1.Its longest sorted subsequence, however, has length n , implying k seq = n = Ω( n ).32 roposition 23. There exist arrays with k ainv = 1 and k bswap = Ω( n ) .Proof. For a given even integer n , consider the array A = [1 , , , . . . , n − , , , . . . , n ]. Clearly, A has a single adjacent inversion. To sort the array with block swaps, there needs to be ablock swap that increases the distance between consecutive odd/even numbers. Each blockswap can increase at most one such distance, hence we need at least n −−