A Framework for Searching in Graphs in the Presence of Errors
Dariusz Dereniowski, Stefan Tiegel, Przemysław Uznański, Daniel Wolleb-Graf
AA Framework for Searching in Graphsin the Presence of Errors
Dariusz Dereniowski ∗ , Stefan Tiegel , Przemysław Uznański , andDaniel Wolleb-Graf Faculty of Electronics, Telecommunications and Informatics,Gdańsk University of Technology, Poland Department of Computer Science, ETH Zürich, Switzerland Institute of Computer Science, Faculty of Mathematics and Computer Science,University of Wrocław, Poland
Abstract
We consider a problem of searching for an unknown target vertex t in a (possibly edge-weighted) graph. Each vertex-query points to a vertex v and the response either admits that v is the target or provides any neighbor s of v that lies on a shortest path from v to t . Thismodel has been introduced for trees by Onak and Parys [FOCS 2006] and for general graphs byEmamjomeh-Zadeh et al. [STOC 2016]. In the latter, the authors provide algorithms for theerror-less case and for the independent noise model (where each query independently receives anerroneous answer with known probability p < / and a correct one with probability − p ).We study this problem both with adversarial errors and independent noise models. First,we show an algorithm that needs at most log n − H ( r ) queries in case of adversarial errors, wherethe adversary is bounded with its rate of errors by a known constant r < / . Our algorithm isin fact a simplification of previous work, and our refinement lies in invoking an amortizationargument. We then show that our algorithm coupled with a Chernoff bound argument leads toa simpler algorithm for the independent noise model and has a query complexity that is bothsimpler and asymptotically better than the one of Emamjomeh-Zadeh et al. [STOC 2016].Our approach has a wide range of applications. First, it improves and simplifies the Ro-bust Interactive Learning framework proposed by Emamjomeh-Zadeh and Kempe [NIPS 2017].Secondly, performing analogous analysis for edge-queries (where a query to an edge e returnsits endpoint that is closer to the target) we actually recover (as a special case) a noisy binarysearch algorithm that is asymptotically optimal, matching the complexity of Feige et al. [SIAM J.Comput. 1994]. Thirdly, we improve and simplify upon an algorithm for searching of unbounded domains due to Aslam and Dhagat [STOC 1991]. ∗ Partially supported by National Science Centre (Poland) grant number 2015/17/B/ST6/01887. a r X i v : . [ c s . D S ] M a r Introduction
Consider the following game played on a simple connected graph G = ( V, E ) :Initially, the Responder selects a target v ∗ ∈ V . In each round , the Questioner asks a vertex-query by pointing to a vertex v of G , and the Responder provides a reply . Thereply either states that v is the target, i.e., v = v ∗ , or provides an edge incident to v that lies on a shortest path to the target, breaking ties arbitrarily. A specific numberof replies can be erroneous (we call them lies ). The goal is to design a strategy for theQuestioner that identifies v ∗ using as few queries as possible.We remark that this problem is known, among several other names, as Rényi-Ulam games [Ré61,Ula76], noisy binary search or noisy decision trees [FRPU94, KK07, BH08]. One needs to put somerestriction as how often the Responder is allowed to lie. Following earlier works, we focus on themost natural probabilistic model, in which each reply is independently correct with a certain fixedprobability.This problem has interesting applications in noisy interactive learning [Ang87, EK17, KV94,Lit87, Set12]. In general terms, the learning process occurs as a version of the following scheme. Auser is presented with some information — this information reflects the current state of knowledgeof the system and should take into account earlier interactions with the user (thus, the process isinteractive). Then, the user responds, which provides a new piece of data to the system. In order tomodel such dynamics as our problem, one needs to place some rules: what the information shouldlook like and what is allowed as a valid user’s response. A crucial element in those applications isthe fact that the learning process (reflected by queries and responses) does not require an explicitconstruction of the underlying graph on which the process takes place. Instead, it is enough to arguethat there exists a graph whose vertices reflect possible states. Moreover, this graph needs to havethe property that a valid user’s response reveals an edge lying on a shortest path to the state thatneeds to be determined by the system. Specific applications pointed out in [EK17] are the following.In learning a ranking the system aims at learning user’s preference list [RJ05, Liu11]. An informationpresented to the user is some list, and as a response the user swaps two consecutive elements on thislist which are in the wrong order with respect to the user’s target preference list. Or, the responsemay reveal which element on a presented list has the highest rank. Both versions of the responseturn out to be consistent with our graph-theoretic game over a properly defined graph, whose vertexset is the set of all possible preference lists. Another application is learning a clustering , wherethe user’s reply tells the system that in the current clustering some cluster needs to be split (thereply does not need to reveal how) or two clusters should be merged [ABV17, BB08]. Yet anotherapplication includes learning a binary classifier. The strength that comes from a graph-theoreticmodeling of those applications as our game is that, although the underlying graph structure hasusually exponential number of vertices (for learning a ranking it is l ! , where l is the maximum lengthof the preference list), the number of required queries is asymptotically logarithmic in this size[EKS16, EK17]. Thus, the learning strategies derived from the algorithms in [EKS16] and [EK17]turn out to be quite efficient. We stress out that the lies in the query game reflect the fact thatthe user may sometimes provide incorrect replies. We also note that any improvement of thosealgorithms, at which we aim in this work, leads to immediate improvements in the above-mentionedapplications.In [EKS16], the authors provide an algorithm with the following query complexity , i.e., theworst-case number of vertex-queries: − H ( p ) (cid:18) log n + O ( 1 C log n + C log δ − ) (cid:19) , where C = max (cid:18) ( 12 − p ) (cid:112) log log n, (cid:19) (1)1hat identifies the target with probability at least − δ , where n is the number of vertices of aninput graph and H ( p ) = − p log p − (1 − p ) log(1 − p ) is the entropy and p is the success probabilityof a query. It is further observed that when p < / is constant (w.r.t. to n ), then (1) reduces to log n − H ( p ) + o (log n )+ O (log δ − ) . log n − H ( p ) + o (log n )+ O (log δ − ) . However, this complexity deteriorateswhen − p (cid:28) √ log log n , and then (1) becomes O ( − H ( p ) (log n + log δ − )) . In our analysis, we first focus on an adversarial model called linearly bounded , in which a rate of lies r < / is given at the beginning of the game and the Responder is restricted so that at most rt lies occur in a game of length t . It turns out that this model is easier to analyze and leads to thefollowing theorem whose proof is postponed to Section 3.3. Theorem 1.1.
In the linearly bounded error model, with known error rate r < / , the target canbe found in at most log n − H ( r ) vertex queries. This bound is strong enough to make an improvement in the probabilistic model. By a simpleapplication of Chernoff bound, we get the following query complexity.
Theorem 1.2.
In the probabilistic error model with error probability p < / , the target can befound using at most − H ( p ) (cid:18) log n + O ( (cid:112) log n log δ − · log log n log δ − ) + O (log δ − ) (cid:19) vertex queries, correctly with probability at least − δ . Simplifying the bound.
For any
A, B it holds that √ AB log AB ≤ √ AB log( AB ) = O ( A/ log A )+ O ( B log B ) . We thus derive a query complexity of − H ( p ) (cid:16) log n + o (log n ) + O (log δ − log log δ − ) (cid:17) . Error comparison with [EKS16].
We compare, in the independent noise model, the precisequery complexities of [EKS16], i.e. (1) with Theorem 1.2. Observe that log n · C + log δ − · C ≥ ( C − log n ) / ( C log δ − ) / = (cid:112) log n log δ − · (cid:18) log n log δ − (cid:19) / and that log δ − · C ≥ log δ − (since C ≥ ). Thus, our bound from Theorem 1.2 for all ranges ofparameters asymptotically improves the one in (1).Note that the compared bounds are with respect to worst-case strategy lengths. Our bounds canbe made in expectation smaller by a factor of roughly − δ using the same techniques as in [BH08]and [EKS16]. If A < B , then √ AB ≤ B . If B < A . , then √ AB log( AB ) = O ( A . log A ) = O ( A/ log A ) . Otherwise, A/ log A + B log B ≥ (cid:113) AB log B log A = Θ( √ AB log( AB )) . .2 Our Contribution — Simplified Algorithmic Techniques The crucial underlying idea behind the algorithm from [EKS16] that reaches the query complexityin (1) is as follows. The algorithm maintains a weight function µ for the vertex set of the inputgraph G = ( V, E ) so that, at any given time, µ ( v ) represents the likelihood that v is the target.Initially, all vertices have the same weight. For a given µ , define a potential of a vertex v to be Φ µ ( v ) = (cid:80) u ∈ V µ ( u ) d ( v, u ) , where d ( u, v ) is the distance between the vertices u and v in G . Avertex q that minimizes this potential function is called a weighted median , or a median for short, q = arg min v ∈ V Φ µ ( v ) . The vertex to be queried in each iteration of the algorithm is a median (tiesare broken arbitrarily). After each query, the weights are updated: the weight of each vertex that iscompatible with the reply is multiplied by p , and the weights of the remaining vertices are multipliedby − p .The above scheme for querying subsequent vertices is the main building block of the algorithmthat reaches the query complexity in (1). However, the analysis of the algorithm reveals a problematiccase, namely the vertices that account for at least half of the total weight, call them heavy . On oneside, such vertices are good candidates to include the target, so they are ‘removed’ from the graphto be investigated later. However, the need to investigate them in this separate way leads to analgorithm that has three phases, where the first two end by trimming the graph by leaving only theheavy vertices for the next phase. The first two phases are sequences of vertex queries performed ona median. The last phase uses yet a different majority technique. The duration of each of the firsttwo phases are dictated by complicated formulas, which makes the algorithm difficult to analyze andunderstand.We propose a simpler algorithm than the one in [EKS16]. In each step, we simply query a medianuntil just one candidate target vertex remains. Our improvement lies in a refined analysis in howsuch a query technique updates the weights, which has several advantages. It not only leads to abetter query complexity but also provides a much simpler proof. Moreover, it results in a betterunderstanding as how querying a median works in general graphs. We point out that this techniqueis quite general: it can be successfully applied to other query models — the details can be found inthe appendix. Regarding the problem of searching in graphs without errors, many papers have been devoted totrees, mainly because it is a structure that naturally generalizes paths, which represents the classicalbinary search (see e.g. [LMP01] for search in a path with non-uniform query times). This querymodel in case of trees is equivalent to several other problems, including vertex ranking [Der08] or tree-depth [NdM06]. There exist linear-time algorithms for finding optimal query strategies [OP06, Sch89].A lot of effort has been done to understand the complexity for trees with non-uniform query times.It turns out that the problem becomes hard for trees [DN06, DKUZ17]. Also refer the reader toworks on a closely related query game with edge queries [CJLV12, CKL +
16, Der06, LY01, MOW08].For general graphs, a strategy that always queries a 1-median (the minimizer of the sum of distancesover all vertices) has length at most log n [EKS16].To shift our attention to searching in graphs with errors, two works have been recently publishedon probabilistic models [EKS16, EK17]. These models are further generalized in [DMS17] byconsidering the case of identifying two targets t and t , where each answer to a query gives anedge on a shortest path to t with probability p or to t with probability p = 1 − p , respectively.Furthermore, there exists a closely related model in which the search is restricted in such a way,that each query performed to a vertex v must be followed by a vertex query to one of its neighbors —3ee [BKR18, HIKN10, HKK04, HKKK08, KK99] — in this context errors are usually referred to asunreliable advice.An extensive amount of work has been devoted to searching problems in the presence of lies ina non-graph-theoretic context. The main tool of analysis is the concept of volume introduced byBerlekamp [Ber68] — see also [Cic13, Dep07] for a more detailed descriptions. We skip references tovery numerous works that deal with fixed number of lies, pointing to surveys in [Cic13, Dep07, Pel02].For general queries, it is known [RMK +
80] that a strategy of length log n + L log log n + O ( L log L ) exists, where n is the size of the search space and L is an upper bound on the number of lies.An almost optimal approximation strategy can be found in [Mut94], which is actually given for amore general model of q -ary queries. For the most relevant model in our context, the probabilisticmodel, we remark on the early works, which bound strategy lengths to O ( poly ( ε ) log n log δ − ) ,where p < and ε = − p , with confidence probability − δ [Asl95, BK93]. A strategy oflength O ( ε − (log n + log δ − )) is given in [FRPU94]. Finally, [BH08] gives the best known boundof − H ( p ) (log n + O (log log n ) + O (log δ − )) . We note that we arrive at a strategy matchingasymptotically the complexity of [FRPU94] as a by-product from our graph-theoretic analysis(presented in the appendix). We now introduce the notation regarding the dynamics of the game. We assume an input graph withnon-uniform edge lengths, and we denote said lengths by ω ( e ) . We denote by d ( u, v ) the distance between two vertices u and v , which is the length of a shortest path in G between u and v . We firstfocus on a simplified error model where the Responder is allowed a fixed number of lies, with theupper bound denoted as L . During the game, the Questioner keeps track of a lie counter (cid:96) v for eachvertex v of G . The value of (cid:96) v equals the number of lies that must have already occurred assumingthat v is actually the target v ∗ . The Questioner will utilize a constant Γ > that will be fixed later.The goal of having this parameter is that we can tune it in order to obtain the right asymptotics.We define a weight µ t ( v ) of a vertex v at the end of a round t > : µ t ( v ) = µ ( v ) · Γ − (cid:96) v , where µ ( v ) is the initial weight of v . For subsets U ⊆ V , let µ ( U ) = (cid:80) v ∈ U µ ( v ) . For brevity wewrite µ t in place of µ t ( V ) . For a queried vertex q and an answer v , a vertex u is compatible with theanswer if u = v when q = v , or v lies on a shortest path from q to u .As soon as there is only one vertex v left with (cid:96) v ≤ L , the Questioner can successfully detect thetarget, v ∗ = v . We will set the initial weight of each vertex v to be µ ( v ) = 1 . Thus, µ = n and µ T ≥ Γ − L if the strategy length is T .Based on the weight function µ , we define a potential of a vertex v : Φ( v ) = (cid:88) u ∈ V µ ( u ) · d ( v, u ) . We write Φ t ( v ) to refer to the value of a potential at the end of round t . Any vertex x ∈ V minimizing Φ( x ) is called -median .Denote for an edge { v, u } , N ( v, u ) = { x | d ( u, x ) + ω ( { v, u } ) = d ( v, x ) } to be the set of allvertices to which some shortest path from v leads through u . Thus, N ( v, u ) consists of the compatiblevertices for the answer u when v has been queried. For any S ⊆ V , we write for brevity S = V \ S ,and for singletons { v } we further shorten to v . We say that a vertex v is α -heavy , for some ≤ α ≤ ,if µ ( v ) > α · µ ( V ) . For a queried vertex q , if the answer is q , then such a reply is called a yes-answer ;otherwise it is called a no-answer . 4 lgorithm VERTEX: Vertex queries for a fixed number of L lies. for v ∈ V do µ ( v ) = 1 (cid:96) v = 0 while more than one vertex x ∈ V has (cid:96) x ≤ L do q = arg min x ∈ V Φ( x ) query the vertex q for all nodes u not compatible with the answer do (cid:96) u = (cid:96) u + 1 µ ( u ) = µ ( u ) / Γ return the only x such that (cid:96) x ≤ L We now formally state the search strategy for a fixed number of lies — see Algorithm VERTEX. Wecombine our weight together with the idea of querying a -median [EKS16]. As announced earlier, itturns out that our bound together with an appropriately selected weight function are strong enoughso that we do not need the additional stages enhanced with a majority selection used in [EKS16]in order to gain asymptotic improvements. We also note that we can easily introduce technicalmodifications to this strategy by changing the initial weight, the value of Γ or the stopping condition.We will do this to conclude several results for various error models (see the appendix). In this subsection we prove the following main technical contribution.
Theorem 3.1.
Algorithm VERTEX finds the target in at most (2Γ / (Γ + 1)) log n + log Γlog (2Γ / (Γ + 1)) · L vertex queries. Note that, due to the values of the initial and the final weight, it is enough to argue that theweight decreases on average, i.e., in an amortized way, by a factor of (Γ + 1) / (2Γ) per round. Wefirst handle two cases (see Lemmas 3.2 and 3.3) when the weight decreases appropriately after asingle query. These cases are a no-answer, and a yes-answer but only when the queried vertex is not / -heavy. In the remaining case, i.e., when the queried vertex q is / heavy, it is not necessarily truethat the weight decreases by the desired factor — this particularly happens in case of a yes-answerto such a query. This case is handled by the amortized analysis: we pair such yes-answers withno-answers to the query on q and show that in each such pair the weight decreases appropriately. Lemma 3.2.
If Algorithm VERTEX receives a no-answer in a round t + 1 , then µ t +1 ≤ Γ+12Γ µ t .Proof. Let q be the vertex queried in round t + 1 . Assume that the reply is some neighbor v of q .By [EKS16], Lemma 4, we get that µ t ( N ( q, v )) ≤ µ t / . Moreover, because the lie counter increasesby one for all vertices in N ( q, v ) and does not change for all vertices in N ( q, v ) in round t + 1 , itfollows that µ t +1 = µ t ( N ( q, v )) + µ t ( N ( q, v )) ≤ Γ+12Γ µ t . emma 3.3. Suppose that Algorithm VERTEX queries in round t +1 a vertex q that is not / -heavy.If a yes-answer is received, then µ t +1 ≤ Γ+12Γ µ t .Proof. The lie counter increments for each vertex of G except for q and remains the same for q inround t + 1 : µ t +1 ( q ) = µ t ( q ) and µ t +1 ( q ) = µ t ( q ) . Since q is not / -heavy at the beginning ofround t + 1 , µ t ( q ) ≤ µ t / . Thus, we get µ t +1 = µ t ( q ) + µ t ( q ) ≤ Γ+12Γ µ t . Now we turn to the proof of Theorem 3.1. Consider a maximal interval [ t , t ] , where t ≤ t areintegers, such that there exists a vertex q that is / -heavy in each round t , . . . , t , and q is not / -heavy in round t + 1 . Call it a q -interval . Note that t > and q is not / -heavy in round t − . We permute the replies given by the Responder in the q -interval to obtain a new sequenceof replies as follows. The replies in rounds , . . . , t − and t + 1 onwards are the same in bothsequences. Note that in the interval [ t , t ] the number of yes-answers, denote it by p , is smaller thanor equal to the number of no-answers. Reorder the replies in the q -interval so that the yes-answersoccur in rounds t + 2 i for each i ∈ { , . . . , p − } . In other words, we pair the yes-answers withno-answers so that a yes-answer in round t + 2 i is paired with a no-answer in round t + 2 i + 1 ; wecall such two rounds a pair . Following the pairs, some remaining, if any, no-answers follow in rounds t + 2 p, . . . , t . Perform this transformation as long as a q -interval exists for some q ∈ V . Denote by µ (cid:48) the weight of the new sequence.Denote by t (cid:48) , if it exists, the minimum integer such that for some vertex v and for each t > t (cid:48) , v is / -heavy at the end of the round t . If no such t (cid:48) exists, then let t (cid:48) be defined to be the number ofrounds of the strategy.We first analyze what happens, in the new sequence, in rounds i and i + 1 that are a pair in anarbitrary q -interval for some vertex q . After such two rounds the lie counter for q increases by one,and the lie counter of any other vertex increases by at least one. This in particular implies that q isa 1-median throughout the entire q -interval in the new sequence. Moreover, the two replies in theserounds result in weight decrease by a factor of at least Γ , µ (cid:48) i +1 ≤ µ (cid:48) i − / Γ . Since < ( ) , theoverall progress after the pair is as required.We now prove that for each t ∈ { , . . . , t (cid:48) − } that does not belong to any pair it holds µ (cid:48) t +1 ≤ Γ + 12Γ µ (cid:48) t . (2)Recall that for each t ≤ t (cid:48) that does not belong to any q -interval, µ (cid:48) t ( v ) = µ t ( v ) for each v ∈ V . Ifthe answer to this query is a no-answer, then (2) follows from Lemma 3.2. Lemma 3.2 also appliesto no-answers of a q -interval that do not belong to any pair since, as argued above, q is a -medianthroughout the q -interval. If the answer is a yes-answer, then since the queried vertex q is not / -heavy due to the choice of q -intervals, Inequality (2) follows from Lemma 3.3.If t (cid:48) is the last round in the original search strategy, then the proof is completed. Otherwise,consider the suffix of the original sequence of replies, consisting of rounds t for t > t (cid:48) . In all theserounds, by definition, some vertex q is / -heavy. Also by definition, both sequences µ and µ (cid:48) areidentical in this suffix. One can check that if a vertex is heavy at the end of some round, then inthe subsequent round Algorithm VERTEX does query this vertex. Thus, the vertex q is queriedin all rounds of the suffix, and hence q is the target. Thus, it is enough to observe how the weightdecreases on q in case of a yes-answer in a round t > t (cid:48) : µ (cid:48) t ( q ) = µ (cid:48) t − ( q ) / Γ ≤ Γ+12Γ µ (cid:48) t − ( q ) . Thiscompletes the proof of Theorem 3.1. We turn our attention to the model with a rate of lies bounded by a fraction r < / (linearlybounded error model). Our result, Theorem 1.1, is obtained on the basis of Algorithm VERTEX6nd the precise bound from Theorem 3.1. In particular, we run Algorithm VERTEX with Γ = − rr and with a fixed bound on number of lies L = log n − H ( r ) r . By Theorem 3.1, Algorithm VERTEX asksthen at most log n log (2 · (1 − r )) + log − rr log (2 · (1 − r )) · L = log n − H ( r ) · − H ( r )+ r log − rr (1 − r ) = log n − H ( r ) = L/r queries. Thisbound concludes the proof, since the number of lies is within r fraction of strategy length. Let ε > be such that p = (1 − ε ) . We run the strategy from Theorem 1.1 with an errorrate r = (1 − ε ) , where ε = ε/ (cid:16) (cid:112) δ − / ln n (cid:17) . By Theorem 1.1 the strategy length is Q = log n − H ( r ) which is (up to lower-order terms) ε − ln n , but we can safely lowerbound it as ε − ln n .The expected number of lies is E [ L ] = p · Q and by the Hoeffding bound,Pr [ Q − L ≤ (1 − r ) · Q ] ≤ exp (cid:18) −
12 ( r − p ) ln nε (cid:19) ≤ exp (cid:32) − (cid:18) ε − ε ε (cid:19) ln n (cid:33) = δ. Asymptotic properties of entropy function.
We now proceed to bound − H ( p )1 − H ( r ) . For this wedenote F ( x ) = 1 − H ( (1 − x )) , and denote α = √ δ − / ln n . So our goal is in fact to bound F ( ε ) F ( α · ε ) . Lemma 3.4.
For any − ≤ x ≤ and α < there is F ( x ) F ( αx ) ≤ F ( α ) . Proof.
Consider G ( x ) = ln F (exp( x )) . It can be verified with calculus that G (cid:48)(cid:48) ( x ) ≥ . The claim isequivalent to G (ln x ) − G (ln α + ln x ) ≤ G (0) − G (ln α ) which follows from the convexity of G ( x ) .First assume α ≥ / , so ln n ≥ ln δ − . Denote η = 1 − α . We observe that η = O ( (cid:113) ln δ − ln n ) . Wetake Taylor expansion of /F ( x ) around x = 1 , and we have that F ( α ) = 1 + O ( η ln η − ) . In thiscase the bound is Q ≤ log n − H ( p ) · F ( α ) = log n + O ( √ ln δ − ln n · (1 + ln ln n ln δ − ))1 − H ( p ) . In the second case when α ≤ / , from Taylor expansion around x = 0 there is F ( α ) = Θ( α − ) .So in this case the bound is Q ≤ log n − H ( p ) · F ( α ) = O (ln δ − )1 − H ( p ) . Conclusions
We note that also other query models have been studied in the graph-theoretic context, includingedge queries. In an edge query , the Questioner points to an edge and the Responder tells whichendpoint of that edge is closer to the target, breaking ties arbitrarily. It turns out that edge queriesare more challenging to analyze, i.e., our technique for vertex queries does not transfer withoutchanges. This is mostly due to a possible lack of edges that subdivide the search space equallyenough. This issue can be patched by treating heavy vertices in a separate way. We provide astrategy of query complexity O ( ε ∆ log ∆(log n + log δ − )) . This generalizes the noisy binary searchof [FRPU94] to general graphs, and has the advantage of being a weight-based strategy.We additionally show the generalizations of our strategies to searching in unbounded domains,where one is concerned in searching e.g., the space of all positive integers with comparison queries.The goal is to minimize the number of queries as a function of N , the (unknown) position of thetarget. By adjusting the initial distribution of the weight to decrease polynomially with respectto the distance from the point 0, we almost automatically get desired solutions for adversarialmodels. For probabilistic error model we present (a slightly more involved) strategy of expectedquery complexity O ( ε (log N + log δ − )) , improving over the complexity O ( poly ( ε − ) log N log δ − ) in [Asl95]. References [ABV17] Pranjal Awasthi, Maria-Florina Balcan, and Konstantin Voevodski. Local algorithms forinteractive clustering.
Journal of Machine Learning Research , 18:3:1–3:35, 2017.[AD91] Javed A. Aslam and Aditi Dhagat. Searching in the presence of linearly bounded errors(extended abstract). In
STOC , pages 486–493, 1991.[Aig96] Martin Aigner. Searching with lies.
J. Comb. Theory, Ser. A , 74(1):43–56, 1996.[Ang87] Dana Angluin. Queries and concept learning.
Machine Learning , 2(4):319–342, 1987.[Asl95] Javed A Aslam.
Noise tolerant algorithms for learning and searching . PhD thesis,Massachusetts Institute of Technology, 1995.[BB08] Maria-Florina Balcan and Avrim Blum. Clustering with interactive feedback. In
ALT ,pages 316–328, 2008.[Ber68] Elvyn R. Berlekamp. Block coding for the binary symmetric channel with noiseless,delayless feedback. In
H.B. Mann (ed.), Error-Correcting Codes , pages 61–88. Wiley &Sons, New York, 1968.[BH08] Michael Ben-Or and Avinatan Hassidim. The bayesian learner is optimal for noisy binarysearch (and pretty good for quantum as well). In
FOCS , pages 221–230, 2008.[BK93] Ryan S. Borgstrom and S. Rao Kosaraju. Comparison-based search in the presence oferrors. In
STOC , pages 130–136, 1993.[BKR18] Lucas Boczkowski, Amos Korman, and Yoav Rodeh. Searching a tree with permanentlynoisy advice. In
ESA , pages 54:1–54:13, 2018.[Cic13] Ferdinando Cicalese.
Fault-Tolerant Search Algorithms: Reliable Computation withUnreliable Information . Springer Publishing Company, Incorporated, 2013.8CJLV12] Ferdinando Cicalese, Tobias Jacobs, Eduardo Sany Laber, and Caio Dias Valentim. Thebinary identification problem for weighted trees.
Theor. Comput. Sci. , 459:100–112, 2012.[CKL +
16] Ferdinando Cicalese, Balázs Keszegh, Bernard Lidický, Dömötör Pálvölgyi, and TomásValla. On the tree search problem with non-uniform costs.
Theor. Comput. Sci. , 647:22–32,2016.[Dep07] Christian Deppe. Coding with feedback and searching with lies. In Imre Csiszár, GyulaO. H. Katona, Gábor Tardos, and Gábor Wiener, editors,
Entropy, Search, Complexity ,pages 27–70. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.[Der06] Dariusz Dereniowski. Edge ranking of weighted trees.
Discrete Applied Mathematics ,154(8):1198–1209, 2006.[Der08] Dariusz Dereniowski. Edge ranking and searching in partial orders.
Discrete AppliedMathematics , 156(13):2493–2500, 2008.[DGW92] Aditi Dhagat, Péter Gács, and Peter Winkler. On playing "twenty questions" with aliar. In
SODA , pages 16–22, 1992.[DKUZ17] Dariusz Dereniowski, Adrian Kosowski, Przemyslaw Uznański, and Mengchuan Zou.Approximation strategies for generalized binary search in weighted trees. In
ICALP ,pages 84:1–84:14, 2017.[DMS17] Argyrios Deligkas, George B. Mertzios, and Paul G. Spirakis. Binary search in graphsrevisited. In
MFCS , pages 20:1–20:14, 2017.[DN06] Dariusz Dereniowski and Adam Nadolski. Vertex rankings of chordal graphs and weightedtrees.
Inf. Process. Lett. , 98(3):96–100, 2006.[EK17] Ehsan Emamjomeh-Zadeh and David Kempe. A general framework for robust interactivelearning. In
NIPS , pages 7085–7094, 2017.[EKS16] Ehsan Emamjomeh-Zadeh, David Kempe, and Vikrant Singhal. Deterministic andprobabilistic binary search in graphs. In
STOC , pages 519–532, 2016.[FRPU94] Uriel Feige, Prabhakar Raghavan, David Peleg, and Eli Upfal. Computing with noisyinformation.
SIAM J. Comput. , 23(5):1001–1018, 1994.[HIKN10] Nicolas Hanusse, David Ilcinkas, Adrian Kosowski, and Nicolas Nisse. Locating a targetwith an agent guided by unreliable local advice: how to beat the random walk when youhave a clock? In
PODC , pages 355–364, 2010.[HKK04] Nicolas Hanusse, Evangelos Kranakis, and Danny Krizanc. Searching with mobile agentsin networks with liars.
Discrete Applied Mathematics , 137(1):69–85, 2004.[HKKK08] Nicolas Hanusse, Dimitris J. Kavvadias, Evangelos Kranakis, and Danny Krizanc. Mem-oryless search algorithms in a network with faulty advice.
Theor. Comput. Sci. , 402(2-3):190–198, 2008.[KK99] Evangelos Kranakis and Danny Krizanc. Searching with uncertainty. In
SIROCCO ,pages 194–203, 1999. 9KK07] Richard M. Karp and Robert Kleinberg. Noisy binary search and its applications. In
SODA , pages 881–890, 2007.[KV94] Michael J. Kearns and Umesh V. Vazirani.
An Introduction to Computational LearningTheory . MIT Press, 1994.[Lit87] Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.
Machine Learning , 2(4):285–318, 1987.[Liu11] Tie-Yan Liu.
Learning to Rank for Information Retrieval . Springer, 2011.[LMP01] Eduardo Sany Laber, Ruy Luiz Milidiú, and Artur Alves Pessoa. On binary searchingwith non-uniform costs. In
SODA , pages 855–864, 2001.[Lon92] Philip M. Long. Sorting and searching with a faulty comparison oracle. Technical report,Technical Report UCSC-CRL-92-15, University of California at Santa Cruz, 1992.[LY01] Tak Wah Lam and Fung Ling Yue. Optimal edge ranking of trees in linear time.
Algorithmica , 30(1):12–33, 2001.[MOW08] Shay Mozes, Krzysztof Onak, and Oren Weimann. Finding an optimal tree searchingstrategy in linear time. In
SODA , pages 1096–1105, 2008.[Mut94] S. Muthukrishnan. On optimal strategies for searching in presence of errors. In
SODA ,pages 680–689, 1994.[NdM06] Jaroslav Nesetril and Patrice Ossona de Mendez. Tree-depth, subgraph coloring andhomomorphism bounds.
Eur. J. Comb. , 27(6):1022–1041, 2006.[OP06] Krzysztof Onak and Pawel Parys. Generalization of binary search: Searching in treesand forest-like partial orders. In
FOCS , pages 379–388, 2006.[Pel89] Andrzej Pelc. Searching with known error probability.
Theor. Comput. Sci. , 63(2):185–202, 1989.[Pel02] Andrzej Pelc. Searching games with errors—fifty years of coping with liars.
TheoreticalComputer Science , 270(1):71 – 109, 2002.[RJ05] Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicitfeedback. In
SIGKDD , pages 239–248, 2005.[RMK +
80] Ronald L. Rivest, Albert R. Meyer, Daniel J. Kleitman, Karl Winklmann, and JoelSpencer. Coping with errors in binary search procedures.
Journal of Computer andSystem Sciences , 20(3):396–404, 1980.[Ré61] Alfréd Rényi. On a problem of information theory.
MTA Mat. Kut. Int. Kozl. , 6B:505–516,1961.[Sch89] Alejandro A. Schäffer. Optimal node ranking of trees in linear time.
Inf. Process. Lett. ,33(2):91–96, 1989.[Set12] Burr Settles.
Active Learning . Synthesis Lectures on Artificial Intelligence and MachineLearning. Morgan & Claypool Publishers, 2012.[Ula76] Stanislaw M. Ulam.
Adventures of a Mathematician . Scribner, New York, 1976.10
More Searching Models
We recall a different format of queries called edge queries , where in each round the Questionerselects an edge { u, v } of an input graph and the Responder replies with the endpoint of { u, v } that is closer to the target. Again, ties are broken arbitrarily. The edge-query model naturallygeneralizes comparison queries in linearly or partially ordered data. In case of edge-queries weconsider unweighted graphs.For the limitations imposed on the Responder, we distinguish yet another model called prefix-bounded . In this model, in each prefix of i queries there may be at most ri lies, ≤ r < , andin such case as opposed to the linearly bounded model, the length of the strategy does not needto be initially set. It is well known that these error models are not feasible for r ≥ , even in thecase of paths. They bridge the gap between the adversarial one with a fixed number of lies and theprobabilistic model. We note that these models naturally reflect processes in potential applicationslike communication scenarios over a noisy channel or hardware errors. This is due to the fact that insuch scenarios the errors typically accumulate over time.In the following sections we show that our generic ideas can be applied to several other models.Our results either match or improve the existing ones, which we point out throughout. We remarkthat in both cases, i.e., whether we obtain an improvement or arrive at an existing result, we reachthat point with a simpler analysis. B Analysis of the Generic Strategies for Edge Queries
We start by giving the notation regarding edge queries. The degree of a vertex v , denoted by deg( v ) ,is the number of its neighbors in G . We denote by ∆ = max v ∈ V deg( v ) the maximum degree of G .We define an edge-vertex distance d ( e, v ) = min( d ( x, v ) , d ( y, v )) for an edge e = { x, y } . Similarly asfor vertex queries, based on a weight function µ and distance d , we define a potential of an edge e : Φ( e ) = (cid:88) u ∈ V µ ( u ) · d ( e, u ) . Again, we write Φ t ( e ) to refer to this value at the end of round t . Any edge e minimizing Φ( e ) iscalled -edge-median . For an edge e = { u, v } and one of its endpoints, N ( e, v ) = { w | d ( v, w ) ≤ d ( u, w ) } , N < ( e, v ) = { w | d ( v, w ) < d ( u, w ) } . For edge-queries we give a strategy that is a bit more complicated — see Algorithm EDGE.Intuitively, as opposed to the vertex-query case, there may be no edges in the graph that ‘subdivide’the search space evenly enough. This already happens as soon as one of the vertices is -heavy.If this is the case, and say vertex v is -heavy, we cyclically query edges incident to v in anappropriate greedy order. We continue to do so until all other vertices have been eliminated, andhence v must be the target, or v is no longer -heavy. If none of the vertices is -heavy, wesimply query a 1-edge-median. The absence of such heavy vertices essentially ensures, that thisdecreases the weight sufficiently.This results in a more involved proof given in Section 3.1. Similarly as for vertex queries, we alsofirst provide an analysis for a fixed number of lies (see Theorem B.1) and then from this bound wederive appropriate bounds for other models (Theorems B.2 and B.3). Theorem B.1.
Let Γ > . Algorithm EDGE finds the target in at most log n + L log Γlog(1+ Γ − ) edge queries. lgorithm EDGE: Edge queries for fixed number of L lies. for v ∈ V do µ ( v ) = 1 (cid:96) v = 0 while more than one vertex x satisfies (cid:96) x ≤ L do if there exists v such that µ ( v ) > µ/ (∆ + 1) then (cid:46) v is -heavy for i = 1 to deg( v ) do (cid:46) greedy ordering of neighbors select an edge e i incident to v to maximize µ ( (cid:83) j ≤ i N < ( e j , v )) i = 1 do (cid:46) cyclically query edges incident to v query e i for all nodes u not compatible with the answer do (cid:96) u = (cid:96) u + 1 µ ( u ) = µ ( u ) / Γ if the answer to the last query is v then i = ( i + 1) mod deg( v ) while µ ( v ) > µ/ (∆ + 1) and there exists more than one x with (cid:96) x ≤ L else e = arg min x ∈ E Φ( x ) query e for all nodes u not compatible with the answer do (cid:96) u = (cid:96) u + 1 µ ( u ) = µ ( u ) / Γ return v such that (cid:96) v ≤ L Theorem B.2.
In the linearly bounded error model with error rate r = (1 − ε ) for some < ε ≤ , the target can be found in at most ε − ∆ ln n edge queries. Theorem B.3.
In the probabilistic error model with error rate p = (1 − ε ) for some < ε ≤ thereexists a strategy that finds the target using at most O ( ε − ∆ log ∆ · (log n + log δ − )) edge queries,correctly with probability at least − δ . Proof of Theorem B.1
We first prove two technical lemmas and then we give the proof of the theorem.
Lemma B.4.
Let Γ > . Suppose that Algorithm EDGE queries in round t + 1 an edge e q incidentto a vertex q such that e q = arg min x ∈ E Φ t ( x ) . If deg( q ) > , then µ t ( N ( e q , q )) ≥ q ) ( µ t − µ t ( q )) . (3) Proof.
Denote e q = { q, v } . For each neighbor w of q define N ∩ w = N ( e q , q ) ∩ N < ( { q, w } , w ) . Consider an edge e (cid:48) = { q, w } that maximizes µ t ( N ∩ w ) . If X is the set of neighbors of q , then bydefinition and by the fact that e q lies on no shortest path from q to any vertex in N < ( e q , v ) , i.e.,12 ∩ v = ∅ , it holds N ( e q , q ) \ { q } ⊆ (cid:91) w (cid:48) ∈ X N ∩ w (cid:48) = (cid:91) w (cid:48) ∈ X \{ v } N ∩ w (cid:48) . Hence (since e (cid:48) maximizes µ t ( N ∩ w ) ) we obtain that µ t ( N ∩ w ) ≥ q ) − µ t ( N ( e q , q )) − µ t ( q )) . (4)For brevity, we extend our notation in the following way: for an edge e and a subset S of vertices, Φ t ( e, S ) = (cid:80) z ∈ S µ t ( z ) · d ( e, z ) . Note that for any S ⊆ V and any edge e , Φ t ( e ) = Φ t ( e, S ) + Φ t ( e, S ) .We obtain Φ t ( e (cid:48) , N ( e q , q )) = Φ t ( e (cid:48) , N ∩ w ) + Φ t ( e (cid:48) , N ( e q , q ) \ N ∩ w )= (cid:88) u ∈ N ∩ w µ t ( u ) · ( d ( q, u ) −
1) + (cid:88) u ∈ N ( e q ,q ) \ N ∩ w µ t ( u ) · d ( q, u )= (cid:88) u ∈ N ( e q ,q ) µ t ( u ) · d ( q, u ) − µ t ( N ∩ w ) ≤ Φ t ( e q , N ( e q , q )) − q ) − µ t ( N ( e q , q )) − µ t ( q )) , (5)where the last inequality is due to (4). For any vertex u , d ( e (cid:48) , u ) ≤ d ( e q , u ) + 1 because e q and e (cid:48) areadjacent. Using this fact we obtain: Φ t ( e (cid:48) , N ( e q , q )) = (cid:88) u/ ∈ N ( e q ,q ) µ t ( u ) · d ( e (cid:48) , u ) ≤ (cid:88) u/ ∈ N ( e q ,q ) µ t ( u ) · d ( e q , u ) + (cid:88) u/ ∈ N ( e q ,q ) µ t ( u )= Φ t ( e q , N ( e q , q )) + µ t ( N ( e q , q )) . (6)Finally, by (5) and (6) we get: Φ t ( e (cid:48) ) = Φ t ( e (cid:48) , N ( e q , q )) + Φ t ( e (cid:48) , N ( e q , q )) ≤ Φ t ( e q , N ( e q , q )) − µ t ( N ( e q , q )) − µ t ( q )deg( q ) − t ( e q , N ( e q , q )) + µ t ( N ( e q , q ))= Φ t ( e q ) + µ t ( N ( e q , q )) − q ) − µ t ( N ( e q , q )) − µ t ( q )) . By assumption, Φ t ( e q ) ≤ Φ t ( e (cid:48) ) . Therefore, q ) − µ t ( N ( e q , q )) − µ t ( q )) ≤ µ t ( N ( e q , q )) , which can be rewritten as in (3). Lemma B.5.
Let Γ > . Suppose that Algorithm EDGE queries in round t + 1 an edge incident toa vertex q that is not -heavy in this round, and the answer is q . Then, µ t +1 ≤ (1 − Γ − ) µ t . roof. Let e q = { q, v } be the edge queried in round t + 1 . Suppose first that deg( q ) > . ByLemma B.4, µ t ( N ( e q , q )) ≥ q ) ( µ t − µ t ( q )) ≥
1∆ ( µ t − µ t ( q )) . (7)Because e q is the queried edge in round t + 1 and the reply is q , the lie counter remains unchangedfor the vertices in N ( e q , q ) and decreases by one in the complement N ( e q , q ) . Hence, µ t +1 = µ t ( N ( e q , q )) + 1Γ µ t ( N ( e q , q )) = µ t − Γ − µ t ( N ( e q , q )) . Thus, by (7) and by the fact that µ t ( q ) ≤ µ t for q that is not -heavy in round t , µ t +1 ≤ (cid:18) − Γ − · ∆∆ + 1 (cid:19) µ t , which completes the proof in the case when deg( q ) > .If deg( q ) = 1 , then in round t the lie counter increases by one for each vertex in q . Thus, againby the fact that q is not -heavy, µ t +1 = µ t ( q ) + 1Γ µ t ( q ) ≤ (cid:18)
1∆ + 1 + 1Γ (cid:19) µ t ≤ (cid:18) − Γ − (cid:19) µ t . Having proved the technical lemmas, we now turn to the proof of Theorem B.1. It is enough toargue that every query, amortized, multiplies the weight by a factor of − Γ − = 1 / (1 + Γ − ) .If there is no -heavy vertex, then the theorem follows from Lemma B.5. Hence suppose in therest of the proof that there exists a -heavy vertex and denote this vertex by q .For the amortized analysis, consider a sequence of t consecutive queries to edges e , . . . , e t , t ≤ deg( q ) , performed while q is -heavy; call such a sequence a segment . Suppose this sequencestarts in round t (cid:48) . Denote e i = { q, v i } , i ∈ { , . . . , t } , and let Q = t (cid:91) i =1 N < ( e i , v i ) , Q = V \ ( Q ∪ { q } ) . First we assume that the query in round t (cid:48) + t (i.e., the query that follows the sequence) doesnot return v as a reply, or v stops being -heavy. We argue, informally speaking, that this queryin round t (cid:48) + t amortizes the t queries prior to it thanks to the assumption t ≤ deg( q ) . Because thelie counter of q increments in round t (cid:48) + t , µ t (cid:48) + t ( q ) ≤ µ t (cid:48) ( q ) . (8)We have µ t (cid:48) + t ( Q ) ≤ µ t (cid:48) ( Q ) by the formulation of Algorithm EDGE, and µ t (cid:48) + t ( Q ) ≤ µ t (cid:48) ( Q ) .Then, Q ∪ Q = q and Q ∩ Q = ∅ imply µ t (cid:48) ( Q ) ≤ µ t (cid:48) ( q ) − µ t (cid:48) ( Q ) and hence µ t (cid:48) + t ( Q ) + µ t (cid:48) + t ( Q ) ≤ µ t (cid:48) ( q ) + Γ − µ t (cid:48) ( Q ) . (9)Due to the order according to which the edges { q, v i } are queried, we have µ t (cid:48) ( Q ) ≤ (cid:18) − t deg( q ) (cid:19) µ t (cid:48) ( q ) ≤ (cid:18) − t ∆ (cid:19) µ t (cid:48) ( q ) . (10)14ote that µ t (cid:48) ( q ) ≤ ∆∆+1 µ t (cid:48) since by assumption q is -heavy in round t (cid:48) . Since µ t (cid:48) + t = µ t (cid:48) + t ( q ) + µ t (cid:48) + t ( Q ) + µ t (cid:48) + t ( Q ) , we get by (8), (9) and (10): µ t (cid:48) + t ≤ (cid:18)
1Γ + Γ −
1Γ ∆ − t ∆ + 1 (cid:19) µ t (cid:48) = (cid:18) − Γ − t + 1(∆ + 1) (cid:19) µ t (cid:48) ≤ (cid:18) − Γ − (cid:19) t +1 µ t (cid:48) , where the last inequality comes from (1 − x ) k ≥ − xk , for k ≥ and x < .Consider now a maximal sequence S of rounds in which q is -heavy and is not -heavy inthe round that follows the sequence. Note that Algorithm EDGE cyclically queries the edges incidentto q in S . Let r (cid:48) ≤ · · · ≤ r (cid:48) b (cid:48) be all rounds in S having q as an answer. Denote X = S \ { r (cid:48) , . . . , r (cid:48) b (cid:48) } ,the set of rounds in S in which q is not an answer. Let a = (cid:100) b (cid:48) / deg( q ) (cid:101) . The lie counter of eachvertex in q increases by at least a − and by at most a times by executing S — we point out thatthis crucial property follows from the fact that the queries in the segment are applied to the edgesincident to q consecutively modulo deg( v ) . Since q is -heavy at the beginning of S and is not -heavy right after S , the lie counter of q increases by at least a as a result of S . Hence, | X | ≥ a .Partition r (cid:48) , . . . , r (cid:48) b (cid:48) into a minimum number of segments of length at most deg( q ) each, which leadsto at most a segments. Thus, we can pair these segments with rounds in X . For each such pair of atmost deg( q ) + 1 rounds we apply the amortized analysis performed above. Note that this approachis valid since the amortized analysis is insensitive of the order of appearance of the queries in X andthe queries in S \ X .Finally, suppose that there is a series of queries at the end of the strategy (a suffix) performed toedges incident to a -heavy vertex q such that all replies point to q and q remains -heavytill the end of the strategy. Note that in such a case q is the target. The vertex q had the uniquelysmallest lie counter just before those queries. This in particular implies that the lie counter is strictlysmaller than L . We artificially add a sequence of pseudo-queries , each of which increments the liecounter of q until it reaches L . This implies that the suffix of the search strategy now consists of areply (which comes from a regular query or a pseudo-query) which does not point to q . Thus, weuse again the arguments from our amortized analysis: we can find a segment and pair with it theabove mentioned query pointing away from q . Proof of Theorem B.2
Similarly as in the case of vertex queries, the generic strategy in Algorithm EDGE for edge queriesand its corresponding bound for a fixed number of lies can be used to provide strong bounds forlinearly bounded and probabilistic error models.Let
Γ = 1 + ∆+1∆ ε − ε = − rr · . Denote Q min = ln n ln(1+ Γ − ) − r ln Γ . We run Algorithm EDGE withbound L = Q min r and parameter Γ set as just mentioned above. Then, by Theorem B.1, the lengthof the strategy is at most Γ − ) · ln n + ln Γln(1+ Γ − ) · Q min r = Q min = L/r . To conclude the proof,we bound Q min = ln nF ( ε ) / (∆ + 1) + F ( − ε/ ∆) · ∆ / (∆ + 1) = (where F ( x ) def = x + (1 − x ) ln(1 − x ) = (cid:80) ∞ i =2 x i i ( i − ) = ln n (cid:80) ∞ i =2 ε i i ( i −
1) ∆ i − +( − i (∆+1)∆ i − ≤ ln nε / (2∆) = 2 ε − ∆ ln n. roof of Theorem B.3 For edge queries, we use a two step approach: first, we repeatedly ask queries to boost their errorrate from ∼ / to below / (∆ + 1) , and then use the linearly bounded error strategy.As a first step, we show that for p = (1 − ε ) , there exists a strategy that locates the targetwith high probability using O (∆ log n/ε ) edge queries. Indeed, assume without loss of generalitythat ε < / . We fix ε = ε / (1 + (cid:113)
32 ∆+1∆ ln δ − / ln n ) , and use Theorem B.2 with error rate r = (1 − ε ) . By Theorem B.2, we obtain that the strategy length is Q = 2 ε − ∆ ln n = O (∆ ε − (log n + log δ − )) . The expected number of lies is E [ L ] = p · Q and by the Chernoff bound,Pr [ L ≥ r · Q ] ≤ exp (cid:32) − (cid:18) r p − (cid:19) · p · Q (cid:33) ≤ exp (cid:32) − (cid:18) ε − ε ε (cid:19) · ∆∆ + 1 ln n (cid:33) = δ. We now observe that to achieve the error rate of (1 − ε ) , we can boost the query error rateto be smaller by repeating the same query multiple times and taking the majority answer. Byrepeating each query P = O (log(2∆ + 2) · ε − ) times, we get the correct answer with probability − p (cid:48) = 1 − · , and as shown already, we only need O (∆(log n + log δ − )) queries with theerror rate p (cid:48) to locate the target with probability at least − δ . Thus the claimed bound follows.As an immediate corollary we obtain a very simple strategy for noisy binary search in an integerrange of complexity O ( ε − (log n + log δ − )) matching [FRPU94]. C Application: Searching Unbounded Integer Ranges
Building on our generic strategies, we now obtain a general technique for searching an unboundeddomain N = { , , . . . } with comparison queries. Here the measure of complexity is the dependencyon the error rate (number of lies) and on N , the (initially unknown) position of the target. The mainidea is to use Algorithms VERTEX and EDGE, tweaking the initial weight distribution. We fix the initial weight of an integer n to be µ ( n ) = n − . The total initial weight then equals π / .We provide the following bounds. Corollary C.1.
There exists a strategy that finds an integer in an unbounded integer range ( N )using at most • log π +2 log N + L log Γlog ternary queries, or • log π +2 log N + L log Γlog binary (comparison) queries,where N is the target, L is an upper bound on the number of (adversarial) lies and Γ > is anarbitrarily selected coefficient.Proof. We use Algorithm VERTEX for ternary queries; let the strategy length be Q . By the proofof Theorem 3.1, µ Q ≤ ( ) Q · π . The final weight is at least µ Q ≥ N − · Γ − L , and the bound forternary queries follows since the number of queries is at most log( π / N − Γ − L ) / log .The bound for binary queries is obtained analogously from Theorem B.1 (note that ∆ = 2 ) sincewe apply Algorithm EDGE for binary queries. We note that the term ternary refers to a model in which each query selects an integer i and as a response receivesinformation whether the target is smaller than i , equals i , or is greater than i . lgorithm UNBOUNDED: Searching unbounded integerrange in probabilistic error model. δ (cid:48) = δ/ while true do n = 1 /δ (cid:48) t = SEARCH ([ n ] , δ (cid:48) ) if t (cid:54) = n then return t else δ (cid:48) = δ (cid:48) Simply setting
Γ = 2 yields an O (log N + L ) length strategy with comparison queries onunbounded integer domains with a fixed number of L lies.We need to restate the linearly bounded error model in the case of unbounded domains since theResponder does not know a priori the length of the strategy. We define this error model as follows:whenever the Questioner finds the target and thus declares the search to be completed after t rounds,it is guaranteed that at most rt lies have occurred throughout the search. Corollary C.2.
For the linearly bounded error model with an error rate r and an unbounded integerdomain, there exists a strategy that finds the target integer N in: • O ( ε − log N ) ternary queries when r = (1 − ε ) , or • O ( ε − log N ) binary queries when r = (1 − ε ) .Proof. Consider ternary queries. We proceed analogously as in the proof of Theorem 1.1. We havethat the initial weight is π / . Run Algorithm VERTEX until there is a single n such that (cid:96) n ≤ r · Q .Any Q such that Q ≥ ln( π / / ln + 2 ln N/ ln + L ln Γ / ln is an upper bound on thelength of the strategy. We thus get an upper bound Q ≤ ε − (2 ln N + O (1)) . The binary case follows in an analogous manner.We now proceed to show an algorithm for searching the ubounded integer range in the probabilisticerror model. The challenge lies in the fact, that in all our previous algorithms we reduced theproblem to the linearly bounded error model, and we could use an upper bound on the length ofthe strategy to select a proper relation between p and r . However in this particular problem, thelinearly bounded strategy could be arbitrarily long as N → ∞ .Consider Algorithm UNBOUNDED. We assume that procedure SEARCH ( I, δ ) implements anoisy binary search, that is, given I ⊂ N , a parameter δ and a promise that the target t is in I ,the procedure returns t ∈ I correctly with probability at least − δ , assuming probabilistic errormodel (with error probability p = (1 − ε ) ). A strategy of length O ( ε − (log n + log δ − )) is givenin [FRPU94], so we can assume that SEARCH ( I, δ ) implements that particular strategy, but for ourpurposes any asymptotically optimal strategy suffices. Theorem C.3.
In the probabilistic error model, given integer domain N and unknown target integer N , Algorithm UNBOUNDED with probability at least − δ returns N using at most O ( ε − (log N +log δ − )) binary queries for p = (1 − ε ) . The strategy expected number of queries is O ( ε − (log N +log δ − )) as well. roof. We first show that the algorithm is correct with probability at least − δ . Observe that thealgorithm does a sequence of searches over increasing range of integer ranges [ n ] , [ n ] , . . . , where n i = 1 /δ i and δ i = ( δ/ i . If SEARCH ([ n ] , δ (cid:48) ) is called with N ≥ n , the returned value is n withprobability at least − δ (cid:48) , which continues the main loop of the algorithm. If N < n , then withprobability at least − δ (cid:48) returned value is N , which correctly stops the strategy. By union bound,failure probability of the algorithm is upperbounded by the sum of failure probabilities of all calls to SEARCH , which is (cid:80) ∞ i =0 δ i = (cid:80) ∞ i =0 ( δ ) i < (cid:80) ∞ i =0 ( δ ) i ≤ δ = δ .We now bound the expected number of queries. Let n j > N be chosen such that j is minimal. Inthe desired execution of algorithm, the last called search is on [ n j ] , and it follows that either j = 0 andthen n j = 1 /δ , or n j ≤ N + 1 . Thus log n j = O (log N + log 1 /δ ) . Let C be a constant such that thenumber of queries performed by a call to SEARCH ([ n ] , δ (cid:48) ) is upperbounded by C · (log n + log 1 /δ (cid:48) ) .The expected number of queries is upperbounded by E [ Q ] ≤ (cid:88) i ≤ j C · (log n i + log 1 /δ i ) + (cid:88) i>j δ i − · C · (log n i + log 1 /δ i ) ≤ C (cid:88) i ≤ j log n i + 2 C (cid:88) i>j δ i − log n i ≤ C log n j + 2 C (cid:88) i> i · ( δ j ) i − log n j ≤ C (cid:32) (cid:88) i> i − i − (cid:33) log n j ≤ C log n j = O (log N + log 1 /δ ) , where we have used that δ j ≤ / . D Application: Edge Queries in the Prefix-Bounded Model
The model of prefix-bounded errors can be somewhat seen as lying in-between the adversarial linearlybounded and the non-adversarial probabilistic. It is reflected e.g. in the fact that in binary searchthe ‘feasibility’ threshold for r changes from in the linearly bounded to in the prefix-boundedmodel. We utilize the ideas from [BK93] more carefully, adapting the approach to edge queries ingeneral graphs and the prefix-bounded error model. It turns out that the feasibility threshold for r can be pushed from to in this case, while keeping the log n dependency on the graph size. For the virtual advance technique that we utilize, in addition to the lie counter (cid:96) v of a vertex v that we used so far, we introduce a virtual lie counter , denoted by virt ( v ) , that is maintainedby our strategy given in Algorithm PRUNING. Whenever a query is made to an edge { u, v } andthe reply is u , then the virtual lie counter of u is incremented by the strategy (note that thisreply results in incrementing (cid:96) v but (cid:96) u remains the same). We extend the notation by introducinga virtual potential (cid:101) Φ( v ) for each node v , (cid:101) Φ( v ) = (cid:101) Φ ( v ) · Γ − ( (cid:96) v + virt ( v )) , where (cid:101) Φ ( v ) is the initialpotential (in Algorithm PRUNING, (cid:101) Φ ( v ) = 1 for all v ). Consequently, we define for each edge e , (cid:101) Ψ( e ) = (cid:80) u ∈ V (cid:101) Φ( u ) · d ( e, u ) . The strategy relies on two constants Γ and H that we select while statingour lemmas below. The values of C and D in Algorithm PRUNING computed in round t of the We note that for any r = 1 / − ε ) there exists a strategy of exponential length (1 /ε ) O (∆ log n ) , following fromstraightforward simulation of error-less strategy by repeating queries. C t and D t , respectively. The goal of Algorithm PRUNINGis to trim down the set of potential targets to at most O (∆ /ε ) , where r = (1 − ε ) . Algorithm PRUNING:
Edge queries for the prefix-boundedmodel for v ∈ V do (cid:101) Φ( v ) = 1 (cid:96) v = 0 virt ( v ) = 0 do e = arg min x ∈ E (cid:101) Ψ( x ) query e with an answer w for all nodes u not compatible with the answer do (cid:96) u = (cid:96) u + 1 (cid:101) Φ( u ) = (cid:101) Φ( u ) / Γ virt ( w ) = virt ( w ) + 1 (cid:101) Φ( w ) = (cid:101) Φ( w ) / Γ D = { u : virt ( u ) ≥ t/H } C = { u ∈ V \ D : (cid:96) u ≤ r · t } . while | C | > return C ∪ D Theorem D.1.
Algorithm PRUNING with H = 2∆ ε − and Γ = 1 + ∆2(∆ − ε in Q = O (∆ ε − log n ) edge queries returns a set D of size O (∆ ε − ) of possible target candidates in the prefix-bounded errormodel with r = (1 − ε ) .Proof. Denote ε (cid:48) = ε/ and r (cid:48) = (1 − ε (cid:48) ) . Note that Γ = 1 + ∆∆ − · ε (cid:48) and H = 1 / ( r (cid:48) − r ) = ∆ /ε (cid:48) .We prove that in at most Q = (cid:0) − ε − + O ( ε − ) (cid:1) ln n (11)edge queries Algorithm PRUNING terminates.If an edge e = { u, v } is a -median with respect to (cid:101) Ψ and deg( u ) > , where u is the reply tothe query in round t + 1 , then by Lemma B.4 applied to the minimizer e of (cid:101) Ψ , (cid:101) Φ t ( N ( e, u )) ≥ u ) ( (cid:101) Φ t − (cid:101) Φ t ( u )) . (12)Note that if deg( u ) = 1 , then (cid:101) Φ t ( N ( e, u )) = (cid:101) Φ t ( V \ { u } ) = (cid:101) Φ t − (cid:101) Φ t ( u ) , which implies that in thiscase (12) also holds. Hence we obtain from (12): (cid:101) Φ t ( N ( e, u ) ∪ { u } ) ≥ (cid:101) Φ t . Thus, in each round, the decrease in the virtual potential is as follows: (cid:101) Φ t +1 = (cid:101) Φ t ( N ( e, u ) \ { u } ) + 1Γ (cid:101) Φ t ( N ( e, u ) ∪ { u } )= (cid:101) Φ t − Γ − (cid:101) Φ t ( N ( e, u ) ∪ { u } ) ≤ (cid:18) − (cid:19) (cid:101) Φ t . (cid:101) Φ = n , this implies (cid:101) Φ Q ≤ n · (cid:18) − (cid:19) Q . (13)Observe that (cid:18) − (cid:19) · Γ r · Γ /H = 1 − ε −
1) + O ( ε ) . (14)Thus, by (11) and (13), the total virtual potential after Q queries is at most (cid:101) Φ Q ≤ n · − ε − + O ( ε )Γ r +1 /H Q = Γ − Q ( r +1 /H ) . (15)Denote for brevity D (cid:48) = D Q . Since we had Q rounds and in each round the virtual potentialof exactly one vertex increases, there are at most H discarded vertices in D (cid:48) . For all other verticesin V \ D (cid:48) , the virtual lie counter does not exceed Q /H according to Algorithm PRUNING. Thus,by (15), Φ Q ( V \ D (cid:48) ) = (cid:88) v ∈ V \ D (cid:48) Φ Q ( v ) ≤ Γ Q /H · (cid:88) v ∈ V \ D (cid:48) (cid:101) Φ Q ( v ) ≤ Γ Q /H · (cid:101) Φ Q ≤ Γ − Q r . This means that there is at most one vertex v ∈ V \ D (cid:48) such that (cid:96) v ≤ r · Q . Thus, Algorithm PRUN-ING indeed terminates in at most Q rounds. Additionally, in any round t , | D t | ≤ H , which provesour claim. Corollary D.2.
In the prefix-bounded error model with r = (1 − ε ) , the target in an integer domaincan be found in O ( ε − log n ) binary queries.Proof. We first use the strategy described from Theorem D.1 to reduce, in Q = O ( ε − log n ) rounds,the set of potential targets to C ∪ D , where | C | = 1 and | D | = O ( ε − ) . In case of no further errors, C ∪ D can be then reduced in Q (cid:48) ≤ | D | queries to a single target. The final strategy can besimulated as described in [Asl95], giving the total strategy length of O ( Q · − r · ( − r ) Q (cid:48) ) . Since − r ≤ and − r = ε − , this results in O ( ε − log n ) edge queries, as claimed.We note that the simulation argument from [Asl95] requires that for any queried edge e = { u, v } and a set of potential targets D , it holds D ⊆ N < ( e, v ) ∪ N < ( e, w ) . This is always true e.g. inbipartite graphs regardless of D .To obtain our results for the prefix-bounded error model and general graphs, we use the ‘trimming’phase provided by Theorem D.1 which is then followed by a simulation argument. The latter requiresan error-less strategy whose queries are then repeated e.g. for majority testing. The theorem belowprovides such an edge search strategy for an arbitrary graph. Theorem D.3.
There exists a strategy that in absence of errors finds the target in at most log( n/ ∆)log(∆ / (∆ − + ∆ edge queries in any n -node graph of max-degree ∆ .Proof. We use Algorithm EDGE with a simplification of taking Γ → ∞ . Thus we have Φ( v ) ∈ { , } and these occur for (cid:96) v > and (cid:96) v = 0 , respectively. Let S t be the set of potential targets after t queries. Note that S t = { v | Φ t ( v ) = 1 } . By Lemma B.4, it follows that in any step querying anedge e q with an answer q , the discarded set of targets satisfies | S t ∩ N ( e q , q ) | ≥ | S t \ { q }| ≥
1∆ ( | S t | − . | S t +1 | = | S t ∩ N ( e q , q ) | ≤ | S t | − ( | S t | − . From | S t +1 | − ≤ ( | S t | − · ∆ − we deducethat it takes at most (cid:108) log( n/ ∆)log(∆ / (∆ − (cid:109) queries to reduce target set to size at most ∆ , and then another ∆ − queries to reduce it to a single target. Theorem D.4.
In the prefix-bounded error model with r = (1 − ε ) , the target can be found in ε −O (∆) log n edge queries in general graphs.Proof. Suppose that D is a set of potential targets, i.e., the target v belongs to D . By Theorem D.3,there exists a strategy (for the error-less case) with at most Q (cid:48) ≤ log( | D | / ∆)log(∆ / (∆ − + ∆ edge queries thatfinds the target v .First assume that ∆ ≥ . It follows immediately from the simulation argument from [Pel89] (inwhich one repeats multiple times a query of another strategy taking majority answer in each case— here we use the error-less strategy of length Q (cid:48) from Theorem D.3) that there exists a strategyterminating in O ( Q · (1 / (1 − r )) Q (cid:48) ) = O ( ε − ∆ log n ) · (1 / (1 − r )) O (∆ log ε − ) edge queries, where Q is the length of the strategy produced by Algorithm PRUNING. Note that the value of Q comesfrom Theorem D.1. Since ∆ ≥ , / (1 − r ) ≤ , the claimed bound immediately follows.For ∆ = 2 , the only cases not covered by Corollary D.2 are in fact odd-length cycles. Wedeal with them as follows. The initial sequence of queries is done as previously — by executingAlgorithm PRUNING, reducing the set of potential targets to D at the cost of Q rounds. We nowobserve, that for any edge e = { u, v } , there is a single vertex v e such that d ( u, v e ) = d ( v, v e ) . Thuswe can consider the following error-less strategy applied to the set of potential targets D : query edgesaccording to an error-less edge strategy (as in Theorem D.3) and for each queried edge e , discard the vertex v e from the set of potential targets. At the end of this strategy, reintroduce all discardedvertices. This strategy can be simulated as in [Asl95], since we always make sure to maintain theproperty of properly bisecting the set of targets. Thus, our initial Q = O ( ε − log n ) rounds and D = | D | = O ( ε − ) targets give that this strategy has length Q = O ( Q · ε − · log D ) = O ( Q D ε − ) and results in D ≤ D targets. Iterating this procedure would give us a strategy of length ε −O (log ∗ ε − ) log n . To improve its length by getting rid of the non-constant exponent, denote by E set of edges queried during the transition from D to D . Since the strategy is basically a binarysearch, there are O (1) pairs of edges in E that share an endpoint, and there are O (1) of pairs ofvertices in D that share an edge. Thus, the querying strategy of reducing D to D can always,except for O (1) queries, choose an edge e to be queried so that v e (cid:54)∈ D . Thus D = O (1) , and theproof concludes. Corollary D.5.
In the prefix-bounded error model with r = (1 − ε ) , < ε ≤ , the target integerin an unbounded integer domain can be found in O ( ε − log N ) binary queries.Proof. Set s = 2 and proceed first with the filtering technique by executing Algorithm PRUNINGwith H = ε and Γ = 1+ ε . Following the proof of Theorem D.1, we observe that a single query reducesthe adjusted potential in V \ D by a factor ε − O ( ε ) . After Q = (8 ε − + O ( ε − )) · ln( ζ ( s ) · N s ) queries the potential of the vertices in V \ D is reduced from ζ ( s ) to N − s , meaning that the set C Q has only one vertex. We apply Corollary D.2 to D Q ∪ C Q , which is of size at most ε − + 1 . E Summary of Results
We conclude by grouping all bounds we have obtained in three tables below. In each case ε is therelative difference between the assumed upper bound for r or p and this value itself, i.e. in the contextof r < r max (or p < p max respectively) it satisfies r = (1 − ε ) r max (respectively p = (1 − ε ) p max ).21or the probabilistic model, δ is the probability threshold, i.e., the target must be located withprobability at least − δ . Our results are compared with the best ones known to date. Keep in mindthat for p = (1 − ε ) , it holds − H ( p ) = Θ( ε ) .Table 1: Query complexity in general graphs. Model Queries Regime Previous result Our result fixed vertex - O (log n + L ) (3.1)edge - O (∆(log n + L )) (B.1)linearlybounded vertex r < - log n − H ( r ) (1.1)edge r = (1 − ε ) - ε − ∆ ln n (B.2)prefix-bounded edge r = (1 − ε ) - (cid:0) ε (cid:1) O (∆) log n (D.4)probabi-listic vertex p <
12 log n − H ( p ) ++ − H ( p ) O ( C log n + C log δ − ) C = max(1 , (1 / − p ) √ log log n ) [EKS16] log n + o ( n )+ (cid:101) O (log δ − )1 − H ( p ) (1.2)edge p = (1 − ε ) - O ( ε − ∆ log ∆(log n + log δ − )) (B.3) Table 2: Query complexity in linearly ordered integer ranges of length n with comparison queries,i.e., the generalizations of the classical binary search (equivalent to the edge-query model in paths). Model Queries Regime Previous result Our result fixed binary O (log n + L ) [Aig96],[Lon92] O (log n + L ) (B.1)linearlybounded binary r = (1 − ε ) 8 ε − log n [DGW92] ε − ln n (B.2)ternary r < - log n − H ( r ) (1.1)prefix-bounded binary r = (1 − ε ) O ( poly ( ε − ) log n ) [BK93] O ( ε − log n ) (D.2)probabilistic binary p <
12 log n − H ( p ) ++ O ( log δ − +log log n − H ( p ) ) [BH08] O ( log n +log δ − − H ( p ) ) (B.3) N = { , , . . . } (here N is the value of theunknown target integer). Model Queries Regime Previous result Our result fixed binary - O (log N + L ) (C.1)linearlybounded binary r = (1 − ε ) - O ( ε − log N ) (C.2)ternary r = (1 − ε ) - O ( ε − log N ) (C.2)prefix-bounded binary r = (1 − ε ) O ( ε − ( N log N ) log ε ) [AD91] O ( ε − log N ) (D.5)probabilistic binary p = (1 − ε ) O ( poly ( ε − ) log N log δ − ) [Asl95] O ( ε − (log N + log δ − )) (C.3)(C.3)