A Multi-Urn Model for Network Search
AA Multi-Urn Model for Network Search
Christopher E Marks
Operations Research Center, Massachusetts Institute of TechnologyCharles Stark Draper Laboratory555 Technology SquareCambridge, MA [email protected]
Tauhid Zaman
Department of Operations Management, Sloan School of ManagementMassachusetts Institute of Technology77 Massachusetts Ave.Cambridge, MA [email protected]
We consider the problem of finding a specific target individual hiding in a social network. We propose amethod for network vertex search that looks for the target vertex by sequentially examining the neighbors ofa set of “known” vertices to which the target vertex may have connected. The objective is to find the targetvertex as quickly as possible from amongst the neighbors of the known vertices. We model this type of searchas successively drawing marbles from a set of urns, where each urn represents one of the known verticesand the marbles in each urn represent the respective vertex’s neighbors. Using a dynamic programmingapproach, we analyze this model and show that there is always an optimal “block” policy, in which all ofthe neighbors of a known vertex are examined before moving on to another vertex. Surprisingly, this blockpolicy result holds for arbitrary dependencies in the connection probabilities of the target vertex and knownvertices. Furthermore, we give precise characterizations of the optimal block policy in two specific cases:(1) when the connections between the known vertices and the target vertex are independent, and (2) whenthe target vertex is connected to at most one known vertex. Finally, we provide some general monotonicityproperties and discuss the relevance of our findings in social media and other applications.
1. Introduction
In this paper, we examine the problem of searching a social network for a particular target indi-vidual by sequentially examining the neighbors of other known users. Social media applicationsenable users to connect with each other, forming social networks. For different reasons which wewill discuss, one may wish to find a target individual in the social network. One may have priorknowledge that the target individual is connected with a set of known users, and so the most logicalplace to begin searching is the neighbors of these known users. If querying each of these neighborsincurs some sort of cost, then the goal would be to find the target with as few queries as possible. a r X i v : . [ m a t h . O C ] A ug arks and Zaman: Multi-Urn Model for Network Search For example, suppose Mary is searching a social media application for an account belongingto John, an old friend from school with whom she has lost contact. From what she knows aboutJohn, Mary might be able to develop a list of accounts she knows about within the social mediaapplication to which John’s account might be connected. For example, she might recall that Johnwas good friends with Matt, who has a social media account that is known to Mary. She also mightremember John was active in a certain charity, which also maintains a social media account knownto Mary.After developing such a list, Mary could sequentially explore each account’s connections, butdoing so could take a substantial amount of time. In order to find John’s account quickly (assumingJohn has an account), Mary might devise a search strategy. For example, she might start by lookingat accounts she feels are the most likely to be connected with John’s account. Alternatively, shemight start by looking at accounts with fewer connections, because her goal is to find John’saccount while minimizing the time spent exploring.In this hypothetical scenario, what Mary is doing is an example of a network vertex search in which the sought object, or target might be found by examining the neighbors of a finite setof known vertices. Once the search target is found, the search typically terminates. Each knownvertex i might have a different degree, N i , requiring a different number of search queries to exhaus-tively search. From our scenario, we also consider that each known vertex i might have a differentprobability of being connected to the target vertex, which we denote as ϕ i .In this paper we present a probabilistic “multi-urn” model for searches of this nature in whichwe represent each known vertex as an urn containing a finite number of marbles. Each marblein an urn represents one of the respective vertex’s neighbors. The search consists of successivelydrawing and and examining individual marbles from the urns with the goal of finding a red marble,representing the search target, in the fewest number of draws. Figure 1 depicts the multi-urnmodel for a network search on three known vertices. In this example, each of the known verticesis connected to the search target, so each urn contains a single red marble. Additionally, each urncontains a number of blue marbles that represent the other neighbors belonging to each respectivevertex. Employing a dynamic programming framework, we provide some insight into the optimalsearch policy in this general model and under certain conditions. Our model applies to any scenario where one must sequentially search for the target amongst aset of entities which are separated into different clusters. In our vertex search problem, the entitiesare vertices and the clusters are neighborhoods of the known vertices. Our main motivation forthis model is in social media applications where many times the goal is to find users who harass arks and Zaman:
Multi-Urn Model for Network Search A B CA B C
Target N A =3 N B =2 N C =3 Figure 1 Network search representation as a multi-urn model. others, incite violence, or engage in other dangerous behaviors. Twitter has been suspending largenumbers of users, many of which support or engage in violent extremism, from its micro-bloggingapplication for violating the site’s published rules [11]. The challenge is that these users can simplycreate a new account each time one is suspended. However, from historical data Twitter couldpredict the accounts to which the suspended user is likely to connect. Using this information,Twitter administrators could then apply our optimality criteria to efficiently locate new accountsbelonging to suspended users.There are other scenarios where this model can apply. For instance, for law enforcement andintelligence applications, the search entities could be suspects in a crime and the clusters could begeographical locations. Or if one is examining a dump of emails from a suspect’s server, one maybe looking for an incriminating email, so the entities are emails and the clusters could be recipientsof the emails. In both of these examples, the process of querying the entities requires a non-trivialamount of resources (interviewing a suspect, reading an email), so it is important to find the targetas quickly as possible. Using an optimal or near-optimal search strategy is therefore crucial in theseexamples.
In this paper we develop a multi-urn search model, a new and useful methodology for analyzingsearches similar to the network vertex search problem we proposed in the introduction. We employa dynamic programming framework to analyze this model and provide theoretical results based onthe nature of the cost function.In particular, we show that in this type of search problem there always exists an optimal policy,i.e., one that minimizes the expected number of marbles drawn before finding a red marble, inwhich the marbles in each urn are exhausted before moving on to the next urn (Theorem 2). arks and Zaman:
Multi-Urn Model for Network Search We refer to a policy that meets this criterion as a block policy , and show that this result holdsirrespective of dependencies among the urns. This result is surprising because the presence of thered ball in the urns could have arbitrary correlations. There could be positive correlations, where ifthe ball is in one urn, it is more likely to be in another urn. Or the correlations could be negative,so if a ball is in one urn, it is less likely to be in another urn. Nonetheless, our result shows thatno matter what the dependency between the urns, a block policy is always an optimal policy tofind the red ball.Building on this finding, we provide optimality conditions that enable immediate determinationof an optimal policy in two specific cases: • Each urn contains a red marble independent of other urns (Theorem 4). This case correspondsto the assumption that each known vertex is connected to the search target independent of otherknown vertices’ connections. • There is at most one red marble among all of the urns (Theorem 6). This case corresponds tothe assumption that the search target is connected to at most one known vertex.Finally, we provide detailed analysis of the system dynamics of the multi-urn search model,leading to insight into the intuition behind our findings. We provide monotonicity properties onthe evolution of certain probabilities as marbles are successively drawn (Theorem 7) and establisha useful bound on how much the probability of drawing a red marble from a certain urn can changebetween two successive stages (Theorem 8).
Much of the work that has been done in the context of network search is focused on finding relevant vertices in a large scale network. Google’s PageRank algorithm is probably the most well knownexample of these methods, of which many adaptations and generalizations exist [2]. Our work looksat an essentially different type of network search: one of finding a specific vertex in a network,presumably identifiable by certain features, by investigating network neighborhoods in which thevertex is likely to appear.The network vertex search problem we have proposed could instead be formulated as a multi-armbandit problem. Bubeck and Cesa-Bianchi [4] provide a broad survey of many variations of themulti-arm bandit problem and their respective applications. These problems are typically likenedto a gambler who has a choice of playing from a set of slot machines. At each discrete stage in theprocess the gambler selects and plays a slot machine for a certain cost and receives a stochasticreward from an unknown distribution. The more times the gambler plays a particular machine, themore he is able to learn about its reward distribution.In the multi-arm bandit setting, the gambler would not want to spend too much money playinglow-payout slot machines just to learn their reward distribution. This quandary is the fundamental arks and Zaman:
Multi-Urn Model for Network Search tradeoff between exploration and exploitation , which is inherent in multi-arm bandit problems. Inorder to make money, the gambler wants to play only the highest-payout slot machine. However,he never really knows the true distributions of any of the machines. As a result, optimal multi-armbandit policies often include a balance of exploratory actions, in which decisions are made forthe sole purpose of observing outcomes, and exploitative actions, in which decisions are made tooptimize the outcomes based on what has been learned.The multi-arm bandit problem objective is often characterized as the minimization of regret ,which is essentially the difference in expectation between what the gambler earns and what hewould have earned by playing the best machine. Lai and Robbins [12] provide a very well-knownmethod for constructing adaptive multi-arm bandit policies using upper confidence bounds, forwhich regret grows proportional to the logarithm of the number of plays in the limit. Auer et al.[1] show that this same bound on regret is also achievable in finite time.The multi-urn search model we present could be cast in the context of a finite time multi-armbandit problem, but there are a few notable differences. Our objective, to find the search target asquickly as possible, does not immediately cast itself as minimizing regret. Gittins [10] overcomesthis difficulty by augmenting the state space in the multi-arm bandit formulation with a “success”state, from which no additional costs or rewards are incurred. Building on this adaptation, Gittinsdescribes a class of search problems that are very similar to our network search problem, andcharacterizes the optimal search policy based on his well-known dynamic allocation index [9].Our search problem differs from [10], however, in that each vertex has a fixed, finite, and knownnumber of neighbors. In essence, we assume the reward distribution of each slot machine is known,and we only allow a fixed, finite number of plays on each machine. Unlike the bandit approach,the outcomes of successive marble draws from a single urn are not assumed to be independentobservations from an unknown distribution. Instead, our model uses a known distribution on eachmachine but limits each machine to allowing at most a single win. Furthermore, in our approach weallow for dependencies between the urns, whereas the multi-arm bandit approach typically assumeseach slot machine’s outcomes are independent of the others.In spite of these differences, the dynamic allocation index applied in the class of search mod-els proposed by Gittins [10] has many similarities to our development. The system dynamics inboth cases are governed by Bayesian probability updates. We show that in at least two cases theoptimal policy can be characterized by a priority index, which is derived directly from the systemdynamics and can be interpreted as the expected rewards of decisions. Gittins [10] also mentionsmonotonicity properties of his dynamic allocation index in the context of search that are similarto the monotonicity properties we derive. Our method for proving optimality uses similar logic tothe proofs given in [8], which are based on the original development by Gittins and Jones [9]. arks and Zaman: Multi-Urn Model for Network Search Like multi-arm bandit problems, urn models have been applied in many contexts, includingdiscrete decision processes. The P´olya urn process is a well-known construct using urns that hasbeen adapted and used in many applications [14]. This process generally consists of one or moreurns, each containing certain numbers of marbles of different colors. At each stage in the process amarble is randomly drawn from an urn and its color observed. This color then dictates an actioninvolving placing one or more marbles of certain colors into certain urns.Wei [15] provides a specific adaptation the P´olya urn process to the problem of conductingmedical trials in a way that is meant to exploit the use of treatments that have shown positiveresults in the past, which is very similar to multi-arm bandit models applied in the same context.The P´olya urn process has also been used as a preferential attachment model in the formation ofnetworks [6]. This application can be useful in considering how links form in social networks, andis similar to our problem. We assume, however, that the links are already present in the networkand are instead interested in finding the optimal way to investigate these existing links.Downey et al. [7] employ a multi-urn model that is very similar to ours but serves a differentpurpose: unsupervised information extraction. The model these authors propose uses urns to repre-sent different collections of documents. Marbles drawn from the urns represent specific documents,from which specific labels are extracted. The objective of the model is to learn which labels arethe correct, or “target” labels, and which labels are erroneous extractions.The urn model proposed in [7] differs substantially from ours in its objective. Downey et al. havethe objective of learning model parameters and, in the unsupervised case, learning which labelsare correct. In the urn model we present, we assume the probability distributions and the targetlabels are known a priori, and we aim to to find a target marble as efficiently as possible.Our network search problem is also related to the problem of mutual information maximization. Ifour goal was mutual information maximization, we would not necessarily focus our search effort ontrying to find the target vertex. Instead, we would examine the places that would give us the mostinformation about where the target is likely to be. This is similar to the goal of exploration in themulti-arm bandit problem. Chen et al. [5] analyze a sequential information maximization problemthat parallels our development, using a dynamic programming approach and giving bounds on theperformance of the greedy approach. The problem the authors propose involves learning aboutthe distribution of an unknown parameter of interest by sequentially observing other variables.Each observation provides some information about the unknown parameter, and the objective isto maximize the total information gained in a fixed number of observations.Our multi-urn search model departs most substantially from the development in [5] by imposingadditional constraints and dynamics in the way observations are made. In our model, the urns aredepleted over time, changing the amount of information contained in each successive marble drawn arks and Zaman:
Multi-Urn Model for Network Search in predictable, but sometimes unintuitive ways. Our main contributions are the characterizationsof optimal search policies under various probability models, which come directly from analysis ofthe dynamics inherent in our multi-urn search model.Finally, recent work in scheduling and inspection policies employ similar dynamic programmingapproaches to characterize optimal policies. Levi et al. [13] use dynamic programming to findpolicies that optimally allocate resources between information gathering and task execution. Thisclass of models provides a natural extension to our network search problem. While we assume aprobability model on a set of known vertices, using this approach we could attempt to find theoptimal balance between the time spent learning a probability model on a set of known verticesand the time spent executing the search on the current known vertex set.
2. Multi-urn Search Model
We return to the context of network vertex search as presented in the introduction. Let V be theset of vertices to search, and let N i be the number of search queries required to exhaustively searchvertex i ∈ V . We assume that the neighbors of each vertex are queried in a random order, so thateach individual neighbor query of a particular vertex is equally likely to be the search target, giventhat the target is connected to the queried vertex.Under these assumptions, we can represent this search problem as an experiment involvingrandomly drawing marbles from a set of urns, where each urn represents a known vertex in thenetwork. Each marble in urn i ∈ V represents a neighbor of vertex i . The degree of vertex i is N i ,so urn i initially has N i marbles. With probability ϕ i >
0, exactly one of the N i marbles in urn i is red, indicating that known vertex i is connected to the target vertex. Otherwise, all marbles inall urns are blue.Querying a random neighbor of vertex i in search of the target is analogous to drawing a randommarble from urn i and observing its color. If the marble is red, the target vertex has been located.If the marble is blue, the target has not been found and the search continues with the remainingmarbles. Note that blue marbles are not put back into the urns; once they are drawn they arediscarded. Just as Mary desires to find her old friend John with as few searches as possible, thegoal in this experiment is to minimize the number of blue marbles drawn before finding a red one.We now more completely specify the probability model that accounts for how the target vertexmight be connected to the set of known vertices, i.e., how red marbles might be distributed amongthe urns. Let A i be the event that the target vertex is connected to vertex i . We have alreadydefined ϕ i = P ( A i ) . arks and Zaman: Multi-Urn Model for Network Search More generally, we let ϕ U = P (cid:32) (cid:92) i ∈ U A i (cid:33) be the probability that the target vertex is connected to all vertices in set U ⊆ V . If we were toassume that the target would connect to the members of U independently, then ϕ U = (cid:81) i ∈ U ϕ i .In general, the connections might not be independent. For example, Mary might think that ifJohn connected with a certain musician he liked, he might be more likely to connect to other,similar musicians. In other cases, a connection to a particular vertex might imply a decrease in theprobability of connection to another vertex.In our urn model, we assume a known probability ϕ U for all subsets { U : U ⊆ V} , which fullyspecifies a probability model on the locations of the red marbles among the urns. It allows forarbitrary correlations between urns, so that the presence of a red marble in one urn (or subset ofurns) can have a positive or negative correlation with the presence of a red marble in another urn(or another subset of urns).We note now that the empty set ∅ ∈ { U : U ⊆ V} . By convention, we set (cid:84) i ∈∅ A i = Ω, so that ϕ ∅ = 1. This term is implicitly included in summations over all subsets expressed in this paper. Forexample, the summation (cid:88) U ⊆V ( − | U | ϕ U includes a “1” corresponding to the case in which U = ∅ .Given this set of probabilities, the probability of any specific outcome of marble locations, orvertex connections, can be determined using the well-known inclusion-exclusion formula. For exam-ple, suppose we are interested in the probability that the marble is located in all of the urns in set U and no other urns. This event can be written as (cid:0)(cid:84) i ∈ U A i (cid:1) ∩ (cid:16)(cid:84) j ∈V\ U A cj (cid:17) , with P (cid:32) (cid:92) i ∈ U A i (cid:33) ∩ (cid:92) j ∈V\ U A cj = (cid:88) S : S ⊆V , U ⊆ S ( − | S |−| U | ϕ S ≥ . (1)Throughout this paper, we refer to the type of search described in this section as a multi-urnsearch problem which we now more formally define. Definition 1 A multi-urn search problem is a search problem that can be modeled as sequentiallydrawing marbles from a set of urns, V , where The objective of the searcher is to find a red marble with as few draws as possible. Each urn i ∈ V contains at most a single red marble. Otherwise, all marbles are blue. Each urn i ∈ V contains a known number of marbles, N i . arks and Zaman: Multi-Urn Model for Network Search For each subset of urns U ⊆ V , the probability that a red marble is present in all urns in U is ϕ U . Additionally, we set ϕ ∅ = 1 . The network vertex search problem we used to motivate this model can be characterized as amulti-urn search problem, but this model might have other useful applications as well. For thisreason, in the remainder of this paper we provide all analyses and results in the multi-urn searchcontext, using the language of “urns” and “marbles,” though we could immediately recover ouroriginal context by substituting “known vertices” and “neighbors,” respectively.
In the search model we have defined, the decisions are carried out sequentially in discrete stages.We now take a dynamic programming approach [3] to framing this problem.We model the search process as a discrete dynamic system of the form x ( t + 1) = f ( x ( t ) , u ( t ) , w ( x ( t ) , u ( t ))) , where t = 0 , . . . is the stage of the search, which we equate to the total number of marbles alreadydrawn from the urns. The system state, x ( t ), is a record of the total number of marbles drawn fromeach urn, which sufficiently characterizes the system at stage t . Parameter u ( t ) is the decision made,or urn selected, at stage t , and w ( x ( t ) , u ( t )) is a binary stochastic input that indicates whether ared marble is drawn from urn u ( t ) ∈ V in state x ( t ).If a red marble is drawn at stage t , then w ( x ( t ) , u ( t )) = 1 and the search terminates. Otherwise, w ( x ( t ) , u ( t )) = 0 and the search continues. Letting x i ( t ) be the number of marbles that have beenremoved from urn i at time t , we can explicitly define the state vector x ( t ) = ( x ( t ) , x ( t ) , . . . , x |V| ) . If a blue marble is drawn from urn u ( t ) in state x ( t ), the state transition function is: f ( x ( t ) , u ( t ) ,
0) = x ( t ) + e u ( t ) , where e i is the i th unit vector. If a red marble is drawn at any stage, the search terminates.Our dynamic programming model consists of at most N + 1 stages, where N = (cid:80) i ∈V N i is thetotal number of marbles summed over all of the urns, and is finite.We define a valid policy u as a sequence of decisions ( u (0) , u (1) , . . . , u ( N − , where u ( t ) ∈ V for t = 0 , , . . . , N − and for which |{ t : u ( t ) = i }| = N i ∀ i ∈ V . arks and Zaman: Multi-Urn Model for Network Search This final condition ensures that the policy will eventually exhaust each urn, as long as a redmarble is not found, while at the same time never attempting to draw marbles from an empty urn.A searcher executing a valid policy draws a marble from urn u ( t ) at each stage t until either thetarget marble is found or there are no marbles remaining in any of the urns, in which case theentire policy has been executed.We note that in this dynamic programming model there is no benefit in making policy decisionsduring search execution. At each stage the searcher either draws a red marble, in which case shestops looking, or draws a blue marble and keeps searching. A valid policy provides an ordering ofurn queries that is essentially conditioned on not drawing a red marble, which can be considereda deterministic process governed by our simple state transition function. The expected searchoutcomes for such a policy can be analyzed and compared to those of other policies a priori. t Probability of Drawing a Red Marble
Building on our dynamic pro-gramming modeling assumptions, we now develop the probability distribution associated with w ( x ( t ) , u ( t )). Recall that this function indicates whether a red marble is drawn in stage t : w ( x ( t ) , u ( t )) = 1 implies a red marble is drawn from urn u ( t ) at stage t , while w ( x ( t ) , u ( t )) = 0implies a blue marble is drawn from u ( t ) at stage t .In determining the probability distribution of w ( x ( t ) , u ( t )), it is important to remember thatin order to arrive in stage t while executing policy u , the preceding queries u (0) , u (1) , . . . , u ( t − without drawing a red marble , so that the system arrives in state x ( t ). Forsimplicity of notation, we condition an event on state x ( t ) to imply that state x ( t ) has been reachedwithout having drawn a red marble. For example, P ( A i | x ( t )) represents the probability urn i contains a red marble, given queries u (0) , u (1) , . . . , u ( t −
1) have been executed without drawing ared marble.Using the multiplication rule, we can write the probability P ( w ( x ( t ) , u ( t )) = 1) = (cid:18) N u ( t ) − x u ( t ) ( t ) (cid:19) P ( A u ( t ) | x ( t )) , (2)which is the probability of drawing a red marble from the N u ( t ) − x u ( t ) ( t ) marbles remaining inurn u ( t ), given there is a red marble in u ( t ), multiplied by the probability urn u ( t ) contains a redmarble given the system has arrived at state x ( t ).The complementary probability can be written using the law of total probability: P ( w ( x ( t ) , u ( t )) = 0) = (cid:18) − N u ( t ) − x u ( t ) ( t ) (cid:19) P ( A u ( t ) | x ( t )) + P ( A cu ( t ) | x ( t ))= 1 − (cid:18) N u ( t ) − x u ( t ) ( t ) (cid:19) P ( A u ( t ) | x ( t )) (3) arks and Zaman: Multi-Urn Model for Network Search t Urn Probabilities
In this process, we have assumed a fully specified initialprobability model on the urns, i.e., for any subset U ⊆ V , the probability that a red marble ispresent in all of the urns, ϕ U , is known. This probability model can be thought of as a Bayesianprior, a quantification of the searcher’s beliefs on where a red marble might be found.However, these probabilities are not static. After drawing a marble from an urn, the probabilitieschange as a result of the new information. If the marble drawn is red, then the probability thata red marble existed in the queried urn becomes 1. Likewise, if the marble drawn is blue, thenthe probability that a red marble can be found in the queried urn decreases as a function of thenumber of marbles remaining in the urn and the current urn probability.As long as a red marble is not found, the evolution of urn probabilities over the course of thesearch is completely determined by the initial probability model and the search policy. We nowprovide a general expression for updated urn probabilities at stage t . Theorem 1 (Urn Probabilities).
In a multi-urn search problem over a set of urns V , suppose ared marble is not found in the first t queries when executing a valid policy u = ( u (0) , . . . , u ( N − .Then, for any subset of urns U ⊆ V , the probability of a red marble being in all of the urns in U atstage t is given by: P (cid:32) (cid:92) i ∈ U A i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x ( t ) (cid:33) = (cid:104)(cid:81) i ∈ U (cid:16) − x i ( t ) N i (cid:17)(cid:105) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U x j ( t ) N j (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S x j ( t ) N j . Proof.
See Section 4.1.Substituting the result in Theorem 1 into Equations (2) and (3) gives us the following corollary.
Corollary 1
The probability distribution of w ( x ( t ) , u ( t )) , indicating whether a red marble is drawnat stage t is, P ( w ( x ( t ) , u ( t )) = k ) = (cid:80) S ⊆V ( − | S | ϕ S (cid:81) i ∈ S xi ( t +1) Ni (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S xi ( t ) Ni , k = 0 (cid:16) N u ( t ) (cid:17) (cid:80) { S ⊆V : u ( t ) ∈ S } ( − | S |− ϕ S (cid:81) i ∈ S \{ u ( t ) } xi ( t ) Ni (cid:80) S ⊆V ( − | S | ϕ S (cid:81) i ∈ S xi ( t ) Ni , k = 1 . Theorem 1 provides a few important insights. First, we can see that the urn probabilities at anystage depend only on the numbers of marbles drawn from all of the urns, and not the order in whichthey were drawn. This path-independence property of the stage t urn probabilities is somewhatintuitive: a specific set of marbles drawn gives us a fixed amount of information irrespective of theorder in which we inspect the marbles. We will make use of the path-independence property in theproofs for Theorems 2, 4, and 6. arks and Zaman: Multi-Urn Model for Network Search Another observation is that the form of the probability expression is similar to the well-knowninclusion-exclusion formula for computing probabilities of unions of events. In fact, this probabil-ity is an application of the principle of inclusion-exclusion applied in conjunction with Bayesianupdates. In Lemma 1, we explicitly define the events that are characterized by the inclusion-exclusion formulas in Theorem 1.
In many network search applications, the cost of finding and examining a (random) neighbor ofa known vertex is primarily the time consumed in executing the query and reviewing the resultsto determine whether the neighbor is the search target. Because we have no reason to believe thistime-cost would be different for different vertex-neighbor queries, we assume in our model that thecost of drawing a marble is the same for all urns. The goal of the searcher is simply to minimizethe number of blue marbles drawn, or vertex-neighbor queries executed, before finding the searchtarget.We therefore define the cost function at stage t , g ( t ) = (cid:40) w ( x ( t ) , u ( t )) = 00 otherwise , which applies a unit cost for every blue marble drawn. Because this quantity is stochastic, we setas our objective the minimization of expected total cost. Letting random variable C = (cid:80) Nt =0 g ( t ),we aim to find the optimal policy u to solve the following optimization problem:minimize u E [ C ] . Because C ∈ { , , . . . , N } almost surely, we can write E [ C ] = N − (cid:88) k =0 P ( C > k )= N − (cid:88) k =0 k (cid:89) t =0 P ( w ( x ( t ) , u ( t )) = 0) . (4)The product in this summation, (cid:81) kt =0 P ( w ( x ( t ) , u ( t )) = 0), is exactly the probability of making itto stage k + 1 without having found a red marble. From Corollary 1 we can find an expression forthis probability. Corollary 2
Given a multi-urn search problem and valid policy u , the probability of arriving instage k + 1 without having found a red marble is P ( C > k ) = k (cid:89) t =0 P ( w ( x ( t ) , u ( t )) = 0)= (cid:88) S ⊆V ( − | S | ϕ S (cid:89) i ∈ S x i ( k + 1) N i . arks and Zaman: Multi-Urn Model for Network Search We can therefore rewrite the cost function, E [ C ] = N − (cid:88) t =0 (cid:88) S ⊆V ( − | S | ϕ S (cid:89) i ∈ S x i ( t + 1) N i . (5)Substituting the probability from Corollary 2 into Corollary 1 reveals an interesting property ofthe dynamics of this system: P ( w ( x ( t ) , u ( t )) = 0) = P ( C > t ) P ( C > t − . (6)
3. Key Results
The cost function in equation (5) is nonlinear and nonconvex. Additionally, for a solution to befeasible, the values for x i ( t ), i = 1 , . . . , |V| , t = 0 , . . . , N must be constrained to correspond to stagesreached by a valid policy. Nonlinear, non-convex constrained optimization is difficult in general.However, the structure of the cost function enables us to derive some useful results that characterizethe optimal solution in general, and provide necessary and sufficient conditions for optimality insome specific cases. We now provide our primary general result, in which we give a characterization of an optimalsearch policy in the multi-urn search problem. We begin with a definition.
Definition 2 A block policy is a valid policy u B in which each urn is queried exhaustivelyprior to querying another urn. A block policy can be specified as a sequence of urns u B =( v , v , . . . , v |V| ) , v i ∈ V , implying u ( t ) = v i , i − (cid:88) j =1 N j ≤ t < i (cid:88) j =1 N j . This definition can be used to characterize the optimal policy, which we now formally state.
Theorem 2 (Block Policy Optimality).
Given a multi-urn search problem in which the objec-tive is to minimize the expected number of searches required to find a red marble, an optimal searchpolicy exists that is a block policy.Proof.
See Section 4.2.This result says that an optimal policy for the multi-urn search problem can be characterizedby a sequence of urns. Once this is specified, one then simply searches each urn until it is out ofballs or a red ball is found. The surprising part of this result is that this block policy optimalityholds for arbitrary correlations in the a priori connection probabilities. For instance, there can be arks and Zaman:
Multi-Urn Model for Network Search a negative correlation between two urns, where if the red ball is more likely to be in one urn, it isless likely to be in another. In this case one may intuitively expect that after querying an urn manytimes and not finding a red ball, at some point it might be advantageous to search another urnwhich has a negative correlation with the queried urn. However, our result says that it is optimalto continue querying the current urn until it is exhausted.While Theorem 2 allows for optimal policies that are not block policies, constructing such acase requires initial conditions that include probabilities that are zero. If ϕ { i,j } > { i, j } ⊂ V (as in the case of independent urns), then only block policies can be optimal policies.This result follows from the proof of Theorem 2 (Section 4.2): observe that this condition impliesthat function h ( t ) in equation (13) is strictly increasing in t , creating a contradiction in equation(14).We have shown that for mutli-urn search problems, a block policy is optimal, but we have notyet specified what the block policy is. In general it can be difficult under arbitrary correlationstructures to find the optimal policy. However, under certain assumptions on the urn probabilitymodel, explicit necessary and sufficient optimality conditions can be found. We examine theseconditions next. We now consider the special case in which the red marbles are assumed to be independently presentin each of the urns, so that the presence of a red marble in any urn (or group of urns) does notaffect the probability of a red marble being present in any other urn (or group of urns). Thisprobabilistic independence can be formalized mathematically.
Definition 3 An independent multi-urn search problem is a multi-urn search problem in which,for any subset of urns, U ⊆ V , ϕ U = (cid:89) i ∈ U ϕ i . Intuitively, this independence property should be maintained throughout the search process forany search policy, as we now show explicitly.
Theorem 3 (Independent Urn Probabilities).
Given an independent multi-urn search prob-lem, then for any policy u , at any stage t , the independence property is maintained so that P (cid:32) (cid:92) i ∈ U A i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x ( t ) (cid:33) = (cid:89) i ∈ U P ( A i | x ( t )) . Proof.
See Section 4.3 arks and Zaman:
Multi-Urn Model for Network Search It follows from the result in Theorem 3 that the probability of finding a red marble in stage t isonly a function of the initial conditions and number of times u ( t ) has been queried in the past.The number of marbles that have previously been drawn from other urns i (cid:54) = u ( t ) do not affect P ( w ( x ( t ) , u ( t )) = 1).Because of the independence of the urn probabilities, we are able to obtain closed form expressionsfor the probability of finding a red ball and the expected cost of a block policy, which are statedin the following results. Corollary 3
Given an independent multi-urn search problem, the probability distribution of w ( x ( t ) , u ( t )) at any stage t is P ( w ( x ( t ) , u ( t )) = 0) = N u ( t ) − x u ( t ) ( t + 1) ϕ u ( t ) N u ( t ) − x u ( t ) ( t ) ϕ u ( t ) . Corollary 4
Given an independent multi-urn search problem and a block policy u B =( v , v , . . . , v |V| ) , v i ∈ V , such that τ ( i ) = (cid:80) i − j =1 N v j is the first stage in which urn v i is queried.Then, τ ( i )+ N i − (cid:89) t = τ ( i ) P ( w ( x ( t ) , u ( t )) = 0) = (1 − ϕ v i ) , the contribution of urn v i to the total expected cost is τ ( i )+ N i − (cid:88) k = τ ( i ) k (cid:89) t =0 P ( w ( x ( t ) , u ( t )) = 0) = (cid:18) N v i − ( N v i + 1) ϕ v i (cid:19) i − (cid:89) j =1 (1 − ϕ v j ) , and the total expected cost is E [ C ] = |V| (cid:88) i =1 (cid:18) N i − ( N i + 1) ϕ i (cid:19) i − (cid:89) j =1 (1 − ϕ j ) . Independence implies that knowing the composition of marbles in urn i does not provide anyadditional information on the compositions of marbles in any of the other urns. Drawing a marblefrom urn u ( t ) in stage t still results in an update to this urn’s probability in stage t + 1, butall other urn probabilities remain stationary in this state transition. This property enables us tocharacterize the optimal policy in the case of independent urns. Theorem 4 (Independent Urns Optimality)
Given an independent multi-urn search problem,a block policy u B = ( v , v , . . . , v |V| ) is optimal if and only if N v i (cid:18) − ϕ v i ϕ v i (cid:19) ≤ N v i +1 (cid:18) − ϕ v i +1 ϕ v i +1 (cid:19) , i = 1 , , . . . , |V| . (7) arks and Zaman: Multi-Urn Model for Network Search Proof.
See Section 4.4.We note that this policy is not greedy, i.e., it does not maximize the probability of finding thered marble at each stage. Rather, the optimality condition in equation (7) balances the probabilityof immediately drawing a red marble with the probability of finding a red marble in successivedraws from the same urn.To gain intuition, consider a two-urn example in which each urn has the same probability ofcontaining a red marble ( ϕ = ϕ ), but urn 1 has fewer marbles ( N < N ). In this case the opti-mality condition in equation (7) would have us initially draw marbles from urn 1, which has thesame probability of giving us the red marble as urn 2 but requires fewer draws.Alternatively, consider the case in which the two urns have the same number of marbles but ϕ < ϕ . In this case, the optimal policy according to equation (7) would have us draw from urn 2first, which is more likely than urn 1 to give us a red marble in the same number of draws.In order to more clearly distinguish between a greedy policy and an optimal one, we provide onemore example. Consider the following independent multi-urn search problem with two urns. Urn1 contains N = 1 marble and has probability ϕ = of containing a red marble. Urn 2 contains N = 2 marbles and has probability ϕ = 1 of containing a red marble. This problem admits twoblock policies: u B = (2 ,
1) and ˜ u B = (1 , u B is a greedy policy; urn 1, which has the highest immediate probability of producing ared marble ( ), is queried before urn 2. The expected number of blue marbles drawn using policy˜ u B is E [ ˜ C ] = 716 + (cid:18) (cid:19) (cid:18) (cid:19) = 2132 . Alternatively, if we follow policy u B and draw from urn 2 first, then expected cost is E [ C ] = 12 , which is optimal. By accepting a slightly lower probability in the first draw, this policy guaranteesthat the red marble is found in at most two draws. The optimality condition in Theorem 4, equation(7) provides the best balance between the immediate and long-term benefits of each query. We now turn our attention to another special case of the multi-urn search problem in which welimit the total number of red marbles among all of the urns to a single marble. In our networksearch scenario, this constraint would follow from assuming that the target user is connected to atmost one of the known accounts on Mary’s list. This might not be a valid assumption for Mary tomake, but it might apply to other search scenarios both in and out of the network context. arks and Zaman:
Multi-Urn Model for Network Search For example, suppose law enforcement investigators have evidence that a suspect made a singlephone call from an unknown phone number during a certain period. Having obtained phone recordsfrom all likely recipients, they must efficiently search for the phone call of interest within the recordsof these likely recipients.For a non-network example, suppose a hotel custodian, after servicing all of the hotel rooms,realizes he left his car keys in one of the rooms. The hotel might consist of several wings, each withdifferent numbers of rooms, and the custodian might feel the loss was more probable in certainwings. The custodian wants to search the rooms efficiently for his keys, in order to find them beforenew customers begin to arrive.This one-marble constraint imposes the strongest negative correlations between the urns: if thered marble is in urn i then it cannot be in j , i.e., P ( A j | A i ) = 0 , i (cid:54) = j. Another way to characterize this constraint is to state that the events A , A , . . . , A |V| are disjoint.We now formalize this notion in a definition. Definition 4
A single marble multi-urn search problem is a multi-urn search problem for which ϕ U = 0 ∀ U ⊆ V such that | U | > . We now analyze of the single marble search problem. First we observe that (cid:88) i ∈V ϕ i ≤ . We allow for the possibility that this sum is strictly less than one, implying there is a chancethat none of the urns contain the red marble. If the sum is equal to one, then the assumption isthat exactly one of the urns contains one red marble. Theorem 5 specifies the single marble urnprobabilities for an arbitrary state x ( t ). Theorem 5 (Single Marble Urn Probabilities).
Given a single marble multi-urn search prob-lem and a search policy u . Then, the probability that a red marble is in urn i given state x ( t ) , andgiven no red marble has been found in the first t queries, is P ( A i | x ( t )) = (cid:16) − x i ( t ) N i (cid:17) ϕ i − (cid:80) j ∈V x j ( t ) ϕ j N j . The probability that the red marble is in all of the urns in any subset U ⊂ V , where | U | > is P (cid:32) (cid:92) i ∈ U A i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x ( t ) (cid:33) = 0 . arks and Zaman: Multi-Urn Model for Network Search Proof.
This result follows immediately from the definition of a single marble multi-urn searchproblem and Theorem 1. (cid:3)
Because the red ball can only be in one urn, all joint probabilities are zero. This greatly simplifiesour analysis and allows us to obtain closed form expressions for the probability of finding a redball and the expected cost of a block policy, which are stated in the following results.
Corollary 5
Given a single marble multi-urn search problem and a valid search policy u , theprobability distribution of w ( x ( t ) , u ( t )) , conditioned on not having found a red marble in a previousstages, is P ( w ( x ( t ) , u ( t )) = k ) = − (cid:80) j ∈V (cid:18) xj ( t +1) ϕjNj (cid:19) − (cid:80) j ∈V (cid:18) xj ( t ) ϕjNj (cid:19) , k = 0 N u ( t ) (cid:32) ϕ u ( t ) − (cid:80) j ∈V (cid:18) xj ( t ) ϕjNj (cid:19) (cid:33) , k = 1 . Corollary 6
Given a single marble multi-urn problem and a block policy u B =( v , v , . . . , v |V| ) , v i ∈ V , such that τ ( i ) = (cid:80) i − j =1 N v j is the first stage in which urn v i is queried.Then, the probability of not finding the red marble before reaching stage τ ( i ) is τ ( i ) (cid:89) t =0 P ( w ( x ( t ) , u ( t )) = 0) = 1 − i − (cid:88) j =1 ϕ v j , the contribution of urn v i to the total expected cost is τ ( i )+ N i − (cid:88) k = τ ( i ) k (cid:89) t =0 P ( w ( x ( t ) , u ( t )) = 0) = (cid:32) N i − ( N i + 1) ϕ i − N i i − (cid:88) j =1 ϕ j (cid:33) , and the total expected cost is E [ C ] = |V| (cid:88) i =1 (cid:18) N v i − ( N v i + 1) ϕ v i (cid:19) − |V|− (cid:88) i =1 |V| (cid:88) j = i +1 N v j ϕ v i . Theorem 6 characterizes the optimal solution in the single marble case.
Theorem 6 (Single Marble Optimality).
Given a single marble multi-urn search problem, ablock policy u B = ( v , v , . . . , v |V| ) is an optimal policy if and only if ϕ v i N v i ≥ ϕ v i +1 N v i +1 , i = 1 , , . . . , |V| . (8) Proof.
See Section 4.5. arks and Zaman:
Multi-Urn Model for Network Search The optimality condition given in equation (8) leads to a greedy policy in which, at each stage,the marble that is drawn is the one that is most likely to be red. At stage t = 0 this is certainly true,as the probability of drawing a red marble from any urn i in the first draw is ϕ i /N i , which is exactlywhat the optimality condition optimizes. In the next section we show that that this condition ismaintained through state transitions in an optimal policy. If drawing a marble from urn i has thehighest probability of producing a red marble in stage t , then (assuming the urn has at least onemarble remaining) drawing another marble from the same urn maximizes the probability of findinga red marble in stage t + 1. We now provide a few monotonicity properties that give additional insight into the dynamics ofmulti-urn search problems, as well as block policy optimality.
Theorem 7 (Monotonicity).
Given a multi-urn search problem and a search policy u , the fol-lowing inequalities hold: For any subset of urns U ⊆ V such that u ( t ) ∈ U , P (cid:32) (cid:92) i ∈ U A i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x ( t ) (cid:33) ≥ P (cid:32) (cid:92) i ∈ U A i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x ( t + 1) (cid:33) , with equality holding only in cases in which P (cid:0) A u ( t ) | x ( t ) (cid:1) = 1 or P (cid:0)(cid:84) i ∈ U A i | x ( t ) (cid:1) = 0 . For any stage t for which urn u ( t ) has more than one marble remaining, i.e., N u ( t ) − x u ( t ) ( t ) > , P ( w ( x ( t ) , u ( t )) = 1) ≤ P ( w ( x ( t + 1) , u ( t )) = 1) , with equality holding only when P ( w ( x ( t ) , u ( t )) = 1) = 0 .Proof. See Section 4.6.These monotonicity properties provide intuition into why optimal block policies exist. Supposeat stage t a blue marble is drawn from urn u ( t ). At stage t + 1, the probability of urn u ( t ) containinga red marble has decreased as a result of this new information. However, the probability that thenext marble drawn from urn u ( t ) is red has increased from the previous stage. If drawing from urn u ( t ) in stage t had a high probability of returning a red marble, drawing another marble from u ( t )in stage t + 1 has an even higher probability of producing a red marble.In the case of independent urns, this property provides justification for using a block policy.Suppose urn u ( t ) has multiple marbles in it and is optimal at stage t , and a blue marble is drawnfrom this urn. At stage t + 1 the probability of drawing a red marble from urn u ( t ) has increased,while all other urn and marble probabilities have remained unchanged from the previous stage t .It follows that it would continue to be optimal to draw from urn u ( t ). arks and Zaman: Multi-Urn Model for Network Search If we allow for correlations among the urns, however, drawing a blue marble from urn u ( t ) mightalso increase the probability of drawing a red marble from other urns in the following stage. In thesingle marble multi-urn search problem, drawing a blue marble from urn u ( t ) in stage t increasesthe probability of finding a red marble in each of the other urns in stage t + 1. We now provide ourfinal theoretical result, which states that the rate of probability growth in a queried urn is alwaysat least as large as the rate of probability growth in any other urn. Theorem 8 (Marble Probability Bound).
Given a multi-urn search problem and a searchpolicy u , the following inequality holds: P ( w ( x ( t + 1) , u ( t )) = 1) P ( w ( x ( t ) , u ( t )) = 1) ≥ P ( w ( x ( t + 1) , i ) = 1) P ( w ( x ( t ) , i ) = 1) . Proof.
See Section 4.7.Theorem 8 provides much intuition about why optimal block policies always exist in multi-urnsearch problems. It also shows that a purely greedy strategy, in which the probability of immediatelydrawing a red marble is maximized at each stage, will always produce a (possibly suboptimal) blockpolicy. Finally, it provides some insight into why the optimality conditions for the single marblemulti-urn search problem given in equation (8) result in a greedy policy. Equation (8) specifies thatthe first marble drawn is the one that maximizes the probability of immediately finding the target.It follows from Theorem 8 that subsequent draws from the same urn will continue to maximizethis probability.
In the preceding section we examined how urn probabilities and the probability of drawing a redmarble from each urn evolved as a function of the state of the system. In this section we show byexample how, in general, the correlations among the urns can evolve in ways that we find to becounterintuitive.We say that two urns i and j are positively correlated at stage t if they have positive covariance,i.e., P ( A i ∩ A j | x ( t )) > P ( A i | x ( t )) P ( A j | x ( t )) . Likewise, urns i and j are negatively correlated if their covariance is negative, P ( A i ∩ A j | x ( t )) < P ( A i | x ( t )) P ( A j | x ( t )) . In Theorem 3 we showed the somewhat intuitive result that independence among the urn proba-bilities is preserved through state transitions. In general, correlations can change through Bayesianupdates each time a blue marble is drawn. These changes can include changes in sign. arks and Zaman:
Multi-Urn Model for Network Search We now provide an example scenario in which all correlations are positive in the initial conditions,but after a blue marble is drawn some correlations become negative. Suppose we have a multi-urnsearch problem consisting of three urns. Each urn i has N i = 1 marble and initial probability ϕ i = of containing a red marble. Furthermore, ϕ { , } = ϕ { , } = ϕ { , } = ϕ { , , } = 13 . A quick calculation confirms that this is a valid probability model, and that a red marble exists inat least one of the three urns with probability . We also verify that all correlations are positive, ϕ { , } > ϕ ϕ ϕ { , } > ϕ ϕ ϕ { , } > ϕ ϕ . Also, we have a more general positive correlation property,
Urn 1
Urn 2Urn 3
Figure 2 Venn diagram of the probabilities in the three-urn example. ϕ { , , } > ϕ ϕ ϕ . Figure 2 depicts this probability law in a Venn diagram. Note that there is no probability of exactlytwo urns containing red marbles. A red marble is present in zero, one, or three urns almost surely.Now suppose at stage t = 0, a blue marble is drawn from urn 1. From Theorem 1, the stage 1urn probabilities are P ( A | x (1)) = 0 P ( A | x (1)) = 13 arks and Zaman: Multi-Urn Model for Network Search P ( A | x (1)) = 13 P ( A ∩ A | x (1)) = 0 P ( A ∩ A | x (1)) = 0 P ( A ∩ A | x (1)) = 0 P ( A ∩ A ∩ A | x (1)) = 0 . By eliminating the possibility that each urn contained a red marble, the only outcomes that havepositive probability in stage 1 are single-marble outcomes. The correlation between urns 2 and 3,which was positive in the initial conditions, has become negative in stage 1: P ( A ∩ A | x (1)) = 0 <
19 = P ( A | x (1)) P ( A | x (1)) . One could similarly produce examples in which correlations that were originally negative becomepositive after drawing blue marbles. In the single-marble and independent urn cases, for which weprovided characterizations of the optimal policies in the preceding sections, correlations among theurns exhibited some stationarity with respect to stage. In general, the nature of correlations amongthe urns can change substantially as blue marbles are drawn and probabilities are updated. Thischaracteristic presents a challenge to finding characterizations of the optimal policy in general.
4. Proofs of Theorems
In this section we provide the technical proof for each theorem.
Proof of Theorem 1
Substituting the initial condition, x (0) = , into the result returns theprior P (cid:32) (cid:92) i ∈ U A i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x (0) (cid:33) = ϕ U := P (cid:32) (cid:92) i ∈ U A i (cid:33) . The proof proceeds by induction. First note that in order to reach state x ( t + 1) from stage t , a blue marble must have been drawn from urn u ( t ) from state x ( t ). We use the law of totalprobability to decompose this event and form a recursion: P (cid:32) (cid:92) i ∈ U A i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x ( t + 1) (cid:33) = P (cid:0)(cid:0)(cid:84) i ∈ U A i (cid:1) ∩ w ( x ( t ) , u ( t )) = 0 | x ( t ) (cid:1) P ( w ( x ( t ) , u ( t )) = 0)= (cid:16) − N u ( t ) − x u ( t ) ( t ) (cid:17) P (cid:16)(cid:84) i ∈ U ∪ u ( t ) A i (cid:12)(cid:12)(cid:12) x ( t ) (cid:17) + P ( A cu ( t ) ∩ (cid:0)(cid:84) i ∈ U A i (cid:1) | x ( t ))1 − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) P ( A u ( t ) | x ( t ))= P (cid:0)(cid:84) i ∈ U A i (cid:12)(cid:12) x ( t ) (cid:1) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) P (cid:16)(cid:84) i ∈ U ∪ u ( t ) A i (cid:12)(cid:12)(cid:12) x ( t ) (cid:17) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) P ( A u ( t ) | x ( t )) (9) arks and Zaman: Multi-Urn Model for Network Search The result in Theorem 1 forms our induction hypothesis. We use this result to form the threeprobabilities in the recursion given in equation (9). P (cid:32) (cid:92) i ∈ U A i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x ( t ) (cid:33) = (cid:104)(cid:81) i ∈ U (cid:16) − x i ( t ) N i (cid:17)(cid:105) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U x j ( t ) N j (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S x j ( t ) N j P (cid:0) A u ( t ) | x ( t ) (cid:1) = (cid:16) − x u ( t ) ( t ) N u ( t ) (cid:17) (cid:80) { S ⊆V : u ( t ) ∈ S } ( − | S |− ϕ S (cid:81) j ∈ S \{ u ( t ) } x j ( t ) N j (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S x j ( t ) N j P (cid:92) i ∈ U ∪ u ( t ) A i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x ( t ) = (cid:104)(cid:81) i ∈ U (cid:16) − xi ( t ) Ni (cid:17)(cid:105) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U xj ( t ) Nj (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S xj ( t ) Nj u ( t ) ∈ U (cid:104)(cid:81) i ∈ ( U ∪ u ( t )) (cid:16) − xi ( t ) Ni (cid:17)(cid:105) (cid:80) { S ⊆V : S ⊇ ( U ∪ u ( t )) } ( − | S |−| U |− ϕ S (cid:81) j ∈ S \ ( U ∪ u ( t )) xj ( t ) Nj (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S xj ( t ) Nj u ( t ) / ∈ U. As we see from these probabilities, we have two cases to consider:1. u ( t ) ∈ U .2. u ( t ) / ∈ U .We now look at each of these cases individually. Case 1: u ( t ) ∈ U . We begin by substituting the probabilities formed using the induction hypoth-esis into the recursion in equation (9). P (cid:32) (cid:92) i ∈ U A i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) x ( t + 1) (cid:33) = P (cid:0)(cid:84) i ∈ U A i (cid:12)(cid:12) x ( t ) (cid:1) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) P (cid:16)(cid:84) i ∈ U ∪ u ( t ) A i (cid:12)(cid:12)(cid:12) x ( t ) (cid:17) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) P ( A u ( t ) | x ( t ))= (cid:16) − N u ( t ) − x u ( t ) ( t ) (cid:17) (cid:104)(cid:81) i ∈ U (cid:16) − xi ( t ) Ni (cid:17)(cid:105) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U xj ( t ) Nj (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S xj ( t ) Nj − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) (cid:32) (cid:18) − xu ( t )( t ) Nu ( t ) (cid:19) (cid:80) { S ⊆V : u ( t ) ∈ S } ( − | S |− ϕ S (cid:81) j ∈ S \{ u ( t ) } xj ( t ) Nj (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S xj ( t ) Nj (cid:33) = (cid:16) N u ( t ) − x u ( t ) ( t ) − N u ( t ) − x u ( t ) ( t ) (cid:17) (cid:20)(cid:18) Nu ( t ) − xu ( t )( t ) Nu ( t ) (cid:19) (cid:81) i ∈ U \{ u ( t ) } (cid:16) − xi ( t ) Ni (cid:17)(cid:21) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U xj ( t ) Nj (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S xj ( t ) Nj − (cid:16) N u ( t ) (cid:17) (cid:32) (cid:80) { S ⊆V : u ( t ) ∈ S } ( − | S |− ϕ S (cid:81) j ∈ S \{ u ( t ) } xj ( t ) Nj (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S xj ( t ) Nj (cid:33) We proceed by separating the summations into terms corresponding to sets containing u ( t ) andthose that do not contain u ( t ). We can then factor out the terms corresponding to urn u ( t ) andmake the following substitutions: x i ( t ) = (cid:40) x i ( t + 1) i (cid:54) = u ( t ) x i ( t + 1) − i = u ( t ) . arks and Zaman: Multi-Urn Model for Network Search Continuing the simplification from above,= (cid:16) N u ( t ) − x u ( t ) ( t +1) N u ( t ) (cid:17) (cid:104)(cid:81) i ∈ U \{ u ( t ) } (cid:16) − x i ( t +1) N i (cid:17)(cid:105) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U x j ( t +1) N j (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S x j ( t ) N j − (cid:16) N u ( t ) (cid:17) (cid:16)(cid:80) { S ⊆V : u ( t ) ∈ S } ( − | S |− ϕ S (cid:81) j ∈ S \{ u ( t ) } x j ( t ) N j (cid:17) = (cid:104)(cid:81) i ∈ U (cid:16) − x i ( t +1) N i (cid:17)(cid:105) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U x j ( t +1) N j (cid:80) S ⊆V\ u ( t ) ( − | S | ϕ S (cid:81) j ∈ S x j ( t ) N j + (cid:16) x u ( t ) ( t ) N u ( t ) + N u ( t ) (cid:17) (cid:16)(cid:80) { S ⊆V : u ( t ) ∈ S } ( − | S | ϕ S (cid:81) j ∈ S \{ u ( t ) } x j ( t ) N j (cid:17) = (cid:104)(cid:81) i ∈ U (cid:16) − x i ( t +1) N i (cid:17)(cid:105) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U x j ( t +1) N j (cid:80) S ⊆V\ u ( t ) ( − | S | ϕ S (cid:81) j ∈ S x j ( t +1) N j + (cid:80) { S ⊆V : u ( t ) ∈ S } ( − | S | ϕ S (cid:81) j ∈ S x j ( t +1) N j = (cid:104)(cid:81) i ∈ U (cid:16) − x i ( t +1) N i (cid:17)(cid:105) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U x j ( t +1) N j (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S x j ( t +1) N j . Observe that this is the desired result for stage t + 1. We now provide the induction step for thecase in which u ( t ) / ∈ U . Case 2: u ( t ) / ∈ U follows a similar set of steps. We begin by substituting the probabilities formedusing the induction hypothesis into the recursion in equation (9). P (cid:0)(cid:84) i ∈ U A i (cid:12)(cid:12) x ( t ) (cid:1) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) P (cid:16)(cid:84) i ∈ U ∪ u ( t ) A i (cid:12)(cid:12)(cid:12) x ( t ) (cid:17) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) P ( A u ( t ) | x ( t ))= (cid:34) (cid:16)(cid:81) i ∈ U (cid:16) − x i ( t ) N i (cid:17)(cid:17) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U x j ( t ) N j (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S x j ( t ) N j − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) (cid:16)(cid:81) i ∈ U ∪ u ( t ) (cid:16) − x i ( t ) N i (cid:17)(cid:17) (cid:80) { S ⊆V : S ⊇ U ∪ u ( t ) } ( − | S |−| U |− ϕ S (cid:81) j ∈ S \ ( U ∪ u ( t )) x j ( t ) N j (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S x j ( t ) N j (cid:35) × (cid:34) − (cid:18) N u ( t ) − x u ( t ) ( t ) (cid:19) (cid:16) − x u ( t ) ( t ) N u ( t ) (cid:17) (cid:80) { S ⊆V : u ( t ) ∈ S } ( − | S |− ϕ S (cid:81) j ∈ S \{ u ( t ) } x j ( t ) N j (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S x j ( t ) N j (cid:35) − The denominators in the above expression reduce in exactly the same way as in the previouscase in which u ( t ) ∈ U . In fact, the denominators in the induction hypothesis and in equation(9) do not depend on whether u ( t ) ∈ U . Because we have already shown the steps for reducingthis denominator to the desired form, we omit these steps and only show the induction on thenumerators for this case: (cid:34) (cid:32)(cid:89) i ∈ U (cid:18) − x i ( t ) N i (cid:19)(cid:33) (cid:88) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:89) j ∈ S \ U x j ( t ) N j − (cid:18) N u ( t ) − x u ( t ) ( t ) (cid:19) (cid:89) i ∈ U ∪ u ( t ) (cid:18) − x i ( t ) N i (cid:19) (cid:88) { S ⊆V : S ⊇ U ∪ u ( t ) } ( − | S |−| U |− ϕ S (cid:89) j ∈ S \ ( U ∪ u ( t )) x j ( t ) N j (cid:35) arks and Zaman: Multi-Urn Model for Network Search = (cid:32)(cid:89) i ∈ U (cid:18) − x i ( t ) N i (cid:19)(cid:33) (cid:34) (cid:88) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:89) j ∈ S \ U x j ( t ) N j + (cid:18) N u ( t ) − x u ( t ) ( t ) (cid:19) (cid:18) N u ( t ) − x u ( t ) ( t ) N u ( t ) (cid:19) (cid:88) { S ⊆V : S ⊇ U ∪ u ( t ) } ( − | S |−| U | ϕ S (cid:89) j ∈ S \ ( U ∪ u ( t )) x j ( t ) N j (cid:35) = (cid:32)(cid:89) i ∈ U (cid:18) − x i ( t ) N i (cid:19)(cid:33) (cid:34) (cid:88) { S ⊆V\{ u ( t ) } : S ⊇ U } ( − | S |−| U | ϕ S (cid:89) j ∈ S \ U x j ( t ) N j + (cid:18) x u ( t ) ( t ) N u ( t ) + 1 N u ( t ) (cid:19) (cid:88) { S ⊆V : S ⊇ U ∪ u ( t ) } ( − | S |−| U | ϕ S (cid:89) j ∈ S \ ( U ∪ u ( t )) x j ( t ) N j (cid:35) We again make the substitution: x i ( t ) = (cid:40) x i ( t + 1) i (cid:54) = u ( t ) x i ( t + 1) − i = u ( t ) , and continue from above:= (cid:32)(cid:89) i ∈ U (cid:18) − x i ( t + 1) N i (cid:19)(cid:33) (cid:34) (cid:88) { S ⊆V\{ u ( t ) } : S ⊇ U } ( − | S |−| U | ϕ S (cid:89) j ∈ S \ U x j ( t + 1) N j + (cid:18) x u ( t ) ( t + 1) N u ( t ) (cid:19) (cid:88) { S ⊆V : S ⊇ U ∪ u ( t ) } ( − | S |−| U | ϕ S (cid:89) j ∈ S \ ( U ∪ u ( t )) x j ( t + 1) N j (cid:35) = (cid:32)(cid:89) i ∈ U (cid:18) − x i ( t + 1) N i (cid:19)(cid:33) (cid:88) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:89) j ∈ S \ U x j ( t + 1) N j . This final expression is the numerator in Theorem 1 for the stage t + 1 urn probabilities. (cid:3) Before we provide a proof for Theorem 2, we state and prove the following Lemma.
Lemma 1
Given a multi-urn search problem on a set of urns V , suppose a policy u is executedto stage t irrespective of whether a red marble is found at any stage. Let B i be the event that ared marble has been drawn from urn i ∈ V in this experiment. Then, for any subset U ⊆ V , theprobability of having drawn a red marble from each of the urns in U and none of the other urns is P (cid:32) (cid:92) i ∈ U B i (cid:33) ∩ (cid:92) j ∈V\ U B cj = (cid:88) S : S ⊆V ,S ⊇ U ( − | S |−| U | ϕ S (cid:89) i ∈ S x i ( t ) N i ≥ . Proof of Lemma 1.
This result is comes from the principle of inclusion-exclusion, and fol-lows from Equation 1. From basic set operations and DeMorgan’s Law, we can write (cid:32) (cid:92) i ∈ U B i (cid:33) = (cid:32) (cid:92) i ∈ U B i (cid:33) ∩ (cid:92) j ∈V\ U B cj ∪ (cid:32) (cid:92) i ∈ U B i (cid:33) ∩ (cid:91) j ∈V\ U B j , arks and Zaman: Multi-Urn Model for Network Search which is a union of disjoint sets. Therefore, P (cid:32) (cid:92) i ∈ U B i (cid:33) ∩ (cid:92) j ∈V\ U B cj = P (cid:32) (cid:92) i ∈ U B i (cid:33) − P (cid:32) (cid:92) i ∈ U B i (cid:33) ∩ (cid:91) j ∈V\ U B j . (10)The probability a red marble is drawn from all of the urns in U in this experiment can be foundusing the multiplication rule: P (cid:32) (cid:92) i ∈ U B i (cid:33) = ϕ U (cid:89) i ∈ U x i ( t ) N i . (11)Recall that ϕ U is the probability of all of the urns in set U containing a red marble, and x i ( t ) N i issimply the fraction of marbles removed from urn i at stage t . The product in this expression impliesconditional independence: given all of the urns in U contain a red marble, the probability that ared marble is drawn from each of them by stage t is the product of the individual probabilities.This conditional independence follows implicitly from our search assumptions. The order of marbledraws from each urn is random, and does not depend on the order of marble draws from any otherurn.We now examine P (cid:16)(cid:0)(cid:84) i ∈ U B i (cid:1) ∩ (cid:16)(cid:83) j ∈V\ U B j (cid:17)(cid:17) . We first note that (cid:32) (cid:92) i ∈ U B i (cid:33) ∩ (cid:91) j ∈V\ U B j = (cid:91) j ∈V\ U (cid:32)(cid:32) (cid:92) i ∈ U B i (cid:33) ∩ B j (cid:33) . Using the principle of inclusion exclusion, we can find the probability of this union, P (cid:91) j ∈V\ U (cid:32)(cid:32) (cid:92) i ∈ U B i (cid:33) ∩ B j (cid:33) = (cid:88) j ∈V\ U P (cid:32)(cid:32) (cid:92) i ∈ U B i (cid:33) ∩ B j (cid:33) − (cid:88) j ∈V\ U (cid:88) k ∈V\ ( U ∪{ j } ) P (cid:32)(cid:32) (cid:92) i ∈ U B i (cid:33) ∩ B j ∩ B k (cid:33) · · · + ( − |V|−| U | +1 P (cid:32)(cid:92) i ∈V B i (cid:33) . (12)Substituting expressions 11 and 12 into equation 10 reduces to the desired result. The principle ofinclusion-exclusion and the axioms of probability ensure that this quantity is nonnegative. (cid:3) We now provide the proof of Theorem 2.
Proof of Theorem 2.
Suppose we are given a multi-urn search problem on a set of urns V , with each urn i ∈ V containing N i marbles and initial target probabilities ϕ U for all U ⊆ V .Suppose also that valid policy u = ( u (0) , . . . , u ( N − u is not a block policy.This implies that we can find a stage τ where u ( τ ) = iu ( τ + 1) , u ( τ + 2) , . . . , u ( τ + δ − (cid:54) = iu ( τ + δ ) = i, arks and Zaman: Multi-Urn Model for Network Search for some urn i ∈ V , where δ > h i j k i lh i i k l ττ−1 τ+1 τ+δ τ+δ+1 ... u(t) t ^ ττ−1 τ+1 τ+δ−1 τ+δ τ+δ+1 ... u(t) th j l ττ−1 τ+δ−1 τ+δ τ+δ+1 ... u(t) ti i ~ j τ+2 k τ+δ−2 Figure 3 Comparison of policies u , ˆ u , and ˜ u . We now consider two alternative policies that move the queries of urn i into “blocks”. Policyˆ u executes the two queries of i in stages τ and τ + 1, then executes the rest of the queries inthe subsequence. Policy ˜ u executes the two queries of i after executing the other queries in thesubsequence. A visual comparison of these policies is provided in Figure 3. Formally,ˆ u = ˆ u ( t ) = (cid:40) u ( t ) , t ≤ τ or t > τ + δu ( t − , t = τ + 1 , τ + 2 , . . . , τ + δ, ˜ u = ˜ u ( t ) = (cid:40) u ( t ) , t < τ or t ≥ τ + δu ( t + 1) , t = τ, τ + 1 , . . . , τ + δ − . Let C be the number of non-target queries (i.e., the cost) when using policy u , ˆ C be the samequantity when using policy ˆ u , and ˜ C be the same quantity when using policy ˜ u . Conditioned onnot having found a red marble, it follows that the state trajectories,ˆ x j ( t ) = x j ( t ) , t ≤ τ + 1 or t > τ + δx j ( t − , t = τ + 2 , . . . , τ + δ, j (cid:54) = ix j ( t −
1) + 1 , t = τ + 2 , . . . , τ + δ, j = i ˜ x j ( t ) = x j ( t ) , t < = τ or t ≥ τ + δx j ( t + 1) , t = τ + 1 , . . . , τ + δ − , j (cid:54) = ix j ( t + 1) − , t = τ + 1 , . . . , τ + δ − , j = i, where ˆ x j ( t ) and ˜ x j ( t ) are the numbers of times urn j has been queried before stage t under policiesˆ u and ˜ u , respectively.Using equation (5), our assumptions imply that E [ ˆ C ] − E [ C ] ≥ N − (cid:88) t =0 (cid:88) S ⊆V ( − | S | ϕ S (cid:89) j ∈ S ˆ x j ( t + 1) N j − N − (cid:88) t =0 (cid:88) S ⊆V ( − | S | ϕ S (cid:89) j ∈ S x j ( t + 1) N j ≥ arks and Zaman: Multi-Urn Model for Network Search (cid:88) S ⊆V\{ i } (cid:32) τ + δ (cid:88) t = τ +2 ( − | S | ϕ S (cid:89) j ∈ S ˆ x j ( t ) N j − τ + δ (cid:88) t = τ +2 ( − | S | ϕ S (cid:89) j ∈ S x j ( t ) N j (cid:33) + (cid:88) S ⊆V : i ∈ S (cid:32) τ + δ (cid:88) t = τ +2 ( − | S | ϕ S (cid:89) j ∈ S ˆ x j ( t ) N j − τ + δ (cid:88) t = τ +2 ( − | S | ϕ S (cid:89) j ∈ S x j ( t ) N j (cid:33) ≥ (cid:88) S ⊆V\{ i } (cid:32) τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:89) j ∈ S x j ( t ) N j − τ + δ (cid:88) t = τ +2 ( − | S | ϕ S (cid:89) j ∈ S x j ( t ) N j (cid:33) + (cid:88) S ⊆V : i ∈ S τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:18) x i ( t ) + 1 N i (cid:19) (cid:89) j ∈ S \{ i } x j ( t ) N j − τ + δ (cid:88) t = τ +2 ( − | S | ϕ S (cid:18) x i ( t ) N i (cid:19) (cid:89) j ∈ S \{ i } x j ( t ) N j ≥ (cid:88) S ⊆V\{ i } (cid:32) ( − | S | ϕ S (cid:89) j ∈ S x j ( τ + 1) N j − ( − | S | ϕ S (cid:89) j ∈ S x j ( τ + δ ) N j (cid:33) + (cid:88) S ⊆V : i ∈ S (cid:18) x i ( τ + δ ) N i (cid:19) ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + 1) N j − ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + δ ) N j + 1 N i (cid:88) S ⊆V : i ∈ S τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( t ) N j ≥ (cid:88) S ⊆V\{ i } (cid:32) ( − | S | ϕ S (cid:89) j ∈ S x j ( τ + δ ) N j − ( − | S | ϕ S (cid:89) j ∈ S x j ( τ + 1) N j (cid:33) + (cid:88) S ⊆V : i ∈ S (cid:18) x i ( τ + δ ) N i (cid:19) ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + δ ) N j − ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + 1) N j − N i (cid:88) S ⊆V : i ∈ S τ + δ − (cid:88) t = τ +2 ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( t ) N j ≤ N i (cid:88) S ⊆V : i ∈ S ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + 1) N j . Optimality of u likewise implies E [ ˜ C ] − E [ C ] ≥ N − (cid:88) t =0 (cid:88) S ⊆V ( − | S | ϕ S (cid:89) j ∈ S ˜ x j ( t + 1) N j − N − (cid:88) t =0 (cid:88) S ⊆V ( − | S | ϕ S (cid:89) j ∈ S x j ( t + 1) N j ≥ (cid:88) S ⊆V (cid:32) τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:89) j ∈ S ˜ x j ( t ) N j − τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:89) j ∈ S x j ( t ) N j (cid:33) ≥ (cid:88) S ⊆V\{ i } (cid:32) τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:89) j ∈ S ˜ x j ( t ) N j − τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:89) j ∈ S x j ( t ) N j (cid:33) + (cid:88) S ⊆V : i ∈ S (cid:32) τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:89) j ∈ S ˜ x j ( t ) N j − τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:89) j ∈ S x j ( t ) N j (cid:33) ≥ (cid:88) S ⊆V\{ i } (cid:32) τ + δ (cid:88) t = τ +2 ( − | S | ϕ S (cid:89) j ∈ S x j ( t ) N j − τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:89) j ∈ S x j ( t ) N j (cid:33) + (cid:88) S ⊆V : i ∈ S τ + δ (cid:88) t = τ +2 ( − | S | ϕ S (cid:18) x i ( t ) − N i (cid:19) (cid:89) j ∈ S \{ i } x j ( t ) N j − τ + δ − (cid:88) t = τ +1 ( − | S | ϕ S (cid:18) x i ( t ) N i (cid:19) (cid:89) j ∈ S \{ i } x j ( t ) N j ≥ (cid:88) S ⊆V\{ i } (cid:32) ( − | S | ϕ S (cid:89) j ∈ S x j ( τ + δ ) N j − ( − | S | ϕ S (cid:89) j ∈ S x j ( τ + 1) N j (cid:33) + (cid:88) S ⊆V : i ∈ S ( − | S | ϕ S (cid:18) x i ( τ + δ ) − N i (cid:19) (cid:89) j ∈ S \{ i } x j ( τ + δ ) N j − ( − | S | ϕ S (cid:18) x i ( τ + 1) N i (cid:19) (cid:89) j ∈ S \{ i } x j ( τ + 1) N j arks and Zaman: Multi-Urn Model for Network Search − N i (cid:88) S ⊆V : i ∈ S τ + δ − (cid:88) t = τ +2 ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( t ) N j ≥ (cid:88) S ⊆V\{ i } (cid:32) ( − | S | ϕ S (cid:89) j ∈ S x j ( τ + δ ) N j − ( − | S | ϕ S (cid:89) j ∈ S x j ( τ + 1) N j (cid:33) + (cid:88) S ⊆V : i ∈ S x i ( τ + δ ) N i ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + δ ) N j − ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + 1) N j − N i (cid:88) S ⊆V : i ∈ S τ + δ − (cid:88) t = τ +2 ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( t ) N j ≥ (cid:18) N i (cid:19) (cid:88) S ⊆V : i ∈ S ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + δ ) N j . Now let α = (cid:88) S ⊆V\{ i } (cid:32) ( − | S | ϕ S (cid:89) j ∈ S x j ( τ + δ ) N j − ( − | S | ϕ S (cid:89) j ∈ S x j ( τ + 1) N j (cid:33) + (cid:88) S ⊆V : i ∈ S (cid:18) x i ( τ + δ ) N i (cid:19) ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + δ ) N j − ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + 1) N j − N i (cid:88) S ⊆V : i ∈ S τ + δ − (cid:88) t = τ +2 ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( t ) N j , which appears in both of the expected cost inequalities we have derived. We have established that α ≥ (cid:18) N i (cid:19) (cid:88) S ⊆V : i ∈ S ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + δ ) N j α ≤ (cid:18) N i (cid:19) (cid:88) S ⊆V : i ∈ S ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( τ + 1) N j We now show that for any i ∈ V , for any u ( t ) ∈ V \ { i } , h ( t ) = (cid:88) S ⊆V : i ∈ S ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( t ) N j (13)is a nondecreasing function that is strictly increasing when ϕ { i,u ( t ) } > h ( t + 1) = (cid:88) S ⊆V : i ∈ S ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( t + 1) N j = (cid:88) S ⊆V\{ u ( t ) } : i ∈ S ( − | S | ϕ S (cid:89) j ∈ S \{ i } x j ( t ) N j + (cid:18) x u ( t ) ( t ) + 1 N u ( t ) (cid:19) (cid:88) S ⊆V : { i,u ( t ) }⊆ S ( − | S | ϕ S (cid:89) j ∈ S \{ i,u ( t ) } x j ( t ) N j = h ( t ) + (cid:18) N u ( t ) (cid:19) (cid:88) S ⊆V : { i,u ( t ) }⊆ S ( − | S | ϕ S (cid:89) j ∈ S \{ i,u ( t ) } x j ( t ) N j . arks and Zaman: Multi-Urn Model for Network Search From Lemma 1, (cid:18) x i ( t ) x u ( t ) ( t ) N i N u ( t ) (cid:19) (cid:88) S ⊆V : { i,u ( t ) }⊆ S ( − | S | ϕ S (cid:89) j ∈ S \{ i,u ( t ) } x j ( t ) N j ≥ ⇒ (cid:18) N u ( t ) (cid:19) (cid:88) S ⊆V : { i,u ( t ) }⊆ S ( − | S | ϕ S (cid:89) j ∈ S \{ i,u ( t ) } x j ( t ) N j ≥ . This final inequality implies h ( t + 1) ≥ h ( t ). Because urn i is not queried in stages t = τ + 1 , . . . , τ + δ , h ( t ) is nondecreasing over these stages. Therefore we can write α ≥ h ( τ + δ ) ≥ h ( τ + 1) ≥ α. (14)This expression can only be satisfied by equality, which means that for any optimal policy thatis not a block policy, we can maintain optimality while successively permuting the policy so thatthe urn queries are arranged into blocks. (cid:3) Proof of Theorem 3.
This result comes from substituting the appropriate products intothe result from Theorem 1. For any urn i ∈ V , P ( A i | x ( t )) = (cid:16) − x i ( t ) N i (cid:17) (cid:80) { S ⊆V : i ∈ S } ( − | S |− ϕ S (cid:81) j ∈ S \{ i } x j ( t ) N j (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S x j ( t ) N j = ϕ i (cid:16) − x i ( t ) N i (cid:17) (cid:80) { S ⊆V\{ i }} ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j (cid:80) S ⊆V ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j = ϕ i (cid:16) − x i ( t ) N i (cid:17) (cid:80) { S ⊆V\{ i }} ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j (cid:80) S ⊆V\{ i } ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j + (cid:80) S ⊆V : i ∈ S ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j = ϕ i (cid:16) − x i ( t ) N i (cid:17) (cid:80) { S ⊆V\{ i }} ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j (cid:80) S ⊆V\{ i } ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j − ϕ i x i ( t ) N i (cid:80) S ⊆V\{ i } ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j = ϕ i (cid:16) − x i ( t ) N i (cid:17) − ϕ i (cid:16) x i ( t ) N i (cid:17) . In a similar manner we use Theorem 1 to find the urn probability for subset U ⊆ V at stage t , P (cid:32) (cid:92) i ∈ U A i (cid:33) = (cid:104)(cid:81) i ∈ U (cid:16) − x i ( t ) N i (cid:17)(cid:105) (cid:80) { S ⊆V : S ⊇ U } ( − | S |−| U | ϕ S (cid:81) j ∈ S \ U x j ( t ) N j (cid:80) S ⊆V ( − | S | ϕ S (cid:81) j ∈ S x j ( t ) N j = (cid:104)(cid:81) i ∈ U ϕ i (cid:16) − x i ( t ) N i (cid:17)(cid:105) (cid:80) S ⊆V\ U ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j (cid:80) T ⊆ U (cid:80) S ⊆V\ U ( − | S | + | T | (cid:81) j ∈ S ∪ T ϕ j x j ( t ) N j arks and Zaman: Multi-Urn Model for Network Search = (cid:104)(cid:81) i ∈ U ϕ i (cid:16) − x i ( t ) N i (cid:17)(cid:105) (cid:80) S ⊆V\ U ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j (cid:80) T ⊆ U ( − | T | (cid:81) k ∈ T ϕ k x k ( t ) N k (cid:80) S ⊆V\ U ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j = (cid:104)(cid:81) i ∈ U ϕ i (cid:16) − x i ( t ) N i (cid:17)(cid:105)(cid:80) S ⊆ U ( − | S | (cid:81) j ∈ S ϕ j x j ( t ) N j = (cid:89) i ∈ U P ( A i ) . In the final equality, we have used the property that for any β , . . . , β M , M (cid:89) i =1 (1 − β i ) = (cid:88) S ⊆ [ M ] (cid:89) j ∈ S ( − β j ) . We can verify this property by induction. Define (cid:81) j ∈∅ ( − β j ) = 1. Now observe M +1 (cid:89) i =1 (1 − β i ) = (1 − β M +1 ) M (cid:89) i =1 (1 − β i )= (1 − β M +1 ) (cid:88) S ⊆ [ M ] (cid:89) j ∈ S ( − β j )= (cid:88) S ⊆ [ M ] (cid:89) j ∈ S ( − β j ) − β M +1 (cid:88) S ⊆ [ M ] (cid:89) j ∈ S ( − β j )= (cid:88) S ⊆ [ M +1] (cid:89) j ∈ S ( − β j ) . Setting β i = ϕ i (cid:16) x i ( t ) N i (cid:17) achieves the desired result. (cid:3) Proof of Theorem 4.
First we prove that the condition in equation 7 implies optimality bycontrapositive. Let u B = ( v , v , . . . , v |V| ) be a block policy that does not satisfy this condition, andlet i be an index for which N v i (cid:18) − ϕ v i ϕ v i (cid:19) > N v i +1 (cid:18) − ϕ v i +1 ϕ v i +1 (cid:19) . Also, we define the first stage in which urn v i is queried in this policy as τ = (cid:80) i − j =0 N j .We now construct an alternative block policy ˜ u B , so that˜ v j = v j j / ∈ { i, i + 1 } v i +1 j = iv i j = i + 1 . Let E [ C ] = (cid:80) N − k =0 (cid:81) kt =0 P ( w ( x ( t ) , u ( t )) = 0) be the expected cost of policy u B and E [ ˜ C ] = (cid:80) N − k =0 (cid:81) kt =0 P ( w (˜ x ( t ) , ˜ u ( t )) = 0) be the expected cost of policy ˜ u B . For brevity, we also define arks and Zaman: Multi-Urn Model for Network Search γ = (cid:81) τ − t =0 P ( w ( x ( t ) , u ( t )) = 0) = (cid:81) i − j =1 (1 − ϕ v j ) > , according to Corollary 4. Now consider thedifference in expected cost, E [ C ] − E [ ˜ C ] = N − (cid:88) k =0 k (cid:89) t =0 P ( w ( x ( t ) , u ( t )) = 0) − N − (cid:88) k =0 k (cid:89) t =0 P ( w (˜ x ( t ) , ˜ u ( t )) = 0)= τ + N vi + N vi +1 − (cid:88) k = τ k (cid:89) t =0 P ( w ( x ( t ) , u ( t )) = 0) − τ + N vi + N vi +1 − (cid:88) k = τ k (cid:89) t =0 P ( w (˜ x ( t ) , ˜ u ( t )) = 0) (a) = γ τ + N vi + N vi +1 − (cid:88) k = τ k (cid:89) t = τ P ( w ( x ( t ) , u ( t )) = 0) − γ τ + N vi + N vi +1 − (cid:88) k = τ k (cid:89) t = τ P ( w (˜ x ( t ) , ˜ u ( t )) = 0) (b) = γ (cid:18)(cid:18) N v i − ( N v i + 1) ϕ v i (cid:19) + (1 − ϕ v i ) (cid:18) N v i +1 − ( N v i +1 + 1) ϕ v i +1 (cid:19)(cid:19) − γ (cid:18)(cid:18) N v i +1 − ( N v i +1 + 1) ϕ v i +1 (cid:19) + (1 − ϕ v i +1 ) (cid:18) N v i − ( N v i + 1) ϕ v i (cid:19)(cid:19) = γ (cid:18) ϕ v i +1 (cid:18) N v i − ( N v i + 1) ϕ v i (cid:19) − ϕ v i (cid:18) N v i +1 − ( N v i +1 + 1) ϕ v i +1 (cid:19)(cid:19) = γϕ v i ϕ v i +1 (cid:18)(cid:18) N v i − N v i ϕ v i − ϕ v i ϕ v i (cid:19) − (cid:18) N v i +1 − N v i +1 ϕ v i +1 − ϕ v i ϕ v i +1 (cid:19)(cid:19) = γϕ v i ϕ v i +1 (cid:18) N v i (cid:18) − ϕ v i ϕ v i (cid:19) − N v i +1 (cid:18) − ϕ v i +1 ϕ v i +1 (cid:19)(cid:19) > . (15)Steps (a) and (b) follow immediately from Corollary 4. The difference in expected cost, E [ C ] − E [ ˜ C ],is strictly positive so that block policy u B cannot be optimal. Therefore, the condition given inTheorem 4 is necessary for optimality.Now we show that the same condition is sufficient for optimality, i.e., if a block policy satisfiesequation (7) in Theorem 4, then it must be an optimal policy. In order to form a contradiction,suppose now that u B is a block policy that satisfies the condition but is not optimal. Let u (cid:63)B bethe optimal block policy, which Theorem 2 guarantees to exist.We know from the above argument that u (cid:63)B also must satisfy the condition, which implies thatpolicies u B and u (cid:63)B can only differ by permuting subsequences v i , v i +1 , . . . v i + δ for which N v i (cid:18) − ϕ v i ϕ v i (cid:19) = N v i +1 (cid:18) − ϕ v i +1 ϕ v i +1 (cid:19) = · · · = N v i + δ (cid:18) − ϕ v i + δ ϕ v i + δ (cid:19) . The optimal block policy u (cid:63)B therefore can be constructed by executing a finite number of sequentialpairwise exchanges in the urn ordering in block policy u B , each satisfying the condition with arks and Zaman: Multi-Urn Model for Network Search equality. However, it follows from the inequality in equation (15) that any such permutation resultsin the same expected policy cost. We can conclude that the expected costs of the two policies areequal, establishing the contradiction and showing u B to be an optimal policy. (cid:3) Proof of Theorem 6.
The proof is similar to that of Theorem 4. First we prove that thecondition in equation (8) implies optimality by contrapositive. Let u B = ( v , v , . . . , v |V| ) be a blockpolicy that does not satisfy equation (8), and let i be an index for which ϕ v i N v i < ϕ v i +1 N v i +1 . Also, we define the first stage in which urn v i is queried in this policy as τ = (cid:80) i − j =1 N j .We now construct an alternative block policy ˜ u B , so that˜ v j = v j j / ∈ { i, i + 1 } v i +1 j = iv i j = i + 1 . Let E [ C ] = (cid:80) N − k =0 (cid:81) kt =0 P ( w ( x ( t ) , u ( t )) = 0) be the expected cost of policy u B and E [ ˜ C ] = (cid:80) N − k =0 (cid:81) kt =0 P ( w (˜ x ( t ) , ˜ u ( t )) = 0) be the expected cost of policy ˜ u B . Now consider the difference inexpected cost, E [ C ] − E [ ˜ C ] = N − (cid:88) k =0 k (cid:89) t =0 P ( w ( x ( t ) , u ( t )) = 0) − N − (cid:88) k =0 k (cid:89) t =0 P ( w (˜ x ( t ) , ˜ u ( t )) = 0)= τ + N vi + N vi +1 − (cid:88) k = τ k (cid:89) t =0 P ( w ( x ( t ) , u ( t )) = 0) − τ + N vi + N vi +1 − (cid:88) k = τ k (cid:89) t =0 P ( w (˜ x ( t ) , ˜ u ( t )) = 0) (c) = (cid:32) N v i − ( N v i + 1) ϕ v i − N v i i − (cid:88) j =1 ϕ v j + N v i +1 − ( N v i +1 + 1) ϕ v i +1 − N v i +1 i (cid:88) j =1 ϕ v j (cid:33) − (cid:32) N v i +1 − ( N v i +1 + 1) ϕ v i +1 − N v i +1 i − (cid:88) j =1 ϕ v j + N v i − ( N v i + 1) ϕ v i − N v i i − (cid:88) j =1 ϕ v j − N v i ϕ v i +1 (cid:33) = N v i ϕ v i +1 − N v i +1 ϕ v i = N v i N v i +1 (cid:18) ϕ v i +1 N v i +1 − ϕ v i N v i (cid:19) > . Step (c) follows from substituting the results in Corollary 6. The difference in expected cost, E [ C ] − E [ ˜ C ], is strictly positive so that block policy u B cannot be optimal. Therefore, the conditiongiven in equation (7) is necessary for optimality. arks and Zaman: Multi-Urn Model for Network Search Now we show that the same condition is sufficient for optimality, i.e., if a block policy satisfiesequation (8), then it must be an optimal policy. In order to form a contradiction, suppose now that u B is a block policy that satisfies the condition but is not optimal. Let u (cid:63)B be the optimal blockpolicy, which Theorem 2 guarantees to exist.We know from the above argument that u (cid:63)B also must satisfy the condition, which implies thatpolicies u B and u (cid:63)B can only differ by permuting subsequences v i , v i +1 , . . . v i + δ for which ϕ v i N v i = ϕ v i +1 N v i +1 = · · · = ϕ v i + δ N v i + δ . The optimal block policy u (cid:63)B therefore can be constructed by executing a finite number of sequentialpairwise exchanges in the urn ordering in block policy u B , in which the two urns in each exchangesatisfy the equation (8) with equality. However, it follows from our previous argument that anysuch permutation does not affect expected policy cost. We can conclude that the expected costs ofthe two policies are equal, establishing the contradiction and showing u B to be an optimal policy. Proof of Theorem 7.
The first inequality in Theorem 7 follows from Equation 9 inAppendix 4.1. Given u ( t ) ∈ U , P (cid:32) (cid:92) i ∈ U A i | x ( t + 1) (cid:33) = P (cid:32) (cid:92) i ∈ U A i | x ( t ) (cid:33) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) P ( A u ( t ) | x ( t )) ≤ P (cid:32) (cid:92) i ∈ U A i | x ( t ) (cid:33) . Note that if P (cid:0)(cid:84) i ∈ U A i | x ( t ) (cid:1) = 0, then P (cid:0)(cid:84) i ∈ U A i | x ( t + 1) (cid:1) = 0. Equality is likewise preserved when P (cid:0) A u ( t ) | x ( t ) (cid:1) = 1. (We intentionally omit the case when no red marble is found after drawing allof the marbles in an urn i for which ϕ i = 1.) Assuming 0 < P (cid:0)(cid:84) i ∈ U A i | x ( t ) (cid:1) and P (cid:0) A u ( t ) | x ( t ) (cid:1) < < (cid:18) N u ( t ) − x u ( t ) ( t ) (cid:19) P ( A u ( t ) | x ( t )) < (cid:18) N u ( t ) − x u ( t ) ( t ) (cid:19) ≤ − (cid:18) N u ( t ) − x u ( t ) ( t ) (cid:19) P ( A u ( t ) | x ( t )) > − (cid:18) N u ( t ) − x u ( t ) ( t ) (cid:19) > − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) P ( A u ( t ) | x ( t )) P (cid:32) (cid:92) i ∈ U A i | x ( t ) (cid:33) > P (cid:32) (cid:92) i ∈ U A i | x ( t ) (cid:33) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) − (cid:16) N u ( t ) − x u ( t ) ( t ) (cid:17) P ( A u ( t ) | x ( t )) = P (cid:32) (cid:92) i ∈ U A i | x ( t + 1) (cid:33) arks and Zaman: Multi-Urn Model for Network Search To prove the second inequality in Theorem 7, first note that for any real numbers α, β , such that α > α + β >
0, and | β | > α + β ) = α + 2 αβ + β > α + 2 αβ = α ( α + 2 β ) (cid:18) α + βα (cid:19) > (cid:18) α + 2 βα + β (cid:19) Now let α = (cid:88) S ⊆V ( − | S | ϕ S (cid:89) i ∈ S x i ( t ) N i β = (cid:18) N u ( t ) (cid:19) (cid:88) S ⊆V : u ( t ) ∈ S ( − | S | ϕ S (cid:89) i ∈ S \{ u ( t ) } x i ( t ) N i . Observe that α + β = (cid:88) S ⊆V ( − | S | ϕ S (cid:89) i ∈ S x i ( t + 1) N i α + 2 β = (cid:88) S ⊆V ( − | S | ϕ S (cid:89) i ∈ S x i ( t + 2) N i , assuming urn u ( t ) is queried again in stage t + 1. From Lemma 1 and Corollary 2, we can concludethat α > α + β > t withoutfinding a red marble. Therefore, (cid:18) α + βα (cid:19) ≥ (cid:18) α + 2 βα + β (cid:19)(cid:80) S ⊆V ( − | S | ϕ S (cid:81) i ∈ S x i ( t +1) N i (cid:80) S ⊆V ( − | S | ϕ S (cid:81) i ∈ S x i ( t ) N i ≥ (cid:80) S ⊆V ( − | S | ϕ S (cid:81) i ∈ S x i ( t +2) N i (cid:80) S ⊆V ( − | S | ϕ S (cid:81) i ∈ S x i ( t +1) N i P ( w ( x ( t ) , u ( t )) = 0) ≥ P ( w ( x ( t + 1) , u ( t )) = 0) P ( w ( x ( t ) , u ( t )) = 1) ≤ P ( w ( x ( t + 1) , u ( t )) = 1) . It follows from Lemma 1 that β = 0 only when there is no probability of drawing a red marblefrom urn u ( t ) in stage t . Therefore, the inequality is strict whenever P ( w ( x ( t ) , u ( t )) = 1) > Proof of Theorem 8.
We assume that all probabilities are positive. Note that the result inTheorem 8 can be restated P ( w ( x ( t ) , i ) = 1) P ( w ( x ( t ) , u ( t )) = 1) ≥ P ( w ( x ( t + 1) , i ) = 1) P ( w ( x ( t + 1) , u ( t )) = 1) . (16) arks and Zaman: Multi-Urn Model for Network Search Now let γ i = (cid:88) S ⊆V\{ u ( t ) } : i ∈ S ( − | S | ϕ S (cid:89) i ∈ S \{ i } x i ( t ) N i γ u ( t ) = (cid:88) S ⊆V\{ i } : u ( t ) ∈ S ( − | S | ϕ S (cid:89) i ∈ S \{ u ( t ) } x i ( t ) N i γ i,u ( t ) = (cid:88) S ⊆V : { i,u ( t ) }⊆ S ( − | S | ϕ S (cid:89) i ∈ S \{ i,u ( t ) } x i ( t ) N i Substituting the probability distribution from Corollary 1 into equation (16), we have − (1 /N i ) γ i − (cid:16) x u ( t ) ( t ) N i N u ( t ) (cid:17) γ i,u ( t ) − (cid:0) /N u ( t ) (cid:1) γ u ( t ) − (cid:16) x i ( t ) N i N u ( t ) (cid:17) γ i,u ( t ) ≥ − (1 /N i ) γ i − (cid:16) x u ( t ) ( t )+1 N i N u ( t ) (cid:17) γ i,u ( t ) − (cid:0) /N u ( t ) (cid:1) γ u ( t ) − (cid:16) x i ( t ) N i N u ( t ) (cid:17) γ i,u ( t ) − (cid:18) x u ( t ) ( t ) N i N u ( t ) (cid:19) γ i,u ( t ) ≥ − (cid:18) x u ( t ) ( t ) + 1 N i N u ( t ) (cid:19) γ i,u ( t ) ≤ γ i,u ( t ) , which is true from Lemma 1. (cid:3)
5. Conclusion
We have presented a multi-urn search problem as a model for searching for a specific vertex in anetwork. Using this model, we have shown that there is always an optimal block policy in searchesthat meet the multi-urn search problem assumptions, irrespective of correlations in the probabilitymodel. We have also provided necessary and sufficient conditions for block policy optimality intwo specific cases: independent urns and the single red marble scenario. Finally, we gave a fewproperties of the dynamics of the multi-urn search problem and commented on the challenges offinding more general optimality conditions.
Future Work
There are additional generalizations and extensions that we have not considered here, but whichmight also have interesting applications in modeling search. One such generalization is removingthe constraint that an urn can have at most one red marble. This generalization might be anappropriate model for a network vertex search problem in which the network structure allowedfor multiple edges between a pair of nodes. Allowing for multiple red marbles in a single urnsubstantially changes the dynamics of the multi-urn search problem.Another area of further inquiry could involve examination of the performance of different policiesunder various urn probability models. We have shown, for example, that a purely greedy policyis not optimal in the case of independent urns. However, the counter-example suggests that therecould be some lower bounds on greedy policy performance, which might depend on the total arks and Zaman:
Multi-Urn Model for Network Search number of urns and total number of marbles. At the very least, there appears to be limits on howsuboptimal we can make a greedy policy when constructing a two-urn data set. Development inthis direction could build on the results presented in [5].We have assumed throughout our analysis that the target of the search would be easily identifi-able to the searcher. In our urn model this assumption translated to clear color distinction, so weassume we know immediately whether a drawn marble is red or blue. However, we could relax thisassumption in several ways. We could, for example, characterize each marble with a feature setand develop a probability model that gives us the probability that a marble is a red marble, givenits set of features. Depending on the context of the search, this type of incomplete informationmodel could evolve into a stopping problem in which the objective is to determine the best time tostop drawing new marbles in search of a red one. Multi-arm bandit models and sequential mutualinformation maximization [5] might offer useful approaches in this scenario.A slight variation from this imperfect information approach would be to represent some marblesas being more “red” than others, according to some probability model. In this model, the searcherreceives a reward at the end of the search that is a function of the most “red” marble found, but stillhas to pay a fixed cost for each draw. Like the imperfect information model, this formulation wouldultimately be a stopping problem, balancing the current reward attained against the likelihood ofattaining a higher reward by drawing more marbles.We have also assumed uniform urn costs. It is plausible, however, that in some cases it might costmore to draw marbles from some urns than from others. Or, there could be a one-time access costin order to gain the ability to draw marbles from the urn, which could represent law enforcementhaving to get a warrant to obtain internet or phone records for an individual.Our assumption that marbles are drawn in a random sequence from each urn might not be validin some settings. The Google search engine, for example, returns the most relevant results first,so that if a user does not find what he is looking for in the first few pages of results it mightmake sense to try a different query rather than look through the remainder of the pages. If wechanged our model so that red marbles were more likely to be drawn first in each urn, then theproblem would involve deciding when to stop querying an urn and switch to one that might bemore promising. References [1] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmedbandit problem.
Machine learning , 47(2-3):235–256, 2002.[2] Pavel Berkhin. A survey on pagerank computing.
Internet Mathematics , 2(1):73–120, 2005.doi: 10.1080/15427951.2005.10129098. URL http://dx.doi.org/10.1080/15427951.2005.10129098 . arks and Zaman: Multi-Urn Model for Network Search [3] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control . Athena Scientific, 2ndedition, 2000. ISBN 1886529094.[4] S´ebastien Bubeck and Nicol`o Cesa-Bianchi. Regret analysis of stochastic and nonstochasticmulti-armed bandit problems.
CoRR , abs/1204.5721, 2012. URL http://arxiv.org/abs/1204.5721 .[5] Yuxin Chen, S Hamed Hassani, Amin Karbasi, and Andreas Krause. Sequential informationmaximization: When is greedy near-optimal? In
Proc. International Conference on LearningTheory (COLT) , 2015.[6] Fan Chung, Shirin Handjani, and Doug Jungreis. Generalizations of polya’s urn prob-lem.
Annals of Combinatorics , 7(2):141–153, 2003. ISSN 0219-3094. doi: 10.1007/s00026-003-0178-y. URL http://dx.doi.org/10.1007/s00026-003-0178-y .[7] Doug Downey, Oren Etzioni, and Stephen Soderland. A probabilistic model of redundancy ininformation extraction. Technical report, DTIC Document, 2006.[8] Esther Frostig and Gideon Weiss. Four proofs of gittins’ multiarmed bandit theorem.
Annals ofOperations Research , 241(1):127–165, 2016. ISSN 1572-9338. doi: 10.1007/s10479-013-1523-0.URL http://dx.doi.org/10.1007/s10479-013-1523-0 .[9] JC Gittins and DM Jones. A dynamic allocation index for new-product chemical research. report), CUED/A-Mat Stud/TR13, Department of Engineering, Cambridge University , 1974.[10] John C Gittins. Bandit processes and dynamic allocation indices.
Journal of the RoyalStatistical Society. Series B (Methodological) , pages 148–177, 1979.[11] Jessica Guynn and Elizabeth Weis. Twitter suspends 125,000 ISIL-related accounts.
USAToday , February 6, 2016. URL . ; Accessed April 12, 2016.[12] Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics , 6(1):4–22, 1985.[13] Retsef Levi, Thomas Magnanti, and Yaron Shaposhnik. Scheduling with testing. Technicalreport, Working paper, 2016.[14] H. Mahmoud.
Polya Urn Models . Chapman & Hall/CRC Texts in Statistical Science.CRC Press, 2008. ISBN 9781420059847. URL https://books.google.com/books?id=7Bizo28c2LQC .[15] L. J. Wei. The generalized polya’s urn design for sequential medical trials.
The Annalsof Statistics , 7(2):291–296, 1979. ISSN 00905364. URL