[PDF] Max-Throughput for (Conservative) k-of-n Testing

Abstract

We define a variant of k-of-n testing that we call conservative k-of-n testing. We present a polynomial-time, combinatorial algorithm for the problem of maximizing throughput of conservative k-of-n testing, in a parallel setting. This extends previous work of Kodialam and Condon et al., who presented combinatorial algorithms for parallel pipelined filter ordering, which is the special case where k=1 (or k = n). We also consider the problem of maximizing throughput for standard k-of-n testing, and show how to obtain a polynomial-time algorithm based on the ellipsoid method using previous techniques.

Full PDF

aa r X i v : . [ c s . D S ] O c t Max-Throughput for (Conservative) k -of- n Testing

Lisa Hellerstein ∗ ¨Ozg¨ur ¨Ozkan † Linda Sellie ‡ June 4, 2018

Abstract

We deﬁne a variant of k -of- n testing that we call conservative k -of- n testing. We present a polynomial-time, combinatorial algorithm for the problem of maximizing throughput of conservative k -of- n testing,in a parallel setting. This extends previous work of Kodialam and Condon et al., who presented com-binatorial algorithms for parallel pipelined ﬁlter ordering, which is the special case where k = 1 (or k = n ) [4, 5, 8]. We also consider the problem of maximizing throughput for standard k -of- n testing, andshow how to obtain a polynomial-time algorithm based on the ellipsoid method using previous techniques. In standard k -of- n testing, there are n binary tests, that can be applied to an “item” x . We use x i to denotethe value of the i th test on x , and treat x as an element of { , } n . With probability p i , x i = 1, and withprobability 1 − p i , x i = 0. The tests are independent, and we are given p , . . . , p n . We need to determinewhether at least k of the n tests on x have a value of 0, by applying the tests sequentially to x . Once wehave enough information to determine whether this is the case, that is, once we have observed k tests withvalue 0, or n − k + 1 tests with value 1 , we do not need to perform further tests. We deﬁne conservative k -of- n testing the same way, except that we continue performing tests until wehave either observed k tests with value 0, or have performed all n tests. In particular, we do not stop testingwhen we have observed n − k + 1 tests with value 1.There are many applications where k -of- n testing problems arise, including quality testing, medicaldiagnosis, and database query optimization. In quality testing, an item x manufactured by a factory istested for defects. If it has at least k defects, it is discarded. In medical diagnosis, the item x is a patient;patients are diagnosed with a particular disease if they fail at least k out of n special medical tests. Adatabase query may ask for all tuples x satisfying at least k of n given predicates (typically k = 1 or k = n ).For k = 1, standard and conservative k -of- n testing are the same. For k >

1, the conservative variant isrelevant in a setting where, for items failing fewer than k tests, we need to know which tests they failed. Forexample, in quality testing, we may want to know which tests were failed by items failing fewer than k tests(i.e. those not discarded) in order to repair the associated defects.Our focus is on the MaxThroughput problem for k -of- n testing. Here the objective is to maximize thethroughput of a system for k -of- n testing in a parallel setting where each test is performed by a separate“processor”. In this problem, in addition to the probabilities p i , there is a rate limit r i associated with theprocessor that performs test i , indicating that the processor can only perform tests on r i items per unit time. ∗ Polytechnic Institute of NYU. This research is supported by the NSF Grant CCF-0917153. [email protected] † Polytechnic Institute of NYU. This research supported by US Department of Education Grant P200A090157. [email protected] ‡ Polytechnic Institute of NYU. This research is supported by a CIFellows Project postdoc, sponsored by NSF and the CRA. [email protected] In an alternative deﬁnition of k -of- n testing, the task is to determine whether at least k of the n tests have a value of 1.Symmetric results hold for this deﬁnition. axThroughput problems are closely related to MinCost problems [6, 9]. In the

MinCost problemfor k -of- n testing, in addition to the probabilities p i , there is a cost c i associated with performing the i th test. The goal is to ﬁnd a testing strategy (i.e. decision tree) that minimizes the expected cost of testingan individual item. There are polynomial-time algorithms for solving the MinCost problem for standard k -of- n testing [1, 3, 10, 11].Kodialam was the ﬁrst to study the MaxThroughput k -of- n testing problem, for the special casewhere k = 1 [8]. He gave a O ( n log n ) algorithm for the problem. The algorithm is combinatorial, but itscorrectness proof relies on polymatroid theory. Later, Condon et al. studied the problem, calling it “parallelpipelined ﬁlter ordering”. They gave two O ( n ) combinatorial algorithms, with direct correctness proofs [5]. Our Results.

In this paper, we extend the previous work by giving a polynomial-time combinatorialalgorithm for the

MaxThroughput problem for conservative k -of- n testing. Our algorithm can be im-plemented to run in time O ( n ), matching the running time of the algorithms of Condon et al. for 1-of-ntesting. More speciﬁcally, the running time is O ( n (log n + k ) + o ), where o varies depending on the outputrepresentation used; the algorithm can be modiﬁed to produce diﬀerent output representations. We discussoutput representations below.The MaxThroughput problem for standard k -of- n testing appears to be fundamentally diﬀerent fromits conservative variant. We leave as an open problem the task of developing a polynomial time combinatorial algorithm for this problem. We show that previous techniques can be used to obtain a polynomial-timealgorithm based on the ellipsoid method. This approach could also be used to yield an algorithm, based onthe ellipsoid method, for the conservative variant. Output Representation

For the type of representation used by Condon et al. in achieving their O ( n ) bound, o = O ( n ). A more explicit representation has size o = O ( n ). We also describe a new, morecompact output representation for which o = O ( n ).In giving running times, we follow Condon et al. and consider only the time taken by the algorithmto produce the output representation. We note, however, that diﬀerent output representations may incurdiﬀerent post-processing costs when we want to use them them to implement the routings. For example, thecompressed representation has o = O ( n ), but it requires spending O ( n ) time in the worst case to extract anypermutation of megaprocessors stored by the megaprocessor representation. We can reduce this complexityto O (log n ) using persistent search trees [13]. In contrast, the explicit O ( n ) representation gives directaccess to the permutations. In practice, the choice of the best output representation can vary depending onthe application and the setting.For ease of presentation, in our pseudocode we use the megaprocessor representation, which is also usedby Condon et al. [5] in their Equalizing Algorithm. Deshpande and Hellerstein studied the

MaxThroughput problem for k = 1, when there are precedenceconstraints between tests [6]. They also showed a close relationship between the exact MinCost and

MaxThroughput problems for k -of- n testing, when k = 1. Their results can be generalized to apply totesting of other functions.Liu et al. [9] presented a generic, LP based method for converting an approximation algorithm for a MinCost problem, into an approximation algorithm for a

MaxThroughput problem. Their results arenot applicable to this paper, where we consider only exact algorithms.Polynomial-time algorithms for the

MinCost problem for standard k -of- n testing were given by Salloum,Breuer, Ben-Dov, and Chang et al. [1, 3, 10–12].The problem of how to best order a sequence of tests, in a sequential setting, has been studied in manydiﬀerent contexts, and in many diﬀerent models. See for example [9] and [5] for a discussion of related workon the ﬁlter-ordering problem (i.e. the MinCost problem for k = 1) and its variants, and [14] for a generalsurvey of sequential testing of functions. 2 Problem Deﬁnitions A k -of- n testing strategy for tests 1 , . . . , n is a binary decision tree T that computes the k -of- n function, f : { , } n → { , } , where f ( x ) = 1 if and only if x contains fewer than k T is labeled bya variable x i . The left child of a node labeled with x i is associated with x i = 0 (i.e., failing test i ), and theright child with x i = 1 (i.e., passing test i ). Each x ∈ { , } n corresponds to a root-to-leaf path in the usualway, and the label at the leaf is f ( x ).A k -of- n testing strategy T is conservative if, for each root-to-leaf path leading to a leaf labeled 1, thepath contains exactly n non-leaf nodes, each labeled with a distinct variable x i .Given a permutation π of the n tests, we deﬁne T ck ( π ) to be the conservative strategy described by thefollowing procedure: Perform the tests in order of permutation π until at least k in the ﬁrst case, and in the second. Similarly, we deﬁne T sk ( π ) to be the following standard k -of- n testing strategy: Perform the tests in orderof permutation π until at least k n − k + 1 in the ﬁrst case, and in the second. Each test i has an associated probability p i , where 0 < p i <

1. Let D p denote the product distributionon { , } n deﬁned by the p i ’s; that is, if x is drawn from D p , then ∀ i, Pr [ x i = 1] = p i and the x i areindependent. We use x ∼ D p to denote a random x drawn from D p . In what follows, when we use anexpression of the form Pr [ . . . ] involving an item x , we mean the probability with respect to D p . In the

MinCost problem for standard k -of- n testing, we are given n probabilities p i and costs c i >

0, for i ∈ { , . . . , n } , associated with the tests. The goal is to ﬁnd a k -of- n testing strategy T that minimizes theexpected cost of applying T to a random item x ∼ D p . The cost of applying a testing strategy T to an item x is the sum of the costs of the tests along the root-to-leaf path for x in T .In the MinCost problem for conservative k -of- n testing, the goal is the same, except that we are restrictedto ﬁnding a conservative testing strategy.For example, consider the MinCost p = p = 1 / p = 1 / c = 1, c = c = 2. A standard testing strategy for this problem can be described procedurally as follows: Given item x , begin by performing test . If x = 1 , follow strategy T s ( π ) , where π = (2 , . Else if x = 0 ,follow strategy T s ( π ) , where π = (3 , . Under the above strategy, which can be shown to be optimal, evaluating x = (0 , ,

1) costs 5, andevaluating x ′ = (1 , ,

0) costs 3. The expected cost of applying this strategy to a random item x ∼ D p is 3 .Because the MinCost testing strategy may be a tree of size exponential in the number of tests, algorithmsfor the

MinCost problem may output a compact representation of the output strategy.

The Algorithm for the MinCost Problem.

In the literature, versions of the

MinCost problem for1-of- n testing are studied under a variety of diﬀerent names, including pipelined ﬁlter ordering, selectionordering, and satisﬁcing search (cf. [5]).The following is a well-known, simple algorithm for solving the MinCost problem for standard 1-of- n testing (see e.g. [7]): First, sort the tests in increasing order of the ratio c i / (1 − p i ). Next, renumber thetests, so that c / (1 − p ) < c / (1 − p ) < . . . < c n / (1 − p n ). Finally, output the sorted list π = (1 , . . . , n ) oftests, which is a compact representation of the strategy T s ( π ) (which is the same as T c ( π )).The above algorithm can be applied to the MinCost problem for conservative k -of- n testing, simply bytreating π as a compact representation of the conservative strategy T ck ( π ). In fact, that strategy is optimalfor conservative k -of- n testing: it has minimum expected cost among all conservative strategies. This followsimmediately from a lemma of Boros et al. [2] . The lemma of Boros et al. actually proves that the corresponding decision tree is . A decision tree computing afunction f is 0-optimal if it minimizes the expected cost of testing an random x , given that f ( x ) = 0. In conservative k -of- n testing, where f is the k -of- n function, the cost of testing x is the same for all x such that f ( x ) = 1. Thus the problem of ﬁndinga min-cost conservative strategy for k -of- n testing is essentially equivalent to the problem of ﬁnding a 0-optimal decision tree .2 The MaxThroughput problem The

MaxThroughput problem for k -of- n testing is a natural generalization of the MaxThroughput problem for 1-of- n testing, ﬁrst studied by Kodialam [8]. We give basic deﬁnitions and motivation here.For further information about this problem, including information relevant to its application in practicalsettings, see [4, 5, 8].In the MaxThroughput problem for k -of- n testing, as in the MinCost problem, we are given theprobabilities p , . . . , p n associated with the tests. Instead of costs c i for the tests, we are given rate limits r i >

0. The

MaxThroughput problem arises in the following context. There is an (eﬀectively inﬁnite)stream of items x that need to be tested. Every item x must be assigned a strategy T that will determinewhich tests are performed on it. Diﬀerent items may be assigned to diﬀerent strategies. Each test is performedby a separate “processor”, and the processors operate in parallel. (Imagine a factory testing setting.) Item x is sent from processor to processor for testing, according to its strategy T . Each processor can only testone item at a time. We view the problem of assigning items to strategies as a ﬂow-routing problem.Processor O i performs test i . It has rate limit (capacity) r i , indicating that it can only process r i items x per unit time.The goal is to determine how many items should be assigned to each strategy T , per unit time, in orderto maximize the number of items that can be processed per unit time, the throughput of the system. Thesolution must respect the rate limits of the processors, in that the expected number of items that need tobe tested by processor O i per unit time must not exceed r i . We assume that tests behave according toexpectation: if m items are tested by processor O i per unit time, then mp i of them will have the value 1,and m (1 − p i ) will have the value 0.Let T denote the set of all k -of- n testing strategies and T c denote the set of all conservative k -of- n testingstrategies. Formally, the MaxThroughput problem for standard k -of- n testing is deﬁned by the linearprogram below. The linear program deﬁning the MaxThroughput problem for conservative k -of- n testingis obtained by simply replacing the set of k -of- n testing strategies T by the set of conservative k -of- n testingstrategies T c .We refer to a feasible assignment to the variables z T in the LP below as a routing . We call constraints oftype (1) rate constraints . The value of F is the throughput of the routing. We deﬁne g ( T, i ) as the probabilitythat test i will be performed on an item x that is tested using strategy T , when x ∼ D p . For i ∈ { , . . . , n } ,if P T ∈T g ( T, i ) z T = r i , we say that the routing saturates processor O i .We will refer to the MaxThroughput problems for standard and conservative k -of- n testing as the“ SMT ( k ) problem” and the “ CMT ( k ) problem”, respectively.As a simple example, consider the following CMT ( k ) problem (equivalently, SMT ( k ) problem) instance,where k = 1 and n = 2: r = 1, r = 2, p = 1 / p = 1 /

4. There are only two possible strategies, T ( π ), where π = (1 , T ( π ), where π = (2 , T ( π ) is tested by O , g ( T ( π ) ,

1) = 1; this ﬂow continues on to O only if it passes test 1, which happens with probability p = 1 /

2, so g ( T ( π ) ,

2) = 1 /

2. Similarly, g ( T ( π ) ,

2) = 1 while g ( T ( π ) ,

1) = 1 / , since p = 1 / F = 4 / T ( π ), and F = 12 / T ( π ). Then the amount of ﬂow reaching O is 4 / · g ( T ( π ) ,

1) + 12 / · g ( T ( π ) ,

1) = 1, and the amount ofﬂow reaching O is 4 / · g ( T ( π ) ,

2) + 12 / · g ( T ( π ) ,

2) = 2. Since r = 1 and r = 2, this routing saturatesboth processors. By the results of Condon et al. [5], it is optimal. MaxThroughput LP:

Given r , . . . , r n > p . . . , p n ∈ (0 , z T , for all T ∈ T , thatmaximizes F = X T ∈T z T subject to the constraints: computing the k -of- n function. The lemma of Boros et al. also applies to a more general class of functions f that include the k -of- n functions. P T ∈T g ( T, i ) z T ≤ r i for all i ∈ { , . . . , n } and(2) z T ≥ T ∈ T where g ( T, i ) denotes the probability that test i will be performed on an item x that is tested using strategy T , when x ∼ D p . CMT ( k ) problem We begin with some useful lemmas. The algorithms of Condon et al. [5] for maximizing throughput of 1-of- n testing rely crucially on the fact that saturation of all processors implies optimality. We show that the sameholds for conservative k -of- n testing. Lemma 1.

Let R be a routing for an instance of the CMT ( k ) problem. If R saturates all processors, then itis optimal.Proof. Each processor O i can test at most r i items per unit time. Thus at processor O i , there are at most r i (1 − p i ) tests performed that have the value 0. Let f denote the k -of- n function.Suppose R is a routing achieving throughput F . Since F items enter the system per unit time, F itemsmust also leave the system per unit time. An item x such that f ( x ) = 0 does not leave the system until itfails k tests. An item x such that f ( x ) = 1 does not leave the system until it has had all tests performed onit. Thus, per unit time, in the entire system, the number of tests performed that have the value 0 must be F · M , where M = ( k · Pr [ x has at least k P k − j =0 j · Pr [ x has exactly j r i (1 − p i ) tests with the value 0 can occur per unit time at processor O i , F · M ≤ P ni =1 r i (1 − p i ). Solving for F , this gives an upper bound of F ≤ P ni =1 r i (1 − p i ) /M on the maximumthroughput. This bound is tight if all processors are saturated, and hence a routing saturating all processorsachieves the maximum throughput.In the above proof, we rely on the fact that every routing with throughput F results in the same numberof 0 test values being generated in the system per unit time. Note that this is not the case for standard testing, where the number of 0 test values generated can depend on the routing itself, and not just on thethroughput of that routing. We now give a simple counterexample showing that, in fact, saturation does notimply optimality for the SMT ( k ) problem. Consider the MaxThroughput p = 1 / , p = 1 / , p = 3 /

4, and r = 2 , r = 1 , r = 1 .The following is a 2-of-3 testing strategy: Given item x , peform test 1. If x = 1 , follow strategy T s ( π ) ,where π = (2 , . Else if x = 0 , follow strategy T s ( π ) , where π = (3 , . Assigning 2 units of ﬂow to this strategy saturates the processors: O is saturated since it receives the 2units entering the system, O is saturated since it receives 1 = 2 · p units from O and 3 / · p · (1 − p ) itemsfrom O , O . Similarly, O is saturated since it receives 1 = 2 · (1 − p ) units from O and 3 / · (1 − p ) · p units from O O .We show that the routing is not optimal by giving a diﬀerent routing with higher throughput. Therouting uses two strategies. The ﬁrst is as follows: Given item x , perform test 1. If x = 1 , follow strategy T s ( π ) , where π = (3 , . Else, if x = 0 follow strategy T s ( π ) , where π = (2 , . The second strategyused by the routing is T s ( π ), where π = (3 , , F = 1 units to the ﬁrst strategy uses 1 units of the capacity of O , 15 /

16 = 1 · (1 − p ) + 1 · p · (1 − p ) units of the capacity of O , and15 /

16 = 1 · (1 − p ) + 1 · (1 − p ) · p of the capacity of O . This leaves O and O with residual capacitymore than 3 / < / − /

16, and O with residual capacity 1 / − . We can then assign 3 / . (The resulting routing is not optimal, but illustrates our point.)The routing produced by our algorithm for the CMT ( k ) problem uses only strategies of the form T ck ( π ),for some permutation π of the tests (in terms of the LP, this means z T > T = T ck ( π ) for some π ).We call such a routing a permutation routing . We say that it has a saturated suﬃx if for some subset Q of5he processors (1) R saturates all processors in Q , and (2) for every strategy T ck ( π ) used by R , the processorsin Q (in some order) must form a suﬃx of π .With this deﬁnition, and the above lemma, we are now able to generalize a key lemma of Condon etal. to apply to conservative k -of- n testing. The proof is essentially the same as theirs; we present it belowfor completeness. Lemma 2. (Saturated Suﬃx Lemma) Let R be a permutation routing for an instance of the CMT ( k ) problem.If R has a saturated suﬃx, then R is optimal.Proof. If R saturates all processors, then the previous lemma guarantees its optimality. If not, let L denotethe set of processors not saturated by R . Imagine that we removed the rate constraints for each processorin L . Let R ′ be an optimal routing for the resulting problem. We may assume that on any input x , R ′ performs the tests in L in some ﬁxed arbitrary order (until and unless k tests with value 0 are obtained),prior to performing any tests in Q . This assumption is without loss of generality, because if not, we couldmodify R ′ to ﬁrst perform the tests in L without violating feasibility, since the processors in L have no rateconstraints, and performing their tests ﬁrst can only decrease the load on the other processors. Thus thethroughput attained by R ′ is T R · p L , where T R denotes the maximum throughput achievable just with theprocessors in Q , and p L is the probability that a random x will have the value 0 for fewer than k of the testsin L (i.e. it will not be eliminated by the tests in L ).Routing R also routes ﬂow ﬁrst through L , and then through Q . Since it saturates the processors in Q ,by the previous lemma, it achieves maximum possible throughput with those processors. It follows that R achieves the same throughput as R ′ , and hence is optimal for the modiﬁed instance where processors in L have no rate constraints. Since removing constraints can only increase the maximum possible throughput,it follows that R is also optimal for the original instance. We begin by considering the

CMT ( k ) problem in the special case where the rate limits r i are equal tosome constant value r for all processors. Condon et al. presented a closed-form solution for this case when k = 1 [5]. The solution is a permutation routing that uses n strategies of the form T ( π ). Each permutation π is one of the n left cyclic shifts of the permutation (1 , . . . , n ). More speciﬁcally, for i ∈ { , . . . , n } , let π i = ( i, i + 1 , . . . , n, , , . . . , i − T i = T c ( π i ). The solution assigns r (1 − p i − ) / (1 − p · · · p n ) unitsof ﬂow to each T i (where p is deﬁned to be p n ). By simple algebra, Condon et al. veriﬁed that the solutionsaturates all processors. Hence it is optimal.The solution of Condon et al. is based on the fact that for the 1-of- n problem, assigning (1 − p i − ) ﬂowto each T i equalizes the load on the processors. Surprisingly, this same assignment equalizes the load for the k -of- n problem as well. Using this fact, we obtain a closed-form solution to the CMT ( k ) problem. Lemma 3.

Consider an instance of the

CMT ( k ) problem. For i ∈ { , . . . , n } , let T i be as deﬁned above. Let X a,b = P bℓ = a (1 − x ℓ ) and let α = P kt =1 Pr [ X ,n ≥ t ] . Any routing that assigns a total of t units of ﬂow tothe strategies T i , such that the fraction of the total that is assigned to each T i is (1 − p i − ) / P nj =1 (1 − p j − ) ,will cause each processor’s residual capacity to be reduced by tα/ P nj =1 (1 − p j ) units. If all processors havethe same rate limit r , then the routing that assigns r (1 − p i − ) /α units of ﬂow to strategy T i saturates allprocessors.Proof. We begin by considering the routing in which (1 − p i − ) units of ﬂow are assigned to each T i . Considerthe question of how much ﬂow arrives per unit time at processor O , under this routing. For simplicity,assume now that k = 2. Thus as soon as an item has failed 2 tests, it is discarded. Let q i = (1 − p i ).Of the q n units assigned to strategy T , all q n arrive at processor O . Of the q n − units assigned tostrategy T n , all q n − arrive at processor O , since they can fail either 0 or 1 test (namely test n ) beforehand.Of the q n − units assigned to strategy T n − , the number reaching processor O is q n − β n − , where β n − is the probability that an item fails either 0 or 1 of tests n − n . Therefore, β n − = 1 − q n − q n .6ore generally, for i ∈ { , . . . , n } , of the q i − units assigned to T i , the number reaching processor O is q i − β i , where β i is the probability that a random item fails a total of 0 or 1 of tests i, i + 1 , . . . , n .Thus, β i = Pr [ X i,n = 0] + Pr [ X i,n = 1]. It follows that the total ﬂow arriving at processor O is P ni =1 ( q i − Pr [ X i,n = 0]) + P ni =1 ( q i − Pr [ X i,n = 1]) . Consider the second summation, P ni =1 ( q i − Pr [ X i,n = 1]). We claim that this summation is equal to Pr [ X ,n ≥ x has at least two x i ’s that are 0. To see this, consider a processwhere we observe the value of x n , then the value of x n − and so on down towards x , stopping if and when wehave observed exactly two 0’s. The probability that we will stop at some point, having observed two 0’s, isclearly equal to the probability that x has at least two x i ’s that are set to 0. The condition P nj = i (1 − x j ) = 1is satisﬁed when exactly 1 of x n , x n − , . . . , x i has the value 0. Thus q i − Pr [ X i,n = 1] is the probability thatwe observe exactly one 0 in x n , . . . , x i , and then we observe a second 0 at x i − . That is, it is the probabilitythat we stop after observing x i − . Since the second summation takes the sum of q i − Pr [ X i,n = 1] overall i between 1 and n , the summation is precisely equal to the probability of stopping at some point in theabove process, having seen two 0’s. This proves the claim.An analogous argument shows that the ﬁrst summation, P ni =1 ( q i − Pr [ X i,n = 0]), is equal to Pr [ X ,n ≥ O is Pr [ X ,n ≥

1] + Pr [ X ,n ≥ O i is equal to this value.Thus the above routing causes all processors to receive the same amount of ﬂow.Scaling each assignment in the above routing by a constant factor scales the amount of ﬂow reachingeach processor by the same factor. In the above routing, the fraction of total ﬂow assigned to each T i is q i − / P nj =1 q j , so each unit of input ﬂow sent along the T i results in each processor receiving ( Pr [ X ,n ≥ Pr [ X ,n ≥ / P nj =1 q j units. Thus any routing that assigns a total of t units of ﬂow to the strate-gies T i , such that the fraction assigned to each T i is q i − / P nj =1 q j , will cause each processor to receive t ( Pr [ X ,n ≥

1] + Pr [ X ,n ≥ / P nj =1 q j units.Thus if all processors have the same rate limit r , the routing that assigns rq i − / ( Pr [ X ,n ≥

1] + Pr [ X ,n ≥ T i will saturate all processors.The above argument for k = 2 can easily be extended to arbitrary k . The corresponding proportionaldistribution of ﬂow for arbitrary k assigns a q i − / P nj =1 q j fraction of the total ﬂow to strategy T i , and eachunit of input ﬂow sent along the T i according to these proportions results in α/ P nj =1 q j units reaching eachprocessor. The saturating routing for arbitrary k , when all processors have rate limit r , assigns rq i − /α units of ﬂow to strategy T i . Our algorithm for the

CMT ( k ) problem is an adaptation of one of the two MaxThroughput algorithms,for the special case where k = 1, given by Condon et al. [5]. We begin by reviewing that algorithm, whichwe will call the Equalizing Algorithm . Note that when k = 1, it only makes sense to consider strategies thatare permutation routings, since an item can be discarded as soon as it fails a single test.Consider the CMT ( k ) problem for k = 1. View the problem as one of constructing a ﬂow of itemsthrough the processors. The capacity of each processor is its rate limit, and the amount of ﬂow sent alonga permutation π (i.e., assigned to strategy T c ( π )) is equal to the number of items sent along that path perunit time. Sort the tests by their rate limits, and re-number them so that r n ≥ r n − ≥ . . . ≥ r . Assume forthe moment that all rate limits r i are distinct.The Equalizing Algorithm constructs a ﬂow incrementally as follows. Imagine pushing ﬂow along thesingle permutation ( n, . . . , i and i −

1. As we increase the amount of ﬂow, the residual capacityof each decreases continuously. Initially, at zero ﬂow, the residual capacity of i is greater than the residualcapacity of i −

1. It follows by continuity that the residual capacity of i cannot become less than the residual7apacity of i − n, . . . ,

1) until either (1) some processorbecomes saturated, or (2) the residual capacities of at least two of the processors become equal. The secondstopping condition ensures that when the ﬂow increase is halted, permutation ( n, . . . ,

1) still orders theprocessors in decreasing order of their residual capacities. (Algorithmically, we do not increase the ﬂowcontinuously, but instead directly calculate the amount of ﬂow which triggers the stopping condition.)If stopping condition (1) above holds when the ﬂow increase is stopped, then the routing can be shownto have a saturated suﬃx, and hence it is optimal.If stopping condition (2) holds, we keep the current ﬂow, and then augment it by solving a new

Max-Throughput problem in which we set the rate limits of the processors to be equal to their residual capacitiesunder the current ﬂow (their p i ’s remain the same).We solve the new MaxThroughput problem as follows. We group the processors into equivalenceclasses according to their rate limits. We then replace each equivalence class with a single megaprocessor,with a rate limit equal to the residual capacities of the constituent processors, and probability p i equal to theproduct of their probabilities. We then essentially apply the procedure for the case of distinct rate limits tothe megaprocessors. gen The one twist is the way in which we translate ﬂow sent through a megaprocessorinto ﬂow sent through the constituent processors of that megaprocessor; we route the ﬂow through theconstituent processors so as to equalize their load. We accomplish this by dividing the ﬂow proportionallybetween the cyclic shifts of a permutation of the processors, using the proportional allocation of Lemma 3.We thus ensure that the processors in each equivalence class continue to have equal residual capacity. Notethat, under this scheme, the residual capacity of a processor in a megaprocessor may decrease more slowlythan it would if all ﬂow were sent directly to that processor (because some ﬂow may ﬁrst be ﬁltered throughother processors in the megaprocessor) and this needs to be taken into account in determining when thestopping condition is reached.We illustrate the Equalizing Algorithm on the following CMT ( k ) problem where k = 1 and n = 3 (since k = 1 this is also an SMT ( k ) problem, where k = 1 and n = 3). Suppose we have 3 processors, O , O , O with rate limits r = 3 , r = 14, and r = 18, and probabilities p = 1 / , p = 1 / p = 1 /

3. When ﬂowis sent along O , O , O , after 6 units of ﬂow is sent we achieve a stopping condition with O and O havingthe same residual capacity of 12; the residual capacity of O is 2.Our algorithm then performs a recursive call where the processors O and O are combined into amegaprocessor O , with associated probability p , = 1 / · / /

6. Within megaprocessor O , , ﬂow willbe routed by sending 3 / O , O , and the remaining 4 / O , O ;we observe that for one unit of ﬂow sent through O , the amount of capacity used by each processor is3 / / /

7. Using this internal routing for megaprocessor O , , the algorithm sends ﬂow along O , , O ;after 12 units of ﬂow, we reach a stopping condition when O is saturated. Even though O and O are notsaturated (they have 12 − · / E m , . . . , E ) , ˆ t ), one for each recursive call. Wecall this a megaprocessor representation . The list ( E m , . . . , E ) represents the permutation of megaprocessors E i along which ﬂow is sent during that call. Each E i is given by the subset of original processors containedin it, and ˆ t > E m , . . . , E ). Of course,ﬂow coming into each megaprocessor should be routed so as to equalize the load on each of its constituentprocessors. The size of this representation is O ( n ). Interpreted in a straightforward way, the representa-tion corresponds to a routing that sends ﬂow along an exponential number of diﬀerent permutations of theoriginal processors.Condon et al. describe a combinatorial method to reduce the number of such permutations used to be O ( n ) [5]. After such a reduction, the output can be represented explicitly as a set of O ( n ) pairs of the form( π, t ), one for each permutation π that is used, indicating that t > π . We call such a representation a permutation representation . The size of this permutataionrepresentation, given explicitly, is O ( n ). (Hellerstein and Deshpande describe a linear algebraic method for8educing the number of permutations to be at most n , yielding an explicit reprsentation of size O ( n ), butat the cost of higher time complexity [6].)We also describe a variant of the megaprocessor representation called the compressed representation ,where the algorithm outputs only the ﬁrst permutation explicitly, and the outputs the sequence of merges,yielding a representation of size O ( n ). CMT ( k ) problem In this section, we prove the following Theorem by presenting an algorithm. We will give an outline of thealgorithm as well as its pseudocode. We will then describe how to achieve the running time stated in theTheorem.

Theorem 4.

There is a combinatorial algorithm for solving the

CMT ( k ) problem that can be implemented torun in time O ( n (log n + k ) + o ) , where the value of o depends on the output representation. For the megapro-cessor representation, o = O ( n ) , for the permutation representation, o = O ( n ) , and for the compressedrepresentation, o = O ( n ) . Algorithm Outline

We extend the Equalizing Algorithm of Condon et al., to apply to arbitrary values of k . Again, we will push ﬂow along the permutation of the processors ( n, . . . ,

1) (where r n ≥ r n − ≥ . . . ≥ r )until one of the two stopping conditions is reached: (1) a processor is saturated, or (2) two processors haveequal residual capacity. Here, however, we do not discard an item until it has failed k tests, rather thandiscarding it as soon as it fails one test. To reﬂect this, we divide the ﬂow into k diﬀerent types, numbered0 through k −

1, depending on how many tests its component items have failed. Flow entering the systemis all of type 0.When m units of ﬂow of type τ enters a processor O i , p i m units pass test i , and (1 − p i ) m units fail it.So, if τ < k −

1, then of the m incoming units of type τ , (1 − p i ) m units will exit processor O i as type τ + 1ﬂow, and p i m will exit as type τ ﬂow. Both types will be passed on to the next processor in the permutation,if any. If τ = k −

1, then p i m units will exit as type τ ﬂow and be passed on to the next processor, and theremaining (1 − p i ) m will be discarded.Algorithmically, we need to calculate the minimum amount of ﬂow that triggers a stopping condition.This computation is only slightly more complicated for general k than it is for k = 1. The key is to compute,for each processor O i , what fraction of the ﬂow that is pushed into the permutation will actually reachprocessor O i (i.e. we need to compute the quantity g ( T ck ( π ) , i ) in the LP.)If stopping condition (2) holds, we keep the current ﬂow, and augment it by solving a new MaxThrough-put problem in which we set the rate limits of the processors to be equal to their residual capacities underthe current ﬂow (their p i ’s remain the same). To solve the new MaxThroughput problem, we again groupthe processors into equivalence classes according to their rate limits, and replace each equivalence class witha single megaprocessor, with a rate limit equal to the rate limit of the constituent processors, and probability p i equal to the product of their probabilities.We then want to apply the procedure for the case of distinct rate limits to the megaprocessors. To dothis, we need to translate ﬂow sent into a megaprocessor into ﬂow sent through the constituent processorsof that megaprocessor, so as to equalize their load. We do this translation separately for each type of ﬂowentering the megaprocessor. Note that ﬂow of type τ must be discarded as soon as it fails an additional k − τ tests. We therefore send ﬂow of type τ into the constituent processors of the megaprocessor accordingto the proportional allocation of Lemma 3 for ( k − τ )-of- n ′ testing, where n ′ is the number of consituentprocessors of the megaprocessor. We also need to compute how much ﬂow of each type ends up leaving themegaprocessor (some of the incoming ﬂow of type τ entering the megaprocessor may, for example, becomeoutgoing ﬂow of type τ + n ′ ), and how much its residual capacity is reduced by the incoming ﬂow.We give a more detailed description of the necessary computations in the pseudocode, which we discussnext. However, the pseudocode does not contain all the implementation details, and is not optimized foreﬃciency. It also gives the output using a megaprocessor representation. Following presentation of the9seudocode, we discuss how to implement it to achieve the running times stated in Theorem 4 for thediﬀerent output representations. Pseudocode

The main part of the pseudocode is presented below as Algorithm 1. The following informa-tion will be helpful in understanding it.At each stage of the algorithm, the processors are partitioned into equivalence classes. The proces-sors in each equivalence class constitute a megaprocessor. Each equivalence class consists of a contiguoussubsequence of processors in the sorted sequence O n , . . . , O , O . We use m to denote the number of megapro-cessors (equivalence classes). The processors in each equivalence class all have the same residual capacity.In Step 1 of the algorithm, we partition the processors into equivalence classes according to their rate limits;two processors are in the same equivalence class if and only if they have the same rate limit. We use E i to denote both the i th equivalence class and the i th megaprocessor. In some our examples, we denote amegaprocessor containing processors { O i , O i +1 , . . . , O j } by O i,i +1 ,...,j .In Step 2, we compute the amount of ﬂow ˆ t that triggers one of the two stopping conditions. In orderto do this, we need to know the rate at which the residual capacity of each processor within an equivalenceclass E i will be reduced when ﬂow is sent down the megaprocessors in the order E m , . . . , E . We use ξ ( i ) todenote the amount by which the residual capacity of the processors in E i is reduced when one unit of ﬂowis sent in that order.The equation for ξ ( i ) follows from the preceding lemmas and discussion. We use f j ( z ) to denote theamount of ﬂow of type j that would reach processor z , if one unit of ﬂow were sent down the permutation O n , . . . , O , where these are the original processors, not the megaprocessors. This is precisely equal to theprobability that random item x has exactly j n, . . . , z + 1. We compute the value of f j ( z ) forall z and j in a separate initialization routine, given below. The key here is noticing that if you send oneunit of ﬂow down the megaprocessors E m , . . . , E , the amount of ﬂow reaching megaprocessor E i is precisely f j ( c ( i )), where c ( i ) is the highest index of a processor in E i ; the amount of ﬂow reaching the megaprocessordepends only on how many 0’s have been encountered in test n, . . . , c ( i ) + 1, and not on the order used toperform those tests.The quantity ˆ t is the amount of ﬂow sent down E m , . . . , E that would cause saturation of the processorsin E . The quantity ˆ t is the minimum amount of ﬂow sent down E m , . . . , E that would cause the residualcapacities of two megaprocessors to equalize. The stopping condition holds at the minimum of these twoquantities. MaxThroughput

Initialization f j ( z ) ← , ∀ z ∈ { , . . . , n } , ∀ j ∈ { , . . . , k − } ; f (1) ← for ( z ← z ≤ n ; z ← z + 1) dofor ( j ← j ≤ k − j ← j + 1) do f j ( z ) ← q z − f j − ( z −

1) + p z − f j ( z − return SolveMaxThroughput( p , . . . , p n , r , . . . , r n ); Example

We illustrate our algorithm for the

CMT ( k ) problem on the following example. Let k = 2 and n = 4. Suppose the probabilities are p = p = p = 1 / , p = 3 /

4, and the rate limits are r = r = 12, r = r = 10.Our algorithm ﬁrst combines processors with same rate limits into megaprocessors; thus we combine O and O into megaprocessor O , with rate limit 12. It routes ﬂow through this megaprocessor by sending a1 / O , O , and sending the other 1 / O , O . Similarly, O and O have the same rate limit, so they are combined into a megaprocessor O , with rate limit 10,where a 1 / O , O , and the other 2 / O , O .Our megaprocessor O , has a higher rate limit than O , , consequently our algorithm routes ﬂow in theorder O , , O , . We now show that the stopping condition is reached after sending 6 units of ﬂow along this10 lgorithm 1 SolveMaxThroughput( p , . . . , p n , r , . . . , r n ) Input: n selectivities p , . . . , p n ; n rate limits r ≤ . . . ≤ r n Output: representation of solution to the

MaxThroughput problem for the given input parameters // form the equivalence classes E m , . . . , E ;Let 1 ≤ ℓ < . . . < ℓ m +1 = n + 1 such that, for all y, y ′ ∈ [ ℓ i − , ℓ i ) and z, z ′ ∈ [ ℓ i , ℓ i +1 ), where i ∈ [2 , n ],we have r y = r y ′ < r z = r z ′ Then, for i ∈ [1 , m ], E i = { O z | ℓ i ≤ z < ℓ i +1 } , and R i ← r ℓ i . // calculate ˆ t using the following steps; for ( i ← i ≤ m ; i ← i + 1) do c ( i ) ← highest index of a processor in E i ; b ( i ) ← lowest index of a processor in E i ;Recall that X a,b = P bℓ = a (1 − x ℓ ) ξ ( i ) ← P k − j =0 f j ( c ( i )) · (cid:16)P k − jv =1 Pr (cid:2) X b ( i ) ,c ( i ) ≥ v (cid:3)(cid:17) / P c ( i ) t = b ( i ) (1 − p t );ˆ t ← R ξ (1) ;ˆ t ← min i ∈ [2 ,...,m ] (cid:16) R i − R i − ξ ( i ) − ξ ( i − (cid:17) ;ˆ t ← min(ˆ t , ˆ t ); // calculate the residual capacity for each processor O ℓ ; for ( ℓ ← ℓ ≤ n ; ℓ ← ℓ + 1) do j ← index of the equivalence class E j containing processor O ℓ ; r ′ ℓ ← r ℓ − ξ ( j )ˆ t ; // store new ﬂow and recurse if needed K ← (( E m , . . . , E ) , ˆ t ); if ( r ′ == 0) then // residual capacity of equivalence class E is 0 return K ; else K ′ ← SolveMaxThroughput( p , . . . , p n , r ′ , . . . , r ′ n ); return K ◦ K ′ ; // i.e. the concatenation of K and K ′ O , and O in O , by 6, since k = 2 and thusﬂow cannot be discarded before it has been subject to at least two tests.We now calculate the reduction of capacity in O and O caused by the 6 units of ﬂow sent through O , , O , . Flow leaving O , has a 1 / O , and exiting thesystem; for ﬂow that stays in the system to be tested by O , , it has a 1 / / O , , 1 / · / O , as type 0 ﬂow, and 1 / · O , as type 1 ﬂow.Of the 3 / O , , all of it must undergo both test 3 and test 4, since ﬂow isnot discarded until it has failed two tests. Thus that ﬂow reduces the capacity of both O and O by 3 / O , , 1 / O , and then by O only if it passes test3 (which it does with probability 1 / / O , and then by O only if itpasses test 4 (which is does with probability 3 / · (1 / / · /

4) = 5 / O , and 3 · (2 / / · /

2) = 5 / O . Hence the 3 + 3 / O , reduce the capacities of both O and O by 5 / / O , and then to O , , cause the residualcapacities of O and O to be 12 − O and O to be 10 − O , withthe processors in O , . All the processors in the resulting megaprocessor, O , , , , have a residual capacityof 6. Using the proportional allocation of Lemma 3 to route ﬂow sent into O , , , , we assign 1/7 of theﬂow into O , , , to permutation π = { , , , } , 2/7 to permutation π = { , , , } , 2/7 to permutation π = { , , , } , and 2/7 to permutation π = { , , , } . By sending a total of 7 units of ﬂow through O , , , according to this allocation, we send 1, 2, 2, and 2 units respectively along the four permutations,achieving the saturating routing given in Lemma 3.Our ﬁnal routing achieves a throughput of 6 + 7 = 13 which is optimal. Achieving the running time.

Let us ﬁrst consider the running time of the algorithm excluding thecomputation of ξ ( i ) and the time it takes to construct the output representation K . It is easy to see that thealgorithm makes at most n − n − ξ ( i ), the time spent in each recursive call is clearly O ( n ). However, wecan implement the algorithm so as to ensure this time is O (log n ), as follows. First, the maintenance of theequivalence classes can be handled in O (1) time per merge by simply taking a union of the sets of adjacentprocessors in each megaprocessor, instead of recomputing these sets from scratch.Second, we do not need to compute the residual capacity of each megaprocessor at every recursive call. Infact, for all megaprocessors except the ﬁrst one, we only need enough information about its residual capacityto allow us to compute ˆ t ← min i ∈ [2 ,...,m ] (cid:16) R i − R i − ξ ( i ) − ξ ( i − (cid:17) . This suggests that for each megaprocessor i where i ≥

2, we keep the quantity Q i , where Q i = (cid:16) R i − R i − ξ ( i ) − ξ ( i − (cid:17) instead of R i . The megaprocessors can be storedin a priority queue, according to their Q i values.Consider any i where E i or E i − are not involved in a merge. Then( R i − ξ ( i )ˆ t ) − ( R i − − ξ ( i − t ) ξ ( i ) − ξ ( i −

1) = R i − R i − ξ ( i ) − ξ ( i − − ˆ t. Thus following the merge, Q i is decreased by the same amount ˆ t for all such i . Therefore, instead of updatingthe Q i for these i in the priority queue, we can keep their current values, and maintain the sum of the ˆ t values computed so far; this can be subtracted from Q i if its updated value is needed. We do need toremove the two merged megaprocessors from the priority queue, insert the information about the resultingnew megaprocessor, and update the Q i values for megaprocessors i such that E i or E i − were involved in12 merge. Note that we need to change the Q i values for such megaprocessors due to the change in the ξ ( · )value of the newly formed megaprocessor. The above operations can be performed in time O (log n ) time permerge, using the priority queue.Therefore, the running time of the algorithm excluding the computation of ξ ( i ) is O ( n log n + o ) where o is the time required to construct the output. In the pseudocode, the output is computed using the megapro-cessor representation. Since there are at most n recursive calls, there are at most n pairs (( E m , . . . , E ) , ˆ t )in the output, and therefore o = O ( n ).If one chose to convert this representation to a permutation representation, using the combinatorialmethod of Condon et al. [5], then the value of o would be O ( n ).Consider instead the following more compact output representation, which we call the compressed rep-resentation. Suppose the algorithm outputs the initial permutation, then outputs the sequence of mergesperformed, together with the ˆ t values associated with the merges. In this case, we have o = O ( n ).We will next show that the computation of ξ ( i ) throughout the algorithm can be performed in O ( nk )total time. Computing ξ ( i ) . Let E ( i ) k be the k th megaprocessor in the recursive call associated with the i th merge.Let b ( i, k ) be the lowest index of a processor in E ( i ) k , and let c ( i, k ) be the highest index of a processor in E ( i ) k . Let | E ( i ) k | denote the size of that megaprocessor, that is, the number of processors in it. Thus | E ( i ) k | = c ( i, k ) − b ( i, k ) + 1.Let h ( i ) and h ( i ) + 1 be the indices of the megaprocessors merged by the i th merge (i.e. E ( i ) h ( i ) and E ( i ) h ( i )+1 are the megaprocessors merged by the i th merge).Observe that at iteration i after megaprocessor E ( i ) h ( i ) is merged with E ( i ) h ( i )+1 we only recompute ξ ( h ( i )).After the merge, we need to compute ξ ( h ( i )) = k − X j =0 f j ( c ( i, h ( i ) + 1)) · k − j X v =1 Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) ≥ v (cid:3)! / c ( i,h ( i )+1) X t = b ( i,h ( i )) (1 − p t )Consider the denominator P c ( i,h ( i )+1) t = b ( i,h ( i )) (1 − p t ) in the above expression. It is the sum of the failureprobabilities of all processors contained in E ( i ) h ( i ) and E ( i ) h ( i )+1 . To enable this computation to be performed inconstant time per recursive call, we simply store, with each megaprocessor, the sum of the failure probabilitiesof all processors in it. In each recursive call, it only takes constant time to update this information. Recallthat f j ( c ( i, h ( i ) + 1)) for all j ∈ { , . . . , k − } are computed in the initialization procedure. Let D j = k − j X v =1 Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) ≥ v (cid:3) . Given D j for j ∈ { , . . . , k − } , we can compute ξ ( h ( i )) in O ( k ) time. Observe that D j = D j +1 + Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) ≥ k − j (cid:3) . Therefore, given Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) ≥ v (cid:3) for v ∈ { , . . . , k } , we can compute { D , . . . , D k − } in O ( k )time. Finally, observe that Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) ≥ v (cid:3) = Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) ≥ v + 1 (cid:3) + Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) = v (cid:3) . Therefore, given Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) = v (cid:3) for v ∈ { , . . . , k } , we can compute Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) ≥ v (cid:3) for all v ∈ { , . . . , k } in O ( k ) time. To enable these computations, we store, with each megaprocessor, thevalues Pr [ X b,c = v ] for all v ∈ { , . . . , k } , where b and c are respectively the lowest and highest indices ofthe processors contained in that megaprocessor. We will analyze below the total cost of keeping these valuesupdated. 13e denote by C the total cost of computing ξ ( · ) throughout the algorithm, using the implementationdescribed. Since we will have to compute ξ ( · ) at most n times throughout the algorithm, by the argumentsabove, C is bounded by O ( nk ) plus the cost of computing Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) = v (cid:3) for all i ∈ { , . . . , n − } and v ∈ { , . . . , k } . Let us denote the cost of computing Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) = v (cid:3) by C i,v . Therefore, C = O ( nk ) + P n − i =1 P kv =0 C i,v . We show that C = O ( nk ) by proving P n − i =1 P kv =0 C i,v = O ( nk ). Lemma 5. P n − i =1 P kv =0 C i,v = O ( nk ) .Proof. Let E ( i )min = E ( i ) h ( i ) if | E ( i ) h ( i ) | ≤ | E ( i ) h ( i )+1 | and E ( i )min = E ( i ) h ( i )+1 otherwise. Recall that when we needto compute Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) = v (cid:3) , we have already computed and stored Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )) = v (cid:3) and Pr (cid:2) X b ( i,h ( i )+1) ,c ( i,h ( i )+1) = v (cid:3) for all v ∈ { , . . . , k } .Since Pr [ X b,c = v ] = 0 for any b, c if v > c − b + 1, we can compute Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) = v (cid:3) usingthe following equality: Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) = v (cid:3) = min( v, | E ( i ) h ( i ) | ) X j =max(0 ,v −| E ( i ) h ( i )+1 | ) Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )) = j (cid:3) · Pr (cid:2) X b ( i,h ( i )+1) ,c ( i,h ( i )+1) = v − j (cid:3) (1)Thus, we perform at most one multiplication and one addition for each term in Equation 1, yielding C i,v < · min( v + 1 , | E ( i )min | + 1) . (2)We can now bound P i,v C i,v as follows. Each time two megaprocessors E ( i ) h ( i ) and E ( i ) h ( i )+1 merge, we chargethe cost of computing Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) = v (cid:3) , for all v ∈ { , . . . , k } to the processors in the smaller ofthe two megaprocessors, distributing the cost evenly among the processors in the megaprocessor. Thus, wecharge ( P kv =0 C i,v ) / | E ( i )min | to each processor O i ∈ E ( i )min .Let κ j ( i, v ) denote how much of the cost of computing Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) = v (cid:3) we charge to O j during the i th merge. Let κ j ( v ) denote how much of the cost of computing Pr (cid:2) X b ( i,h ( i )) ,c ( i,h ( i )+1) = v (cid:3) forall i = { , . . . , k − } we charge to processor O j . Let κ j denote the total amount we charge to processor O j .In other words, κ j ( i, v ) = (cid:26) C i,v / | E ( i )min | if O j ∈ E ( i )min , κ j ( v ) = n − X i =1 κ j ( i, v ) (4) κ j = k X v =0 κ j ( v ) = n − X i =1 k X v =0 κ j ( i, v ) (5)Then, we have P i,v C i,v = P nj =1 κ j . We will bound P i,v C i,v by proving an upper bound on κ j .Consider any processor O j . We will show that κ j ( v ) = O (1). Let i ′ ( z ) = i ′ ( j, v, z ) be the index of themerge in which processor O j is charged for the cost of computing Pr [ X b,c = v ] for any b, c for the z th time.Formally, let i ′ ( z ) = i ′ ( j, v, z ) = (cid:26) ℓ if ∃ ℓ ∈ [1 , n −

1] s.t. κ j ( ℓ, v ) > ∧ |{ t | t < ℓ, κ j ( t, v ) > }| = z −

10 otherwiseBy Equation 2, C i,v < | E ( i )min | + 2, which implies κ j ( i, v ) < E ( i )min and i ′ ( z ), if i ′ (2) >

0, we have | E ( i ′ ( z ))min | ≥ · | E ( i ′ ( z − | . (7)Combining all these facts, and letting Z = max( x : i ′ ( x ) > κ j ( v ) = n − X i =1 κ j ( i, v ) by deﬁnition ≤ n − X i =2 κ j ( i, v ) κ j ( i, v ) <

4= 4 + Z X z =2 κ j ( i ′ ( z ) , v ) by deﬁnition= 4 + Z X z =2 C i ′ ( z ) ,v | E ( i ′ ( z ))min | by Equation 3 < Z X z =2 v + 2 | E ( i ′ ( z ))min | by Equation 2 ≤ Z X z =2 v + 22 z − · | E ( i ′ (2))min | by Equation 7 < Z X z =2 v z − · | E ( i ′ (2))min | | E ( i ′ (2))min | ≥ ≤ Z X z =2 z − by Equation 6 < . Thus, we have κ j ( v ) = O (1). This yields κ j = P kv =0 κ j ( v ) = O ( k ). Since P nj =1 κ j ≤ n · max i ∈{ ,...,n } κ i , wehave, n − X i =1 k X v =0 C i,v = n X j =1 κ j ≤ n · O ( k ) = O ( nk )Therefore, by Lemma 5, we have C = O ( nk ) + P n − i =1 P kv =0 C i,v = O ( nk ) . Thus, our algorithm runs intotal time O ( n (log n + k ) + o ). SMT ( k ) problem There is a simple and elegant algorithm that solves the

MinCost problem for standard k -of- n testing, dueto Salloum, Breuer, and (independently) Ben-Dov [1,10,11]. It outputs a strategy compactly represented bytwo permutations, one ordering the processors in increasing order of the ratio c i / (1 − p i ), and the other inincreasing order of the ratio c i /p i . Chang et al. and Salloum and Breuer later gave modiﬁed versions of thisalgorithm that output a less compact, but more eﬃciently evaluatable representation of the same strategy[3, 12].We now show how to combine previous techniques to obtain a polynomial-time algorithm for the SMT ( k )problem based on the ellipsoid method. The algorithm uses a technique of Despande and Hellerstein [6].15hey showed that, for 1-of- n testing, an algorithm solving the MinCost problem can be combined with theellipsoid method to yield an algorithm for the

MaxThroughput problem. In fact, as we see in the proofbelow, their approach is actually a generic one, and can be applied to the testing of other functions.The ellipsoid-based algorithm for k -of- n testing makes use of the dual of the LP for the CMT ( k ) problem,which is as follows: Dual of Max-Throughput LP:

Given r , . . . , r n > p . . . , p n ∈ (0 , y i , for all i ∈ { , . . . , n } , minimizing F = n X i =1 r i y i subject to the constraints:(1) P ni =1 g ( π, i ) y i ≥ T ∈ T c ,(2) y i ≥ i ∈ { , . . . , n } . Theorem 6.

There is a polynomial-time algorithm, based on the ellipsoid method, for solving the

SMT ( k ) problem.Proof. The approach of Deshpande and Hellerstein works as follows. The input consists of the p i and the r i , and the goal is to solve the MaxThroughput

LP in time polynomial in n . The number of variablesof the MaxThroughput

LP is not polynomial, so the LP cannot be solved directly. Instead, the idea isto solve it by ﬁrst using the ellipsoid method to solve the dual LP. The ellipsoid method is run using analgorithm that simulates a separation oracle for the dual in time polynomial in n . During the running ofthe ellipsoid method, the violated constraints returned by the separation oracle are saved in a set M . Eachconstraint of the dual corresponds to an ordering T . When the ellipsoid method terminates, a modiﬁedversion of the MaxThroughput

LP is generated, which includes only the variables z T corresponding toorderings T in M (i.e. the other variables z T are set to 0). This modiﬁed version can then be solveddirectly using a polynomial-time LP algorithm. The resulting solution is an optimal solution for the original MaxThroughput

LP.The above approach requires a polynomial-time algorithm for simulating the separation oracle for thedual. Deshpande and Hellerstein’s method for simulating the separation oracle relies on the following observa-tions. In the dual LP for the

MaxThroughput n testing problem, there are n ! constraints correspond-ing to the n ! permutations of the processors. The constraint for permutation π is P ni =1 g ( T ( π ) , i ) y i ≤

1. Ifone views y as a vector of costs, where the cost of i is y i , then P ni =1 g ( T, i ) y i is the expected cost of testingan item x using ordering T . Thus one can determine the ordering T that minimizes P ni =1 g ( T, i ) y i by solvingthe MinCost problem with probabilities p , . . . , p n and cost vector y . (Liu et al.’s approximation algorithmfor generic MaxThroughput also relies on this observation [9].)If the

MinCost ordering T has expected cost less than 1, then the constraint it corresponds to is violated.Otherwise, since the right hand side of each constraint is 1, y obeys all constraints. Thus simulating theseparation oracle for the dual on input y can be done by ﬁrst running the MinCost algorithm (withprobabilities p i and costs y i ) to ﬁnd a MinCost ordering T . Once T is found, the values of the coeﬃcients g ( T, i ) are calculated. These are used to calculate P ni =1 g ( T, i ), the expected cost of T . If this value is lessthan 1, then the constraint P ni =1 g ( T, i ) is returned.To apply the above approach to

MaxThroughput for standard k -of- n testing, we observe that in thedual LP for this problem, there is a constraint, P ni =1 g ( T, i ) y i ≤

1, for every possible strategy T , We cansimulate a separation oracle for the dual on input y by running a MinCost algorithm for standard k -of- n testing. We also need to be able to compute the g ( T, i ) values for the strategy output by that algorithm.The algorithm of Chang et al. for the

MinCost standard k -of- n testing problem is suitable for this purpose,as it can easily be modiﬁed to output the g ( T, i ) values associated with its output strategy T [3].16 eferences [1] Yosi Ben-Dov. Optimal testing procedure for special structures of coherent systems. ManagementScience , 27:1410–1420, 1981.[2] Endre Boros and Tongu¸c ¨Unl¨uyurt. Diagnosing double regular systems.

Annals of Mathematics andArtiﬁcial Intelligence , 26(1-4):171–191, 1999.[3] Ming-Feng Chang, Weiping Shi, and W. Kent Fuchs. Optimal diagnosis procedures for k-out-of-nstructures.

IEEE Transactions on Computers , 39(4):559–564, 1990.[4] Anne Condon, Amol Deshpande, Lisa Hellerstein, and Ning Wu. Flow algorithms for two pipelined ﬁlterordering problems. In

Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposiumon Principles of Database Systems, June 26-28, PODS 2006, Chicago, Illinois, USA , pages 193–202.ACM, 2006.[5] Anne Condon, Amol Deshpande, Lisa Hellerstein, and Ning Wu. Algorithms for distributional andadversarial pipelined ﬁlter ordering problems.

ACM Transactions on Algorithms , 5:24:1–24:34, March2009.[6] Amol Deshpande and Lisa Hellerstein. Parallel pipelined ﬁlter ordering with precedence constraints.

ACM Transactions on Algorithms , 8(4):41, 2012.[7] Michael R. Garey. Optimal task sequencing with precedence constraints.

Discrete Mathematics , 4:37–56,1973.[8] Murali S. Kodialam. The throughput of sequential testing. In

Proceedings of Integer Programmingand Combinatorial Optimization, 8th International IPCO Conference, Utrecht, The Netherlands, June13-15, 2001 , volume 2081 of

Lecture Notes in Computer Science , pages 280–292. Springer, 2001.[9] Zhen Liu, Srinivasan Parthasarathy, Anand Ranganathan, and Hao Yang. A generic ﬂow algorithmfor shared ﬁlter ordering problems. In

Proceedings of the Twenty-Seventh ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008, June 9-11, 2008, Vancouver,BC, Canada , pages 79–88. ACM, 2008.[10] Salam Salloum.

Optimal testing algorithms for symmetric coherent systems . PhD thesis, University ofSouthern California, 1979.[11] Salam Salloum and Melvin A. Breuer. An optimum testing algorithm for some symmetric coherentsystems.

Journal of Mathematical Analysis and Applications , 101(1):170 – 194, 1984.[12] Salam Salloum and Melvin A. Breuer. Fast optimal diagnosis procedures for k-out-of-n:G systems.

IEEE Transactions on Reliability , 46(2):283 –290, June 1997.[13] Neil Sarnak and Robert Endre Tarjan. Planar point location using persistent search trees.

Communi-cations of the ACM , 29(7):669–679, 1986.[14] Tongu¸c ¨Unl¨uyurt. Sequential testing of complex systems: a review.