On optimal policy in the group testing with incomplete identification
aa r X i v : . [ s t a t . O T ] A p r On optimal policy in the group testing withincomplete identification
Yaakov MalinovskyDepartment of Mathematics and StatisticsUniversity of Maryland, Baltimore County, Baltimore, MD 21250, USAOctober 2, 2018
Abstract
Consider a very large (infinite) population of items, where each item independentfrom the others is defective with probability p , or good with probability q = 1 − p . Thegoal is to identify N good items as quickly as possible. The following group testingpolicy (policy A) is considered: test items together in the groups, if the test outcomeof group i of size n i is negative, then accept all items in this group as good, otherwisediscard the group. Then, move to the next group and continue until exact N gooditems are found. The goal is to find an optimal testing configuration, i.e., groupsizes, under policy A, such that the expected waiting time to obtain N good items isminimal. Recently, Gusev (2012) found an optimal group testing configuration underthe assumptions of constant group size and N = ∞ . In this note, an optimal solutionunder policy A for finite N is provided. Keywords:
Dynamic programming; Optimal design; Partition problem; Shur-convexity1
Introduction and problem formulation
Consider a subset of x items, where each item has the probability p to be defective, and q = 1 − p to be good independently from the other items. Following the accepted notation inthe group testing literature, we call that model a binomial model (Sobel and Groll, 1959).A group test applied to the subset x is a binary test with two possible outcomes, positiveor negative. The outcome is negative if all x items are good, and the outcome is positiveif at least one item among x items is defective.In 1943 , Robert Dorfman introduced the concept of group testing based on the needto administer syphilis tests to a very large number of individuals drafted into the U.S.army during World War II. The goal was complete identification of all drafted people. TheDorfman procedure (Dorfman, 1943) is a two-stage procedure, where the group is testedin the first stage and if the outcome is positive, then in the second stage individual testingis performed. If the group test outcome is negative in the first stage, then all items inthe group are accepted as good. In this simple procedure, the saving of time may besubstantial, especially for the small values of p . For example, if p = 0 .
01, when comparedwith individual testing, the reduction in the expected number of tests is about 80%.Since the Dorfman work, group testing has wide-spread applications from communica-tion networks (Wolf, 1985) to DNA and blood screening (Du and Hwang, 2006; Bar-Lev et al. ,2017). Until today, an optimal group testing procedure for complete identification under bi-nomial model is unknown for p < (3 − / ) /
2. For p ≥ (3 − / ) / incomplete identification problem was introduced2y Bar-Lev et al. (1990) and extended by Bar-Lev et al. (1995). In their model, demand D of good items should be fulfilled by purchasing two kinds of items. The first kind is100% quality items with the purchasing cost s per unit, and the second kind is 100 q %quality items with purchasing cost c per unit. In addition, there is cost K for each group-test regardless of the size of the tested group with the items of 100 q % quality. Underthese constrains/assumptions, the authors found an optimal number of 100 q % quality topurchase (once) and an optimal group size chosen from the purchased group, in each stageof the testing procedure. It is related to the problem we discuss here, but with differentassumptions and constrains.Consider the binomial model with a very large (infinite) population of items. The goalis to identify N good items as quickly as possible. This is an incomplete identification problem. We consider the following group testing policy (policy A): Test items together inthe groups, if test outcome of the group i of size n i is negative, then accept all items in thisgroup as good, otherwise discard the group. Then, move to the next group and continueuntil exact N good items will be found. The goal is to find an optimal testing configuration,i.e., group sizes, under policy A, such that the expected waiting time to obtain N gooditems is minimal.In the recent work (Gusev, 2012) the problem of incomplete identification was consid-ered. The policy A was applied under assumptions N = ∞ and a constant group size.The author found an optimal group size as a function of q . It can be explained as follows:Each time a group of size n is tested, if the test outcome of the group is negative, thenaccept all n items in this group as good, otherwise discard the group and take the nextgroup of size n and so on. The waiting time (number of tests until first good group) isa geometric random variable with expectation 1 q n . Therefore, the mean waiting time perone good item is 1 nq n . We want to minimize this quantity. It is equivalent to maximizingthe function µ ( n, q ) = nq n , which is concave as a function of continuous variable n . But,since the feasible solution is an integer, the maximizer is not necessarily unique. In the3roposition below we present a slightly modified result by Gusev (2012), which found anoptimal group size as the function of q . We also follow the accepted notation in the grouptesting literature and denote p as the probability to be defective, which is differ from Gusev(2012) notation. Proposition 1 ((Gusev, 2012)) . Define n ∗∗ = 1ln(1 /q ) . Under policy A with the constantgroup size and N = ∞ , the optimal group size for the q ≥ / is n ∗ = n ∗∗ if integer ⌊ n ∗∗ ⌋ if µ ( ⌊ n ∗∗ ⌋ , q ) > µ ( ⌈ n ∗∗ ⌉ , q ) ⌈ n ∗∗ ⌉ if µ ( ⌊ n ∗∗ ⌋ , q ) < µ ( ⌈ n ∗∗ ⌉ , q ) ⌊ n ∗∗ ⌋ or ⌈ n ∗∗ ⌉ if µ ( ⌊ n ∗∗ ⌋ , q ) = µ ( ⌈ n ∗∗ ⌉ , q ) , (1) where ⌊ x ⌋ ( ⌈ x ⌉ ) for x > is defined as the largest (smallest) integer, which is smaller(larger) or equal to x . For q < / , the optimal group size n ∗ equals . Comment 1. (Cut-off point)There is an analogy of Ungar’s cut-off point for the complete identification. It seems thatfor N = 2 , the policy A with the groups of size 2 is the only reasonable policy for an in-complete identification problem. For N = 2 , policy A is better than the individual testing ifthe expected waiting time /q is less than the expected waiting time /q under individualtesting, i.e., q > / . Now, following Ungar (1960) with adoption to incomplete identifi-cation case, one can show that if q < / , then individual testing is the optimal among allpossible strategies for any N . In the boundary case q = 1 / , the individual testing is anoptimal strategy. he problem formulation: Finite N Under policy A , we are interested in finding an optimal partition { m , . . . , m J } with m + . . . + m J for some J ∈ { , . . . , N } such that the expected total waiting time to obtain N good items is minimal, i.e., { m , . . . m J } = arg min n ,...,n I (cid:26) q n + . . . + 1 q n I (cid:27) , subject to I X i =1 n i = N, I ∈ { , . . . , N } . (2) Denote n ( n = 1 , . . . , N ) as a number of good items remains yet unidentified and H ( n )an optimal total expected time to obtain n good items. Then, if we test a group of size x ( x = 1 , . . . , n ), we have H ( n ) = q x H ( n − x ) + (1 − q x ) H ( n ) , n = 2 , . . . , N ; x = 1 , . . . , n, (3)where H (0) = 0 , H (1) = 1 . Combining H ( n ) from the left and right-hand side of (3) we obtain the dynamic pro-gramming (DP) algorithm: H (0) = 0 , H (1) = 1 , (4) H ( n ) = min x =1 ,...,n (cid:26) q x + H ( n − x ) (cid:27) , n = 2 , . . . , N. The complexity of calculation of H ( N ) is O ( N ).We present below two examples that help to illustrate how subgroup sizes may differ.5 xample 1. N = 250 , p = 0 . . In this case n ∗ = 99 or . An optimal DP algorithmfor the problem A gives us the unique (until permutation) solution: n = 83 , n = 83 , n = 84 with expected waiting time equals to . . Example 2. N = 220 , p = 0 . , then the optimal solution is n = n = 110 with expectedwaiting time . . Both examples provide insight on the possibility that under an optimal policy A , sub-group sizes differ at most by one unit. In the following proposition we prove this conjecture.This result also allows us to reduce the computational complexity of H ( N ) from O ( N ) to O ( N ). Proposition 2.
For the given
I, n , . . . , n I exist n ∗ , . . . , n ∗ I with | n ∗ i − n ∗ j | ≤ , for all i, j =1 , . . . , I , and I X i =1 n i = I X i =1 n ∗ i such that Iq n I ≤ q n ∗ + . . . + 1 q n ∗ I ≤ q n + . . . + 1 q n I , (5) where n I = n + . . . + n I I .
Proof.
The function f ( x , . . . , x I ) = 1 q x + . . . + 1 q x I is Shur-convex on the finite support asthe function of continuous variables x , . . . , x I . Since ( n I , . . . n I ) ≺ ( n ∗ , . . . , n ∗ I ) ≺ ( n , . . . , n I ),where ‘ ≺ ’ denotes majorization (see Steele (2004)), from Shur-convexity of f ( x , . . . , x I )we obtain f ( n I , . . . n I ) ≤ f ( n ∗ , . . . , n ∗ I ) ≤ f ( n , . . . , n I ), and Proposition 2 follows.We can use Proposition 2 to solve problem (2) with computational complexity O ( N ) inthe following way. Fix I = 1 , . . . , N . If n I = N/I is an integer then, due to (5) it follows thatfor the fixed number of groups I it is optimal solution, which minimizes the expected totaltime. Otherwise, any partition with | n ∗ i − n ∗ j | ≤ , for all i, j = 1 , . . . , I , and I X i =1 n ∗ i = N is an optimal. As such, we have to repeat the algorithm for I = 1 , I = 2 , . . . , I = N , thecomputational complexity is O ( N ). 6inally, in the spirit of Gilstein (1985), we present an optimal solution for the optimiza-tion problem (2). The solution uses the value of n ∗ and allows us to reduce the optimizationproblem (2) in such a way that we have to consider only up to the two partitions. Then,we need to evaluate the expected total waiting time for each of these partitions and chooseone with the minimal expectation. Theorem 1.
Suppose q > / . In the case of non-unique solution in (1) , choose n ∗ = ⌊ n ∗∗ ⌋ . Denote s = j Nn ∗ k and θ = N − sn ∗ . Then, an optimal partition { m , . . . , m J } underpolicy A , i.e., an optimal solution in the equation (2) is the following:(i) If θ = 0 then m = . . . = m J = n ∗ ,(ii) If ≤ θ ≤ s and n ∗ in (1) is not unique, then an optimal partition is m , . . . , m s with m i = ⌊ n ∗∗ ⌋ or m i = ⌈ n ∗∗ ⌉ for all i = 1 , . . . , s .(iii) Otherwise, an optimal partition is one of the following:(a) Distribute θ among s groups (with initial size n ∗ ) in such a way that | m i − m j | ≤ for all i, j ∈ { , . . . , s } .(b) Build up an additional group (group s+1) by taking the reminder θ and unitsfrom the above s groups (with initial size n ∗ ) in such way that | m i − m j | ≤ forall i, j ∈ { , . . . , s + 1 } .Proof. (i) Follows from the convexity of the function 1 /q x as a continuous variable x forthe fixed q .(ii) Again, using the convexity we have1 q ⌊ n ∗∗ ⌋ + 1 q ⌈ n ∗∗ ⌉ < q n + 1 q n , for any n , n such that n + n = ⌊ n ∗∗ ⌋ + ⌈ n ∗∗ ⌉ and { n , n } 6 = {⌊ n ∗∗ ⌋ , ⌈ n ∗∗ ⌉} , andthe result follows. 7iii) From Proposition 2 we know that under an optimal policy A , subgroup sizes differ atmost by one unit. Consider case (a), i.e., s groups with s = u + u , where u is thenumber of subgroups of size n ∗ + t , for some t ≥
1, and u is the number of subgroupsof size n ∗ + t + 1, such that u ( n ∗ + t ) + u ( n ∗ + t + 1) = N . Suppose that we partition N into a fewer subgroups s such that s < s . Suppose that s = v + v , where v is the number of subgroups of size n ∗ + j , for some j > t ≥
0, and v is the numberof subgroups of size n ∗ + j + 1, such that v ( n ∗ + j ) + v ( n ∗ + j + 1) = N . Denote, f ( x ) ≡ f ( x, q ) = 1 xq x . For the fixed q the function f ( x ) is the convex function of thecontinuous variable x . Therefore, we have f ( n ∗ + t ) n ∗ + t < f ( n ∗ + t + 1) n ∗ + t + 1 < f ( n ∗ + j ) n ∗ + j < f ( n ∗ + j + 1) n ∗ + j + 1 . (6)From (6) we get the following inequality u f ( n ∗ + t ) + u f ( n ∗ + t + 1) < v f ( n ∗ + j ) + v f ( n ∗ + j + 1) . Therefore, the partition into fewer than s subgroups cannot be optimal. Considercase (b): The similar arguments lead to the conclusion that partitioning into morethan s + 1 subgroups cannot be optimal. In this work, we provide an optimal solution under policy A for an incomplete identificationproblem. It is a natural complement to the recent investigation by Gusev (2012). Thereis an interesting open question: Overall, is policy A an optimal policy for the incompleteidentification problem in the sampling from infinite population with finite demand N ?Also, it is important to note that we assume that parameter p is known. In many practical8ituations, the parameter is unknown or the limited knowledge is available, such as theupper bound or lower bound. In this case, the Bayesian methodology from the completeidentification case (Sobel and Groll, 1966) or the minimax method (Malinovsky and Albert,2015) can be adopted. Another possible direction for the investigation is to remove theassumption that the tests are error-free. In this case, the expected total waiting timecannot be used as the only criterion for comparison among group-testing procedures andadditional criteria must be considered (Bar-Lev et al. , 2006; Malinovsky et al. , 2016). Wedo not attempt to investigate erroneous tests in the current work and leave this directionfor future investigations. Acknowledgement
The author thanks the editor for his time and advice.
References
Bar-Lev, S. K., Boneh, A., Perry, D. (1990). Incomplete identification models for group-testable items. N av. Res. Logist.
7, 647–659.Bar-Lev, S. K., Parlar, M., Perry, D. (1995). Optimal sequential decisions for incompleteidentification of group-testable items. S equential Anal.
4, 41–57.Bar-Lev, S. K., Stadje, W., Van der Duyn Schouten, F. A. (2006). Group testing procedureswith incomplete identification and unreliable testing results. A pplied Stochastic Modelsin Business and Industry
2, 281–296.Bar-Lev, S. K., Boxma, O., Kleiner, I., Perry, D. (2017). Recycled incomplete identificationprocedures for blood screening. E ur. J. Oper. Res.,
59, 330–343.9orfman, R. (1943). The detection of defective members of large populations. A nn. Math.Statist.
4, 436–440.Du, D., Hwang, F. K. (2006). Pooling Design and Nonadaptive Group Testing: ImportantTools for DNA Sequencing.
World Scientific, Singapore .Gilstein, C. Z. (1985). Optimal partitions of finite populations for Dorfman-type grouptesting.
J. Stat. Plan. Inf.
2, 385–394.Gusev, A. L. (2012). The optimal number of items in a group for group testing.
Statist.Probab. Lett.
2, 2083–2085.Hwang, F. K. (1976). An optimal nested procedure in binomial group testing.
Biometrics
2, 939–943.Malinovsky, Y., Albert, P. S. (2015). A note on the minimax solution for the two-stagegroup testing problem.
The American Statistician
9, 45–52.Malinovsky, Y., Albert, P. S. (2017). Revisiting nested group testing procedures: newresults, comparisons, and robustness.
The American Statistician . In press.Malinovsky, Y., Albert, P. S., and Roy, A. (2016). Reader Reaction: A Note on the Eval-uation of Group Testing Algorithms in the Presence of Misclassification.
Biometrics Bell System Tech. J.
8, 1179–1252.Sobel, M., Groll, P. A. (1966). Binomial group-gesting with an unknown proportion ofdefectives.
Technometrics , 631–656.Steele, J. M. (2004). The Cauchy-Schwarz master class. Cambridge University Press .10terrett, A. (1957). On the detection of defective members of large populations. A nn.Math. Statist.
8, 1033–1036.Ungar, P. (1960). Cutoff points in group testing.
Comm. Pure Appl. Math.
IEEE Trans. Inf.Theory.3