# Nested Group Testing Procedures for Screening

aa r X i v : . [ s t a t . M E ] F e b Nested Group Testing Procedures forScreening

Yaakov Malinovsky ∗ and Paul S. Albert † February 19, 2021

Abstract

This article reviews a class of adaptive group testing procedures that operate undera probabilistic model assumption as follows. Consider a set of N items, where item i has the probability p ( p i in the generalized group testing) to be defective, and theprobability 1 − p to be non-defective independent from the other items. A group testapplied to any subset of size n is a binary test with two possible outcomes, positive ornegative. The outcome is negative if all n items are non-defective, whereas the outcomeis positive if at least one item among the n items is defective. The goal is completeidentiﬁcation of all N items with the minimum expected number of tests. Keywords: Dynamic programming; Disease screening; Information theory; Partition prob-lem; Optimal design

In the last few months of the COVID-19 pandemic, the mostly-forgotten practice of grouptesting has been raised again in many countries as an eﬃcient method for addressing anepidemic while facing restrictions of time and resources. During this very short period, nu-merous publications and reports have appeared in both scientiﬁc and non-scientiﬁc journals.The reader can easily see them, for example, in the

Washington Post, NY Times, ScientiﬁcAmerican, Science Advances, medRxiv, bioRxiv, ArXiv , and elsewhere.The story goes back to 1943, when Robert Dorfman published a manuscript where heintroduced the concept of group testing in response to the need to administer syphilis teststo millions of individuals drafted into the U.S. Army during World War II (Dorfman, 1943). ∗ Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, MD21250, USA † Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute,Rockville, MD 20850, USA. The work was supported by the National Cancer Institute Intramural Program. ”A largenumber, N , of people are subject to a blood test. This can be administered in two ways. (i)Each person is tested separately. In this case N tests are required. (ii) The blood samples of k people can be pooled and analyzed together. If the test is negative, this one test suﬃces forthe k people. If the test is positive, each of the k persons must be tested separately, and all k + 1 tests are required for the k people. Assume the probability p that the test is positive isthe same for all and that people are stochastically independent. ”Procedure ( ii ) is commonly referred to as the Dorfman (D) two-stage group testing (GT)procedure.In general, the above setting assumes a probabilistic model where there are N individualsto be tested, test outcomes are independent, and each individual has the same probability p to be infected (Binomial model). In this article, we will discuss diﬀerent GT proceduresunder this model, where comparisons are done based on the expected number of tests. Thefundamental result by Peter Ungar (Ungar, 1960) shows that if p > p u = (3 − √ / ≈ . p < p u , then it is not optimal.However, it is important to note that despite 80 years’ worth of research eﬀort, the optimalprocedure is yet unknown for p < p u and a general N .This article deals with nested GT procedures, of which the Dorfman (1943) procedure isa member. A nested algorithm has the property that if a positive subset I is identiﬁed, thenext subset I that we will test is a proper subset of I –that is, I ⊂ I . This natural classof GT procedures was deﬁned by Sobel and Groll (1959) and Sobel (1960). Procedure D belongs to this class with the restriction that up to two stages are required. p case Recall that we prefer procedure/design A over procedure/design B if E A ≤ E B , where E isthe corresponding expected number of tests. In a group of size k ≥ , the total number of tests is 1 with probability q k ( q = 1 − p )and k + 1 with probability 1 − q k . Therefore, the expected number of tests per person inProcedure D is E D ( k, p ) = 1 − q k + 1 k , for k ≥

2; and equals 1 otherwise.Dorfman numerically found the optimum group size k when assuming that a populationis large (inﬁnite) and p is ﬁxed. For example, if p = 0 .

01, then the optimal group size is 11.Speciﬁcally, this means that for testing N = 999 ,

999 individuals, we need only 195 ,

571 testsin expectation. Samuels (1978) showed that an optimal value of k , k ∗ D ( p ) is a non-increasingfunction of p , which is 1 for p > − / / ≈ .

31 and otherwise is either 1 + [ p − / ] or2 + [ p − / ], where [ x ] denotes the integer part of x .There is a logical inconsistency in Procedure D . It is clear that any ”reasonable” grouptesting plan should satisfy the following property: ”A test is not performed if its outcomecan be inferred from previous test results” (Ungar, 1960). Procedure D does not satisfy this2roperty since if the group is positive and all but the last person are negative, the last personis still tested. The modiﬁed Dorfman procedure, which we deﬁne as D ′ , would not test thelast individual in that case (Sobel and Groll, 1959). Also, Procedure D ′ will be preferredover individual testing if p is below Ungat’s cut-oﬀ point (Malinovsky and Albert, 2019). For k ≥ , the total number of tests is 1 with probability q k , k with probability q k − (1 − q ), and k +1 with probability 1 − q k − . Therefore, the expected number of tests per person in a groupof size k under Procedure D ′ : E D ′ ( k, p ) = 1 − q k + 1 /k − (1 /k )(1 − q ) q k − . Pfeifer and Enis(1978) showed that an optimal value k ∗ D ′ is the smallest k value that satisﬁes E D ′ ( k, p ) ≤ E D ′ ( k − , p ) and E D ′ ( k, p ) < E D ′ ( k + 1 , p ). It was conjectured and empirically veriﬁedin Malinovsky and Albert (2019) that the optimal group size k ∗ D ′ ( p ) is equal to either ⌊ p − / ⌋ or ⌈ p − / ⌉ . Sterrett (1957) realized that one can improve the eﬃciency of the GT procedure by a sequen-tial modiﬁcation of Procedure D ′ . If in the ﬁrst stage of Procedure D ′ a group is positive,then individuals are tested one-by-one until the ﬁrst positive individual is identiﬁed, or untilall but the last person are negative; in the latter case, the positivity of the last individual fol-lows from the positivity of the group. Otherwise, if the ﬁrst individual identiﬁed as positiveis not the last in the group, then the ﬁrst stage of Procedure D ′ is applied to the remaining(nonidentiﬁed) individuals. This process is repeated until all individuals are identiﬁed. Asimple closed-form expression for the expected number of tests per person was provided bySobel and Groll (1959): E S ( k, p ) = 1 k (cid:20) k − ( k − q − − q k +1 − q (cid:21) . It was conjectured andempirically veriﬁed in Malinovsky and Albert (2019) that for 0 < p < p U the optimal groupsize k ∗ D ′ ( p ) is equal to ⌊ p /p ⌋ or ⌊ p /p ⌋ + 1 or ⌊ p /p ⌋ + 2. Using above conjectures onecan verify that lim p ↓ E D ( k ∗ D , p ) E S ( k ∗ S , p ) = √ . Some extensions of the Sterrett (S) procedure werepresented in Johnson et al. (1991).

Finite Versus Inﬁnite Population

A ﬁnite population of size N is not necessarily divisible by k . Therefore, for a ﬁnite pop-ulation of size N and a given Procedure A ∈ { D, D ′ , S } , we have to solve the followingoptimization problem: ﬁnd the optimal partition { n , . . . , n I } with n + . . . + n I = N forsome I ∈ { , . . . , N } such that E A ( k, p ) is minimal. A common method to solve such anoptimization problem is dynamic programming (DP) (Sobel and Groll, 1959). It was conjec-tured by Lee and Sobel (1972) for that Procedure D , the optimal partition subgroup sizesdiﬀer at most by one unit. Gilstein (1985) proved a similar result for Procedure D ′ , andMalinovsky and Albert (2019) for Procedure S . Procedures

D, D ′ , and S ﬁt into a larger class of procedures discussed below.3 n Optimal Hierarchical Procedure The hierarchical class procedure was introduced by Sobel and Groll (1959) and deﬁnedas follows (see also Hwang et al. (1981)): A procedure is in the hierarchical class (HC) if twounits are only tested together in a group if they have an identical test history, i.e. if eachprevious group test contains either both of them or none of them.It follows from this deﬁnition that a procedure in the HC is similar to the multistageDorfman procedure. An optimal hierarchical procedure was obtained by Sobel and Groll(1959) as a dynamic programming algorithm with computational cost O ( N ), which theycalled Procedure R . This was recently computationally improved by Zimmerman (2017)(see also Malinovsky (2019a) for a discussion). An Optimal Nested Procedure

This class of GT procedures was deﬁned by Sobel and Groll (1959) and Sobel (1960, 1967)A nested procedure requires that between any two successive tests n units not yet classiﬁedhave to be separated into only (at most) two sets. One set of size m ≥

0, called the “defectiveset,” is known to contain at least one defective unit if m ≥ n − m ≥ N = 5, then there are 235 ,

200 possible algorithms (Moon and Sobel, 1977). Sobel and Groll(1959) overcame this problem by proposing a DP algorithm that ﬁnds the optimal nestedalgorithm, which Sobel and Groll termed “Procedure R .” There was a large research eﬀortto reduce the O ( N ) computational complexity of the original proposed algorithm (Sobel,1960; Kumar and Sobel, 1971; Hwang, 1976a; Yao and Hwang, 1990). Zaman and Pippenger(2016) provided an asymptotic analysis of the optimal nested procedure; see also Malinovsky and Albert(2019) for discussion. In addition, the connection of group testing with noiseless-coding the-ory was presented in the group testing literature by Sobel and Groll (1959) and furtherinvestigated in Sobel (1960, 1967). In particular, for N = 2 the procedures D ′ , S, R , and R coincide and are the optimal GT procedures. The optimality follows from the fact thatfor N = 2 they are equivalent to the optimal preﬁx Huﬀman code (Huﬀman, 1952) with theexpected length L ( N ). In general, for any N , L ( N ) can serve as a theoretical lower bound forthe expected number of tests of an optimal GT procedure; however, the complexity of calcu-lation of L ( N ) is O (cid:0) N log (2 N ) (cid:1) . Therefore, even for small N , obtaining the exact value of L ( N ) is impossible. A well-known noiseless coding theorem provides the information theorybounds for L ( N ) as H ( p ) ≤ L ( N ) ≤ H ( p ) + 1, where H ( p ) = N (cid:20) p log p + q log q (cid:21) is theShannon entropy. For a comprehensive discussion, see Katona (1973).Below, we compare Procedures D ′ , S , Optimal Hierarchical ( R ), and Optimal Nested( R ) for diﬀerent p with respect to the expected number of tests for N = 100. For Procedures D ′ and S , the optimal conﬁguration for ﬁnite population was found in Gilstein (1985) and4n Malinovsky and Albert (2019), respectively.Table 1: The minimal (optimal) expected number of tests per 100 individuals for Procedures D ′ , S, Hierarchical ( R ), and Nested ( R ) for diﬀerent p . p D ′ S R R H ( p )0.001 6.278 4.605 1.9554 1.766 1.1410.01 19.470 15.181 9.6872 8.320 8.0790.05 41.807 36.018 32.0186 28.958 28.6400.10 57.567 52.288 50.6752 47.375 46.9000.20 77.872 74.974 74.974 72.875 72.1920.25 84.375 83.875 83.875 82.191 81.1280.30 90.500 90.500 90.500 88.889 88.1290.35 96.375 96.375 96.375 95.633 93.4070.38 99.780 99.780 99.780 99.730 95.804 Table 1 shows that for each value of p there is a consistent ranking among optimalProcedures D ′ , S , Optimal Hierarchical ( R ), and Optimal Nested ( R ) with respect to theexpected total number of tests: Procedure R is the best, R is the second-best, S is the third-best, and D ′ is the worst. The consistent ranking among Procedures D ′ , R , and R followsfrom their deﬁnitions; R is similar to D ′ but without being limited in maximal numberof stages as D ′ , and R does not have a restriction (as R has) that any two units only betested together in a group if they have an identical test history. Meanwhile, Procedure S alsobelongs to the hierarchical class, but has the restriction that in a positive group, individualsare tested one-by-one until the ﬁrst positive individual is identiﬁed, or until all individualsbut the last one are determined negative. Consequently, Procedure R and therefore R rank higher than Procedure S . To the best of our knowledge, no theoretical results compareoptimal Procedures D ′ and S . It is important to note that for some values of p and N , tiesamong the procedures are possible (case N = 2 was mentioned early in this section). p The generalized group testing problem (GGTP), ﬁrst introduced by Sobel (1960), consistsof N stochastically independent units u , u , . . . , u N , where unit u i has the probability p i (0 < p i <

1) to be defective and the probability q i = 1 − p i to be non-defective. Weassume that the probabilities p , p , . . . , p N are known and that we can decide the order inwhich the units will be tested. All units have to be classiﬁed as either non-defective ordefective by group testing. Since its introduction, GGTP has seen considerable theoreticalinvestigation (Lee and Sobel (1972); Nebenzahl and Sobel (1973); Katona (1973); Hwang(1976a); Yao and Hwang (1988a,b); Kurtz and Sidi (1988); Kealy et al. (2014); Malinovsky(2019b, 2020); Malinovsky et al. (2020)). Dorfman and Sterrett Procedures

Ideally, under procedure A ( A ∈ { D, D ′ , S } ) we are interested in ﬁnding an optimal partition { m , . . . m I } with m + . . . + m I = N for some I ∈ { , . . . , N } such that the total expected5umber of tests is minimal, i.e. { m , . . . m I } = arg min n ,...,n J E A ( n , n , . . . , n J ) subject to P Ji =1 n i = N, J ∈ { , . . . , N } , where E A ( n , n , . . . , n J ) = E A (1 : n ) + · · · + E A (1 : n J ), and E A (1 : n j ) is the total expected number of tests (under procedure A ) in a group of size n j .This task is a hard computational problem, and moreover impossible to perform because thetotal number of possible partitions of a set of size N is the Bell number B ( N ) = l e P Nj =1 j N j ! m ,which grows exponentially with N . For example, B (13) = 27 , , D , due to Hwang (1975, 1981). Hwang proved thatunder Procedure D , an optimal partition is an ordered partition (i.e. each pair of subsets hasthe property that the numbers in one subset are all greater than or equal to every number inthe other subset); he also provided a dynamic programming algorithm for ﬁnding an optimalpartition with computational eﬀort O ( N ). However, the ordered partition is not optimalfor Procedures D ′ and S (Malinovsky, 2019b), and ﬁnding an optimal partition for theseprocedures is a hard computational problem. That said, one can evaluate the optimal D ′ and S algorithm under a predetermined order of p’s. In Table 2, we compare Procedures D, D ′ and S for ordered p’s, where the method for Procedure D was developed by Hwang(1975) and those for Procedures D ′ and S by (Malinovsky, 2019b). Hierarchical and Nested Procedures

For the ﬁxed predetermined order of p , p , . . . , p N an optimal nested and hierarchical pro-cedures with respect to the expected total number of tests were developed as DP algorithmsin Kurtz and Sidi (1988) and in Malinovsky et al. (2020), respectively. Numerical comparisons

We generated the vector p , p , . . . , p from a Beta distribution with parameters α = 1 , β =(1 − p ) /p such that the expectation equals p . We repeat this process M = 1000 times foreach value of p . Each time an optimal ordered partition with the corresponding expectednumber of tests was found for Procedures D , D ′ , and S , the hierarchical (HL) and nested(ON) procedures were obtained using DP algorithm (based on the previously mentionedreferences). Also, in the GGTP, Shannon entropy P Ni =1 n p i log p i + (1 − p i ) log − p i o canserve as the information lower bound for the expected number of tests of an optimal grouptesting procedure. The averages of 1000 repetitions are presented in Table 2 below.Table 2: Comparison of Procedures D ′ , S , Hierarchical (HL), and Nested (ON). p N = 100 D ′ S HL ON

Shannon Entropy0.001 5.738 3.745 1.867 1.697 1.0810.01 17.345 13.121 8.720 7.730 7.4740.05 37.095 31.801 28.212 25.797 25.6530.10 50.758 46.105 43.606 41.192 40.8550.20 67.536 64.33 62.69 61.030 60.110.30 77.598 75.358 74.382 72.611 70.303

Table 2 shows the same ranking pattern among procedures as was observed in Table 16or the homogeneous p case. This pattern can be explained along the lines of the previousdiscussion concerning homogeneous p . Unknown p In many practical situations, the exact value p of the probability of disease prevalence isunknown or else only some limited information is available, for example a range. Sinceall of the above-presented procedures require knowledge of p , there is a need to evaluate p during the process of testing. For a nested algorithm R

1, Sobel and Groll (1966) proposed aBayesian approach that uses upcoming information during testing to evaluate and revaluate p . A minimax approach for Procedure D was introduced by Malinovsky and Albert (2015)and for Procedures D ′ and S in Malinovsky and Albert (2019). Errors in the Testing

In many settings, particularly in biology and medicine, tests may be subject to measurementerror or misclassiﬁcation. This issue occurs in individual testing but may be enhanced ingroup testing. In particular, for many applications, the sensitivity of a grouped test maydecrease with group size (this is often referred to as dilution). Graﬀ and Roeloﬀs (1972)and Hwang (1976b) recognized early that when tests are misclassiﬁed, the objective functionshould not be the expected number of tests. Graﬀ and Roeloﬀs (1972) and Burns and Mauro(1987) proposed a modiﬁcation of the Dorfman procedure and searched for a design thatminimized total cost as a linear function of the expected number of tests, weighted theexpected number of good items misclassiﬁed as defective, and weighted the expected numberof defective items misclassiﬁed as good. Hwang (1976b) studied a group testing model withthe presence of a dilution eﬀect, where a group containing a few defective items may bemisidentiﬁed as one containing no such items, especially when the size of the group is large.He calculated the expected cost under the Dorfman procedure in the presence of the dilutioneﬀect and derived the optimal group sizes to minimize this cost. Malinovsky et al. (2016)characterized the optimal design in the Dorfman procedure in the presence of misclassiﬁcationby maximizing the ratio between the expected number of correct classiﬁcations and theexpected number of tests. Haber et al. (2021) proposed to minimize the expected numberof tests while controlling overall misclassiﬁcation rates. In general, since it is expected thatmisclassiﬁcation may be related to group size, one has to be very cautious about proposingDorfman designs with large group sizes. Alternative designs where groups are re-tested indiﬀerent ways have been explored (Litvak et al. , 1994, 2020).

Incomplete identiﬁcation

Consider a very large (inﬁnite) population of items, where each item, independent from theothers, is either defective with probability p or non-defective with probability 1 − p . Thegoal is to identify a certain number of non-defective items as quickly as possible. To the bestof our knowledge, the incomplete identiﬁcation problem was introduced by Bar-Lev et al. (1990). For recent developments and references, see Malinovsky (2018).7 eferences Bar-Lev, S. K., Boneh, A., Perry, D. (1990). Incomplete identiﬁcation models for group-testable items. N av. Res. Logist.

7, 647–659.Burns, K. C., Mauro, C. A. (1987). Group testing with test error as a function of concen-tration. C ommunications in Statistics - Theory and Methods

6, 2821–2837.Dorfman, R. (1943). The detection of defective members of large populations. T he Annalsof Mathematical Statistics

4, 436–440.Feller, W. (1950). An introduction to probability theory and its application.

New York: JohnWiley & Sons .Gilstein, C. Z. (1985). Optimal partitions of ﬁnite populations for Dorfman-type grouptesting.

J. Stat. Plan. Inf.

2, 385–394.Graﬀ, L. E., Roeloﬀs, R.(1972). Group testing in the presence of test error; an extension ofthe Dorfman Procedure.

Technometrics Preprint arXiv: https://arxiv.org/abs/2004.04837 .Huﬀman, D. A. (1952). A Method for the Construction of Minimum-Redundancy Codes. P roceedings of the I.R.E.

0, 1098–1101.Hwang, F. K. (1975). A generalized binomial group testing problem. J . Amer. Statist. Assoc.

0, 923–926.Hwang, F. K. (1976a). An optimal nested procedure in binomial group testing.

Biometrics

2, 939–943.Hwang, F. K. (1976b). Group testing with a dilution eﬀect.

Biometrika

3, 671–680.Hwang, F. K. (1981). Optimal Partitions. J . Optim. Theory Appl.

4, 1–10.Hwang, F. K., Pfeifer, C. J., and Enis, P. (1981). An Optimal Hierarchical Procedure for aModiﬁed Binomial Group-Testing Problem. J . Amer. Statist. Assoc.

6, 947–949.Johnson, N. L., Kotz, S., Wu, X. Z. (1991). Inspection errors for attributes in quality control.

Monographs on Statistics and Applied Probability, 44. Chapman & Hall, London.

Katona, G. O. H. (1973). Combinatorial search problems.

J.N. Srivastava et al., A Surveyof combinatorial Theory, P roc. 52nd Annu. Allerton Conf. Commun. Control Comput., 101–108.8urtz, D., and Sidi, M. (1988). Multiple access algorithms via group testing for heterogeneouspopulation of users. I EEE Trans. Commun.

6, 1316–1323.Kumar, S., and Sobel, M. (1971). Finding a single defective in binomial group-testing.

J.Amer. Statist. Assoc.

6, 824–828.Lee, J.K., and Sobel, M. (1972). Dorfman and R -type procedures for a generalized grouptesting problem. Mathematical Biosciences

5, 317–340.Litvak, E., Dentzer, S., Pagano, M. (2020). The Right Kind of Pooled Testing for the NovelCoronavirus: First, Do No Harm.

American Journal of Public Health

10, 1772–1773.Litvak, E., Tu, X. M., Pagano, M. (1994). Screening for the presence of a disease by poolingsera samples.

J. Am. Stat. Assoc.

9, 424–434.Malinovsky, Y. (2018). On optimal policy in the group testing with incomplete identiﬁcation. S tatist. Probab. Lett.

40, 44–47.Malinovsky, Y. (2019a). End Notes. M ath. Mag.

2, 398.Malinovsky, Y. (2019b). Sterrett procedure for the generalized group testing problem. M ethodology and Computing in Applied Probability.

1, 829–840.Malinovsky, Y. (2020). Conjectures on Optimal Nested Generalized Group Testing Algo-rithm. A pplied Stochastic Models in Business and Industry.

6, 1029–1036.Malinovsky, Y., Albert, P. S. (2015). A note on the minimax solution for the two-stage grouptesting problem.

The American Statistician

9, 45–52.Malinovsky, Y., Albert, P. S. (2019). Revisiting nested group testing procedures: new results,comparisons, and robustness.

The American Statistician

3, 117–125.Malinovsky, Y., Albert, P. S., Roy, A. (2016). Reader reaction: A note on the evaluation ofgroup testing algorithms in the presence of misclassiﬁcation.

Biometrics

2, 299–302.Malinovsky, Y., Haber, G., Albert, P. S. (2020). An optimal design for hierarchical general-ized group testing.

J. R. Stat. Soc. Ser. C. Appl. Stat.

9, 607–621.Moon, J. W., Sobel, M. (1977). Enumerating a class of nested group testing procedures.

Journal of combinatorial theory, series B

3, 184–188.Nebenzahl, E., and Sobel, M. (1973). Finite and inﬁnite models for generalized group-testingwith unequal probabilities of success for each item. in T. Cacoullos, ed.,

DiscriminantAnalysis and Aplications , New York: Academic Press Inc., 239–284.Pfeifer, C. G., Enis, P. (1978). Dorfman-type group testing for a modiﬁed binomial model.

J. Amer. Statist. Assoc.

3, 588–592. 9amuels, S. M. (1978). The exact solution to the two-stage group-testing problem.

Techno-metrics

0, 497–500.Sobel, M. (1960). Group testing to classify eﬃciently all defectives in a binomial sample.

Information and Decision Processes (R. E. Machol, ed.; McGraw-Hill, New York), pp.127-161.Sobel, M. (1967). Optimal group testing.

Proc. Colloq. on Information Theory, Bolyai Math.Society, Debrecen, Hungary .Sobel, M., Groll, P. A. (1959). Group testing to eliminate eﬃciently all defectives in abinomial sample.

Bell System Tech. J.

8, 1179–1252.Sobel, M., Groll, P. A. (1966). Binomial group-testing with an unknown proportion ofdefectives.

Technometrics , 631–656.Sterrett, A. (1957). On the detection of defective members of large populations. T he Annalsof Mathematical Statistics

8, 1033–1036.Ungar, P. (1960). Cutoﬀ points in group testing.

Comm. Pure Appl. Math. S IAM J.Disc. Math. , 256–259.Yao, Y. C., Hwang, F. K. (1988b). Individual testing of independent items in optimum grouptesting. P robab. Eng. Inform. Sci. , 23–29.Yao, Y. C., Hwang, F. K. (1990). On optimal nested group testing algorithms. J. Stat. Plan.Inf.

4, 167–175.Zaman, N., and Pippenger, N. (2016). Asymptotic analysis of optimal nested group-testingprocedures.

Prob. Eng. Inform. Sci.

0, 547—552.Zimmerman, S. (2017). Detecting deﬁciencies: an optimal group testing algorithm. M ath.Mag.9