[PDF] A Note on Double Pooling Tests

Abstract

We present double pooling, a simple, easy-to-implement variation on test pooling, that in certain ranges for the a priori probability of a positive test, is significantly more efficient than the standard single pooling approach (the Dorfman method).

Full PDF

AA Note on Double Pooling Tests(Preliminary version)

Andrei Z. Broder & Ravi KumarGoogleMountain View, CA [email protected], [email protected]

April 6, 2020

Abstract

We present double pooling, a simple, easy-to-implement variation on test pooling, that incertain ranges for the a priori probability of a positive test, is signiﬁcantly more eﬃcient thanthe standard single pooling approach (the Dorfman method).

Introduction

The concept of test pooling was apparently invented by Robert Dorfman [Dor43] in 1943 whosuggested that it would be more eﬀective to test WW2 would-be recruits for syphilis by mixing theblood samples of several recruits and test the pool for antigens. If the pool tests negative then allthe pool members are deemed healthy; otherwise, each member of the pool is tested separately. Asimple analysis (see next section) shows that for a given probability p that a recruit is infected thereis an optimum pool size s ( p ) that minimizes the expected number of needed tests. The lower the p , the larger the s ( p ) and the lower the expected number of tests required. Dorfman’s analysis hasbeen further reﬁned and generalized to deal with various problems such as false negatives [LLZA12]and studied as part of the broad topic of Combinatorial Group Testing [DH93, AJS19].Note that some recursive and adaptive approaches dear to computer scientists, such as binarysearch, often may not work for this problem: there are pragmatic limitations on (a) the size of thepool beyond which dilution results in too many false negatives; (b) the number of samples availablefrom a given specimen; and (c) the total time required to produce an answer.Nevertheless the emergence of the COVID-19 pandemic and the cost and scarcity of tests forthe underlying virus has revived an enormous interest in test pooling. For COVID-19 test poolinghas been shown to be doable with pools as large as 64 [YAST +

20] and is already in use in severalcountries including Germany [SCea] and Israel [YAST + Double pooling

The purpose of this note is to propose a simple variation on Dorfman’s approach that we call doublepooling ; for clarity, we refer to Dorfman’s method as single pooling . Double pooling works as follows:1 a r X i v : . [ c s . D M ] A p r iven a probability p of a positive test, pick an optimal size s ( p ) for the pool size. (The optimal s ( p ) is larger than the corresponding optimal s ( p ) for single pooling.) Divide the population tobe tested into non-overlapping pools of size s (the division is assumed to be random) twice . Thus,now every patient belongs to two pools and is tested in two parallel rounds, A and B . For everypatient if both the pools test positive then test the patient individually. Otherwise consider thatpatient cleared. If the pool tests do not ever produce false negatives the algorithm is clearly correct.(The false positives only reduce eﬃciency.)It turns out the double pooling is particularly advantageous for p <

2% corresponding to testinga large population of asymptomatic patients but it is more eﬃcient than single pooling even for p = 10%.We will discuss double pooling in the next section in more detail, but to see its advantages andbuild an intuitive understanding we start with an example.Assume that p = 0 . s = 10 is the optimal size for single pooling with this p and results in an expected cost of ≈ .

206 tests/patient, a nice improvement over testing everyone.However using double pooling the optimum s ( p ) is 23 and the expected cost further declines tojust 0 . p ≈ .

011 so we will posit we have exactly11 positive patients. • For single pooling, since s = 10, we start with 100 tests. Assuming all positive cases endin separate pools (an upper bound) we will need to do another 110 tests to deal with all thesuspicious cases, hence 210 total tests (which is close enough to 206.) • For double pooling, since s = 23, we do twice 44 tests with 23 patients each (88 tests).In each round at most 11 tests will come back positive raising suspicions about a total of22 ×

11 = 242 healthy patients. Thus a given healthy patient has probability 0 .

24 to be asuspect in Round A and the same 0 .

24 probability in round B. These are quasi-independent,hence the probability of being suspected twice is only 0 . × .

24 = 0 . ≈ . × (1000 − ×

23 = 1012patients).In conclusion, the “magic” of double pooling comes from a paradigm that has been observed inmany other situations, e.g., Bloom ﬁlters [Blo70, BM03] and balanced allocations [ABKU99, Mit01].Although the probability of being “unlucky” in a given trial might be high, the probability of beingunlucky in two or more independent trials decreases dramatically.

Analysis

Consider the expected cost attributable to one patient in the single pooling situation, where thesize of the pool is s : • if the patient is positive, then the cost is 1 /s + 1 (the patient’s share of the pool + theirindividual test); 2 if the patient is negative, then the cost is 1 /s + 1 × (1 − (1 − p ) ( s − ) (the patient’s share ofthe pool + their individual test iﬀ not all the other patients are healthy).Since the probability of being positive is p , the total expected cost per patient in this case is p (cid:18) s (cid:19) + (1 − p ) (cid:18) − (1 − p ) s − + 1 s (cid:19) = 1 s + 1 − (1 − p ) s . (1)To determine the pool size that minimizes the total for a given p , we take the derivative of thecost with respect to s : ∂∂s (cid:18) s + 1 − (1 − p ) s (cid:19) = − (1 − p ) s ln(1 − p ) − s , (2)and set it to 0.The solution of interest can be expressed in terms of the Lambert W function namely s = 2 W (cid:18) − (cid:113) − log(1 − p ) (cid:19)(cid:30) log(1 − p ) . (3)Let us deﬁne p as the value of p for which the optimum s is exactly 10. It turns out that p ≈ . s as a function of p . s p Figure 1: Optimum s as a function of p .Let us turn to double pooling: now each patient will be assigned to two random pools each ofsize s . A patient will be tested individually iﬀ both their pools test positive. Again let us look atthe expected cost induced by the testing of one patient: https://en.wikipedia.org/wiki/Lambert_W_function if the patient is positive, then the cost is 2 /s + 1 (the patient’s share of the two pools + theirindividual test); • if the patient is negative, then the cost is 2 /s + 1 × (1 − (1 − p ) ( s − ) (the patient’s share ofthe pools + their individual test iﬀ both pools test positive).Hence the total expected cost is p (cid:18) s (cid:19) + q (cid:18) (1 − q s − ) + 2 s (cid:19) = 2 s + p + q (1 − q s − ) , (4)where for brevity q stands for 1 − p .As before, to determine the pool size that minimizes the total cost for a given p , we take thepartial derivative of the cost with respect to s : ∂∂s (cid:18) s + p + q (1 − q s − ) (cid:19) = − s − q s (1 − q s − ) ln q, (5)and set it to 0.This has to be solved numerically at each p . Solving this at p = p ≈ . s = 23. More generally, Figure 2 shows the optimum integer s as a function of p andFigure 3 shows the expected cost per patient tested as a function of p for both single and doublepooling using optimal integer values of s ( p ) and s ( p ). s p Figure 2: Optimum s as a function of p .4 e x pe c t ed c o s t p single poolingdouble pooling Figure 3: Expected cost using the optimum s and s as a function of p .In principle we can generalize double pooling to k -pooling, whereby each patient participatesin k independent pools in k parallel rounds. The expected cost becomes ks + p + q (1 − q s − ) k . (6)Depending on p this can yield further improvements but they are probably impractical especiallyif p gets larger. (With triple testing for p = p and about 1000 samples we would need only 128tests with pools of size 36, and with quadruple testing only 122 tests with pools of size 47.). Evenmore asymptotically eﬃcient tests can be constructed [MT11, PR11], but it is unclear if they canbe practical. Conclusions

We presented double pooling , a simple, easy-to-implement variation on test pooling, that in certainranges for p , the a priori probability of a positive tests, is signiﬁcantly more eﬃcient than thestandard single pooling approach. Figure 4 shows the percentage of savings of double pooling oversingle pooling as a function of p . We can see that double pooling is particularly advantageous for p below 2% corresponding to large scale testing of asymptomatic patients, but is still at least 10%better than single pooling all the way up to p = 5 . % s a v i ng s p Figure 4: % savings of double pooling over single pooling as a function of p .two reasons: one physical: the pools used are larger; and one mathematical: a true positive samplewill be missed if either of its two pools produces a false negative. We will discuss this further inthe ﬁnal version.We are reaching out to our colleagues in the medical ﬁeld to ﬁnd out whether double poolingis practically usable for COVID testing and will update our note with their feedback.Presently, there is an extraordinary ﬂurry of activity and independent work on group testingfor COVID. This includes an analysis of a single pooling method [Gol20] and a proposal based onbinary search [Gos20]. It might well be the case that independent researchers have already obtainedthe same results presented here. We are encouraging members of the community to send us theircomments and feedback. Acknowledgments

We thank our colleagues Fernando Pereira and Tam´as Sarl´os for many useful comments.

References [ABKU99] Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. Balanced allocations.

SIAM J. Comput. , 29(1):180–200, 1999.[AJS19] Matthew Aldridge, Oliver Johnson, and Jonathan Scarlett. Group testing: An informa-tion theory perspective.

Foundations and Trends in Communications and InformationTheory , 15(3-4):196–392, 2019. 6Blo70] Burton H. Bloom. Space/time trade-oﬀs in hash coding with allowable errors.

C. ACM ,13(7):422–426, 1970.[BM03] Andrei Z. Broder and Michael Mitzenmacher. Network applications of Bloom ﬁlters:A survey.

Internet Mathematics , 1(4):485–509, 2003.[DH93] Ding-Zhu Du and Frank K. Hwang.

Combinatorial Group Testing and its Applications .World Scientiﬁc, 1993.[Dor43] Robert Dorfman. The detection of defective members of large populations.

Ann MathStat. , 14:436–440, 1943.[Gol20] Christian Gollier. Optimal group testing to exit the COVID conﬁnement.

ToulouseSchool of Economics , 2020. Technical report.[Gos20] Olivier Gossner. Group testing against COVID-19.

Center for Research in Economicsand Statistics , 2020. Working Papers 2020-02.[LLZA12] Aiyi Liu, Chunling Liu, Zhiwei Zhang, and Paul S. Albert. Optimality of group testingin the presence of misclassiﬁcation.

Biometrika , 99:245–251, 2012.[Mit01] Michael Mitzenmacher. The power of two choices in randomized load balancing.

IEEETrans. Parallel Distrib. Syst. , 12(10):1094–1104, 2001.[MT11] Marc Mezard and Cristina Toninelli. Group testing with random pools: Optimal two-stage algorithms.

IEEE Transactions on Information Theory , 57(3):1736–1745, 2011.[PR11] Ely Porat and Amir Rothschild. Explicit nonadaptive combinatorial group testingschemes.

IEEE Transactions on Information Theory , 57:7982–7989, 12 2011.[SCea] Erhard Seifried, Sandra Ciesek, and et al. Pool testing of SARS-CoV-2 samplesincreases test capacity. .[YAST +

20] Idan Yelin, Noga Aharony, Einat Shaer-Tamar, Amir Argoetti, Esther Messer, DinaBerenbaum, Einat Shafran, Areen Kuzli, Nagam Gandali, Tamar Hashimshony, YaelMandel-Gutfreund, Michael Halberthal, Yuval Geﬀen, Moran Szwarcwort-Cohen, andRoy Kishony. Evaluation of COVID-19 RT-qPCR test in multi-sample pools. medRxivmedRxiv