AA Note on Double Pooling Tests(Preliminary version)
Andrei Z. Broder & Ravi KumarGoogleMountain View, CA [email protected], [email protected]
April 6, 2020
Abstract
We present double pooling, a simple, easy-to-implement variation on test pooling, that incertain ranges for the a priori probability of a positive test, is significantly more efficient thanthe standard single pooling approach (the Dorfman method).
Introduction
The concept of test pooling was apparently invented by Robert Dorfman [Dor43] in 1943 whosuggested that it would be more effective to test WW2 would-be recruits for syphilis by mixing theblood samples of several recruits and test the pool for antigens. If the pool tests negative then allthe pool members are deemed healthy; otherwise, each member of the pool is tested separately. Asimple analysis (see next section) shows that for a given probability p that a recruit is infected thereis an optimum pool size s ( p ) that minimizes the expected number of needed tests. The lower the p , the larger the s ( p ) and the lower the expected number of tests required. Dorfman’s analysis hasbeen further refined and generalized to deal with various problems such as false negatives [LLZA12]and studied as part of the broad topic of Combinatorial Group Testing [DH93, AJS19].Note that some recursive and adaptive approaches dear to computer scientists, such as binarysearch, often may not work for this problem: there are pragmatic limitations on (a) the size of thepool beyond which dilution results in too many false negatives; (b) the number of samples availablefrom a given specimen; and (c) the total time required to produce an answer.Nevertheless the emergence of the COVID-19 pandemic and the cost and scarcity of tests forthe underlying virus has revived an enormous interest in test pooling. For COVID-19 test poolinghas been shown to be doable with pools as large as 64 [YAST +
20] and is already in use in severalcountries including Germany [SCea] and Israel [YAST + Double pooling
The purpose of this note is to propose a simple variation on Dorfman’s approach that we call doublepooling ; for clarity, we refer to Dorfman’s method as single pooling . Double pooling works as follows:1 a r X i v : . [ c s . D M ] A p r iven a probability p of a positive test, pick an optimal size s ( p ) for the pool size. (The optimal s ( p ) is larger than the corresponding optimal s ( p ) for single pooling.) Divide the population tobe tested into non-overlapping pools of size s (the division is assumed to be random) twice . Thus,now every patient belongs to two pools and is tested in two parallel rounds, A and B . For everypatient if both the pools test positive then test the patient individually. Otherwise consider thatpatient cleared. If the pool tests do not ever produce false negatives the algorithm is clearly correct.(The false positives only reduce efficiency.)It turns out the double pooling is particularly advantageous for p <
2% corresponding to testinga large population of asymptomatic patients but it is more efficient than single pooling even for p = 10%.We will discuss double pooling in the next section in more detail, but to see its advantages andbuild an intuitive understanding we start with an example.Assume that p = 0 . s = 10 is the optimal size for single pooling with this p and results in an expected cost of ≈ .
206 tests/patient, a nice improvement over testing everyone.However using double pooling the optimum s ( p ) is 23 and the expected cost further declines tojust 0 . p ≈ .
011 so we will posit we have exactly11 positive patients. • For single pooling, since s = 10, we start with 100 tests. Assuming all positive cases endin separate pools (an upper bound) we will need to do another 110 tests to deal with all thesuspicious cases, hence 210 total tests (which is close enough to 206.) • For double pooling, since s = 23, we do twice 44 tests with 23 patients each (88 tests).In each round at most 11 tests will come back positive raising suspicions about a total of22 ×
11 = 242 healthy patients. Thus a given healthy patient has probability 0 .
24 to be asuspect in Round A and the same 0 .
24 probability in round B. These are quasi-independent,hence the probability of being suspected twice is only 0 . × .
24 = 0 . ≈ . × (1000 − ×
23 = 1012patients).In conclusion, the “magic” of double pooling comes from a paradigm that has been observed inmany other situations, e.g., Bloom filters [Blo70, BM03] and balanced allocations [ABKU99, Mit01].Although the probability of being “unlucky” in a given trial might be high, the probability of beingunlucky in two or more independent trials decreases dramatically.
Analysis
Consider the expected cost attributable to one patient in the single pooling situation, where thesize of the pool is s : • if the patient is positive, then the cost is 1 /s + 1 (the patient’s share of the pool + theirindividual test); 2 if the patient is negative, then the cost is 1 /s + 1 × (1 − (1 − p ) ( s − ) (the patient’s share ofthe pool + their individual test iff not all the other patients are healthy).Since the probability of being positive is p , the total expected cost per patient in this case is p (cid:18) s (cid:19) + (1 − p ) (cid:18) − (1 − p ) s − + 1 s (cid:19) = 1 s + 1 − (1 − p ) s . (1)To determine the pool size that minimizes the total for a given p , we take the derivative of thecost with respect to s : ∂∂s (cid:18) s + 1 − (1 − p ) s (cid:19) = − (1 − p ) s ln(1 − p ) − s , (2)and set it to 0.The solution of interest can be expressed in terms of the Lambert W function namely s = 2 W (cid:18) − (cid:113) − log(1 − p ) (cid:19)(cid:30) log(1 − p ) . (3)Let us define p as the value of p for which the optimum s is exactly 10. It turns out that p ≈ . s as a function of p . s p Figure 1: Optimum s as a function of p .Let us turn to double pooling: now each patient will be assigned to two random pools each ofsize s . A patient will be tested individually iff both their pools test positive. Again let us look atthe expected cost induced by the testing of one patient: https://en.wikipedia.org/wiki/Lambert_W_function if the patient is positive, then the cost is 2 /s + 1 (the patient’s share of the two pools + theirindividual test); • if the patient is negative, then the cost is 2 /s + 1 × (1 − (1 − p ) ( s − ) (the patient’s share ofthe pools + their individual test iff both pools test positive).Hence the total expected cost is p (cid:18) s (cid:19) + q (cid:18) (1 − q s − ) + 2 s (cid:19) = 2 s + p + q (1 − q s − ) , (4)where for brevity q stands for 1 − p .As before, to determine the pool size that minimizes the total cost for a given p , we take thepartial derivative of the cost with respect to s : ∂∂s (cid:18) s + p + q (1 − q s − ) (cid:19) = − s − q s (1 − q s − ) ln q, (5)and set it to 0.This has to be solved numerically at each p . Solving this at p = p ≈ . s = 23. More generally, Figure 2 shows the optimum integer s as a function of p andFigure 3 shows the expected cost per patient tested as a function of p for both single and doublepooling using optimal integer values of s ( p ) and s ( p ). s p Figure 2: Optimum s as a function of p .4 e x pe c t ed c o s t p single poolingdouble pooling Figure 3: Expected cost using the optimum s and s as a function of p .In principle we can generalize double pooling to k -pooling, whereby each patient participatesin k independent pools in k parallel rounds. The expected cost becomes ks + p + q (1 − q s − ) k . (6)Depending on p this can yield further improvements but they are probably impractical especiallyif p gets larger. (With triple testing for p = p and about 1000 samples we would need only 128tests with pools of size 36, and with quadruple testing only 122 tests with pools of size 47.). Evenmore asymptotically efficient tests can be constructed [MT11, PR11], but it is unclear if they canbe practical. Conclusions
We presented double pooling , a simple, easy-to-implement variation on test pooling, that in certainranges for p , the a priori probability of a positive tests, is significantly more efficient than thestandard single pooling approach. Figure 4 shows the percentage of savings of double pooling oversingle pooling as a function of p . We can see that double pooling is particularly advantageous for p below 2% corresponding to large scale testing of asymptomatic patients, but is still at least 10%better than single pooling all the way up to p = 5 . % s a v i ng s p Figure 4: % savings of double pooling over single pooling as a function of p .two reasons: one physical: the pools used are larger; and one mathematical: a true positive samplewill be missed if either of its two pools produces a false negative. We will discuss this further inthe final version.We are reaching out to our colleagues in the medical field to find out whether double poolingis practically usable for COVID testing and will update our note with their feedback.Presently, there is an extraordinary flurry of activity and independent work on group testingfor COVID. This includes an analysis of a single pooling method [Gol20] and a proposal based onbinary search [Gos20]. It might well be the case that independent researchers have already obtainedthe same results presented here. We are encouraging members of the community to send us theircomments and feedback. Acknowledgments
We thank our colleagues Fernando Pereira and Tam´as Sarl´os for many useful comments.
References [ABKU99] Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. Balanced allocations.
SIAM J. Comput. , 29(1):180–200, 1999.[AJS19] Matthew Aldridge, Oliver Johnson, and Jonathan Scarlett. Group testing: An informa-tion theory perspective.
Foundations and Trends in Communications and InformationTheory , 15(3-4):196–392, 2019. 6Blo70] Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors.
C. ACM ,13(7):422–426, 1970.[BM03] Andrei Z. Broder and Michael Mitzenmacher. Network applications of Bloom filters:A survey.
Internet Mathematics , 1(4):485–509, 2003.[DH93] Ding-Zhu Du and Frank K. Hwang.
Combinatorial Group Testing and its Applications .World Scientific, 1993.[Dor43] Robert Dorfman. The detection of defective members of large populations.
Ann MathStat. , 14:436–440, 1943.[Gol20] Christian Gollier. Optimal group testing to exit the COVID confinement.
ToulouseSchool of Economics , 2020. Technical report.[Gos20] Olivier Gossner. Group testing against COVID-19.
Center for Research in Economicsand Statistics , 2020. Working Papers 2020-02.[LLZA12] Aiyi Liu, Chunling Liu, Zhiwei Zhang, and Paul S. Albert. Optimality of group testingin the presence of misclassification.
Biometrika , 99:245–251, 2012.[Mit01] Michael Mitzenmacher. The power of two choices in randomized load balancing.
IEEETrans. Parallel Distrib. Syst. , 12(10):1094–1104, 2001.[MT11] Marc Mezard and Cristina Toninelli. Group testing with random pools: Optimal two-stage algorithms.
IEEE Transactions on Information Theory , 57(3):1736–1745, 2011.[PR11] Ely Porat and Amir Rothschild. Explicit nonadaptive combinatorial group testingschemes.
IEEE Transactions on Information Theory , 57:7982–7989, 12 2011.[SCea] Erhard Seifried, Sandra Ciesek, and et al. Pool testing of SARS-CoV-2 samplesincreases test capacity. .[YAST +
20] Idan Yelin, Noga Aharony, Einat Shaer-Tamar, Amir Argoetti, Esther Messer, DinaBerenbaum, Einat Shafran, Areen Kuzli, Nagam Gandali, Tamar Hashimshony, YaelMandel-Gutfreund, Michael Halberthal, Yuval Geffen, Moran Szwarcwort-Cohen, andRoy Kishony. Evaluation of COVID-19 RT-qPCR test in multi-sample pools. medRxivmedRxiv