Noisy Adaptive Group Testing: Bounds and Algorithms
11 Noisy Adaptive Group Testing:Bounds and Algorithms
Jonathan Scarlett
Abstract
The group testing problem consists of determining a small set of defective items from a larger set of items basedon a number of possibly-noisy tests, and is relevant in applications such as medical testing, communication protocols,pattern matching, and many more. One of the defining features of the group testing problem is the distinction betweenthe non-adaptive and adaptive settings: In the non-adaptive case, all tests must be designed in advance, whereas in theadaptive case, each test can be designed based on the previous outcomes. While tight information-theoretic limits andnear-optimal practical algorithms are known for the adaptive setting in the absence of noise, surprisingly little is knownin the noisy adaptive setting. In this paper, we address this gap by providing information-theoretic achievability andconverse bounds under various noise models, as well as a slightly weaker achievability bound for a computationallyefficient variant. These bounds are shown to be tight or near-tight in a broad range of scaling regimes, particularly atlow noise levels. The algorithms used for the achievability results have the notable feature of only using two or threestages of adaptivity.
Index Terms
Group testing, sparsity, information-theoretic limits, adaptive algorithms
I. I
NTRODUCTION
The group testing problem consists of determining a small subset S of “defective” items within a larger set ofitems { , . . . , p } , based on a number of possibly-noisy tests. This problem has a history in medical testing [1], andhas regained significant attention following new applications in areas such as communication protocols [2], patternmatching [3], and database systems [4], and connections with compressive sensing [5], [6]. In the noiseless setting,each test takes the form Y = (cid:95) j ∈ S X j , (1)where the test vector X = ( X , . . . , X p ) ∈ { , } p indicates which items are included in the test, and Y is theresulting observation. That is, the output indicates whether at least one defective item was included in the test. Onewishes to design a sequence of tests X (1) , . . . , X ( n ) , with n ideally as small as possible, such that the outcomescan be used to reliably recover the defective set S with probability close to one. The author is with the Department of Computer Science & Department of Mathematics, National University of Singapore (e-mail:[email protected]). This work was supported by an NUS Startup Grant.
October 5, 2018 DRAFT a r X i v : . [ c s . I T ] O c t One of the defining features of the group testing problem is the distinction between the non-adaptive and adaptive settings. In the non-adaptive setting, every test must be designed prior to observing any outcomes, whereas in theadaptive setting, a given test X ( i ) can be designed based on the previous outcomes Y (1) , . . . , Y ( i − . It is ofconsiderable interest to determine the extent to which this additional freedom helps in reducing the number of tests.In the noiseless setting, a number of interesting results have been discovered along these lines: • When the number of defectives k := | S | scales as k = O ( p / ) , the minimal number of tests permittingvanishing error probability scales as n = (cid:0) k log pk (cid:1) (1 + o (1)) in both the adaptive and non-adaptive settings[7], [8]. Hence, at least information-theoretically, there is no asymptotic adaptivity gain. • For scalings of the form k = Θ( p θ ) with θ ∈ (cid:0) , (cid:1) , the behavior n = (cid:0) k log pk (cid:1) (1 + o (1)) remains unchangedin the adaptive setting [7], but it remains open as to whether this can be attained non-adaptively. For θ closeto one, the best known non-adaptive achievability bounds are far from this threshold. • Even in the first case above with no adaptivity gain, the adaptive algorithms known to achieve the optimalthreshold are practical , having low storage and computation requirements [9]. In contrast, in the non-adaptivecase, only computationally intractable algorithms have been shown to attain the optimal threshold [8], [10]. • It has recently been established that there is a provable adaptivity gap under certain scalings of the form k = Θ( p ) , i.e., the linear regime [11], [12].Despite this progress for the noiseless setting, there has been surprisingly little work on adaptivity in noisy settings;the vast majority of existing group testing algorithms for random noise models are non-adaptive [13]–[16]. In thispaper, we address this gap by providing new achievability and converse bounds for noisy adaptive group testing,focusing primarily on a widely-adopted symmetric noise model. Before outlining our contributions, we formallyintroduce the setup. A. Problem Setup
Except where stated otherwise, we let the defective set S be uniform on the (cid:0) pk (cid:1) subsets of { , . . . , p } of cardinality k . As mentioned above, an adaptive algorithm iteratively designs a sequence of tests X (1) , . . . , X ( n ) , with X ( i ) ∈{ , } p , and the corresponding outcomes are denoted by Y = ( Y (1) , . . . , Y ( n ) ) , with Y ( i ) ∈ { , } . A given test isallowed to depend on all of the previous outcomes.Generalizing (1), we consider the following widely-adopted symmetric noise model: Y = (cid:95) j ∈ S X j ⊕ Z, (2)where Z ∼ Bernoulli( ρ ) for some ρ ∈ (cid:0) , (cid:1) , and ⊕ denotes modulo-2 addition. In Section V, we will also considerother asymmetric noise models.Given the tests and their outcomes, a decoder forms an estimate (cid:98) S of S . We consider the exact recovery criterion,in which the error probability is given by P e := P [ (cid:98) S (cid:54) = S ] , (3) October 5, 2018 DRAFT and is taken over the randomness of the defective set S , the tests X (1) , . . . , X ( n ) (if randomized), and the noisyoutcomes Y (1) , . . . , Y ( n ) .As a stepping stone towards exact recovery results, we will also consider a less stringent partial recovery criterion,in which we allow for up to d max false positives and up to d max false negatives, for some d max > . That is, theerror probability is P e ( d max ) := P [ d ( S, (cid:98) S ) > d max ] , (4)where d ( S, (cid:98) S ) = max {| S \ (cid:98) S | , | (cid:98) S \ S |} . (5)Understanding partial recovery is, of course, also of interest in its own right. However, the results of [8], [17]indicate that there is little or no adaptivity gain under this criterion, at least when k = o ( p ) and d max = Θ( k ) .Except where stated otherwise, we assume that the noise level ρ and number of defectives k are known. In SectionIV, we will consider cases where k is only approximately known. B. Related work
Non-adaptive setting.
The information-theoretic limits of group testing were first studied in the Russian literature[13], [18], [19], and have recently become increasingly well-understood [8], [10], [17], [20]–[22]. Among the existingworks, the results most relevant to the present paper are as follows: • In the adaptive setting, it was shown by Baldassini et al. [7] that if the output Y is produced by passing thenoiseless outcome U = ∨ j ∈ S X j through a binary channel P Y | U , then the number of tests for attaining P e → must satisfy n ≥ (cid:0) C k log pk (cid:1) (1 − o (1)) , where C is the Shannon capacity of P Y | U in nats. For the symmetricnoise model (2), this yields n ≥ k log pk log 2 − H ( ρ ) (1 − o (1)) , (6)where H ( ρ ) = ρ log ρ + (1 − ρ ) log − ρ is the binary entropy function. • In the non-adaptive setting with symmetric noise, it was shown that an information-theoretic threshold decoderattains the bound (6) for k = o ( p ) under the partial recovery criterion with d max = Θ( k ) and an arbitrarilysmall implied constant [8], [17]. For exact recovery, a more complicated bound was also given in [8] thatmatches (6) when k = Θ( p θ ) for sufficiently small θ > .Several non-adaptive noisy group testing algorithms have been shown to come with rigorous guarantees. We willuse two of these non-adaptive algorithms as building blocks in our adaptive methods: • The
Noisy Combinatorial Orthogonal Matching Pursuit (NCOMP) algorithm checks, for each item, theproportion of tests it was included in that returned positive, and declares the item to be defective if thisnumber exceeds a suitably-chosen threshold. This is known to provide optimal scaling laws for the regime k = Θ( p θ ) ( θ ∈ (0 , ) [14], [15], albeit with somewhat suboptimal constants. Here and subsequently, the function log( · ) has base e , and information measures have units of nats. October 5, 2018 DRAFT • The method of separate decoding of items , also known as separate testing of inputs [13], [16], also considers theitems separately, but uses all of the tests. Specifically, a given item’s status is selected via a binary hypothesistest. This method was studied for k = O (1) in [13], and for k = Θ( p θ ) in [16]; in particular, it was shown thatthe number of tests is within a factor log 2 of the optimal information-theoretic threshold under exact recoveryas θ → , and under partial recovery (with d max = Θ( k ) ) for all θ ∈ (0 , .A different line of works has considered group testing with adversarial errors (e.g., see [23]–[25]); these are lessrelevant to the present paper. Adaptive setting.
As mentioned above, adaptive algorithms are well-understood in the noiseless setting [26], [27].To our knowledge, the first algorithm that was proved to achieve n = (cid:0) k log pk (cid:1) (1 + o (1)) for all k = o ( p ) isHwang’s generalized binary splitting algorithm [9], [26]. More recently, there has been interest in algorithms thatonly use limited rounds of adaptivity [27]–[29], and among other things, it has been shown that the same guaranteecan be attained using at most four stages [27]. The two-stage setting is often considered to be of particular interest[29]–[31].In the noisy adaptive setting, the existing work is relatively limited. In [32], an adaptive algorithm calledGROTESQUE was shown to provide optimal scaling laws in terms of both samples and runtime. Our focus inthis paper is only on the number of samples, but with a much greater emphasis on the constant factors . In [33,Ch. 4], noisy adaptive group testing algorithms were proposed for two different noise models based on the Z-channeland reverse Z-channel, also achieving an order-optimal required number of tests with reasonable constant factors.We discuss these noise models further in Section V. C. Contributions
In this paper, we characterize both the information-theoretic limits and the performance of practical algorithmsfor noisy adaptive group testing, characterizing the asymptotic required number of tests for P e → as p → ∞ .For the achievability part, we propose an adaptive algorithm whose first stage can be taken as any non-adaptivealgorithm that comes with partial recovery guarantees, and whose second stage (and third stage in a refined version)improve this initial estimate. By letting the first stage use the information-theoretic threshold decoder of [8], weattain an achievability bound that is near-tight in many cases of interest, whereas by using separate decoding ofitems as per [13], [16], we attain a slightly weaker guarantee while still maintaining computational efficiency. Inaddition, we provide a novel converse bound showing that Ω( k log k ) tests are always necessary, and hence, theimplied constant in any scaling of the form n = Θ (cid:0) k log pk (cid:1) with k = Θ( p θ ) must grow unbounded as θ → . (Thisis because k log k = Θ (cid:0) θ − θ k log pk (cid:1) when k = Θ( p θ ) .)Our results are summarized in Figure 1, where we observe a considerable gain over the best known non-adaptiveguarantees, particularly when the noise level ρ is small. Although there is a gap between the achievability andconverse bounds for most values of θ , the converse has the notable feature of showing that n = k log pk log 2 − H ( ρ ) (1+ o (1)) is not always achievable, as one might conjecture based on the noiseless setting. In addition, the gap between the(refined) achievability bound and the converse bound is zero or nearly zero in at least two cases: (i) θ is small; (ii) October 5, 2018 DRAFT
Value such that k = ( p ) A s y m p t o t i c r a t i oo f k l og p = k t o n Non-adaptiveAdaptive (Simple)Adaptive (Practical)Adaptive (Refined)Converse
Value such that k = ( p ) A s y m p t o t i c r a t i oo f k l og p = k t o n Figure 1: Asymptotic bounds on the number of tests required for vanishing error probability under the noise levels ρ = 0 . (Left) and ρ = 10 − (Right). θ is close to one and ρ is close to zero. The algorithms used in our upper bounds have the notable feature of onlyusing two or three rounds of adaptivity, i.e., two in the simple version, and three in the refined version.In addition to these contributions for the symmetric noise model, we provide the following results for variousother observation models: • In the noiseless case, we recover the threshold n = (cid:0) k log pk (cid:1) (1 + o (1)) for all θ ∈ (0 , using a two-stageadaptive algorithm. Previously, the best known number of stages was four [27]. • For the Z-channel noise model (defined formally in Section V), we show that one can attain n = C (cid:0) k log pk (cid:1) (1 + o (1)) for all θ ∈ (0 , , where C is the Shannon capacity of the channel. This matchesthe general converse bound given in [7], i.e., the generalized version of (6). As a result, we improve on theabove-mentioned bounds of [33], which contain reasonable yet strictly suboptimal constant factors. • For the reverse Z-channel noise model (defined formally in Section V), we prove a similar converse bound tothe one mentioned above for the symmetric noise model, thus showing that one cannot match the conversebound of [7] for all θ ∈ (0 , .The remainder of the paper is organized as follows. For the symmetric noise model, we present the simple versionof our achievability bound in Section II, the refined version in Section III, and the converse in Section IV. Theother observation models mentioned above are considered in Section V, and conclusions are drawn in Section VI.II. A CHIEVABILITY (S IMPLE V ERSION )In this section, we formally state our simplest achievability results; a more complicated but also more powerfulresult is given in Section III. Using a common two-stage approach, we provide achievability bounds for both acomputationally intractable information-theoretic decoder, and computationally efficient decoder.
October 5, 2018 DRAFT
A. Information-theoretic decoder
The two-stage algorithm that we adopt is outlined informally in Algorithm 1; we describe the steps more preciselyin the proof of Theorem 1 below. The idea is to use a non-adaptive algorithm with partial recovery guarantees, andthen refine the solution by resolving the false negatives and false positives separately, i.e., Steps 2a and 2b. Whilethese latter steps are stated separately in Algorithm 1, the tests that they use can be performed together in a singleround of adaptivity, so that the overall algorithm is a two-stage procedure.
Algorithm 1:
Two-stage algorithm for noisy group testing (informal).1. Apply the information-theoretic threshold decoder of [8] (see Appendix A) to the ground set { , . . . , p } to findan estimate (cid:98) S of cardinality k such that max {| (cid:98) S \ S | , | S \ (cid:98) S |} ≤ α k (7)with high probability, for some small α > .2a. Apply a variation of NCOMP [14] (see Appendix C) to the reduced ground set { , . . . , p }\ (cid:98) S to exactly identifythe false negatives S \ (cid:98) S from the first step. Let these items be denoted by (cid:98) S (cid:48) a .2b. Test the items in (cid:98) S individually (cid:101) n times (for some (cid:101) n to be specified), and let (cid:98) S (cid:48) b contain the items thatreturned positive at least (cid:101) n times. The final estimate of the defective set is given by (cid:98) S := (cid:98) S (cid:48) a ∪ (cid:98) S (cid:48) b .Our first main information-theoretic achievability result is as follows. Theorem 1.
Under the symmetric noisy group testing setup with crossover probability ρ ∈ (cid:0) , (cid:1) , with k = Θ( p θ ) for some θ ∈ (0 , , there exists a two-stage adaptive group testing algorithm such that n ≤ (cid:18) k log pk log 2 − H ( ρ ) + k log k log ρ (1 − ρ ) (cid:19) (1 + o (1)) (8) and such that P e → as p → ∞ .Proof. We study the guarantees of the three steps in Algorithm 1, and the number of tests used for each one.
Step 1.
It was shown in [8] that, for an arbitrarily small constant α > , there exists a non-adaptive grouptesting algorithm returning some set (cid:98) S of cardinality k such that max {| (cid:98) S \ S | , | S \ (cid:98) S |} ≤ α k, (9)with probability approaching one, with the number of tests being at most n ≤ (cid:18) k log pk log 2 − H ( ρ ) (cid:19) (1 + o (1)) . (10)In Appendix A, we recap the decoding algorithm and its analysis. The non-adaptive test design for this stage is theubiquitous i.i.d. Bernoulli design. Step 2a.
Let us condition on the first step being successful, in the sense that (9) holds. We claim that thereexists a non-adaptive algorithm that, when applied to the reduced ground set { , . . . , p }\ (cid:98) S , returns (cid:98) S (cid:48) a containing October 5, 2018 DRAFT precisely the set of (at most α k ) defective items S \ (cid:98) S with probability approaching one, with the number ofsamples behaving as n = O (cid:18) α k log pα k (cid:19) . (11)If the number of defectives k := | S \ (cid:98) S | in the reduced ground set were known, this would simply be an applicationof the O ( k log p ) scaling derived in [14] for the NCOMP algorithm. In Appendix C, we adapt the algorithm andanalysis of [14] to handle the case that k is only known up to a constant factor.In fact, in the present setting, we only know that k ∈ [0 , α k ] , so we do not even know k up to a constantfactor. To get around this, we apply a simple trick that is done purely for the purpose of the analysis: Instead ofapplying the modified NCOMP algorithm directly to { , . . . , p }\ (cid:98) S , apply it to the slightly larger set in which α k “dummy” defective items are included. Then, the number of defectives is in [ α k, α k ] , and is known up to afactor of two. We do not expect that this trick would ever be useful practice, but it is convenient for the sake ofthe analysis. Step 2b.
Since we conditioned on the first step being successful, at most α k of the k items in (cid:98) S are non-defective. In the final step, we simply test each item in (cid:98) S individually (cid:101) n times, and declare the item positive ifand only if at least half of the outcomes are positive.To study the success probability, we use a well-known Chernoff-based concentration bound for Binomial randomvariables: If Z ∼ Binomial(
N, q ) , then P [ Z ≤ nq (cid:48) ] ≤ e − ND ( q (cid:48) (cid:107) q ) , q (cid:48) < q, (12)where D ( q (cid:48) (cid:107) q ) = q (cid:48) log q (cid:48) q + (1 − q (cid:48) ) log − q (cid:48) − q is the binary KL divergence function.Fix an arbitrary item j , and let (cid:101) N j, be the number of its (cid:101) n tests that are positive. Since the test outcomes aredistributed as Bernoulli(1 − ρ ) for defective j and Bernoulli( ρ ) for non-defective j , we obtain from (12) that P (cid:20) (cid:101) N j, ≤ (cid:101) n (cid:21) ≤ e − (cid:101) nD ( (cid:107) − ρ ) = e − (cid:101) nD ( (cid:107) ρ ) , j ∈ S (13) P (cid:20) (cid:101) N j, ≥ (cid:101) n (cid:21) ≤ e − (cid:101) nD ( (cid:107) ρ ) , j / ∈ S. (14)Hence, we obtain from the union bound over the k items in (cid:98) S that P [ (cid:98) S (cid:48) b (cid:54) = ( S ∩ (cid:98) S )] ≤ k · e − (cid:101) nD ( (cid:107) ρ ) . (15)For any η > , the right-hand side tends to zero as p → ∞ under the choice (cid:101) n = log kD ( (cid:107) ρ ) (1 + η ) , (16)which gives a total number of tests in Step 2b of n b = k log kD ( (cid:107) ρ ) (1 + η ) . (17)The proof is concluded by noting that η can be arbitrarily small, and writing D ( (cid:107) ρ ) = log ρ (1 − ρ ) . October 5, 2018 DRAFT
A weakness of Theorem 1 is that it does not achieve the threshold n = k log pk log 2 − H ( ρ ) (1 + o (1)) for any value of θ > (see Figure 1), even though such a threshold is achievable for sufficiently small θ even non-adaptively [8].We overcome this limitation via a refined three-stage algorithm in Section III. B. Practical decoder
Of the three steps given in Algorithm 1 and the proof of Theorem 1, the only one that is computationallydemanding is the first, which uses an information-theoretic threshold decoder to identify S up to a distance ( cf. ,(5)) of d ( S, (cid:98) S ) ≤ α k , for small α > . A similar approximate recovery result was also shown in [16] for separatedecoding of items, which is computationally efficient. The asymptotic threshold on n for separate decoding of itemsis only a log 2 factor worse than the optimal information-theoretic threshold [16], and this fact leads to the followingcounterpart to Theorem 1. Theorem 2.
Under the symmetric noisy group testing setup with crossover probability ρ ∈ (cid:0) , (cid:1) , and k = Θ( p θ ) for some θ ∈ (0 , , there exists a computationally efficient two-stage adaptive group testing algorithm such that n ≤ (cid:18) k log pk log 2 · (log 2 − H ( ρ )) + k log k log ρ (1 − ρ ) (cid:19) (1 + o (1)) (18) and P e → as p → ∞ . The proof is nearly identical to that of Theorem 1, except that the required number of tests in the first stage ismultiplied by in accordance with [16]. For brevity, we omit the details.III. A
CHIEVABILITY (R EFINED V ERSION )As mentioned previously, a weakness of Theorem 1 is that it only achieves the behavior n ≤ (cid:0) k log pk log 2 − H ( ρ ) (cid:1) (1 + o (1)) (for which a matching converse is known [7]) in the limit as θ → , even though this can be achieved evennon-adaptively for sufficiently small θ [8]. Since adaptivity provides extra freedom in the design, we should expectthe corresponding bounds to be at least as good as the non-adaptive setting.While we can simply take the better of Theorem 1 and the exact recovery result of [8], this is a rather unsatisfyingsolution, and it leads to a discontinuity in the asymptotic threshold ( cf. , Figure 1). It is clearly more desirableto construct an adaptive scheme that “smoothly” transitions between the two. In this section, we attain such animprovement by modifying Algorithm 1 in two ways. The resulting algorithm is outlined informally in Algorithm2, and the modifications are as follows: • In the first stage, instead of learning S up to a distance of α k for some constant α ∈ (0 , , we learn it upto a distance of k γ for some γ ∈ (0 , . The non-adaptive partial recovery analysis of [8] requires non-trivialmodifications for this purpose; we provide the details in Appendix A. • We split Step 2b of Algorithm 1 into two stages, one comprising Step 2b in Algorithm 2, and the othercomprising Step 3. The former of these identifies most of the defective items, and the latter resolves the rest.
October 5, 2018 DRAFT
It is worth noting that, at least using our analysis techniques, neither of the above modifications alone is enough toobtain a bound that is always at least as good as the non-adaptive exact recovery result of [8]. We will shortly see,however, that the two modifications combined do suffice.
Algorithm 2:
Three-stage algorithm for noisy group testing (informal).1. Apply the information-theoretic threshold decoder of [8] (see Appendix A) to the ground set { , . . . , p } to findan estimate (cid:98) S of cardinality k such that max {| (cid:98) S \ S | , | S \ (cid:98) S |} ≤ k γ (19)with high probability, where γ ∈ (0 , .2a. Apply a variation of NCOMP [14] (see Appendix C) to the reduced ground set { , . . . , p }\ (cid:98) S to exactly identifythe false negatives from the first step. Let these items be denoted by (cid:98) S (cid:48) a .2b. Test each item in (cid:98) S individually ˇ n times (for some ˇ n to be specified), and let (cid:98) S (cid:48) b ⊆ (cid:98) S contain the k − α k items that returned positive the highest number of times, for some small α > .3. Test the items in (cid:98) S \ (cid:98) S (cid:48) b (of which there are α k ) individually (cid:101) n times (for some (cid:101) n to be specified), and let (cid:98) S (cid:48) contain the items that returned positive at least (cid:101) n times. The final estimate of the defective set is given by (cid:98) S := (cid:98) S (cid:48) a ∪ (cid:98) S (cid:48) b ∪ (cid:98) S (cid:48) .The following theorem characterizes the asymptotic number of tests required. Theorem 3.
Under the symmetric noisy group testing setup with crossover probability ρ ∈ (cid:0) , (cid:1) , under the scaling k = Θ( p θ ) for some θ ∈ (0 , , there exists a three-stage adaptive group testing algorithm such that n ≤ inf γ ∈ (0 , ,δ ∈ (0 , (cid:16) max (cid:8) n MI , , n MI , ( γ, δ ) , n Conc ( γ, δ ) (cid:9) + n Indiv ( γ ) (cid:17) (1 + o (1)) (20) and P e → as p → ∞ , where: • The standard mutual information based term is n MI , = k log pk log 2 − H ( ρ ) . (21) • An additional mutual information based term is n MI , ( γ, δ ) = 2(log 2)(1 − ρ ) log − ρρ · − δ · (cid:16) (1 − θ ) k log p + 2(1 − γ ) k log k (cid:17) . (22) • The term associated with a concentration bound is n Conc ( γ, δ ) = 4(1 + δ (1 − ρ ))(log 2) δ (1 − ρ ) · (1 − γ ) k log k. (23) • The term associated with individual testing is n Indiv ( γ ) = γk log kD ( ρ (cid:107) − ρ ) . (24)While the theorem statement is somewhat complex, it is closely related to other simpler results on group testing: October 5, 2018 DRAFT0 • In the limit as γ → , the term max (cid:8) n MI , , n MI , ( δ ) , n Conc ( γ, δ ) (cid:9) corresponds to the condition for exactrecovery derived in [8]. Since n Indiv ( γ ) becomes negligible as γ → , this means that we have the above-mentioned desired property of being at least as good as the exact recovery result. • Taking γ → and δ → in a manner such that − γδ → , we recover a strengthened version of Theorem 1with D (cid:0) (cid:107) − ρ ) = log ρ (1 − ρ ) increased to D (cid:0) ρ (cid:107) − ρ ) . The parameter δ controls the trade-off between the concentration behavior associated with n Conc and the mutualinformation based term n MI , . A. Proof of Theorem 3
The proof follows similar steps to those of Theorem 1, considering the four steps of Algorithm 2 separately.
Step 1.
We show in Appendix A that the approximate recovery result of [8] can be extended as follows: Thereexists a non-adaptive algorithm recovering an estimate (cid:98) S of cardinality k such that d ( S, (cid:98) S ) ≤ k γ with probabilityapproaching one, provided that the number of tests n satisfies n ≥ max (cid:8) n MI , , n MI , ( γ, δ ) , n Conc ( γ, δ ) (cid:9) · (1 + o (1)) (25)for some δ ∈ (0 , , under the definitions in (21)–(23). This algorithm and its corresponding estimate (cid:98) S constitutethe first step. Step 2a.
The algorithm and analysis for Stage 2 are identical to that of Theorem 1: We use the variation ofNCOMP given in Appendix C to identify all defective items in { , . . . , p } \ (cid:98) S with probability approaching one,while only using O ( k γ log p ) = o ( k log p ) tests. Step 2b.
For this step, we need to show that the set (cid:98) S (cid:48) b constructed in Algorithm 2 only contains defectiveitems. Recall that this set is constructed by testing each item in (cid:98) S individually ˇ n times, and keeping items thatreturned positive the highest number of times. Since (cid:98) S (cid:48) b contains | (cid:98) S | − α k items, requiring all of these itemsto be defective is equivalent to requiring that the set of α k items with the smallest number of positive outcomesincludes the k γ (or fewer) non-defective items in (cid:98) S . For any ζ > , the following two conditions suffice for thispurpose: • Event A : All non-defective items in (cid:98) S return positive less than ζ ˇ n times; • Event A : At most α k − k γ defective items return positive less than ζ ˇ n times.Here we assume that k is sufficiently large so that α k > k γ , which is valid since γ < and α > are constant.Fix an arbitrary item j , and let ˇ N j, be the number of its ˇ n tests that are positive. Since the test outcomes aredistributed as Bernoulli(1 − ρ ) for defective j and Bernoulli( ρ ) for non-defective j , we obtain from (12) that P (cid:2) ˇ N j, ≤ ζ ˇ n (cid:3) ≤ e − ˇ nD ( ζ (cid:107) − ρ ) , j ∈ S (26) P (cid:2) ˇ N j, ≥ ζ ˇ n (cid:3) ≤ e − ˇ nD ( ζ (cid:107) ρ ) , j / ∈ S. (27) By letting the first stage of Algorithm 2 use separate decoding of items [16], one can obtain a strengthened version of Theorem 2 with thesame improvement. This result is omitted for the sake of brevity, as the main purpose of the refinements given in this section is to obtain abound that is always at least as good as the non-adaptive information-theoretic bound of [8].
October 5, 2018 DRAFT1
Hence, we obtain from the union bound over the non-defective items in (cid:98) S that P [ A c ] ≤ k γ · e − ˇ nD ( ζ (cid:107) ρ ) , (28)which is upper bounded by δ > as long as ˇ n ≥ log k γ δ D ( ζ (cid:107) ρ ) . (29)Moreover, regarding the event A , the average number of defective items that return positive less than ζ ˇ n times isupper bounded by ke − ˇ nD ( ζ (cid:107) − ρ ) (recall that | (cid:98) S | = k ), and hence, Markov’s inequality gives P [ A c ] ≤ ke − ˇ nD ( ζ (cid:107) − ρ ) α k − k γ . (30)This is upper bounded by k log k α k − k γ → as long as ˇ n ≥ log log kD ( ζ (cid:107) − ρ ) . This, in turn, behaves as o (log k ) for any ζ < − ρ . Hence, we are left with only the condition on ˇ n in (29), and choosing ζ arbitrarily close to − ρ meansthat we only need the following to hold for arbitrarily small η > : ˇ n ≥ γ log kD (1 − ρ (cid:107) ρ ) (1 + η ) , (31)since log k γ δ = ( γ log k )(1 + o (1)) for arbitrarily small δ . Multiplying by k (i.e., the number of items that aretested individually ˇ n times) and noting that D (1 − ρ (cid:107) ρ ) = D ( ρ (cid:107) − ρ ) , we deduce that the number of tests in thisstage is asymptotically at most n Indiv ( γ ) , defined in (24). Step 3.
This step is the same as that of Step 2b in Algorithm 1, but we are now working with α k items ratherthan k items. As a result, the number of tests required is O ( α k log k ) , meaning that the coefficient to k log k canbe made arbitrarily small by a suitable choice of α .IV. C ONVERSE
To our knowledge, the best-known existing converse bound for the symmetric noise model in the adaptive settingis the capacity-based bound of [7], shown in (6). On the other hand, the preceding achievability bounds contains k log k terms, meaning that the gap between the achievability and converse grows unbounded as θ → under thescaling k = Θ( p θ ) (since k log k = Θ (cid:0) θ − θ k log pk (cid:1) ). In this section, we provide a novel converse bound revealingthat Ω( k log k ) behavior is unavoidable.There is a minor caveat to this converse result: We have not been able to prove it in the case that S is knownto have cardinality exactly k , but rather, only in the case that it is known to have cardinality either k or k − .We strongly conjecture that this distinction has no impact on the fundamental limits; we argue in Appendix B thatTheorem 1 remains true even when k is only known up to a multiplicative o (1) term, and Theorem 3 remainstrue when k is only known up to an additive o ( k γ ) term. Since we assume that k → ∞ , these assumptions aremuch milder than the assumption | S | ∈ { k − , k } To make the model definition more precise, fix k ≤ p , and define S k,p = (cid:8) S ⊆ { , . . . , p } : | S | = k (cid:9) , (32) October 5, 2018 DRAFT2 and similarly for S k − ,p . We consider the following distribution for the random defective set: S ∼ Uniform( S k,p ∪ S k − ,p ) . (33)Under this slightly modified model, we have the following. Theorem 4.
Consider the symmetric noisy group testing setup with crossover probability ρ ∈ (cid:0) , (cid:1) , S distributedaccording to (33) , and k → ∞ with k ≤ p . For any adaptive algorithm, in order to achieve P e → , it is necessarythat n ≥ max (cid:26) k log pk log 2 − H ( ρ ) , k log k log − ρρ (cid:27) (1 − o (1)) . (34) Proof.
See Section IV-A.The first term is precisely (6), so our novelty is in deriving the second term. This result provides the first counter-example to the natural conjecture that the optimal number of tests is (cid:0) k log pk log 2 − H ( ρ ) (cid:1) (1 + o (1)) whenever k = Θ( p θ ) with θ ∈ (0 , . Indeed, the Ω( k log k ) lower bound reveals that the constant pre-factor to k log pk must growunbounded as θ → .It is interesting to observe the behavior of Theorems 1, 3, and 4 in the limit as ρ → . As one should expect,under the scaling k = Θ( p θ ) for fixed θ ∈ (0 , , both the achievability and converse bounds (see (8) and (34))tend towards the noiseless limit (cid:0) k log pk (cid:1) (1 + o (1)) as ρ → . Moreover, the achievability and converse boundsscale similarly with respect to ρ , in the sense that the k log k term is scaled by Θ (cid:0) ρ (cid:1) in both cases.In fact, if we consider the refined achievability bound (Theorem 3), we can make a stronger claim. If we take γ → and θ → simultaneously, then the bound in (20) is asymptotically equivalent to n Indiv (1) , since n MI , scales as k log pk (cid:28) k log k , whereas the constant factors in n MI , and n Conc vanish (see (21)–(23)). Hence, we areonly left with n Indiv (1) in (24), and if ρ is small, then the denominator D ( ρ (cid:107) − ρ ) = ρ log ρ − ρ + (1 − ρ ) log − ρρ is approximately equal to log ρ . The exact same statement is true for the denominator in (34), and hence, theachievability and converse bounds exhibit matching constant factors . Specifically, this statement holds when theorder of the limits is first n → ∞ , then θ → , then ρ → . This fact explains the near-identical behavior of theachievability and converse in Figure 1 for θ close to one in the low noise setting, ρ = 10 − .On the other hand, for fixed θ ∈ (0 , , the logarithmic decay of the Θ (cid:0) ρ (cid:1) factor to zero is quite slow, whichexplains the non-negligible deviation from the noiseless threshold (i.e., a straight line at height ) in Figure 1, evenin the low-noise case.Another interesting consequence of Theorem 4 is that in the linear regime k = Θ( p ) , one requires n = Ω( p log p ) in the presence of noise. This is in stark contrast to the noiseless setting, where individual testing trivially identifies S with only p tests.The proof of Theorem 4 is inspired by that of a converse bound for the top- m arm identification problem fromthe multi-armed bandit (MAB) literature [34]. Compared with the latter, the adaptive group testing setting has anumber of distinct features that are non-trivial to handle: October 5, 2018 DRAFT3 • In group testing, one does not necessarily test one item at a time, whereas in the MAB setting of [34], onepulls one arm at a time. • In contrast with [34], we do not consider a minimax lower bound, but rather, a Bayesian lower bound for agiven distribution on S . The latter is more difficult, in the sense that a Bayesian lower bound implies a minimaxlower bound but not vice versa. • In our setting, the status of each item is binary-valued (defective or non-defective), whereas the constructionof a hard MAB problem in [34] consists of three distinct types of items (or “arms” in the MAB terminology),corresponding to high reward, medium reward, and low reward.We now proceed with the proof.
A. Proof of Theorem 4
We assume without loss of generality that any given test X ( i ) is deterministic given Y (1) , . . . , Y ( i − , and thatthe final estimate (cid:98) S is similarly deterministic given the test outcomes. To see that it suffices to consider this case,we note that P [error] = E (cid:2) P [error | A ] (cid:3) ≥ min A P [error | A = A ] , (35)where A denotes a randomized algorithm (i.e., combination of test design and decoder), and A is a realization of A corresponding to a deterministic algorithm.Suppose that after S is randomly generated according to (33), a genie reveals S ∪ T to the decoder, where T is auniformly random set of non-defective items such that | S ∪ T | = 2 k (i.e., T has cardinality k − | S | ∈ { k, k + 1 } ).Hence, we are left with an easier group testing problem consisting of k items, k − or k of which are defective.Since the prior distribution on S in (33) is uniform, we see that conditioned on the ground set of size k , thedefective set S is uniform on the (cid:0) kk (cid:1) + (cid:0) kk − (cid:1) possibilities.Without loss of generality, assume that the k revealed items are { , . . . , k } , and hence, the new distribution of S given the information from the genie is S ∼ Uniform( S k, k ∪ S k − , k ) . (36)We first study the error probability conditioned on a given defective set S ⊂ { , . . . , k } having cardinality k . Forany such fixed choice, we denote probabilities and expectations (with respect to the noisy outcomes) by P S and E S .Fix (cid:15) ∈ (cid:0) , (cid:1) , and for each j ∈ S , let N j be the (random) number of tests containing item j and no otherdefective items. Since (cid:80) j ∈ S N j ≤ n with probability one, we have (cid:80) j ∈ S E S [ N j ] ≤ n , meaning that at most (1 − (cid:15) ) k of the j ∈ S have E S [ N j ] ≥ n (1 − (cid:15) ) k . For all other j , we have E S [ N j ] ≤ n (1 − (cid:15) ) k , and Markov’s inequalitygives P S [ N j ≥ (1+2 (cid:15) ) nk ] ≤ − (cid:15) )(1+2 (cid:15) ) < . Denoting ψ ( (cid:15) ) := − (cid:15) )(1+2 (cid:15) ) for brevity, we have proved the following. Lemma 1.
For any (cid:15) ∈ (cid:0) , (cid:1) , and any set S ⊂ { , . . . , k } of cardinality k , there exist at least (cid:15)k items j ∈ S such that P S [ N j ≥ (1+2 (cid:15) ) nk ] ≤ ψ ( (cid:15) ) , where ψ ( (cid:15) ) = − (cid:15) )(1+2 (cid:15) ) . October 5, 2018 DRAFT4
The following lemma, consisting of a change of measure between the probabilities under two different defectivesets, will also be crucial. Recalling that we are considering test designs that are deterministic given the past samples,we see that N j is a deterministic function of Y = ( Y (1) , . . . , Y ( n ) ) , so we write the corresponding function as n j ( y ) . Moreover, we let Y S be the set of y sequences that are decoded as S , and we write P [ y ] and P [ Y S ] asshorthands for P [ Y = y ] and P [ Y ∈ Y S ] , respectively. Lemma 2.
Given S of cardinality k , for any j ∈ S , and any output sequence y such that n j ( y ) ≤ (1+2 (cid:15) ) nk , wehave P S \{ j } [ y ] ≥ P S [ y ] (cid:16) ρ − ρ (cid:17) (1+2 (cid:15) ) nk . (37) Moreover, if j ∈ S is such that P S [ N j ≥ (1+2 (cid:15) ) nk ] ≤ ψ ( (cid:15) ) , then P S \{ j } [ Y S ] ≥ (cid:0) P S [ Y S ] − ψ ( (cid:15) ) (cid:1)(cid:16) ρ − ρ (cid:17) (1+2 (cid:15) ) nk . (38) Proof.
Again using the fact that the test designs that are deterministic given the past samples, we can write P S [ y ] = n (cid:89) i =1 P S [ y ( i ) | y (1) , . . . , y ( i − ] (39) = n (cid:89) i =1 P S [ y ( i ) | x ( i ) ] , (40)where x ( i ) ∈ { , } p is the i -th test. Note that (40) holds because Y ( i ) depends on the previous samples onlythrough X ( i ) . An analogous expression also holds for P S \{ j } [ y ] .Due to the “or” operation in the observation model (2), the only tests for which the outcome probability changesas a result of removing j from S are those for which j was the unique defective item tested. We have at most (1+2 (cid:15) ) nk such tests by assumption, and each of them causes the probability of y ( i ) (given x ( i ) ) to be multiplied ordivided by ρ − ρ . Since ρ < . , we deduce the lower bound in (37), corresponding to the case that all (1+2 (cid:15) ) nk ofthem are multiplied by this factor.To prove the second part, we write P S \{ j } [ Y S ] ≥ P S \{ j } (cid:20) Y ∈ Y S ∩ N j ≤ (1 + 2 (cid:15) ) nk (cid:21) (41) ≥ P S (cid:20) Y ∈ Y S ∩ N j ≤ (1 + 2 (cid:15) ) nk (cid:21)(cid:16) ρ − ρ (cid:17) (1+2 (cid:15) ) nk (42) ≥ (cid:0) P S [ Y S ] − ψ ( (cid:15) ) (cid:1)(cid:16) ρ − ρ (cid:17) (1+2 (cid:15) ) nk , (43)where (42) follows from the first part of the lemma, and (43) follows by writing P [ A ∩ B ] ≥ P [ A ] − P [ B c ] .The idea behind applying this lemma is that if a given y is decoded to S , then it cannot be decoded to S \ { j } ;hence, if a given sequence y contributes to P S [no error] , then it also contributes to P S \{ j } [error] . We formalize October 5, 2018 DRAFT5 this idea as follows. Recalling that S k, k is the set of all subsets of { , . . . , k } of cardinality k , we have (cid:88) S (cid:48) ∈S k − , k P S (cid:48) [error] ≥ (cid:88) S (cid:48) ∈S k − , k (cid:88) j / ∈ S (cid:48) P S (cid:48) [ Y S (cid:48) ∪{ j } ] (44) = (cid:88) S (cid:48) ∈S k − , k (cid:88) j / ∈ S (cid:48) (cid:88) S ∈S k, k (cid:8) S = S (cid:48) ∪ { j } (cid:9) P S (cid:48) [ Y S ] (45) = (cid:88) S (cid:48) ∈S k − , k k (cid:88) j =1 (cid:88) S ∈S k, k (cid:8) S = S (cid:48) ∪ { j } (cid:9) P S (cid:48) [ Y S ] (46) = (cid:88) S ∈S k, k (cid:88) j ∈ S (cid:88) S (cid:48) ∈S k − , k (cid:8) S = S (cid:48) ∪ { j } (cid:9) P S (cid:48) [ Y S ] (47) = (cid:88) S ∈S k, k (cid:88) j ∈ S P S \{ j } [ Y S ] , (48)where (44) follows since S (cid:48) differs from S (cid:48) ∪ { j } , (45) follows since the indicator function is only equal to one for S = S (cid:48) ∪ { j } , (46) follows since the extra j included in the middle summation (i.e., j ∈ S ) also make the indicatorfunction equal zero, (47) follows by re-ordering the summations and noting that the indicator function equals zerowhen j / ∈ S , and (48) follows by only keeping the S (cid:48) for which the indicator function is one.The following lemma is based on lower bounding (48) using Lemma 2. Lemma 3. If |S k, k | (cid:80) S ∈S k, k P S [error] ≤ δ for some δ > , then |S k − , k | (cid:88) S (cid:48) ∈S k − , k P S (cid:48) [error] ≥ (cid:15)k · (cid:0) − δ − ψ ( (cid:15) ) (cid:1) · (cid:16) ρ − ρ (cid:17) (1+2 (cid:15) ) nk (49) for any (cid:15) ∈ (cid:0) , (cid:1) .Proof. Since |S k, k | (cid:80) S ∈S k, k P S [error] ≤ δ and |S k, k | = (cid:0) kk (cid:1) , there must exist at least (cid:0) kk (cid:1) defective sets S ∈ S k, k such that P S [error] ≤ δ . We lower bound the first summation in (48) by a summation over such S , and for each one, we lower bound the summation over j ∈ S by the set of size at least (cid:15)k given in Lemma1. For the choices of S and j that are kept in this lower bound, the summand P S \{ j } [ Y S ] is lower bounded by (cid:0) − δ − ψ ( (cid:15) ) (cid:1)(cid:0) ρ − ρ (cid:1) (1+2 (cid:15) ) nk by the second part of Lemma 2 (with P S [ Y S ] = P S [no error] ≥ − δ ). Putting thisall together, we obtain (cid:88) S (cid:48) ∈S k − , k P S (cid:48) [error] ≥ (cid:18) kk (cid:19) · (cid:15)k · (cid:0) − δ − ψ ( (cid:15) ) (cid:1) · (cid:16) ρ − ρ (cid:17) (1+2 (cid:15) ) nk . (50)Using the identity (cid:0) kk (cid:1) = (cid:0) kk − (cid:1) · k − kk = 2 (cid:0) kk − (cid:1) , this yields (cid:0) kk − (cid:1) (cid:88) S (cid:48) ∈S k − , k P S (cid:48) [error] ≥ (cid:15)k · (cid:0) − δ − ψ ( (cid:15) ) (cid:1) · (cid:16) ρ − ρ (cid:17) (1+2 (cid:15) ) nk , (51)which proves the lemma.Recalling that ψ ( (cid:15) ) = − (cid:15) )(1+2 (cid:15) ) , it is easily verified that ψ ( (cid:15) ) < for all (cid:15) ∈ (cid:0) , (cid:1) . Hence, by a suitablechoice of δ , we can let (cid:15) be arbitrarily small while still ensuring that − δ − ψ ( (cid:15) ) > . Moreover, P [ S ∈ S k, k ] and P [ S ∈ S k − , k ] are both bounded away from zero under the distribution in (33). Most importantly, the term October 5, 2018 DRAFT6 ⇢ ⇢ ⇢ ⇢ ⇢ ⇢ ⇢ ⇢ Figure 2: Z-channel (Left) and reverse Z-channel (Right). k (cid:0) ρ − ρ (cid:1) (1+2 (cid:15) ) nk appearing in (49) is lower bounded by δ (cid:48) > as long as n ≤ k log( kδ (cid:48) )(1+2 (cid:15) ) log − ρρ . Since (cid:15) may be arbitrarilysmall and log( kδ (cid:48) ) = (log k )(1 + o (1)) , we deduce that the following condition is necessary for attaining arbitrarilysmall error probability: n ≥ k log k log − ρρ (1 − η ) , (52)where η > is arbitrarily small. This completes the proof of Theorem 4.V. O THER O BSERVATION M ODELS
While we have focused on the symmetric noise model (2) for concreteness, most of our algorithms and analysistechniques can be extended to other observation models. In this section, we present some of the resulting boundsfor three different models: The noiseless model (1), the Z-channel model, P Y | U (0 |
0) = 1 , P Y | U (1 |
0) = 0 , (53) P Y | U (0 |
1) = ρ, P Y | U (1 |
1) = 1 − ρ, (54)and the reverse Z-channel model, P Y | U (0 |
0) = 1 − ρ, P Y | U (1 |
0) = ρ, (55) P Y | U (0 |
1) = 0 , P Y | U (1 |
1) = 1 , (56)where in both cases we define U = ∨ j ∈ S X j . That is, we pass the noiseless observation through the suitable binarychannel; see Figure 2 for an illustration. Under the Z-channel model, positive tests indicate with certainty that adefective item is included, whereas under the reverse Z-channel model, negative tests indicate with certainty thatno defective item is included. While the two channels have the same capacity, it is interesting to ask whether oneof the two is fundamentally more difficult to handle in the context of group testing. We provide a partial answer tothis question in the adaptive setting; see also [35] for the non-adaptive setting. A. Noiseless setting
In the noiseless setting, the final step of Algorithm 1 is much simpler: Simply test the items in (cid:98) S individuallyonce each. This only requires k tests, and succeeds with certainty, yielding the following. October 5, 2018 DRAFT7
Theorem 5.
Under the scaling k = Θ( p θ ) for some θ ∈ (0 , , there exists a two-stage algorithm for noiselessadaptive group testing that succeeds with probability approaching one, with a number of tests bounded by n ≤ (cid:18) k log pk (cid:19) (1 + o (1)) . (57) Moreover, there exists a computationally efficient two-stage algorithm that succeeds with probability approachingone, with a number of tests bounded by n ≤ (cid:18) k log pk (cid:19) (1 + o (1)) . (58)The upper bound (57) is tight, as it matches the so-called counting bound, e.g., see [36]. To our knowledge, theminimum number of stages used to attain this bound previously for all θ ∈ (0 , was four [27]. It is worth noting,however, that the algorithm of [27] has low computational complexity, unlike Algorithm 1.The bound (57) does not contradict the converse bound of Mézard and Toninelli [29]; the latter states that anytwo-stage algorithm with zero error probability must have an average number of tests of (cid:0) k log pk (cid:1) (1 + o (1)) or higher. In contrast, (57) corresponds to vanishing error probability and a fixed number of tests. B. Z-channel model
Under the Z-channel model, the capacity-based converse bound of [7] turns out to be tight for all θ ∈ (0 , , asstated in the following. Theorem 6.
Under the noisy group testing model with Z-channel noise having parameter ρ ∈ (0 , , and a numberof defectives satisfying k = Θ( p θ ) for some θ ∈ (0 , , there exists a three-stage adaptive algorithm achievingvanishing error probability with n ≤ k log pk C ( ρ ) (1 + o (1)) , (59) where C ( ρ ) is the capacity of the Z-channel in nats.Proof. The analysis is similar to that of the symmetric noise model ( cf. , Theorem 3), so we omit most of the details.In the first stage, we use i.i.d. Bernoulli testing with parameter ν > chosen to ensure that the induceddistribution P U of U = ∨ j ∈ S X i equals the capacity-achieving input distribution of the Z-channel. Under thischoice, a straightforward extension of the analysis of [8] (see the final part of Appendix A for details) reveals thatwe can find a set (cid:98) S of cardinality k such that d ( S, (cid:98) S ) ≤ α k with n satisfying (59), where d is defined in (5),and α > is arbitrarily small.The second stage is similar to steps 2a and 2b in Algorithm 2. The modifications required in step 2a are statedin Appendix C, and step 2b is in fact simpler: We include a given item in (cid:98) S (cid:48) b if and only if any of its testsreturned positive. Due to the nature of the Z-channel, no non-defectives are included in (cid:98) S (cid:48) b . On the other hand, theprobability of a positive item returning negative on all ˇ n tests is given by ρ ˇ n , and is asymptotically vanishing if ˇ n = log log k (say). Hence, by Markov’s inequality, we have with probability approaching one that the number ofdefective items that fail to be placed in (cid:98) S (cid:48) b is smaller than α k with probability approaching one. Moreover, therequired number of tests is O ( k log log k ) , which is asymptotically negligible. October 5, 2018 DRAFT8
In the third stage, as in Algorithm 2, we test each item individually (cid:101) n times. Here, however, we let (cid:98) S (cid:48) containthe items that returned positive in any test. There are again no false positives, and a given defective item is a falsenegative with probability ρ (cid:101) n . By the union bound and the fact that there are at most α k items and α k defectiveitems, we readily deduce vanishing error probability as long as (cid:101) n = O (log k ) , meaning the total number of tests is O ( α k log k ) . This is asymptotically negligible, since α is arbitrarily small.This result shows that under Z-channel noise, the conjecture of the optimal (inverse) coefficient to k log pk equalingthe channel capacity (e.g., see [7]) is true for all θ ∈ (0 , , in stark contrast to the symmetric noise model.It is worth noting that the converse analysis of Section IV does not apply to the Z channel model. This is becauseany analog of Lemma 2 is impossible: If there exists a test outcome y i = 1 where j is the only defective included,then P S \{ j } [ y ] = 0 , meaning we cannot hope for an inequality of the form (37). C. Reverse Z-channel model
Under the reverse Z-channel model, we have the following analog of the converse bound in Theorem 4.
Theorem 7.
Consider the noisy group testing setup with reverse Z-channel noise having parameter ρ ∈ (0 , , S distributed according to (33) , and k → ∞ with k ≤ p . For any adaptive algorithm, in order to achieve P e → , itis necessary that n ≥ max (cid:26) k log pk C ( ρ ) , k log k log ρ (cid:27) (1 − o (1)) , (60) where C ( ρ ) is the capacity of the Z-channel in nats.Proof. The first bound in (60) is the capacity-based bound from [7]. On the other hand, the second bound followsfrom a near-identical analysis to the proof of Theorem 4, with the only difference being that ρ − ρ is replaced by ρ in (37) and the subsequent equations that make use of (37).We note that unlike the Z -channel, the cases where one of P S [ y ] and P S \{ j } [ y ] is zero and the other is non-zeroare not problematic. Specifically, this only occurs when P S [ y ] = 0 , and in this case, any inequality of the form (37)is trivially true.Interestingly, this result shows that reverse Z-channel noise is more difficult to handle than Z-channel noise byan arbitrarily large factor as θ gets closer to one, even though the two channels have the same capacity.VI. C ONCLUSION
We have developed both information-theoretic limits and practical performance guarantees for noisy adaptive grouptesting. Some of the main implications of our results include the following: • Under the scaling k = Θ( p θ ) , for most θ ∈ (0 , , our information-theoretic achievability guarantees for thesymmetric noise model are significantly better than the best known non-adaptive achievability guarantees, andsimilarly when it comes to practical guarantees. October 5, 2018 DRAFT9 • Our converse for the symmetric noise model reveals that n = Ω( k log k ) is necessary, and hence, the impliedconstant to n = Θ (cid:0) k log pk (cid:1) must grow unbounded as θ → . This phenomenon also holds true for the reverseZ-channel noise model, but not for the Z-channel noise model. • Our bounds are tight or near-tight in several cases of interest, including small values of θ , and low noiselevels with θ close to one. Moreover, in the noiseless case, we obtain the optimal threshold using a two-stagealgorithm; previously the smallest known number of stages was four.It is worth noting that our two-stage (or three-stage) algorithm and its analysis remain applicable when any non-adaptive algorithm is used in the first stage, as long as it identifies a suitably high fraction of the defective set. Hence,improved practical or information-theoretic guarantees for partial recovery in the non-adaptive setting immediatelytransfer to improved exact recovery guarantees in the adaptive setting.A PPENDIX
A. Non-Adaptive Partial Recovery Result with d max = Θ( k γ ) The analysis of [8] considers the case that the maximum distance ( cf. , (4)–(5)) scales as d max = Θ( k ) . In thissection, we adapt the analysis therein to the case d max = Θ( k γ ) for some γ ∈ (0 , . This generalization is usefulfor the refined achievability bound given in Section III ( cf. , Theorem 3), and is also of interest in its own right.
1) Notation:
Recall that S is uniform on the set of subsets of { , . . . , p } having a given cardinality k . As in [8],we consider non-adaptive i.i.d. Bernoulli testing, where each item is placed in a given test with probability νk forsome ν > . We focus our attention on ν = log 2 , though we will still write ν for the parts of the analysis thatapply more generally. The test matrix is denoted by X ∈ { , } n × p (i.e., the i -th row is X ( i ) ), and the notation X s denotes the sub-matrix obtained by keeping only the columns indexed by s ⊆ { , . . . , p } .Next, we recall some notation from [8]. It will prove convenient to work with random variables that are implicitlyconditioned on a fixed value of S , say s = { , . . . , k } . We write P Y | X s for the conditional test outcome probability,where X s is the subset of the test vector X indexed by s . Moreover, we write P X s Y ( x s , y ) := P kX ( x s ) P Y | X s ( y | x s ) , (61) P X s Y ( x s , y ) := P n × kX ( x s ) P nY | X s ( y | x s ) , (62)where P nY | X s ( ·|· ) is the n -fold product of P Y | X s ( ·|· ) , and P ( · ) X denotes the i.i.d. Bernoulli (cid:0) νk (cid:1) distribution for avector or matrix of the size indexed in the superscript. The random variables ( X s , Y ) and ( X s , Y ) are distributedas ( X s , Y ) ∼ P X s Y , (63) ( X s , Y ) ∼ P X s Y , (64)and the remaining entries of the measurement matrix are distributed as X s c ∼ P n × ( p − k ) X , independent of ( X s , Y ) . October 5, 2018 DRAFT0
In our analysis, we consider partitions of the defective set s into two sets s dif (cid:54) = ∅ and s eq . One can think of s eq as corresponding to an overlap s ∩ s between the true set s and some incorrect set s , with s dif corresponding to theindices s \ s in one set but not the other. For a fixed defective set s , and a corresponding pair ( s dif , s eq ) , we write P Y | X s dif X s eq ( y | x s dif , x s eq ) := P Y | X s ( y | x s ) , (65)where P Y | X s is the marginal distribution of (62). This form of the conditional error probability allows us to introducethe marginal distribution P Y | X s eq ( y | x s eq ) := (cid:88) x s dif P (cid:96)X ( x s dif ) P Y | X s dif X s eq ( y | x s dif , x s eq ) , (66)where (cid:96) := | s dif | = k − | s eq | . Using the preceding definitions, we introduce the information density [37] ı n ( x s dif ; y | x s eq ) := n (cid:88) i =1 ı ( x ( i ) s dif ; y ( i ) | x ( i ) s eq ) , (67) ı ( x s dif ; y | x s eq ) := log P Y | X s dif X s eq ( y | x s dif , x s eq ) P Y | X s eq ( y | x s eq ) (68)where ( · ) ( i ) denotes the i -th entry (respectively, row) of a vector (respectively, matrix). Averaging (68) with respectto ( X s , Y ) in (63) yields a conditional mutual information, which we denote by I (cid:96) := I ( X s dif ; Y | X s eq ) , (69)where (cid:96) := | s dif | ; by symmetry, the mutual information for each ( s dif , s eq ) depends only on this quantity.
2) Choice of decoder:
We use the same information-theoretic threshold decoder as that in [8]: Fix the constants { γ (cid:96) } k(cid:96) = d max +1 , and search for a set s of cardinality k such that ı n ( X s dif ; Y | X s eq ) ≥ γ | s dif | , ∀ ( s dif , s eq ) such that | s dif | > d max . (70)If multiple such s exist, or if none exist, then an error is declared. This decoder is inspired by analogous thresholdingtechniques from the channel coding literature [38], [39].
3) Useful existing results:
We build heavily on several intermediate results given in [17], stated as follows: • Initial bounds.
Since the analysis is the same for any defective set s of cardinality k , we assume without lossof generality that s = { , . . . , k } . The initial non-asymptotic bound of [8] takes the form P e ( d max ) ≤ P (cid:20) (cid:91) ( s dif ,s eq ) : | s dif | >d max (cid:26) ı n ( X s dif ; Y | X s eq ) ≤ log (cid:18) p − k | s dif | (cid:19) + log (cid:18) kδ (cid:18) k | s dif | (cid:19)(cid:19)(cid:27)(cid:21) + δ (71)for any δ > . A simple consequence of this non-asymptotic bound is the following: For any positive constants { δ ,(cid:96) } k(cid:96) = d max +1 , if the number of tests is at least n ≥ max (cid:96) = d max +1 ,...,k log (cid:0) p − k(cid:96) (cid:1) + log (cid:0) kδ (cid:0) k(cid:96) (cid:1)(cid:1) I (cid:96) (1 − δ ,(cid:96) ) , (72)and if each information density satisfies a concentration bound of the form P (cid:2) ı n ( X s dif ; Y | X s eq ) ≤ n (1 − δ ,(cid:96) ) I (cid:96) (cid:3) ≤ ψ (cid:96) ( n, δ ,(cid:96) ) , (73) October 5, 2018 DRAFT1 for some functions { ψ (cid:96) } k(cid:96) = d max +1 , then P e ( d max ) ≤ k (cid:88) (cid:96) = d max +1 (cid:18) k(cid:96) (cid:19) ψ (cid:96) ( n, δ ,(cid:96) ) + δ . (74) • Characterization of mutual information.
Under the symmetric noise model with crossover probability ρ ∈ (cid:0) , (cid:1) , the conditional mutual information I (cid:96) behaves as follows as k → ∞ : – If (cid:96)k → , then I (cid:96) = (cid:18) e − ν ν (cid:96)k (1 − ρ ) log 1 − ρρ (cid:19) (1 + o (1)) . (75) – If (cid:96)k → α ∈ (0 , , then I (cid:96) = e − (1 − α ) ν (cid:0) H (cid:0) e − αν (cid:63) ρ (cid:1) − H ( ρ ) (cid:1) (1 + o (1)) . (76) • Concentration bounds.
The following concentration bounds provide explicit choices for ψ (cid:96) satisfying (73): – For all (cid:96) and δ > , we have P (cid:104)(cid:12)(cid:12) ı n ( X s dif ; Y | X s eq ) − nI (cid:96) (cid:12)(cid:12) ≥ nδ (cid:105) ≤ (cid:18) − δ n δ ) (cid:19) (77)for all ( s dif , s eq ) with | s dif | = (cid:96) . – If (cid:96)k → , then for any (cid:15) > and δ > (not depending on p ), the following holds for sufficiently large p : P (cid:104) ı n ( X s dif ; Y | X s eq ) ≤ nI (cid:96) (1 − δ ) (cid:105) ≤ exp (cid:18) − n (cid:96)k e − ν ν (cid:18) δ (1 − ρ ) δ (1 − ρ )) (cid:19) (1 − (cid:15) ) (cid:19) . (78)for all ( s dif , s eq ) with | s dif | = (cid:96) .With these tools in place, we proceed by obtaining an explicit bound on the number of tests for the case d max =Θ( k γ ) .
4) Bounding the error probability:
We split the summation over (cid:96) in (74) into two terms: T := k √ log k (cid:88) (cid:96) = d max +1 (cid:18) k(cid:96) (cid:19) ψ (cid:96) ( n, δ (1)2 ) , T := k (cid:88) (cid:96) = k √ log k (cid:18) k(cid:96) (cid:19) ψ (cid:96) ( n, δ (2)2 ) , (79)where we have let δ ,(cid:96) equal a given value δ (1)2 ∈ (0 , for all (cid:96) in the first sum, and a different value δ (2)2 ∈ (0 , for all (cid:96) in the second sum.To bound T , we consider ψ (cid:96) ( n, δ ) equaling the right-hand side of (78). Letting c ( δ ) = e − ν ν (cid:0) δ (1 − ρ ) δ (1 − ρ )) (cid:1) (1 − (cid:15) ) for brevity, we have T ≤ k max (cid:96) = d max +1 ,..., k √ log k (cid:18) k(cid:96) (cid:19) e − n · (cid:96)k · c ( δ (1)2 ) , (80)where we have upper bounded the summation defining T by k times the maximum. Re-arranging, we find in orderto attain T ≤ δ , it suffices that n ≥ max (cid:96) = d max +1 ,..., k √ log k c ( δ (1)2 ) · k(cid:96) · (cid:18) log (cid:18) k(cid:96) (cid:19) + log kδ (cid:19) . (81) October 5, 2018 DRAFT2
Writing log (cid:0) k(cid:96) (cid:1) = (cid:0) (cid:96) log k(cid:96) (cid:1) (1 + o (1)) , this simplifies to n ≥ max (cid:96) = d max +1 ,..., k √ log k c ( δ (1)2 ) · (cid:18) k log k(cid:96) + k(cid:96) log kδ (cid:19) (82) = (cid:18) c ( δ (1)2 ) · (1 − γ ) k log k (cid:19) (1 + o (1)) , (83)since the maximum is achieved by the smallest value d max + 1 = Θ( k γ ) , and for that value, the second term isasymptotically negligible compared to the first. Substituting the definition of c ( · ) and taking (cid:15) → , we obtain thecondition n ≥ (cid:18) δ (1 − ρ )) e − ν νδ (1 − ρ ) (cid:19) · (cid:16) (1 − γ ) k log k (cid:17) (1 + o (1)) (84)To bound T , we consider ψ (cid:96) ( n, δ ) equaling the right-hand side of (77) with δ = δ (2)2 I (cid:96) . Again upper boundingthe summation by k times the maximum, and defining c (cid:48) ( δ ) = δ δ I (cid:96) ) , we obtain T ≤ k max (cid:96) = k √ log k ,...,k (cid:18) k(cid:96) (cid:19) · (cid:0) − c (cid:48) ( δ (2)2 ) I (cid:96) n (cid:1) . (85)It follows that in order to attain T ≤ δ , it suffices that n ≥ c (cid:48) ( δ (2)2 ) I (cid:96) log 2 k · (cid:0) k(cid:96) (cid:1) δ (86)for all (cid:96) = k √ log k , . . . , k . By the mutual information characterizations in (75)–(76), we have c (cid:48) ( δ (2)2 ) = Θ(1) forany δ (2)2 ∈ (0 , , and I (cid:96) = Θ (cid:0)(cid:0) (cid:96)k (cid:1) (cid:1) . We consider this fact in the following two cases: • If (cid:96) = Θ( k ) , then (86) simply amounts to n = Ω( k ) ; • If (cid:96) = o ( k ) , then also writing log k · ( k(cid:96) ) δ = Θ (cid:0) (cid:96) log k(cid:96) (cid:1) , we find that (86) takes the form n = Ω (cid:0) k (cid:96) log k(cid:96) (cid:1) . Themost stringent condition is then provided by the smallest value (cid:96) = k √ log k , yielding n = Ω (cid:0) k · log log k ·√ log k (cid:1) .Combining these two cases, we deduce that T vanishes for any scaling of the form n = Ω (cid:0) k log pk (cid:1) , since log pk =Θ(log p ) = Θ(log k ) in the sub-linear regime k = Θ( p θ ) with θ ∈ (0 , .
5) Characterizing the mutual-information based condition (72) : Recall that we require the number of tests tosatisfy (72). For the values of (cid:96) corresponding to T in (79), we have chosen δ ,(cid:96) = δ (1)2 , and the mutual informationcharacterization (75) yields the condition n ≥ max (cid:96) = d max +1 ,..., k √ log k k log p(cid:96) + k log k(cid:96) + k(cid:96) log (cid:0) kδ (cid:1)(cid:0) e − ν ν (1 − ρ ) log − ρρ (cid:1) (1 − δ (1)2 ) (1 + o (1)) , (87)where we have applied log (cid:0) p − k(cid:96) (cid:1) = (cid:0) (cid:96) log p(cid:96) (cid:1) (1 + o (1)) and log (cid:0) k(cid:96) (cid:1) = (cid:0) (cid:96) log k(cid:96) (cid:1) (1 + o (1)) for (cid:96) = o ( k ) . Writing k log p(cid:96) + k log k(cid:96) = k log pk + k log k (cid:96) and recalling that k = Θ( p θ ) and d max = Θ( k γ ) , we find that (87) simplifiesto n ≥ (1 − θ ) k log p + 2(1 − γ ) log k (cid:0) e − ν ν (1 − ρ ) log − ρρ (cid:1) (1 − δ (1)2 ) (1 + o (1)) , (88)since the maximum over (cid:96) is achieved by the smallest value, (cid:96) = d max + 1 = Θ( k γ ) . October 5, 2018 DRAFT3
For the (cid:96) values corresponding to T in (79), the condition (72) was already simplified in [8]. It was shown thatunder the choice ν = log 2 , the dominant condition is that of the highest value, (cid:96) = k , and the resulting conditionon the number of tests is n ≥ k log pk (log 2 − H ( ρ ))(1 − δ (2)2 ) (1 + o (1)) . (89)
6) Wrapping up:
We obtain the final condition on n by combining (84), (88), and (89). We take δ (2)2 to bearbitrarily small, while renaming δ (1)2 to δ and letting it remain a free parameter. Also recalling the choice ν = log 2 ,we obtain the following generalization of the partial recovery bound given in [8]. Theorem 8.
Under the symmetric noise model (2) , in the regime k = Θ( p θ ) and d max = Θ( k γ ) with θ, γ ∈ (0 , ,there exists a non-adaptive group testing algorithm such that P e → as p → ∞ with a number of tests satisfying n ≤ inf δ ∈ (0 , max (cid:8) n MI , , n MI , ( γ, δ ) , n Conc ( γ, δ ) (cid:9) (1 + o (1)) , (90) where n MI , , n MI , , and n Conc are defined in (21) – (23) . Variation for the Z-channel.
For general γ ∈ (0 , , the preceding analysis is non-trivial to extend to the Z-channel noise model, which we consider in Section V. However, it is relatively easy to obtain a partial recoveryresult for the case d max = Θ( k ) , and such a result suffices for our purposes. We outline the required changes here.We continue to assume that the test matrix X is i.i.d. Bernoulli, but now the probability of a given entry being oneis νk for some ν > to be chosen later.As was observed in [8], the analysis is considerably simplified by the fact that we do not need to consider thecase (cid:96)k → . This means that we can rely exclusively on (77), which is known to hold for any binary-outputnoise model [8]. Consequently, one finds that the only requirement on n is that (72) holds, with the conditionalmutual information I (cid:96) = I ( X s dif ; Y | X s eq ) suitably modified due to the different noise model. By some asymptoticsimplifications and the fact that (cid:96) = Θ( k ) for all (cid:96) under consideration, this condition simplifies to n ≥ max (cid:96)>d max (cid:96) log pk I (cid:96) (1 + o (1)) . (91)Next, we note that an early result of Malyutov and Mateev [13] (see also [40]) implies that (cid:96)I (cid:96) is maximized at (cid:96) = k . For completeness, we provide a short proof. Assuming without loss of generality that s = { , . . . , k } , andletting X j (cid:48) j denote the collection ( X j , . . . , X j (cid:48) ) for indices ≤ j ≤ j (cid:48) ≤ k , we have I (cid:96) (cid:96) = 1 (cid:96) I ( X kk − (cid:96) +1 ; Y | X k − (cid:96) ) (92) = 1 (cid:96) k (cid:88) j = k − (cid:96) +1 I ( X j ; Y | X j − ) (93) = 1 (cid:96) k (cid:88) j = k − (cid:96) +1 (cid:0) H ( X j ) − H ( X j | Y, X j − ) (cid:1) , (94)where (92) follows since I (cid:96) = I ( X s dif ; Y | X s eq ) only depends on the sets ( s dif , s eq ) through their cardinalities, (93)follows from the chain rule for mutual information, and (94) follows since X j is independent of X j − . We establish October 5, 2018 DRAFT4 the desired claim by observing that I (cid:96) (cid:96) is decreasing in (cid:96) : The term H ( X j ) is the same for all j , whereas the term H ( X j | Y, X j − ) is smaller for higher values of j because conditioning reduces entropy.Using this observation, the condition in (91) simplifies to n ≥ max (cid:96)>d max k log pk I k (1 + o (1)) . (95)We can further replace I k = I ( X s ; Y ) by the capacity of the Z-channel upon optimizing the i.i.d. Bernoulliparameter ν > . The optimal value is the one that makes P [ ∨ j ∈ s X s = 1] the same as P ∗ U (1) , where P ∗ U is thecapacity-achieving input distribution of the Z-channel P Y | U . B. Partial Recovery Result with Unknown k In this section, we explain how to adapt the partial recovery analysis of [8] for the symmetric noise model (aswell as that of Appendix A for d max = Θ( k γ ) ) to the case that k is only known to lie within a certain interval K of length ∆ = o ( d max ) , where d max is the partial recovery threshold. Specifically, we argue that for any defectiveset s with | s | ∈ K , there exists a decoder that knows K but not | s | , such that the error probability P [ (cid:98) S (cid:54) = s | S = s ] vanishes under i.i.d. Bernoulli testing, with the same requirement on n is the case of known | s | . Of course, thisalso implies that P [ (cid:98) S (cid:54) = S ] vanishes under any prior distribution on S such that | S | ∈ K almost surely.We consider the same non-adaptive setup of Appendix A, denoting the test matrix by X ∈ { , } p and makingextensive use of the information densities defined in in (67)–(68). Since k := | s | is unknown, we can no longerassume that the test matrix is i.i.d. with distribution P X ∼ Bernoulli (cid:0) νk (cid:1) , so we instead use P X ∼ Bernoulli (cid:0) νk max (cid:1) ,with k max equaling the maximum value in K .In the case of known k , we considered the decoder in (70), first proposed in [8]. In the present setting, we modifythe decoder to consider all possible k , and to allow s dif ∪ s eq to be a strict subset of s . More specifically, the decoderis defined as follows. For any pair ( s dif , s eq ) such that | s dif ∪ s eq | equals some constant k (cid:48) , let ı nk (cid:48) ( x s dif ; y | x s eq ) bethe information density corresponding to the case that the defective set equals s dif ∪ s eq , with an explicit dependenceon the cardinality k (cid:48) . We consider a decoder that searches over all s ⊆ { , . . . , p } whose cardinality is in K , andseeks a set such that ı nk (cid:48) ( X s dif ; Y | X s eq ) ≥ γ k (cid:48) ,(cid:96) , ∀ ( s dif , s eq ) ⊆ ˜ S s , (96)where { γ k (cid:48) ,(cid:96) } is a set of constants depending on k (cid:48) := | s dif ∪ s eq | and (cid:96) := | s dif | , and ˜ S s is the set of pairs ( s dif , s eq ) satisfying the following:1. s dif ⊆ s and s eq ⊆ s are disjoint;2. The total cardinality k (cid:48) = | s dif ∪ s eq | lies in K ;3. The “distance” (cid:96) + k − k (cid:48) exceeds d max . Specifically, if s is the true defective set and ˆ s is some estimate ofcardinality k (cid:48) ≤ k with s ∩ ˆ s = s eq and | s eq | = k (cid:48) − (cid:96) , then we have (cid:96) + k − k (cid:48) false negatives, and (cid:96) falsepositives, so that d ( s, ˆ s ) = (cid:96) + k − k (cid:48) under the distance function in (5). In fact, this analysis applies to any binary channel P Y | U . October 5, 2018 DRAFT5
If multiple s satisfy (96), then the one with the smallest cardinality k := | s | is chosen, with any remaining tiesbroken arbitrarily. If none of the s satisfy (96), an error is declared.Under this decoder, an error occurs if the true defective set s fails the threshold test (96), or if some s (cid:48) with | s (cid:48) | ≤ | s | and d ( s, s (cid:48) ) > d max passes it. By the union bound, the first of these occurs with probability at most P (1)e ( s, d max ) ≤ (cid:88) ( k (cid:48) ,(cid:96) ) : k (cid:48) ∈K ,(cid:96) ≤ k (cid:48) ≤ k,(cid:96) + k − k (cid:48) >d max (cid:18) kk (cid:48) (cid:19)(cid:18) k (cid:48) (cid:96) (cid:19) P (cid:2) ı nk (cid:48) ( X s dif ; Y | X s eq ) ≥ γ k (cid:48) ,(cid:96) (cid:3) , (97)where ( s dif , s eq ) is an arbitrary pair with | s dif | = (cid:96) and | s dif ∪ s eq | = k (cid:48) . Here the combinatorial terms arise bychoosing k (cid:48) elements of s to form s dif ∪ s eq , and then choosing (cid:96) of those elements to form s dif .As for the probability of some incorrect s (cid:48) passing the threshold test, we have the following. Let s eq = s (cid:48) ∩ s and s dif = s (cid:48) \ s . Since only sets with | s (cid:48) | ≤ | s | can cause errors, k (cid:48) := | s (cid:48) | = | s eq ∪ s dif | is upper bounded by k ,and since only sets with d ( s, s (cid:48) ) > d max can cause errors, we can also assume that this holds. Defining (cid:96) = | s dif | ,we can upper bound the probability of s (cid:48) passing the test (96) for all ( s dif , s eq ) by the probability of passing it forthe specific pair ( s dif , s eq ) . By doing so, and summing over all possible s (cid:48) , we find that the second error event isupper bounded as follows for any given s : P (2)e ( s, d max ) ≤ (cid:88) ( k (cid:48) ,(cid:96) ) : k (cid:48) ∈K ,(cid:96) ≤ k (cid:48) ≤ k,(cid:96) + k − k (cid:48) >d max (cid:18) p − k(cid:96) (cid:19)(cid:18) kk (cid:48) − (cid:96) (cid:19) P (cid:2) ı nk (cid:48) ( X s dif ; Y | X s eq ) ≥ γ k (cid:48) ,(cid:96) (cid:3) , (98)where the combinatorial terms corresponding to choosing (cid:96) elements of { , . . . , p } \ s to form s dif , and choosing k (cid:48) − (cid:96) elements of s to form s eq .Combining the above, the overall upper bound on the error probability given s is P e ( s ) ≤ P (1)e ( s, d max ) + P (2)e ( s, d max ) . (99)Upon substituting the upper bounds in (97) and (98), we obtain an expression that is nearly the same as that when k is known [8], except that we sum over a number of different k (cid:48) , rather than only k (cid:48) = k . We proceed by arguingthat this does not affect the final bound, as long as d max = Θ( k γ ) for some γ ∈ (0 , , and ∆ = o ( d max ) (recallthat ∆ is the highest possible difference between two k values).The main additional difficulty here is that the information density ı k (cid:48) ( x s dif ; y | x s eq ) = log P Y | Xs dif ,Xs eq ( y | x s dif ,x s eq ) P Y | Xs eq ( y | x s eq ) is defined with respect to ( s eq , s dif ) of total cardinality k (cid:48) , whereas the output variables Y are distributed accordingto the true model in which there are k defectives. The following lemma allows us to perform a change of measure to circumvent this issue. Lemma 4.
Fix a defective set s of cardinality k , let ( s dif , s eq ) be disjoint subsets of s with total cardinality k (cid:48) ≤ k ,and let P ( k ) Y | X s dif ,X s eq be the conditional probability of Y given the partial test vector ( X s dif , X s eq ) , in the case ofa test vector with i.i.d. Bernoulli (cid:0) νk max (cid:1) entries, where k max = k (1 + o (1)) . Similarly, let P ( k (cid:48) ) Y | X s dif ,X s eq denote theconditional transition law when s (cid:48) = s dif ∪ s eq is the true defective set. Then, if | k − k (cid:48) | ≤ ∆ = o ( k ) , we have max x s dif ,x s eq ,y P ( k ) Y | X s dif ,X s eq ( y | x s dif , x s eq ) P ( k (cid:48) ) Y | X s dif ,X s eq ( y | x s dif , x s eq ) ≤ O (cid:18) ∆ k (cid:19) . (100) October 5, 2018 DRAFT6
Consequently, the corresponding n -letter product distributions P ( k ) Y | X s dif , X s eq and P ( k (cid:48) ) Y | X s dif , X s eq for conditionallyindependent observations satisfy the following: max x s dif , x s eq , y P ( k ) Y | X s dif , X s eq ( y | x s dif , x s eq ) P ( k (cid:48) ) Y | X s dif , X s eq ( y | x s dif , x s eq ) ≤ e O ( n ∆ k ) (101) Proof.
First observe that if x s dif or x s eq contain an entry equal to one, then the ratio in (100) equals one, as Y = 1 with probability − ρ in either case. Hence, it suffices to prove the claim for x s dif and x s eq having all entries equalto zero. In the denominator, we have P ( k (cid:48) ) Y | X s dif ,X s eq (1 | x s dif , x s eq ) = ρ, (102)since there ( s dif , s eq ) corresponds to the entire defective set. On the other hand, in the numerator, there are k − k (cid:48) additional defective items, and the probability of one or more of them being defective is (cid:15) := 1 − (cid:0) − νk ) k − k (cid:48) = O (cid:0) ∆ k max (cid:1) , where we applied the assumptions | k − k (cid:48) | ≤ ∆ = o ( k ) and k max = k (1 + o (1)) , along with someasymptotic simplifications. Therefore, we have P ( k ) Y | X s dif ,X s eq (1 | x s dif , x s eq ) = (1 − (cid:15) ) ρ + (cid:15) (1 − ρ ) (103) = ρ + (cid:15) (1 − ρ ) . (104)The ratio of (104) and (102) evaluates to O ( (cid:15) ) , and similarly for the conditional probabilities of Y = 0 obtainedby taking one minus the right-hand sides. Since (cid:15) = O (cid:0) ∆ k (cid:1) , this proves (100).We obtain (101) by raising the right-hand side of (100) to the power of n , and applying α ≤ e α .We now show how to use Lemma 4 to bound P (1)e ( s, d max ) and P (2)e ( s, d max ) . Starting with the former, weobserve that Y in (97) is conditionally distributed according to P ( k ) Y | X s dif , X s eq , and hence, (101) yields P (cid:2) ı nk (cid:48) ( X s dif ; Y | X s eq ) ≥ γ k (cid:48) ,(cid:96) (cid:3) ≤ e O ( n ∆ k ) · P (cid:2) ı nk (cid:48) ( X s dif ; (cid:101) Y | X s eq ) ≥ γ k (cid:48) ,(cid:96) (cid:3) , (105)where (cid:101) Y is conditionally distributed according to P ( k (cid:48) ) Y | X s dif , X s eq .For P (2)e ( s, d max ) , we first note that a similar bound to (101) holds when we condition on X s eq alone; this isseen by simply moving the denominator to the right-hand side and averaging over X s dif on both sides. Since Y in(98) is conditionally distributed according to P ( k ) Y | X s eq , we obtain from (101) that P (cid:2) ı nk (cid:48) ( X s dif ; Y | X s eq ) ≥ γ k (cid:48) ,(cid:96) (cid:3) ≤ e O ( n ∆ k ) · P (cid:2) ı nk (cid:48) ( X s dif ; (cid:101) Y | X s eq ) ≥ γ k (cid:48) ,(cid:96) (cid:3) , (106)where (cid:101) Y is conditionally distributed according to P ( k (cid:48) ) Y | X s eq .Next, observe that if the number of tests satisfies n = O ( k log p ) , then we can simplify the term e O ( n ∆ k ) to e O (∆ log p ) . By doing so, and substituting (105) and (106) into (97)–(99), we obtain P e ( s ) ≤ e O (∆ log p ) (cid:88) ( k (cid:48) ,(cid:96) ) : k (cid:48) ∈K ,(cid:96) ≤ k (cid:48) ≤ k,(cid:96) + k − k (cid:48) >d max (cid:18) kk (cid:48) (cid:19)(cid:18) k (cid:48) (cid:96) (cid:19) P (cid:2) ı nk (cid:48) ( X s dif ; (cid:101) Y | X s eq ) ≥ γ k (cid:48) ,(cid:96) (cid:3) + e O (∆ log p ) (cid:88) ( k (cid:48) ,(cid:96) ) : k (cid:48) ∈K ,(cid:96) ≤ k (cid:48) ≤ k,(cid:96) + k − k (cid:48) >d max (cid:18) p − k(cid:96) (cid:19)(cid:18) kk (cid:48) − (cid:96) (cid:19) P (cid:2) ı nk (cid:48) ( X s dif ; (cid:101) Y | X s eq ) ≥ γ k (cid:48) ,(cid:96) (cid:3) . (107) October 5, 2018 DRAFT7
This bound is now of a similar form to that analyzed in [8], in the sense that the joint distributions of the testsand outcomes match those that define the information density. The only differences are the presence of additional k (cid:48) values beyond only k = k (cid:48) , and the presence of the e O (∆ log p ) terms. We conclude by explaining how thesedifferences do not impact the final result as long as ∆ = o ( d max ) with d max = Θ( k γ ) for some γ ∈ (0 , : • The term (cid:0) p − k(cid:96) (cid:1) satisfies log (cid:0) p − k(cid:96) (cid:1) = (cid:0) (cid:96) log p(cid:96) (cid:1) (1 + o (1)) , and the assumption | k − k (cid:48) | ≤ ∆ = o ( d max ) = o ( k ) implies that the term (cid:0) k (cid:48) (cid:96) (cid:1) satisfies log (cid:0) k (cid:48) (cid:96) (cid:1) = (cid:0) (cid:96) log k(cid:96) (cid:1) (1+ o (1)) . On the other hand, the logarithm of e O (∆ log p ) is O (∆ log p ) , so it is dominated by the other combinatorial terms due to the fact that ∆ = o ( d max ) and (cid:96) = Ω( d max ) . Similarly, the term (cid:0) kk (cid:48) (cid:1) = (cid:0) kk − k (cid:48) (cid:1) satisfies log (cid:0) kk (cid:48) (cid:1) = O (∆ log k ) , and is dominated by (cid:0) k (cid:48) (cid:96) (cid:1) . • The term (cid:0) kk (cid:48) − (cid:96) (cid:1) simplifies to (cid:0) kk − k (cid:48) + (cid:96) (cid:1) = (cid:0) k(cid:96) (1+ o (1)) (cid:1) (by the assumption ∆ = o ( d max ) ), and hence, theasymptotic behavior for any k (cid:48) is the same as (cid:0) kk − (cid:96) (cid:1) , the term corresponding to k = k (cid:48) . Similarly, the asymptoticsof the tail probabilities of the information densities are unaffected by switching from k to k (cid:48) = k (1 + o (1)) . • In [8], the number of (cid:96) being summed over is upper bounded by k , whereas here we can upper bound thenumber of ( k (cid:48) , (cid:96) ) being summed over by k ∆ . Since ∆ = o ( k ) , this simplifies to k o (1) . Since it is thelogarithm of this term that appears in the final expression, this difference only amounts to a multiplication by o (1) . C. NCOMP with Unknown Number of Defectives
Chan et al. [14] showed that Noisy Combinatorial Orthogonal Matching Pursuit (NCOMP), used in conjunctionwith i.i.d. Bernoulli test matrices, ensures exact recovery of a defective set S of cardinality k with high probabilityunder the scaling n = O ( k log p ) , which in turn behaves as O (cid:0) k log pk (cid:1) when k = O ( p θ ) for some θ < . However,the random test design and the decoding rule in [14] assume knowledge of k , meaning the result cannot immediatelybe used for our purposes in Step 2 of Algorithm 1. In this section, we modify the algorithm and analysis of [14]to handle the case that k is only known up to a constant factor.Suppose that k ∈ [ c k max , k max ] for some k max = Θ( p θ ) , where c ∈ (0 , and θ ∈ (0 , do not depend on p . We adopt a Bernoulli design in which each item is independently placed in each test with probability νk max forfixed ν > . It follows that for a given test vector X = ( X , . . . , X p ) , we have P (cid:20) (cid:95) j ∈ S X j = 1 (cid:21) = 1 − (cid:18) − νk max (cid:19) k = (1 − e − cν )(1 + o (1)) (108)for some c ∈ [ c , , and hence, the corresponding observation Y satisfies P [ Y = 1] = (cid:16) (1 − ρ )(1 − e − cν ) + ρe − cν (cid:17) (1 + o (1)) . (109)In contrast, for any j ∈ S , we have P [ Y = 1 | X j = 1] = 1 − ρ. (110)The idea of the NCOMP algorithm is the following: For each item j , consider the set of tests in which the item isincluded, and define the total number as N (cid:48) j . If j is defective, we should expect a proportion of roughly − ρ ofthese tests to be positive according to (110), whereas if j is non-defective, we should expect the proportion to be October 5, 2018 DRAFT8 roughly (1 − ρ )(1 − e − cν ) + ρe − cν according to (109). Hence, we set a threshold in between these two values, anddeclare j to be defective if and only if the proportion of positive tests exceeds that threshold.We first study the behavior of N (cid:48) j . Under the above Bernoulli test design, we have N (cid:48) j ∼ Binomial (cid:0) n, νk max (cid:1) ,and hence, standard Binomial concentration [41, Ch. 4] gives P (cid:20) N (cid:48) j ≤ nν k max (cid:21) ≤ e − Θ(1) nk max (111) ≤ p , (112)where (112) holds provided that n = Ω( k log p ) with a suitably-chosen implied constant (recall that k = Θ( k max ) ).Next, we present the modified NCOMP decoding rule, and study its performance under the assumption that N (cid:48) j = n (cid:48) j with n (cid:48) j ≥ nν k max , for each j ∈ { , . . . , p } . Observe that the gap between (109) and (110) behaves as Θ(1) for any c ∈ [ c , . Hence, for sufficiently small ∆ > , we have P [ Y = 1] ≤ − ρ − . Accordingly, letting N (cid:48) j, be the number of the N (cid:48) j tests including j that returned positive, we declare j to be defective if and only if N (cid:48) j, ≥ (1 − ρ − ∆) N (cid:48) j . We then have the following: • If j is defective, then the probability of incorrectly declaring it to be non-defective given N (cid:48) j = n (cid:48) j satisfies P (cid:2) N (cid:48) j, < (1 − ρ − ∆) n (cid:48) j (cid:3) ≤ e − Θ(1) n (cid:48) j ≤ e − Θ(1) nν k max , (113)where the first inequality is standard Binomial concentration, and the second holds for n (cid:48) j ≥ nν k max . • Similarly, if j is non-defective, the probability of incorrectly declaring it to be defective given N (cid:48) j = n (cid:48) j satisfies P (cid:2) N (cid:48) j, ≥ (1 − ρ − ∆) n (cid:48) j (cid:3) ≤ e − Θ(1) n (cid:48) j ≤ e − Θ(1) nν k max . (114)Combining these bounds with (112) and a union bound over the p items, the overall error probability P e = P [ (cid:98) S (cid:54) = S ] of the modified NCOMP algorithm is upper bounded by P e ≤ p + pe − Θ(1) nν k max . (115)Since k max = Θ( k ) , this vanishes when n = Ω( k log p ) with a suitably-chosen implied constant, thus establishingthe desired result. Z-channel noise.
Under the Z-channel noise model introduced in Section V, the preceding analysis is essentiallyunchanged. It only relied on there being a constant gap between the probabilities P [ Y = 1] and P [ Y = 1 | X j = 1] ,and this is still the case here: Equations (109) and (110) remain true when (1 − ρ )(1 − e − cν ) + ρe − cν is replacedby (1 − e − cν ) + ρe − cν in the former. A CKNOWLEDGMENT
The author thanks Volkan Cevher, Sidharth Jaggi, Oliver Johnson, and Matthew Aldridge for helpful discussions,and Leonardo Baldassini for sharing his PhD thesis [33]. This work was supported by an NUS startup grant.
October 5, 2018 DRAFT9 R EFERENCES[1] R. Dorfman, “The detection of defective members of large populations,”
Ann. Math. Stats. , vol. 14, no. 4, pp. 436–440, 1943.[2] A. Fernández Anta, M. A. Mosteiro, and J. Ramón Muñoz, “Unbounded contention resolution in multiple-access channels,” in
DistributedComputing . Springer Berlin Heidelberg, 2011, vol. 6950, pp. 225–236.[3] R. Clifford, K. Efremenko, E. Porat, and A. Rothschild, “Pattern matching with don’t cares and few errors,”
J. Comp. Sys. Sci. , vol. 76,no. 2, pp. 115–124, 2010.[4] G. Cormode and S. Muthukrishnan, “What’s hot and what’s not: Tracking most frequent items dynamically,”
ACM Trans. Database Sys. ,vol. 30, no. 1, pp. 249–278, March 2005.[5] A. Gilbert, M. Iwen, and M. Strauss, “Group testing and sparse signal recovery,” in
Asilomar Conf. Sig., Sys. and Comp. , Oct. 2008, pp.1059–1063.[6] A. C. Gilbert, M. J. Strauss, J. A. Tropp, and R. Vershynin, “One sketch for all: Fast algorithms for compressed sensing,” in
Proc.ACM-SIAM Symp. Disc. Alg. (SODA) , New York, 2007, pp. 237–246.[7] L. Baldassini, O. Johnson, and M. Aldridge, “The capacity of adaptive group testing,” in
IEEE Int. Symp. Inf. Theory , July 2013, pp.2676–2680.[8] J. Scarlett and V. Cevher, “Phase transitions in group testing,” in
Proc. ACM-SIAM Symp. Disc. Alg. (SODA) , 2016.[9] D. Du, F. K. Hwang, and F. Hwang,
Combinatorial group testing and its applications . World Scientific, 2000, vol. 12.[10] M. Aldridge, “The capacity of Bernoulli nonadaptive group testing,”
IEEE Trans. Inf. Theory , vol. 63, no. 11, pp. 7142–7148, 2017.[11] A. Agarwal, S. Jaggi, and A. Mazumdar, “Novel impossibility results for group-testing,” 2018, http://arxiv.org/abs/1801.02701.[12] M. Aldridge, “Individual testing is optimal for nonadaptive group testing in the linear regime,” 2018, http://arxiv.org/abs/1801.08590.[13] M. B. Malyutov and P. S. Mateev, “Screening designs for non-symmetric response function,”
Mat. Zametki , vol. 29, pp. 109–127, 1980.[14] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama, “Non-adaptive probabilistic group testing with noisy measurements: Near-optimal boundswith efficient algorithms,” in
Allerton Conf. Comm., Ctrl., Comp. , Sep. 2011, pp. 1832–1839.[15] C. L. Chan, S. Jaggi, V. Saligrama, and S. Agnihotri, “Non-adaptive group testing: Explicit bounds and novel algorithms,”
IEEE Trans.Inf. Theory , vol. 60, no. 5, pp. 3019–3035, May 2014.[16] J. Scarlett and V. Cevher, “Near-optimal noisy group testing via separate decoding of items,” 2017, accepted to
IEEE Trans. Sel. TopicsSig. Proc. [17] ——, “Limits on support recovery with probabilistic models: An information-theoretic framework,”
IEEE Trans. Inf. Theory , vol. 63, no. 1,pp. 593–620, 2017.[18] M. Malyutov, “The separating property of random matrices,”
Math. Notes Acad. Sci. USSR , vol. 23, no. 1, pp. 84–91, 1978.[19] A. G. D’yachkov, “Error probability bounds for the symmetrical model of the design of screening experiments,”
Prob. Inf. Transm. , 1982.[20] G. Atia and V. Saligrama, “Boolean compressed sensing and noisy group testing,”
IEEE Trans. Inf. Theory , vol. 58, no. 3, pp. 1880–1901,March 2012.[21] M. Aldridge, L. Baldassini, and K. Gunderson, “Almost separable matrices,”
J. Comb. Opt. , pp. 1–22, 2015.[22] J. Scarlett and V. Cevher, “Converse bounds for noisy group testing with arbitrary measurement matrices,” in
IEEE Int. Symp. Inf. Theory ,Barcelona, 2016.[23] A. J. Macula, “Error-correcting nonadaptive group testing with de-disjunct matrices,”
Disc. App. Math. , vol. 80, no. 2-3, pp. 217–222,1997.[24] H. Q. Ngo, E. Porat, and A. Rudra, “Efficiently decodable error-correcting list disjunct matrices and applications,” in
Int. Colloq. Automata,Lang., and Prog. , 2011.[25] M. Cheraghchi, “Noise-resilient group testing: Limitations and constructions,”
Disc. App. Math. , vol. 161, no. 1, pp. 81–95, 2013.[26] F. Hwang, “A method for detecting all defective members in a population by group testing,”
J. Amer. Stats. Assoc. , vol. 67, no. 339, pp.605–608, 1972.[27] P. Damaschke and A. S. Muhammad, “Randomized group testing both query-optimal and minimal adaptive,” in
Int. Conf. Current Trendsin Theory and Practice of Computer Science . Springer, 2012, pp. 214–225.[28] A. J. Macula, “Probabilistic nonadaptive and two-stage group testing with relatively small pools and DNA library screening,”
J. Comb.Opt. , vol. 2, no. 4, pp. 385–397, 1998.
October 5, 2018 DRAFT0 [29] M. Mézard and C. Toninelli, “Group testing with random pools: Optimal two-stage algorithms,”
IEEE Trans. Inf. Theory , vol. 57, no. 3,pp. 1736–1745, 2011.[30] A. De Bonis, L. Gasieniec, and U. Vaccaro, “Optimal two-stage algorithms for group testing problems,”
SIAM Journal on Computing ,vol. 34, no. 5, pp. 1253–1270, 2005.[31] D. Eppstein, M. T. Goodrich, and D. S. Hirschberg, “Improved combinatorial group testing algorithms for real-world problem sizes,”
SIAMJournal on Computing , vol. 36, no. 5, pp. 1360–1375, 2007.[32] S. Cai, M. Jahangoshahi, M. Bakshi, and S. Jaggi, “GROTESQUE: Noisy group testing (quick and efficient),” 2013,http://arxiv.org/abs/1307.2811.[33] L. Baldassini, “Rates and algorithms for group testing,” Ph.D. dissertation, Univ. Bristol, 2015.[34] S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone, “PAC subset selection in stochastic multi-armed bandits.” in
Int. Conf. Mach. Learn.(ICML) , 2012.[35] J. Scarlett and O. Johnson, “Noisy non-adaptive group testing: A (near-) definite defectives approach,” 2018, https://arxiv.org/abs/1808.09143.[36] O. Johnson, “Strong converses for group testing from finite blocklength results,”
IEEE Trans. Inf. Theory , vol. 63, no. 9, pp. 5923–5933,Sept. 2017.[37] Y. Polyanskiy, V. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,”
IEEE Trans. Inf. Theory , vol. 56, no. 5, pp.2307–2359, May 2010.[38] A. Feinstein, “A new basic theorem of information theory,”
IRE Prof. Group. on Inf. Theory , vol. 4, no. 4, pp. 2–22, Sept. 1954.[39] T. S. Han,
Information-Spectrum Methods in Information Theory . Springer, 2003.[40] M. B. Malyutov, “Search for sparse active inputs: A review,” in
Inf. Theory, Comb. and Search Theory , 2013, pp. 609–647.[41] R. Motwani and P. Raghavan,
Randomized Algorithms . Chapman & Hall/CRC, 2010.. Chapman & Hall/CRC, 2010.