Combinatorial Quantitative Group Testing with Adversarially Perturbed Measurements
11 Combinatorial Quantitative Group Testing withAdversarially Perturbed Measurements
Yun-Han Li and I-Hsiang Wang
Abstract
In this paper, combinatorial quantitative group testing (QGT) with noisy measurements is studied. The goal of QGTis to detect defective items from a data set of size n with counting measurements, each of which counts the numberof defects in a selected pool of items. While most literatures consider either probabilistic QGT with random noise orcombinatorial QGT with noiseless measurements, our focus is on the combinatorial QGT with noisy measurementsthat might be adversarially perturbed by additive bounded noises. Since perfect detection is impossible, a partialdetection criterion is adopted. With the adversarial noise being bounded by d n = Θ( n δ ) and the detection criterionbeing to ensure no more than k n = Θ( n κ ) errors can be made, our goal is to characterize the fundamental limiton the number of measurement, termed pooling complexity , as well as provide explicit construction of measurementplans with optimal pooling complexity and efficient decoding algorithms. We first show that the fundamental limit is − δ n log n to within a constant factor not depending on ( n, κ, δ ) for the non-adaptive setting when < δ ≤ κ < ,sharpening the previous result by Chen and Wang [1]. We also provide an explicit construction of a non-adaptivedeterministic measurement plan with − δ n log n pooling complexity up to a constant factor, matching the fundamentallimit, with decoding complexity being o ( n ρ ) for all ρ > , nearly linear in n , the size of the data set. I. I
NTRODUCTION
Group testing is the problem of identifying defective items in a large set with cardinality n by taking measurementson pools (subsets) of items. The type of measurement plays a central role in the fundamental limits of detectionefficiency. In a classical model by Dorfman [2], binary-valued measurements are considered, where the output is a bitindicating the existence of defected items in the measured pool. Extensive results for this model (termed traditionalgroup testing hereafter), including algorithms and information theoretic limits, can be found in surveys [3], [4]and the references therein. Meanwhile, in many modern applications such as bioinformatics [5], network trafficmonitoring [6], resource allocation in multi-user communication systems [7], etc., more informative measurementon the pool of items can be carried out. A natural one is the counting measurement that outputs the number ofdefective items in the pool. This is called the quantitative group testing (QGT) problem or the coin weighingproblem with its root in combinatorics dating back to Shapiro [8]. QGT with noiseless measurements has been Y.-H. Li is with the Graduate Institute of Communication Engineering, National Taiwan University, Taipei 10617, Taiwan (email:[email protected]).I.-H. Wang is with the Department of Electrical Engineering and the Graduate Institute of Communication Engineering, National TaiwanUniversity, Taipei 10617, Taiwan (email: [email protected]).
February 1, 2021 DRAFT a r X i v : . [ c s . I T ] J a n extensively studied. In particular, it has been shown that the minimum number of measurements is asymptotically n log n [9] with explicit constructions of the optimal non-adaptive measurement plans [10], [11]. These results arecombinatorial in nature as the goal is to detect the defects no matter where they are located. Hence, it is also calledthe combinatorial QGT (CQGT) with noiseless measurements, to contrast another more recent line of works takinga probabilistic approach [12], [13], termed probabilistic QGT hereafter.In practice, however, measurement might be noisy, as counting the number of defectives might be too costlyto be accurate. In database applications, in order to preserve privacy, the measurement might also be perturbedintentionally [14]. While the traditional group testing with noisy measurements has been extensively studied (see[4] for a survey), QGT with noisy measurements is far less understood. One line of works pertains to probabilisticQGT with random perturbation in the measurement [15]. Another line of works [1], [16]–[18] consider CQGTwith adversarially perturbed measurements. It has been shown in [1] that, for δ, κ ∈ (0 , , when the perturbationis at most the order of Θ( n δ ) and the goal is to detect the defective items within Hamming distance at most theorder of Θ( n κ ) , there is a sharp phase transition in the fundamental limit: for < δ ≤ κ < , the optimalpooling complexity is Θ( n log n ) , and for < κ < δ < , it is ω ( n p ) ∀ p ∈ N . This sharpened results in previousworks related to data privacy [16], [17]. Unlike the noiseless case [10], [11], however, only the existence of optimalmeasurement plans was shown in [1] by a probabilistic argument. The optimal explicit construction remained open.In this work, we improve the previous work [1] both in characterization of the optimal pooling complexity andconstruction of algorithms for the regime < δ ≤ κ < . Our contribution is two-fold: one on sharpeningthe characterization of the information theoretic limit, and the other on providing an explicit construction of apooling-complexity optimal deterministic non-adaptive measurement plan, together with a low-complexity decodingalgorithm. As for the information theoretic limit, we characterize the relationship between ( κ, δ ) and the leadingcoefficient of the optimal non-adaptive pooling complexity which turns out to be − δ n log n to within a constantfactor not depending on ( κ, δ ) . We further investigate the sparse CQGT (SCQGT) problem, that is, the originalCQGT problem with an additional condition that the number of defective items is not greater than a thresholdthat we term the sparsity level. When the sparsity level is Θ( n λ ) , for < δ ≤ κ < λ < , the optimal poolingcomplexity is also characterized to within a constant factor not depending on ( κ, δ, λ ) . Achievability is proved viaa probabilistic argument, and the converse proof extends that of Erdös and Rényi [9].As for the construction of CQGT algorithms, the following contributions are made. We first provide an explicitconstruction of a non-adaptive measurement plan with pooling complexity being κ − δ n log n (which is not optimalin the leading coefficient) to within a constant factor, along with a procedure that combines this construction withany sufficiently good SCQGT algorithms to reach a construction that has the optimal pooling complexity. The wholeproblem boils down to the design of good SCQGT algorithms, which is the key to the overall construction. In orderto give an explicit non-adaptive SCQGT algorithm that meet the above-mentioned criterion, we first reduce it toa combinatorial design problem of constructing unbalanced bipartite expanders [19], [20]. While the existence ofunbalanced bipartite expanders with the optimal parameters can be proved via probabilistic methods, the explicitconstruction that achieves the optimal parameters remain open to the best of our knowledge. Fortunately, thenear-optimal construction in Guruswami et al. [20] suffices to meet our need. As a result, we provide an explicit February 1, 2021 DRAFT construction of a SCQGT non-adaptive measurement plan with near-optimal pooling complexity ( o ( n λ + ρ ) for all ρ > ). Consequently, it leads to overall non-adaptive measurement plan with pooling complexity being − δ n log n to within a constant factor (optimal in the leading coefficient). For efficient decoding, the complexity is dominatedby the SCQGT part, where we leverage the Sparse Matching Pursuit algorithm in [21], and the overall decodingcomplexity is o ( n ρ ) for all ρ > , nearly linear in n . Note that in [21], the Sparse Matching Pursuit algorithmis proposed for recovering sparse real-valued vector from measurements generated by binary sensing matrices, andhence it can be used to recover sparse binary vector. Related Works
There are several closely related works [18], [22]–[24] that consider CQGT with adversarially perturbed mea-surements. The noise model in the measurement, however, are all quite different from ours. In [18], there are threepossible outcomes: the correct sum, an erroneous outcome with an arbitrary value, and an erasure symbol "?".When the total number of erroneous (or erasure) outcomes is assumed to be at most a fraction of the total numberof measurements, which can be viewed as a (cid:96) -norm constraint on the perturbation vector, the optimal non-adaptivepooling complexity is characterized to within a constant factor. Another line of related works pertain to the binarymultiple-access adder channel [22]–[24], where the perturbation vector is constrained in the (cid:96) -norm . In contrast,the noise model in our work constrains the perturbation vector in the (cid:96) ∞ -norm , which makes perfect detectionimpossible, while in the related works mentioned above, only the perfect-detection criterion is considered.II. P ROBLEM F ORMULATION
In this section, let us define the combinatorial quantitative group testing (CQGT) problem and other relatednotions. A CQGT problem comprises the following:Data: for each item indexed by j = 1 , ..., n , we use x j ∈ { , } to denote whether or not the j -th item isdefective. Hence, the n -by- data vector x := [ x x ... x n ] (cid:124) is the target to be reconstructed from the noisymeasurements.Counting measurements: the pool of items in the i -th counting measurement can be represented by an -by- n pooling vector q i ∈ { , } × n , and the outcome of the counting measurement is q i x . For a non-adaptive poolingalgorithm, the measurement plan can be concisely represented by an n -by- s pooling matrix Q with its i -th rowbeing the i -th pooling vector q i . Here s denotes the number of measurements, termed pooling complexity .Perturbed outcomes: the outcome of the i -th measurement is y i = q i x + n i , where n i ∈ [ − d n , d n ] denotes thebounded additive perturbation in the i -th measurement. The s outcomes of the measurements can be written as an s -by- vector y = Q x + n , where n is the perturbation vector with (cid:107) n (cid:107) ∞ ≤ d n .Detection: for any data vector x ∈ { , } n × , the estimate generated by the detection algorithm (denoted by ˆ x )should be close to x . In particular, the Hamming distance between ˆ x and x should not be greater than k n , that is, (cid:107) ˆ x − x (cid:107) ≤ k n .Hence, a pooling matrix Q that solves the above CQGT problem if and only if ∀ x , x (cid:48) ∈ { , } n × with (cid:107) x − x (cid:48) (cid:107) > k n , (cid:107) Q x − Q x (cid:48) (cid:107) ∞ > d n . (1) February 1, 2021 DRAFT
Let us introduce the following definition.
Definition 2.1: ( n, k n , d n ) -CQGT denotes the combinatorial quantitative group testing problem defined above. Ifa pooling matrix Q is a solution to ( n, k n , d n ) -CQGT , it is called an ( n, k n , d n ) -detecting matrix. s ∗ CQGT ( n, k n , d n ) denotes the smallest possible pooling complexity among all non-adaptive pooling algorithms, that is, it is the smallestheight of ( n, k n , d n ) -detecting matrices.Throughout our development, it turns out that CQGT with an additional sparsity constraint, which we call sparsecombinatorial group testing (SCQGT), can and should be explored simultaneously, as it serves as part of our explicitconstruction of non-adaptive CQGT measurement plans. Let us introduce the following definition to better refer tothis problem. Definition 2.2: ( n, k n , d n , l n ) -SCQGT denotes the combinatorial quantitative group testing problem ( n, k n , d n ) -CQGT with the additional sparsity assumption on the data vector x , that is, (cid:107) x (cid:107) ≤ l n . If a pooling matrix Q is asolution to ( n, k n , d n , l n ) -SCQGT , with a slight abuse of notation, it is called an ( n, k n , d n , l n ) -detecting matrix. s ∗ SCQGT ( n, k n , d n , l n ) denotes the smallest possible pooling complexity among all non-adaptive pooling algorithms,that is, it is the smallest height of ( n, k n , d n , l n ) -detecting matrices.III. F UNDAMENTAL L IMITS
In this section, we provide the characterization of the optimal non-adaptive pooling complexity for ( n, n κ , n δ ) -CQGT , < δ ≤ κ < . The characterization is tight to within a constant factor that is independent of ( n, κ, δ ) ,as stated in the following theorem. Theorem 3.1:
For < δ ≤ κ < , s ∗ CQGT ( n, n κ , n δ ) = − δ n log n up to a constant factor that is independent of ( n, κ, δ ) . Proof:
The proof comprises two parts: achievability and converse, established in the lemmas below.
Lemma 3.1 (CQGT Achievability):
For < δ ≤ κ < , lim sup n →∞ s ∗ CQGT ( n,n κ ,n δ ) n/ log n ≤ − δ . In words, there exists a sequence of ( n, n κ , n δ ) -detecting matrices with pooling complexity not greater than − δ n log n as n → ∞ . Lemma 3.2 (CQGT Converse):
For < δ ≤ κ < , s ∗ CQGT ( n, n κ , n δ ) ≥ − δ n log n . The two lemmas complete the proof of the theorem. The proof of achievability (Lemma 3.1) is in Appendix A,which uses a probabilistic argument to prove the existence of good pooling matrices. Converse (Lemma 3.2) isproved in Appendix B, which is based on extending a counting argument with its root in [9].It is interesting to note that the leading coefficient does not depend on the order of the detection criterion κ . Inother words, as long as partial detection to within n κ successfully detection items is allowed, the number of poolsto be measured only depend on the strength of the adversarial perturbation n δ , where δ ≤ κ/ . February 1, 2021 DRAFT
When the defective items are sparsely populated in the data set, the number of pools needed to be measured shouldbe smaller. The following theorem characterizes the optimal non-adaptive pooling complexity for ( n, n κ , n δ , n λ ) -SCQGT when < δ ≤ κ < λ < . Theorem 3.2:
For < δ ≤ κ < λ < , s ∗ SCQGT ( n, n κ , n δ , n λ ) = − λλ − δ n λ , δ < κ − λλ − δ n λ log n, δ = κ up to a constant factor that is independent of ( n, κ, δ, λ ) . Proof:
Similar to the proof of Theorem 3.1, the following two lemmas correspond to achievability and converserespectively, and their combination completes the proof.
Lemma 3.3 (SCQGT Achievability):
For < δ ≤ κ < λ < , lim sup n →∞ s ∗ SCQGT ( n,n κ ,n δ ,n λ ) n λ ≤ − λ ) λ − δ , δ < κ lim sup n →∞ s ∗ SCQGT ( n,n κ ,n δ ,n λ ) n λ log n ≤ − λ ) λ − δ , δ = κ Lemma 3.4 (SCQGT Converse):
For < δ ≤ κ < λ < , s ∗ SCQGT ( n, n κ , n δ , n λ ) ≥ − λ ) λ − δ n λ , δ < κ − λ ) λ − δ n λ log n, δ = κ Proofs of the above two lemmas are similar to those of Lemma 3.1 and 3.2 and hence left in the appendix.IV. A
LGORITHMS
In this section, first we give a basic construction of a non-adaptive measurement plan for ( n, n κ , n δ ) -CQGT with pooling complexity that has the optimal order in n but a suboptimal leading coefficient in terms of κ, δ inSection IV-A. To achieve a better leading coefficient, a non-adaptive pooling algorithm for SCQGT is developedin Section IV-C. This non-adaptive pooling algorithm is then combined with the basic non-adaptive measurementplan in Section IV-A to give an explicit construction of a non-adaptive pooling algorithm with optimal leadingcoefficient of the pooling complexity, along with an efficient (nearly-linear-in- n ) decoding algorithm. A. Basic construction
The basic construction of the non-adaptive CQGT measurement plan is given below. Some necessary notationsare set up first. Let (cid:15) = κ − δ > . Let |(cid:100) n (cid:15)/ (cid:101)| denote the smallest possible width of the detecting matrix forthe noiseless coin weighing problem mentioned in Section 4 of [11] that is not smaller than n (cid:15)/ , and M |(cid:100) n (cid:15)/ (cid:101)| be the corresponding detecting matrix. Let (cid:107)(cid:100) n − (cid:15)/ (cid:101)(cid:107) denote the smallest possible size of the Sylvester’s typeHadamard matrix that is not smaller than n − (cid:15)/ , and H (cid:107)(cid:100) n − (cid:15)/ (cid:101)(cid:107) be the corresponding Hadamard matrix. Let ¯ n = |(cid:100) n (cid:15)/ (cid:101)| (cid:107)(cid:100) n − (cid:15)/ (cid:101)(cid:107) and let P ¯ n = H (cid:107)(cid:100) n − (cid:15)/ (cid:101)(cid:107) ⊗ M |(cid:100) n (cid:15)/ (cid:101)| the Kronecker product of the two matrices. February 1, 2021 DRAFT
We are ready to give our basic construction. According to the above setup, entries of P ¯ n take value in { , ± } .Let us find two { , } -matrices Q n and Q n such that Q n − Q n = P ¯ n , concatenate them vertically into a newmatrix Q , and delete the last ¯ n − n columns of Q to get ˆ Q . The width of the matrix ˆ Q becomes n , and ˆ Q standsfor the measurement matrix that we would like to construct.The basic construction ˆ Q turns out to be a detecting matrix for CQGT with guarantees summarized in followingtheorem, the proof of which is detailed in Appendix C. It leverages the structure of the Hadamard matrix alongwith the detecting capability of M . Theorem 4.1 (Basic Construction):
For n sufficiently large, ˆ Q is a (cid:0) n, n κ , n δ (cid:1) -detecting matrix with poolingcomplexity no more than κ − δ n log ( n ) , and an efficient O ( n ) decoding algorithm.Later in Section IV-D, we provide a companion two-step decoding algorithm for this non-adaptive measurementplan with O ( n ) time complexity. B. Outline of the improved construction
Towards improving the leading coefficient in the pooling complexity of the above basic construction, in thefollowing we first outline the program of the improved construction. The construction turns out to be the combinationof the above basic one and a sufficiently good non-adaptive measurement plan for ( n, n κ , n δ , n λ ) -SCQGT .The idea of the improved construction goes as follows. First, let us consider a given non-adaptive measurementplan for ( n, n κ , n δ , n λ ) -SCQGT with pooling matrix Q . Then, take the basic construction for ( n, n λ , n δ ) -CQGT in Section IV-A with pooling matrix M . Finally, concatenate M and Q vertically to get R with a pooling matrix R = MQ (2)By definition of the detecting matrices, (cid:107) M x (cid:107) ∞ ≥ n δ , ∀ x ∈ { , ± } n with (cid:107) x (cid:107) ≥ n λ , (cid:107) Q x (cid:107) ∞ ≥ n δ , ∀ x ∈ { , ± } n with n λ ≥ (cid:107) x (cid:107) ≥ n κ . Hence, (cid:107) R x (cid:107) ∞ = max {(cid:107) M x (cid:107) ∞ , (cid:107) Q x (cid:107) ∞ } ≥ n δ for all x ∈ { , ± } n with (cid:107) x (cid:107) ≥ n κ . This suggests that R ,the vertical concatenation of M and Q , is a detecting matrix for ( n, n κ , n δ ) -CQGT . Its pooling complexity is thesum of those of Q and M . Since the pooling complexity of M is asymptotically no more than λ − δ n log n , as longas that of Q is o ( n log n ) , the overall pooling complexity can be made − δ n log n to within a constant factor bysetting λ → . C. Explicit construction of the pooling matrix for ( n, n κ , n δ , n λ ) -SCQGT With the discussion in Section IV-B, the remaining problem is how to construct explicit pooling matrix for ( n, n κ , n δ , n λ ) -SCQGT . To construct such matrix with the desired pooling complexity, we propose an approach to reduce the original problem into constructing an unbalanced bipartite expander , so that once the construction ofthe bipartite expander is found, it leads to the construction of a ( n, n κ , √ − (cid:15) n δ , n λ ) -detecting matrix with height February 1, 2021 DRAFT asymptotically being n λ (cid:16) λ log n(cid:15) (cid:17) (cid:114) λ log n log (cid:16) λ log2 n(cid:15) (cid:17) = o ( n λ + ρ ) for all ρ > , satisfying the above-mentioneddesired requirement on the height of Q .To describe how the reduction works, let us first introduce the notions of bipartite graphs and bipartite expanders. Definition 4.1 (Bipartite Graphs):
A bipartite graph G = ( L , R , E ) consists of the set of left vertices L , the setof right vertices R , and the set of edges E , where each edge in E is a pair ( i, j ) with i ∈ L and j ∈ R . Note thatwe follow the convention that left and right vertex sets are equivalently represented by their index sets, that is, thefollowing equivalence: L ≡ [ N ] (cid:44) { , ..., N } , R ≡ [ M ] (cid:44) { , ..., M } , where | L | = N and | R | = M .A bipartite graph is called left- D -regular if each left vertex i ∈ L has exactly D neighbors in the right part R ,that is, the cardinality of the neighbor of i (denoted by Γ( i ) ) is exactly D for all i ∈ L Definition 4.2 (Bipartite Expanders):
A bipartite, left- D -regular graph G = ([ N ] , [ M ] , E ) is called a ( N, M, D, K, A ) -bipartite expander if ∀ S ⊆ [ N ] with | S | ≤ K , | Γ( S ) | ≥ A | S | , where Γ( S ) (cid:44) (cid:91) i ∈ S Γ( i ) , the union of all neighbors of nodes in S .Before we proceed, let us give an example of a (6 , , , , -bipartite expander in Figure 1. Induced matrix 𝐁 𝑔 Fig. 1: A (6 , , , , -bipartite expander and its induced matrix B G .In our construction, we heavily rely on the binary matrix (denoted by B G ) induced by a bipartite graph G =( L , R , E ) , with its ( i, j ) -th entry being ( B G ) i,j (cid:44) , if ( i, j ) ∈ E ; , otherwise.Hence, by definition, for a ( N, M, D, K, A ) -bipartite expander G , its corresponding induced binary matrix B G satisfies the following two properties: • Each of its column vector is D -sparse, that is, each of them has exactly D ’s. February 1, 2021 DRAFT • For any k of its column vectors B i , ..., B i k , ≤ k ≤ K , (cid:107) (cid:87) j = i ,...,i k B j (cid:107) ≥ Ak , where (cid:87) denotes thebit-wise “or” operation of binary column vectors.We are now ready to describe the proposed reduction. Consider a ( N, M, D, K, A ) -bipartite expander G , withits parameters ( N, M, D, K, A ) to be specified later. The proposed pooling matrix Q takes the following form: Q = D ... D D , (3)where D , ..., D D are block matrices to be constructed from the expander G , as described in the following steps:1) First, take the induced binary matrix B G of the ( N, M, D, K, A ) -bipartite expander G .2) Second, decompose B G into D binary matrices B , ..., B D , such that each of their columns has exactly onenon-zero element, and D (cid:88) i =1 B i = B G .
3) Third, for i = 1 , ..., D , construct the i -th block matrix D i as D i = H M B i , where H M is the Hadamard matrix of size M .Let us illustrate the construction with an example depicted in Figure 2. In this example, we first decompose B G into B + B , and then construct D = H B and D = H B respectively. = + = 1 11 −1 1 11 −11 1 −1 −1 −1 1 −1 −1 −1 −1 −1 1 −1 −1 1 = −1 −1 −1 1 0 0 00 0 0 0 0 01 0 0 Induced matrix 𝐁 𝑔 𝐁 𝐁 𝐇 𝐇 𝐁 𝐁 𝐇 𝐃 𝐃 Fig. 2: example of the reduction procedure
February 1, 2021 DRAFT
To this end, what remains is how to choose the parameters ( N, D, M, K, A ) in the bipartite expander so thatthe constructed pooling matrix Q in (3) meets our need. It turns out that with the explicit construction of bipartiteexpanders by Guruswami et al. [20], we are able to choose ( N, D, M, K, A ) = n, λ log( n ) (cid:15) (cid:115) λ log( n ) log (cid:18) λ log( n )2 (cid:15) (cid:19) , D n λ (cid:114) λ log( n ) log (cid:16) λ log( n )2 (cid:15) (cid:17) , n λ , (1 − (cid:15) ) D (4)in the explicit construction, and together with the reduction mentioned above, the desired pooling matrix Q can beobtained. The following theorem summarizes the guarantees of the constructed Q . Theorem 4.2:
With the above construction, the pooling matrix Q in equation (3) is ( n, n κ , √ − (cid:15) n δ , n λ ) -detectingand its pooling complexity is at most n λ (cid:16) λ log( n ) (cid:15) (cid:17) (cid:114) λ log( n ) log (cid:16) λ log( n )2 (cid:15) (cid:17) . Proof:
Recall that form of the constructed matrix Q in (3). For each D i , ∀ x , x (cid:48) ∈ { , } n × , (cid:107) x (cid:107) , (cid:107) x (cid:48) (cid:107) < n λ , (cid:107) D i ( x − x (cid:48) ) (cid:107) = ( a ) (cid:13)(cid:13)(cid:13) H M ( x − x (cid:48) ) ifold (cid:13)(cid:13)(cid:13) = ( b ) M (cid:13)(cid:13)(cid:13) ( x − x (cid:48) ) ifold (cid:13)(cid:13)(cid:13) ≥ ( c ) M | S i x − x (cid:48) | where ( x − x (cid:48) ) ifold is defined as a M × vector such that ( x − x (cid:48) ) ifold ( j ) := (cid:88) z : z ’th column at B i has it’s non zero element at j -th place ( x − x (cid:48) ) ( z ) and S i x − x (cid:48) := { j ∈ [ M ] | there is only one column vector in B i such that it’s non zero element at j -th placeand it correspond to a non-zero element in ( x − x (cid:48) ) } and ( a ) follows from definition of ( x − x (cid:48) ) ifold . ( c ) follows from the fact that x − x (cid:48) is a ternary vector. ( b ) followsfrom the property of Hadamard matrix (cid:0) ∀ x ∈ R M , (cid:107) H M x (cid:107) = M (cid:107) x (cid:107) (cid:1) .Then (cid:107) Q ( x − x (cid:48) ) (cid:107) = D (cid:88) i =1 (cid:107) D i ( x − x (cid:48) ) (cid:107) ≥ D (cid:88) i =1 M | S i x − x (cid:48) |≥ ( a ) M ( A (cid:107) x − x (cid:48) (cid:107) − ( D − A ) (cid:107) x − x (cid:48) (cid:107) ) = M (2 A − D ) (cid:107) x − x (cid:48) (cid:107) ( a ) follows from B Γ is an adjacency matrix of bipartite expander graph and the definition of S i x − x (cid:48) and the factthat x − x (cid:48) is a ternary vector.Then (cid:107) Q ( x − x (cid:48) ) (cid:107) ∞ ≥ ( a ) (cid:115) (cid:107) Q ( x − x (cid:48) ) (cid:107) size of Q ≥ (cid:114) M (2 A − D ) (cid:107) x − x (cid:48) (cid:107) DM = (cid:114) A − DD (cid:107) x − x (cid:48) (cid:107) ≥ ( b ) (cid:114) A − DD n κ ≥ ( c ) (cid:114) A − DD n δ ( a ) follows from max always bigger than mean. ( b ) , ( c ) follows from that our goal is solving ( n, n κ , √ − (cid:15)n δ , n λ ) -SCQGT problem. Then, size of matrix Q = D (cid:88) i =1 size of matrix D i = M D
February 1, 2021 DRAFT0
The remaining problem is how to choose good parameter ( N, D, M, K, A ) and how to construct such bipartiteexpander explicitly. It turns out we can obtain explicit construction of bipartite expander from [20] Theorem 3.5 withparameter N = n, D = λ log( n ) (cid:15) (cid:115) λ log( n ) log (cid:18) λ log( n )2 (cid:15) (cid:19) , M = D n λ (cid:114) λ log( n ) log (cid:16) λ log( n )2 (cid:15) (cid:17) , K = n λ , A = (1 − (cid:15) ) D .Finally, plug in these specific parameter, the proof is complete. D. Decoding algorithms and complexities
Let us turn to the decoding algorithms for measurement plan mentioned in Theorem 4.1 in Section IV-D1, anddecoding algorithms for measurement plan mentioned in Theorem 4.2 in Section IV-D2. .
1) Decoding algorithm for Basic code:
We propose a two step O ( n ) decoding algorithm for the basic construc-tion:1) Deconstruction step2) Rounding stepFor the deconstruction step, let y (cid:48) = ˆQ x (cid:48) + n (cid:48) , x (cid:48) ∈ { , } n . Since ˆQ is reduced from Q , y (cid:48) = Q x + n (cid:48) , x = (cid:104) x (cid:48) , ¯ n − n (cid:105) ∈ { , } ¯ n , where ¯ n − n is the zero vector with size ¯ n − n . Let (cid:15) = κ − δ . We first subtract y (cid:48) l (lowerhalf of y (cid:48) ) from y (cid:48) u (upper half y (cid:48) ), then y = y (cid:48) u − y (cid:48) l = (cid:0) Q n − Q n (cid:1) x + n (cid:48) u − n (cid:48) l = P ¯ n x + n Note that Sylvester’s type Hadamard matrix with size d can be write as kronecker product of H itself d times H d = H ⊗ H ⊗ ... ⊗ H = H ⊗ d hence P ¯ n = H (cid:108) n − (cid:15) (cid:109) ⊗ M (cid:108) n (cid:15) (cid:109) = H ⊗ H (cid:24) n − (cid:15) (cid:25) ⊗ M (cid:108) n (cid:15) (cid:109) = H ⊗ P ¯ n = P ¯ n P ¯ n P ¯ n − P ¯ n then we can see that y = y u y l = P ¯ n P ¯ n P ¯ n − P ¯ n x + n Next we do some row operation(deconstruction) y u + y l y u − y l = P ¯ n
00 P ¯ n x + n u + n l n u − n l After some calculation (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) n u + n l n u − n l (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = (cid:13)(cid:13)(cid:13)(cid:13) n u + n l (cid:13)(cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13)(cid:13) n u − n l (cid:13)(cid:13)(cid:13)(cid:13) = (cid:107) n u (cid:107) + (cid:107) n l (cid:107) (cid:107) n (cid:107) February 1, 2021 DRAFT1
We can see that the two-norm square of noise vector reduce by half after one time deconstruction. Hence, afterwe do log (cid:0)(cid:6) n − (cid:15) (cid:7)(cid:1) times deconstruction R (cid:0) y , log (cid:0)(cid:6) n − (cid:15) (cid:7)(cid:1)(cid:1) = P (cid:108) n (cid:15) (cid:109) ...
00 P (cid:108) n (cid:15) (cid:109) ... ... ... . . . ... ... P (cid:108) n (cid:15) (cid:109) x x ... x (cid:108) n − (cid:15) (cid:109) + R (cid:0) n , log (cid:0)(cid:6) n − (cid:15) (cid:7)(cid:1)(cid:1) (5)Here we define R ( y , t ) , R ( n , t ) as the corresponding vector after doing t times deconstruction on column vector y , n .Since (cid:107) n (cid:107) ∞ ≤ n δ , and s = o ( n ) (cid:13)(cid:13) R (cid:0) n , log (cid:0)(cid:6) n − (cid:15) (cid:7)(cid:1)(cid:1)(cid:13)(cid:13) = (cid:107) n (cid:107) (cid:6) n − (cid:15) (cid:7) = o (cid:0) n δ (cid:1)(cid:6) n − (cid:15) (cid:7) = o (cid:0) n δ + (cid:15) (cid:1) (6)Then we divide R (cid:0) y , log (cid:0)(cid:6) n − (cid:15) (cid:7)(cid:1)(cid:1) = [ y , y , ..., y (cid:108) n − (cid:15) (cid:109) ] , R (cid:0) n , log (cid:0)(cid:6) n − (cid:15) (cid:7)(cid:1)(cid:1) = [ n , n , ..., n (cid:108) n − (cid:15) (cid:109) ] intoequal length segment. From (5) , we can see that y i = P (cid:108) n (cid:15) (cid:109) x i + n i = M (cid:108) n (cid:15) (cid:109) x i + n i , ∀ i For the rounding step, for each y i , we first do rounding, and then apply decoding algorithm for noiseless codementioned in section 4 of [11].Since we do rounding first, if (cid:107) n i (cid:107) ∞ < , then after rounding, the noisy part in y i will vanish, and so thedecoding result ˆ x i = x i , hence, for those i such that ˆ x i (cid:54) = x i , (cid:107) n i (cid:107) ≥ . Combine with (6), the number ofsegments that possible wrong is smaller than (cid:13)(cid:13) R (cid:0) n , log (cid:0)(cid:6) n − (cid:15) (cid:7)(cid:1)(cid:1)(cid:13)(cid:13) = o (cid:0) n δ + (cid:15) (cid:1) Since there are (cid:6) n (cid:15) (cid:7) bits in each segment, the total number of error bits must smaller than o (cid:0) n δ + (cid:15) (cid:1) (cid:6) n (cid:15) (cid:7) = o (cid:0) n δ + (cid:15) (cid:1) = o ( n κ ) Finally, in each deconstruction step we takes O (cid:16) n log ( n ) (cid:17) operations, and we do log (cid:0)(cid:6) n − (cid:15) (cid:7)(cid:1) times of decon-struction, so the decoding complexity in deconstruction step is O ( n ) . For the rounding step, it is easy to checkthat the decoding complexity for each data segment x i is O (cid:0)(cid:6) n (cid:15) (cid:7)(cid:1) , and there are totally (cid:6) n − (cid:15) (cid:7) segments, hencethe total decoding complexity for the rounding step is O (¯ n ) = O ( n ) , so the total decoding complexity for basiccode is O ( n ) .
2) Decoding algorithm for ( n, n κ , n δ , n λ ) -SCQGT : Let us turn to the decoding algorithms for measurementplan mentioned in Theorem 4.2.
Theorem 4.3:
There is a decoding algorithm for the non-adaptive measure plan for ( n, n κ , √ − (cid:15) n δ , n λ ) -SCQGTin Theorem 4.2 has decoding complexity O n λ ( λ − δ ) log( n ) (cid:15) (cid:115) λ log( n ) log (cid:18) λ log( n )2 (cid:15) (cid:19) Proof:
The decoding procedure goes as follows.1) Get the pooling result y = (cid:104) y (cid:124) ... y (cid:124) D (cid:105) (cid:124) = Q x + n = (cid:104) ( D x + n ) (cid:124) ... ( D D x + n ) (cid:124) (cid:105) (cid:124) . February 1, 2021 DRAFT2
2) Compute ¯ y i = H − M y i = H − M ( D i x + n i ) = H − M H M B i x + H M M n i = B i x + ¯ n i . And then the roundingof ¯ y i , y ∗ i = B i x + n ∗ i .3) Call the Sparse Matching Pursuit algorithm that is illustrated in Fig 2 of [21] as a black box with input y ∗ = (cid:104) ( y ∗ ) (cid:124) ... ( y ∗ D ) (cid:124) (cid:105) (cid:124) = (cid:104) B (cid:124) ... B (cid:124) D (cid:105) (cid:124) x + (cid:104) ( n ∗ ) (cid:124) ... ( n ∗ D ) (cid:124) (cid:105) (cid:124) . Get the result x ∗ ∈ Z n ofSparse Matching Pursuit algorithm with guarantee (cid:107) x − x ∗ (cid:107) = O (cid:16)(cid:13)(cid:13)(cid:13)(cid:104) ( n ∗ ) (cid:124) ... ( n ∗ D ) (cid:124) (cid:105) (cid:124) (cid:13)(cid:13)(cid:13) /D (cid:17) .4) Base on x ∗ , we do some refinements. If some elements of x ∗ is smaller than , we set it to , if someelements of x ∗ greater than , we set it to . The result vector z ∗ is our final output.Note that y ∗ i is the rounding of ¯ y i , and elements of B i , x are all integer, together implies n ∗ i are integer vector,and (cid:107) n ∗ i (cid:107) ≤ (cid:107) ¯ n i (cid:107) . Our noise model also implies (cid:107) n (cid:107) = (cid:0) √ − (cid:15)n δ (cid:1) × size of Q = (1 − (cid:15) ) n δ M D .Combine these two fact, we get (cid:13)(cid:13)(cid:13)(cid:104) ( n ∗ ) (cid:124) ... ( n ∗ D ) (cid:124) (cid:105) (cid:124) (cid:13)(cid:13)(cid:13) = D (cid:88) i =1 (cid:107) n ∗ i (cid:107) ≤ D (cid:88) i =1 (cid:107) ¯ n i (cid:107) = 4 D (cid:88) i =1 (cid:107) H M n i (cid:107) M = 4 D (cid:88) i =1 (cid:107) n i (cid:107) M = 4 (cid:107) n (cid:107) M ≤ − (cid:15) ) n δ D since (cid:104) ( n ∗ ) (cid:124) ... ( n ∗ D ) (cid:124) (cid:105) (cid:124) is an integer vector, we get (cid:107) x − x ∗ (cid:107) = O (cid:16)(cid:13)(cid:13)(cid:13)(cid:104) ( n ∗ ) (cid:124) ... ( n ∗ D ) (cid:124) (cid:105) (cid:124) (cid:13)(cid:13)(cid:13) /D (cid:17) = O (cid:18)(cid:13)(cid:13)(cid:13)(cid:104) ( n ∗ ) (cid:124) ... ( n ∗ D ) (cid:124) (cid:105) (cid:124) (cid:13)(cid:13)(cid:13) /D (cid:19) = O (cid:0) − (cid:15) ) n δ (cid:1) In step 4, since x is a binary vector, (cid:107) x − z ∗ (cid:107) ≤ (cid:107) x − x ∗ (cid:107) = O (cid:0) − (cid:15) ) n δ (cid:1) = O (cid:0) − (cid:15) ) n δ (cid:1) = O (4 (1 − (cid:15) ) n κ ) Hence the output z ∗ satisfy the SCQGT guarantee.For decoding complexity, note that the decoding complexity for step 2 is O ( M D log( M )) = O ( n ) , for step 3,by [21] Theorem 1, is O (cid:16) nD log (cid:16) D (cid:107) x (cid:107) / (cid:13)(cid:13)(cid:13)(cid:104) ( n ∗ ) (cid:124) ... ( n ∗ D ) (cid:124) (cid:105) (cid:124) (cid:13)(cid:13)(cid:13) (cid:17)(cid:17) = O n λ ( λ − δ ) log( n ) (cid:15) (cid:115) λ log( n ) log (cid:18) λ log( n )2 (cid:15) (cid:19) for step 4 is O ( n ) , hence the overall decoding complexity is dominated by step 3. E. Explicit construction of the optimal CQGT pooling matrix and decoding algorithm
We are now ready to complete the program outlined in Section IV-B, that is, to combine the non-adaptivemeasurement plan in Section IV-A and the non-adaptive SCQGT pooling matrix in Section IV-C to produce annon-adaptive scheme for the original CQGT problem. In particular, the following theorem summarizes the detectingcapability of the constructed pooling matrix R in (2). Theorem 4.4:
In (2), if the submatrix M is a ( n, n λ , n δ ) -CQGT detecting matrix based on the construction inSection IV-A and Q is a ( n, n κ , n δ , n λ ) -SCQGT detecting matrix based on the construction in Section IV-C, thenthe matrix R is a ( n, n κ , n δ ) -CQGT detecting matrix. Moreover, there is an efficient λ ( λ − δ ) (cid:15) O n log( n ) (cid:115) λ log( n ) log (cid:18) λ log( n )2 (cid:15) (cid:19) decoding algorithm. February 1, 2021 DRAFT3
Proof:
Step A: First, employ the measurement plan of the basic construction in Section IV-A for ( n, n λ , n δ ) -CQGT and use the companion decoding algorithm (described in Section IV-D). The decoded result is then represented asfollows: ˆ x = x − p , (7)where x is the true data vector and p is n λ -sparse { , ± } -vectors. In this step, we make s A = λ − δ n log n countingmeasurements.Step B: In this step, p , the remaining mistakes made in Step A, will be detected. Compute ˆ y = Qˆ x , combined with y = Q x + n (part of the answer vector, where n denote the noise vector), we get z := y − ˆ y = Q p + n . Then calledthe SCQGT decoding algorithm mentioned in Theorem 4.3 to solve p , and we denote it’s output vector by z ∗ . ByTheorem 4.3, Q is a detecting matrix for ( n, n κ , √ − (cid:15) n δ , n λ ) -SCQGT, and thus our final output ˆ x + z ∗ would dif-fer from true data vector x at most n κ bits. In this step, we make s B = O n λ ( λ − δ ) log( n ) (cid:15) (cid:115) λ log( n ) log (cid:18) λ log( n )2 (cid:15) (cid:19) counting measurements.Note that in Step B, p is ternary vector, not binary as supposed in Theorem 4.3, but it turns out that with littleadjustment, the whole algorithm still work for ternary vector, and consequently the guarantee in Step B follows.To sum up, the total number of mistakes made at the end is at most n κ with the number of counting measurementsno more than s total = s A + s B = λ − δ n log n + O n λ ( λ − δ ) log( n ) (cid:15) (cid:115) λ log( n ) log (cid:18) λ log( n )2 (cid:15) (cid:19) . Asymptotically, when n is large enough, s total ≈ λ − δ n log n . Taking λ → , s total tends to − δ n log n . As a result,the total pooling complexity is − δ n log n . Moreover, the total decoding complexity is dominated by Step B, thedecoding algorithm in Theorem 4.3. Hence, the decoding complexity of the overall decoding algorithm is shownto be that claimed in the Theorem. R EFERENCES[1] W.-N. Chen and I.-H. Wang, “Partial data extraction via noisy histogram queries: Information theoretic bounds,” in , 2017, pp. 2488–2492.[2] R. Dorfman, “The detection of defective members of large populations,”
The Annals of Mathematical Statistics , vol. 14, no. 4, pp. 436–440,1943.[3] D. Du and F. Hwang,
Combinatorial group testing and its applications . World Scientific, 1993.[4] M. Aldridge, O. Johnson, and J. Scarlett, “Group testing: An information theory perspective,”
Foundations and Trends® inCommunications and Information Theory , vol. 15, no. 3-4, pp. 196–392, 2019. [Online]. Available: http://dx.doi.org/10.1561/0100000099[5] C.-C. Cao, C. Li, and X. Sun, “Quantitative group testing-based overlapping pool sequencing to identify rare variant carriers,”
BMCBioinformatics , vol. 15, no. 1, p. 195, June 2014.[6] C. Wang, Q. Zhao, and C. Chuah, “Group testing under sum observations for heavy hitter detection,” in , Feb 2015, pp. 149–153.[7] G. De Marco, T. Jurdzi´nski, and D. R. Kowalski, “Optimal channel utilization with limited feedback,” in
Fundamentals of ComputationTheory , L. A. G ˛asieniec, J. Jansson, and C. Levcopoulos, Eds. Cham: Springer International Publishing, 2019, pp. 140–152.[8] H. S. Shapiro and N. J. Fine, “Problem e 1399,”
The American Mathematical Monthly , vol. 67, no. 7, pp. 697–698, 1960.
February 1, 2021 DRAFT4 [9] P. Erd˝os and A. Rényi, “On two problems of information theory,”
A Magyar Tudományos Akadémia. Matematikai Kutató IntézeténekKözleményei , vol. 8, pp. 229–243, 1963.[10] B. Lindström, “On a combinatorial problem in number theory,”
Canadian Mathematical Bulletin , pp. 477–490, 1965.[11] D. G. Cantor and W. H. Mills, “Determination of a subset from certain combinatorial properties,”
Canadian Journal of Mathematics ,vol. 18, pp. 42–48, 1966.[12] A. E. Alaoui, A. Ramdas, F. Krzakala, L. Zdeborová, and M. I. Jordan, “Decoding from pooled data: Sharp information-theoretic bounds,”
SIAM Journal on Mathematics of Data Science , vol. 1, no. 1, pp. 161–168, 2019.[13] E. Karimi, F. Kazemi, A. Heidarzadeh, K. R. Narayanan, and A. Sprintson, “Sparse graph codes for non-adaptive quantitative grouptesting,”
Proceedings of IEEE Information Theory Workshop , 2019.[14] C. Dwork, “Differential privacy,”
Proceedings of 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP2006) , 2006.[15] J. Scarlett and V. Cevher, “Phase transitions in the pooled data problem,”
Advances in Neural Information Processing Systems 30 (NIPS2017) , pp. 377–385, 2017.[16] I. Dinur and K. Nissim, “Revealing information while preserving privacy,” 2003.[17] C. Dwork and S. Yekhanin, “New efficient attacks on statistical disclosure control mechanisms,” in
Advances in Cryptology"”CRYPTO2008 , vol. 5157, 2008, pp. 469–480.[18] N. H. Bshouty, “On the coin weighing problem with the presence of noise,” in
Approximation, Randomization, and CombinatorialOptimization. Algorithms and Techniques . Springer Berlin Heidelberg, 2012, pp. 471–482.[19] A. Ta-Shma, C. Umans, and D. Zuckerman, “Lossless condensers, unbalanced expanders, and extractors,”
Combinatorica , vol. 27, no. 2,pp. 213–240, 2007.[20] V. Guruswami, C. Umans, and S. Vadhan, “Unbalanced expanders and randomness extractors from Parvaresh-Vardy codes,”
Journal ofthe ACM , vol. 56, no. 4, 2009.[21] M. Ruzic, R. Berinde, and P. Indyk, “Practical near-optimal sparse recovery in the L1 norm,”
Proceedings of the 46th Annual AllertonConference on Communication, Control, and Computing , pp. 198–205, 2008.[22] S.-C. Chang and E. J. Weldon, “Coding for t -user multiple-access channels,” IEEE Transactions on Information Theory , vol. 25, no. 6,pp. 684–691, November 1979.[23] J. H. Wilson, “Error-correcting codes for a t -user binary adder channel,” IEEE Transactions on Information Theory , vol. 34, no. 4, pp.888–890, July 1988.[24] J. Cheng, K. Kamoi, and Y. Watanabe, “User identification by signature code for noisy multiple-access adder channel,”
Proceedings ofIEEE International Symposium on Information Theory , pp. 1974–1977, 2006.
February 1, 2021 DRAFT5 A PPENDIX
A. Proof of Achievability (Lemma 3.1 and 3.3)
A probabilistic argument is used to show the existence of good pooling matrices. In particular, we are going to upper bound the probability that a randomly generated matrix with height s is not an ( n, k n , d n ) -detecting matrix.If this probability is strictly bounded below , then the existence of ( n, k n , d n ) -detecting matrices is established.In words, we are going to show that s ≤ − δ n log n is a sufficient condition for the upper bound mentioned abovebeing strictly less than .Let us now describe the random pooling matrix ensemble employed in this probabilistic argument. To simplifythe analysis, we focus on pooling matrices with {± } -entries. Note that any pooling vector with {± } -entries canbe generated by taking the difference of two pooling vectors with { , } -entries. Hence, at the end of our analysis,to conform with the original CQGT problem formulation, we need to double the pooling complexity upper bound.The random pooling matrix ensemble is generated as follows: each element of the matrix is drawn from {± } uniformly at random, i.i.d. across all entries. With a slight abuse of notation, let Q denote this random matrix, thatis, ( Q ) i,j i.i.d. ∼ Unif( {± } ) , ∀ ( i, j ) ∈ { , ..., s } × { , ..., n } , and let Q i denote the i -th row of Q .Consider the event E that Q is not an ( n, k n , d n ) -detecting matrix. By definition (Definition 2.1), E = ∃ x , x (cid:48) ∈ { , } n × with (cid:107) x − x (cid:48) (cid:107) > k n and (cid:107) Q x − Q x (cid:48) (cid:107) ∞ ≤ d n . (8)For notational convenience, let us introduce D ba = (cid:8) x − y (cid:12)(cid:12) x , y ∈ { , } n × , a < (cid:107) x − y (cid:107) ≤ b (cid:9) (9)to denote the set of difference vectors of (cid:96) -norm ranging from a to b . With the notations above, the event E canbe succinctly written as E = (cid:83) d ∈ D nkn (cid:8) (cid:107) Q d (cid:107) ∞ ≤ d n (cid:9) . Hence, by the Union Bound, Pr {E} ≤ (cid:88) d ∈ D nkn Pr {(cid:107) Q d (cid:107) ∞ ≤ d n } = (cid:88) d ∈ D nkn s (cid:89) i =1 Pr { Q i d ≤ d n } (10)Noting that the event { Q i d ≤ d n } is equivalent to the event that out of (cid:107) d (cid:107) i.i.d. Unif( {± } ) random variables,the number of +1 and the number of − differ by at most d n , we have Pr { Q i d ≤ d n } = (cid:88) (cid:96) : (cid:107) d (cid:107) − d n ≤ (cid:96) ≤(cid:107) d (cid:107) +2 d n (cid:18) (cid:107) d (cid:107) (cid:96) (cid:19) −(cid:107) d (cid:107) ≤ d n (cid:18) (cid:107) d (cid:107) (cid:4) (cid:107) d (cid:107) (cid:5)(cid:19) −(cid:107) d (cid:107) (a) ≤ d n (cid:18) π (cid:107) d (cid:107) (cid:19) − / (11) February 1, 2021 DRAFT6 (a) is due to the fact that (cid:0) j (cid:98) j/ (cid:99) (cid:1) ≤ j ( πj/ − / for all j ∈ N . Combining (10) and (11), we get Pr {E} ≤ (cid:88) d ∈ D nkn (cid:16) d n (cid:112) /π (cid:17) s (cid:107) d (cid:107) − s/ = n (cid:88) (cid:96) = (cid:98) k n (cid:99) +1 | D (cid:96)(cid:96) − | (cid:16) d n (cid:112) /π (cid:17) s (cid:96) − s/ (12)To proceed, the range of the above summation is divided into three regimes and bounded separately: the firstregime is k n ≤ (cid:96) ≤ k n n (cid:15) , the second regime is k n n (cid:15) ≤ (cid:96) ≤ n − (cid:15) , and the third regime is n − (cid:15) ≤ (cid:96) ≤ n , where (cid:15) is a positive constant that is smaller than (1 − log n k n ) / − κ ) / . Then,(12) (a) ≤ (cid:12)(cid:12)(cid:12) D k n n (cid:15) (cid:98) k n (cid:99) (cid:12)(cid:12)(cid:12) (cid:18) d n √ k n (cid:112) /π (cid:19) s + (cid:12)(cid:12)(cid:12) D n − (cid:15) k n n (cid:15) (cid:12)(cid:12)(cid:12) (cid:18) d n √ k n n (cid:15) (cid:112) /π (cid:19) s + | D nn − (cid:15) | (cid:18) d n √ n − (cid:15) (cid:112) /π (cid:19) s (b) ≤ (2( n + 1)) k n n (cid:15) (cid:18) d n √ k n (cid:112) /π (cid:19) s + (2( n + 1)) n − (cid:15) (cid:18) d n √ k n n (cid:15) (cid:112) /π (cid:19) s + 3 n (cid:18) d n √ n − (cid:15) (cid:112) /π (cid:19) s (c) = (2( n + 1)) n κ +2 (cid:15) (cid:16) n δ − κ (cid:112) /π (cid:17) s (13) + (2( n + 1)) n − (cid:15) (cid:16) n δ − κ − (cid:15) (cid:112) /π (cid:17) s (14) + 3 n (cid:16) n δ − + (cid:15) (cid:112) /π (cid:17) s . (15) (a) follows from dividing the whole summation into the three regimes mentioned above and applying the triviallower bound of (cid:96) in each regime. (b) follows from applying two different upper bounds on the sizes of differencesets: | D ba | = (cid:80) bj = a +1 (cid:0) nj (cid:1) j ≤ (cid:80) bj =0 (cid:0) nj (cid:1) b ≤ ( n + 1) b b , (16) | D ba | = (cid:80) bj = a +1 (cid:0) nj (cid:1) j ≤ (cid:80) nj =0 (cid:0) nj (cid:1) j = 3 n . (c) follows from plugging in k n = n κ , d n = n δ .Finally, in order to ensure all the three terms (13) – (15) vanish as n → ∞ , since it is the most stringent to drive(15) to zero, it suffices to choose s = log (3) n (1 / − δ − (cid:15) ) log ( n ) + 1 . Picking sufficiently small (cid:15) ∈ (0 , − κ ) , we immediately see that it is also sufficient to choose s = 41 − δ n log n to ensure (13) – (15) all vanish as n → ∞ . As a result, there exists a {± } -pooling matrix with size s = − δ n log n .Finally, note that a { , } -pooling matrix can be generated by simple row operations from {± } -pooling matrix,with the increase of the height by at most a factor of . Hence, there exists a binary pooling matrix with size s = − δ n log n , and this completes the proof of Lemma 3.1. February 1, 2021 DRAFT7
As for the proof of achievability for the sparse case (Lemma 3.3), we slightly modify the definition of event E in (8) as follows: E = ∃ x , x (cid:48) ∈ { , } n × with (cid:107) x (cid:107) , (cid:107) x (cid:48) (cid:107) < l n (cid:107) x − x (cid:48) (cid:107) > k n and (cid:107) Q x − Q x (cid:48) (cid:107) ∞ ≤ d n . In words, it is the event that Q is not an ( n, k n , d n , l n ) -detecting matrix. Then, following the same proof program,an upper bound on the probability of this event, similar to (12), can be found as follows: Pr {E} ≤ l n (cid:88) (cid:96) = (cid:98) k n (cid:99) +1 | ˜ D (cid:96)(cid:96) − | (cid:16) d n (cid:112) /π (cid:17) s (cid:96) − s/ . (17)Note that now in the definition of the difference set ˜ D ba , there is an additional condition (cid:107) x (cid:107) , (cid:107) y (cid:107) ≤ l n , comparedto that of D ba in (9). Next, following the steps to upper bound (12) by (13) – (15), we derive an upper bound on(17) by diving the range of the above summation into three regimes and bounding them separately: the first regimeis k n ≤ (cid:96) ≤ k n n (cid:15) , the second regime is k n n (cid:15) ≤ (cid:96) ≤ l n n − (cid:15) , and the third regime is l n n − (cid:15) ≤ (cid:96) ≤ l n , where (cid:15) is a positive constant that is smaller than (log n l n − log n k n ) / λ − κ ) / . Then,(17) ≤ (cid:12)(cid:12)(cid:12) D k n n (cid:15) (cid:98) k n (cid:99) (cid:12)(cid:12)(cid:12) (cid:18) d n √ k n (cid:112) /π (cid:19) s + (cid:12)(cid:12)(cid:12) D l n n − (cid:15) k n n (cid:15) (cid:12)(cid:12)(cid:12) (cid:18) d n √ k n n (cid:15) (cid:112) /π (cid:19) s + (cid:12)(cid:12)(cid:12) D l n l n n − (cid:15) (cid:12)(cid:12)(cid:12) (cid:18) d n √ l n n − (cid:15) (cid:112) /π (cid:19) s (d) ≤ (2( n + 1)) k n n (cid:15) (cid:18) d n √ k n (cid:112) /π (cid:19) s + (2( n + 1)) l n n − (cid:15) (cid:18) d n √ k n n (cid:15) (cid:112) /π (cid:19) s + 2 l n (log ( e n ln )+1) (cid:18) d n √ l n n − (cid:15) (cid:112) /π (cid:19) s (e) = (2( n + 1)) n κ +2 (cid:15) (cid:16) n δ − κ (cid:112) /π (cid:17) s (18) + (2( n + 1)) n λ − (cid:15) (cid:16) n δ − κ − (cid:15) (cid:112) /π (cid:17) s (19) + 2 n λ ((1 − λ ) log ( n )+log (e)) (cid:16) n δ − λ + (cid:15) (cid:112) /π (cid:17) s . (20) (d) follows from (16), | ˜ D ba | ≤ (cid:80) bj =0 (cid:0) nj (cid:1) b , and (cid:80) bj =0 (cid:0) nj (cid:1) ≤ (cid:80) bj =0 n j j ! = (cid:80) bj =0 b j j ! (cid:0) nb (cid:1) j ≤ e b (cid:0) nb (cid:1) b . (e) follows from plugging in l n = n λ , k n = n κ , d n = n δ , < δ ≤ κ < λ < .Then, following the same discussion in the non-sparse case, in order to make (18) – (20) all vanish as n → ∞ ,it suffices to choose a sufficiently small (cid:15) and s = − λ ) λ − δ n λ , δ < κ − λ ) λ − δ n λ log n, δ = κ February 1, 2021 DRAFT8
As a result, there exists a { , } -pooling matrix with size s = − λ ) λ − δ n λ , δ < κ − λ ) λ − δ n λ log n, δ = κ B. Proof of Converse (Lemma 3.2 and 3.4)
The proof of converse is based on packing. It will be shown that if a pooling matrix Q is ( n, k n , d n ) -detecting,the number of measurements s (the height of Q ) must be greater than or equal to a certain threshold. The argumentgoes as follows. Note that the detection criterion (1) implies that, for any k n -packing C G with respect to (cid:96) -normof a subset G ⊆ { , } n , its image set after multiplying with Q , Q [ C G ] (cid:44) { Q x | x ∈ C G } , must be a d n -packingwith respect to (cid:96) ∞ -norm of the image set Q [ G ] (cid:44) { Q x | x ∈ G } . By properly choosing G , one can derive a goodupper bound on the packing number of Q [ G ] which is related to s , the height of Q . Meanwhile, a lower bound ofthe packing number of G is also a lower bound of the packing number of Q [ G ] , which can be found by a simplecounting argument. The two bounds are then combined to derive a lower bound of s .Let us consider the k n -packing number of G with respect to (cid:96) -norm. Note that it is lower bounded by the k n -covering number of G with respect to (cid:96) -norm, and the covering number is further lowered by | G | divided bythe cardinality of an (cid:96) -norm ball with radius k n . Hence, there exists a maximal packing C G with | C G | ≥ | G | (cid:80) k n j =0 (cid:0) nj (cid:1) ≥ | G | ( n + 1) k n . (21)The choice of the subset G is a second key to the proof. Since we are going to upper bound the d n -packingnumber with respect to (cid:96) ∞ -norm of the image set Q [ G ] , we select G so that it is strictly contained in an s -dimensional cube of appropriate side lengths, say, r , r , ..., r s . Then, as Q i x are integers for all x and i = 1 , ..., s ,the packing number with respect to (cid:96) ∞ -norm is upper bounded by s (cid:89) i =1 r i d n = (cid:81) si =1 r i (2 d n ) s . (22)Combining (21) and (22), it can be seen that with d n = n δ and k n = n κ , (cid:81) si =1 r i (2 d n ) s ≥ | G | ( n + 1) k n = ⇒ s (cid:32) s (cid:88) i =1 log r i s − δ log n − (cid:33) ≥ log | G | − n κ log( n + 1) . (23)Hence, to get the desired lower bound on s , r i ≈ √ n within a poly-logarithm factor and | G | ≈ n .The above discussion motivates us to make the following choice of G . Let q i (cid:44) (cid:107) Q i (cid:107) , the number of ’s inthe i -th counting measurement, i = 1 , ..., s . To this end, let us define “atypical” sets to be excluded as follows: for i = 1 , , ..., s , If q i ≥ (cid:112) n log n , B i (cid:44) (cid:8) x ∈ { , } n : | Q i x − q i / | ≥ (cid:112) q i log q i (cid:9) . (24)If q i < (cid:112) n log n , B i (cid:44) ∅ . (25) February 1, 2021 DRAFT9
The “typical” set to be considered is hence G (cid:44) { , } n \ B, where B (cid:44) (cid:83) si =1 B i . (26)To control the cardinality of B i , let us employ Hoeffiding’s inequality as follows: randomize the data vector sothat the n elements X , ..., X n are now n i.i.d. Ber(1 / random variables. In other words, X = [ X X ...X n ] (cid:124) ∼ Unif( { , } n ) . As a result, for (24), | B i | = 2 n Pr (cid:8) | Q i X − q i / | ≥ r i / (cid:9) , with r i = 2 √ q i log q i . Note that givena pooling vector Q i , the outcome of the counting measurement, Q i X , is just the sum of q i i.i.d. Ber(1 / randomvariables. Hence, by Hoeffiding’s inequality, with r i = 2 √ q i log q i , Pr (cid:8) | Q i X − q i / | ≥ r i / (cid:9) ≤ − r i / q i = 2 q − i ≤ n log n . Consequently, ∀ i = 1 , ..., s , | B i | ≤ n n log n , and | G | ≥ n − s (cid:88) i =1 | B i | ≥ n (cid:16) − sn log n (cid:17) ≥ n (cid:16) − n (cid:17) , (27)where in the last inequality, we make use of an implicit assumption that n ≥ s when n is sufficiently large, due tothe achievability part (Lemma 3.1).Let us now turn back to inequality (23). The choice of G in (24) – (26) together with the fact that ≤ Q i x ≤ q i (since it is the outcome of a counting measurement with q i items in the pool) ensures that the image set Q [ G ] is strictly contained in an s -dimensional cube with side lengths not greater than √ n log n . Hence, (23) and (27)imply s (cid:16) log(2 (cid:112) n log n ) − δ log n − (cid:17) ≥ log (cid:18) n (cid:16) − n (cid:17)(cid:19) − n κ log( n + 1) . As n tends to infinity, we conclude that lim inf n →∞ sn/ log n ≥ − δ . The proof for the sparse case (Lemma 3.4) largely follows that of the non-sparse case, with slight modificationof the definition of the “atypical” sets in (24) and (25): for i = 1 , , ..., s , the definition of B i in (24) is changed to (cid:110) x ∈ { , } n : (cid:107) x (cid:107) ≤ n λ , | Q i x − q i n λ n | ≥ (cid:112) λn λ log n (cid:111) . Accordingly, the “typical” set G becomes G (cid:44) { x ∈ { , } n , (cid:107) x (cid:107) ≤ n λ } \ B, where B (cid:44) (cid:83) si =1 B i .Chernoff bound is then employed to control the cardinality of B i . Note that the new definition of B i has anadditional sparsity constraint (cid:107) x (cid:107) ≤ n λ . Removing the sparsity constraint, we have a set ˜ B i with cardinality notsmaller than that of B i . Now, randomize the data vector so that X i i.i.d. ∼ Ber( n λ − ) , i = 1 , ..., n . We first calculate Pr (cid:8) | Q i X − q i n λ /n | ≥ (cid:112) λn λ log n (cid:9) , and then relate this quantity to | ˜ B i | . February 1, 2021 DRAFT0 Pr (cid:8) | Q i X − q i n λ n | ≥ (cid:112) λn λ log n (cid:9) ≤ ( a ) (cid:0) n λ − e t + 1 − n λ − (cid:1) q i e t (cid:16) q i n λ − + √ λn λ log n (cid:17) = 2 (cid:0) n λ − ( e t − (cid:1) q i e t (cid:16) q i n λ − + √ λn λ log n (cid:17) ≤ e n λ − ( e t − ) q i e t (cid:16) q i n λ − + √ λn λ log n (cid:17) ≤ ( b ) n λ log n ) . ( a ) follows from Chernoff bound. In order to get ( b ), it suffics to choose t such that t (cid:16) q i n λ − + (cid:112) λn λ log n (cid:17) − n λ − (cid:0) e t − (cid:1) q i (28) ≥ λ log( n ) + 2 log (log( n )) (29)Then we have(28) ≥ t (cid:16) q i n λ − + (cid:112) λn λ log n (cid:17) − n λ − (cid:0) e t − (cid:1) q i = ( a ) ln (cid:32) (cid:112) λn λ log nn λ − q i (cid:33) (cid:16) q i n λ − + (cid:112) λn λ log n (cid:17) − (cid:112) λn λ log n := f ( q i ) ≥ ( b ) ln (cid:32) (cid:112) λn λ log nn λ (cid:33) (cid:16) n λ + (cid:112) λn λ log n (cid:17) − (cid:112) λn λ log n ≥ ( c ) (cid:32) (cid:112) λn λ log nn λ − λn λ log n n λ (cid:33) (cid:16) n λ + (cid:112) λn λ log n (cid:17) − (cid:112) λn λ log n = 12 6 λn λ log nn λ (cid:32) − (cid:112) λn λ log nn λ (cid:33) ≥ λ log n + 2 log (log n ) ( a ) follows from choose t = ln (cid:18) √ λn λ log nn λ (cid:19) .( b ) follows from the the fact that f ( q i ) is a decreasing functionof q i and ≤ q i ≤ n . ( c ) follows from ln(1 + x ) ≥ x − x .Consequently, ∀ i = 1 , ..., s , | B i | ≤ ( a ) (cid:0) nn λ (cid:1) n λ n λ (log n ) , and | G | ≥ (cid:18) nn λ (cid:19) − s (cid:88) i =1 | B i | ≥ (cid:18) nn λ (cid:19)(cid:16) − sn λ (log n ) (cid:17) (30) ≥ ( b ) n (1 − λ ) n λ (cid:16) − Θ( 1log n ) (cid:17) , (31)( a ) follows from the fact that for all x ∈ { , } n , (cid:107) x (cid:107) ≤ n λ , those (cid:107) x (cid:107) = n λ has the smallest probability (cid:16) n λ n (cid:17) n λ (cid:16) − n λ n (cid:17) n − n λ , and (cid:0) nn λ (cid:1) (cid:16) n λ n (cid:17) n λ (cid:16) − n λ n (cid:17) n − n λ ≥ n λ . where in ( b ), we make use of an implicit as-sumption that s = O ( n λ log n ) when n is sufficiently large, due to the achievability part (Lemma 3.3), and the factthat (cid:0) nn λ (cid:1) ≥ n (1 − λ ) n λ .Finally, combine (23),(30), we get s (cid:16) log(2 (cid:112) λn λ log n ) − δ log n − (cid:17) ≥ log (cid:18) n (1 − λ ) n λ (cid:16) − Θ( 1log n ) (cid:17)(cid:19) − n κ log( n + 1) . February 1, 2021 DRAFT1 As n tends to infinity, we conclude that s ≥ − λ ) λ − δ n λ , δ < κ − λ ) λ − δ n λ log n, δ = κ C. Proof of Theorem 4.1
Let (cid:15) ∈ (0 , , and (cid:6) n (cid:15) (cid:7) denote the smallest width of noiseless detecting matrix corresponding to non-adaptivepooling algorithm mentioned in Section 4 of [11] that greater or equal than n (cid:15) . Let (cid:6) n − (cid:15) (cid:7) denote the smallestsize of Sylvester’s type Hadamard matrix that greater or equal than n − (cid:15) . Let ¯ n = (cid:6) n − (cid:15) (cid:7) (cid:6) n (cid:15) (cid:7) . One can easilyget that n (cid:15) ≤ (cid:6) n (cid:15) (cid:7) ≤ n (cid:15) , and n − (cid:15) ≤ (cid:6) n − (cid:15) (cid:7) ≤ n − (cid:15) , and consequently, n ≤ ¯ n ≤ n .In order to proof that ˆQ is a (cid:0) n, n κ , n δ (cid:1) -detecting matrix, we need to show that ∀ a (cid:54) = b ∈ { , } n , (cid:107) a − b (cid:107) ≥ n κ , (cid:13)(cid:13)(cid:13) ˆQ ( a − b ) (cid:13)(cid:13)(cid:13) ∞ ≥ n δ . Since ˆQ is reduced(delete last ¯ n − n column) from Q , and Q is the concatenate of Q n , Q n ,and P ¯ n = Q n − Q n . It suffics to show that for any a (cid:54) = b ∈ { , } ¯ n , (cid:107) a − b (cid:107) ≥ n κ , (cid:107) P ¯ n ( a − b ) (cid:107) ∞ ≥ n δ .Let d = a − b = [ d , d , ..., d (cid:108) n − (cid:15) (cid:109) ] be the equal length division of some difference vector, where a (cid:54) = b ∈{ , } ¯ n . Let y = [ y , y , ..., y (cid:108) n − (cid:15) (cid:109) ] = P ¯ n d be the equal length division of result vector. Since rows of Hadamardmatrix form an orthogonal basis, (cid:107) y (cid:107) = (cid:107) P ¯ n d (cid:107) = (cid:108) n − (cid:15) (cid:109) (cid:88) i =1 (cid:6) n − (cid:15) (cid:7) (cid:13)(cid:13)(cid:13)(cid:13) M (cid:108) n (cid:15) (cid:109) d i (cid:13)(cid:13)(cid:13)(cid:13) and M (cid:108) n (cid:15) (cid:109) is a noiseless detecting matrix, so for any d i (cid:54) = , M (cid:108) n (cid:15) (cid:109) d i (cid:54) = , combine with the fact that M (cid:108) n (cid:15) (cid:109) d i is an integer vector, (cid:13)(cid:13)(cid:13)(cid:13) M (cid:108) n (cid:15) (cid:109) d i (cid:13)(cid:13)(cid:13)(cid:13) ≥ . but in our setting, (cid:107) d (cid:107) ≥ n κ , so, there exists at least n κ (cid:108) n (cid:15) (cid:109) segments d i (cid:54) = , hence (cid:107) y (cid:107) = (cid:108) n − (cid:15) (cid:109) (cid:88) i =1 (cid:6) n − (cid:15) (cid:7) (cid:13)(cid:13)(cid:13)(cid:13) M (cid:108) n (cid:15) (cid:109) d i (cid:13)(cid:13)(cid:13)(cid:13) ≥ (cid:6) n − (cid:15) (cid:7) n κ (cid:6) n (cid:15) (cid:7) ≥ n κ − (cid:15) finally, remember that the height of y equals to the height of M (cid:108) n (cid:15) (cid:109) times the height of H (cid:108) n − (cid:15) (cid:109) , which is (cid:32) (cid:6) n (cid:15) (cid:7) log (cid:0)(cid:6) n (cid:15) (cid:7)(cid:1) + O (cid:32) (cid:6) n (cid:15) (cid:7) log (cid:0) log (cid:0)(cid:6) n (cid:15) (cid:7)(cid:1)(cid:1) log (cid:0)(cid:6) n (cid:15) (cid:7)(cid:1) (cid:33)(cid:33) (32) (cid:6) n − (cid:15) (cid:7) ≤ n log (cid:0) n (cid:15) (cid:1) + O (cid:18) n log (log ( n ))log ( n ) (cid:19) (33) ≤ (1) n(cid:15) log ( n ) = (2) n ( κ − δ ) log ( n ) = o (cid:16) n (cid:17) (34)When n large enough, inequality ( ) follows. Equality ( ) follows from that we choose (cid:15) = κ − δ . So we have, (cid:107) y (cid:107) ∞ ≥ (cid:115) (cid:107) y (cid:107) n ≥ (cid:112) n (1+ κ − (cid:15) ) − = √ n κ − (cid:15) = (1) n δ Equality ( ) follows from that we choose (cid:15) = κ − δ . Hence we can see that, for any a (cid:54) = b ∈ { , } ¯ n , (cid:107) a − b (cid:107) ≥ n κ , (cid:107) P ¯ n ( a − b ) (cid:107) ∞ ≥ n δ , which implies ˆQ is a (cid:0) n, n κ , n δ (cid:1) -detecting matrix, and by (34), the pooling complexityof this construction is no more than n ( κ − δ ) log ( n ) ..