Sublinear decoding schemes for non-adaptive group testing with inhibitors
Thach V. Bui, Minoru Kuribayashi, Tetsuya Kojima, Isao Echizen
SSublinear decoding schemes for non-adaptivegroup testing with inhibitors
Thach V. Bui ∗ , Minoru Kuribayashi ‡ , Tetsuya Kojima § , and Isao Echizen ∗†∗ SOKENDAI (TheGraduate Universityfor AdvancedStudies), Hayama,Kanagawa, [email protected] ‡ Graduate Schoolof Natural Scienceand Technology,Okayama University,Okayama, [email protected] § National Instituteof Technology, Tokyo College,Hachioji, [email protected] † National Instituteof Informatics,Tokyo, [email protected]
Abstract
Identification of up to d defective items and up to h inhibitors in a set of n items is the main task of non-adaptive group testing with inhibitors. To efficiently reduce the cost of this Herculean task, a subset of the n itemsis formed and then tested. This is called group testing . A test outcome on a subset of items is positive if the subsetcontains at least one defective item and no inhibitors, and negative otherwise. We present two decoding schemesfor efficiently identifying the defective items and the inhibitors in the presence of e erroneous outcomes in time poly ( d, h, e, log n ) , which is sublinear to the number of items n . This decoding complexity significantly improvesthe state-of-the-art schemes in which the decoding time is linear to the number of items n , i.e., poly ( d, h, e, n ) .Moreover, each column of the measurement matrices associated with the proposed schemes can be nonrandomlygenerated in polynomial order of the number of rows. As a result, one can save space for storing them. Simulationresults confirm our theoretical analysis. When the number of items is sufficiently large, the decoding time in ourproposed scheme is smallest in comparison with existing work. In addition, when some erroneous outcomes areallowed, the number of tests in the proposed scheme is often smaller than the number of tests in existing work. I. I
NTRODUCTION
Group testing was proposed by an economist, Robert Dorfman, who tried to solve the problem ofidentifying which draftees had syphilis [1] in WWII. Nowaday, it is known as a problem of finding upto d defective items in a colossal number of items n by testing t subsets of n items. It can also betranslated into the classification of up to d defective items and at least n − d negative items in a setof n items. The meanings of “items,” “defective items,” and “tests” depend on the context. Normally,a test on a subset of items (a test for short) is positive if the subset has at least one defective item,and negative otherwise. For testing design, there are two main approaches: adaptive and non-adaptivedesigns. In adaptive group testing , the design of a test depends on the earlier tests. With this approach,the number of tests can be theoretically optimized [2]. However, it would take a long time to proceed suchsequential tests. Therefore, non-adaptive group testing (NAGT) [3], [2] is preferable to be used: all testsare designed in prior and tested in parallel. The proliferation of applying NAGT in various fields suchas DNA library screening [4], DNA hybridization [5], multiple-access channels [6], data streaming [7],compressed sensing [8], similarity searching [9], neuroscience [10] has made it become more attractiverecently. We thus focus on NAGT in this work.The development of NAGT applications in the field of molecular biology led to the introduction ofanother type of item: inhibitor . An item is considered to be an inhibitor if it interferes with the identificationof defective items in a test, i.e., a test containing at least one inhibitor item returns negative outcome. Inthis “Group Testing with Inhibitors (GTI)” model, the outcome of a test on a subset of items is positiveiff the subset has at least one defective item and no inhibitors in the tested set. Due to great potential for1 a r X i v : . [ c s . I T ] J a n se in applications, the GTI model has been intensively studied for the last two decades [11], [12], [13],[14].In NAGT using the GTI model (NAGTI), if t tests are needed to identify up to d defective items andup to h inhibitors among n items, it can be seen that they comprise a t × n measurement matrix. Theprocedure for obtaining the matrix is called the construction procedure . The procedure for obtaning theoutcome of t tests using the matrix is called encoding procedure , and the procedure for obtaining thedefective items and the inhibitor items from t outcomes is called the decoding procedure . Since noisetypically occurs in biology experiments, we assume that there are up to e erroneous outcomes in the testoutcomes. The objective of NAGTI is to design a scheme such that all items are “efficiently” identifiedfrom the encoding procedure and from the decoding procedure in the presence of noise.There are two approaches when using NAGTI. One is to identify defective items only. Chang et al. [15]proposed a scheme using O (( d + h + e ) log n ) tests to identify all defective items in time O (( d + h + e ) n log n ) . Using a probabilistic scheme, Ganesan et al. [16] reduced the number of tests to O (( d + h ) log n ) and the decoding time to O (( d + h ) n log n ) . However, this scheme proposed is applicable onlyin a noise-free setting, which is restricted in practice. The second approach is to identify both defectiveitems and inhibitors. Chang et al. [15] proposed a scheme using O ( e ( d + h ) log n ) tests to classify n items in time O ( e ( d + h ) n log n ) . Without considering the presence of noise in the test outcome, Ganesanet al. [16] used O (( d + h ) log n ) tests to identify at most d defective items and at most h inhibitor itemsin time O (( d + h ) n log n ) . A. Problem definition
We address two problems. The first is how to efficiently identify defective items in the test outcomes inthe presence of noise. The second is how to efficiently identify both defective items and inhibitor items inthe test outcome in the presence of noise. Let z be an odd integer and e = z − be the maximum numberof errors in the test outcomes. Problem 1.
There are n items including up to d defective items and up to h inhibitor items. Is there ameasurement matrix such that • All defective items can be identified in time poly ( d, h, e, log n ) in the presence of up to e erroneousoutcomes, where the number of rows in the measurement matrix is much smaller than n ? • Each column of the matrix can be nonrandomly generated in polynomial time of the number of rows?
Problem 2.
There are n items including up to d defective items and up to h inhibitor items. Is there ameasurement matrix such that • All defective items and inhibitors items can be identified in time poly ( d, h, e, log n ) in the presenceof up to e erroneous outcomes, where the number of rows in the measurement matrix is much smallerthan n ? • Each column of the matrix can be nonrandomly generated in polynomial time of the number of rows?
We note that some previous works such as [17], [18] do not consider inhibitor items. In this case,Problem 1 and 2 can be reduced to the same problem by eliminating all terms related to “inhibitor items.”
B. Problem model
We model NAGTI as follows. Suppose that there are up to ≤ d defectives and up to ≤ h inhibitorsin n items. Let x = ( x , . . . , x n ) T ∈ { , , −∞} n be the vector representation of n items. Note that thenumber of defective items must be at least one. Otherwise, the outcomes of the tests designed would yieldnegative. Item j is defective iff x j = 1 , is an inhibitor iff x j = −∞ , and is negative iff x j = 0 . Supposethat there are at most d x , i.e., | D = { j | x j = 1 , for j = 1 , . . . , n }| ≤ d , and at most h −∞ ’s in x , i.e., | H = { j | x j = −∞ , for j = 1 , . . . , n }| ≤ h .2et Q = ( q ij ) be a q × n binary measurement matrix which is used to identify defectives and inhibitorsin n items. Item j is represented by column j of Q ( Q j ) for j = 1 , . . . , n . Test i is represented by therow i in which q ij = 1 iff the item j belongs to the test i , and q ij = 0 otherwise, where i = 1 , . . . , q .Then the outcome vector using the measurement matrix Q is r = Q ⊗ x = r ... r q , (1)where ⊗ is called the NAGTI operator, test outcome r i = 1 iff (cid:80) nj =1 q ij x j ≥ , and r i = 0 otherwise for i = 1 , . . . , q. Note that we assume × ( −∞ ) = 0 and there may be at most e erroneous outcomes in r .Given l binary vectors y w = ( y w , y w , . . . , y Bw ) T for w = 1 , . . . , l and some integer B ≥ . The unionof y , . . . , y l is defined as vector y = ∨ li =1 y i = ( ∨ li =1 y i , . . . , ∨ li =1 y Bi ) T , where ∨ is the OR operator.Then when vector x is binary, i.e., there is no inhibitor in n items, (1) can be represented as r = Q ⊗ x = n (cid:95) j =1 x j Q j = n (cid:95) j ∈ D Q j . (2)Our objective is to design the matrix Q such that vector x can be recovered when having r in time poly ( q ) = poly ( d, h, e, log n ) . C. Our contributions
Overview:
Our objective is to reduce the decoding complexity for identifying up to d defectives and/orup to h inhibitors in the presence of up to e erroneous test outcomes. We present two deterministicschemes that can efficiently solve both Problems 1 and 2 with the probability 1. These schemes usetwo basic ideas: each column of a t × n ( d + h, r ; z ] -disjunct matrix (defined later) must be generatedin time poly ( t ) and the tensor product (defined later) between it and a special signature matrix. Theseideas reduce decoding complexity to poly ( t ) . Moreover, the measurement matrices used in our proposedschemes are nonrandom, i.e., their columns can be nonrandomly generated in time polynomial of thenumber of rows. As a result, one can save space for storing the measurement matrices. Simulation resultsconfirm our theoretical analysis. When the number of items is sufficiently large, the decoding time inour proposed scheme is smallest in comparison with existing work. In addition, when some erroneousoutcomes are allowed, the number of tests in the proposed scheme is often smaller than the number oftests in existing work. Comparison:
We compare our proposed schemes with existing schemes in Table I. There are six criteriato be considered here. The first one is construction type, which defines how to achieve a measurementmatrix. It also affects how defectives and inhibitors are identified. The most common construction type israndom; i.e., a measurement matrix is generated randomly. The six schemes evaluated here use randomconstruction except for our proposed schemes.The second criterion is decoding type: “Deterministic” means the decoding objectives are alwaysachieved with probability 1, while “Randomized” means the decoding objectives are achieved withsome high probability. Ganesan et al. [16] used randomized decoding schemes to identify defectivesand inhibitors. The schemes in [15] and our proposed schemes use deterministic decoding.The remaining criteria are: identification of defective items only, identification of both defective itemsand inhibitor items, error tolerance, the number of tests, and the decoding complexity. The only advantageof the schemes proposed by Ganesan et al. [16] is that the number of tests is less than ours. Our schemesoutperformed the existing schemes in other criteria such as error-tolerance, the decoding type, and thedecoding complexity. The number of tests with our proposed schemes for identifying defective items only3ABLE I: Comparison with existing schemes. “Deterministic” and “Randomized” are abbreviated as “Det.”and “Rnd.”. Notation log stands for log . The √ sign means that the criterion holds for that scheme, whilethe × sign means that it does not. We set e = z − and λ = ( d + h ) ln n W (( d + h ) ln n ) + z. Note that W ( x )e W ( x ) = x and W ( x ) = Θ (ln x − ln ln x ) . Scheme Constructiontype Decodingtype Max. no.of t ) Decodingcomplexity (cid:104) (cid:105) Changet al. [15] Random Det. e √ × O (( d + h + e ) log n ) O ( tn ) (cid:104) (cid:105) Ganesanet al. [16] Random Rnd. √ × O (( d + h ) log n ) O ( tn ) (cid:104) (cid:105) Proposed(Theorem 4) Nonrandom Det. e √ × Θ (cid:0) λ log n (cid:1) O (cid:16) λ log n ( d + h ) (cid:17) (cid:104) (cid:105) Changet al. [15] Random Det. e √ √ O ( e ( d + h ) log n ) O ( tn ) (cid:104) (cid:105) Ganesanet al. [16] Random Rnd. √ √ O (( d + h ) log n ) O ( tn ) (cid:104) (cid:105) Proposed(Theorem 5) Nonrandom Det. e √ √ Θ (cid:0) λ log n (cid:1) O (cid:16) dλ × max (cid:110) λ ( d + h ) , (cid:111)(cid:17) or both defective items and inhibitor items is slightly larger than that with two schemes proposed byChang et al. [15]. However, the decoding complexity in our proposed scheme is much less than theirs.II. P RELIMINARIES
Notation is defined here for consistency. We use capital calligraphic letters for matrices, non-capitalletters for scalars, bold letters for vectors, and capital letters for sets. Capital letters with asterisk isdenoted for multisets in which elements may appear multiple times. For example, S = { , , } is a setand S ∗ = { , , , } is a multiset.Here we assume × ( −∞ ) = 0 . We also list some frequent notations as follows: • n ; d : number of items; maximum number of defective items. For simplicity, we suppose that n is thepower of 2. • | · | : the weight, i.e., the number of non-zero entries in the input vector or the cardinality of the inputset. • ⊗ , (cid:125) : operator for NAGTI and tensor product, respectively (to be defined later). • [ n ] : { , , . . . , n } . • S : s × n measurement matrix used to identify at most one defective item or one inhibitor item, where s = 2 log n . • M = ( m ij ) : m × n disjunct matrix, where integer m ≥ is number of tests. • T = ( t ij ) : t × n measurement matrix used to identify at most d defective items, where integer t ≥ is number of tests. • x ; y : representation of n items; binary representation of the test outcomes. • S j , M j , M i, ∗ : column j of matrix S , column j of matrix M , and row i of matrix M . • D ; H : index set of defective items; index set of inhibitor items. For example, D = { , } meansitems 2 and 6 are defectives, and H = { , } means items 10 and 11 are inhibitors. • supp ( c ) : support set of vector c = ( c , . . . , c k ) ; i.e., supp ( c ) = { j | c j (cid:54) = 0 } . For example, the supportvector for v = (1 , , −∞ , , , is supp ( v ) = { , , } . • diag( M i, ∗ ) = diag( m i , m i , . . . , m in ) : diagonal matrix constructed from input vector M i, ∗ =( m i , m i , . . . , m in ) . • e , log , ln : base of natural logarithm, logarithm of base 2, and natural logarithm.4 (cid:100) x (cid:101) ; (cid:98) x (cid:99) : ceiling function of x ; floor function of x . • W ( x ) : the Lambert W function in which W ( x )e W ( x ) = x and W ( x ) = Θ (ln x − ln ln x ) . A. Tensor product
Let (cid:125) be the tensor product notation. Note that the tensor product defined here is not the usual tensorproduct used in linear algebra. Given an a × n matrix A = ( a ij ) and an s × n matrix S = ( s ij ) , the r × n tensor product R = ( r ij ) is defined as R = A (cid:125) S := S × diag( A , ∗ ) ... S × diag( A f, ∗ ) = a S . . . a n S n ... . . . ... a a S . . . a an S n , (3)where diag( . ) is the diagonal matrix constructed from the input vector, and A h, ∗ = ( a h , . . . , a hn ) is the h th row of A for h = 1 , . . . , a . The size of R is r × n , where r = a × s . For example, suppose that a = 3 , s = 2 , and n = 4 . Matrices A and S are defined as follows: A = , S = (cid:20) (cid:21) . (4)Then the tensor product of A and S is R = A (cid:125) S = (cid:20) (cid:21) (cid:125) (cid:20) (cid:21) = × (cid:20) (cid:21) × (cid:20) (cid:21) × (cid:20) (cid:21) × (cid:20) (cid:21) × (cid:20) (cid:21) × (cid:20) (cid:21) × (cid:20) (cid:21) × (cid:20) (cid:21) = . B. Reed-Solomon codes
Let n , r , Λ , q be positive integers. Let Σ be a finite field, which is called the alphabet of the code, and | Σ | = q . From now, we set Σ = F q . Each codeword is considered as a vector of F n × q . An ( n , r , Λ) q code C is a subset of Σ n such that: (i) Λ = min x , y ∈ C ∆( x , y ) , where ∆( x , y ) is the number of positions inwhich the corresponding entries of x and y differ; and (ii) the cardinality of C , i.e., | C | , is at least q r .The parameters ( n , r , Λ , q ) represent the block length, dimension, minimum distance, and alphabetsize of C . Assume that for any y ∈ C , there exists a message x ∈ F r q such that y = G x , wherematrix G is a full-rank n × r matrix in F q . Then C is called a linear code with minimum distance Λ = min y ∈ C | supp ( y ) | and denoted as [ n , r , Λ] q . Let M C denote the n × q r matrix whose columnsare the codewords in C .An [ n , r , Λ] q -Reed-Solomon (RS) code [19] is an [ n , r , Λ] q code with Λ = n − r + 1 . Since theparameter Λ can be obtained from n and r , we usually refer to a [ n , r , Λ] q -RS code as [ n , r ] q -RScode. C. Disjunct matrix
Superimposed code was introduced by Kautz and Singleton [20] and then generalized by D’yachkovet al. [21] and Stinson and Wei [22]. A superimposed code is defined as follows.
Definition 1. An m × n binary matrix M is called an ( d, r ; z ] -superimposed code if for any two disjointsubsets S , S ⊂ [ n ] such that | S | = d and | S | = r , there exists at least z rows in which there are all1’s among the columns in S while all the columns in S have 0’s, i.e., (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:92) j ∈ S supp ( M j ) (cid:15) (cid:91) j ∈ S supp ( M j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ z. M is usually referred to as an ( d, r ; z ] -disjunct matrix. The illustration of M is as follows. M = . . .. . .. . .. . .. . .. . . r (cid:122) (cid:125)(cid:124) (cid:123) . . . . . . . . . . . . . . . . . .. . . . . . . . .. . .. . .. . .. . .. . . d (cid:122) (cid:125)(cid:124) (cid:123) . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . .. . .. . .. . .. . .. . . z The parameter e = (cid:98) ( z − / (cid:99) is usually referred to as the error tolerance of a disjunct matrix. Itis clear that for any d (cid:48) ≤ d , r (cid:48) ≤ r , and z (cid:48) ≤ z , an ( d, r ; z ] -disjunct matrix is also an ( d (cid:48) , r (cid:48) ; z (cid:48) ] -disjunctmatrix.Let M = ( m ij ) be an m × n binary ( d, r ; z ] -disjunct matrix and x = ( x , . . . , x n ) T ∈ { , } n be thebinary representation vector of n items, where | x | ≤ d . From (2), the outcome vector of m tests by using M and x is defined as follows: y = M ⊗ x = n (cid:95) j =1 x j M j = n (cid:95) j ∈ D M j , (5)where D = supp ( x ) = { j | x j (cid:54) = 0 } = { j | x j = 1 } . The procedure to get y is called encoding procedure. It includes the construction procedure, which is to get a measurement matrix M . The procedure to recover x from y and M is called decoding procedure. Our objective is to recover x when the outcome vector y and the matrix M are given. The naivedecoding when given an outcome vector is to scan all columns. If a column does not belong to theoutcome vector, the item corresponding to that column is negative. Once the negative items are identified,the remaining items can be taken as defectives. With this naive decoding, up to r − false positives areidentified in time O ( tn ) . Moreover, at most | x | + r − (and at least | x | ) defective items are identified.The number of rows in an m × n ( d, r ; z ] -disjunct matrix is usually exponential to d [18], [23].Cheraghchi [24] proposed a nonrandom construction for ( d, r ; z ] -disjunct matrices in which the numberof tests is larger than the existing works as d or r increases. Theorem 1 (Lemma 29 [24]) . For any positive integers d, r, z and n with d + r ≤ n , there exists an m × n nonrandom ( d, r ; z ] -disjunct matrix where m = O (( rd ln n + z ) r +1 ) . Moreover, each column ofthe matrix can be generated in time poly ( m ) . An ( d, r ; z ] -disjunct matrix is called an ( d ; z ] -disjunct matrix when r = 1 , and a d -disjunct matrix when r = z = 1 . For efficient decoding in the NAGTI model, we pay attention only to an m × n binary ( d, r ; z ] -disjunct matrix in which each column can be generated in time poly ( m ) . Cheraghchi [25] presented amatrix that can handle at most e false positives and e false negatives in the outcome vector. However,the reconstructed vector would differ O ( d ) positions from the original vector x ; i.e., there is no guaranteethat the measurement matrix is d -disjunct. Therefore, it is unsuitable for efficient decoding in NAGTI.The t × n d -disjunct matrix proposed in [26] can be used to achieve an ( d ; z ] -disjunct matrix by stackingit z times. Each column of the resulting matrix can be generated in time poly ( t ) . However, the numberof tests is zd log n , which is pretty large. Moreover, the construction in [26] is random, which isrestrictive in practice, especially in biology screening. D. Bui et al.’s scheme
In this section, the scheme proposed by Bui et al. [17] is described. Its main contribution is that, givenany m × n ( d − -disjunct matrix, a bigger t × n measurement matrix can be generated such that up to6 defective items (in a set of n items having only defective and negative items) can be identified in time O ( t ) = O ( m log n ) , where t = 2 m log n . Encoding procedure:
Let S be an s × n measurement matrix: S := (cid:20) b b . . . b n b b . . . b n (cid:21) = (cid:2) S . . . S n (cid:3) , (6)where s = 2 log n , b j is the log n -bit binary representation of integer j − , b j is the complement of b j ,and S j := (cid:20) b j b j (cid:21) for j = 1 , , . . . , n . Item j is characterized by column S j and that the weight of everycolumn in S is s/ n. Furthermore, the index j is uniquely identified by b j .For example, if we set n = 8 , s = 2 log n = 6 , and the matrix in (6) becomes: S = . (7)Given an m × n ( d − -disjunct matrix M , the new measurement t × n matrix is constructed as follows: T = M (cid:125) S , (8)where (cid:125) is the tensor product defined in section II-A and t = ms . For any binary input vector x , itsoutcome using measurement matrix T is y = T ⊗ x = y ... y m = y ... y s ... y ( m − s +1 ... y t , (9)where y i = ( S × diag( M i, ∗ )) ⊗ x = (cid:87) nj =1 x j m ij S j for i = 1 , . . . , m . Decoding procedure:
The decoding procedure is quite simple. We can scan all y i for i = 1 , . . . , m .If wt( y i ) = log n , the defective item can be identified by calculating the first half of y i . Otherwise, nodefective item is identified. The procedure is described in Algorithm 1.This scheme can be summarized as the following theorem: Theorem 2.
Let an m × n matrix M be ( d − -disjunct. Suppose that a set of n items has up to d defective and no inhibitors. Then there exists a t × n matrix T constructed from M that can be used toidentify up to d defective items in time t = m × n . Further, suppose that each column of M can becomputed in time β . Then every column of T can be computed in time n × β = O ( β log n ) . Algorithm 1 is modified and denoted as
GetDefectives ∗ ( y , n ) if we substitute S by multiset S ∗ ; i.e.,the output of GetDefectives ∗ ( · ) may have duplicated items which are used to handle the presence oferroneous outcomes in Sections IV and V. Line 8 is interpreted as “Add d to set S ∗ ”.7 lgorithm 1 GetDefectives( y , n ) : detection of up to d defective items. Input: number of items n ; outcome vector y Output: defective items s = 2 log n. S = ∅ . Let t be number of entries in y . Divide y into m = t/s smaller vectors y , . . . , y m such that y = ( y , . . . , y m ) T and their size areequal to s . for i = 1 to m do if wt( y i ) = log n then Get defective item d by checking first half of y . S = S ∪ { d } . end if end for return S .III. I MPROVED INSTANTIATION OF NONRANDOM ( d, r ; z ] - DISJUNCT MATRICES
We first state the useful nonrandom construction of ( d, r ; z ] -disjunct matrices, which is an instance ofTheorem 1: Theorem 3 (Lemma 29 [24]) . Let ≤ d, r, z < n be integers and C be a [ n = q − , k ] q -RS code. Forany d < n − zr ( k − = q − − zr ( k − and n ≤ q k , there exists a t × n nonrandom ( d, r ; z ] -disjunct matrix where t = O ( q r +1 ) . Moreover, each column of the matrix can be constructed in time O (cid:16) q r +2 r d (cid:17) . Let W ( x ) be a Lambert W function in which W ( x )e W ( x ) = x for any x ≥ − . An approximation of W ( x ) [27] is ln x − ln ln x ≤ W ( x ) ≤ ln x − ln ln x for any x ≥ e . Then an improved instatiation ofnonrandom ( d, r ; z ] -disjunct matrix is stated as follows: Corollary 1.
Let ≤ r, d + z ≤ n be integers. Then there exists a t × n nonrandom ( d, r ; z ] -disjunctmatrix where t = Θ (cid:18)(cid:16) rd ln n W ( d ln n ) + z (cid:17) r +1 (cid:19) . Moreover, each column of the matrix can be constructed intime O (cid:18) r d (cid:16) rd ln n W ( d ln n ) + z (cid:17) r +2 (cid:19) . Proof.
From Theorem 3, we only need to find a [ n = q − , k ] q -RS code such that d < n − zr ( k − = q − − zr ( k − and q k ≥ n. One chooses q = rd ln n W ( d ln n ) + z + 1 if rd ln n W ( d ln n ) + z + 1 isthe power of 2. η +1 , otherwise. (10)where η is an integer satisfying η < rd ln n W ( d ln n ) + z + 1 < η +1 . We have q = Θ (cid:16) rd ln n W ( d ln n ) + z (cid:17) in bothcases because rd ln n W ( d ln n ) + z + 1 ≤ q < (cid:18) rd ln n W ( d ln n ) + z + 1 (cid:19) . k = (cid:6) q − z − rd (cid:7) ≥ ln n W ( d ln n ) . Note that the condition on d in Theorem 3 always holds because: k = (cid:24) q − z − rd (cid:25) = ⇒ k < q − z − rd + 1 = ⇒ d < q − − zr ( k −
1) = n − zr ( k − . Finally, our task is to prove that n ≤ q k . Indeed, we have: q k ≥ (cid:18) rd ln n W ( d ln n ) + z + 1 (cid:19) ln n W ( d ln n ) ≥ (cid:18) d ln n W ( d ln n ) (cid:19) ln n W ( d ln n ) = (cid:16) e W ( d ln n ) e W ( d ln n (cid:17) /d ≥ (e d ln n ) /d = n. This completes our proof.The number of tests in our construction is better than the one in Theorem 1. Furthermore, there is nodecoding scheme associated with matrices in this corollary except the naive one if the given input is abinary vector. However, when r = z = 1 , the scheme in [17] achieves the same number of tests and hasan efficient decoding algorithm.IV. I DENTIFICATION OF DEFECTIVE ITEMS
In this section, we answer Problem 1 that there exists a t × n measurement matrix such that: it canhandle at most e errors in the test outcome; each column can be nonrandomly generated in time poly ( t ) ;and all defective items can be identified in time poly ( d, h, e, log n ) , where there are up to d defectiveitems and up to h inhibitor items in n items. The main idea is to use Algorithm 1 to identify all potentialdefective items. Then a sanitary procedure is proceeded to remove all false defective items. Theorem 4.
Let ≤ z, d + h ≤ n be integers, z be odd, and λ = ( d + h ) ln n W (( d + h ) ln n ) + z . A set of n itemsincludes up to d defective items and up to h inhibitors. Then there exists a nonrandom matrix t × n suchthat up to d defective items can be identified in time O (cid:16) λ log n ( d + h ) (cid:17) with up to e = z − errors in the testoutcomes, where t = Θ ( λ log n ) . Moreover, each column of the matrix can be generated in time poly ( t ) . The proof is given in the following sections.
A. Encoding procedure
We set e = z − and λ = ( d + h ) ln n W (( d + h ) ln n ) + z . Let an m × n matrix M be an ( d + h ; z ] -disjunct matrix inCorollary 1 ( r = 1 ), where m = Θ (cid:32)(cid:18) ( d + h ) ln n W (( d + h ) ln n ) + z (cid:19) (cid:33) = O ( λ ) . Each column in M can be generated in time t where t = O (cid:18) λ ( d + h ) (cid:19) . Then the final t × n measurement matrix T is T = M (cid:125) S , (11)where the s × n matrix S is defined in (6) and t = ms = Θ ( λ log n ) . Then it is easy to see that eachcolumn of matrix T can be generated in time t × s = poly ( t ) .9ny input vector x = ( x , . . . , x n ) T ∈ { , , −∞} n contains at most d h −∞ ’s asdescribed in section I-B. Note that D and H are the index sets of the defective items and the inhibitoritems, respectively. Then the binary outcome vector using the measurement matrix T is y = T ⊗ x = y ... y m = y ... y s ... y ( m − s +1 ... y t , (12)where y i = ( S × diag( M i, ∗ )) ⊗ x = y ( i − s +1 . . .y is , (13)and y ( i − s + l = 1 iff (cid:80) nj =1 m ij s lj x j ≥ , and y ( i − s + l = 0 otherwise, for i = 1 , . . . , m , and l = 1 , . . . , s .We assume that there are at most e incorrect outcomes in the outcome vector y . B. Decoding procedure
Given outcome vector y = ( y , . . . , y m ) T , we can identify all defective items by using Algorithm 2.Step 1 is to identify all potential defectives and put them in the set S ∗ . Then Steps 3 to 8 are to removeduplicate items in the new potential defective set S . After that, Steps 9 to 17 are to remove all falsedefectives. Finally, Step 18 returns the defective set.
C. Correctness of decoding procedure
Since matrix M is an ( d + h ; z ] -disjunct matrix, there are at least z rows i such that m i j = 1 and m i j (cid:48) = 0 for any j ∈ D and j (cid:48) (cid:54)∈ D ∪ H \ { j } . Since up to e = ( z − / errors may appear in testoutcome y , there are at least e + 1 vectors y i such that the condition in Step 6 of Algorithm 1 holds.Consequently, each value j ∈ D appears at least e + 1 times. Therefore, Steps 1 to 8 return a set S containing all defective items and some false defectives.Steps 9 to 17 are to remove false defectives. For any index j (cid:54)∈ D , since there are at most e = ( z − / erroneous outcomes, there is at least 1 row i such that t i j = 1 and t i j (cid:48) = 0 for all j (cid:48) ∈ D ∪ H. Becauseitem j (cid:54)∈ D , the outcome of that row (test) is negative ( ). Therefore, Step 13 is to check whether an itemin S is non-defective. Finally, Step 18 returns the set of defective items. D. Decoding complexity
The time to run Step 1 is O ( t ) . Since | S ∗ | ≤ m , it takes m time to run Steps 3 to 8. Because | S ∗ | ≤ m ,the cardinality of S is up to m . The loop at Step 9 runs at most m times. Steps 11 and 13 take time s × m . ( d + h ) and t , respectively. The total decoding time is: O ( t ) + m + m × (cid:18) s × m . ( d + h ) + t (cid:19) = O (cid:18) sm . ( d + h ) (cid:19) = O (cid:18) λ log n ( d + h ) (cid:19) = O (cid:32) log n ( d + h ) (cid:18) ( d + h ) ln n W (( d + h ) ln n ) + z (cid:19) (cid:33) . lgorithm 2 GetDefectivesWOInhibitors( y , n, e ) : detection of up to d defective items without identifyinginhibitors. Input: a function to generate t × n measurement matrix T ; outcome vector y ; maximum number of errors e Output: defective items S ∗ = GetDefectives ∗ ( y , n ) . (cid:46) Identify all potential defectives. S = ∅ . (cid:46) Defective set. foreach x ∈ S ∗ do if x appears in S ∗ at least e + 1 times then S = S ∪ { x } . Remove all elements that equal x in S ∗ . end if end foreach for all x ∈ S do (cid:46) Remove false defectives. (cid:46)
Get column corresponding to defective item x . Generate column T x = M x (cid:125) S x . (cid:46) Condition for a false defective. if ∃ i ∈ [ t ] : t i x = 1 and y i = 0 then S = S \ { x } . (cid:46) Remove false defectives. break; end if end for return S . (cid:46) Return set of defective item.V. I
DENTIFICATION OF DEFECTIVES AND INHIBITORS
In this section, we answer Problem 2 that there exists a v × n measurement matrix such that: it canhandle at most e errors in the test outcome; each column can be nonrandomly generated in time poly ( v ) ;and all defective items and inhibitor items can be identified in time poly ( d, h, e, log n ) , where there areup to d defective items and up to h inhibitor items in n items. Theorem 5.
Let ≤ z, d + h ≤ n be integers, z be odd, and λ = ( d + h ) ln n W (( d + h ) ln n ) + z. A set of n items includesup to d defective items and up to h inhibitors. Then there exists a nonrandom matrix v × n such thatup to d defective items and up to h inhibitor items can be identified in time O (cid:16) dλ × max (cid:110) λ ( d + h ) , (cid:111)(cid:17) ,with up to e = z − errors in the test outcomes, where v = Θ ( λ log n ) . Moreover, each column of thematrix can be generated in time poly ( v ) . To detect both up to h inhibitors and d defectives, we have to use two types of matrices: an ( d + h ; z ] -disjunct matrix and an ( d + h − , z ] -disjunct matrix. The main idea is as follows. We first identify alldefective items. Then all potential inhibitors are located by using an ( d + h − , z ] -disjunct matrix. Thefinal procedure is to remove all false inhibitor items. A. Identification of an inhibitor
Let ∨ be the notation for the union of the column corresponding to the defective item and the columncorresponding to the inhibitor item. We suppose that there is an outcome o := ( o , . . . , o s ) T = S a ∨S b ,where the defective item is a and the inhibitor item is b , and that S a and S b are two columns in the s × n matrix S in (6). Note that o i = 1 iff s ia = 1 and s ib = 0 , and o i = 0 otherwise, for i = 1 , . . . , s. Assumethat the defective item a is already known. The inhibitor item b is identified as in Algorithm 3.11 lgorithm 3 GetInhibitorFromADefective( o , S a , n ) : identification of an inhibitor when defective itemand union of corresponding columns are known. Input: outcome vector o := ( o , . . . , o s ) = S a ∨ S b ; number of items n ; vector S a corresponding todefective item a Output: inhibitor item b s = 2 log n . Set S b = ( s b , . . . , s sb ) T = ( − , − , . . . , − T . for i = 1 to s do (cid:46) Obtain s/ entries of S b . if s ia = 1 and o i = 1 then s ib = 0 . end if if s ia = 1 and o i = 0 then s ib = 1 . end if end for for i = 1 to s/ do (cid:46) Obtain s/ remaining entries of S b . if s ib = − then s ib = 1 − s i + s/ ,b . end if if s ib = 0 then s i + s/ ,b = 1 . end if if s ib = 1 then s i + s/ ,b = 0 . end if end for Get index b by checking first half of S b . return b . (cid:46) Return the inhibitor item.The correctness of the algorithm is described here. Step 2 initializes the corresponding column ofinhibitor b in S . Since column S a has exactly s/ s/ positions of S b .Since the first half of S a is the complement of its second half, it does not exist two indexes i and i such that s i a = s i a = 1 , where | i − i | = log n . As a result, it does not exist two indexes i and i such that s i b = s i b = − , where | i − i | = log n . Moreover, the first half of S b is the complement ofits second half. Therefore, the remaining s/ entries of S b can be obtained by using Steps 9 to 16. Theindex of inhibitor b can be identified by checking the first half of S b , which is done in Step 17. Finally,Step 18 returns the index of the inhibitor.It is easy to verify that the decoding complexity of Algorithm 3 is O ( s ) . Example:
Let S be the matrix in (7), i.e., n = 8 and s = 6 . Given item 1 is the unknown inhibitorand that item 3 is the known defective item, assume that the observed vector is o = (0 , , , , , T . The corresponding column of the defective item is S . We set S b = ( − , − , − , − , − , − T . We get S b = ( − , , − , , − , T from Steps 3 to 8 and the complete column S b = (0 , , , , , T from Steps 9to 16. Because the first half of S b is (0 , , T , the index of the inhibitor is 1.12 . Encoding procedure We set e = z − and λ = ( d + h ) ln n W (( d + h ) ln n ) + z . Let an m × n matrix M and a g × n matrix G be an ( d + h ; z ] -disjunct matrix and an ( d + h − , z ] -disjunct matrix in Corollary 1, respectively, where m = Θ (cid:32)(cid:18) ( d + h ) ln n W (( d + h ) ln n ) + z (cid:19) (cid:33) = Θ (cid:0) λ (cid:1) ,g = Θ (cid:32)(cid:18) ( d + h ) ln n W (( d + h ) ln n ) + z (cid:19) (cid:33) = Θ (cid:0) λ (cid:1) . Each column in M and G can be generated in time t and t , respectively, where t = O (cid:18) λ ( d + h ) (cid:19) , (14) t = O (cid:18) λ ( d + h ) (cid:19) . (15)The final v × n measurement matrix V is V = M (cid:125) SG (cid:125) SG = THG , (16)where T = M (cid:125) S and H = G (cid:125) S . The sizes of matrices T and H are t × n and h × n , respectively.Then we have t = ms = 2 m log n and h = gs = 2 g log n . Note that the matrix T is the same as the onein (11). The number of tests of the measurement matrix V is v = t + h + g = ms + gs + g = O (( m + g ) s ) = Θ (cid:0) λ log n (cid:1) . Then it is easy to see that each column of matrix V can be generated in time ( t + t ) × s + t = poly ( v ) .Any input vector x = ( x , . . . , x n ) T ∈ { , , −∞} n contains at most d h −∞ ’s asdescribed in Section I-B. The outcome vector using measurement matrix T , i.e., y = T ⊗ x , is the sameas the one in Section IV-A. The binary outcome vector using the measurement matrix H is h = H ⊗ x = h ... h g = h ... h s ... h ( g − s +1 . . .h gs , (17)where h i = ( S × diag( G i, ∗ )) ⊗ x , h ( i − s + l = 1 iff (cid:80) nj =1 g ij s lj x j ≥ , and h ( i − s + l = 0 otherwise, for i = 1 , . . . , g , and l = 1 , . . . , s . Therefore, the outcome vector using the measurement matrix V in (16) is: v = V ⊗ x = THG ⊗ x = T ⊗ x H ⊗ x G ⊗ x = yhg , (18)where y is as same as the one in Section IV-A, h is defined in (17), and g = G ⊗ x = ( r , . . . , r g ) T . Weassume that × ( −∞ ) = 0 and there are at most e = ( z − / incorrect outcomes in the outcome vector v . . Decoding procedure Given outcome vector v , number of items n , number of tests in matrix M , number of tests in matrix G , maximum number of errors e , and functions to generate matrix V , G , M , and S . The details of theproposed scheme is described in Algorithm 4. Steps 1 to 2 are to divide the outcome vector v into threesmaller vectors y , h , and g as (18). Then Step 3 is to get the defective set. All potential inhibitors wouldbe identified in Steps 5 to 12. Then Steps 14 to 23 are to remove most of false inhibitors. Since theremay be some duplicate inhibitors and some remaining false inhibitors in the inhibitor set, Step 25 to 31are to remove the remaining false inhibitors and make each element in the inhibitor set unique. Finally,Step 32 is to return the defective set and the inhibitor set. D. Correctness of the decoding procedure
Because of the construction of V , the three vectors split from the outcome vector v in Step 2 are y = T ⊗ x , h = H ⊗ x , and g = G ⊗ x . Therefore, the set D achieved in Step 3 is the defective set asanalyzed in Section IV.Let H be the true inhibitor set which we will identify. Since G is an ( d + h − , z ] -disjunct matrix G , for any j ∈ H (we have not known H yet) and j ∈ D , there exists at least z rows i ’s such that g i j = g i j = 1 and g i j (cid:48) = 0 , for all j (cid:48) ∈ D ∪ H \ { j , j } . Then, since there are at most e = ( z − / errors in v , there exists at least e + 1 = ( z − / index i ’s such that h i = S j ∨S j . As analyzed inSection V-A, for any vector which is the union of the column corresponding to the defective item and thecolumn corresponding to the inhibitor item, the inhibitor item is always identified if the defective item isknown. Therefore, the set H ∗ obtained from Steps 7 to 12 contains all inhibitors and may contain somefalse inhibitors. Our next goal is to remove false inhibitors.To remove the false inhibitors, we first remove all defective items in the set H ∗ as Step 16. Therefore,there are only inhibitors and negative items in the set H ∗ after implementing Step 16. One needs to exploitthe property of the inhibitor that it will make the test outcome negative if there are at least one inhibitorand at least one defective in the same test. We pick an arbitrary defective item y ∈ D and generate itscorresponding column G y in the matrix G . Since G is an ( d + h − , z ] -disjunct matrix G and there areat most e = ( z − / errors in v , for any j ∈ H (we have not known H yet) and y ∈ D , there existsat least z − e = e + 1 rows i ’s such that g i j = g i y = 1 and g i j (cid:48) = 0 , for all j (cid:48) ∈ D ∪ H \ { j , y } . Theoutcome of these tests would be negative. Therefore, Steps 14 to 23 removes most of false inhibitors.Note that since there are at most e errors, the are at most e false inhibitors and each of them appears atmost e times in the set H ∗ . Then Step 25 to 31 are to completely remove false inhibitors and make eachelement in the inhibitor set unique. Finally, Step 32 returns the sets of defective items and inhibitor items.
E. Decoding complexity
First, we find all potential inhibitors. It takes time O ( v ) for Step 2. The time to get the defective set D is O (cid:16) sm . ( d + h ) (cid:17) = O (cid:16) λ log n ( d + h ) (cid:17) as analyzed in Theorem 4. Steps 7 and 8 have up to g and | D | ≤ d loops, respectively. Since Step 9 takes time O ( s ) , the running time from Steps 7 to 12 is O ( gds ) and thecardinality of H ∗ is up to gd .Second, we analyze the complexity of removing false inhibitors. Step 15 takes time t as in (14). Since | H ∗ | ≤ gd , the number of loops at Step 17 is at most gd . For the next step, it takes time t for Step 18as in (15). And it takes time O ( g ) from Steps 19 to 22. As a result, it takes time O ( t + gd ( t + g )) forSteps 14 to 23.Finally, Steps 25 to 31 are to remove duplicate inhibitors in the new defective set H. It takes time O ( gd ) to do that because we know | H ∗ | ≤ gd. lgorithm 4 GetInhibitors( v , n, e, m, g ) : identification of up to d defectives and up to h inhibitors. Input: outcome vector v ; number of items n ; number of tests in matrix M ; number of tests in matrix G ; maximum number of errors e ; and functions to generate matrix V , G , M , and S Output: defective items and inhibitor items s = 2 log n . (cid:46) number of rows in the matrix S . Divide vector v into three smaller vectors y , h , and g such that v = ( y T , h T , g T ) T and number ofentries in y , h , and g are ms, gs, and g, respectively. D = GetDefectivesWOInhibitors( y , n, e ) . (cid:46) defective set. (cid:3) Find all potential inhibitors. Divide vector h into g smaller vectors h , . . . , h g such that h = ( h T , . . . , h Tg ) T and their size areequal to s. H ∗ = ∅ . (cid:46) Initialize inhibitor multiset. for i = 1 to g do (cid:46) Scan all outcomes in h . foreach x ∈ D do i = GetInhibitorFromADefective( h i , S x , n ) . Add item i to multiset H ∗ . end foreach end for (cid:3) Remove most of false inhibitors.
Assign ( r , . . . , r g ) T = g . Generate a column G y for any y ∈ D . (cid:46) Get the column of a defective. H ∗ = H ∗ \ D . foreach x ∈ H ∗ do (cid:46) Scan all potential inhibitors.
Generate column G x if ∃ i ∈ [ g ] : g i x = g i y = 1 and r i = 1 then Remove all elements that equal x in H ∗ . (cid:46) Remove the false inhibitor. break; end if end foreach (cid:3)
Completely remove false inhibitors and duplicate inhibitors. H = ∅ . foreach x ∈ H ∗ do if x appears in H ∗ at least e + 1 times then H = H ∪ { x } . Remove all elements that equal x in H ∗ . end if end foreach return D and H . (cid:46) Return set of defective items.In summary, the decoding complexity is: O (cid:18) sm . ( d + h ) (cid:19) + O ( gds ) + O ( t + gd × ( t + g )) + O ( gd )= O (cid:18) sm . ( d + h ) (cid:19) + O ( gd ( t + g ))= O (cid:18) λ log n ( d + h ) (cid:19) + O (cid:18) dλ × (cid:18) λ ( d + h ) + λ (cid:19)(cid:19) = O (cid:18) dλ × max (cid:26) λ ( d + h ) , (cid:27)(cid:19) . d) h) × N u m be r o f t e s t s ( t ) Ganesan et al.Chang et al.Proposed
Fig. 1: Number of tests versus number of de-fectives and number of inhibitors for identifyingonly defective items when there is no error in testoutcomes. d) h) × N u m be r o f t e s t s ( t ) Chang et al.Proposed
Fig. 2: Number of tests versus number of de-fectives and number of inhibitors for identifyingonly defective items with presence of erroneousoutcomes.VI. S
IMULATION
In this section, we visualize number of tests and decoding times in Table I. We evaluated variations ofour proposed scheme by simulation using d = 2 , , . . . , , h = 0 . d , and n = 2 in Matlab R2015a on anHP Compaq Pro 8300SF desktop PC with a 3.4-GHz Intel Core i7-3770 processor and 16-GB memory.Two scenarios are considered here: identification of defective items (corresponding to section IV) andidentification of defectives and inhibitors (corresponding to section V). For each scenario, two models ofnoise are considered in test outcomes: noiseless setting and noisy setting. In noisy setting, the number oferrors is set to be as 100 times as the summation of the number of defective items and the number ofinhibitor items. Moreover, in some special cases, the number of items and the number of errors may bereconsidered.All figures are plotted in 3 dimensions in which the x-axis (on the right of figures), y-axis (in themiddle of figures), z-axis (the vertical line) represent for number of defectives, number of inhibitors, andnumber of tests. Proposed scheme, Ganesan et al.’s scheme, and Chang et al.’s scheme are visualized withred color with marker of circle, green color with marker of pentagram, and blue color with marker ofasterisk. In noisy setting, Ganesan et al.’s scheme is not plotted because the authors of that scheme didnot consider noisy setting.Since our proposed scheme is nonrandom, the number of tests is slightly larger than the ones proposedby Ganesan et al. and Chang et al. However, due to nonrandom construction, there is no requirement forstoring such big measurement matrices (millions of GBs needed) as the existing works.For decoding time, when the number of items is sufficiently large, the decoding time in our proposedscheme is smallest in comparison with the ones in Chang et al.’s scheme and Ganesan et al.’s scheme. A. Identification of defective items
We illustrate number of tests and decoding time when defective items are the only items that we wantto recover here.
1) Number of tests:
When there is no error in test outcomes, i.e., noiseless setting, the number of testsproposed by Ganesan at al. is lowest. The number of tests in our proposed scheme is larger than thenumber of tests proposed by Ganesan et al. and Chang et al. as illustrated in Fig. 1. However, when thereare some erroneous outcomes, i.e., noisy setting, the number of tests in our proposed scheme is lowestas illustrated in Fig. 2. 16 d) h) × D e c od i ng t i m e Ganesan et al.Chang et al.Proposed d) h) × D e c od i ng t i m e Ganesan et al.Proposed
Fig. 3: Decoding time versus number of defectives and number of inhibitors for identifying only defectiveitems when there is no error in test outcomes. d) h) × D e c od i ng t i m e Chang et al.Proposed
Fig. 4: Decoding time versus number of defectives and number of inhibitors for identifying only defectiveitems with presence of erroneous outcomes.
2) Decoding time:
When there is no error in test outcomes, as shown in Fig. 3, the decoding time inour proposed scheme is lowest. Since the decoding times in our proposed scheme and Ganesan et al.’sscheme are slightly equal, only one line is visible in the left subfigure of Fig. 3. Therefore, we zoomedin that line to see how close these two decoding times are. As plotted in the right subfigure of Fig. 3,when the number of defective items and the number of inhibitor items are not quite large, the decodingtime in our proposed scheme is always smaller the one in Ganesan et al.’s scheme. As the number ofdefective items and the number of inhibitor items increase, the decoding time in our proposed scheme isfirst larger the one in Ganesan et al.’s scheme, though it become smaller in the end. We note that if thenumber of defective items and inhibitor items are fixed while the number of items is sufficiently large,the decoding time in our proposed scheme is always smaller than the ones in Chang et al.’s scheme andGanesan et al.’s scheme.When some erroneous outcome are allowed, the decoding time in our proposed scheme is alwayssmaller than the one in Chang et al.’s scheme as shown in Fig. 4.
B. Identification of defectives and inhibitors
We illustrate number of tests and decoding time for classifying all items. Due to the presence of inhibitoritems and exact classification, the number of tests is larger the number of items in Chang et al.’s schemeand the proposed scheme. The only exception is that number of tests proposed by Ganesan et al. is smallerthan the number of items. 17 d) h) × N u m be r o f t e s t s ( t ) Ganesan et al.Chang et al.Proposed (a) Normal scale. d) h) × N u m be r o f t e s t s ( t ) Ganesan et al.Chang et al. (b) Magnifying scale.
Fig. 5: Decoding time versus number of defectives and number of inhibitors for classifying items whenthere is no error in test outcomes. d) h) × N u m be r o f t e s t s ( t ) Chang et al.Proposed
Fig. 6: Number of tests versus number of de-fectives and number of inhibitors for classifyingitems when the number of erroneous outcomes isas 10 times as the total numbers of defective itemsand inhibitor items. d) h) × N u m be r o f t e s t s ( t ) Chang et al.Proposed
Fig. 7: Number of tests versus number of de-fectives and number of inhibitors for classifyingitems when the number of erroneous outcomes isas 100 times as the total numbers of defectiveitems and inhibitor items.
1) Number of tests:
When there is no error in test outcomes, i.e., noiseless setting, the number of testsproposed by Ganesan et al. is lowest and the one in our proposed scheme is largest as illustrated in Fig. 5.When there are some erroneous outcomes, i.e., noisy setting, the number of tests in our proposed schemeis smaller or larger than the one is proposed by Chang et al. according to the number of erroneousoutcomes. For example, if the number of erroneous outcomes is as 10 times as the total numbers ofdefective items and inhibitor items, the number of tests in our proposed scheme is smaller than thenumber of tests is proposed by Chang et al. as illustrated in Fig. 6. However, when the number oferroneous outcomes is as 100 times as the total numbers of defective items and inhibitor items, thenumber of tests in our proposed scheme is larger than the number of tests is proposed by Chang et al. asin Fig. 7.
2) Decoding time:
It is in principle that the complexity of the decoding time in our proposed schemeis smallest in comparison with the ones in Chang et al.’s scheme and Ganesan et al.’s scheme when thenumber of items is sufficiently large. When there are no errors in test outcomes, the decoding time of theproposed scheme is smallest when the number of items is at least , as shown in subfigure (b) of Fig. 8.When some erroneous outcome are allowed, the decoding time in our proposed scheme is always smallerthan the one in Chang et al.’s scheme when the number of items is at least , as shown in subfigure (b)18 d) h) × D e c od i ng t i m e Ganesan et al.Chang et al.Proposed (a) n = 2 d) h) × D e c od i ng t i m e Ganesan et al.Chang et al.Proposed (b) n = 2 Fig. 8: Decoding time versus number of defectives and number of inhibitors for classifying items whenthere is no error in test outcomes. d) h) × D e c od i ng t i m e Chang et al.Proposed (a) n = 2 d) h) × D e c od i ng t i m e Chang et al.Proposed (b) n = 2 Fig. 9: Decoding time versus number of defectives and number of inhibitors for classifying items whenthere are some erroneous outcomes.of Fig. 9. VII. C
ONCLUSION
We have presented two schemes for efficiently identifying up to d defective items and up to h inhibitorsin the presence of e erroneous outcomes in time poly ( d, h, e, log n ) . This decoding complexity is substan-tially less than that of state-of-the-art systems in which the decoding complexity is linear to the number ofitems n , i.e., poly ( d, h, e, n ) . However, the number of tests with our proposed schemes is slightly higher.Moreover, we have not considered an inhibitor complex model [15] in which each inhibitor in this workwould be transferred to a bundle of inhibitors. Such a model would be much more complicated and isleft for future work. R EFERENCES [1] R. Dorfman, “The detection of defective members of large populations,”
The Annals of Mathematical Statistics , vol. 14, no. 4, pp. 436–440, 1943.[2] D. Du, F. K. Hwang, and F. Hwang,
Combinatorial group testing and its applications , vol. 12. World Scientific, 2000.[3] A. G. D’yachkov and V. V. Rykov, “Bounds on the length of disjunctive codes,”
Problemy Peredachi Informatsii , vol. 18, no. 3,pp. 7–13, 1982.
4] H. Q. Ngo and D.-Z. Du, “A survey on combinatorial group testing algorithms with applications to dna library screening,”
Discretemathematical problems with medical applications , vol. 55, pp. 171–182, 2000.[5] F. Y. Chin, H. C. Leung, and S.-M. Yiu, “Non-adaptive complex group testing with multiple positive sets,”
TCS , vol. 505, pp. 11–18,2013.[6] A. D’yachkov, N. Polyanskii, V. Shchukin, and I. Vorobyev, “Separable codes for the symmetric multiple-access channel,” in , pp. 291–295, IEEE, 2018.[7] G. Cormode and S. Muthukrishnan, “What’s hot and what’s not: tracking most frequent items dynamically,”
ACM TODS , vol. 30, no. 1,pp. 249–278, 2005.[8] G. K. Atia and V. Saligrama, “Boolean compressed sensing and noisy group testing,”
IEEE Trans. on Information Theory , vol. 58,no. 3, pp. 1880–1901, 2012.[9] A. Iscen, M. Rabbat, and T. Furon, “Efficient large-scale similarity search using matrix factorization,” in
Proceedings of the IEEECVPR , pp. 2073–2081, 2016.[10] T. V. Bui, M. Kuribayashi, M. Cheraghchi, and I. Echizen, “A framework for generalized group testing with inhibitors and its potentialapplication in neuroscience,” arXiv preprint arXiv:1810.01086 , 2018.[11] M. Farach, S. Kannan, E. Knill, and S. Muthukrishnan, “Group testing problems with sequences in experimental molecular biology,”in
Compression and Complexity of Sequences 1997. Proceedings , pp. 357–367, IEEE, 1997.[12] A. De Bonis and U. Vaccaro, “Improved algorithms for group testing with inhibitors,”
Information Processing Letters , vol. 67, no. 2,pp. 57–64, 1998.[13] A. De Bonis, L. Gasieniec, and U. Vaccaro, “Optimal two-stage algorithms for group testing problems,”
SIAM J. on Comp. , vol. 34,no. 5, pp. 1253–1270, 2005.[14] F. K. Hwang and Y. Liu, “Error-tolerant pooling designs with inhibitors,”
Journal of Computational Biology , vol. 10, no. 2, pp. 231–236,2003.[15] H. Chang, H.-B. Chen, and H.-L. Fu, “Identification and classification problems on pooling designs for inhibitor models,”
Journal ofComputational Biology , vol. 17, no. 7, pp. 927–941, 2010.[16] A. Ganesan, S. Jaggi, and V. Saligrama, “Non-adaptive group testing with inhibitors,” in
ITW , pp. 1–5, IEEE, 2015.[17] T. V. Bui, T. Kojima, M. Kuribayashi, R. Haghvirdinezhad, and I. Echizen, “Efficient (nonrandom) construction and decoding fornon-adaptive group testing,” arXiv preprint arXiv:1804.03819 , 2018.[18] T. V. Bui, M. Kuribayashil, M. Cheraghchi, and I. Echizen, “Efficiently decodable non-adaptive threshold group testing,” in
ISIT ,pp. 2584–2588, IEEE, 2018.[19] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields,”
JSIAM , vol. 8, no. 2, pp. 300–304, 1960.[20] W. Kautz and R. Singleton, “Nonrandom binary superimposed codes,”
IEEE Transactions on Information Theory , vol. 10, no. 4,pp. 363–377, 1964.[21] A. D’yachkov, P. Vilenkin, D. Torney, and A. Macula, “Families of finite sets in which no intersection of (cid:96) sets is covered by the unionof s others,” Journal of Combinatorial Theory, Series A , vol. 99, no. 2, pp. 195–218, 2002.[22] D. R. Stinson and R. Wei, “Generalized cover-free families,”
Discrete Mathematics , vol. 279, no. 1-3, pp. 463–477, 2004.[23] H.-B. Chen, H.-L. Fu, and F. K. Hwang, “An upper bound of the number of tests in pooling designs for the error-tolerant complexmodel,”
Optimization Letters , vol. 2, no. 3, pp. 425–431, 2008.[24] M. Cheraghchi, “Improved constructions for non-adaptive threshold group testing,”
Algorithmica , vol. 67, no. 3, pp. 384–417, 2013.[25] M. Cheraghchi, “Noise-resilient group testing: Limitations and constructions,”
Discrete Applied Mathematics , vol. 161, no. 1-2, pp. 81–95, 2013.[26] H. Q. Ngo, E. Porat, and A. Rudra, “Efficiently decodable error-correcting list disjunct matrices and applications,” in
InternationalColloquium on Automata, Languages, and Programming , pp. 557–568, Springer, 2011.[27] A. Hoorfar and M. Hassani, “Inequalities on the lambert w function and hyperpower function,”
J. Inequal. Pure and Appl. Math , vol. 9,no. 2, pp. 5–9, 2008., vol. 9,no. 2, pp. 5–9, 2008.