A Compressed Sensing Approach to Pooled RT-PCR Testing for COVID-19 Detection
Sabyasachi Ghosh, Rishi Agarwal, Mohammad Ali Rehan, Shreya Pathak, Pratyush Agrawal, Yash Gupta, Sarthak Consul, Nimay Gupta, Ritika, Ritesh Goenka, Ajit Rajwade, Manoj Gopalkrishnan
aa r X i v : . [ q - b i o . Q M ] M a y A Compressed Sensing Approach to Group-testing for COVID-19Detection
Sabyasachi Ghosh † , Rishi Agarwal † , Mohammad Ali Rehan † , Shreya Pathak † ,Pratyush Agrawal † , Yash Gupta † , Sarthak Consul ∗ , Nimay Gupta † , Ritika Goyal † ,Ajit Rajwade † ∗ , Manoj Gopalkrishnan ∗ †‡ May 19, 2020
Abstract
We propose Tapestry, a novel approach to pooled testing with application to COVID-19 testing with quantitativePolymerase Chain Reaction (PCR) that can result in shorter testing time and conservation of reagents and testingkits. Tapestry combines ideas from compressed sensing and combinatorial group testing with a novel noise modelfor PCR. Unlike Boolean group testing algorithms, the input is a quantitative readout from each test, and theoutput is a list of viral loads for each sample. While other pooling techniques require a second confirmatoryassay, Tapestry obtains individual sample-level results in a single round of testing. When testing n samples with t tests, as many as k = O ( t/ log n ) infected samples can be identified at clinically-acceptable false positive and falsenegative rates. This makes Tapestry viable even at prevalence rates as high as 10%. Tapestry has been validated insimulations as well as in wet lab experiments with oligomers. Clinical trials with Covid-19 samples are underway.An accompanying Android application Byom Smart Testing which makes the Tapestry protocol straightforward toimplement in testing centres is available for free download. Index terms—
Compressed sensing, group testing, pooled testing, sensing matrix design, expander matrices,Steiner triples, mutual coherence, restricted isometry constant, LASSO, OMP, sparse Bayesian learning, COMP,Covid-19, coronavirus
The coronavirus disease of 2019 (COVID-19) crisis has led to widespread lockdowns in several countries, and has hada major negative impact on the economy. Early identification of infected individuals can enable quarantining of theindividuals and thus control the spread of the disease. Infected individuals are often asymptomatic for many days.Widespread testing with the RT-PCR (reverse transcription polymerase chain reaction) method can help identify theinfected individuals. However, widespread testing is not an available option in many countries due to constraints onresources such as time, basic equipment, skilled manpower and reagents.The current low rate of COVID-19 infection in the world population means that most samples tested are notinfected, so that most tests are wasted on uninfected samples. Group testing is a process of pooling together samplesof n different people into multiple pools, and testing the pools instead of each individual sample. A negative result onthe mixture implies that all n samples were negative. This saves a huge amount of testing resources, especially withlow infection rates. Group testing for medical applications has a long history dating back to the 1940s when it wasproposed for testing of blood samples for syphilis [1]. Simple group testing schemes have already been applied in thefield by several research labs [2, 3, 4] for COVID-19 testing.Simple group testing schemes require pooling of samples and a second round of RNA extraction for all samples inpositive pools. This second round of RNA extraction can increase the time to result and be laborious to perform inlaboratories where the RNA extraction process is done manually. In situations where the result needs to be deliveredfast or a second round of RNA extraction must be avoided, these schemes are less attractive.Tapestry has a number of salient features which we enumerate below.1. Tapestry delivers results in a single round of testing without the need for a second confirmatory round. ∗ AR acknowledges support from SERB Matrics grant MTR/2019/000691 † † Dept. of Computer Science and Engg., IIT Bombay, India; { sghosh, ajitvr } @cse.iitb.ac.in ‡ ∗ Dept. of Electrical Engineering, IIT Bombay; [email protected]
1. The number m of required tests is O ( k log n ) as per compressed sensing theory [5, 6]. In the targeted use caseswhere the number of infected samples k ≪ n , we see that m ≪ n . Consequently we obtain significant savings intesting time as well as testing resources such as number of tests, quantity of reagents, and manpower.3. Tapestry reconstructs viral loads i.e., number of copies of the virus per sample. This additional information maybe clinically relevant. For example, some recent research has shown a positive correlation between severity ofCovid-19 infection and the viral loads, and this also has an effect on the period for shedding the virus [7].4. Tapestry can estimate the number of infected samples k . This gives Tapestry a graceful failure mode . Whenit encounters a batch where the number of infected samples turns out to be much larger than the anticipatednumber, Tapestry first recognizes that this is the case by estimating the number of infected samples. It thenreturns a list of samples that are suspected to be positive. This list comes with the promise that with highprobability it includes all the true positives. At the same time, the algorithm tries to keep the size of this list assmall as possible. In such cases — which should be infrequent in the field given reasonable estimates of prevalencerate — a second round of testing is required on the suspected positive samples to get confirmed decisions. Inthese rare cases, Tapestry has turned into an adaptive test. Even in this regime, Tapestry will save on totalnumber of tests, and still compares favorably with simple two-round pooling tests.5. When the number of positives is higher than Tapestry is able to decode, Tapestry returns far fewer false positivesthan traditional group testing algorithms such as COMP [8] while maintaining clinically acceptable false negativerates.6. Because each sample is tested in three pools, Tapestry can detect and correct for some degree of noise in termsof cross-contamination of samples and pipetting errors.7. Tapestry allows PCR test measurements to be noisy. We develop a novel noise model to describe noise in PCRexperiments. Our algorithms are tested on this noise model in simulation.8. All tuning parameters for execution of the algorithms are inferred on the fly in a data driven fashion.9. Each sample goes to exactly three pools, and each pool has the same number of samples. This simplifies theexperimental design, conserves sample, keeps pipetting overhead to a minimum, and makes sure that dilutiondue to pool size is in a manageable regime.The organization of the paper is as follows. We first present a brief overview of the RT-PCR method in Sec. 2. Theprecise mathematical definition of the computational problem being solved in this paper is then put forth in Sec. 3.1.We describe traditional and CS-based group-testing algorithms for this problem in Sec. 3.2, 3.3 and 3.4. The sensingmatrix design problem is described in Sec. 3.5. Results on synthetic data are presented in Sec. 4. This is followed byresults on a limited amount of data from lab experiments performed with oligomers to mock the clinical situation asclosely as possible. In Sec. 5, we compare our work to two recent related approaches. We conclude in Sec. 6 with aglance through different scenarios where our work could be deployed. In the RT-PCR method [9], a sample is collected from a patient from a biological matrix that would with high likelihoodcontain copies of the virus should the patient be infected. For COVID-19, nasal swabs are most common, but otherbiological matrices like saliva, oropharyngeal swab, bronchial lavage, stool, blood sample, etc may also be collectedin certain cases. The sample is then dispersed into a liquid medium. The RNA molecules of the virus present inthis liquid medium are converted into complementary DNA (cDNA) via a process called reverse transcription. DNAfragments called primers complementary to cDNA from the viral genome are then added. They attach themselves tospecific sections of the cDNA from the viral genome if the virus is present in the sample. The cDNA of these specificviral genes then undergoes a process of exponential amplification in an RT-PCR machine. Here, they are put throughseveral cycles of alternate heating and cooling in the presence of Taq polymerase and appropriate reagents. Thistriggers the creation of many new identical copies of specific portions of the target DNA, roughly doubling in numberwith every cycle of heating and cooling. The reaction volume contains sequence-specific fluorescent markers whichreport on the total amount of amplified DNA of the appropriate sequence. The resulting fluorescence is measured, andthe increase can be observed on a computer screen in real time. The time when the amount of fluorescence exceedsthe threshold level is known as the threshold cycle C t , and is a quantitative readout from the experiment. A smaller C t indicates greater number of copies of the virus. Usually C t takes values anywhere between 16 to 32 cycles in realexperiments. PCR can detect even single molecules. A single molecule typically would have C t value of around 37cycles or so, but there can be wide variation. The test takes about 3-4 hours to run.2 Testing Methods
Let x denote a vector of n elements where x i is the viral load (or virus concentration) of the i th person. Throughoutthis paper we assume that only one sample per person is extracted. Hence x contains the viral loads corresponding to n different people. Due to the low infection rate for COVID-19 as yet, x is considered to be a sparse vector with atthe most k ≪ n positive-valued elements. Note that x i = 0 implies that the i th person is not infected. In nonadaptivegroup testing, small and equal volumes of the samples of a subset of these n people are pooled together according to asensing or pooling matrix A = ( A ji ) m × n whose entries are either 0 or 1. The viral loads of the pools will be given by: z j = n X i =1 A ji x i , ≤ j ≤ m, ≤ i ≤ n, (1)where A ji = 1 if a portion of the sample of i th person is included in the j th pool. In all, some m < n pools are createdand individually tested using RT-PCR, so that 1 ≤ j ≤ m . We now have the following relationship: z = Ax , (2)where z is the m -element vector of viral loads in the mixtures, and A denotes a m × n binary ‘pooling matrix’ (alsoreferred to as a ‘sensing matrix’ in compressed sensing literature). In particular, z j = A j x where A j is the j th row of A and placing the two vectors adjacent to each other denotes matrix multiplication of the row vector with the columvector x . Note that each positive RT-PCR test will yield a noisy version of z j , which we refer to as y j . The relationbetween the ‘clean’ and noisy versions is given as: y j = z j (1 + q ) e j = (1 + q ) e j A j x , (3)where e j ∼ N (0 , σ ) and q ∈ (0 ,
1) are system-specific constants. The factor (1 + q ) e j reflects the stochasticity in thegrowth of the numbers of DNA molecules during PCR. Here σ is known and constant. We have found that a numberaround . q , we find a number around .
95 to be typical. Equivalently for positive tests, we have:log y j = log( A j x ) + log(1 + q ) e j . (4)In case of negative tests, y j as well as z j are 0-valued, and no logarithms need be computed. The core computationalproblem is to estimate x given y and A . It should be noted that though we have treated each element of x to be afixed quantity, it is in reality a random variable of the form x i ∼ Poisson( λ i ) where λ i ≥
0. If matrix A contains onlyones and zeros, this implies that z j ∼ Poisson( A j x ), making use of the fact that the sum of Poisson random variablesis also a Poisson random variable. Combinatorial Orthogonal Matching Pursuit (COMP) is a Boolean nonadaptive group testing method. Here one usesthe simple idea that if a mixture y j tests negative then any sample x i for which A ji = 1 must be negative. In otherwords, all samples contributing to a mixture that tests negative must be negative (or non-infected). The other samplesare all considered to be positive. This algorithm guarantees that there are no ‘false negatives’. However it can producea very large number of ‘false positives’. For example, a sample x k will be falsely reported to be positive if every mixture y j it is part of also contains at least one (other) genuinely positive sample. The COMP algorithm is largely insensitiveto noise. Moreover a small variant of it can also produce a list of ‘sure positives’, after identifying the sure negatives.This happens when a positive mixture y j contains only one sample x i , not counting the other samples which weredeclared sure negatives in the earlier step . The performance guarantees for COMP have been analyzed in [8] andshow that COMP requires ek (1 + δ ) log n tests for an error probability less than n − δ . This analysis has been extendedto include the case of noisy test results as well [8]. However COMP can result in a large number of false positives, andit also does not predict viral loads. Group testing is intimately related to the field of compressed sensing (CS) [10], which has emerged as a significantsub-area of signal and image processing [5]. In compressed sensing, an image or a signal x with n elements, is directly Such a step of identifying ‘sure positives’ is included in the so-called ‘Definite Defectives’ (DD) Algorithm. However DD labels allremaining items to be negative, potentially leading to a large number of false-negatives. m linear measurements of the form y = Ax + η . Here, the measurement vector y has m elements, and A is a matrix of size m × n , and η is a vector of noise values. If x is a sparse vector with k ≪ n non-zero entries, and A obeys the so-called restricted isometry property (RIP), then exact recovery of x from y , A ispossible [11] if η = . In the case of measurement noise, the recovery of x produces a solution that is provably closeto the original x . A typical recovery problem P0 consists of optimizing the following cost function:min k x k s.t. k y − Ax k ≤ ε, (5)where ε is an upper bound (possibly a high probability upper bound) on k η k , and k x k is the number of non-zeroelements in x . In the absence of noise, a unique and exact solution to this problem is possible with as few as 2 k measurements in y if x has k non-zero elements [11]. Unfortunately, this optimization problem P0 is NP-Hard andthe algorithm requires brute-force subset enumeration. Instead, the following problem P1 (often termed ‘Basis PursuitDenoising’ or BPDN) is solved in practice: min k x k s.t. k y − Ax k ≤ ε. (6) P1 is a convex optimization problem which yields the same solution as the earlier problem (with similar conditions on x , A ) at significantly lower computational cost, albeit with O ( k log n ) measurements (i.e. typically greater than 2 k )[5, 11].The order k restricted isometry constant (RIC) of a matrix A is defined as the smallest constant δ k , for which thefollowing relationship holds for all k -sparse vectors x (i.e. all vectors with at the most k non-zero entries):(1 − δ k ) k x k ≤ k Ax k ≤ (1 + δ k ) k x k . (7)The matrix A is said to obey the order k restricted isometry property if δ k is close to 1. This property essentiallyimplies that no k -sparse vector (other than the zero vector) can lie in the null-space of A . Unique recovery of k -sparse signals requires that no 2 k -sparse vector lies in the nullspace of A [11]. A matrix A which obeys RIP of order2 k satisfies this property. It has been proved that matrices with entries randomly and independently drawn fromdistributions such as Rademacher or Gaussian, obey the RIP of order k with high probability [12], provided they haveat least O ( k log n ) rows. The solution to the optimization problems P0 and P1 in Eqns. 5 and 6 respectively, areprovably robust to noise [5], and the recovery error worsens with increase in noise magnitude. The error bounds for P0 in Eqn. 5 are of the form, for solution ˆ x [13]: ε √ δ k ≤ k x − ˆ x k ≤ ε √ − δ k , (8)whereas those for P1 in Eqn. 6 have the form [13]: k x − ˆ x k ≤ εC ( δ k ) . (9)Here C ( δ k ) is a monotonically increasing function of δ k ∈ (0 ,
1) and has a small value in practice.Over the years, a variety of different techniques for compressive recovery have been proposed. We use some ofthese for our experiments in Sec. 3.4. These algorithms use different forms of sparsity and incorporate different typesof constraints on the solution.
Our approach toward group-testing for COVID-19 involves a two-stage procedure . In the first stage, we apply theCOMP algorithm described in Sec. 3.2, to identify the sure negatives (if any) in x to form a set X . Let Y be theset of zero-valued measurements in y (i.e. negative tests). Moreover, we define ¯ X , ¯ Y as the complement-sets of X , Y respectively. Also, let y ¯ Y be the vector of m − |Y| measurements which yielded a positive result. Let x ¯ X be thevector of n − |X | samples, which does not include the |X | surely negative samples. Let A ¯ X , ¯ Y be the submatrixof A , having size ( m − |Y| ) × ( n − |X | ), which excludes rows corresponding to zero-valued measurements in y andcolumns corresponding to negative elements in x . In the second stage, we apply a CS algorithm to recover x ¯ X from y ¯ Y , A ¯ X , ¯ Y . To avoid symbol clutter, we henceforth just stick to the notation y , x , A , even though they respectively referto y ¯ Y , x ¯ X , A ¯ X , ¯ Y .Note that the CS stage following COMP is very important for the following reasons:1. COMP typically produces a large number of false positives. The CS algorithms help reduce the number of falsepositives as we shall see in later sections. The two-stage procedure is purely algorithmic. It does not require two consecutive rounds of testing in a lab.
4. COMP does not estimate viral loads, unlike CS algorithms.3. In fact, unlike CS algorithms, COMP treats the measurements in y as also being binary, thus discarding a lot ofuseful information.However, the COMP algorithm prior to applying the CS algorithm is also very important for the following reasons:1. It identifies the sure negatives in x from the negative measurements in y . Therefore, it effectively reduces thesize of the problem to be solved by the CS step from ( m, n ) to ( m − |Y| , n − X | ).2. It can be seen from Eqn. 4, that the multiplicative noise we have in y is essentially heteroscedastic. This isbecause the 0-valued measurements in y are noiseless, and the others are noisy. In such a case, direct applicationof CS algorithms without the preceding COMP step will be fraught with challenges. It is instead easier to discardthe obvious negatives before applying the CS step.For CS recovery, we employ the following algorithms: the non-negative LASSO (NN-LASSO), orthogonal matchingpursuit (OMP), and Sparse Bayesian Learning (SBL). For problems of small size, we also apply a brute force searchalgorithm to solve problem P0 from Eqn. 5 combinatorially. The LASSO (least absolute shrinkage and selection operator) is a penalized version of the constrained problem P1 inEqn. 6. It is widely used in statistics and signal processing, and seeks to minimize the following cost function: J lasso ( x ; y , A ) := k y − Ax k + λ k x k . (10)Here λ is a regularization parameter which imposes sparsity in x . The LASSO has rigorous theoretical guarantees forrecovery of x as well as recovery of the support of x (i.e. recovery of the set of non-zero indices of x ). We refer thereader to chapter 11 of [14] for the relevant theorems. Given the non-negative nature of x , we implement a variant ofLASSO with a non-negativity constraint, leading to the following optimization problem: J nnlasso ( x ; y , A ) := k y − Ax k + λ k x k s.t. x ≥ . (11) Selection of λ : There are criteria defined in [14] for selection of λ under iid Gaussian noise, so as to guaranteestatistical consistency. However, in practice, cross-validation (CV) can be used for optimal choice of λ . For this, themeasurements in y are divided into two randomly chosen disjoint sets: one for reconstruction ( R ) and the other forvalidation ( V ). The NN-LASSO is executed independently on multiple values of λ from a candidate set Λ. For each λ value, an estimate ˆ x λ is produced using measurements only from R , and the CV error v e ( λ ) := P i ∈V ( y i − A i ˆ x λ ) is computed. The value of λ which yields the least value of v e ( λ ) is chosen, and an estimate of x is obtained byexecuting LASSO on all measurements from R ∪ V . If V is large enough, then v e ( λ ) is shown to be a good estimateof the actual error k x − ˆ x λ k , as has been shown for Gaussian noise [15]. Nonetheless, it should be noted thatCV is a method of choice for parameter selection in CS even under a variety of other noise models such as Poisson[16], etc, and we have experimentally observed that it works well even in the case of our noise model in Eqn. 4.However, in this particular application, we may potentially deal with situations where m and n are small, for example m = 16 , n = 40. In such a scenario, CV will yield unstable results since |V| will be very small. In such a case,one resorts to the so-called ‘discrepancy principle’ (DP) [17]. As per our noise model in Eqn. 4, the expectedvalue of R ( y , A ˆ x ) := k log y − log A ˆ x k should be close to σ log(1 + q ) √ m . Moreover, its variance is quite small andindependent of m . Hence, as per the discrepancy principle, we seek to find λ ∈ Λ such that | R ( y , A ˆ x λ ) − σ log(1+ q ) √ m | is minimized. Previous work in [18] has employed such a technique in the context of image deblurring under Poisson-Gaussian noise with a square-root based data-fidelity term of the form k p y + 3 / − p A ˆ x λ + 3 / k . Orthogonal Matching Pursuit (OMP) [19] is a greedy approximation algorithm to solve the optimization problem inEqn. 5. Rigorous theoretical guarantees for OMP have been established in [20]. OMP proceeds by maintaining aset H of ‘selected coefficients’ in x corresponding to columns of A .In each round a column of A is picked greedily,based on the criterion of maximum absolute correlation with a residual vector r := y − P k ∈H A k ˆ x k . Each timea column is picked, all the coefficients extracted so far (i.e. in set H ) are updated. This is done by computingthe orthogonal projection of y onto the subspace spanned by the columns in H . The OMP algorithm can be quiteexpensive computationally. Moreover, in order to maintain non-negativity of x , the orthogonal projection step wouldrequire the solution of a non-negative least squares problem, further adding to computational costs. However, a fastimplementation of a non-negative version of OMP (NN-OMP) has been developed in [21], which is the implementationwe adopt here. For the choice of ε in Eqn. 5, we can use CV or DP as described in Sec. 3.4.1.5 .4.3 Sparse Bayesian Learning (SBL) Sparse Bayesian Learning (SBL) [22, 23] is a non-convex optimization algorithm based on Expectation-Maximization(EM) that has empirically shown superior reconstruction performance to most other CS algorithms with manageablecomputation cost [24]. In SBL, we consider the case of Gaussian noise in y and a Gaussian prior on elements of x ,leading to: p ( y | x ) = exp( −k y − Ax k / (2 σ ))(2 πσ ) n/ (12) p ( x i ; ϕ i ) = exp( − x i / (2 ϕ i )) √ πϕ i ; ϕ i ≥ . (13)Since both x and ϕ (the vector of the { ϕ i } ni =1 values) are unknown, the optimization for these quantities can beperformed using an EM algorithm. In the following, we shall denote Φ := diag( ϕ ). Moreover, we shall use thenotation Φ ( l ) for the estimate of Φ in the l th iteration. The E-step of the EM algorithm here involves computing Q ( Φ | Φ l ) := E x | y ; Φ ( l ) log p ( y , x ; Φ ). It is to be noted that the posterior distribution p ( x | y ; Φ ( l ) ) has the form N ( µ , Σ )where µ = Σ A T y /σ and Σ = ( A T A /σ + ( Φ ( l ) ) − ) − . The M-step involves maximization of Q ( Φ | Φ ( l ) ), leading tothe update Φ ( l +1) = diag( µ i + Σ ii ). The E-step and M-step are executed alternately until convergence. Convergenceto a fixed-point is guaranteed, though the fixed point may or may not be a local minimum. However, all local minimaare guaranteed to produce sparse solutions for x (even in the presence of noise) because most of the ϕ i values shrinktowards 0. The SBL procedure can also be modified to dynamically update σ , the Gaussian noise variance, if it isunknown. All these results can be found in [23]. Unlike NN-LASSO or OMP, the SBL algorithm from [23] expresslyrequires Gaussian noise. However we use it as is in this paper for the simplicity it affords, and choose the standarddeviation of the noise (required for the SBL updates) simply via CV or DP as described in Sec. 3.4.1. Unlike NNOMP orNNLASSO, there is no explicit non-negativity constraint imposed in the basic SBL algorithm. In our implementation,the non-negativity is simply imposed at the end of the optimization by setting to 0 any negative-valued elements in µ , though more principled, albeit more computationally heavy, approached can be adopted [25]. The sensing matrix A must obey some properties specific to this application such as being non-negative. For ease ofpipetting, it is desirable that the entries of A be binary, where A ji = 0 indicates that sample i did not contribute topool j , and A ji = 1 indicates that a fixed unit volume of sample i was pipetted into pool j . Also for ease of pipettingincluding saving pipetting time, it is desirable that A be sparse. This additionally ensures that not too many samplescontribute to a pool, and that a single sample does not contribute to too many pools. The former is important becausetypically the volume of sample that is added in a PCR reaction is fixed. Increasing pool size means each samplecontributes a smaller fraction of that volume. This leads to dilution which manifests as a shift of the C t value towardslarger numbers. If care is not taken in this regard, this can affect the power of PCR to discriminate between positiveand negative samples. The latter is important because contribution of one sample to a large number of pools couldlead to depletion of sample.The Restricted Isometry Property (RIP) of sensing matrices is a sufficient condition for good CS recovery asdescribed in Sec. 3.3. However the matrices which obey the aforementioned physical constraints are not guaranteedto obey RIP. Instead, we consider sensing matrices which are adjacency matrices of expander graphs . A left-regularbipartite graph G (( V I , V O ) , E ⊆ V I × V O ) with degree of each vertex in V I being d , is said to be a ( k, − α ) expandergraph for some integer k > α ∈ (0 , S ⊆ V I with |S| ≤ k , we have | N ( S ) | ≥ (1 − α ) d |S| . Here N ( S ) denotes the union set of neighbors of all nodes in S . Intuitively a bipartite graphis an expander if every ‘not too large’ subset has a ‘large’ boundary. It can be proved that a randomly generatedleft-regular bipartite graph is an expander, with high probability [26]. Moreover, it has been shown in [6] that theadjacency matrix A of a ( k, − α ) expander graph obeys the RIP-1 property (a version of the RIP, but with ℓ norm).That is, for any k -sparse vector x , the following relationship holds if A obeys the RIP-1 property of order k :(1 − α ) d k x k ≤ k Ax k ≤ d k x k . (14)This property again implies that the null-space of A cannot contain vectors that are ‘too sparse’ (apart from thezero-vector). This summarizes the motivation behind the use of expanders in compressive recovery of sparse vectors,and also in group testing.Although randomly generated left-regular bipartite graphs are expanders, we would need to verify whether a particular such graph is a good expander, and prove theorems for it. In the application at hand, this can prove to6e a critical limitation since matrices of various sizes may have to be served, depending on the number of samplesarriving in that batch at the testing centre, and the number of tests available to be performed. Hence, we have chosento employ deterministic procedures to design such matrices, based on objects from combinatorial design theory knownas Kirkman triples [27]. Generalizations of these kind of objects under the name of the ‘social golfer problem’ is anactive area of research.We first recall Kirkman triple systems which are Steiner triple systems with an extra property. Steiner triplesystems consist of n = (cid:0) m (cid:1) / m with entries either 0 or 1 such that each column has exactlythree 1s, and no two columns have dot product more than 1 [28]. If the sum of columns from i to i + m/ − ∈ R m for every i ≡ m/ mutual coherence of the matrix, defined as: µ ( A ) := max i = j | A i t A j |k A i k k A j k , (15)where A i refers to the i th column of A . Matrices with lower µ ( A ) values have lower values of worst case upper boundson the reconstruction error [29]. These bounds are looser than those based on the RIC that we saw in previous sections.However, unlike the RIC, the mutual coherence is efficiently computable.Another benefit is that the Kirkman matrix bipartite graph of samples and tests has high girth, i.e., the smallestcycle in the graph has large diameter. Hence the graph is locally treelike, and has good expansion properties. As aresult, the RIP-1 property holds and the matrix has good reconstruction properties under compressed sensing.A practical benefit of Kirkman triples that is not shared by Steiner triples is that Kirkman triples can be servedfor number of samples far less than n = (cid:0) m (cid:1) / n to be anyinteger multiple of m/
3, and ensure that every pool gets the same number of samples. This allows us to characterizethe properties of the full Kirkman matrix, and use that analysis to predict how it will behave in the clinical situationwhere the pooling matrix to be served may require very specific values of m, n depending on the prevalence rate.As mentioned earlier, the mutual coherence is efficient to compute and optimize over. Hence, there is a largebody of literature on designing CS matrices by minimizing µ ( A ) w.r.t. A , for example [30]. We followed sucha procedure for designing sensing matrices for some of our experimental results in Sec. 4.2. For this, we followsimulated annealing to update the entries of A , starting with an initial condition where A is a random binary matrix.For synthetic experiments, we compared such matrices with random matrices, biregular random sparse graphs, andKirkman matrices. We found that matrices of Kirkman triples perform very well empirically in the regime of sizes weare interested in, and hence the results are reported using only Kirkman or Steiner triple matrices. In the class of ‘adaptive group testing’ techniques, the n samples are distributed into two or more groups, each ofsmaller size, and the smaller groups are then individually tested. In one particular adaptive method called generalizedbinary splitting (GBS) [31], this procedure is repeated (in a binary search fashion) until a single infected sample isidentified. This requires O (log n ) sequential tests, where each test requires mixing up to n/ n − k . It requires atotal of only O ( k log n ) tests, if k is the number of infected samples. However such a multi-stage method is impracticalto be deployed due to its sequential nature, since each RT-PCR stage requires nearly 3-4 hours. Moreover, eachmixture that is tested contains contributions from as many as O ( n ) samples, which can lead to significant dilution ormay be difficult to implement in the lab. Hence in this work, we do not pursue this particular approach further. Suchan approach may be very useful if each individual test had a quick turn-around time. In this section, we show a suite of experimental results on synthetic data as well as on real data.
Signal/Measurement Generation:
For the case of synthetic data, we generated signal vectors x of dimension n = 105, with varying levels of sparsity. In all cases, m = 45 noisy measurements in y were simulated following the noise7 ×
105 Kirkman 93 ×
961 Kirkman k RMSE ×
105 and 93 ×
961 Kirkman triple matrices. For eachcriterion, mean and standard deviation values are reported, across 100 signals.model in Eqn. 4 with σ = 0 .
1. The Poisson nature of the elements of x in Eqn. 4 was ignored. This approximationwas based on the principle that if X ∼ Poisson( λ ), then Std. Dev.( X ) /E ( X ) = √ λ/λ = 1 / √ λ which becomessmaller and smaller as λ increases. Kirkman triple sensing matrices were used in these experiments for generating themeasurements. The recovery algorithms were tested on Q = 100 randomly generated signals. The different signalshad different supports, and the magnitudes of the non-zero elements were uniformly randomly generated in the range[1 , Algorithms tested:
The following algorithms were compared:1. COMP (see Table 1)2. COMP followed by NN-LASSO (see Table 2)3. COMP followed by SBL (see Table 3)4. COMP followed by NN-OMP (see Table 4)5. COMP-BF, i.e. COMP followed by brute-force search, only for small sample sizes (see Tables 5 and 7)
Comparison Criteria:
In the following, ˆ x denotes the estimate of x . Most numerical algorithms do not producevectors that are exactly sparse, due to issues such as choice of convergence criterion. Instead, the recovered signalscontain many entries which have a tiny magnitude. Since in this application, support recovery is of paramountimportance (to identify which samples in x were infected), we employ the following post-processing step: All entriesin ˆ x whose magnitude falls below a threshold τ := 0 . × x min are set to zero, yielding a vector ˜ x . Here x min refersto the least possible value of the viral load, and this can be obtained offline from practical experiments (even onsingle samples). In these synthetic experiments, we simply set x min := 1. The various algorithms were compared withrespect to the following criteria:1. RMSE := k x − ˜ x k / k x k
2. Number of false positives (fp) := |{ i : x i = 0 , ˜ x i > }|
3. Number of false negatives (fn) := |{ i : x i > , ˜ x i = 0 }|
4. Sensitivity (also called Recall or True Positive rate) := ratio of number of correctly detected positives to thenumber of actual positives5. Specificity (also called True Negative Rate) := ratio of number of correctly detected negatives to the number ofactual negatives.It should be noted that all algorithms were evaluated on 100 randomly generated sparse signals, given the same sensingmatrix. The average value as well as standard deviation of all quality measures (over the 100 signals) are reportedin the Tables 1, 2, 3, 4, 5, 7. A comparison of Table 1 to Tables 2, 3, 4, 5, 7 indicates that COMP followed byNN-LASSO/SBL/NNOMP/NN-BF significantly reduces the false positives at the cost of a rare false negative. TheRMSE is also significantly improved, since COMP does not estimate viral loads. We note that the experimental resultsreported in these tables are quite encouraging, since these experiments are challenging due to small m and fairly large k, n , albeit with testing on synthetic data. Brute-force method : We refer to COMP-BF as a method where we apply COMP followed by a brute-forcesearch to solve problem P0 in Eqn. 5. This is computationally feasible only if C ( n, k ) is ‘reasonable’ (note that8 ×
105 Kirkman 93 ×
961 Kirkman k RMSE ×
105 and 93 ×
961 Kirkman triplematrices. For each criterion, mean and standard deviation values are reported, across 100 signals. ×
105 Kirkman 93 ×
961 Kirkman k RMSE ×
105 and 93 ×
961 Kirkman triple matrices.For each criterion, mean and standard deviation values are reported, across 100 signals. ×
105 Kirkman 93 ×
961 Kirkman k RMSE ×
105 and 93 ×
961 Kirkmantriple matrices. For each criterion, mean and standard deviation values are reported, across 100 signals.9ethod k RMSE P0 (Eqn. 5) for 16 ×
40 Steinertriple matrices, with different values of k , assuming that the true value of k was known. Results for both methods arereported on synthetic data. n = 105 n = 961 k k n ) 105 and 961 for various k .Note that the compressed sensing methods we have adopted here require much fewer tests (45 and 93) typically, anddo not require two rounds of testing.the effective n is often reduced after application of COMP), and so we employ it only for small-sized matrices. Themethod essentially enumerates all possible supports of x which have size k . For each such candidate support set Z ,the following cost function is minimized using the fmincon routine of MATLAB which implements an interior-pointoptimizer : J ( x Z ) := k log y − log A Z x Z k such that x Z ≥ . (16)Results with the COMP-BF method are shown in Table 5. The special advantage of the brute-force method is that itrequires only m = 2 k mixtures, which is often less than O ( k log n ). However, such a method requires prior knowledgeof k , or an estimate thereof. We employ a method to estimate k directly from y , A . This is described in Sec. 4.3.The results in Table 5 assume that the exact k was known, or that the estimator predicted the exact k . However, weobserved that the estimator from Sec. 4.3 can sometimes over-estimate k . Hence, we also present results with COMP-BF where the brute-force search assumed that the sparsity was (over-estimated to be) k + 1 instead of k . Theseare shown in Table 7. A comparison of Tables 5 and 7 shows that RMSE deteriorates if k is incorrectly estimated.However there is no adverse effect on the number of false negatives, and only a small adverse effect on the number offalse positives. Comparison with Dorfman Pooling:
We also performed a comparison of our algorithms with the populartwo-stage Dorfman pooling method (an adaptive method), with regard to the number of tests required. In the firststage of the Dorfman pooling technique, the n samples are divided into p n/k pools, each of size √ nk . Each ofthese p n/k pools are tested, and a negative result leads to all members of that pool being considered negative (i.e.non-infected). However, the pools that are tested positive are passed onto a second stage, where all members of thosepools are individually tested. The comparison w.r.t. the number of tests is shown in Table 6, assuming (a) that thenumber of infected samples k is known in advance, and (b) that the k infected samples are distributed across pools(a worst case situation). Comparisons of Tables 1, 2, 3, 4 with the two-stage Dorfman pooling method in 6 show thatour methods require much fewer tests, albeit with a slight increase in number of false negatives. Moreover, all ourmethods are single-stage methods unlike the Dorfman method which requires two stages of testing. If the value of k is unknown, which will usually be the case on the field, then the method from Sec. 4.3 can be used. We acquired real data in the form of test results on pooled samples from two labs: one at the National Centerof Biological Sciences (NCBS) in India, and the other at the Wyss Institute at the Harvard Medical School, USA. https://in.mathworks.com/help/optim/ug/choosing-the-algorithm.html k RMSE P0 (Eqn. 5) for 16 ×
40 Steinertriple matrices, with different values of k , assuming that the sparsity value was estimated to be k + 1. Results for bothmethods are reported on synthetic data.In both cases, viral RNA was artificially injected into k of the n samples where k ≪ n . From these n samples,a total of m mixtures were created. For the datasets obtained from NCBS that we experimented with, we had m = 16 , n = 40 , k ∈ { , , , } . For the data from the Wyss Institute, we had m = 24 , n = 60 , k = 3. The resultsfor all these datasets are presented in Table 8. The pooling matrices in both cases were obtained by performing asimulated annealing procedure to minimize the mutual coherence (see Sec. 3.5), starting with a random sparse binarymatrix as initial condition. We see that the CS algorithms reduce the false positives, albeit with an introduction ofoccasional false negatives for higher values of k . We also refer the reader to our work in [32] for a more in-depthdescription of results on real experimental data. CS algorithms require O ( k log n ) measurements for successful recovery assuming RIP-1 or RIP-2 properties of thesensing matrix. However, in practice k is always unknown, which leads to the question as to how many measurementsare needed as a minimum for a particular problem instance. To address this, we adopt the technique from [34]to estimate k on the fly from the compressive measurements. This technique does not require signal recovery forestimating k . The relative error in the estimate of k is shown to be O ( p log m/m ) [35], which diminishes as m increases (irrespective of the true k ). The advantage of this estimate of k is that it can drive the COMP-BF algorithm,as well as act as an indicator of whether there exist any false negatives. There exists very little literature on applying CS for COVID-19 testing. While preparing this manuscript, we cameacross two references on arxiv: [36] and [37]. Both these references adopt a nonadaptive CS based approach. Howevercompared to them, our work is different in the following ways (also see [32]):1.
Real/Synthetic data : The work in [36] is largely theoretical, whereas our work as well as that in [37] have testedresults on real data.2.
Noise model : Our work uses the physically-derived noise model in Eqn. 4 (as opposed to only Gaussian noise).This noise model is not considered in [36, 37].3.
Algorithms : The work in [36] adopts the BPDN technique (i.e P1 from Eqn. 6) as well as the brute-forcesearch method for reconstruction. The work in [37] uses the LASSO, albeit without explicit imposition of anon-negativity constraint. On the other hand, we use the LASSO with a non-negative constraint, the brute-forcemethod, as well as other techniques such as SBL and NNOMP, all in combination with COMP. The work in [36]assumes knowledge of the (Gaussian) noise variance for selection of ε in the estimator in Eqn. 6, whereas we usecross-validation for all our estimators. The technique in [37] uses a slightly different form of cross-validation forselection of the regularization parameter in LASSO.4. Sensing matrix design : [36] use randomly generated expander graphs, whereas we use Kirkman triple matrices.The work in [37] uses randomly generated sparse Bernoulli matrices or Reed-Solomon codes. Each sample in ourmatrix participates in 3 pools as opposed to 6 pools as used in [37] which is advantageous from the point of viewof pipetting time. For deterministic binary sensing matrices, the sample complexity is O (max( k , √ n )) in the worst case, but in practice it is often seenthat fewer samples are required. In fact, the phase transition of binary matrices is experimentally seen to be on par with random Gaussianmatrices [33] which provably require O ( k log n ) samples for guaranteed reconstruction. ataset Algorithm × , k = 2 COMP 2 0 1COMP-SBL 2 0 1COMP-NNOMP 2 0 0COMP-NNLASSO 2 0 1 Dataset Algorithm × , k = 0 COMP 0 0 0COMP-SBL 0 0 0COMP-NNOMP 0 0 0COMP-NNLASSO 0 0 0 Dataset Algorithm × , k = 1 COMP 1 0 0COMP-SBL 1 0 0COMP-NNOMP 1 0 0COMP-NNLASSO 1 0 0 Dataset Algorithm × , k = 2 COMP 2 0 0COMP-SBL 2 0 0COMP-NNOMP 2 0 0COMP-NNLASSO 2 0 0 Dataset Algorithm × , k = 3 COMP 3 0 1COMP-SBL 2 1 1COMP-NNOMP 2 1 0COMP-NNLASSO 2 1 1COMP-BF 2 1 1 Dataset Algorithm × , k = 4 COMP 4 0 3COMP-SBL 3 1 2COMP-NNOMP 2 2 2COMP-NNLASSO 2 2 3COMP-BF 2 2 2Table 8: Results of lab experiments with each algorithm12. Sparsity estimation:
Our work uses an explicit sparsity estimator and does not rely on any assumption regardingthe prevalence rate.6.
Numerical comparisons:
The method in [37] can correctly identify up to 5/384 (1.3%) of samples with 48 tests,with an average number of false positives that was less than 2.75, and an average number of false negatives thatwas less than 0.33. On synthetic simulations with their 48 ×
384 Reed-Solomon code based matrix (releasedby the authors) for a total of 100 x vectors with ℓ norm of 5 using COMP-NNLASSO, we obtained 1.51 falsepositives and 0.02 false negatives on an average with a standard deviation of 1.439 and 0.14 respectively. UsingCOMP-SBL instead of COMP-NNLASSO with all other settings remaining the same, we obtained 1.4 falsepositives and 0.0 false negatives on an average with a standard deviation of 1.6 and 0.1 respectively. As such, adirect numerical comparison between our work and that in [37] is not possible, due to lack of available real data,however these numbers yield some indicator of performance. We have presented a non-adaptive, single-round technique for prediction of infected samples as well as the viral loads,from an array of n samples, using a compressed sensing approach. We have empirically shown on synthetic data aswell as on some real lab acquisitions that our technique can correctly predict the positive samples with a very smallnumber of false positives and false negatives. Moreover, we have presented techniques for appropriate design of themixing matrix. Our single-round testing technique can be deployed in many different scenarios such as the following:1. Testing of 105 symptomatic individuals in 45 tests.2. Testing of 195 asymptomatic individuals in 45 tests assuming a low rate of infection. A good use case for this isairport security personnel, delivery personnel, or hospital staff.3. Testing of 399 individuals in 63 tests. This can be used to test students coming back to campuses, or policeforce, or asymptomatic people in housing blocks and localities currently under quarantine.4. Testing of 961 people in 93 tests, assuming low infection rate. This might be suitable for airports and otherplaces where samples can be collected and tested immediately, and it might be possible to obtain liquid handlingrobots. Outputs:
We have designed an Android app named Byom Smart Testing to make our Tapestry protocol easy todeploy in the future. The app can be accessed at the following link: https://drive.google.com/file/d/1tfrzKfbewVqwOOKhpCb1WplMfsUfAOe0/view?usp=sharing .We are also sharing our code and some amount of data at the link below: https://github.com/atoms-to-intelligence/tapestry . Future work:
Future work will involve extensive testing on real data, and extensive implementation of a variety ofalgorithms for sensing matrix design as well as signal recovery, keeping in mind the accurate statistical noise model.
References [1] R. Dorfman, “The detection of defective members of large populations,”
The Annals of Mathematical Statistics
IEEE Signal Processing Magazine , 2008.[6] R. Berinde, A. C. Gilbert, P. Indyk, H. Karloff, and M. J. Strauss, “Combining geometry and combinatorics: Aunified approach to sparse signal recovery,” in , 2008, pp. 798–805.[7] Y. Liu, L.-M. Yan, L. Wan, T.-X. Xiang, A. Le, J.-M. Liu, M. Peiris, L. L. M. Poon, and W. Zhang, “Viraldynamics in mild and severe cases of covid-19,”
The Lancet Infectious Diseases , 2020.138] C. L. Chan, P. H. Che, S. Jaggi, and V. Saligrama, “Non-adaptive probabilistic group testing with noisy measure-ments: Near-optimal bounds with efficient algorithms,” in
Asilomar Conference onSignals, Systems and Computers , 2008, p. 10591063.[11] E. Candes, “The restricted isometry property and its implications for compressive sensing,”
Comptes RendusMathematiques , 2008.[12] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof of the restricted isometry property forrandom matrices,”
Constr Approx , vol. 28, p. 253263, 2008.[13] M. Davenport, M. Duarte, Y. Eldar, and G. Kutyniok,
Introduction to compressed sensing . Cambridge UniversityPress, 2012, p. 164.[14] T. Hastie, R. Tibshirani, and M. Wainwright,
Statistical Learning with Sparsity: The LASSO and Generalizations .CRC Press, 2015.[15] J. Zhang, L. Chen, P. T. Boufounos, and Y. Gu, “On the theoretical analysis of cross validation in compressivesensing,” in
ICASSP , 2014, pp. 3370–3374.[16] Y. Li and G. Raskutti, “Minimax optimal convex methods for Poisson inverse problems under lq-ball sparsity,”
IEEE Trans. Information Theory , vol. 64, no. 8, 2018.[17] T. Bonesky, “Morozov’s discrepancy principle and tikhonov-type functionals,”
Inverse Problems , vol. 25, no. 1,2008.[18] P. Bohra, D. Garg, K. S. Gurumoorthy, and A. Rajwade, “Variance-stabilization-based compressive inversionunder Poisson or Poisson–Gaussian noise with analytical bounds,”
Inverse Problems , vol. 35, no. 10, 2019.[19] Y. Pati, R. Rezaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximationwith application to wavelet decomposition,” in
Asilomar Conf. On Signals, Systems and Computing , 1993, pp.40–44.[20] T. T. Cai and L. Wang, “Orthogonal matching pursuit for sparse signal recovery with noise,”
IEEE Transactionson Information Theory , vol. 57, no. 7, pp. 4680–4688, 2011.[21] M. Yaghoobi, D. Wu, and M. Davies, “Fast non-negative orthogonal matching pursuit,”
IEEE Signal ProcessingLetters , vol. 22, no. 9, pp. 1229–1233, 2015.[22] M. Tipping, “Sparse Bayesian learning and the relevance vector machine,”
Journal of Machine Learning Research ,2001.[23] D. Wipf and B. D. Rao, “Sparse Bayesian learning for basis selection,”
IEEE Trans. Signal Processing , vol. 52,no. 8, 2004.[24] E. Crespo Marques, N. Maciel, L. Naviner, H. Cai, and J. Yang, “A review of sparse recovery algorithms,”
IEEEAccess , vol. 7, pp. 1300–1322, 2019.[25] A. Nalci, I. Fedorov, M. Al-Shoukairi, T. T. Liu, and B. D. Rao, “Rectified gaussian scale mixtures and the sparsenon-negative least squares problem,”
IEEE Transactions on Signal Processing , vol. 66, no. 12, pp. 3124–3139,2018.[26] M. Lotfi and M. Vidyasagar, “A fast noniterative algorithm for compressive sensing using binary measurementmatrices,”
IEEE Transactions on Signal Processing , vol. 66, no. 15, 2018.[27] “Kirkman’s schoolgirl problem,” https://mathworld.wolfram.com/KirkmansSchoolgirlProblem.html.[28] “Steiner triple systems,” https://en.wikipedia.org/wiki/Steiner system
Applied andComputational Harmonic Analysis , vol. 37, no. 1, pp. 12 – 35, 2014.[30] V. Abdoghasemi, S. Ferdowsi, B. Makkiabadi, and S. Sanei, “On optimization of the measurement matrix forcompresive sensing,” in
EUSIPCO , 2010.[31] D. Du, F. K. Hwang, and F. Hwang,
Combinatorial group testing and its applications . World Scientific, 2000.[32] S. Ghosh, A. Rajwade, S. Krishna, N. Gopalkrishnan, T. E. Schaus, A. Chakravarthy, S. Varahan, V. Appu,R. Ramakrishnan, S. Ch, M. Jindal, V. Bhupathi, A. Gupta, A. Jain, R. Agarwal, S. Pathak, M. A. Rehan,S. Consul, Y. Gupta, N. Gupta, P. Agarwal, R. Goyal, V. Sagar, U. Ramakrishnan, S. Krishna, P. Yin,D. Palakodeti, and M. Gopalkrishnan, “Tapestry: A single-round smart pooling technique for covid-19 testing,” medRxiv
ICASSP , 2015, p. 38013805.[35] C. Ravazzi, S. Fosson, T. Bianchi, and E. Magli, “Sparsity estimation from compressive projections via sparserandom matrices,”