[PDF] A Coding Theory Perspective on Multiplexed Molecular Profiling of Biological Tissues

Abstract

High-throughput and quantitative experimental technologies are experiencing rapid advances in the biological sciences. One important recent technique is multiplexed fluorescence in situ hybridization (mFISH), which enables the identification and localization of large numbers of individual strands of RNA within single cells. Core to that technology is a coding problem: with each RNA sequence of interest being a codeword, how to design a codebook of probes, and how to decode the resulting noisy measurements? Published work has relied on assumptions of uniformly distributed codewords and binary symmetric channels for decoding and to a lesser degree for code construction. Here we establish that both of these assumptions are inappropriate in the context of mFISH experiments and substantial decoding performance gains can be obtained by using more appropriate, less classical, assumptions. We propose a more appropriate asymmetric channel model that can be readily parameterized from data and use it to develop a maximum a posteriori (MAP) decoders. We show that false discovery rate for rare RNAs, which is the key experimental metric, is vastly improved with MAP decoders even when employed with the existing sub-optimal codebook. Using an evolutionary optimization methodology, we further show that by permuting the codebook to better align with the prior, which is an experimentally straightforward procedure, significant further improvements are possible.

Full PDF

AA Coding Theory Perspective on MultiplexedMolecular Proﬁling of Biological Tissues

Luca D’Alessio ∗ Broad Institute, Cambridge, MA [email protected]

Litian Liu ∗ MIT, Cambridge, MA [email protected]

Ken Duffy

Maynooth University, Ireland [email protected]

Yonina C. Eldar

Weizmann Institute of Science, Israel [email protected]

Muriel M´edard

MIT, Cambridge, MA [email protected]

Mehrtash Babadi

Broad Institute, Cambridge, MA [email protected]

Abstract —High-throughput and quantitative experimentaltechnologies are experiencing rapid advances in the biologi-cal sciences. One important recent technique is multiplexedﬂuorescence in situ hybridization (mFISH), which enables theidentiﬁcation and localization of large numbers of individualstrands of RNA within single cells. Core to that technology isa coding problem: with each RNA sequence of interest beinga codeword, how to design a codebook of probes, and howto decode the resulting noisy measurements? Published workhas relied on assumptions of uniformly distributed codewordsand binary symmetric channels for decoding and to a lesserdegree for code construction. Here we establish that both ofthese assumptions are inappropriate in the context of mFISHexperiments and substantial decoding performance gains can beobtained by using more appropriate, less classical, assumptions.We propose a more appropriate asymmetric channel model thatcan be readily parameterized from data and use it to developa maximum a posteriori (MAP) decoders. We show that falsediscovery rate for rare RNAs, which is the key experimentalmetric, is vastly improved with MAP decoders even whenemployed with the existing sub-optimal codebook. Using anevolutionary optimization methodology, we further show that bypermuting the codebook to better align with the prior, which isan experimentally straightforward procedure, signiﬁcant furtherimprovements are possible.

I. I

NTRODUCTION

In recent years, the ﬁeld of single-cell biology has witnessedtransformative advances in experimental and computationalmethods. Of particular interest is the recent advent of mul-tiplexed ﬂuorescence in situ (in-place) hybridization (mFISH)microscopy techniques that allow molecular proﬁling of hun-dreds of thousands of cells without disturbing their complexarrangement in space. This highly-informative data modalitypaves the way to transformative progress in many areas ofbiology, including understanding morphogenesis, tissue regen-eration, and disease at molecular resolution.

Luca D’Alessio and Mehrtash Babadi acknowledge funding and supportfrom Data Sciences Platform (DSP), Broad Institute. Litian Liu acknowledgesﬁnancial support from Klarman Family Foundation. Yonina Eldar acknowl-edges funding from NIMH grant 1RF1MH121289-0. The authors thankSamouil L. Farhi for beneﬁcial discussions, and Aviv Regev for supportingthis project. * These two authors contribute equally to the work.

One of the major challenges in designing such experimentsis the vastness of functional bio-molecules. For example,the human genome codes nearly 30k non-redundant types ofRNA molecules, many of which translate to proteins withspeciﬁc functions. Modern data-driven biology heavily relieson our ability to measure as many different types of functionalmolecules as possible. Clearly, a sequential imaging approachis impractical. Fortunately, a typical cell produces a rathersparse set of all molecules, and some of the most promisingmFISH techniques exploit molecular sparsity in space togetherwith coding ideas in order to multiplex the measurements intofewer imaging rounds [1]–[5].In brief, the mFISH technique involves assigning binarycodes to RNA molecules of interest, chemically synthesizingand “hybridizing” these codes to the molecules, and measuringthem in space one bit at a time via sequential ﬂuorescencemicroscopy. A more detailed account of one such pioneeringtechnique known as MERFISH (“multiplexed error-robustﬂuorescence in situ hybridization”) [1] is given in Sec. I-A(also, cf. Fig. 1). An important part of the MERFISH protocolis the utilization of sparse codes with large minimum distanceto allow error correction. Referred to as MHD4 codes [1]–[3], these 16-bit codes have minimum Hamming distance 4and contain 4 ones and 12 zeros each. The bit imbalance ismotivated by the empirically observed ∼ × higher signalfallout rate compared to false alarm. There are only 140 suchcodes and therefore one is limited to measuring at most 140distinct molecules. These codes are randomly assigned to theRNA molecules of interest. The decoding method in currentuse relies on quantization, Hamming error correction, andrejection of ambiguous sequences.We point out that the assumptions motivating the codebookconstruction and decoding, tacitly yet heavily, rely on sourceuniformity and to a certain extent on the binary symmetric channel paradigm, both of which are violated in the contextmolecular proﬁling. For channel coding in communication,source can be readily assumed as uniformly distributed thanksto compression in source coding and the separation theorem[6]. In molecular proﬁling, however, source compression isnot applicable and the distribution of RNA molecules is a r X i v : . [ c s . I T ] F e b NA codeword

THSB1FLNAFBN2 ... ...

THSB1

R1 R3 R5 R15 ab readout round 1 readout round 2 readout round 3 ... c spot intensities Fig. 1. A schematic overview of a typical mFISH experiment. (a) codebookand probe design; (b) sequential imaging; (c) image processing and decoding. extremely non-uniform. Moreover, ﬂuorescence microscopyis established to be highly asymmetric in terms of falloutand false alarm. These violated assumptions become a sourceof potential problems when directly applying communicationencoding and decoding paradigms. For example, the false dis-covery rate of rare molecules is found to be unacceptably highin replicate experiments [1], [2], which we later show to be aconsequence of the assumed source uniformity. Accurate quan-tiﬁcation of rare RNA molecules (e.g. transcription factors)is particularly important for data-driven biological discoverysince rare molecules often signal rare events, transient cellsstates, etc. This motivates our primary goal in this paper: toincorporate the prior non-uniformity in the decoding processin a principled way in order to control false discovery rate ofrare molecules. In practice, either accurate priors are known,can be estimated from the data, or can be measured cheaplyand effortlessly (e.g. using bulk RNA sequencing [1]).This paper is organized as follows: we review the MERFISHprotocol in Sec. I-A and propose a generative model for thedata in Sec. II-A, along with a model ﬁtting algorithm anda procedure to derive a more tractable binary asymmetricchannel (BAC) formulation from the ﬁtted model. The BACframework allows us to evaluate the performance of differentencoding and decoding schemes. We incorporate the priornon-uniformity into the decoding algorithm by developing a maximum a posteriori (MAP) decoder with a tunable rejectionthreshold in Sec. II-D. We show that the false discoveryrate of rare RNAs, which is the key experimental metric,is vastly improved compared to the presently used MLE-based decoding method [1]–[3], even when employed withthe existing sub-optimal MHD4 codebook. Finally, we take aﬁrst step in data-driven code construction in Sec. III. Usingan evolutionary optimization methodology, we show that bypermuting the codebook to better align with the prior, whichis an experimentally straightforward procedure, signiﬁcantfurther improvements are possible. We conclude the paper inSec. IV with of follow up research directions.

A. A brief overview of the MERFISH protocol

In this section, we brieﬂy review the MERFISH protocol [1],recount different sources of noise and nuisance, and motivatea generative process for MERFISH data. Fig. 1 shows aschematic overview of the MERFISH technique. This protocolconsists of four main steps:

Step 1.

A unique binary codewordof length L = 16 is assigned to the RNA molecules of interest; Step 2.

The specimen is stained with carefully designed shortRNA sequences called encoding probes . The middle part ofthe encoding probes bind with high speciﬁcity to a single RNAtype while their ﬂanking heads and tails contain a subset of L artiﬁcial sequences, { R , . . . , R L } , called readout sequences .The choice of readout sequences reﬂects the intended binarycodeword. For instance, if the code for a certain RNA typecontains “1” at positions 1, 3, 5, and 15, the encoding probesare designed to have R , R , R , and R ﬂanking sequences(see Fig. 1a); Step 3.

The prepared tissue undergoes L roundsof imaging. Imaging round l begins with attaching ﬂuorescentreadout probes for round l to the prepared tissue. Theseprobes bind to the ﬂanking readout sequences and containa ﬂuorescent dye that emits light upon excitation. The roundends with bleaching the dye. In effect, imaging round l revealsthe position of all RNA molecules having “1” in their binarycodeword at position l . Step 4.

Finally, the position of RNAmolecules, which appear as bright spots, are identiﬁed usingconventional image processing operations (see Fig. 2). Thedata is summarized as an N × L intensity matrix ( N being thenumber of identiﬁed spots) and is ultimately decoded accord-ing to the codebook. MERFISH measurements are affected byseveral independent sources of noise. These include factorsthat are intrinsic to individual molecules, such as (1) stochas-ticity in the hybridization of encoding and readout probes, (2)random walk of molecules between imaging rounds, and (3)CCD camera shot noise. These factors module the intensitymeasurements independently in each round and are largelyuncorrelated across rounds. Extreme multiplexing (e.g. as inthe seqFISH+ protocol [5]) further leads to interference noise due to signal mixing between nearby molecules. This nuisance,however, is rather negligible in the MERFISH protocol.II. M ETHODOLOGY AND R ESULTS

A. A generative model for mFISH data

In this section, we present a simple generative model forMERFISH spot intensity data, ﬁt the model to real data,and evaluate the goodness of ﬁt. This model will serve asa foundation for developing a MAP decoder. Fig. 2 shows atypical example of MERFISH data from [2]. We formalize thedata generating process as follows: let C ⊂ { , } L be a setof codewords with cardinality | C | = K which are assigned to G ≤ K molecules, let a : ˜ C → { , . . . , G } be the bijectivecode assignment map where ˜ C ⊂ C , | ˜ C | = G is the set of usedcodes, and let π G be the prior distribution of molecules.Setting aside interference effects, we model the ﬂuorescence ll Lo c a l P ea ks I s o l a t ed Lo c a l P ea ks Spot Intensities ab c

Fig. 2. Extraction of isolated spots from MERFISH images (data from [2]).(a) local peak ﬁnding; (b) identiﬁcation of isolated spots; (c) intensity seriesfrom 10 random spots (rows); the leftmost 16 columns show the intensitymeasurements; the last two column show the summed intensity and nearest-neighbor cross-correlations and are used for ﬁltering of poorly localized spots.

Log Empirical Intensity P D F Log Empirical Intensity

Log M ode l ed I n t en s i t y a b Fig. 3. Modeling spot intensities as two-component Gaussian mixture for eachdata dimension (i.e. readout round and color channel). (a) model ﬁtting (blackand red lines) and empirical histograms (gray); the green lines indicate thequantization thresholds for the ensuing BAC approximation; (b) QQ-plots foreach data dimension; the labels shown in the sub-panels indicate hybridizationrounds { , . . . , } and color channels { , } . intensity series I L ∈ [0 , ∞ ) L measured for an arbitrarymolecule as follows: g ∼ Categorical( π ) , c = a − ( g ) , log I l | c l ∼ N ( µ l [ c l ] , σ l [ c l ]) . (1)As discussed earlier, the intrinsic spot intensity noise (1) ismultiplicative, (2) results from a multitude of independentsources, and (3) is uncorrelated across imaging rounds, mo-tivating factorizing I L | c L in l and modeling each condi-tional as a Gaussian in the logarithmic space. The well-knownheteroscedasticity of ﬂuorescence noise is reﬂected in havingtwo different σ [ c ] for c ∈ { , } for the two binary states. B. Image processing and model ﬁtting

The most straightforward way to ﬁt the generative model toempirical data is by observing that marginalizing the (discrete)molecule identity variable g yields a two-component Gaussianmixture model (GMM) for log I l , with weights determined bythe prior π , codebook C , and the assignment a . The modelparameters { µ L [0] , σ L [0] , µ L [1] , σ L [1] } can be readilyestimated by ML GMM ﬁtting to each column of the spotintensity table (cf. Fig. 1c), which can be performed efﬁcientlyusing the conventional EM algorithm. In order to decouplethe intrinsic and extrinsic spot noise in the raw data, we censor the dataset to only spatially isolated molecules. Inbrief, we process the images as described in [1], subtractthe background, identify the position of molecules by localpeak-ﬁnding, censor dense regions (e.g. cell nuclei), and retain local peaks that are separated from one another at least by ∼ , which is a few multiples of the diffraction limit.We perform additional ﬁltering based on the spot intensitypattern and nearest-neighbor Pearson correlation (cf. Fig. 2)and only retain peaks with a symmetric appearance. Thisprocedure yields ∼ C. Quantization, channel model and estimation

The generative model speciﬁed by Eq. (1) readily yields theposterior distribution

Pr( g | I ; π , C , a ) and can form the basisof an intensity-based MAP decoder. To make the formulationmore amenable for computational and theoretical investigation,as well as making a connection to the currently used decodingmethod, we derive an approximate binary asymmetric channel(BAC) model from Eq. (1) through quantization. The optimalquantization thresholds θ L are determined for each l tobe the point of equal responsibility between the two Gaus-sian components, i.e. (cid:80) Gg =1 π g a − ( g )[ l ] N ( θ l | µ l [1] , σ l [1]) = (cid:80) Gg =1 π g [1 − a − ( g )[ l ] N ( θ l | µ l [0] , σ l [0]) , which amdits aclosed-form solution. Here, a and π correspond to the knowncode assignment and prior distribution of the data used forﬁtting. The fallout p → and false alarm p → rates are givenby the integrated probability weights of the two Gaussiancomponents below and above the threshold (cf. Fig. 3 a ), i.e. p → l = Φ[( µ l [0] − θ l ) /σ l [0]] and p → l = Φ[( θ l − µ l [1]) /σ l [1]] ,where Φ( · ) is the CDF of the standard normal distribution.We ﬁnd p → l and p → l to be . and . (mean in l ), respectively, for the data given in Ref. [2], which is inagreement with the estimates reported therein. We, however,observed signiﬁcant round-to-round variation in the channelparameters and as such, refrained from further simplifying thechannel model to a single BAC for all imaging rounds l . Werefer to the bundle of estimated BAC parameters as θ BAC . D. Decoding: MAP and MLE decoders

A gratifying property of the BAC approximation of Eq. (1)is allowing us to evaluate the performance of various decodingstrategies without resorting to time-consuming simulations orfurther analytical approximations. In the BAC model, thelikelihood of a binary sequence x L ∈ { , } L conditionedon the codeword c ∈ C is given as: log Pr( x | c , θ BAC ) = L (cid:88) l =1 (cid:88) i,j ∈{ , } δ c l ,i δ x l ,j log p i → jl , (2)where δ · , · is the Kronecker’s delta function. We deﬁne the posterior Voronoi set for each codeword c ∈ C as: V ( c | a, ω , C , θ BAC ) = (cid:8) x ∈ { , } L | ∀ c (cid:48) ∈ C , c (cid:54) = c (cid:48) : ω a ( c ) Pr( x | c , θ BAC ) > ω a ( c (cid:48) ) Pr( x | c (cid:48) , θ BAC ) (cid:9) , (3)where ω G is the prior distribution assumed by the decoder .The Voronoi sets are mutually exclusive by construction, canbe obtained quickly for short codes by exhaustive enumeration,nd determine the optimal codeword for an observed binarysequence. The MLE decoder corresponds to using a uniformprior, i.e. ω ← /G whereas the MAP decoder correspondsto using the actual (non-uniform) prior governing the data ω ← π . We additionally introduce a MAP q decoder , which isa MAP decoder obtained from depleting the Voronoi sets frombinary sequences for which the posterior probability of thebest candidate code is below a set threshold q . Intuitively, theMAP q decoder is a Bayesian decoder with reject option thattrades precision gain for sensitivity loss by ﬁltering dubioussequences from the Voronoi sets. The decoding algorithm in-troduced by Ref. [1]–[3] can be thought of as the MLE decoderwith a rejection subspace given by S rej = { x | ∃ c , c (cid:48) , c (cid:54) = c (cid:48) ∈ C : d H ( c , x ) = d H ( c (cid:48) , x ) = d ∗ ( x , C ) } where d H ( · , · ) isthe Hamming distance and d ∗ ( x , C ) = inf c ∈ C d H ( c , x ) . Werefer to this decoder as Mofﬁtt (2016) . We remark that theacceptance criterion of Mofﬁtt (2016) is extremely stringent:for MHD4 codes, | S acc | = 9100 , which is only ∼ of all possible sequences (here, S acc is the complement of S rej ). In all cases, the confusion matrix T ( c | c (cid:48) ) , i.e. theprobability that a molecule coded with c (cid:48) is decoded to c ,can be immediately calculated: T ( c | c (cid:48) ; π , ω , θ BAC ) = (cid:88) x ∈ V ( c | ω ,... ) π a ( c (cid:48) ) Pr( x | c (cid:48) , θ BAC ) (4)from which the marginal true positive rates TPR G and falsediscovery rates FDR G can be readily calculated. E. Comparing the performance of different decoders

Developments in previous sections allow us to compare theperformance of MLE, MAP, MAP q , and Mofﬁtt (2016). Weuse the BAC parameters obtained from the data in [2], 16-bit MHD4 codes with random assignment, and two differentpreviously estimated and published source priors with differentdegree of non-uniformity. As a ﬁrst step, we compare the per-formance of our proposed MAP and MLE decoders separatelyinside S acc and S rej , the acceptance and rejection subspaces ofMofﬁtt (2016), in Fig. 4 a, b (middle, bottom). The priors areshown on the top, including the estimated Dirichlet concentra-tion α . Marginal performance metrics for different moleculesare color-coded according to their prior rank from red (mostabundant) to blue (least abundant). The MLE decoder inside S acc is equivalent to Mofﬁtt (2016). Both decoders performwell in this subspace. While the MLE decoder is performingpoorly inside S rej , providing a sound basis for rejection asin Mofﬁtt (2016), the MAP decoder yields acceptable FDR,hinting that the S acc is too stringent for the MAP decoderand better performance can be expected from MAP q . It alsonoticed that MAP decoder controls FDR much better thanMLE inside S acc for the more non-uniform prior. We explorethis observation more systematically in panel c . We sample π from a symmetric Dirichlet distribution with concentration α and calculate the distribution of the marginal FDRs (bottom) aswell as the uniform mismatch rate (top). We notice that as theprior gets more concentrated log α → −∞ , the MAP decoder behaves progressively better whereas the MLE decoder de-grades and exhibits a bi-modal behavior: extremely low (high)FDR on abundant (rare) codes. As the prior gets more uniform log α → + ∞ , MLE and MAP become indistinguishable. Thegreen and red symbols show the biological priors used inpanels a and b , respectively, together with their estimated α , inagreement with the trend of the Dirichlet prior model. Finally,panel d compares the performance of the MAP q decoder atdifferent rejection thresholds q with Mofﬁtt (2016). The priorused here is the same as in panel b . It is noticed that theMAP q decoder is remarkably effective at controlling FDR forall codes whereas Mofﬁtt (2016) degrades in FDR for rarecodes, as expected from the source uniformity assumption.This ﬁnding explains the reportedly lower correlation betweenrare molecules in replicate experiments [1], [2]. The smallerpanels at the top of panel c show mean TPR, FDR, andrejection rate across all molecules. The MAP . decoder hassimilar sensitivity to Mofﬁtt (2016) while yielding ∼ lower FDR on average and remarkably ∼ lower 5-95FDR percentile range, implying signiﬁcant improvement inreducing the mean and variance of false positives for bothabundant and rare molecules.III. D ATA - DRIVEN CODE CONSTRUCTION

The results presented so far were obtained randomly as-signing a ﬁxed set of MHD4 codes. Constructing codes tobetter reﬂect channel asymmetry and prior non-uniformity isanother attractive opportunity for improving the performanceof mFISH protocols. Constructing application-speciﬁc codesfor mFISH is outside the scope of the present paper and isa topic for future research. Here, we continue to thread onthe theme of utilizing prior non-uniformity and show thatoptimizing the assignment of the even sub-optimal codes tomolecules with respect to prior abundance can signiﬁcantlyreduce FDR. This is to be expected given the rather wideperformance outcomes shown in Fig. 4 that result from randomcode assignment. Explicitly, we seek to optimize the scalarizedmetric

FDR( a, π ) = G − (cid:80) Gg =1 FDR g ( a, π ) over the assign-ment operator a for a given prior π through an evolutionaryoptimization process. We start with a population of N = 5000 random code assignments, mutate the population via pairwisepermutations with a small probability of . per moleculeper assignment, and select the ﬁttest N offsprings using FDR as the measure of ﬁtness. We do not use a crossoveroperation here. We hypothesize that a relevant surrogate for theoptimality of

FDR is the concordance between the Hammingdistance d H and the prior distance d π ( c , c (cid:48) ) ≡ | π a ( c ) − π a ( c (cid:48) ) | .We investigate the emergence of this order by monitoring thefollowing order parameter during the evolution: χ ( a, π ) ≡ G G (cid:88) g =1 ρ s (cid:104) d H (cid:0) a − ( g ) , C a (cid:1) , d π (cid:0) a − ( g ) , C a (cid:1)(cid:105) , (5)where ρ s [ · , · ] denotes the Spearman correlation and C a is theordered list of all codes used by a over which the correlation iscalculated. We refer to the population average of χ ( a, π ) as χ . a c d Fig. 4. Comparing the performance of different decoding schemes for randomly assigned MHD4 codes. (a) and (b) correspond to prior distribution for RNAmolecules selected in [2] and [3], respectively. The top panels show the rank-ordered prior distribution and the estimated Dirichlet concentration parameter α ;the middle and bottom panels show the marginal TPR and FDR for each molecule type conditioned on S acc and S rej subspaces (cf. Sec. II-D); markers arecolor-coded according to prior rank of their corresponding molecules. Shaded regions indicate 5-95 percentile range as a matter of random code assignment;(c) the effect of prior non-uniformity on the performance of MLE and MAP decoders for randomly assigned MHD4 codes. The top panels show the uniformmismatch rate. The bottom panels show the histogram of marginal FDRs vs. Dirichlet prior concentration α in grayscale. The orange lines and regions indicatethe median and 5-95 percentile ranges; (d) performance of MAP decoders with reject at different acceptance thresholds compared to method in [2]. Generation F D R FDR g T P R g MAPMAP-Opt a b

Fig. 5. Evolutionary optimization of code assignment for MHD4 codes(for channel model described in Fig. 3 and prior distribution from [3]). (a)bottom: mean FDR vs. generation; top: d H − d π matching order parametervs. generation (see Eq. 5); (b) the performance of MAP decoder for randomlyassigned codes (squares) vs. optimized assignment (circles). We implement the evolutionary algorithm using the PyMOOpackage [7] and vectorize the calculation of Voronoi sets withGPU acceleration. Fig. 5 shows the results obtained by runningthe evolutionary optimization for three days (NVIDIA TestlaP100 GPU, MHD4 codes, prior from [3]). Panel a showsthe monotonic decline of FDR to ∼ of its initial value(random assignment). This trend proceeds concurrently witha monotonic upturn in χ , providing evidence for the hypoth-esized matching order between d H and d π . Panel b comparesthe performance metrics of the MAP decoder between the ﬁrstand last population of code assignments. It is noticed that theoptimized code assignment predominantly reduces FDR ofrare molecules, the mean FDR of which reduce to ∼ ofrandomly assigned codes. The possibility to reduce the FDR ofrare molecules is a particularly favorable outcome in practice.IV. C ONCLUSION AND O UTLOOK

In this paper, we reviewed multiplexed molecular proﬁlingexperiments from the perspective of coding theory, proposeda motivated generative model for the data, based on which wederived an approximate parallel BAC model for the system.We show that the exact MAP decoder of the BAC model vastly outperforms the decoding algorithm in current use interms of controlling FDR of rare molecules, the key exper-imental metric. This is achieved by taking into account thenon-uniformity of source prior, a “non-classical” aspect ofmultiplexed molecular proﬁling viewed as a noisy channel.We also took the ﬁrst step in data-driven code construction andshow that optimizing the assignment of existing sub-optimalcodes is another effective method for reducing false positives.Attractive directions for follow up research include con-structing application-speciﬁc codes to increase the throughputof the mFISH experiments, theoretical progress in under-standing the optimal assignment of existing codes (e.g. byinvestigating the geometry of Voronoi sets), extending thegenerative model and the ensuing channel description to q -ary codes (e.g. as in seqFISH and seqFISH+ experimentalprotocols [4], [5]), and taking into account spatial interferenceand color channel cross-talk in the data generating process.R EFERENCES[1] K. H. Chen, A. N. Boettiger, J. R. Mofﬁtt, S. Wang, and X. Zhuang,“Spatially resolved, highly multiplexed rna proﬁling in single cells,”

Science , vol. 348, no. 6233, p. aaa6090, 2015.[2] J. R. Mofﬁtt, J. Hao, G. Wang, K. H. Chen, H. P. Babcock, andX. Zhuang, “High-throughput single-cell gene-expression proﬁling withmultiplexed error-robust ﬂuorescence in situ hybridization,”

Proceedingsof the National Academy of Sciences , vol. 113, no. 39, pp. 11 046–11 051,2016.[3] J. R. Mofﬁtt, J. Hao, D. Bambah-Mukku, T. Lu, C. Dulac, and X. Zhuang,“High-performance multiplexed ﬂuorescence in situ hybridization inculture and tissue with matrix imprinting and clearing,”

Proceedings ofthe National Academy of Sciences , vol. 113, no. 50, pp. 14 456–14 461,2016.[4] S. Shah, E. Lubeck, W. Zhou, and L. Cai, “seqﬁsh accurately detectstranscripts in single cells and reveals robust spatial organization in thehippocampus,”

Neuron , vol. 94, no. 4, pp. 752–758, 2017.[5] C.-H. L. Eng, M. Lawson, Q. Zhu, R. Dries, N. Koulena, Y. Takei,J. Yun, C. Cronin, C. Karp, G.-C. Yuan et al. , “Transcriptome-scale super-resolved imaging in tissues by rna seqﬁsh+,”

Nature , vol. 568, no. 7751,pp. 235–239, 2019.[6] C. E. Shannon, “A mathematical theory of communication,”