[PDF] Semiparametric point process modelling of blinking artifacts in PALM

Abstract

Photoactivated localization microscopy (PALM) is a powerful imaging technique for characterization of protein organization in biological cells. Due to the stochastic blinking of fluorescent probes, and camera discretization effects, each protein gives rise to a cluster of artificial observations. These blinking artifacts are an obstacle for qualitative analysis of PALM data, and tools for their correction are in high demand. We develop the Independent Blinking Cluster point process (IBCpp) family of models, and present results on the mark correlation function. We then construct the PALM-IBCpp - a semiparametric IBCpp tailored for PALM data. We describe a procedure for estimation of parameters, which can be used without parametric assumptions on the spatial organization of proteins. The parameters include the kinetic rates that control blinking, and as such can be used to correct subsequent data analysis. The method is demonstrated on real data, and in a simulation study, where blinking artifacts were precisely quantified in a range of realistic settings.

Full PDF

SS EMIPARAMETRIC POINT PROCESS MODELLING OF BLINKINGARTIFACTS IN

PALM P REPRINT

Louis G. Jensen*

Department of MathematicsAarhus UniversityAarhus CDenmark

David J. Williamson

Randall Division for Cell and Molecular BiophysicsKing’s College LondonLondonUK

Ute Hahn

Department of MathematicsAarhus UniversityAarhus CDenmarkFebruary 1, 2021 A BSTRACT

Photoactivated localization microscopy (PALM) is a powerful imaging technique for characteriza-tion of protein organization in biological cells. Due to the stochastic blinking of ﬂuorescent probes,and camera discretization effects, each protein gives rise to a cluster of artiﬁcial observations. Theseblinking artifacts are an obstacle for qualitative analysis of PALM data, and tools for their correctionare in high demand. We develop the Independent Blinking Cluster point process (IBCpp) family ofmodels, and present results on the mark correlation function. We then construct the PALM-IBCpp- a semiparametric IBCpp tailored for PALM data. We describe a procedure for estimation of pa-rameters, which can be used without parametric assumptions on the spatial organization of proteins.The parameters include the kinetic rates that control blinking, and as such can be used to correctsubsequent data analysis. The method is demonstrated on real data, and in a simulation study, whereblinking artifacts were precisely quantiﬁed in a range of realistic settings. K eywords Photoactivated localization microscopy · Multiple blinking · Spatio-temporal point patterns · Markcorrelation function · Moment-based estimation · Second-order characteristics

In their 2006 paper (Betzig et al. 2006), Eric Betzig and his collaborators showed how it is possible to localize multipleﬂuorescing proteins within a diffraction limited spot, using photoactiveable ﬂuorescent proteins (PA-FP). PA-FPs canbe activated, read, and permanently photobleached in stochastic fashion. By splitting up the ﬂuorescent signal comingfrom different PA-FP into separate time intervals, each protein can be independently localized with high precision,even when they are closely located in space (Yamanaka et al. 2014).Unfortunately, it is the nature of PA-FPs to stochastically enter and reemerge from dark states a number of timesbefore permanently bleaching, leading to multiple appearances of the same protein (Annibale et al. 2011 a , Fricke et al.2015). For analysis of the spatial organization of molecules, these reappearances lead to erroneous conclusions, unlessexplicitly dealt with (Shivanandan et al. 2014). In addition, camera discretization effects complicate and exacerbate themultiple appearance problem (Grifﬁ´e et al. 2020, Patel et al. 2019), and an understanding of both PA-FP photophysicsand camera discretization effects is thus required to properly remedy the situation.Although such artifacts are best understood by considering the spatio-temporal behavior of PA-FPs, little has been doneto exploit both the spatial and temporal dimensions when analyzing blinking artifacts. In methods such as (Andersenet al. 2018, Sengupta et al. 2011), the spatial data alone is used, and require a model for protein behavior. Othermethods use the temporal ﬂuorescence traces to estimate the number of proteins in local regions (Hummer et al. 2016, * Corresponding author Email address: [email protected] a r X i v : . [ s t a t . A P ] J a n REPRINT - F

EBRUARY

1, 2021Karathanasis et al. 2017, Lin et al. 2015), which require either manual segmentation or external calibration samples.More recently, complex descriptions of PA-FP photophysics have been modeled by means of Hidden Markov Models(HMM) (Staudt et al. 2020, Patel et al. 2019). In (Patel et al. 2019), estimation is carried out by means of a calibrationsample of well-separated ﬂuorophores. In (Staudt et al. 2020), the authors model the conglomerate ﬂuorescent intensitytrace over a sequence of time points, as originating from some unknown number of PA-FP. This means that additionalparameters have to be estimated, and the information in the spatial dimension is not exploited. Although these methodsare powerful, and offer highly realistic modeling of the ﬂuorescent probes, they rely on intricate numerical routines tooptimize the (pseudo)-likelihood function, and are perhaps most relevant when the PA-FP themselves are of primaryinterest.In this paper, we introduce a modeling and estimation framework for analyzing data from PALM microscopy, with afocus on quick estimation of blinking artifacts. The results and methods are built on a class of realistic, semiparametriccluster point processes for the observed spatio-temporal data, and estimation is carried out using moment methods.Our approach leads to estimates of the kinetic rates that govern photoblinking, which can be used to quantify the effectof blinking artifacts on a given sample. Blinking rates are estimated without a need for parametric modeling of thespatial organization of proteins, allowing the methods to be applied to essentially any PA-FP experiment.The paper is organized as follows. In Section 2, we brieﬂy go over the needed point process theory that will be used formodeling or estimation, and we give a quick rundown of the principles of PALM imaging, and how camera artifactscome into play. In Section 3, we deﬁne the family of Independent Blinking Cluster point processes, and presentresults on the mark correlation function. We then propose a particular model from the family, the PALM-IBCpp, formodeling of PALM data, and motivate the construction in terms of a discretized, 4-state PA-FP blinking model. InSection 4, we describe a procedure for estimation of the kinetic rates in the PA-FP blinking model. Section 5 considersa dataset expressing LAT-mEos3.2 PA-FP, and demonstrates the ability of the PALM-IBCpp to capture the spatio-temporal clustering behavior observed in real data. Finally, in Section 6, we simulate PA-FP with a range of differentspatial organizations and blinking behaviors, and illustrate the ability of our estimation methods to precisely recoverthe blinking rates. We also include a misspeciﬁed blinking model, and show that important PA-FP descriptors, such asthe total number of reappearances and time to bleaching, are still accurately recovered. Appendix A contains proofsof moment results hidden from the main text. Appendix B contains derivations of asymptotically exact, approximateexpressions for discretized moment statistics needed for parameter estimation. In Appendix C we argue for the validityof using our estimation procedures, without modiﬁcation, on samples with arbitrary protein organization. Finally, wepropose a way of selecting informative query points for summary functions in Appendix D.

In this section we present the notation and main point process concepts that we will be needing below, includingmoment measures, mark distributions, and the mark correlation function. We also explain some of the modeling difﬁ-culties that arise in super-resolution ﬂuorescence microscopy experiments, namely those associated with discretizationof the temporal information and background noise. For the general exposition, we work with processes on R d × R + .For a more rigorous introduction to point process theory, we refer to (Daley & Vere-Jones 2007). For more on markdistributions, see (Stoyan 1984). Finally, more on the acquisition and preparation of super-resolution ﬂuorescencemicroscopy data can be found in (Deschout et al. 2014). For the purpose of this paper, a spatio-temporal point process, V = { ( v i , t v i ) } ∞ i =1 , is a random, locally ﬁnite pointconﬁguration with distinct points in R d × R + . We assume the moment measures in the following are σ -ﬁnite.Write ↓ V = { v i } ∞ i =1 ( ground V ) for the random object obtained by stripping V of its times. Assume ↓ V is well-deﬁnedas a spatial point process on R d , having ﬁnite intensity function λ ↓ V and second-order product density λ (2) ↓ V . Then wecompute the intensity measure, Λ ↓ V , and second-order factorial moment measure, α (2) ↓ V , as Λ ↓ V ( A ) = E  (cid:88) v ∈ ↓ V A ( v )  = (cid:90) A λ ↓ V ( v ) dvα (2) ↓ V ( A × A ) = E  (cid:54) = (cid:88) ( v ,v ) ∈ ↓ V A × A ( v , v )  = (cid:90) A × A λ (2) ↓ V ( v , v ) d ( v , v ) REPRINT - F

EBRUARY

1, 2021working everywhere on Borel sets, and we deﬁne the pair correlation function g ↓ V in the usual way g ↓ V ( v , v ) = λ (2) ↓ V ( v , v ) λ ↓ V ( v ) λ ↓ V ( v ) We deﬁne the , M (1) V | v , as the conditional probability measure on R + satisfying Λ V ( A × B ) = E  (cid:88) ( v,t v ) ∈ V A ( v ) B ( t v )  = (cid:90) A M (1) V | v ( B ) d Λ ↓ V ( v ) for arbitrary Borel sets A and B . Similarly, the , M (2) V | ( v ,v ) , satisﬁes α (2) V ( × k =1 [ A k × B k ]) = E  (cid:54) = (cid:88) ( v ,t v ) , ( v ,t v ) ∈ V A × A ( v , v ) B × B ( t v , t v )  = (cid:90) A × A M (2) V | ( v ,v ) ( B × B ) dα (2) ↓ V ( v , v ) We then deﬁne the mark correlation function , k fV , as k fV ( v , v ) = (cid:82) f ( t v , t v ) dM (2) V | ( v ,v ) ( t v , t v ) (cid:82) (cid:82) f ( t v , t v ) dM (1) V | v ( t v ) dM (1) V | v ( t v ) (1)for f : R (cid:55)→ R + a non-negative Borel function of two times. In the following, we will only need to work with markdistributions that are independent of the conditioning points, ( v, v , v ) , and we simply write M (1) V and M (2) V . To understand how PALM works, consider ﬁrst a single ﬂuorescent emitter located at the position x . Wheneverﬂuorescence is emitted, it is captured by the camera, and the signal is integrated over the acquisition time lasting 1frame. Based on the intensity proﬁle observed on pixels, the position x is estimated, by assuming a shape for thepoint spread function (PSF) (Small & Stahlheber 2014, Ober et al. 2015), which models the blurry shape observedon a camera when imaging a point-source of light. The localization uncertainty associated with the estimate of x canthen be computed, and is included in the dataset for each localization. This procedure was made possible because weassumed only a single, isolated ﬂuorescent emitter. In a real biological sample, there can be several emitters at nearlythe same position, and the assumption of an isolated signal is thus often violated. However, if we only receive a signalof ﬁnite length from each emitter, in non-overlapping windows of time, their spatial proximity becomes irrelevant,and we can again determine the position of each emitter. In PALM, this temporal separation is made possible usingPA-FPs, which activate at different times, and turn off permanently after ﬁnite emission of ﬂuorescence. In this way,at most a few emitters should be activate at a given space-time location, and they can then be individually localized.Note that, using the procedure outlined above, each emitter will give rise to several localizations. To see why thisis true, assume again that a single ﬂuorescent emitter at position x sends out a (sufﬁciently bright) signal lasting intotal T seconds, and the frame acquisition time is ∆ seconds. We can then expect the signal to result in roughly T ∆ − estimates of x , all of which will be included in the sample as separate localizations. Depending on the totalﬂuorescence observed from the PA-FPs, and the camera framerate, this can lead to a large number of reappearances foreach PA-FP. It is natural to think that this problem can be solved by grouping localization that are close in space-time,but this is a difﬁcult problem in general. The localization uncertainty varies, and will occasionally take on large values.In addition, PA-FPs can enter temporary dark periods during which no ﬂuorescence is emitted for a prolonged periodof time. For these reasons, localizations arising from the same emitter can be relatively far removed from eachother,and can thus be easily confused with those arising from a nearby, or even nonexistent, emitter. Further complicatingsuch procedures is the lack of speciﬁc, apriori knowledge about the temporal behavior of the PA-FPs in the sample,making it hard to know how far in time it is reasonable to look for reappearances, or how many localizations to expectper grouping.In addition to reappearances, background noise will invariably affect the dataset. Each time ﬂuorescence is observedon the camera, it must be attributed as spurious background or coming from a PA-FP emission event, by means ofa separating threshold. Since we cannot set the threshold too high without losing the signal of real PA-FP, somebackground noise points will always be present in PALM recordings.3 REPRINT - F

EBRUARY

1, 2021

In this section we introduce the IBCpp family of models, which is a subset of clustered spatio-temporal point processeswith a particular spatio-temporal clustering structure that is natural for modeling of super resolution microscopy data.We consider some moment results and summary statistics of interest for IBCpps, and ﬁnally construct the PALM-IBCpp, which is a semiparametric IBCpp model tailored for PALM data.

In specifying the IBCpp model, there are a few layers to cover. It is most natural to start by deﬁning the spatio-temporalclusters, so let C = ( t x , G, { w i } i ∈ N , { (cid:15) i } i ∈ N ) be a collection of random objects associated with clustering. Here, t x is a random variable on R + referred to as the activation time , with marginal distribution denoted M (1) X . G is a ﬁniterandom variable on N , { w i } i ∈ N is a stochastic process on R + , and { (cid:15) i } i ∈ N is a collection of i.i.d. random variableson R d with radially symmetric, continuous density function h (cid:15) . We allow for dependence between the variables in ( t x , G, { w i } i ∈ N ) , but { (cid:15) i } i ∈ N and ( t x , G, { w i } i ∈ N ) are assumed independent. Using these quantities, we deﬁne the typical cluster at ( x, t x ) ∈ R d × R + , by the expression Y ( x,t x ) = { y i , t y i } Gi =1 = { x + (cid:15) i , t x + w i } Gi =1 (2)Since we will need clusters at countably many locations, let C k = ( t x k , G k , { w ki } i ∈ N , { (cid:15) ki } i ∈ N ) be deﬁned such thatthe collection { C k } k ∈ N consists of i.i.d. collections, each having the same distribution as C .Now, let ↓ X = { x i } ∞ i =1 be a point process on R d . Using C k , we mark each x k ∈ ↓ X with the associate t x k ∈ C k , sothat X = { x k , t x k } ∞ k =1 (3)and we refer to X as the protein process . Similarly, using again C k we deﬁne the blinking cluster at ( x k , t x k ) ∈ X via Y ( x k ,t xk ) = { x k + (cid:15) ki , t x k + w ki } G k i =1 (4)By combining X with its clusters, we obtain the blinking process by Z = (cid:91) ( x k ,t xk ) ∈ X Y ( x k ,t xk ) (5)Finally, let E = { e i , t e i } ∞ i =1 be a Poisson process (the noise process ) independent of Z , with intensity function on theform λ E ( e, t e ) = λ ↓ E ( t e ∈ [0 , b ]) b (6)for < b < ∞ , ≤ λ ↓ E < ∞ . Then, we deﬁne the Independent Blinking Cluster point process , O , by thesuperposition O = Z (cid:91) E (7)As a remark on the deﬁnition, note that including t x k both in the clusters, and as the marks of X , is done primarilybecause it gives X the nice interpretation of being a process of origins, both in time and space, from which theclusters grow. It also explains our choice of notation for the distribution of t x , M (1) X , which is then precisely the markdistribution of X . Throughout, let O be an IBCpp. Various simpliﬁcations of moment-based summary functions are available for O .Importantly, the mark correlation function takes a simple form. Although the estimation procedures of Section 4 canbe used on general ↓ X , the basis of these procedures is most clearly seen in the case where ↓ X is motion-invariant(stationary and rotation invariant) with well-deﬁned intensity and pair correlation function, so we will assume thathere, and refer to Appendix C for the general case. For clarity of exposition, we hide the derivations of the identitiesin this section - they can be seen in Appendix A. 4 REPRINT - F

EBRUARY

1, 2021Assume that G has ﬁnite ﬁrst and second moments, and that f is a symmetric function of 2 arrival times. We also(trivially) assume that we are not only looking at noise, so that λ ↓ X > . Set n c = E (cid:2) G (cid:3) E [ G ] − η = λ ↓ Z λ ↓ O The intensity and pair correlation functions are then easily found λ ↓ Z = λ ↓ X E [ G ] (8) λ ↓ O = λ ↓ Z + λ ↓ E (9) g ↓ O ( r ) = η (cid:0) g ↓ Z ( r ) − (cid:1) + 1 (10) g ↓ Z ( r ) = n c λ ↓ Z ( h (cid:15) ∗ h (cid:15) )( r ) + ( h (cid:15) ∗ g ↓ X )( r ) (11)with ( h (cid:15) ∗ h (cid:15) )( r ) = (cid:90) h (cid:15) ( o − x ) h (cid:15) ( o − x ) dx ( h (cid:15) ∗ g ↓ X )( r ) = (cid:90) (cid:90) g ↓ X ( || x − x || ) h (cid:15) ( o − x ) h (cid:15) ( o − x ) dx dx for arbitrary ( o , o ) satisfying || o − o || = r . For the mark correlation function, we consider ﬁrst Z , for which wehave k fZ ( r ) g ↓ Z ( r ) = γ ( f ) γ ( f ) n c λ ↓ Z ( h (cid:15) ∗ h (cid:15) )( r ) + ( h (cid:15) ∗ g ↓ X )( r ) (12)where γ ( f ) = (cid:90) (cid:90) f ( t , t ) dM (2) Y ( x ,tx ) ( t , t ) dM (1) X ( t x ) (13) γ ( f ) = (cid:90) (cid:90) f ( t , t ) dM (1) Z ( t ) dM (1) Z ( t ) (14)for arbitrary x , and f such that γ ( f ) > . More explicitly, we have γ ( f ) = E (cid:104)(cid:80) G ( i,j )=1 ( i (cid:54) = j ) f ( t x + w i , t x + w j ) (cid:105) E [ G ( G − (15) γ ( f ) = E (cid:104)(cid:80) Gi =1 (cid:80) G (cid:48) i =1 f ( t x + w i , t (cid:48) x + w (cid:48) j ) (cid:105) E [ G ] (16)where ( t (cid:48) x , G (cid:48) , { w (cid:48) j } ∞ j =1 ) is an independent copy of the typical cluster quantity ( t x , G, { w i } ∞ i =1 ) . Similarly, for theobserved IBCpp, O , we have k fO ( r ) g ↓ O ( r ) = η γ ( f ) γ O ( f ) (cid:104) k fZ ( r ) g ↓ Z ( r ) − (cid:105) + 1 (17)where γ O ( f ) = η γ ( f ) + (1 − η ) γ E ( f ) + 2(1 − η ) ηγ EZ ( f ) γ E ( f ) = (cid:90) (cid:90) f ( t , t ) dM (1) E ( t ) dM (1) E ( t ) γ EZ ( f ) = (cid:90) (cid:90) f ( t , t ) dM (1) E ( t ) dM (1) Z ( t ) for f with γ O ( f ) > . Expanding the above, we see that γ O ( f ) k fO ( r ) g ↓ O ( r ) = γ ( f ) (cid:20) ηλ ↓ O n c ( h (cid:15) ∗ h (cid:15) )( r ) (cid:21) + γ ( f ) (cid:2) η (( h (cid:15) ∗ g ↓ X )( r ) − (cid:3) + γ O ( f ) (18)5 REPRINT - F

EBRUARY

1, 2021And we have the important alternative characterization γ O ( f ) k fO ( r ) g ↓ O ( r ) = ( γ ( f ) − γ ( f )) (cid:20) ηλ ↓ O n c ( h (cid:15) ∗ h (cid:15) )( r ) (cid:21) + γ ( f ) (cid:2) g ↓ O ( r ) − (cid:3) + γ O ( f ) (19)obtained by simply combining (18) with (10) and (11). In (19), the inﬂuence of the protein locations, ↓ X , is entirelycontained in g ↓ O , which can be estimated nonparametrically. This will allow us to estimate the kinetic blinking rateswithout specifying a model for the proteins in Section 4. In order to use the IBCpp family in practice, we get more speciﬁc about the construction of the typical cluster, andthe noise process. The choices we make here are based on realistic models for ﬂuorophore photophysics, cameradiscretization effects, and localization errors, and lead to the PALM-IBCpp model. In the following, we will assumethat the spatial dimension is d = 2 everywhere. Of course, our methods can still be used for 3D-PALM recordingsby simply discarding the z -coordinates before estimation. The reason we do not directly model 3D data is that thelocalization uncertainty is generally asymmetrical in the z dimension (Shtengel et al. 2009), and a radially symmetricnoise proﬁle is then no longer a valid assumption.Dealing ﬁrst with the spatial components, we need to specify the localization error density, h (cid:15) . On the camera, a pointsource of light appears as a blurry spot, the shape of which is described by the PSF. We model the PSF associatedwith each localization using a symmetric Gaussian density function with variance ξ . As ξ depends on the number ofphotons detected, and various other stochastic nuisance factors that can vary over time and space, it makes sense totreat ξ as random. As a simple approximation, we assume that ξ is drawn i.i.d. for each observation. Thus, denotingby N (0 , ξ ) the density of the centered, symmetric Gaussian distribution in R with variance ξ , we set h (cid:15) ( x ) = E (cid:2) N (0 , ξ )( x ) (cid:3) (20)where the mean is taken with respect to ξ . The use of Gaussian approximate PSFs is standard practice, and generallyprovides highly accurate results (Zhang et al. 2007).Moving on to G and the temporal behavior of clusters, we take as basis a well-established 4-state model for continuoustime ﬂuorophore behavior (Grifﬁ´e et al. 2020, Rollins et al. 2015, Coltharp et al. 2012). We imagine the PA-FP areindependently following a Markov processes, with a single ﬂuorescent state (F), and 3 non-ﬂuorescent states, seeFigure 1. The processes always begins in the inactive state (I), and eventually moves to the (F) state. From here, it caneither go dark (D) temporarily, or permanently photobleach (B).Unfortunately, we cannot observe the process in continuous time. To describe the ﬂuorescent signal that is ultimatelyobserved from a PA-FP during the experiment, write ∆ for the length of 1 camera frame, and e ( t ) = (cid:26) if the PA-FP is in state F at time t otherwiseIf no ﬂuorescence was missed, we would then observe the discretized signal on camera frame k precisely when e ( t ) = 1 for any duration on frame k . In practice, there is a non-zero threshold on the amount of signal that must beobserved during a given integration length period, but this threshold is generally set low when recording PALM imagesin practice (Patel et al. 2019), so we ignore it and simply model the activity on frame k with the idealized expression ˜ e ( k ) = (0 , ∆] (cid:32)(cid:90) ∆ k ∆( k − e ( t ) dt (cid:33) and the observed timepoints are then k ∆ for k with ˜ e ( k ) = 1 . Thinking of t x as the temporal origin of a cluster, wetherefore specify the cluster quantities ( t x , G, { w i } ∞ i =1 ) as G = ∞ (cid:88) k =1 ˜ e ( k ) (21) t x = ∆ min { s : s (cid:88) k =1 ˜ e ( k ) = 1 } (22) w i = (cid:26) ∆ min { s : (cid:80) sk =1 ˜ e ( k ) = i } − t x if ≤ i ≤ G ∞ otherwise (23)where t x , w i , and G are understood to be computed from the same trace, e ( t ) , i.e. they are dependent. In this way,the timepoints of a typical cluster correspond precisely to the discretized signal in ˜ e ( k ) , see Figure 2. Finally, for thenoise process, we simply need to specify b as the time at which the PALM recording was terminated.6 REPRINT - F

EBRUARY

1, 2021

I F BDr F r R r D r B Figure 1: The transition diagram for the continuous time, photophysical model of ﬂuorophores. Transitions are Marko-vian, with rates indicated next to the relevant transition arrows.Figure 2: Camera discretization transforms the continuous process e ( t ) (in black) into the discrete process ˜ e ( k ) (inred). The observed timepoints are k ∆ for k with ˜ e ( k ) = 1 ; in this example there are such k , observed on frames { , , , , , } , and we thus have G = 6 , t x = ∆ , { w i } ∞ i =1 = (0 , ∆ , , , , , ∞ , ∞ , ... ) . In the following, let O = { o i , t o i } i ∈ N be a PALM-IBCpp model, observed with N points in the window W × [0 , b ] ⊂ R × R + . There are several ways in which estimation of parameters of O could proceed. Here, we suggest an approachthat estimates the cluster term auto-convolution, ( h (cid:15) ∗ h (cid:15) ) , and the ’signal fraction’, η , before the kinetic rates. We willuse approximate expressions for the theoretical moment quantities used in this section. The approximations are basedon simpliﬁed discretization assumptions, and are asymptotically exact as ∆ → ; we refer to Appendix B for details.For clarity of exposition, we describe our methods here on the assumption that ↓ X be motion-invariant, but we pointout again that the same methods can be used on general ↓ X without modiﬁcation. We will be using moment methods, based on second-order statistics of the location and time behavior of the process.We write d ( t , t ) = t + t P u ( t , t ) = ( | t − t | ≤ u ) and we set S fO ( r ) = γ O ( f ) k fO ( r ) g ↓ O ( r ) (24)where we may estimate γ O ( f ) k fO as the (unnormalized) mark correlation function of O , and g ↓ O is estimated also from O . Both quantities can be obtained using standard kernel estimators - for this work, we have used the implementationsin the R package Spatstat (Baddeley et al. 2015). Combining two such estimators we get an estimate for S fO , whichwe will denote ˆ S fO . In the following, we will work with S fO for f = P u for various u , and we assume that we have theestimate ˆ S fO ( r ) for r in a set R = { r , .., r m } . Standard kernel estimator implementations have reasonable defaultsfor the set of r -values, but we describe a rule for selecting a set that is more appropriate for our needs in Appendix D.7 REPRINT - F

EBRUARY

1, 2021

Localization software outputs an uncertainty for each localized emitter, and we will use this output to ﬁnd the clusterautoconvolution, (cid:100) ( h (cid:15) ∗ h (cid:15) ) . Recall that a PA-FP at location x ∈ R is observed with noise, so that the observedlocalization is instead given by x + (cid:15) , where the distribution of (cid:15) does not depend on x or any other observation in thesample. Given the deﬁnition of h (cid:15) in (20), we condition on ( ξ , ξ ) to obtain ( h (cid:15) ∗ h (cid:15) )( r ) = (cid:90) h (cid:15) ( o − x ) h (cid:15) ( o − x ) dx = E  e − r ξ ξ π ( ξ + ξ )  for || o − o || = r . Since our dataset includes an estimate of the localization uncertainty for each x i , we estimate theautoconvolution by (cid:100) ( h (cid:15) ∗ h (cid:15) )( r ) = E  e − r ξ ξ π ( ˆ ξ + ˆ ξ )  (25)where ˆ ξ and ˆ ξ are drawn from the observed distribution of localization uncertainties. Although there are generally only a small fraction of background noise points in a given region of interest (ROI), theway standard PALM imaging is conducted allows us to obtain a better estimate than the optimistic ˆ η = 1 . Typically,a part of bare imaging coverslip will be included in such a PALM recording, both because it adds a reference for thelocation of the ROI, and because the imaging window may be larger than the cell being imaged, see Figure . Weassume that the only ﬂuorescent signal coming from the bare part of the coverslip is attributed to background noise,and that the intensity of the noise process is constant across the imaging window.Let ↓ O noise be the noise process locations observed in a bare part of the coverslip. Under the assumptions above, wethen have that O noise is a stationary Poisson process with intensity λ ↓ E . Write W noise for the region in which O noise is observed. The standard estimator of λ ↓ E is then ˆ λ ↓ E = | ( O noise ∩ W noise ) || W noise | where | ( O noise ∩ W noise ) | denotes the number of points O noise has in W noise , and | W noise | is the area of W noise .Using this estimator of λ ↓ E , we obtain ˆ η = 1 − ˆ λ ↓ E ˆ λ ↓ O (26)where ˆ λ ↓ O is the estimated intensity of O . As a basis for estimation of blinking rates, ˆ S P u O is ﬁrst computed for u in U ⊂ R + . A good choice for the set U isdiscussed in Appendix D. We will also need an estimate of γ O ( P u ) - the standard estimator is obtained by setting ˆ γ O ( P u ) = 1 N ( N − (cid:54) = (cid:88) i,j P u ( t o i , t o j ) where the sum is over all distinct pairs of timepoints in the observation window (Gelfand et al. 2010, p. 393).In addition, we will need ˆ γ ( P u ) . From (14) we see that γ ( f ) is simply the mean of f under 2 arrival times indepen-dently following the 1-point mark distribution of Z , M (1) Z . Thus, by estimating M (1) Z , it is trivial to obtain γ ( f ) forany f . To do this, write M (1) Z ( u ) = (cid:82) u dM (1) Z ( x ) for the CDF associated with M (1) Z , and M (1) O ( u ) similarly the CDFassociated with the 1-point mark distribution of O , M (1) O . Then, it is easy to show that (see Appendix A) M (1) O ( u ) = ηM (1) Z ( u ) + (1 − η ) ub REPRINT - F

EBRUARY

1, 2021Using ˆ η and replacing M (1) O ( u ) with the observed arrival time CDF, we obtain ˆ M (1) Z ( u ) = N − (cid:80) Ni =1 ( t o i ≤ u ) − (1 − ˆ η ) ub ˆ η (27)from which ˆ γ ( P u ) is obtained numerically or by sampling. Now, write ζ u = ( γ ( P u ) − γ ( P u )) n c and, solving a minimum contrast problem cf. the theoretical form of S P u O in (19) with estimated quantities in place oftheoretical, we deﬁne the estimator ˆ ζ u = ˆ λ ↓ O ˆ η (cid:80) r ∈ R (cid:104) ˆ S P u O ( r ) − ˆ γ ( P u )(ˆ g ↓ O ( r ) − − ˆ γ O ( P u ) (cid:105) (cid:104) (cid:100) ( h (cid:15) ∗ h (cid:15) )( r ) (cid:105)(cid:80) r ∈ R (cid:104) (cid:100) ( h (cid:15) ∗ h (cid:15) )( r ) (cid:105) (28)for u ∈ U .Using ˆ ζ u , we set up the following minimization problem for 3 of the rates in the 4-state model: min r D ,r R ,r B (cid:88) u ∈ U (ˆ ζ u − ( γ ( P u ) − ˆ γ ( P u )) n c ) (29)where the rates control the values of γ ( P u ) and n c . As we will see below, r F has no impact on the minimizationproblem, and therefore does not appear. To optimize over the rates, we need expressions for γ ( P u ) and n c . Closed-form expressions are not available due to complicated discretization effects, and simulation is generally too prohibitiveto be used as part of an optimization regime. For these reasons, we will employ various approximations, we againrefer to Appendix B for their derivations.Deﬁne the following quantities, associated with the behavior of a PA-FP following the 4-state model p = r B r D + r B N b ∼ Geom ( p ) W F ∼ Exp ( r D + r B ) W D ∼ Exp ( r R ) W I ∼ Exp ( r F ) φ R ( v ) = E (cid:2) e ivW R (cid:3) φ F ( v ) = E (cid:2) e ivW F (cid:3) φ ( F + R ) ( v ) = E (cid:2) e ivW F (cid:3) E (cid:2) e ivW R (cid:3) where by Geom we mean a geometric distribution starting from 1. Then, recall again that ∆ is the frame integrationlength, and denote A ( v ) = 2 E [ N b ] (cid:16) φ F ( v ) e − iv ∆ + (cid:16) E [ W F ]∆ − (cid:17) ( e − ∆ iv − − (cid:17) (1 − e − ∆ iv ) B ( v ) = φ R ( v ) (cid:0) E (cid:2) φ ( F + R ) ( v ) N b (cid:3) − − E [ N b ] ( φ ( F + R ) ( v ) − (cid:1) C ( v ) = 2 e − iv ∆2 (1 − e − ∆ iv ) (cid:32) φ F ( v ) e iv ∆ − φ ( F + R ) ( v ) − (cid:33) D = E (cid:2) N b (cid:3) (cid:18) E [ W F ]∆ + 12 (cid:19) + E [ N b ] (cid:34) E (cid:2) W F (cid:3) − E [ W F ] ∆ − E [ W F ]∆ − (cid:35) and the CDF u (cid:55)→ γ ( P u ) then has associated characteristic function given as approximately φ ( v ) ≈ A ( v ) + B ( v ) C ( v ) D (30)and we can obtain γ ( P u ) by an inversion of φ ( v ) . Since the approximations are based on continuous random variables,we use the simple numerical inversion scheme of (Davies 1973) for continuous distributions.9 REPRINT - F

EBRUARY

1, 2021For n c , we base our expression on an approximate distribution for G . We have E [ G ] ≈ E [ N b ] (cid:18) E [ W F ]∆ + 1 (cid:19) − E [ N b − µ R E (cid:2) G (cid:3) ≈ E (cid:2) N b (cid:3) (cid:18) E [ W F ]∆ + 1 (cid:19) + E [ N b ] E (cid:2) W F (cid:3) − E [ W F ] ∆ + E (cid:2) ( N b − (cid:3) ( µ R ) + E [ N b − (cid:0) µ R − ( µ R ) (cid:1) − E [ N b ( N b − (cid:18) E [ W F ]∆ + 1 (cid:19) µ R with µ R = (cid:90) (1 − x ) dP WR ∆ ( x ) µ R = (cid:90) (1 − x ) dP WR ∆ ( x ) where P WR ∆ denotes the distribution of the scaled random variable W R ∆ . Now, since n c = E (cid:2) G (cid:3) E [ G ] − we obtain ˆ n c by substituting in the moments.Minimizing (29) is done numerically, and results in estimates of ( r D , r R , r B ) . The minimization problem is well-behaved and requires little supervision, but upper bounds can be utilized to speed up the process, and to avoid theexploration of extreme kinetic rates for which the approximations above can break down. Since r F cannot be estimatedin this way, we consider the relation E [ W I ] ≈ γ ( d ) − A − B where A = E [ W F ] + E [ W F ] + E [ W F ]∆ + B = ( E [ W F ]∆ + )( E [ N b ( N b − E [ W F ] + E [ W R ]) + E [ N b ] ∆ ) E [ N b ] ( E [ W F ]∆ + ) Write ˆ A and ˆ B for the quantities A and B computed using the estimated (ˆ r D , ˆ r R , ˆ r B ) . Finally, by estimating γ ( d ) directly from the observed timepoints and ˆ η , we obtain an estimator for r F as ˆ r F = (cid:32) N (cid:80) Ni =1 t o i − (1 − ˆ η ) b ˆ η − ˆ A − ˆ B (cid:33) − (31)So far, we have silently assumed that all PA-FP have been imaged by the end of the experiment, that is, within thetime window [0 , b ] . In practice, a recording is often allowed to run for a predetermined period of time, say 30-60minutes, so this assumption is rarely true. Barring edge effects, this changes nothing about the way estimation shouldbe carried out, or the validity of the estimated ( η, r D , r R , r B ) and derived quantities, but the interpretation of ˆ r F asthe rate of W I ∼ Exp ( r F ) no longer holds. Rather, ˆ r − F is then estimating the mean of the conditional distribution ( W I | W I < b ) . A corrected estimate can be found by numerically equating this mean with its theoretical counterpart,i.e. solving e r cF b − r cF b − r cF ( e r cF b − − ˆ r − F = 0 in r cF . Among other things, this corrected estimate can be used to estimate the total number of proteins that wouldhave been imaged in the ROI W , had the experiment been allowed to run to completion. If ˆ r cF is the corrected estimateand N b = ˆ η ˆ λ ↓ O | W | (cid:100) E [ G ] REPRINT - F

EBRUARY

1, 2021Figure 3: The full dataset (left) with green region of interest magniﬁed (center). The blue regions were used forestimating the intensity of the noise process. After discarding the ﬁrst 300 seconds of the recording, a reduced datasetis obtained (right).Figure 4: The observed data (center) with simulations from the ﬁtted PALM-IBCpp on the left and right. The arrivaltimes are plotted against the x-axis, illustrating the spatio-temporal clustering behavior.is the estimated number of proteins imaged in W during the time interval [0 , b ] , we get the estimated total number ofproteins in W (hidden and observed) as N ∞ = N b − e − ˆ r cF b (32) We consider data from a 2D-PALM experiment. The sample is a Jurkat T cell which has been activated by, and formsynapses against, antibody coated glass. It is expressing LAT-mEos3.2, and was imaged by TIRFM. The dataset wasrecorded at a framerate of ∆ − = 25 hz , and was then resolved and corrected for drift using ThunderSTORM (Ovesn`yet al. 2014). Finally, the frame numbers were divided by the framerate, so that the arrival times are measured inseconds.For analysis we ﬁrst subset out a region of interest of manageable size. We further subset out 2 large regions from thecoverslip outside the cell, which were used for estimation of η , see Figure 3. In addition, since the localization errorswere slightly larger in the beginning of the experiment, we removed the ﬁrst 300 seconds of the recording, at whichtime the errors had stabilized. The resulting sample had 3924 localizations in the ROI, and 1097 localizations in the(combined) noise region. After removing the ﬁrst 300 seconds, all timepoints had 300 seconds subtracted, so that thenew sample begins at time 0, and r F can be correctly estimated. Note that the relative amount of background noisepoints will be higher in the later parts of the experiment, where most PA-FPs have activated and bleached. In our case,11 REPRINT - F

EBRUARY

1, 2021we have ˆ η = 0 . for the subset data, whereas for the entire ROI we obtain ˆ η = 0 . . This difference is important,and the latter estimate should be used if subsequent analysis is carried out on the entire ROI.We ﬁt the PALM-IBCpp model to the reduced sample, the results of which can be seen in Table 1, where we alsoinclude various important descriptors related to the blinking behavior, and results from a simulation reﬁtting study.Simulations from the ﬁtted model can be seen in Figure 4. The estimated parameters indicate that most of the clusteringin this ROI can be explained by blinking clusters, so in simulations we have chosen a homogeneous Poisson processfor X , conditioned on there being | X | = 3924 · . (cid:100) E [ G ] ≈ real, underlying proteins. The localization uncertainties were drawn i.i.d. from the observed uncertainties.Est Avg Sd η r F · r B r D r R E [ G ] p q . q . q . q . simulations from the ﬁtted model. Included derived quantitiesare: the mean number of reappearances, E [ G ] , the bleaching probability, p , and the (0 . , . , . , . -quantiles ( q . , q . , q . , q . ) of the total PA-FP lifetime distribution (time in seconds from activation to bleaching). We evaluate the performance of our estimation methods under different protein distributions and kinetic blinking rates.We will also consider what happens when the blinking model is misspeciﬁed. We consider 3 different cases of proteindistributions: complete spatial randomness (CSR), spherical clustering, and ﬁbrous structures, see Figure 5. We ﬁx thenumber of proteins at | X | = 500 for all simulations, with localization uncertainties drawn i.i.d. from the Gamma(6.5,0.375) distribution (shape and rate parameterization), which is the maximum likelihood ﬁt to the observed uncertaintiesin the real data example of Section 5, and we consider η = 1 known. For the ﬁbers, ↓ X is neither stationary nor rotationinvariant.For the kinetic rates, we consider short and long lived PA-FP. Additionally, in a misspeciﬁed case, we use a modelwith 3 distinct dark states, each selected with the same probability, but with very different holding time distributions.For the values of the kinetic rates in the 3 PA-FP models, see Figure 6. We simulated realizations from eachcombination of spatial organization and blinking behavior, and discretized signals according to a framerate of 25hz.The results of the simulation study can be seen in Table 2. For the short and long lived PA-FP, we see that thereis close correspondence between the true parameter values and their estimates, especially for the smaller rates andall derived blinking statistics. The mean number of reappearances is well estimated, as is the bleaching probability, p , and the total lifetime quantiles. Some bias appears to exist for the dark-state return rate, r D , which also has thehighest uncertainty of the rates. This is likely due to the bias in the utilized approximations for low framerate to rateratios. Importantly, for the misspeciﬁed 3 dark-states model, the number of reappearances and the lifetime quantilesare again well estimated. Unsurprisingly, both r B and p are biased in this case, as the model attempts to ﬁt to anaverage blinking cycle, and cannot exactly capture the nuances of having 3 different dark states. Overall, the effect ofthe protein distribution is small compared to the effect of different PA-FP models, with a slight increase in variancefor more clustered conditions. 12 REPRINT - F

EBRUARY

1, 2021Figure 5: Typical simulations from each of the 3 protein conﬁgurations (CSR, clusters, ﬁbers) in the columns, beforeand after adding blinking clusters in the top and bottom rows, respectively. The CSR data is simulated as 500 i.i.d.uniform points in the ROI. The clusters data consists of 100 CSR points and further 20 uniformly located Gaussianclusters with standard deviation , each having 20 points. Finally, for the ﬁber data, 450 points are sampled uniformlyalong the edges of a ﬁxed ﬁber structure, and 50 CSR points are added to the background. FI BD (a) Short lived PA-FP.

FI BD (b) Long lived PA-FP.

FI BD D D

44 1 100.0040.25 4 2.5 (c) 3 dark-state PA-FP.

Figure 6: The three models of PA-FP photophysics considered in simulations.13

REPRINT - F

EBRUARY

1, 2021CSRShort lived Long lived 3 dark-statesTruth Avg Sd Truth Avg Sd Truth Avg Sd r F · r B r D r R E [ G ] p q . q . q . q . r F · r B r D r R E [ G ] p q . q . q . q . r F · r B r D r R E [ G ] p q . q . q . q . REPRINT - F

EBRUARY

1, 2021

In the present paper, we have established the IBCpp family of spatio-temporal clustered point processes, and we haveprovided results on the mark correlation functions for these models. Using an IBCpp particularly well-tailored toPALM data, the PALM-IBCpp, we described a procedure for estimation of parameters, allowing for a quantiﬁcationof data artifacts. We considered a real dataset expressing LAT-mEos3.2 PA-FP, where the PALM-IBCpp model ﬁtresulted in a close reconstruction of the spatio-temporal clustering behavior. From the reﬁtting and simulation studieswe saw that we can precisely estimate kinetic blinking rates under a variety of different protein and blinking conditions,and further provide valuable information about the number of reappearances and total lifetimes of PA-FP, even whenthe blinking model is misspeciﬁed.The special structure of the mark correlation function in the IBCpp family allows the moment-based approach to becarried out on any arbitrary ROI, without having to specify a model for the proteins. This is perhaps the most importantfeature of our method, because it guarantees that the estimated kinetic rates are directly applicable to the ROI beinganalyzed. The well-known sensitivity of PA-FP photodynamics to the experimental conditions (Annibale et al. 2011 b ,Staudt et al. 2020) means that kinetic rates obtained via a calibration sample may not be entirely relevant in anothersample, further emphasizing the importance of being able to directly estimate data artifacts from a given ROI.As our focus has been on estimation on data artifacts, we opted for a simpliﬁed disretization scheme and kineticphotoswitching model in the PALM-IBCpp, and we used an estimation scheme based on functional data summariesand moment methods. Such tools are unlikely to be sufﬁcient for ﬁtting more complex descriptions of photoblinking,but on the other hand are fast to compute, and involve only low-dimensional optimization problems due to the step-wise procedure. As a result, it is possible to quickly analyze many ROI as part of a larger analysis, or for lowering thevariance of the kinetic rate estimates by means of averaging. Acknowledgements

This work was supported by the Centre for Stochastic Geometry and Advanced Bioimaging, funded by grant 8721from the Villum Foundation. We acknowledge the use of the Nikon Imaging Facility (NIC) at King’s College Londonfor data acquisition.

Appendix A Moment results for IBCpp models

Let O be an IBCpp with motion-invariant protein process ↓ X . Deriving the results of Section 3.2 is perhaps mosteasily done by taking as starting point the f -weighted second factorial moment measure , α (2) f , given as α (2) f ( A ) = E  (cid:54) = (cid:88) ( o ,t o ) , ( o ,t o ) ∈ ↓ O A ( o , o ) f ( t o , t o )  (A.1)for A ∈ R d × R d a Borel set. By use of a Cambell theorem we obtain α (2) f ( A ) = (cid:90) A (cid:20)(cid:90) f ( t o , t o ) dM (2) O | ( o ,o ) ( t o , t o ) (cid:21) dα (2) O ( o , o ) (A.2)so that, comparing the above with the deﬁnition of the mark correlation function, we get the alternative characterization k fO ( o , o ) = 1 (cid:82) (cid:82) f ( t o , t o ) dM (1) O | o ( t o ) dM (1) O | o ( t o ) ∂α (2) f ∂α (2) O ( o , o ) (A.3)15 REPRINT - F

EBRUARY

1, 2021and we need merely compute the involved factors. We ﬁrst compute the 1-point mark distributions. Let A ⊂ R d and B ⊂ R + be Borel sets, then we obtain Λ O ( A × B ) = E  (cid:88) ( x,t x ) ∈ X (cid:88) ( y,t y ) ∈ Y ( x,tx ) A ( y ) B ( t y )  + E  (cid:88) ( e,t e ) ∈ E A ( e ) B ( t e )  (A.4) = E  (cid:88) ( x,t x ) ∈ X | Y ( x,tx ) | (cid:88) i =1 A ( x + (cid:15) i ) B ( t x + w i )  + E  (cid:88) ( e,t e ) ∈ E A ( e ) B ( t e )  (A.5) = (cid:82) E (cid:104)(cid:80) Gi =1 B ( t x + w i ) (cid:105) dM (1) X ( t x ) E [ G ] (cid:90) A λ ↓ Z dz + (cid:90) B ( t ∈ [0 , b ]) b dt (cid:90) A λ ↓ E de (A.6) = (cid:90) A  η (cid:82) E (cid:104)(cid:80) Gi =1 B ( t x + w i ) (cid:105) dM (1) X ( t x ) E [ G ] + (1 − η ) (cid:90) B ( t ∈ [0 , b ]) b dt  d Λ O ( o ) (A.7)where we averaged out the (cid:15) i , w i , and t x by conditioning on ↓ X , when going from the second to the third line. Fromthe above, we see that all involved mark distributions are independent of locations, with M (1) Z ( B ) = (cid:82) E (cid:104)(cid:80) Gi =1 B ( t x + w i ) (cid:105) dM (1) X ( t x ) E [ G ] (A.8) M (1) E ( B ) = (cid:90) B ( t ∈ [0 , b ]) b dt (A.9) M (1) O ( B ) = ηM (1) Z ( B ) + (1 − η ) M (1) E ( B ) (A.10)based on which the normalization for k fO is easily found (cid:90) (cid:90) f ( t o , t o ) dM (1) O ( t o ) dM (1) O ( t o ) = η γ ( f ) + (1 − η ) γ E ( f ) + 2(1 − η ) ηγ EZ ( f ) = γ O ( f ) (A.11)Next, we consider the second factorial moment measure of the typical cluster, α (2) Y ( x,tx ) , which will be needed below.For arbitrary ( x, t x ) and Borel sets A ⊂ R d × R d , B ⊂ R + × R + , we have α (2) Y ( x,tx ) ( A × B ) = E  G (cid:88) ( i,j )=1 ( i (cid:54) = j ) A ( x + (cid:15) i , x + (cid:15) j ) B ( t x + w i , t x + w j )  (A.12) = E  G (cid:88) ( i,j )=1 ( i (cid:54) = j ) B ( t x + w i , t x + w j )  (cid:90) A h (cid:15) ( x − x ) h (cid:15) ( x − x ) d ( x , x ) (A.13)obtained by averaging out the (cid:15) i by conditioning on ( t x , G, { w i } i ∈ N ) , from which follows α (2) ↓ Y ( x,tx ) ( A ) = E [ G ( G − (cid:90) A h (cid:15) ( x − x ) h (cid:15) ( x − x ) d ( x , x ) (A.14) M (2) Y ( x,tx ) = E (cid:104)(cid:80) G ( i,j )=1 ( i (cid:54) = j ) B ( t x + w i , t x + w j ) (cid:105) E [ G ( G − (A.15)and we see that M (2) Y ( x,tx ) is independent of x and, using a standard proof, we further obtain the identity in (15). Theidentity in (16) is obtained similarly, considering instead (Λ Y ( x,tx ) ) . Finally, for the density of α (2) f , we split the16 REPRINT - F

EBRUARY

1, 2021summation according to the process memberships of each pair: α (2) f ( A ) = E  (cid:88) ( x,t x ) ∈ X | Y ( x,tx ) | (cid:88) ( i,j )=1 ( i (cid:54) = j ) A ( x + (cid:15) i , x + (cid:15) j ) f ( t x + w i , t x + w j )  (A.16) + E  (cid:54) = (cid:88) ( x ,t x ) , ( x ,t x ) ∈ X | Y ( x ,tx | (cid:88) i =1 | Y ( x ,tx | (cid:88) j =1 A ( x + (cid:15) i , x + (cid:15) (cid:48) j ) f ( t x + w i , t x + w (cid:48) j )  (A.17) + E  (cid:88) ( z,t z ) ∈ Z (cid:88) ( e,t e ) ∈ E A ( z, e ) f ( t z , t e )  + E  (cid:88) ( z,t z ) ∈ Z (cid:88) ( e,t e ) ∈ E A ( e, z ) f ( t e , t z )  (A.18) + E  (cid:54) = (cid:88) ( e ,t e ) , ( e ,t e ) ∈ E A ( e , e ) f ( t e , t e )  (A.19)Using (A.14) and (A.15), recalling that E is a Poisson process independent of Z , and using that f is symmetrical, wesee that α (2) f ( A ) = γ ( f ) (cid:90) α (2) ↓ Y (0 , ( A − ( x, x )) λ ↓ X dx (A.20) + γ ( f ) (cid:90) (Λ ↓ Y , ) ( A − ( x , x )) dα (2) ↓ X ( x , x ) (A.21) + 2 γ EZ ( f ) (cid:90) A λ ↓ Z λ ↓ E d ( o , o ) (A.22) + γ E ( f ) (cid:90) A λ ↓ E d ( o , o ) (A.23) = γ ( f ) E [ G ( G − λ ↓ X (cid:90) A (cid:90) h (cid:15) ( o − x ) h (cid:15) ( o − x ) dxd ( o , o ) (A.24) + γ ( f ) λ ↓ X E [ G ] (cid:90) A (cid:90) g ↓ X ( || x − x || ) h ( o − x ) h ( o − x ) d ( x , x ) d ( o , o ) (A.25) + 2 γ EZ ( f ) λ ↓ Z λ ↓ E (cid:90) A d ( o , o ) (A.26) + γ E ( f ) λ ↓ E (cid:90) A d ( o , o ) (A.27)Write m for the Lebesgue measure on R d . Then, using the rotational symmetry of h (cid:15) and g ↓ X , it follows that ∂α (2) f ∂m ( o , o ) depends only on r = || o − o || , and ∂α (2) f ∂m ( r ) = γ ( f ) n c λ ↓ Z ( h (cid:15) ∗ h (cid:15) )( r ) + γ ( f ) λ ↓ Z ( h (cid:15) ∗ g ↓ X )( r ) + 2 γ EZ ( f ) λ ↓ Z λ ↓ E + γ E ( f ) λ ↓ E (A.28)and in particular γ O ( f ) g ↓ O k fO ( r ) = λ − ↓ O ∂α (2) f ∂m ( r ) (A.29) = γ ( f ) ηλ ↓ O n c ( h (cid:15) ∗ h (cid:15) )( r ) + γ ( f ) η ( h (cid:15) ∗ g ↓ X )( r ) + 2 γ EZ ( f ) η (1 − η ) + γ E ( f )(1 − η ) (A.30) = γ ( f ) ηλ ↓ O n c ( h (cid:15) ∗ h (cid:15) )( r ) + γ ( f ) η (( h (cid:15) ∗ g ↓ X )( r ) −

1) + γ O ( f ) (A.31)which is (18), from which the remaining results of Section 3.2 follow easily as special cases of f and η .17 REPRINT - F

EBRUARY

1, 2021

Appendix B Approximate discretized statistics

B.1 Approximate φ ( v ) The mean value to compute is formally φ ( v ) := E (cid:104)(cid:80) (cid:54) = j ,j ∈{ ,..,G } e iv | m j − m j | (cid:105) E [ G ( G − (B.1)where we have dropped the heavier notation of timepoints above, so that ( m j , m j ) are arrival times j and j inthe typical blinking cluster. Denote by N b the number of F -state visits (number of blinks), and by F s the observedtimepoints between the entrance to the s ’th and ( s +1) ’th F -state visits for s < N b , and F N b are all observed timepointsafter the last entrance to the F -state. Below, we will assume that m j < m j for ( j > j ) when ( m j , m j ) ∈ F s ,that is, we have the arrival times sorted within each F s set. We can split the summation according to whether m j and m j are from the same F s , and otherwise how many F -state visits are separating them. Thus φ ( v ) = E (cid:104)(cid:80) N b s =1 (cid:80) (cid:54) =( m j ,m j ) ∈ F s e iv | m j − m j | (cid:105) E [ G ( G − (B.2) + E (cid:104)(cid:80) N b s =1 (cid:80) N b s =1 ( s (cid:54) = s ) (cid:80) m j ∈ F s (cid:80) m j ∈ F s e iv | m j − m j | (cid:105) E [ G ( G − (B.3)To compute these terms, referred to as the ”non-separated” and ”separated” terms, respectively, we write the involvedquantities in terms of a continuous part, and an error part, and demonstrate that the errors vanish asymptotically, andin particular can be ignored for a given framerate as a valid approximation.First, we consider the number of timepoints in F s , | F s | . Since only those frames that do not fully overlap the signalfrom the s ’th F -visit (of which there are at most 2) cause discretization effects, we can write | F s | = W F s ∆ + E Fs (B.4)where W F s is the waiting time that was spent on the s ’th visit to the F state, and E Fs is an error term with P ( E Fs ∈ ( − , (B.5)and in particular, we obtain for G G = N b (cid:88) s =1 | F s | = N b (cid:88) s =1 W F s ∆ + N b (cid:88) s =1 E Fs (B.6)Next, consider the inner sum from the non-separated term: (cid:54) = (cid:88) ( m j ,m j ) ∈ F s e iv | m j − m j | (B.7)Here, note that the ﬁrst observed timepoint in F s , m s , can be written as m s = E m s + W I + s − (cid:88) k =1 ( W F k + W R k ) (B.8)since there is always a waiting time of W I spent in the inactive state, and ( s − visits in and out of the F state beforethe s ’th visit. Here, E m s is again a discretization error, with magnitude P ( E m s ∈ (0 , ∆)) = 1 (B.9)Since each member of F s is a whole number of ∆ -increments away from m s , this in particular means that, for ( m j , m j ) ∈ F s with j > j : | m j − m j | = ( j − j )∆ (B.10)18 REPRINT - F

EBRUARY

1, 2021and any discretization effects, and the time spent in the I -state, can be seen to disappear here. We can now expand onthe non-separate term enumerator: E  N b (cid:88) s =1 (cid:54) = (cid:88) ( m j ,m j ) ∈ F s e iv | m j − m j |  = (B.11) E  N b (cid:88) s =1 (cid:54) = (cid:88) ( j ,j ) ∈{ , ,.., | F s |} e iv ( j ∨ j − j ∧ j )∆  = (B.12) E  N b (cid:88) s =1 | F s |− (cid:88) j =1 ( | F s | − j ) e ivj ∆  = (B.13) E (cid:34) N b (cid:88) s =1 e iv ∆( | F s |− + e − iv ∆ ( | F s | − − | F s | ( e − iv ∆ − (cid:35) = (B.14) E (cid:34) N b (cid:88) s =1 e ivW Fs e iv ∆( E Fs − + e − iv ∆ ( W Fs ∆ + E Fs − − W Fs ∆ − E Fs ( e − iv ∆ − (cid:35) (B.15)At this point, consider what happens in the limit as ∆ → for the complete non-separate term: lim ∆ → E (cid:104)(cid:80) N b s =1 (cid:80) (cid:54) =( m j ,m j ) ∈ F s e iv | m j − m j | (cid:105) E [ G ( G − (B.16) E (cid:104) lim ∆ → (cid:80) N b s =1 e ivW Fs e iv ∆( E Fs − + W Fs ∆ ( e − iv ∆ −

1) + E Fs ( e − iv ∆ − − e − iv ∆ (cid:105) lim ∆ → ( e − iv ∆ − (cid:20) E (cid:20)(cid:16)(cid:80) N b s =1 W Fs ∆ + (cid:80) N b s =1 E Fs (cid:17) (cid:21) − E (cid:104)(cid:80) N b s =1 W Fs ∆ + (cid:80) N b s =1 E Fs (cid:105)(cid:21) = (B.17) E (cid:104)(cid:80) N b s =1 ivW F s − e iW Fs (cid:105) v E (cid:20)(cid:16)(cid:80) N b s =1 W F s (cid:17) (cid:21) = (B.18) E [ N b ] (1 + iv E [ W F ] − φ F ( u )) v ( E [ N b ] E [ W F ] + E [ N b ( N b − E [ W F ] ) (B.19)Predictably the rounding errors play no role in the limit, and as a simple approximation we therefore set E Fs = tothe midpoint of its domain for all s , to obtain the asymptotically exact approximation: E (cid:104)(cid:80) N b s =1 (cid:80) (cid:54) =( m j ,m j ) ∈ F s e iv | m j − m j | (cid:105) E [ G ( G − ≈ (B.20) E (cid:104)(cid:80) N b s =1 e ivW Fs e − iv ∆ + e − iv ∆ ( W Fs ∆ − ) − W Fs ∆ − (cid:105) ( e − iv ∆ − (cid:20) E (cid:20)(cid:16)(cid:80) N b s =1 W Fs ∆ + (cid:17) (cid:21) − E (cid:104)(cid:80) N b s =1 W Fs ∆ + (cid:105)(cid:21) = (B.21) E [ N b ] ( φ F ( v ) e − iv ∆2 + e − iv ∆ ( E [ W F ]∆ − ) − E [ W F ]∆ − )( e − iv ∆ − (cid:18) E [ N b ] (cid:16) E [ W F ]∆ + (cid:17) + E [ N b ] (cid:20) E [ W F ] − E [ W F ] ∆ − E [ W F ]∆ − (cid:21)(cid:19) (B.22)which is A ( v ) D .Now, consider the separate summation enumerator. We use similar techniques as before. Note that for F s and F s there are | s − s − | W F waiting times, and | s − s | W R waiting times, separating the closest pair in F s × F s , upto rounding error. Thus, if we enumerate the timepoints in F s instead starting from the end (so that m (cid:48) j ∈ F s is the j ’th largest value in F s , j ≥ ), the differences in timepoints m (cid:48) j ∈ F s and m j ∈ F s , with s > s , can be writtenon the form. | m (cid:48) j − m j | = W s + s − s (cid:88) k =1 ( W R s k + W F s k ) + ( j + j − E ( s ,s ) (B.23)19 REPRINT - F

EBRUARY

1, 2021where E ( s ,s ) only depends on ( s , s ) and has P ( E ( s ,s ) ∈ ( − ∆ , ∆)) = 1 (B.24)Therefore: N b (cid:88) s =1 N b (cid:88) s =1 ( s (cid:54) = s ) (cid:88) m j ∈ F s (cid:88) m j ∈ F s e iv | m j − m j | = (B.25) N b − (cid:88) s =1 N b (cid:88) s = s +1 (cid:88) m j ∈ F s (cid:88) m j ∈ F s e iv | m j − m j | = (B.26) N b − (cid:88) s =1 N b (cid:88) s = s +1 e ivW s e iv (cid:80) s − s k =1 ( W Rs k + W Fs k ) e ivE ( s ,s e − iv ∆2 | F s | (cid:88) j =1 | F s | (cid:88) j =1 e iv ( j + j )∆ = (B.27) N b − (cid:88) s =1 N b (cid:88) s = s +1 e ivW s e iv (cid:80) s − s k =1 ( W Rs k + W Fs k ) e ivE ( s ,s e − iv ∆2 ( e iv ∆ | F s | − e iv ∆ | F s | − e iv ∆ − (B.28)At this point, it should be clear that discretization effects again have no impact in the limit. For the sake of completion,we compute also this asymptotic value: lim ∆ → E (cid:104)(cid:80) N b s =1 (cid:80) N b s =1 ( s (cid:54) = s ) (cid:80) m j ∈ F s (cid:80) m j ∈ F s e iv | m j − m j | (cid:105) E [ G ( G − (B.29) − E (cid:20) (cid:80) N b − s =1 (cid:80) N b s = s +1 e ivW s e iv (cid:80) s − s k =1 ( W Rs k + W Fs k ) ( e ivW Fs − e ivW Fs − (cid:21) v ( E [ N b ] E [ W F ] + E [ N b ( N b − E [ W F ] ) = (B.30) (cid:16) φ F ( v ) − φ ( F + R ) ( v ) − (cid:17) φ R ( v ) (cid:0) E [ N b ] ( φ ( F + R ) ( v ) − − E (cid:2) φ ( F + R ) ( v ) N b (cid:3)(cid:1) v ( E [ N b ] E [ W F ] + E [ N b ( N b − E [ W F ] ) (B.31)Thus, replacing again all discretization errors with the midpoints of their domains ( E Fs = , E s ,s = 0 ), we get anasymptotically exact approximation: E (cid:104)(cid:80) N b s =1 (cid:80) N b s =1 ( s (cid:54) = s ) (cid:80) m j ∈ F s (cid:80) m j ∈ F s e iv | m j − m j | (cid:105) E [ G ( G − ≈ (B.32) e − iv ∆2 E (cid:20)(cid:80) N b − s =1 (cid:80) N b s = s +1 e ivW s e iv (cid:80) s − s k =1 ( W Rs k + W Fs k ) ( e ivW Fs e iv ∆ − e ivW Fs e iv ∆ − (cid:21) ( e − iv ∆ − (cid:18) E [ N b ] (cid:16) E [ W F ]∆ + (cid:17) + E [ N b ] (cid:20) E [ W F ] − E [ W F ] ∆ − E [ W F ]∆ − (cid:21)(cid:19) = (B.33) e − iv ∆2 (cid:18) φ F ( v ) e iv ∆ 12 − φ ( F + R ) ( v ) − (cid:19) φ R ( v ) (cid:0) E (cid:2) φ ( F + R ) ( v ) N b (cid:3) − − E [ N b ] ( φ ( F + R ) ( v ) − (cid:1) ( e − iv ∆ − (cid:18) E [ N b ] (cid:16) E [ W F ]∆ + (cid:17) + E [ N b ] (cid:20) E [ W F ] − E [ W F ] ∆ − E [ W F ]∆ − (cid:21)(cid:19) (B.34)which is B ( v ) C ( v ) D . B.2 Approximate n c We wish to compute n c = E (cid:2) G (cid:3) E [ G ] − (B.35)20 REPRINT - F

EBRUARY

1, 2021Instead of approximating the moments directly, we ﬁrst approximate the distribution of G , from which the momentscan be obtained. We can write somewhat loosely G = N b (cid:88) s =1 frames hit by the s’th F-signal ) − N b − (cid:88) s =1 ( F-signals s and s+1 share a frame ) (B.36)where by ”sharing” we mean that the continuous time signals emitted from the 2 F -state visits hit the same frame. Now,computing the distribution of G from this representation is made intractable due to the dependence and complicatedbehavior in the summands caused by disretization to the ﬁxed grid ∆ Z . Instead, we replace the summands with theirmean under disretization to grids ∆ Z + U , where U ∼ U ni (0 , ∆) . Write E U [ · ] for this mean, and let (cid:98) a (cid:99) and { a } denote the integer and fractional parts, respectively, of a number a . Write T Is and T Os for the entrance and exit times,respectively, for the s ’th F -state visit, and D s for the distance from T Is to the nearest gridpoint larger than T Is . Then,we obtain for any s E U [ frames hit by the s’th F-signal )] = (B.37) (cid:98) W F s ∆ − (cid:99) + E U (cid:2) ( D s < { W F s ∆ − } ) + ( D s > { W F s ∆ − } ) (cid:3) = (B.38) (cid:98) W F s ∆ − (cid:99) + 2 { W F s ∆ − } + (1 − { W F s ∆ − } ) = (B.39) W F s ∆ − + 1 (B.40)Now, for the second sum, we get E U [ ( F-signals s and s+1 share a frame )] = (B.41) − E U (cid:2) ( there is a gridpoint between T Os and T Is +1 ) (cid:3) = (B.42) − ( W R s ∆ − ( W R s ≤ ∆) + ( W R s > ∆)) = (B.43) ( W R s ∆ − ≤ − W R s ∆ − ) (B.44)and our approximation for G is thus G ≈ N b (cid:88) s =1 W F s ∆ + 1 − N b − (cid:88) s =1 ( W R s ∆ ≤ − W R s ∆ ) (B.45)from which we obtain E [ G ] ≈ E [ N b ] (cid:18) E [ W F ]∆ + 1 (cid:19) − E [ N b − µ R (B.46)where µ R = (cid:82) (1 − x ) dP WR ∆ ( x ) , and E (cid:2) G (cid:3) ≈ E (cid:2) N b (cid:3) (cid:18) E [ W F ]∆ + 1 (cid:19) + E [ N b ] E (cid:2) W F (cid:3) − E [ W F ] ∆ (B.47) + E (cid:2) ( N b − (cid:3) ( µ R ) + E [ N b − (cid:0) µ R − ( µ R ) (cid:1) (B.48) − E [ N b ( N b − (cid:18) E [ W F ]∆ + 1 (cid:19) µ R (B.49)with µ R = (cid:82) (1 − x ) dP WR ∆ ( x ) .If we write n c (∆) for the approximation of n c given a framerate of ∆ − , we have that n c (∆) is asymptotically exactin the sense that, after appropriate normalization, we have lim ∆ → ∆ n c (∆) = lim ∆ → ∆ n c (B.50)where this asymptotic value is given as lim ∆ → ∆ n c = E [ N b ] E (cid:2) W F (cid:3) + E [ N b ( N b − E [ W F ] E [ N b ] E [ W F ] (B.51)21 REPRINT - F

EBRUARY

1, 2021

B.3 Approximate γ ( d ) By deﬁnition, we have γ ( d ) = E (cid:104)(cid:80) Gk =1 (cid:80) G (cid:48) j =1 d ( m k , m (cid:48) j ) (cid:105) E [ G ] = E (cid:104)(cid:80) Gk =1 (cid:80) G (cid:48) j =1 m k + m (cid:48) j (cid:105) E [ G ] (B.52)where we again drop the drop the heavier time point notation, such that e.g. m k is arrival time k in a typical cluster,and m (cid:48) j is mark j in an independent copy of the typical cluster. Clearly, then, γ ( d ) = E (cid:104)(cid:80) Gk =1 m k (cid:105) E [ G ] (B.53)Now, write T Is for the (continuous) entrance time to the s ’th F -state visit. Then the ﬁrst observed timepoint in F s canbe written as T Is + E s , where E s is a discretization error with P (0 ≤ E s ≤ ∆) = 1 . Note further, that T Is = W I + s − (cid:88) i =1 ( W F i + W R i ) (B.54)and we arrive at the expression γ ( d ) = E (cid:104)(cid:80) N b s =1 | F s | ( T Is + E s ) + (cid:80) | F s | k =1 k ∆ (cid:105) E [ G ] (B.55) = E (cid:104)(cid:80) N b s =1 | F s | ( T Is + E s ) (cid:105) E [ G ] + ∆ E (cid:104)(cid:80) N b s =1 | F s | ( | F s | + 1) (cid:105) E [ G ] (B.56)Now, setting everywhere | F s | = W Fs ∆ + as in Appendix B.1, and similarly setting all E s = ∆ , we get E (cid:104)(cid:80) N b s =1 (cid:80) | F s | k =1 k ∆ (cid:105) E [ G ] = E [ N b ] ( E [ W F ] + E [ W F ] + ) E [ G ] (B.57)and E (cid:104)(cid:80) N b s =1 | F s | ( T Is + E s ) (cid:105) E [ G ] = E [ W I ] + E (cid:104)(cid:80) N b s =1 ( W Fs ∆ + )( (cid:80) s − i =1 ( W F i + W R i ) + ∆ ) (cid:105) E [ G ] (B.58) = E [ W I ] + ( E [ W F ]∆ + )( E [ N b ( N b − E [ W F ] + E [ W R ]) + E [ N b ] ∆ ) E [ G ] (B.59)so that setting E [ G ] ≈ E [ N b ] (cid:16) E [ W F ]∆ + (cid:17) yields the approximation. Again, the approximation is asymptoticallyexact, with limiting value lim ∆ → γ ( d ) = E [ W I ] + E [ W F ] ( E [ N b ( N b − E [ W F ] + E [ W R ]) E [ N b ] E [ W F ] + E (cid:2) W F (cid:3) E [ W F ] (B.60) Appendix C Use on general ↓ X For general ↓ X , we can nevertheless use the estimation procedures of Section 4 and obtain meaningful estimates. Tosee why this is true, we consider the means of standard kernel-estimators used above.Assume again that the IBCpp O is observed in W × [0 , b ] . Standard estimators of γ O ( f ) k fO and g ↓ O , when O ismotion-invariant, are given as ˆ γ O ( f )ˆ k fO ( r ) = (cid:80) i (cid:54) = j f ( t o i , t o j ) κ ( || o i − o j || − r ) w ( o i , o j ) (cid:80) i (cid:54) = j κ ( || o i − o j || − r ) w ( o i , o j ) (C.1) ˆ g ↓ O ( r ) = c ( r ) (cid:88) i (cid:54) = j κ ( || o i − o j || − r ) w ( o i , o j ) (C.2)22 REPRINT - F

EBRUARY

1, 2021Here, c ( r ) = (2 πr ) − N − | W | , κ is a smoothing kernel, and w ( x, y ) are edge correction weights, see e.g. (Gelfandet al. 2010, p. 308, 393). We think of the sum as running over all observed pairs in O , and for simplicity the w arejust indicator functions that the location pairs are in W . The kernel and weights should be chosen identical for bothestimators, as indicated by the notation. Our estimator of S fO is then ˆ S fO ( r ) = ˆ g ↓ O ( r )ˆ γ O ( f )ˆ k fO ( r ) = c ( r ) (cid:88) i (cid:54) = j f ( t o i , t o j ) κ ( || o i − o j || − r ) w ( o i , o j ) (C.3)Now, rather than computing the mean of ˆ S fO directly, we write ˜ c ( r ) = N c ( r ) , and compute the mean of N ˆ S fO ( r ) ,which yields slightly more elegant computations. By splitting the summation according to the cluster and processrelationships of pairs, and using the symmetry of f , we obtain E (cid:104) N ˆ S fO ( r ) (cid:105) = E  ˜ c ( r ) (cid:88) ( x,t x ) ∈ X (cid:54) = (cid:88) ( y ,t y ) , ( y ,t y ) ∈ Y ( x,tx ) f ( t y , t y ) κ ( || y − y || − r ) w ( y , y )  (C.4) + E  ˜ c ( r ) (cid:54) = (cid:88) ( x ,t x ) , ( x ,t x ) ∈ X (cid:88) ( y ,t y ) ∈ Y ( x ,tx (cid:88) ( y ,t y ) ∈ Y ( x ,tx f ( t y , t y ) κ ( || y − y || − r ) w ( y , y )  (C.5) + 2 E  ˜ c ( r ) (cid:88) ( x,t x ) ∈ X (cid:88) ( y,t y ) ∈ Y ( x,tx ) (cid:88) ( e,t e ) ∈ E f ( t y , t e ) κ ( || y − e || − r ) w ( y, e )  (C.6) + E  ˜ c ( r ) (cid:54) = (cid:88) ( e ,t e ) , ( e ,t e ) ∈ E f ( t e , t e ) κ ( || e − e || − r ) w ( e , e )  (C.7)Using the independence structures of our model, we average out the clusters to arrive at E (cid:104) N ˆ S fO ( r ) (cid:105) = γ ( f ) n c E [ G ] E  ˜ c ( r ) (cid:88) ( x,t x ) ∈ X ( h (cid:15) ∗ h (cid:15) ) w,κx ( r )  (C.8) + γ ( f ) E [ G ] E  ˜ c ( r ) (cid:54) = (cid:88) ( x ,t x ) , ( x ,t x ) ∈ X ( h (cid:15) ∗ h (cid:15) ) w,κx ,x ( r )  (C.9) + 2 γ EZ ( f ) E [ G ] λ ↓ E E  ˜ c ( r ) (cid:88) ( x,t x ) ∈ X ( h (cid:15) ∗ h e ) w,κx ( r )  (C.10) + γ E ( f ) λ ↓ E E [˜ c ( r )( h e ∗ h e ) w,κx ( r )] (C.11)where ( h (cid:15) ∗ h (cid:15) ) w,κx ( r ) = (cid:90) κ ( || t − t || − r ) h (cid:15) ( t ) h (cid:15) ( t ) W ( x + t , x + t ) dt dt (C.12) ( h (cid:15) ∗ h (cid:15) ) w,κx ,x ( r ) = (cid:90) κ ( || t + x − t − x || − r ) h (cid:15) ( t ) h (cid:15) ( t ) W ( x + t , x + t ) dt dt (C.13) ( h (cid:15) ∗ h e ) w,κx ( r ) = (cid:90) κ ( || x + t − t || − r ) h (cid:15) ( t ) W ( x + t , t ) dt dt (C.14) ( h e ∗ h e ) w,κx ( r ) = (cid:90) κ ( || t − t || − r ) W ( t , t ) dt dt (C.15)23 REPRINT - F

EBRUARY

1, 2021By considering what happens for f = 1 (in which case γ ( f ) = γ ( f ) = γ E ( f ) = γ EZ ( f ) = 1 ), we see that we canrewrite the above as E (cid:104) N ˆ S fO ( r ) (cid:105) = ( γ ( f ) − γ ( f )) n c E [ G ] E  ˜ c ( r ) (cid:88) x ∈ ↓ X ( h (cid:15) ∗ h (cid:15) ) w,κx ( r )  (C.16) + γ ( f ) E (cid:2) N ˆ g ↓ O ( r ) (cid:3) (C.17) + 2( γ EZ ( f ) − γ ( f )) E [ G ] λ ↓ E E  ˜ c ( r ) (cid:88) x ∈ ↓ X ( h (cid:15) ∗ h e ) w,κx ( r )  (C.18) + ( γ E ( f ) − γ ( f )) λ ↓ E E [˜ c ( r )( h e ∗ h e ) w,κx ( r )] (C.19)and we already have a very similar expression to the motion-invariant case. The obstacle to further exact computationscome from edge and kernel smoothing biases. For the pure cluster term, consider ﬁrst x ∈ ↓ X which is positionedcentrally in W , such that the edge correction weights can be disregarded. Then we would have ( h (cid:15) ∗ h (cid:15) ) w,κx ( r ) ≈ (cid:90) κ ( || t − t || − r ) h (cid:15) ( t ) h (cid:15) ( t ) dt dt (C.20) = (cid:90) κ ( || t || − r ) h (cid:15) ( t + t ) dt h (cid:15) ( t ) dt (C.21) = (cid:90) lκ ( l − r ) h (cid:15) ( l [ cos ( θ ) , sin ( θ )] + t ) dldθh (cid:15) ( t ) dt (C.22) = 2 π (cid:90) lκ ( l − r )( h (cid:15) ∗ h (cid:15) )( l ) dl (C.23)obtained by polar integration. Thus, for central x , we have a kernel-smoothed version of the cluster autoconvolution.For x closer to the border of W (on either side), not all pairs will be considered. As a simple approximation for all x ,we therefore assume an edge-affected cluster contributes the same as a central cluster, but weighted with the expectedfraction of points observed in W , (cid:82) W − x h (cid:15) ( t ) dt , so that E [ G ] E  ˜ c ( r ) (cid:88) x ∈ ↓ X ( h (cid:15) ∗ h (cid:15) ) w,κx ( r )  ≈ | W | Λ Z ( W ) r (cid:90) lκ ( l − r )( h (cid:15) ∗ h (cid:15) )( l ) dl (C.24)In particular, for small kernel bandwidths, we have E [ G ] E  ˜ c ( r ) (cid:88) x ∈ ↓ X ( h (cid:15) ∗ h (cid:15) ) w,κx ( r )  ≈ | W | Λ Z ( W )( h (cid:15) ∗ h (cid:15) )( r ) (C.25)Using the same tricks for the mixed term, we have for central x ( h (cid:15) ∗ h e ) w,κx ( r ) ≈ (cid:90) κ ( || x + t − t || − r ) h (cid:15) ( t ) dt dt (C.26) = (cid:90) κ ( || t || − r ) dt (cid:90) h (cid:15) ( t ) dt (C.27)so that, for small kernel bandwidths and all x , we have E [ G ] E  ˜ c ( r ) (cid:88) x ∈ ↓ X ( h (cid:15) ∗ h e ) w,κx ( r )  ≈ | W | Λ Z ( W ) (C.28)Finally, for the pure noise term, note that λ ↓ E E [˜ c ( r )( h e ∗ h e ) w,κx ( r )] = E (cid:2) | E ∩ W | ˆ g ↓ E ( r ) (cid:3) (C.29)where ˆ g ↓ E ( r ) is the estimator of the pair correlation function of a stationary Poisson process, so that we can reasonablyexpect λ ↓ E E [˜ c ( r )( h e ∗ h e ) w,κx ( r )] ≈ E (cid:2) | E ∩ W | (cid:3) (C.30)24 REPRINT - F

EBRUARY

1, 2021Thus, assuming the kernel bandwidth is small, and the majority of points in ↓ X are not close to the boundary of W relative to the effective cluster size, we obtain E (cid:104) N ˆ S fO ( r ) (cid:105) ≈ ( γ ( f ) − γ ( f )) n c | W | Λ Z ( W )( h (cid:15) ∗ h (cid:15) )( r ) (C.31) + γ ( f ) E (cid:2) N ˆ g ↓ O ( r ) (cid:3) (C.32) + 2( γ EZ ( f ) − γ ( f ))Λ E ( W )Λ Z ( W ) (C.33) + ( γ E ( f ) − γ ( f )) E (cid:2) | E ∩ W | (cid:3) (C.34)Finally, using simple Taylor approximations for the mean values, we have E (cid:104) ˆ S fO ( r ) (cid:105) ≈ ( γ ( f ) − γ ( f )) n c η ( W )Λ O ( W ) | W | − ( h (cid:15) ∗ h (cid:15) )( r ) (C.35) + γ ( f ) E (cid:2) ˆ g ↓ O ( r ) (cid:3) (C.36) + 2( γ EZ ( f ) − γ ( f )) η ( W )(1 − η ( W )) (C.37) + ( γ E ( f ) − γ ( f ))(1 − η ( W )) (C.38)or E (cid:104) ˆ S fO ( r ) (cid:105) ≈ ( γ ( f ) − γ ( f )) n c η ( W )Λ O ( W ) | W | − ( h (cid:15) ∗ h (cid:15) )( r ) + γ ( f )( E (cid:2) ˆ g ↓ O ( r ) (cid:3) −

1) + γ O ( f, W ) (C.39)where η ( W ) = Λ Z ( W )Λ O ( W ) (C.40) γ O ( f, W ) = η ( W ) γ ( f ) + (1 − η ( W )) γ E ( f ) + 2 η ( W )(1 − η ( W )) γ EZ ( f ) (C.41)Thus, whether X is motion-invariant or not, the mean of the involved summary statistics take approximately the sameshape, and the estimation procedures can be carried out in the same way regardless. Of course, for pathological caseswith all points lining the ROI edge, the above approximations can be bad, but this is a general problem and not uniqueto the methods presented here. Such cases can be avoided in practice, as the ROI can be chosen freely within the cellto obtain a favorable situation with relatively few points near edges. Appendix D Selection of R and U The sets R and U must be chosen in a way that reﬂects the size of blinking clusters, both in time and space. Since weare interested in the weights on the blinking-cluster term ( h (cid:15) ∗ h (cid:15) )( r ) , we should pick R to reﬂect where this term isinformative (different from ), and explore this region well. Similarly for U , we need to explore the informative rangeof values where the CDF γ ( P u ) moves between and .As ( h (cid:15) ∗ h (cid:15) )( r ) can be estimated using only the observed localization uncertainties, it is straightforward to ﬁnd a rangeof r -values where ( h (cid:15) ∗ h (cid:15) )( r ) exceeds some small threshold, say ( h (cid:15) ∗ h (cid:15) )( r ) > . . In addition, since standardkernel estimators of the mark and pair correlation functions can have considerable bias (and variance) for small valuesof R , it is advisable to exclude small values of r . The values of r that should be excluded depends primarily on thekernel bandwidth and the observed pairwise point distances. For all ﬁtting in this article, we have chosen R as anevenly spaced grid between twice the kernel bandwidth and the ﬁrst time (cid:100) ( h (cid:15) ∗ h (cid:15) )( r ) dips under . , with a stepsize of the smallest observed nearest-neighbor distance in the dataset.For U we are at a slight disadvantage, as we do not apriori know where γ ( P u ) takes informative values. Thus, wepropose the following iterative method for determining U : ﬁrst, pick ˜ U as a wide-spanning grid of u values, withinwhich interval we can say with a large degree of certainty most of the ﬂourophores have bleached. Typically, thiswould be measured on a time-scale of a minute or two at most, but more qualiﬁed guesses can be obtained by simplyplotting the arrival times in the dataset. Using ˜ U , we ﬁt the PALM-IBCpp model to data, and obtain rate estimates. If ˜ U was chosen poorly, the estimates will be of lower quality, but will still provide information about the timescale ofblinking. Thus, using the estimated ˆ γ ( P u ) on the basis of ˜ U , we pick a new U as an evenly spaced grid between and the ﬁrst u such that ˆ γ ( P u ) > . , and the model is ﬁt a ﬁnal time with this U . For simulations and analysis ofthe LAT-data, we chose the initial ˜ U as an evenly spaced grid between and minute, with gridpoints. The ﬁnal U was then picked as suggested above, with U having gridpoints.25 REPRINT - F

EBRUARY

1, 2021

References

Andersen, I. T., Hahn, U., Arnspang, E. C., Nejsum, L. N. & Jensen, E. B. V. (2018), ‘Double Cox cluster processes— with applications to photoactivated localization microscopy’,

Spatial Statistics , 58–73.Annibale, P., Vanni, S., Scarselli, M., Rothlisberger, U. & Radenovic, A. (2011 a ), ‘Identiﬁcation of clustering artifactsin photoactivated localization microscopy’, Nature methods (7), 527.Annibale, P., Vanni, S., Scarselli, M., Rothlisberger, U. & Radenovic, A. (2011 b ), ‘Quantitative photo activated local-ization microscopy: unraveling the effects of photoblinking’, PloS one (7), e22678.Baddeley, A., Rubak, E. & Turner, R. (2015), Spatial Point Patterns , Apple Academic Press Inc.Betzig, E., Patterson, G. H., Sougrat, R., Lindwasser, O. W., Olenych, S., Bonifacino, J. S., Davidson, M. W.,Lippincott-Schwartz, J. & Hess, H. F. (2006), ‘Imaging intracellular ﬂuorescent proteins at nanometer resolution’,

Science (5793), 1642–1645.Coltharp, C., Kessler, R. P. & Xiao, J. (2012), ‘Accurate construction of photoactivated localization microscopy (palm)images for quantitative measurements’,

PLoS One (12), e51725.Daley, D. J. & Vere-Jones, D. (2007), An introduction to the theory of point processes: volume II: general theory andstructure , Springer Science & Business Media.Davies, R. B. (1973), ‘Numerical inversion of a characteristic function’,

Biometrika (2), 415–417.Deschout, H., Zanacchi, F. C., Mlodzianoski, M., Diaspro, A., Bewersdorf, J., Hess, S. T. & Braeckmans, K. (2014),‘Precisely and accurately localizing single emitters in ﬂuorescence microscopy’, Nature methods (3), 253.Fricke, F., Beaudouin, J., Eils, R. & Heilemann, M. (2015), ‘One, two or three? probing the stoichiometry of membraneproteins by single-molecule localization microscopy’, Scientiﬁc reports , 14072.Gelfand, A. E., Diggle, P., Guttorp, P. & Fuentes, M. (2010), Handbook of spatial statistics , CRC press.Grifﬁ´e, J., Pham, T., Sieben, C., Lang, R., Cevher, V., Holden, S., Unser, M., Manley, S. & Sage, D. (2020), ‘Virtual-SMLM, a virtual environment for real-time interactive SMLM acquisition’.Hummer, G., Fricke, F. & Heilemann, M. (2016), ‘Model-independent counting of molecules in single-moleculelocalization microscopy’,

Molecular biology of the cell (22), 3637–3644.Karathanasis, C., Fricke, F., Hummer, G. & Heilemann, M. (2017), ‘Molecule counts in localization microscopy withorganic ﬂuorophores’, ChemPhysChem (8), 942–948.Lin, Y., Long, J. J., Huang, F., Duim, W. C., Kirschbaum, S., Zhang, Y., Schroeder, L. K., Rebane, A. A., Velasco, M.G. M., Virrueta, A. et al. (2015), ‘Quantifying and optimizing single-molecule switching nanoscopy at high speeds’, PloS one (5), e0128135.Ober, R. J., Tahmasbi, A., Ram, S., Lin, Z. & Ward, E. S. (2015), ‘Quantitative aspects of single-molecule microscopy:Information-theoretic analysis of single-molecule data’, IEEE Signal Processing Magazine (1), 58–69.Ovesn`y, M., Kˇr´ıˇzek, P., Borkovec, J., ˇSvindrych, Z. & Hagen, G. M. (2014), ‘Thunderstorm: a comprehensive imagejplug-in for palm and storm data analysis and super-resolution imaging’, Bioinformatics (16), 2389–2390.Patel, L., Gustafsson, N., Lin, Y., Ober, R., Henriques, R., Cohen, E. et al. (2019), ‘A hidden markov model approachto characterizing the photo-switching behavior of ﬂuorophores’, The Annals of Applied Statistics (3), 1397–1429.Rollins, G. C., Shin, J. Y., Bustamante, C. & Press´e, S. (2015), ‘Stochastic approach to the molecular counting problemin superresolution microscopy’, Proceedings of the National Academy of Sciences (2), E110–E118.Sengupta, P., Jovanovic-Talisman, T., Skoko, D., Renz, M., Veatch, S. L. & Lippincott-Schwartz, J. (2011), ‘Probingprotein heterogeneity in the plasma membrane using palm and pair correlation analysis’,

Nature methods (11), 969.Shivanandan, A., Deschout, H., Scarselli, M. & Radenovic, A. (2014), ‘Challenges in quantitative single moleculelocalization microscopy’, FEBS letters (19), 3595–3602.Shtengel, G., Galbraith, J. A., Galbraith, C. G., Lippincott-Schwartz, J., Gillette, J. M., Manley, S., Sougrat, R., Wa-terman, C. M., Kanchanawong, P., Davidson, M. W., Fetter, R. D. & Hess, H. F. (2009), ‘Interferometric ﬂuorescentsuper-resolution microscopy resolves 3d cellular ultrastructure’,

Proceedings of the National Academy of Sciences (9), 3125–3130.Small, A. & Stahlheber, S. (2014), ‘Fluorophore localization algorithms for super-resolution microscopy’,

Naturemethods (3), 267.Staudt, T., Aspelmeier, T., Laitenberger, O., Geisler, C., Egner, A., Munk, A. et al. (2020), ‘Statistical molecule count-ing in super-resolution ﬂuorescence microscopy: Towards quantitative nanoscopy’, Statistical Science (1), 92–111. 26 REPRINT - F

EBRUARY

1, 2021Stoyan, D. (1984), ‘On correlations of marked point processes’,

Mathematische Nachrichten (1), 197–207.Yamanaka, M., Smith, N. I. & Fujita, K. (2014), ‘Introduction to super-resolution microscopy’,

Microscopy (3), 177–192.Zhang, B., Zerubia, J. & Olivo-Marin, J.-C. (2007), ‘Gaussian approximations of ﬂuorescence microscope point-spread function models’, Applied optics46