[PDF] Estimation of the Diffusion Constant from Intermittent Trajectories with Variable Position Uncertainties

Abstract

The movement of a particle described by Brownian motion is quantified by a single parameter, D , the diffusion constant. The estimation of D from a discrete sequence of noisy observations is a fundamental problem in biological single particle tracking experiments since it can report on the environment and/or the state of the particle itself via hydrodynamic radius. Here we present a method to estimate D that takes into account several effects that occur in practice, that are important for correct estimation of D , and that have hitherto not been combined together for estimation of D . These effects are motion blur from finite integration time of the camera, intermittent trajectories, and time-dependent localization uncertainty. Our estimation procedure, a maximum likelihood estimation, follows directly from the likelihood expression for a discretely observed Brownian trajectory that explicitly includes these effects. The manuscript begins with the formulation of the likelihood expression and then presents three methods to find the exact solution. Each method has its own advantages in either computational robustness, theoretical insight, or the estimation of hidden variables. We then compare our estimator to previously published estimators using a squared log loss function to demonstrate the benefit of including these effects.

Full PDF

EEstimation of the Diffusion Constant from Intermittent Trajectorieswith Variable Position Uncertainties

Peter Relich, Mark Olah, Patrick Cutler, Keith LidkeAugust 24, 2015

Abstract

The movement of a particle described by Brownian motion is quantiﬁed by a single parameter, D , the diffusionconstant. The estimation of D from a discrete sequence of noisy observations is a fundamental problem in biologicalsingle particle tracking experiments since it can report on the environment and/or the state of the particle itself viahydrodynamic radius. Here we present a method to estimate D that takes into account several effects that occurin practice, that are important for correct estimation of D , and that have hitherto not been combined together forestimation of D . These effects are motion blur from ﬁnite integration time of the camera, intermittent trajectories,and time-dependent localization uncertainty. Our estimation procedure, a maximum likelihood estimation, followsdirectly from the likelihood expression for a discretely observed Brownian trajectory that explicitly includes theseeffects. The manuscript begins with the formulation of the likelihood expression and then presents three methods toﬁnd the exact solution. Each method has its own advantages in either computational robustness, theoretical insight, orthe estimation of hidden variables. We then compare our estimator to previously published estimators using a squaredlog loss function to demonstrate the beneﬁt of including these effects. a r X i v : . [ q - b i o . S C ] A ug Introduction

Single Particle Tracking (SPT) is a method to observe and classify the motion of individual particles as trajectories :estimates of a particle’s position in a sequence of discrete measurement times. In the ﬁeld of biological microscopy,SPT has been used for ﬁnding and analyzing protein motion in heterogeneous environments like the cellular mem-brane (1, 2) and cytoplasm (3, 4). The SPT trajectory information can be used to resolve variations in the individualmotion of molecules that would otherwise be lost in ensemble imaging techniques.In the analysis of trajectories, the pure Brownian motion model is often the ﬁrst model used to describe a trajectoryin the absence of prior information about the movement. The behavior of a single particle dominated by Brownianmotion can be described by a normal distribution with the variance term proportional to a single physical scale param-eter D , the diffusion constant; which makes Brownian motion the simplest model for describing stochastic motion.More complicated behavior could potentially be modeled as Brownian motion with discrete changes in the diffusionconstant that could be identiﬁed with change point analysis (5). Therefore, the estimation of the diffusion constantof a particle from discrete, noisy, and possibly short particle trajectories is a fundamental problem in single particletracking.In this manuscript, we focus on the likelihood distribution of D . We present a maximum-likelihood-based approachfor estimating the diffusion constant of a particle given an SPT trajectory that includes the individual localization errorfor each position in the trajectory, the time of the observation, and the camera integration time. Our approach is basedon a direct solution to the likelihood equation for the observation of a particular trajectory. The need for such anestimation procedure has evolved out of the rapid progress that has been made in SPT analysis techniques over the lastfew years (6–10). In particular, some emitter localization techniques can not only accurately resolve the location ofan emitter to tens of nanometers, but can also reliably estimate the localization error (11). Because the signal to noiseratio of a particle can vary signiﬁcantly from frame to frame in an image sequence (e.g. from varying background,or photobleaching of the probe), the localization error reported for each observation in a trajectory can also varysigniﬁcantly from frame to frame. We have therefore developed an estimator that takes into account this information. Historically, one of the primary techniques for estimating the diffusion constant from trajectories relied on a linearregression of the mean-squared-displacement (MSD) of the tracked particle coordinates as a function of time lag (12).In the absence of measurement errors, the observed MSD for pure Brownian motion scales linearly with time lagand intersects at the origin, allowing the direct recovery of the diffusion constant from a linear regression on the wellsampled data points. It has been shown that a regression of the MSD with an offset parameter can be interpreted toaccount for the cumulative effects of static (13) and dynamic measurement errors (14). If the MSD is built using thesame data points for multiple time lags, the correlation between MSD values must also be taken into account in theregression (12, 15, 16). Although it seems theoretically possible to include individual localization error into the MSDregression, to date this has not been described.A separate line of work has focused on maximum likelihood approaches to the estimation procedure. A maximumlikelihood estimator works by ﬁnding the maximum of a likelihood function L ( D ) = P ( O | D ) that gives the proba-bility of observing a particular trajectory O , given a diffusion constant D . Ideally this probability should incorporateboth the variable localization errors of the trajectory and effect of motion-blur. The motion-blur effect arises from thefact that each localization is performed on data that is acquired over some non-zero exposure time. Typically camerasensors integrate the signal over the exposure time resulting in a blurring of the particle image. This blurring hasimportant numerical effects on the likelihood function (17). A speciﬁc solution to the likelihood function has beenaccurately derived that incorporates the effects of motion-blur but with the caveat that only a single global localizationerror estimate is used as an input or estimated (16, 18). This estimator is a more robust alternative to the MSD-basedestimators because it can implement all trajectory information without incurring systematic error when the data is notwell conditioned for a linear regression. Subsequent work has extended this approach to deal with non-uniformlyspaced or intermittent trajectories (19), however the particular implementation in (19) did not account for motion blur.Maximum likelihood estimators are not the only class of diffusion estimators that have evolved recently; continueddevelopment on displacement-based estimators has resulted in an estimator that incorporates the effects of covariancesbetween sequentially observed displacements (20). 2n this work we provide a generalized solution to the likelihood function, incorporating variable localization er-rors and variable displacement periods, which results in an improvement in estimation accuracy for short trajectories,trajectories with large variations in localization accuracy, and trajectories with intermittently spaced measurements.In Sec. 2 we formulate the diffusion likelihood function to directly incorporate the effects of motion-blur, variablelocalization errors, and intermittent or non-uniformly spaced observations in time. We present three independent so-lutions to this likelihood function. The ﬁrst derivation, the recursive method (Sec. 3), is a sequential integration of thenuisance parameters and provides the fastest numerical implementation. The second derivation, the Laplace method (Sec. 4), utilizes a second order Taylor expansion to express the likelihood as a multivariate Gaussian in the basis ofintegration. The Laplace method additionally returns the maximum likelihood values of the true positions given asampled D . The third derivation, the Markov method (Sec. 5), calculates the characteristic function in order to expressthe likelihood in the basis of displacements. The

Markov method allows us to verify that the generalized form of theexpression derived in (18) is the same distribution as the expressions derived in this manuscript. The

Markov method was also instrumental in determining the coefﬁcients necessary to reduce the computational complexity of all themethods (Sec. 13.2). Each of these derivations leads to an independent, numerically accurate computational algorithmfor estimating the likelihood of D (Sec. 6), making full use of all the information contained in a noisy trajectory. Theresulting likelihood calculation allows for robust computations in speciﬁc problems, such as a maximum likelihoodestimator, maximum a posteriori estimate, or change point analysis. We compare the results of our maximum likeli-hood estimator (MLE) to the current state of the art estimation software (16) with the squared log loss function anddemonstrate that the additional information provided from the localization errors allows for better estimates of D withtrajectories parameterized by any non-constant, but otherwise arbitrary distribution of localization variances. If a diffusing particle is accurately and exactly observed at a discrete sequence of N + 1 positions X = { x i } N +1 i =1 attimes t i , then P ( X | D ) , the probability of sequence X given diffusion constant D , is P ( X | D ) = N (cid:89) i =1 P ( x i +1 | x i ) . (1)In Eq. 1, P ( x i +1 | x i ) = P ( x i +1 | x i , D ) is the probability density of each discrete jump from x i → x i +1 over timestep δt i = t i +1 − t i , given diffusion constant D .When measured experimentally, however, the true positions X are never known exactly, but are related to N observed positions O by some distribution P ( o i | x i , x i +1 ) , where the dependence on both x i and x i +1 arises fromthe effects of exposure time integration by the observation apparatus which will be dealt with in detail later. Under thisexperimental model, P ( O , X | D ) , the combined likelihood of the observed positions O and the actual positions X is a product of the observation probability densities P ( o i | x i , x i +1 ) and the diffusion transition probability densities P ( x i +1 | x i ) for each of the N observed positions and displacements, P ( O , X | D ) = N (cid:89) i =1 P ( o i | x i , x i +1 ) P ( x i +1 | x i ) . (2)Since X is unknown for experimental data, we integrate Eq. 2 over all possible X to marginalize out the dependenceon X , and write the diffusion likelihood as an integral over the space of all X -values, P ( O | D ) = (cid:90) d X P ( O , X | D ) = (cid:90) d X N (cid:89) i =1 P ( o i | x i , x i +1 ) P ( x i +1 | x i ) . Experimental data typically involves trajectories with two or three spatial dimensions. For diffusion in an isotropicmedium and particle uncertainties given as normal distributions with no covariance among the spatial dimensions,the probability distribution of a particular displacement in each dimension is separable. Thus, if Υ is the number of3imensions, then P ( O | D ) = Υ (cid:89) n =1 P ( O n | D ) . (3)Hence, it is sufﬁcient to only consider the estimation problem in the one-dimensional (1D) case O = { o i } Ni =1 , and P ( O | D ) = (cid:90) R N +1 d X N (cid:89) i =1 P ( o i | x i , x i +1 ) P ( x i +1 | x i ) . (4) Equation 40 is the fundamental description of the likelihood of diffusion constant D given observations O . Unfortu-nately, solving for this expression explicitly is difﬁcult because every o i term is dependent on both x i and x i +1 . Thisis because the estimate of o i ’s position is typically made from data collected over an exposure time < t (cid:15) ≤ t i +1 − t i .If the observational apparatus is a camera sensor, the signal will be integrated over the frame, resulting in a motion-blurred image of the moving particle, hence the observed location is conditional upon the particle’s true position at thebeginning ( x i ) and end ( x i +1 ) of the frame.In the case where exposure time t (cid:15) goes to 0, but δt i = t i +1 − t i remains constant, the motion-blur effect is nolonger present, so the observed location o i depends only on position x i , P ( O | D ) = (cid:90) R N d X N (cid:89) i =1 P ( o i | x i ) N − (cid:89) j =1 P ( x j +1 | x j ) . (5)Without the additional dependence on x i +1 , the methods required to solve the integral in Eq. 5 are simpler. In orderto use this simpler representation, we will transform Eq. 40 into a form which resembles Eq. 5, and seek functions M ( o i , x i ) and T ( x j +1 , x j ) such that P ( O | D ) = (cid:90) R N +1 d X N (cid:89) i =1 P ( o i | x i , x i +1 ) P ( x i +1 | x i ) = (cid:90) R N d X N (cid:89) i =1 M ( o i , x i ) N − (cid:89) j =1 T ( x j +1 , x j ) . (6)The function T ( x i +1 , x i ) stands for the transition probability; it is simply the probability of a particle diffusing withconstant D moving from x i to x i +1 , over time δt i . The function M ( o i , x i ) stands for the measurement probabilityand it encapsulates the net effect of both the measurement localization error and the motion-blur. The details ofthe representation equivalence of Eq. 6 are important for correctness, but they also unnecessarily complicate theexposition, and so can be found in Sec. 13. Other authors (14, 18) have investigated the motion-blur effects of exposuretime integration, and found that the effect can be approximated by an effective decrease in variance of the measurementlocalization error, dependent on diffusion constant D and exposure time t (cid:15) . Our derivations in Sec. 13 agrees with theeffective correction factor in (14, 18), and more importantly provides a form for the diffusion likelihood that is directlyamenable to the solution techniques we employ in Secs. 3,4, and 5.The result of the transformation of Eq. 6 is that the effective measurement function M i and the transition function T i take the form of normalized Gaussians. We use the notation N ( a, a , η ) = 1 √ πη exp (cid:20) − ( a − a ) η (cid:21) . to represent the normalized Gaussian function with variance η = σ centered around mean a considered as a functionof a, a , and η . Using this notation, we can succinctly represent the measurement and transition functions as, T i = T i ( x i +1 , x i ) = N ( x i +1 , x i , ω i ( D )) , for ≤ i ≤ N − , and (7) M i = M i ( o i , x i ) = N ( o i , x i , ε i ( D )) , for ≤ i ≤ N. (8)4he transition functions (Eq. 46), are unaffected by the motion blur transformation and their Gaussian representationfollows directly from the normally distributed displacements of diffusive processes, hence the variance is ω i ( D ) = 2 Dδt i . For the measurement functions (Eq. 45), the variance ε i ( D ) , is the variance due to measurement error, v i , combinedwith a correction for the effect of motion-blur that is dependent on diffusion constant D and exposure time t (cid:15) , ε i ( D ) = v i − Dt ε / . where the factor of / comes from the continuous limit integration of photon emissions for averaged Browniantrajectories (Sec. 13.1). It is important to note that the independence of t ε and δt i allows for gaps in the trajectories,where δt i could span a duration of multiple frames but t ε is the exposure time of a single frame.The result is that Eq. 6 allows us to express the likelihood function exactly in a simple form that deals directly withvariable localization error, motion-blur effects, and missing or irregularly spaced trajectory localizations, P ( O | D ) = (cid:90) R N d X N (cid:89) i =1 M i N − (cid:89) j =1 T j . (9) The notation for the transition and measurement functions allows us to deﬁne the likelihood function L ( D ) , by writingEq. 9 in a form that emphasizes the dependencies on each marginalized position x i , L ( D ) = P ( O | D ) = (cid:90) d x N M N (cid:90) d x N − M N − T N − . . . (cid:90) d x M T (cid:90) d x M T . (10)The form of Eq. 10 leads to a direct recursive solution, taking into account the properties of integrals over productsof normalized Gaussian functions. Deﬁne L i as the sub-integrand of L ( D ) considering only the ﬁrst i observations, L ( D, x ) = (cid:90) M T d x , L i ( D, x i +1 ) = (cid:90) M i T i L i − d x i , ≤ i ≤ N − L ( D ) = L N ( D ) = (cid:90) M N L N − d x N . Now, consider that the integral of a product of two normalized Gaussians sharing a parameter x , with means andvariances denoted by c i and ϕ i respectively, is itself a normalized Gaussian (Sec. 11), (cid:90) d x (cid:89) i =1 N ( x, c i , ϕ i ) = N ( c , c , ϕ + ϕ ) . (11)Hence, L is a normalized Gaussian in parameter x , L ( D, x ) = (cid:90) d x M T = N ( o , x , ε + ω ) . This implies that L ( D, x ) which is an integral over positions x , can now be written as an integral over threenormalized Gaussians, all of which share parameter x , L ( D, x ) = (cid:90) d x M T (cid:90) d x M T = (cid:90) d x M T L . (12)5imilarly, the integral of a product of three normalized Gaussians sharing the integrated parameter is itself a productof two normalized Gaussians (Sec. 11), (cid:90) d x (cid:89) i =1 N ( c i , x, ϕ i ) = N ( c , c , ϕ + ϕ ) N ( c , c (cid:48) , γ ) , (13)where, c (cid:48) = c ϕ + c ϕ ϕ + ϕ , and γ = ϕ ϕ + ϕ ϕ + ϕ ϕ ϕ + ϕ . Hence, applying Eq. 13 to Eq. 12, we ﬁnd that L = (cid:90) d x M T L = N ( o , o , ( ε + ω ) + ε ) N ( x , µ , η ) is a product of two normalized Gaussians, one of which depends on x and the other is a constant with respect to X .The variables µ and η follow from Eq. 11 and Eq. 13, and are the second pair of values in a recursive solution. Sinceall subsequent integrals, barring the last integral, can be expressed as a product of three normalized Gaussians, we canexpress the recursion variables as µ = o , η = ε + ω , and α = η + ε , (14)and for ≤ i ≤ N − , µ i = µ i − ε i + η i − o i α i − , η i = η i − ε i α i − + ω i , and α i = η i + ε i +1 . (15)Finally, this allows us to express our integrands L i as L = N ( x , µ , η ) L i = N ( x i +1 , µ i , η i ) i − (cid:89) k =1 N ( o k +1 , µ k , α k ) , ≤ i ≤ N − L ( D ) = L N = N − (cid:89) k =1 N ( o k +1 , µ k , α k ) . (16)Equation 16 is the ﬁnal form of the recursive solution for L ( D ) which is simply the product of N − normalizedGaussians each of which has parameters which come from a recursive relationship on o i , ε i , and ω i . The value of D that maximizes L ( D ) is the maximum likelihood estimate. An independent solution for Eq. 9 can be obtained using the Laplace method which is based on integrating the secondmoment of the Taylor expansion of the exponential component of a function. Given that the second moment of aTaylor expansion is quadratic, this ensures that the function under the integral is always a Gaussian function (21).Another caveat to the Laplace method is that the Taylor expansion has to occur about the peak of the exponential, sothat the ﬁrst moment of the Taylor expansion goes to 0. To perform the Laplace method, we express our likelihood L ( D ) = P ( O | D ) in terms of exponential and non-exponential components L ( D ) = (cid:90) d X f ( X ) = (cid:90) d X h ( X ) exp [ − g ( X )] , f ( x ) is simply the integrand of Eq. 9, f ( X ) = h ( X ) exp [ − g ( X )] = N (cid:89) i =1 M i N − (cid:89) j =1 T j . (17)Thus, using equations 45 and 46, we see that h = h ( X ) is independent of X and g ( X ) is quadratic in X , h = N (cid:89) i =1 √ πε i N − (cid:89) i =1 √ πω i ,g ( X ) = N (cid:88) i =1 ( o i − x i ) ε i + N − (cid:88) i =1 ( x i +1 − x i ) ω i . The maximum likelihood estimate (cid:98) X of the actual positions X , given D and O will be wherever the integrand ismaximized, and since g ( X ) ≥ , (cid:98) X = argmax X f ( X ) = argmin X g ( X ) . Now, given that g ( X ) is quadratic, a second order Taylor expansion of g about (cid:98) X is exact and the Laplace methodwill provide an exact solution for L ( D ) as the integral can be shown to take the form of a standard Gaussian integral.To see this, we write out the second order Taylor expansion (cid:90) d X f ( X ) = h (cid:90) d X exp (cid:20) − g ( (cid:98) X ) − ∇ g ( (cid:98) X )( X − (cid:98) X ) −

12 ( X − (cid:98) X ) (cid:124) ∇∇ g ( (cid:98) X )( X − (cid:98) X ) (cid:21) . (18)Since (cid:98) X is the minima of g ( X ) , the gradient ∇ g ( (cid:98) X ) = 0 , and we can rearrange Eq. 18 to extract all terms independentof X , (cid:90) d X f ( X ) = f ( (cid:98) X ) (cid:90) d X exp (cid:20) −

12 ( X − (cid:98) X ) (cid:124) ∇∇ g ( (cid:98) X )( X − (cid:98) X ) (cid:21) . (19)Furthermore, since h is independent of X , we know that −∇∇ ln f ( X ) = ∇∇ g ( X ) = M , where M can be thoughtof as the inverse of the covariance matrix for the multivariate Gaussian, or equivalently as the Hessian matrix of − ln f ( X ) . Substituting M for ∇∇ g ( X ) in Eq. 19 we are left with a Gaussian integral with the solution, L ( D ) = f ( (cid:98) X ) (cid:90) d X exp (cid:20) −

12 ( X − (cid:98) X ) (cid:124) M ( X − (cid:98) X ) (cid:21) = f ( (cid:98) X ) (cid:114) (2 π ) N det M . (20)The Hessian matrix M is independent of X and is symmetric tri-diagonal with non-zero elements, M , = − ∂ ln f∂x = 1 ε + 1 ω M i,i = − ∂ ln f∂x i = 1 ε i + 1 ω i + 1 ω i − , ≤ i ≤ N − M N,N = − ∂ ln f∂x N = 1 ε N + 1 ω N − M i,i +1 = M i +1 ,i = − ∂ ln f∂x i ∂x i +1 = − ω i , ≤ i ≤ N − . (21)We also require (cid:98) X to compute Eq. 20, which can be solved for with the relation −∇ ln f ( (cid:98) X ) = 0 (Sec. 14), giving (cid:98) X = M − Θ , (22)where Θ = { θ i = o i /ε i } Ni =1 . 7 Markov Method

In this section we present a derivation of the likelihood function L ( D ) utilizing a technique developed by AndreyMarkov (22) and generalized by Chandrasekhar (23). Markov’s method allows us to transform P ( O | D ) (Eq. 9) froma function of the N observed positions O = { o i } Ni =1 into a function of the N − discrete steps (displacements)between subsequent observations S = { s i = o i +1 − o i } N − i =1 . This is possible because the spatial invariance of diffusion means L ( D ) depends not on the absolute spatial positions O , but only their relative displacements, S . Thus we should expect that P ( O | D ) can also be expressed as P ( S | D ) ,however Eq. 10, which deﬁnes P ( O | D ) , cannot be directly transformed into a function on S . This is where Markov’smethod allows us to solve for a function P ( S | D ) = P ( O | D ) for a given D value. For a particular ﬁxed S and d S of interest, the value of P ( S | D ) d S gives the probability that variable S (cid:48) is within the bounds S −

12 d S ≤ S (cid:48) ≤ S + 12 d S . (23)More formally, P ( S | D ) d S is the integral over the volume d S around the point of interest S , and we let S (cid:48) representthe variable integrated over, P ( S | D ) d S = (cid:90) S + d SS − d S d S (cid:48) P ( O | D ) . The issue remains that P ( O | D ) is expressed in a basis of O rather than of S , and integrating with respect to boundsin a different basis is non-trivial. In order to circumvent this issue, Markov utilized a product of Dirichlet integrals, Ξ( S (cid:48) ) = (cid:81) N − k =1 ξ k ( s (cid:48) k ) , to expand the limits of integration to all space P ( S | D ) d S = (cid:90) d S (cid:48) Ξ( S (cid:48) ) P ( O | D ) . (24)The idea is that for each dimension of S , the Dirichlet integral ξ k ( s (cid:48) k ) acts like a continuous indicator function deter-mining if s (cid:48) k is within the bounds of Eq. 23, ξ k ( s (cid:48) k ) = 1 π (cid:90) d ρ k sin( d s k ρ k ) ρ k exp [ ıρ k ( s (cid:48) k − s k )] , so that, ξ k ( s (cid:48) k ) = (cid:40) s k − d s k ≤ s (cid:48) k ≤ s k + d s k otherwise . Therefore Ξ( S (cid:48) ) is the indicator function acting over the whole space of S (cid:48) , and determining if S (cid:48) is within the volume d S around our point of interest S , Ξ( S (cid:48) ) = N − (cid:89) k =1 ξ k ( s (cid:48) k ) =  N − (cid:94) k =1 s (cid:48) k ∈ (cid:2) s k − d s k , s k + d s k (cid:3) otherwise . This puts Eq. 24 in the form P ( S | D ) d S = (cid:90) d S (cid:48) π N − (cid:90) d ρ (cid:34) N − (cid:89) k =1 sin( d s k ρ k ) ρ k (cid:35) exp [ ı ρ (cid:124) ( S (cid:48) − S )] P ( O | D ) , (25)where ρ = { ρ k } N − k =1 is a vector of conjugate coordinates to S (cid:48) . We can then rearrange Eq. 25 to move the integralover d S (cid:48) and all factors dependent on S (cid:48) into function Λ( ρ ) , P ( S | D ) d S = 1 π N − (cid:90) d ρ (cid:34) N − (cid:89) k =1 sin( d s k ρ k ) ρ k (cid:35) exp [ − ı ρ (cid:124) S ]Λ( ρ ) . (26)8ow, we can interpret Λ( ρ ) as the characteristic function of P ( O | D ) in the S (cid:48) basis and it has the form Λ( ρ ) = (cid:90) d S (cid:48) exp [ ı ρ (cid:124) S (cid:48) ] P ( O | D ) . (27)The form of Eq. 27 implies that Λ( ρ ) is the inverse Fourier transform of P ( O | D ) . Due to the properties of the Fouriertransform, we assert Λ( ρ ) is a bounded function with a ﬁnite integral because P ( O | D ) is expressed as a ﬁnite productof Gaussians with non-zero variance, hence it is a continuous function (24). Given that (cid:82) d ρ Λ( ρ ) is bounded and d S is small we can approximate the product of sinc functions in Eq. 26 as N − (cid:89) k =1 sin( d s k ρ k ) ρ k = N − (cid:89) k =1 d s k S N − . Thus Eq. 26 becomes the Fourier transform of Λ( ρ ) P ( S | D ) d S = d S (2 π ) N − (cid:90) d ρ exp [ − ı ρ (cid:124) S ]Λ( ρ ) . (28)We are now interested in evaluating Λ( ρ ) explicitly. To do so we note that (cid:82) d S (cid:48) P ( O | D ) = 1 since P ( O | D ) isa probability distribution that, as we have argued, can be equivalently expressed in the S basis as P ( S | D ) (Sec. 14).With this understanding we evaluate Λ( ρ ) , by expanding the exponential under integration in Eq. 27 as a Taylor seriesabout the origin exp [ ı ρ (cid:124) S (cid:48) ] = 1 + ı  N − (cid:88) j =1 ρ j s (cid:48) j  −  N − (cid:88) j =1 ρ j s (cid:48) j  +  N − (cid:88) j =1 N − (cid:88) k = j +1 ρ j ρ k s (cid:48) j s (cid:48) k  + O ( ρ ) . Solving the integral for Λ( ρ ) with the Taylor expanded exponential term given that we know P ( O | D ) is a nor-malized probability density under S (cid:48) allows us to write Λ( ρ ) in terms of expected values for s k (using (cid:104)·(cid:105) to representexpectation), Λ( ρ ) = 1 + ı  N − (cid:88) j =1 ρ j (cid:10) s (cid:48) j (cid:11) −  N − (cid:88) j =1 ρ j (cid:10) s (cid:48) j (cid:11) +  N − (cid:88) j =1 N − (cid:88) k = j +1 ρ j ρ k (cid:10) s (cid:48) j s (cid:48) k (cid:11) + O ( ρ ) . (29)We take the approach of solving for the expectation values on S (cid:48) to see if those results will simplify the expressionin Eq. 29. If we set one of the observations, o i , as a constant, we ﬁnd that we can marginalize all of P ( O | D ) exceptfor the terms that are independently represented by the basis we are interested in. In other words, we ﬁnd that (cid:104) s (cid:48) i (cid:105) = (cid:90) d s (cid:48) i s (cid:48) i N ( s (cid:48) i , , ε i + ε i +1 + ω i ) = 0 (cid:10) s (cid:48) i (cid:11) = (cid:90) d s (cid:48) i s (cid:48) i N ( s (cid:48) i , , ε i + ε i +1 + ω i ) = ε i + ε i +1 + ω i . (cid:10) s (cid:48) i s (cid:48) i +1 (cid:11) = (cid:90) d s (cid:48) i d s (cid:48) i +1 s (cid:48) i s (cid:48) i +1 N ( S b , , Σ b ) = − ε i +1 . Where the substitution of variables required for integrating the expectation of (cid:10) s (cid:48) i s (cid:48) i +1 (cid:11) induces a bivariate Gaussianfunction with location parameters S b = [ s i , s i +1 ] (cid:124) and covariance matrix Σ b = (cid:20) ω i + (cid:15) i + (cid:15) i +1 − (cid:15) i +1 − (cid:15) i +1 ω i +1 + (cid:15) i +1 + (cid:15) i +2 (cid:21) . Furthermore, we ﬁnd that the separability of two non-adjacent displacements results in the relation (cid:10) s (cid:48) i s (cid:48) i + k (cid:11) = (cid:104) s (cid:48) i (cid:105) (cid:10) s (cid:48) i + k (cid:11) = 0 for k > . Since P ( O | D ) is a multivariate Gaussian with 0 mean, we can apply Isserlis’s theorem to9olve for all of the moments of the distribution (25) given knowledge of the second moments of the Fourier transformon P ( O | D ) , which can be expressed as a covariance matrix. This allows us to express Eq. 29 as Λ( ρ ) = exp (cid:20) − ρ (cid:124) Σ ρ (cid:21) . (30)The covariance matrix Σ is symmetric tri-diagonal, with non-zero elements Σ i,i = ω i + ε i + ε i +1 Σ i,i +1 = Σ i +1 ,i = − ε i +1 . (31)The expression in Eq. 30 is well known as the characteristic function of a multivariate Gaussian. SubstitutingEq. 30 into Eq. 28 and factoring out d S gives L ( D ) = P ( S | D ) = 1 (cid:112) (2 π ) N − det Σ exp (cid:20) − S (cid:124) Σ − S (cid:21) . (32) We have presented three independent solutions to the experimental diffusion likelihood L ( D ) (Eq. 10): the recursivemethod (Sec. 3), the Laplace method (Sec. 4), and the Markov method (Sec. 5). While each method requires separateconsideration, several features are common to all of the implementations. The separability of the problem allowsus to estimate diffusion constants for any dimensional inputs inputs using the 1D algorithms (Eq. 3). The inputsto the algorithms are: (1) the observed particle locations, O = { o i } Ni =1 ; (2) the observation times T = { t i } Ni =1 ;(3) the measurement variance for each observation V = { v i } Ni =1 ; (4) the exposure of each frame t (cid:15) ; and (5) oneor more diffusion constants D at which to evaluate the likelihood. The output for each D value is ln( L ( D )) . Thelogarithm of the likelihood makes the computation of products and exponentials much faster, and avoids the problemof numerical underﬂow for very small values of L ( D ) . Additionally, because the logarithm is a strictly monotonicallyincreasing function, argmax D L ( D ) = argmax D ln( L ( D )) , so the maximum likelihood estimate is identical for thelog-likelihood. The recursive algorithm follows directly from the recursively deﬁned variables (Eqs. 14 and 15), and the expression of L ( D ) as a product of Gaussians (Eq. 16). The recursive expressions for α i , η i , and µ i , are causal (the i -terms dependonly on the ( i − -terms), enabling their computation in a simple for loop over N . Noting that the logarithm of anormalized Gaussian is ln N ( a, b, v ) = − (cid:20) ln(2 π ) + ln( v ) + ( a − b ) v (cid:21) , (33)we apply Eq. 33 directly to Eq. 16 to arrive at a computationally efﬁcient form for the recursive solution of the log-likelihood ln L ( D ) = N − (cid:88) i =1 ln N ( o i +1 , µ i , α i ) = − (cid:34) ( N −

1) ln(2 π ) + N − (cid:88) i =1 ln( α i ) + N − (cid:88) i =1 ( o i +1 − µ i ) α i (cid:35) . Of all the methods, the recursive method is the simplest to implement and the most computationally efﬁcient andnumerically stable.

The computational core of the Laplace method centers around the Hessian matrix M (Eq. 21). This matrix is symmet-ric tri-diagonal, which means all non-zero elements are on the main diagonal and the diagonals immediately above and10elow. Using M we can solve the linear system (cid:98) X = M − Θ (Eq. 22) to obtain the maximum likelihood estimates (cid:98) X for the true particle locations. Typically, solving large linear systems is expensive but since M is tri-diagonal there arealgorithms to solve this system in linear time (26). We refer the reader to Sec. 15 for the details of tri-diagonal matrixalgorithms and our implementation.Given a solution for (cid:98) X , we can use the deﬁnition of f ( X ) in Eq. 17 along with Eq. 33 to compute ln f ( (cid:98) X ) = − (cid:34) N (cid:88) i =1 ln(2 πε i ) + N (cid:88) i =1 ( o i − (cid:98) x i ) ε i + N − (cid:88) i =1 ln(2 πω i ) + N − (cid:88) i =1 ( (cid:98) x i +1 − (cid:98) x i ) ω i (cid:35) . (34)Finally we can compute the log-likelihood using the Laplace solution of Eq. 20, ﬁnding that ln L ( D ) = ln f ( (cid:98) X ) + N π ) −

12 ln det M = − (cid:34) ( N −

1) ln(2 π ) + N − (cid:88) i =1 ln( ω i ) + N − (cid:88) i =1 ( (cid:98) x i +1 − (cid:98) x i ) ω i + N (cid:88) i =1 ln( ε i ) + N (cid:88) i =1 ( o i − (cid:98) x i ) ε i + ln det M (cid:35) (35) Finally, the Markov method computation, like the Laplace method, is centered around matrix computations. In thiscase, the matrix of interest is the N − dimensional covariance matrix Σ (Eq. 31), which also happens to be symmetrictri-diagonal, so the same linear-time algorithms used in the Laplace method are applicable (Sec. 15).For the Markov method computation we ﬁrst solve the linear system Φ = Σ − S , then apply this solution alongwith the tri-diagonal log-determinant algorithm to compute the logarithm of the likelihood expression from Eq. 32,giving ln L ( D ) = −

12 [( N −

1) ln(2 π ) + S (cid:124) Φ + ln det Σ] . To demonstrate the beneﬁts of including individual localization errors in the estimation, we opt for a loss function toevaluate the quality of the maximum likelihood estimators (MLE) under consideration. The mean squared error is apopular choice for evaluating estimators, but it is not a sufﬁcient loss function for the estimation of D , where the meansquared error shows over penalization for higher estimations as it has a bounded loss at ˆ D = 0 and an unbounded lossat ˆ D = ∞ . Instead, we will use the squared log loss function (27) (cid:96) ( D, ˆ D ) = (cid:16) ln( D ) − ln( ˆ D ) (cid:17) . The squared log loss function is similar to the mean squared error, except that the squared distance is between thelogarithms of D and ˆ D . From (28), we see that the variance term, D , scales with the observed data as a logarithm,hence the choice of the squared log loss function represents a metric between the expected data given by D and ˆ D .For one set of SPT simulations, the true trajectory coordinates were ﬁrst generated with the pure diffusion model.Full frame motion blur was accounted for and localization errors were independently and identically drawn fromeither a Uniform or a Gamma distribution to test the effects of variable localization errors without attributing successto a particular choice of distribution. As a control, a set of simulations with a constant localization error for allobservations was generated to show that the likelihood distribution represented by the class of diffusion estimators thatonly recognize a Scalar Localization Error (SLE) was exactly the same as the distribution represented by either of ourthree derived methods which recognize a Vector of Localization Errors (VLE). We then used the Gamma and Uniformdistribution based localization errors to compare the new VLE based estimators that account for individually measuredlocalization errors over the SLE based estimators, for which we had input the square root of the mean localization11ariance as the best representation for the scalar error in all trajectory observations. We performed 10,000 trajectorytrials for each data point to estimate the risk of using a particular estimator, where risk is deﬁned as R ( D, ˆ D ) = (cid:68) (cid:96) ( D, ˆ D ) (cid:69) . The squared log risk is the variance of ln( ˆ D ) for an unbiased estimator.The simulations with the gamma distributed localization errors were generated according to √ V = σ ∼ Gamma(4 , (cid:104)√ V (cid:105) / , where the standard gamma distribution p.d.f. is Gamma( k, θ ) = θ − k √ V k − exp( −√ V /θ ) / Γ( k ) , and (cid:104)√ V (cid:105) rep-resents the mean error. The simulations with the uniform distributed localization errors were generated accordingto √ V = σ ∼ Uniform( 12 (cid:104)√ V (cid:105) , (cid:104)√ V (cid:105) ) , where the uniform distribution p.d.f. is Uniform( a, b ) = 1 / ( b − a ) for √ V ∈ [ a, b ] and Uniform( a, b ) = 0 for allother values of √ V . 12A) (B) Trajectory Length N E s t i m a t o r R i s k h ` ( b D ; D ) i -2 -1 D = 0 : m = s) ; hp V i = 0 : m), / t = 0 : ; N trials = 10000 VLE EstimatorsSLE Estimators / N ! p V = < Gamma ( ; hp V i = Mean Standard Error hp V i ( m) -2 -1 E s t i m a t o r R i s k h ` ( b D ; D ) i -1 D = 1 : m = s) ; N = 50 : / t = 1 : ; N trials = 10000 VLE EstimatorsSLE Estimators p V = < Gamma ( ; h ; p V i = ) (C) (D) Trajectory Length N E s t i m a t o r R i s k h ` ( b D ; D ) i -1 D = 0 : m = s) ; hp V i = 0 : m), / t = 0 : ; N trials = 10000 VLE EstimatorsSLE Estimators / N ! p V = < Uniform hp V i ; hp V i Mean Standard Error hp V i ( m) -2 -1 E s t i m a t o r R i s k h ` ( b D ; D ) i -1 D = 1 : m = s) ; N = 50 : / t = 1 : ; N trials = 10000 VLE EstimatorsSLE Estimators p V = < Uniform hp V i ; hp V i Figure 1: Comparison of estimation risk of the Vector of Localization Errors (VLE) and Scalar Localization Error(SLE) MLEs with localization errors drawn from Gamma or Uniform distributions. The MLE was found with thefminbnd command from Matlab on the calculated likelihood distributions with a lower bound of − . Log-logplots (A) and (C) show the risk of using a particular estimator on trajectories of various lengths with other constantparameters set to typical experimental values with standard errors drawn from gamma (A) or uniform (C) distributions.Log-log plots (B) and (D) show the risk of using a particular estimator on trajectories of length 50 with various standarderrors drawn from gamma (B) or uniform distributions and all other parameters were set to 1 to study the effects ofrelative localization error. In both (A) and (C) a ﬁducial line shows that after 30 observations the risk for the methodsdecreases approximately ∝ /N with trajectory length; in this regime the SLE estimators perform worse by a constantfactor (to simulation precision) relative to the VLE estimators. In (B) and (D) the risk for SLE estimators increase at afaster rate than the VLE estimators, indicating that the localization errors provide valuable information for improvingestimator reliability.Figures 1A and C are shown in log space to show that in the presence of sufﬁcient information the risk of the VLEestimators is less than the risk of the SLE estimators by a constant proportionality factor when a scalar localizationerror is used to parameterize a continuous distribution of localization errors. In other words, for a given parameterizedscalar localization error, the amount of observations required to generate the same quality ˆ D estimate is always lessby a proportional factor for the VLE estimators method compared to the SLE estimators. Figures 1B and D are shownin log space to show how each estimator begins to fail in the presence of increasing relative localization error givena ﬁxed set of trajectory observations (50). For these subplots, the VLE estimators show a noticeable improvement inestimator reliability when the relative error is equal to or greater than the true underlying D .13n an experimental trajectory, the distribution that parameterizes the localization error is typically a function ofseveral environmental variables so that it can often appear arbitrary or speciﬁc to a particular experimental trial.In this manuscript, we focus on two simple distributions to provide a fair metric for validation over thousands ofsimulated trajectories. It is worthwhile to further investigate the the precision increase of the VLE estimators overtheir SLE counterparts with localization error distributions of varying error variances. We do so by performing trialson trajectories with localization errors parameterized by the Gamma and Uniform distribution, but this time we varythe parameters that characterize the variance of these distributions without altering the value of the mean localizationvariance. To do so, we run simulations where the shape parameter, k, of the gamma distribution is altered so that ourexpression looks like √ V = σ ∼ Gamma( k, (cid:104)√ V (cid:105) /k ) , and the bounds of the uniform distribution are altered so the expression becomes √ V = σ ∼ Uniform([1 − b ] (cid:104)√ V (cid:105) , [1 + b ] (cid:104)√ V (cid:105) ) . (A) (B) Trajectory Length N S L E / V L E R i sk R a t i o D = 0 : m = s) ; hp V i = 0 : m), / t = 0 : ; N trials = 10000 k = 1k = 4k = 7k = 10k = 13Equal Precision p V = < Gamma ( k ; hp V i = k ) Trajectory Length N S L E / V L E R i sk R a t i o D = 0 : m = s) ; hp V i = 0 : m), / t = 0 : ; N trials = 10000 b = 0.1b = 0.3225b = 0.545b = 0.7675b = 0.99Equal Precision p V = < Uniform ( ! b ) hp V i ; ( + b ) hp V i Figure 2: Risk ratio of the Scalar Localization Error (SLE) and Vector of Localization Errors (VLE) MLEs for varioustrajectories parameterized by localization errors drawn from Gamma and Uniform distributions of a single varyingparameter but the same mean (cid:104)√ V (cid:105) . Log-log plot (A) shows the risk ratio of SLE and VLE with 5 Gamma distributionsof the same mean localization error and different shape parameters, k. The increased value of k reduces the varianceof the gamma distribution; k = 1 is an exponential distribution and k = 13 is approaching a gaussian distribution.All of the gamma distributions in (A) have a variance of (cid:104)√ V (cid:105) /k . Log-log plot (B) shows the risk ratio of SLEand VLE with 5 Uniform distributions of the same localization variance and different sampling boundaries, b. Thereduced value of b reduces the variance of the uniform distribution; a given b in plot (B) corresponds to a variance of (cid:16) (cid:104)√ V (cid:105) b (cid:17) / , hence b = 0.1 is nearly a variance of 0. As long as the variance of (cid:104)√ V (cid:105) is greater than 0, the VLEestimators outperform the SLE estimators.We see in Fig. 2 that the effect of increasing the variance relative to the mean of the localization errors in these testdistributions results in a growing disparity between the two classes of estimators. From Fig. 1 the effect of increasing (cid:104)√ V (cid:105) sets a minimum trajectory length where estimates become reasonable; e.g the linear decrease in risk for thegamma distribution of Fig. 1A is seen at shorter trajectories than the uniform distribution of Fig. 1C even though boththe gamma and uniform distributions have the same variance because the uniform distribution has a larger (cid:104)√ V (cid:105) . InFig. 2 the constant precision improvement of the VLE estimators are recognized in the regime where both estimatorsstart reporting risk values that scale linearly with trajectory length. Prior to that, the VLE estimators start performingsigniﬁcantly better at shorter trajectories, which is represented by the peaks seen in Fig. 2. These simulations, whileoverly simpliﬁed versions of real SPT trajectories, highlight the importance of characterizing each localization erroraccordingly to their associated localizations in a trajectory.14 Discussion and Conclusion

Starting from the fundamental diffusion likelihood expression (Eq. 9), we presented three independent solutions, eachof which has different beneﬁts, and leads to a computational algorithm with different advantages. The recursive methodpresents the simplest solution that is numerically more stable than the other methods for estimating likelihoods when D approaches 0. The Laplace method has the advantage that the expected true positions ˆ X are computed along withthe likelihood, which may be useful in some applications. The Markov method was crucial for deriving the terms (cid:15) i in the components M i given knowledge of the true underlying probability distribution, hence the generality ofthe Markov method is its main advantage. In terms of numerical accuracy and computational efﬁciency the Markovmethod is better than the Laplace method especially for very small D values, but it remains computationally inferiorto the recursive method. For practical implementations, we recommend the recursive method unless the MLE of thetrue positions is also desired.The method described here naturally allows for trajectories with missing or irregularly spaced localizations bydecoupling the concept of observation times t i from the exposure time t (cid:15) . This is important when some localizationsare missed because the gap between t i and t i +1 becomes larger, but the exposure time remains the same. Thus tocorrectly account for the motion-blur, the effective weighting of the dependence of observed position o i on x i and x i +1 changes, and our technique directly incorporates this effect. Trajectory intermittency has been accounted for in priorstudies (19) and in those same studies extensions to dynamic errors were suggested, but a convenient computationalframework for an estimator that seamlessly factors in both trajectory intermittencies and dynamic error had not beenexplicitly worked out until now.Numerical implementations of the likelihood forms resulting from the three derivations were tested to justifythe equivalence among the likelihood forms. Since all three derivations began from the same set of ﬁrst principles,the three likelihood calculations are essentially equivalent. We note however that our implementation remains unitagnostic, so that the trajectory with a very small D value can be easily scaled up to appropriate units so that the valueof D is closer to 1, where the numerical calculations will be more robust.Variable localization uncertainties could occur in practice from variable background intensities or photobleachingof the ﬂuorescent label. We compared the performance of our VLE estimator to the current state-of-the-art SLEestimator using the squared log loss function and found a clear performance beneﬁt when trajectories had variablelocalization uncertainties. PKR and MJO contributed equally to this work. KAL, PJC and PKR conceived the project. PJC initiated the for-mulation of the estimation problem as a MLE given a set of observations. PKR derived the recursive, Markov, andLaplace methods. MJO derived the efﬁcient algorithms for the three solution methods and the C++ and MATLABimplementations of the methods, generated the estimation accuracy results and helped to simplify the presentation.All authors contributed to the writing and editing of the manuscript.

10 Acknowledgments

We wish to acknowledge Stan Steinberg and Michael Wester for reading our manuscript and providing enlighteningdiscussions and helpful comments. Financial support for this work was provided primarily by the National ScienceFoundation grant 0954836. Additional support was provided by The New Mexico Spatiotemporal Modeling Center:NIH P50GM085273 (KAL), NIH grant 1R01GM100114 (KAL,MJO) and NIH grant 1R01NS071116 (KAL, MJO).

References [1] Michael J Saxton and Ken Jacobson. SINGLE-PARTICLE TRACKING:Applications to Membrane Dynamics.

Annual Review of Biophysics and Biomolecular Structure , 26:373–399, 1997.152] Michael J Saxton. Two-dimensional continuum percolation threshold for diffusing particles of nonzero radius.

Biophysical journal , 99(5):1490–1499, 2010.[3] Arash Sanamrad, Fredrik Persson, Ebba G Lundius, David Fange, Arvid H Gynn˚a, and Johan Elf. Single-particletracking reveals that free ribosomal subunits are not excluded from the escherichia coli nucleoid.

Proceedings ofthe National Academy of Sciences , 111(31):11413–11418, 2014.[4] Christopher P Calderon, Michael A Thompson, Jason M Casolari, Randy C Paffenroth, and WE Moerner. Quan-tifying transient 3d dynamical phenomena of single mrna particles in live yeast cell measurements.

The Journalof Physical Chemistry B , 117(49):15701–15713, 2013.[5] Nilah Monnier, Syuan-Ming Guo, Masashi Mori, Jun He, P´eter L´en´art, and Mark Bathe. Bayesian approach tomsd-based analysis of particle motion in live cells.

Biophysical journal , 103(3):616–626, 2012.[6] Khuloud Jaqaman, Dinah Loerke, Marcel Mettlen, Hirotaka Kuwata, Sergio Grinstein, Sandra L Schmid, andGaudenz Danuser. Robust single-particle tracking in live-cell time-lapse sequences.

Nature methods , 5(8):695–702, 2008.[7] Arnauld Serg´e, Nicolas Bertaux, Herv´e Rigneault, and Didier Marguet. Dynamic multiple-target tracing to probespatiotemporal cartography of cell membranes.

Nature Methods , 5(8):687–694, 2008.[8] Nicolas Chenouard, Ihor Smal, Fabrice De Chaumont, Martin Maˇska, Ivo F Sbalzarini, Yuanhao Gong, JanickCardinale, Craig Carthel, Stefano Coraluppi, Mark Winter, et al. Objective comparison of particle trackingmethods.

Nature methods , 2014.[9] Alexander D Mont, Christopher P Calderon, and Aubrey B Poore. A new computational method for ambiguityassessment of solutions to assignment problems arising in target tracking. In

SPIE Defense+ Security , pages90920J–90920J. International Society for Optics and Photonics, 2014.[10] Ji Won Yoon, Andreas Bruckbauer, William J Fitzgerald, and David Klenerman. Bayesian inference for improvedsingle molecule ﬂuorescence tracking.

Biophysical journal , 94(12):4932–4947, 2008.[11] Carlas S Smith, Nikolai Joseph, Bernd Rieger, and Keith a Lidke. Fast, single-molecule localization that achievestheoretically minimum uncertainty.

Nature methods , 7(5):373–5, May 2010.[12] H Qian, M P Sheetz, and E L Elson. Single particle tracking. Analysis of diffusion and ﬂow in two-dimensionalsystems.

Biophysical journal , 60(4):910–21, October 1991.[13] Douglas S Martin, Martin B Forstner, and Josef A K¨as. Apparent subdiffusion inherent to single particle tracking.

Biophysical journal , 83(4):2109–2117, 2002.[14] Thierry Savin and Patrick S Doyle. Static and dynamic errors in particle tracking microrheology.

Biophysicaljournal , 88(1):623–38, January 2005.[15] Xavier Michalet. Mean square displacement analysis of single-particle trajectories with localization error: Brow-nian motion in an isotropic medium.

Phys Rev E Stat Nonlin Soft Matter Phys , 82(4):041914, October 2010.[16] Xavier Michalet and Andrew J Berglund. Optimal diffusion coefﬁcient estimation in single-particle tracking.

Physical review. E, Statistical, nonlinear, and soft matter physics , 85(6 Pt 1):061916, June 2012.[17] D Montiel, H Cang, and H Yang. Quantitative characterization of changes in dynamical behavior for single-particle tracking studies.

Journal of Physical Chemistry B , 110:19763–70, 2006.[18] A J Berglund. Statistics of camera-based single-particle tracking.

Phys Rev E Stat Nonlin Soft Matter Phys ,82:011917, 2010. 1619] Bo Shuang, Chad P Byers, Lydia Kisley, Lin-Yung Wang, Julia Zhao, Hiroyuki Morimura, Stephan Link, andChristy F Landes. Improved analysis for determining diffusion coefﬁcients from short, single-molecule trajecto-ries with photoblinking.

Langmuir : the ACS journal of surfaces and colloids , 29(1):228–34, January 2013.[20] Christian L Vestergaard, Paul C Blainey, and Henrik Flyvbjerg. Optimal estimation of diffusion coefﬁcients fromsingle-particle trajectories.

Physical Review E , 89(2):022726, 2014.[21] PS de Laplace. M´emoire sur les suites r´ecurro-r´ecurrentes et sur leurs usages dans la th´eorie des hasards.

M´em.Acad. Roy. Sci. Paris , 6:353–371, 1774.[22] A.A. Markov.

Wahrscheinlichkeitsrechnung . BG Teubner, 1912.[23] S. Chandrasekhar. Stochastic problems in physics and astronomy.

Rev. Mod. Phys. , 15:1–89, Jan 1943.[24] Kevin Cahill.

Physical Mathematics . Cambridge University Press, 2013.[25] L. Isserlis. On a formula for the product-moment coefﬁcient of any order of a normal frequency distribution inany number of variables.

Biometrika , 12(1/2):pp. 134–139, 1918.[26] Moawwad EA El-Mikkawy. On the inverse of a general tridiagonal matrix.

Applied Mathematics and Computa-tion , 150(3):669–679, 2004.[27] L Brown. Inadmissibility of the usual estimators of scale parameters in problems with unknown location andscale parameters.

The Annals of Mathematical Statistics , pages 29–48, 1968.[28] Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin.

Bayesian data analysis , volume 2. Taylor &Francis, 2014.[29] Sheldon M Ross et al.

Stochastic processes , volume 2. John Wiley & Sons New York, 1996.[30] Conte de Boor.

Elementary numerical analysis . McGraw-Hill, 1972.[31] Moawwad E.A. El-Mikkawy. On the inverse of a general tridiagonal matrix.

Applied Mathematics and Compu-tation , 150(3):669 – 679, 2004. 17

UPPORTING MATERIAL: Estimation of the Diffusion Constant fromIntermittent Trajectories with Variable Position Uncertainties

Contents

13 Explicit Derivations on the Problem Formulation 21

14 Method Component Derivations 24

15 Implementation 28

The derivations presented in the main text make heavy use of normalized Gaussian functions, which we denote as afunction of three arguments, N ( a, b, v ) = 1 √ πv exp (cid:20) − ( a − b ) v (cid:21) . The Gaussian function deﬁned this way is symmetric with respect to the ﬁrst two position parameters, so that N ( a, b, v ) = N ( b, a, v ) , and the variance of the function is given by the third parameter. Also, the normalization factor ensures thatthe Gaussian integrated over all space with respect to either of its position parameters is unity, (cid:90) ∞−∞ d a N ( a, b, v ) = (cid:90) ∞−∞ d b N ( a, b, v ) = 1 . As a corollary, a normalized Gaussian has a useful scaling identity, for q > , N ( a, b, v ) = q N ( qa, qb, q v ) . Next, consider the case of the product of two normalized Gaussians sharing a common position parameter, whichcan be rewritten as a product of two normalized Gaussians where the common parameter only appears in one of thetwo, N ( x, µ , η ) N ( x, µ , η ) = N ( µ , µ , η + η ) N ( x, µ (cid:48) , η (cid:48) ) , (36)where, µ (cid:48) = µ η + µ η η + η and, η (cid:48) = η η η + η . Using Eq. 36, the integral of the product of two normalized Gaussians over a shared position parameter is itself anormalized Gaussian in the other two (unintegrated) position parameters, (cid:90) d x N ( x, µ , η ) N ( x, µ , η ) = N ( µ , µ , η + η ) (cid:90) d x N ( x, µ (cid:48) , η (cid:48) ) = N ( µ , µ , η + η ) . (37)With two successive applications of Eq. 36, the product of three Gaussians which share a common position parameterbecomes, N ( x, µ , η ) N ( x, µ , η ) N ( x, µ , η ) = N ( µ , µ , η + η ) N ( x, µ (cid:48) , η (cid:48) ) N ( x, µ , η ) , = N ( µ , µ , η + η ) N ( µ , µ (cid:48) , η (cid:48) + η ) N ( x, µ (cid:48)(cid:48) , η (cid:48)(cid:48) ) , where, µ (cid:48)(cid:48) = µ (cid:48) η + µ η (cid:48) η (cid:48) + η , and, η (cid:48)(cid:48) = η (cid:48) η η (cid:48) + η . (38)Finally, Eq. 38 allows the integral of three normalized Gaussians over a shared position parameter to reduce to (cid:90) d x N ( x, µ , η ) N ( x, µ , η ) N ( x, µ , η ) = N ( µ , µ , η + η ) N ( µ , µ (cid:48) , η (cid:48) + η ) (cid:90) d x N ( x, µ (cid:48)(cid:48) , η (cid:48)(cid:48) )= N ( µ , µ , η + η ) N ( µ , µ (cid:48) , η (cid:48) + η )= N ( µ , µ , η + η ) N ( µ , µ (cid:48) , γ ) , where, γ = η η + η η + η η η + η . (39)19 In the main manuscript, the likelihood distribution for a 1D random walk given a set of observations, O = { o i } Ni =1 , isdescribed as an integral that marginalizes over the N + 1 unknown true positions X = { x i } N +1 i =1 , P ( O | D ) = (cid:90) R N +1 d X N (cid:89) i =1 P ( o i | x i , x i +1 ) P ( x i +1 | x i ) . (40)We seek to rigorously deﬁne our formalism to justify the analysis used in the rest of the manuscript. Under typical biological SPT experiments, the temporal resolution of a probe is high enough that the effects of diffrac-tion are much greater than the effects of particle motion in generating the point spread function of the image. Followingthe assumption that the variance due to diffraction is signiﬁcantly greater than the variance due to motion, the pointspread function can be deﬁned as a stationary Gaussian function centered about the average position of the particle dur-ing a single frame with some background offset. With these considerations, we now set to deﬁne P ( o i , x i , x i +1 | D ) :the probability of an observation and the particle’s true start and end points in a frame given the free diffusion model.Since the maximum likelihood estimator for a Gaussian function returns the peak (the maximum likelihood) and vari-ance (the error) of a gaussian distribution, there is sufﬁcient information from the estimator to build a probabilitydistribution relating the localization to the true averaged position of a particle. The probability of obtaining a localizedposition, o i , given the true averaged position of the particle, ¯ y i , and the estimator variance, v i , is P ( o i | ¯ y i ) = N ( o i , ¯ y i , v i ) . (41)We incorporate more information on ¯ y i by relating it to the start positions at the considered frame, i , and the subsequentframe, i + 1 ; where it is assumed that a frame begins immediately after the prior frame ends. Therefore the probabilitydistribution of o i with frame start coordinates, x i and x i +1 , is expressed as P ( o i , x i , x i +1 | D ) = (cid:90) d¯ y i P ( o i | ¯ y i ) P (¯ y i | x i , x i +1 ) P ( x i +1 | x i ) P ( x i ) , where the diffusion constant, D, is implicitly included in the probability distributions. The probabilities P ( x i +1 | x i ) represent a pure diffusion process. P ( x i +1 | x i ) = N ( x i +1 , x i , ω i ( D )) , (42)where ω i ( D ) = 2 Dδt i and δt i = t i +1 − t i .In section 13, we will explicitly derive P (¯ y i | x i , x i +1 ) , but here we will state the result as P (¯ y i | x i , x i +1 ) = N (cid:18) ¯ y i , (cid:18) − t (cid:15) δt i (cid:19) x i + (cid:18) t (cid:15) δt i (cid:19) x i +1 , Dt (cid:15) (cid:20) − t (cid:15) δt i (cid:21)(cid:19) , where t (cid:15) is the exposure time of a frame; in other words, the time the last photon observed in one frame can bearbitrarily spaced from the ﬁrst photon in the next frame. Hence, trajectory intermittencies can be accounted for byredeﬁning δt i so that frame i + 1 is the next frame that observes a photon from the particle under consideration,omitting all frames that do not provide measurement information. Given that P (¯ y i | x i , x i +1 ) is in the form of aGaussian function with ¯ y i as one of the location parameters, ¯ y i can be effectively marginalized in P ( o i , x i , x i +1 | D ) so that the expression reduces to P ( o i , x i , x i +1 | D ) = P ( o i | x i , x i +1 ) P ( x i +1 | x i ) P ( x i ) . When obtaining the global probability of all O given D , the probabilities P ( x i ) will be conditioned on prior observa-tions, so it becomes necessary to deﬁne the expression P ( o i , x i +1 | x i , D ) = P ( o i | x i , x i +1 ) P ( x i +1 | x i ) . We now wish to get the full expression for P ( O | D ) , which has no dependency on the true positions, X . To do this,ﬁrst deﬁne the expression P ( O, X | D ) = P ( x ) N (cid:89) i =1 P ( o i , x i +1 | x i , D ) = P ( x ) N (cid:89) i =1 P ( o i | x i , x i +1 ) P ( x i +1 | x i ) . Since X are nuisance parameters, integrate the X variables over all conﬁguration space and set P ( x ) = 1 because itis assumed that x must already be known, since the inference of a diffusion probability must have an origin to relateall subsequent coordinates. P ( O | D ) = (cid:90) R N +1 d X N (cid:89) i =1 P ( o i | x i , x i +1 ) P ( x i +1 | x i ) . (43)We will see in section 13 that the following expression, which is the focus of the main manuscript, is equivalent andeasier to evaluate P ( O | D ) = (cid:90) R N d X N (cid:89) i =1 N ( o i , x i , ε i ( D )) N − (cid:89) j =1 N ( x i +1 , x i , ω i ( D )) = (cid:90) R N d X N (cid:89) i =1 M i N − (cid:89) j =1 T j . (44)where M i = M i ( o i , x i ) = N ( o i , x i , ε i ( D )) , for ≤ i ≤ N, and (45) T i = T i ( x i +1 , x i ) = N ( x i +1 , x i , ω i ( D )) , for ≤ i ≤ N − . (46)

13 Explicit Derivations on the Problem Formulation

There were a few results presented in section 12 that are clariﬁed in this section. The probability distribution ofan averaged position given the start and end points of a frame are discussed and analytically solved in the continuouslimit. Then the relation between representation of P ( O | D ) with the probability distribution and reduced measurementfunctions is investigated. Given D , the probability density of a transition from point a to to point b separated by a time T is P ( b | a ) = N ( b, a, DT ) . If an intermediate point, y ( t ) sampled at a time t < T is considered, the joint probability density of a transition from a to y ( t ) and then from y ( t ) to b is P ( y ( t ) , a, b ) = P ( a ) N ( y ( t ) , a, Dt ) N ( b, y ( t ) , D ( T − t ))= P ( a ) N ( b, a, DT ) N (cid:18) y ( t ) , a (cid:18) − tT (cid:19) + b tT , D tT ( T − t ) (cid:19) . The probability density of the variable y ( t ) , preconditioned on the end points, a and b, is then P ( y ( t ) | b, a ) = N (cid:18) y ( t ) , a (cid:18) − tT (cid:19) + b tT , D tT ( T − t ) (cid:19) . (47)21q. 47 is the probability density for what is known as a Brownian Bridge (29), or Brownian motion with preconditionedend points. It has a mean and covariance deﬁned as (cid:104) y ( t ) (cid:105) = a (cid:18) − tT (cid:19) + b tT cov [ y ( t ) , y ( s )] = 2 D (cid:18) s − stT (cid:19) for s ≤ t ≤ T It is now of interest to ﬁnd the probability density for a quantity that describes an integrated average of y ( t ) such that ¯ y = 1 t ε (cid:90) t ε t =0 d t y ( t ) . Since each y ( t ) is a normally distributed random variable, y(t) is a gaussian process, then the time averaged integralof a Gaussian process, ¯ y , is also a normally distributed random variable. Therefore, from Isserlis theorem (25) onlythe ﬁrst two moments of ¯ y are needed to determine its probability distribution. The ﬁrst moment is (cid:104) ¯ y (cid:105) = (cid:28) t ε (cid:90) t ε t =0 d t y ( t ) (cid:29) = 1 t ε (cid:90) t ε t =0 d t (cid:104) y ( t ) (cid:105) = 1 t ε (cid:90) t ε t =0 d t (cid:20) a (cid:18) − tT (cid:19) + b tT (cid:21) = a (cid:18) − t (cid:15) T (cid:19) + b t (cid:15) T = (1 − α ) a + αb, where for notational convenience, we deﬁne α = t (cid:15) / T . It follows that the second moment is (cid:10) ¯ y (cid:11) = (cid:28) t ε (cid:90) t ε t =0 d t y ( t ) (cid:90) t ε s =0 d s y ( s ) (cid:29) = 1 t ε (cid:90) t ε t =0 d t (cid:90) t ε s =0 d s (cid:104) y ( t ) y ( s ) (cid:105) = 1 t ε (cid:90) t ε t =0 d t (cid:90) t ε s =0 d s (cid:104) [ y ( t ) − (cid:104) y ( t ) (cid:105) + (cid:104) y ( t ) (cid:105) ][ y ( s ) − (cid:104) y ( s ) (cid:105) + (cid:104) y ( s ) (cid:105) ] (cid:105) = 1 t ε (cid:90) t ε t =0 d t (cid:90) t ε s =0 d s cov [ y ( t ) , y ( s )] + (cid:104) y ( t ) (cid:105) (cid:104) y ( s ) (cid:105) = (cid:104) ¯ y (cid:105) + 1 t ε (cid:90) t ε t =0 d t (cid:90) t ε s =0 d s cov [ y ( t ) , y ( s )] . Therefore, the variance for ¯ y is (cid:10) ¯ y (cid:11) − (cid:104) ¯ y (cid:105) = 1 t ε (cid:90) t ε t =0 d t (cid:90) t ε s =0 d s cov [ y ( t ) , y ( s )]= 1 t ε (cid:90) t ε t =0 d t (cid:20)(cid:90) t ε s = t d s D (cid:18) t − stT (cid:19) + (cid:90) ts =0 d s D (cid:18) s − stT (cid:19)(cid:21) = 1 t ε (cid:90) t ε t =0 d t D (cid:20) t (cid:15) t − t − t (cid:15) t T + t (cid:21) = 2 D (cid:20) t (cid:15) − t (cid:15) − t (cid:15) T + t (cid:15) (cid:21) = 2 Dt (cid:15) (cid:20) − α (cid:21) . Given that ¯ y must be a normally distributed and its ﬁrst two moments are known, then P (¯ y | a, b ) = N (¯ y, (cid:104) ¯ y (cid:105) , (cid:10) ¯ y (cid:11) − (cid:104) ¯ y (cid:105) ) = N (cid:18) ¯ y, (1 − α ) a + αb, Dt ε (cid:20) − α (cid:21)(cid:19) Furthermore, it was established in Eq. 41 that a time averaged position was related to a localized observation by anormal Gaussian function so that P ( o | a, b ) = (cid:90) d¯ y P ( o | ¯ y ) P (¯ y | a, b ) = N (cid:18) o, (1 − α ) a + αb, v + 2 Dt ε (cid:20) − α (cid:21)(cid:19) (48)22 In the limit where the camera exposure time goes to 0, the probability of o i is dependent on one coordinate, x i .However, as the camera exposure time becomes non-negligible with respect to the time spacing between frames, theprobability of o i becomes increasingly dependent on the subsequent coordinate, x i +1 . The o i dependence on both x i and x i +1 make a direct approach to solving the integral computationally difﬁcult. In the main manuscript, Markov’smethod approach showed the following relationship for a multivariate gaussian Σ i,j = (cid:104) s i s j (cid:105) , where Σ is the covariance matrix of a multivariate Gaussian function describing the vector of random variables S = { s i = o i +1 − o i } N − i =1 . Given that our probability distribution is obtained by integrating several Gaussian functions,the result of the distribution is a Gaussian function. Therefore, if the moments of S are known, the parameter (cid:15) i forthe simpler expression can be derived given that (cid:104) s i s i +1 (cid:105) = − (cid:15) i +1 was previously shown to be true for the simpler expression in the main manuscript. Starting from our derived proba-bility expression in Eq. 48 for an arbitrary o i P ( o i | x i , x i +1 ) = N (cid:18) o i , (1 − α i ) x i + α i x i +1 , v i + 2 Dt (cid:15) (cid:20) − α i (cid:21)(cid:19) where v i is deﬁned as the localization variance and α i = t (cid:15) δt i . From the properties of a Gaussian function with 0 mean (cid:104) o i − (1 − α i ) x i − α i x i +1 (cid:105) = 0 , (cid:10) ( o i − (1 − α i ) x i − α i x i +1 ) (cid:11) = v i + 2 Dt (cid:15) (cid:20) − α i (cid:21) . (49)Which implies (cid:104) o i − x i (cid:105) = 0 (cid:104) o i − x i | X (cid:105) = α i ( x i +1 − x i ) (cid:10) ( o i − x i ) (cid:11) = v i + 2 Dt (cid:15) . (50)Also (cid:104) s i (cid:105) = (cid:104) o i +1 − o i (cid:105) = (cid:104) o i +1 − x i +1 (cid:105) − (cid:104) o i − x i (cid:105) + (cid:104) x i +1 − x i (cid:105) = 0 . (51)Given the relations in Eq. 50 and Eq. 51 (cid:104) s i s i +1 (cid:105) = (cid:104) ( o i +2 − o i +1 )( o i +1 − o i ) (cid:105) = − (cid:15) i +1 = (cid:104) ( o i +2 − o i +1 + x i +2 − x i +2 + x i +1 − x i +1 )( o i +1 − o i + x i +1 − x i +1 + x i − x i ) (cid:105) = (cid:104) ( o i +1 − x i )( x i +1 − x i ) (cid:105) − (cid:10) ( o i +1 − x i +1 ) (cid:11) + · · · = α i +1 (2 Dδt i +1 ) − Dt (cid:15) − v i +1 = 2 Dt (cid:15) − v i +1 . Where all the other terms in the ellipsis ( · · · ) go to 0. Additionally, it follows that (cid:10) s i (cid:11) = w i + (cid:15) i + (cid:15) i +1 = 2 Dδt i − Dt (cid:15)

16 + v i + v i +1 . Therefore ε i ( D ) = v i − Dt (cid:15) . Which is the variance correction discovered in earlier diffusion estimation papers (14, 17, 18).23

The following sub-sections explain some of the relations that were explicitly stated to complete the derivations in themain text.

Recalling the objective function in the Laplace method − ln ( f ( X )) = N (cid:88) i =1 (cid:20) ln (2 πε i ) + ( o i − x i ) ε i (cid:21) + N − (cid:88) i =1 (cid:20) ln (2 πω i ) + ( x i +1 − x i ) ω i (cid:21) , (52)the gradient of Eq. 52 is − ∂ ln f∂x = ( x − o ) ε + ( x − x ) ω − ∂ ln f∂x i = ( x i − o i ) ε i + ( x i − x i − ) ω i − + ( x i − x i +1 ) ω i − ∂ ln f∂x N = ( x N − o N ) ε N + ( x N − x N − ) ω N − , (53)where i ∈ N − . The Hessian − ln ∇∇ f ( (cid:98) X ) = M , of Eq. 52 has the non-zero elements M , = − ∂ ln f∂x ∂x = 1 ε + 1 ω M i,i = − ∂ ln f∂x i ∂x i = 1 ε i + 1 ω i − + 1 ω i M N,N = − ∂ ln f∂x N ∂x N = 1 ε N + 1 ω N − M i,i +1 = − ∂ ln f∂x i ∂x i +1 = − ω i M i,i − = − ∂ ln f∂x i ∂x i − = − ω i − Setting the gradient in Eq. 53 equal to 0 and moving the constants to the left hand side of the equation gives o ε = (cid:98) x ε + ( (cid:98) x − (cid:98) x ) ω o i ε i = (cid:98) x i ε i + ( (cid:98) x i − (cid:98) x i − ) ω i − + ( (cid:98) x i − (cid:98) x i +1 ) ω i o N ε N = (cid:98) x N ε N + ( (cid:98) x N − (cid:98) x N − ) ω N − . With additional factoring, the expression looks like o ε = (cid:98) x · (cid:18) ε + 1 ω (cid:19) + (cid:98) x · (cid:18) − ω (cid:19) o i ε i = (cid:98) x i · (cid:18) ε i + 1 ω i − + 1 ω i (cid:19) + (cid:98) x i − · (cid:18) − ω i − (cid:19) + (cid:98) x i +1 · (cid:18) − ω i (cid:19) o N ε N = (cid:98) x N · (cid:18) ε N + 1 ω N − (cid:19) + (cid:98) x N − · (cid:18) − ω N − (cid:19) . M · (cid:98) X . We invert the Hessian matrix to bring it to the other side of theequation so that the resulting expression for the maximum likelihood looks like (cid:98) X = M − Θ , where the components of Θ are θ i = o i /ε i . Starting from the probability component formalism f ( X ) = N (cid:89) i =1 P ( o i | x i , x i +1 ) P ( x i | x i +1 ) = N (cid:89) i =1 N ( o i , (1 − α i ) x i + α i x i +1 , q i ) N ( x i , x i +1 , ω i ) , where q i is the variance due to the observation and α i = t (cid:15) δt i . We solve for the Hessian of our objective function M = −∇∇ ln f ( X ) M , = − ∂ ln f∂x = (1 − α ) q + 1 ω M i,i = − ∂ ln f∂x i = (1 − α i ) q i + ( α i − ) ε i − + 1 ω i + 1 ω i − , ≤ i ≤ NM N +1 ,N +1 = − ∂ ln f∂x N = α N q N + 1 ω N M i,i +1 = M i +1 ,i = − ∂ ln f∂x i ∂x i +1 = (1 − α i )( α i ) q i − ω i , ≤ i ≤ N. The form is similar to our functions, so we can solve for the maximum likelihood in the same fashion (cid:98) X = M − Θ Where the components of Θ are modiﬁed as θ i = (1 − α i ) o i q i + α i − o i − q i − , and for completeness we set α = 0 and α N +1 = 0 . To understand how the integration of S (cid:48) on P ( O | D ) behaves, lets ﬁrst consider the integration of O on P ( O | D ) withone o i held constant, which is equivalent to multiplying P ( O | D ) with a delta distribution δ ( o (cid:48) i − o i ) . The integral of P ( O | D ) and the delta distribution with respect to O is of the form (cid:90) d O δ ( o (cid:48) i − o i ) P ( O | D ) = (cid:90) d O d X δ ( o (cid:48) i − o i ) N (cid:89) i =1 M i N − (cid:89) j =1 T j = (cid:90) d X d O δ ( o (cid:48) i − o i ) N (cid:89) i =1 N ( o i , x i , ε i ) N − (cid:89) j =1 N ( x j +1 , x j , ω j ) . O basis is integrated ﬁrst allows us to marginalize all O except for o i .There are then N terms of X which can be effectively marginalized (cid:90) d X d O δ ( o (cid:48) i − o i ) N (cid:89) i =1 N ( o i , x i , ε i ) N − (cid:89) j =1 N ( x j +1 , x j , ω j )= (cid:90) d X N ( o (cid:48) i , x i , ε i ) N − (cid:89) j =1 N ( x j +1 , x j , ω j ) = 1 . Now we wish to perform a similar integration, but this time on the basis of S (cid:48) . The integration of P ( O | D ) withrespect to O is a completely different function than an integration with respect to S (cid:48) , but we wish to show that theintegration on S (cid:48) yields analogous to results to integration on O with one o i held constant. If o i = o (cid:48) i is held constant,than we can directly express s (cid:48) i = o i +1 − o (cid:48) i in terms of one variable, o i +1 , if i < N . We can also express s (cid:48) i − interms of o i − if i > . Analogously, every o i + k can be expressed as o i + k = i + k − (cid:88) j = i s (cid:48) j + o (cid:48) i = s (cid:48) i + k − + o (cid:48) i + g ( s (cid:48) i + k − , i ) and every o i − l can be expressed as o i − l = i − (cid:88) j = i − l − s (cid:48) j + o (cid:48) i = − s (cid:48) i − l + o (cid:48) i + h ( s (cid:48) i − l +1 , i ) . We perform this substitution with an arbitrary o i held ﬁxed to express the integral over S (cid:48) as (cid:90) d S (cid:48) P ( O | D ) = (cid:90) d S (cid:48) d X N (cid:89) i =1 M i N − (cid:89) j =1 T j = (cid:90) d X d S (cid:48) N ( o (cid:48) i , x i , ε i ) N ( s (cid:48) i + o (cid:48) i , x i +1 , ε i +1 ) N (cid:89) j = i +2 N ( s (cid:48) j − + o (cid:48) i + g ( s j − , i ) , x i , ε i ) i − (cid:89) k =1 N ( − s (cid:48) + o (cid:48) i + h ( s (cid:48) , i ) , x , ε ) N − (cid:89) l =1 N ( x l +1 , x l , ω l ) . We can then shufﬂe the order of integration so that we can iteratively integrate the components of the S basis, essen-tially the components furthest from o (cid:48) i , that are expressed in only one of the univariate gaussian functions that comprise P ( O | D ) , effectively performing a sequential marginalization (cid:90) d X d S (cid:48) N ( o (cid:48) i , x i , ε i ) N ( s (cid:48) i + o (cid:48) i , x i +1 , ε i +1 ) N (cid:89) j = i +2 N ( s (cid:48) j − + o (cid:48) i + g ( s j − , i ) , x i , ε i ) i − (cid:89) k =1 N ( − s (cid:48) + o (cid:48) i + h ( s (cid:48) , i ) , x , ε ) N − (cid:89) l =1 N ( x l +1 , x l , ω l )= (cid:90) d X d s i d s i − N ( o (cid:48) i , x i , ε i ) N ( s (cid:48) i + o (cid:48) i , x i +1 , ε i +1 ) N ( − s (cid:48) i − + o (cid:48) i , x i , ε i ) N − (cid:89) l =1 N ( x l +1 , x l , ω l )= (cid:90) d X N ( o (cid:48) i , x i , ε i ) N − (cid:89) l =1 N ( x l +1 , x l , ω l ) = 1 . (54)From this result, we see that holding a single o i ﬁxed is quite arbitrary, as the term is eventually marginalized by itsassociated x i . It is also apparent that ﬁxing a particular o i allows complete isolation of a particular s i basis if all26ther s k bases are marginalized. However, if we wish to evaluate s i and s i − components with o i ﬁxed, we see in theintegral expression that there will be some correlation between adjacent displacements. Most importantly, we see that P ( O | D ) is a normalized probability density under S (cid:48) . We shall solve for the expectation values on S (cid:48) over P ( O | D ) . In order to do so, we will use the trick in Eq.54 wherewe hold one observation constant until the appropriate substitutions are taken. Solving for (cid:104) s i (cid:105) in this spirit yields (cid:104) s i (cid:105) = (cid:90) d S (cid:48) d X s i N (cid:89) j =1 M j N − (cid:89) k =1 T k = (cid:90) d X d s (cid:48) i s (cid:48) i N ( s (cid:48) i + o i , x i +1 , ε i +1 ) N ( o i , x i , ε i ) N − (cid:89) j =1 N ( x j +1 , x j , ω j )= (cid:90) d X d s (cid:48) i s (cid:48) i N ( s (cid:48) i , , ε i + ε i +1 + ω i ) N ( x i , β i , γ i ) N ( x i +1 , β i +1 , γ i +1 ) N − (cid:89) j =1 ,j (cid:54) = i N ( x j +1 , x j , ω j )= (cid:90) d s (cid:48) i s (cid:48) i N ( s (cid:48) i , , ε i + ε i +1 + ω i ) = 0 , (55)where the terms β i and γ i are intermediate variables that are generated from rearranging Gaussian functions. Thesevariables are immediately marginalized by integration on X, so we shall omit their explicit representation. Aside fromthe fact that the expectation value on the displacement means are all 0, we also discover another striking property fromEq. 55, that is, the marginalization of all other s (cid:48) and all X isolates the expectation calculation to components of s (cid:48) i that are independent of all other marginalized variables. The same results from Eq. 55 are used so that the expectationof the variance is (cid:104) s i (cid:105) = (cid:90) d S (cid:48) d X s (cid:48) i N (cid:89) j =1 M j N − (cid:89) k =1 T k = (cid:90) d X d s (cid:48) i s (cid:48) i N ( s (cid:48) i + o i , x i +1 , ε i +1 ) N ( o i , x i , ε i ) N − (cid:89) j =1 N ( x j +1 , x j , ω j )= (cid:90) d X d s (cid:48) i s (cid:48) i N ( s (cid:48) i , , ε i + ε i +1 + ω i ) N ( x i , β i , γ i ) N ( x i +1 , β i +1 , γ i +1 ) N − (cid:89) j =1 ,j (cid:54) = i N ( x j =1 , x j , ω j )= (cid:90) d s (cid:48) i s (cid:48) i N ( s (cid:48) i , , ε i + ε i +1 + ω i ) = ε i + ε i +1 + ω i . Now on to the covariance terms; starting with adjacent displacements (cid:104) s i s i +1 (cid:105) . However, if a given o i is ﬁxed,there is a dependence between adjacent displacements. With the techniques implemented in Eq. 55 a bivariate gaussianfunction is isolated from the rest of the integral expression. (cid:104) s (cid:48) i s (cid:48) i +1 (cid:105) = (cid:90) d S (cid:48) d X s (cid:48) i s (cid:48) i +1 N (cid:89) j =1 M j N − (cid:89) k =1 T k = (cid:90) d X d s (cid:48) i d s (cid:48) i − s (cid:48) i s (cid:48) i +1 N ( s (cid:48) i + o i , x i +1 , ε i +1 ) N ( o i , x i , ε i ) N ( s (cid:48) i − + o i , x i , ε i ) N − (cid:89) j =1 N ( x j +1 , x j , ω j )= (cid:90) d s (cid:48) i d s (cid:48) i +1 s (cid:48) i s (cid:48) i +1 N ( S b , , Σ b ) = − ε i +1 . (cid:10) s (cid:48) i s (cid:48) i +1 (cid:11) induces a bivariate Gaussianfunction with location parameters S b = [ s i , s i +1 ] (cid:124) and covariance matrix Σ b = (cid:20) ω i + (cid:15) i + (cid:15) i +1 − (cid:15) i +1 − (cid:15) i +1 ω i +1 + (cid:15) i +1 + (cid:15) i +2 (cid:21) . Conversely, when evaluating (cid:104) s i s i + k (cid:105) where k > , it is necessary to temporarily ﬁx o i as well as o i + k , this results inisolating two univariate Gaussian functions which results in (cid:104) s (cid:48) i s (cid:48) i + k (cid:105) = (cid:90) d S (cid:48) d X s (cid:48) i s (cid:48) i + k N (cid:89) j =1 M j N − (cid:89) k =1 T k = (cid:90) d s (cid:48) i d s (cid:48) i + k s (cid:48) i s (cid:48) i + k N ( s (cid:48) i , , ε i + ε i +1 + ω i ) N ( s (cid:48) i + k , , ε i + k + ε i + k +1 + ω i + k ) = 0 .

15 Implementation

A common algorithmic problem shared by each of the three methods is the need to compute logarithms of products.A straight forward method is to utilize the identity ln (cid:32) N (cid:89) i =1 a i (cid:33) = N (cid:88) i =1 ln a i , (56)the left side of which requires one logarithm and N − multiplications, while the right side requires N logarithmsand N − additions. Directly using the left side of Eq. 56 in computations can lead to numerical over- or underﬂow,while the N logarithms required for the right side can dominate the computational costs, taking up the majority of thecomputational cycles for each of the three algorithms. Thus, to minimize the number of logarithm computations, yetstill maintain numerical accuracy, we utilize a hybrid log-product implementation to evaluate forms like Eq. 56. Thelog–product method builds up a product of a i -values, only taking a logarithm when multiplying by the next a i wouldlead to loss of precision, overﬂow, or underﬂow. Given the inputs, each of the method ﬁrsts compute the N variance terms due to measurement, ε i ( D ) = v i − Dt (cid:15) / ,and the N − variance terms due to diffusion, ω i ( D ) = 2 Dδt i . Any of these variance terms can be arbitrarily closeto zero, so to prevent numerical instabilities we bound these terms away from zero by at least machine epsilon, whilepreserving the sign which can be negative for ε i ( D ) . Listing 1: The Recursive Algorithm for the log-likelihood calculation in C++. This fundemental algorithm can beimpemented with just the log function from the C++ standard math library. We rely on the variances to be computedas described in Sec. 15.2, and the log–product computation as described in Sec. 15.1F l o a t T r e c u s i v e L L H ( i n t N,c o n s t F l o a t T Obs [ ] ,c o n s t F l o a t T dT [ ] ,c o n s t F l o a t T vD [ ] ,c o n s t F l o a t T vM [ ] ) {

28 l o a t T a l p h a [N − < N −

1; n ++) { F l o a t T t e m p a l p h a = vM[ n ] + e t a ;a l p h a [ n −

1] = t e m p a l p h a ;F l o a t T t e m p d i f f = ( Obs [ n] − mu ) ;LLH += t e m p d i f f * t e m p d i f f / t e m p a l p h a ;mu = ( mu*vM[ n ] + Obs [ n ] * e t a ) / t e m p a l p h a ;e t a = vM[ n ] * e t a / t e m p a l p h a +vD [ n ] ; } a l p h a [N −

2] = vM[N − − − − mu ) ;LLH += t e m p d i f f * t e m p d i f f / a l p h a [N − − } The Laplace and Markov method both require solving linear systems of the form Ax = b , where A is symmetrictri-diagonal (all non-zero terms are on the main diagonal or those diagonals immediately above and below the maindiagonal). A naive solution based on inverting matrix A has computational complexity O (cid:0) N (cid:1) , where N is thelength of the trajectory. However because A is tri-diagonal there are established algorithms (30) that use the Gaussianelimination strategy to solve the system in time O ( N ) . Furthermore, the determinant of the matrix can be shown tofollow a recurrence relation (31) that leads to a linear time computation. Combined with the log–product computationdescribed in Sec. 15.1, this leads to a fast algorithm for computing log(det( A )) with a minimum number of calls tothe logarithm function, and time complexity O ( N ) . We make use of these algorithms for the Laplace and Markovmethod implementations. The DST algorithm code is based on the Matlab code provided by the authors (16). Our implementation is as faithful aspossible to the original implementation but there are a few caveats that should be mentioned. First because we assumethat the localization variances are known, we use the form of the estimator that takes in the mean localization varianceas an input parameter. Because the DST can only incorporate a single localization variance we use the mean of thegiven localization variances. For the R parameter we use t (cid:15) / as was derived for uniform exposure intervals (16).The original implementation also leaves off the constant term − (1 / N −

1) log(2 π ))