Fitting a function to time-dependent ensemble averaged data
Karl Fogelmark, Michael A. Lomholt, Anders Irback, Tobias Ambjornsson
FFitting a function to time-dependent ensemble averaged data
Karl Fogelmark , Michael A. Lomholt , Anders Irbäck , and Tobias Ambjörnsson Computational Biology and Biological Physics, Department of Astronomy andTheoretical Physics, Lund University, 223 62 Lund, Sweden Department of Physics, Chemistry and Pharmacy, University of Southern Denmark,Campusvej 55, 5230 Odense M, Denmark * [email protected] 9, 2018 Abstract
Time-dependent ensemble averages, i.e., trajectory-based averages of some observable, are of importancein many fields of science. A crucial objective when interpreting such data is to fit these averages (forinstance, squared displacements) with a function and extract parameters (such as diffusion constants).A commonly overlooked challenge in such function fitting procedures is that fluctuations around meanvalues, by construction, exhibit temporal correlations. We show that the only available general purposefunction fitting methods, correlated chi-square method and the weighted least squares method (whichneglects correlation), fail at either robust parameter estimation or accurate error estimation. We remedythis by deriving a new closed-form error estimation formula for weighted least square fitting. The newformula uses the full covariance matrix, i.e., rigorously includes temporal correlations, but is free ofthe robustness issues, inherent to the correlated chi-square method. We demonstrate its accuracy infour examples of importance in many fields: Brownian motion, damped harmonic oscillation, fractionalBrownian motion and continuous time random walks. We also successfully apply our method, weightedleast squares including correlation in error estimation (WLS-ICE), to particle tracking data. The WLS-ICE method is applicable to arbitrary fit functions, and we provide a publically available WLS-ICEsoftware.
Introduction
Time-dependent ensemble averages appear in several scientific fields. Examples include: particletracking experiments where mean square displacements (MSD) are measured at different samplingtimes [1], human travel dynamics where dispersal distance as a function of time are measured[2], single-molecule pulling experiments[3], applications of fluctuation theorems [4] such as the Jarzynski equality[5], measurements of the time-dependence of donor-acceptor distance dynamics[6], tracer particle dy-namics in complex systems[7] and correlation functions in spin systems and lattice gauge theories[8].The final step when interpreting ensemble averages is often to fit a function to these averages in orderto extract parameters.Fitting a function to data is done so readily in science that one seldom considers the correctness of thestandard go-to solution of the (linear and non-linear) weighted least squares (WLS) method [9, 10, 11].One of the crucial implicit assumptions of the “standard” version of this method is that the fluctuationsaround mean values are independent. However, since for time-dependent ensemble averages the datais sampled along trajectories, this independence assumption is in general not satisfied when analyzingensemble averages; heuristically, if in one trajectory an observable, such as the square displacement, was a r X i v : . [ phy s i c s . d a t a - a n ] M a y maller than its ensemble averaged value at some time, it is typically still so at the next time step. Foran illustrative example, see Figure S1 in Supplementary Information, which shows the time-evolution insimulations of fractional Brownian motion (FBM). Thus, the fluctuations around an ensemble averaged(time-dependent) observable will in general exhibit temporal correlations. Herein, the term trajectoryis used in its widest sense: an observable (such as squared displacement) is chosen, and a trajectory isthen measurements of this observable at different consecutive sampling times.The question now arises of how severe the consequences of neglecting the temporal correlations in leastsquares fitting are. We demonstrate that such neglect leads to unreliable error estimation for parametersand can in some cases lead to underestimated errors for fitted parameters (such as diffusion constants)by more than one order of magnitude for our prototype systems (see below). The unreliability of theestimated errors can have detrimental effects when statistically interpreting the data: The σ ( σ ) rulefor Gaussian statistics states that 68 % (95 %) of the observed data should (on average) fall within ± ( ± ) σ from the estimated mean. For this rule to be meaningful one must have a correct estimator forthe variance in estimated parameters, σ .To our knowledge, the only previous method for dealing fully with correlation in data for functionfitting to ensemble-averages is the correlated chi-square method (CCM) [12, 13]. This method is knownto the lattice quantum chromodynamics community, but does not seem to have found wide spread use.This could partly be due to that, while mathematically sound, numerical robustness issues have beenidentified [14, 15]. We here carefully examine the CCM method and demonstrate that it in general onlyprovides correct parameter estimation in a small region of the "phase space" ( N, M ) , where N is thenumber of sampling times and M is the number of trajectories. Thus, it appears that the CCM is oflimited general purpose use for fitting of time-dependent ensemble averages to a model function.Although the least squares and WLS methods are common techniques for parameter estimationfrom ensemble averages, alternative methods exist, e.g., for inferring parameters from trajectories forbiological systems.[16, 17, 18] In particular, for Brownian motion (BM) an optimal estimator for thediffusion constant has recently been derived[19, 20, 21]. Bayesian methods [11, 22, 23, 24, 25, 26] havealso been used for parameter estimation for certain classes of systems. In general, when they apply, thesemethods give more precise parameter estimates than the WLS method. However, these newer approachesrequire as input a full stochastic model of the process, and we refer to this type of approach as modelmatching methods. By a full stochastic model we here refer to a model from which (in principle) anyprobability or average of a measured observable can be calculated. A simple example is BM, where thetime-evolution is described by a Langevin equation with a noise term for which the statistics is fullyspecified. In contrast, the WLS and CCM methods are parametric function fitting [27] type methods,which can be used even if a full stochastic model is not available to describe the data at hand. An examplefrom single-particle tracking, where function fitting is useful, is if one wants to determine a power-lawexponent for the scaling of the mean-square displacement with time. In this situation, a function fittingprocedure such as WLS can be used, without making any assumption about the underlying dynamics.Also, even if a full stochastic model is indeed available, it might be impractical to carry out a full modelmatching procedure.In this article, we derive a mathematically rigorous expression for the variance and covariance ofestimated parameters in WLS fitting. Our new error estimation formula for fitted WLS parameters takesinto account the temporal correlations, which are intrinsic to ensemble averages based on trajectories.To avoid confusion we term the “standard” WLS method[9, 10, 11] (i.e., weighted least squares neglectingcorrelation) as WLS-ECE (Weighted Least Squares Excluding Correlation in Error estimation), whereasour new approach is referred to as WLS-ICE (Weighted Least Squares Including Correlation in Errorestimation). In figures and discussion where we only consider parameter values and not the associatederrors, we only use the term WLS. In contrast to the previous two methods (WLS-ECE and CCM), ournew method has the desirable unique features of providing both (1) robust parameter estimates in thefull phase space ( N, M ) with mean parameter values in agreement with theory for our prototype systems;(2) error estimates that reproduce the observed spreads in our fitted parameters.As prototype models we use BM, damped harmonic oscillation (DHO) in a heat bath, FBM andcontinuous time random walks (CTRW). These have been identified as important model systems in awide range of systems. BM is of interest to many fields of science [28, 29, 30]. Variants of DHO appear2n physics, engineering and chemistry.[31] FBM has been applied, for instance, to protein dynamics[6],in financial modeling[32], for analyzing climate time series[33], to describe tracer particle diffusion[7, 34]and for modeling earth quake phenomena[35]. Recent applications of CTRW[28, 36] include modelingof human travel patterns[2] and of molecular motions in cells and cell membrane[37, 34]. However, wepoint out that our model systems are merely convenient examples for illustrating our WLS-ICE functionfitting procedure, which can be applied to arbitrary fit functions. Our four model systems provide idealtest beds for our method, because the functions to be fitted, the mean position and MSD, are knownanalytically for these systems. Moreover, trajectories are fast to generate for these systems, which, whichfacilitates stringent testing of the fitting methods based on a relatively large number of trajectories.We finally point out two restrictions on the scope of our study: First, we do not concern ourselveswith the model selection problem [38, 11], i.e., how to choose the “best” model or “best” form for the fitfunction. Second, in single particle tracking (one of the application fields of our results), it is commonto separate between time-averaged observables (such as the time-averaged MSD) and ensemble averagedobservables.[39, 40] In certain cases, these averages are described by the same functional form, but thisis not always so.[40] In this study our sole focus is on ensemble averaged observables. Methods
In what follows, we provide a ready-to-use method, which is further motivated and detailed in SectionA in Supplementary Information.
The WLS-ICE procedure
In experiments or simulations one records a set of trajectories, here denoted by m . The task at handis to fit some functional form f ( t i ; θ ) = f i ( θ ) , with K free fitting parameters θ = θ , . . . , θ K to someensemble averaged observable y ( t i ) = y i over the trajectories, i.e., to a sample mean of the form y i = 1 M M (cid:88) m =1 y ( m ) i (1)where the index i is over the N sampling times T = T , . . . , T N (with N ≥ K ). Herein, we use boldsymbols to denote vectors or matrices. For BM, FBM and CTRW (see Results), which are all zeromean processes, the observable used is the squared displacements, i.e., y ( m ) i = | x ( m ) ( T i ) − x ( m ) (0) | ,where x ( m ) ( t ) is the position (a vector with d components, where d is the number of spatial dimensions)at process time t for trajectory m , and the start time for the simulation/experiment is t = 0 . ForDHO, our non-zero-mean prototype process, we instead use the position directly as relevant observable, y ( m ) i = x ( m ) ( T i ) . It is important to point out, however, that in the fitting procedure the quantity y ( m ) i can be any observable for trajectory m at sampling time T i . We shall consistently use a ’bar’ todenote a sample estimator (we only make use of sample means and sample covariances). The challengein function fitting procedures [10] is to fit some function f i ( θ ) to the data y i and thereby extract themodel parameters, θ . This problem has previously been tackled using the WLS-ECE or CCM methods(reviewed in Section B in Supplementary Information).Our approach, the WLS-ICE method, extends the WLS-ECE procedure with a correct error es-timation formula which takes correlations in fluctuations around ensemble averages into account (seeIntroduction). For completeness and ease of application, we here provide the full details of the proposedWLS-ICE fitting procedure. We start by introducing a cost function, χ , based on the the differencebetween the sample average and the fit function Λ i = f i ( θ ) − y i for all time points, according to χ = Λ T R Λ , (2)where R is a symmetric positive definite matrix. This cost function is to be minimized with respect to θ in order to determine the best parameter values, ˆ θ a ( a = 1 , . . . , K ) [41]. We use a ’hat’ to denote3arameters which have been estimated through minimization of the χ cost function above and for theestimated (co)variance of such parameters. In the WLS method one uses weights R ij = R ij = δ i,j /C ij ,where δ i,j is the Kronecker delta, and the (unbiased) sample “covariance matrix of the mean” is definedas C ij = Q ij /M , with Q being the sample covariance matrix Q ij = 1 M − M (cid:88) m =1 ( y ( m ) i − y i )( y ( m ) j − y j ) . (3)While this specific choice of R is used in our applications, we note that the results in this section,including the new error formula below, is valid for arbitrary choices of R . In Section A in SupplementaryInformation we elaborate on one "non-conventional" choice of R particularly adapted for BM.The parameters, ˆ θ a , obtained by minimizing χ in equation (2), have a (co)variance ∆ ab = (cid:104) (ˆ θ a − θ ∗ a )(ˆ θ b − θ ∗ b ) (cid:105) , where (cid:104) . . . (cid:105) denotes ensemble average. Throughout this study we use a ’star’ to denote exactparameter values, i.e., estimated values as M → ∞ . The variances of the fitted parameter are σ a = ∆ aa .As noted in the Introduction, this covariance depends on the temporal correlations. For a stationaryprocess, it is well-known how to estimate the variance of a mean in the presence of temporal correlations,typically by expressing the variance in terms of the sum or integral of the auto-correlation function [42,43]. In the present context, such an estimation corresponds to fitting to a constant, f i ( t ) = θ , andassuming all correlation functions only depend on time differences.We here extend the above-mentioned results to non-stationary processes and arbitrary fit func-tions by deriving the analogous expression for ˆ∆ ab by using the full multivariate probability densityfor the fluctuations around mean values. Briefly, the covariance for the estimated parameters is de-fined ˆ∆ ab = (cid:104) (ˆ θ a − θ ∗ a )(ˆ θ b − θ ∗ b ) (cid:105) where (cid:104) F ( y ) (cid:105) = (cid:82) F ( y ) ρ ( y ; θ ∗ ) dy dy · · · dy N denotes an averageover the multivariate probability density, ρ ( y ; θ ∗ ) . We note that the dependence of the estimatedparameters ˆ θ on y is implicitly determined by the minimization condition ∂χ /∂θ a = 0. Now, be-cause all y i are averages over M identically distributed random numbers, for large M , it immedi-ately follows from the multivariate central limit theorem that the function ρ takes the Gaussian form: ρ ( y ; θ ∗ ) = Z − exp( − ( y − y ∗ ) T C ∗− ( y − y ∗ ) / with normalization constant Z = (2 π ) N/ (cid:112) det( C ∗ ) [44].Two complications that occur in evaluating ˆ∆ ab in closed-form are that the y -dependence of ˆ θ is implicit,and, in general, non-linear. Both of these challenges are solved by making a Taylor series expansion of ˆ θ a − θ ∗ a in terms of y − y ∗ and implicitly using the minimization condition. The full derivation is givenin Section A in Supplementary Information. The final result is the following estimator: ˆ∆ ab = ˆ φ ab M , (4a) ˆ φ ab = 4 (cid:88) c,d (cid:88) i,j ( ˆ h − ) ac ∂f i ( θ ) ∂θ c (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ ( R T Q R ) ij ∂f j ( θ ) ∂θ d (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ ( ˆ h − ) db , (4b)and ˆ h ab = 2 (cid:88) i,j ∂ f i ( θ ) ∂θ a ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ij Λ j + 2 (cid:88) i,j ∂f i ( θ ) ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ij ∂f j ( θ ) ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ , (4c)where the indices a, b = 1 , . . . , K . Equation (4) gives a mathematically rigorous expression (to lowestorder in /M ) for the covariance of the estimated parameters, and is our key result. It allows us toaccurately estimate the covariance of any parameter fitted by minimizing the cost function in equation (2).Notice that the correlations in fluctuations around mean values enter through the quantity Q , which isestimated using the usual sample estimate above. In practice, our general formula, equation (4) is simpleto implement and computationally fast.The new error estimation formula, equation (4), reduces to previously known results in specific limits.(i) Neglecting the off-diagonal elements of Q above we recover the WLS-ECE error estimation formula [9].(ii) By setting R = C − above we recover the covariance estimation formula for CCM [12, 10]. (iii) Fora stationary process one seeks to fit a constant, f i ( θ ) = θ , to data. For such a case, the minimization4rocedure (solving ∂χ /∂θ = 0 with R ij = (1 /σ ) δ i,j , where σ is the time-independent variance) yields ˆ θ = (1 /N ) (cid:80) i y i , i.e., the parameter estimate is the mean of the data. The error estimation equation (4),then reduces to the usual result [42, 43] ˆ∆ = (1 /M ) (cid:80) i,j Q ij /N used, for instance, in analyzing MonteCarlo and molecular dynamics simulations. (iv) For linear fit functions, f i ( θ ) = θ t i , equation (4) reducesto previously known expressions (equation 5.253 in van den Bos [10]). Validation procedure
We tested the different fitting procedures on simulation data for our four prototype systems (generatedas described in Section D in Supplementary Information). Estimated parameters, ˆ θ a , were comparedto their known exact values θ ∗ a (see Section C in Supplementary Information). For BM, the MSDbehaves as (cid:104) [ x ( t ) − x (0)] (cid:105) = f ( θ, t ) = θ t . The corresponding expression for FBM and CTRW is (cid:104) [ x ( t ) − x (0)] (cid:105) = f ( θ , t ) = θ t θ . For DHO (at critical damping and with the initial conditions x (0) = x and v (0) = 0) , the mean position has the form (cid:104) x ( t ) (cid:105) = f ( θ, t ) = x (1 + θ t ) exp( − θ t ) .For validating the WLS-ICE estimator for ∆ ab , we generated S simulation sets (with S = 500 )each consisting of M trajectories. Using these S × M trajectories, we obtained S number of parameterestimates ˆ θ a . From these S estimates we calculate the covariance ∆ ab (using sample estimators), whichthen serves as true ∆ ab (“ground truth”). This true ∆ ab is then compared to estimates based on theWLS-ICE error formula, equation (4) (which requires only one set of simulations), and the correspondingerror estimates for WLS and CCM. Code availability
Computer codes (Python, Octave/ matlab , and Lisp) which performs the associated fitting (determining ˆ θ a ) and error estimation (calculating ˆ∆ ab ), using a set of measured observables for different trajectoriesand at different times as input, is freely available under the gnu General Public License ( gpl ) [45] at http://cbbp.thep.lu.se/activities/wlsice/ . Results
Our first test of the fitting methods involve comparing histograms of fitted parameters for our fourprototype systems (the number of trajectories, M , and number of sampling times, N , were kept fixed).For both CCM and WLS the S fitted values of a given parameter were binned to a histogram, see Fig. 1,and compared to a Gaussian centered on the mean of the estimated parameters with a variance from theaverage of the error estimates, using either the WLS-ECE or WLS-ICE method. For WLS, the histogramof fitted parameters is centered close to the true value (see also Figure S3 in Supplementary Information).However, only the WLS-ICE method gives a correct error estimation, equation (4), as the predicted widthof the WLS-ECE method, see Section B in Supplementary Information, is much too narrow. Clearly,the new error estimation of the WLS-ICE method performs extremely well. By contrast, the WLS-ECEmethod does not provide correct errors of the estimated parameters; this result extends beyond the chosenparameters for ( N , M ) in Fig. 1, and holds true under rather general conditions, see Fig. 2 (the exceptionis the prefactor for CTRW for very small M ). Notice that while the parameters from the WLS-ICE andWLS-ECE methods are centered on the analytical prediction, this is not true for parameters from theCCM method, which show a strong bias (Fig. 1) for BM, FBM and CTRW (not for DHO). Thus, theWLS-ICE is the only method which yields an acceptable bias and correct error estimation for all modelsystems. Note that for the ensemble size used in Figure S2 in Supplementary Information, the distributionof fitted parameters is well described by a Gaussian, see Section F in Supplementary Information for adiscussion on this topic. For a smaller ensemble size there are deviation from a Gaussian distribution,see Figure S2 in Supplementary Information, in particular for the prefactor for CTRW. From Fig. 2 wenotice that the variance in the estimated parameter does not approach zero as N → ∞ . Hence, the onlyway to decrease the variance in estimated parameters further is to increase M (the WLS estimator isconsistent with respect to M ). 5igure 1: Histograms of fitted parameters for two WLS methods and CCM compared totheoretical predictions.
Each method is tested on: ( a ) Brownian motion (BM), ( b ) damped harmonicoscillation (DHO) ( c – d ) fractional Brownian motion (FBM), and ( e – f ) continuous time random walk(CTRW). In each test, we generate S = 500 data sets, each consisting of M = 1000 trajectories sampledat N = 75 time points (histograms). Panel ( a ) shows the MSD prefactor (proportional to the diffusionconstant) for BM, panel ( b ) shows DHO fitting parameter θ , while panels ( c-f ) left and right panels showthe MSD prefactor θ , and the exponent θ , respectively. For comparing WLS-ICE and WLS-ECE to thehistograms based on the S data sets, we place Gaussian functions with their center positions at the meanof the WLS-fitted parameters. The widths of the Gaussians correspond to the parameter uncertaintyestimated by the fit method (averaged over the S number of fits). The CCM fits for BM and CTRWexhibits a strong bias in the parameter value (not centered on the analytical prediction), and the WLS-ECE fit gives an error estimation, see Section B in Supplementary Information, that is much too small.The new WLS-ICE procedure (Methods) works well, i.e., exhibits negligible bias for all model systemsand yields correct error estimation, equation (4). The rather large number of trajectories ( M = 1000 )was used in order to avoid ill-conditioness and major bias issues for the CCM fitting, compare to Fig. 3.Results for a smaller ensemble size are found in Figure S2 in Supplementary Information, where we seethat also for FBM there can be pronounced bias effects for CCM fitting. For simulation parameters, seeSection D.5 in Supplementary Information. 6igure 2: Error estimation.
Standard deviation from the WLS-ECE and WLS-ICE parameter fits asa function of the number of sampling points, N , used in the fitting procedure (log-scale on the horizontalaxis for visibility). Each method is applied to S = 500 realizations of data from ( a ) Brownian motion(BM), ( b ) damped harmonic oscillation (DHO), ( c – d ) fractional Brownian motion (FBM), and ( e – f )continuous time random walk (CTRW). In conjunction we show the true standard deviation of each ofthese methods computed from the parameters from the fit (lines), i.e., the width seen in Fig. 1, but foran extended range of N . It is evident that the standard deviation from the WLS-ECE fit is far too smallfor almost all N . Error bars show standard error of the mean. For panels a-d there are small biasesfor M = 20 and M = 80 in the observable ˆ σ , as compared to actual standard deviation. These biasescan be removed using the jackknife procedure applied to equation (4b), see Section G in SupplementaryInformation. For panel e, M = 20 , there is discrepancy between the WLS-ICE estimate ˆ σ , and theactual standard deviation; we assign this to slow convergence towards the asymptotic form for themultivariate distribution ρ (see Methods) for CTRW (see also Figure S2 in Supplementary Information).For simulation parameters, see Section D.5 in Supplementary Information.7s we have seen (Fig. 1), the CCM method gives a pronounced bias in the parameter estimate fora specific choice of the number of sampling times N and trajectories M for BM, FBM and CTRWsystems, but not for DHO. In order to understand the generality of these findings, we numericallyquantified the bias for an extended range of ( N, M ) values, and find that the pronounced bias forBM, FBM and CTRW (and lack of bias for DHO) is rather general, see Figure S3 in SupplementaryInformation. In Section E in Supplementary Information we investigate the expected bias for the CCMmethod further by analytical means. Indeed, we find that the parameter estimate from CCM fitting isunbiased for DHO. Mathematically, this result follows from the fact that the observable (mean position)used for the fitting is a linear function of the noise (this is in contrast to BM, FBM and CTRW,where the squared displacements are used as relevant observables). For BM, our analytical calculationin Section E in Supplementary Information shows that for large N the bias for CCM fitting becomes (cid:104) ˆ θ (cid:105) = θ ∗ + DG ( N ) /M , where G ( N ) ≈ − N/ (ln N + γ + 2 ln 2) and γ ≈ . is the Euler-Mascheroniconstant. Thus, with increasing number of sampling points N , the bias increases as N/ ln N (see FigureS3 in Supplementary Information). The bias for CCM appears also in the FBM and CTRW systems, asseen in Fig. 1 Figure S3 in Supplementary Information. A similar calculation for the WLS parameterestimate, see Section E in Supplementary Information, yields only a minor, essentially N -independent,bias with G ( N ) = − − /N ) for BM.In order to further investigate practical implications of the pronounced bias for CCM fitting, as wellas other known issues with the CCM method [14, 15], we quantified in what parts of phase space ( N, M ) the CCM fitting and WLS-ICE provides “acceptable” (see below) parameter estimation, see Fig. 3. First,we find that for large N and moderate to small M , the sample estimate for the covariance matrix C is ill-conditioned (the condition number is larger than the machine precision). In practice this meansthat it cannot be numerically inverted, as required in the CCM parameter estimation procedure, withoutuncontrollable numerical errors. Second, for parts of phase space where ill-conditioness is not an issue,we, rather generously, defined an acceptable fit as one where the bias is smaller than 10% (comparedto the analytic value, θ ∗ a ). We find that for BM, FBM and CTRW there is indeed a thin region of the ( N, M ) -phase space (large M and small N ) where CCM works. For DHO, the bias effect is negligible,as previously noted. However, the ill-conditioness issue is as pronounced for DHO as for BM, FBM andCTRW. In contrast, for WLS ill-conditioness is not a problem (no matrix inversion is required in thisprocedure), and the bias in the parameter estimation is acceptable for most parts of the phase space.The bias inherent in the CCM method (for observables which are not linear functions of the noise (MSDfor BM, FBM and CTRW)) can be reduced by applying the common jackknife procedure [46], whichremoves bias terms proportional to /M , see Section G in Supplementary Information. By applying the(first-order) jackknife procedures to BM, FBM and CTRW (Fig. 3), we find that the bias is reducedwhich expands somewhat the region of the phase space where the CCM method may be used reliably.Note that the computational time is a factor g (i.e., the number of groups into which the trajectoriesare pooled) larger for the first-order jackknife procedure compared to the non-jackknife case. Finally,the jackknifing procedure can be extended to remove higher order bias terms (proportional to /M n ,with n = 2 , , . . . ) [46]. However, for the present case there is no guarantee that these higher order termshave this functional form with respect to M , see Section E in Supplementary Information. Also, ourresults show that the second-order jackknife increased, rather than decreased, the bias in the parameterestimations for most parts of the phase spaces (Fig. 3). For BM, Figure S4 in Supplementary Informationindicates that the reason for this is that the third order term (term proportional to /M ) is generallylarger in amplitude (but of opposite sign) than the second order one. Higher order bias reduction comesat a computational price, since the number of numerical evaluations required for second order jackknifeis g ( g + 1) / times that of non-jackknifed parameter estimation. Due to these findings and the lack of aformal functional form for the bias, beyond the /M term (see above), we do not recommend applyingthe jackknife procedure beyond first order. Finally, we point out that the new error estimation formula,equation (4), remains valid also for jackknifed parameters, see Section G in Supplementary Information.In Figure S5 in Supplementary Information we investigated the "goodness of fit" for the WLS andCCM procedures using a standard R metric (see Section I in Supplementary Information). Examplesof fitted curves are found in Figure S6 in Supplementary Information. A good fit is characterized by R ≈ . We find that, in this sense, the new method provides "good" fits. In contrast, the CCM method8igure 3: Phase space of reliable parameter estimation for CCM and WLS.
For each of ourexample systems, ( a ) Brownian motion (BM), ( b ) damped harmonic oscillation (DHO) ( c – d ) fractionalBrownian motion (FBM), and ( e – f ) continuous time random walk (CTRW), we investigate for whichnumber of sampling times N , and number of trajectory realizations M , the fitting is more than 10%off from its analytical value, averaged over S = 500 simulations. As indicated, CCM is only reliable ina limited region (large M , small N ), which can be extended by a first order jackknife correction. ForBM we also include when the analytically predicted first order bias term for CCM, G ( N ) , see SectionE in Supplementary Information, gives a bias that is 10% of the true parameter value. We also showthe boundary for when more than half of the S generated covariance matrices become ill-conditioned.Interestingly, for the CCM a second order jackknife generally does more harm than good compared tothe first order, which we elaborate more on in Figure S4 in Supplementary Information. In contrast toCCM (non-jackknifed), the parameter estimations for the WLS method are acceptable for most N, M (region above the green curve), and can be extended even further using a jackknife approach (data notshown). For simulation parameters, see Section D.5 in Supplementary Information.9rovides "bad" fits for BM, FBM and CTRW with R (cid:28) for large N . We point out that for the presenttype of data, R is only a heuristic goodness-of-fit metric — its distributional properties are not knownfor general fit functions and correlated data.When computational times are not a concern, error estimation using bootstrap resampling (or therelated jackknife error estimation procedure) are common method (see Section H in SupplementaryInformation).[47] We here find that bootstrap resampling performs as well as WLS-ICE in general forour four models (jackknife error estimation is slightly worse), see Figure S7 in Supplementary Information.Thus, our numerical results indicate that for the type of observables and fit functions used in our modelsystems, the bootstrap can be used for calculating the variance for parameters estimated through χ minimization. However, we point out that such resampling techniques require us to repeat the χ minimization several (herein, 100) times (the WLS-ICE method requires only one χ minimization).Such minimization can be computationally costly, especially for the case when the number of unknownparameters is large. Moreover, one must bear in mind that the bootstrap method is in general a heuristicmethod (there are cases when it does not apply[47]).As a final alternative to the WLS-ICE method, we now briefly turn to error estimation using subsam-pling [43]. Subsampling refers to the method of choosing sampling times sufficiently sparsely in order tomake the data points essentially uncorrelated (the “brute force” method in Figure S1 in SupplementaryInformation is an extreme case of subsampling where only one data point per trajectory is kept). Aftersubsampling, error analysis is performed using standard error analysis for independent data. In orderto properly choose N within this method, N is systematically decreased until the variance saturatesto a constant, which is assumed to be the true variance [43]. Notice for stationary time series, ratherthan reducing the number of sampling times, one can make full use of the data through the blockingmethod.[42] For non-stationary processes the blocking method cannot be used, however. Fig. 2 showshow estimated errors from our WLS-ECE and WLS-ICE analyses depend on the number of data pointsused, N . We find that temporal correlations are so strong that the WLS-ECE method underestimatesthe errors down to very small N . Moreover, finding a sufficiently small N is difficult, since the error doesnot in general saturate to a constant level as N is reduced. These problems are circumvented by insteadusing the error estimation from the WLS-ICE method (i.e., using equation (4) instead of the WLS-ECEequations in Section B in Supplementary Information).As a final test of our method, we now turn to "real world" data. To that end, we use particletracking data used in a competition for testing particle tracking software where 14 teams world-wideparticipated.[48] We choose to analyze this data set for two reasons. First, it served as standard bench-mark data within the particle tracking community. Second, since these movies are based on noisifiedand pixelated simulations (aiming to mimic actual experimental data), we know the values of the un-derlying model parameters. We used their Supplementary Videos 1 (medium particle density), 5 (lowparticle density) and 6 (high particle density). All these movies correspond to BM of vesicles for whichthe expected MSD for the data sets are (cid:104) [ x ( t ) − x (0)] (cid:105) = f BM ( θ, t ) = θ t , with θ = 2 dD = 8 . Forparticle detection in the movies and linking of particle positions into trajectories we used Method 1[48],i.e., the tracking method described by Sbalzarini et al.[49], and implemented as the ImageJ plugin "Par-ticle Tracker" by the MOSAIC group [50]. Parameter settings for the plug-in are listed in Section Jin Supplementary Information. For each video we extracted trajectories which were subsequently cutinto trajectories consisting of discrete process times (there is no memory in BM, so the start time isinessential). Notice that for the higher particle density, fewer sufficiently long trajectories were producedas compared to the low density scenario (values for M are listed in Table 1). We subsequently dividedthe trajectories for each movie into two data sets each with M trajectories. For the fitting procedures thefirst process time point, t = 0 , was discarded (since at t the position is precisely known, the variance =0 and can not be used as a weight in equation (2)), thus leaving us with N = 6 sampling times. Resultsfor the estimated parameters, ˆ θ and associated standard deviation, ˆ σ are found in Table 1. We noticethat the CCM method fails at predicting the correct parameter for high and medium particle densities.This finding is simply due to the smaller ensemble size for these cases which, in turn, is a result of thetracking software’s inability to track and link particles in high and medium density settings. Comparingthe WLS-ECE and WLS-ICE method, we see that the WLS-ECE underestimates the error by factors ≈ σ (confidence level 95 %) of the expected result ( = 8 ). Incontrast, for the WLS-ICE all six observed parameter estimations for θ fall within σ of the expectedvalue. Description
Low density Medium density High density
Video
S5 S1 S6
Number of trajectories M = 310 M = 16 M = 5 Method
ObservableWLS-ICE ˆ θ ˆ σ ˆ θ ˆ σ ˆ θ ˆ σ Results of the three fitting methods for “real world” particle tracking data . Particletrajectories where extracted from the “Vesicle” Supplementary videos from the article by Chenouard etal [48] using the “Particle Tracker” software (MOSAIC group). The trajectories where cut into shortertrajectories, all of length 7 discrete process times. The short trajectories were then divided into twoindependent sets of size M . We then performed fitting using the WLS-ICE, WLS-ECE and CCMmethods for BM, discarding the first process time point, resulting in N = 6 sampling times. Expectedparameter value is θ = 8 (data are noisified and pixelized simulations with known properties). Since M was very small for video S6, we applied the jackknife procedure both in parameter and error estimation(all videos). Results before jackknifing are found in Table S1 in Supplementary Information. We noticethat the CCM method gives ill-conditioness issues for the high density movie, where few trajectoriescould be extracted. The WLS-ECE method underestimates the error as compared to WLS-ICE method.Let us finally briefly discuss how well one is expected to be able to estimate a parameter based on ex-perimental/simulation data. For model matching procedures (see Introduction), the Cramer-Rao boundis useful by providing an expression for the smallest possible variance in the estimated parameter.[10]For the case of BM, optimal estimators (i.e., estimators which reach the Cramer-Rao bound) based onthe measured displacements have been derived for model matching type fitting[19, 20, 21]. For functionfitting, the question is rather whether an optimal cost function, i.e., an optimal weight matrix R , can befound (see equation (2)). If the covariance matrix for the process is independent of the inferred param-eters (up to a proportionality constant), and for linear fit functions, then the generalized least squaresmethod can be shown to be optimal among unbiased WLS methods.[51]. Since the generalized leastsquares method requires as input the inverse of the true covariance matrix, it can be viewed as a hybridmethod in between model matching and function fitting. In Figure S8 in Supplementary Informationwe show results for the generalized least squares for BM (we use the term BMALS – Brownian motionadapted least squares) where we see that, indeed, the variance in estimated parameter value is smallerfor BMALS as compared to WLS-ICE, although the difference is not dramatic. Also notice that for M and N values where the CCM “works” (acceptable bias, see Fig. 3) the variance in estimated parametersfor CCM and BMALS agree, as it should. Discussion, conclusion and outlook
A common task in many fields of science is that of fitting a model to the time-evolving mean of someobservable. Since fluctuations around observed mean values, calculated based on trajectories, are ingeneral correlated in time, the error estimates provided by a “standard” weighted least squares (WLS-ECE) fit can be more than one order of magnitude too small, see Fig. 2. Further, the correlated chi-square method (CCM), involving numerical inversion of a noisy covariance matrix, often show numerical11nstabilities (ill-conditioning) or a strong bias in the fitted parameters, see Fig. 3. To overcome theseproblems, we derived a new error estimation formula, see equation (4), for weighted least squares fitting,which does not require inversion of a noisy covariance matrix. With this formula at hand, a simple, yetaccurate, function fitting procedure, WLS-ICE, can be followed: (A) perform a weighted least squares fitto the data, (B) use the new formula to estimate the errors. We demonstrated on four simulated prototypesystems that the WLS-ICE method provides robust results, with a negligible bias in the fitted parametersand accurate error estimates. Our method’s estimated errors are comparable to errors estimated usingbootstrap and jack-knife resampling for the four model systems. A strength of our method is that thefitting procedure does not have to be repeated multiple times.We separated between two types of parameter estimation procedures: model matching where a fullstochastic model is matched to the data, and function fitting in which a full stochastic model is notknown and one rather seeks to fit a function to the chosen ensemble-averaged observables. The weightedleast-squares method is a procedure of function fitting type.We have in this study not discussed methods for dealing with experimental errors, such as missingdata etc. Such errors depend on the experimental setup and typically have to be dealt with in differentways depending on setup. For the single-particle tracking field (one of the application fields of ourresults), two major sources of experimental errors are: effects due to the finite size of pixels in camerasused to record the trajectory and motional blur effects (in a single time frame, a fluorescent moleculemoves while being imaged). Methods for correcting these types of errors are discussed by Savin et al.[52],Martin et al.[53], Berglund[19] and Calderon.[54]Parameter estimation through χ minimization is ubiquitous throughout many fields of science, andwe hope that our method and publically available software will be found useful in these fields. References [1] Saxton, M. J. Single-particle tracking: connecting the dots.
Nature Methods , 671–672 (2008).[2] Brockmann, D., Hufnagel, L. & Geisel, T. The scaling laws of human travel. Nature , 462–465(2006).[3] de Souza, N. Pulling on single molecules.
Nature methods , 873–877 (2012).[4] Seifert, U. Stochastic thermodynamics, fluctuation theorems and molecular machines. Reports onProgress in Physics , 126001 (2012).[5] Jarzynski, C. Nonequilibrium equality for free energy differences. Physical Review Letters , 2690(1997).[6] Kou, S. & Xie, X. S. Generalized langevin equation with fractional gaussian noise: subdiffusionwithin a single protein molecule. Physical Review Letters , 180603 (2004).[7] Szymanski, J. & Weiss, M. Elucidating the origin of anomalous diffusion in crowded fluids. PhysicalReview Letters , 038102 (2009).[8] Rothe, H. J.
Lattice gauge theories: an introduction, 4th ed. , vol. 74 (World Scientific, 2012).[9] Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P.
Numerical Recipes 3rd Edition:The Art of Scientific Computing (Cambridge University Press, New York, NY, USA, 2007), 3rdedn.[10] Van den Bos, A.
Parameter estimation for scientists and engineers (John Wiley & Sons, 2007).[11] Sivia, D. & Skilling, J.
Data analysis: a Bayesian tutorial (OUP Oxford, 2006).[12] Gottlieb, S., Liu, W., Renken, R. L., Sugar, R. L. & Toussaint, D. Hadron masses with two quarkflavors.
Physical Review D , 2245–2265 (1988).1213] Michael, C. Fitting correlated data. Physical Review D , 2616–2619 (1994).[14] Seibert, D. Undesirable effects of covariance matrix techniques for error analysis. Physical ReviewD , 6240–6243 (1994).[15] Yoon, B., Jang, Y.-C., Jung, C. & Lee, W. Covariance fitting of highly-correlated data in latticeQCD. Journal of the Korean Physical Society , 145–162 (2013).[16] Meroz, Y. & Sokolov, I. M. A toolbox for determining subdiffusive mechanisms. Physics Reports , 1–29 (2015).[17] Höfling, F. & Franosch, T. Anomalous transport in the crowded world of biological cells.
Reportson Progress in Physics , 046602 (2013).[18] Norregaard, K., Metzler, R., Ritter, C. M., Berg-Sørensen, K. & Oddershede, L. B. Manipulationand motion of organelles and single molecules in living cells. Chemical reviews , 4342–4375(2017).[19] Berglund, A. J. Statistics of camera-based single-particle tracking.
Physical Review E , 011917(2010).[20] Michalet, X. & Berglund, A. J. Optimal diffusion coefficient estimation in single-particle tracking. Physical Review E , 061916 (2012).[21] Vestergaard, C. L., Blainey, P. C. & Flyvbjerg, H. Optimal estimation of diffusion coefficients fromsingle-particle trajectories. Physical Review E , 022726 (2014).[22] Persson, F., Lindén, M., Unoson, C. & Elf, J. Extracting intracellular diffusive states and transitionrates from single-molecule tracking data. Nature Methods , 265–269 (2013).[23] Monnier, N. et al. Inferring transient particle transport dynamics in live cells.
Nature Methods ,838–840 (2015).[24] El Beheiry, M., Dahan, M. & Masson, J.-B. Inferencemap: mapping of single-molecule dynamicswith bayesian inference. Nature Methods , 594–595 (2015).[25] Robson, A., Burrage, K. & Leake, M. C. Inferring diffusion in single live cells at the single-moleculelevel. Phil. Trans. R. Soc. B , 20120029 (2013).[26] Krog, J. & Lomholt, M. A. Bayesian inference with information content model check for langevinequations.
Physical Review E , 062106 (2017).[27] Gershenfeld, N. A. The nature of mathematical modeling (Cambridge university press, 1999).[28] Metzler, R. & Klafter, J. The random walk’s guide to anomalous diffusion: a fractional dynamicsapproach.
Physics Reports , 1–77 (2000).[29] Pigeon, S., Fogelmark, K., Söderberg, B., Mukhopadhyay, G. & Ambjörnsson, T. Tracer particlediffusion in a system with hardcore interacting particles.
Journal of Statistical Mechanics: Theoryand Experiment , 123209 (2017).[30] Mehrer, H. & Stolwijk, N. A. Heroes and highlights in the history of diffusion.
Diffusion Funda-mentals , 1–32 (2009).[31] Bloch, S. C. Introduction to Classical and Quantum Harmonic Oscillators (John Wiley & Sons,2013).[32] Bouchaud, J.-P. & Sornette, D. The black-scholes option pricing problem in mathematical finance:generalization and extensions for a large class of stochastic processes.
Journal de Physique I ,863–881 (1994). 1333] Yuan, N., Fu, Z. & Liu, S. Extracting climate memory using fractional integrated statistical model:A new perspective on climate prediction. Scientific Reports (2014).[34] Barkai, E., Garini, Y. & Metzler, R. Strange kinetics of single molecules in living cells. PhysicsToday , 29 (2012).[35] Tsai, C.-C. Slip, stress drop and ground motion of earthquakes: A view from the perspective offractional Brownian motion. Pure and Applied Geophysics , 689–706 (1997).[36] Metzler, R. & Klafter, J. The restaurant at the end of the random walk: recent developments inthe description of anomalous transport by fractional dynamics.
Journal of Physics A: Mathematicaland General , R161 (2004).[37] Weigel, A. V., Simon, B., Tamkun, M. M. & Krapf, D. Ergodic and nonergodic processes coexist inthe plasma membrane as observed by single-molecule tracking. Proceedings of the National Academyof Sciences , 6438–6443 (2011).[38] Machta, B. B., Chachra, R., Transtrum, M. K. & Sethna, J. P. Parameter space compressionunderlies emergent theories and predictive models.
Science , 604–607 (2013).[39] Kepten, E., Bronshtein, I. & Garini, Y. Improved estimation of anomalous diffusion exponents insingle-particle tracking experiments.
Physical Review E , 052713 (2013).[40] Metzler, R., Jeon, J.-H., Cherstvy, A. G. & Barkai, E. Anomalous diffusion models and theirproperties: non-stationarity, non-ergodicity, and ageing at the centenary of single particle tracking. Physical Chemistry Chemical Physics , 24128–24164 (2014).[41] Transtrum, M. K., Machta, B. B. & Sethna, J. P. Why are nonlinear fits to data so challenging? Physical Review Letters , 060201 (2010).[42] Flyvbjerg, H. & Petersen, H. G. Error estimates on averages of correlated data.
The Journal ofChemical Physics , 461–466 (1989).[43] Berg, B. A. & Billoire, A. Markov chain Monte Carlo simulations (Wiley Online Library, 2008).[44] Van Kampen, N. G.
Stochastic processes in physics and chemistry , vol. 1 (Elsevier, 1992).[45] gnu
General Public License. URL .[46] Miller, R. G. The jackknife — a review.
Biometrika , 1–15 (1974).[47] Efron, B. & Tibshirani, R. J. An introduction to the bootstrap (CRC press, 1994).[48] Chenouard, N. et al.
Objective comparison of particle tracking methods.
Nature Methods , 281(2014).[49] Sbalzarini, I. F. & Koumoutsakos, P. Feature point tracking and trajectory analysis for videoimaging in cell biology. Journal of Structural Biology , 182–195 (2005).[50] Sbalzarini, I. F. & Koumoutsakos, P. Particletracker (2016). URL http://imagej.net/Particle_Tracker . Version November 2016.[51] Kariya, T. & Kurata, H.
Generalized least squares (John Wiley & Sons, 2004).[52] Savin, T. & Doyle, P. S. Static and dynamic errors in particle tracking microrheology.
BiophysicalJournal , 623–638 (2005).[53] Martin, D. S., Forstner, M. B. & Käs, J. A. Apparent subdiffusion inherent to single particletracking. Biophysical Journal , 2109–2117 (2002).1454] Calderon, C. P. Motion blur filtering: A statistical approach for extracting confinement forces anddiffusivity from a single blurred trajectory. Physical Review E , 053303 (2016).[55] Chaichian, M. & Demichev, A. Path integrals in physics, vol. 1: Stochastic processes and quantummechanics. IOP, Bristol, UK (2001).[56] Nørrelykke, S. F. & Flyvbjerg, H. Harmonic oscillator in heat bath: Exact simulation of time-lapse-recorded data and exact analytical benchmark statistics.
Physical Review E , 041103 (2011).[57] Qian, H. Fractional Brownian motion and fractional Gaussian noise. In Processes with Long-RangeCorrelations , 22–33 (Springer, 2003).[58] Mandelbrot, B. B. & Van Ness, J. W. Fractional Brownian motions, fractional noises and applica-tions.
SIAM Review , 422–437 (1968).[59] Davies, R. B. & Harte, D. Tests for Hurst effect. Biometrika , 95–101 (1987).[60] Chambers, M. The simulation of random vector time series with given spectrum. Mathematical andComputer Modelling , 1–6 (1995).[61] Quenouille, M. H. Notes on bias in estimation. Biometrika , 353–360 (1956).[62] Gradshteyn, I. & Ryzhik, I. Table of integrals, series and products (corrected and enlarged editionprepared by A. Jeffrey and D. Zwillinger). Academic Press, New York (2000).[63] Anderson, T. W.
An introduction to multivariate statistical analysis, 3rd ed. (Wiley New York,2003).[64] Schucany, W., Gray, H. & Owen, D. On bias reduction in estimation.
Journal of the AmericanStatistical Association , 524–533 (1971).[65] Efron, B. & Stein, C. The jackknife estimate of variance. The Annals of Statistics
Statistical science
Acknowledgments
We are grateful to Bo Söderberg and Björn Linse for fruitful discussions. T.A. was supported by theSwedish Research Council (grant nos 2009-2924 and 2014-4305). K.F. was supported by the SwedishResearch Council (grant no 2010-5219). M.A.L. acknowledges funding from the Danish council forIndependent Research-Natural Sciences (FNU), grant number 4002-00428B.
Author contributions statement
M.A.L. and T.A. conceived the idea of the project. All authors contributed to the conceptual design ofthe WLS-ICE method. K.F. performed the simulations and wrote the analysis software supervised byT.A. K.F. prepared all figures. T.A. and K.F. wrote the manuscript with help from A.I. and M.A.L.T.A. derived the new error estimation formula (with and without jackknife). M.A.L. derived the biascorrection prediction for BM with input from K.F. and T.A. A.I. suggested the use of jackknife for CCMfitting. T.A. coordinated the project.
Competing interests
The authors declare no competing interests. 15 upplementary Figures
Supplementary Figure S1:
Correlations in fluctuations around ensemble averages for real tra-jectories compared to uncorrelated fluctuations (synthetic data).
The displacement squared y ( m ) i = [ x ( m ) ( t i ) − x ( m ) (0)] for fractional Brownian motion (FBM) as a function of time, t , for twotrajectories, labeled by m , and the mean of a large ensemble ( M = 10 ) of trajectories. Panel ( a ) showsactual trajectories which exhibit strong temporal correlation, meaning: if we are above the mean for sometime point on a trajectory, we are likely to still be above the mean for time points close to it (circled).In panel ( b ) we have constructed “synthetic” trajectories for comparison by only using one data pointfrom each real trajectory, and "throw away" the rest, resulting in (computationally expensive) uncorre-lated data. That is, within this “brute force” method, to generate a single uncorrelated trajectory of N sampling points, we need to use the same amount of real trajectories, and throw away all data pointssave one. Data was generated from a one-dimensional FBM simulation with Hurst parameter H = 0 . ,see Supplementary Methods section C.3.upplementary Figure S2: Histograms of fitted parameters for two WLS methods and CCMcompared to theoretical predictions for a small ensemble size.
All panels are identical to thosein Figure 1 in the main text except that we here only used M = 150 trajectories (instead of M = 1000 ).In panel e (the CTRW prefactor), the CCM fitting procedure gave a vastly incorrect parameter estimate( (cid:104) θ (cid:105) /θ ∗ = 13 . ) and the associated histogram is therefore not displayed. Due to the smaller M valueused here as compared to Figure 1 in the main text the histogram of fitted parameters are non-Gaussianfor panel e , see Supplementary Methods Sec. F for a discussion on this topic. The other panels convergedto normal distributions for smaller M values. Examples of parameter fits to the MSD data are shown inFig. S6. 17upplementary Figure S3: Bias in the parameter fit.
The residual bias in the fit (multiplied bythe number of trajectories M ) as a function of sampling points, N (log-scale for the horizontal axisfor visibility), averaged over parameters from fitting to S = 500 realizations of the mean. ( a ) For theBrownian motion (BM) CCM fit, the analytical prediction, G ( N ) , (full line) for the first order bias followsthe observed bias for M = 10 , data (Supplementary Methods section E.3). For ( b ) damped harmonicoscillation (DHO) the bias in CCM and WLS are both small, but for ( c – d ) fractional Brownian motion(FBM), and ( e – f ) continuous time random walk (CTRW), the bias term in CCM is much larger thanthe WLS bias. The bias can be alleviated to some degree by a Jackknife procedure. Error bars showstandard error of the mean. For simulation parameters, see Supplementary Methods section D.5.18upplementary Figure S4: High order bias contribution in CCM fitting for BM.
The bias in theparameter estimation is commonly assumed to be of the form ˆ θ = θ ∗ + a/M + b/M + c/M + O ( M − ) ,see Supplementary Methods section E. In panel ( a ) the vertical axis shows the (negative) second orderbias term − b/M , and in ( b ) the (positive) third order term, c/M , for three different number ofsampling times N . Note that these are of comparable magnitude, but opposite sign. Thus a secondorder jackknife, which removes terms proportional to a/M and b/M , may yield more unfavorable resultsthan a first order jackknife, which only removes the a/M term. We note that the slope of the secondorder bias term approximately corresponds to M − , and the third order is slightly more. For panel (a)the second order bias was extracted combining equation (S122) and equation (S124), to give − b/M =2 θ (0 , , J + θ (0 , J − θ ∗ , and for panel (b) we have ( θ (0 , , J − θ ∗ ) = c/M , which follows immediately fromequation (S124). For simulation parameters, see Supplementary Methods section D.5.19upplementary Figure S5: Heuristic goodness-of-fit using the the coefficient of determination, R . The quality of the CCM and WLS fits are heuristically quantified by the coefficient of determination, R , as a function of sampling points, N (horizontal axis on log-scale for visibility), for our four prototypesystems: ( a – b ) Brownian motion (BM), ( c – d ) damped harmonic oscillation (DHO), ( e – f ) fractionalBrownian motion (FBM), and ( g – h ) continuous time random walk (CTRW). A perfect fit yields unitvalue, while a bad fit results in R (cid:28) (see Supplementary Methods, Sec. I). The number of trajectoriesused in the ensemble average was either M = 10 (left), or M = 80 (right). All data was averaged over S = 500 realizations, with standard deviation given by the error bars. For panels ( b , f ) only a few datapoints could be obtained, due to numerical instability of CCM, and for panels ( g , h ) R < for larger N . For simulation parameters, see Supplementary Methods section D.5.20upplementary Figure S6: Example of a fit to the mean of ensemble data for the WLS andCCM methods.
An illustrative example of a typical fit to average ensemble trajectory data, basedon M = 150 trajectories, for ( a ) Brownian motion (BM), ( b ) damped harmonic oscillation (DHO),( c ) fractional Brownian motion (FBM), and ( d ) continuous time random walk (CTRW). The modelparameters were fitted to the data using either WLS or CCM fitting procedure, for N = 75 , M = 150 .For CCM fitting to the FBM data, we see that although the exponent is almost the same, the pre-factoris inaccurate. For CCM fitting to CTRW data, both exponent and pre-factor is poor. For simulationparameters, see Supplementary Methods section D.5.21upplementary Figure S7: Error estimation using bootstrap resampling and jackknife errorestimation.
Standard deviation for the parameter fits as a function of the number of sampling points, N ,used in the fitting procedure. Each method is applied to S = 500 realizations of data from ( a ) Brownianmotion (BM), ( b ) damped harmonic oscillation (DHO), ( c – d ) fractional Brownian motion (FBM), and( e – f ) continuous time random walk (CTRW). The associated standard deviation in parameter estimatesserve as "actual" standard deviation. These actual values are compared to estimates using bootstrapresampling and jackknife error estimation procedures, see Supplementary Methods, Sec. H. We see thatthe bootstrap method gives rather reliable error estimates which are similar to that of the WLS-ICEprocedure, compare to Figure 2 in the main text. However, note that the bootstrap method is associatedwith a substantially larger computational time compared to the WLS-ICE. The jackknife error estimationperforms worse than bootstrap resampling in general. For the jackknife error estimation, we used groups. For the bootstrap results, trajectories were resampled with replacement and the χ minimizationperformed 100 times. For simulation parameters, see Supplementary Methods section D.5.22upplementary Figure S8: Bias and variance of Brownian motion adapted least squares(BMALS) compared to the WLS-ICE and CCM methods.
We show the bias in parameterfit (left panels) and their variance compared to estimates from fitting procedure (right panels), as afunction of the number of sampling times, N . The MSD based on two different data sizes, M (numberof trajectories) was considered: ( a , b ) M = 10 and ( c , d ) M = 80 ; averaged over S = 500 realizations.Notice the lower variance in BMALS as compared to WLS-ICE, and that as M is increased the CCMvariance approach the variance for BMALS. The BMALS is a hybrid between model matching and func-tion fitting procedures as it requires the true covariance matrix as input. Error bars show standard errorsof the mean. For simulation parameters, see Supplementary Methods section D.5.23 upplementary Tables Description
Low density Medium density High density
Video
S5 S1 S6
Number of trajectories M = 310 M = 16 M = 5 Method
ObservableWLS-ICE ˆ θ ˆ σ ˆ θ ˆ σ ˆ θ ˆ σ Results of the three fitting method for “real world” particle trackingdata, without jackknife . Results shown are for the same data as in Table 1 in the main text, but herebefore the jackknife procedures were applied. Comparing to Table 1 in the main text we see that biasesis rather large for video S6 (few trajectories) but minor for video S5 (large number of trajectories).
Abbreviation Comment
WLS-ICE weighted least squares includingcorrelations in error estimation new methodWLS-ECE weighted least squares excludingcorrelations in error estimation old methodCCM correlated chi-square method old methodBM Brownian motion zero-mean process without memoryDHO damped harmonic oscillation process with a time-dependent meanFBM fractional Brownian motion zero-mean process with memoryCTRW continuous time random walk zero-mean, ageing processSupplementary Table S2:
List of abbreviations .24 upplementary MethodsContents
A Weighted Least Squares Including Correlations in Error estimation (WLS-ICE) 26
A.1 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26A.2 Error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
B Review of previous fitting procedures 28
B.1 WLS-ECE fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28B.1.1 General fit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28B.1.2 Linear fit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29B.2 CCM fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29B.2.1 General fit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29B.2.2 Linear fit functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
C Prototypical model systems 30
C.1 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30C.2 Damped Harmonic Oscillation in a heat bath (DHO) . . . . . . . . . . . . . . . . . . . . . 32C.3 Fractional Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34C.4 Continuous time random walk (CTRW) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
D Simulation procedures 34
D.1 Brownian motion (BM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35D.2 Damped harmonic oscillation (DHO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35D.3 Fractional Brownian motion (FBM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35D.4 Continuous time random walk (CTRW) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35D.5 Simulation parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
E Bias effects in parameter estimation 36
E.1 The origin of bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36E.2 Bias in parameter estimation of CCM for linear fit functions . . . . . . . . . . . . . . . . . 36E.3 Bias in parameter estimation of CCM for BM . . . . . . . . . . . . . . . . . . . . . . . . . 37E.3.1 Asymptotic expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39E.4 Bias in parameter estimation of WLS for BM . . . . . . . . . . . . . . . . . . . . . . . . . 40E.5 Lack of bias for BMALS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41E.6 Lack of bias in parameter estimation of CCM for DHO . . . . . . . . . . . . . . . . . . . . 41
F Approximate distribution for the estimated parameters 42G Jackknife bias reduction 42
G.1 First order jackknife bias reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43G.2 Second order jackknife bias reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43G.3 Variance for jackknife-bias-reduced estimators . . . . . . . . . . . . . . . . . . . . . . . . . 44G.3.1 First order jackknife bias reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 44G.3.2 Second order jackknife bias reduction . . . . . . . . . . . . . . . . . . . . . . . . . 45
H Estimation of errors on estimated parameters, using jackknife and bootstrap proce-dures 45
H.1 Jackknife error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45H.2 Bootstrap error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
I Coefficient of determination 46 Settings in "Particle Tracker" plug-in 46
In this Supplementary Methods, details of the derivations, simulations and methods are provided.For convenience, Table S2 lists all abbreviations used herein.
A Weighted Least Squares Including Correlations in Error esti-mation (WLS-ICE)
We here describe our new fitting procedure, the WLS-ICE method, in detail. As demonstrated in themain text, the previous standard methods for fitting of ensemble averages, the WLS-ECE or CCMprocedures (section B), are of limited general applicability for fitting of correlated data: the WLS-ECEmethod assumes data points are independent resulting in flawed error estimation, whereas the CCMmethod (involving inversion of a noisy sample covariance matrix) provides ill-conditioned results orstrong bias in the parameter estimation. We here formulate the problem at hand as a minimizationof a “cost function”, χ , which can be chosen rather general. Minimizing this cost function provides anestimate, ˆ θ , for the model parameters of interest. However, unlike the WLS-ECE fitting procedure, wherefluctuations around mean values are assumed to be independent, we use the full multivariate probabilitydensity function for the mean values, eq. (S8) (which is Gaussian due to the multivariate central limittheorem), when estimating the standard error and covariance in the fitted parameters. This provides amathematically rigorous way of avoiding the problems with previous fitting methods. A.1 Parameter estimation
The cost function used herein is a χ functional (eq. (2) in the main text) on the form: χ = ( f − y ) T R ( f − y ) , (S1)where y = ( y , . . . , y N ) , f = ( f , . . . , f N ) , f i = f ( T i ; θ ) with sampling times T i ( i = 1 , . . . , N ) and where ( . . . ) T denote transpose. We find the best parameters ˆ θ by minimizing χ , i.e., by solving: ∂χ ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ = 0 = 2 (cid:88) i,j ∂f i ( θ ) ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ij ( f j ( ˆ θ ) − y j ) , (S2)where a = 1 , . . . , K . As in the main text, a ’bar’ denotes a sample estimator, a ’hat’ denotes parametersobtained through χ minimization, and a ’star’ is used to denote the true value of a parameter. For alinear fit function, f i ( θ ) = θ T i , eq. (S2) can be solved analytically: ˆ θ = y T RTT T RT . (S3)Note that the positive definite symmetric matrix R in eq. (S1) could potentially be custom madefor particular applications. In the main text the observables y i are mean positions or mean squaredisplacements at different sampling times, T i . We note, however, that our WLS-ICE procedure is validfor any type of ensemble averaged observables (the matrices C and Q below are then the covariancematrix for those particular observables).For the matrix R , we consider three main choices:
1. Correlated Chi-Square Method (CCM):
Here we make use of the full covariance matrix, (seesection B.2): R = R [ CCM ] = C − , (S4)where C is the covariance matrix of the mean, C = Q /M , as defined in eq. (3) in the main text.26 . Weighted least squares (WLS): Here we only make use of the diagonal elements, R ij = R [ W LS ] ij = δ i,j /C ii , (S5)where δ i,j is the Kronecker delta-function.
3. Brownian motion adapted least squares (BMALS):
Finally we probe our fitting method bythe following choice: R = R [ BMALS ] = 1 M Q ∗ BM − , (S6)where Q ∗ BM is the exact covariance matrix for BM, see eq. (S38). For comparison of the BMALSmethod to WLS-ICE, please see Supplementary Figure S8. A.2 Error estimation
The covariance for the estimated parameters (i.e., the parameters ˆ θ obtained by solving eq. (S2)) isdefined ˆ∆ ab = (cid:104) (ˆ θ a − θ ∗ a )(ˆ θ b − θ ∗ b ) (cid:105) , (S7)where (cid:104) F ( y ) (cid:105) = (cid:82) F ( y ) ρ ( y ; θ ∗ ) dy dy · · · dy N denotes an average over the multivariate probability den-sity, ρ ( y ; θ ∗ ) . Due to the multivariate central limit theorem (note that y is a sum of M identicallydistributed random numbers), for large M this probability density is a multi-variate Gaussian: ρ ( y ; θ ∗ ) = Z − exp (cid:18) −
12 ( y − y ∗ ) T C ∗− ( y − y ∗ ) (cid:19) , (S8)with normalization constant Z = (2 π ) N/ (cid:112) det( C ∗ ) [44] and C ∗ = Q ∗ /M , where Q ∗ is the exactcovariance matrix.In order to derive an explicit expression for ˆ∆ ab we follow the lines of thought of Gottlieb et al. [12]and make a first order Taylor series expansion of the estimated parameter values in terms of deviationsof the estimated y from their true values: ˆ θ a − θ ∗ a = (cid:88) k ∂ ˆ θ a ∂y k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y = y ∗ ( y k − y ∗ k ) + O [( y k − y ∗ k )( y l − y ∗ l )] . (S9)Substituting this expression into eq. (S7) and using the definition of the covariance matrix: C ∗ kl = (cid:104) ( y k − y ∗ k )( y l − y ∗ l ) (cid:105) [this result follows from eq. (S8)] we find, to first order, ˆ∆ ab = (cid:88) k,l ∂ ˆ θ a ∂y k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y = y ∗ C ∗ kl ∂ ˆ θ b ∂y l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y = y ∗ . (S10)In order to obtain an explicit expression for ∂ ˆ θ a /∂y k we differentiate eq. (S2) with respect to y k . Thisyields (cid:88) b ˆ h ab ∂ ˆ θ b ∂y k − (cid:88) i ∂f i ( θ ) ∂θ a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ik (S11)where we introduced ˆ h ab = 2 (cid:88) i,j ∂ f i ( θ ) ∂θ a ∂θ b (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ij ( f j ( ˆ θ ) − y j ) + 2 (cid:88) i,j ∂f i ( θ ) ∂θ a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ij ∂f j ( θ ) ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ . (S12)Solving eq. (S11) we obtain: ∂ ˆ θ a ∂y k = 2 (cid:88) i (cid:88) b ( ˆ h − ) ab ∂f i ( θ ) ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ik , (S13)27hich when substituted into eq. (S10) yields the following expression for the covariance of the estimatedparameter, ˆ θ : ˆ∆ ab = (cid:88) c,d (cid:88) j,k,l,m ( ˆ h − ) ac ∂f j ( θ ) ∂θ c (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R jk C ∗ kl R lm ∂f m ( θ ) ∂θ d (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ ( ˆ h − ) db y = y ∗ . (S14)We finally replace all exact quantities above by the corresponding sample estimators (and use C = Q /M ),giving the key result, eq. (4) in the main text. The replacement of exact ensemble averages by sampleestimates introduces bias terms which, to first order, are proportional to /M , where M is the numberof trajectories, see section E.1. For WLS-ICE/WLS-ECE procedures, we find that the bias is in practiceoften negligible (see main text). Just as the parameter estimates ˆ θ a are typically biased, so will thequantity ˆ φ ab in eq. (4) in the main text also be, as it is a nonlinear function of sample estimates, seesection E.1. This bias can be reduced using the jackknife procedure applied to ˆ φ ab (see section G). B Review of previous fitting procedures
In this section we investigate the two previous ubiquitous χ methods for model fitting, namely WLS-ECE (uncorrelated χ ) fitting and CCM (correlated χ ) fitting. B.1 WLS-ECE fitting
The previous most common method of functional fitting to data is the “standard” weighted least squares(WLS-ECE in the main text) method (uncorrelated χ fitting), which is reviewed in this section. In thismethod, one assumes that all fluctuations around mean values are uncorrelated. B.1.1 General fit functions
In the WLS-ECE method one maximizes the probability for the function f ( T i ; θ ) = f i ( θ ) to have a goodfit to the data: P ( y ; θ ) ∝ N (cid:89) i =1 exp (cid:18) −
12 ( y i − f i ( θ )) σ i (cid:19) . (S15)Note that this probability is a product over the observations, y , hence the data is assumed to be statis-tically independent . Within this assumption, the unbiased estimator of variance of the mean is σ i = 1 M M − M (cid:88) m =1 ( y ( m ) i − y i ) . (S16)Maximizing the probability P is equivalent to minimizing χ = N (cid:88) i =1 ( y i − f i ( θ )) σ i , (S17)from which we get estimated parameters ˆ θ , by solving ∂χ ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ = 0 = 2 (cid:88) i ∂f i ( θ ) ∂θ a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ σ i ( f i ( ˆ θ ) − y i ) . (S18)For χ close to the estimated parameter set ˆ θ we have the Taylor expansion χ = χ (cid:12)(cid:12) ˆ θ + K (cid:88) a =1 ( θ a − ˆ θ a ) ∂χ ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ + 12 K (cid:88) a,b =1 ( θ a − ˆ θ a )( θ b − ˆ θ b ) ∂ χ ∂θ a ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ , (S19)28hich we can insert back into the expression for P , eq. (S15), to yield P ( θ ) = W exp − K (cid:88) a,b =1 ˆ H ab ( θ a − ˆ θ a )( θ b − ˆ θ b ) , (S20)where W is a normalization constant and ˆ H ab = ∂ χ ∂θ a ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ (S21)is the Hessian matrix, and we used ∂χ /∂θ a | θ = ˆ θ = 0 . From eq. (S20) we find that ˆ∆ ab ≡ (cid:104) (ˆ θ a − θ ∗ a )(ˆ θ b − θ ∗ b ) (cid:105) = 2( ˆ H ) − ab , (S22)i.e., the inverse of the Hessian matrix determines the covariances of the estimated parameters. B.1.2 Linear fit functions
For the case that the fit function is linear, i.e., f i ( θ ) = θ T i , eq. (S18) can be solved analytically (Press et al. [9]). The same can be done for the variance, σ , in the estimated parameter. We have ˆ θ = (cid:80) i y i T i /σ i (cid:80) i T i /σ i (S23a) ˆ σ = ˆ∆ = 1 (cid:80) i T i /σ i . (S23b) B.2 CCM fitting
In this section we review CCM (correlated chi-square method) fitting procedure [10, 12, 14, 13].
B.2.1 General fit functions
Where a WLS-ECE fit only makes use of the diagonal (variance) of the covariance matrix, CCM makesuse of the full matrix, defined as in eq. (S39), where the diagonal will be the square of the standard errorof the mean, s i = σ i /M . The task of fitting a function f ( t i ; θ ) , reduces to maximizing the probabilitywhich is taken as the multi-variate Gaussian: P ( y ; θ ) = Z − exp (cid:18) −
12 ( y − f ( θ )) T C − ( y − f ( θ )) (cid:19) , (S24)where (for a good fit: y ∗ ≈ f ) C = Q /M can be estimated through eq. (3) in the main text, andthe normalization constant Z = (2 π ) N/ (cid:113) det( C ) [44], y = ( y , . . . , y N ) , f = ( f , . . . , f N ) , with f i = f ( T i ; θ ) , and ( . . . ) T denotes transpose. For uncorrelated data the covariance matrix estimator, C , willbe diagonal and eq. (S24) reduces to eq. (S15), and the WLS-ECE method is attained.As for WLS-ECE, maximizing P is equivalent to minimizing the cost function χ = ( y − f ( θ )) T C − ( y − f ( θ )) . (S25)Thus, we get our estimated parameters ˆ θ a ( a = 1 , . . . , K ) by solving: ∂χ ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ = 0 = − ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ C − ( y − f ( ˆ θ )) + ( y − f ( ˆ θ )) C − (cid:18) − ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ (cid:19) = ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ C − ( f ( ˆ θ ) − y ) , (S26)29here in the last step we used the symmetry property of C , i.e., that C ij = C ji .The derivation of the covariance, ∆ ab , of the CCM estimated parameters, ˆ θ a follows along identicallines as for WLS-ECE (previous section). Hence, ∆ ab is given by eq. (S22) where ˆ θ a is now obtained bysolving eq. (S26) (instead of solving eq. (S18) as for WLS).We finally note that the CCM is a maximum likelihood estimation procedure "asymptotically". Moreprecisely, if M is large enough so that y s are Gaussian by the multi-variate central limit theorem, if thefit is "good" in the sense that y ∗ ≈ f , and if the errors on the estimated elements of the covariancematrix are negligible, then the CCM is a maximum likelihood estimation method. B.2.2 Linear fit functions
For fitting a linear function, f i ( θ ) = θ T i , to data one can determine the minimum of the CCM χ function, eq. (S25), analytically. In particular, such a fit function is of relevance for BM (section C.1).Eq. (S26) becomes ∂χ ∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ∗ = ( y − θ ∗ T ) T C − T . (S27)Taking the second derivative we get ∂ χ ∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ∗ = − T T C − T . (S28)From these results, as well as using eq. (S21) and eq. (S22), we get the estimated value for the parameter θ and its variance σ as ˆ θ = y T C − TT T C − T (S29a) ˆ σ = ˆ∆ = 1 T T C − T . (S29b) C Prototypical model systems
In the main text we provide results for different parameter estimation procedures. As prototype systemswe use four processes where the true parameter values are known, namely: (i) Brownian motion (BM),(ii) damped harmonic oscillation (DHO), (iii) fractional Brownian motion (FBM), and (iv) continuoustime random walks (CTRW). For BM and CTRW in d spatial dimensions, steps in different directions areindependent. Therefore, without loss of generality, all simulations are here performed in one dimension, d = 1 , for these systems. Also, for consistency, we use d = 1 in our FBM simulations. C.1 Brownian motion
Our first example is a simple BM, which can be used to describe, e.g., single particle diffusion in onedimension. The mean square displacement (MSD) at time t , for dimension d , and diffusion constant D ,is (cid:104) ( x ( t ) − x (0)) (cid:105) = (cid:104) y ( t ) (cid:105) = θt, (S30)where θ = 2 dD (S31)and y ( t ) = [ x ( t ) − x (0)] . (S32)In all simulations in the main text we use one-dimensional simulations, i.e., d = 1 .30n one-dimensional BM, the full covariance matrix for the displacement is known [55]. Choosing x (0) = 0 and discretizing time into process times t i = i(cid:15) ( i = 1 , . . . , N ), with time step (cid:15) , we have V ∗ ij = (cid:104) ( x i − (cid:104) x i (cid:105) )( x j − (cid:104) x j (cid:105) ) (cid:105) = 2 D min( t i , t j ) , (S33)where x i = x ( t i ) and D is the diffusion constant. On matrix form: V ∗ = 2 D(cid:15) . . .
11 2 2 21 2 3 3 ... . . . N . (S34)Of interest here is also the covariance matrix for the square displacements: Q ∗ ij = (cid:104) ( y i − (cid:104) y i (cid:105) )( y j − (cid:104) y j (cid:105) ) (cid:105) . (S35)Using Wick’s (Isserlis’) theorem for zero-mean processes, we can calculate any moment of a multivariateGaussian according to (cid:104) x x · · · x n (cid:105) = (cid:88) (cid:89) (cid:104) x i x j (cid:105) , (S36)where the sum is over all distinct ways of partitioning x . . . , x n into pairs x i x j . Using eq. (S36) wehave the following relation between Q ∗ and V ∗ : Q ∗ ij = 2( V ∗ ij ) . (S37)On matrix form: Q ∗ = 8( D(cid:15) ) . . .
11 4 4 41 4 9 9 ... . . . N . (S38)The standard unbiased sample estimator of Q is Q ij = 1 M − (cid:88) m ( y ( m ) i − y i )( y ( m ) j − y j ) . (S39)where m labels trajectories, see main text.For BM, the inverse of the Q ∗ matrix is a tridiagonal matrix with column sum of zero, except thefirst. Explicitly Q ∗− = 18( D(cid:15) ) − . . . −
13 13 + − −
15 15 + − ... − . . . . . . − N − N − , (S40)which can be written as ( Q ∗− ) ij = 18( D(cid:15) ) (cid:20)(cid:18) i − − δ i,N )2 i + 1 (cid:19) δ i,j − (cid:18) i + 1 (cid:19) δ i,j − − (cid:18) i − (cid:19) δ i,j +1 (cid:21) . (S41)where δ i,j is the Kronecker delta-function ( δ i,j = 1 , if i = j ; δ i,j = 0 , if i (cid:54) = j ). It is straightforwardto show that indeed the matrix above satisfies ( Q ∗− ) · Q ∗ = I , where I is the identity matrix. Notethat the results above for Q ∗− assumes that the time of the first sampling time is equal to the distancebetween subsequent sampling times. In general, this choice of sampling times may not be optimal. Insuch situations one can evaluate Q ∗− using numerical inversion of Q ∗ given in Eqs. (S37) and (S33).31 .2 Damped Harmonic Oscillation in a heat bath (DHO) Following Nørrelykke and Flyvbjerg[56] we consider the dynamics of a damped harmonic oscillation in aheat bath (DHO). Physically, this process corresponds to the motion of a particle in a harmonic potential(i.e., the particle experiences a restoring force proportional to the displacement from the botttom of thepotential) in a viscous liquid. Besides exerting friction on the particle, the molecules in the viscous liquidact as a noise source by providing thermal kicks on the particle. The equation of motion is: m d x ( t ) dt + γ dx ( t ) dt + κx ( t ) = F therm ( t ) , (S42)where x ( t ) is the particle position at time t , m is the mass, γ is the friction constant, κ is the springconstant and F therm = (2 k B T γ ) / η ( t ) is the thermal noise, which is assumed to be zero mean Gaussianand delta-correlated, i.e., (cid:104) η ( t ) (cid:105) = 0 (S43)and (cid:104) η ( t ) η ( t (cid:48) ) (cid:105) = δ ( t − t (cid:48) ) . (S44)Above, k B is the Boltzmann constant, T is the temperature of the heat bath and δ ( z ) is the Diracdelta-function. The equation of motion is completed by initial conditions for the position and velocity.We restrict ourself to x ( t = 0) = x (S45) v ( t = 0) = dx ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12) t =0 = 0 , (S46)i.e., the particle is at the initial time displaced by a distance x from its equilibrium position and thenlet go without imposing any initial velocity (no external pushing or pulling).Based on eq. (S42) it is straightforward to derive an expression for the expected position, (cid:104) x ( t ) (cid:105) , attime t . By taking the ensemble average of eq. (S42) and then making the ansatz: (cid:104) x ( t ) (cid:105) = exp( i Ω t ) wearrive at a second order algebraic equation for Ω with two solutions: Ω ± = i τ + √ ω , (S47)where τ = mγ , (S48) ω = ω − τ (S49)and ω = (cid:114) κm . (S50)Thus, for the case ω < the solution for (cid:104) x ( t ) (cid:105) is an exponentially damped function. For the case ω > ,the solution is a complex valued exponential which can be written in terms of real-valued exponentialsmultiplied by sinus and cosinus functions. Also incorporating the initial conditions used here, eqs. (S45)and (S46), we find the solution for the mean to be (cid:104) x ( t ) (cid:105) = x (cid:18) cos( ωt ) + θ ω sin( ωt ) (cid:19) exp( − θ t ) , (S51)with θ = 12 τ . (S52)32he case when ω = 0 (i.e., ω = 1 / (2 τ ) ) is referred to as critical damping. For this case we can obtainthe solution from eq. (S51) by taking the limit of ω → to find (cid:104) x ( t ) (cid:105) = x (1 + θ t ) exp( − θ t ) . (S53)The case of critical damping is used in the simulations in the main text, where θ is used as a fittingparameter.Using the full stochastic eq. (S42), we can also derive an explicit expression for the covariance matrix C ∗ ( t, ˜ t ) = (cid:104) [ x ( t ) − (cid:104) x ( t ) (cid:105) ][ x (˜ t ) − (cid:104) x (˜ t ) (cid:105) ] (cid:105) . For simplicity we limit ourselves to the case ω ≥ . We startby rewriting eq. (S42) as a set of two coupled first order equations[56] ddt (cid:18) x ( t ) v ( t ) (cid:19) = − M (cid:18) x ( t ) v ( t ) (cid:19) + (cid:18) √ Dτ η ( t ) (cid:19) , (S54)with D = k B T /γ being the particle diffusion constant and M = (cid:18) − ω
20 1 τ (cid:19) , (S55)which has the formal solution (cid:18) x ( t ) v ( t ) (cid:19) = (cid:18) (cid:104) x ( t ) (cid:105)(cid:104) v ( t ) (cid:105) (cid:19) + √ Dτ (cid:90) t exp( − M )( t − t (cid:48) ) (cid:18) η ( t (cid:48) ) (cid:19) dt (cid:48) , (S56)where (cid:18) (cid:104) x ( t ) (cid:105)(cid:104) v ( t ) (cid:105) (cid:19) = exp( − M t ) (cid:18) x v (cid:19) (S57)is the solution to the mean of eq. (S54) (using (cid:104) η ( t ) (cid:105) = 0 ). The covariance matrix now becomes: C ∗ ( t, ˜ t ) = 2 Dτ (cid:90) t dt (cid:48) (cid:90) ˜ t dt (cid:48)(cid:48) (exp( − M ( t − t (cid:48) ))) (cid:0) exp( − M (˜ t − t (cid:48)(cid:48) )) (cid:1) (cid:104) η ( t (cid:48) ) η ( t (cid:48)(cid:48) ) (cid:105) . (S58)Without loss of generality, we assume that t < ˜ t , and carry out the integral over t (cid:48)(cid:48) above to find: C ∗ ( t, ˜ t ) = 2 Dτ (cid:90) t dt (cid:48) (exp( − M ( t − t (cid:48) ))) (cid:0) exp( − M (˜ t − t (cid:48) )) (cid:1) . (S59)Using for ω > the explicit form for the matrix exponential above as provided by Nørrelykke et al. [56] exp( − M t ) = exp( − θ t )[cos( ωt ) I + sin( ωt ) J ] , (S60)with I the 2 by 2 identity matrix and J = (cid:32) θ ω ω − ω ω − θ ω (cid:33) , (S61)eq. (S59) becomes: C ∗ ( t, ˜ t ) = 8 Dθ ω (cid:90) t dt (cid:48) exp( − θ ( t − t (cid:48) )) exp( − θ (˜ t − t (cid:48) )) sin[ ω ( t − t (cid:48) )] sin[ ω (˜ t − t (cid:48) )] . (S62)Carrying out the integral above we arrive at our final expression for the covariance matrix for DHO: C ∗ ( t, ˜ t ) = 2 Dθ ω ( θ + ω ) exp( − θ | ˜ t − t | ) (cid:8) ω cos[ ω (˜ t − t )] + θ sin[ ω | ˜ t − t | ] (cid:9) + 2 Dθ ω exp( − θ (˜ t + t )) (cid:18) θ θ + ω cos[ ω (˜ t + t )] − θ cos[ ω (˜ t − t )] − ωθ + ω sin[ ω (˜ t + t )] (cid:19) . (S63)33n the limit ω → (critical damping) we have C ∗ ( t, ˜ t ) = 2 Dθ (cid:0) exp( − θ (˜ t − t )) (cid:0) θ (˜ t − t ) (cid:1) − exp( − θ (˜ t + t )) (cid:0) θ (˜ t + t ) + 2 θ ˜ tt (cid:1)(cid:1) . (S64)To arrive at this result we made a Taylor series expansion to second order in ω of the general expression.We refrain from attempting to obtain an analytic expression for the inverse covariance matrix forDHO, as it appears a daunting task beyond the scope of the current study. C.3 Fractional Brownian motion
Our third example is the case of one-dimensional FBM, which is a zero mean Gaussian process withautocorrelation function, [57] v ij = (cid:104) x ( t i ) x ( t j ) (cid:105) = c ( t Hi + t Hj − | t i − t j | H ) , (S65)at discrete times t i = i(cid:15) and where the parameter H denotes the Hurst parameter [58]. For H = 1 / , FBMbecomes standard BM. Indeed, if we set H = 1 / in eq. (S65) we find that v ij = c [( t i + t j ) − | t i − t j | ] =2 c min( t i , t j ) which is identical to eq. (S33) if we choose c = D . The inverse covariance matrix of eq. (S65)is (currently) not known analytically.From eq. (S65) we get the MSD, for t i = t j , as ( x (0) = 0 ) (cid:104) x ( t ) (cid:105) = θ t θ , (S66)where θ = 2 c and θ = 2 H , i.e., the MSD has, for H < / , a sublinear (or superlinear, if H > / )dependence on time, t . C.4 Continuous time random walk (CTRW)
Our last example uses CTRW in one dimension. Such a process is defined through a waiting time density ψ ( τ ) , and a jump length probability density, ζ ( (cid:96) ) [28]. In our case we choose ψ ( τ ) = ατ ∗ (1 + τ /τ ∗ ) − − α (S67)with < α < so that we have infinite average waiting time (cid:104) τ (cid:105) . The jump length probability densityis chosen to be a Gaussian: ζ ( (cid:96) ) = 1 √ πa exp (cid:18) − (cid:96) a (cid:19) (S68)with a variance a . For such a process, the MSD follows (for long times) [28]: (cid:104) x ( t ) (cid:105) = θ t θ (S69)(with x (0) = 0 ) where θ = 2Γ(1 + α )Γ(1 − α ) a τ ∗ ) α , (S70)and θ = α. (S71) D Simulation procedures
In this section we provide details about the methods used to generate the data for our prototypicalexample systems introduced in section C. Simulations ran to a stop time t stop . All simulation parametersare listed in Sec. D.5. 34 .1 Brownian motion (BM) BM in one dimension is simulated using random jump lengths drawn from a normal distribution. Insome detail, we start by taking the cumulative sum of N random numbers from a Gaussian distributionwith zero mean and variance a , and square each element of the sum. Each step increments time by (cid:15) .This is repeated M times and summed and averaged. In short, the MSD was computed as: y i = 1 M M (cid:88) m =1 (cid:34) i (cid:88) n =1 r ( m ) n (cid:35) , (S72)where r ( m ) n is a random number drawn from a normal distribution, associated with the length of the n thjump for trajectory m . The diffusion constant for this type of process is D = a / (2 (cid:15) ) . D.2 Damped harmonic oscillation (DHO)
When simulating the harmonic oscillation in a heat bath, see eq. (S42), we follow the procedure describedby Nørrelykke and Flyvbjerg[56] (at critical damping, ω = 0 ). D.3 Fractional Brownian motion (FBM)
For FBM simulations we used an algorithm by Davies and Harte [59, 60]. When fitting the model ineq. (S66), we include only time points t ≥ T since this model prediction for the MSD, as for CTRW (seesection D.4), is only valid for large simulation times. D.4 Continuous time random walk (CTRW)
For generating the CTRW data we move a "particle" randomly with a step length drawn from a Gaussianprobability density, eq. (S68), at each time step and increment time with a waiting time τ from the power-law distribution in eq. (S67). In more detail: while the process time, t , is smaller than the designatedstop time we repeat the following procedure to generate one trajectory m :1. Draw a random waiting time, τ , from the power-law in eq. (S67).2. Move the particle, by increasing the current displacement by a random number r drawn from anormal distribution.3. Update the time t by τ .The procedure is repeated M times and averaged over, to yield the MSD. Since the prediction in eq. (S69)is only valid for t (cid:29) τ ∗ , for fitting purposes, we include only time points t ≥ T in the χ expression, eq.(S1), and in the associated parameter covariance estimation formula, eq. (4) in the main text. D.5 Simulation parameters
Below we list the simulation parameters used in all simulations in the main text and for the Supplemen-tary Figures. We also give values for the first sampling time, T , used in the fit procedure (some of thefunctional forms used for fitting are only valid for "large" times). • BM.
Time increment, (cid:15) = 1 (dimensionless). Step length variance, a = 1 (dimensionless). Simu-lation stop time t = 10 (cid:15) . First sampling time, T = (cid:15) . • DHO.
Spring constant κ = 1 (dimensionless). Mass m = 1 (dimensionless). Initial position, x = 1 (dimensionless). Thermal energy, k B T = 10 − (dimensionless). Simulation stop time, t stop = 20 ω − (with ω = (cid:112) κ/m = 1 ). First sampling time, T = ω − .35 FBM.
Hurst exponent, H = 1 / , unless stated otherwise. Time increment, (cid:15) = 1 (dimensionless).Prefactor in covariance matrix, c = 1 (dimensionless). Simulation stop time, t stop = 10 (cid:15) . Firstsampling time, T = 200 (cid:15) . • CTRW.
Power-law exponent, α = 0 . . Step length variance, a = 1 (dimensionless). Character-istic time scales τ ∗ = 1 (dimensionless). Simulation stop time, t stop = 10 τ ∗ . First sampling time, T = 10 τ ∗ . E Bias effects in parameter estimation
In this section, we provide analytical expressions for the bias in parameter (diffusion constant) estimationfor BM. We find that for BM the CCM method has a bias which increases strongly with the numberof sampling times, N . In contrast, the WLS method provides a (small) bias which is independent of N for large N . To make notation compact, we leave summations over repeated indices implicit (where noconfusion can occur) in this section. E.1 The origin of bias
In general the bias, i.e., the expected difference between some observable based on sample estimates andthe “true” value of that observable, can be written as a series expansion in terms of /M , where M isthe number of trajectories [61]. To understand why this is so, in the present context, we recall thatany sample mean or sample covariance, Q ijk.... (where i , j , k etc. labels sampling times), is an average(normalized sum) over the M trajectories. The multivariate central limit theorem tells us that for large M we can, for such averages, write Q ijk... = Q ∗ ijk.... + γ ijk... / √ M , where γ ijk... is a zero-mean “noise”.Therefore any observable, O , which is a function of one, or several, such sample estimates (the optimalparameters ˆ θ and their associated covariance matrix ˆ∆ , see previous sections, are examples of suchobservables) will (schematically) have a Taylor series expansion of the form: O = O ∗ + ∞ (cid:88) k =1 A k √ M M k − + ∞ (cid:88) k =1 B k M k (S73)for large M . The first term in the Taylor expansion is the sought quantity, O ∗ . Considering the remainingterms, we note that, by construction, we have that (cid:104) A (cid:105) = 0 , and hence the first non-zero term of theexpectation value of the expression above is (cid:104) B (cid:105) /M ∝ /M . For the case that the observable, O , isa function of more than one independent sample estimates, then we have (cid:104) A k (cid:105) = 0 for all k . However,note that if O is a function of several sample estimates which are dependent , then in general (cid:104) A k (cid:105) (cid:54) = 0 for k ≥ . We can safely remove the first bias-term with a jackknife procedure [46], see section G. Alsohigher order bias terms can be removed formally. However, already at the second order bias reductionlevel computational costs becomes considerable. E.2 Bias in parameter estimation of CCM for linear fit functions
Consider equations (S1) and (S4). We write the sample estimator of the covariance matrix eq. (S39),and the exact, Q ∗ , as related by Q ij = Q ∗ ij + η ij , (S74)where η represents their deviation. We seek the “noise” in the inverse, ( Q − ) ij . Using the normalizationcondition, and writing ( Q − ) ij = ( Q ∗− ) ij + ξ ij , (S75)we get I = Q Q − = ( Q ∗ + η )( Q ∗− + ξ ) = I + ηQ ∗− + Q ∗ ξ + ηξ . (S76)36hus, to first order ηQ ∗− + Q ∗ ξ = 0 , and by definition η = Q − Q ∗ : ξ = Q ∗− − Q ∗− Q Q ∗− . (S77)Using eq. (S75) in eq. (S1) and eq. (S4) yields ˆ θ = y T ( Q ∗− + ξ ) tt T ( Q ∗− + ξ ) t = y T Q ∗− tt T Q ∗− t (cid:16) t T ξttQ ∗− t (cid:17) + y T ξtt T Q ∗− t (cid:16) t T ξttQ ∗− t (cid:17) ≈ t T Q ∗− t (cid:18) y T Q ∗− t + y T ξt − y T Q ∗− ttQ ∗− t tξt (cid:19) , (S78)where we did a series expansion to first order in ξ . Using eq. (S77) we get ˆ θ = y T Q ∗− tt T Q ∗− t + y T ( Q ∗− − Q ∗− Q Q ∗− ) tt T Q ∗− t − y T Q ∗− t ( t T Q ∗− t ) (cid:16) t T Q ∗− t − t T Q ∗− Q Q ∗− t (cid:17) = y T Q ∗− tt T Q ∗− t − y T Q ∗− Q Q ∗− tt T Q ∗− t + y T Q ∗− t ( t T Q ∗− t ) t T Q ∗− Q Q ∗− t (cid:124) (cid:123)(cid:122) (cid:125) bias = B . (S79)Note that the expectation value of the first term on the right hand side evaluates to θ ∗ , hence theadditional terms yield the bias, whose expectation value, (cid:104) B (cid:105) , we now seek. It is convenient to writeeq. (S79) on component form (repeated indices are summed over) with B = B + B where B = − y k ( Q ∗− ) ki Q ij ( Q ∗− ) jl t l t T Q ∗− t (S80a) B = y i ( Q ∗− ) ik t k t j ( Q ∗− ) jm Q ml ( Q ∗− ) ln t n ( t T Q ∗− t ) (S80b)(the component form of the quantity appearing in the denominators above is t T Q ∗− t = t p ( Q ∗− ) pq t q ).We thus see that the expected bias, (cid:104) B (cid:105) , is determined by expectation value ( a, b, c . . . label trajectories): (cid:104) y k Q ij (cid:105) = 1 M ( M − (cid:104) M (cid:88) a =1 y ( a ) k (cid:34) M (cid:88) b =1 y ( b ) i y ( b ) j − M M (cid:88) b =1 y ( b ) i M (cid:88) c =1 y ( c ) j (cid:35) (cid:105) . (S81) E.3 Bias in parameter estimation of CCM for BM
Let us now consider the expected bias for CCM fitting for BM using the formal expression in sectionE.2. We have: (cid:104) y ( a ) k (cid:105) = (cid:68) (cid:104) x ( a ) k − x ( a ) (0) (cid:105) (cid:69) = σ ∗ k = V ∗ kk , (S82)where we in the last step used eq. (S33). Also (cid:104) x ( a ) i − x ( a ) (0) (cid:105) = 0 , and since different realizations(trajectories) are independent we have (cid:104) x ( a ) i x ( b ) j (cid:105) = δ a,b V ∗ ij . (S83)Higher order terms can be calculated using Wick’s theorem, eq. (S36) (for large i , x ( a ) i is a sum of manysmall increments, from the central limit theorem it follows that x ( a ) i are Gaussian). We have (cid:104) y ( a ) i y ( b ) j (cid:105) = (cid:104) ( x ( a ) i ) ( x ( b ) j ) (cid:105) = (cid:104) x ( a ) i x ( a ) i x ( b ) j x ( b ) j (cid:105) = (cid:104) x ( a ) i x ( a ) i (cid:105)(cid:104) x ( b ) j x ( b ) j (cid:105) + (cid:104) x ( a ) i x ( b ) j (cid:105)(cid:104) x ( a ) i x ( b ) j (cid:105) + (cid:104) x ( a ) i x ( b ) j (cid:105)(cid:104) x ( a ) i x ( b ) j (cid:105) = σ ∗ i σ ∗ j + 2( V ∗ ij ) δ a,b . (S84)37ow, in the same way for higher order terms, we get (cid:104) y ( a ) k y ( b ) i y ( c ) j (cid:105) = (cid:104) x ( a ) k x ( a ) k x ( b ) i x ( b ) i x ( c ) j x ( c ) j (cid:105) = [ tedious enumeration of all cases ] == σ ∗ k σ ∗ i σ ∗ j + 2 σ ∗ k ( V ∗ ij ) δ b,c + 2 σ ∗ j ( V ∗ ki ) δ a,b + 2 σ ∗ i ( V ∗ kj ) δ a,c + 8( V ∗ ki ) ( V ∗ kj ) ( V ∗ ij ) δ a,b δ b,c δ a,c , (S85)(no sum over repeated indices). Eq. (S81) now becomes (cid:104) y k Q ij (cid:105) = 1 M ( M − M (cid:88) a =1 M (cid:88) b =1 (cid:104) y ( a ) k y ( b ) i y ( b ) j (cid:105) (cid:124) (cid:123)(cid:122) (cid:125) U − M ( M − (cid:88) a,b,c (cid:104) y ( a ) k y ( b ) i y ( c ) j (cid:105) (cid:124) (cid:123)(cid:122) (cid:125) U . (S86)Using eq. (S85) we get: U = (cid:88) a,b (cid:104) y ( a ) k y ( b ) i y ( b ) j (cid:105) = M σ ∗ k σ ∗ i σ ∗ j + 2 M σ ∗ k ( V ∗ ij ) + 2 M σ ∗ j ( V ∗ ki ) + 2 M σ ∗ i ( V ∗ kj ) + 8 M V ∗ ki V ∗ ij V ∗ kj (S87a) U = (cid:88) a,b,c (cid:104) y ( a ) k y ( b ) i y ( c ) j (cid:105) = M σ ∗ k σ ∗ i σ ∗ j + 2 M (cid:104) σ ∗ k ( V ∗ ij ) + σ ∗ j ( V ∗ ki ) + σ ∗ i ( V ∗ kj ) (cid:105) + 8 M V ∗ ki V ∗ ij V ∗ kj . (S87b)Combining eq. (S87) with eq. (S86) results in: (cid:104) y k Q ij (cid:105) = 1 M ( M − (cid:104) (2 M − M ) σ ∗ k ( V ∗ ij ) + (8 M − V ∗ ki V ∗ ij V ∗ kj (cid:105) = 2 σ ∗ k ( V ∗ ij ) + 8 M V ∗ ki V ∗ ij V ∗ kj . (S88)Using eq. (S88) in eq. (S80a) we find (cid:104) B (cid:105) = − σ ∗ k ( Q ∗− ) ki δ i,l t l + M V ∗ ki V ∗ ij V ∗ kj ( Q ∗− ) ki ( Q ∗− ) jl t l t T Q ∗− t , (S89)where we used that Q ∗ ij ( Q ∗− ) jl = δ i,l . Now consider B , eq. (S80b). We write eq. (S88) according to(also see eq. (S37)) (cid:104) y i Q ml (cid:105) = σ ∗ i Q ∗ ml + 8 M V ∗ im V ∗ ml V ∗ li . (S90)Eq. (S80b) now becomes (cid:104) B (cid:105) = ( Q ∗− ) ik t k t j ( Q ∗− ) jm (cid:2) σ ∗ i Q ∗ ml + M V ∗ im V ∗ ml V ∗ li (cid:3) ( Q ∗− ) ln t n ( t T Q ∗− t ) = σ ∗ i ( Q ∗− ) ik t k t T Q ∗− t + 8 M ( Q ∗− ) ik t k t j ( Q ∗− ) jm V ∗ im V ∗ ml V ∗ li ( Q ∗− ) ln t n ( t T Q ∗− t ) . (S91)Combining B and B we arrive at an expression for the predicted first order bias (eq. (S79)) for thesuggested matrix, R [ CCM ] ; (notice the cancellations of the first terms): (cid:104) B (cid:105) = 1 M t T Q ∗− t (cid:18) ( Q ∗− ) ik t k t j ( Q ∗− ) jm V ∗ im V ∗ ml V ∗ li ( Q ∗− ) ln t n t T Q ∗− t − V ∗ ki V ∗ ij V ∗ jk ( Q ∗− ) ki ( Q ∗− ) jl t l (cid:19) , (S92)38hich can be analytically evaluated. With this in mind we use eq. (S33), with t i = i(cid:15) , and eq. (S41), ineq. (S92). When evaluating the associated sums over repeated indices in eq. (S92), one uses: min( i, j ) = (cid:40) i, if i ≤ jj, if i > j (S93)and then splits the sums accordingly. This splitting leads to sums on the form I ( m, p ) = (cid:88) k k m (2 k − p , (S94)where m and p are positive integers. These sums are rewritten according to I ( m, p ) = 12 m (cid:88) k k − p ((2 k −
1) + 1) m = 12 m m (cid:88) q =1 (cid:18) mq (cid:19) (cid:88) k (2 k − q − p , (S95)where we used the binomial theorem. The full calculation is tedious but straightforward. The final resultis: (cid:104) B (cid:105) = DM G ( N ) G ( N ) = − ad + bd a = N s − s b = 116 (3 s − s ) d = s s n = N (cid:88) k =1 k − n . (S96a)(S96b)(S96c)(S96d)(S96e)(S96f) E.3.1 Asymptotic expansion
Let us now investigate eq. (S96) for large N . To that end, we write s n , defined above, according to s n = N (cid:88) k =1 (cid:18) k − n + 1(2 k ) n − k ) n (cid:19) = N (cid:88) k =1 k − n − n N (cid:88) k =1 k − n . (S97)In eq. (S96), there are three sums, s , s and s . Out of these sums, s decays most slowly with N and hence this sum is the only one which needs to be kept for large N . From eq. (0.131) in Gradshteyn et al. [62] we have N (cid:88) k =1 k = γ + ln N + 12 N + O ( 1 N ) , (S98)where γ ≈ . is the Euler-Mascheroni constant. Combining the result above with eq. (S97) andeq. (S96) we arrive at the asymptotic expression G ( N ) ≈ − N ln N + γ + 2 ln 2 , (S99)where we used ln ab = ln a + ln b . For large N , eq. (S99) is a good approximation compared to the exactbias, eq. (S96), see Supplementary Figure S3. 39 .4 Bias in parameter estimation of WLS for BM Let us now consider the second case, eq. (S5), of choosing R . According to eqs. (S3) and (S5) we havethe following: ˆ θ = y T Q − new tt T Q − new t , (S100)where Q new ,ij = Q ij δ i,j (S101) Q ∗ new ,ij = Q ∗ ij δ i,j (S102) ( Q − new ) ij = δ i,j /Q ij (S103) ( Q ∗− new ) ij = δ i,j /Q ∗ ij . (S104)The calculation starting from eq. (S77) to eq. (S79) is identical to before, just replace Q with Q new ,and same for exact results. Since our new matrices are diagonal, eq. (S80) becomes (we here reintroduceexplicit sums for the sake of clarity) B = − (cid:80) k y k Q ∗ kk ) Q kk t k (cid:80) q t q /Q ∗ qq (S105a) B = (cid:80) j,k y k Q ∗ kk t k · t j Q ∗ jj ) Q jj (cid:16)(cid:80) q t q /Q ∗ qq (cid:17) . (S105b)Also the calculation from eq. (S81) which leads up to eq. (S88) is identical. From eq. (S105) we see thatwe need (cid:104) y k Q jj (cid:105) =2 σ ∗ k ( V ∗ jj ) + 8 M ( V ∗ kj ) V ∗ jj ( j, k fixed ) , (S106a) (cid:104) y k Q kk (cid:105) =2 σ ∗ k + 8 M σ ∗ k ( k fixed ) . (S106b)Substituting eq. (S106b) into eq. (S105a), and using eq. (S37) Q ∗ kk = 2( V ∗ kk ) = 2 σ ∗ k , and σ ∗ k = 2 Dt k we get (with sums explicitly written) (cid:104) B (cid:105) = − (cid:80) k Q ∗ kk ) (cid:0) σ ∗ k + M σ ∗ k (cid:1) t k (cid:80) q t q /Q ∗ qq = − (1 + M ) (cid:80) k / D (cid:80) k / (2 D ) = − D (cid:18) M (cid:19) . (S107)In much the same way, we insert eq. (S106a) into eq. (S105b) (cid:104) B (cid:105) = (cid:80) j,k (cid:16) σ ∗ k σ ∗ j + M σ ∗ j ( V ∗ kj ) (cid:17) σ ∗ k t k t j σ ∗ j (cid:16)(cid:80) q t q / σ ∗ q (cid:17) = (cid:80) j,k (cid:16)
14 1(2 D ) + M D ) ( V ∗ kj ) t k t j (cid:17) / D ( (cid:80) k = 2 D + 2 M D N (cid:88) j (cid:88) k ( V ∗ kj ) t k t j (cid:124) (cid:123)(cid:122) (cid:125) I . (S108)40onsider the double sum, I , in eq. (S108). We have time step t j = (cid:15)j and separate the sums into j = k and j (cid:54) = k , which gives V ∗ ij = 2 D(cid:15) min( i, j ) I = 4 D N (cid:88) k =1 N (cid:88) j =1 [min( j, k )] jk = 4 D N (cid:88) k =1 N (cid:88) k =1 k − (cid:88) j =1 [min( j, k )] jk = 4 D N + 2 N (cid:88) k =1 k k − (cid:88) j =1 j = 4 D (cid:32) N + 2 N (cid:88) k =1 k k ( k − (cid:33) =4 D N (cid:88) k =1 k = 2 D N ( N + 1) , (S109)which inserted in eq. (S108) yields (cid:104) B (cid:105) = 2 D + 4 DM (cid:18) N + 1 (cid:19) , (S110)from which we get the complete full bias together with eq. (S107): (cid:104) B (cid:105) = (cid:104) B (cid:105) + (cid:104) B (cid:105) = 4 DM (cid:18) N − (cid:19) . (S111)Thus, ˆ θ − θ ∗ = − DM (cid:18) − N (cid:19) . (S112)Note that the bias is independent of N for large N . E.5 Lack of bias for BMALS
We now consider our third and final choice of R -matrix for BM. Since (cid:104) ¯ y i (cid:105) = y ∗ i and R is a true inversecovariance matrix (and hence no sample estimate, see eq. (S6)) it follows immediately, by taking theexpectation value of eq. (S3), that the BMALS parameter estimate is unbiased. E.6 Lack of bias in parameter estimation of CCM for DHO
For the DHO problem we choose as our observable the particle position, i.e., we use y ( m ) k = x ( m ) k , where m labels different trajectories. For a good fit, the DHO parameter estimates ˆ θ are unbiased for CCM.To see this, consider the CCM minimization criterion eq. (S26) for DHO, which we write ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ Q − ( f ( ˆ θ ) − y ∗ ) − ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ Q − ( y − y ∗ ) . (S113)As in previous subsections, we then expand the inverse sample covariance matrix around its true value,i.e., we write Q − = Q ∗− + ξ , where ξ is given in eq. (S77). By expanding the right-hand side of eq.(S113) in f ( ˆ θ ) − y ∗ , y − y ∗ and ξ , we arrive at ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ Q ∗− ( f ( ˆ θ ) − y ∗ ) (cid:124) (cid:123)(cid:122) (cid:125) F ( ˆ θ ) (S114)41 ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ (cid:16) ( Q ∗− − Q ∗− Q Q ∗− )( f ( ˆ θ ) − y ∗ ) + Q ∗− ( y − y ∗ ) − ( Q ∗− − Q ∗− QQ ∗− )( y − y ∗ ) (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) G ( ˆ θ ) . (S115)Since F ( ˆ θ ) involves only the true covariance matrix and y ∗ , the solution to F ( ˆ θ ) = 0 yields the trueparameter value, i.e., we have F a ( θ ∗ ) = 0 . If the fit is good, then we obtain the solution to eq. (S114)using a Taylor expansion, i.e., we write F a ( ˆ θ ) ≈ F a ( θ ∗ ) + (cid:80) b w ab (ˆ θ b − θ ∗ b ) = (cid:80) b w ab (ˆ θ b − θ ∗ b ) , where w ab = ∂ b F a ( ˆ θ ) /∂ ˆ θ b | ˆ θ = θ ∗ . Inserting this into eq. (S114) and solving for ˆ θ a , we get ˆ θ a = θ ∗ a + (cid:88) b ( w − ) ab G b ( ˆ θ ) (cid:124) (cid:123)(cid:122) (cid:125) B a . (S116)Thus, the bias in the estimated parameter, ˆ θ a , is determined by the expectation value of B a . Anapplication of Wick’s theorem for Gaussian variables yields (cid:104) y k Q ij (cid:105) = (cid:104) y k (cid:105) Q ∗ ij . (S117)This result is a direct consequence of the fact that the positions at different times for the DHO processare distributed according to a multivariate Gaussian. This Gaussianity, in turn, follows from the factthat the harmonic oscillator position is a linear function of the imposed Gaussian noise, see eq. (S42).Using eq. (S117) and the fact that (cid:104) y k (cid:105) = y ∗ k and (cid:104) Q ij (cid:105) = Q ∗ ij we find that (cid:104) G ( ˆ θ ) (cid:105) = 0 and therebythat indeed (cid:104) B a (cid:105) = 0 , i.e., the CCM parameter estimate for DHO does not suffer from the bias problemsdiscussed in the previous subsections. F Approximate distribution for the estimated parameters
In the main text we saw that if M (the number of trajectories) is large enough the distribution for theestimated parameters is approximately Gaussian, see Figure 1 in the main text. To understand why thisis so, we note that a set of random number, y i ( i = 1 , . . . N ), from the Gaussian distribution in eq. (S8)can be generated using y i = y ∗ i + 1 √ M η i (S118)where η is a zero mean Gaussian random number with ( M -independent) covariance matrix Q . Considernow a function F ( y ) , and note that the estimated parameters, θ a , are functions of this type. We thenTaylor-expand: F ( y ) ≈ F ( y ∗ ) + 1 √ M A · η + O ( 1 M ) , (S119)where A is a matrix containing partial derivatives. Now assuming that the second term of the RHSabove is non-zero, that the matrix A is full rank, and that all terms higher than or equal to /M can beneglected, we have that the distribution for F is another Gaussian. This follows from the fact that A · η is normally distributed if η are drawn from a multivariate Gaussian. [63]. G Jackknife bias reduction
Through data resampling, bias in data-fitting can often be reduced. Let O be the parameter estimator,based on some data set with M trajectories. The associated true parameter is denoted by O ∗ . Herein,we choose O as either the estimated parameters ˆ θ , obtained by minimizing eq. (S1), or the associated42ovariance matrix ˆ φ , eq. (4b) in the main text. As outlined in section E.1, one often expects such afinite data set to yield a bias contribution of the form O = O ∗ + aM + bM + cM + O (cid:18) M (cid:19) . (S120)The bias terms can be reduced by increasing the data samples, M , or by using the jackknife method [46].Let us split the sample into g groups, each of size h , and define O [ − j ] as the parameter fitted to a datasample with the j th group removed. G.1 First order jackknife bias reduction
The first order bias term can be removed through repeated fitting and averaging over the sampled dataset: O (1) = 1 g g (cid:88) j =1 O [ − j ] (S121a) O (0 , J = gO − ( g − O (1) . (S121b)By using eq. (S120) which has bias terms proportional to M = hg for the full fitting, O , and h ( g − for the reduced sample estimator in eq. (S121), we see that we are left with O (0 , J = O ∗ − bh g ( g − − ch (cid:18) g − − g (cid:19) + O ( g − ) ≈ O ∗ − bM − cM , (S122)lacking the first order bias term. Although the higher order terms remain, their contribution is oftenlower than the first order term. G.2 Second order jackknife bias reduction
For further bias reduction we can apply a second order correction. In a similar spirit to what is done inthe first order jackknife, we split the data into g groups, and define O [ − j, − j (cid:48) ] as the parameter estimatorbased on a data set with the j th and j (cid:48) th group removed, each of size h . Following Schucany et al. [64]we get O (2) = 2 g ( g − g (cid:88) j 2) + O ( g − ) ≈ O ∗ + cM . (S124)43 .3 Variance for jackknife-bias-reduced estimators In this section, we use eq. (S9) to show that ˆ θ a − θ ∗ a is insensitive (to lowest orders in /M ) to thejackknifing procedure. As a consequence, the covariance estimation formula, eq. (4) in the main text,remains valid also for jackknifed parameter estimations.For later convenience, we define the derivative in eq. (S9) as A a,i = ∂ ˆ θ a ∂y i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y = y ∗ , (S125)which we will use in the following. G.3.1 First order jackknife bias reduction To first order the jackknife estimator is obtained by dividing the M trajectories into g groups of size h .Define the observable O [ − j ] ,i as the estimate for observable O , in point i , with group j removed. Inparticular, y [ − j ] ,i = 1 M − h (cid:88) m (cid:54) = m j y ( m ) i = 1 M − h M (cid:88) m =1 y ( m ) i − (cid:88) m j y ( m ) i . (S126)The corresponding non-jackknifed estimator is y i = 1 M M (cid:88) m =1 y ( m ) i . (S127)The bias of the first order jackknife estimator of θ ∗ a within the WLS-ICE method (see section A) is θ (0 , J,a − θ ∗ a = g ˆ θ a − ( g − g g (cid:88) j =1 θ [ − j ] ,a − θ ∗ a = 1 h M ˆ θ a − ( M − h ) 1 g g (cid:88) j =1 θ [ − j ] ,a = 1 h (cid:88) i A a,i M ( y i − y ∗ i ) − ( M − h ) 1 g g (cid:88) j =1 (cid:16) y [ − j ] ,i − y ∗ i (cid:17) = 1 h (cid:88) i A a,i M (cid:88) m =1 ( y ( m ) i − y ∗ i ) − g g (cid:88) j =1 M (cid:88) m =1 ( y ( m ) i − y ∗ i ) − (cid:88) m j ( y ( m j ) i − y ∗ i ) = (cid:88) i A a,i gh g (cid:88) j =1 (cid:88) m j ( y ( m j ) i − y ∗ i ) = (cid:88) i A a,i (cid:32) M M (cid:88) m =1 ( y ( m ) i − y ∗ i ) (cid:33) =ˆ θ a − θ ∗ a , (S128)where we used eq. (S9) to get to the second and last (fifth) row, and eq. (S126)-(S127) for the third row.Thus θ (0 , J,a − θ ∗ a = ˆ θ a − θ ∗ a . (S129)Hence, jackknifing a parameter estimate does not change the (co)variance: ( θ (0 , J,a − θ ∗ a )( θ (0 , J,b − θ ∗ b ) = (ˆ θ a − θ ∗ a )(ˆ θ b − θ ∗ b ) . (S130)44 .3.2 Second order jackknife bias reduction For the second order bias removal, the M trajectories are again divided into g groups. We define, asbefore, O [ − j, − j (cid:48) ] ,i as the estimate for observable O , in point i , with group j and j (cid:48) removed. In particular y [ − j, − j (cid:48) ] ,i = 1 M − h (cid:88) m (cid:54) = m j ,m j (cid:48) y ( m ) i = 1 M − h M (cid:88) m y ( m ) i − (cid:88) m j y ( m j ) i − (cid:88) m j (cid:48) y ( m j (cid:48) ) i . (S131)The average over all groups for θ a is θ (2) a = 1 g ( g − (cid:88) j (cid:54) = j (cid:48) θ [ − j, − j (cid:48) ] . (S132)The second order jackknife is now (as given by eq. (S123c)) θ (0 , , J,a = g θ (0 , J,a − g − θ (1 , J,a . (S133)Using eq. (S123b) we note θ (1 , J,a − θ ∗ a = 1 h ( M − h ) g g (cid:88) j =1 θ [ − j ] ,a − ( M − h ) g ( g − (cid:88) j (cid:54) = j (cid:48) θ [ − j, − j (cid:48) ] ,a − θ ∗ a = 1 h (cid:88) i A a,i (cid:32) g g (cid:88) j =1 M (cid:88) m =1 ( y ( m ) i − y ∗ i ) − g g (cid:88) j =1 (cid:88) m j y ( m j ) i − y ∗ i − (cid:34) g ( g − (cid:88) j,j (cid:48) M (cid:88) m =1 ( y ( m ) i − y ∗ i ) − g ( g − (cid:88) j,j (cid:48) M (cid:88) m j ( y ( m j ) i − y ∗ i ) − g ( g − (cid:88) j,j (cid:48) M (cid:88) m j (cid:48) ( y ( m j (cid:48) ) i − y ∗ i ) (cid:35)(cid:33) = 1 h (cid:88) i A a,i − g g (cid:88) j =1 (cid:88) m j ( y ( m j ) i − y ∗ i ) + 1 g − (cid:88) j (cid:48) g (cid:88) j (cid:88) m j ( y ( m j ) i − y ∗ i ) + 1 g − (cid:88) j g (cid:88) j (cid:48) ( y ( m j (cid:48) ) i − y ∗ i ) = 1 h (cid:88) i A a,i (cid:18) − g + 1 g + 1 g (cid:19) (cid:88) j (cid:88) m j ( y ( m j ) i − y ∗ i ) . (S134)Thus θ (1 , J,a − θ ∗ a = (cid:88) i A a,i M (cid:88) j (cid:88) m j ( y ( m j ) i − y ∗ i ) = ˆ θ a − θ ∗ a (S135)and ( θ (0 , , J,a − θ ∗ a ) = ˆ θ a − θ ∗ a . (S136)Thus the second order jackknife estimator has the same variance and covariance as non-jackknifed esti-mators. H Estimation of errors on estimated parameters, using jackknifeand bootstrap procedures H.1 Jackknife error estimation In the heuristic jackknife error estimation one makes use of the quantities O [ − j ] , see section G, andcalculates[65, 47] σ J = g − g g (cid:88) j =1 [ O [ − j ] − O (1) ] (S137)45here O (1) is given in eq. (S121). Then σ J serves as an estimate for the error on the estimated parameter.Note that in contrast to jackknife bias reduction which is mathematically justified (based on the expectedfluctuations around estimated mean values using the central limit theorem), there is in the general caseno corresponding simple justification for the jackknife error estimation procedure for the present type ofdata. H.2 Bootstrap error estimation In the bootstrap error estimation, the scheme is: • First, bootstrap [66, 9, 47] our original M trajectories, i.e., pick M trajectories from the originaldata with replacement (the same trajectory may be picked several times). Denote by ( ˜ y ( m ) i , t i )the associated observables and compute the synthetic mean value of the chosen observable ¯ y i = M − (cid:80) m ˜ y ( m ) i . • Make a weighted least squares (WLS) fit to the synthetic MSDs with respect to the fitting param-eters. This fitting yields parameters ˜ θ i .By repeating the two steps above many times (here, 100 times) we get a set of fit parameters ˜ θ i ( i = 1 , , . . . , ). From this set we simply compute the standard deviation as an estimator of the errorfor the fit parameters.[66, 9] I Coefficient of determination We determine the goodness of fit by using the R coefficient of determination, defined as R = 1 − S res S tot . (S138)The method is based on a sum of squares over the N sampling points of, in our case, the mean positionsor the MSD, y ; hence, measuring the deviation from the sample mean in time , Y = 1 N N (cid:88) i =1 y i (S139) S tot = N (cid:88) i =1 ( y i − Y ) (S140) S res = N (cid:88) i =1 ( f ( t i ; θ ) − y i ) . (S141)Heuristically, a model that fits data perfectly has an R = 1 , while if it does not fit at all, R (cid:28) , seeSupplementary Figure S5. J Settings in "Particle Tracker" plug-in For detecting and linking particles into trajectories from the Supplementary movies S1, S5 and S6 fromthe study by Chenouard et al.[48] we used the ImageJ plug-in "Particle Tracker" [50] (November 2016version) with the following settings: • • radius: 3 46 cutoff: 3 • radius: 0.1 • LinkRange: 1 (default: 2) • displacement: 10.00 • Dynamics: Brownianand the following advanced options: • Object features: 1.000 • dynamics: 1.000 ••