[PDF] Fitting a function to time-dependent ensemble averaged data

Abstract

Time-dependent ensemble averages, i.e., trajectory-based averages of some observable, are of importance in many fields of science. A crucial objective when interpreting such data is to fit these averages (for instance, squared displacements) with a function and extract parameters (such as diffusion constants). A commonly overlooked challenge in such function fitting procedures is that fluctuations around mean values, by construction, exhibit temporal correlations. We show that the only available general purpose function fitting methods, correlated chi-square method and the weighted least squares method (which neglects correlation), fail at either robust parameter estimation or accurate error estimation. We remedy this by deriving a new closed-form error estimation formula for weighted least square fitting. The new formula uses the full covariance matrix, i.e., rigorously includes temporal correlations, but is free of the robustness issues, inherent to the correlated chi-square method. We demonstrate its accuracy in four examples of importance in many fields: Brownian motion, damped harmonic oscillation, fractional Brownian motion and continuous time random walks. We also successfully apply our method, weighted least squares including correlation in error estimation (WLS-ICE), to particle tracking data. The WLS-ICE method is applicable to arbitrary fit functions, and we provide a publically available WLS-ICE software.

Full PDF

FFitting a function to time-dependent ensemble averaged data

Karl Fogelmark , Michael A. Lomholt , Anders Irbäck , and Tobias Ambjörnsson Computational Biology and Biological Physics, Department of Astronomy andTheoretical Physics, Lund University, 223 62 Lund, Sweden Department of Physics, Chemistry and Pharmacy, University of Southern Denmark,Campusvej 55, 5230 Odense M, Denmark * [email protected] 9, 2018 Abstract

Time-dependent ensemble averages, i.e., trajectory-based averages of some observable, are of importancein many ﬁelds of science. A crucial objective when interpreting such data is to ﬁt these averages (forinstance, squared displacements) with a function and extract parameters (such as diﬀusion constants).A commonly overlooked challenge in such function ﬁtting procedures is that ﬂuctuations around meanvalues, by construction, exhibit temporal correlations. We show that the only available general purposefunction ﬁtting methods, correlated chi-square method and the weighted least squares method (whichneglects correlation), fail at either robust parameter estimation or accurate error estimation. We remedythis by deriving a new closed-form error estimation formula for weighted least square ﬁtting. The newformula uses the full covariance matrix, i.e., rigorously includes temporal correlations, but is free ofthe robustness issues, inherent to the correlated chi-square method. We demonstrate its accuracy infour examples of importance in many ﬁelds: Brownian motion, damped harmonic oscillation, fractionalBrownian motion and continuous time random walks. We also successfully apply our method, weightedleast squares including correlation in error estimation (WLS-ICE), to particle tracking data. The WLS-ICE method is applicable to arbitrary ﬁt functions, and we provide a publically available WLS-ICEsoftware.

Introduction

Time-dependent ensemble averages appear in several scientiﬁc ﬁelds. Examples include: particletracking experiments where mean square displacements (MSD) are measured at diﬀerent samplingtimes [1], human travel dynamics where dispersal distance as a function of time are measured[2], single-molecule pulling experiments[3], applications of ﬂuctuation theorems [4] such as the Jarzynski equality[5], measurements of the time-dependence of donor-acceptor distance dynamics[6], tracer particle dy-namics in complex systems[7] and correlation functions in spin systems and lattice gauge theories[8].The ﬁnal step when interpreting ensemble averages is often to ﬁt a function to these averages in orderto extract parameters.Fitting a function to data is done so readily in science that one seldom considers the correctness of thestandard go-to solution of the (linear and non-linear) weighted least squares (WLS) method [9, 10, 11].One of the crucial implicit assumptions of the “standard” version of this method is that the ﬂuctuationsaround mean values are independent. However, since for time-dependent ensemble averages the datais sampled along trajectories, this independence assumption is in general not satisﬁed when analyzingensemble averages; heuristically, if in one trajectory an observable, such as the square displacement, was a r X i v : . [ phy s i c s . d a t a - a n ] M a y maller than its ensemble averaged value at some time, it is typically still so at the next time step. Foran illustrative example, see Figure S1 in Supplementary Information, which shows the time-evolution insimulations of fractional Brownian motion (FBM). Thus, the ﬂuctuations around an ensemble averaged(time-dependent) observable will in general exhibit temporal correlations. Herein, the term trajectoryis used in its widest sense: an observable (such as squared displacement) is chosen, and a trajectory isthen measurements of this observable at diﬀerent consecutive sampling times.The question now arises of how severe the consequences of neglecting the temporal correlations in leastsquares ﬁtting are. We demonstrate that such neglect leads to unreliable error estimation for parametersand can in some cases lead to underestimated errors for ﬁtted parameters (such as diﬀusion constants)by more than one order of magnitude for our prototype systems (see below). The unreliability of theestimated errors can have detrimental eﬀects when statistically interpreting the data: The σ ( σ ) rulefor Gaussian statistics states that 68 % (95 %) of the observed data should (on average) fall within ± ( ± ) σ from the estimated mean. For this rule to be meaningful one must have a correct estimator forthe variance in estimated parameters, σ .To our knowledge, the only previous method for dealing fully with correlation in data for functionﬁtting to ensemble-averages is the correlated chi-square method (CCM) [12, 13]. This method is knownto the lattice quantum chromodynamics community, but does not seem to have found wide spread use.This could partly be due to that, while mathematically sound, numerical robustness issues have beenidentiﬁed [14, 15]. We here carefully examine the CCM method and demonstrate that it in general onlyprovides correct parameter estimation in a small region of the "phase space" ( N, M ) , where N is thenumber of sampling times and M is the number of trajectories. Thus, it appears that the CCM is oflimited general purpose use for ﬁtting of time-dependent ensemble averages to a model function.Although the least squares and WLS methods are common techniques for parameter estimationfrom ensemble averages, alternative methods exist, e.g., for inferring parameters from trajectories forbiological systems.[16, 17, 18] In particular, for Brownian motion (BM) an optimal estimator for thediﬀusion constant has recently been derived[19, 20, 21]. Bayesian methods [11, 22, 23, 24, 25, 26] havealso been used for parameter estimation for certain classes of systems. In general, when they apply, thesemethods give more precise parameter estimates than the WLS method. However, these newer approachesrequire as input a full stochastic model of the process, and we refer to this type of approach as modelmatching methods. By a full stochastic model we here refer to a model from which (in principle) anyprobability or average of a measured observable can be calculated. A simple example is BM, where thetime-evolution is described by a Langevin equation with a noise term for which the statistics is fullyspeciﬁed. In contrast, the WLS and CCM methods are parametric function ﬁtting [27] type methods,which can be used even if a full stochastic model is not available to describe the data at hand. An examplefrom single-particle tracking, where function ﬁtting is useful, is if one wants to determine a power-lawexponent for the scaling of the mean-square displacement with time. In this situation, a function ﬁttingprocedure such as WLS can be used, without making any assumption about the underlying dynamics.Also, even if a full stochastic model is indeed available, it might be impractical to carry out a full modelmatching procedure.In this article, we derive a mathematically rigorous expression for the variance and covariance ofestimated parameters in WLS ﬁtting. Our new error estimation formula for ﬁtted WLS parameters takesinto account the temporal correlations, which are intrinsic to ensemble averages based on trajectories.To avoid confusion we term the “standard” WLS method[9, 10, 11] (i.e., weighted least squares neglectingcorrelation) as WLS-ECE (Weighted Least Squares Excluding Correlation in Error estimation), whereasour new approach is referred to as WLS-ICE (Weighted Least Squares Including Correlation in Errorestimation). In ﬁgures and discussion where we only consider parameter values and not the associatederrors, we only use the term WLS. In contrast to the previous two methods (WLS-ECE and CCM), ournew method has the desirable unique features of providing both (1) robust parameter estimates in thefull phase space ( N, M ) with mean parameter values in agreement with theory for our prototype systems;(2) error estimates that reproduce the observed spreads in our ﬁtted parameters.As prototype models we use BM, damped harmonic oscillation (DHO) in a heat bath, FBM andcontinuous time random walks (CTRW). These have been identiﬁed as important model systems in awide range of systems. BM is of interest to many ﬁelds of science [28, 29, 30]. Variants of DHO appear2n physics, engineering and chemistry.[31] FBM has been applied, for instance, to protein dynamics[6],in ﬁnancial modeling[32], for analyzing climate time series[33], to describe tracer particle diﬀusion[7, 34]and for modeling earth quake phenomena[35]. Recent applications of CTRW[28, 36] include modelingof human travel patterns[2] and of molecular motions in cells and cell membrane[37, 34]. However, wepoint out that our model systems are merely convenient examples for illustrating our WLS-ICE functionﬁtting procedure, which can be applied to arbitrary ﬁt functions. Our four model systems provide idealtest beds for our method, because the functions to be ﬁtted, the mean position and MSD, are knownanalytically for these systems. Moreover, trajectories are fast to generate for these systems, which, whichfacilitates stringent testing of the ﬁtting methods based on a relatively large number of trajectories.We ﬁnally point out two restrictions on the scope of our study: First, we do not concern ourselveswith the model selection problem [38, 11], i.e., how to choose the “best” model or “best” form for the ﬁtfunction. Second, in single particle tracking (one of the application ﬁelds of our results), it is commonto separate between time-averaged observables (such as the time-averaged MSD) and ensemble averagedobservables.[39, 40] In certain cases, these averages are described by the same functional form, but thisis not always so.[40] In this study our sole focus is on ensemble averaged observables. Methods

In what follows, we provide a ready-to-use method, which is further motivated and detailed in SectionA in Supplementary Information.

The WLS-ICE procedure

In experiments or simulations one records a set of trajectories, here denoted by m . The task at handis to ﬁt some functional form f ( t i ; θ ) = f i ( θ ) , with K free ﬁtting parameters θ = θ , . . . , θ K to someensemble averaged observable y ( t i ) = y i over the trajectories, i.e., to a sample mean of the form y i = 1 M M (cid:88) m =1 y ( m ) i (1)where the index i is over the N sampling times T = T , . . . , T N (with N ≥ K ). Herein, we use boldsymbols to denote vectors or matrices. For BM, FBM and CTRW (see Results), which are all zeromean processes, the observable used is the squared displacements, i.e., y ( m ) i = | x ( m ) ( T i ) − x ( m ) (0) | ,where x ( m ) ( t ) is the position (a vector with d components, where d is the number of spatial dimensions)at process time t for trajectory m , and the start time for the simulation/experiment is t = 0 . ForDHO, our non-zero-mean prototype process, we instead use the position directly as relevant observable, y ( m ) i = x ( m ) ( T i ) . It is important to point out, however, that in the ﬁtting procedure the quantity y ( m ) i can be any observable for trajectory m at sampling time T i . We shall consistently use a ’bar’ todenote a sample estimator (we only make use of sample means and sample covariances). The challengein function ﬁtting procedures [10] is to ﬁt some function f i ( θ ) to the data y i and thereby extract themodel parameters, θ . This problem has previously been tackled using the WLS-ECE or CCM methods(reviewed in Section B in Supplementary Information).Our approach, the WLS-ICE method, extends the WLS-ECE procedure with a correct error es-timation formula which takes correlations in ﬂuctuations around ensemble averages into account (seeIntroduction). For completeness and ease of application, we here provide the full details of the proposedWLS-ICE ﬁtting procedure. We start by introducing a cost function, χ , based on the the diﬀerencebetween the sample average and the ﬁt function Λ i = f i ( θ ) − y i for all time points, according to χ = Λ T R Λ , (2)where R is a symmetric positive deﬁnite matrix. This cost function is to be minimized with respect to θ in order to determine the best parameter values, ˆ θ a ( a = 1 , . . . , K ) [41]. We use a ’hat’ to denote3arameters which have been estimated through minimization of the χ cost function above and for theestimated (co)variance of such parameters. In the WLS method one uses weights R ij = R ij = δ i,j /C ij ,where δ i,j is the Kronecker delta, and the (unbiased) sample “covariance matrix of the mean” is deﬁnedas C ij = Q ij /M , with Q being the sample covariance matrix Q ij = 1 M − M (cid:88) m =1 ( y ( m ) i − y i )( y ( m ) j − y j ) . (3)While this speciﬁc choice of R is used in our applications, we note that the results in this section,including the new error formula below, is valid for arbitrary choices of R . In Section A in SupplementaryInformation we elaborate on one "non-conventional" choice of R particularly adapted for BM.The parameters, ˆ θ a , obtained by minimizing χ in equation (2), have a (co)variance ∆ ab = (cid:104) (ˆ θ a − θ ∗ a )(ˆ θ b − θ ∗ b ) (cid:105) , where (cid:104) . . . (cid:105) denotes ensemble average. Throughout this study we use a ’star’ to denote exactparameter values, i.e., estimated values as M → ∞ . The variances of the ﬁtted parameter are σ a = ∆ aa .As noted in the Introduction, this covariance depends on the temporal correlations. For a stationaryprocess, it is well-known how to estimate the variance of a mean in the presence of temporal correlations,typically by expressing the variance in terms of the sum or integral of the auto-correlation function [42,43]. In the present context, such an estimation corresponds to ﬁtting to a constant, f i ( t ) = θ , andassuming all correlation functions only depend on time diﬀerences.We here extend the above-mentioned results to non-stationary processes and arbitrary ﬁt func-tions by deriving the analogous expression for ˆ∆ ab by using the full multivariate probability densityfor the ﬂuctuations around mean values. Brieﬂy, the covariance for the estimated parameters is de-ﬁned ˆ∆ ab = (cid:104) (ˆ θ a − θ ∗ a )(ˆ θ b − θ ∗ b ) (cid:105) where (cid:104) F ( y ) (cid:105) = (cid:82) F ( y ) ρ ( y ; θ ∗ ) dy dy · · · dy N denotes an averageover the multivariate probability density, ρ ( y ; θ ∗ ) . We note that the dependence of the estimatedparameters ˆ θ on y is implicitly determined by the minimization condition ∂χ /∂θ a = 0. Now, be-cause all y i are averages over M identically distributed random numbers, for large M , it immedi-ately follows from the multivariate central limit theorem that the function ρ takes the Gaussian form: ρ ( y ; θ ∗ ) = Z − exp( − ( y − y ∗ ) T C ∗− ( y − y ∗ ) / with normalization constant Z = (2 π ) N/ (cid:112) det( C ∗ ) [44].Two complications that occur in evaluating ˆ∆ ab in closed-form are that the y -dependence of ˆ θ is implicit,and, in general, non-linear. Both of these challenges are solved by making a Taylor series expansion of ˆ θ a − θ ∗ a in terms of y − y ∗ and implicitly using the minimization condition. The full derivation is givenin Section A in Supplementary Information. The ﬁnal result is the following estimator: ˆ∆ ab = ˆ φ ab M , (4a) ˆ φ ab = 4 (cid:88) c,d (cid:88) i,j ( ˆ h − ) ac ∂f i ( θ ) ∂θ c (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ ( R T Q R ) ij ∂f j ( θ ) ∂θ d (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ ( ˆ h − ) db , (4b)and ˆ h ab = 2 (cid:88) i,j ∂ f i ( θ ) ∂θ a ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ij Λ j + 2 (cid:88) i,j ∂f i ( θ ) ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ij ∂f j ( θ ) ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ , (4c)where the indices a, b = 1 , . . . , K . Equation (4) gives a mathematically rigorous expression (to lowestorder in /M ) for the covariance of the estimated parameters, and is our key result. It allows us toaccurately estimate the covariance of any parameter ﬁtted by minimizing the cost function in equation (2).Notice that the correlations in ﬂuctuations around mean values enter through the quantity Q , which isestimated using the usual sample estimate above. In practice, our general formula, equation (4) is simpleto implement and computationally fast.The new error estimation formula, equation (4), reduces to previously known results in speciﬁc limits.(i) Neglecting the oﬀ-diagonal elements of Q above we recover the WLS-ECE error estimation formula [9].(ii) By setting R = C − above we recover the covariance estimation formula for CCM [12, 10]. (iii) Fora stationary process one seeks to ﬁt a constant, f i ( θ ) = θ , to data. For such a case, the minimization4rocedure (solving ∂χ /∂θ = 0 with R ij = (1 /σ ) δ i,j , where σ is the time-independent variance) yields ˆ θ = (1 /N ) (cid:80) i y i , i.e., the parameter estimate is the mean of the data. The error estimation equation (4),then reduces to the usual result [42, 43] ˆ∆ = (1 /M ) (cid:80) i,j Q ij /N used, for instance, in analyzing MonteCarlo and molecular dynamics simulations. (iv) For linear ﬁt functions, f i ( θ ) = θ t i , equation (4) reducesto previously known expressions (equation 5.253 in van den Bos [10]). Validation procedure

We tested the diﬀerent ﬁtting procedures on simulation data for our four prototype systems (generatedas described in Section D in Supplementary Information). Estimated parameters, ˆ θ a , were comparedto their known exact values θ ∗ a (see Section C in Supplementary Information). For BM, the MSDbehaves as (cid:104) [ x ( t ) − x (0)] (cid:105) = f ( θ, t ) = θ t . The corresponding expression for FBM and CTRW is (cid:104) [ x ( t ) − x (0)] (cid:105) = f ( θ , t ) = θ t θ . For DHO (at critical damping and with the initial conditions x (0) = x and v (0) = 0) , the mean position has the form (cid:104) x ( t ) (cid:105) = f ( θ, t ) = x (1 + θ t ) exp( − θ t ) .For validating the WLS-ICE estimator for ∆ ab , we generated S simulation sets (with S = 500 )each consisting of M trajectories. Using these S × M trajectories, we obtained S number of parameterestimates ˆ θ a . From these S estimates we calculate the covariance ∆ ab (using sample estimators), whichthen serves as true ∆ ab (“ground truth”). This true ∆ ab is then compared to estimates based on theWLS-ICE error formula, equation (4) (which requires only one set of simulations), and the correspondingerror estimates for WLS and CCM. Code availability

Computer codes (Python, Octave/ matlab , and Lisp) which performs the associated ﬁtting (determining ˆ θ a ) and error estimation (calculating ˆ∆ ab ), using a set of measured observables for diﬀerent trajectoriesand at diﬀerent times as input, is freely available under the gnu General Public License ( gpl ) [45] at http://cbbp.thep.lu.se/activities/wlsice/ . Results

Our ﬁrst test of the ﬁtting methods involve comparing histograms of ﬁtted parameters for our fourprototype systems (the number of trajectories, M , and number of sampling times, N , were kept ﬁxed).For both CCM and WLS the S ﬁtted values of a given parameter were binned to a histogram, see Fig. 1,and compared to a Gaussian centered on the mean of the estimated parameters with a variance from theaverage of the error estimates, using either the WLS-ECE or WLS-ICE method. For WLS, the histogramof ﬁtted parameters is centered close to the true value (see also Figure S3 in Supplementary Information).However, only the WLS-ICE method gives a correct error estimation, equation (4), as the predicted widthof the WLS-ECE method, see Section B in Supplementary Information, is much too narrow. Clearly,the new error estimation of the WLS-ICE method performs extremely well. By contrast, the WLS-ECEmethod does not provide correct errors of the estimated parameters; this result extends beyond the chosenparameters for ( N , M ) in Fig. 1, and holds true under rather general conditions, see Fig. 2 (the exceptionis the prefactor for CTRW for very small M ). Notice that while the parameters from the WLS-ICE andWLS-ECE methods are centered on the analytical prediction, this is not true for parameters from theCCM method, which show a strong bias (Fig. 1) for BM, FBM and CTRW (not for DHO). Thus, theWLS-ICE is the only method which yields an acceptable bias and correct error estimation for all modelsystems. Note that for the ensemble size used in Figure S2 in Supplementary Information, the distributionof ﬁtted parameters is well described by a Gaussian, see Section F in Supplementary Information for adiscussion on this topic. For a smaller ensemble size there are deviation from a Gaussian distribution,see Figure S2 in Supplementary Information, in particular for the prefactor for CTRW. From Fig. 2 wenotice that the variance in the estimated parameter does not approach zero as N → ∞ . Hence, the onlyway to decrease the variance in estimated parameters further is to increase M (the WLS estimator isconsistent with respect to M ). 5igure 1: Histograms of ﬁtted parameters for two WLS methods and CCM compared totheoretical predictions.

Each method is tested on: ( a ) Brownian motion (BM), ( b ) damped harmonicoscillation (DHO) ( c – d ) fractional Brownian motion (FBM), and ( e – f ) continuous time random walk(CTRW). In each test, we generate S = 500 data sets, each consisting of M = 1000 trajectories sampledat N = 75 time points (histograms). Panel ( a ) shows the MSD prefactor (proportional to the diﬀusionconstant) for BM, panel ( b ) shows DHO ﬁtting parameter θ , while panels ( c-f ) left and right panels showthe MSD prefactor θ , and the exponent θ , respectively. For comparing WLS-ICE and WLS-ECE to thehistograms based on the S data sets, we place Gaussian functions with their center positions at the meanof the WLS-ﬁtted parameters. The widths of the Gaussians correspond to the parameter uncertaintyestimated by the ﬁt method (averaged over the S number of ﬁts). The CCM ﬁts for BM and CTRWexhibits a strong bias in the parameter value (not centered on the analytical prediction), and the WLS-ECE ﬁt gives an error estimation, see Section B in Supplementary Information, that is much too small.The new WLS-ICE procedure (Methods) works well, i.e., exhibits negligible bias for all model systemsand yields correct error estimation, equation (4). The rather large number of trajectories ( M = 1000 )was used in order to avoid ill-conditioness and major bias issues for the CCM ﬁtting, compare to Fig. 3.Results for a smaller ensemble size are found in Figure S2 in Supplementary Information, where we seethat also for FBM there can be pronounced bias eﬀects for CCM ﬁtting. For simulation parameters, seeSection D.5 in Supplementary Information. 6igure 2: Error estimation.

Standard deviation from the WLS-ECE and WLS-ICE parameter ﬁts asa function of the number of sampling points, N , used in the ﬁtting procedure (log-scale on the horizontalaxis for visibility). Each method is applied to S = 500 realizations of data from ( a ) Brownian motion(BM), ( b ) damped harmonic oscillation (DHO), ( c – d ) fractional Brownian motion (FBM), and ( e – f )continuous time random walk (CTRW). In conjunction we show the true standard deviation of each ofthese methods computed from the parameters from the ﬁt (lines), i.e., the width seen in Fig. 1, but foran extended range of N . It is evident that the standard deviation from the WLS-ECE ﬁt is far too smallfor almost all N . Error bars show standard error of the mean. For panels a-d there are small biasesfor M = 20 and M = 80 in the observable ˆ σ , as compared to actual standard deviation. These biasescan be removed using the jackknife procedure applied to equation (4b), see Section G in SupplementaryInformation. For panel e, M = 20 , there is discrepancy between the WLS-ICE estimate ˆ σ , and theactual standard deviation; we assign this to slow convergence towards the asymptotic form for themultivariate distribution ρ (see Methods) for CTRW (see also Figure S2 in Supplementary Information).For simulation parameters, see Section D.5 in Supplementary Information.7s we have seen (Fig. 1), the CCM method gives a pronounced bias in the parameter estimate fora speciﬁc choice of the number of sampling times N and trajectories M for BM, FBM and CTRWsystems, but not for DHO. In order to understand the generality of these ﬁndings, we numericallyquantiﬁed the bias for an extended range of ( N, M ) values, and ﬁnd that the pronounced bias forBM, FBM and CTRW (and lack of bias for DHO) is rather general, see Figure S3 in SupplementaryInformation. In Section E in Supplementary Information we investigate the expected bias for the CCMmethod further by analytical means. Indeed, we ﬁnd that the parameter estimate from CCM ﬁtting isunbiased for DHO. Mathematically, this result follows from the fact that the observable (mean position)used for the ﬁtting is a linear function of the noise (this is in contrast to BM, FBM and CTRW,where the squared displacements are used as relevant observables). For BM, our analytical calculationin Section E in Supplementary Information shows that for large N the bias for CCM ﬁtting becomes (cid:104) ˆ θ (cid:105) = θ ∗ + DG ( N ) /M , where G ( N ) ≈ − N/ (ln N + γ + 2 ln 2) and γ ≈ . is the Euler-Mascheroniconstant. Thus, with increasing number of sampling points N , the bias increases as N/ ln N (see FigureS3 in Supplementary Information). The bias for CCM appears also in the FBM and CTRW systems, asseen in Fig. 1 Figure S3 in Supplementary Information. A similar calculation for the WLS parameterestimate, see Section E in Supplementary Information, yields only a minor, essentially N -independent,bias with G ( N ) = − − /N ) for BM.In order to further investigate practical implications of the pronounced bias for CCM ﬁtting, as wellas other known issues with the CCM method [14, 15], we quantiﬁed in what parts of phase space ( N, M ) the CCM ﬁtting and WLS-ICE provides “acceptable” (see below) parameter estimation, see Fig. 3. First,we ﬁnd that for large N and moderate to small M , the sample estimate for the covariance matrix C is ill-conditioned (the condition number is larger than the machine precision). In practice this meansthat it cannot be numerically inverted, as required in the CCM parameter estimation procedure, withoutuncontrollable numerical errors. Second, for parts of phase space where ill-conditioness is not an issue,we, rather generously, deﬁned an acceptable ﬁt as one where the bias is smaller than 10% (comparedto the analytic value, θ ∗ a ). We ﬁnd that for BM, FBM and CTRW there is indeed a thin region of the ( N, M ) -phase space (large M and small N ) where CCM works. For DHO, the bias eﬀect is negligible,as previously noted. However, the ill-conditioness issue is as pronounced for DHO as for BM, FBM andCTRW. In contrast, for WLS ill-conditioness is not a problem (no matrix inversion is required in thisprocedure), and the bias in the parameter estimation is acceptable for most parts of the phase space.The bias inherent in the CCM method (for observables which are not linear functions of the noise (MSDfor BM, FBM and CTRW)) can be reduced by applying the common jackknife procedure [46], whichremoves bias terms proportional to /M , see Section G in Supplementary Information. By applying the(ﬁrst-order) jackknife procedures to BM, FBM and CTRW (Fig. 3), we ﬁnd that the bias is reducedwhich expands somewhat the region of the phase space where the CCM method may be used reliably.Note that the computational time is a factor g (i.e., the number of groups into which the trajectoriesare pooled) larger for the ﬁrst-order jackknife procedure compared to the non-jackknife case. Finally,the jackkniﬁng procedure can be extended to remove higher order bias terms (proportional to /M n ,with n = 2 , , . . . ) [46]. However, for the present case there is no guarantee that these higher order termshave this functional form with respect to M , see Section E in Supplementary Information. Also, ourresults show that the second-order jackknife increased, rather than decreased, the bias in the parameterestimations for most parts of the phase spaces (Fig. 3). For BM, Figure S4 in Supplementary Informationindicates that the reason for this is that the third order term (term proportional to /M ) is generallylarger in amplitude (but of opposite sign) than the second order one. Higher order bias reduction comesat a computational price, since the number of numerical evaluations required for second order jackknifeis g ( g + 1) / times that of non-jackknifed parameter estimation. Due to these ﬁndings and the lack of aformal functional form for the bias, beyond the /M term (see above), we do not recommend applyingthe jackknife procedure beyond ﬁrst order. Finally, we point out that the new error estimation formula,equation (4), remains valid also for jackknifed parameters, see Section G in Supplementary Information.In Figure S5 in Supplementary Information we investigated the "goodness of ﬁt" for the WLS andCCM procedures using a standard R metric (see Section I in Supplementary Information). Examplesof ﬁtted curves are found in Figure S6 in Supplementary Information. A good ﬁt is characterized by R ≈ . We ﬁnd that, in this sense, the new method provides "good" ﬁts. In contrast, the CCM method8igure 3: Phase space of reliable parameter estimation for CCM and WLS.

For each of ourexample systems, ( a ) Brownian motion (BM), ( b ) damped harmonic oscillation (DHO) ( c – d ) fractionalBrownian motion (FBM), and ( e – f ) continuous time random walk (CTRW), we investigate for whichnumber of sampling times N , and number of trajectory realizations M , the ﬁtting is more than 10%oﬀ from its analytical value, averaged over S = 500 simulations. As indicated, CCM is only reliable ina limited region (large M , small N ), which can be extended by a ﬁrst order jackknife correction. ForBM we also include when the analytically predicted ﬁrst order bias term for CCM, G ( N ) , see SectionE in Supplementary Information, gives a bias that is 10% of the true parameter value. We also showthe boundary for when more than half of the S generated covariance matrices become ill-conditioned.Interestingly, for the CCM a second order jackknife generally does more harm than good compared tothe ﬁrst order, which we elaborate more on in Figure S4 in Supplementary Information. In contrast toCCM (non-jackknifed), the parameter estimations for the WLS method are acceptable for most N, M (region above the green curve), and can be extended even further using a jackknife approach (data notshown). For simulation parameters, see Section D.5 in Supplementary Information.9rovides "bad" ﬁts for BM, FBM and CTRW with R (cid:28) for large N . We point out that for the presenttype of data, R is only a heuristic goodness-of-ﬁt metric — its distributional properties are not knownfor general ﬁt functions and correlated data.When computational times are not a concern, error estimation using bootstrap resampling (or therelated jackknife error estimation procedure) are common method (see Section H in SupplementaryInformation).[47] We here ﬁnd that bootstrap resampling performs as well as WLS-ICE in general forour four models (jackknife error estimation is slightly worse), see Figure S7 in Supplementary Information.Thus, our numerical results indicate that for the type of observables and ﬁt functions used in our modelsystems, the bootstrap can be used for calculating the variance for parameters estimated through χ minimization. However, we point out that such resampling techniques require us to repeat the χ minimization several (herein, 100) times (the WLS-ICE method requires only one χ minimization).Such minimization can be computationally costly, especially for the case when the number of unknownparameters is large. Moreover, one must bear in mind that the bootstrap method is in general a heuristicmethod (there are cases when it does not apply[47]).As a ﬁnal alternative to the WLS-ICE method, we now brieﬂy turn to error estimation using subsam-pling [43]. Subsampling refers to the method of choosing sampling times suﬃciently sparsely in order tomake the data points essentially uncorrelated (the “brute force” method in Figure S1 in SupplementaryInformation is an extreme case of subsampling where only one data point per trajectory is kept). Aftersubsampling, error analysis is performed using standard error analysis for independent data. In orderto properly choose N within this method, N is systematically decreased until the variance saturatesto a constant, which is assumed to be the true variance [43]. Notice for stationary time series, ratherthan reducing the number of sampling times, one can make full use of the data through the blockingmethod.[42] For non-stationary processes the blocking method cannot be used, however. Fig. 2 showshow estimated errors from our WLS-ECE and WLS-ICE analyses depend on the number of data pointsused, N . We ﬁnd that temporal correlations are so strong that the WLS-ECE method underestimatesthe errors down to very small N . Moreover, ﬁnding a suﬃciently small N is diﬃcult, since the error doesnot in general saturate to a constant level as N is reduced. These problems are circumvented by insteadusing the error estimation from the WLS-ICE method (i.e., using equation (4) instead of the WLS-ECEequations in Section B in Supplementary Information).As a ﬁnal test of our method, we now turn to "real world" data. To that end, we use particletracking data used in a competition for testing particle tracking software where 14 teams world-wideparticipated.[48] We choose to analyze this data set for two reasons. First, it served as standard bench-mark data within the particle tracking community. Second, since these movies are based on noisiﬁedand pixelated simulations (aiming to mimic actual experimental data), we know the values of the un-derlying model parameters. We used their Supplementary Videos 1 (medium particle density), 5 (lowparticle density) and 6 (high particle density). All these movies correspond to BM of vesicles for whichthe expected MSD for the data sets are (cid:104) [ x ( t ) − x (0)] (cid:105) = f BM ( θ, t ) = θ t , with θ = 2 dD = 8 . Forparticle detection in the movies and linking of particle positions into trajectories we used Method 1[48],i.e., the tracking method described by Sbalzarini et al.[49], and implemented as the ImageJ plugin "Par-ticle Tracker" by the MOSAIC group [50]. Parameter settings for the plug-in are listed in Section Jin Supplementary Information. For each video we extracted trajectories which were subsequently cutinto trajectories consisting of discrete process times (there is no memory in BM, so the start time isinessential). Notice that for the higher particle density, fewer suﬃciently long trajectories were producedas compared to the low density scenario (values for M are listed in Table 1). We subsequently dividedthe trajectories for each movie into two data sets each with M trajectories. For the ﬁtting procedures theﬁrst process time point, t = 0 , was discarded (since at t the position is precisely known, the variance =0 and can not be used as a weight in equation (2)), thus leaving us with N = 6 sampling times. Resultsfor the estimated parameters, ˆ θ and associated standard deviation, ˆ σ are found in Table 1. We noticethat the CCM method fails at predicting the correct parameter for high and medium particle densities.This ﬁnding is simply due to the smaller ensemble size for these cases which, in turn, is a result of thetracking software’s inability to track and link particles in high and medium density settings. Comparingthe WLS-ECE and WLS-ICE method, we see that the WLS-ECE underestimates the error by factors ≈ σ (conﬁdence level 95 %) of the expected result ( = 8 ). Incontrast, for the WLS-ICE all six observed parameter estimations for θ fall within σ of the expectedvalue. Description

Low density Medium density High density

Video

S5 S1 S6

Number of trajectories M = 310 M = 16 M = 5 Method

ObservableWLS-ICE ˆ θ ˆ σ ˆ θ ˆ σ ˆ θ ˆ σ Results of the three ﬁtting methods for “real world” particle tracking data . Particletrajectories where extracted from the “Vesicle” Supplementary videos from the article by Chenouard etal [48] using the “Particle Tracker” software (MOSAIC group). The trajectories where cut into shortertrajectories, all of length 7 discrete process times. The short trajectories were then divided into twoindependent sets of size M . We then performed ﬁtting using the WLS-ICE, WLS-ECE and CCMmethods for BM, discarding the ﬁrst process time point, resulting in N = 6 sampling times. Expectedparameter value is θ = 8 (data are noisiﬁed and pixelized simulations with known properties). Since M was very small for video S6, we applied the jackknife procedure both in parameter and error estimation(all videos). Results before jackkniﬁng are found in Table S1 in Supplementary Information. We noticethat the CCM method gives ill-conditioness issues for the high density movie, where few trajectoriescould be extracted. The WLS-ECE method underestimates the error as compared to WLS-ICE method.Let us ﬁnally brieﬂy discuss how well one is expected to be able to estimate a parameter based on ex-perimental/simulation data. For model matching procedures (see Introduction), the Cramer-Rao boundis useful by providing an expression for the smallest possible variance in the estimated parameter.[10]For the case of BM, optimal estimators (i.e., estimators which reach the Cramer-Rao bound) based onthe measured displacements have been derived for model matching type ﬁtting[19, 20, 21]. For functionﬁtting, the question is rather whether an optimal cost function, i.e., an optimal weight matrix R , can befound (see equation (2)). If the covariance matrix for the process is independent of the inferred param-eters (up to a proportionality constant), and for linear ﬁt functions, then the generalized least squaresmethod can be shown to be optimal among unbiased WLS methods.[51]. Since the generalized leastsquares method requires as input the inverse of the true covariance matrix, it can be viewed as a hybridmethod in between model matching and function ﬁtting. In Figure S8 in Supplementary Informationwe show results for the generalized least squares for BM (we use the term BMALS – Brownian motionadapted least squares) where we see that, indeed, the variance in estimated parameter value is smallerfor BMALS as compared to WLS-ICE, although the diﬀerence is not dramatic. Also notice that for M and N values where the CCM “works” (acceptable bias, see Fig. 3) the variance in estimated parametersfor CCM and BMALS agree, as it should. Discussion, conclusion and outlook

A common task in many ﬁelds of science is that of ﬁtting a model to the time-evolving mean of someobservable. Since ﬂuctuations around observed mean values, calculated based on trajectories, are ingeneral correlated in time, the error estimates provided by a “standard” weighted least squares (WLS-ECE) ﬁt can be more than one order of magnitude too small, see Fig. 2. Further, the correlated chi-square method (CCM), involving numerical inversion of a noisy covariance matrix, often show numerical11nstabilities (ill-conditioning) or a strong bias in the ﬁtted parameters, see Fig. 3. To overcome theseproblems, we derived a new error estimation formula, see equation (4), for weighted least squares ﬁtting,which does not require inversion of a noisy covariance matrix. With this formula at hand, a simple, yetaccurate, function ﬁtting procedure, WLS-ICE, can be followed: (A) perform a weighted least squares ﬁtto the data, (B) use the new formula to estimate the errors. We demonstrated on four simulated prototypesystems that the WLS-ICE method provides robust results, with a negligible bias in the ﬁtted parametersand accurate error estimates. Our method’s estimated errors are comparable to errors estimated usingbootstrap and jack-knife resampling for the four model systems. A strength of our method is that theﬁtting procedure does not have to be repeated multiple times.We separated between two types of parameter estimation procedures: model matching where a fullstochastic model is matched to the data, and function ﬁtting in which a full stochastic model is notknown and one rather seeks to ﬁt a function to the chosen ensemble-averaged observables. The weightedleast-squares method is a procedure of function ﬁtting type.We have in this study not discussed methods for dealing with experimental errors, such as missingdata etc. Such errors depend on the experimental setup and typically have to be dealt with in diﬀerentways depending on setup. For the single-particle tracking ﬁeld (one of the application ﬁelds of ourresults), two major sources of experimental errors are: eﬀects due to the ﬁnite size of pixels in camerasused to record the trajectory and motional blur eﬀects (in a single time frame, a ﬂuorescent moleculemoves while being imaged). Methods for correcting these types of errors are discussed by Savin et al.[52],Martin et al.[53], Berglund[19] and Calderon.[54]Parameter estimation through χ minimization is ubiquitous throughout many ﬁelds of science, andwe hope that our method and publically available software will be found useful in these ﬁelds. References [1] Saxton, M. J. Single-particle tracking: connecting the dots.

Nature Methods , 671–672 (2008).[2] Brockmann, D., Hufnagel, L. & Geisel, T. The scaling laws of human travel. Nature , 462–465(2006).[3] de Souza, N. Pulling on single molecules.

Nature methods , 873–877 (2012).[4] Seifert, U. Stochastic thermodynamics, ﬂuctuation theorems and molecular machines. Reports onProgress in Physics , 126001 (2012).[5] Jarzynski, C. Nonequilibrium equality for free energy diﬀerences. Physical Review Letters , 2690(1997).[6] Kou, S. & Xie, X. S. Generalized langevin equation with fractional gaussian noise: subdiﬀusionwithin a single protein molecule. Physical Review Letters , 180603 (2004).[7] Szymanski, J. & Weiss, M. Elucidating the origin of anomalous diﬀusion in crowded ﬂuids. PhysicalReview Letters , 038102 (2009).[8] Rothe, H. J.

Lattice gauge theories: an introduction, 4th ed. , vol. 74 (World Scientiﬁc, 2012).[9] Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P.

Numerical Recipes 3rd Edition:The Art of Scientiﬁc Computing (Cambridge University Press, New York, NY, USA, 2007), 3rdedn.[10] Van den Bos, A.

Parameter estimation for scientists and engineers (John Wiley & Sons, 2007).[11] Sivia, D. & Skilling, J.

Data analysis: a Bayesian tutorial (OUP Oxford, 2006).[12] Gottlieb, S., Liu, W., Renken, R. L., Sugar, R. L. & Toussaint, D. Hadron masses with two quarkﬂavors.

Physical Review D , 2245–2265 (1988).1213] Michael, C. Fitting correlated data. Physical Review D , 2616–2619 (1994).[14] Seibert, D. Undesirable eﬀects of covariance matrix techniques for error analysis. Physical ReviewD , 6240–6243 (1994).[15] Yoon, B., Jang, Y.-C., Jung, C. & Lee, W. Covariance ﬁtting of highly-correlated data in latticeQCD. Journal of the Korean Physical Society , 145–162 (2013).[16] Meroz, Y. & Sokolov, I. M. A toolbox for determining subdiﬀusive mechanisms. Physics Reports , 1–29 (2015).[17] Höﬂing, F. & Franosch, T. Anomalous transport in the crowded world of biological cells.

Reportson Progress in Physics , 046602 (2013).[18] Norregaard, K., Metzler, R., Ritter, C. M., Berg-Sørensen, K. & Oddershede, L. B. Manipulationand motion of organelles and single molecules in living cells. Chemical reviews , 4342–4375(2017).[19] Berglund, A. J. Statistics of camera-based single-particle tracking.

Physical Review E , 011917(2010).[20] Michalet, X. & Berglund, A. J. Optimal diﬀusion coeﬃcient estimation in single-particle tracking. Physical Review E , 061916 (2012).[21] Vestergaard, C. L., Blainey, P. C. & Flyvbjerg, H. Optimal estimation of diﬀusion coeﬃcients fromsingle-particle trajectories. Physical Review E , 022726 (2014).[22] Persson, F., Lindén, M., Unoson, C. & Elf, J. Extracting intracellular diﬀusive states and transitionrates from single-molecule tracking data. Nature Methods , 265–269 (2013).[23] Monnier, N. et al. Inferring transient particle transport dynamics in live cells.

Nature Methods ,838–840 (2015).[24] El Beheiry, M., Dahan, M. & Masson, J.-B. Inferencemap: mapping of single-molecule dynamicswith bayesian inference. Nature Methods , 594–595 (2015).[25] Robson, A., Burrage, K. & Leake, M. C. Inferring diﬀusion in single live cells at the single-moleculelevel. Phil. Trans. R. Soc. B , 20120029 (2013).[26] Krog, J. & Lomholt, M. A. Bayesian inference with information content model check for langevinequations.

Physical Review E , 062106 (2017).[27] Gershenfeld, N. A. The nature of mathematical modeling (Cambridge university press, 1999).[28] Metzler, R. & Klafter, J. The random walk’s guide to anomalous diﬀusion: a fractional dynamicsapproach.

Physics Reports , 1–77 (2000).[29] Pigeon, S., Fogelmark, K., Söderberg, B., Mukhopadhyay, G. & Ambjörnsson, T. Tracer particlediﬀusion in a system with hardcore interacting particles.

Journal of Statistical Mechanics: Theoryand Experiment , 123209 (2017).[30] Mehrer, H. & Stolwijk, N. A. Heroes and highlights in the history of diﬀusion.

Diﬀusion Funda-mentals , 1–32 (2009).[31] Bloch, S. C. Introduction to Classical and Quantum Harmonic Oscillators (John Wiley & Sons,2013).[32] Bouchaud, J.-P. & Sornette, D. The black-scholes option pricing problem in mathematical ﬁnance:generalization and extensions for a large class of stochastic processes.

Journal de Physique I ,863–881 (1994). 1333] Yuan, N., Fu, Z. & Liu, S. Extracting climate memory using fractional integrated statistical model:A new perspective on climate prediction. Scientiﬁc Reports (2014).[34] Barkai, E., Garini, Y. & Metzler, R. Strange kinetics of single molecules in living cells. PhysicsToday , 29 (2012).[35] Tsai, C.-C. Slip, stress drop and ground motion of earthquakes: A view from the perspective offractional Brownian motion. Pure and Applied Geophysics , 689–706 (1997).[36] Metzler, R. & Klafter, J. The restaurant at the end of the random walk: recent developments inthe description of anomalous transport by fractional dynamics.

Journal of Physics A: Mathematicaland General , R161 (2004).[37] Weigel, A. V., Simon, B., Tamkun, M. M. & Krapf, D. Ergodic and nonergodic processes coexist inthe plasma membrane as observed by single-molecule tracking. Proceedings of the National Academyof Sciences , 6438–6443 (2011).[38] Machta, B. B., Chachra, R., Transtrum, M. K. & Sethna, J. P. Parameter space compressionunderlies emergent theories and predictive models.

Science , 604–607 (2013).[39] Kepten, E., Bronshtein, I. & Garini, Y. Improved estimation of anomalous diﬀusion exponents insingle-particle tracking experiments.

Physical Review E , 052713 (2013).[40] Metzler, R., Jeon, J.-H., Cherstvy, A. G. & Barkai, E. Anomalous diﬀusion models and theirproperties: non-stationarity, non-ergodicity, and ageing at the centenary of single particle tracking. Physical Chemistry Chemical Physics , 24128–24164 (2014).[41] Transtrum, M. K., Machta, B. B. & Sethna, J. P. Why are nonlinear ﬁts to data so challenging? Physical Review Letters , 060201 (2010).[42] Flyvbjerg, H. & Petersen, H. G. Error estimates on averages of correlated data.

The Journal ofChemical Physics , 461–466 (1989).[43] Berg, B. A. & Billoire, A. Markov chain Monte Carlo simulations (Wiley Online Library, 2008).[44] Van Kampen, N. G.

Stochastic processes in physics and chemistry , vol. 1 (Elsevier, 1992).[45] gnu

General Public License. URL .[46] Miller, R. G. The jackknife — a review.

Biometrika , 1–15 (1974).[47] Efron, B. & Tibshirani, R. J. An introduction to the bootstrap (CRC press, 1994).[48] Chenouard, N. et al.

Objective comparison of particle tracking methods.

Nature Methods , 281(2014).[49] Sbalzarini, I. F. & Koumoutsakos, P. Feature point tracking and trajectory analysis for videoimaging in cell biology. Journal of Structural Biology , 182–195 (2005).[50] Sbalzarini, I. F. & Koumoutsakos, P. Particletracker (2016). URL http://imagej.net/Particle_Tracker . Version November 2016.[51] Kariya, T. & Kurata, H.

Generalized least squares (John Wiley & Sons, 2004).[52] Savin, T. & Doyle, P. S. Static and dynamic errors in particle tracking microrheology.

BiophysicalJournal , 623–638 (2005).[53] Martin, D. S., Forstner, M. B. & Käs, J. A. Apparent subdiﬀusion inherent to single particletracking. Biophysical Journal , 2109–2117 (2002).1454] Calderon, C. P. Motion blur ﬁltering: A statistical approach for extracting conﬁnement forces anddiﬀusivity from a single blurred trajectory. Physical Review E , 053303 (2016).[55] Chaichian, M. & Demichev, A. Path integrals in physics, vol. 1: Stochastic processes and quantummechanics. IOP, Bristol, UK (2001).[56] Nørrelykke, S. F. & Flyvbjerg, H. Harmonic oscillator in heat bath: Exact simulation of time-lapse-recorded data and exact analytical benchmark statistics.

Physical Review E , 041103 (2011).[57] Qian, H. Fractional Brownian motion and fractional Gaussian noise. In Processes with Long-RangeCorrelations , 22–33 (Springer, 2003).[58] Mandelbrot, B. B. & Van Ness, J. W. Fractional Brownian motions, fractional noises and applica-tions.

SIAM Review , 422–437 (1968).[59] Davies, R. B. & Harte, D. Tests for Hurst eﬀect. Biometrika , 95–101 (1987).[60] Chambers, M. The simulation of random vector time series with given spectrum. Mathematical andComputer Modelling , 1–6 (1995).[61] Quenouille, M. H. Notes on bias in estimation. Biometrika , 353–360 (1956).[62] Gradshteyn, I. & Ryzhik, I. Table of integrals, series and products (corrected and enlarged editionprepared by A. Jeﬀrey and D. Zwillinger). Academic Press, New York (2000).[63] Anderson, T. W.

An introduction to multivariate statistical analysis, 3rd ed. (Wiley New York,2003).[64] Schucany, W., Gray, H. & Owen, D. On bias reduction in estimation.

Journal of the AmericanStatistical Association , 524–533 (1971).[65] Efron, B. & Stein, C. The jackknife estimate of variance. The Annals of Statistics

Statistical science

Acknowledgments

We are grateful to Bo Söderberg and Björn Linse for fruitful discussions. T.A. was supported by theSwedish Research Council (grant nos 2009-2924 and 2014-4305). K.F. was supported by the SwedishResearch Council (grant no 2010-5219). M.A.L. acknowledges funding from the Danish council forIndependent Research-Natural Sciences (FNU), grant number 4002-00428B.

Author contributions statement

M.A.L. and T.A. conceived the idea of the project. All authors contributed to the conceptual design ofthe WLS-ICE method. K.F. performed the simulations and wrote the analysis software supervised byT.A. K.F. prepared all ﬁgures. T.A. and K.F. wrote the manuscript with help from A.I. and M.A.L.T.A. derived the new error estimation formula (with and without jackknife). M.A.L. derived the biascorrection prediction for BM with input from K.F. and T.A. A.I. suggested the use of jackknife for CCMﬁtting. T.A. coordinated the project.

Competing interests

The authors declare no competing interests. 15 upplementary Figures

Supplementary Figure S1:

Correlations in ﬂuctuations around ensemble averages for real tra-jectories compared to uncorrelated ﬂuctuations (synthetic data).

The displacement squared y ( m ) i = [ x ( m ) ( t i ) − x ( m ) (0)] for fractional Brownian motion (FBM) as a function of time, t , for twotrajectories, labeled by m , and the mean of a large ensemble ( M = 10 ) of trajectories. Panel ( a ) showsactual trajectories which exhibit strong temporal correlation, meaning: if we are above the mean for sometime point on a trajectory, we are likely to still be above the mean for time points close to it (circled).In panel ( b ) we have constructed “synthetic” trajectories for comparison by only using one data pointfrom each real trajectory, and "throw away" the rest, resulting in (computationally expensive) uncorre-lated data. That is, within this “brute force” method, to generate a single uncorrelated trajectory of N sampling points, we need to use the same amount of real trajectories, and throw away all data pointssave one. Data was generated from a one-dimensional FBM simulation with Hurst parameter H = 0 . ,see Supplementary Methods section C.3.upplementary Figure S2: Histograms of ﬁtted parameters for two WLS methods and CCMcompared to theoretical predictions for a small ensemble size.

All panels are identical to thosein Figure 1 in the main text except that we here only used M = 150 trajectories (instead of M = 1000 ).In panel e (the CTRW prefactor), the CCM ﬁtting procedure gave a vastly incorrect parameter estimate( (cid:104) θ (cid:105) /θ ∗ = 13 . ) and the associated histogram is therefore not displayed. Due to the smaller M valueused here as compared to Figure 1 in the main text the histogram of ﬁtted parameters are non-Gaussianfor panel e , see Supplementary Methods Sec. F for a discussion on this topic. The other panels convergedto normal distributions for smaller M values. Examples of parameter ﬁts to the MSD data are shown inFig. S6. 17upplementary Figure S3: Bias in the parameter ﬁt.

The residual bias in the ﬁt (multiplied bythe number of trajectories M ) as a function of sampling points, N (log-scale for the horizontal axisfor visibility), averaged over parameters from ﬁtting to S = 500 realizations of the mean. ( a ) For theBrownian motion (BM) CCM ﬁt, the analytical prediction, G ( N ) , (full line) for the ﬁrst order bias followsthe observed bias for M = 10 , data (Supplementary Methods section E.3). For ( b ) damped harmonicoscillation (DHO) the bias in CCM and WLS are both small, but for ( c – d ) fractional Brownian motion(FBM), and ( e – f ) continuous time random walk (CTRW), the bias term in CCM is much larger thanthe WLS bias. The bias can be alleviated to some degree by a Jackknife procedure. Error bars showstandard error of the mean. For simulation parameters, see Supplementary Methods section D.5.18upplementary Figure S4: High order bias contribution in CCM ﬁtting for BM.

The bias in theparameter estimation is commonly assumed to be of the form ˆ θ = θ ∗ + a/M + b/M + c/M + O ( M − ) ,see Supplementary Methods section E. In panel ( a ) the vertical axis shows the (negative) second orderbias term − b/M , and in ( b ) the (positive) third order term, c/M , for three diﬀerent number ofsampling times N . Note that these are of comparable magnitude, but opposite sign. Thus a secondorder jackknife, which removes terms proportional to a/M and b/M , may yield more unfavorable resultsthan a ﬁrst order jackknife, which only removes the a/M term. We note that the slope of the secondorder bias term approximately corresponds to M − , and the third order is slightly more. For panel (a)the second order bias was extracted combining equation (S122) and equation (S124), to give − b/M =2 θ (0 , , J + θ (0 , J − θ ∗ , and for panel (b) we have ( θ (0 , , J − θ ∗ ) = c/M , which follows immediately fromequation (S124). For simulation parameters, see Supplementary Methods section D.5.19upplementary Figure S5: Heuristic goodness-of-ﬁt using the the coeﬃcient of determination, R . The quality of the CCM and WLS ﬁts are heuristically quantiﬁed by the coeﬃcient of determination, R , as a function of sampling points, N (horizontal axis on log-scale for visibility), for our four prototypesystems: ( a – b ) Brownian motion (BM), ( c – d ) damped harmonic oscillation (DHO), ( e – f ) fractionalBrownian motion (FBM), and ( g – h ) continuous time random walk (CTRW). A perfect ﬁt yields unitvalue, while a bad ﬁt results in R (cid:28) (see Supplementary Methods, Sec. I). The number of trajectoriesused in the ensemble average was either M = 10 (left), or M = 80 (right). All data was averaged over S = 500 realizations, with standard deviation given by the error bars. For panels ( b , f ) only a few datapoints could be obtained, due to numerical instability of CCM, and for panels ( g , h ) R < for larger N . For simulation parameters, see Supplementary Methods section D.5.20upplementary Figure S6: Example of a ﬁt to the mean of ensemble data for the WLS andCCM methods.

An illustrative example of a typical ﬁt to average ensemble trajectory data, basedon M = 150 trajectories, for ( a ) Brownian motion (BM), ( b ) damped harmonic oscillation (DHO),( c ) fractional Brownian motion (FBM), and ( d ) continuous time random walk (CTRW). The modelparameters were ﬁtted to the data using either WLS or CCM ﬁtting procedure, for N = 75 , M = 150 .For CCM ﬁtting to the FBM data, we see that although the exponent is almost the same, the pre-factoris inaccurate. For CCM ﬁtting to CTRW data, both exponent and pre-factor is poor. For simulationparameters, see Supplementary Methods section D.5.21upplementary Figure S7: Error estimation using bootstrap resampling and jackknife errorestimation.

Standard deviation for the parameter ﬁts as a function of the number of sampling points, N ,used in the ﬁtting procedure. Each method is applied to S = 500 realizations of data from ( a ) Brownianmotion (BM), ( b ) damped harmonic oscillation (DHO), ( c – d ) fractional Brownian motion (FBM), and( e – f ) continuous time random walk (CTRW). The associated standard deviation in parameter estimatesserve as "actual" standard deviation. These actual values are compared to estimates using bootstrapresampling and jackknife error estimation procedures, see Supplementary Methods, Sec. H. We see thatthe bootstrap method gives rather reliable error estimates which are similar to that of the WLS-ICEprocedure, compare to Figure 2 in the main text. However, note that the bootstrap method is associatedwith a substantially larger computational time compared to the WLS-ICE. The jackknife error estimationperforms worse than bootstrap resampling in general. For the jackknife error estimation, we used groups. For the bootstrap results, trajectories were resampled with replacement and the χ minimizationperformed 100 times. For simulation parameters, see Supplementary Methods section D.5.22upplementary Figure S8: Bias and variance of Brownian motion adapted least squares(BMALS) compared to the WLS-ICE and CCM methods.

We show the bias in parameterﬁt (left panels) and their variance compared to estimates from ﬁtting procedure (right panels), as afunction of the number of sampling times, N . The MSD based on two diﬀerent data sizes, M (numberof trajectories) was considered: ( a , b ) M = 10 and ( c , d ) M = 80 ; averaged over S = 500 realizations.Notice the lower variance in BMALS as compared to WLS-ICE, and that as M is increased the CCMvariance approach the variance for BMALS. The BMALS is a hybrid between model matching and func-tion ﬁtting procedures as it requires the true covariance matrix as input. Error bars show standard errorsof the mean. For simulation parameters, see Supplementary Methods section D.5.23 upplementary Tables Description

Low density Medium density High density

Video

S5 S1 S6

Number of trajectories M = 310 M = 16 M = 5 Method

ObservableWLS-ICE ˆ θ ˆ σ ˆ θ ˆ σ ˆ θ ˆ σ Results of the three ﬁtting method for “real world” particle trackingdata, without jackknife . Results shown are for the same data as in Table 1 in the main text, but herebefore the jackknife procedures were applied. Comparing to Table 1 in the main text we see that biasesis rather large for video S6 (few trajectories) but minor for video S5 (large number of trajectories).

Abbreviation Comment

WLS-ICE weighted least squares includingcorrelations in error estimation new methodWLS-ECE weighted least squares excludingcorrelations in error estimation old methodCCM correlated chi-square method old methodBM Brownian motion zero-mean process without memoryDHO damped harmonic oscillation process with a time-dependent meanFBM fractional Brownian motion zero-mean process with memoryCTRW continuous time random walk zero-mean, ageing processSupplementary Table S2:

List of abbreviations .24 upplementary MethodsContents

A Weighted Least Squares Including Correlations in Error estimation (WLS-ICE) 26

A.1 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26A.2 Error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

B Review of previous ﬁtting procedures 28

B.1 WLS-ECE ﬁtting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28B.1.1 General ﬁt functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28B.1.2 Linear ﬁt functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29B.2 CCM ﬁtting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29B.2.1 General ﬁt functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29B.2.2 Linear ﬁt functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

C Prototypical model systems 30

C.1 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30C.2 Damped Harmonic Oscillation in a heat bath (DHO) . . . . . . . . . . . . . . . . . . . . . 32C.3 Fractional Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34C.4 Continuous time random walk (CTRW) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

D Simulation procedures 34

D.1 Brownian motion (BM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35D.2 Damped harmonic oscillation (DHO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35D.3 Fractional Brownian motion (FBM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35D.4 Continuous time random walk (CTRW) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35D.5 Simulation parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

E Bias eﬀects in parameter estimation 36

E.1 The origin of bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36E.2 Bias in parameter estimation of CCM for linear ﬁt functions . . . . . . . . . . . . . . . . . 36E.3 Bias in parameter estimation of CCM for BM . . . . . . . . . . . . . . . . . . . . . . . . . 37E.3.1 Asymptotic expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39E.4 Bias in parameter estimation of WLS for BM . . . . . . . . . . . . . . . . . . . . . . . . . 40E.5 Lack of bias for BMALS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41E.6 Lack of bias in parameter estimation of CCM for DHO . . . . . . . . . . . . . . . . . . . . 41

F Approximate distribution for the estimated parameters 42G Jackknife bias reduction 42

G.1 First order jackknife bias reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43G.2 Second order jackknife bias reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43G.3 Variance for jackknife-bias-reduced estimators . . . . . . . . . . . . . . . . . . . . . . . . . 44G.3.1 First order jackknife bias reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 44G.3.2 Second order jackknife bias reduction . . . . . . . . . . . . . . . . . . . . . . . . . 45

H Estimation of errors on estimated parameters, using jackknife and bootstrap proce-dures 45

H.1 Jackknife error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45H.2 Bootstrap error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

I Coeﬃcient of determination 46 Settings in "Particle Tracker" plug-in 46

In this Supplementary Methods, details of the derivations, simulations and methods are provided.For convenience, Table S2 lists all abbreviations used herein.

A Weighted Least Squares Including Correlations in Error esti-mation (WLS-ICE)

We here describe our new ﬁtting procedure, the WLS-ICE method, in detail. As demonstrated in themain text, the previous standard methods for ﬁtting of ensemble averages, the WLS-ECE or CCMprocedures (section B), are of limited general applicability for ﬁtting of correlated data: the WLS-ECEmethod assumes data points are independent resulting in ﬂawed error estimation, whereas the CCMmethod (involving inversion of a noisy sample covariance matrix) provides ill-conditioned results orstrong bias in the parameter estimation. We here formulate the problem at hand as a minimizationof a “cost function”, χ , which can be chosen rather general. Minimizing this cost function provides anestimate, ˆ θ , for the model parameters of interest. However, unlike the WLS-ECE ﬁtting procedure, whereﬂuctuations around mean values are assumed to be independent, we use the full multivariate probabilitydensity function for the mean values, eq. (S8) (which is Gaussian due to the multivariate central limittheorem), when estimating the standard error and covariance in the ﬁtted parameters. This provides amathematically rigorous way of avoiding the problems with previous ﬁtting methods. A.1 Parameter estimation

The cost function used herein is a χ functional (eq. (2) in the main text) on the form: χ = ( f − y ) T R ( f − y ) , (S1)where y = ( y , . . . , y N ) , f = ( f , . . . , f N ) , f i = f ( T i ; θ ) with sampling times T i ( i = 1 , . . . , N ) and where ( . . . ) T denote transpose. We ﬁnd the best parameters ˆ θ by minimizing χ , i.e., by solving: ∂χ ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ = 0 = 2 (cid:88) i,j ∂f i ( θ ) ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ij ( f j ( ˆ θ ) − y j ) , (S2)where a = 1 , . . . , K . As in the main text, a ’bar’ denotes a sample estimator, a ’hat’ denotes parametersobtained through χ minimization, and a ’star’ is used to denote the true value of a parameter. For alinear ﬁt function, f i ( θ ) = θ T i , eq. (S2) can be solved analytically: ˆ θ = y T RTT T RT . (S3)Note that the positive deﬁnite symmetric matrix R in eq. (S1) could potentially be custom madefor particular applications. In the main text the observables y i are mean positions or mean squaredisplacements at diﬀerent sampling times, T i . We note, however, that our WLS-ICE procedure is validfor any type of ensemble averaged observables (the matrices C and Q below are then the covariancematrix for those particular observables).For the matrix R , we consider three main choices:

1. Correlated Chi-Square Method (CCM):

Here we make use of the full covariance matrix, (seesection B.2): R = R [ CCM ] = C − , (S4)where C is the covariance matrix of the mean, C = Q /M , as deﬁned in eq. (3) in the main text.26 . Weighted least squares (WLS): Here we only make use of the diagonal elements, R ij = R [ W LS ] ij = δ i,j /C ii , (S5)where δ i,j is the Kronecker delta-function.

3. Brownian motion adapted least squares (BMALS):

Finally we probe our ﬁtting method bythe following choice: R = R [ BMALS ] = 1 M Q ∗ BM − , (S6)where Q ∗ BM is the exact covariance matrix for BM, see eq. (S38). For comparison of the BMALSmethod to WLS-ICE, please see Supplementary Figure S8. A.2 Error estimation

The covariance for the estimated parameters (i.e., the parameters ˆ θ obtained by solving eq. (S2)) isdeﬁned ˆ∆ ab = (cid:104) (ˆ θ a − θ ∗ a )(ˆ θ b − θ ∗ b ) (cid:105) , (S7)where (cid:104) F ( y ) (cid:105) = (cid:82) F ( y ) ρ ( y ; θ ∗ ) dy dy · · · dy N denotes an average over the multivariate probability den-sity, ρ ( y ; θ ∗ ) . Due to the multivariate central limit theorem (note that y is a sum of M identicallydistributed random numbers), for large M this probability density is a multi-variate Gaussian: ρ ( y ; θ ∗ ) = Z − exp (cid:18) −

12 ( y − y ∗ ) T C ∗− ( y − y ∗ ) (cid:19) , (S8)with normalization constant Z = (2 π ) N/ (cid:112) det( C ∗ ) [44] and C ∗ = Q ∗ /M , where Q ∗ is the exactcovariance matrix.In order to derive an explicit expression for ˆ∆ ab we follow the lines of thought of Gottlieb et al. [12]and make a ﬁrst order Taylor series expansion of the estimated parameter values in terms of deviationsof the estimated y from their true values: ˆ θ a − θ ∗ a = (cid:88) k ∂ ˆ θ a ∂y k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y = y ∗ ( y k − y ∗ k ) + O [( y k − y ∗ k )( y l − y ∗ l )] . (S9)Substituting this expression into eq. (S7) and using the deﬁnition of the covariance matrix: C ∗ kl = (cid:104) ( y k − y ∗ k )( y l − y ∗ l ) (cid:105) [this result follows from eq. (S8)] we ﬁnd, to ﬁrst order, ˆ∆ ab = (cid:88) k,l ∂ ˆ θ a ∂y k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y = y ∗ C ∗ kl ∂ ˆ θ b ∂y l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y = y ∗ . (S10)In order to obtain an explicit expression for ∂ ˆ θ a /∂y k we diﬀerentiate eq. (S2) with respect to y k . Thisyields (cid:88) b ˆ h ab ∂ ˆ θ b ∂y k − (cid:88) i ∂f i ( θ ) ∂θ a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ik (S11)where we introduced ˆ h ab = 2 (cid:88) i,j ∂ f i ( θ ) ∂θ a ∂θ b (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ij ( f j ( ˆ θ ) − y j ) + 2 (cid:88) i,j ∂f i ( θ ) ∂θ a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ij ∂f j ( θ ) ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ . (S12)Solving eq. (S11) we obtain: ∂ ˆ θ a ∂y k = 2 (cid:88) i (cid:88) b ( ˆ h − ) ab ∂f i ( θ ) ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R ik , (S13)27hich when substituted into eq. (S10) yields the following expression for the covariance of the estimatedparameter, ˆ θ : ˆ∆ ab =  (cid:88) c,d (cid:88) j,k,l,m ( ˆ h − ) ac ∂f j ( θ ) ∂θ c (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ R jk C ∗ kl R lm ∂f m ( θ ) ∂θ d (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ ( ˆ h − ) db  y = y ∗ . (S14)We ﬁnally replace all exact quantities above by the corresponding sample estimators (and use C = Q /M ),giving the key result, eq. (4) in the main text. The replacement of exact ensemble averages by sampleestimates introduces bias terms which, to ﬁrst order, are proportional to /M , where M is the numberof trajectories, see section E.1. For WLS-ICE/WLS-ECE procedures, we ﬁnd that the bias is in practiceoften negligible (see main text). Just as the parameter estimates ˆ θ a are typically biased, so will thequantity ˆ φ ab in eq. (4) in the main text also be, as it is a nonlinear function of sample estimates, seesection E.1. This bias can be reduced using the jackknife procedure applied to ˆ φ ab (see section G). B Review of previous ﬁtting procedures

In this section we investigate the two previous ubiquitous χ methods for model ﬁtting, namely WLS-ECE (uncorrelated χ ) ﬁtting and CCM (correlated χ ) ﬁtting. B.1 WLS-ECE ﬁtting

The previous most common method of functional ﬁtting to data is the “standard” weighted least squares(WLS-ECE in the main text) method (uncorrelated χ ﬁtting), which is reviewed in this section. In thismethod, one assumes that all ﬂuctuations around mean values are uncorrelated. B.1.1 General ﬁt functions

In the WLS-ECE method one maximizes the probability for the function f ( T i ; θ ) = f i ( θ ) to have a goodﬁt to the data: P ( y ; θ ) ∝ N (cid:89) i =1 exp (cid:18) −

12 ( y i − f i ( θ )) σ i (cid:19) . (S15)Note that this probability is a product over the observations, y , hence the data is assumed to be statis-tically independent . Within this assumption, the unbiased estimator of variance of the mean is σ i = 1 M M − M (cid:88) m =1 ( y ( m ) i − y i ) . (S16)Maximizing the probability P is equivalent to minimizing χ = N (cid:88) i =1 ( y i − f i ( θ )) σ i , (S17)from which we get estimated parameters ˆ θ , by solving ∂χ ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ = 0 = 2 (cid:88) i ∂f i ( θ ) ∂θ a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ σ i ( f i ( ˆ θ ) − y i ) . (S18)For χ close to the estimated parameter set ˆ θ we have the Taylor expansion χ = χ (cid:12)(cid:12) ˆ θ + K (cid:88) a =1 ( θ a − ˆ θ a ) ∂χ ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ + 12 K (cid:88) a,b =1 ( θ a − ˆ θ a )( θ b − ˆ θ b ) ∂ χ ∂θ a ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ , (S19)28hich we can insert back into the expression for P , eq. (S15), to yield P ( θ ) = W exp  − K (cid:88) a,b =1 ˆ H ab ( θ a − ˆ θ a )( θ b − ˆ θ b )  , (S20)where W is a normalization constant and ˆ H ab = ∂ χ ∂θ a ∂θ b (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ (S21)is the Hessian matrix, and we used ∂χ /∂θ a | θ = ˆ θ = 0 . From eq. (S20) we ﬁnd that ˆ∆ ab ≡ (cid:104) (ˆ θ a − θ ∗ a )(ˆ θ b − θ ∗ b ) (cid:105) = 2( ˆ H ) − ab , (S22)i.e., the inverse of the Hessian matrix determines the covariances of the estimated parameters. B.1.2 Linear ﬁt functions

For the case that the ﬁt function is linear, i.e., f i ( θ ) = θ T i , eq. (S18) can be solved analytically (Press et al. [9]). The same can be done for the variance, σ , in the estimated parameter. We have ˆ θ = (cid:80) i y i T i /σ i (cid:80) i T i /σ i (S23a) ˆ σ = ˆ∆ = 1 (cid:80) i T i /σ i . (S23b) B.2 CCM ﬁtting

In this section we review CCM (correlated chi-square method) ﬁtting procedure [10, 12, 14, 13].

B.2.1 General ﬁt functions

Where a WLS-ECE ﬁt only makes use of the diagonal (variance) of the covariance matrix, CCM makesuse of the full matrix, deﬁned as in eq. (S39), where the diagonal will be the square of the standard errorof the mean, s i = σ i /M . The task of ﬁtting a function f ( t i ; θ ) , reduces to maximizing the probabilitywhich is taken as the multi-variate Gaussian: P ( y ; θ ) = Z − exp (cid:18) −

12 ( y − f ( θ )) T C − ( y − f ( θ )) (cid:19) , (S24)where (for a good ﬁt: y ∗ ≈ f ) C = Q /M can be estimated through eq. (3) in the main text, andthe normalization constant Z = (2 π ) N/ (cid:113) det( C ) [44], y = ( y , . . . , y N ) , f = ( f , . . . , f N ) , with f i = f ( T i ; θ ) , and ( . . . ) T denotes transpose. For uncorrelated data the covariance matrix estimator, C , willbe diagonal and eq. (S24) reduces to eq. (S15), and the WLS-ECE method is attained.As for WLS-ECE, maximizing P is equivalent to minimizing the cost function χ = ( y − f ( θ )) T C − ( y − f ( θ )) . (S25)Thus, we get our estimated parameters ˆ θ a ( a = 1 , . . . , K ) by solving: ∂χ ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ = 0 = − ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ C − ( y − f ( ˆ θ )) + ( y − f ( ˆ θ )) C − (cid:18) − ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ (cid:19) = ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ C − ( f ( ˆ θ ) − y ) , (S26)29here in the last step we used the symmetry property of C , i.e., that C ij = C ji .The derivation of the covariance, ∆ ab , of the CCM estimated parameters, ˆ θ a follows along identicallines as for WLS-ECE (previous section). Hence, ∆ ab is given by eq. (S22) where ˆ θ a is now obtained bysolving eq. (S26) (instead of solving eq. (S18) as for WLS).We ﬁnally note that the CCM is a maximum likelihood estimation procedure "asymptotically". Moreprecisely, if M is large enough so that y s are Gaussian by the multi-variate central limit theorem, if theﬁt is "good" in the sense that y ∗ ≈ f , and if the errors on the estimated elements of the covariancematrix are negligible, then the CCM is a maximum likelihood estimation method. B.2.2 Linear ﬁt functions

For ﬁtting a linear function, f i ( θ ) = θ T i , to data one can determine the minimum of the CCM χ function, eq. (S25), analytically. In particular, such a ﬁt function is of relevance for BM (section C.1).Eq. (S26) becomes ∂χ ∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ∗ = ( y − θ ∗ T ) T C − T . (S27)Taking the second derivative we get ∂ χ ∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ = θ ∗ = − T T C − T . (S28)From these results, as well as using eq. (S21) and eq. (S22), we get the estimated value for the parameter θ and its variance σ as ˆ θ = y T C − TT T C − T (S29a) ˆ σ = ˆ∆ = 1 T T C − T . (S29b) C Prototypical model systems

In the main text we provide results for diﬀerent parameter estimation procedures. As prototype systemswe use four processes where the true parameter values are known, namely: (i) Brownian motion (BM),(ii) damped harmonic oscillation (DHO), (iii) fractional Brownian motion (FBM), and (iv) continuoustime random walks (CTRW). For BM and CTRW in d spatial dimensions, steps in diﬀerent directions areindependent. Therefore, without loss of generality, all simulations are here performed in one dimension, d = 1 , for these systems. Also, for consistency, we use d = 1 in our FBM simulations. C.1 Brownian motion

Our ﬁrst example is a simple BM, which can be used to describe, e.g., single particle diﬀusion in onedimension. The mean square displacement (MSD) at time t , for dimension d , and diﬀusion constant D ,is (cid:104) ( x ( t ) − x (0)) (cid:105) = (cid:104) y ( t ) (cid:105) = θt, (S30)where θ = 2 dD (S31)and y ( t ) = [ x ( t ) − x (0)] . (S32)In all simulations in the main text we use one-dimensional simulations, i.e., d = 1 .30n one-dimensional BM, the full covariance matrix for the displacement is known [55]. Choosing x (0) = 0 and discretizing time into process times t i = i(cid:15) ( i = 1 , . . . , N ), with time step (cid:15) , we have V ∗ ij = (cid:104) ( x i − (cid:104) x i (cid:105) )( x j − (cid:104) x j (cid:105) ) (cid:105) = 2 D min( t i , t j ) , (S33)where x i = x ( t i ) and D is the diﬀusion constant. On matrix form: V ∗ = 2 D(cid:15)  . . .

11 2 2 21 2 3 3 ... . . . N  . (S34)Of interest here is also the covariance matrix for the square displacements: Q ∗ ij = (cid:104) ( y i − (cid:104) y i (cid:105) )( y j − (cid:104) y j (cid:105) ) (cid:105) . (S35)Using Wick’s (Isserlis’) theorem for zero-mean processes, we can calculate any moment of a multivariateGaussian according to (cid:104) x x · · · x n (cid:105) = (cid:88) (cid:89) (cid:104) x i x j (cid:105) , (S36)where the sum is over all distinct ways of partitioning x . . . , x n into pairs x i x j . Using eq. (S36) wehave the following relation between Q ∗ and V ∗ : Q ∗ ij = 2( V ∗ ij ) . (S37)On matrix form: Q ∗ = 8( D(cid:15) )  . . .

11 4 4 41 4 9 9 ... . . . N  . (S38)The standard unbiased sample estimator of Q is Q ij = 1 M − (cid:88) m ( y ( m ) i − y i )( y ( m ) j − y j ) . (S39)where m labels trajectories, see main text.For BM, the inverse of the Q ∗ matrix is a tridiagonal matrix with column sum of zero, except theﬁrst. Explicitly Q ∗− = 18( D(cid:15) )  − . . . −

13 13 + − −

15 15 + − ... − . . . . . . − N − N −  , (S40)which can be written as ( Q ∗− ) ij = 18( D(cid:15) ) (cid:20)(cid:18) i − − δ i,N )2 i + 1 (cid:19) δ i,j − (cid:18) i + 1 (cid:19) δ i,j − − (cid:18) i − (cid:19) δ i,j +1 (cid:21) . (S41)where δ i,j is the Kronecker delta-function ( δ i,j = 1 , if i = j ; δ i,j = 0 , if i (cid:54) = j ). It is straightforwardto show that indeed the matrix above satisﬁes ( Q ∗− ) · Q ∗ = I , where I is the identity matrix. Notethat the results above for Q ∗− assumes that the time of the ﬁrst sampling time is equal to the distancebetween subsequent sampling times. In general, this choice of sampling times may not be optimal. Insuch situations one can evaluate Q ∗− using numerical inversion of Q ∗ given in Eqs. (S37) and (S33).31 .2 Damped Harmonic Oscillation in a heat bath (DHO) Following Nørrelykke and Flyvbjerg[56] we consider the dynamics of a damped harmonic oscillation in aheat bath (DHO). Physically, this process corresponds to the motion of a particle in a harmonic potential(i.e., the particle experiences a restoring force proportional to the displacement from the botttom of thepotential) in a viscous liquid. Besides exerting friction on the particle, the molecules in the viscous liquidact as a noise source by providing thermal kicks on the particle. The equation of motion is: m d x ( t ) dt + γ dx ( t ) dt + κx ( t ) = F therm ( t ) , (S42)where x ( t ) is the particle position at time t , m is the mass, γ is the friction constant, κ is the springconstant and F therm = (2 k B T γ ) / η ( t ) is the thermal noise, which is assumed to be zero mean Gaussianand delta-correlated, i.e., (cid:104) η ( t ) (cid:105) = 0 (S43)and (cid:104) η ( t ) η ( t (cid:48) ) (cid:105) = δ ( t − t (cid:48) ) . (S44)Above, k B is the Boltzmann constant, T is the temperature of the heat bath and δ ( z ) is the Diracdelta-function. The equation of motion is completed by initial conditions for the position and velocity.We restrict ourself to x ( t = 0) = x (S45) v ( t = 0) = dx ( t ) dt (cid:12)(cid:12)(cid:12)(cid:12) t =0 = 0 , (S46)i.e., the particle is at the initial time displaced by a distance x from its equilibrium position and thenlet go without imposing any initial velocity (no external pushing or pulling).Based on eq. (S42) it is straightforward to derive an expression for the expected position, (cid:104) x ( t ) (cid:105) , attime t . By taking the ensemble average of eq. (S42) and then making the ansatz: (cid:104) x ( t ) (cid:105) = exp( i Ω t ) wearrive at a second order algebraic equation for Ω with two solutions: Ω ± = i τ + √ ω , (S47)where τ = mγ , (S48) ω = ω − τ (S49)and ω = (cid:114) κm . (S50)Thus, for the case ω < the solution for (cid:104) x ( t ) (cid:105) is an exponentially damped function. For the case ω > ,the solution is a complex valued exponential which can be written in terms of real-valued exponentialsmultiplied by sinus and cosinus functions. Also incorporating the initial conditions used here, eqs. (S45)and (S46), we ﬁnd the solution for the mean to be (cid:104) x ( t ) (cid:105) = x (cid:18) cos( ωt ) + θ ω sin( ωt ) (cid:19) exp( − θ t ) , (S51)with θ = 12 τ . (S52)32he case when ω = 0 (i.e., ω = 1 / (2 τ ) ) is referred to as critical damping. For this case we can obtainthe solution from eq. (S51) by taking the limit of ω → to ﬁnd (cid:104) x ( t ) (cid:105) = x (1 + θ t ) exp( − θ t ) . (S53)The case of critical damping is used in the simulations in the main text, where θ is used as a ﬁttingparameter.Using the full stochastic eq. (S42), we can also derive an explicit expression for the covariance matrix C ∗ ( t, ˜ t ) = (cid:104) [ x ( t ) − (cid:104) x ( t ) (cid:105) ][ x (˜ t ) − (cid:104) x (˜ t ) (cid:105) ] (cid:105) . For simplicity we limit ourselves to the case ω ≥ . We startby rewriting eq. (S42) as a set of two coupled ﬁrst order equations[56] ddt (cid:18) x ( t ) v ( t ) (cid:19) = − M (cid:18) x ( t ) v ( t ) (cid:19) + (cid:18) √ Dτ η ( t ) (cid:19) , (S54)with D = k B T /γ being the particle diﬀusion constant and M = (cid:18) − ω

20 1 τ (cid:19) , (S55)which has the formal solution (cid:18) x ( t ) v ( t ) (cid:19) = (cid:18) (cid:104) x ( t ) (cid:105)(cid:104) v ( t ) (cid:105) (cid:19) + √ Dτ (cid:90) t exp( − M )( t − t (cid:48) ) (cid:18) η ( t (cid:48) ) (cid:19) dt (cid:48) , (S56)where (cid:18) (cid:104) x ( t ) (cid:105)(cid:104) v ( t ) (cid:105) (cid:19) = exp( − M t ) (cid:18) x v (cid:19) (S57)is the solution to the mean of eq. (S54) (using (cid:104) η ( t ) (cid:105) = 0 ). The covariance matrix now becomes: C ∗ ( t, ˜ t ) = 2 Dτ (cid:90) t dt (cid:48) (cid:90) ˜ t dt (cid:48)(cid:48) (exp( − M ( t − t (cid:48) ))) (cid:0) exp( − M (˜ t − t (cid:48)(cid:48) )) (cid:1) (cid:104) η ( t (cid:48) ) η ( t (cid:48)(cid:48) ) (cid:105) . (S58)Without loss of generality, we assume that t < ˜ t , and carry out the integral over t (cid:48)(cid:48) above to ﬁnd: C ∗ ( t, ˜ t ) = 2 Dτ (cid:90) t dt (cid:48) (exp( − M ( t − t (cid:48) ))) (cid:0) exp( − M (˜ t − t (cid:48) )) (cid:1) . (S59)Using for ω > the explicit form for the matrix exponential above as provided by Nørrelykke et al. [56] exp( − M t ) = exp( − θ t )[cos( ωt ) I + sin( ωt ) J ] , (S60)with I the 2 by 2 identity matrix and J = (cid:32) θ ω ω − ω ω − θ ω (cid:33) , (S61)eq. (S59) becomes: C ∗ ( t, ˜ t ) = 8 Dθ ω (cid:90) t dt (cid:48) exp( − θ ( t − t (cid:48) )) exp( − θ (˜ t − t (cid:48) )) sin[ ω ( t − t (cid:48) )] sin[ ω (˜ t − t (cid:48) )] . (S62)Carrying out the integral above we arrive at our ﬁnal expression for the covariance matrix for DHO: C ∗ ( t, ˜ t ) = 2 Dθ ω ( θ + ω ) exp( − θ | ˜ t − t | ) (cid:8) ω cos[ ω (˜ t − t )] + θ sin[ ω | ˜ t − t | ] (cid:9) + 2 Dθ ω exp( − θ (˜ t + t )) (cid:18) θ θ + ω cos[ ω (˜ t + t )] − θ cos[ ω (˜ t − t )] − ωθ + ω sin[ ω (˜ t + t )] (cid:19) . (S63)33n the limit ω → (critical damping) we have C ∗ ( t, ˜ t ) = 2 Dθ (cid:0) exp( − θ (˜ t − t )) (cid:0) θ (˜ t − t ) (cid:1) − exp( − θ (˜ t + t )) (cid:0) θ (˜ t + t ) + 2 θ ˜ tt (cid:1)(cid:1) . (S64)To arrive at this result we made a Taylor series expansion to second order in ω of the general expression.We refrain from attempting to obtain an analytic expression for the inverse covariance matrix forDHO, as it appears a daunting task beyond the scope of the current study. C.3 Fractional Brownian motion

Our third example is the case of one-dimensional FBM, which is a zero mean Gaussian process withautocorrelation function, [57] v ij = (cid:104) x ( t i ) x ( t j ) (cid:105) = c ( t Hi + t Hj − | t i − t j | H ) , (S65)at discrete times t i = i(cid:15) and where the parameter H denotes the Hurst parameter [58]. For H = 1 / , FBMbecomes standard BM. Indeed, if we set H = 1 / in eq. (S65) we ﬁnd that v ij = c [( t i + t j ) − | t i − t j | ] =2 c min( t i , t j ) which is identical to eq. (S33) if we choose c = D . The inverse covariance matrix of eq. (S65)is (currently) not known analytically.From eq. (S65) we get the MSD, for t i = t j , as ( x (0) = 0 ) (cid:104) x ( t ) (cid:105) = θ t θ , (S66)where θ = 2 c and θ = 2 H , i.e., the MSD has, for H < / , a sublinear (or superlinear, if H > / )dependence on time, t . C.4 Continuous time random walk (CTRW)

Our last example uses CTRW in one dimension. Such a process is deﬁned through a waiting time density ψ ( τ ) , and a jump length probability density, ζ ( (cid:96) ) [28]. In our case we choose ψ ( τ ) = ατ ∗ (1 + τ /τ ∗ ) − − α (S67)with < α < so that we have inﬁnite average waiting time (cid:104) τ (cid:105) . The jump length probability densityis chosen to be a Gaussian: ζ ( (cid:96) ) = 1 √ πa exp (cid:18) − (cid:96) a (cid:19) (S68)with a variance a . For such a process, the MSD follows (for long times) [28]: (cid:104) x ( t ) (cid:105) = θ t θ (S69)(with x (0) = 0 ) where θ = 2Γ(1 + α )Γ(1 − α ) a τ ∗ ) α , (S70)and θ = α. (S71) D Simulation procedures

In this section we provide details about the methods used to generate the data for our prototypicalexample systems introduced in section C. Simulations ran to a stop time t stop . All simulation parametersare listed in Sec. D.5. 34 .1 Brownian motion (BM) BM in one dimension is simulated using random jump lengths drawn from a normal distribution. Insome detail, we start by taking the cumulative sum of N random numbers from a Gaussian distributionwith zero mean and variance a , and square each element of the sum. Each step increments time by (cid:15) .This is repeated M times and summed and averaged. In short, the MSD was computed as: y i = 1 M M (cid:88) m =1 (cid:34) i (cid:88) n =1 r ( m ) n (cid:35) , (S72)where r ( m ) n is a random number drawn from a normal distribution, associated with the length of the n thjump for trajectory m . The diﬀusion constant for this type of process is D = a / (2 (cid:15) ) . D.2 Damped harmonic oscillation (DHO)

When simulating the harmonic oscillation in a heat bath, see eq. (S42), we follow the procedure describedby Nørrelykke and Flyvbjerg[56] (at critical damping, ω = 0 ). D.3 Fractional Brownian motion (FBM)

For FBM simulations we used an algorithm by Davies and Harte [59, 60]. When ﬁtting the model ineq. (S66), we include only time points t ≥ T since this model prediction for the MSD, as for CTRW (seesection D.4), is only valid for large simulation times. D.4 Continuous time random walk (CTRW)

For generating the CTRW data we move a "particle" randomly with a step length drawn from a Gaussianprobability density, eq. (S68), at each time step and increment time with a waiting time τ from the power-law distribution in eq. (S67). In more detail: while the process time, t , is smaller than the designatedstop time we repeat the following procedure to generate one trajectory m :1. Draw a random waiting time, τ , from the power-law in eq. (S67).2. Move the particle, by increasing the current displacement by a random number r drawn from anormal distribution.3. Update the time t by τ .The procedure is repeated M times and averaged over, to yield the MSD. Since the prediction in eq. (S69)is only valid for t (cid:29) τ ∗ , for ﬁtting purposes, we include only time points t ≥ T in the χ expression, eq.(S1), and in the associated parameter covariance estimation formula, eq. (4) in the main text. D.5 Simulation parameters

Below we list the simulation parameters used in all simulations in the main text and for the Supplemen-tary Figures. We also give values for the ﬁrst sampling time, T , used in the ﬁt procedure (some of thefunctional forms used for ﬁtting are only valid for "large" times). • BM.

Time increment, (cid:15) = 1 (dimensionless). Step length variance, a = 1 (dimensionless). Simu-lation stop time t = 10 (cid:15) . First sampling time, T = (cid:15) . • DHO.

Spring constant κ = 1 (dimensionless). Mass m = 1 (dimensionless). Initial position, x = 1 (dimensionless). Thermal energy, k B T = 10 − (dimensionless). Simulation stop time, t stop = 20 ω − (with ω = (cid:112) κ/m = 1 ). First sampling time, T = ω − .35 FBM.

Hurst exponent, H = 1 / , unless stated otherwise. Time increment, (cid:15) = 1 (dimensionless).Prefactor in covariance matrix, c = 1 (dimensionless). Simulation stop time, t stop = 10 (cid:15) . Firstsampling time, T = 200 (cid:15) . • CTRW.

Power-law exponent, α = 0 . . Step length variance, a = 1 (dimensionless). Character-istic time scales τ ∗ = 1 (dimensionless). Simulation stop time, t stop = 10 τ ∗ . First sampling time, T = 10 τ ∗ . E Bias eﬀects in parameter estimation

In this section, we provide analytical expressions for the bias in parameter (diﬀusion constant) estimationfor BM. We ﬁnd that for BM the CCM method has a bias which increases strongly with the numberof sampling times, N . In contrast, the WLS method provides a (small) bias which is independent of N for large N . To make notation compact, we leave summations over repeated indices implicit (where noconfusion can occur) in this section. E.1 The origin of bias

In general the bias, i.e., the expected diﬀerence between some observable based on sample estimates andthe “true” value of that observable, can be written as a series expansion in terms of /M , where M isthe number of trajectories [61]. To understand why this is so, in the present context, we recall thatany sample mean or sample covariance, Q ijk.... (where i , j , k etc. labels sampling times), is an average(normalized sum) over the M trajectories. The multivariate central limit theorem tells us that for large M we can, for such averages, write Q ijk... = Q ∗ ijk.... + γ ijk... / √ M , where γ ijk... is a zero-mean “noise”.Therefore any observable, O , which is a function of one, or several, such sample estimates (the optimalparameters ˆ θ and their associated covariance matrix ˆ∆ , see previous sections, are examples of suchobservables) will (schematically) have a Taylor series expansion of the form: O = O ∗ + ∞ (cid:88) k =1 A k √ M M k − + ∞ (cid:88) k =1 B k M k (S73)for large M . The ﬁrst term in the Taylor expansion is the sought quantity, O ∗ . Considering the remainingterms, we note that, by construction, we have that (cid:104) A (cid:105) = 0 , and hence the ﬁrst non-zero term of theexpectation value of the expression above is (cid:104) B (cid:105) /M ∝ /M . For the case that the observable, O , isa function of more than one independent sample estimates, then we have (cid:104) A k (cid:105) = 0 for all k . However,note that if O is a function of several sample estimates which are dependent , then in general (cid:104) A k (cid:105) (cid:54) = 0 for k ≥ . We can safely remove the ﬁrst bias-term with a jackknife procedure [46], see section G. Alsohigher order bias terms can be removed formally. However, already at the second order bias reductionlevel computational costs becomes considerable. E.2 Bias in parameter estimation of CCM for linear ﬁt functions

Consider equations (S1) and (S4). We write the sample estimator of the covariance matrix eq. (S39),and the exact, Q ∗ , as related by Q ij = Q ∗ ij + η ij , (S74)where η represents their deviation. We seek the “noise” in the inverse, ( Q − ) ij . Using the normalizationcondition, and writing ( Q − ) ij = ( Q ∗− ) ij + ξ ij , (S75)we get I = Q Q − = ( Q ∗ + η )( Q ∗− + ξ ) = I + ηQ ∗− + Q ∗ ξ + ηξ . (S76)36hus, to ﬁrst order ηQ ∗− + Q ∗ ξ = 0 , and by deﬁnition η = Q − Q ∗ : ξ = Q ∗− − Q ∗− Q Q ∗− . (S77)Using eq. (S75) in eq. (S1) and eq. (S4) yields ˆ θ = y T ( Q ∗− + ξ ) tt T ( Q ∗− + ξ ) t = y T Q ∗− tt T Q ∗− t (cid:16) t T ξttQ ∗− t (cid:17) + y T ξtt T Q ∗− t (cid:16) t T ξttQ ∗− t (cid:17) ≈ t T Q ∗− t (cid:18) y T Q ∗− t + y T ξt − y T Q ∗− ttQ ∗− t tξt (cid:19) , (S78)where we did a series expansion to ﬁrst order in ξ . Using eq. (S77) we get ˆ θ = y T Q ∗− tt T Q ∗− t + y T ( Q ∗− − Q ∗− Q Q ∗− ) tt T Q ∗− t − y T Q ∗− t ( t T Q ∗− t ) (cid:16) t T Q ∗− t − t T Q ∗− Q Q ∗− t (cid:17) = y T Q ∗− tt T Q ∗− t − y T Q ∗− Q Q ∗− tt T Q ∗− t + y T Q ∗− t ( t T Q ∗− t ) t T Q ∗− Q Q ∗− t (cid:124) (cid:123)(cid:122) (cid:125) bias = B . (S79)Note that the expectation value of the ﬁrst term on the right hand side evaluates to θ ∗ , hence theadditional terms yield the bias, whose expectation value, (cid:104) B (cid:105) , we now seek. It is convenient to writeeq. (S79) on component form (repeated indices are summed over) with B = B + B where B = − y k ( Q ∗− ) ki Q ij ( Q ∗− ) jl t l t T Q ∗− t (S80a) B = y i ( Q ∗− ) ik t k t j ( Q ∗− ) jm Q ml ( Q ∗− ) ln t n ( t T Q ∗− t ) (S80b)(the component form of the quantity appearing in the denominators above is t T Q ∗− t = t p ( Q ∗− ) pq t q ).We thus see that the expected bias, (cid:104) B (cid:105) , is determined by expectation value ( a, b, c . . . label trajectories): (cid:104) y k Q ij (cid:105) = 1 M ( M − (cid:104) M (cid:88) a =1 y ( a ) k (cid:34) M (cid:88) b =1 y ( b ) i y ( b ) j − M M (cid:88) b =1 y ( b ) i M (cid:88) c =1 y ( c ) j (cid:35) (cid:105) . (S81) E.3 Bias in parameter estimation of CCM for BM

Let us now consider the expected bias for CCM ﬁtting for BM using the formal expression in sectionE.2. We have: (cid:104) y ( a ) k (cid:105) = (cid:68) (cid:104) x ( a ) k − x ( a ) (0) (cid:105) (cid:69) = σ ∗ k = V ∗ kk , (S82)where we in the last step used eq. (S33). Also (cid:104) x ( a ) i − x ( a ) (0) (cid:105) = 0 , and since diﬀerent realizations(trajectories) are independent we have (cid:104) x ( a ) i x ( b ) j (cid:105) = δ a,b V ∗ ij . (S83)Higher order terms can be calculated using Wick’s theorem, eq. (S36) (for large i , x ( a ) i is a sum of manysmall increments, from the central limit theorem it follows that x ( a ) i are Gaussian). We have (cid:104) y ( a ) i y ( b ) j (cid:105) = (cid:104) ( x ( a ) i ) ( x ( b ) j ) (cid:105) = (cid:104) x ( a ) i x ( a ) i x ( b ) j x ( b ) j (cid:105) = (cid:104) x ( a ) i x ( a ) i (cid:105)(cid:104) x ( b ) j x ( b ) j (cid:105) + (cid:104) x ( a ) i x ( b ) j (cid:105)(cid:104) x ( a ) i x ( b ) j (cid:105) + (cid:104) x ( a ) i x ( b ) j (cid:105)(cid:104) x ( a ) i x ( b ) j (cid:105) = σ ∗ i σ ∗ j + 2( V ∗ ij ) δ a,b . (S84)37ow, in the same way for higher order terms, we get (cid:104) y ( a ) k y ( b ) i y ( c ) j (cid:105) = (cid:104) x ( a ) k x ( a ) k x ( b ) i x ( b ) i x ( c ) j x ( c ) j (cid:105) = [ tedious enumeration of all cases ] == σ ∗ k σ ∗ i σ ∗ j + 2 σ ∗ k ( V ∗ ij ) δ b,c + 2 σ ∗ j ( V ∗ ki ) δ a,b + 2 σ ∗ i ( V ∗ kj ) δ a,c + 8( V ∗ ki ) ( V ∗ kj ) ( V ∗ ij ) δ a,b δ b,c δ a,c , (S85)(no sum over repeated indices). Eq. (S81) now becomes (cid:104) y k Q ij (cid:105) = 1 M ( M − M (cid:88) a =1 M (cid:88) b =1 (cid:104) y ( a ) k y ( b ) i y ( b ) j (cid:105) (cid:124) (cid:123)(cid:122) (cid:125) U − M ( M − (cid:88) a,b,c (cid:104) y ( a ) k y ( b ) i y ( c ) j (cid:105) (cid:124) (cid:123)(cid:122) (cid:125) U . (S86)Using eq. (S85) we get: U = (cid:88) a,b (cid:104) y ( a ) k y ( b ) i y ( b ) j (cid:105) = M σ ∗ k σ ∗ i σ ∗ j + 2 M σ ∗ k ( V ∗ ij ) + 2 M σ ∗ j ( V ∗ ki ) + 2 M σ ∗ i ( V ∗ kj ) + 8 M V ∗ ki V ∗ ij V ∗ kj (S87a) U = (cid:88) a,b,c (cid:104) y ( a ) k y ( b ) i y ( c ) j (cid:105) = M σ ∗ k σ ∗ i σ ∗ j + 2 M (cid:104) σ ∗ k ( V ∗ ij ) + σ ∗ j ( V ∗ ki ) + σ ∗ i ( V ∗ kj ) (cid:105) + 8 M V ∗ ki V ∗ ij V ∗ kj . (S87b)Combining eq. (S87) with eq. (S86) results in: (cid:104) y k Q ij (cid:105) = 1 M ( M − (cid:104) (2 M − M ) σ ∗ k ( V ∗ ij ) + (8 M − V ∗ ki V ∗ ij V ∗ kj (cid:105) = 2 σ ∗ k ( V ∗ ij ) + 8 M V ∗ ki V ∗ ij V ∗ kj . (S88)Using eq. (S88) in eq. (S80a) we ﬁnd (cid:104) B (cid:105) = − σ ∗ k ( Q ∗− ) ki δ i,l t l + M V ∗ ki V ∗ ij V ∗ kj ( Q ∗− ) ki ( Q ∗− ) jl t l t T Q ∗− t , (S89)where we used that Q ∗ ij ( Q ∗− ) jl = δ i,l . Now consider B , eq. (S80b). We write eq. (S88) according to(also see eq. (S37)) (cid:104) y i Q ml (cid:105) = σ ∗ i Q ∗ ml + 8 M V ∗ im V ∗ ml V ∗ li . (S90)Eq. (S80b) now becomes (cid:104) B (cid:105) = ( Q ∗− ) ik t k t j ( Q ∗− ) jm (cid:2) σ ∗ i Q ∗ ml + M V ∗ im V ∗ ml V ∗ li (cid:3) ( Q ∗− ) ln t n ( t T Q ∗− t ) = σ ∗ i ( Q ∗− ) ik t k t T Q ∗− t + 8 M ( Q ∗− ) ik t k t j ( Q ∗− ) jm V ∗ im V ∗ ml V ∗ li ( Q ∗− ) ln t n ( t T Q ∗− t ) . (S91)Combining B and B we arrive at an expression for the predicted ﬁrst order bias (eq. (S79)) for thesuggested matrix, R [ CCM ] ; (notice the cancellations of the ﬁrst terms): (cid:104) B (cid:105) = 1 M t T Q ∗− t (cid:18) ( Q ∗− ) ik t k t j ( Q ∗− ) jm V ∗ im V ∗ ml V ∗ li ( Q ∗− ) ln t n t T Q ∗− t − V ∗ ki V ∗ ij V ∗ jk ( Q ∗− ) ki ( Q ∗− ) jl t l (cid:19) , (S92)38hich can be analytically evaluated. With this in mind we use eq. (S33), with t i = i(cid:15) , and eq. (S41), ineq. (S92). When evaluating the associated sums over repeated indices in eq. (S92), one uses: min( i, j ) = (cid:40) i, if i ≤ jj, if i > j (S93)and then splits the sums accordingly. This splitting leads to sums on the form I ( m, p ) = (cid:88) k k m (2 k − p , (S94)where m and p are positive integers. These sums are rewritten according to I ( m, p ) = 12 m (cid:88) k k − p ((2 k −

1) + 1) m = 12 m m (cid:88) q =1 (cid:18) mq (cid:19) (cid:88) k (2 k − q − p , (S95)where we used the binomial theorem. The full calculation is tedious but straightforward. The ﬁnal resultis: (cid:104) B (cid:105) = DM G ( N ) G ( N ) = − ad + bd a = N s − s b = 116 (3 s − s ) d = s s n = N (cid:88) k =1 k − n . (S96a)(S96b)(S96c)(S96d)(S96e)(S96f) E.3.1 Asymptotic expansion

Let us now investigate eq. (S96) for large N . To that end, we write s n , deﬁned above, according to s n = N (cid:88) k =1 (cid:18) k − n + 1(2 k ) n − k ) n (cid:19) = N (cid:88) k =1 k − n − n N (cid:88) k =1 k − n . (S97)In eq. (S96), there are three sums, s , s and s . Out of these sums, s decays most slowly with N and hence this sum is the only one which needs to be kept for large N . From eq. (0.131) in Gradshteyn et al. [62] we have N (cid:88) k =1 k = γ + ln N + 12 N + O ( 1 N ) , (S98)where γ ≈ . is the Euler-Mascheroni constant. Combining the result above with eq. (S97) andeq. (S96) we arrive at the asymptotic expression G ( N ) ≈ − N ln N + γ + 2 ln 2 , (S99)where we used ln ab = ln a + ln b . For large N , eq. (S99) is a good approximation compared to the exactbias, eq. (S96), see Supplementary Figure S3. 39 .4 Bias in parameter estimation of WLS for BM Let us now consider the second case, eq. (S5), of choosing R . According to eqs. (S3) and (S5) we havethe following: ˆ θ = y T Q − new tt T Q − new t , (S100)where Q new ,ij = Q ij δ i,j (S101) Q ∗ new ,ij = Q ∗ ij δ i,j (S102) ( Q − new ) ij = δ i,j /Q ij (S103) ( Q ∗− new ) ij = δ i,j /Q ∗ ij . (S104)The calculation starting from eq. (S77) to eq. (S79) is identical to before, just replace Q with Q new ,and same for exact results. Since our new matrices are diagonal, eq. (S80) becomes (we here reintroduceexplicit sums for the sake of clarity) B = − (cid:80) k y k Q ∗ kk ) Q kk t k (cid:80) q t q /Q ∗ qq (S105a) B = (cid:80) j,k y k Q ∗ kk t k · t j Q ∗ jj ) Q jj (cid:16)(cid:80) q t q /Q ∗ qq (cid:17) . (S105b)Also the calculation from eq. (S81) which leads up to eq. (S88) is identical. From eq. (S105) we see thatwe need (cid:104) y k Q jj (cid:105) =2 σ ∗ k ( V ∗ jj ) + 8 M ( V ∗ kj ) V ∗ jj ( j, k ﬁxed ) , (S106a) (cid:104) y k Q kk (cid:105) =2 σ ∗ k + 8 M σ ∗ k ( k ﬁxed ) . (S106b)Substituting eq. (S106b) into eq. (S105a), and using eq. (S37) Q ∗ kk = 2( V ∗ kk ) = 2 σ ∗ k , and σ ∗ k = 2 Dt k we get (with sums explicitly written) (cid:104) B (cid:105) = − (cid:80) k Q ∗ kk ) (cid:0) σ ∗ k + M σ ∗ k (cid:1) t k (cid:80) q t q /Q ∗ qq = − (1 + M ) (cid:80) k / D (cid:80) k / (2 D ) = − D (cid:18) M (cid:19) . (S107)In much the same way, we insert eq. (S106a) into eq. (S105b) (cid:104) B (cid:105) = (cid:80) j,k (cid:16) σ ∗ k σ ∗ j + M σ ∗ j ( V ∗ kj ) (cid:17) σ ∗ k t k t j σ ∗ j (cid:16)(cid:80) q t q / σ ∗ q (cid:17) = (cid:80) j,k (cid:16)

14 1(2 D ) + M D ) ( V ∗ kj ) t k t j (cid:17) / D ( (cid:80) k = 2 D + 2 M D N (cid:88) j (cid:88) k ( V ∗ kj ) t k t j (cid:124) (cid:123)(cid:122) (cid:125) I . (S108)40onsider the double sum, I , in eq. (S108). We have time step t j = (cid:15)j and separate the sums into j = k and j (cid:54) = k , which gives V ∗ ij = 2 D(cid:15) min( i, j ) I = 4 D N (cid:88) k =1 N (cid:88) j =1 [min( j, k )] jk = 4 D  N (cid:88) k =1 N (cid:88) k =1 k − (cid:88) j =1 [min( j, k )] jk  = 4 D  N + 2 N (cid:88) k =1 k k − (cid:88) j =1 j  = 4 D (cid:32) N + 2 N (cid:88) k =1 k k ( k − (cid:33) =4 D N (cid:88) k =1 k = 2 D N ( N + 1) , (S109)which inserted in eq. (S108) yields (cid:104) B (cid:105) = 2 D + 4 DM (cid:18) N + 1 (cid:19) , (S110)from which we get the complete full bias together with eq. (S107): (cid:104) B (cid:105) = (cid:104) B (cid:105) + (cid:104) B (cid:105) = 4 DM (cid:18) N − (cid:19) . (S111)Thus, ˆ θ − θ ∗ = − DM (cid:18) − N (cid:19) . (S112)Note that the bias is independent of N for large N . E.5 Lack of bias for BMALS

We now consider our third and ﬁnal choice of R -matrix for BM. Since (cid:104) ¯ y i (cid:105) = y ∗ i and R is a true inversecovariance matrix (and hence no sample estimate, see eq. (S6)) it follows immediately, by taking theexpectation value of eq. (S3), that the BMALS parameter estimate is unbiased. E.6 Lack of bias in parameter estimation of CCM for DHO

For the DHO problem we choose as our observable the particle position, i.e., we use y ( m ) k = x ( m ) k , where m labels diﬀerent trajectories. For a good ﬁt, the DHO parameter estimates ˆ θ are unbiased for CCM.To see this, consider the CCM minimization criterion eq. (S26) for DHO, which we write ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ Q − ( f ( ˆ θ ) − y ∗ ) − ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ Q − ( y − y ∗ ) . (S113)As in previous subsections, we then expand the inverse sample covariance matrix around its true value,i.e., we write Q − = Q ∗− + ξ , where ξ is given in eq. (S77). By expanding the right-hand side of eq.(S113) in f ( ˆ θ ) − y ∗ , y − y ∗ and ξ , we arrive at ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ Q ∗− ( f ( ˆ θ ) − y ∗ ) (cid:124) (cid:123)(cid:122) (cid:125) F ( ˆ θ ) (S114)41 ∂ f ∂θ a (cid:12)(cid:12)(cid:12)(cid:12) θ = ˆ θ (cid:16) ( Q ∗− − Q ∗− Q Q ∗− )( f ( ˆ θ ) − y ∗ ) + Q ∗− ( y − y ∗ ) − ( Q ∗− − Q ∗− QQ ∗− )( y − y ∗ ) (cid:17)(cid:124) (cid:123)(cid:122) (cid:125) G ( ˆ θ ) . (S115)Since F ( ˆ θ ) involves only the true covariance matrix and y ∗ , the solution to F ( ˆ θ ) = 0 yields the trueparameter value, i.e., we have F a ( θ ∗ ) = 0 . If the ﬁt is good, then we obtain the solution to eq. (S114)using a Taylor expansion, i.e., we write F a ( ˆ θ ) ≈ F a ( θ ∗ ) + (cid:80) b w ab (ˆ θ b − θ ∗ b ) = (cid:80) b w ab (ˆ θ b − θ ∗ b ) , where w ab = ∂ b F a ( ˆ θ ) /∂ ˆ θ b | ˆ θ = θ ∗ . Inserting this into eq. (S114) and solving for ˆ θ a , we get ˆ θ a = θ ∗ a + (cid:88) b ( w − ) ab G b ( ˆ θ ) (cid:124) (cid:123)(cid:122) (cid:125) B a . (S116)Thus, the bias in the estimated parameter, ˆ θ a , is determined by the expectation value of B a . Anapplication of Wick’s theorem for Gaussian variables yields (cid:104) y k Q ij (cid:105) = (cid:104) y k (cid:105) Q ∗ ij . (S117)This result is a direct consequence of the fact that the positions at diﬀerent times for the DHO processare distributed according to a multivariate Gaussian. This Gaussianity, in turn, follows from the factthat the harmonic oscillator position is a linear function of the imposed Gaussian noise, see eq. (S42).Using eq. (S117) and the fact that (cid:104) y k (cid:105) = y ∗ k and (cid:104) Q ij (cid:105) = Q ∗ ij we ﬁnd that (cid:104) G ( ˆ θ ) (cid:105) = 0 and therebythat indeed (cid:104) B a (cid:105) = 0 , i.e., the CCM parameter estimate for DHO does not suﬀer from the bias problemsdiscussed in the previous subsections. F Approximate distribution for the estimated parameters

In the main text we saw that if M (the number of trajectories) is large enough the distribution for theestimated parameters is approximately Gaussian, see Figure 1 in the main text. To understand why thisis so, we note that a set of random number, y i ( i = 1 , . . . N ), from the Gaussian distribution in eq. (S8)can be generated using y i = y ∗ i + 1 √ M η i (S118)where η is a zero mean Gaussian random number with ( M -independent) covariance matrix Q . Considernow a function F ( y ) , and note that the estimated parameters, θ a , are functions of this type. We thenTaylor-expand: F ( y ) ≈ F ( y ∗ ) + 1 √ M A · η + O ( 1 M ) , (S119)where A is a matrix containing partial derivatives. Now assuming that the second term of the RHSabove is non-zero, that the matrix A is full rank, and that all terms higher than or equal to /M can beneglected, we have that the distribution for F is another Gaussian. This follows from the fact that A · η is normally distributed if η are drawn from a multivariate Gaussian. [63]. G Jackknife bias reduction

Through data resampling, bias in data-ﬁtting can often be reduced. Let O be the parameter estimator,based on some data set with M trajectories. The associated true parameter is denoted by O ∗ . Herein,we choose O as either the estimated parameters ˆ θ , obtained by minimizing eq. (S1), or the associated42ovariance matrix ˆ φ , eq. (4b) in the main text. As outlined in section E.1, one often expects such aﬁnite data set to yield a bias contribution of the form O = O ∗ + aM + bM + cM + O (cid:18) M (cid:19) . (S120)The bias terms can be reduced by increasing the data samples, M , or by using the jackknife method [46].Let us split the sample into g groups, each of size h , and deﬁne O [ − j ] as the parameter ﬁtted to a datasample with the j th group removed. G.1 First order jackknife bias reduction

The ﬁrst order bias term can be removed through repeated ﬁtting and averaging over the sampled dataset: O (1) = 1 g g (cid:88) j =1 O [ − j ] (S121a) O (0 , J = gO − ( g − O (1) . (S121b)By using eq. (S120) which has bias terms proportional to M = hg for the full ﬁtting, O , and h ( g − for the reduced sample estimator in eq. (S121), we see that we are left with O (0 , J = O ∗ − bh g ( g − − ch (cid:18) g − − g (cid:19) + O ( g − ) ≈ O ∗ − bM − cM , (S122)lacking the ﬁrst order bias term. Although the higher order terms remain, their contribution is oftenlower than the ﬁrst order term. G.2 Second order jackknife bias reduction

For further bias reduction we can apply a second order correction. In a similar spirit to what is done inthe ﬁrst order jackknife, we split the data into g groups, and deﬁne O [ − j, − j (cid:48) ] as the parameter estimatorbased on a data set with the j th and j (cid:48) th group removed, each of size h . Following Schucany et al. [64]we get O (2) = 2 g ( g − g (cid:88) j

2) + O ( g − ) ≈ O ∗ + cM . (S124)43 .3 Variance for jackknife-bias-reduced estimators In this section, we use eq. (S9) to show that ˆ θ a − θ ∗ a is insensitive (to lowest orders in /M ) to thejackkniﬁng procedure. As a consequence, the covariance estimation formula, eq. (4) in the main text,remains valid also for jackknifed parameter estimations.For later convenience, we deﬁne the derivative in eq. (S9) as A a,i = ∂ ˆ θ a ∂y i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) y = y ∗ , (S125)which we will use in the following. G.3.1 First order jackknife bias reduction

To ﬁrst order the jackknife estimator is obtained by dividing the M trajectories into g groups of size h .Deﬁne the observable O [ − j ] ,i as the estimate for observable O , in point i , with group j removed. Inparticular, y [ − j ] ,i = 1 M − h (cid:88) m (cid:54) = m j y ( m ) i = 1 M − h  M (cid:88) m =1 y ( m ) i − (cid:88) m j y ( m ) i  . (S126)The corresponding non-jackknifed estimator is y i = 1 M M (cid:88) m =1 y ( m ) i . (S127)The bias of the ﬁrst order jackknife estimator of θ ∗ a within the WLS-ICE method (see section A) is θ (0 , J,a − θ ∗ a = g ˆ θ a − ( g −  g g (cid:88) j =1 θ [ − j ] ,a  − θ ∗ a = 1 h  M ˆ θ a − ( M − h ) 1 g g (cid:88) j =1 θ [ − j ] ,a  = 1 h (cid:88) i A a,i  M ( y i − y ∗ i ) − ( M − h ) 1 g g (cid:88) j =1 (cid:16) y [ − j ] ,i − y ∗ i (cid:17) = 1 h (cid:88) i A a,i  M (cid:88) m =1 ( y ( m ) i − y ∗ i ) − g g (cid:88) j =1  M (cid:88) m =1 ( y ( m ) i − y ∗ i ) − (cid:88) m j ( y ( m j ) i − y ∗ i )  = (cid:88) i A a,i  gh g (cid:88) j =1 (cid:88) m j ( y ( m j ) i − y ∗ i )  = (cid:88) i A a,i (cid:32) M M (cid:88) m =1 ( y ( m ) i − y ∗ i ) (cid:33) =ˆ θ a − θ ∗ a , (S128)where we used eq. (S9) to get to the second and last (ﬁfth) row, and eq. (S126)-(S127) for the third row.Thus θ (0 , J,a − θ ∗ a = ˆ θ a − θ ∗ a . (S129)Hence, jackkniﬁng a parameter estimate does not change the (co)variance: ( θ (0 , J,a − θ ∗ a )( θ (0 , J,b − θ ∗ b ) = (ˆ θ a − θ ∗ a )(ˆ θ b − θ ∗ b ) . (S130)44 .3.2 Second order jackknife bias reduction For the second order bias removal, the M trajectories are again divided into g groups. We deﬁne, asbefore, O [ − j, − j (cid:48) ] ,i as the estimate for observable O , in point i , with group j and j (cid:48) removed. In particular y [ − j, − j (cid:48) ] ,i = 1 M − h (cid:88) m (cid:54) = m j ,m j (cid:48) y ( m ) i = 1 M − h  M (cid:88) m y ( m ) i − (cid:88) m j y ( m j ) i − (cid:88) m j (cid:48) y ( m j (cid:48) ) i  . (S131)The average over all groups for θ a is θ (2) a = 1 g ( g − (cid:88) j (cid:54) = j (cid:48) θ [ − j, − j (cid:48) ] . (S132)The second order jackknife is now (as given by eq. (S123c)) θ (0 , , J,a = g θ (0 , J,a − g − θ (1 , J,a . (S133)Using eq. (S123b) we note θ (1 , J,a − θ ∗ a = 1 h  ( M − h )  g g (cid:88) j =1 θ [ − j ] ,a  − ( M − h )  g ( g − (cid:88) j (cid:54) = j (cid:48) θ [ − j, − j (cid:48) ] ,a  − θ ∗ a = 1 h (cid:88) i A a,i (cid:32) g g (cid:88) j =1 M (cid:88) m =1 ( y ( m ) i − y ∗ i ) − g g (cid:88) j =1 (cid:88) m j y ( m j ) i − y ∗ i  − (cid:34) g ( g − (cid:88) j,j (cid:48) M (cid:88) m =1 ( y ( m ) i − y ∗ i ) − g ( g − (cid:88) j,j (cid:48) M (cid:88) m j ( y ( m j ) i − y ∗ i ) − g ( g − (cid:88) j,j (cid:48) M (cid:88) m j (cid:48) ( y ( m j (cid:48) ) i − y ∗ i ) (cid:35)(cid:33) = 1 h (cid:88) i A a,i  − g g (cid:88) j =1 (cid:88) m j ( y ( m j ) i − y ∗ i ) + 1 g − (cid:88) j (cid:48) g (cid:88) j (cid:88) m j ( y ( m j ) i − y ∗ i ) + 1 g − (cid:88) j g (cid:88) j (cid:48) ( y ( m j (cid:48) ) i − y ∗ i )  = 1 h (cid:88) i A a,i (cid:18) − g + 1 g + 1 g (cid:19) (cid:88) j (cid:88) m j ( y ( m j ) i − y ∗ i ) . (S134)Thus θ (1 , J,a − θ ∗ a = (cid:88) i A a,i M (cid:88) j (cid:88) m j ( y ( m j ) i − y ∗ i ) = ˆ θ a − θ ∗ a (S135)and ( θ (0 , , J,a − θ ∗ a ) = ˆ θ a − θ ∗ a . (S136)Thus the second order jackknife estimator has the same variance and covariance as non-jackknifed esti-mators. H Estimation of errors on estimated parameters, using jackknifeand bootstrap procedures

H.1 Jackknife error estimation

In the heuristic jackknife error estimation one makes use of the quantities O [ − j ] , see section G, andcalculates[65, 47] σ J = g − g g (cid:88) j =1 [ O [ − j ] − O (1) ] (S137)45here O (1) is given in eq. (S121). Then σ J serves as an estimate for the error on the estimated parameter.Note that in contrast to jackknife bias reduction which is mathematically justiﬁed (based on the expectedﬂuctuations around estimated mean values using the central limit theorem), there is in the general caseno corresponding simple justiﬁcation for the jackknife error estimation procedure for the present type ofdata. H.2 Bootstrap error estimation

In the bootstrap error estimation, the scheme is: • First, bootstrap [66, 9, 47] our original M trajectories, i.e., pick M trajectories from the originaldata with replacement (the same trajectory may be picked several times). Denote by ( ˜ y ( m ) i , t i )the associated observables and compute the synthetic mean value of the chosen observable ¯ y i = M − (cid:80) m ˜ y ( m ) i . • Make a weighted least squares (WLS) ﬁt to the synthetic MSDs with respect to the ﬁtting param-eters. This ﬁtting yields parameters ˜ θ i .By repeating the two steps above many times (here, 100 times) we get a set of ﬁt parameters ˜ θ i ( i = 1 , , . . . , ). From this set we simply compute the standard deviation as an estimator of the errorfor the ﬁt parameters.[66, 9] I Coeﬃcient of determination

We determine the goodness of ﬁt by using the R coeﬃcient of determination, deﬁned as R = 1 − S res S tot . (S138)The method is based on a sum of squares over the N sampling points of, in our case, the mean positionsor the MSD, y ; hence, measuring the deviation from the sample mean in time , Y = 1 N N (cid:88) i =1 y i (S139) S tot = N (cid:88) i =1 ( y i − Y ) (S140) S res = N (cid:88) i =1 ( f ( t i ; θ ) − y i ) . (S141)Heuristically, a model that ﬁts data perfectly has an R = 1 , while if it does not ﬁt at all, R (cid:28) , seeSupplementary Figure S5. J Settings in "Particle Tracker" plug-in

For detecting and linking particles into trajectories from the Supplementary movies S1, S5 and S6 fromthe study by Chenouard et al.[48] we used the ImageJ plug-in "Particle Tracker" [50] (November 2016version) with the following settings: • • radius: 3 46 cutoﬀ: 3 • radius: 0.1 • LinkRange: 1 (default: 2) • displacement: 10.00 • Dynamics: Brownianand the following advanced options: • Object features: 1.000 • dynamics: 1.000 ••