Bayesian model selection with fractional Brownian motion
Jens Krog, Lars H. Jacobsen, Frederik W. Lund, Daniel Wüstner, Michael A. Lomholt
BBayesian model selection with fractional Brownian motion
Jens Krog, Lars H. Jacobsen, Frederik W. Lund, Daniel W¨ustner, and Michael A. Lomholt MEMPHYS, Department of Physics, Chemistry and Pharmacy,University of Southern Denmark, DK-5230 Odense M, Denmark Department of Biochemistry and Molecular Biology,University of Southern Denmark, DK-5230 Odense M, Denmark (Dated: April 5, 2018)We implement Bayesian model selection and parameter estimation for the case of fractional Brow-nian motion with measurement noise and a constant drift. The approach is tested on artificial tra-jectories and shown to make estimates that match well with the underlying true parameters, whilefor model selection the approach has a preference for simple models when the trajectories are finite.The approach is applied to observed trajectories of vesicles diffusing in Chinese hamster ovary cells.Here it is supplemented with a goodness-of-fit test, which is able to reveal statistical discrepanciesbetween the observed trajectories and model predictions.
I. INTRODUCTION
The nature of the motion of particles in biological cellsis often found to deviate significantly from Brownian mo-tion [1–4]. However, the most common method for ana-lyzing the motion, estimation of the time-averaged meansquare displacement (TA-MSD), cannot always distin-guish whether the diffusion is non-Brownian. For ex-ample, the TA-MSD is linear in time for both Brown-ian motion and continuous time random walks with longtailed power law distributed waiting times [5, 6]. Otherestimators have been suggested, which are able to distin-guish between some but not all classes of non-Browniandiffusion. Examples are p variation tests [7], first passagetimes [8], and mean maximal excursions [9]. See for in-stance [10] for a review. As further models of diffusionare introduced, the number and complextity of estima-tors increases, and it becomes unclear how to systemat-ically compare the models once the data is limited. Isthere a systematic approach?In this article we approach the problem from aBayesian perspective [11–13]. Bayesian statistics pro-vides a systematic framework for comparing differentprobabilistic models to select the one that best describesa given dataset. All subjectivity is clearly stated in thespecification of the prior probability distributions for themodel parameters. The framework relies on the com-putation of the likelihood function, and can be appliedon the raw trajectory, without filtering out informationby computing other estimators. However, sometimes thecomputational complexity involved is an obstacle to afull Bayesian approach. We tackle this using the nestedsampling approach introduced by Skilling [14].This work focuses on the specific model of anomalousdiffusion called fractional Brownian motion (FBM) [15],in which the steps are correlated with long term mem-ory leading to a MSD which is non-linear in time. Inaddition to using the Bayesian framework to distinguishpure and fractional Brownian motion, we also comparemodels featuring measurement noise and constant drift.Our implementation of the Bayesian inference is mainly implemented in Matlab while some calculations are im-plemented in C to increase performance. The scripts arepublicly available at GitHub [16].While the Bayesian approach provides rankings for aset of candidate models, it does not address the questionof whether the best of these models actually describesthe data well. We therefore supplement the Bayesian ap-proach with a goodness-of-fit test in the form of the in-formation content model check of [17], which calculates a p value giving a measure of how typical the observed datais for a specific model. If extreme p values are found, theanalysis reveals that the corresponding model does notdescribe the data well, and that some important phys-ical feature has been overlooked. There are other teststhat one could have used alternatively: for instance a testof how the TA-MSD match the model prediction [18] orinvestigation of the detrending moving average statistic[19]. The information content check does not rely onthe existence of an estimator like the MSD, but utilizesonly the likelihood function, which is available from theBayesian analysis.A number of other works on single particle trackingfrom a Bayesian perspective already exist, e.g inferencewith hidden Markov models (HMM) [20–22] and regu-lar diffusion in a potential energy landscape [23]. As allthese approaches utilize the likelihood function and noother estimator, combining their analysis with ours is astraightforward exercise. Such a combination, possiblealso with future works, would yield a systematic modelselection among a large set of models for single particletracking data. We also note that a number of previousworks on the aspect of parameter estimation for FBMby Bayesian methods have been published previously, forinstance [24–26].The article is organized as follows. In Sec. II we in-troduce Bayesian inference and the model of fractionalBrownian motion with measurement errors and drift.The Nested sampling frameework enabling Bayesian in-ference for such models is also outlined. We perform pa-rameter estimation and model comparison for artificialand experimental data sets in Sec. III, and then move onto supplement with independent goodness-of-fit tests in a r X i v : . [ phy s i c s . d a t a - a n ] A p r Sec. IV. We offer our conclusions on the results in Sec. V.
II. BAYESIAN INFERENCE ANDFRACTIONAL BROWNIAN MOTION
When using Bayesian inference [11–13] to select themost probable model among a number of models, M , M ,. . . , based on some data the starting point is Bayes’formula P ( M i | data) = P (data | M i ) P ( M i ) P (data) . (1)The probability on the left hand side is the poste-rior probability of the model given the data, while P (data | M i ) is the likelihood or evidence of the model and P ( M i ) is the prior probability of the model. The modelindependent probability, P (data), drops out when onecompares the probability of different models by takingtheir ratio: P ( M i | data) P ( M j | data) = P (data | M i ) P (data | M j ) P ( M i ) P ( M j ) . (2)If the models are considered equally probable before tak-ing the data into account, i.e., P ( M i ) = P ( M j ), the mod-els are simply ranked by their evidence Z i = P (data | M i ).Usually a model includes a set of parameters θ , thevalues of which are to be inferred. For these parameters,a prior probability π ( θ ) = P ( θ | M i ) must be assignedaccording to the knowledge (or lack of knowledge) of theparameters in the absence of the data. The evidencecan then be calculated as an integral over the parameterspace Z i = (cid:90) L ( θ ) π ( θ ) d θ , (3)where we have introduced the likelihood of the parame-ters L ( θ ) = P (data | θ , M i ) as the probability of the datagiven the parameters of the considered model.For a specific model M i the posterior probability dis-tribution of the parameters: P ( θ | M i , data) = L ( θ ) π ( θ ) Z i (4)specifies the estimated parameter values and their uncer-tainties. In order to do so, however, we must be able tocalculate the likelihood of the data, L ( θ ). A. The likelihood function for FBM
The class of models analysed in this work is that offractional Brownian motion. FBM in one dimension isa zero mean stationary Gaussian process where the dis-placements ˜ x k and ˜ x n from the starting point ˜ xx = 0 attwo later times kτ and nτ are correlated such that [15] (cid:104) ˜ x k ˜ x n (cid:105) = D H (cid:2) ( kτ ) H + ( nτ ) H − | kτ − nτ | H (cid:3) . (5) If the data consists of N + 1 observations˜ x n : n = 0 , , , . . . , N at evenly spaced time intervals oflength τ then the corresponding one step displacements∆˜ x n = ˜ x n − ˜ x n − will have autocovariance function γ ( k ) = (cid:104) ∆˜ x n ∆˜ x n + k (cid:105) = D H τ H (cid:2) | k + 1 | H + | k − | H − | k | H (cid:3) . (6)Collecting the displacements in a column vector ∆ ˜ x N with transpose ∆ ˜ x TN = [∆˜ x , ∆˜ x , . . . , ∆˜ x N ] we canwrite the likelihood function, i.e., the probability of ob-serving ∆ ˜ x N , as L x ( θ ) = 1(2 π ) N/ | Γ N | / exp (cid:16) − ∆ ˜ x TN Γ − N ∆ ˜ x N (cid:17) (7)where Γ − N is the inverse of the N × N covariance matrixwith elementsΓ N,mn = (cid:104) ∆˜ x m ∆˜ x n (cid:105) = γ ( m − n ) (8)and determinant | Γ N | . For a specific choice of D H and H , the likelihood function can then be calculated viaEq. (8) and (6).For a large data set, conventional inversion of Γ N canbe computationally demanding. We circumvent this dif-ficulty by rewriting the likelihood expression by using thefact that P (∆ ˜ x N | θ ) = P (∆˜ x N | ∆ ˜ x N − , θ ) × P (∆ ˜ x N − | θ )= P (∆˜ x N | ∆ ˜ x N − , θ ) × P (∆˜ x N − | ∆ ˜ x N − , θ ) × · · · × P (∆˜ x | θ ) , (9)where each conditional likelihood can be expressed as P (∆˜ x n | ∆ ˜ x n − , θ ) = 1 (cid:112) πσ n exp (cid:18) − (∆˜ x n − ∆˜ µ n ) σ n (cid:19) . (10)The mean ∆˜ µ n and variance σ n in this expression are cal-culated iteratively using the Durbin-Levinson algorithm[27]. Setting ∆˜ µ = 0 and σ = γ (0) we iterate from n = 1 to N − µ n +1 = n (cid:88) j =1 φ n,j ∆˜ x n +1 − j , (11) σ n +1 = σ n (1 − φ n,n ) , (12)where φ n,n = γ ( n ) − (cid:80) n − j =1 γ ( n − j ) φ n − ,j σ n , (13) φ n,i = φ n − ,i − φ n − ,n − i φ n,n for 1 ≤ i < n, (14)with φ , = γ (1) /γ (0). For completeness we include aderivation of Eqs. (11-14) in Appendix A.We have defined FBM for zero-mean processes, butwe can straightforwardly relax this and allow for a con-stant average drift with velocity v x . Labeling the actualmeasured displacements as ∆ x n we simply subtract theaverage trend by defining ∆˜ x n = ∆ x n − v x τ .In order to apply the model to single-particle track-ing data obtained with a microscope producing two-dimensional images, we must generalize to two dimen-sions corresponding to two sets of coordinates; (˜ x n , ˜ y n ).By assuming that the motion in the two dimension areindependent, i.e. (cid:104) ˜ x n ˜ y n (cid:105) = 0, the likelihood function forthe two-dimensional fractional Brownian motions is givenby L ( D H , H, v x , v y ) = L x ( D H , H, v x ) L y ( D H , H, v y ) , (15)where L y ( D H , H, v y ) = P (∆ ˜ y N | D H , H, v y ) is also givenby Eq. (7). B. Measurement noise
When analysing single particle tracking data it can benecessary to distinguish between the experimentally ob-served particle position and the actual position, the dif-ference being that the actual position can be obscured byinaccuracies in the measurement process [28, 29].We include the possibility of a Gaussian measurementnoise via the Durbin-Levinson algorithm. Labelling theactual particle x -positions as x clean n and the measurementnoise for the positions η n , the observed positions become x n = x clean n + η n . (16)Assuming that the noise is memoryless with zero meanand variance (cid:104) η n (cid:105) = σ we find that the displacements∆˜ x n = x n − x n − − v x τ have autocovariance function γ ( k ) = (cid:104) ∆˜ x n ∆˜ x n + k (cid:105) with γ ( n ) = γ clean (0) + 2 σ , for n = 0 ,γ clean (1) − σ for n = 1 ,γ clean ( n ) for n > . (17)where γ clean ( n ) is the autocovariance in the absenceof measurement noise corresponding to the underlyingFBM. Thus the inclusion of this measurement noise isa simple modification of the Durbin-Levinson algorithmby the addition of one extra parameter, σ mn . It wouldbe a straightforward matter to generalize the model ofthe noise to include correlations among the η n and cal-culate the corresponding autocovariance function γ ( n ).Such correlations may arise due to the finite image ac-quisition time [28, 29] or due to conformational changesof the observed objects . C. Prior distributions
The Bayesian formalism, outlined in Eqs. (1)-(4), re-quires that a prior distribution π ( θ ) over the possibleparameter values of the model is specified. The prior rep-resents the knowledge, or lack of knowledge, about the system before the data is known. We will use uniformpriors π ( θ ) = (cid:40) θ max − θ min , if θ min ≤ θ ≤ θ max , otherwise (18)for parameters such as H that are unknown within somegiven interval from θ min to θ max and parameters, such asmeasurement noise or the components of the drift veloc-ity, that may be zero. For the diffusion constant which isalways positive, but where even the order of magnitudeof the value could be unknown, we will use the socalledJeffreys prior [13] π ( θ ) = (cid:40) θ max /θ min ) 1 θ , for θ min ≤ θ ≤ θ max . (19)Since D H will change units when H changes we will inpractice use the deviation for a single step, which wewill label σ H , as the parameter for which we specify aJeffrey’s prior and then define D H = σ H / (2 τ H ). D. Nested sampling
Besides the calculation of the likelihood function, themain practical obstacle in Bayesian inference is the cal-culation of the evidence of a given model, since this oftenincludes multidimensional integrals. We use the nestedsampling procedure, as introduced by Skilling [14], to ac-complish this task. We briefly introduce the method andspecific details of our implementation below, while we re-fer to [13] for further details. The essence of the nestedsampling procedure is to rewrite Eq. (3) on the form(suppressing the index i of the model) Z = (cid:90) L ( X ) dX. (20)Here, the function L ( X ) (distinguished from L ( θ ) by thevariable name) is defined as the inverse of the function X ( λ ) = (cid:90) L ( θ ) >λ π ( θ ) d θ . (21)The variable X , which we will loosely call the prior mass,is the proportion of the parameter space with likelihoodgreater than λ . When implementing nested sampling,the integral in Eq. (20) is estimated by the sum Z ≈ i max (cid:88) i =1 L i w i . (22)Here the likelihood values L i are computed by gener-ating K ‘walkers’ in the parameters space, denoted by θ k , where k = 1 , . . . , K . The walkers are independentlychosen points in parameter space, distributed randomlyaccording to the prior π ( θ ) (In our implementaion we set K = 200). From these K walkers the one, θ k ≡ θ (1)min ,with the smallest likelihood L ( θ k ) ≡ L is chosen for the i = 1 term in Eq. (22).For i > θ ( i )min and L i are generated iterativelyin the same manner, except that the K walkers arenow restricted to be in the part of the parameter spacewhere the likelihood is larger than L i − . This meansthat for each new i the considered prior mass will shrinkon average by a factor K/ ( K + 1) since the K walk-ers are distributed uniformly on the prior mass intervalfrom X = 0 to X = X ( L i − ) with the largest valuebeing X ( L ( θ ( i )min )). Accordingly the weights w i in thesum are chosen to shrink as w i = w i − K/ ( K + 1), with w = 1 / ( K + 1) corresponding to the average prior massdiscarded after the first iteration.After each iteration we are left with K − L i = L ( θ ( i )min ). To supplement these with an independentsample, we choose one of the K − K − • Defining u = F ( θ ) ≡ (cid:82) θθ min π ( θ (cid:48) ) dθ (cid:48) , a u ∗ israndomly chosen uniformly within the interval oflength l u centered on u , and θ ∗ = F − ( u ∗ ) is cal-culated. • A trial position θ ∗ in parameter space is con-structed by supplementing θ ∗ with the current val-ues of the other parameters for the walker. • If L ( θ ∗ ) > L i then the attempted move is acceptedand the walker moves to this position in parameterspace. If not, the move is rejected and the walkerremains at the original position.The procedure is repeated for each parameter coordinateand the random walk then ends after N sweeps = 30 at-tempted jumps for each parameter. To adjust the lengthof the moves as the prior mass shrinks we change thevalue of l u after each finished walk. If R denotes thefraction of rejected moves during the N sweeps attempted,then we update l u → min ( l u exp(0 . − R ) ,
1) (23)for that particular direction, which is then used for thenext iteration. Our l u adjustment aims at obtaining a re-jection fraction around 25%, while one could argue thatthe best fraction would be 50%. However, since the frac-tion of rejected steps is estimated from a walk possibly We use periodic boundary conditions at u ∗ = 0 and u ∗ = 1, andinitially l u = 1. performed in a different area of parameter space than thefollowing walk, we choose to be a bit conservative whenupdating l u such as to limit the number of walks whereall moves in a specific direction are rejected.Thus, the procedure differs from Skilling’s in two as-pects: (i) We consider each parameter on its own andstore a separate l u for each, reflecting that some pa-rameters are more tightly constrained by the likelihoodrequirement than others. (ii) The lengths are adjustedmore conservatively, to decrease the risk of losing inde-pendence between the samples.To obtain a termination criterion at some iteration i = i stop , we estimate the remainder of the integral as Z remain = w i (cid:80) Kk =1 L ( θ k ). If Z remain divided by the cur-rent estimate of the evidence Z = (cid:80) ij =1 L j w j is smallerthan some fixed number (we have used 10 − ) then weterminate the algorithm. At this point, we are left with K walkers which are included in the evidence by addingthe final Z remain = w i stop (cid:80) Kk =1 L ( θ k ) to Z . In Eq. (22)we thus set i max = i stop + K , and define L i stop + k = L ( θ k )and w i stop + k = w i stop for k = 1 , . . . , K .With regards to the uncertainty on the estimated evi-dence we follow Skilling [14] and estimate the uncertaintyof ln Z as (cid:112) H /K , where the information H = (cid:90) L ( X ) Z ln L ( X ) Z dX (24)is estimated as
H ≈ i max (cid:88) i =1 L i w i Z ln L i Z (25)Finally the mean and variance of the parameters, ormore generally the posterior average of any function f ( θ ),can be estimated as (cid:104) f ( θ ) (cid:105) ≈ i max (cid:88) i =1 f ( θ ( i )min ) L i w i Z . (26)In case large data sets, we reduce the amount of storedsamples θ ( i )min , to decrease the required computer memory.The number of samples have been kept below a thresholdby letting a new sample, produced by the nested samplingalgorithm, compete with a random old one. Of the pair,one is selected randomly with probabilities proportionalto the samples’s posterior probabilities. This sample isthen assigned a new posterior probability equal to thesum of the two. This sum is then used instead of L i w i /Z in Eq. (26) when calculating posterior averages. III. PARAMETER INFERENCE AND MODELSELECTION TESTS
To demonstrate the strengths and weaknesses of theBayesian approach, we have tested the implementationon a range of artificial trajectories as well as a bundleof experimental data where anomalous diffusion is sus-pected to appear.In Table I the results of our analysis is shown for an ar-tificially generated track. The underlying process is quitecomplex, and features positive correlations between steps( H = 3 /
4) as well as some measurement noise, but zerodrift. As shown in the final column, the correct model( i = 7) holds by far the largest evidence as expected.Between the 8 models, the true model is thus correctlyidentified in this case, while the simpler model ( i = 4),without measurement noise is the next best candidate.The reason why the model without measurement noisedoes not yield an even smaller posterior probability lieswithin the broad prior on the measurement noise rela-tive to the size of the true value. This broad prior re-flects a large uncertainty in the model ( i = 7) with themeasurement noise, which indeed weakens the model andthus decreases its evidence. Either increasing the lengthof the trajectory or narrowing the prior would lead tothe true model being selected with even higher certainty(data not shown).Table II shows the results for another simulated ex-ample, featuring negative correlations between the stepsinstead, as well as measurement noise and a constant driftin the x -direction. In this case our analysis does not cor-rectly distinguish between the negative correlations aris-ing from the fractional Brownian nature and the similarcontributions from the measurement noise. As before,a more narrow prior on the measurement noise or addi-tional data can remedy this effect. It does not have prob-lems discerning the drift though, although it offers as anunlikely alternative with around 1% probability that theprocess is superdiffusive with larger measurement noisethan the true value.The lesson from the examples in Table I and II is thatthe conclusions drawn with respect to model compari-son are sensitive to the amount of data and the priorassignments. When the data becomes sparse or the priorwidths increase, the analysis yields relatively higher ev-idende for the simpler models as compared to complexones. It is thus crucial for model selection to assign priorscorresponding to the uncertainty about each parametertaking everything except the data into account.In addition to the self consistency tests, the frame-work was utilized for a large ensemble of experimentaldata. Using a fluorescent analog of cholesterol enabledthe tracking of sterol rich vesicles in chinese hamster ovar-ian cells, a system known to exhibit anomalous diffusion.For two different temperatures, an ensemble of vesicleswere tracked, yielding 111 2-d trajectories at 25 ◦ C and170 trajectories at 37 ◦ C [30].We show a sample trajectory along with the evidencefor each diffusion model and the behavior of the like-lihood function around the maximum likelihood pointfor the best model in Fig. 1. The model evidences aredominated by that of fractional Brownian motion, andindeed the Hurst parameter is estimated to be H = FIG. 1. (a) A sample experimental trajectory besides theestimated evidences for each model. In (b)-(e) we displaythe evolution of the likelihood function around the maximumlikelihood point for model M , i.e., pure fractional Brownianmotion without drift. . ± . H pure = 1 /
2. Note that a key feature of theBayesian framework is that it yields probability distribu-tions about each inferred parameter, and not just a meanand error, although these are readily available. For com-pleteness, we demonstrate the complete parameter meanand error outputs of the analysis in Table III.To evaluate the degree of superdiffusion in the ensem-ble, we calculate the mean of the posterior distributionfor each trajectory, and combine them the histograms inFig. 2. A comparison between the histograms, indicatethat the superdiffusion is more profound at 37 ◦ C, as alsofound in [30].Turning to the model selection aspect of the Bayesianframework, we compare the model selection results from i log Z i σ H v x τ v y τ σ mn H log L max P ( M i | data)1 − . ± .
06 24 . ± . / − .
64 0 . − . ± .
12 23 . ± . . ± . − . ± . / − .
01 0 . − . ± .
09 24 . ± . . ± . / − .
64 0 . − . ± .
07 24 . ± . . ± . − .
98 0 . − . ± .
14 23 . ± . . ± . − . ± . . ± . / − .
01 0 . − . ± .
13 24 . ± . . ± . − . ± . . ± . − .
46 0 . − . ± .
10 20 . ± . . ± . . ± . − .
50 0 . − . ± .
13 21 . ± . . ± . − . ± . . ± . . ± . − .
87 0 . . − .
70 -TABLE I. Results obtained by applying the nested sampling algorithm on a trajectory artificially generated with N = 200time steps and parameters as given by the last line in the table. The different models are labelled with i and correspondto the 8 possible combinations of the following single point or uniform priors: H = 1 / ≤ H ≤ v x = v y = 0 or − ≤ v x τ, v y τ ≤ , σ mn = 0 or 0 ≤ σ mn ≤ . The prior on σ H is a Jeffreys prior with θ min = 1 and θ max = 10 .The estimated parameter values for the broad priors are given with uncertainty (as mean ± standard deviation). L max is themaximal L i found during the nested sampling run. The probability of the models are computed from the estimated mean ofthe evidences as P ( M i | data) = Z i / (cid:80) j =1 Z j , i.e., from Bayes’ formula using P ( M i ) = 1 / P (data) = (cid:80) j =1 Z j / i log Z i σ H v x τ v y τ σ mn H log L max P ( M i | data)1 − . ± .
06 26 . ± . / − .
01 0 . − . ± .
12 25 . ± . . ± . . ± . / − .
38 0 . − . ± .
09 17 . ± . . ± . / − .
78 0 . − . ± .
08 26 . ± . . ± . − .
33 0 . − . ± .
14 8 . ± . . ± . . ± . . ± . / − .
71 0 . − . ± .
15 25 . ± . . ± . . ± . . ± . − .
85 0 . − . ± .
10 7 . ± . . ± . . ± . − .
59 0 . − . ± .
15 18 . ± . . ± . . ± . . ± . . ± . − .
85 0 . . − .
94 -TABLE II. Results obtained by applying the nested sampling algorithm on a subdiffusive trajectory generated with N = 200time steps and parameters as given by the last line in the table. The models and priors are the same as in Table I. the experimental data with those of an ensemble of arti-ficial trajectories. To mirror the experimental data, 170artifical trajectories were simulated, where the underly-ing model was chosen with equal probability from the8 possible models. Model parameters were then drawnfrom the prior distributions given below Table I. Themodel selection procedure was then performed as for theexperimental data with nested sampling, and the resultsare displayed in Fig. 3.As shown in the upper left of Fig. 3 then the nestedsampling algorithm selected Model 1 (pure Brownian mo-tion) as the most probable for about half of the tra-jectories at 25 ◦ C, while Model 4 (fractional Brownianmotion) was selected for the other half. The distribu-tion in the upper right of Fig. 3 shows that selection offractional Brownian motion is more pronounced at 37 ◦ C,confirming the previous results . In the lower left plotwe illustrate the results of a test measuring the modelselection effectiveness on the artificial data. The bars in-dicate how many times each model was selected as themost probable, the lower part indicating correct choices,while the upper part represents incorrect ranking. Thecorrect model was ranked highest 123 out of 170 times.The reason for the discrepancy is indicated in the his-togram in the lower right, where the x-axis represents the true underlying model, while the bars indicate howoften it was ranked highest. The plot indicates the well-known fact, that the Bayesian approach includes an Oc-cam’s razor effect: data from the simple model 1 wascorrectly classified each time, while the data from thecomplex model 8 was quite often appointed to one of thesimpler models. This can also be viewed as an instanceof the Jeffreys-Lindley paradox [31].
IV. GOODNESS-OF-FIT TEST
The main shortcoming of the Bayesian analysis is, thatit yields only relative model rankings and parameter val-ues for the candidate models. To investigate how well themodels describe the data, we supplement the Bayesiananalysis with a goodness-of-fit test. This test checkswhether the probability of the observed trajectory ∆ x N is typical for the inferred model. We take our startingpoint in the p value [17] p = (cid:104) Θ[ P (∆ x ∗ N | θ ∗ , M i ) − P (∆ x N | θ ∗ , M i )] (cid:105) , (27)where Θ[ · ] is the Heaviside step function and the averageis over the joint posterior distribution P (∆ x ∗ N , θ ∗ | M i ) forthe parameters θ ∗ and hypothetical trajectories ∆ x ∗ N . i log Z i σ H v x τ v y τ σ mn H log L max P ( M i | data)1 − . ± .
06 17 . ± . / − .
84 0 . − . ± .
12 17 . ± . − . ± . − . ± . / − .
79 0 . − . ± .
09 17 . ± . . ± . / − .
84 0 . − . ± .
07 17 . ± . . ± . − .
69 0 . − . ± .
14 17 . ± . − . ± . . ± . . ± . / − .
79 0 . − . ± .
13 17 . ± . − . ± . . ± . . ± . − .
66 0 . − . ± .
10 17 . ± . . ± . . ± . − .
69 0 . − . ± .
14 17 . ± . − . ± . . ± . . ± . . ± . − .
66 0 . τ = 0 . H for each ex-perimental trajectory. In (a) results for trajectories capturedat 25 ◦ C are shown, while (b) depicts the distribution fortrajectories captured at 37 ◦ C For each model M i , we estimate p by drawing N repl =100 sets of parameter values θ ∗ k , k = 1 , · · · , N repl amongthe samples θ ( i )min according to their posterior probabil-ities L i w i /Z . For each of the chosen parameter sam-ples a replicated trajectory ∆ x ∗ N is generated accord-ing to the corresponding stochastic model and then the p value is estimated by averaging Θ[ P (∆ x ∗ N | θ ∗ , M i ) − P (∆ x N | θ ∗ , M i )] over the N repl replicated trajectories.The reasoning of such a p -value is as follows: If themodel M i and parameters θ ∗ are a perfect description ofthe data, then the probability distribution for the p -value FIG. 3. (a): Histogram of the most probable model for eachtrajectory for the data at 25 ◦ . (b): corresponding histogramfor data at 37 ◦ . (c): corresponding histogram for 170 artifi-cial trajectories generated using parameters drawn from to thepriors given in Table I with a 1 / would be uniform between zero and one. Thus, if the p -value turns out to be improbably close to the extrem-ities zero or one, then one can conclude that the modelprobably does not describe the data well. However, asproven in an appendix of [17], the p value as calculatedabove for the full trajectory will be valued close to 0 . FIG. 4. Histograms of the p -values defined in Eq. (28) for thesame artificial trajectories as used in the lower plots of Fig. 3.The values are for n = 1 (top left), n = 2 (top right), n = 4(bottom left) and n = 16 (bottom right) and the model withthe highest evidence for the trajectory.FIG. 5. Same as Fig. 4 but for the experimental data at 25 ◦ . parameter inference. The test is therefore extended byconsidering how well the model describes the data whenconsidering longer steps in time. Thus, the estimatorused for the check is different than that which served asscore for the parameter inference. The longer time stepsare achieved simply by removing data points, keeping only every n th position along the trajectory for some in-teger n . We thus construct the steps in the x -directionof a trajectory ∆ x ( n ) N (cid:48) by keeping only the x -coordinates x ( n ) i = x ni for the possible integers i = 0 , . . . , N (cid:48) and con-struct the corresponding steps ∆ x ( n ) i = x ( n ) i − x ( n ) i − . Thereplicated trajectories ∆ x ( n ) ∗ N are constructed similarly,and the p -values are estimates of p = (cid:104) Θ[ P (∆ x ( n ) ∗ N | θ ∗ , M i ) − P (∆ x ( n ) N | θ ∗ , M i )] (cid:105) , (28)obtained in the same way as described above but usingthe time step n × τ .For the artificial trajectories corresponding to thelower plots in Fig. 3 the distribution of the p -values isshown in Fig. 4. Note how the values cluster around p = 0 . n = 1, but becomes increasingly uniform as n is increased. As a reference, the experimental data at 25 ◦ Celsius give rise to a different pattern, see Fig. 5. When n become larger than one, the p -values do not spreadout uniformly. At n = 2 the distribution moves towards p = 1, which means that trajectories replicated from themodels tend to be more probable than the observed tra-jectories. As n is increased even further, the p -valuesdistribution tend towards p = 0 instead, which indicatesthat the experimental data is more probable than whatis expected for the model. An explanation of this behav-ior could be, that the correlations between steps at shortand long time scales is different from that described by asimple power law. Thus a reasonable conclusion from the p -values is that the set of candidate models should be ex-panded to achieve models that describe the data better.Note that this conclusion could not have been reachedby just considering the evidences of the candidate mod-els, since they only tell you how to rank the candidatemodels relative to each other. The corresponding plotsat 37 ◦ are qualitatively similar to Fig. 5. V. CONCLUSION
We implemented Bayesian model selection and param-eter estimation for fractional Brownian motion with driftand measurement noise. The approach was tested on ar-tificial trajectories and found to make estimates of theparameters and uncertainties that are consistent withthe true underlying parameters. With limited data theBayesian approach was able to select the true underlyingmodel, except when the deviation from a simpler modelis small compared with the broadness of the prior.For the analysis of experimental data from vesiclestracked in Chinese hamster ovarian cells, the approachfavored both regular and anomalous diffusion, with an in-creased tendency towards superdiffusion for intact cells,consistent with previous findings[30]. However, a modelcheck regarding the typicality of the steps observed at dif-ferent time scales, revealed shortcomings of the fractionalBrownian motion model. We observed tendencies for theexperimental trajectories to be untypically improbableat short time-scales and untypically probable at longertime-scales. This suggests that the set of candidate mod-els does not fully cover the kind of stochastic process thatgoverns the data, and thus that the set of candidate mod-els should be expanded for these particular data sets. Wenote that the present approach could easily be modifiedto include models of stationary Gaussian processes withdifferent autocovariance functions than the one for FBM.Thus we expect that the approach can be applied to sin-gle particle tracking in many different situations.
VI. ACKNOWLEDGEMENT
JK and MAL acknowledge funding from Danish Coun-cil for Independent Research - Natural Sciences (FNU),grant number 4002-00428B.
Appendix A: Durbin-Levinson algorithm
Our starting for the derivation of the Durbin-Levinsonalgorithm is an expression of ∆˜ x n +1 given all the previoussteps ∆˜ x n +1 = n (cid:88) j =1 φ n,j ∆˜ x n +1 − j + z n +1 = Φ Tn,n ∆ ˆ˜ x n + z n +1 (A1)Here the sequence of coefficients Φ Tn,n =[ φ n, , φ n, , . . . , φ n,n ] controls the correlation withthe previous steps, the hat means that the orderof the elements in the vector has been reversed:∆ ˆ˜ x Tn = [∆˜ x n , ∆˜ x n − , . . . , ∆˜ x ], and z n +1 is a zero meanGaussian noise which is independent of the previousdisplacements ∆ ˜ x n .The coefficients φ n,i can be determined by multiplyingboth sides of Eq. (A1) by ∆˜ x i , i < n + 1 and averagingto get γ ( n + 1 − i ) = n (cid:88) j =1 φ n,j γ ( n + 1 − j − i ) (A2)With c Tn = [ γ (1) , γ (2) , . . . , γ ( n )] this equation can bewritten in matrixform as Γ n Φ n,n = c n or Φ n,n = Γ − n c n (A3)In principle this determines the coefficients φ n,i in termsof the autocovariance function γ ( n ). However, it involvesthe inverse of Γ N , which we would like to avoid calculat-ing. Instead we aim at obtaining an iterative procedurefor calculating φ n,i step by step.Before going further with that let us find expressionsfor the conditional mean∆˜ µ n +1 = (cid:90) ∆˜ x n +1 P (∆˜ x n +1 | ∆˜ x , . . . , ∆˜ x n ) d ∆˜ x n +1 (A4) of ∆˜ x n +1 given ∆˜ x , . . . , ∆˜ x n and corresponding vari-ance σ n +1 . For the conditional mean we see directly fromEq. (A1) that ∆˜ µ n +1 = Φ Tn,n ∆ ˆ˜ x n (A5)which is Eq. (11). From Eq. (A1) we also see that thevariance σ n +1 of ∆˜ x n +1 given ∆˜ x , . . . , ∆˜ x n is the sameas the variance of z n +1 . Thus we find σ n +1 = (cid:104) z n +1 (cid:105) = (cid:104) (∆˜ x n +1 − Φ Tn,n ∆ ˆ˜ x n ) (cid:105) (A6)= γ (0) − Φ Tn,n c n + Φ Tn,n (cid:104) ∆ ˆ˜ x n ∆ ˆ˜ x Tn (cid:105) Φ n,n (A7)= γ (0) − Φ Tn,n c n (A8)Let us now return to finding iterative expressions for φ n,i . First note that Γ n +1 can be written in a blockstructure Γ n +1 = (cid:34) Γ n ˆ c n ˆ c Tn γ (0) (cid:35) (A9)where ˆ c Tn = [ γ ( n ) , γ ( n − , . . . , γ (1)]. This we can use towrite Γ n +1 Φ n +1 ,n +1 = c n +1 as (cid:34) Γ n ˆ c n ˆ c Tn γ (0) (cid:35) (cid:20) Φ n +1 ,n φ n +1 ,n +1 (cid:21) = (cid:20) c n γ ( n + 1) (cid:21) (A10)where Φ Tn +1 ,n = ( φ n +1 , , φ n +1 , , . . . , φ n +1 ,n ). Writingthis out as two equations we obtain Γ n Φ n +1 ,n + ˆ c n φ n +1 ,n +1 = c n (A11a)ˆ c Tn Φ n +1 ,n + γ (0) φ n +1 ,n +1 = γ ( n + 1) (A11b)This can be rearranged to find Φ n +1 ,n = Γ − n ( c n − ˆ c n φ n +1 ,n +1 ) (A12a)ˆ c Tn Γ − n ( c n − ˆ c n φ n +1 ,n +1 ) + γ (0) φ n +1 ,n +1 = γ ( n + 1)(A12b)and further exploiting Eq. (A3) to obtain Φ n +1 ,n = Φ n,n − ˆ Φ n,n φ n +1 ,n +1 (A13a) φ n +1 ,n +1 = γ ( n + 1) − ˆ c Tn Φ n,n γ (0) − ˆ c Tn Γ − n ˆ c n (A13b)Since ˆ c Tn Γ − n ˆ c n = c Tn Γ − n c n we see that the denominatorof Eq. (A13b) is the variance given in Eq. (A8). Thuswe arrive at (after shifting the index n by 1) φ n,i = φ n − ,i − φ n − ,n − i φ n,n for i < n (A14a) φ n,n = γ ( n ) − (cid:80) n − j =1 γ ( n − j ) φ n − ,j σ n (A14b)which gives the iterative procedure described in Eqs. (13)and (14).0Returning to the variance we can now simplify it using Eqs. (A13a) and (A13b) to σ n +1 = γ (0) − c Tn Φ n,n = γ (0) − [ c Tn − , γ ( n )] (cid:20) Φ n,n − φ n,n (cid:21) = γ (0) − ( c Tn − Φ n,n − + γ ( n ) φ n,n )= γ (0) − ( c Tn − ( Φ n − ,n − − ˆ Φ n − ,n − φ n,n )+ γ ( n ) φ n,n )= σ n − σ n φ n,n γ ( n ) − c Tn − ˆ Φ n − ,n − σ n = σ n (1 − φ n,n ) (A15)which is Eq. (12). This concludes our derivation of theDurbin-Levinson algorithm. [1] M. J. Saxton and K. Jacobson, Annu. Rev. Biophys.Biomol. Struct. , 373 (1997).[2] I. Golding and E. C. Cox, Phys. Rev. Lett. , 098102(2006).[3] M. Weiss, H. Hashimoto, and T. Nilsson, Biophys. J. ,4043 (2003).[4] J.-H. Jeon, V. Tejedor, S. Burov, E. Barkai, C. Selhuber-Unkel, K. B.-S. rensen, L. Oddershede, and R. Metzler,Phys. Rev. Lett. , 048103 (2011).[5] A. Lubelski, I. M. Sokolov, and J. Klafter, Phys. Rev.Lett. , 250602 (2008).[6] Y. He, S. Burov, R. Metzler, and E. Barkai, Phys. Rev.Lett. , 058101 (2008).[7] M. Magdziarz, A. Weron, K. Burnecki, and J. Klafter,Phys. Rev. Lett. , 180602 (2009).[8] S. Condamin, V. Tejedor, R. Voituriez, O. B´enichou, andJ. Klafter, Proc. Natl. Acad. Sci. USA , 5675 (2008).[9] V. Tejedor, O. B´enichou, R. Voituriez, R. Jungmann,F. Simmel, C. Selhuber-Unkel, L. B. Oddershede, andR. Metzler, Biophys. J. , 1364 (2010).[10] Y. Meroz and I. M. Sokolov, Phys. Rep. , 1 (2015).[11] D. J. C. MacKay, Information Theory, Inference,and Learning Algorithms (Cambridge University Press,2003).[12] E. T. Jaynes and G. L. Bretthorst,
Probability Theory -The Logic of Science (Cambridge University Press, 2003).[13] D. S. Sivia and J. Skilling,
Data Analysis: A BayesianTutorial (Oxford University Press, 2006).[14] J. Skilling, AIP Conf. Proc. , 395 (2004).[15] B. B. Mandelbrot and J. W. V. Ness, SIAM Review ,422 (1968). [16] https://github.com/mlomholt/fbm .[17] J. Krog and M. A. Lomholt, Phys. Rev. E , 062106(2017).[18] G. Sikora, K. Burnecki, and A. Wy(cid:32)loma´nska, Phys. Rev.E , 032110 (2017).[19] G. Sikora, (2018), arXiv:1803.08553 [physics.data-an].[20] R. Das, C. W. Cairo, and D. Coombs, PLoS Comput.Biol. , e1000556 (2009).[21] F. Persson, M. Lind´en, C. Unoson, and J. Elf, Nat. Meth. , 265 (2013).[22] N. Monnier, Z. Barry, H. Y. Park, K.-C. Su, Z. Katz,B. P. English, A. Dey, K. Pan, I. M. Cheeseman, R. H.Singer, and M. Bathe, Nat. Meth. , 838 (2015).[23] J.-B. Masson, D. Casanova, S. T¨urkcan, G. Voisinne,M. R. Popoff, M. Vergassola, and A. Alexandrou, Phys.Rev. Lett. , 048103 (2009).[24] A. Beskos, J. Dureau, and K. Kalogeropoulos,Biometrika , 809 (2015).[25] Y. Kozachenko, A. Melnikov, and Y. Mishura, Statistics , 35 (2015).[26] A. R. Taheriyoun and M. Moghimbeygi, Scientific Re-ports (2017).[27] P. J. Brockwell and R. A. Davis, Time Series: Theoryand Methods , 2nd ed. (Springer, 1991).[28] T. Savin and P. S. Doyle, Biophysical Journal , 623(2005).[29] A. J. Berglund, Phys. Rev. E , 011917 (2010).[30] F. W. Lund, M. A. Lomholt, L. M. Solanko, R. Bittman,and D. W¨ustner, BMC Biophysics , 20 (2012).[31] R. D. Cousins, Synthese194