[PDF] Classification of particle trajectories in living cells: machine learning versus statistical testing hypothesis for fractional anomalous diffusion

Abstract

Single-particle tracking (SPT) has become a popular tool to study the intracellular transport of molecules in living cells. Inferring the character of their dynamics is important, because it determines the organization and functions of the cells. For this reason, one of the first steps in the analysis of SPT data is the identification of the diffusion type of the observed particles. The most popular method to identify the class of a trajectory is based on the mean square displacement (MSD). However, due to its known limitations, several other approaches have been already proposed. With the recent advances in algorithms and the developments of modern hardware, the classification attempts rooted in machine learning (ML) are of particular interest. In this work, we adopt two ML ensemble algorithms, i.e. random forest and gradient boosting, to the problem of trajectory classification. We present a new set of features used to transform the raw trajectories data into input vectors required by the classifiers. The resulting models are then applied to real data for G protein-coupled receptors and G proteins. The classification results are compared to recent statistical methods going beyond MSD.

Full PDF

CClassiﬁcation of particle trajectories in living cells: machinelearning versus statistical testing hypothesis for fractionalanomalous diﬀusion

Joanna Janczura, Patrycja Kowalek, HannaLoch-Olszewska, Janusz Szwabi´nski, and Aleksander Weron

Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center,Wroc(cid:32)law University of Science and Technology, 50-370 Wroc(cid:32)law, Poland

Abstract

Single-particle tracking (SPT) has become a popular tool to study the intracellular transportof molecules in living cells. Inferring the character of their dynamics is important, because itdetermines the organization and functions of the cells. For this reason, one of the ﬁrst steps inthe analysis of SPT data is the identiﬁcation of the diﬀusion type of the observed particles. Themost popular method to identify the class of a trajectory is based on the mean square displace-ment (MSD). However, due to its known limitations, several other approaches have been alreadyproposed. With the recent advances in algorithms and the developments of modern hardware, theclassiﬁcation attempts rooted in machine learning (ML) are of particular interest. In this work, weadopt two ML ensemble algorithms, i.e. random forest and gradient boosting, to the problem oftrajectory classiﬁcation. We present a new set of features used to transform the raw trajectoriesdata into input vectors required by the classiﬁers. The resulting models are then applied to realdata for G protein-coupled receptors and G proteins. The classiﬁcation results are compared torecent statistical methods going beyond MSD.

Keywords: single particle tracking, anomalous diﬀusion, time series classiﬁcation, machine learning a r X i v : . [ q - b i o . Q M ] J u l . INTRODUCTION Single-particle tracking (SPT) has become an important tool in the biophysical commu-nity in recent years. It was ﬁrst carried out on proteins diﬀusing in the cell membrane [1, 2].Since then it was successfully used to study diﬀerent transport processes in intracellular envi-ronment, providing valuable information about mechano-structural characteristics of livingcells. For instance, it helped already to unveil the details of the movement of molecularmotors inside cells [3, 4] or of target search mechanisms of nuclear proteins [5].Living cells belong to the class of active systems [6], in which the particles undergosimultaneous active and thermally driven transport. It has been shown already that thedynamics of proteins in cells determines their organization and functions [7]. This is thereason why it is crucial to identify the type of motion of the observed particles in order todeduct their driving forces [8–11].Over the last decades, a number of stochastic models has been already proposed to de-scribe the intracellular transport of molecules [11, 12]. Within those models, the dynamicsof molecules usually alternates between distinct types of diﬀusion, each of which may beassociated with a diﬀerent physical scenario. The Brownian motion [13] models a particlethat diﬀuses freely, i.e. it does not meet any obstacles in its path nor it interacts with othermolecules in its surrounding. The subdiﬀusion is appropriate to represent trapped parti-cles [11, 14], particles which encounter ﬁxed or moving obstacles [8, 15] or particles sloweddown due to the viscoelastic properties of the cytoplasm [16]. Finally, the superdiﬀusionmodels the motion driven by molecular motors: the particles move faster than in a freediﬀusion case and in a speciﬁc direction [17]. The sub- and superdiﬀusion together are oftenreferred to as the anomalous diﬀusion.The standard method of classiﬁcation of individual trajectories into those three types ofdiﬀusion is based on the mean square displacement (MSD) [12]. Within this approach oneﬁts the theoretical MSD curves for various models to the data and then selects the best ﬁtwith statistical analysis [18]. A linear MSD curve indicates the free diﬀusion, a sublinear(superlinear) one - the subdiﬀusion (the superdiﬀusion). However, there are some issuesrelated with this method. In many cases, the experimental trajectories are too short toextract a meaningful information from MSD. Moreover, the ﬁnite precision adds a term tothe MSD, which is known to limit the interpretation of the data [9, 12, 19, 20]. As a result,2everal methods improving or going beyond the MSD have been introduced to overcomethese problems. For instance, Michalet [12] used an iterative method called the optimalleast-squares ﬁt to determine the optimal number of points to obtain the best ﬁt to MSD inthe presence of localization errors. Weiss [21] used a resampling approach that eliminateslocalization errors in the time-averaged MSD of subdiﬀusive fractional Brownian motion pro-cesses. The trajectory spread in space calculated through the radius of gyration [22], the VanHove displacements distributions and deviations from Gaussian statistics [23], self-similarityof trajectory using diﬀerent powers of the displacement [24], velocity autocorrelation func-tion [25, 26] or the time-dependent directional persistence of trajectories [27] methods canbe combined with the output of MSD to improve the classiﬁcation results. The distributionof directional changes [28], the mean maximum excursion method [29] and the fractionallyintegrated moving average (FIMA) framework [30] may eﬃciently replace the MSD estima-tor for classiﬁcation purposes. Hidden Markov Models (HMM) has been proposed to checkthe heterogeneity within single trajectories [31, 32]. They have proven to be quite usefulin the detection of conﬁnement [33]. Last but not least, classiﬁcation based on hypothesistesting, both relying on MSD and going beyond this statistics, has been shown to be quitesuccessful as well [20, 34].An alternative, very promising approach to SPT data analysis is rooted in computerscience. Namely, classiﬁcation of trajectories may be seen as a subject of machine learning(ML) [35]. In the ML context, classiﬁcation relies on available data, because its goal is toidentify to which category a new observation belongs on the basis of a training data setcontaining observations with a known category membership.There is already a number of attempts to analyze particle trajectories with machinelearning methods. Among them, Bayesian approach [18, 36, 37], random forests [38–40],neural networks [41] and deep neural networks [39, 42–44] have gained a lot of attention andpopularity. While some of the works have focused just on the identiﬁcation of the diﬀusionmodes [38, 39, 41], the others went beyond just the classiﬁcation of diﬀusion and tried toextract quantitative information about the trajectories (e.g. the anomalous exponent [40,42]).Recently, we presented a comparison of performance of two diﬀerent classes of methods:traditional feature-based algorithms (random forest and gradient boosting) and a moderndeep learning approach based on convolutional neural networks [39]. The latter constitutes3owadays the state-of-the-art technology for automatic data classiﬁcation and is much sim-pler to use from the perspective of the end-user, because it operates on raw data and doesnot require any preprocessing eﬀort from human experts [45]. In contrast, the traditionalmethods require a representation of trajectories by a set of human-engineered features orattributes [46]. In most of the applications the deep learning approach outperforms thetraditional methods. However, in some situations it is still worth to use them, because theyusually work better on small data sets, are computationally cheaper and easier to interpret.From our results it follows that both approaches achieve excellent (and very similar) accu-racies on synthetic data. But they turned out to be really bad in terms of transfer learning.This concept refers to a situation, in which a classiﬁer is trained in one setting and thenapplied to a diﬀerent one. The classiﬁers from Ref. [39] were not able to successfully classifytrajectories generated with methods diﬀerent from the ones used for the training set.In this paper, we are going to present an improved version of the traditional classiﬁerspresented in Ref. [39]. We will propose a new set of training data as well as a new collectionof features describing a trajectory. Both are inspired by a recent statistical analysis ofanomalous diﬀusion [34]. To illustrate the transfer learning abilities of the new classiﬁers,we will apply them to the data from a single-particle tracking experiment on G protein-coupled receptors and G proteins [47]. Results of classiﬁcation from Ref. [34] will be usedas a benchmark.The paper is organized as follows. In Sec. II, we brieﬂy introduce the diﬀerent modes ofdiﬀusion and methods of their analysis. Sec. III contains a short description of the machinelearning methods used in this work. Stochastic models of diﬀusion for generation of syntheticdata are presented in Sec. IV. The data itself is characterized in Sec. V. The set of featuresused as input to the classiﬁers is introduced in Sec. VI. Our results are presented in Sec. VII,followed by some concluding remarks.

II. DIFFUSION MODES AND THEIR ANALYSIS

As already mentioned in the introduction, identiﬁcation of the diﬀusion modes of parti-cles within living cells is important, because they reﬂect the interactions of those particleswith their surrounding. For instance, if a particle is driven by a free diﬀusion (Brownianmotion) [13], we expect that it does not meet any obstacles in its path and does not undergo4ny relevant interactions with other particles. Deviations from Brownian motion are calledanomalous diﬀusion and can be divided into two distinct classes. Subdiﬀusion is slower thanthe normal one. It usually occurs in crowded or constrained domains and can be brought to-gether with diﬀerent physical mechanisms including immobile obstacles, cytoplasm viscosity,crowding, trapping and heterogeneities [48–50]. Superdiﬀusion represents active transportalong the cytoskeleton, assisted by molecular motors [17]. Particles undergoing that typeof motion move faster than those freely diﬀusing and usually do not come back to previouspositions.Although diﬀerent scenarios for both classes of anomalous diﬀusion are possible [11,49, 51–54], for the purpose of this work we will limit ourselves to those three basic typesmentioned above: free, sub- and superdiﬀusion.The most popular method of deducing a particles’ type of motion from their trajectoriesis based on the analysis of the mean square displacement (MSD) [55],

M SD ( t ) = E (cid:16) (cid:107) X t + t − X t (cid:107) (cid:17) , (1)where ( X t ) t> is a particle trajectory, (cid:107) · (cid:107) is the Euclidean norm and E is the expectationof the probability space. Since in many experiments only a limited number of trajectories isobserved, the time averaged MSD (TAMSD) calculated from a single trajectory is usuallyused as the estimator of MSD, (cid:100) M SD ( n ∆ t ) = 1 N − n + 1 N − n (cid:88) i =0 (cid:107) X t i + n − X t i (cid:107) . (2)The trajectory is assumed to be given in form of N consecutive two dimensional positions X i = ( x i , y i ) ( i = 0 , . . . , N ) recorded with a constant time interval ∆ t and n is the timelag between the initial and the ﬁnal positions of the particle. If the underlying process isergodic and has stationary increments, TAMSD converges to the theoretical MSD [51].TAMSD as a function of the time lag for the normal diﬀusion converges asymptoticallyto a linear function [9], i.e. for large N : (cid:100) M SD ( n ∆ t ) ∼ D ( n ∆ t ) , (3)with D being the diﬀusion coeﬃcient. For subdiﬀusion, being slower than diﬀusion, thebehaviour of TAMSD is sublinear, while for superdiﬀusion, being faster than diﬀusion, thebehaviour is superlinear. Thus, for pure trajectories with no localization errors one could5asily determine their diﬀusion type by ﬁtting a function α log( n ∆ t ) + β to the estimatedlog[ (cid:100) M SD ( n ∆ t )] curve. If α < α >

1, as superdiﬀusive. Although theoretically this approach allows for the uncomplicateddistinction of the diﬀusion types, there are several issues related with it as a method forclassiﬁcation. First, real trajectories are usually noisy, which makes the ﬁtting of a mathe-matical model a challenging task, even in the simplest case of the normal diﬀusion [12, 21].Secondly, according to Eq. (2), only the values of (cid:100)

M SD corresponding to small time lags arewell averaged. The larger the lag, the smaller is the number of displacements contributingto the averages, resulting in ﬂuctuations increasing with the lag. Selecting a suitable lag isby the way a well known problem in biophysics [20, 56, 57]. Since many real trajectoriesare short, we are forced to concentrated on short times (small lags). This induces anotherproblem in a classiﬁcation method based only on MSD curves, as in this case the diﬀerentpower laws look alike even in the absence of noise.

III. MACHINE LEARNING APPROACH

Several diﬀerent procedures have been already proposed to circumvent the limitationsof the MSD [12, 20, 22–24, 27–29, 31–34], including the use of machine learning meth-ods [18, 36–40, 42, 43]. Recently, we discussed the applicability of three diﬀerent machinelearning algorithms to classiﬁcation, including two feature-based methods and a deep learn-ing one [39]. The results of that study were ambiguous. On one hand all of the methodsperformed excellent on the test data, on the other - they failed to transfer their knowledge todata coming from unseen physical models. The latter ﬁnding practically disqualiﬁed themas candidates for a reliable classiﬁcation tool.In this paper, we are going to continue the analysis started in Ref. [39] and presentimproved versions of the classiﬁers, which performs much better in terms of transfer learning.We will focus on the traditional machine learning methods: the random forest (RF) [58, 59]and the gradient boosting (GB) [60, 61]. Both methods are feature-based, meaning thateach instance in the data set is described by a set of human-engineered attributes [46]. Andboth belong to the class of ensemble methods, which combine multiple base classiﬁers toform a better one. In each case, decision trees [62] are used as the base classiﬁers.A decision tree is built by splitting the original dataset (trajectories with known classes),6onstituting the root node of the tree, into subsets, which represent the successor children.The splitting is based on a set of rules utilizing the values of features. This process isrepeated on each derived subset in a recursive manner. The recursion is completed whenthe subset at a node has all samples belonging to the same class (i.e. the node is pure) orwhen splitting no longer adds value to the classiﬁcation. At each step, a feature that bestsplits the data is chosen. Two metrics are typically used to measure the quality of the split:Gini impurity and information gain [35].Gini impurity tells us how often a randomly chosen element from the set would be incor-rectly labeled if it was randomly labeled according to the distribution of labels in that set.It is given by I G = J (cid:88) i =1 p i (1 − p i ) , (4)where J is the number of classes ( J = 3 in our case) and p i is the fraction of items labeledwith class i in the set.Information gain related to a split is simply the reduction of information entropy [63],calculated as the diﬀerence between the entropy of a parent node in the tree and a weightedsum of entropies of its children nodes. The entropy itself is given as H = − J (cid:88) i =1 p i log p i , (5)where p , p , . . . are fractions of each class present in the node.Decision trees are often used for classiﬁcation purposes, because they are easy to under-stand and interpret. However, single trees are unstable in the sense that a small variationin the data may lead to a completely diﬀerent tree [64]. They also have a tendency tooverﬁt, i.e. they model the training data too well and learn noise or random ﬂuctuations asmeaningful concepts, which limits their accuracy in case of unseen data [65]. That is whythey are rather used as building blocks of the ensembles and not as stand alone classiﬁers.In a random forest, multiple decision trees are constructed independently from the sametraining data. The predictions of individual trees are aggregated and then their mode istaken as the ﬁnal output. In gradient boosting, the trees are not independent. Instead,they are built sequentially by learning from mistakes commited by the ensemble. In manyapplications, gradient boosting is expected to have a better performance than random forest.However, it is usually not the better choice in case of very noisy data.7 x y Normal di ﬀ usion Input (trajectory)Output (its label) T r a i n i ng s e t Training processML algorithm learns the parameters and produces a trained model

Final model U n s een da t a Input (new trajectory) Predicted output

Subdi ﬀ usion Preprocessing P r ep r o c e ss i ng FIG. 1. Workﬂow of our classiﬁcation method. The training set is composed of a large numberof synthetic trajectories (Sec. V B). The preprocessing phase consists in extraction of featuresintroduced in Sec. VI.

A workﬂow of our classiﬁcation method is shown in Fig. 1. The training set consists ofa large number of synthetic trajectories and their labels (diﬀusion modes). The trajectorieswere generated with various kinds of theoretical models of diﬀusion (see Sections IV and V Bfor further details). In the preprocessing phase, the raw data is cleaned and transformedinto a form required as input by the classiﬁer. Many traditional classiﬁers including randomforest and gradient boosting work much better with vectors of features characterizing eachtrajectory instead of raw data. The features used in this work are introduced in Sec. VI.Some authors normalize the trajectories before further processing [40]. However, we omittedthis step as our preliminary analysis indicated a signiﬁcant decrease in the performance ofthe classiﬁers induced by normalization. The ensembles of trees were inferred from thefeature vectors and their labels. Once trained, they may be used to classify new trajectories,including the experimental ones.

IV. STOCHASTIC MODELS OF DIFFUSION

The most popular theoretical models of diﬀusion commonly employed are: continuous-time random walk (CTRW) [11], obstructed diﬀusion (OD) [8, 66], random walk on ran-8om walks (RWRW) [67], random walks on percolating clusters (RWPC) [68, 69] fractionalBrownian motion (FBM) [70–72], fractional Levy α -stable motion (FLSM) [73], fractionalLangevin equation (FLE) [74] and autoregressive fractionally integrated moving average(ARFIMA) [75]. They are applicable to diﬀerent physical environments: trapping andcrowded environments (CTRW, FFPE); labyrinthine environments (OD, RWPC, RWRW);viscoelastic systems (FBM, FLSM, FLE, ARFIMA); systems with time-dependent diﬀusion(scaled FBM, ARFIMA). Following Refs. [20, 34], we will focus on three stochastic pro-cesses known to generate diﬀerent kinds of fractional diﬀusion: fractional Brownian motion,directed Brownian motion (DBM) [76] and Ornstein-Uhlenbeck process (OU) [77].FBM is the solution of the stochastic diﬀerential equation dX it = σdB H,it , i = 1 , , (6)where the parameter σ > σ = √ D , H is the Hurstparameter ( H = α/

2) and B Ht - a continuous-time Gaussian process that starts at zero, hasexpectation zero and has the following covariance function:E (cid:16) B Ht B Hs (cid:17) = 12 (cid:16) | t | H + | s | H − | t − s | H (cid:17) . (7)For H < (i.e. α < H = . And for H > , FBM generates superdiﬀusive motion (Fig. 2a).The directed Brownian motion, also known as the diﬀusion with drift, is the solution to dX it = v i dt + σdB / ,it , i = 1 , , (8)where v = ( v , v ) ∈ R is the drift parameter. This process generates superdiﬀusion relatedto an active transport of particles driven by molecular motors. The velocity of the motors ismodeled by the parameter v (Fig. 2b). For v = 0, the process reduces to normal diﬀusion.The Ornstein-Uhlenbeck process is known to model conﬁned diﬀusion, which is a subclassof subdiﬀusion (Fig. 2b). It corresponds to a particle inside a potential well and is a solutionto the following stochastic diﬀerential equation: dX it = − λ i ( X it − θ i ) dt + σdB / ,it , i = 1 , , θ i ∈ R . (9)Here, θ = ( θ , θ ) is the equilibrium position of the particle and λ i measures the strength ofinteraction. For λ i = 0, OU reduces to normal diﬀusion as well.9 log( t ) [log( ms )] l o g ( M S D ( t )) l o g ( t ) = = )%0 > 1 )%0 < 1 D

50 0 [ [ m ]50250 \ [ m ] [ [ m ]05 \ [ m ] log( t ) [log( ms )] l o g ( M S D ( t )) l o g ( t ) ',5(&7('28 E [ [ m ]0200400 \ [ m ] [ [ m ]505 \ [ m ] FIG. 2. Time-averaged mean-squared displacement calculated for: (a) FBM with diﬀerent valuesof α , (b) DBM and OU processes. The trajectories used to calculate the MSD curves are shown inthe corresponding insets and are consistent with real data time and distance scales: ms and µm accordingly. The solid line in both plots indicates TAMSD of normal diﬀusion. V. OUR DATASETA. Real SPT data

The classiﬁers built in this study will be applied to the data from single-particle trackingexperiment on G protein-coupled receptors and G proteins, already analyzed in Refs. [34, 47].The receptors are of great interest, because they mediate the biological eﬀects of manyhormones and neourotransmitters and are also important as pharmacological targets [79].Their signals are transmitted to the cell interior via interactions with G proteins. Theanalysis of the dynamics of these two types of molecules will shed more light on how thereceptors and G proteins meet, interact and couple.A subset of that data has been already studied by means of statistical methods in Ref. [34].Since we are interested in using those results as a benchmark for our classiﬁers, we will focuson the very same subset of data in our analysis. Hence, only trajectories with at least 50steps will be taken into account, resulting in 1037 G proteins and 1218 receptors. Thetrajectories under consideration for both types of molecules are visualized in Fig. 3.10

10 15 20 25 [ [ m ] \ [ m ] 5HFHSWRU [ [ m ] \ [ m ] *SURWHLQ FIG. 3. (Color online) Trajectories of the receptors (left) and G proteins (right) used as inputfor the classiﬁers. Diﬀerent colors are introduced to indicate diﬀerent trajectories. The set of thereceptors contains 1218 trajectories and the one of G proteins – 1037 trajectories. The lengths ofthe trajectories are from range [50 , . µ m. B. Synthetic data

Building a classiﬁer requires training data, which consists of a set of training exam-ples [35]. Each of these examples is a pair of an input (trajectory) and its output label(diﬀusion type). In an optimal scenario the training set would contain real trajectories withtheir true labels from e.g. previous experiments on the same type of cells. However, col-lecting a training set consisting of real trajectories is practically impossible. First of all,independently of the method used for analysis, the labels of such trajectories are aﬀected bysome uncertainties [34]. Moreover, typical machine learning algorithms require thousands oftraining examples to provide a reasonable function that maps an input to an output and canbe used for classiﬁcation of new input data. That is why one usually resorts to synthetic,computer generated trajectories to prepare the training set. In this case the true label ofeach trajectory is known in advance and it is rather cheap to generate many of them.The stochastic processes described in Sec. IV will be used to generate the training set.Just to recall, a discrete trajectory of a particle is given by X n = ( X t , X t , . . . , X t N ) , (10)11 ype of diﬀusion Model Parameter ranges H ∈ [0 . − c, . c ] 20000DBM v ∈ [0 , c ] 10000OU θ = 0 , λ = [0 , c ] 10000Subdiﬀusion FBM H ∈ [0 . , . − c ) 20000OU θ = 0 , λ = ( c,

1] 20000Superdiﬀusion FBM H ∈ (0 . c, .

9] 20000DBM v ∈ ( c,

1] 20000TABLE I. Summary of the synthetic trajectories used as the training set. The parameter c wasset to 0.1 in all simulations. If not speciﬁed otherwise, σ = 1 µm s − / and ∆ t = 1 s were used. where X t i = (cid:16) X t i , X t i (cid:17) ∈ R is the position of the particle at time t i = t + i ∆ t , i =0 , , . . . , N . The lag ∆ t between two consecutive observations is assumed to be constant. Intracking experiments, it is determined by the temporal resolution of the imaging method.However, we will assume the lag being equal to 1 s in the simulations. Similarly, we willuse σ = 1 µm s − / most of the time (see Sec. V D for an exception to this choice). Intotal, 120000 trajectories have been generated for the main training set. Their length wasrandomly chosen from the range between 50 and 500 steps. No additional noise was addedto the raw data in this set (see Sec. V C for a set with noise).A summary of the training set is presented in Table I. The case of the free diﬀusionrequires probably a short explanation. From the description in Sec. IV we know that all ofthe models reduce to the normal Brownian motion for some speciﬁc values of the parametres( H = 0 . v = 0 for DBM and λ = 0 for OU). However, it is very diﬃcult todistinguish anomalous diﬀusion processes from the normal one already if their parametersare in the vicinity of those values. That is why we extended the ranges of parametervalues corresponding to the free diﬀusion. Although introduced here at a diﬀerent level, thisapproach resembles the cutoﬀ c used in Ref. [34] to classify the results. As for the value, wetake the smallest one considered therein.The Python package fbm [80] was used to simulate the FBM trajectories as well as theBrownian motion part of the diﬀusion with drift. By default, the fbm() function fromthat package utilizes the Davies-Harte method [81] for fastest performance. However, the12ethod is known to fail for the Hurst parameter close to 1. If this occurs, the func-tion fallbacks to the Hosking’s algorithm [82]. The OU process was generated with the OrnsteinUhlenbeckProcess object from the stochastic package [83]. This object usesthe Euler-Maruyama method [84] to produce realizations of the process.

C. Adding noise

The synthetic dataset introduced in the previous section constitutes our main trainingset for building the classiﬁers. For the sake of comparison with statistical methods presentedin Ref. [34], it consists of pure trajectories, that do not suﬀer from any localization errors.However, real data is usually aﬀected by diﬀerent kinds of noise. For instance, slow driftcurrents in the cytoplasm may induce low frequency noise. Typically, it may be reduced byvarious detrending methods [12, 85]. In contrast, high frequency noise can be due to a varietyof reasons: mechanical vibrations of the instrumental setup; particle displacement while thecamera shutter is open; noisy estimation of true position from the pixelated microscopyimage; error-prone tracking of particle positions when they are out of the camera focalplane [21, 30, 86–88].To account for diﬀerent localization errors and to check their impact on the performanceof the classiﬁers, we prepared a second training set by simply adding a normal Gaussiannoise with zero mean and standard deviation σ gn to each simulated trajectory. We followedthe procedure already used in Refs. [38, 39]. That means, instead of setting σ gn directly, weﬁrst introduced the signal to noise ratio, Q =  √ D ∆ t + v ∆ t σ gn for DBM , √ D ∆ tσ gn otherwise , (11)where v = (cid:113) v + v . For each trajectory, Q was randomly set in the range from 1 (highnoise) to 9 (low one). Then, Eq. (11) was used to determine σ gn for given D and ∆ t . D. Auxiliary training set

During our ﬁrst attempt to apply the classiﬁers to experimental data it turned out thatthe parameter σ = 1, taken from Ref. [34], may not be the best choice for real trajectoriesunder investigation. Thus, we also prepared an auxiliary training set of synthetic trajectories,13hich were simulated with no noise and σ = 0 .

38. This particular value of σ correspondsto the diﬀusion coeﬃcient D = 0 . µm s − and will be explained in Sec. VII E 3. Allother parameters of the set are exactly the same as in the main training set introduced inSec. V B. VI. CLASSIFICATION FEATURES

The random forest and gradient boosting algorithms require human-engineered featuresrepresenting the trajectories for both the model training and the classiﬁcation of new data.Choosing the right features constitutes a challenge and is crucial for the classiﬁcation results.For instance, in Ref. [39] we considered a set of features proposed for the ﬁrst time by Wagneret al [38]. Although we did not apply them to real data, we showed that classiﬁers usingthose features do not generalize well to data generated with models diﬀerent from the onesused for training.A more detailed discussion on the role of the features will be addressed in a forthcomingpaper. In this work we use a new set of features motivated by the statistical analysiscarried out in Ref. [34]. The main conclusion of that paper was that, even though statisticalmethods going beyond the standard MSD classiﬁcation may provide good results even forshort trajectories, no method was found to be superior in all examples and one shouldactually combine diﬀerent approaches to get reliable results.Following this recommendation, we decided to extract features from all methods consid-ered in Ref. [34] and to use them simultaneously as the input for our classiﬁers. Thus, ourfeature set will consist of: • anomalous α exponent (ﬁtted to TAMSD), • the diﬀusion coeﬃcient D (ﬁtted to TAMSD), • the standarized value T N = D N (cid:113) ˆ σ N ( t N − t ) (12)of the maximum distance D N traveled by a particle, D N = max i =1 , ,...,N (cid:107) X t i − X t (cid:107) , (13)14here ˆ σ N is a consistent estimator of the standard deviation of D N ,ˆ σ N = 12 N ∆ t N (cid:88) j =1 (cid:107) X t j − X t j − (cid:107) , (14) • the power γ p (in the function kn γ p ) ﬁtted to p -variation [73, 89]ˆ V ( p ) n = N/n − (cid:88) k =0 (cid:107) X ( k +1) n − X kn (cid:107) p (15)for values of p from 1 to 5.Note that the ﬁrst two of the above features were included in the feature set used in Refs. [39].To determine their values, the maximum lag equal to 10% of each trajectory’s length wasused to calculate the corresponding TAMSD curve. VII. RESULTS

We used the scikit-learn [90] implementations of the random forest and gradient boost-ing algorithms. As already stated in Ref. [39], a cluster of 24 CPUs with 25 GB total memorywas used to perform the computation. The processing time (feature extraction, hyperpa-rameter tuning, training and validation of a model) was of the order of two hours in eachcase. If not stated otherwise, the dataset without noise (see Sec. V B) was used to train theclassiﬁers.

A. Details of the classiﬁers

In order to ﬁnd optimal models, we used the

RandomisedSearchCV method from scikit-learn library. It allows to perform a search over a grid of hyperparameter ranges.Here, a hyperparameter of the model is understood as a parameter, whose value is set beforethe learning process begins (it cannot be derived simply by training of the model).In Table. II, the optimal values of the hyperparameters for our training set are listed. The“with D ” column in the table refers to the full set of features deﬁned in Sec. VI. The “no D ”columns corresponds to a reduced feature set with the diﬀusion coeﬃcient D removed fromconsideration. The reason for introducing the latter set will be explained in Sec. VII E. The bootstrap hyperparameter is a boolean value. It decides whether bootstrap samples ( T rue )15 andom forest Gradient boostingwith D no D with D no D bootstrap T rue T rue

NA NA criterion entropy entropy

NA NA max depth

60 10 10 10 max features log sqrt log log min samples leaf min samples split n estimators

900 600 100 100TABLE II. Hyperparameters of the optimal classiﬁers found with both methods. Their meaningis explained in Sec. VII A. The “with D ” column refers to the full feature set, “no D ” one - to thefeature set after removal of the diﬀusion coeﬃcient D . N A stands for “Not Applicable” (the ﬁrsttwo parameters are random forest speciﬁc). or the whole data set (

F alse ) are used to build each single tree.

Criterion speciﬁes, whichfunction should be used to measure the quality of a split of data into subsamples at a newnode of the tree. Gini impurity and information entropy are available for that purpose [35].The max depth is the maximum depth (the number of levels) of each decision tree. Thenumber of features to consider when looking for the best split is given by max features . Ifequal to log2 ( sqrt ), then the number is calculated as the logarithm (square root) of thenumber of features. Min samples split speciﬁes the minimum number of samples requiredto split a subset of data at an internal node of the tree.

Min samples leaf is the minimumnumber of samples required to be at a leaf node (a node representing a class label). Finally, n estimators gives the number of trees in the ensemble.As it follows from Table II, the ensemble found with gradient boosting is signiﬁcantlysmaller than the one generated with the random forest method.

B. Performance of the classiﬁers

Since our synthetic data set is perfectly balanced (same number of trajectories in eachclass), we may start the analysis of the classiﬁers simply by looking at their accuracy. It is16 andom forest Gradient boostingData set with D no D with D no D Training 0.977 0.953 0.992 0.989Test 0.948 0.946 0.947 0.944TABLE III. Accuracies of the optimal classiﬁers for both the training and the test data. one of the basic measures to assess the classiﬁcation performance, deﬁned as the number ofcorrect predictions divided by the total number of preditions.From the results listed in Table III it follows that in the case of the training set, thegradient boosting method is the more accurate one, even though the diﬀerences are small.Moreover, its decline in accuracy after the removal of D from the feature set is smallerthan for random forest. However, the latter one performs a little bit better on the testset, indicating a small tendency of GB to overﬁt. Despite these diﬀerences, both classiﬁersperform very well.The normalized confusion matrices of the classiﬁers are presented in Fig. 4. By deﬁnition,an element C ij of the confusion matrix is equal to the number of observations known to bein class i (true labels) and predicted to be in class j (predicted labels) [35]. In all cases, theworst performance (93% of correctly predicted labels) is observed for the normal diﬀusion.This relates probably to the fact that in our synthetic training set we also tagged anomaloustrajectories with parameters slightly deviating from the normal ones as free diﬀusion.The data collected in Fig. 4 may be used to calculate some other popular measures givingmore insight into the performance of the classiﬁers: precision, recall and F1 score [91].Precision is the fraction of correct predictions of a class among all predictions of that class.It tells us how often a classiﬁer is correct if it predicts a given class. Recall is the fractionof correct predictions of a given class over the total number of members of that class. Itmeasures the number of relevant results within a predicted class. A harmonic mean ofprecision and recall gives the F1 score - another measure of classiﬁer’s accuracy.As we see from Table IV, both models return much more relevant results than the irrel-evant ones (high precision). Moreover, they yield most of the relevant results (high recall).The F1 scores resemble the accuracies given in Table III.17 U HH V XE V XSH U 3UHGLFWHGODEHOIUHHVXEVXSHU 7 U XH O DEH O 5)ZLWK' I U HH V XE V XSH U 3UHGLFWHGODEHOIUHHVXEVXSHU 7 U XH O DEH O 5)QR' I U HH V XE V XSH U 3UHGLFWHGODEHOIUHHVXEVXSHU 7 U XH O DEH O *%ZLWK' I U HH V XE V XSH U 3UHGLFWHGODEHOIUHHVXEVXSHU 7 U XH O DEH O *%QR' FIG. 4. Normalized confusion matrices for random forest and gradient boosting classiﬁers. The“with D ” label refers to the full feature set, “no D ” one - to the feature set after removal of thediﬀusion coeﬃcient D . All results are rounded to two decimal digits. C. Feature importances

While working with the human-engineered features, it may happen that some of themare more informative than the others. Therefore, knowing the relative importances of thefeatures is useful, because it can provide further insight into the data and the classiﬁcationmodel. The features with high importances are the drivers of the outcome. The leastimportant ones might often be omitted from the model, making it faster to ﬁt and predict.The latter is of particular signiﬁcance in case of models with very large feature sets, as itmay additionally help to reduce the dimensionality of the problem.There are several ways to determine the feature importances. The one implemented inthe scikit-learn library is deﬁned as the total decrease in node impurity caused by a given18 ethod Features Measure Normal diﬀusion Subdiﬀusion Superdiﬀusion Total/AverageSupport 12000 12000 12000 36000Precision 0.912 0.969 0.966 0.949with D Recall 0.935 0.958 0.951 0.948RF F1 0.923 0.963 0.959 0.948Support 12000 12000 12000 36000no D Precision 0.908 0.967 0.967 0.947Recall 0.935 0.955 0.950 0.947F1 0.921 0.961 0.958 0.947Support 12000 12000 12000 36000Precision 0.911 0.967 0.965 0.948with D Recall 0.933 0.958 0.951 0.947GB F1 0.922 0.962 0.958 0.947Support 12000 12000 12000 36000no D Precision 0.907 0.963 0.964 0.945Recall 0.928 0.954 0.951 0.944F1 0.917 0.958 0.957 0.944TABLE IV. Detailed performance analysis of both classiﬁcation methods on the test data. Supportis the number of trajectories known to belong to a given class. All results are rounded to two decimaldigits. feature, averaged over all trees in the ensemble [92]. In other words, the Gini impurities (4)are calculated before and after each split on a given feature to determine the total decreasein the impurity related to that feature. The outcome is then averaged over all trees in theensemble.Relative feature importances for both classiﬁers are shown in Table V. The features areordered according to the descending scores in case of the random forest with D . We see thatthe p -variation for p = 2 (2-var) is the most informative feature, followed by the anomalousexponent α . The diﬀusion coeﬃcient is the least important feature. After its removal therelative importances of the remaining features changed. The diﬀerences between them aresmaller now. Moreover, the 1-var became the second most important attribute and T N - the19 andom forest Gradient boostingFeature with D no D with D no D α T N D D . The bold face indicates the most important features in each case.The least important ones are underlined. The “with D ” label refers to the full feature set, “no D ”one - to the feature set after removal of the diﬀusion coeﬃcient D . least important one.Gradient boosting diﬀers slightly from the random forest. In case with D , the order ofthe top two features is reversed. After removal of D , 3-var and 1-var became the mostinformative ones. The exponent α lost much of its importance. And again, T N is the leastinformative attribute. D. Note on auxiliary classiﬁers

Apart from the main collection of synthetic trajectories described in Sec. V B, we gener-ated two additional training sets. The ﬁrst one was built from the main set by simply addingnoise (Sec. V C) and the second one – with a smaller value (0.38 vs 1) of the parameter σ (Sec. V D).Those sets were then used to train new classiﬁers. Their accuracies are listed in Table VI.Note that those values are very similar to the ones presented in Table III. Thus, all classiﬁersperform very well on their corresponding synthetic test sets. Interestingly, the machinelearning algorithms seem to deal excellent with noisy data, as there is no signiﬁcant drop in20 andom forest Gradient boostingData set with D no D with D no D with noise 0.946 0.932 0.946 0.930with σ = 0 .

38 0.953 0.950 0.952 0.950TABLE VI. Accuracies of the classiﬁers trained on auxiliary data sets: the ﬁrst with noise (seeSec. V C) and the second with σ = 0 .

38 (Sec. V D) ).MSD MSD test MAX 1-var 2-varR G R G R G R G R GNormal diﬀusion 19% 22% 79% 76% 79% 76% 53% 52% 47% 51%Subdiﬀusion 80% 72% 21% 24% 21% 24% 47% 46% 53% 48%Superdiﬀusion 1% 6% 0 % 1% 0% 1% 0% 1% 0% 2%TABLE VII. Summary of the classiﬁcation results from Ref. [34]. Columns labeled with R and Gcorrespond to the G protein-coupled receptors and G proteins, respectively. The MSD data wascalculated for c = 0 . the accuracy of the classiﬁers trained on that data.The basic characteristics of the additional classiﬁers turned out to be practically indistin-guishable from the ones presented in the previous sections. Thus, we will skip their detaileddescription for the sake of readability. E. Application to real data

1. Summary of statistical methods

In Table VII, classiﬁcation results from Ref. [34] for the G protein-coupled receptors andG proteins (see Sec. V A for details) are summarized. Except the standard MSD method,the authors used statistical testing procedures based on: (a) MSD (referred as “MSD test”in Table VII), (b) maximum distance traveled by a particle (“MAX”) and (c) p -variationsat diﬀerent values of p (“1-var” and “2-var”). As wee see, the methods do not yield coherent21 andom forest Gradient boostingReceptor G protein Receptor G proteinNormal diﬀusion 38% 44% 38% 38%Subdiﬀusion 61% 54% 60% 55%Superdiﬀusion 0% 1% 0% 5%TABLE VIII. Diﬀusion modes of real trajectories found with classiﬁers trained on the main syn-thetic dataset ( σ = 1, no noise; see Sec. V B) with the full set of features (referred to as “with D”in the previous sections). Due to rounding, the numbers may not add up precisely to 100%. results. MSD classiﬁes most of the trajectories as subdiﬀusion. The MAX and MSD testprocedures indicate a prevalence of freely diﬀusing particles in the same data set. The p -vartests give similar proportions of normal and subdiﬀusive trajectories. Moreover, only thestandard MSD method is able to recognize a noticeable subset of trajectories as superdif-fusion. Further analysis with synthetic data revealed that the p -var method is the mostaccurate one for FBM, while the MSD/MAX tests are the best choice (in terms of errors)for OU and DBM processes.

2. Classiﬁcation with full feature set

Our ﬁrst attempt to classify the data with the whole feature set deﬁned in Sec. VI ispresented in Table VIII. As we see, both methods work similarly, with gradient boostingrecognizing more G protein trajectories as a superdiﬀusive motion. Most of the trajectoriesare classiﬁed as normal or subdiﬀusion, with the prevalence of the latter. Note that thenumbers given in Table VIII do not match any of the results generated with the statisticalmethods. Thus, it is really hard to judge which method should be chosen to work with realdata.

3. Role of D and classiﬁcation with reduced feature set To pin down a possible cause for the deviation from the statistical methods, let us havea look at the distribution of values of the generalized diﬀusion coeﬃcient D among thetrajectories in the real data set. The corresponding histograms for the G protein-coupled22 ' [ m / s ] 1 X P EH U R IW U D M H F W R U L H V 5HFHSWRUV ' [ m / s ] 1 X P EH U R IW U D M H F W R U L H V *SURWHLQV FIG. 5. Distribution of D among trajectories in the real data set. receptors and G proteins are shown in Fig. 5. To calculate the histograms, D was simplyextracted from the MSD curves under the assumption of the anomalous diﬀusion model [34].Its values in the data set turned out to be much smaller than in the synthetic data set, with D = 0 . σ = 1 (i.e. D = 0 .

5) for all types of diﬀusion. Thus, the discrepancy in the classiﬁcationresults from any of the methods presented in [34] may simple be caused by the fact, thatthe classiﬁers were trained for a diﬀerent regime of diﬀusion.To check this hypothesis let us classify the trajectories with the models trained on thedata set generated with σ = 0 .

38, which corresponds to D = 0 . σ to the real data set improved the results. However, this procedure is not really convenient,as it requires generation of a new synthetic data set, extracting of features and training ofa new classiﬁer practically every time new experimental samples are arriving.In search of a more universal procedure we decided to train new classiﬁers on the reduced23 andom forest Gradient boostingReceptor G protein Receptor G proteinNormal diﬀusion 54% 51% 54% 51%Subdiﬀusion 45% 47% 45% 46%Superdiﬀusion 0% 1% 0% 1%TABLE IX. Diﬀusion types found in real data with the classiﬁers trained on the auxiliary dataset( σ = 0 .

38, no noise; see Sec. V D) with the full set of features (referred to as “with D” in theprevious sections). Due to rounding, the numbers may not add up precisely to 100%.Random forest Gradient boostingReceptor G protein Receptor G proteinNormal diﬀusion 52% 53% 56% 54%Subdiﬀusion 46% 45% 43% 43%Superdiﬀusion 0% 1% 0% 1%TABLE X. Results of classiﬁers trained on the main synthetic dataset ( σ = 1, no noise) with thereduced set of features, after removal of D (referred to as “no D” in the previous sections). Dueto rounding, the numbers may not add up precisely to 100%. set of features not containing the diﬀusion coeﬃcient D , but we used the main synthetic setwith σ = 1 as in Ref. [34]. Since D turned out to be the least informative feature (Table V),it was the natural candidate for the removal anyway. We know already, that the accuracyof the classiﬁers without D is a little bit smaller (Table III). Nevertheless, we expect themto work better on unseen data. Indeed, even though the choice of σ was not optimal, theclassiﬁcation results shown in Table X resemble the ones obtained above (Table IX) and the1-var and 2-var methods from Ref. [34]. Again, most of the trajectories belong either to thenormal diﬀusion or the subdiﬀusion class, with the ﬁrst having a larger count. Just few datasamples are recognized as superdiﬀusion in case of G proteins.The advantage of the classiﬁcation with the reduced data set over the one with theadjustment of σ lies in that the classiﬁer is trained only once and then may be simply appliedto any unseen samples. It does not require a recurring and time consuming procedure ofadjustment of σ , generation of tailor-made training data, extraction of features and training24 andom forest Gradient boostingReceptor G protein Receptor G proteinNormal diﬀusion 62% 58% 63% 58%Subdiﬀusion 38% 40% 36% 40%Superdiﬀusion 0% 2% 0% 3%TABLE XI. Results of classiﬁers trained on noisy data ( σ = 1) with the reduced set of features(referred to as “no D” in the previous sections). Due to rounding, the numbers may not add upprecisely to 100%. of the classiﬁer every time a new set of experimental trajectories is arriving for analysis.The agreement with the p -variation procedure for small values of p makes perfect sense,if we recall that 2-var and 1-var belong to be the most informative among the features usedby the classiﬁers to distinguish between the data samples.

4. Impact of noise

Although it goes beyond a comparison of machine learning algorithms with the statisticalmethods from Ref. [34], as its authors used only pure trajectories, we would like to concludethis section by applying the classiﬁer trained on noisy data (Sec. V C) to the real trajectories.Results of classiﬁcation with the reduced feature set are shown in Table XI.The introduction of noise has changed the results. Although still most of the trajectoriesbelong either to the normal diﬀusion or to the subdiﬀusion class, the ﬁrst has now largercount compared with the case without noise in the synthetic data. As we already pointedout in Sec. V B, the boundary between the normal and anomalous modes in the vicinityof α = 1 is not particularly well deﬁned even in the absence of noise. This boundary isfurther blurred in the presence of noise, resulting in the observed rearrangement of classmemberships. However, if we still would like to relate the “noisy” results with the ones fromRef. [34], they are close to an average of 1-var and M AX methods.25

III. DISCUSSION

Machine learning methods used for classiﬁcation of SPT data are known to sometimesfail to generalize to unseen data [39]. In this paper, we revisited our ML approach totrajectory classiﬁcation and presented a new set of features, which are required by theclassiﬁers to process the input data. This new set allows the random forest and gradientboosting classiﬁers to transfer the knowledge from a synthetic training set to real data. Theclassiﬁers were tested on a subset of experimental data describing G proteins and G protein-coupled receptors [47]. The results were then compared to four statistical testing proceduresintroduced in Ref. [34].We have shown that the choice of the feature set is crucial, as even a small change inits content may signiﬁcantly impact the behavior of the classiﬁers. We decided to use a setconsisting of the anomalous α exponent, the diﬀusion coeﬃcient D , the maximum distancetraveled by a particle, the power γ p ﬁtted to p -variation for values of p from 1 to 5. Thesefeatures were extracted from several statistical methods presented in Ref. [34]. Since noneof the methods turned out to be superior to the others, the authors of the work proposedto take a mean of the results of all methods in order to minimize the risk of large errors.Due to the fact that our classiﬁers use all features simultaneously as input, in some sense wefollowed their advice. From our ﬁndings it follows that with the full feature set, the machinelearning methods applied to the real data yield results completely diﬀerent from the onesproduced with the statistical methods. However, adjusting the diﬀusion coeﬃcient in thesynthetic trajectories to the most frequent value among the real samples or removing thiscoeﬃcient from the feature set and re-training the classiﬁers starts to produce results verysimilar to the p -variation method from Ref. [34].From the above methods, the one with the reduced set of features is more convenient,because the classiﬁer is trained only once and then may be simply applied to any unseendata. It does not require a recurring and time consuming procedure consisting of: (1)adjustment of σ , (2) generation of tailor-made training data, (3) extraction of features and(4) training of the classiﬁer every time a new set of experimental trajectories is arriving foranalysis.The agreement between the ML approach and the statistical testing based on p variationsis, on one hand, a conﬁrmation that our ML methods are able to classify unseen data in26 reasonable way. On the other hand, it may support the choice of the p -variation testingprocedure among the statistical methods.Introduction of noise mimicking diﬀerent kinds of localization errors changed the classiﬁ-cation results – the count of normal diﬀusion (subdiﬀusion) trajectories increased (decreased)by a couple of percentage points. A slight increase of superdiﬀusion samples was also ob-served in case of G proteins. If we would like to relate the “noisy” results with the onesfrom Ref. [34], they are close to an average of 1-var and M AX methods.Although still a lot needs to be done in terms of selection of robust features and thegeneration of appropriate synthetic training data, we believe that our methodology maybe successfully applied to experimental data in order to provide a further insight into thedynamics of complex biological processes.

ACKNOWLEDGMENTS

The work of P.K, H. L.-O. J.S. and A.W. was supported by NCN-DFG Beethoven GrantNo. 2016/23/G/ST1/04083. The work of J.J. was supported by NCN Sonata Bis GrantNo. 2019/34/E/ST1/00360. Calculations have been carried out using resources provided byWroclaw Centre for Networking and Supercomputing (http://wcss.pl).

Appendix: Codes

Python codes for every stage of the classiﬁcation procedure shown in Fig. 1, togetherwith a short documentation, are publicly available at Zenodo (see Ref. [93]). [1] L. Barak and W. Webb, Journal of Cell Biology , 846 (1982),https://rupress.org/jcb/article-pdf/95/3/846/459484/846.pdf.[2] A. Kusumi, Y. Sako, and M. Yamamoto, Biophysical Journal , 2021 (1993).[3] A. Yildiz, J. N. Forkey, S. A. McKinney, T. Ha, Y. E. Goldman, and P. R. Selvin, Science , 2061 (2003).[4] C. Kural, H. Kim, S. Syed, G. Goshima, V. I. Gelfand, and P. R. Selvin, Science , 1469(2005).

5] I. Izeddin, V. R´ecamier, L. Bosanac, I. I. Ciss´e, L. Boudarene, C. Dugast-Darzacq, F. Proux,O. B´enichou, R. Voituriez, O. Bensaude, M. Dahan, and X. Darzacq, eLife , e02230 (2014).[6] N. Gal, D. Lechtman-Goldstein, and D. Weihs, Rheologica Acta , 425 (2013).[7] P. C. Bressloﬀ, Stochastic processes in cell biology , Interdisciplinary applied mathematics(Springer, Cham [u.a.], 2014) literaturverz. S. 645 - 672.[8] M. J. Saxton, Biophysical Journal , 2110 (1994).[9] M. J. Saxton and K. Jacobson, Annu. Rev. Biophys. Biomol. Struct. , 373 (1997).[10] T. J. Feder, I. Brust-Mascher, J. P. Slattery, B. Baird, and W. W. Webb, Biophysical Journal , 2767 (1996).[11] R. Metzler and J. Klafter, Physics Reports , 1 (2000).[12] X. Michalet, Physical Review E , 041914 (2010).[13] S. B. Alves, G. F. O. Jr., L. C. Oliveira, T. P. de Silansa, M. Chevrollier, M. Ori´a, andH. L. S. Cavalcante, Physica A , 392 (2016).[14] N. Hoze, D. Nair, E. Hosy, C. Sieben, S. Manley, A. Herrmann, J.-B. Sibarita, D. Cho-quet, and D. Holcman, Proceedings of the National Academy of Sciences , 022708 (2014).[16] M. Weiss, M. Elsner, F. Kartberg, and T. Nilsson, Biophysical Journal , 3518 (2004).[17] D. Arcizet, B. Meier, E. Sackmann, J. O. R¨adler, and D. Heinrich, Phys. Rev. Lett. ,248103 (2008).[18] N. Monnier, S.-M. Guo, M. Mori, J. He, P. L´en´art, and M. Bathe, Biophysical Journal ,616 (2012).[19] E. Kepten, A. Weron, G. Sikora, K. Burnecki, and Y. Garini, PLoS ONE , e0117722(2015).[20] V. Briane, C. Kervrann, and M. Vimond, Phys. Rev. E , 062121 (2018).[21] M. Weiss, Phys. Rev. E , 042125 (2019).[22] M. J. Saxton, Biophysical Journal , 1766 (1993).[23] M. T. Valentine, P. D. Kaplan, D. Thota, J. C. Crocker, T. Gisler, R. K. Prud’homme,M. Beck, and D. A. Weitz, Phys. Rev. E , 061506 (2001).[24] N. Gal and D. Weihs, Phys. Rev. E , 020903 (2010).

25] D. S. Grebenkov, M. Vahabi, E. Bertseva, L. Forr´o, and S. Jeney, Phys. Rev. E , 040701(2013).[26] A. Fuli´nski, Journal of Physics A: Mathematical and Theoretical , 054002 (2017).[27] C. Raupach, D. P. Zitterbart, C. T. Mierke, C. Metzner, F. A. M¨uller, and B. Fabry, Phys.Rev. E , 011918 (2007).[28] S. Burov, S. M. A. Tabei, T. Huynh, M. P. Murrell, L. H. Philipson, S. A. Rice, M. L. Gardel,N. F. Scherer, and A. R. Dinner, Proceedings of the National Academy of Sciences , 19689(2013).[29] V. Tejedor, O. B´enichou, R. Voituriez, R. Jungmann, F. Simmel, C. Selhuber-Unkel, L. B.Oddershede, and R. Metzler, Biophysical Journal , 1364 (2010).[30] K. Burnecki, E. Kepten, Y. Garini, G. Sikora, and A. Weron, Scientiﬁc Reports , 11306(2015).[31] R. Das, C. W. Cairo, and D. Coombs, PLOS Computational Biology , 1 (2009).[32] P. J. Slator, C. W. Cairo, and N. J. Burroughs, PLOS ONE , 1 (2015).[33] P. J. Slator and N. J. Burroughs, Biophysical Journal , 1741 (2018).[34] A. Weron, J. Janczura, E. Boryczka, T. Sungkaworn, and D. Calebiro, Phys. Rev. E ,042149 (2019).[35] S. Raschka, Python Machine Learning (Packt Publishing, 2015).[36] S. Thapa, M. A. Lomholt, J. Krog, A. G. Cherstvy, and R. Metzler, Phys. Chem. Chem.Phys. , 29018 (2018).[37] A. G. Cherstvy, S. Thapa, C. E. Wagner, and R. Metzler, Soft Matter , 2526 (2019).[38] T. Wagner, A. Kroll, C. R. Haramagatti, H.-G. Lipinski, and M. Wiemann, PLoS ONE , e0170165 (2017).[39] P. Kowalek, H. Loch-Olszewska, and J. Szwabi´nski, Phys. Rev. E , 032410 (2019).[40] G. Mu˜noz-Gil, M. A. Garcia-March, C. Manzo, J. D. Mart´ın-Guerrero, and M. Lewenstein,New Journal of Physics , 013010 (2020).[41] P. Dosset, P. Rassam, L. Fernandez, C. Espenel, E. Rubinstein, E. Margeat, and P.-E. Milhiet,BMC Bioinformatics , 197 (2016).[42] N. Granik, L. E. Weiss, E. Nehme, M. Levin, M. Chein, E. Perlson, Y. Roichman, andY. Shechtman, Biophysical Journal , 185 (2019).[43] S. Bo, F. Schmidt, R. Eichhorn, and G. Volpe, Phys. Rev. E , 010102 (2019).

44] D. Han, N. Korabel, R. Chen, M. Johnston, A. Gavrilova, V. J. Allan, S. Fedotov, and T. A.Waigh, eLife , e52224 (2020).[45] N. Hatami, Y. Gavet, and J. Debayle, in Proceedings of SPIE, Tenth International Conferenceon Machine Vision (ICMV 2017) , edited by A. Verikas, P. Radeva, D. Nikolaev, and J. Zhou(2018) p. 10696.[46] T. Mitchel,

Machine Learning (McGraw-Hill Professional, 1997).[47] T. Sungkaworn, M.-L. Jobin, K. Burnecki, A. Weron, M. J. Lohse, and D. Calebiro, Nature , 543 (2017).[48] F. H¨oﬂing and T. Franosch, Reports on Progress in Physics , 046602 (2013).[49] J. Szymanski and M. Weiss, Phys. Rev. Lett. , 038102 (2009).[50] J.-H. Jeon and R. Metzler, Phys. Rev. E , 021103 (2010).[51] A. V. Weigel, B. Simon, M. M. Tamkun, and D. Krapf, Proceedings of the National Academyof Sciences , 6438 (2011).[52] A. V. Weigel, S. Ragi, M. L. Reid, E. K. P. Chong, M. M. Tamkun, and D. Krapf, Phys.Rev. E , 041924 (2012).[53] M. Hellmann, J. Klafter, D. W. Heermann, and M. Weiss, Journal of Physics: CondensedMatter , 234113 (2011).[54] S. Sadegh, J. L. Higgins, P. C. Mannion, M. M. Tamkun, and D. Krapf, Phys. Rev. X ,011031 (2017).[55] H. Qian, M. P. Sheetz, and E. L. Elson, Biophysical Journal , 910 (1991).[56] A. R. Vega, S. A. Freeman, S. Grinstein, and K. Jaqaman, Biophysical Journal , 1018(2018).[57] Y. Lanoisel´ee, G. Sikora, A. Grzesiek, D. S. Grebenkov, and A. Wy(cid:32)loma´nska, Phys. Rev. E , 062139 (2018).[58] T. K. Ho, in Proceedings of the Third International Conference on Document Analysis andRecognition - Volume 1 (IEEE Computer Society, 1995).[59] T. K. Ho, IEEE Transactions on Pattern Analysis and Machine Intelligence , 832 (1998).[60] R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee, Annals of Statistics , 1651 (1998).[61] J. H. Friedman, Comput. Stat. Data Anal. , 367 (2002).[62] Y.-Y. Song and Y. LU, Shanghai Archives of Psychiatry , 130 (2015).[63] C. E. Shannon, Bell Syst. Tech. J. , 379 (1948).

64] G. James, D. Witten, T. Hastie, and R. Tibshirani,

An Introduction to Statistical Learningwith Applications in R (Springer, New York, NY, 2013).[65] M. Bramer,

Principles of Data Mining , 2nd ed. (Springer Publishing Company, Incorporated,2013).[66] S. Havlin and D. Ben-Avraham, Advances in Physics , 695 (1987).[67] O. Blondel, M. R. Hil´ario, R. S. dos Santos, V. Sidoravicius, and A. Teixeira, Electron. J.Probab. , 33 pp. (2019).[68] J. P. Straley, Journal of Physics C: Solid State Physics , 2991 (1980).[69] R. Metzler, J.-H. Jeon, A. G. Cherstvy, and E. Barkai, Phys. Chem. Chem. Phys. , 24128(2014).[70] B. B. Mandelbrot and J. W. V. Ness, SIAM Review , 422 (1968).[71] G. Guigas, C. Kalla, and M. Weiss, Biophysical Journal , 316 (2007).[72] K. Burnecki, E. Kepten, J. Janczura, I. Bronshtein, Y. Garini, and A. Weron, BiophysicalJournal , 1839 (2012).[73] K. Burnecki and A. Weron, Physical Review E , 021130 (2010).[74] S. C. Kou and X. S. Xie, Phys. Rev. Lett. , 180603 (2004).[75] K. Burnecki and A. Weron, Journal of Statistical Mechanics: Theory and Experiment ,P10036 (2014).[76] T. C. Elston, Journal of Mathematical Biology , 189 (2000).[77] C. L. MacLeod, Z. Ivezi, C. S. Kochanek, S. Koz(cid:32)lowski, B. Kelly, E. Bullock, A. Kimball,B. Sesar, D. Westman, K. Brooks, R. Gibson, A. C. Becker, and W. H. de Vries, AstrophysicalJournal , 1014 (2010).[78] J.-H. Jeon, V. Tejedor, S. Burov, E. Barkai, C. Selhuber-Unkel, K. Berg-Sørensen, L. Odder-shede, and R. Metzler, Phys. Rev. Lett. , 048103 (2011).[79] K. L. Pierce, R. T. Premont, and R. J. Lefkowitz, Nature Reviews Molecular Cell Biology ,639 (2002).[80] C. Flynn, “FBM: Exact methods for simulating fractional brownian motion (fbm) orfractional gaussian noise (fgn) in Python.” (2017–), [Online; accessed 21-02-2019],https://github.com/crﬂynn/fbm.[81] R. B. Davies and D. S. Harte, Biometrika , 95 (1987).[82] J. Hosking, Water Resources Research , 1898 (1984).

83] C. Flynn, “Stochastic: a python package for generating realizations of stochastic processes.”(2018–), [Online; accessed 20-04-2020], https://github.com/crﬂynn/stochastic/.[84] P. E. Kloeden and E. Platen,

Numerical Solution of Stochastic Diﬀerential Equations (Springer-Verlag, Berlin, Heidelberg, 1992).[85] J. W. R. Mellnik, M. Lysy, P. A. Vasquez, N. S. Pillai, D. B. Hill, J. Cribb, S. A. McKinley,and M. G. Forest, Journal of Rheology , 379 (2016).[86] H. Deschout, F. C. Zanacchi, M. Mlodzianoski, A. Diaspro, J. Bewersdorf, S. T. Hess, andK. Braeckmans, Nature Methods , 253 (2014).[87] M. Weiss, Phys. Rev. E , 010101 (2013).[88] C. P. Calderon, Phys. Rev. E , 053303 (2016).[89] M. Magdziarz, A. Weron, K. Burnecki, and J. Klafter, Phys. Rev. Lett. , 180602 (2009).[90] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,M. Perrot, and E. Duchesnay, Journal of Machine Learning Research , 2825 (2011).[91] J. W. Perry, A. Kent, and M. M. Berry, American Documentation , 242 (1955).[92] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classiﬁcation and RegressionTrees (Wadsworth and Brooks, Monterey, CA, 1984).[93] J. Janczura, P. Kowalek, H. Loch-Olszewska, J. Szwabi´nski, and A. Weron, “Trajec-tory generators and classiﬁcators for fractional anomalous diﬀusion classiﬁcation,” (2020),https://doi.org/10.5281/zenodo.3933357.(Wadsworth and Brooks, Monterey, CA, 1984).[93] J. Janczura, P. Kowalek, H. Loch-Olszewska, J. Szwabi´nski, and A. Weron, “Trajec-tory generators and classiﬁcators for fractional anomalous diﬀusion classiﬁcation,” (2020),https://doi.org/10.5281/zenodo.3933357.