Classification of particle trajectories in living cells: machine learning versus statistical testing hypothesis for fractional anomalous diffusion
Joanna Janczura, Patrycja Kowalek, Hanna Loch-Olszewska, Janusz Szwabiński, Aleksander Weron
CClassification of particle trajectories in living cells: machinelearning versus statistical testing hypothesis for fractionalanomalous diffusion
Joanna Janczura, Patrycja Kowalek, HannaLoch-Olszewska, Janusz Szwabi´nski, and Aleksander Weron
Faculty of Pure and Applied Mathematics, Hugo Steinhaus Center,Wroc(cid:32)law University of Science and Technology, 50-370 Wroc(cid:32)law, Poland
Abstract
Single-particle tracking (SPT) has become a popular tool to study the intracellular transportof molecules in living cells. Inferring the character of their dynamics is important, because itdetermines the organization and functions of the cells. For this reason, one of the first steps inthe analysis of SPT data is the identification of the diffusion type of the observed particles. Themost popular method to identify the class of a trajectory is based on the mean square displace-ment (MSD). However, due to its known limitations, several other approaches have been alreadyproposed. With the recent advances in algorithms and the developments of modern hardware, theclassification attempts rooted in machine learning (ML) are of particular interest. In this work, weadopt two ML ensemble algorithms, i.e. random forest and gradient boosting, to the problem oftrajectory classification. We present a new set of features used to transform the raw trajectoriesdata into input vectors required by the classifiers. The resulting models are then applied to realdata for G protein-coupled receptors and G proteins. The classification results are compared torecent statistical methods going beyond MSD.
Keywords: single particle tracking, anomalous diffusion, time series classification, machine learning a r X i v : . [ q - b i o . Q M ] J u l . INTRODUCTION Single-particle tracking (SPT) has become an important tool in the biophysical commu-nity in recent years. It was first carried out on proteins diffusing in the cell membrane [1, 2].Since then it was successfully used to study different transport processes in intracellular envi-ronment, providing valuable information about mechano-structural characteristics of livingcells. For instance, it helped already to unveil the details of the movement of molecularmotors inside cells [3, 4] or of target search mechanisms of nuclear proteins [5].Living cells belong to the class of active systems [6], in which the particles undergosimultaneous active and thermally driven transport. It has been shown already that thedynamics of proteins in cells determines their organization and functions [7]. This is thereason why it is crucial to identify the type of motion of the observed particles in order todeduct their driving forces [8–11].Over the last decades, a number of stochastic models has been already proposed to de-scribe the intracellular transport of molecules [11, 12]. Within those models, the dynamicsof molecules usually alternates between distinct types of diffusion, each of which may beassociated with a different physical scenario. The Brownian motion [13] models a particlethat diffuses freely, i.e. it does not meet any obstacles in its path nor it interacts with othermolecules in its surrounding. The subdiffusion is appropriate to represent trapped parti-cles [11, 14], particles which encounter fixed or moving obstacles [8, 15] or particles sloweddown due to the viscoelastic properties of the cytoplasm [16]. Finally, the superdiffusionmodels the motion driven by molecular motors: the particles move faster than in a freediffusion case and in a specific direction [17]. The sub- and superdiffusion together are oftenreferred to as the anomalous diffusion.The standard method of classification of individual trajectories into those three types ofdiffusion is based on the mean square displacement (MSD) [12]. Within this approach onefits the theoretical MSD curves for various models to the data and then selects the best fitwith statistical analysis [18]. A linear MSD curve indicates the free diffusion, a sublinear(superlinear) one - the subdiffusion (the superdiffusion). However, there are some issuesrelated with this method. In many cases, the experimental trajectories are too short toextract a meaningful information from MSD. Moreover, the finite precision adds a term tothe MSD, which is known to limit the interpretation of the data [9, 12, 19, 20]. As a result,2everal methods improving or going beyond the MSD have been introduced to overcomethese problems. For instance, Michalet [12] used an iterative method called the optimalleast-squares fit to determine the optimal number of points to obtain the best fit to MSD inthe presence of localization errors. Weiss [21] used a resampling approach that eliminateslocalization errors in the time-averaged MSD of subdiffusive fractional Brownian motion pro-cesses. The trajectory spread in space calculated through the radius of gyration [22], the VanHove displacements distributions and deviations from Gaussian statistics [23], self-similarityof trajectory using different powers of the displacement [24], velocity autocorrelation func-tion [25, 26] or the time-dependent directional persistence of trajectories [27] methods canbe combined with the output of MSD to improve the classification results. The distributionof directional changes [28], the mean maximum excursion method [29] and the fractionallyintegrated moving average (FIMA) framework [30] may efficiently replace the MSD estima-tor for classification purposes. Hidden Markov Models (HMM) has been proposed to checkthe heterogeneity within single trajectories [31, 32]. They have proven to be quite usefulin the detection of confinement [33]. Last but not least, classification based on hypothesistesting, both relying on MSD and going beyond this statistics, has been shown to be quitesuccessful as well [20, 34].An alternative, very promising approach to SPT data analysis is rooted in computerscience. Namely, classification of trajectories may be seen as a subject of machine learning(ML) [35]. In the ML context, classification relies on available data, because its goal is toidentify to which category a new observation belongs on the basis of a training data setcontaining observations with a known category membership.There is already a number of attempts to analyze particle trajectories with machinelearning methods. Among them, Bayesian approach [18, 36, 37], random forests [38–40],neural networks [41] and deep neural networks [39, 42–44] have gained a lot of attention andpopularity. While some of the works have focused just on the identification of the diffusionmodes [38, 39, 41], the others went beyond just the classification of diffusion and tried toextract quantitative information about the trajectories (e.g. the anomalous exponent [40,42]).Recently, we presented a comparison of performance of two different classes of methods:traditional feature-based algorithms (random forest and gradient boosting) and a moderndeep learning approach based on convolutional neural networks [39]. The latter constitutes3owadays the state-of-the-art technology for automatic data classification and is much sim-pler to use from the perspective of the end-user, because it operates on raw data and doesnot require any preprocessing effort from human experts [45]. In contrast, the traditionalmethods require a representation of trajectories by a set of human-engineered features orattributes [46]. In most of the applications the deep learning approach outperforms thetraditional methods. However, in some situations it is still worth to use them, because theyusually work better on small data sets, are computationally cheaper and easier to interpret.From our results it follows that both approaches achieve excellent (and very similar) accu-racies on synthetic data. But they turned out to be really bad in terms of transfer learning.This concept refers to a situation, in which a classifier is trained in one setting and thenapplied to a different one. The classifiers from Ref. [39] were not able to successfully classifytrajectories generated with methods different from the ones used for the training set.In this paper, we are going to present an improved version of the traditional classifierspresented in Ref. [39]. We will propose a new set of training data as well as a new collectionof features describing a trajectory. Both are inspired by a recent statistical analysis ofanomalous diffusion [34]. To illustrate the transfer learning abilities of the new classifiers,we will apply them to the data from a single-particle tracking experiment on G protein-coupled receptors and G proteins [47]. Results of classification from Ref. [34] will be usedas a benchmark.The paper is organized as follows. In Sec. II, we briefly introduce the different modes ofdiffusion and methods of their analysis. Sec. III contains a short description of the machinelearning methods used in this work. Stochastic models of diffusion for generation of syntheticdata are presented in Sec. IV. The data itself is characterized in Sec. V. The set of featuresused as input to the classifiers is introduced in Sec. VI. Our results are presented in Sec. VII,followed by some concluding remarks.
II. DIFFUSION MODES AND THEIR ANALYSIS
As already mentioned in the introduction, identification of the diffusion modes of parti-cles within living cells is important, because they reflect the interactions of those particleswith their surrounding. For instance, if a particle is driven by a free diffusion (Brownianmotion) [13], we expect that it does not meet any obstacles in its path and does not undergo4ny relevant interactions with other particles. Deviations from Brownian motion are calledanomalous diffusion and can be divided into two distinct classes. Subdiffusion is slower thanthe normal one. It usually occurs in crowded or constrained domains and can be brought to-gether with different physical mechanisms including immobile obstacles, cytoplasm viscosity,crowding, trapping and heterogeneities [48–50]. Superdiffusion represents active transportalong the cytoskeleton, assisted by molecular motors [17]. Particles undergoing that typeof motion move faster than those freely diffusing and usually do not come back to previouspositions.Although different scenarios for both classes of anomalous diffusion are possible [11,49, 51–54], for the purpose of this work we will limit ourselves to those three basic typesmentioned above: free, sub- and superdiffusion.The most popular method of deducing a particles’ type of motion from their trajectoriesis based on the analysis of the mean square displacement (MSD) [55],
M SD ( t ) = E (cid:16) (cid:107) X t + t − X t (cid:107) (cid:17) , (1)where ( X t ) t> is a particle trajectory, (cid:107) · (cid:107) is the Euclidean norm and E is the expectationof the probability space. Since in many experiments only a limited number of trajectories isobserved, the time averaged MSD (TAMSD) calculated from a single trajectory is usuallyused as the estimator of MSD, (cid:100) M SD ( n ∆ t ) = 1 N − n + 1 N − n (cid:88) i =0 (cid:107) X t i + n − X t i (cid:107) . (2)The trajectory is assumed to be given in form of N consecutive two dimensional positions X i = ( x i , y i ) ( i = 0 , . . . , N ) recorded with a constant time interval ∆ t and n is the timelag between the initial and the final positions of the particle. If the underlying process isergodic and has stationary increments, TAMSD converges to the theoretical MSD [51].TAMSD as a function of the time lag for the normal diffusion converges asymptoticallyto a linear function [9], i.e. for large N : (cid:100) M SD ( n ∆ t ) ∼ D ( n ∆ t ) , (3)with D being the diffusion coefficient. For subdiffusion, being slower than diffusion, thebehaviour of TAMSD is sublinear, while for superdiffusion, being faster than diffusion, thebehaviour is superlinear. Thus, for pure trajectories with no localization errors one could5asily determine their diffusion type by fitting a function α log( n ∆ t ) + β to the estimatedlog[ (cid:100) M SD ( n ∆ t )] curve. If α < α >
1, as superdiffusive. Although theoretically this approach allows for the uncomplicateddistinction of the diffusion types, there are several issues related with it as a method forclassification. First, real trajectories are usually noisy, which makes the fitting of a mathe-matical model a challenging task, even in the simplest case of the normal diffusion [12, 21].Secondly, according to Eq. (2), only the values of (cid:100)
M SD corresponding to small time lags arewell averaged. The larger the lag, the smaller is the number of displacements contributingto the averages, resulting in fluctuations increasing with the lag. Selecting a suitable lag isby the way a well known problem in biophysics [20, 56, 57]. Since many real trajectoriesare short, we are forced to concentrated on short times (small lags). This induces anotherproblem in a classification method based only on MSD curves, as in this case the differentpower laws look alike even in the absence of noise.
III. MACHINE LEARNING APPROACH
Several different procedures have been already proposed to circumvent the limitationsof the MSD [12, 20, 22–24, 27–29, 31–34], including the use of machine learning meth-ods [18, 36–40, 42, 43]. Recently, we discussed the applicability of three different machinelearning algorithms to classification, including two feature-based methods and a deep learn-ing one [39]. The results of that study were ambiguous. On one hand all of the methodsperformed excellent on the test data, on the other - they failed to transfer their knowledge todata coming from unseen physical models. The latter finding practically disqualified themas candidates for a reliable classification tool.In this paper, we are going to continue the analysis started in Ref. [39] and presentimproved versions of the classifiers, which performs much better in terms of transfer learning.We will focus on the traditional machine learning methods: the random forest (RF) [58, 59]and the gradient boosting (GB) [60, 61]. Both methods are feature-based, meaning thateach instance in the data set is described by a set of human-engineered attributes [46]. Andboth belong to the class of ensemble methods, which combine multiple base classifiers toform a better one. In each case, decision trees [62] are used as the base classifiers.A decision tree is built by splitting the original dataset (trajectories with known classes),6onstituting the root node of the tree, into subsets, which represent the successor children.The splitting is based on a set of rules utilizing the values of features. This process isrepeated on each derived subset in a recursive manner. The recursion is completed whenthe subset at a node has all samples belonging to the same class (i.e. the node is pure) orwhen splitting no longer adds value to the classification. At each step, a feature that bestsplits the data is chosen. Two metrics are typically used to measure the quality of the split:Gini impurity and information gain [35].Gini impurity tells us how often a randomly chosen element from the set would be incor-rectly labeled if it was randomly labeled according to the distribution of labels in that set.It is given by I G = J (cid:88) i =1 p i (1 − p i ) , (4)where J is the number of classes ( J = 3 in our case) and p i is the fraction of items labeledwith class i in the set.Information gain related to a split is simply the reduction of information entropy [63],calculated as the difference between the entropy of a parent node in the tree and a weightedsum of entropies of its children nodes. The entropy itself is given as H = − J (cid:88) i =1 p i log p i , (5)where p , p , . . . are fractions of each class present in the node.Decision trees are often used for classification purposes, because they are easy to under-stand and interpret. However, single trees are unstable in the sense that a small variationin the data may lead to a completely different tree [64]. They also have a tendency tooverfit, i.e. they model the training data too well and learn noise or random fluctuations asmeaningful concepts, which limits their accuracy in case of unseen data [65]. That is whythey are rather used as building blocks of the ensembles and not as stand alone classifiers.In a random forest, multiple decision trees are constructed independently from the sametraining data. The predictions of individual trees are aggregated and then their mode istaken as the final output. In gradient boosting, the trees are not independent. Instead,they are built sequentially by learning from mistakes commited by the ensemble. In manyapplications, gradient boosting is expected to have a better performance than random forest.However, it is usually not the better choice in case of very noisy data.7 x y Normal di ff usion Input (trajectory)Output (its label) T r a i n i ng s e t Training processML algorithm learns the parameters and produces a trained model
Final model U n s een da t a Input (new trajectory) Predicted output
Subdi ff usion Preprocessing P r ep r o c e ss i ng FIG. 1. Workflow of our classification method. The training set is composed of a large numberof synthetic trajectories (Sec. V B). The preprocessing phase consists in extraction of featuresintroduced in Sec. VI.
A workflow of our classification method is shown in Fig. 1. The training set consists ofa large number of synthetic trajectories and their labels (diffusion modes). The trajectorieswere generated with various kinds of theoretical models of diffusion (see Sections IV and V Bfor further details). In the preprocessing phase, the raw data is cleaned and transformedinto a form required as input by the classifier. Many traditional classifiers including randomforest and gradient boosting work much better with vectors of features characterizing eachtrajectory instead of raw data. The features used in this work are introduced in Sec. VI.Some authors normalize the trajectories before further processing [40]. However, we omittedthis step as our preliminary analysis indicated a significant decrease in the performance ofthe classifiers induced by normalization. The ensembles of trees were inferred from thefeature vectors and their labels. Once trained, they may be used to classify new trajectories,including the experimental ones.
IV. STOCHASTIC MODELS OF DIFFUSION
The most popular theoretical models of diffusion commonly employed are: continuous-time random walk (CTRW) [11], obstructed diffusion (OD) [8, 66], random walk on ran-8om walks (RWRW) [67], random walks on percolating clusters (RWPC) [68, 69] fractionalBrownian motion (FBM) [70–72], fractional Levy α -stable motion (FLSM) [73], fractionalLangevin equation (FLE) [74] and autoregressive fractionally integrated moving average(ARFIMA) [75]. They are applicable to different physical environments: trapping andcrowded environments (CTRW, FFPE); labyrinthine environments (OD, RWPC, RWRW);viscoelastic systems (FBM, FLSM, FLE, ARFIMA); systems with time-dependent diffusion(scaled FBM, ARFIMA). Following Refs. [20, 34], we will focus on three stochastic pro-cesses known to generate different kinds of fractional diffusion: fractional Brownian motion,directed Brownian motion (DBM) [76] and Ornstein-Uhlenbeck process (OU) [77].FBM is the solution of the stochastic differential equation dX it = σdB H,it , i = 1 , , (6)where the parameter σ > σ = √ D , H is the Hurstparameter ( H = α/
2) and B Ht - a continuous-time Gaussian process that starts at zero, hasexpectation zero and has the following covariance function:E (cid:16) B Ht B Hs (cid:17) = 12 (cid:16) | t | H + | s | H − | t − s | H (cid:17) . (7)For H < (i.e. α < H = . And for H > , FBM generates superdiffusive motion (Fig. 2a).The directed Brownian motion, also known as the diffusion with drift, is the solution to dX it = v i dt + σdB / ,it , i = 1 , , (8)where v = ( v , v ) ∈ R is the drift parameter. This process generates superdiffusion relatedto an active transport of particles driven by molecular motors. The velocity of the motors ismodeled by the parameter v (Fig. 2b). For v = 0, the process reduces to normal diffusion.The Ornstein-Uhlenbeck process is known to model confined diffusion, which is a subclassof subdiffusion (Fig. 2b). It corresponds to a particle inside a potential well and is a solutionto the following stochastic differential equation: dX it = − λ i ( X it − θ i ) dt + σdB / ,it , i = 1 , , θ i ∈ R . (9)Here, θ = ( θ , θ ) is the equilibrium position of the particle and λ i measures the strength ofinteraction. For λ i = 0, OU reduces to normal diffusion as well.9 log( t ) [log( ms )] l o g ( M S D ( t )) l o g ( t ) = = ) % 0 > 1 ) % 0 < 1 D
50 0 [ [ m ]50250 \ [ m ] [ [ m ]05 \ [ m ] log( t ) [log( ms )] l o g ( M S D ( t )) l o g ( t ) ' , 5 ( &