[PDF] Screening by changes in stereotypical behavior during cell motility

Abstract

Stereotyped behaviors are series of postures that show very little variability between repeats. They have been used to classify the dynamics of individuals, groups and species without reference to the lower-level mechanisms that drive them. Stereotypes are easily identified in animals due to strong constraints on the number, shape, and relative positions of anatomical features, such as limbs, that may be used as landmarks for posture identification. In contrast, the identification of stereotypes in single cells poses a significant challenge as the cell lacks these landmark features, and finding constraints on cell shape is a non-trivial task. Here, we use the maximum caliber variational method to build a minimal model of cell behavior during migration. Without reference to biochemical details, we are able to make behavioral predictions over timescales of minutes using only changes in cell shape over timescales of seconds. We use drug treatment and genetics to demonstrate that maximum caliber descriptors can discriminate between healthy and aberrant migration, thereby showing potential applications for maximum caliber methods in automated disease screening, for example in the identification of behaviors associated with cancer metastasis.

Full PDF

SScreening by changes in stereotypical behaviorduring cell motility

Luke Tweedy , Patrick Witzel , Doris Heinrich

3, 4 , Robert H. Insall , and Robert G.Endres

1, * Department of Life Sciences and Centre for Integrative Systems Biology and Bioinformatics, Imperial College,London, United Kingdom CRUK Beatson Institute, Glasgow G61 1BD, Scotland, UK Fraunhofer Institute for Silicate Research ISC, Neunerplatz 2, 97082, W ¨urzburg, Germany Leiden Institute of Physics, LION, Leiden University * Corresponding Author

ABSTRACT

Stereotyped behaviors are series of postures that show very little variability between repeats. They have been used to classifythe dynamics of individuals, groups and species without reference to the lower-level mechanisms that drive them. Stereotypesare easily identiﬁed in animals due to strong constraints on the number, shape, and relative positions of anatomical features,such as limbs, that may be used as landmarks for posture identiﬁcation. In contrast, the identiﬁcation of stereotypes in singlecells poses a signiﬁcant challenge as the cell lacks these landmark features, and ﬁnding constraints on cell shape is a non-trivialtask. Here, we use the maximum caliber variational method to build a minimal model of cell behavior during migration. Withoutreference to biochemical details, we are able to make behavioral predictions over timescales of minutes using only changes incell shape over timescales of seconds. We use drug treatment and genetics to demonstrate that maximum caliber descriptorscan discriminate between healthy and aberrant migration, thereby showing potential applications for maximum caliber methodsin automated disease screening, for example in the identiﬁcation of behaviors associated with cancer metastasis.

Introduction

Moving in the right way at the right time can be a matter of life and death. Whether avoiding a predator or searching forfood, choosing the correct movements in response to speciﬁc stimuli is a crucial part of how an organism interacts with itsenvironment. The repetitive, highly coordinated movements that make up behavioral stereotypes have been shown to beentwined with survival strategies in a number of species, for example the incredible correlation in posture between prey captureevents in raptors and the escape response of C. elegans when exposed to extreme heat . Understanding these stereotypes isvital to creating a full picture of a species’ interactions with its environment. If stereotypes represent evolved, selection-drivenbehavior in animals, might the same not be true for single-celled organisms?This point of view may be particularly useful in understanding chemotaxis, the guided movement of a cell in response toa chemical gradient. During chemotaxis, eukaryotic cells change their shape through the repeated splitting and extension ofactin-rich structures called pseudods . Though this behavior is well known, the study of chemotaxis has traditionally focusedon the signaling events that regulate cytoskeletal remodeling. Even where pseudopods are acknowledged to be relevant, thefocus is on the biochemical mechanisms that generate and regulate them . These mechanisms are, however, staggeringlycomplex and the way chemotaxis emerges from these lower-level processes remains largely unknown. Rather than delvingdeeper into the network of biochemical interactions, we can instead learn from the shape changes and movements that thisintricate machine has evolved to produce. Such an approach, also known as morphological proﬁling, shows great promise inbiomedicine .Here, we explore this question using Dictyostelium discoideum , a model for chemotaxis noted for its reproducible directionalmigration towards cyclic adenosine monophosphate (cAMP) , which it senses using typical G-protein coupled receptors. Tocapture cell shape (or posture) at any given point in time, we employ Fourier shape descriptors, a rotationally invariant methodof quantifying shapes used previously to show that cell shape and environment are intrinsically linked (Fig. 1A). These shapedata are naturally of high dimensionality, making further analysis difﬁcult. We reduce their dimensionality using principalcomponent analysis (PCA), a method used previously to obtain the key directions of variability from the shapes of cells and animals (Fig. 1B) . Our ﬁnal challenge (and the focus of this paper) is to quantify behavior, which we deﬁne as themovement between shapes. There are many potential ways to do so , however we have adapted the variational maximum a r X i v : . [ q - b i o . CB ] J un aliber (MaxCal) approach to this end. These methods have several advantages over conventional alternatives: Firstly,Fourier descriptors capture all available information on shape, and the subsequent PCA provides a natural quantitative means ofdiscarding its less informative elements. Easier methods, such as measuring aspect ratio or eccentricity, require us to assumethat they provide a useful description a priori , and cannot inform us how much (or what) we have discarded for the sake ofsimplicity. Secondly, our chosen methods are blind to the researcher’s preconceptions, as well as to previous descriptions ofshape and behavior. Any behavioral modes identiﬁed have emerged from the data without our deciding on (and imposing)them, as we might if using supervised machine learning or ﬁtting parameters to a preconceived biochemical model. Finally,the minimal model we construct using maximum caliber makes no reference to any underlying biochemistry and thereforecannot make potentially incorrect assumptions about it. We demonstrate the usefulness of these methods by showing that theysuccessfully discriminate between the behavior of drug-treated or genetically altered cells and their parental strains. Results

Maximum caliber approach to behavioral classiﬁcation

Cells continuously change shape as they migrate, creating trajectories in the space of shapes that are speciﬁc to their situation.For example, we have previously shown that cells follow different shape trajectories in environments with low and highchemoattractant signal-to-noise ratios , here deﬁned as the local gradient squared over the background concentration (Fig.1C). In this example, it is important to note that the distributions of cell shape for each condition overlap signiﬁcantly. Thismeans that it is not always possible to accurately determine the cell’s condition from a static snapshot of its shape. In contrast,the dynamics of shape change in each condition are clearly distinct. Our aim here is to quantify the details of these shapechanges, making a small set of values that can act as a signature for a given mode of behavior. We can then use such signaturesto quantitatively compare, or to discriminate between, various conditions or genotypes. To this end, we employ the MaxCalmethod (Fig. 2A).MaxCal was originally proposed by E. T. Jaynes to treat dynamical problems in physics, and is much like his better-knownmaximum entropy (MaxEnt) method used in equilibrium physics. The motivation is the same for both; we wish to ﬁnd theprobability of a given structure for a system in a way that neither assumes something we do not know, nor contradicts somethingwe do know, i.e. an observation of the system’s behavior. In the case of MaxEnt, this is achieved by ﬁnding the maximumShannon entropy with respect to the probabilities of the set of possible conﬁgurations the system can take . MaxCal usesthe probabilities of the possible trajectories a dynamical system can follow instead. In this case, the entropy is replaced bythe caliber C , , so called because the ﬂow rate in a pipe relates to its caliber, or internal diameter. In essence, the methodextracts the degree to which different rates or events within the system are coupled, or co-occur beyond random chance. Thismethod has previously been used to sucessfully predict the dynamics of neuronal spiking, the ﬂocking behavior of birds andgene regulatory networks .Generally, the caliber takes the form C ( { p j } ) = − ∑ j p j ln ( p j ) + µ ∑ j p j + ∑ n λ n ∑ j S n , j p j , (1)where p j is the (potentially time-dependent) probability of the j th trajectory. The ﬁrst term on the right-hand side of Eq. (1)represents a Shannon-entropy like quantity and the second ensures that the p j are normalized. The third constrains the averagevalues of some properties S n , j of the trajectories j to the values of some macroscopically observed quantities (cid:104) S n (cid:105) , making surewe do not contradict any known information.By maximizing the caliber, the probabilities of the trajectories p j = Q − exp (cid:18) ∑ n λ n S n , j (cid:19) (2)are found, where Q = ∑ j exp ( ∑ n λ n S n , j ) is the dynamical partition function and { λ n } are a set of Lagrange multipliers. ThisBoltzmann-type distribution fulﬁls detailed balance, even for non-equilibrium problems. Practically, the problem is to ﬁndthese Lagrange multipliers (and hence, the partition function). To this end, we exploit their relations to the externally observedaverage values of some quantities (cid:104) S n (cid:105) = ∂ ln Q ∂ λ n , (3)where the values of the (cid:104) S n (cid:105) are determined from experiment. This training process is equivalent to maximum-likelihood ﬁttingto observed data. s our interest is in cell shape and motility, we derive our values for the (cid:104) S n (cid:105) from the shape dynamics of migrating Dictyostelium cells. In order to effectively parameterise our model, we must constrain the continuum of possible shape changesto a much smaller set of discrete unit changes in our principal components (PCs). We therefore build our model from discretizedvalues of the shape measures PC 1 and 2, assigning them to the variables N and N , respectively. Their values are analagous toparticle numbers in a chemical reaction. The switching between continuous and discrete variables is possible as σ x (cid:104) N x (cid:105) ≈ .

035 issmall with x = , PC x and σ x the standard deviation. We reduce the size of the time-step δ t until we no longer observechanges greater than 1 in a single δ t (similar to deviations of master equations). As PC 2 accounts for less overall variation thanPC 1, (see Fig. 1B), we naturally reach this minimal change for a much larger value of δ t , which is undesirable because by thetime δ t is small enough for PC 1, changes in PC 2 are almost never observed, making correlations between the two PCs difﬁcultto detect. We therefore scale all changes in PC 2 by a factor of σ / σ in order that unit changes are observed in both PCs for asimilar value of δ t . Practically, our training data yielded a δ t of 0.1875s (as each 3s frame in the video data was divided into 16sections, in which the intermediate PC values were linearly interpolated).The advantage of limiting the possible macroscopic shape changes in δ t to the following: an increase, a decrease, or nochange in each PC. As changes in each PC can be considered independently, this gives us a total of 3x3 = 9 cases (that is, nochange form the current position, or a move to any of the 8 neighbouring spaces, see Fig. 2A inset). These macroscopic casesare taken to be the observable effects of an underlying microscopic structure. From our analogy of a chemical reaction, we treatincreases to be particle creation and denote the microscopic variable for an increase in trajectory j as S x + , j , where x ∈ { , } correspond to PC 1 and 2, respectively. For small δ t this variable is binary, taking the value 1 when N x increases over a singletime-step and taking the value 0 otherwise. Decreases will be treated as particle decay, with N x separate variables { S x , i − , j } areused to denote decays for the i th particle, with 1 ≤ i ≤ N x . These { S x , i − , j } are equal to 1 if the i th particle decays in δ t and equalto 0 otherwise. Hence, in each δ t there are N x + N x (Fig. 2B). We choose such a ﬁrst-order decay over a zeroth-order decay in order to introduce avirtual force, bringing the system back toward the mean (see Fig. S2). As the two components may change independently, thereare ( N + )( N + ) possible microtrajectories in a single δ t over PC 1 and 2. Applying Eq. 3, we constrain the probabilitiesof these microtrajectories such that they agree with the macroscopically observed rates (cid:104) S x α (cid:105) , with α ∈ { + , −} an increase ordecrease in component x , respectively.We then expand the model to include a following time-step, allowing us to capture short-term correlations between events.This increases the number of possible trajectories substantially. The number of microtrajectories in a given time-step dependson N x at time t + δ t , and this quantity is different dependent on the pathway taken in the ﬁrst time-step, so we must include thishistory dependence. For example, a reduction in component x can happen in N x ways, and will cause N x to go down by one.This change is followed by ( N x − ) + N x (cid:0) N x + (cid:1) microtrajectories in which there is a decrease in the ﬁrst time-step. Accounting for the effectof the changing values of N and N over interval t , t + δ t in each microtrajectory on the interval starting at time t + δ t , thenumber of microtrajectories over 2 δ t is ( N + N + )( N + N + ) . Each observable has a corresponding value in trajectory j of S xy αβ , j , which is 1 if the correlation is observed and 0 otherwise. We can reduce this to 10 time-correlated observables byassuming symmetry under order-reversal, i.e. S xy αβ , j ≡ S yx βα , j (Fig. 2C). This assumption is justiﬁed: if we consider a negativelycorrelated movement between PC1 and PC2, we may see transitions in the order 1 + , − , + . Here the two couplets 1 + , − and2 − , + both represent the same phenomenon (see Fig S3). This leads to an additional 16 observables (cid:104) S αβ xy (cid:105) , where x , y ∈ { , } are shape PCs and α , β ∈ {− , + } denote a change in the component displayed above. We constrain our analysis to the ﬁrsttwo shape components only, as further components account for relatively little residual variance in shape, whilst increasingcomputational complexity geometrically.As an example, we show the partition function in a single shape component, in which there are 5 observables, {(cid:104) S + (cid:105) , (cid:104) S ++ (cid:105) , (cid:104) S + − (cid:105) , (cid:104) S − (cid:105) , (cid:104) S −− (cid:105)} : Q N = γ + (cid:2) γ + γ ++ + + (cid:0) N + (cid:1) γ − γ + − (cid:3) + N γ − (cid:2) γ + γ + − + + (cid:0) N − (cid:1) γ − γ −− (cid:3) + γ + + + N γ − , (4)where γ α = e λ α corresponds to a rate (when divided by δ t ), with λ α the Lagrange multiplier associated with observable (cid:104) S α (cid:105) . The ﬁrst line in Eq. (4) shows all possible transitions that begin with an increase over the ﬁrst time-step, and so thewhole line shares the factor γ + , the rate of increase. A subsequent increase contributes a further γ + , as well as a couplingterm γ ++ which allows us to capture the likelihood of adjacent transitions beyond the naive probability γ + γ + . A subsequentdecrease can happen in N + γ − . The term γ + − is a coupling constant, controling thelikelihood of an adjacent increase and decrease beyond the naive probability γ + γ − . Finally, the +1 allows for the possibility of o transition occurring in the subsequent time-step. The second and third lines correspond to a decrease in the ﬁrst time-step,and no transition occurring in the ﬁrst time-step, respectively.The Lagrange multipliers corresponding to observables are found using Eq. (3), which yields a set of equations to be solvedsimultaneously (see supplementary material for details). In the case of a single component, these equations are (cid:104) S + (cid:105) = γ + (cid:20) γ + γ ++ + + ( N + ) γ − γ + − (cid:21) (5a) (cid:104) S − (cid:105) = γ − (cid:20) ( N + ) γ + γ + − + N + N ( N − ) γ − γ −− (cid:21) (5b) (cid:104) S ++ (cid:105) = γ + γ + γ ++ (5c) (cid:104) S + − (cid:105) = N γ + γ − γ + − (5d) (cid:104) S −− (cid:105) = N ( N − ) γ − γ − γ −− . (5e)The equations for the two-component partition function and Lagrange multipliers can be found in the SI. This method effectivelyallows us to build a map of the commonality of complex, correlated behaviors relative to basic rates of shape change (asquantiﬁed using principal components). For a given Lagrange multiplier governing a particular correlation, a value less thanzero indicates a behavior that is less common than expected, and a value greater than zero represents a behavior that is morecommon. Stereotypical behavior without biochemical details

After training our model on

Dictyostelium shape trajectories, we conﬁrmed that the method had adequately captured theobserved correlations by using them to simulate the shape changes of untreated cells responding to cAMP. In order to illustratethe importance of the correlations, we also ran control simulations trained only on the basic rates of increase and decreasein each PC without these correlations. We compared the activities of the uncorrelated and correlated simulations against theobserved data. The uncorrelated model acts entirely proportional to the observed rates (though, interestingly, did not matchthem; Fig. 2D). In contrast, individual cells from the experimental data show very strong anticorrelation, with increases inone component coupled with decreases in the other. This behavior is clearly replicated by the correlated simulations, in bothcases appearing in the plot as a red diagonal from the bottom left to the top right. Furthermore, we see suppression of turningbehavior in both PCs, with the most poorly represented activity (relative to chance) being a switch in direction in either PC (forexample 1+ followed by 1-). This too is reﬂected in the correlated simulations.The predictive power of MaxCal simulations goes beyond those correlations on which they were directly trained: We testedthe simulations’ ability to predict repetition of any given transition. These patterns took the form of N transitions in T timesteps, e.g. ﬁve 1+ transitions in ten time-steps. The MaxCal model predicted frequencies of appearance for these patterns thatclosely resembled the real data (Fig. 3A, model in red, real data in black). In contrast, the uncorrelated model predicted patternsat a much lower rate, for example there are runs of 5 consecutive increases in PC 1 in the real data at a rate of around one in1.35 minutes. The correlated model predicts this pattern rate to be one in every three minutes. The uncorrelated model predictsthe same pattern at a rate of one in 6.67 hours. This result indicates that no higher-order correlations are required to recapitulatethe data, allowing us to avoid the huge increase in model complexity that their inclusion would entail.The greater predictive power of the MaxCal model is reﬂected by its lower Jensen-Shannon divergence from the observeddata for these kinds of pattern (Fig. 3B). The MaxCal model also more closely matches the observed probabilities of generatinga given number of transitions in a row, with predictions almost perfect up to 4 transitions in a row (twice the length-scale of themeasured correlations), and far stronger predictive power than the uncorrelated model over even longer timescales (Fig. 3C). A real world application of MaxCal methods to discriminate between genotypes

We wondered whether the MaxCal methods would accurately discriminate between biologically relevant conditions. Toinvestigate this, we used two comparisons. First, we compared shape data from control AX2 cells against the same cellstreated with two drugs targeting cytoskeletal function: the phospholipase A2 inhibitor p-bromophenacyl bromide (BPB) and thephosphoinositide 3-kinase inhibitor LY294002 (LY) (for details, see ). Second, we compared a stable myosin heavy-chainknockout against its parent strain (again AX2) (Fig. 4A). We ﬁrst looked at the effects of these conditions on the distributionof cell shapes, to see whether their effect could be identiﬁed from shape, rather than behavior. The drug treatment causeda substantial change in the distribution within the population (Fig. 4B), but still left a substantial overlap. In contrast, the hcA − cells showed no substantial difference to their parent in shape distribution (Fig 4C). In both cases, the identiﬁcation of acondition from a small sample of shape data would not be feasible.We then compared the behavioral Lagrange multipliers of each condition, found by MaxCal, producing distributions for theestimated values of these by bootstrapping our data (sampling with replacement). The values of γ + − and γ + − are lower inthe untreated condition than those in drug-treated condition, indicating the persistence of shape change in WT cells (Fig. 4D).The anticorrelation between PCs 1 and 2 through pseudopod splitting is reﬂected in γ + − and γ − + , both of which have valuesgreater than 1 in WT cells. In comparison, the drug-treated cells have only a moderate anticorrelation. In the mhcA − strain, thedifferences in the values of γ + − , γ + − and γ − + when compared with their parent show similar changes to those observed indrug treatment (Fig. 4E). In both cases, the differences highlighted by these dynamical measurements are striking.We then applied the MaxCal model to the task of classiﬁcation. We settled upon classiﬁcation using k -nearest-neighbors(kNN). In order to see how the strength of our prediction improved with more data, we classiﬁed based on the preferred class of N repeats, all known to come from the same data set. We estimated the classiﬁcation power of our methods by cross validation,dividing the drug-treated data and its control into three sets containing different cells, and dividing the mhcA − and its parentby the day of the experiment. We ﬁrst performed the classiﬁcation by shape alone, taking small subsamples of frames fromeach cell and projecting them into their shape PCs, with our distance measure for the kNN being the Euclidean distance inthese PCs. With one, two or 3 PCs, we were able to achieve reasonable classiﬁcation of the drug-treated cells against theirparent as data set size increased, with the accuracy of classiﬁcation leveling off at around 0.85 (with 1 being perfect and 0.5being no better than random chance, Fig. 5A-C, blue). In contrast, classiﬁcation of mhcA − cells was little better than randomchance, even with relatively many data (Fig. 5A-C, green). This is unsurprising given the similarity of the distributions ofthese two conditions. We then calculated our MaxCal multipliers for subsamples of each of these groups, bootstrapping 100estimates from 20% of each set. We then repeat our kNN classiﬁcation, instead using for a distance measure the two MaxCalvalues that best separate our training classes. As the test data come from entirely separated sources (in the case of drug-treatedcells coming from different cells, and in the case of the mhcA − being taken on different days), we can be conﬁdent that we donot conﬂate our training and test data. In both the drug-treated case and the mhcA − mutant, the dynamics differ very cleanlybetween our test and control data. As such, our classiﬁcation is close to perfect even for only a few samples (Fig. 5D).As the two Lagrange multipliers that best classiﬁed the data both encoded correlations between adjacent time-steps, weguessed that this short-term memory might be key to recapitulating the dynamic properties of cell shape change. A key aspect ofthe shape dynamics of AX2 cells is the anticorrelation between the ﬁrst two PCs at the single-cell level (which is deﬁnitionallyabsent at the population level, as PCA produces uncorrelated axes). To see if memory is vital to recapitualting this dynamicalaspect of cell shape change, we constructed two versions of the master equation for our MaxCal system (see SI for details). Theﬁrst is Markovian (that is, at a given time the probabilities of each possible next event only depend on the current state of thesystem). We ran Gillespie simulations corresponding to this master equation, and compared the correlations of trajectories fromthese simulations with those from real data. The expected anticorrelation is clearly observed in the data (Fig 5E, black line), butthe trajectories of our Markovian Gillespie simulations fail to recapitulate it (Fig. 5E, blue line).We then introduced a memory property to the simulations, allowing the probabilities of each possible event to depend on thenature of the previous event (with the very ﬁrst event taken using the uncorrelated probabilities). The model has nine possiblestates (with each state containing its own set of event probabilities), corresponding to the nine possible events that might havepreceeded it. These are an increase, a decrease, or no change in each PC indepenently (3x3 events). These non-Maroviansimulations recovered the distribution of correlations observed in the data (Fig. 5E, red line). This indicates that such featuresof cell shape change can only be addressed by methods that acknowledge a dependency on past events. Discussion

Eukaryotic chemotaxis emerges from a vast network of interacting molecular species. Here, instead of examining the moleculardetails of chemotaxis in

Dictyostelium discoideum , we have inferred properties that capture cell behavior from observationsof shape alone. For this purpose, we quantiﬁed shape using Fourier shape descriptors, reduced these shape data to a small,tractable dimensionality by principal component analysis, and built a minimal model of behavior using the maximum calibervariational method. Unlike conventional modeling approaches, such as master equations and their simpliﬁcations, our methodis intrinsically non-Markovian, capturing memory effects and short-term history in the values of the behavioral signature ityields (see SI for further discussion of memory, and a comparison to the master equation). Our approach has the advantageof ease, requiring only the observation of what a cell naturally does , without tagging or genetic manipulation, as well as ofgenerality, being independent from the speciﬁc and poorly understood biochemistry of any one cell type. This is important tounderstanding chemotaxis, as the biochemistry governing this process can vary greatly: for example, the spermatazoa of C.elegans chemotax and migrate with no actin at all , but strategies for accurate chemotaxis might be shared among biologicalsystems and cell types.A number of recent studies have demonstrated the importance of pairwise or short-scale correlations in determining complex ehaviors both in space and time. The behavior of large ﬂocks of starlings can be predicted from the interactions of individualbirds with their nearest neighbors , and the pairwise correlations between neurons can be used to replicate activity at muchhigher coupling orders, correctly reproducing the coordinated ﬁring patterns of large groups of cells . Furthermore, cells inmany circumstances use short-range spatial interactions to organise macroscopically . Interestingly, these systems appearto exhibit self-organized criticality , in which the nature of their short-range interactions leads to periods of quiescencepuntuated by sudden changes. This could indicate the coupling strengths inherent to a system (such as the temporal correlationsin our shape modes in Dictyostelium cells) are crucial for complex behavior. Absence of this behavior could be an indicator ofdisease as illustrated by both of our aberrant cell types.Here, we employ a very simple classiﬁer to demonstrate the usefulness of our MaxCal multipliers as a measurementby which we can classify cell behaviours. We choose MaxCal because it is a minimal, statistical approach to modelling acomplex phenomenon, allowing high descriptive power with no assumptions made about the underlying mechanism. As ourunderstanding of the molecular biology controlling cell shape improves, an interesting alternative would be to use our data intraining recurrent neural network (RNN) auto-encoders, a self-supervised method in which the neural network trains a model toaccurately represent the input data. In particular, long short-term memory RNNs have recently been used to accurately identifymitotic events in cell video data and classes of random migration based on cell tracks . The two approaches are not mutuallyexclusive; MaxCal can provide a neat, compressed basis in which to identify behavioural states of cells, whilst RNNs could beused to learn time-series rules for transitions between behavioural states.It is increasingly clear that cell shape is a powerful discriminatory tool . For example, diffeomorphic methods of shapeanalysis have the power to discriminate between healthy and diseased nuclei . Shape characteristics can also be used asan indicator of gene expression : an automated, shape-based classiﬁcation of Drosophila haemocytes recently showed thatshape characteristics correlate with gene expression levels, speciﬁcally that depletion of the tumor suppressor PTEN leads topopulations with high numbers of rounded and elongated cells. Of particular note is the observation from this study that genesregulate shape transitions as opposed to the shapes themselves, illustrating the importance of tools to quantify behavior as wellas shape. This may be an appropriate approach to take if, for example, creating automated assistance for pathologists whenclassifying melanocytic lesions (a task which has already proved tractable to automated image analyses ), as classes are fewin number, predeﬁned and extensive training data are available. A drawback of the method used by is that their classes aredecided in advance, and the divisions between them are arbitrary. This means that the method cannot ﬁnd novel importantfeatures of shape by deﬁnition, as it can only pick between classes decided upon by a person in advance.A stronger alternative would be to take some more general description of shape and behavior (such as the one we detailhere), which could be used to give biopsied cells a quantitative signature. Training would then map these data not onto discreteclasses, but onto measured outcomes based on the long-term survival of patients. It will be important for any such methodto account for the heterogeneity of primary tissue samples as small sub-populations, lost in gross measurements, may be keydeterminants of patient outcomes. Such an approach would allow a classiﬁer to identify signs of disease and metastatic potentialnot previously observed or conceived of by the researchers themselves. As machine learning advances, it will be vital to specifythe problem without the insertion of our own biases. Then, behavioral quantiﬁcation will become a powerful tool for medicine. Cell culture.

The cells used in our experiments are either of the

Dictyostelium discoideum

AX2 strain, or a stable myosinheavy-chain knockout ( mhcA − ) in an AX2 background. Cells are grown in a medium containing 10 µ g / mL Geneticin 418disulfate salt (G418) (Sigma-Aldrich) and 10 µ g / mL Blasticidine S hydrochloride (Sigma-Aldrich). Cells are concentrated to c = × cells / mL in shaking culture (150 rpm). Five hours prior to the experiment, cells are washed with 17 mM K-NaPBS pH 6.0 (Sigma-Aldrich). Four hours prior to the experiment, cells are pulsed every 6 minutes with 200nM cAMP, and areintroduced into the microﬂuidic chamber at c = . × cells / mL . Measurements are performed with cells starved for 5-7 h.Drug-treated cells were exposed to 200 pM p-bromophenacyl bromide and 50 nM LY294002. No. cells sampled in AX2 control,drug-treated, AX2 parent, mhcA − are, respectively, 313, 23, 858,198. Microﬂuidics and imaging.

The microﬂuidic device is made of a µ -slide 3-in-1 microﬂuidic chamber (Ibidi) as describedin (12), with three 0 . × . mm inﬂows that converge under an angle of α = ◦ to the main channel of dimension 0 . × . × . mm . Both side ﬂows are connected to reservoirs, built from two 50 ml syringes (Braun Melsungen AG), separatelyconnected to a customized suction control pressure pump (Nanion). Two micrometer valves (Upchurch Scientiﬁc) reduce theﬂow velocities at the side ﬂows. The central ﬂow is connected to an infusion syringe pump (TSE Systems), which generates astable ﬂow of 1 ml / h . Measurements were performed with an Axiovert 135 TV microscope (Zeiss), with LD Plan-Neoﬂuarobjectives 20 x / . N . A . and 40 x / . N . A . (Zeiss) in combination with a DV2 DualView system (Photometrics). A solutionof 1 µ M Alexa Fluor 568 hydrazide (Invitrogen) was used to characterize the concentration proﬁle of cAMP (Sigma-Aldrich)because of their comparable molecular weight. mage preprocessing.

We extracted a binary mask of each cell from the video data using Canny edge detection, thresh-olding, and binary closing and ﬁlling. The centroid of each mask was extracted to track cell movement. Overlapping masksfrom multiple cells were discarded in order to avoid unwanted contact effects, such as distortions through contact pressure andcell-cell adhesion. For each binary mask, the coordinates with respect to the centroid of 64 points around the perimeter wereencoded in a complex number, with each shape therefore recorded as a 64 dimensional vector of the form S = x + i y . Thesevectors were passed through a fast Fourier transform in order to create Fourier shape descriptors. Principal component analysiswas performed on the power spectra (with the power spectrum P ( f ) = | s ( f ) | for the frequency-domain signal s ( f ) ) to ﬁndthe dominant modes of variation. This approach is superior to simple descriptors such as circularity and elongation, as keydirections of variability within the high-dimensional shape data cannot be known a-priori. As we have previously reported ,90% of Dictyostelium shape variation can be accounted for using the ﬁrst three principal components (PCs), corresponding tothe degree of cell elongation (PC 1), pseudopod splitting (PC 2) and polarization in the shape (PC 3) (Fig. 1B), with around85% of variability accounted for in just two, and 80% in one.

We are grateful to Börn Meier for sharing his data, and to both André Brown, Linus Schumacher and Peter Thomason fora critial reading of the manuscript. This work was supported by Cancer Research UK core funding (L.T.), the DeutscheForschungs-gemeinschaft (DFG fund HE5958/2-1), the Volkswagen Foundation grant I/85100 (D.H), and the BBSRC grantBB/N00065X/1 (R.G.E.) well as the ERC Starting Grant 280492-PPHPI (R.G.E.).

Author contributions statement

LT and RGE designed the study. LT and PW performed the experiments, and LT conducted data analysis and modelling. Allauthors (LT, PW, DH, RHI, RGE) analyzed results and data, and wrote the paper.

Additional information

Competing interests

All authors declare that there is no conﬂict of interest, neither ﬁnancial nor non-ﬁnancial.

References Csermely D, Mainardi D, Agostini N (1989) The predatory behaviour of captive wild kestrel, Falco tinnunculus L.

Bull Zool Stephens G J, Johnson-Kerner B, Bialek W, Ryu W S (2008) Dimensionality and dynamics in the behavior of C. elegans.

PLoS Comput Biol Andrew N, Insall R H (2007) Chemotaxis in shallow gradients is mediated independently of PtdIns 3-kinase by biasedchoices between random protrusions.

Nat Cell Biol Neilson MP et al (2011) Chemotaxis: A feedback-based computational model robustly predicts multiple aspects of real cellbehaviour.

PLoS Biol van Haastert P J M (2010) A model for a correlated random walk based on the ordered extension of pseudopodia. PLoSComput Biol Otsuji M, Terashima Y, Ishihara S, Kuroda S, Matsushima K (2010) A conceptual molecular network for chemotacticbehaviors characterized by feedback of molecules cycling between the membrane and the cytosol.

Sci Signal Westendorf C et al (2013) Actin cytoskeleton of chemotactic amoebae operates close to the onset of oscillations.

Proc NatAcad Sci USA Davidson A J, Amato C, Thomason P A, Insall RH (2017) WASP family proteins and formins compete in pseudopod- andbleb-based migration.

J Cell Biol jcb.201705160 Swaney K F , Huang C H , Devreotes P N (2010) Eukaryotic chemotaxis: a network of signaling pathways controls motility,directional sensing, and polarity.

Ann Rev Biophys

Marklein R A, Lam J, Guvendiren M, Sung K E, Bauer S R (2017) Functionally-Relevant Morphological Proﬁling: A Toolto Assess Cellular Heterogeneity.

Trends Biotech van Haastert P J M , Postma M (2007) Biased random walk by stochastic ﬂuctuations of chemoattractant-receptorinteractions at the lower limit of detection.

Biophys J Tweedy L, Knecht D A, Mackay G M, Insall R H (2016) Self-Generated Chemoattractant Gradients: Attractant DepletionExtends the Range and Robustness of Chemotaxis.

PLoS Biol

Tweedy L, Meier B, Stephan J, Heinrich D, Endres R G (2013) Distinct cell shapes determine accurate chemotaxis.

Sci.Rep.

Keren K et al (2008) Mechanism of shape determination in motile cells. it Nature 453:475-480.

Yin Z et al (2013) A screen for morphological complexity identiﬁes regulators of switch-like transitions between discretecell shapes.

Nat Cell Biol

Broekmans O D, Rodgers J B, Ryu W S, Stephens G J (2016) Resolving coiled shapes reveals new reorientation behaviorsin C. elegans. eLife

Gyenes B, Brown A E X (2016) Deriving Shape-Based Features for C. elegans Locomotion Using DimensionalityReduction Methods.

Front Behav Neurosci

Valletta J J, Torney C, Kings M, Thornton A, Madden J (2017) Applications of machine learning in animal behaviourstudies.

Animal Behav.

Gomez-Marin A, Stephens G J, Brown A E X (2016) Hierarchical compression of Caenorhabditis elegans locomotionreveals phenotypic differences in the organization of behaviour

Front Behav Neurosci

Pressé S, Ghosh K, Phillips R, Dill K A (2010) Dynamical ﬂuctuations in biochemical reactions and cycles.

Phys Rev E

Pressé S, Ghosh K, Lee J, and Dill KA (2013) Principles of maximum entropy and maximum caliber in statistical physics.

Rev Mod Phys

Jaynes E T (1980) The minimum entropy production principle.

Ann Rev Phys Chem

Schneidman E, Berry M J, Segev R, Bialek W (2006) Weak pairwise correlations imply strongly correlated network statesin a neural population.

Nature

Cavagna, A et al (2014) Dynamical maximum entropy approach to ﬂocking.

Phys Rev E

Vasquez J C, Marre O, Palacios A G, Berry II M J, Cessac B (2012) Gibbs distribution analysis of temporal correlationsstructure in retina ganglion cells

J Physiology-Paris

Firman T, Balázsi G, Ghosh K (2017) Building Predictive Models of Genetic Circuits Using the Principle of MaximumCaliber

J Biophys J

Meier B et al (2011) Chemotactic cell trapping in controlled alternating gradient ﬁelds.

Proc Natl Acad Sci USA

Nelson G A, Roberts T M, Ward S (1982) Caenorhabditis elegans spermatozoan locomotion: amoeboid movement withalmost no actin.

J Cell Biol

Toner J, Tu Y (1995) Long-range order in a two-dimensional dynamical XY model: how birds ﬂy together.

Phys Rev Lett

Bialek W et al (2012) Statistical mechanics for natural ﬂocks of birds.

Proc Nat Acad Sci USA

De Palo G, Yi D, Endres RG (2017) A critical-like collective state leads to long-range cell communication in Dictyosteliumdiscoideum aggregation

PLoS Biol

Mora T, Bialek W (2011) Are biological systems poised at criticality?

J Stat Phys

Chialvo D R (2010) Emergent complex neural dynamics.

Nat Phys

Phan H T H, Kumar A, Feng D, Fulham M, Kim J (2018) Unsupervised two-path neural network for cell event detectionand classiﬁcation using spatio-temporal patterns.

IEEE T Med Imaging doi: 10.1109/TMI.2018.2885572.

Kimmel J C, Brack A S, Marshall W F (2019) Deep convolutional and recurrent neural networks for cell motilitydiscrimination. bioRxiv doi: 10:1101/159202.

Rohde G K, Ribeiro A J S, Dahl, K N, Murphy R F (2008) Deformation-based nuclear morphometry: capturing nuclearshape variation in HeLa cells.

Cytom Part A

Esteva A et al (2017) Dermatologist-level classiﬁcation of skin cancer with deep neural networks

Nature 542:115–118. (s)y(s)y x n = 5 A R e s i dua l v a r i an c e μ m1 3 590% n = 12 s PC 1“Elongation” PC 2“Splitting” PC 3“Polarization” B P C PC 1 PC 1 C Shallow attractant gradient Steepattractantgradient

Figure 1.

Fourier shape descriptors reveal low-dimensional shape space. (A) (left) The outline of a live cell is convertedto a set of coordinates, which are a function of the distance traveled around the perimeter, s . (right) The original outline can bereconstructed with increasing accuracy by increasing the number n of included Fourier components. We record 64 componentsfor each shape and use all of them in our analyses. (B) Principal component analysis (PCA) performed on Fourier spectra ofcell shapes from 900 cells in a wide range of chemical gradients reveals that 90(83)% of shape variability in

D. discoideum canbe accounted for in the ﬁrst three (two) principal components, corresponding to elongation, splitting and polarization in thespatial domain. The inset picture above each PC shows its reconstruction in the spatial domain (i.e. after reverse Fouriertransformation). Each is added to (solid line) and subtracted from (dashed line) the mean cell shape descriptor. In order toguarantee their invariance when rotated or ﬂipped we use only the power spectrum of the Fourier component, which renderstheir reconstructions symmetric. We show shapes here that are two standard deviations above (solid lines) and below (dashedlines) the mean shape in each PC. For Fourier contributions to each PC, see Fig. S1. For details on data collection and analysis,see Materials and Methods and . (C) Example trajectories in the PC 1 and PC 2 shape space for one low- and onehigh-signal-to-noise ratio cell. Example cell outlines from the two trajectories are superimposed in their correct positions. + δ t t+2 δ tN(t) Particles

N(t)+1N(t)-1N(t)-2N(t)+2 N ( t ) N ( t ) + N ( t ) N ( t )- t Time B t+ δ t t t+ δ t

1+ 1- 2+ 2- t+ δ t

1+ 1- 2+ 2-1+1-2+2- 1+ 1- 2+ 2- CD λ λ λ λ λ λ λ λ λ λ λ

11- - λ λ

12- - λ λ λ λ λ

22- - λ γ λ λ λ - - λ λ λ λ λ λ λ λ -+ λ No change λ λ A PC 2 PC 1 λ ++1 λ +-1 λ - -1 λ ++2 λ +-22 λ - -2 λ ++1 λ +-12 λ - + λ - -1 Coupled multiplier values calculated

Uncorrelated SimulatedObserved

Possible movesin 1 δ t Figure 2.

MaxCal trained simulations reproduce local correlations. (A)

The panel shows the trajectory of a cell in shapespace over time as it shortens, splits pseudopods, commits to one pseudopod and lengthens again. Our aim is to distil thiscomplex behavioral information into a small, quantiﬁable signature for this behavor in a manner that will yield similarsignatures for similar behaviors. We subdivide the shape space, and register speciﬁc small events when cells cross boundaries.The elements of our signature are a series of multipliers that are determined by the rates at which particular events are observed.We introduce multipliers for two types of events: simple events, which encode the average rate of increase or decrease in eachprincipal component of shape, and correlated events, which encode how often we see certain combinations of simple events inquick succession. For every pair of simple events there is a speciﬁc multiplier (see C). As multipliers for simple events encodethe average rate at which transitions between squares occur in any particular direction, the input data for these events are anytwo adjacent time points, for example the ﬁrst two red spots. Here we see that PC 1 has decreased, increasing the average valueof the multiplier λ − , and PC 2 has increased, thereby increasing the value of λ + . In contrast, the ﬁrst two yellow points occupythe same square, and as we are encoding the likelihood of transitions, the values of all simple event multipliers are lowered torepresent this inactivity. Multipliers for correlated events involve three adjacent time points, as they encode the likelihood ofany one simple event following another particular simple event. For example, the three red dots increase the value of threecorrelated multipliers. λ −− due to a decrease in PC 1 being adjacent to another decrease in PC 1, λ ++ due to an increase in PC2 being adjacent to another increase in PC 2, and λ − + due to a decrease in PC 1 being adjacent to an increase in PC 2. The setof yellow dots does not increase the likelihood of any such correlated events as there are no transitions in the ﬁrst time step.The contributions of the whole trajectory can eventually comprise a quantitative signature in which correlated events appearmore or less frequently than would be expected by chance (shown below the main diagram, with blue indicating a higher (andred a lower) likelihood. The calculation of the magnitudes of these contributions is complicated and makes up much of thispaper. Note that, though the example trajectory shown here is real, the sub-divisions and time points are instructional only.( Inset ) In a single timestep (1 δ t ), a cell may be observed to move to any of the eight neighbouring squares, or may move tonone of them, resulting in nine possible observed trajectories. (B) Diagram of all possible trajectories for a single shapecomponent over time interval 2 δ t . The redundancy of each path is indicated by the number above. (C) Correlation parametersinferred from data. Lagrange multipliers are found corresponding to increases ( λ + ) and decreases ( λ − ) in PC 1 and increases( λ + ) and decreases ( λ − ) in PC 2. Additional Lagrange multipliers controlling the rates at which these events occur inneighboring time-steps are shown in the table, and rates are assumed to be symmetric (i.e. event A followed by event B isequivalent to event B followed by event A). (D) Co-occurrences of transitions in neighboring time-steps are shown forsimulations based only on naive probabilities (without correlations, left), for data (centre) and for simulations that includeshort-term temporal correlations (right). All are scaled relative to the naive rates of transition observed in the data. + ObservedCorrelatedUncorrelated .01 .02 .05 .1 .2 .5 -7 -5 -3 P r obab ili t y D en s i t y P r obab ili t y P r ed i c t ed P r obab ili t y Observed probability Divergence (bits)

A BC

CorrelatedUncorrelated CorrelatedUncorrelated - - - - - - Time (s)

Figure 3.

MaxCal model reproduces long-term cell behavior (A) (top) Example track of naive transitions. Possiblepatterns are highlighted: three 2+ in a row and four 1- in a row are given as examples. This panel is illustrative- though thetrack shown does correspond to the shape transition in the cartoon, there are spaces in between all transitions here, and actual3-in-a-row transitions would be even denser. (Bottom) Observed versus predicted probabilities of various patterns of transitionsfor simulations containing (red) or ignoring (blue) short-term temporal correlations. The black line corresponds to perfectmatching of predicted and observed probabilities. Patterns are always for a single transition type (e.g. 1+ only), and are of theform “N transitions in T time-steps” for varying N and T, e.g. three 1+ transitions in four time-steps. T runs from 10 to 20 inunit steps, N runs from T-8 to T in unit steps. (B)

Jensen-Shannon divergence between patterns of transitions observed in thedata and both correlated (red) and uncorrelated (blue) simulations. (C)

Probability of observing repeated transitions of a singletype under 3 models- the correlated model (red), the uncorrelated model (blue), and when observed (black). Results shown for1+ (left) and 1- (right). The x-axis gives the number of repeats in a row of this transition. arentalDrug-treated mhcA - 3 - 2 - 1 0 1020406080100 λ λ λ λ DE -100 0 100 2000.000.050.100.150.20 -10 0 10 200.000.050.100.15 PC 1 PC 2 C PC 1 PC 2 A AX2 Parent mhcA - - AX2 DIC AX2 Phase mhcA - Drug treated B Figure 4.

Maximum caliber coupling strengths reﬂect behavioral differences. (A)

Representative phase contrast imagesof AX2 (L) cells and mhcA − (R) cells. Scale bar is 10 µ m . (B) Outlines of cells as they move for all conditions in the study. Weimaged drug-treated cells paired with an AX2 parental control in differential interference contrast (DIC) and the mhcA − controlwith an AX2 parent in phase contrast microscopy, so we present these controls separately. Here, for clarity, we show outlines atintervals of one minute (or thirty frames). (C) Distributions in shape space of the BPB/LY treated cells (red) and the mhcA − strain (green) vs their respective controls (blue). Darker red and green areas show where the compared conditions overlap. (D) Lagrange multipliers λ x = log γ x from MaxCal model trained on untreated cells (blue) and p-bromophenacyl bromide (BPB)and LY294002 treated cells (red). Untreated cells have coupling values for λ + − and λ + − much lower than 0, indicating thepersistence in the direction of cell shape change (or more accurately, the rarity of reversals). In contrast, treatment with causescells to have couplings values for λ + − and λ + − that are not signiﬁcantly different to zero, indicating a lack of such persistence.Untreated cells also have much higher values than drug-treated for λ + − and λ − + , indicating a stronger anticorrelation betweenthe two components. Remaining distributions of rates can be found in Fig. S4. The distributions of all 14 rates differsigniﬁcantly upon drug treatment α = .

001 (Kolmogorov-Smirnov test). (E)

The distributions for the same parameters for mhcA − cells (green) against their parental control (blue). The changes in λ + − , λ + − and λ − + show similar changes to thoseobserved in drug treatment. These experiments used phase contrast microscopy rather than the DIC used in the drug-treatedcase, leading to the differences in the extremity of these coupling coefﬁcients. l a ss i f i c a t i on sc o r e C

10 20 30 40 505 10 15 20

AX2 vs BPB/LYAX2 vs mhc -

Using 3 shape PCs Using MaxCal -1.0 -0.5 0.5 1.00.20.40.60.81.01.2Correlation (PC 1, PC2)PDF DataMemoryNo memory E C l a ss i f i c a t i on sc o r e A Using one shape PC B Using two shape PCs

10 20 30 40 500.50.60.70.80.91.0 10 20 30 40 500.50.60.70.80.91.0 D Figure 5.

Discriminators trained on MaxCal behavioral descriptors perform better than shape alone. (A-C)