[PDF] A framework for studying behavioral evolution by reconstructing ancestral repertoires

Abstract

Although extensive behavioral changes often exist between closely related animal species, our understanding of the genetic basis underlying the evolution of behavior has remained limited. Here, we propose a new framework to study behavioral evolution by computational estimation of ancestral behavioral repertoires. We measured the behaviors of individuals from six species of fruit flies using unsupervised techniques and identified suites of stereotyped movements exhibited by each species. We then fit a Generalized Linear Mixed Model to estimate the suites of behaviors exhibited by ancestral species, as well as the intra- and inter-species behavioral covariances. We found that much of intraspecific behavioral variation is explained by differences between individuals in the status of their behavioral hidden states, what might be called their "mood." Lastly, we propose a method to identify groups of behaviors that appear to have evolved together, illustrating how sets of behaviors, rather than individual behaviors, likely evolved. Our approach provides a new framework for identifying co-evolving behaviors and may provide new opportunities to study the genetic basis of behavioral evolution.

Full PDF

AA framework for studying behavioral evolution by reconstructing ancestral repertoires

Dami´an G. Hern´andez,

1, 2, ∗ Catalina Rivera, ∗ Jessica Cande, Baohua Zhou,

1, 4

David L. Stern, and Gordon J. Berman

1, 5, † Department of Physics, Emory University Department of Medical Physics, Centro At´omico Bariloche and Instituto Balseiro Janelia Research Campus Department of Molecular, Cellular and Developmental Biology, Yale University Department of Biology, Emory University (Dated: July 21, 2020)Although extensive behavioral changes often exist between closely related animal species, ourunderstanding of the genetic basis underlying the evolution of behavior has remained limited. Here,we propose a new framework to study behavioral evolution by computational estimation of ancestralbehavioral repertoires. We measured the behaviors of individuals from six species of fruit ﬂies usingunsupervised techniques and identiﬁed suites of stereotyped movements exhibited by each species.We then ﬁt a Generalized Linear Mixed Model to estimate the suites of behaviors exhibited byancestral species, as well as the intra- and inter-species behavioral covariances. We found thatmuch of intraspeciﬁc behavioral variation is explained by diﬀerences between individuals in thestatus of their behavioral hidden states, what might be called their “mood.” Lastly, we propose amethod to identify groups of behaviors that appear to have evolved together, illustrating how sets ofbehaviors, rather than individual behaviors, likely evolved. Our approach provides a new frameworkfor identifying co-evolving behaviors and may provide new opportunities to study the genetic basisof behavioral evolution.

Behavior is one of the most rapidly evolving phe-notypes, with notable diﬀerences even between closely-related species [1, 2]. Variable behaviors and rapid be-havioral evolution likely allows species to adapt rapidlyto new or varying environments [3, 4]. Despite the im-portance of animal behavior, progress in revealing thegenetic basis of behavioral evolution has been slow [5–8].In contrast, recent decades have seen signiﬁcant progressin understanding the genetic causes of morphological evo-lution [9–12].While there are many potential reasons for the dis-crepancy between studies of behavioral and morpholog-ical evolution, including the lack of a fossil record forbehavior, a key diﬃculty is identifying which aspects ofan animal’s development and physiology are responsiblefor the observed changes in animals’ actions. Changes inbehavior along a phylogeny could emerge from alterationsin the developmental patterning of neural circuitry (e.g.,brain networks, descending commands, central patterngenerators), hormonal regulation that aﬀects the expres-sion of behaviors, or the gross morphology of an animal’sbody or limbs [13]. Each of these possibilities could resultin behavioral eﬀects at diﬀerent, yet overlapping, scales –from muscle twitches to stereotyped behaviors to longer-lived states like foraging or courtship or aging that maycontrol the relative frequency of a given behavior – mak-ing it diﬃcult to identify the precise aspects of behaviorthat are changing.To address these diﬃculties, the standard approach inthe study of behavioral evolution has been to identifyfocal behaviors that exhibit robust diﬀerences betweenspecies, such as courtship behavior in fruit ﬂies [14–16]or burrow formation in deermice [17, 18]. With these ro- bust diﬀerences in phenotype, it is possible to performanalyses that isolate regions of the genome that corre-late with quantitative changes in the performance of thefocal behavior. However, there tend to be multiple suchregions identiﬁed, each containing many genes. Giventhe large number of putative genes involved, combinedwith the possibility of epistatic interactions between loci,identiﬁcation of the contributions of individual genes tobehavioral evolution has moved slowly.An alternative approach to focusing on single behav-iors is to examine the full repertoire of movements thatan animal performs. By identifying sets of behaviorsthat evolve together, it may be possible to identify reg-ulators of these suites of behaviors. This approach hasbeen made possible by recent progress in unsupervisedidentiﬁcation of animal behaviors across length and timescales [20, 21]. In this study, we introduce a quantita-tive framework for studying the evolutionary dynamicsof large suites of behavior. We have focused initiallyon fruit ﬂies, which provide a convenient model for thisproblem because they exhibit a wide range of complex be-haviors and unsupervised approaches can be used to mapall of the animal movements captured in video recordings[19, 22, 23].We recorded movies of isolated male ﬂies from sixspecies in a nearly stimulus-free environment. Becausewe did not record ﬂies experiencing social and otherenvironmental cues, we did not observe many charis-matic natural behaviors, such as courtship and aggres-sion. Nevertheless, we found that the behaviors they per-formed, including walking and grooming, contain species-speciﬁc information. We thus hypothesized that ourquantitative representations of behaviors could be stud- a r X i v : . [ q - b i o . P E ] J u l B D. melanogaster D. mauritiana D. simulansD. sechellia D. santomea D. yakuba - - D e v i a t i o n s f r o m m e a n b e h a v i o r ( L o g ) -4 PosteriorMovementsAnteriorMovements

Slow

Idle/SlowLocomotion

Fast A B e h a v i o r a l d e n s i t y FIG. 1. Behavioral repertoires of

Drosophila . A: The behavioral space probability density function, obtained using theunsupervised approach described in [19] on the entire data set of 561 individuals across all species. Coarse grained behaviorscorresponding to the diﬀerent types of movements exhibited in the map are shown as well. B: The relative performance ofeach of the 134 stereotyped behaviors for each of the six species. Each region here represents a behavior, and the color scaleindicates the logarithm of the fraction of time that species performs the speciﬁed behavior divided by the average across allspecies. ied in an evolutionary context. To infer the evolution-ary trajectories of behavioral evolution, we estimated an-cestral behavioral repertoires with a Generalized LinearMixed Model (GLMM) approach [24], which builds uponFelsenstein’s approach to reconstructing ancestral states[25, 26]. Using these results, we develop a frameworkthat allows us to model the behavioral traits that co-varyboth within a species and along the phylogeny. We ﬁndthat within-species variance is related primarily to long-lasting internal states of the animal, what might be calleda ﬂy’s mood, and that inter-species variance can capturehow disparate behaviors may evolve together. This lat-ter ﬁnding points towards the presence of higher-orderbehavioral traits that may be amenable to further evolu-tionary and genetic analysis.

EXPERIMENTS AND BEHAVIORALQUANTIFICATION

We captured video recordings of all behaviors per-formed by single ﬂies isolated in a largely featurelessenvironment for multiple individuals from six species ofthe

Drosophila melanogaster species subgroup:

D. mau-ritiana , D. melanogaster , D. santomea , D. sechellia , D. simulans , and

D. yakuba [22]. Although the animalscould not jump or ﬂy in these chambers and were notexpected to exhibit social or feeding behaviors, the ﬂiesdisplayed a variety of complex behaviors, including loco-motion and grooming. Each of these behaviors involvesmultiple body parts that move at varying time scales.The species studied here were chosen because their phy-logenetic relationships are well understood [27–30] (sum-marized in the tree seen in Fig. 3), and genetic toolsare available for most of these species [31]. Since a sin- gle strain represents a genomic snapshot of each species,we assayed individuals from multiple strains from eachspecies to attempt to capture species-speciﬁc diﬀerences,and not variation speciﬁc to particular strains (see Ma-terials and Methods). In total, we collected data from561 ﬂies, each measured for an hour at a sampling rateof 100 Hz.While previous studies have identiﬁed diﬀerences inspeciﬁc behaviors, such as courtship behavior, betweenthese species [14, 16, 32, 33], here we assayed the fullrepertoire of behaviors the ﬂies performed in the arena,with the aim of identifying combinations of behaviorsthat may be evolving together. To measure this reper-toire, we used a previously-described behavior mappingmethod [19, 22] that starts from raw video images andattempts to ﬁnd each animals stereotyped movements inan unsupervised manner. The output of this methodis a two-dimensional probability density function (PDF)that contains many peaks and valleys (Fig. 1A), whereeach peak corresponds to a diﬀerent stereotyped behavior(e.g., right wing grooming, proboscis extension, running,etc).Brieﬂy, to create the density plots, raw video imageswere rotationally and translationally aligned to createan egocentric frame for the ﬂy. The transformed imageswere decomposed using Principal Components Analysisinto a low-dimensional set of time series. For each of thesepostural mode time series, a Morlet wavelet transformwas applied, obtaining a local spectrogram between 1 Hzand 50 Hz (the Nyquist frequency). After normalization,each point in time was mapped using t-SNE [34] into atwo dimensional plane. Finally, convolving these pointswith a two-dimensional gaussian and applying the water-shed transform [35], produced 134 diﬀerent regions, eachof these containing a single local maximum of probabil-

FIG. 2. Classiﬁcation of ﬂy species based on behavioralrepertoires. A: A t-SNE embedding of the behavioral reper-toires shows that behavioral repertoires contain some species-speciﬁc information. Each dot represents one individual ﬂy,with diﬀerent colors representing diﬀerent species and diﬀer-ent symbols with the same color representing diﬀerent strainswithin the same species. The distance matrix (561 by 561)used to create the embedding is the Jensen-Shannon diver-gence between the behavioral densities of individual ﬂies. B:Confusion matrix for the logistic regression with each row nor-malized. All the values are averaged from 100 diﬀerent trials.The standard error is less than 0.01 for the diagonal elementsand less than 0.005 for each of the oﬀ-diagonal elements. ity density that corresponds to a particular stereotypicalbehavior. Thus, by integrating the density of the regionfor a particular ﬂy, we can associate to each of thema 134-dimensional real-valued vector that represents theprobability of the ﬂy performing a certain stereotypedbehavior at a given time. We will refer to this quantityas the animal’s behavioral vector (cid:126)P .The behavioral map averaged across all six species isshown in Fig. 1A and displays a pattern of movementssimilar to those we found in previous work, where lo-comotion, idle/slow, anterior/posterior movements, etc.are segregated into diﬀerent regions [19, 22]. Averaging across all individuals of each species, we found the meanbehavioral vector for each species (Fig. 1B) and observedthat each species performs certain behaviors with diﬀer-ent probabilities. For example,

D. mauritiana individ-uals spend more time performing fast locomotion thanall other species on average, and

D. yakuba individualsspend much of their time performing an almost species-unique type of slow locomotion, but little time runningquickly.These average probability maps provide some insightinto potential species diﬀerences, but to identify species-speciﬁc behaviors, we also need to account for variationin the probability that individuals of each species performeach behavior. One way to address this problem is to askwhether an individual’s species identity can be predictedsolely from its multi-dimensional behavioral vector. Toexplore this question, we ﬁrst used t-SNE to project all561 individuals into a 2 dimensional plane (Fig. 2A), us-ing the Jensen-Shannon divergence as the distance met-ric between individual behavioral vectors. In this plot,diﬀerent colors represent diﬀerent species, and diﬀerentsymbols with the same color represent diﬀerent strainswithin the same species. Although there is not a clearsegregation of all species in this plane, the distributionof species is far from random, with individuals from thesame species tending to group near to each other.To quantify this observation, we applied a multinomiallogistic regression classiﬁer that performed a six-way clas-siﬁcation based solely on the high-dimensional behavioralvectors. After training, the classiﬁer correctly classiﬁed89 ± .

2% of vectors (using a randomly-selected test set of30% of the entire data set). Moreover, the confusion ma-trix (Fig. 2B) revealed no systematic misclassiﬁcationsbias amongst the species. Note that we have used a rel-atively simple classiﬁer compared to modern deep learn-ing methods [36], so these results likely represent a lowerbound on the distinguishability of the behavioral vectors.Thus, behavioral vectors appear to contain considerablespecies-speciﬁc information. We therefore proceeded toexplore how these behavioral vectors may have evolvedalong the phylogeny.

RECONSTRUCTING ANCESTRALBEHAVIORAL REPERTOIRES

Multiple methods have been proposed for reconstruct-ing ancestral states solely from data collected from extantspecies [25, 37]. These methods generally fall into twocamps: parsimony reconstruction, which attempts to re-construct evolutionary history with the fewest number ofevolutionary changes [38], and diﬀusion-processes, whichmodel evolution as a random walk on a multi-dimensionallandscape [39]. Given the high-dimensional behavioralvectors that we are attempting to model, a diﬀusion pro-cess is more likely to capture the inter-trait correlations

FIG. 3. Reconstructed behavioral repertoires using the GLMM. Inferred probabilities of the behavioral traits for the ancestralstates are plotted in logarithmic scale. Except for the ancestral root, other ancestral states are plotted with respect to theclosest ancestor. Here, all but the root ancestor are plotted with respect to their closest ancestral state. Therefore, for eachbehavioral trait, i , we show: log( ¯ P i ) − log( ¯ P Anci ), where ¯

P i and ¯ P Anci correspond to the inferred mean behavioral trait for thegiven ancestor and its closest ancestor, respectively. that we would like to understand. Thus, we focus on adiﬀusion-based model here.Given a phylogeny for a collection of species, we mod-eled how species-speciﬁc complexes of behaviors mighthave emerged. Speciﬁcally, we assumed that each behav-ior is a quantitative trait, that is, each behavioral dif-ference results from the additive eﬀects of many geneticloci, each of small eﬀect. We do not, however, assumethat all behaviors evolve independently of each other.Thus, we are interested in predicting (1) how behaviorsco-vary and (2) whether intra- and inter-species variationcan be separated to identify independently evolving setsor linear combinations of behaviors.We assumed that the ﬂies’ behaviors evolved via a dif-fusion process, where initially the process starts at thecommon ancestor behavioral representation and eventu-ally each individual’s trajectory performs a random walkwith Gaussian noise along the known phylogenetic tree.Note that this is a less stringent assumption than neutral-ity, as multiple traits under selection may evolve in a cor-related manner. More precisely, we ﬁt a Multi-responseGeneralized Linear Mixed Model (GLMM) to the data,using the approach described in [24]: (cid:126)l = (cid:126)µ + (cid:126)ρ + (cid:126)e (1)where (cid:126)l = ( l , ..., l K =134 ) denotes the logarithm of the behavioral vector (cid:126)P for each individual, (cid:126)µ is the meanbehavior of the common ancestor (treated as the ﬁxedeﬀects of this model), and (cid:126)ρ and (cid:126)e are the randomeﬀects corresponding to the phylogenetic and individ-ual variability, respectively. We assume that these ran-dom eﬀects are generated from the multi-dimensional normal distributions N ( (cid:126) , A ⊗ V ( a ) ) (phylogenetic) and N ( (cid:126) , I ⊗ V ( e ) ) (individual). Here, the matrix A repre-sents the information contained in the phylogenetic tree,with A ij being proportional to the length of the pathfrom the most recent common ancestor of species i and j to the main ancestor. This matrix is normalized sothat the diagonal elements are all equal to 1. I is theidentity matrix, and V ( a ) and V ( e ) are the covariancematrices that govern the process. We ﬁt µ , V ( a ) , and V ( e ) using Markov Chain Monte Carlo (MCMC) simu-lation (see Materials and Methods). We checked thatthe MCMC converged using the Gelman-Rubin diagnos-tic (see Materials and Methods, Fig. S1). In addition tothe inferred behavioral states corresponding to the com-mon ancestor, ¯ P Anc , we also reconstructed the mean be-havioral representations for the intermediate ancestors(Fig. 3). Further validation of our results correspondingto the current species behavior is shown in Fig. S2.

INDIVIDUAL VARIABILITY AND LONGTIMESCALE CORRELATIONS

While it is not possible to directly test the accuracy ofour ancestral state reconstructions, the inferred covari-ance matrices generate predictions about genetic corre-lations that are, in principle, testable. We therefore focuson our ﬁtted covariance matrix, V ( e ) ∈ (cid:60) × , whichaccounts for within-species random eﬀects.We ﬁrst note that V ( e ) exhibits a modular structure(Fig. 4A). After rearranging the behavior order via aninformation-based clustering procedure [40], we see thata block diagonal pattern emerges, with positive correla- - - A BC C o v a r i an c e FIG. 4. The structure of variability between ﬂies of the same species relates to long timescale transitions in behavior. A:The intra-species behavioral covariance matrix ( V ( e ) ), with columns and rows ordered via an information-based clusteringalgorithm [40]. The black squares represent behaviors that are grouped together in the three cluster solution. B: Behavioralmap representation of the clustering solutions. The two, three, and six cluster solutions are shown on top (colors on the threecluster solution match those above the plot in A). The clusters are all spatially contiguous and break down hierarchically (seeFig. S3 for more examples). C: Clustering structure of the behavioral space obtained ﬁnding the optimally predictive groupsof behaviors (see text for details). Note how these clusterings are nearly the same as the clusterings in B, despite having beenderived from an entirely independent measure. tions lying within the blocks and negative correlationslying oﬀ the diagonal. This clustering approach mini-mizes the functional F = (cid:104) d (cid:105) + βI ( C ; b ), where (cid:104) d (cid:105) is theaverage within-cluster distance between behaviors (de-ﬁned here as d ij = [1 − V ( e ) ij / (cid:113) V ( e ) ii V ( e ) jj ]), I ( C ; b ) isthe mutual information between cluster assignment andbehavior number, and β modulates the relative impor-tance of the two terms (see Materials and Methods). Thismodular structure emerges when applying other cluster-ing methods as well (Fig. S3). Quantifying the matrix’smodularity, we ﬁnd that (cid:104) d (cid:105) ≈ .

30 and 0 .

22 for the3 and 6-cluster solutions respectively. These values aresigniﬁcantly smaller than the average distances obtainedusing random cluster assignments ( (cid:104) d (cid:105) = 0 . ± .

03 and0 . ± .

04 for 3 and 6 clusters respectively, see Fig. S5).Strikingly, these clusters are spatially contiguous in thebehavioral map – implying that similar behaviors explainmuch of the intra-species variance [23]. Moreover, newclusters emerge in a hierarchical fashion, where coarse-grained behaviors sub-divide into new clusters (Fig. 4B),a feature that is not guaranteed by the information-basedclustering algorithm.This hierarchical structure of the behavioral space isreminiscent of the hierarchical temporal structure of be-havior that was hypothesized originally by ethologists[41] and was observed to optimally explain the longtimescale structure of

Drosophila melanogaster behav-ioral transitions [23]. To explore this connection fur-ther, we found coarse-grainings of the behavioral spacethat are optimally predictive of the future behaviors thatthe ﬂies perform via the Deterministic Information Bot-tleneck (DIB) [42]. Similar to the previously described information-based clustering method, this approach min-imizes a functional, J τ = − I ( b ( t ); Z ( t + τ )) + γ H ( Z ),where b ( t ) is a ﬂy’s behavior at time t , Z ( t + τ ) is thecoarse-grained behavior visited at time t + τ , τ = 50, I ( b ( t ); Z ( t + τ )) is the mutual information between thesequantities, γ is a positive constant, and H ( Z ) is the en-tropy of the coarse-grained representation (see Materialand Methods). As γ is increased, progressively coarserrepresentations are found.Applying this method to the data pooled across allsix species (Figs. 4C, S4), we again found the same typeof hierarchical division in the behavioral space that wasobserved for freely moving D. melanogaster [23]. More-over, we found that the structure of the space using thisapproach closely mirrors the structure found via cluster-ing V ( e ) (Fig. 4C). We quantify the similarity betweenboth clustering partitions by calculating the WeightedSimilarity Index (WSI), a modiﬁcation of the Rand In-dex [43] (Materials and Methods). The WSI between theinformation-based clustering method and the predictiveinformation bottleneck for three clusters is W SI = 0 . W SI = 0 .

87 for six clusters. For random clusterings,we would expect to observe 0 . ± .

02 and 0 . ± .

01 for3 and 6 clusters, respectively, indicating a non-randomoverlap between these two partitions. Fig. S3, shows thatthis result is independent of the clustering method andthe number of clusters.The overlap between these two coarse-grainings indi-cates that most individual variability in the behaviors weobserve results from non-stationarity in behavioral mea-surements, rather than from individual-speciﬁc variation.That is, much of the intraspeciﬁc variation appears to ( z b ) ij = μ i - μ j σ i + σ j ( z b ) ij f o rr eno r m a li z edp r obab ili t i e s - - - - ( p ) - - - - ( p )- - - - ( p ) - - - - ( p )- - - - ( p ) - - - - - ( p )- - - - ( p ) - - - ( p )- - - - ( p ) - - - - ( p )- - - ( p ) - - - ( p ) D. sanD. yak

B C

Many individuals within a species An individual over long-time scales

State 1 State 3

1 2 3

Behavioral repertoire

Hidden states Coarse-grained postural behaviors

Fly … State 2 time Fly A FIG. 5. Variability within a species, long timescale transitions, and hidden states modulating behavior. A: A cartoon of thehypothesized relation between individual variability within a species and long timescale transitions through hidden states. B:Accounting for the long timescale dynamics - by adjusting for the amount of time spent in each coarse-grained region (here,the six cluster solution at the top right of Fig. 4C) - aﬀects the measured behavioral distributions between

D. santomea and

D. yakuba . Shown is the comparison of the Mahalanobis distance (( z b ) ij ) between behavioral distributions before (x-axis) andafter (y-axis) adjusting. C: Kernel density estimates of the distributions for the circled behaviors in B) on the left before (left)and after (right) adjustments. Solid lines represent D. santomea and dashed lines represent

D. yakuba . reﬂect ﬂies recorded when they were experiencing diﬀer-ent hidden behavioral states (i.e. moods), rather thanreﬂecting ﬁxed (environmental or genetic) diﬀerences be-tween ﬂies. This variation may have arisen because, al-though we controlled many variables (e.g., ﬂy age, circa-dian cycle, temperature, and humidity), it is not possibleto control for all internal factors (e.g., hunger, arousal,etc.) that aﬀect an animals behavioral patterns [44].The temporal coarse-graining of the behavioral spacethat we found via the DIB, gives insight into these non-stationarities, as they are optimally-predictive of the ﬂysfuture behaviors. Given the contiguous nature of theseregions, this result means that ﬂies tended to stay withinspeciﬁc regions of the behavioral space much longer thanone would assume from a Markov model.This observation implies that variation in behavior ob-served among individuals, especially in non-manipulatedsettings, is likely to often reﬂect a large component ofhidden behavior states (Fig. 5A). Thus, it may be pos- sible to improve upon behavioral measurements in manysettings by controlling for the variability associated withthese hidden states. For example, just because one ﬂyperforms less anterior grooming than another may reﬂectthat the animal is in a diﬀerent long timescale behavioralstate, rather than that the animal has a genetically en-coded preference for reduced grooming.A potential method for accounting for these artifactsis to normalize each individual’s behavioral density suchthat the amount of time that the animal spends in each ofthe coarse-grained regions is equalized. In other words,the amount of time spent anterior grooming, locomot-ing, etc. are set to be the same for all animals, thus ac-counting for the variability associated our inferred hiddenstates. Mathematically, if P i is the probability of observ-ing behavior i , and C i is the clustering assignment of this D F E A B C

Eigenvector 1 Eigenvector 3 Eigenvector 5 Eigenvector 6 Eigenvector 4 Eigenvector 2

Eigenvector components C o v a r i an c e D. sanD. yak - P D F D. simD. sech - P D F D. simD. mau - P D F E x p l a i ned v a r i an c e - - - - FIG. 6. Phylogenetic variability and behavioral meta-traits. A: (top) Clustering the phylogenetic covariance matrix (usingthe same information-based clustering method from Fig. 4), we observe that the clusters are no longer spatially contiguous.(bottom) The phylogenetic covariance matrix reordered according to four clusters (colors corresponding to the four-cluster mapabove). B: Fraction of variance explained by the largest eigenvalues of the phylogenetic covariance matrix. C: The eigenvectorscorresponding to the largest six eigenvalues. D: Distributions of the projections of individual density vectors from

D. santomea and

D. yakuba onto eigenvector 3. E: Same as in D but using projections of individuals from

D. sechellia and

D. simulans ontoeigenvector 4. F: Same as in D but using projections of individuals from

D. simulans and

D. mauritiana onto eigenvector 5. behavior, we can deﬁne a normalized probability, ˆ P i , viaˆ P i = ¯ P ( C i ) P ( C i ) i P i , (2)where P ( C ) i = (cid:80) k ∈ C P k is the total density in cluster C for an individual ﬂy and ¯ P ( C ) is the average across allanimals.We found that applying this normalization to our dataoften results in substantial changes in the inferred dis-tributions of behavioral densities. For example, Fig. 5Bdisplays how the diﬀerence in behavioral density between D. santomea and

D. yakuba (as measured by the Ma-halanobis distance between the distributions) alters asa result of normalization. For some behaviors, the sig-nal increases (red points), and in some cases, it reverses(blue points). Thus, it is important to take these non-stationary eﬀects into account when estimating how oftensingle behaviors are performed in studies of behavioralevolution. To measure these non-stationary eﬀects, manybehaviors must be measured, not just a focal behavior.

IDENTIFYING PHYLOGENETICALLY LINKEDBEHAVIORS

One of the advantages of our approach is that we sep-arate variations in behavior corresponding to evolution- ary patterns, the phylogenetic variability, from variationsamong individuals of the same species. By studying theproperties of the phylogenetic covariance matrix ( V ( a ) ),we can identify behaviors that may be evolving together.We ﬁrst characterized the coarse-grained structurewithin V ( a ) through the information-based clustering de-scribed in the previous section [40]. As seen in Fig. 6A,these clusters are not spatially contiguous in the behav-ioral space. This pattern contrasts to the spatial con-tiguity we observed for the individual covariance matrix(Fig. 4B). For example, the two-cluster solution (Fig. 6A,left) separates the behavioral space into side legs move-ments (middle) and certain locomotion gaits (far left)from the rest of behaviors. Similarly, non-localized struc-ture is also observed when the matrix is clustered into alarger numbers of clusters as well.One possible interpretation of these discontinuous clus-ters is that at the neural level, each of these groups ofmovements may reﬂect a motor response to shared up-stream commands [22]. For example, diﬀerent types of lo-comotion might be controlled through the same descend-ing neural circuitry, but due to evolutionary changes, thesame commands could lead to diﬀerent behavioral out-puts, as has been observed in ﬂy courtship patterns [16].Thus, examination of phylogenetically course-grained re-gions such as these may provide a more biologically real-istic view of suites of evolving behaviors than does focuson single behaviors.To quantify these patterns as traits, we decomposed V ( a ) via an eigendecomposition. As seen in Fig. 6B, al-most all of the variance within the matrix can be ex-plained with only the ﬁrst six eigenmodes. These eigen-vectors (Fig. 6C) share similar non-local structure to theclusterings described above. By projecting individual be-havioral vectors onto these eigenvectors, the resulting dotproducts represent a meta-trait that is a linear combina-tion of phylogenetically linked behaviors.These evolving meta-traits may be suitable targets forfurther neurobiological or genetic studies. Three exam-ples of these distributions are shown in in Fig. 6D forseveral pairs of closely related species. These three ex-amples were not chosen at random, but instead becausethey showed signiﬁcant diﬀerentiation between species.The aim of this analysis is not to show that all meta-traitswould diﬀer between all pairs of species, which strikes usas unlikely, but rather that it is possible to identify syn-thetic meta-traits that could be further interrogated withexperimental methods. DISCUSSION

We have developed a quantitative framework to studythe evolution of behavioral repertoires, using fruit ﬂies(

Drosophila ) as a model system. We started with ob-servations of 561 individuals from six extant species be-having in an unremarkable environment. This assay didnot include social behaviors, such as courtship and ag-gression, nor many foraging behaviors. Thus, at ﬁrstglance, it might seem like we had excluded most species-speciﬁc behaviors from the analysis. Nonetheless, wefound that other complex behaviors, like walking, run-ning, and grooming, exhibit species-speciﬁc features thatcan be used to reliably assign individuals to the correctspecies. Thus, the motor patterns of behaviors that arenot normally investigated for their species-speciﬁc fea-tures are clearly evolving between even closely relatedspecies. It is not clear if these diﬀerences reﬂect naturalselection or genetic drift on the details of these motorpatterns. But, all of these behaviors would seem to becritical to individual survival, so it is possible that thesebehaviors have evolved, at least in part, in response tonatural selection. It is clear, however, that the underly-ing neural circuitry controlling these behaviors must haveevolved.Inspired by these observations, we estimated pat-terns of behavioral evolution in the context of a well-understood phylogeny. We ﬁt a Generalized Mixed Lin-ear Model to our behavioral measurements and the givenphylogeny to reconstruct ancestral behavioral repertoiresand the intra- and inter-species covariance matrices. Wefound that the patterns of intra-species variability aresimilar to long timescale behavioral dynamics. This suggests that much of the intraspeciﬁc variability thatemerged by sampling ﬂies under well-controlled condi-tions reﬂects variability in the hidden behavioral statesof individual ﬂies. This variability is a clear confound forevolutionary and experimental studies of behavior and wetherefore propose a method to control for these internalstates and improve the accuracy of behavioral phenotyp-ing. We showed that controlling for these internal statescan dramatically alter estimates of the heritable elementsof behavior.Given our estimates for how suites of behaviorsevolved, we examined whether the inter-species covari-ance matrix could be used to identify behavioral meta-traits that might be subjected to further evolutionaryand experimental analysis. We identiﬁed multiple suitesof behaviors that diﬀered between closely related species,providing a starting point for further analysis of how themechanismns underlying these suites of behaviors haveevolved.The analysis framework introduced here represents theﬁrst attempt to analyze full behavioral repertoires to gaininsight into evolution. In principle, this approach couldbe applied to any data set where a large number of be-haviors have been sampled in many species. We envisionseveral areas where future improvements may yield moredetailed, comprehensive, and biologically meaningful re-sults. First, we recorded behavior from only six speciesof ﬂies. Adding additional species would place more con-straints on the evolutionary dynamics, likely resulting inless variance in the ancestral state estimations and poten-tially adding more structure to the relatively low rank co-variance matrices. Additionally, further work is requiredto determine the balance between sampling within andbetween strains and species that optimizes estimates ofevolutionary dynamics.Second, our framework assumes that all evolutionarychanges in behavior resemble a diﬀusion process. Al-though this assumption is a reasonable initial hypothesis[25], it may be possible to test this assumption. For ex-ample, deeper sampling of additional species may allowidentiﬁcation of speciﬁc behaviors on particular lineageswhere neutrality can be rejected [45].In addition, all of our analyses involved measuringthe fraction of behaviors performed during the record-ing time, ignoring the temporal structure and sequencesof movements. While we show here that much of thisinformation can be related to the structure of the intra-species covariance matrix, the order in which behaviorsoccured may also provide important biological informa-tion. It should be possible to incorporate temporal struc-ture directly into the regression. Deciding exactly whichquantities to measure and how they should be incorpo-rated, however, are complex questions that are outsidethe scope of this initial study.Lastly, capturing the full range of animal behaviors fora large number of animals presents a number of techno-logical challenges, which is why we focused on measuringbehavior in a highly simpliﬁed environment. However, amore complete understanding of the structure of behav-ior will require more sophisticated ways to capture be-havioral dynamics in more naturalistic settings and dur-ing complex social arrangements. While modern deeplearning methods have made tracking animals in morerealistic settings increasingly plausible [46, 47], there arestill considerable hurdles to translating this informationinto a form that can be subjected to the kind of analysiswe propose here.Despite these limitations, this work represents a newway to quantitatively characterize the evolution of com-plex behaviors, which may provide new phenotypes thatcan be subjected to experimental analysis. In the ab-sence of a behavioral fossil record, reconstructing ances-tral behaviors requires an inferential approach like theone we present here. In addition, more complex modelscould be built to test assumptions underlying this ini-tial, diﬀusion-based, model. Finally, a strength of ourapproach is that it makes falsiﬁable predictions abouthow behaviors are linked mechanistically, providing pre-dictions that can be tested experimentally to provide fur-ther insight in the genetic and neurobiological structureof behavior.

MATERIALS AND METHODSData collection

All imaging of ﬂy behavior followed the procedures de-scribed in [22], but without any red light stimulation.In total, we collected data from 561 individual from 18strains and six species. These included three strains of

D.mauritiana ( mau29 : 29 ﬂies, mau317 : 35 ﬂies, mau318 :32 ﬂies), four strains of D. melanogaster ( Canton-S : 31ﬂies,

Oregon-R : 33 ﬂies, mel54 : 34 ﬂies, mel56 : 31 ﬂies),three strains of

D. santomea ( san00 : 29 ﬂies, san1482 :33 ﬂies, STO OBAT : 22 ﬂies), three strains of

D. sechel-lia ( sech28 : 32 ﬂies, sech340 : 25 ﬂies, sech349 : 33 ﬂies),three strains of D. simulans ( sim5 : 33 ﬂies, sim199 : 30ﬂies, Oxnard : 34 ﬂies), and two strains of

D. yakuba ( yak01 : 34 ﬂies, CYO2 : 31 ﬂies).

Generalized Linear Mixed Model

We ﬁt our GLMM (Eq. 1) using the software in-troduced in [24]. The covariance matrices V ( e ) and V ( a ) ∈ (cid:60) K × K , K = 134 and the mean vector (cid:126)µ ∈ (cid:60) K × ,were inferred from the posterior distribution via MCMCsampling. Prior distributions for the covariance matri-ces were given by Inverse Wishart Distributions (conju-gate priors for the multi-Gaussian model) with K degreesof freedom and K +1 I + J as scale matrix, with J and I the unit and identity matrices respectively. Tree branchlength were estimated from [30]. Gelman-Rubin convergence diagnostic

This test evaluates MCMC convergence by analyzingthe diﬀerence between several Markov chains. Conver-gence is evaluated by comparing the estimated between-chains and within-chain variances for each parameter ofthe model. Large diﬀerences between these variancesindicate non-convergence [48]. Let θ be the model pa-rameter of interest and { θ m } Nt =1 be the m th simulatedchain, m = 1 , , ..., M . Denote, ˆ θ m and ˆ σ m be the sam-ple posterior mean and variance of the m th chain. Ifˆ θ = M (cid:80) Mm =1 ˆ θ m is the overall posterior mean estimator,the between-chains and within-chain variances are givenby: B = NM − M (cid:88) m =1 (ˆ θ m − ˆ θ ) , W = 1 M M (cid:88) m =1 ˆ σ m . (3)In reference [48], it is shown that the following weightedaverage of W and B is an unbiased estimator of themarginal posterior variance of θ : ˆ V = N − N W + M +1 NM B .The ratio ˆ V /W should get close to one as the M chainsconverge to the target distribution with N → ∞ . In ref-erence [49] this ratio known as the potential scale reduc-tion factor (PSRF) was corrected to account for the thesampling variability using R c = (cid:113) d +3 d +1 ˆ VW , where d is thedegrees of freedom estimate of a t-distribution. Values ofPSRF for all model parameters such that R c < . Information-based clustering

Minimizes the distance between elements within clus-ters while compressing the original representation. Themethod minimizes the functional F = (cid:104) d (cid:105) + T I ( C ; i ),where I ( C ; i ) = (cid:80) Ni =1 (cid:80) N c C =1 P ( C ; i ) log[ P ( C | i ) P ( C ) ] is themutual information between the original behavioral vari-able i and the representation C . (cid:104) d (cid:105) = (cid:80) N c C =1 P ( C ) d ( C ),and d ( C ) is the average distance of elements chosen outof a single cluster: d ( C ) = N (cid:88) i N (cid:88) i P ( i | C ) P ( i | C ) d ( i , i ) , (4)with d ( i , i ) being the distance measure between a pairof elements and P ( i | C ) being the probability to ﬁndelement i in cluster C .0Given | C | = N c , T and a random initial condi-tion for P ( C | i ), a solution is obtained by iteratingthe following self-consistent equations until the criteria F t −F t +1 F t < − is satisﬁed. We chose 40 ,

000 diﬀerentrandom values of T ∈ [0 . , N c between 2 and 20,and performed the optimization in each case until theconvergence criterion was met. We deﬁned the Paretofront as the set of solutions P ( C | i ) such that no othersolution presents a smaller (cid:104) d (cid:105) and a smaller I ( C ; i ). Fi-nally, for each number of clusters we selected the solutionwith the lowest (cid:104) d (cid:105) .For each number of clusters, the signiﬁcance of theoptimal value found for (cid:104) d (cid:105) is shown by comparing itto the average distance corresponding to random clusterassignments. These assignments are made in such a waythat the amount of elements per cluster is conserved byrandomly shuﬄing the vector that assigns each behaviorto a particular cluster. The values presented in the maintext correspond to the mean and standard deviation of (cid:104) d (cid:105) over 50 diﬀerent random trials. Deterministic Information Bottleneck

Here we use the Deterministic Information Bottleneck(DIB) method to ﬁnd coarse-grainings of the behavioralspace that optimally predict future states [42]. Inspiredby the Information Bottleneck [50], DIB replaces thecompression measure I ( X, Z ) with the entropy H ( Z ),thus emphasizing constraints on the representation. DIBminimizes the functional: L α = H ( Z ) − αH ( Z | X ) − βI ( Z ; Y ) , (5)with respect to p ( z ∈ Z | x ∈ X ) and takes the limit as α → b ( n ) that can take on N = 134 diﬀerent integer values at each discrete time n . Here, we relate the joint distributions of b ( n ) ( X inEqn. 5) and b ( n + τ ) ( Y ) through a coarse-grained clus-tering of the behavioral states ( Z ). We chose 10 , β between 0 . and N c between 2 and 30 clusters. Given N c , β anda random initial condition for p ( t | x ), we ﬁnd a solutionby iterating the self-consistent equations [42] until theconvergence criteria | L t − L t +1 | < − is satisﬁed. Ifany cluster has its probability become zero at any itera-tion, then that cluster is dropped for all future iterations,thus N c is the maximum number of clusters that can bereturned. Of these 10 ,

000 solutions, we keep all solutionsthat are on the Pareto front (i.e., no other solution hasboth a higher I ( Y ; T ) and a smaller H ( T )). The dis-played clusters are the solutions on the Pareto front withthe largest I ( Y ; Z ) for a given number of clusters. Weighted Similarity Index

We quantify the similarity between clustering par-titions by calculating the Weighted Similarity Index(WSI), a modiﬁcation of the Rand Index [43] such thatbehaviors contribute the index according to their overallprobability. Speciﬁcally,WSI = (cid:88) i,j ∈ S a W ij + (cid:88) k,l ∈ S b W kl , W ij = P i P k (cid:80) kl P k P l , (6)where S a ( S b ) is the set of pairs of behaviors that belongto the same (diﬀerent) cluster in the two partitions and P k is the probability of observing behavior k .We thank Ilya Nemenman and Daniel Weissman fortheir helpful comments on the manuscript. D.G.H. wassupported by Programa Raices from the MinCyT. C.R.was supported by the NSF Physics of Living Systems Stu-dent Research Network (1806833). G.J.B. was supportedby NIMH R01 MH115831-01, the Human Frontier Sci-ence Program (RGY0076/2018), and a Cottrell ScholarAward, a program of the Research Corporation for Sci-ence Advancement (25999). J.C., D.L.S., and G.J.B.were supported by the Howard Hughes Medical Instituteand the Janelia visiting researcher program. ∗ D.G.H. and C.R. contributed equally to this work. † To whom correspondence should be addressed: [email protected].[1] K. Z. Lorenz, Scientiﬁc American , 67 (1958).[2] E. P. Martins and E. L. P. Martins,

Phylogenies and thecomparative method in animal behavior (Oxford Univer-sity Press, 1996).[3] F. Baier and H. E. Hoekstra, Proceedings of the RoyalSociety B , 20191697 (2019).[4] M. J. West-Eberhard,

Developmental plasticity and evo-lution (Oxford University Press, Oxford, U.K., 2003).[5] J. M. Gleason and M. G. Ritchie, Genetics , 1303(2004).[6] D. Yamamoto and Y. Ishikawa, Journal of Neurogenetics , 130 (2013).[7] C. Ellison, C. Wiley, and K. Shaw, Journal of Evolution-ary Biology , 1110 (2011).[8] K. L. Shaw and S. C. Lesnick, Proceedings of the Na-tional Academy of Sciences , 9737 (2009).[9] T. M. Williams and S. B. Carroll, Nature Reviews Ge-netics , 797 (2009).[10] N. Shubin, C. Tabin, and S. B. Carroll, Nature , 818(2009).[11] M. Levine and E. H. Davidson, Proceedings of the Na-tional Academy of Sciences , 4936 (2005).[12] D. L. Stern and N. Frankel, Philosophical Transactions ofthe Royal Society B: Biological Sciences , 20130028(2013).[13] B. S. Baker, B. J. Taylor, and J. Hall, Cell , 13(2001).[14] J. Cande, P. Andolfatto, B. Prud’homme, D. L. Stern,and N. Gompel, PLoS One , 1 (2012). [15] J. Cande, D. L. Stern, T. Morita, B. Prudhomme, andN. Gompel, Cell reports , 363 (2014).[16] Y. Ding, J. L. Lillvis, J. Cande, G. J. Berman, B. J.Arthur, M. Xu, B. J. Dickson, and D. L. Stern, CurrentBiology , 1089 (2019).[17] J. N. Webert, B. K. Peterson, and H. E. Hoekstra, Na-ture , 402 (2013).[18] C. K. Hu and H. E. Hoekstra, Seminars in cell & devel-opmental biology , 107 (2016).[19] G. J. Berman, D. M. Choi, W. Bialek, and J. W. Shae-vitz, Journal of The Royal Society Interface , 20140672(2014).[20] G. J. Berman, BMC Biology , 23 (2018).[21] A. E. X. Brown and B. de Bivort, Nature Physics , 410(2018).[22] J. Cande, S. Namiki, J. Qiu, W. Korﬀ, G. M. Card, J. W.Shaevitz, D. L. Stern, and G. J. Berman, eLife , e34275(2018).[23] G. J. Berman, W. Bialek, and J. W. Shaevitz, Proceed-ings of the National Academy of Sciences , 11943(2016).[24] J. Hadﬁeld, Journal of Statistical Software , 1 (2010).[25] J. Felsenstein, The American Naturalist , 1 (1985).[26] J. D. Hadﬁeld and S. Nakagawa, Journal of evolutionarybiology , 494 (2010).[27] Drosophila

12 Genomes Consortium, Nature , 203(2007).[28] D. J. Obbard, J. Maclennan, K.-W. Kim, A. Rambaut,P. M. O’Grady, and F. M. Jiggins, Molecular Biologyand Evolution , 3459 (2012).[29] S. Chyb and N. Gompel, Atlas of Drosophila Morphology:Wild-type and classical mutants (Academic Press, 2013).[30] A. S. Seetharam and G. W. Stuart, PeerJ , e226 (2013).[31] D. L. Stern, J. Crocker, Y. Ding, N. Frankel, G. Kappes,E. Kim, R. Kuzmickas, A. Lemire, J. D. Mast, and S. Pi-card, G3 , 1339 (2017).[32] D. Yamamoto and Y. Ishikawa, Journal of Neurogenetics , 130 (2013).[33] T. O. Auer and R. Benton, Current opinion in neurobi-ology , 18 (2016). [34] L. van der Maaten and G. Hinton, Journal of MachineLearning Research , 2579 (2008).[35] F. Meyer, Signal processing , 113 (1994).[36] I. Goodfellow, Y. Bengio, and A. Courville, Deep learn-ing (MIT press, Cambridge, MA, 2016).[37] Z. Yang et al. , Computational molecular evolution , Vol.284 (Oxford University Press Oxford, Oxford, U.K.,2006).[38] C. W. Cunningham, K. E. Omland, and T. H. Oakley,Trends in Ecology & Evolution , 361 (1998).[39] J. Hadﬁeld and S. Nakagawa, Journal of evolutionary bi-ology , 494 (2010).[40] N. Slonim, G. S. Atwal, G. Tkaik, andW. Bialek, Proceedings of the NationalAcademy of Sciences The Study of Instinct (Oxford UniversityPress, Oxford, U. K., 1951).[42] D. Strouse and D. J. Schwab, Neural Computation ,1611 (2017).[43] W. M. Rand, Journal of the American Statistical associ-ation , 846 (1971).[44] D. J. Anderson, Nature Reviews Neuroscience , 692(2016).[45] F. Tajima, Genetics , 599 (1993).[46] T. D. Pereira, D. E. Aldarondo, L. Willmore, M. Kislin,S. S. H. Wang, M. Murthy, and J. W. Shaevitz, NatureMethods , 117 (2018).[47] M. W. Mathis and A. Mathis, Current Opinion in Neu-robiology , 1 (2020).[48] A. Gelman and D. B. Rubin, Statistical Science , 457(1992).[49] S. P. Brooks and A. Gelman, Journal of computationaland graphical statistics , 434 (1998).[50] N. Tishby, F. C. Pereira, and W. Bialek, in Proceedingsof the 37th Annual Allerton Conference on Communica-tion, Control and Computing (University of Illinois Press,Urbana-Champaign, IL, 1999) pp. 368–377. FIG. S1. Gelman Rubin diagnostic for model parameters inferred using MCMC. A: PSRF for the 134 ancestral behaviorsinferred in the GLMM. 20 MCMC chains with diﬀerent initial conditions were used. B: PSRF for the phylogenetic covariancematrix elements corresponding to the 10% most common behaviors performed by the measured ﬂies. C: PSRF for the individualcovariance matrix elements corresponding to the 10% most common behaviors performed by the measured ﬂies. The PSRFvalues for all of these inferred parameters indicate that the MCMC chains are converging. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - D. yakuba D. santomea D. sechellia D. simulans D. mauritiana D. melanogaster

Mean species behavior (Log) M e a n o f i n f e rr e d s p e c i e s b e h a v i o r ( L o g ) FIG. S2. Comparison between measured and inferred behaviors (in log scale) for each of the extant species. The mean of themeasured behavioral repertoires for all the individuals of a particular species is taken in the log scale. Each measured behavioralmean gets compared to the mean obtained from the components of the MCMC samples corresponding to that particular speciesand behavioral mode (i.e., the inferred behavioral repertories from the GLMM). The biggest diﬀerences occur mostly in thelow probability behaviors. WSIShuffleWSI W S I W S I W S I BAC

WSIShuffleWSIWSIShuffleWSI

FIG. S3. Behaviors clustered according to information of the individual covariance matrix using three diﬀerent clusteringmethods. A: Results using k-medoids clustering method with distance matrix d ij = (1 − ρ ij ) /  A B ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ■ ■ ■■ ■ ■ H ( Z ) I ( b ( t ) , Z ( t + τ )) Pareto Front

FIG. S4. Coarse-grained behavioral representations that are optimally predictive of the future behavior states via DIB. A:Behavioral representation with 2,3,...,7 clusters using τ = 50 in Eq. 5. B: Optimal trade-oﬀ curve (Pareto Front) betweencomplexity of coarse grained description against predictive power. For each number of clusters, representations in A correspondto points (in red) in this curve with the highest predictive information < d > < d >< d > Shuffle

FIG. S5. Modularity measure of the intra-species behavioral covariance matrix using information based clustering. < d >< d >