A framework for studying behavioral evolution by reconstructing ancestral repertoires
Damián G. Hernández, Catalina Rivera, Jessica Cande, Baohua Zhou, David L. Stern, Gordon J. Berman
AA framework for studying behavioral evolution by reconstructing ancestral repertoires
Dami´an G. Hern´andez,
1, 2, ∗ Catalina Rivera, ∗ Jessica Cande, Baohua Zhou,
1, 4
David L. Stern, and Gordon J. Berman
1, 5, † Department of Physics, Emory University Department of Medical Physics, Centro At´omico Bariloche and Instituto Balseiro Janelia Research Campus Department of Molecular, Cellular and Developmental Biology, Yale University Department of Biology, Emory University (Dated: July 21, 2020)Although extensive behavioral changes often exist between closely related animal species, ourunderstanding of the genetic basis underlying the evolution of behavior has remained limited. Here,we propose a new framework to study behavioral evolution by computational estimation of ancestralbehavioral repertoires. We measured the behaviors of individuals from six species of fruit flies usingunsupervised techniques and identified suites of stereotyped movements exhibited by each species.We then fit a Generalized Linear Mixed Model to estimate the suites of behaviors exhibited byancestral species, as well as the intra- and inter-species behavioral covariances. We found thatmuch of intraspecific behavioral variation is explained by differences between individuals in thestatus of their behavioral hidden states, what might be called their “mood.” Lastly, we propose amethod to identify groups of behaviors that appear to have evolved together, illustrating how sets ofbehaviors, rather than individual behaviors, likely evolved. Our approach provides a new frameworkfor identifying co-evolving behaviors and may provide new opportunities to study the genetic basisof behavioral evolution.
Behavior is one of the most rapidly evolving phe-notypes, with notable differences even between closely-related species [1, 2]. Variable behaviors and rapid be-havioral evolution likely allows species to adapt rapidlyto new or varying environments [3, 4]. Despite the im-portance of animal behavior, progress in revealing thegenetic basis of behavioral evolution has been slow [5–8].In contrast, recent decades have seen significant progressin understanding the genetic causes of morphological evo-lution [9–12].While there are many potential reasons for the dis-crepancy between studies of behavioral and morpholog-ical evolution, including the lack of a fossil record forbehavior, a key difficulty is identifying which aspects ofan animal’s development and physiology are responsiblefor the observed changes in animals’ actions. Changes inbehavior along a phylogeny could emerge from alterationsin the developmental patterning of neural circuitry (e.g.,brain networks, descending commands, central patterngenerators), hormonal regulation that affects the expres-sion of behaviors, or the gross morphology of an animal’sbody or limbs [13]. Each of these possibilities could resultin behavioral effects at different, yet overlapping, scales –from muscle twitches to stereotyped behaviors to longer-lived states like foraging or courtship or aging that maycontrol the relative frequency of a given behavior – mak-ing it difficult to identify the precise aspects of behaviorthat are changing.To address these difficulties, the standard approach inthe study of behavioral evolution has been to identifyfocal behaviors that exhibit robust differences betweenspecies, such as courtship behavior in fruit flies [14–16]or burrow formation in deermice [17, 18]. With these ro- bust differences in phenotype, it is possible to performanalyses that isolate regions of the genome that corre-late with quantitative changes in the performance of thefocal behavior. However, there tend to be multiple suchregions identified, each containing many genes. Giventhe large number of putative genes involved, combinedwith the possibility of epistatic interactions between loci,identification of the contributions of individual genes tobehavioral evolution has moved slowly.An alternative approach to focusing on single behav-iors is to examine the full repertoire of movements thatan animal performs. By identifying sets of behaviorsthat evolve together, it may be possible to identify reg-ulators of these suites of behaviors. This approach hasbeen made possible by recent progress in unsupervisedidentification of animal behaviors across length and timescales [20, 21]. In this study, we introduce a quantita-tive framework for studying the evolutionary dynamicsof large suites of behavior. We have focused initiallyon fruit flies, which provide a convenient model for thisproblem because they exhibit a wide range of complex be-haviors and unsupervised approaches can be used to mapall of the animal movements captured in video recordings[19, 22, 23].We recorded movies of isolated male flies from sixspecies in a nearly stimulus-free environment. Becausewe did not record flies experiencing social and otherenvironmental cues, we did not observe many charis-matic natural behaviors, such as courtship and aggres-sion. Nevertheless, we found that the behaviors they per-formed, including walking and grooming, contain species-specific information. We thus hypothesized that ourquantitative representations of behaviors could be stud- a r X i v : . [ q - b i o . P E ] J u l B D. melanogaster D. mauritiana D. simulansD. sechellia D. santomea D. yakuba - - D e v i a t i o n s f r o m m e a n b e h a v i o r ( L o g ) -4 PosteriorMovementsAnteriorMovements
Slow
Idle/SlowLocomotion
Fast A B e h a v i o r a l d e n s i t y FIG. 1. Behavioral repertoires of
Drosophila . A: The behavioral space probability density function, obtained using theunsupervised approach described in [19] on the entire data set of 561 individuals across all species. Coarse grained behaviorscorresponding to the different types of movements exhibited in the map are shown as well. B: The relative performance ofeach of the 134 stereotyped behaviors for each of the six species. Each region here represents a behavior, and the color scaleindicates the logarithm of the fraction of time that species performs the specified behavior divided by the average across allspecies. ied in an evolutionary context. To infer the evolution-ary trajectories of behavioral evolution, we estimated an-cestral behavioral repertoires with a Generalized LinearMixed Model (GLMM) approach [24], which builds uponFelsenstein’s approach to reconstructing ancestral states[25, 26]. Using these results, we develop a frameworkthat allows us to model the behavioral traits that co-varyboth within a species and along the phylogeny. We findthat within-species variance is related primarily to long-lasting internal states of the animal, what might be calleda fly’s mood, and that inter-species variance can capturehow disparate behaviors may evolve together. This lat-ter finding points towards the presence of higher-orderbehavioral traits that may be amenable to further evolu-tionary and genetic analysis.
EXPERIMENTS AND BEHAVIORALQUANTIFICATION
We captured video recordings of all behaviors per-formed by single flies isolated in a largely featurelessenvironment for multiple individuals from six species ofthe
Drosophila melanogaster species subgroup:
D. mau-ritiana , D. melanogaster , D. santomea , D. sechellia , D. simulans , and
D. yakuba [22]. Although the animalscould not jump or fly in these chambers and were notexpected to exhibit social or feeding behaviors, the fliesdisplayed a variety of complex behaviors, including loco-motion and grooming. Each of these behaviors involvesmultiple body parts that move at varying time scales.The species studied here were chosen because their phy-logenetic relationships are well understood [27–30] (sum-marized in the tree seen in Fig. 3), and genetic toolsare available for most of these species [31]. Since a sin- gle strain represents a genomic snapshot of each species,we assayed individuals from multiple strains from eachspecies to attempt to capture species-specific differences,and not variation specific to particular strains (see Ma-terials and Methods). In total, we collected data from561 flies, each measured for an hour at a sampling rateof 100 Hz.While previous studies have identified differences inspecific behaviors, such as courtship behavior, betweenthese species [14, 16, 32, 33], here we assayed the fullrepertoire of behaviors the flies performed in the arena,with the aim of identifying combinations of behaviorsthat may be evolving together. To measure this reper-toire, we used a previously-described behavior mappingmethod [19, 22] that starts from raw video images andattempts to find each animals stereotyped movements inan unsupervised manner. The output of this methodis a two-dimensional probability density function (PDF)that contains many peaks and valleys (Fig. 1A), whereeach peak corresponds to a different stereotyped behavior(e.g., right wing grooming, proboscis extension, running,etc).Briefly, to create the density plots, raw video imageswere rotationally and translationally aligned to createan egocentric frame for the fly. The transformed imageswere decomposed using Principal Components Analysisinto a low-dimensional set of time series. For each of thesepostural mode time series, a Morlet wavelet transformwas applied, obtaining a local spectrogram between 1 Hzand 50 Hz (the Nyquist frequency). After normalization,each point in time was mapped using t-SNE [34] into atwo dimensional plane. Finally, convolving these pointswith a two-dimensional gaussian and applying the water-shed transform [35], produced 134 different regions, eachof these containing a single local maximum of probabil-
FIG. 2. Classification of fly species based on behavioralrepertoires. A: A t-SNE embedding of the behavioral reper-toires shows that behavioral repertoires contain some species-specific information. Each dot represents one individual fly,with different colors representing different species and differ-ent symbols with the same color representing different strainswithin the same species. The distance matrix (561 by 561)used to create the embedding is the Jensen-Shannon diver-gence between the behavioral densities of individual flies. B:Confusion matrix for the logistic regression with each row nor-malized. All the values are averaged from 100 different trials.The standard error is less than 0.01 for the diagonal elementsand less than 0.005 for each of the off-diagonal elements. ity density that corresponds to a particular stereotypicalbehavior. Thus, by integrating the density of the regionfor a particular fly, we can associate to each of thema 134-dimensional real-valued vector that represents theprobability of the fly performing a certain stereotypedbehavior at a given time. We will refer to this quantityas the animal’s behavioral vector (cid:126)P .The behavioral map averaged across all six species isshown in Fig. 1A and displays a pattern of movementssimilar to those we found in previous work, where lo-comotion, idle/slow, anterior/posterior movements, etc.are segregated into different regions [19, 22]. Averaging across all individuals of each species, we found the meanbehavioral vector for each species (Fig. 1B) and observedthat each species performs certain behaviors with differ-ent probabilities. For example,
D. mauritiana individ-uals spend more time performing fast locomotion thanall other species on average, and
D. yakuba individualsspend much of their time performing an almost species-unique type of slow locomotion, but little time runningquickly.These average probability maps provide some insightinto potential species differences, but to identify species-specific behaviors, we also need to account for variationin the probability that individuals of each species performeach behavior. One way to address this problem is to askwhether an individual’s species identity can be predictedsolely from its multi-dimensional behavioral vector. Toexplore this question, we first used t-SNE to project all561 individuals into a 2 dimensional plane (Fig. 2A), us-ing the Jensen-Shannon divergence as the distance met-ric between individual behavioral vectors. In this plot,different colors represent different species, and differentsymbols with the same color represent different strainswithin the same species. Although there is not a clearsegregation of all species in this plane, the distributionof species is far from random, with individuals from thesame species tending to group near to each other.To quantify this observation, we applied a multinomiallogistic regression classifier that performed a six-way clas-sification based solely on the high-dimensional behavioralvectors. After training, the classifier correctly classified89 ± .
2% of vectors (using a randomly-selected test set of30% of the entire data set). Moreover, the confusion ma-trix (Fig. 2B) revealed no systematic misclassificationsbias amongst the species. Note that we have used a rel-atively simple classifier compared to modern deep learn-ing methods [36], so these results likely represent a lowerbound on the distinguishability of the behavioral vectors.Thus, behavioral vectors appear to contain considerablespecies-specific information. We therefore proceeded toexplore how these behavioral vectors may have evolvedalong the phylogeny.
RECONSTRUCTING ANCESTRALBEHAVIORAL REPERTOIRES
Multiple methods have been proposed for reconstruct-ing ancestral states solely from data collected from extantspecies [25, 37]. These methods generally fall into twocamps: parsimony reconstruction, which attempts to re-construct evolutionary history with the fewest number ofevolutionary changes [38], and diffusion-processes, whichmodel evolution as a random walk on a multi-dimensionallandscape [39]. Given the high-dimensional behavioralvectors that we are attempting to model, a diffusion pro-cess is more likely to capture the inter-trait correlations
FIG. 3. Reconstructed behavioral repertoires using the GLMM. Inferred probabilities of the behavioral traits for the ancestralstates are plotted in logarithmic scale. Except for the ancestral root, other ancestral states are plotted with respect to theclosest ancestor. Here, all but the root ancestor are plotted with respect to their closest ancestral state. Therefore, for eachbehavioral trait, i , we show: log( ¯ P i ) − log( ¯ P Anci ), where ¯
P i and ¯ P Anci correspond to the inferred mean behavioral trait for thegiven ancestor and its closest ancestor, respectively. that we would like to understand. Thus, we focus on adiffusion-based model here.Given a phylogeny for a collection of species, we mod-eled how species-specific complexes of behaviors mighthave emerged. Specifically, we assumed that each behav-ior is a quantitative trait, that is, each behavioral dif-ference results from the additive effects of many geneticloci, each of small effect. We do not, however, assumethat all behaviors evolve independently of each other.Thus, we are interested in predicting (1) how behaviorsco-vary and (2) whether intra- and inter-species variationcan be separated to identify independently evolving setsor linear combinations of behaviors.We assumed that the flies’ behaviors evolved via a dif-fusion process, where initially the process starts at thecommon ancestor behavioral representation and eventu-ally each individual’s trajectory performs a random walkwith Gaussian noise along the known phylogenetic tree.Note that this is a less stringent assumption than neutral-ity, as multiple traits under selection may evolve in a cor-related manner. More precisely, we fit a Multi-responseGeneralized Linear Mixed Model (GLMM) to the data,using the approach described in [24]: (cid:126)l = (cid:126)µ + (cid:126)ρ + (cid:126)e (1)where (cid:126)l = ( l , ..., l K =134 ) denotes the logarithm of the behavioral vector (cid:126)P for each individual, (cid:126)µ is the meanbehavior of the common ancestor (treated as the fixedeffects of this model), and (cid:126)ρ and (cid:126)e are the randomeffects corresponding to the phylogenetic and individ-ual variability, respectively. We assume that these ran-dom effects are generated from the multi-dimensional normal distributions N ( (cid:126) , A ⊗ V ( a ) ) (phylogenetic) and N ( (cid:126) , I ⊗ V ( e ) ) (individual). Here, the matrix A repre-sents the information contained in the phylogenetic tree,with A ij being proportional to the length of the pathfrom the most recent common ancestor of species i and j to the main ancestor. This matrix is normalized sothat the diagonal elements are all equal to 1. I is theidentity matrix, and V ( a ) and V ( e ) are the covariancematrices that govern the process. We fit µ , V ( a ) , and V ( e ) using Markov Chain Monte Carlo (MCMC) simu-lation (see Materials and Methods). We checked thatthe MCMC converged using the Gelman-Rubin diagnos-tic (see Materials and Methods, Fig. S1). In addition tothe inferred behavioral states corresponding to the com-mon ancestor, ¯ P Anc , we also reconstructed the mean be-havioral representations for the intermediate ancestors(Fig. 3). Further validation of our results correspondingto the current species behavior is shown in Fig. S2.
INDIVIDUAL VARIABILITY AND LONGTIMESCALE CORRELATIONS
While it is not possible to directly test the accuracy ofour ancestral state reconstructions, the inferred covari-ance matrices generate predictions about genetic corre-lations that are, in principle, testable. We therefore focuson our fitted covariance matrix, V ( e ) ∈ (cid:60) × , whichaccounts for within-species random effects.We first note that V ( e ) exhibits a modular structure(Fig. 4A). After rearranging the behavior order via aninformation-based clustering procedure [40], we see thata block diagonal pattern emerges, with positive correla- - - A BC C o v a r i an c e FIG. 4. The structure of variability between flies of the same species relates to long timescale transitions in behavior. A:The intra-species behavioral covariance matrix ( V ( e ) ), with columns and rows ordered via an information-based clusteringalgorithm [40]. The black squares represent behaviors that are grouped together in the three cluster solution. B: Behavioralmap representation of the clustering solutions. The two, three, and six cluster solutions are shown on top (colors on the threecluster solution match those above the plot in A). The clusters are all spatially contiguous and break down hierarchically (seeFig. S3 for more examples). C: Clustering structure of the behavioral space obtained finding the optimally predictive groupsof behaviors (see text for details). Note how these clusterings are nearly the same as the clusterings in B, despite having beenderived from an entirely independent measure. tions lying within the blocks and negative correlationslying off the diagonal. This clustering approach mini-mizes the functional F = (cid:104) d (cid:105) + βI ( C ; b ), where (cid:104) d (cid:105) is theaverage within-cluster distance between behaviors (de-fined here as d ij = [1 − V ( e ) ij / (cid:113) V ( e ) ii V ( e ) jj ]), I ( C ; b ) isthe mutual information between cluster assignment andbehavior number, and β modulates the relative impor-tance of the two terms (see Materials and Methods). Thismodular structure emerges when applying other cluster-ing methods as well (Fig. S3). Quantifying the matrix’smodularity, we find that (cid:104) d (cid:105) ≈ .
30 and 0 .
22 for the3 and 6-cluster solutions respectively. These values aresignificantly smaller than the average distances obtainedusing random cluster assignments ( (cid:104) d (cid:105) = 0 . ± .
03 and0 . ± .
04 for 3 and 6 clusters respectively, see Fig. S5).Strikingly, these clusters are spatially contiguous in thebehavioral map – implying that similar behaviors explainmuch of the intra-species variance [23]. Moreover, newclusters emerge in a hierarchical fashion, where coarse-grained behaviors sub-divide into new clusters (Fig. 4B),a feature that is not guaranteed by the information-basedclustering algorithm.This hierarchical structure of the behavioral space isreminiscent of the hierarchical temporal structure of be-havior that was hypothesized originally by ethologists[41] and was observed to optimally explain the longtimescale structure of
Drosophila melanogaster behav-ioral transitions [23]. To explore this connection fur-ther, we found coarse-grainings of the behavioral spacethat are optimally predictive of the future behaviors thatthe flies perform via the Deterministic Information Bot-tleneck (DIB) [42]. Similar to the previously described information-based clustering method, this approach min-imizes a functional, J τ = − I ( b ( t ); Z ( t + τ )) + γ H ( Z ),where b ( t ) is a fly’s behavior at time t , Z ( t + τ ) is thecoarse-grained behavior visited at time t + τ , τ = 50, I ( b ( t ); Z ( t + τ )) is the mutual information between thesequantities, γ is a positive constant, and H ( Z ) is the en-tropy of the coarse-grained representation (see Materialand Methods). As γ is increased, progressively coarserrepresentations are found.Applying this method to the data pooled across allsix species (Figs. 4C, S4), we again found the same typeof hierarchical division in the behavioral space that wasobserved for freely moving D. melanogaster [23]. More-over, we found that the structure of the space using thisapproach closely mirrors the structure found via cluster-ing V ( e ) (Fig. 4C). We quantify the similarity betweenboth clustering partitions by calculating the WeightedSimilarity Index (WSI), a modification of the Rand In-dex [43] (Materials and Methods). The WSI between theinformation-based clustering method and the predictiveinformation bottleneck for three clusters is W SI = 0 . W SI = 0 .
87 for six clusters. For random clusterings,we would expect to observe 0 . ± .
02 and 0 . ± .
01 for3 and 6 clusters, respectively, indicating a non-randomoverlap between these two partitions. Fig. S3, shows thatthis result is independent of the clustering method andthe number of clusters.The overlap between these two coarse-grainings indi-cates that most individual variability in the behaviors weobserve results from non-stationarity in behavioral mea-surements, rather than from individual-specific variation.That is, much of the intraspecific variation appears to ( z b ) ij = μ i - μ j σ i + σ j ( z b ) ij f o rr eno r m a li z edp r obab ili t i e s - - - - ( p ) - - - - ( p )- - - - ( p ) - - - - ( p )- - - - ( p ) - - - - - ( p )- - - - ( p ) - - - ( p )- - - - ( p ) - - - - ( p )- - - ( p ) - - - ( p ) D. sanD. yak
B C
Many individuals within a species An individual over long-time scales
State 1 State 3
1 2 3
Behavioral repertoire
Hidden states Coarse-grained postural behaviors
Fly … State 2 time Fly A FIG. 5. Variability within a species, long timescale transitions, and hidden states modulating behavior. A: A cartoon of thehypothesized relation between individual variability within a species and long timescale transitions through hidden states. B:Accounting for the long timescale dynamics - by adjusting for the amount of time spent in each coarse-grained region (here,the six cluster solution at the top right of Fig. 4C) - affects the measured behavioral distributions between
D. santomea and
D. yakuba . Shown is the comparison of the Mahalanobis distance (( z b ) ij ) between behavioral distributions before (x-axis) andafter (y-axis) adjusting. C: Kernel density estimates of the distributions for the circled behaviors in B) on the left before (left)and after (right) adjustments. Solid lines represent D. santomea and dashed lines represent
D. yakuba . reflect flies recorded when they were experiencing differ-ent hidden behavioral states (i.e. moods), rather thanreflecting fixed (environmental or genetic) differences be-tween flies. This variation may have arisen because, al-though we controlled many variables (e.g., fly age, circa-dian cycle, temperature, and humidity), it is not possibleto control for all internal factors (e.g., hunger, arousal,etc.) that affect an animals behavioral patterns [44].The temporal coarse-graining of the behavioral spacethat we found via the DIB, gives insight into these non-stationarities, as they are optimally-predictive of the flysfuture behaviors. Given the contiguous nature of theseregions, this result means that flies tended to stay withinspecific regions of the behavioral space much longer thanone would assume from a Markov model.This observation implies that variation in behavior ob-served among individuals, especially in non-manipulatedsettings, is likely to often reflect a large component ofhidden behavior states (Fig. 5A). Thus, it may be pos- sible to improve upon behavioral measurements in manysettings by controlling for the variability associated withthese hidden states. For example, just because one flyperforms less anterior grooming than another may reflectthat the animal is in a different long timescale behavioralstate, rather than that the animal has a genetically en-coded preference for reduced grooming.A potential method for accounting for these artifactsis to normalize each individual’s behavioral density suchthat the amount of time that the animal spends in each ofthe coarse-grained regions is equalized. In other words,the amount of time spent anterior grooming, locomot-ing, etc. are set to be the same for all animals, thus ac-counting for the variability associated our inferred hiddenstates. Mathematically, if P i is the probability of observ-ing behavior i , and C i is the clustering assignment of this D F E A B C
Eigenvector 1 Eigenvector 3 Eigenvector 5 Eigenvector 6 Eigenvector 4 Eigenvector 2
Eigenvector components C o v a r i an c e D. sanD. yak - P D F D. simD. sech - P D F D. simD. mau - P D F E x p l a i ned v a r i an c e - - - - FIG. 6. Phylogenetic variability and behavioral meta-traits. A: (top) Clustering the phylogenetic covariance matrix (usingthe same information-based clustering method from Fig. 4), we observe that the clusters are no longer spatially contiguous.(bottom) The phylogenetic covariance matrix reordered according to four clusters (colors corresponding to the four-cluster mapabove). B: Fraction of variance explained by the largest eigenvalues of the phylogenetic covariance matrix. C: The eigenvectorscorresponding to the largest six eigenvalues. D: Distributions of the projections of individual density vectors from
D. santomea and
D. yakuba onto eigenvector 3. E: Same as in D but using projections of individuals from
D. sechellia and
D. simulans ontoeigenvector 4. F: Same as in D but using projections of individuals from
D. simulans and
D. mauritiana onto eigenvector 5. behavior, we can define a normalized probability, ˆ P i , viaˆ P i = ¯ P ( C i ) P ( C i ) i P i , (2)where P ( C ) i = (cid:80) k ∈ C P k is the total density in cluster C for an individual fly and ¯ P ( C ) is the average across allanimals.We found that applying this normalization to our dataoften results in substantial changes in the inferred dis-tributions of behavioral densities. For example, Fig. 5Bdisplays how the difference in behavioral density between D. santomea and
D. yakuba (as measured by the Ma-halanobis distance between the distributions) alters asa result of normalization. For some behaviors, the sig-nal increases (red points), and in some cases, it reverses(blue points). Thus, it is important to take these non-stationary effects into account when estimating how oftensingle behaviors are performed in studies of behavioralevolution. To measure these non-stationary effects, manybehaviors must be measured, not just a focal behavior.
IDENTIFYING PHYLOGENETICALLY LINKEDBEHAVIORS
One of the advantages of our approach is that we sep-arate variations in behavior corresponding to evolution- ary patterns, the phylogenetic variability, from variationsamong individuals of the same species. By studying theproperties of the phylogenetic covariance matrix ( V ( a ) ),we can identify behaviors that may be evolving together.We first characterized the coarse-grained structurewithin V ( a ) through the information-based clustering de-scribed in the previous section [40]. As seen in Fig. 6A,these clusters are not spatially contiguous in the behav-ioral space. This pattern contrasts to the spatial con-tiguity we observed for the individual covariance matrix(Fig. 4B). For example, the two-cluster solution (Fig. 6A,left) separates the behavioral space into side legs move-ments (middle) and certain locomotion gaits (far left)from the rest of behaviors. Similarly, non-localized struc-ture is also observed when the matrix is clustered into alarger numbers of clusters as well.One possible interpretation of these discontinuous clus-ters is that at the neural level, each of these groups ofmovements may reflect a motor response to shared up-stream commands [22]. For example, different types of lo-comotion might be controlled through the same descend-ing neural circuitry, but due to evolutionary changes, thesame commands could lead to different behavioral out-puts, as has been observed in fly courtship patterns [16].Thus, examination of phylogenetically course-grained re-gions such as these may provide a more biologically real-istic view of suites of evolving behaviors than does focuson single behaviors.To quantify these patterns as traits, we decomposed V ( a ) via an eigendecomposition. As seen in Fig. 6B, al-most all of the variance within the matrix can be ex-plained with only the first six eigenmodes. These eigen-vectors (Fig. 6C) share similar non-local structure to theclusterings described above. By projecting individual be-havioral vectors onto these eigenvectors, the resulting dotproducts represent a meta-trait that is a linear combina-tion of phylogenetically linked behaviors.These evolving meta-traits may be suitable targets forfurther neurobiological or genetic studies. Three exam-ples of these distributions are shown in in Fig. 6D forseveral pairs of closely related species. These three ex-amples were not chosen at random, but instead becausethey showed significant differentiation between species.The aim of this analysis is not to show that all meta-traitswould differ between all pairs of species, which strikes usas unlikely, but rather that it is possible to identify syn-thetic meta-traits that could be further interrogated withexperimental methods. DISCUSSION
We have developed a quantitative framework to studythe evolution of behavioral repertoires, using fruit flies(
Drosophila ) as a model system. We started with ob-servations of 561 individuals from six extant species be-having in an unremarkable environment. This assay didnot include social behaviors, such as courtship and ag-gression, nor many foraging behaviors. Thus, at firstglance, it might seem like we had excluded most species-specific behaviors from the analysis. Nonetheless, wefound that other complex behaviors, like walking, run-ning, and grooming, exhibit species-specific features thatcan be used to reliably assign individuals to the correctspecies. Thus, the motor patterns of behaviors that arenot normally investigated for their species-specific fea-tures are clearly evolving between even closely relatedspecies. It is not clear if these differences reflect naturalselection or genetic drift on the details of these motorpatterns. But, all of these behaviors would seem to becritical to individual survival, so it is possible that thesebehaviors have evolved, at least in part, in response tonatural selection. It is clear, however, that the underly-ing neural circuitry controlling these behaviors must haveevolved.Inspired by these observations, we estimated pat-terns of behavioral evolution in the context of a well-understood phylogeny. We fit a Generalized Mixed Lin-ear Model to our behavioral measurements and the givenphylogeny to reconstruct ancestral behavioral repertoiresand the intra- and inter-species covariance matrices. Wefound that the patterns of intra-species variability aresimilar to long timescale behavioral dynamics. This suggests that much of the intraspecific variability thatemerged by sampling flies under well-controlled condi-tions reflects variability in the hidden behavioral statesof individual flies. This variability is a clear confound forevolutionary and experimental studies of behavior and wetherefore propose a method to control for these internalstates and improve the accuracy of behavioral phenotyp-ing. We showed that controlling for these internal statescan dramatically alter estimates of the heritable elementsof behavior.Given our estimates for how suites of behaviorsevolved, we examined whether the inter-species covari-ance matrix could be used to identify behavioral meta-traits that might be subjected to further evolutionaryand experimental analysis. We identified multiple suitesof behaviors that differed between closely related species,providing a starting point for further analysis of how themechanismns underlying these suites of behaviors haveevolved.The analysis framework introduced here represents thefirst attempt to analyze full behavioral repertoires to gaininsight into evolution. In principle, this approach couldbe applied to any data set where a large number of be-haviors have been sampled in many species. We envisionseveral areas where future improvements may yield moredetailed, comprehensive, and biologically meaningful re-sults. First, we recorded behavior from only six speciesof flies. Adding additional species would place more con-straints on the evolutionary dynamics, likely resulting inless variance in the ancestral state estimations and poten-tially adding more structure to the relatively low rank co-variance matrices. Additionally, further work is requiredto determine the balance between sampling within andbetween strains and species that optimizes estimates ofevolutionary dynamics.Second, our framework assumes that all evolutionarychanges in behavior resemble a diffusion process. Al-though this assumption is a reasonable initial hypothesis[25], it may be possible to test this assumption. For ex-ample, deeper sampling of additional species may allowidentification of specific behaviors on particular lineageswhere neutrality can be rejected [45].In addition, all of our analyses involved measuringthe fraction of behaviors performed during the record-ing time, ignoring the temporal structure and sequencesof movements. While we show here that much of thisinformation can be related to the structure of the intra-species covariance matrix, the order in which behaviorsoccured may also provide important biological informa-tion. It should be possible to incorporate temporal struc-ture directly into the regression. Deciding exactly whichquantities to measure and how they should be incorpo-rated, however, are complex questions that are outsidethe scope of this initial study.Lastly, capturing the full range of animal behaviors fora large number of animals presents a number of techno-logical challenges, which is why we focused on measuringbehavior in a highly simplified environment. However, amore complete understanding of the structure of behav-ior will require more sophisticated ways to capture be-havioral dynamics in more naturalistic settings and dur-ing complex social arrangements. While modern deeplearning methods have made tracking animals in morerealistic settings increasingly plausible [46, 47], there arestill considerable hurdles to translating this informationinto a form that can be subjected to the kind of analysiswe propose here.Despite these limitations, this work represents a newway to quantitatively characterize the evolution of com-plex behaviors, which may provide new phenotypes thatcan be subjected to experimental analysis. In the ab-sence of a behavioral fossil record, reconstructing ances-tral behaviors requires an inferential approach like theone we present here. In addition, more complex modelscould be built to test assumptions underlying this ini-tial, diffusion-based, model. Finally, a strength of ourapproach is that it makes falsifiable predictions abouthow behaviors are linked mechanistically, providing pre-dictions that can be tested experimentally to provide fur-ther insight in the genetic and neurobiological structureof behavior.
MATERIALS AND METHODSData collection
All imaging of fly behavior followed the procedures de-scribed in [22], but without any red light stimulation.In total, we collected data from 561 individual from 18strains and six species. These included three strains of
D.mauritiana ( mau29 : 29 flies, mau317 : 35 flies, mau318 :32 flies), four strains of D. melanogaster ( Canton-S : 31flies,
Oregon-R : 33 flies, mel54 : 34 flies, mel56 : 31 flies),three strains of
D. santomea ( san00 : 29 flies, san1482 :33 flies, STO OBAT : 22 flies), three strains of
D. sechel-lia ( sech28 : 32 flies, sech340 : 25 flies, sech349 : 33 flies),three strains of D. simulans ( sim5 : 33 flies, sim199 : 30flies, Oxnard : 34 flies), and two strains of
D. yakuba ( yak01 : 34 flies, CYO2 : 31 flies).
Generalized Linear Mixed Model
We fit our GLMM (Eq. 1) using the software in-troduced in [24]. The covariance matrices V ( e ) and V ( a ) ∈ (cid:60) K × K , K = 134 and the mean vector (cid:126)µ ∈ (cid:60) K × ,were inferred from the posterior distribution via MCMCsampling. Prior distributions for the covariance matri-ces were given by Inverse Wishart Distributions (conju-gate priors for the multi-Gaussian model) with K degreesof freedom and K +1 I + J as scale matrix, with J and I the unit and identity matrices respectively. Tree branchlength were estimated from [30]. Gelman-Rubin convergence diagnostic
This test evaluates MCMC convergence by analyzingthe difference between several Markov chains. Conver-gence is evaluated by comparing the estimated between-chains and within-chain variances for each parameter ofthe model. Large differences between these variancesindicate non-convergence [48]. Let θ be the model pa-rameter of interest and { θ m } Nt =1 be the m th simulatedchain, m = 1 , , ..., M . Denote, ˆ θ m and ˆ σ m be the sam-ple posterior mean and variance of the m th chain. Ifˆ θ = M (cid:80) Mm =1 ˆ θ m is the overall posterior mean estimator,the between-chains and within-chain variances are givenby: B = NM − M (cid:88) m =1 (ˆ θ m − ˆ θ ) , W = 1 M M (cid:88) m =1 ˆ σ m . (3)In reference [48], it is shown that the following weightedaverage of W and B is an unbiased estimator of themarginal posterior variance of θ : ˆ V = N − N W + M +1 NM B .The ratio ˆ V /W should get close to one as the M chainsconverge to the target distribution with N → ∞ . In ref-erence [49] this ratio known as the potential scale reduc-tion factor (PSRF) was corrected to account for the thesampling variability using R c = (cid:113) d +3 d +1 ˆ VW , where d is thedegrees of freedom estimate of a t-distribution. Values ofPSRF for all model parameters such that R c < . Information-based clustering
Minimizes the distance between elements within clus-ters while compressing the original representation. Themethod minimizes the functional F = (cid:104) d (cid:105) + T I ( C ; i ),where I ( C ; i ) = (cid:80) Ni =1 (cid:80) N c C =1 P ( C ; i ) log[ P ( C | i ) P ( C ) ] is themutual information between the original behavioral vari-able i and the representation C . (cid:104) d (cid:105) = (cid:80) N c C =1 P ( C ) d ( C ),and d ( C ) is the average distance of elements chosen outof a single cluster: d ( C ) = N (cid:88) i N (cid:88) i P ( i | C ) P ( i | C ) d ( i , i ) , (4)with d ( i , i ) being the distance measure between a pairof elements and P ( i | C ) being the probability to findelement i in cluster C .0Given | C | = N c , T and a random initial condi-tion for P ( C | i ), a solution is obtained by iteratingthe following self-consistent equations until the criteria F t −F t +1 F t < − is satisfied. We chose 40 ,
000 differentrandom values of T ∈ [0 . , N c between 2 and 20,and performed the optimization in each case until theconvergence criterion was met. We defined the Paretofront as the set of solutions P ( C | i ) such that no othersolution presents a smaller (cid:104) d (cid:105) and a smaller I ( C ; i ). Fi-nally, for each number of clusters we selected the solutionwith the lowest (cid:104) d (cid:105) .For each number of clusters, the significance of theoptimal value found for (cid:104) d (cid:105) is shown by comparing itto the average distance corresponding to random clusterassignments. These assignments are made in such a waythat the amount of elements per cluster is conserved byrandomly shuffling the vector that assigns each behaviorto a particular cluster. The values presented in the maintext correspond to the mean and standard deviation of (cid:104) d (cid:105) over 50 different random trials. Deterministic Information Bottleneck
Here we use the Deterministic Information Bottleneck(DIB) method to find coarse-grainings of the behavioralspace that optimally predict future states [42]. Inspiredby the Information Bottleneck [50], DIB replaces thecompression measure I ( X, Z ) with the entropy H ( Z ),thus emphasizing constraints on the representation. DIBminimizes the functional: L α = H ( Z ) − αH ( Z | X ) − βI ( Z ; Y ) , (5)with respect to p ( z ∈ Z | x ∈ X ) and takes the limit as α → b ( n ) that can take on N = 134 different integer values at each discrete time n . Here, we relate the joint distributions of b ( n ) ( X inEqn. 5) and b ( n + τ ) ( Y ) through a coarse-grained clus-tering of the behavioral states ( Z ). We chose 10 , β between 0 . and N c between 2 and 30 clusters. Given N c , β anda random initial condition for p ( t | x ), we find a solutionby iterating the self-consistent equations [42] until theconvergence criteria | L t − L t +1 | < − is satisfied. Ifany cluster has its probability become zero at any itera-tion, then that cluster is dropped for all future iterations,thus N c is the maximum number of clusters that can bereturned. Of these 10 ,
000 solutions, we keep all solutionsthat are on the Pareto front (i.e., no other solution hasboth a higher I ( Y ; T ) and a smaller H ( T )). The dis-played clusters are the solutions on the Pareto front withthe largest I ( Y ; Z ) for a given number of clusters. Weighted Similarity Index
We quantify the similarity between clustering par-titions by calculating the Weighted Similarity Index(WSI), a modification of the Rand Index [43] such thatbehaviors contribute the index according to their overallprobability. Specifically,WSI = (cid:88) i,j ∈ S a W ij + (cid:88) k,l ∈ S b W kl , W ij = P i P k (cid:80) kl P k P l , (6)where S a ( S b ) is the set of pairs of behaviors that belongto the same (different) cluster in the two partitions and P k is the probability of observing behavior k .We thank Ilya Nemenman and Daniel Weissman fortheir helpful comments on the manuscript. D.G.H. wassupported by Programa Raices from the MinCyT. C.R.was supported by the NSF Physics of Living Systems Stu-dent Research Network (1806833). G.J.B. was supportedby NIMH R01 MH115831-01, the Human Frontier Sci-ence Program (RGY0076/2018), and a Cottrell ScholarAward, a program of the Research Corporation for Sci-ence Advancement (25999). J.C., D.L.S., and G.J.B.were supported by the Howard Hughes Medical Instituteand the Janelia visiting researcher program. ∗ D.G.H. and C.R. contributed equally to this work. † To whom correspondence should be addressed: [email protected].[1] K. Z. Lorenz, Scientific American , 67 (1958).[2] E. P. Martins and E. L. P. Martins,
Phylogenies and thecomparative method in animal behavior (Oxford Univer-sity Press, 1996).[3] F. Baier and H. E. Hoekstra, Proceedings of the RoyalSociety B , 20191697 (2019).[4] M. J. West-Eberhard,
Developmental plasticity and evo-lution (Oxford University Press, Oxford, U.K., 2003).[5] J. M. Gleason and M. G. Ritchie, Genetics , 1303(2004).[6] D. Yamamoto and Y. Ishikawa, Journal of Neurogenetics , 130 (2013).[7] C. Ellison, C. Wiley, and K. Shaw, Journal of Evolution-ary Biology , 1110 (2011).[8] K. L. Shaw and S. C. Lesnick, Proceedings of the Na-tional Academy of Sciences , 9737 (2009).[9] T. M. Williams and S. B. Carroll, Nature Reviews Ge-netics , 797 (2009).[10] N. Shubin, C. Tabin, and S. B. Carroll, Nature , 818(2009).[11] M. Levine and E. H. Davidson, Proceedings of the Na-tional Academy of Sciences , 4936 (2005).[12] D. L. Stern and N. Frankel, Philosophical Transactions ofthe Royal Society B: Biological Sciences , 20130028(2013).[13] B. S. Baker, B. J. Taylor, and J. Hall, Cell , 13(2001).[14] J. Cande, P. Andolfatto, B. Prud’homme, D. L. Stern,and N. Gompel, PLoS One , 1 (2012). [15] J. Cande, D. L. Stern, T. Morita, B. Prudhomme, andN. Gompel, Cell reports , 363 (2014).[16] Y. Ding, J. L. Lillvis, J. Cande, G. J. Berman, B. J.Arthur, M. Xu, B. J. Dickson, and D. L. Stern, CurrentBiology , 1089 (2019).[17] J. N. Webert, B. K. Peterson, and H. E. Hoekstra, Na-ture , 402 (2013).[18] C. K. Hu and H. E. Hoekstra, Seminars in cell & devel-opmental biology , 107 (2016).[19] G. J. Berman, D. M. Choi, W. Bialek, and J. W. Shae-vitz, Journal of The Royal Society Interface , 20140672(2014).[20] G. J. Berman, BMC Biology , 23 (2018).[21] A. E. X. Brown and B. de Bivort, Nature Physics , 410(2018).[22] J. Cande, S. Namiki, J. Qiu, W. Korff, G. M. Card, J. W.Shaevitz, D. L. Stern, and G. J. Berman, eLife , e34275(2018).[23] G. J. Berman, W. Bialek, and J. W. Shaevitz, Proceed-ings of the National Academy of Sciences , 11943(2016).[24] J. Hadfield, Journal of Statistical Software , 1 (2010).[25] J. Felsenstein, The American Naturalist , 1 (1985).[26] J. D. Hadfield and S. Nakagawa, Journal of evolutionarybiology , 494 (2010).[27] Drosophila
12 Genomes Consortium, Nature , 203(2007).[28] D. J. Obbard, J. Maclennan, K.-W. Kim, A. Rambaut,P. M. O’Grady, and F. M. Jiggins, Molecular Biologyand Evolution , 3459 (2012).[29] S. Chyb and N. Gompel, Atlas of Drosophila Morphology:Wild-type and classical mutants (Academic Press, 2013).[30] A. S. Seetharam and G. W. Stuart, PeerJ , e226 (2013).[31] D. L. Stern, J. Crocker, Y. Ding, N. Frankel, G. Kappes,E. Kim, R. Kuzmickas, A. Lemire, J. D. Mast, and S. Pi-card, G3 , 1339 (2017).[32] D. Yamamoto and Y. Ishikawa, Journal of Neurogenetics , 130 (2013).[33] T. O. Auer and R. Benton, Current opinion in neurobi-ology , 18 (2016). [34] L. van der Maaten and G. Hinton, Journal of MachineLearning Research , 2579 (2008).[35] F. Meyer, Signal processing , 113 (1994).[36] I. Goodfellow, Y. Bengio, and A. Courville, Deep learn-ing (MIT press, Cambridge, MA, 2016).[37] Z. Yang et al. , Computational molecular evolution , Vol.284 (Oxford University Press Oxford, Oxford, U.K.,2006).[38] C. W. Cunningham, K. E. Omland, and T. H. Oakley,Trends in Ecology & Evolution , 361 (1998).[39] J. Hadfield and S. Nakagawa, Journal of evolutionary bi-ology , 494 (2010).[40] N. Slonim, G. S. Atwal, G. Tkaik, andW. Bialek, Proceedings of the NationalAcademy of Sciences The Study of Instinct (Oxford UniversityPress, Oxford, U. K., 1951).[42] D. Strouse and D. J. Schwab, Neural Computation ,1611 (2017).[43] W. M. Rand, Journal of the American Statistical associ-ation , 846 (1971).[44] D. J. Anderson, Nature Reviews Neuroscience , 692(2016).[45] F. Tajima, Genetics , 599 (1993).[46] T. D. Pereira, D. E. Aldarondo, L. Willmore, M. Kislin,S. S. H. Wang, M. Murthy, and J. W. Shaevitz, NatureMethods , 117 (2018).[47] M. W. Mathis and A. Mathis, Current Opinion in Neu-robiology , 1 (2020).[48] A. Gelman and D. B. Rubin, Statistical Science , 457(1992).[49] S. P. Brooks and A. Gelman, Journal of computationaland graphical statistics , 434 (1998).[50] N. Tishby, F. C. Pereira, and W. Bialek, in Proceedingsof the 37th Annual Allerton Conference on Communica-tion, Control and Computing (University of Illinois Press,Urbana-Champaign, IL, 1999) pp. 368–377. FIG. S1. Gelman Rubin diagnostic for model parameters inferred using MCMC. A: PSRF for the 134 ancestral behaviorsinferred in the GLMM. 20 MCMC chains with different initial conditions were used. B: PSRF for the phylogenetic covariancematrix elements corresponding to the 10% most common behaviors performed by the measured flies. C: PSRF for the individualcovariance matrix elements corresponding to the 10% most common behaviors performed by the measured flies. The PSRFvalues for all of these inferred parameters indicate that the MCMC chains are converging. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - D. yakuba D. santomea D. sechellia D. simulans D. mauritiana D. melanogaster
Mean species behavior (Log) M e a n o f i n f e rr e d s p e c i e s b e h a v i o r ( L o g ) FIG. S2. Comparison between measured and inferred behaviors (in log scale) for each of the extant species. The mean of themeasured behavioral repertoires for all the individuals of a particular species is taken in the log scale. Each measured behavioralmean gets compared to the mean obtained from the components of the MCMC samples corresponding to that particular speciesand behavioral mode (i.e., the inferred behavioral repertories from the GLMM). The biggest differences occur mostly in thelow probability behaviors. WSIShuffleWSI W S I W S I W S I BAC
WSIShuffleWSIWSIShuffleWSI
FIG. S3. Behaviors clustered according to information of the individual covariance matrix using three different clusteringmethods. A: Results using k-medoids clustering method with distance matrix d ij = (1 − ρ ij ) / A B ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ■ ■ ■■ ■ ■ H ( Z ) I ( b ( t ) , Z ( t + τ )) Pareto Front
FIG. S4. Coarse-grained behavioral representations that are optimally predictive of the future behavior states via DIB. A:Behavioral representation with 2,3,...,7 clusters using τ = 50 in Eq. 5. B: Optimal trade-off curve (Pareto Front) betweencomplexity of coarse grained description against predictive power. For each number of clusters, representations in A correspondto points (in red) in this curve with the highest predictive information < d > < d >< d > Shuffle
FIG. S5. Modularity measure of the intra-species behavioral covariance matrix using information based clustering. < d >< d >