Multi-scale modeling of animal movement and general behavior data using hidden Markov models with hierarchical structures
Vianey Leos-Barajas, Eric Gangloff, Timo Adam, Roland Langrock, Floris M. van Beest, Jacob Nabe-Nielsen, Juan M. Morales
MMulti-scale modeling of animal movement and generalbehavior data using hidden Markov models withhierarchical structures
Vianey Leos-Barajas ∗ , Eric J. Gangloff , Timo Adam , Roland Langrock ,Floris M. van Beest , Jacob Nabe-Nielsen and Juan M. Morales Iowa State University, USA Bielefeld University, Germany Aarhus University, Denmark INIBIOMA-CONICET, Argentina
Abstract
Hidden Markov models (HMMs) are commonly used to model animal movementdata and infer aspects of animal behavior. An HMM assumes that each data pointfrom a time series of observations stems from one of N possible states. The states areloosely connected to behavioral modes that manifest themselves at the temporal res-olution at which observations are made. However, due to advances in tag technology,data can be collected at increasingly fine temporal resolutions. Yet, inferences at timescales cruder than those at which data are collected, and which correspond to larger-scale behavioral processes, are not yet answered via HMMs. We include additional a r X i v : . [ s t a t . M E ] F e b ierarchical structures to the basic HMM framework in order to incorporate multipleMarkov chains at various time scales. The hierarchically structured HMMs allow forbehavioral inferences at multiple time scales and can also serve as a means to avoidcoarsening data. Our proposed framework is one of the first that models animal be-havior simultaneously at multiple time scales, opening new possibilities in the area ofanimal movement modeling. We illustrate the application of hierarchically structuredHMMs in two real-data examples: (i) vertical movements of harbor porpoises observedin the field, and (ii) garter snake movement data collected as part of an experimentaldesign. Keywords: animal behavior, bio-logging, experimental design, latent process, state-switchingmodel, temporal resolution
Hidden Markov models (HMMs) and related state-switching models are prevalent in the fieldof animal movement modeling, where they provide a flexible framework to infer aspects ofanimal behavior from various types of movement data (Morales et al. , 2004; Patterson et al. ,2009; Langrock et al. , 2012, 2014). They are very natural models for time series data relatedto animal movement, as they account for the serial dependence typically observed, yet alsoallow to (loosely) connect each observation to distinct underlying behavioral modes. A basicHMM for movement data consists of two stochastic processes: an observed movement processand an underlying state process, the latter of which can be related to distinct behavioralmodes, at least in the sense of serving as a proxy of the actual behavioral process (Patterson et al. , 2009; Langrock et al. , 2012). Applications of HMMs to movement data often focus on2nvestigating the effect of individual and environmental covariates on state occupancy, andthus ultimately on the dynamics of the variation in behavioral modes in response to internaland external drivers.Generally, movement data are analyzed such that the observation process is assumedto stem from a single (behavioral) state process. It may however be the case that thereare two (connected) behavioral processes that occur at distinct time scales. For instance,so-called hierarchical HMMs have been used to process data on handwriting in order todistinguish between distinct letters but also to recognize a word, defined as a sequence ofwritten letters (Fine et al. , 1998). However, these versatile extensions of HMMs have notyet been applied to movement data, even in light of the intuitive idea that distinct behaviorsmanifest themselves at different time scales (hereafter referred to as multi-scale behaviors).A motivating example to have in mind is a central-place forager such as the southern elephantseal. These animals exhibit large-scale migration movements (from land colonies to eitherthe sea ice zone around Antarctica or into open-ocean pelagic zones, and back), but alsomovement patterns where much more frequent changes take place between behavioral modes,e.g. “foraging” and “resting” modes (Michelot et al. , 2016). The modeling framework wepropose regards such data as stemming from two behavioral processes, which operate ondifferent time scales: the first process determines the behavioral mode at the cruder timescale (e.g., whether or not an elephant seal is performing a migratory trip, and also whatkind of migratory trip), while the second process, at the finer time scale, determines thebehavioral mode nested within the large-scale mode (e.g., whether an elephant seal is restingor foraging, given it is close to the sea ice zone, or whether it is traveling or foraging, givenit is on a migratory trip).For multi-scale modeling of animal movement data, we propose an extension to the stan-3ard HMM that allows for a hierarchical state process, where two (or more) different Markovchains, operating at different time scales, will be tied together. To illustrate the applicationof hierarchically structured HMMs in a real-data setting, we model vertical movements ofa harbor porpoise (
Phocoena phocoena ) throughout its natural habitat in the northeasternpart of the North Sea. While the data were collected at a dive-by-dive resolution, the aimhere is to infer dive patterns at two different temporal scales: an hourly scale to infer thegeneral behavioral mode (e.g. resting or traveling), which may persist for a large numberof consecutive dives, and a fine-scale process to infer more nuanced state transitions at adive-by-dive resolution given the general behavioral mode. As a second real-data example,we model baby garter snake (
Thamnophis elegans ) movement data produced in a controlledexperimental design context. The hierarchically structured HMM has two Markov chains,where one Markov chain models the transitions among three types of movements (distancetraveled in 1/2 s) and the second Markov chain models transitions across the six time se-ries produced per snake. As this group of garter snakes was assigned to the control group,indicating no treatment effect, we use the second Markov chain to investigate personalityand repeatability in their movement patterns. That is, we attempt to answer if the gartersnakes adapt their general movement strategies or if they have tendencies to exhibit thesame general movement pattern across multiple time series.A conceptual challenge with HMMs, and in fact any discrete-time models for behavioraldata, is that the temporal resolution of the observations being analyzed (e.g., hourly, daily,etc.) determines what kind of behaviors may be inferred at all. Strictly speaking, this is not aproblem arising from the model applied, but rather from the sampling protocol, i.e., the data.For instance, Towner et al. (2016) processed white shark location data, collected every fiveminutes, into distance traveled and turning angle and subsequently connected each bivariate4bservation to “area-restricted search” and “transiting” behavior. If the shark’s locationwere observed once per day, we could not infer the same behaviors as switches between thesebehavioral modes occur at a much finer temporal scale. The hierarchically structured HMMswill not solve the conceptual challenges associated with data processing or data collectionrequired to infer multi-scale behaviors. However, it does offer new opportunities in theanalysis of animal movement data, allowing for identification of general behavioral patternsthat are a composition of fine-scale observations and inferences to be made at multiple time-scales.
In Section 2.1 we first detail the basic HMM framework in order to introduce the necessarynotation that will be used throughout the paper. In Section 2.2, we introduce the hierarchicalmodel formulation, distinguishing between two types of latent states, production states andinternal states, which occur at distinct time scales.
A basic HMM is composed of two stochastic processes: an observable state-dependent process { Y t } Tt =1 and an unobservable state process { S t } Tt =1 taking on a finite number of states. Herewe call the state a production state (as it produces an observation), in order to differentiateit from other forms of the latent states which we introduce in Section 2.2. As is generalpractice, we assume a first-order Markov process at the production state level, such that5 t , the production state at time t , given the states at all previous times, will only dependon S t − . We further assume Y t , t = 1 , . . . , T , to be conditionally independent of past andfuture observations and production states, given the production state S t , such that theproduction states effectively select from which of finitely many possible distributions eachobservation is drawn. Due to the Markov property, the evolution of the production statesover time is governed by the transition probability matrix (t.p.m.), Γ = ( γ ij ), where γ ij =Pr( S t = j | S t − = i ) for i, j = 1 , ..., N , with N denoting the number of production states.The initial distribution, δ , is a vector of probabilities with entries δ i = Pr( S = i ), of thefirst observation y belonging to one of the N production states. It is common to assumethe initial distribution to be the stationary distribution, defined as the solution to Γ δ = Γ .However, δ can also be estimated. In order to ensure identifiability when estimating theentries of the t.p.m., we map the entries of each row onto the real line with the use of themultinomial logit link and set the diagonal entries of the matrix as the reference categoriesin the following manner, γ ij = exp( η ij ) (cid:80) Nk =1 exp( η ik ) , where η ij = β ( ij ) if i (cid:54) = j ;0 otherwise . We similarly use a multinomial logit link transformation for the initial distribution, if es-timated rather than assumed to the be the stationary distribution. The state-dependentdistributions for Y t will be represented in terms of probability density or mass functions f ( y t | S t = i ) = f i ( y t ); i = 1 , . . . , N . If the observations are multivariate, in which case wewrite Y t = ( Y t , . . . , Y Rt ), we can either formulate a joint distribution f i ( y t ) or assume con-temporaneous conditional independence by allowing the joint distribution to be represented6s a product of marginal densities, f i ( y t )= f i ( y t ) f i ( y t ) · · · f Ri ( y Rt ). While parametric fam-ilies are usually chosen for the f i , such as a Gaussian or gamma distribution, we can alsoestimate the distribution nonparametrically by expressing it as a linear combination of alarge number of basis functions (Langrock et al. , 2015). The likelihood of an individual timeseries can be expressed concisely as a matrix product, L p ( y , . . . , y T ) = δ T P ( y ) T (cid:89) t =2 ΓP ( y t ) , (1)where P ( y t ) = diag ( f ( y t ) , . . . , f N ( y t )), δ denotes the initial distribution, and is a columnvector of ones. Calculation of the matrix product given above — the computational cost ofwhich notably is only linear in T — is equivalent to applying the forward algorithm, whichis an efficient recursive scheme for calculating the likelihood of an HMM (Zucchini et al. ,2016). The framework for the basic HMM accounts for switches at the production state level. Ina movement modeling analysis, the production states are generally thought to be proxiesfor behavior occurring at the time scale at which the data were collected (or processed).However, as outlined in the elephant seal example in the introduction, production statesalone may not be sufficient to encompass complex multi-scale behavioral processes. Morespecifically, there may be crude-scale behavioral processes (e.g. migration) that manifestthemselves as a sequence of production states and associated observations. Intuitively, wewould then connect a behavior occurring at a cruder time scale to one of K internal states ,7uch that each internal state generates a distinct HMM, with the corresponding N productionstates producing the actual observations.Akin to the basic HMM framework, we can think of a fine-scale sequence of observations, y m = ( y ,m , . . . , y T,m ) — with one such sequence for each m = 1 , . . . , M — to be producedby a sequence of production states, S ,m , . . . , S T,m , during a given time frame (namely the m th of M time frames). However, in addition we now assume that the way in which thesequence of production states is generated depends on which of K possible internal statesis active during the current time frame. The length of the sequence of production statesproduced by the k th internal state can be dictated by the data collection process or imposedby the analysis. The corresponding K -state internal state process, { H m } Mm =1 , is such that H m serves as a proxy for a behavior occurring at a cruder time scale, namely throughout the m th time frame. In this manner, we account for differences observed across the time series y m , m = 1 , . . . , M , by connecting each with one of K crude-scale behavioral processes whilestill modeling the transitions among production states at the time scale at which the datawere collected. Supposing that there are multiple time series per individual, we can modelthe manner in which an animal switches among the K internal states (behavioral processes).We assume a Markov process at the individual time series level, i.e. Pr( H m | H m − , . . . , H ) =Pr( H m | H m − ), such that the m th internal state is conditionally independent of all otherinternal states given the internal state at the ( m − th time point. The K × K t.p.m. for theinternal states { H m } Mm =1 examines persistence in the internal states, as well as the mannerin which an animal will switch among them. Figure 1 displays the dependence structure ofhierarchically structured HMMs with two Markov chains, one at the level of the productionstates, S t,m , and the other at the level of the internal states, H m .Let δ ( I ) denote a K − vector of initial probabilities for the internal states, let Γ ( I ) denote8 · · H m − H m H m +1 · · · S t,m S t − ,m S t +1 ,m Y t − ,m Y t,m Y t +1 ,m · · · · · ·· · · · · ·· · · · · · observablehiddenFigure 1: Dependence structure in hierarchically structured HMMs.the K × K t.p.m. for the internal state process, and define P ( I ) ( y m ) = diag (cid:0) L p ( y m | H m = 1) , . . . , L p ( y m | H m = K ) (cid:1) . The likelihoods L p ( y m | H m = k ), k = 1 , . . . , K , have the form as given in (1), andonly vary across k in terms of the implied production-level t.p.m. (and potentially alsothe production state-dependent distributions). Then the likelihood for the hierarchicallystructured HMM is, L h = δ ( I ) P ( I ) ( y ) M (cid:89) m =2 Γ ( I ) P ( I ) ( y m ) As the estimated production states are generally proxies for behaviors, allowing for only thet.p.m. to vary across the K HMMs leads to an interpretation of the K internal states (looselyconnected to K behavioral processes) as distinct manners in which an animal will persist andswitch among the production states. As long as the individual time series’ likelihoods, L p ,can be evaluated in an efficient manner, we can evaluate the likelihood of the hierarchicallystructured HMM via the forward algorithm, since the general structure does not differ fromthat of the basic HMM, and thus maximize it directly (Zucchini et al. , 2016). The Viterbialgorithm can be used for global state decoding, i.e. finding the sequence of the most likely9nternal and production states, respectively, given the observations. . To illustrate the application of hierarchically structured HMMs, we modelvertical movements of a harbor porpoise ( Phocoena phocoena ) throughout its natural habitatin the northeastern part of the North Sea. From a time-depth recorder (LAT1800ST, Lotek,Ontario, Canada), we obtained observations of the dive depth every second. Assuming a“dive” to be any vertical movement deeper than two meters below the surface, we used the R package diveMove (Luque, 2007) to process the raw data into measures of the dive duration,the maximum depth and the dive wiggliness (as represented by the absolute vertical distancecovered at the bottom of each dive) to characterize the porpoise’s vertical movements at adive-by-dive resolution. Previous applications of HMMs, though not hierarchically structuredHMMs, with dive-by-dive data of marine mammals have been presented in Hart et al. (2010)and DeRuiter et al. (2016). Overall, we consider 275 hours of observations, comprising 7,585dives in total (hence, about 28 dives per hour). . Behavioral modes of marine mammals, e.g., rest-ing, foraging and traveling, do not necessarily manifest themselves at a dive-by-dive resolu-tion. For example, foraging behavior typically coincides with a large proportion of extensive,wiggly dive sequences. However, foraging sequences may be interspersed by short periodsof resting behavior (shallow and smooth dives) even though the dominant behavioral modemay still be foraging. Such patterns are especially likely to occur in harbor porpoise divedata, a species that needs to feed almost continuously to meet energy requirements (Wis-10iewska et al. , 2016). In these cases, hierarchically structured HMMs have strong potentialto infer the movement strategies adopted over time, by modeling the transitions betweendistinct dive patterns (as represented by multiple HMMs) rather than modeling dive-by-diveobservations using a single HMM. Thus, to draw a more detailed picture of the behavioraldynamics at multiple time scales, we use hierarchical HMMs, where a crude-scale K -stateMarkov chain selects which of K fine-scale HMMs describes the dive pattern observed atany point in time. Intuitively, the crude-scale process describes the general behavioral mode(e.g. resting or traveling) — which may persist for a large number of consecutive dives —while the fine-scale process captures more nuanced state transitions at the dive-by-dive level,given the general behavioral mode.In terms of the crude time scale, we segmented the time series into hourly intervals andallowed each segment to be connected to one of K = 2 HMMs with N = 3 (dive-by-dive-level)states each. This somewhat arbitrary time scale was chosen based on exploratory analysisof the data set, which suggested that a certain dive pattern is typically adopted for severalhours before switching to another one (c.f. Figure 3). As comprehensively discussed in Pohle et al. (2017), model selection criteria such as Akaike’s Information Criterion (AIC) or theBayesian Information Criterion (BIC) typically tend to favor models with larger number ofstates than are biologically sensible. This is indeed a well-known and notorious problem inapplications of HMMs to ecological data (see also Langrock et al. , 2015, DeRuiter et al. ,2016, Li and Bolker, 2017). Thus, following Pohle et al. (2017), instead of relying on formalmodel selection procedures, the number of states was chosen pragmatically, with particularemphasis on model parsimony and biological intuition.The state-dependent distributions were kept the same across the two dive-level HMMs,which were instead allowed to differ only by the t.p.m.s. This assumption implies that any11f the three types of dives — as generated by the three different production states — couldin principle occur in both crude-level behavioral modes, but will not occur equally often,on average, due to the different Markov chains active at the dive-by-dive level. The initialstate distributions, both for the internal and for the production state process, were assumedto be the stationary distributions of the respective Markov chains. We assumed gammadistributions for each of the three dive variables (dive duration, maximum depth and divewiggliness), with an additional point mass on zero in case of dive wiggliness to account forthe zeros observed. We assumed contemporaneous conditional independence, i.e. for anygiven dive, the three variables observed are conditionally independent given the productionstate active at the time of the dive (Zucchini et al. , 2016). These assumptions could in factbe relaxed if deemed necessary. However, for this case study we decided that in order toillustrate the key concepts, it would be best to focus on a relatively simple yet biologicallyinformative model structure.We computed the likelihood using the forward algorithm (Zucchini et al. , 2016) and usedthe R function nlm (R Core Team, 2016) to obtain maximum likelihood estimates via directnumerical likelihood maximization. . The fitted (dive-level) state-dependent distribu-tions displayed in Figure 2 suggest three distinct dive types: State 1 captures the shortest(lasting less than 25 seconds), shallowest (less than 10 meters deep) and smoothest (lessthan 8 meters absolute vertical distance covered) dives with small variance. State 2 capturesmoderately long (10-60 seconds), moderately deep (5-25 meters) and moderately wiggly (5-30 meters) dives with moderate variance. State 3 captures the longest (40-180 seconds),deepest (10-80 meters) and wiggliest (10-80 meters) dives with high variance.In the next section, we discuss the ( K = 2) distinct dive-level switching patterns among12 roduction state 1production state 2production state 3 seconds d i v e du r a t i on ( den s i t y ) production state 1production state 2production state 3 meters m a x i m u m dep t h ( den s i t y ) production state 1production state 2production state 3Pr(X=0|S=1)=0.309Pr(X=0|S=2)=0.008Pr(X=0|S=3)=0.000 meters d i v e w i gg li ne ss ( den s i t y ) Figure 2: Fitted state-dependent distributions for the dive duration, the maximum depthand the dive wiggliness, the latter together with the estimated point mass on zero.the ( N = 3) states discussed here, as well as the crude-level process that selects which of thedive-level switching patterns is active in any given hour. . The t.p.m. of the crude-level Markov chain,which selects among the dive-level HMMs, and the t.p.m.s of those two dive-level HMMs,which describe the switching between different types of dives, were estimated as follows: • crude level: ˆ Γ ( I ) = .
789 0 . .
219 0 . • dive level: ˆ Γ = .
406 0 .
443 0 . .
240 0 .
600 0 . .
196 0 .
366 0 . and ˆ Γ = .
277 0 .
153 0 . .
124 0 .
248 0 . .
057 0 .
087 0 . The corresponding stationary distributions are (0 . , . . , . , . . , . , . Γ ( I ) ,there is fairly strong persistence in the crude-level states, indicating that the porpoise typ-ically remains in any given internal state for several hours before switching to the otherinternal state. This is also confirmed by Figure 3, which displays the first 25% of the de-coded observations. In particular, Figure 3 shows that there are bouts of several hours whereproduction states 1 and 2 are dominant (yet still interspersed with occasional dives generatedby production state 3), but also such where production state 3 is dominant. Bouts of theformer type are assigned to internal state 1, while the latter are assigned to internal state 2.This again highlights the need to apply hierarchically structured HMMs, here effectively asa means to account for temporal heterogeneity in the state-switching pattern exhibited bythe porpoise.At the dive level, when the first HMM is active, then in the long run about 28%, 51%and 22% of the observations are generated in state 1, 2 and 3, respectively, whereas when thesecond HMM is active, in the long run about 8%, 11% and 81% of the dives are generatedin the respective states. Furthermore, if the first HMM is active, then switching takes placeprimarily between states 1 and 2, and additionally from state 3 to state 2. If the secondHMM is active, then state 3 is dominant, with fairly strong persistence and lots of switchesfrom states 1 and 2 to state 3. . The second HMM is indicative of foraging behavior, particularlydue to the extensive wiggliness during dives, which often indicates prey-chasing. The inter-pretation of the first HMM, which involves a large proportion of relatively short, shallow andsmooth dives, could indicate a resting and/or a traveling behavior. Indeed, traveling fromone area to another while remaining close to the water surface is likely the most efficientstrategy. However, a more detailed interpretation of the first HMM would require inclusion14 d i v e du r a t i on ( s e c ond s ) m a x i m u m dep t h ( m e t e r s ) d i v e w i gg li ne ss ( m e t e r s )
1 2 1 250 500 750 1000 1250 1500 1750 dive index s t a t e ( c r ude l e v e l ) Figure 3: Exemplary sequence of the first 25% of the observations of the dive duration, themaximum depth and the dive wiggliness. Hourly segments are indicated by vertical greylines. The states were decoded using the Viterbi algorithm.of other variables such as the step length, which may prove useful to distinguish betweenresting and traveling. . We model the movements of 19 juvenile garter snakes (
Thamnophis elegans )in repeated trials that are a subset from a larger experiment quantifying behaviors in theoffspring of wild-caught females across experimental treatments, manuscript in preparation.Using EthoVision XT 8.5 (Noldus Information Technology, Wageningen, The Netherlands),15e extracted movement data for each of the snakes across three trials. In brief, snakes wereplaced in a novel test arena (circular enclosure with diameter of 24.5 cm) for 120 s. Trialswere videorecorded and divided into two segments, each lasting 50 s, to account for anypossible disturbances of snakes by observer movement at the beginning and end of eachtrial. This resulted in a total of six recorded segments for each snake. The individualsincluded here represent the control group, which was not exposed to any additional in thetest arena during the first two trials and was exposed to a novel object after the first minuteof the third trial (that is, between tracks 5 and 6). . The snakes displayed a variety of general move-ment strategies, from the extreme of remaining motionless to moving rapidly around thetest arena for the duration of the trial. We calculated the distance moved within 1/2 sand subsequently applied a square root transformation to deal with extreme values present.We assumed that each observed distance conditional on one of three production states wasgenerated by a state-dependent gamma density. Further, to investigate habituation andbehavioral plasticity over the course of the six time series per snake, we assumed that eachtime series was generated by one of three internal state-dependent HMMs. The completehierarchically structured HMM fitted to the observed distances was composed of three pro-duction states, kept the same across the internal states, and three internal states. In thismanner, we investigated whether there was persistence at the internal state level, i.e. if thegarter snakes tended to repeat the same general movement patterns across time series orswitch strategies. . The fitted state-dependent gamma distributionsfor the three production states, shown in Figure 4, correspond to three general types ofmovement strategies: motionless (or nearly so), slow exploratory, and rapid escape, which16he video recordings demonstrate. The estimated average distance traveled in productionstates 1–3 are: 0.0148, 0.459 and 1.891 cm per 1/2 s, respectively. The largest amountof variability in observed step lengths corresponds to production state 3, with a standarddeviation of 0.487 cm / . production state 1production state 2 production state 30246 0 1 2 3 distance den s i t y Figure 4: Fitted state-dependent distributions for distance traveled. . • crude level: ˆ Γ ( I ) = .
166 0 .
578 0 . .
680 0 .
226 0 . .
157 0 .
208 0 . ˆ δ ( I ) = . . . movement level: ˆ Γ = .
947 0 .
047 0 . .
018 0 .
919 0 . ∼ .
244 0 . ˆ δ = . . . ˆ Γ = .
806 0 .
144 0 . .
019 0 .
657 0 . ∼ .
185 0 . ˆ δ = . . . ˆ Γ = .
994 0 . ∼ .
003 0 . ∼ ∼ .
018 0 . ˆ δ = . . . The three estimated t.p.m.s at the movement level, corresponding to the three internalstates, can generally be interpreted as representing three different levels of behavioral flexi-bility. When the first internal state is active, characterized by ˆ Γ , individuals are showingmore persistence in the motionless and slow exploratory behavioral states overall, and arelikely to transition from the rapid escape state to the exploratory state. When the secondinternal state is active, ˆ Γ demonstrates that individuals are switching regularly betweenbehavioral states. When the third internal state is active, ˆ Γ demonstrates that individualsseldom transition between states. Thus, the three internal states reflect a continuum ofbehavioral flexibility within a short, but ecologically relevant time scale in the context of apotential predation event ( < Γ ( I ) ) indicates that individuals are readily switching amongthe three movement level HMMs across trials and in fact are more likely to switch between18 eadow672−5 Meadow673−3 Meadow674−2 Meadow676−2Meadow661−1 Meadow662−3 Meadow663−2 Meadow664−3 Meadow667−2Lakeshore698−8 Meadow26−3 Meadow6511−1 Meadow658−2 Meadow660−1Lakeshore6771−1 Lakeshore697−3 Lakeshore697−6 Lakeshore698−2 Lakeshore698−32 4 6 2 4 6 2 4 6 2 4 6 2 4 6123123123123 Time Series I n t e r na l S t a t e Figure 5: Internal state decodings by garter snake. Each garter snake is from one of twoecotypes: Lakeshore or Meadow.the movement HMMs representing the greatest behavioral flexibility ( ˆ Γ and ˆ Γ ). At thecrude time scale, we observe persistence in the the movement-level HMM describing snakesthat are behaviorally inflexible in their movements ( ˆ Γ ). Overall, these results indicate that,at the broader time scale, many individuals are readily altering their level of behavioralflexibility while some individuals remain persistent in their behaviors both within and acrosstrials (i.e., at both the movement and crude levels). . HMMs are not yet commonly applied to animal movement datafrom experimental designs, even though they typically produce multiple time series perindividual. Introducing multiple Markov chains in the HMM formulation lends itself tocharacterizing the consistency of individual behaviors and variation among individuals atdifferent time scales. We show that individuals employing multiple movement strategies in anarrow time frame are more likely to switch between strategies at a crude time scale, whileindividuals consistent in their behaviors at the movement time scale are also consistent at the19rude time scale. Furthermore, these patterns are independent of the behavioral strategiesexhibited: individuals may consistently remain in any one of the three behavioral states. HMMs have proven to be useful statistical tools for modeling animal movement data, pro-viding a framework to infer drivers of variation in movement patterns, and thus behavior.The basic HMM, however, has so far been used to infer aspects of animal behavior onlywhen a single data point can be thought to stem from one of N possible (production) states,which are loosely connected to behavioral modes that manifest themselves at the temporalresolution at which observations are made. Yet, thanks to advances in tag technology andbattery life, data can be collected at finer temporal resolutions and over longer periods oftime. Inferences at time scales cruder than those at which data are collected, and which cor-respond to larger-scale behavioral processes, are not yet answered via HMMs. We providea corresponding extension to incorporate multiple Markov chains in an HMM, allowing formulti-scale behavioral inferences. The extension is straightforward in the sense that likeli-hood inference via application of the forward algorithm is essentially analogous as in caseof basic HMMs. The hierarchically structured HMMs can also be used to avoid coarseningdata, such as acceleration data that can be collected many times per second (Leos-Barajas et al. , 2016). As this is, as of yet, an area of movement ecology that has received little atten-tion, our proposed framework is one of the first that models animal behavior simultaneouslyat multiple time scales.In this manuscript, we did not discuss how to implement model selection and modelchecking for hierarchically structured HMMs. In principle, since we are fitting the models20sing maximum likelihood, model selection could be conducted using standard informationcriteria. However, while conceptually this is completely straightforward, in practice thisprocedure is notoriously error-prone already for basic HMMs, due to the strong tendency ofinformation criteria to favor models with many more states than are biologically reasonable(Langrock et al. , 2015; Pohle et al. , 2017; Li and Bolker, 2017). Given the additional stateprocess, this issue will be exacerbated within hierarchically structured HMMs as presentedin this work, since the number of states both for the production process and for the internalstate process needs to be chosen. We cannot currently offer a satisfactory solution to thisproblem, except by saying that biological a priori expert knowledge ought to be taken intoaccount. For general advice regarding the issue of model selection in HMMs, see (Pohle et al. ,2017). For model checking, possible avenues are (i) simulation-based model assessment and(ii) analyses of pseudo-residuals. Regarding (i), the fundamental concept is the idea thatthe fitted model should generate data similar to the observed data in all important aspects.Quantification of aspects of the data patterns should reflect key behaviors believed to beimportant to the problem. Pseudo-residuals, as discussed for example in Patterson et al. (2009), Langrock et al. (2012) and in Zucchini et al. (2016), can be calculated also forhierarchically structured HMMs, most easily by conditioning on Viterbi-decoded internalstates, hence calculating the pseudo-residuals at the production level, given the (fixed) mostlikely internal state sequence. Both model selection and model checking needs to be exploredfurther before these models may become a tool that is routinely applied in the analysis ofanimal behavior data.Using ad hoc choices of the exact model formulations (yet such that are grounded inbiological theory), in Section 3 we demonstrated how the hierarchically structured HMMs,applied to movement data collected on harbor porpoises and garter snakes, respectively, pro-21ided new insights into the behavior of these species. However, a hierarchically structuredHMM not only allows for new inferences to be made from movement studies — it can alsobe applied to the study of behavior in general. Being able to characterize persistence ofmovement patterns at multiple time scales allows us to learn about personality, individualspecialization, and cognition, among other things. Several studies across a wide range of taxahave shown that individual animals behave differently from other individuals and that thesedifferences are maintained through time (R´eale et al. , 2007; Sih et al. , 2004; Dingemanse &R´eale, 2005; Biro & Stamps, 2008). These observations have given rise to the burgeoningfield of animal personality which explores the ecology and evolutionary significance of suchbehavioral differences among individuals. Such studies have included a variety of behavioralmeasures but have only recently incorporated models of movement as a behavioral trait (e.g.Schliehe-Dieks et al. , 2012; McKellar et al. , 2015; Spiegel et al. , 2017). Importantly, the an-imal personality framework has recently incorporated an understanding of how individualsdiffer in their behavioral plasticity (reviewed in Mathot & Dingemanse, 2015; Stamps, 2016),which requires more specific theoretical models as well as more sophisticated statistical ap-proaches (Dingemanse & Dochtermann, 2013; Kleun & Brommer, 2013; Japyass´u & Malange,2014). Thus, the field is attempting to address two fundamental questions: (1) how do be-haviors differ among individuals and (2) how do individual behaviors change over time orcontext? Addressing these questions therefore requires analysis at two levels: (1) to identifyand categorize behavioral states (production states) and (2) to identify patterns of changesin behavioral states (internal states). In the HMM framework, the internal states may reflectgeneral movement patterns associated with endogenous behavioral plasticity ( sensu Stamps,2016) or personality which allows for further examination of persistence or switching amongthem at the cruder time-scale. 22he addition of multiple Markov chains in the HMM framework to conduct multi-scalebehavioral inferences necessitates the selection of the temporal resolution at two time scales:the observation level and the level of the individual time series. The selected temporalresolution at the level of the internal states will need to be tied to the specific biologicalquestion of interest. There may be a natural manner in which the data are segmentedthat produces time series of unequal length. However, this need not be an issue as long aseach time series is reflective of some general behavioral process irrespective of the length ofthe time series. Formulating the hierarchically structured HMM, in terms of selecting thenumber of production states and internal states, will need to be done in a pragmatic fashionin order to balance model complexity with biological intuition. Due to the HMM’s inherentflexibility, the internal states may be formulated in a few manners, e.g. a single HMM (suchas has been described in Section 2), assuming a distribution of HMMs, or allowing for longerstate dwell times via the hidden semi-Markov model, in order to account for unexplainedvariability in the state processes. In particular, as the number of production states, N ,increases, so will the number of ways in which two HMM’s t.p.m.s Γ i and Γ j can differ. Toaccount for all of these possibilities may require a large number of internal states, if eachinternal state is assumed to only correspond to one t.p.m. for the HMM.Adding hierarchical structures to the HMM opens new possibilities for modeling multi-scale behaviors and provides an avenue to study animal personality and general behaviorfrom movement studies. In this manner, environmental covariates can also be included tounderstand their effects on state occupancy and dynamics of variation in behavioral modes atbroader time-scales than that at which the data are collected. Further, this framework maybe adapted for simultaneous modeling of multiple animal behavior data streams collected atdistinct temporal resolutions. The internal states can be adapted to generate a sequence of23ne-scale observations as well as one observation from a distinct data stream. REFERENCES
Biro, P.A. & Stamps, J.A. (2008) Are animal personality traits linked to life-history produc-tivity?
Trends in Ecology Evolution , , 361—368.DeRuiter, S.L., Langrock, R., Skirbutas, T., Goldbogen, J.A., Calambokidis, J., Friedlaen-der, A.S. & Southall, B.L. (in press) A multivariate mixed hidden Markov model for bluewhale behaviour and responses to sound exposure. Annals of Applied Statistics , —.Dingemanse, N.J. & R´eale, D. (2005) Natural selection and animal personality.
Behaviour , , 1159–1184.Dingemanse, N.J. & Dochtermann, N.A. (2013) Quantifying individual variation in be-haviour: mixed-effect modelling approaches. Journal of Animal Ecology , , 39–54.24ine, S., Singer, Y. & Tishby N. (1998) The hierarchical hidden Markov model: Analysisand applications. Machine Learning , , 41–62.Hart, T., Mann, R., Coulson, T., Pettorelli, N. & Trathan, P.N. (2010) Behavioural switchingin a central place forager: patterns of diving behaviour in the macaroni penguin ( Eudypteschrysolophus ). Marine Biology , , 1543–1553.Japyass´u, H.F. & Malange, J. (2014) Plasticity, stereotypy, intra-individual variability andpersonality: handle with care. Behavioural Processes , , 40–47.Kleun, E. & Brommer, J.E. (2013) Context-specific repeatability of personality traits in awild bird: A reaction-norm perspective. Behavioral Ecology , , 650–658.Langrock, R., King, R., Matthiopoulos, J., Thomas, L., Fortin, D. & Morales, J.M. (2012)Flexible and practical modeling of animal telemetry data: hidden Markov models andextensions. Ecology , , 2336–2342.Langrock, R. & Zucchini, W. (2012) Hidden Markov models with arbitrary state dwell-timedistributions. Computational Statistics and Data Analysis , , 715–724.Langrock, R., Marques, T.A., Baird, R.W. & Thomas, L. (2014) Modeling the diving behav-ior of whales: a latent-variable approach with feedback and semi-Markovian components. Journal of Agricultural, Biological and Environmental Statistics , , 82–100.Langrock, R., Kneib, T., Sohn, A. & DeRuiter, S.L. (2015) Nonparametric inference inhidden Markov models using P-splines. Biometrics , , 520–528.Leos-Barajas, V., Photopoulou, T., Langrock, R., Patterson, T.A., Murgatroyd, M., Watan-25be, Y.Y. & Papastamatiou, Y.P. (2016) Analysis of accelerometer data using hiddenMarkov models. Methods in Ecology and Evolution , doi:10.1111/2041-210X.12657.Li, M. & Bolker, B.M. (2017) Incorporating periodic variability in hidden Markov modelsfor animal movement
Movement Ecology , doi:10.1186/s40462-016-0093-6.Luque, S.P. (2007) Diving Behaviour Analysis in R. An Introduction to the diveMove Pack-age.
R News , , 8—14.Maruotii, A. & Ryden, T. (2009) A semiparametric approach to hidden Markov modelsunder longitudinal observations. Statistics and Computing , , 381–393.Mathot, K.J. & Dingemanse, N.J. (2015) Plasticity and Personality. In Integrative Organis-mal Biology , pp. 55-69, John Wiley & Sons, NJ, Hoboken.McKellar, A.E., Langrock, R., Walters, J.R. & Kesler, D.C. (2015) Using mixed hiddenMarkov models to examine behavioural states in a cooperatively breeding bird.
BehavioralEcology , , 148–157.Michelot, T., Langrock, R., Bestley, S., Jonsen, I.D., Photopoulou, T. & Patterson,T.A. (2016) Estimation and simulation of foraging trips in land-based marine predators. arXiv :1610.06953.Morales, J.M., Haydon, D.T., Frair, J., Holsinger, K.E. & Fryxell, J.M. (2004) Extractingmore out of relocation data: building movement models as mixtures of random walks. Ecology , , 2436–2445.Patterson, T.A., Basson, M., Bravington, M.V. & Gunn, J.S. (2009) Classifying movement26ehaviour in relation to environmental conditions using hidden Markov models. Journalof Animal Ecology , , 1113–1123.Pohle, J., Langrock, R., van Beest, F.M. & Schmidt, N.M. (2017) Selecting the number ofstates in hidden Markov models —– pitfalls, practical challenges and pragmatic solutions. arXiv :1701.08673.R Core Team (2016) R: A language and environment for statistical computing
Biological Reviews , , 291–318.Schliehe-Dieks, S., Kappeler, P.M. & Langrock, R. (2012) On the application of mixed hiddenMarkov models to multiple behavioural time series. Interface Focus , , 180–189.Sih, A., Bell, A. & Johnson, J.C. (2004) Behavioral syndromes: an ecological and evolution-ary overview. Trends in Ecology and Evolution , , 372–378.Spiegel, O., Leu, S.T., Bull, C.M. & Sih, A. (2017) What’s your move? Movement as alink between personality and spatial dynamics in animal populations. Ecology Letters , ,3–18. doi:10.1111/ele.12708Stamps, J.A. (2016) Individual differences in behavioural plasticities. Biological Reviews , ,534–567.Towner, A., Leos-Barajas, V., Langrock, R., Schick, R., Smale, M., Taschke, T., Jewell,O. & Papastamatiou, Y.P. (2016) Sex-specific and individual specialization for huntingstrategies in white sharks. Functional Ecology , In press, DOI: 10.1111/1365-2435.12613.27isniewska, D. M., M. Johnson, J. Teilmann, L. Rojano-Do˜nate, J. Shearer, S. Sveegaard, L.A. Miller, U. Siebert & P. T. Madsen (2016) Ultra-high foraging rates of harbor porpoisesmake them vulnerable to anthropogenic disturbance.