[PDF] Hierarchical animal movement models for population-level inference

Abstract

New methods for modeling animal movement based on telemetry data are developed regularly. With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated. Despite a need for population-level inference, animal movement models are still predominantly developed for individual-level inference. Most efforts to upscale the inference to the population-level are either post hoc or complicated enough that only the developer can implement the model. Hierarchical Bayesian models provide an ideal platform for the development of population-level animal movement models but can be challenging to fit due to computational limitations or extensive tuning required. We propose a two-stage procedure for fitting hierarchical animal movement models to telemetry data. The two-stage approach is statistically rigorous and allows one to fit individual-level movement models separately, then resample them using a secondary MCMC algorithm. The primary advantages of the two-stage approach are that the first stage is easily parallelizable and the second stage is completely unsupervised, allowing for a completely automated fitting procedure in many cases. We demonstrate the two-stage procedure with two applications of animal movement models. The first application involves a spatial point process approach to modeling telemetry data and the second involves a more complicated continuous-time discrete-space animal movement model. We fit these models to simulated data and real telemetry data arising from a population of monitored Canada lynx in Colorado, USA.

Full PDF

PPopulation-Level Inference for Animal Movement

Hierarchical Animal Movement Models forPopulation-Level Inference

Mevin B. Hooten a ∗ Frances E. Buderman b , Brian M. Brost b ,Ephraim M. Hanks c , and Jacob S. Ivan d Summary:

New methods for modeling animal movement based on telemetry data are developed regularly.With advances in telemetry capabilities, animal movement models are becoming increasingly sophisticated.Despite a need for population-level inference, animal movement models are still predominantly developed forindividual-level inference. Most eﬀorts to upscale the inference to the population-level are either post hoc orcomplicated enough that only the developer can implement the model. Hierarchical Bayesian models providean ideal platform for the development of population-level animal movement models but can be challengingto ﬁt due to computational limitations or extensive tuning required. We propose a two-stage procedurefor ﬁtting hierarchical animal movement models to telemetry data. The two-stage approach is statisticallyrigorous and allows one to ﬁt individual-level movement models separately, then resample them using asecondary MCMC algorithm. The primary advantages of the two-stage approach are that the ﬁrst stage iseasily parallelizable and the second stage is completely unsupervised, allowing for a completely automatedﬁtting procedure in many cases. We demonstrate the two-stage procedure with two applications of animalmovement models. The ﬁrst application involves a spatial point process approach to modeling telemetry dataand the second involves a more complicated continuous-time discrete-space animal movement model. We ﬁtthese models to simulated data and real telemetry data arising from a population of monitored Canada lynxin Colorado, USA.

Keywords:

Hierarchical model, resource selection model, spatial statistics, telemetry data, trajectories. a U.S. Geological Survey, Colorado Cooperative Fish and Wildlife Research Unit; Departments of Fish, Wildlife, & ConservationBiology and Statistics, Colorado State University, Fort Collins, CO 80523 b Department of Fish, Wildlife, and Conservation Biology, Colorado State University c Department of Statistics, Pennsylvania State University d Colorado Parks and Wildlife ∗ Correspondence to: M.B. Hooten, E-mail: [email protected] a r X i v : . [ s t a t . M E ] J un .B. Hooten et al.

1. INTRODUCTION

The ﬁeld of movement ecology is booming, in large part, because of the increased availabilityof telemetry data sources (Cagnacci et al. et al.

In Press). Forexample, most individual-based statistical models for telemetry data fall into one of threeclasses: point process models, discrete-time models, or continuous-time models, with eachbeing appropriate in certain settings (McClintock et al. et al. y j ∼ [ y j | β j , θ j ] , (1) β j ∼ [ β j | µ β , Σ β ] , (2) µ β ∼ [ µ β ] , (3) Σ − β ∼ [ Σ − β ] , (4) θ j ∼ [ θ j ] , (5)where y j are measurements associated with each individual j ( j = 1 , . . . , J ) and we use‘[ . . . ]’ to denote a probability distribution or mass/density function as necessary (Gelfand andSmith 1990). The priors in (3)–(5) are for the auxiliary data-level parameters θ j , population-level coeﬃcients µ β , and precision matrix Σ − β , forming the familiar three-level hierarchicalmodel (Berliner 1996). The hierarchical model in (1)–(5) provides a straightforward andintuitive means for obtaining inference for µ β , which is the ultimate goal of many animalmovement studies. Similar hierarchical models have become popular, and now standard, toolsfor obtaining upscaled inference in many other ﬁelds such as atmospheric science (Cressieand Wikle 2011), ecology (Hobbs and Hooten 2015), and sociology (Gelman and Hill 2006).The complexity of modern animal movement models makes implementation challenging.Furthermore, increases in the quantity of data resulting from newer telemetry devices hasoutpaced computational methods for ﬁtting animal movement models. Animal ecologistsmay wish to extend individual-level models to provide statistically rigorous population-level inference, but, in many cases, the algorithms required to ﬁt such models becomeprohibitively challenging to program or are too slow in settings with large data sets and/ormany individuals. For example, Hanks et al. (2011) performed a post hoc meta-analysis toobtain population-level inference for northern fur seals ( Callorhinus ursinus ) because theimplementation of a full hierarchical movement model was not computationally feasible.3.B. Hooten et al.Furthermore, in the Bayesian setting, Markov Chain Monte Carlo (MCMC) algorithms formost animal movement models require tuning from the user due to lack of conjugacy. In caseswhere data sets from tens or hundreds of individuals are available, it may not be feasible totune individual-level Metropolis-Hastings updates for all parameters.We present a statistically rigorous two-stage procedure for economizing hierarchical animalmovement models to provide exact population-level inference using a sequence of algorithmsthat are fast, stable, and require little or no tuning by the user. Our approach is simple.First, we ﬁt individual-level models (1) independently using a preferred stochastic samplingalgorithm. Independent model ﬁts in the ﬁrst stage allow for parallel processing, leading to animprovement in computational eﬃciency that scales with the number of processors. Second,we obtain exact population-level inference using a secondary MCMC algorithm that requiresno tuning. The secondary algorithm is based on a little-known technique for Bayesian meta-analysis proposed by Lunn et al. (2013). We found that our two-stage procedure providessubstantial computational improvements in both speed and ease of use in cases with largedata sets and/or complicated data models.In what follows, we present a general two-stage procedure for ﬁtting a broad class ofhierarchical animal movement models. We then demonstrate the approach for a basic pointprocess model for telemetry data (i.e., resource selection function model) and verify it usingsimulation. In our second application, we show how the approach can be applied to acontinuous-time discrete-space (CTDS) animal movement model using telemetry data withcomplicated error structure. We apply the CTDS model to satellite telemetry data from apopulation of Canada lynx (

Lynx canadensis ) in Colorado, USA. Finally, we close with asummary and discussion of the approach and future directions.4opulation-Level Inference for Animal Movement

2. TWO-STAGE PROCEDURE

Many animal movement models have been constructed solely for individual-level inference(e.g., Jonsen et al. et al. et al. et al. et al. et al. (2013), wepropose a simple two-stage procedure for obtaining population-level inference under thefull hierarchical model. The two-stage procedure only requires independent individual-levelmodel ﬁts and an unsupervised resampling algorithm to obtain population-level inferencewithout any user tuning.The ﬁrst stage in the procedure involves ﬁtting a data model like (1) independently foreach individual j ( j = 1 , . . . , J ). In addition to the prior for auxiliary data-level parameters θ j from (5), we also specify a prior for the individual-level parameters β j as β j ∼ [ β j ] (wherethe priors for θ j and β j can diﬀer by individual). The priors for β j are only used in theﬁrst stage of the two-stage procedure and do not aﬀect the ﬁnal inference. The posteriordistribution for individual j is[ θ j , β j | y j ] = [ y j | β j , θ j ][ β j ][ θ j ] (cid:82) (cid:82) [ y j | β j , θ j ][ β j ][ θ j ] d β j d θ j . (6)In principle, any stochastic sampling algorithm can be used to obtain samples from theposterior distribution in (6), but those relying on MCMC are most commonly applied in5.B. Hooten et al.the animal movement literature. However, because we treat the models in (6) for all J individuals independently in the ﬁrst stage, they can be ﬁt in parallel using readily availablesoftware (e.g., the ‘parallel’ R package; R Core Team 2016). Additionally, if we choose asampling algorithm for ﬁtting the models in (6) that is unsupervised (i.e., requiring nosupervised tuning), then the entire two-stage procedure can be automated. An unsupervisedﬁtting procedure will be used much more often by ecologists in situations where data existfor a large number of individuals. Thus, automatic MCMC algorithms like BUGS (Lunn et al. et al. β j ] canalso lead to fully automatic and parallelizable ﬁrst-stage algorithms. For example, if the datamodel (1) is Poisson (i.e., y j ∼ Pois(exp( X j β j )), where X j is a design matrix of covariatesfor the j th individual), then θ j is empty because the Poisson does not have a separatedispersion parameter. A multivariate log-gamma prior distribution (Crooks 2010; Bradley et al. β j facilitates the use of a Monte Carlo sampler to obtain posterior samplesfrom (6). For non-conjugate priors, adaptively tuned MCMC algorithms (e.g., Givens andHoeting 2012) are straightforward to implement and provide a way to obtain unsupervisedstage-one samples for β j .The second stage in the two-stage procedure involves an MCMC algorithm resemblingthat used to ﬁt the full hierarchical model, but with a critical simpliﬁcation. To ﬁt the fullhierarchical model in (1)–(5), we sequentially sample from the full-conditional distributions[ β j |· ] for j = 1 , . . . , J , [ µ β |· ], and [ Σ − β |· ], using an MCMC algorithm. In our second stagealgorithm, we use the MCMC algorithm for the full hierarchical model as a template, butmodify the updates for β j . Updates for the individual-level auxiliary parameters, θ j , areautomatically coupled with those from β j , but are only necessary if we desire inference for θ j . In fact, if θ j are considered nuisance parameters, it is not necessary to store samples for6opulation-Level Inference for Animal Movementthem in our two-stage procedure.The full-conditional distributions for population-level parameters µ β and Σ − β in the secondstage model remain the same as in the MCMC algorithm to ﬁt the full hierarchical modelin (1)–(5): [ µ β |· ] ∝ (cid:32) J (cid:89) j =1 [ β j | µ β , Σ β ] (cid:33) [ µ β ] , (7)[ Σ − β |· ] ∝ (cid:32) J (cid:89) j =1 [ β j | µ β , Σ β ] (cid:33) [ Σ − β ] . (8)If the model for β j and prior for µ β are multivariate Gaussian and the prior for Σ − β isWishart, then the full-conditional distributions in (7) and (8) are multivariate Gaussianand Wishart, respectively. These speciﬁc distributions are commonly used in many animalmovement models for population-level parameters and permit conjugate Gibbs updates inour second stage algorithm.The joint full-conditional distribution for the data-level auxiliary parameters, θ j , andindividual-level parameters, β j , is[ θ j , β j |· ] ∝ [ y j | β j , θ j ][ β j | µ β , Σ β ][ θ j ] , (9)which, depending on the form of data model [ y j | β j , θ j ], would normally require a Metropolis-Hastings update. In this case, the Metropolis-Hastings ratio for the joint update of θ j and β j is r j = [ y j | β ∗ j , θ ∗ j ][ β ∗ j | µ kβ , Σ kβ ][ θ ∗ j ][ θ k − j , β k − j | θ ∗ j , β ∗ j ][ y j | β k − j , θ k − j ][ β k − j | µ kβ , Σ kβ ][ θ k − j ][ θ ∗ j , β ∗ j | θ k − j , β k − j ] , (10)where, the ‘ ∗ ’ superscript represents the proposal for β j and the ‘ k ’ and ‘ k −

1’ superscriptscorrespond to the MCMC sample for the k or k − k = 2 , . . . , K ). Typically, the proposal distribution, [ θ ∗ j , β ∗ j | θ k − j , β k − j ], is chosen to be amultivariate Gaussian random walk such that ( θ ∗ j , β ∗ j ) (cid:48) ∼ N(( θ k − j , β k − j ) (cid:48) , ˜ Σ j ) which requires7.B. Hooten et al.tuning for each individual j by adjusting ˜ Σ j using trial and error or an adaptive MCMCapproach (e.g., Roberts and Rosenthal 2009).However, if we use the posterior samples for θ j and β j from the ﬁrst stage (6) as theproposal in the second stage update for β j , then the proposal distribution is[ θ ∗ j , β ∗ j | θ k − j , β k − j ] ≡ [ y j | β ∗ j , θ ∗ j ][ β ∗ ][ θ ∗ j ] (cid:82) (cid:82) [ y j | β j , θ j ][ β j ][ θ j ] d β j d θ j , (11)which does not depend on the previous θ k − j and β k − j . The Metropolis-Hastings ratio from(10) simpliﬁes to r j = [ β ∗ j | µ kβ , Σ kβ ][ β k − j ][ β k − j | µ kβ , Σ kβ ][ β ∗ j ] , (12)while the updates for µ β and Σ − β remain unchanged. Thus, we keep the samples for θ ∗ j and β ∗ j , from the ﬁrst stage, with probability min( r j , θ j ) in the ﬁrst or second stages if wedesire inference on them because r j , from (12), does not depend on θ j . Furthermore, Lunn et al. (2013) note that, when the stage one priors for β j are diﬀuse, the ratio simpliﬁes furtherto r j = [ β ∗ j | µ kβ , Σ kβ ] / [ β k − j | µ kβ , Σ kβ ], a mere quotient involving the individual-level processdistributions. However, we retain the form in (12) so that we can use prior informationwhen available. Because there is no Markov dependence in the proposal for β j , we select β ∗ j (and θ ∗ j , if desired) uniformly at random from the output resulting from the ﬁrst stagemodel ﬁts. More importantly, the Metropolis-Hastings ratios ( r j , for j = 1 , . . . , J ) in (12) donot contain a tuning parameter, resulting in unsupervised updates. Paired with the Gibbsupdates for µ β and Σ − β , the second stage algorithm is fully automatic, and samples fromthe full-conditional for β j can be obtained in parallel (within the broader second stageMCMC algorithm) creating the potential for additional computational eﬃciency. Critically,the Metropolis-Hastings ratio, r j in (12), is not a function of the data. Therefore, complicateddata models do not need to be reconsidered in the second stage algorithm. The utility of the8opulation-Level Inference for Animal Movementsimple two-stage procedure is that it is intuitive, facilitates parallelization, and can result inalgorithms that are fully automatic.In what follows, we provide two example applications where the two-stage procedure forobtaining population-level animal movement inference is valuable. The ﬁrst applicationinvolves a spatial point process modeling approach for telemetry data commonly referredto as “resource selection function” (RSF) analysis (e.g., Manly et al. et al. (2010a) and Hanks et al. (2015a).

3. APPLICATIONS3.1. Hierarchical Point Process Models

Perhaps the most common model ﬁt to temporally independent telemetry data is the RSFmodel. The RSF model is a heterogeneous point process model that conditions on the numberof telemetry observations. Assuming there is no measurement error associated with thetelemetry data s ij (typically a 2 × i = 1 , . . . , n j and individuals j = 1 , . . . , J , the data model takes the form of a weighted distribution (Patil and Rao 1977)such that s ij ∼ [ s ij | β j ] and [ s ij | β j ] ≡ g ( x ( s ij ) , β j ) f ( s ij ) (cid:82) g ( x ( s ) , β j ) f ( s ) d s , (13)where, g ( x ( s ) , β j ) is the “selection” function and f ( s ) is the “availability” function. Thus, theanimal movement interpretation of (13) is that inference for β j provides insight about howindividual j selects resources (i.e., covariates, x ) from those available to it. The selectionfunction is often chosen to be exponential (i.e., g ( x ( s ij ) , β j ) ≡ exp( x ( s ij ) (cid:48) β j )) and theavailability function is typically assumed to be uniform on the support of the point process(i.e., f ( s ij ) ≡ unif( S ) for s ij ∈ S ⊂ (cid:60) × (cid:60) ). 9.B. Hooten et al.Warton and Shepherd (2010) and Aarts et al. (2012) showed that the RSF model in (13)can be ﬁt using a variety of approaches, including a Poisson likelihood. The Poisson likelihoodcan be considered by ﬁrst preprocessing the data such that y j ≡ ( y ,j , . . . , y m,j ) (cid:48) representscounts of telemetry locations in grid cells corresponding to a discretization of the support S .As the grid cell size decreases with respect to the resolution of the covariates x , a Poissondata model coincides with the point process model. Thus, the corresponding hierarchicalmodel y j ∼ Pois(exp( X j β j )) , (14) β j ∼ N( µ β , Σ β ) , (15) µ β ∼ N( µ , Σ ) , (16) Σ − β ∼ Wish(( S ν ) − , ν ) , (17)assumes the same form as (1)–(5) and allows for population-level resource selection inferenceon µ β . To ﬁt the full hierarchical model directly using MCMC, we sample from the full-conditional distributions for β j , µ β , and Σ − β , sequentially. Standard Metropolis-Hastingsupdates for β j require tuning, but the model can be ﬁt using a single MCMC algorithm formoderately sized data sets. Alternatively, the weighted least squares proposal approach ofGamerman (1997) could be used to acquire samples for β j from the posterior distribution.However, to adequately approximate the point process model, the grid cells often need tobe quite small, resulting in a ﬁne-scale discretization of the support S and increasing thecomputational burden.The two-stage procedure we described in the previous Section can easily be employed to ﬁtthe hierarchical model in (14)–(17). For the ﬁrst stage, we can use an MCMC or HamiltonianMonte Carlo algorithm (via BUGS, JAGS, or STAN; Lunn et al. et al. y j ∼ Pois(exp( X j β j )) , (18) β j ∼ N( µ , Σ ) , (19)for j = 1 , . . . , J , independently. Note that the individual-level parameter model in (19) isan exchangeable prior for all j = 1 , . . . , J . Also, if the individual data sets y j and X j areso large that they are diﬃcult to store in memory simultaneously for all J individuals, theﬁrst stage model ﬁtting can be fully distributed among separate machines or performed insequence. This highlights another primary advantage of the two-stage procedure.The second stage algorithm for obtaining population-level inference is an MCMC algorithmwith Gibbs updates for µ β and Σ − β as described in the previous Section, and updates for β j using Metropolis-Hastings based on the acceptance ratio in (12), which becomes r j = N( β ∗ j | µ kβ , Σ kβ )N( β k − j | µ , Σ )N( β k − j | µ kβ , Σ kβ )N( β ∗ j | µ , Σ ) . (20)Within the second stage MCMC algorithm, the updates for β j can also be parallelizedbecause they are independent, although this model is simple enough that parallelization isnot necessary in the second stage algorithm. Thus, the data, y j for j = 1 , . . . , J , which couldinclude counts for 10s or 100s of thousands of grid cells and 100s of individuals, do notappear in the second stage algorithm. The absence of y j leads to a more computationallyeﬃcient second stage algorithm than the original algorithm to ﬁt the full hierarchical modeldirectly.We simulated point process data from 20 individuals (Figure 1), resulting in approximately30 simulated telemetry ﬁxes per individual, and ﬁt the hierarchical RSF model using: 1.) asingle MCMC algorithm, and 2.) our two-stage procedure. We compared the population-levelresults from the ﬁts resulting from each procedure.11.B. Hooten et al.[Figure 1 about here.]For the ﬁrst-stage algorithm in our two-stage procedure, we ﬁt the individual-level modelsindependently using an adaptive MCMC algorithm in parallel using R (R Core Team 2016)and assumed N( , · I ) priors for β j , a N( , · I ) prior for µ β , and a Wish((3 · I ) − , Σ − β . Our ﬁrst-stage algorithm uses a multivariate Gaussian proposal for β j andadapts the tuning using a single variance parameter, resulting in an unsupervised algorithmfor the individual-level model ﬁts. We could have also used BUGS or JAGS to ﬁt the ﬁrst-stage models, but our adaptive MCMC algorithm required less computing time.The single MCMC algorithm to ﬁt the full hierarchical model required 2.62 minutes toobtain 20,000 MCMC samples in R, whereas the ﬁrst-stage algorithm required 0.57 minutesto obtain the same number of samples using an adaptive MCMC algorithm in parallel for the20 individuals. The second-stage algorithm required only 1.49 minutes in R, which impliesthat the total compute time to ﬁt the model using the two-stage procedure was 2.06 minutes(0.56 minutes less than the single MCMC algorithm). Also, the two-stage procedure requiresno tuning and results in much larger eﬀective MCMC sample sizes for parameters. Theeﬀective MCMC sample sizes for µ β and β j were 8560 and 1398 (averaged across individuals)for the single MCMC algorithm, but were 17590 and 15184 for the two-stage algorithm (outof 20,000 total samples). Thus, to obtain the same eﬀective MCMC sample size using MCMCfor all parameters, we would need an order of magnitude more samples from the single MCMCalgorithm.Figure 2 illustrates the similarities in inference for the slope parameters µ β and β j for j = 1 , . . . ,

20 when ﬁtting the hierarchical RSF model using a single MCMC algorithm (black)versus the two-stage procedure (gray).[Figure 2 about here.]Notice that the single MCMC algorithm and the two-stage procedure provide very similarinference. In terms of inference, there exists some variability among individuals, but the12opulation-Level Inference for Animal Movementpopulation-level inference (Figure 2, top) suggests a consistent overall positive populationresponse to the covariate.

The previous application, involving spatial point process models, involves a commonlyused model speciﬁcation and desired type of inference in ecological research, but morecontemporary methods have been developed to explicitly model the dynamics of animalmovement based on temporally dependent telemetry data with observations close in time.Among these methods are discrete-time and continuous-time approaches to modeling theindividual animal trajectories (McClintock et al. et al. et al. (2008b) in the context of Gaussian processes with covariance structure induced by temporalbasis functions. Buderman et al. (2016) used a simpliﬁed basis function parameterization tomodel Canada lynx (

Lynx canadensis ) movement while accounting for measurement error inthe telemetry data. Buderman et al. (2016) refer to their model as a “functional movementmodel” and use it to provide inference for the true underlying continuous position process(i.e., µ ( t ), for time t ) of an individual.The approach developed by Buderman et al. (2016) assumes that the telemetry data s ij are observed with error. In fact, for the Canada lynx in our study, the bivariate measurement13.B. Hooten et al.error follows an unusual X-shaped pattern because the telemetry data are collected by ServiceArgos (Costa et al. et al. (2015)and Buderman et al. (2016) developed a measurement error model based on a mixturedistribution to account for the X-shaped Argos pattern (see Appendix A for details). Properlyaccounting for measurement error adds another level to the hierarchical model in (1)–(5) suchthat s ij ∼ [ s ij | µ j ( t i ) , φ j ] , (21) y j ∼ [ y j | β j , θ j ] , (22) β j ∼ [ β j | µ β , Σ β ] , (23) µ β ∼ [ µ β ] , (24) Σ − β ∼ [ Σ − β ] , (25) θ j ∼ [ θ j ] , (26) φ j ∼ [ φ j ] , (27)for j = 1 , . . . , J individuals, and where y j is an m j × { µ j ( t ) , ∀ t } by a deterministic functional h such that y j = h ( { µ j ( t ) , ∀ t } ), and φ j are measurement error covariance parameters.Hooten et al. (2010a) developed an individual-level animal movement model based on(21) and (22) where the latent variables y j represent a sequential multinomial processindicating transitions among grid cells on a discretization of the spatial support S . The latentprocess model in (22) relies on a continuous-time discrete-space (CTDS) representation of theposition process. However, because the functional h ( · ), that links the position process withthe data, is non-invertible in their model, Hooten et al. (2010a) proposed a Bayesian multipleimputation procedure to account for uncertainty in the true position process when makinginference on β j . The multiple imputation procedure used by Hooten et al. (2010a) diﬀers14opulation-Level Inference for Animal Movementfrom the two-stage procedure we described herein because it does not allow for feedback fromthe individual-level parameters β j to the position process { µ j ( t ) , ∀ t } or measurement errorparameters φ j . Hooten et al. (2010a) used an imputation model to interpolate the positionprocess and then integrated over the uncertainty in the position process while ﬁtting (22) toprovide posterior inference for the individual-level parameters β j .Hanks et al. (2015a) showed that the multinomial process of Hooten et al. (2010a) could bereparameterized such that [ y j | β j , θ j ] can be modeled using Poisson regression. Speciﬁcally,let τ cj represent the amount of time individual j remains in a grid cell for the c th “stay/move”pair associated with the discretization of the individual’s path through a landscape (for c = 1 , . . . , n j ). Then let y clj ∼ Pois( τ cj exp( x (cid:48) clj β j )) where the index l = 1 , , , l = 3 isnot necessary because corresponds to the middle cell which is captured by τ cj ) denotesmoves to neighboring grid cells in each cardinal direction (i.e., north, east, south, west).That is, if individual j moved north for “stay/move” pair c , then the data point y c j = 1and y c j = y c j = y c j = 0 (see Appendix B for details). The Poisson reparametrizationdramatically improves computational eﬃciency at the individual level because the totalnumber of observations used in the model (4 m j ) is a function of the grid cell size ratherthan the position process discretization as used in Hooten et al. (2010a). Thus, Hanks et al. (2015a) were able to ﬁt the CTDS model to large telemetry data sets in a fraction of the timerequired by the multinomial method developed by Hooten et al. (2010a). However, neitherHooten et al. (2010a) nor Hanks et al. (2015a) attempted to ﬁt a hierarchical model like thatin (21)—(27) to obtain population level inference for µ β .In our application involving population-level inference for Canada lynx, we use the modeldeveloped by Buderman et al. (2016) to obtain the imputation distribution for the trueindividual-level position process { ˜ µ j ( t ) , ∀ t } , and hence ˜ y clj for all c , l , and j , while accountingfor the complicated nature of Argos telemetry error (see Appendix A for details). In whatfollows, we combine all ˜ y clj into a single vector representing the latent process ˜ y j and use ˜ y j β j ∼ N( µ , Σ ). We use an adaptively-tuned MCMC algorithm to obtainsamples from the posterior distributions[ β j |{ s ij , ∀ i, j } ] = (cid:90) [ β j | ˜ y j ][˜ y j |{ s ij , ∀ i, j } ] d ˜ y j , (28)for j = 1 , . . . , J , and where, [˜ y j |{ s ij , ∀ i, j } ] represents the imputation distribution forthe latent Poisson process. To perform the integration in (28), we simply sample ˜ y kj ∼ [˜ y j |{ s ij , ∀ i, j } ] on the k th MCMC iteration and then let the Metropolis-Hastings update β kj depend on ˜ y kj as described in Hooten et al. (2010a) and Hanks et al. (2015a). As in theﬁrst application, we can ﬁt the J models for all individuals in parallel, dramatically reducingthe required computational time.For the second stage of the two-stage procedure, we use the posterior samples for { β j , ∀ j } ,from the ﬁrst stage, as proposals in the MCMC algorithm to ﬁt the hierarchical model in (22)–(25). In doing so, we update { β j , ∀ j } , µ β , and Σ − β sequentially in a completely unsupervisedsecond-stage MCMC algorithm. Recall that the Metropolis-Hastings acceptance ratio for β j is identical to that used in the previous application (20). As a result of the two-stage implementation and the adaptive tuning in the ﬁrst-stage algorithm, the procedure iscompletely automatic after the data are preprocessed to obtain the imputation distribution,and population-level inference for µ β can easily be obtained.Using telemetry data from J = 18 individual Canada lynx in Colorado, USA (Figure 3a),we applied the two-stage procedure to ﬁt the hierarchical model in (22)–(25).[Figure 3 about here.]16opulation-Level Inference for Animal MovementWe used the functional movement model of Buderman et al. (2016) to obtain the imputedpath distribution (Figure 3b,c) for each individual and used nearly continuous imputed pathrealizations to create the latent Poisson data realizations ˜ y kj (resulting in approximately 450discrete-space transitions per individual, n j ≈ et al. et al. β j ∼ N( , I ) forall j = 1 , . . . ,

18. We used µ β ∼ N( , I ) and Σ − β ∼ Wish((3 · I ) − ,

3) as priors for thepopulation-level parameters and precision matrix. See Appendix B for additional details onthe CTDS animal movement model.We ﬁt the overall hierarchical model using the two-stage procedure and the resultingalgorithms required 0.86 minutes for the ﬁrst stage (using an adaptive MCMC algorithmin parallel) and 1.62 minutes for the second stage. Figure 5 shows the results of the modelﬁt in terms of posterior means and 50% and 95% credible intervals for the population-levelparameters µ β and individual-level parameters β j .[Figure 5 about here.]While there exists substantial variability among individual Canada lynx, with someindividuals exhibiting clear relationships with the covariates (e.g., individuals 2, 4, and 5), theposterior distributions for µ did not indicate a population-level eﬀect for either covariate atthe 95% level (but both did at the 50% level). For the individuals that did show evidence of aneﬀect (i.e., 95% credible intervals not overlapping zero), the negative response to elevationindicates that overall motility decreases at higher elevations, leading to greater residencetimes in those regions, as opposed to lower elevations (Figure 5a). Similarly, for individuals17.B. Hooten et al.with signiﬁcant eﬀects related to distance from forest we see positive inﬂuence on motilityimplying that those Canada lynx have higher motility (and hence lower residence time)in regions farther from forest (Figure 5b). Thus, the inference in our application involvingCanada lynx agrees with that obtained in other studies (e.g., McKelvey et al.

4. CONCLUSION

Our ﬁndings indicate that the two-stage procedure we described herein holds tremendousvalue for ﬁtting hierarchical animal movement models to telemetry data for population-level inference. We applied the two-stage procedure to two types of commonly used animalmovement models of varying complexity and found that it worked well in both cases.The spatial point process modeling approach we described in the ﬁrst application is acommonly used model, but still fairly simple. Much more complicated spatio-temporal pointprocess models have been used to model temporally correlated telemetry data (e.g., Johnson et al. et al. et al. et al. (2015) developed amodel with a time-varying dynamic availability component that depended on an additionalsmoothness parameter. Thus, the data model developed by Brost et al. (2015) requiredsubstantially more computation time than the simulated example we presented in Section3.1 and would beneﬁt from a two-stage implementation where individual-level models couldbe ﬁt independently on separate processors and then recombined using the second stageMCMC algorithm to yield population-level inference for µ β .In our example involving Canada lynx, the continuous-time discrete-space reparameter-ization developed by Hanks et al. (2015a) already provides signiﬁcant improvements incomputational eﬃciency over the motivating model developed by Hooten et al. (2010a).However, additional computational gains can be achieved using the two-stage ﬁtting18opulation-Level Inference for Animal Movementprocedure to provide population-level inference.Despite the wide range of potential applications to many types of hierarchical models,we found it surprising that the two-stage ﬁtting procedure of Lunn et al. (2013) is notmore well known. For our situations with large amounts of telemetry data and potentiallycomplicated data models, we found the two-stage procedure works very well and is trivial toimplement. We also found it very helpful to be able to use diﬀerent data models, ﬁrst-stageﬁtting algorithms, and easy parallelization. As a potential caveat, the two-stage proceduredescribed by Lunn et al. (2013) may not be very eﬃcient when the population inducesextreme amounts of shrinkage in the individual-level parameters. Thus, in these cases, moresamples would be needed in the ﬁrst stage algorithm. However, in a preliminary simulationstudy, we found that the two-stage procedure performs poorly only for data sets with verysmall amounts of data (i.e., <

20 observations for a subset of individuals).Animal movement models have also been developed to account for more mechanisticinteractions among individuals (e.g., Russell et al. et al. et al. et al. et al. et al.

ACKNOWLEDGEMENTS

Support for this research was provided by NSF 1614392, NSF EEID 1414296, CPW T01304,and NOAA AKC188000. The authors thank The International Environmetrics Society andthe editors of Environmetrics for their support and assistance with this work. The authorsalso thank Walt Piegorsch, Mindy Rice, Devin Johnson, Peter Craigmile, Erin Peterson, RonSmith, and the other organizers of the TIES 2016 annual meeting. Any use of trade, ﬁrm,or product names is for descriptive purposes only and does not imply endorsement by theU.S. Government.

REFERENCES

Aarts G, Fieberg J, Matthiopoulos J, 2012. Comparative interpretation of count, presence-absence, and pointmethods for species distribution models.

Methods in Ecology and Evolution : 177–187.Berliner L, 1996. Hierarchical Bayesian time series models. In Hanson K, Silver R (eds.), Maximum Entropyand Bayesian Methods , Kluwer Academic Publishers, 15–22.Blackwell P, 1997. Random diﬀusion models for animal movement.

Ecological Modelling : 87–102.Bradley J, Holan S, Wikle C, 2015. Computationally eﬃcient distribution theory for Bayesian inference ofhigh-dimensional dependent count-valued data. arXiv preprint arXiv:1512.07273 .Brost B, Hooten M, Hanks E, Small R, 2015. Animal movement constraints improve resource selectioninference in the presence of telemetry error.

Ecology : 2590–2597.Buderman F, Hooten M, Ivan J, Shenk T, 2016. A functional model for characterizing long distance movementbehavior. Methods in Ecology and Evolution : 264–273.Cagnacci F, Boitani L, Powell RA, Boyce MS, 2010. Animal ecology meets gps-based radiotelemetry: aperfect storm of opportunities and challenges. Philosophical Transactions of the Royal Society of LondonB: Biological Sciences : 2157–2162.Carpenter B, Gelman A, Hoﬀman M, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, RiddellA, 2016. Stan: a probabilistic programming language.

Journal of Statistical Software .Costa D, Robinson P, Arnould J, Harrison AL, Simmons SE, Hassrick JL, Hoskins AJ, Kirkman SP,Oosthuizen H, Villegas-Amtmann S, Crocker DE, 2010. Accuracy of Argos locations of pinnipeds at-sea estimated using Fastloc GPS.

PLoS One : e8677.Cressie N, Wikle C, 2011. Statistics for Spatio-Temporal Data . John Wiley and Sons, New York, New York,USA.Crooks G, 2010. The amoroso distribution. arXiv preprint arXiv:1005.3274 .Dunn J, Gipson P, 1977. Analysis of radio-telemetry data in studies of home range.

Biometrics : 85–101.Gamerman D, 1997. Sampling from the posterior distribution in generalized linear mixed models. Statisticsand Computing : 57–68.Gelfand A, Smith A, 1990. Sampling-based approaches to calculating marginal densities. Journal of theAmerican Statistical Association : 398–409.Gelman A, Hill J, 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models . CambridgeUniversity Press, Cambridge, United Kingdom.Givens G, Hoeting J, 2012.

Computational Statistics , volume 710. John Wiley & Sons.Hanks E, Hooten M, Alldredge M, 2015a. Continuous-time discrete-space models for animal movement.

Annals of Applied Statistics : 145–165.Hanks E, Hooten M, Johnson D, Sterling J, 2011. Velocity-based movement modeling for individual andpopulation level inference. PLoS One : e22795.Hobbs N, Hooten M, 2015. Bayesian Models: A Statistical Primer for Ecologists . Princeton University Press,Princeton, New Jersey, USA.Hooten M, Johnson D, 2016a. Basis function models for nonstationary continuous-time trajectories.

Journalof the American Statistical Association : In Revision.Hooten M, Johnson D, Hanks E, Lowry J, 2010a. Agent-based inference for animal movement and selection.

Journal of Agricultural, Biological and Environmental Statistics : 523–538.Hooten M, Johnson D, McClintock B, Morales J, In Press. Animal Movement: Statistical Models forTelemetry Data . Chapman & Hall/CRC.Illian J, Martino S, Sørbye S, Gallego-Fern´andez J, Zunzunegui M, Esquivias M, Travis J, 2013. Fittingcomplex ecological point process models with integrated nested Laplace approximation.

Methods in Ecologyand Evolution : 305–315.Illian J, Sorbye S, Rue H, Hendrichsen D, 2012. Using INLA to ﬁt a complex point process model withtemporally varying eﬀects - a case study. Journal of Environmental Statistics : 1–25. Johnson D, Hooten M, Kuhn C, 2013. Estimating animal resource selection from telemetry data using pointprocess models.

Journal of Animal Ecology : 1155–1164.Johnson D, London J, Lea M, Durban J, 2008b. Continuous-time correlated random walk model for animaltelemetry data. Ecology : 1208–1215.Johnson D, Thomas D, Ver Hoef J, Christ A, 2008a. A general framework for the analysis of animal resourceselection from telemetry data. Biometrics : 968–976.Jonsen I, 2016. Joint estimation over multiple individuals improves behavioural state inference from animalmovement data. Scientiﬁc Reports : 20625.Jonsen I, Flemming J, Myers R, 2005. Robust state-space modeling of animal movement data. Ecology :589–598.Lunn D, Barrett J, Sweeting M, Thompson S, 2013. Fully Bayesian hierarchical modelling in two stages,with application to meta-analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics) : 551–572.Lunn D, Spiegelhalter D, Thomas A, Best N, 2009. The BUGS project: Evolution, critique and futuredirections. Statistics in Medicine : 3049–3067.Manly B, McDonald L, Thomas D, McDonald T, Erickson W, 2007. Resource Selection by Animals: StatisticalDesign and Analysis for Field Studies . Springer Science & Business Media.McClintock B, Johnson D, Hooten M, Ver Hoef J, Morales J, 2014. When to be discrete: the importance oftime formulation in understanding animal movement.

Movement Ecology : 21.McKelvey K, Aubry K, Ortega Y, 2000. History and distribution of lynx in the contiguous United States.In Ruggiero L, Squires J, Buskirk S, Aubry K, McKelvey K, Koehler G, Krebs C (eds.), Ecology andConservation of Lynx in the United States , University Press of Colorado, Boulder, Colorado, USA, 207–264.Murray L, 2013. Bayesian state-space modelling on high-performance hardware using libBi. arXiv preprintarXiv:1306.3277 .Patil G, Rao C, 1977. The weighted distributions: a survey of their applications. In Krishnaiah P (ed.),

Applications of Statistics , North Holland Publishing Company.Plummer M, 2003. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In

Proceedings of the 3rd international workshop on distributed statistical computing , volume 124, TechnischeUniversit at Wien Wien, Austria, 125.

R Core Team, 2016.

R: A Language and Environment for Statistical Computing . R Foundation for StatisticalComputing, Vienna, Austria.Roberts G, Rosenthal J, 2009. Examples of adaptive MCMC.

Journal of Computational and GraphicalStatistics : 349–367.Rue H, Martino S, Chopin N, 2009. Approximate Bayesian inference for latent Gaussian models by usingintegrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (StatisticalMethodology) (2): 319–392.Ruiz-C´ardenas R, Krainski E, Rue H, 2012. Direct ﬁtting of dynamic models using integrated nested Laplaceapproximations INLA. Computational Statistics & Data Analysis : 1808–1828.Russell JC, Hanks EM, Haran M, Hughes DP, 2016. A spatially-varying stochastic diﬀerential equationmodel for animal movement. arXiv preprint arXiv:1603.07630 .Scharf H, Hooten M, Fosdick B, Johnson D, London J, Durban J, 2015. Dynamic social networks based onmovement. arXiv : 1512.07607.Warton D, Shepherd L, 2010. Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology. Annals of Applied Statistics : 1383–1402. APPENDIXAPPENDIX A: THE IMPUTATION DISTRIBUTION

Buderman et al. (2016) developed a phenomenological statistical model for estimatingan individual’s underlying continuous-time path based on Argos telemetry data and asemiparametric regression using temporal basis functions. We used this model to precalculatean imputation distribution for the true path. For the j th individual, the FMM developed by23.B. Hooten et al.Buderman et al. (2016) is s ij ∼  N( µ j ( t i ) , Σ i ) with prob. p N( µ j ( t i ) , HΣ i H (cid:48) ) with prob. 1 − p , (A.1) µ j ( t i ) = W j ( t i ) α , (A.2) α ∼ N( , Σ α ) , (A.3)where, s ij represent the i th telemetry observation, µ j ( t i ) is the true individual position attime t i , Σ i is an error covariance matrix on the ﬁrst axis, and HΣ i H (cid:48) is the error covariancematrix on a rotated axis ( H is a rotation matrix). The probability p allows the telemetry datato arise from a bivariate Gaussian mixture that captures the X-shaped error pattern inherentto Argos data. The matrix W j ( t i ) contains basis vectors (i.e., b-spline basis vectors) at time t i for individual j , and α is a set of regression coeﬃcients corresponding to the temporal basisfunctions. Buderman et al. (2016) set Σ α ≡ Diag( σ α ) and tuned σ α to induce regularizationin the model and improve predictive ability (i.e., ridge regression).The imputed path distribution is obtained by sampling from the posterior predictivedistribution of [ µ j ( t ) |{ s ij , ∀ i, j } ] for a large, but ﬁnite, set of times t ∈ T to obtain posteriorrealizations µ kj ( t ) for k = 1 , . . . , K MCMC iterations. Figure A.1a shows an example set ofpath realizations (lines) that could result from ﬁtting the FMM from Buderman et al. (2016)to telemetry data (points). [Figure A.1 about here.]Figure A.1b shows a zoomed in section of the path realizations that highlight the temporaldiscretization. At a ﬁner spatial resolution, we can see that the path realizations cross throughan example grid cell and its associated neighborhood (Figure A.1c). This idea is critical forprocessing the path realizations for use with the CTDS model.24opulation-Level Inference for Animal Movement

APPENDIX B: CTDS MODEL

For each individual j in the original CTDS model, each segment (between points) inFigure A.1c served as a multinomial data vector y ij ≡ ( y i , y i , y i , y i , y i ) (cid:48) j where y ij ∼ MN(1 , p ij ) (Hooten et al. y ij = h ( { µ j ( t ) , ∀ t } ) based on the imputed path realizations by coding a transitionas either a stay or a move in a certain direction according to the schematic in Figure A.2.[Figure A.2 about here.]Hanks et al. (2015a) reparameterized the multinomial imputation data using suﬃcientstatistics. They denoted residence time as τ lj (approximated by ∆ t times the numberof consecutive stays in the current grid cell, Figure A.1c) for l = 1 , . . . , L “stay–move”pairs and then deﬁned the probability of staying in the current grid cell for time τ lj as p τ lj / ∆ t ij = (1 − p lj, move ) τ lj / ∆ t , where p lj, move is the probability of moving. Hanks et al. (2015a)let p lj, move = ∆ t · λ lj, move and ∆ t → ∆ t → (1 − p j, move ) τ lj / ∆ t = e − τ lj λ lj, move , (A.4)which, implies that τ lj ∼ Exp( λ lj, move ).Similarly, Hanks et al. (2015a) showed that the movement probability to neighboring gridcell c is p clj /p lj, move = λ clj /λ lj, move . Thus, combing the residence probability model with themovement probability yields a likelihood for the suﬃcient statistic ( τ lj , y lj , y lj , y lj , y lj ) (cid:48) equal to (cid:81) Ll =1 (cid:81) c (cid:54) =3 λ clj exp( − τ lj λ clj ). The likelihood for this reparameterized CTDS modelcoincides with a Poisson where λ clj is the movement rate to neighboring cell c and τ lj is anoﬀset. Thus, any software capable of ﬁtting a Poisson generalized linear model with an oﬀsetcan ﬁt the CTDS model if the true path is observed at a ﬁne enough temporal resolution.Hanks et al. (2015a) used a multiple imputation approach to account for the uncertaintyin the path distribution based on (28). The movement rates can then be linked to the25.B. Hooten et al.environmental covariates by a log-linear link λ clj = x (cid:48) clj β j , where the covariates x (cid:48) clj canbe speciﬁed in several meaningful ways to capture either diﬀerential movement rates (i.e.,motility) or gradient-based directional bias in movement relative to environmental covariates(see Hanks et al. et al. (2015a) is much more computationally eﬃcient than that of Hooten et al. (2010a) becausethe dimensionality of the data 4 L depends on the grid cell size instead of the temporaldiscretization of the path. 26IGURES a.) y b.) Figure 1. a.) Simulated animal positions (points) based on a spatial point process (13) with one simulated covariate (backgroundimage, dark shading represents larger values). b.) A zoomed in spatial map (from inset white box in panel a) showing positions from 5individual animals as diﬀerent point types (i.e., (cid:5) , (cid:52) , (cid:53) , +, × ). llll llll llll ll ll llll llll llll llllll llll llll b b b b b b b b b b b b b b b b b b b b m b Figure 2.

Posterior means (points) and 50% and 95% credible intervals for µ β and β j for j = 1 , . . . ,

20. Single MCMC algorithmresults are shown in black and two-stage procedure results are shown in gray.

Denver N o r t h i ng ( k m ) Easting (km) a.) b.) b.) c.) c.) b.) b.) c.) c.)

Figure 3. a.) Colorado, USA, with major highways and the city of Denver shown. The telemetry data spanning a year of time for 18individual Canada lynx are shown as points. A shaded relief map is shown as the background image to illustrate the topography of thearea. b.) and c.) Close up views of the predicted paths for two individual Canada lynx. For clarity, only the posterior mean path isshown.

200 250 300 350 40041504200425043004350 N o r t h i ng ( k m ) Easting (km)a.) 200 250 300 350 40041504200425043004350 N o r t h i ng ( k m ) Easting (km)b.)

Figure 4.

Images of covariates with telemetry observations overlaid as black points: a.) Elevation and b.) distance to forest. Lightshading corresponds to larger values. a.) Elevation l l lll ll lll l ll ll ll l l −0.5 0.0 0.5 b b b b b b b b b b b b b b b b b b m b b.) Distance to forest llll lll l l ll ll lll l ll b b b b b b b b b b b b b b b b b b m b Figure 5.

Posterior estimates for the population-level parameters µ β and individual-level parameters β j . The posterior mean is shownas a central point and the 50% and 95% credible intervals are shown as the thick and thin black lines. Panel a shows the results for theelevation covariate and panel b shows the results for the distance from forest covariate. a b c Figure A.1. a.) The telemetry data (large points) and imputation distribution (lines) for the individual’s path { µ kj ( t ) , ∀ k, j, t } usingthe posterior predictive distribution of the FMM (Buderman et al. a b c d e Figure A.2.

Discrete set of possible transitions at any time t , used to create the multinomial vector y ( t ), based on the function h ( { µ j ( t ) , ∀ t } ). a.) move up: y ( t ) = (1 , , , , (cid:48) , b.) move right: y ( t ) = (0 , , , , (cid:48) , c.) stay: y ( t ) = (0 , , , , (cid:48) , d.) move down: y ( t ) = (0 , , , , (cid:48) , e.) move left: y ( t ) = (0 , , , , (cid:48) ..