[PDF] The Importance of Prior Choice in Model Selection: a Density Dependence Example

Abstract

We perform a Bayesian analysis on abundance data for ten species of North American duck, using the results to investigate the evidence in favour of biologically motivated hypotheses about the causes and mechanisms of density dependence in these species. We explore the capabilities of our methods to detect density dependent effects, both by simulation and through analyzes of real data. The effect of the prior choice on predictive accuracy is also examined. We conclude that our priors, which are motivated by considering the dynamics of the system of interest, offer clear advances over the priors used by previous authors for the duck data sets. We use this analysis as a motivating example to demonstrate the importance of careful parameter prior selection if we are to perform a balanced model selection procedure. We also present some simple guidelines that can be followed in a wide variety of modelling frameworks where vague parameter prior choice is not a viable option. These will produce parameter priors that not only greatly reduce bias in selecting certain models, but improve the predictive ability of the resulting model-averaged predictor.

Full PDF

TThe Importance of Prior Choice in Model Selection: a Density Dependence Example James D. Lawrence, Robert B. Gramacy,Len Thomas and Stephen T. Buckland November 7, 2018 Abstract We perform a Bayesian analysis on abundance data for ten species of North American duck, using the results to investigate the evidence in favour of biologically motivated hypotheses about the causes and mechanisms of density dependence in these species. We explore the capabilities of our methods to detect density dependent effects, both by simulation and through analyzes of real data. The effect of the prior choice on predictive accuracy is also examined. We conclude that our priors, which are motivated by considering the dynamics of the system of interest, offer clear advances over the priors used by previous authors for the duck data sets. We use this analysis as a motivating example to demonstrate the importance of careful parameter prior selection if we are to perform a balanced model selection procedure. We also present some simple guidelines that can be followed in a wide variety of modelling frameworks where vague parameter prior choice is not a viable option. These will produce parameter priors that not only greatly reduce bias in selecting certain models, but improve the predictive ability of the resulting model-averaged predictor. a r X i v : . [ s t a t . M E ] A ug Prior Choice for Model Selection Density dependence within a species is usually the primary means of numerical self-regulation, the mechanism by which a species can maintain a steady population trajectory in an environment that produces unexpected events of both beneﬁcial and harmful natures. Turchin (1995), in a synthesis of several other sources, states that density dependence is necessary for a regulated population. That is, a population without it is almost certain to be numerically unstable, with an undeﬁned carrying capacity. It is important to discern the magnitude of density dependence a species exhibits, as well as the time lag over which it operates. Knowledge of a species’ likely response to natural as well as synthetic shocks will assist in effective species management. Statistically this is a challenging problem which does not usually admit closed-form mathematical analysis. The debate over the relevance of density dependence has been at times acrimonious, as summarised in Turchin (1995). The quote from that paper which we take as our starting point on this issue is that available evidence “is entirely consistent with the universal applicability of the density dependence model.” (Turchin, 1995, p. 31). As such, we seek to make what statistical inferences we can about the magnitude and time period of such effects. There are several biological hypotheses as to the causes of density dependence, both in general and in the speciﬁc case of North American ducks, our motivating example. These have differing implications for the likely degree of density dependence to be expected in such species. We analyze ten species of duck, including both diving and dabbling ducks, between which there is reason to expect a distinction in density dependence proﬁle. The hypothesis tested (and to an extent borne out) by Jamieson and Brooks (2004) was that diving ducks might, in response to a poor year (low habitat and/or food availability), delay breeding for a year. This would imply a delayed density dependence in diving ducks that would not be present in dabbling ducks. Prior Choice for Model SelectionIn contrast, Sargeant et al. (1984) looked at red fox ( vulpes vulpes ) predation on both diving and dabbling ducks, and concluded that dabbling ducks are signiﬁcantly more vulnerable to predation of this kind. The red fox is only one predator of ducks in North America, but it is one of the primary predators and, in common with many other duck predators, it is a generalist. A hypothesis of Bjørnstad et al. (1995), tested in Viljugrein et al. (2005) suggests that this would induce more immediate density dependence in the affected species, since both ducks and eggs are potential predatory targets. This would imply both ﬁrst and second order density dependence in dabbling ducks; less so in diving ducks. It is apparent that there are hypotheses that produce differing predictions as to the nature of density dependence in these species. We aim to provide a thorough statistical analysis using historical count data provided by US Fish and Wildlife Service (2010). We will take a Bayesian standpoint when analyzing these data. This is not because a classical analysis is impossible, but rather because we believe that common sense can be translated into a meaningful, informative parameter prior. Inference about the degree of density dependence under this framework is a Bayesian model selection problem. Link and Barker (2006) illustrate the principles and some of the issues inherent to this class of problem. We demonstrate that choosing an informative prior (using simple rules which we will describe) is both necessary for a balanced model selection procedure, and improves the accuracy with which we can predict future population levels. The outline for the paper is as follows. First we summarise a widely used model for density dependence in the following subsection. Then in section 2 we consider the problem of choosing a Bayesian prior to use in our analysis. Section 3 is a simulation study to exhibit the improvements we offer over previous approaches, before we analyze real data in section 4. We ﬁnish with a discussion of our results and lessons learned that can usefully be applied to a wider class of problems than the speciﬁc case of density dependence in North American ducks. Prior Choice for Model Selection We consider density dependence model from Dennis and Taper (1994). Let x t be the log-population size in year t . The evolution of x t over time is governed by the stochastic update x t = x t − + b + k (cid:88) i =1 b i e x t − i + (cid:15) t , (cid:15) t ∼ N (0 , σ ) . (1)The parameters are interpreted as k : degree (maximum time lag) of density dependence, in years. b : uninhibited exponential growth rate b k : density dependence effects at different time-lags σ : species (and unmodelled covariate) volatility The number of b parameters is k + 1 , so k is a model order parameter. If k = 0 , then this process simpliﬁes to a random walk with drift. Also, if several different mechanisms induce a density dependence effect at the same time lag, then the appropriate component of b will in effect be a summary statistic measuring the sum of all effects at that time lag. We do not in general observe a true and accurate count of the species abundance. We observe data y t which will include noise which may vary in intensity from year to year. We assume that this observation process is Gaussian, i.e. y t ∼ N ( x t , S t ) (2)and we assume that S t is known for each year t = 1 , . . . , T . The full model as speciﬁed by equations 1 and 2 is thus a state space formulation. Prior Choice for Model Selection To perform a full Bayesian analysis and ﬁt of this model, we need to specify a prior for each parameter that is not directly speciﬁed by the model itself. We give a uniform prior to k , over { , . . . , } . We believe it is implausible that density dependent effects could operate on a longer timescale than this. In particular, the hypotheses that we wish to assess are only concerned with density dependence up to second order. Our prior gives no preference to one time lag over another in this range, so that we can assess the evidence provided by the data in favour of each model. This is similar to a Bayes Factor, which can be used to compare the ﬁt of different models (Kass and Raftery, 1995). An immproper inverse gamma (0,0) prior is assigned to σ . This is mostly for reasons of Bayesian conjugacy — the rate of learning is high for this parameter and the prior shape makes little difference. The distribution of x might not be speciﬁed by the model (depending on k — if k = 2 for example, then we need to specify the distribution of x and x , and the model gives us the distribution of x ). In order to have a consistent likelihood across all models, we consider the observed likelihood function p ( y | x ) as a (density) function of x and treat it as our prior. Naturally we do not count it twice, so it is removed from the likelihood, as well as those systemic terms relating to the evolution of x . Thus, for all models, the ﬁrst model-driven term in the likelihood is p ( x | x , σ , b ) . The ﬁnal parameter that requires a prior is b , but there is a pitfall to be aware of before we make our choice. Lindley’s Paradox

It has been known for some time (Lindley, 1957) that choosing a vague (high-variance) prior for within-model parameters (except for parameters common to all of them, such as σ ) will bias the model selection routine in favour of simple models. This is discussed in depth in Link and Barker Prior Choice for Model Selection(2006). In the limiting case where an improper ﬂat prior is used, the posterior model probabilities will always be degenerate in favour of the model with the fewest parameters. Lindley’s paradox therefore implies that we cannot take a diffuse Normal prior for b , since this would lead to selecting k = 0 , even if the data produced a likelihood that was higher for other models (hence the paradox). In light of this it is clear we must choose an informative prior, but the question arises as to how to choose an informative prior when one has, apparently, no information. We now show that an informative choice can be reached just by excluding certain pathological cases that we would not expect to arise in the biological systems in question.

The population evolution model is simple to simulate from. When one does so one notices that for certain parameter values, the population ﬂuctuates wildly or grows very rapidly until the computer suffers numerical overﬂow. However, for other values, the population reaches a stable threshold after a period of time (regardless of its starting value) and then does not move too far from this.

We refer to this level as the carrying capacity, since it is the maximum level for which the expected population trajectory is not downwards. We would like to restrict our parameters to values that produce a (ﬁnite) carrying capacity (exempt from this is the null model k = 0 , as it can never have a carrying capacity). We will demonstrate that a diffuse independent Normal prior does not always lead to the stable scenario, but there are other priors that do (at least much more often). Consider the deterministic analog of the model equation 1 with no measurement error,and suppose that we observe a string of k years where the population is at a constant level x k = x .Then x k +1 = x + b + k (cid:88) i =1 b i e x . If b and (cid:80) ki =1 b i are of different sign (and k is at least 1), then we can solve for when x k +1 = x , Prior Choice for Model Selectionand we ﬁnd that this corresponds to x = x ∗ = log (cid:32) − b (cid:80) ki =1 b i (cid:33) . (3)This exposes an inherent asymmetry in this model, that b and the sum of the other components of b need to be of different sign to produce stable populations. This is not captured in an independent Normal prior. In addition, it raises the problem of estimating the carrying capacity x ∗ . We are constructing a prior, so “peeking” at the data should be avoided where possible. The approach we suggest is to center the observed data (on the log scale) so that the carrying capacity should correspond approximately to x ∗ = 0 . Thus if you have data on a well-established and stable species, you should center to the mean across all of the time series, whereas if you are analyzing a population that (say) only achieves a stable level in the last ﬁfteen years of a 50-year study, then it should be centered so that the mean of the last ﬁfteen years is zero on the log-scale. This is equivalent to multiplying the data so that the geometric mean over that time period is 1. Optionally, the carrying capacity could be introduced as another parameter and given a prior, but that is not an approach we consider, because it is difﬁcult to get an independent estimate that might inform such a prior. We have suggested centering data, however the model is not invariant to such a trans- formation. For large populations, a density dependent effect of a particular magnitude will require a smaller b than for smaller populations. This is because the i -th density dependent effect is equal to b i e x t − i . If we do not center the data, we must incorporate some measure of the overall magnitude of the data into the prior (as done in Jamieson and Brooks (2004)). If we center, then we do not need to look at the data in order to inform our prior. If we take the carrying capacity to be x ∗ = 0 , then, rearranging 3, we get b = − k (cid:88) i =1 b i . (4)Thus b is perfectly negatively correlated with each of the other components of b . If we take an independent Normal (0 , σ b ) prior for b k then this suggests that the joint prior for b k should be Prior Choice for Model Selectionthe degenerate Normal b k ∼ N  ,  kσ b − . . . − − σ b ... . . . − σ b  . (5)This is degenerate in the sense that the covariance matrix does not have full rank, and only those values of b for which 4 holds will have nonzero likelihood. In practice this only applies to the deterministic model, and a small amount h would be added to the variance of b to allow for mis-estimation of x ∗ . This is because there will always be probabilistic drift towards the carrying capacity, and by allowing some additional variation in b , we introduce the requisite additional ﬂexibility into the model. The choice of h also dictates the prior under the null k = 0 model, so a reasonable value might be obtained by considering the variance of symmetric Gaussian random walks over time. For example, a value of h = 0 . corresponds to a process that is as likely as not to at least halve or double in ﬁve years. In other words, if Z ∼ N (0 , × . then P ( Z ∈ [ − log(2) , log(2)]) = 1 / . This is the value we use in all of our priors which have h as a parameter. We now consider the effect of small perturbations about carrying capacity. We will see that this restricts even further the set of parameter values that yield a dynamical system we might expect to see in a natural population.

Suppose that x k = (0 , . . . , , δ ) . Then we may be in one of several scenarios (equa- tion 3 is assumed to hold): (a) (cid:80) ki =1 b i is positive. In this case, regardless of the sign of δ , the population is unstable and will diverge from 0. Carrying capacity is undeﬁned. (b) − < (cid:80) ki =1 b i < . The population returns monotonically towards capacity.

Prior Choice for Model Selection(c) − < (cid:80) ki =1 b i < − . The population oscillates around capacity, with decreasing magnitude. (d) (cid:80) ki =1 b i < − . The population oscillates around zero, but usually with much greater magni- tude than above. If all of b k are negative, then the oscillations will quickly reach a consistent (perhaps large) magnitude, but if any of b k are positive, then the population is probabilisti- cally unbounded i.e. with probability 1, as t → ∞ , x t → ∞ . In the latter case, capacity is again undeﬁned. Plots of simulated population trajectories for all four cases are given in ﬁgure 1. We contend that the second of these is most likely to be characteristic of a natural population, but that perhaps some allowance might be made for the third. The ﬁrst and fourth are considered unlikely to arise in the natural world.

This means that had we chosen a prior of the form 5 then we would unintentionally be making a strong prior assumption about the model order. For example, if k = 1 , then (cid:80) ki =1 b i is a N (0 , σ b ) random variable, with a corresponding probability of lying in [ − , . If k = 2 , then (cid:80) ki =1 b i has a N (0 , σ b ) distribution, with correspondingly reduced probability of lying in this interval. This could be thought of as a manifestation of Lindley’s paradox. If for example σ b = 1 , then the chance of being in the prior-plausible region under k = 1 would be . Under k = 5 , that chance shrinks to . The difference is even more pronounced if σ b is higher. Thus, we would be accidentally favouring simple models. A logical reﬁnement of 5 is to keep the distribution of (cid:80) ki =1 b i constant, and to restrict to cases where − < (cid:80) ki =1 b i < . This is b k ∼ N  ,  σ b + h − . . . − − σ b /k ... . . . − σ b /k  (6)0 Prior Choice for Model Selection − . − . . Index x − . . . . Index x − . . . Index x − . − . − . . . Index x Figure 1: Simulations from the autoregressive model, with b = (1 / , − / , (1 , − , (3 / , − / and (5 / , − / . Note that the last of these is a stable exception to the usually unstable case b > . σ = 0 . for all of these, with the process driving the greatly increased variance for thelast simulation. There is no measurement error, and we observe from t = 100 to t = 150 startingat ( x , x ) = (0 , .restricted to the aforementioned set. This is easy and quick to sample from by rejection sampling. This prior also has the attractive property that the marginal distribution of b is the same under all models except k = 0 , so we are equally willing to entertain density dependence effects at different k , since the prior probability of a model where carrying capacity is deﬁned is the same for all k > . The principle of shrinkage derives from the classical problem of estimating the mean of a mul- tivariate Normal distribution, subject to assumptions about its variance. It can be shown (Stein,

In other words, a shrunk estimate will provide better (in terms of mean square error) predictions of future observations drawn from the same distribution. We use this idea to motivate an alternative choice of prior, which will have an artiﬁcially reduced variance.

It must be noted that the improved predictive power shrinkage allows is at the cost of bias. Such bias-variance tradeoffs are common in model selection problems.

Before we look at observed abundance data, we analyze some simulations of populations which follow the speciﬁed dynamics. We have two simulated datasets with the parameters (1) k = 1 , b = (0 . , − . and (2) k = 2 , b = (0 . , − . , − . . Both simulations share the parameters σ = .

05 = S t for each t. Both series have 501 years of data (this is considerably longer than the real survey, so we can see how much we can expect to learn about the model parameters in the future).

We consider ﬁve prior choices for b :

1. Independent Normal, variance 5 (primarily as an illustration of Lindley’s Paradox).

2. Independent Normal, variance 1 (a baseline for comparison).

3. Multivariate Normal with covariance matrix from the modiﬁed version of 5, and σ b = 1 , h = . .

4. A shrinkage-inspired prior: Normal with covariance matrix based on 6: b k ∼ N  ,  σ b + h − σ b /k . . . . . . − σ b /k − σ b /k σ b /k ... σ b /k ... . . . − σ b /k σ b /k  (7)and again σ b = 1 , h = 0 . .

5. As (4), but with smaller variance ascribed to later components of b : b k ∼ N  ,  σ b + h − σ b ∗ d . . . . . . − σ b ∗ d/k − σ b ∗ d σ b ∗ d ... σ b ∗ d/ ... . . . − σ b ∗ d/k σ b ∗ d/k  (8) d is suitably deﬁned so that the sum of variances of b k is σ b . In fact under this restriction d = 1 (cid:80) kj =1 /j in equation 8 for k ≥ . Notice that both priors (4) and (5) have the same total variance for b , as long as k > . This is deliberate, as discussed earlier. The choice between the last two priors largely depends on whether one considers the assumption that longer lags tend to be smaller in size to be suitable a priori . We will see that they do not provide substantially different estimates or predictions, but then we only consider simulations for low values of k . Jump MCMC (Green, 1995) to produce a sample from the posterior for each simulation.This pro- duces a weighted sample from the posterior distribution of models, parameters and hidden states.

We are also able to chart the posterior as it evolves over time, as more data are added.

The evolution of the posterior for k in the k = 1 simulation is shown in ﬁgure 2. The results for the l . . . . . . Evolution of Model Posterior with Time

Time C u m u l a t i v e P o s t e r i o r P r obab ili t y k=5k=4k=3k=2k=1k=0 (a) Independent prior, variance 5 l . . . . . . Evolution of Model Posterior with Time

Time C u m u l a t i v e P o s t e r i o r P r obab ili t y k=5k=4k=3k=2k=1k=0 (b) Independent prior, variance 1 l . . . . . . Evolution of Model Posterior with Time

Time C u m u l a t i v e P o s t e r i o r P r obab ili t y k=5k=4k=3k=2k=1k=0 (c) Correlated prior, variance 1 l . . . . . . Evolution of Model Posterior with Time

Time C u m u l a t i v e P o s t e r i o r P r obab ili t y k=5k=4k=3k=2k=1k=0 (d) Shrinkage prior, variance /k Figure 2: Evolution of the posterior model distribution over a long time span when k = 1 , for fourdifferent prior choices (1)–(4). The ﬁfth posterior is almost identical to the fourth. second simulation are not pictured; they are qualitatively similar (except that the majority of the posterior mass is on k = 2 instead of k = 1 , after a similar period of time). This ﬁgure makes it k , we can only do so slowly and given that we only have ﬁfty or so years of duck abundance data, we cannot expect a conclusive model selection posterior. This makes it particularly important that we choose a parameter prior that will not inﬂuence the model selection process, since the signal from the data is quite weak.

Another point of interest is the posterior at t = 6 , i.e. after only one residual is taken into account. This is the ﬁrst point at which we have a model posterior, and we can see for the independent priors that this posterior is far from uniform. This is one quantiﬁcation of the model selection bias induced by the independent priors. In total, eleven species are analyzed. Seven of these are dabbling ducks: Mallard (

Anas platyrhyn- chos ), American Wigeon (

Anas americana ), Gadwall (

Anas strepera ), Green-Winged Teal (

Anas crecca ), Blue-Winged Teal (

Anas discors ), Northern Shoveler (

Anas clypeata ) and Northern Pin- tail (

Anas acuta ). The remaining four are diving ducks, two of which are amalgamated: Redhead ( Aythya americana ), Canvasback (

Aythya valisineria ) and Greater and Lesser Scaup (

Aythya mar- ila and

Aythya afﬁnis ). The data, as supplied by US Fish and Wildlife Service (2010), include both an estimated annual count and an estimate of the observation error. We treat the observation error as exact. The posterior model probabilities for each species, using the shrinkage prior (4), are summarised in table 3.

None of the posteriors are conclusive as to the order of density dependence. We expect this from the simulation study; even with data that we know follows a particular instance of the model, we can only expect perhaps a 60% posterior probability for that model after this length of time. It would be optimistic to expect the same level of agreement with real data. k = 0 k = 1 k = 2 k = 3 k = 4 k = 5 Mallard 0.1718 0.1882 0.208 0.0783 0.2399 0.1138A.Wigeon 0.0238 0.4192 0.2637 0.1337 0.0601 0.0995Gadwall 0.6805 0.1664 0.0553 0.0191 0.045 0.0338G.W.Teal 0.6816 0.0982 0.052 0.053 0.0588 0.0563B.W.Teal 0.4422 0.3197 0.1347 0.0582 0.0276 0.0176N.Shoveler 0.4906 0.0756 0.2494 0.0942 0.0421 0.0481N.Pintail 0.2733 0.2319 0.2707 0.1057 .0322 0.0862Redhead 0.3239 0.0671 0.2005 0.1489 0.1355 0.124Canvasback 0.0299 0.5284 0.192 0.0939 0.0922 0.0636Scaup 0.5764 0.1454 0.1349 0.0684 0.0418 0.0331Figure 3: Posterior model probabiliites for each duck species, using a shrinkage prior.

It is impossible to assess the quality of the k posterior, since we have nothing with which to compare it. We can however look at the ability of the posterior at a given time point to make predictions of future numbers. These can then be compared with our best guess of the truth for that year (which the predictions were made without knowledge of.) A simple quantity that measures predictive accuracy is the one step ahead Mean Square Error MSE ( t ) = E (ˆ x t − ˜ x t ) where ˆ x t is the prediction of x t from the particle set at time t − and ˜ x t is the “smoothed” state estimated from the particle set at time T . We seek to minimize MSE. As a typical example of the relative performance of each prior, ﬁgure 4 shows the evolution of the MSE over time, using the N (0 , prior as a baseline, for the American Wigeon data. After a certain time, the MSE becomes approximately equal for all priors. This shows that the data has overwhelmed the prior in terms of information. Before then, there is signiﬁcant disparity in MSE for the different priors, and while the correlated prior offers a mild improvement over the independent one, the shrinkage priors clearly outperform the others for up to 30 years.

The MSE can sometimes be slightly misleading, since predictions are correlated (as are the quantities they are predicting). One measure to correct this is the Mahalanobis distance (Ma- halanobis, 1936). This is based on taking a Gaussian approximation to the predictive distribution

10 20 30 40 50

Predictive Accuracy Relative to Normal(0,5) t M SE % Normal(0,5)Normal(0,1)CorrelatedShrinkage (eql var)shrinkage (decr var)Normal(0,5)Normal(0,1)CorrelatedShrinkage (eql var)shrinkage (decr var)

Figure 4: Mean squared predictive error for different priors, scaled so that the Normal(0,5) prior isat 100%. American Wigeon data.and calculating the expected total squared error over the whole time series. It is given by D M = (ˆ x − ˜ x ) T S − (ˆ x − ˜ x ) . (9)The Mahalanobis distance is not a function of time, it measures performance from start to ﬁnish. A low Mahalanobis distance is indicative of good overall predictive accuracy. When we calculate the Mahalanobis distance under each choice of prior for each species, we obtain table 5. The story is broadly the same for all the species, as follows: The independent priors have much more predictive error than the correlated ones (the high-variance prior being worst). The shrinkage priors, as expected, offer improvements over all the others, however there is little difference in accuracy between the two types of shrinkage prior.

We see that for most species, we cannot discount the possibility that k = 0 . This can be interpreted in a few different ways. The simplest explanation (which is also the least likely to be true in the authors’ opinion) is that the species do not show density dependent dynamics. It is also possible that the species are in fact far from carrying capacity, so that the density dependent effects are too small to be measured. In that case a hypothesis must be made as to what is keeping the species from reaching capacity, and that is beyond the scope of this study. It might be possible that the numerical nature of the density dependence cannot be projected onto this class of models. If we were to observe the species over a longer time period, or where it were closer to capacity, these differences would likely present themselves in the form of evidence for k > in the posterior. It is interesting to note that the Mallard (which is the only species for which the posterior-preferred model is greater than k = 2 ) is also the species with the highest population count. This is potentially indicative that intra-species competition is a major factor in dabbling ducks, as there is a strong negative correlation between total count and posterior probability that k = 0 for dabbling ducks. This correlation also could be taken as evidence of the generalist preda- tor hypothesis, which would argue that changes in duck recruitment (i.e. changes to b ) would be met with immediate responses from the predator (so that in fact b might change from year to year, but in a way that is probabilistically equivalent to the k = 0 model with the variance being added σ instead). The picture is somewhat different for diving ducks. The aforementioned correlation be- tween raw numbers and apparent density dependence is not apparent here. Again, this is consistent with the generalist predator hypothesis which, taken in conjunction with the reports from Sargeant et al. (1984) about diving ducks being much less vulnerable to this kind of predation, would sug- gest a different density dependent structure from that of dabbling ducks. Even here though, there is still appreciable posterior probability that k = 0 in two out of three cases. The hypothesis of Jamieson and Brooks (2004), that diving ducks were in general more density dependent than dabbling ducks, is not really borne out by this analysis. The authors of that paper used independent priors with different variance for each species. As one example, for the Blue Winged Teal, the authors had an independent prior variance of 3, and came to a posterior that was 73% in favour of k = 0 , and almost all the rest of the mass was for k = 1 . We have demonstrated that this is largely an artifact of Lindley’s Paradox and our posterior is much less conclusive. We hope that we have demonstrated the importance of a considered choice of prior. A default choice is rarely safe in model selection problems, and we have shown how, by considering whether the carrying capacity is well-deﬁned and trying to exclude cases where it isn’t, we can arrive at an informative prior without peeking at the data.

A more general principle is that of excluding so-called ‘unphysical’ possibilities from the prior, that is, not allowing parameters to take values which would produce behaviour we know does not happen. We excluded models which did not give rise to a well-deﬁned carrying capacity; the precise nature of the prior restrictions will vary from problem to problem.

It is important to consider how a parameter’s prior varies between models: a parameter

In our example b typically had a prior that was different under the null model k = 0 than in more complex cases. This mirrored the fact that in the null model b was interpreted as an overall drift, whereas otherwise it was the counterbalance to the density dependence effects. When we excercise such caution in choosing our parameter priors, we are in a position to judge much more effectively whether the data provide evidence in favour of our hypotheses or not.

Acknowledgements

JDL is funded by an Engineering and Physical Research Council Grant (EPSRC) number (to fol- low). This work was also partially funded by EPSRC grant EP/D06570 4/1 to RBG. Most of this research was conducted while RBG was a Lecturer at the Statistical Laboratory, University of

Cambridge.

References

Bjørnstad, O., Falck, W., and Stenseth, N. (1995). “A Geographic Gradient in Small Rodent

Density Fluctuations: A Statistical Modelling Approach.”

Proceedings: Biological Sciences , Carvalho, C., Johannes, M., Lopes, H., and Polson, N. (2010). “Particle learning and smoothing.”

Statistical Science , 25, 1, 88–106.

Dennis, B. and Taper, M. (1994). “Density Dependence in Time Series Observations of Natural

Populations: Estimation and Testing.”

Ecological Monographs , 64, 205–224.

Green, P. (1995). “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.”

Biometrika , 82, 4, 711–732.

Animal

Biodiversity and Conservation , 27, 1, 113–128.

Kass, R. and Raftery, A. (1995). “Bayes factors.”

Journal of the American Statistical Association ,

90, 430.

Lindley, D. (1957). “A Statistical Paradox.”

Biometrika , 44, 187–192.

Link, W. and Barker, R. (2006). “Model Weights and the Foundations of Multimodel Inference.”

Ecology , 87, 10, 2626–2635.

Mahalanobis, P. (1936). “On the Generalised Distance in Statistics.”

Proc. of the Nat’l. Inst. Sci. in India , 12, 49–55.

Sargeant, A. B., Allen, S. H., and Eberhardt, R. T. (1984). “Red Fox Predation on Breeding Ducks in Midcontinent North America.”

Wildlife Monographs , , 89, 3–41.

Stein, C. (1955). “Inadmissibility of the Usual Estimator for the Mean of a Multivariate Nor- mal Distribution.” In

Proceedings of the Third Berkeley Symposium , ed. J. Neyman, 197–206.

University of California Press.

Turchin, P. (1995). “Population regulation: old arguments and a new synthesis.” In

Population dynamics: new approaches and synthesis , eds. N. Cappuccino and P. Price, 19–40. Academic

Press, San Diego CA.

US Fish and Wildlife Service (2010). “Waterfowl Population Status 2010.”

Viljugrein, H., Stenseth, N. C., Smith, G. W., and Steinbakk, G. H. (2005). “Density Dependence in North American Ducks.”

Ecology , 86, 1, 245–254., 86, 1, 245–254.