[PDF] Robust discrete choice models with t-distributed kernel errors

Abstract

Inferences of robust behavioural and statistical models are insensitive to outlying observations resulting from aberrant behaviour, misreporting and misclassification. Standard discrete choice models such as logit and probit lack robustness to outliers due to their rigid kernel error distributions. In this paper, we analyse two robust alternatives to the multinomial probit (MNP) model. The two models belong to the family of robit models whose kernel error distributions are heavy-tailed t-distributions which moderate the influence of outlying observations. The first model is the multinomial robit (MNR) model, in which a generic degrees of freedom parameter controls the heavy-tailedness of the kernel error distribution. The second model, the generalised multinomial robit (Gen-MNR) model, is more flexible than MNR, as it allows for distinct heavy-tailedness in each dimension of the kernel error distribution. For both models, we derive efficient Gibbs sampling schemes, which also allow for a straightforward inclusion of random parameters. In a simulation study, we illustrate the excellent finite sample properties of the proposed Bayes estimators and show that MNR and Gen-MNR produce more exact elasticity estimates if the choice data contain outliers through the lens of the non-robust MNP model. In a case study on transport mode choice behaviour, MNR and Gen-MNR outperform MNP by substantial margins in terms of in-sample fit and out-of-sample predictive accuracy. We also find that the benefits of the more flexible kernel error distributions underlying MNR and Gen-MNR are maintained in the presence of random heterogeneity.

Full PDF

RRobust discrete choice models with t-distributedkernel errors

14 September 2020R

ICO K RUEGER (corresponding author)Transport and Mobility LaboratoryEcole Polytechnique Fédérale de Lausanne, Switzerlandrico.krueger@epﬂ.chP

RATEEK B ANSAL

Transport Strategy Centre, Department of Civil and Environmental EngineeringImperial College London, [email protected]

ICHEL B IERLAIRE

Transport and Mobility LaboratoryEcole Polytechnique Fédérale de Lausanne, Switzerlandmichel.bierlaire@epﬂ.chT

HOMAS G ASÓS

Transport and Mobility LaboratoryEcole Polytechnique Fédérale de Lausanne, Switzerlandthomas.gasos@epﬂ.ch a r X i v : . [ ec on . E M ] S e p bstract Models that are robust to aberrant choice behaviour have received limited attention in discrete choiceanalysis. In this paper, we analyse two robust alternatives to the multinomial probit (MNP) model.Both alternative models belong to the family of robit models, whose kernel error distributions areheavy-tailed t-distributions. The ﬁrst model is the multinomial robit (MNR) model in which a genericdegrees of freedom parameter controls the heavy-tailedness of the kernel error distribution. Thesecond alternative, the generalised multinomial robit (Gen-MNR) model, has not been studied inthe literature before and is more ﬂexible than MNR, as it allows for alternative-speciﬁc marginalheavy-tailedness of the kernel error distribution. For both models, we devise scalable and gradient-freeBayes estimators. We compare MNP, MNR and Gen-MNR in a simulation study and a case study ontransport mode choice behaviour. We ﬁnd that both MNR and Gen-MNR deliver signiﬁcantly betterin-sample ﬁt and out-of-sample predictive accuracy than MNP. Gen-MNR outperforms MNR due to itsmore ﬂexible kernel error distribution. Also, Gen-MNR gives more reasonable elasticity estimatesthan MNP and MNR, in particular regarding the demand for under-represented alternatives in aclass-imbalanced dataset.

Keywords: robustness, probit, robit, Bayesian estimation, discrete choice, transport mode choice . Introduction

Random utility maximisation is by far the most widely adopted decision paradigm in the formulationof discrete choice models. Random utility theory (McFadden, 1981) posits that a rational decision-maker chooses the option with the highest utility from a ﬁnite set of mutually-exclusive alternatives.Since the utility of an alternative depends both on observed factors as well as on factors that theanalyst does not or cannot observe, the conditional indirect utility of an alternative contains a randomerror term. Typically, the random error terms of alternatives in a choice set are assumed to be eitherindependent and identically Gumbel distributed (logit kernel) or jointly Gaussian distributed (probitkernel).The Gumbel and the Gaussian distributions have restrictive shapes, which limit the explanatory andpredictive powers of the resulting logit and probit choice models. Whereas the Gaussian distributionhas a symmetric bell shape with light tails, the Gumbel distribution is right-skewed with a righttail that is slightly heavier than that of the Gaussian distribution. In recent years, researchers haveexplored various departures from standard kernel error distributions (see Paleti, 2019, for a review).These advancements include negative exponential (Alptekino˘glu and Semple, 2016), negative Weibull(Castillo et al., 2008), generalised exponential (Fosgerau and Bierlaire, 2009) and q-generalisedreverse Gumbel (Chikaraishi and Nakayama, 2016) kernel error distributions, additive combinationsof Gumbel and exponential error terms (Del Castillo, 2016), a class of asymmetric distributions(Brathwaite and Walker, 2018), copulas with Gumbel marginals (Del Castillo, 2020). However, theseextensions do not aim at enhancing the robustness of choice models.The concept of robustness is well established in statistics, with the notion that a robust modelsafeguards inferences against the inﬂuence of outliers and violations of modelling assumptions(e.g. Gelman et al., 2013). In discrete choice analysis, the need for robust models arises in varioussituations to address aberrant utility differences. For example, utility differences can be aberrant fromthe analyst’s point-of-view, if the analyst possesses little information about the factors inﬂuencingchoices. In this scenario, the contribution of the random disturbance to the conditional indirectutility can be relatively large for some observations. Utility differences also contain outliers if thepostulated decision paradigm (such as random utility maximisation) does not accurately represent thedecision protocols governing some of the observed choices. Furthermore, aberrant utility differencesare a concern in class-imbalanced datasets, which are frequently encountered in non-experimentalsettings. This is because in class-imbalanced data, the utility differences involving under-representedalternatives are outliers relative to utility differences involving well-represented options.Lange et al. (1989) advocate the use of the heavy-tailed t-distribution as a means to increaserobustness in regression models. Compared to the Gaussian distribution, the t-distribution has onemore parameter which controls the heavy-tailedness of the distribution to moderate outlying datapoints. In the context of generalised linear models, Liu (2004) proposes the binary robit model,which is built on a t-distribution with unknown degrees of freedom (DOF), as a robust alternativeto logistic and probit regression models. Furthermore, Ding (2014) constructs a robust Heckmanselection model using the t-distribution as kernel error distribution. Jiang and Ding (2016) formulateHeckman selection and multivariate robit models based on t-distributions with different marginalDOF.Robustness has received limited attention in multinomial choice analysis. Dubey et al. (2020)present the ﬁrst multinomial robit (MNR) model, i.e. a multinomial choice model deﬁned through1 t-distributed kernel error with an estimable DOF. Dubey et al. (2020) make a strong empiricalcase to adopt the MNR model over the multinomial probit (MNP) model. First, the estimates of theMNP model are inconsistent, if the kernel errors in the data generating process are heavy-tailed.Second, the robustness of MNR results in superior in-sample ﬁt and out-of-sample predictive abilityfor class-imbalanced data sets. In another study, Peyhardi (2020) formulates an MNR model in thecontext of generalised linear models and shows that MNR can help in identifying artiﬁcial aspects ofthe design of stated preference experiments.We identify two research gaps in the formulation and estimation of MNR models. First, Dubeyet al. (2020) and Peyhardi (2020) constrain the ﬂexibility of the kernel error distribution by assumingthat a single, generic DOF parameter controls the heavy-tailedness of the kernel error distribution.This modelling assumption implies that the same level of utility aberrance applies to all alternatives.Second, the estimation approaches employed in both studies are not scalable. Dubey et al. (2020)are unable to derive analytical gradients of the MNR model and thus rely on computationally-expensive numerical gradient approximations during the maximisation of the simulated log-likelihoodof the model. Peyhardi (2020) estimates the DOF parameter by performing a grid search, whichrequires the model to be estimated at multiple values of the DOF parameter and suffers from thecurse of dimensionality if the underlying kernel distribution had multiple DOF parameters. Besides,incorporating representations of unobserved heterogeneity is computationally expensive in bothstudies, as it necessitates an additional layer of simulation in the computation of the log-likelihood.In this paper, we address the ﬁrst limitation of existing MNR models (i.e. generic heavy-tailedness)by formulating a generalised MNR (Gen-MNR) model with alternative-speciﬁc DOF parameters. Tothat end, we adopt the non-elliptical contoured t-distribution (Jiang and Ding, 2016) as kernel errordistribution. To tackle the second limitation (i.e. computationally-expensive estimation), we devisegradient-free Bayesian estimation approaches for both the MNR and the Gen-MNR models. In theconstruction of the Bayesian estimation approaches, we exploit the hierarchical normal mixturerepresentation of the t-distribution. To bypass complex likelihood computations in the estimationof the MNR and the Gen-MNR models, we employ a combination of Bayesian data augmentationtechniques used in the estimation of MNP models (Albert and Chib, 1993; McCulloch and Rossi, 1994)as well as of non-multinomial robit models (Ding, 2014; Jiang and Ding, 2016). Bayesian estimationalso facilitates accommodating ﬂexible semi-parametric representations of unobserved preferenceheterogeneity (Krueger et al., 2020).We ﬁrst use simulated data to investigate the properties of the proposed models and their estimationmethods in terms of parameter recovery and elasticity estimates. Subsequently, we compare MNP,MNR and Gen-MNR in a case study on transport mode choice behaviour using revealed preferencedata from London, UK. In the real data application, we contrast willingness to pay and elasticityestimates as well as in-sample ﬁt and out-of-sample predictive accuracy of the three models.The remainder of the paper is organised, as follows: First, we present the mathematical formulationsof the MNP, MNR and Gen-MNR models (Section 2). Then, we outline the estimation approaches andsuccinctly discuss the adopted data augmentation techniques (Section 3). In Sections 4 and 5, wepresent the simulation and case studies, respectively. Finally, we conclude and identify avenues forfuture research (Section 6). 2 . Model formulations

In this section, we present the formulations of the MNP, MNR and Gen-MNR models.

We consider a standard random utility model in which an agent i =

1, . . . , N chooses from a setof J mutually exclusive alternatives. In principle, utility is not identiﬁed at an absolute level.Therefore, the MNP model is deﬁned through a J − w i = { w i j , . . . , w i , J − } (McCulloch and Rossi, 1994). The elements of w i correspond to the utilitydifferences with respect to the base alternative J . The observed choice y i ∈ {

1, . . . , J } is assumed toarise from y i ( w i ) =  j if max ( w i ) = w i j > J if max ( w i ) <

0, for i =

1, . . . , N . (1)The latent variable w i is represented as w i = X i β + (cid:34) i with (cid:34) i ∼ N ( , Σ ) , for i =

1, . . . , N . (2)Here, X i is a ( J − ) × K matrix of differenced predictors, i.e. X i =  X i ... X i , J −  =  X obs i − X obs iJ ... X obs i , J − − X obs iJ  ,where X obs i j is the observed attribute vector of alternative j for agent i . β is a K vector of tasteparameters. Σ is a ( J − ) × ( J − ) covariance matrix. The latent variable representation (2) is notidentiﬁed, because w i can be multiplied by any positive scalar c without changing the likelihood (1),i.e. y i ( w i ) = y i ( c w i ) . Therefore, we must set the scale. We follow Burgette and Nordheim (2012)and impose a trace restriction on Σ , i.e. tr ( Σ ) = J −

1. To complete the speciﬁcation of the MNPmodel, we place a normal prior on β , i.e. β ∼ N ( ζ , B ), and an Inverse-Wishart prior on Σ , i.e. Σ ∼ I W ( ρ , S ) . Predictions under the Bayesian formulation of the MNP model can be sensitive to theselection of the base alternative J (Burgette and Nordheim, 2012). The MNR model assumes a t-distributed kernel error for the latent variable w i , i.e. w i = X i β + (cid:34) i with (cid:34) i ∼ t ( , Σ , ν ) , for i =

1, . . . , N , (3)where Σ is a ( J − ) × ( J − ) covariance matrix and ν is scalar degree of freedom (DOF). Thet-distribution also has the following normal mixture representation (Ding, 2014): (cid:34) i ∼ N ( , Σ / q i ) with q i ∼ χ ν /ν , for i =

1, . . . , N . (4)The latent variables q = { q , . . . , q N } allow for heavy-tailedness in the distribution of the kernel errorby increasing the variability of (cid:34) i across different i . Figure 1 illustrates the relationship betweenthe χ -distribution (which controls the distribution of q ) and a t -distribution (which controls thedistribution of (cid:34) ) with unit variance for different DOF ν . For small ν <

30, the t-distribution exhibits3eavy tails. As ν approaches ∞ , the t -distribution converges to the normal distribution. We use thesame priors for β and Σ as in MNP. For identiﬁcation, we also maintain the trace restriction on Σ . Weplace a Gamma prior on ν with ν ∼ Gamma ( α , β ) . Predictions under the Bayesian formulation ofthe MNR model can be sensitive to the selection of the base alternative in the same way as predictionsunder the Bayesian formulation of the MNP model. x f ( x ) -distribution = 1= 2= 3= 4= 6= 8 x f ( x ) t -distribution = 1= 2= 3= 4= 6= 8= Figure 1:

Relationship between χ - and t-distributions for different degrees of freedom ν We generalise MNR by allowing for different marginal heavy-tailedness in the distribution of the latentvariable w i . The Gen-MNR model assumes that the kernel error of w i is drawn from a non-ellipticalcontoured t -distribution (NECT; Jiang and Ding, 2016). We have w i = X i β + (cid:34) i with (cid:34) i ∼ NECT p ( , Σ , ν ) , for i =

1, . . . , N , (5)where Σ is a ( J − ) × ( J − ) covariance matrix and ν = { ν , . . . , ν S } is S vector of DOF with1 < S ≤ J − p = { p , . . . , p S } is a S vector giving the number of dimensions that are associated witheach DOF ν s . We have p s ∈ (cid:78) \ { } and (cid:80) Ss = p s = J −

1. The NECT distribution has the followingnormal mixture representation (Jiang and Ding, 2016): (cid:34) i = Q − / i Σ / Z i , Z i ∼ N ( , I J − ) , for i =

1, . . . , N , (6)where Q i = diag ( q i I p , . . . , q iS I p S } is a ( J − ) × ( J − ) block-diagonal matrix with q is ∼ χ ν s /ν s for s =

1, . . . , S . I l is a l × l identity matrix. Each marginal component of a NECT-distributed randomvariable follows a univariate t-distribution with the respective DOF, i.e. if (cid:34) ∼ N EC T p ( , Σ , ν ) , then (cid:34) j ∼ t ( Σ j j , ν s ( j ) ) , where s ( j ) maps dimension j onto its associated DOF. In the rest of this work,we assume that S = J − ν j ∼ Gamma ( α , β ) for j =

1, . . . , J −

1. We also maintainthe trace restriction on Σ for identiﬁcation. Predictions of the Gen-MNR model can be sensitive to theselection of the base alternative in the same way as predictions of the MNP and MNR models.4 . Inference For the estimation of the MNP, MNR and Gen-MNR models, we employ Markov chain Monte Carlomethods in the form of Gibbs sampling (Robert and Casella, 2013). The sampling schemes for thethree models are presented in Algorithms 1, 2 and 3.Algorithm 1 corresponds to the Gibbs sampler proposed by Burgette and Nordheim (2012) withoutthe marginal data augmentation scheme devised by Imai and Van Dyk (2005). Algorithm 2 is based onthe Gibbs sampler proposed by Ding (2014) for the robust Heckman selection model, and Algorithm3 is most closely related to the Gibbs sampler proposed by Jiang and Ding (2016) for the multivariaterobit model.All samplers involve data augmentation (Tanner and Wong, 1987) to facilitate their construction.The central idea of Bayesian data augmentation is to treat latent variables as unknown modelparameters, which are imputed in additional sampling steps. Each of the samplers uses the dataaugmentation scheme developed by Albert and Chib (1993) and McCulloch and Rossi (1994) toimpute the latent variable w (see Appendix A.1 for details). The samplers for the MNR and Gen-MNRmodels additionally incorporate the data augmentation schemes devised by Ding (2014) and Jiangand Ding (2016), respectively, to impute the latent variable q . Data augmentation circumventscomplex likelihood calculations in the estimation of the MNP, MNR and Gen-MNR models. Thisis because conditional on w and q (if applicable), the models reduce to standard Bayesian linearmodels.The full conditional distribution of ν in the MNR model as well as the full conditional distributionsof ν j and q i j in the Gen-MNR model are nonstandard. To draw from these intricate distributions,we use Metropolised Independence samplers (Liu, 2008) with approximate Gamma proposals, asdevised by Ding (2014) and Jiang and Ding (2016) (see Appendices A.2 and A.3 for details).We implement Algorithms 1, 2 and 3 in Julia (Bezanson et al., 2017). In the subsequent applications,the Gibbs samplers are executed with a single chain consisting of 300,000 draws including a warm-upperiod of 200,000 draws. A thinning factor of 10 is applied to the post warm-up draws. Convergenceis assessed with the help of the potential scale reduction factor (Gelman et al., 1992).5 lgorithm 1 Gibbs sampler for the MNP model

Step 0:

Initialise parameters β , Σ , w . for t =

1, . . . , T doStep 1: Update w i j given β , Σ , ν , w i , − j . for i =

1, . . . , N dofor j =

1, . . . , J − do Draw w i j | β , Σ , w i , − j ∼ TN ( µ i j , τ i j ) as explained in Appendix A.1. end forend forStep 2: Update β given Σ , w .Set ˆ B = (cid:128)(cid:80) Ni = X (cid:62) i Σ − X i + B (cid:138) − .Set ˆ β = ˆ B (cid:128)(cid:80) Ni = X (cid:62) i Σ − w i (cid:138) .Draw β | Σ , w ∼ N ( ˆ β , ˆ B ) . Step 3:

Update Σ given β , w .Set z i = w i − X i β , for i =

1, . . . , N .Draw ˜ Σ | β , w ∼ I W (cid:128) N + ρ , S + (cid:80) Ni = z i z (cid:62) i (cid:138) .Set α = tr ( ˜ Σ ) / ( J − ) .Set Σ = ˜ Σ /α .Set w i = ( z i + α X i β ) /α . end forreturn β , Σ lgorithm 2 Gibbs sampler for the MNR model

Step 0:

Initialise parameters β , Σ , ν , w , q . for t =

1, . . . , T doStep 1: Update w i j given β , Σ , w i , − j , q i . for i =

1, . . . , N dofor j =

1, . . . , J − do Draw w i j | β , Σ , w i , − j , q i ∼ TN ( µ i j , τ i j ) as explained in Appendix A.1. end forend forStep 2: Update q i j given β , Σ , ν j w i . for i =

1, . . . , N do Set z i = w i − X i β .Draw q i | β , Σ , w i ∼ χ ν + J − / (cid:0) z (cid:62) i Σ − z i (cid:1) . end forStep 3: Update β given Σ , w , q .Set ˆ B = (cid:128)(cid:80) Ni = q i X (cid:62) i Σ − X i + B (cid:138) − .Set ˆ β = ˆ B (cid:128)(cid:80) Ni = q i X (cid:62) i Σ − w i (cid:138) .Draw β | Σ , w , q ∼ N ( ˆ β , ˆ B ) . Step 4:

Update Σ given β , w , q .Set z i = w i − X i β , for i =

1, . . . , N .Draw ˜ Σ | β , w , q ∼ I W (cid:128) N + ρ , S + (cid:80) Ni = q i z i z (cid:62) i (cid:138) .Set α = tr ( ˜ Σ ) / ( J − ) .Set Σ = ˜ Σ /α .Set w i = ( z i + α X i β ) /α . Step 5:

Update ν given q .Calculate α ∗ , β ∗ as explained in Appendix A.2.Draw proposal ν (cid:48) ∼ Gamma ( α ∗ , β ∗ ) .Accept the proposal with probability min (cid:8)

1, exp (cid:0) l ( ν (cid:48) ) − h ( ν (cid:48) ) − l ( ν ) + h ( ν ) (cid:1)(cid:9) , where l ( ν ) and h ( ν ) are deﬁned in (9) and (10), respectively. end forreturn β , Σ , ν lgorithm 3 Gibbs sampler for the Gen-MNR model

Step 0:

Initialise parameters β , Σ , ν , w , q . for t =

1, . . . , T doStep 1: Update w i j given β , Σ , w i , − j , q i . for i =

1, . . . , N dofor j =

1, . . . , J − do Draw w i j | β , Σ , w i , − j , q i ∼ TN ( µ i j , τ i j ) as explained in Appendix A.1. end forend forStep 2: Update q i j given β , Σ , w i , q i , − j . for i =

1, . . . , N dofor j =

1, . . . , J − do Calculate α ∗ , β ∗ as explained in Appendix A.3.Draw proposal q (cid:48) i j ∼ Gamma ( α ∗ , β ∗ ) .Accept the proposal with probability min (cid:166)

1, exp (cid:128) f ( q (cid:48) i j ) − g ( q (cid:48) i j ) − f ( q i j ) + g ( q i j ) (cid:138)(cid:169) ,where f ( q i j ) and g ( q i j ) are deﬁned in (15) and (16), respectively. end forend forStep 3: Update β given Σ , w , q .Set ˆ B = (cid:128)(cid:80) Ni = X (cid:62) i Q / i Σ − Q / i X i + B (cid:138) − .Set ˆ β = ˆ B (cid:128)(cid:80) Ni = X (cid:62) i Q / i Σ − Q / i w i (cid:138) .Draw β | Σ , w , q ∼ N ( ˆ β , ˆ B ) . Step 4:

Update Σ given β , w , q .Set z i = w i − X i β , for i =

1, . . . , N .Draw ˜ Σ | β , w , q ∼ I W (cid:128) N + ρ , S + (cid:80) Ni = Q / i z i z (cid:62) i Q / i (cid:138) .Set α = tr ( ˜ Σ ) / ( J − ) .Set Σ = ˜ Σ /α .Set w i = ( z i + α X i β ) /α . Step 5:

Update ν j given q j . for j =

1, . . . , J − do Calculate α ∗ , β ∗ as explained in Appendix A.2.Draw proposal ν (cid:48) j ∼ Gamma ( α ∗ , β ∗ ) .Accept the proposal with probability min (cid:166)

1, exp (cid:128) l ( ν (cid:48) j ) − h ( ν (cid:48) j ) − l ( ν j ) + h ( ν j ) (cid:138)(cid:169) , where l ( ν ) and h ( ν ) are deﬁned in (9) and (10), respectively. end forend forreturn β , Σ , ν . Simulation study We conduct a simulation study consisting of two examples to investigate the properties of the proposedmodels and their estimation methods. The simulation study has two speciﬁc objectives. First, weaim to assess the ability of the proposed Gibbs samplers to recover model parameters in ﬁnitesamples. Second, we aim to quantify the effects of ignoring non-normality and different marginalheavy-tailedness of the kernel error distribution on ﬁt and elasticity estimates.

In the ﬁrst example, data are generated according to the MNR model. We let N =

40, 000 and J = β = ( −

2, 1, 1, −

1, 1, − ) (cid:62) and Σ = D Ω D , where Ω =   and D = diag (cid:0) (cid:112) σ (cid:1) with σ = ( ) (cid:62) . Furthermore, we let ν =

2. Here, the ﬁrst three predictors are alternative-speciﬁc constants. The remaining predictors are alternative-speciﬁc attributes. We draw X obs i jk ∼ U (

0, 2 ) for i =

1, . . . , N , j =

1, . . . , J and k =

4, . . . , 7. The fourth alternative is set as reference alternative indata generation and model estimation. For the sake of simplicity, we do not perform a search on thespeciﬁcation of the reference alternative. The simulated data intentionally exhibit signiﬁcant classimbalance. The realised market shares of the choice alternatives are 42.2%, 3.3%, 40.4% and 14.1%.Figure 2 shows the posterior distribution of the DOF parameter ν of the MNR model along withthe corresponding true parameter value used in the generation of the data. It can be seen thatAlgorithm 2 performs well at recovering the DOF parameter of the MNR model. From Figures 4 and5 in Appendix B.1, we can conclude that Algorithm 2 also does an excellent job at recovering theremaining parameters β and Σ .Table 1 compares the in-sample ﬁt of the MNP, MNR and Gen-MNR models in terms of the quadraticloss (QL), which is deﬁned as QL = (cid:80) Ni = (cid:80) Jj = (cid:0) p nj − ˆ p nj (cid:1) , where p nj is the true choice probabilitysimulated at the true parameter values that y n = j is realised, and where ˆ p nj is the correspondingﬁtted choice probability. The MNR model offers the best ﬁt to the data. Interestingly, the Gen-MNRmodel with multiple DOF parameters performs slightly worse than the simpler MNR model. A possibleexplanation for the inability of the Gen-MNR model to perform as well as the MNR model is that theestimation of multiple DOF parameters incurs a greater simulation error. Nonetheless, both the MNRand Gen-MNR models outperform the MNP model by a substantial margin.Finally, we contrast the elasticity estimates of the three models by considering two scenarios. Table2 enumerates the aggregate arc elasticities computed for each of the three models along with thecorresponding true aggregate arc elasticities. In the ﬁrst scenario, we manipulate the ﬁrst alternative-speciﬁc attribute of the under-represented second alternative. We observe that the MNR and Gen-MNRmodels produce direct aggregate arc elasticities, which are much closer to the truth than the directaggregate arc elasticities of the MNP model. For example, for a 10% increase in the consideredattribute, the true direct aggregate arc elasticity is 1.08. Whilst the MNR and Gen-MNR models givedirect elasticities of 1.04 and 1.02, respectively, the MNP model produces a substantially lower directelasticity of 0.85. In the second scenario, we manipulate the ﬁrst alternative-speciﬁc attributes of thewell-represented ﬁrst alternative. In this scenario, the models perform equally well at recovering thetrue aggregate arc elasticities. 9 .6 1.8 2.0 2.2 2.401234 TruthPosterior meanPosterior Figure 2:

Estimated posterior distribution and true value of the degree of freedom parameter ν for the MNR model in simulation example IModel Loss MNP 104.4MNR 8.9Gen-MNR 16.9Table 1:

Quadratic loss in simulation example I ruth MNP MNR Gen-MNRScenario Alt. 1 Alt. 2 Alt. 3 Alt. 4 Alt. 1 Alt. 2 Alt. 3 Alt. 4 Alt. 1 Alt. 2 Alt. 3 Alt. 4 Alt. 1 Alt. 2 Alt. 3 Alt. 4 x obs n ,2,4 ∀ n =

1, . . . , N increased by 5% -0.03 1.05 -0.04 -0.05 -0.03 0.84 -0.03 -0.04 -0.03 1.01 -0.04 -0.04 -0.03 0.97 -0.04 -0.03increased by 10% -0.03 1.08 -0.04 -0.05 -0.03 0.85 -0.03 -0.04 -0.03 1.04 -0.04 -0.05 -0.03 1.02 -0.04 -0.04increased by 25% -0.04 1.16 -0.04 -0.06 -0.03 0.90 -0.03 -0.04 -0.04 1.12 -0.04 -0.05 -0.04 1.12 -0.05 -0.05 x obs n ,1,4 ∀ n =

1, . . . , N increased by 5% 0.41 -0.27 -0.28 -0.38 0.41 -0.26 -0.28 -0.37 0.41 -0.27 -0.28 -0.38 0.41 -0.31 -0.28 -0.38increased by 10% 0.42 -0.29 -0.29 -0.39 0.41 -0.28 -0.29 -0.38 0.41 -0.30 -0.29 -0.39 0.41 -0.30 -0.29 -0.39increased by 25% 0.43 -0.31 -0.32 -0.42 0.43 -0.31 -0.31 -0.42 0.43 -0.31 -0.31 -0.42 0.43 -0.32 -0.31 -0.42 Table 2:

Aggregate arc elasticities in simulation example I

Truth MNP MNR Gen-MNRScenario Alt. 1 Alt. 2 Alt. 3 Alt. 4 Alt. 1 Alt. 2 Alt. 3 Alt. 4 Alt. 1 Alt. 2 Alt. 3 Alt. 4 Alt. 1 Alt. 2 Alt. 3 Alt. 4 x obs n ,2,4 ∀ n =

1, . . . , N increased by 5% -0.04 1.14 -0.03 -0.06 -0.04 0.98 -0.03 -0.05 -0.04 1.07 -0.03 -0.06 -0.04 1.11 -0.03 -0.06increased by 10% -0.04 1.16 -0.04 -0.07 -0.04 1.00 -0.03 -0.05 -0.04 1.11 -0.03 -0.07 -0.04 1.13 -0.03 -0.07increased by 25% -0.05 1.25 -0.04 -0.08 -0.04 1.06 -0.03 -0.06 -0.05 1.19 -0.04 -0.08 -0.05 1.20 -0.04 -0.08 x obs n ,1,4 ∀ n =

1, . . . , N increased by 5% 0.42 -0.37 -0.24 -0.43 0.42 -0.36 -0.24 -0.43 0.42 -0.36 -0.24 -0.43 0.42 -0.34 -0.24 -0.43increased by 10% 0.42 -0.37 -0.24 -0.44 0.42 -0.36 -0.24 -0.44 0.42 -0.36 -0.24 -0.44 0.42 -0.36 -0.24 -0.45increased by 25% 0.43 -0.39 -0.27 -0.48 0.44 -0.39 -0.27 -0.48 0.44 -0.39 -0.27 -0.47 0.44 -0.39 -0.26 -0.48 Table 3:

Aggregate arc elasticities in simulation example II .2. Example II: Data generated according to Gen-MNR In the second example, data are generated according to Gen-MNR. The data generating process isessentially same as before. The only difference is that we allow for different marginal heavy-tailednessby setting ν = (

5, 3, 1 ) (cid:62) . Furthermore, we let β = − ν , ν and ν alongwith their corresponding true parameter values. It can be seen that Algorithm 3 performs well atrecovering the DOF parameters of Gen-MNR. From Figures 6 and 7 in Appendix B.2, we can concludethat Algorithm 3 also does an excellent job at recovering β and Σ .Table 4 compares the in-sample ﬁt of MNP, MNR and Gen-MNR in terms of the quadratic loss.As expected, Gen-MNR provides the best ﬁt to the data, followed by MNR. MNR and Gen-MNRoutperform MNP by a signiﬁcant margin.Finally, Table 3 enumerates the aggregate arc elasticities of the three models along with their truecounterparts for the exact same scenarios as in the ﬁrst simulation example. In the ﬁrst scenario, wemanipulate the ﬁrst alternative-speciﬁc attribute of the under-represented second alternative. Weobserve that MNR and Gen-MNR produce direct aggregate arc elasticities, which are closer to thetruth than the corresponding estimates of MNP. For example, the true direct aggregate arc elasticityfor a 10% increase in the considered attribute is 1.16. MNR and Gen-MNR produce direct aggregatearc elasticities of 1.10 and 1.13, respectively. By contrast, MNP returns a markedly lower directaggregate arc elasticity of 1.00. Overall, the differences between MNR and Gen-MNR are minor. Inthe second scenario, we manipulate the ﬁrst alternative-speciﬁc attribute of the well-representedﬁrst alternative. In this scenarios, all models perform equally well at recovering the true direct andindirect arc elasticities. Figure 3:

Estimated posterior distributions and true values of the degree of freedom parameters ν , ν and ν for Gen-MNR in simulation example II odel Loss MNP 68.3MNR 25.7Gen-MNR 9.0Table 4:

Quadratic loss in simulation example II

5. Case study

We apply MNP, MNR and Gen-MNR in a case study on transport mode choice behaviour.

Revealed preference data for the case study are sourced from the London Passenger Mode Choice(LPMC) dataset, which was compiled and made publicly available by Hillel et al. (2018). The LPMCdataset consists of trip records from the London Travel Demand Survey, which was conducted from2012 to 2015. For each trip record, Hillel et al. (2018) imputed tailored choice sets includingthe attributes of the chosen and the non-chosen alternatives using an online directions applicationprogramming interface. For more information about the LPMC dataset, the reader is directed to Hillelet al. (2018). In this case study, we restrict our analysis to home-based trips reported by individualswho are at least 12 years old. The resulting dataset comprises 58,584 observations. There are fourmode choice alternatives, namely walking, cycling, transit and driving with observed market sharesof 16.6%, 3.2%, 37.4% and 42.8%, respectively. We hold out 10% of the data corresponding to 5,858observations for out-of-sample validation.Each of the three models uses the same speciﬁcation of the systematic utility, as shown in Table5. The variable “trafﬁc variability” is a measure of the driving travel time uncertainty for the givenorigin-destination pair. It is deﬁned as the difference between the travel times in a pessimistic trafﬁcscenario and in an optimistic one divided by the travel time in a typical, best-guess trafﬁc scenario(see Hillel et al., 2018). The drive alternative is set as reference alternative in the estimation ofall models. We performed a search over the speciﬁcation of the reference alternative but found nosubstantive differences in parameter estimates, in-sample ﬁt and out-of-sample predictive accuracyfor different speciﬁcations of the reference alternative.13 ariable Walk Cycle Transit Drive

Alternative-speciﬁc constants β asc, cycle β asc, transit β asc, drive Alternative-speciﬁc attributesCost [ GBP ] β cost β cost Out-of-vehicle time (ovtt) [ hours ] β ovtt β ovtt β ovtt In-vehicle time (ivtt) [ hours ] β ivtt β ivtt No. of transfers β transfers Trafﬁc variability (tv) β tv Individual- and context-speciﬁc attributesFemale traveller β female, cycle β female, transit β female, drive Traveller age <

18 years β ( age <

18 years ) ∨ ( age ≥

65 years ) , cycle β age <

18 years, transit β age <

18 years, drive

Traveller age ≥

65 years β ( age <

18 years ) ∨ ( age ≥

65 years ) , cycle β age ≥

65 years, transit β age ≥

65 years, drive

Travel during winter period (Nov–Mar) β winter, cycle No. of household cars β cars, drive Table 5:

Utility speciﬁcation by alternative in MNP, MNR and Gen-MNR for the case study

Table 6 compares the in-sample ﬁt and the out-of-sample predictive ability of MNP, MNR and Gen-MNR. We calculate Brier scores on the training and test data for each of the models. The Brier score(BS; Brier, 1950) is deﬁned as BS = (cid:80) Ni = (cid:80) Jj = (cid:0) { y n = j } − ˆ p nj (cid:1) , where { y n = j } is an indicator,which equals one if the condition inside the braces is true and zero otherwise, and where ˆ p nj is thepredicted probability that y n = j is observed. The Brier score is a strictly proper scoring rule, becauseit is uniquely minimised by the true predictive choice probabilities (Gneiting and Raftery, 2007).Closely followed by MNR, Gen-MNR provides the best ﬁt to the training data and exhibits superiorout-of-sample predictive accuracy. Both MNR and Gen-MNR outperform MNP by a signiﬁcant margin. Brier scoreModel Train Test

MNP 22002.7 2436.8MNR 21696.1 2404.9Gen-MNR 21681.8 2401.9

Table 6:

In-sample ﬁt and out-of-sample predictive ability of MNP, MNR and Gen-MNR for thecase study5.2.2. Parameters estimates

Table 7 presents the estimates of the parameters of the MNP, MNR and Gen-MNR models. For eachparameter, we report the posterior mean, the posterior standard deviation and the bounds of the 95%credible interval.First, we examine the estimates of the DOF parameters in MNR and Gen-MNR. For both models,we ﬁnd evidence of sizeable heavy-tailedness. For instance, the posterior mean of the genericDOF parameter ν in MNR is 2.031, which is indicative of substantial heavy-tailedness. We detect14ronounced and distinct marginal heavy-tailedness in Gen-MNR. As expected, heavy-tailedness is mostsubstantial for the utility differences involving the under-represented walking and cycling alternatives.The posterior means of the associated DOF parameters equal 2.583 and 4.315, respectively. By contrast,tails are only moderately heavy for the utility difference between transit and the drive alternatives,since the posterior mean of the associated DOF parameter is 10.647. The credible intervals of thethree DOF parameters of Gen-MNR also do not overlap, which indicates that heavy-tailedness in eachdimension of the kernel error distribution is statistically different.In sum, the estimates of the DOF parameters suggest that utility aberrance characterises a non-negligible proportion of the observed transport mode choices. The Gen-MNR model also providesseveral interesting behavioural insights which the MNR model fails to reveal. As the DOF parameterof the utility difference between the passive and frequently-chosen transit and drive alternatives isthe largest of all DOF parameters, the utilities of these alternatives are comparatively less aberrantthan the utilities of the other modes. However, aberrance is noticeably stronger for utility differencesinvolving the active and less frequently-chosen walking and cycling alternatives.Next, we compare the estimates of the taste parameters β . Since the scale of β is not necessarilythe same in each of the three models, we contrast the sensitivities to alternative-speciﬁc attributes interms of their implied willingness to pay (WTP), which is given by the ratio of a non-price coefﬁcientof interest and the price coefﬁcient. WTP is scale-free and allows for a money-metric representationof sensitivities. WTP for reductions in out-of-vehicle and in-vehicle time is slightly larger in MNRand Gen-MNR than in MNP. To be precise, WTP for a reduction in out-of-vehicle time is 33.8 GBP / h,35.1 GBP / h and 37.0 GBP / h, and WTP for a reduction in in-vehicle time is 18.4 GBP / h, 19.9 GBP / h,20.6 GBP / h in MNP, MNR and Gen-MNR, respectively. Interestingly, MNR gives a lower WTP fora reduction in trafﬁc variability than the two other models. Whereas the implied value of a 10%reduction in trafﬁc variability is 2.0 GBP in the MNP model and 1.9 GBP in Gen-MNR, the impliedvalue of a 10% reduction in trafﬁc variability is only 1.7 GBP in MNR. MNR also implies a lowertransfer penalty. The implied transfer penalties in MNP and Gen-MNR are 1.4 GBP and 1.5 GBP,respectively, per transfer. By contrast, the implied transfer penalty in MNR is 1.2 GBP per transfer.On the whole, the WTP estimates show that the three models can produce different economicvaluations of non-price attributes. Strikingly, the WTP estimates for reductions in trafﬁc variabilityand transfers of MNP and Gen-MNR resemble each other closely and differ noticeably from thecorresponding WTP estimates of MNR.The models also provide insights into the inﬂuence of individual- and context-speciﬁc attributeson mode choice propensities. Interestingly, there are no substantive differences between the threemodels. For example, all models suggest that female travellers are relatively less likely to cycle andrelatively more likely to use transit. No gender differences in the propensity to use the driving modeare detected. Old age reduces the propensity to cycle but increases the propensities to use transit andthe driving mode. Furthermore, travel during winter months reduces the propensity to cycle. Higherlevels of car ownership increase the propensity to select the driving mode. Table 8 enumerates aggregate arc elasticities for various policy-relevant scenarios. Our ﬁrst observationis that the elasticities of demand for cycling in response to changes in the out-of-vehicle travel timeof the cycling alternative differ markedly across the three models. Whereas MNP and MNR suggest15hat demand is inelastic, Gen-MNR indicates that demand is elastic. For example, the aggregate arcelasticity for a 10% decrease in the out-of-vehicle time of the cycling alternative is − − − − − NP MNR Gen-MNRParameter Mean Std. dev. [ ] Mean Std. dev. [ ] Mean Std. dev. [ ] β asc, cycle -1.986 0.053 -2.089 -1.878 -2.887 0.183 -3.242 -2.563 -1.907 0.103 -2.098 -1.662 β asc, transit -0.296 0.011 -0.318 -0.275 -0.598 0.022 -0.640 -0.557 -0.587 0.020 -0.627 -0.548 β asc, drive -0.832 0.023 -0.877 -0.786 -1.654 0.051 -1.757 -1.563 -1.621 0.048 -1.733 -1.527 β cost -0.057 0.002 -0.062 -0.052 -0.118 0.006 -0.129 -0.106 -0.097 0.004 -0.106 -0.089 β ovtt -1.934 0.041 -2.006 -1.850 -4.136 0.119 -4.396 -3.962 -3.605 0.083 -3.786 -3.430 β ivtt -1.052 0.036 -1.124 -0.981 -2.348 0.086 -2.543 -2.200 -2.003 0.065 -2.132 -1.873 β tv -1.184 0.039 -1.260 -1.108 -2.049 0.075 -2.196 -1.906 -1.817 0.060 -1.939 -1.699 β transfers -0.080 0.008 -0.094 -0.065 -0.139 0.015 -0.167 -0.110 -0.142 0.012 -0.166 -0.117 β female, cycle -0.660 0.037 -0.734 -0.588 -1.850 0.193 -2.270 -1.499 -1.041 0.097 -1.258 -0.866 β winter, cycle -0.149 0.030 -0.207 -0.091 -0.342 0.085 -0.513 -0.178 -0.189 0.045 -0.282 -0.102 β ( age <

18 years ) ∨ ( age ≥

65 years ) , cycle -0.519 0.049 -0.616 -0.425 -2.041 0.290 -2.641 -1.507 -1.025 0.121 -1.270 -0.800 β female, transit β age <

18 years, transit β age ≥

65 years, transit β female, drive β age <

18 years, drive -0.445 0.023 -0.490 -0.400 -0.989 0.047 -1.085 -0.903 -0.790 0.037 -0.863 -0.720 β age ≥

65 years, drive β cars, drive Σ walk − drive,walk − drive Σ walk − drive,cycle − drive -0.109 0.069 -0.248 0.028 -0.490 0.192 -0.849 -0.115 0.541 0.075 0.387 0.683 Σ walk − drive,transit − drive Σ cycle − drive,cycle − drive Σ cycle − drive,transit − drive Σ transit − drive,transit − drive ν ν walk − drive ν cycle − drive ν transit − drive Table 7:

Estimated parameters of MNP, MNR and Gen-MNR for the case study NP MNR Gen-MNRScenario Walk Cycle Transit Drive Walk Cycle Transit Drive Walk Cycle Transit Drive

Cycling out-of-vehicle travel time decreased by 5% -0.06 -0.87 0.06 0.04 -0.08 -0.70 0.05 0.05 -0.07 -1.06 0.07 0.04decreased by 10% -0.03 -0.86 0.06 0.03 -0.04 -0.71 0.04 0.04 -0.02 -1.08 0.07 0.04decreased by 25% -0.01 -0.83 0.06 0.03 -0.01 -0.71 0.03 0.04 0.00 -1.08 0.07 0.04Walking out-of-vehicle travel time decreased by 5% -1.41 0.03 0.48 0.16 -1.63 0.06 0.56 0.15 -1.63 0.31 0.55 0.15decreased by 10% -1.35 0.02 0.48 0.16 -1.56 0.04 0.57 0.15 -1.56 0.30 0.55 0.15decreased by 25% -1.26 0.02 0.53 0.17 -1.42 0.04 0.62 0.17 -1.43 0.32 0.61 0.17Transit out-of-vehicle travel time decreased by 5% 0.25 0.31 -0.40 0.23 0.34 0.14 -0.44 0.25 0.33 0.32 -0.45 0.25decreased by 10% 0.28 0.29 -0.39 0.22 0.37 0.13 -0.44 0.24 0.37 0.29 -0.45 0.25decreased by 25% 0.29 0.28 -0.36 0.21 0.39 0.12 -0.40 0.23 0.38 0.27 -0.41 0.23Transit in-vehicle travel time decreased by 5% 0.10 0.27 -0.33 0.24 0.11 0.15 -0.37 0.27 0.10 0.31 -0.37 0.27decreased by 10% 0.12 0.24 -0.33 0.23 0.15 0.12 -0.36 0.26 0.14 0.28 -0.36 0.26decreased by 25% 0.13 0.22 -0.29 0.21 0.16 0.11 -0.33 0.24 0.15 0.25 -0.33 0.24Transit fares increased by 5% 0.14 0.09 -0.13 0.05 0.17 0.02 -0.15 0.06 0.17 0.05 -0.14 0.06increased by 10% 0.11 0.09 -0.13 0.06 0.13 0.03 -0.14 0.07 0.13 0.07 -0.13 0.06increased by 25% 0.09 0.11 -0.13 0.07 0.11 0.05 -0.14 0.07 0.11 0.09 -0.14 0.07Driving cost increased by 5% 0.07 0.01 0.05 -0.08 0.10 0.03 0.04 -0.08 0.09 0.00 0.04 -0.08increased by 10% 0.04 0.02 0.06 -0.07 0.05 0.05 0.05 -0.07 0.05 0.02 0.06 -0.07increased by 25% 0.02 0.02 0.07 -0.07 0.03 0.05 0.06 -0.07 0.03 0.03 0.06 -0.07Driving in-vehicle travel time increased by 5% 0.12 0.12 0.25 -0.28 0.15 0.18 0.29 -0.32 0.15 0.13 0.28 -0.32increased by 10% 0.09 0.13 0.27 -0.28 0.11 0.20 0.30 -0.32 0.10 0.16 0.29 -0.32increased by 25% 0.08 0.13 0.28 -0.30 0.09 0.22 0.31 -0.34 0.08 0.18 0.31 -0.34Driving travel time uncertainty decreased by 5% 0.06 0.19 0.31 -0.31 0.03 0.27 0.29 -0.28 0.03 0.24 0.30 -0.29decreased by 10% 0.09 0.18 0.31 -0.31 0.07 0.25 0.28 -0.28 0.07 0.22 0.28 -0.28decreased by 25% 0.10 0.17 0.29 -0.29 0.09 0.23 0.26 -0.26 0.09 0.20 0.27 -0.26

Table 8:

Aggregate arc elasticities for MNP, MNR and Gen-MNR for the case study . Conclusion Models that are robust to violations of modelling assumptions and safeguard inferences againstaberrant choice behaviour have received limited attention in discrete choice analysis. In this paper,we present Bayesian formulations of two robust alternatives to the multinomial probit (MNP) model.These alternatives belong to the family of robit models whose kernel error distributions are heavy-tailed t-distributions. The ﬁrst alternative is the multinomial robit model, in which a single, genericdegrees of freedom (DOF) parameter controls the heavy-tailedness of the kernel error distribution.The second alternative is a generalised multinomial robit (Gen-MNR) model, whose kernel errordistribution is a t-distribution with alternative-speciﬁc DOF parameters. The kernel error distributionof Gen-MNR is more ﬂexible than the kernel error distribution of MNR, as it allows for differentmarginal heavy-tailedness. To the best of our knowledge, Gen-MNR has not been studied in theliterature before. For both models, we devise scalable and gradient-free Gibbs samplers, which addressthe limitations of estimation approaches of existing robit choice models.We contrast MNP, MNR and Gen-MNR in a simulation study and a case study on transport modechoice behaviour. The simulation study illustrates the excellent ﬁnite-sample properties of theproposed Bayes estimators. We also show that MNR and Gen-MNR yield more faithful elasticityestimates if the true data generating process involves a heavy-tailed kernel error distribution. In thecase study, we demonstrate that both MNR and Gen-MNR outperform MNP by a signiﬁcant margin interm of in-sample ﬁt and out-of-sample predictive ability. More speciﬁcally, Gen-MNR delivers the bestin-sample ﬁt and out-of-sample predictive due to its more ﬂexible kernel error distribution. Gen-MNRalso produces more plausible elasticity estimates than MNP and MNR, in particular regarding thedemand for under-represented alternatives in a class-imbalanced data set.On the whole, our analysis suggests that Gen-MNR is a useful addition to the choice modeller’stoolbox due to its robustness properties. In general, Gen-MNR should be preferred over the previously-studied MNR model because of its more ﬂexible kernel error distribution. In practice, the non-ellipticalcontoured t-distribution used in the formulation of Gen-MNR can also be speciﬁed in a way suchthat one DOF parameter controls the heavy-tailedness of more than one marginal of the kernel errordistribution. Analysts can exploit this feature of Gen-MNR to achieve more parsimonious modelspeciﬁcations.Our work suggests several directions for future research. First, the hierarchical Bayesian modellingparadigm can be leveraged to accommodate ﬂexible parametric and semi-parametric representationsof unobserved taste heterogeneity (see Krueger et al., 2020) into the MNR and Gen-MNR models.Incorporating these representations only requires adding another layer to the proposed Gibbs samplingschemes. Second, ﬂexible nonlinear speciﬁcations of the systematic utility can be incorporated intothe MNR and Gen-MNR models to enhance their expressiveness and predictive abilities. For example,Kindo et al. (2016) propose a MNP model in which the systematic utilities are represented usingthe Bayesian additive regression trees (BART) model of Chipman et al. (2010). BART automaticallypartitions a large predictor space to capture interaction effects and nonlinearities. As BART hasfoundations in the Bayesian inferential paradigm, BART components can be incorporated into MNRand Gen-MNR with relative ease. Third, the proposed MNR and Gen-MNR models can be extendedto skew-t-distributed kernel errors which can also account for asymmetric error distributions (Kimet al., 2008; Lee and Mclachlan, 2014). 19 uthor contribution statement

RK: conception and design, method development and implementation, data processing and analysis,manuscript writing and editing, supervision.PB: conception and design, manuscript writing and editing.MB: conception and design, manuscript editing, supervision.TG: conception and design, method development and implementation, manuscript writing andediting. 20 eferences

Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data.

Journal of the American statistical Association , 88(422):669–679.Alptekino˘glu, A. and Semple, J. H. (2016). The exponomial choice model: A new alternative forassortment and price optimization.

Operations Research , 64(1):79–93.Bezanson, J., Edelman, A., Karpinski, S., and Shah, V. B. (2017). Julia: A fresh approach to numericalcomputing.

SIAM review , 59(1):65–98.Brathwaite, T. and Walker, J. L. (2018). Asymmetric, closed-form, ﬁnite-parameter models of multi-nomial choice.

Journal of choice modelling , 29:78–112.Brier, G. W. (1950). Veriﬁcation of forecasts expressed in terms of probability.

Monthly weather review ,78(1):1–3.Burgette, L. F. and Nordheim, E. V. (2012). The trace restriction: An alternative identiﬁcation strategyfor the bayesian multinomial probit model.

Journal of Business & Economic Statistics , 30(3):404–410.Castillo, E., Menéndez, J. M., Jiménez, P., and Rivas, A. (2008). Closed form expressions for choiceprobabilities in the weibull case.

Transportation Research Part B: Methodological , 42(4):373–380.Chikaraishi, M. and Nakayama, S. (2016). Discrete choice models with q-product random utilities.

Transportation Research Part B: Methodological , 93:576–595.Chipman, H. A., George, E. I., McCulloch, R. E., et al. (2010). Bart: Bayesian additive regressiontrees.

The Annals of Applied Statistics , 4(1):266–298.Del Castillo, J. (2016). A class of rum choice models that includes the model in which the utility haslogistic distributed errors.

Transportation Research Part B: Methodological , 91:1–20.Del Castillo, J. (2020). Choice probabilities of random utility maximization models when the errorsdistribution is a polynomial copula with gumbel marginals.

Transportmetrica A: Transport Science ,16(3):439–472.Dill, J. and Rose, G. (2012). Electric bikes and transportation policy: Insights from early adopters.

Transportation research record , 2314(1):1–6.Ding, P. (2014). Bayesian robust inference of sample selection using selection-t models.

Journal ofMultivariate Analysis , 124:451–464.Dubey, S., Bansal, P., Daziano, R. A., and Guerra, E. (2020). A generalized continuous-multinomialresponse model with a t-distributed error kernel.

Transportation Research Part B: Methodological ,133:114–141.Fosgerau, M. and Bierlaire, M. (2009). Discrete choice models with multiplicative error terms.

Transportation Research Part B: Methodological , 43(5):494–505.Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013).

Bayesiandata analysis . CRC press. 21elman, A., Rubin, D. B., et al. (1992). Inference from iterative simulation using multiple sequences.

Statistical science , 7(4):457–472.Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation.

Journalof the American statistical Association , 102(477):359–378.Hillel, T., Elshaﬁe, M. Z., and Jin, Y. (2018). Recreating passenger mode choice-sets for transportsimulation: A case study of london, uk.

Proceedings of the Institution of Civil Engineers-SmartInfrastructure and Construction , 171(1):29–42.Imai, K. and Van Dyk, D. A. (2005). A bayesian analysis of the multinomial probit model usingmarginal data augmentation.

Journal of econometrics , 124(2):311–334.Jiang, Z. and Ding, P. (2016). Robust modeling using non-elliptically contoured multivariate tdistributions.

Journal of Statistical Planning and Inference , 177:50–63.Kim, S., Chen, M.-H., and Dey, D. K. (2008). Flexible generalized t-link models for binary responsedata.

Biometrika , 95(1):93–106.Kindo, B. P., Wang, H., and Peña, E. A. (2016). Multinomial probit bayesian additive regression trees.

Stat , 5(1):119–131.Krueger, R., Rashidi, T. H., and Vij, A. (2020). A dirichlet process mixture model of discrete choice:Comparisons and a case study on preferences for shared automated vehicles.

Journal of ChoiceModelling , 36:100229.Lange, K. L., Little, R. J., and Taylor, J. M. (1989). Robust statistical modeling using the t distribution.

Journal of the American Statistical Association , 84(408):881–896.Lee, S. and Mclachlan, G. J. (2014). Finite mixtures of multivariate skew t-distributions: some recentand new results.

Statistics and Computing , 24(2):181–202.Liu, C. (2004). Robit regression: a simple robust alternative to logistic and probit regression.

AppliedBayesian Modeling and Casual Inference from Incomplete-Data Perspectives , pages 227–238.Liu, J. S. (2008).

Monte Carlo strategies in scientiﬁc computing . Springer Science & Business Media.McCulloch, R. and Rossi, P. E. (1994). An exact likelihood analysis of the multinomial probit model.

Journal of Econometrics , 64(1-2):207–240.McFadden, D. (1981). Econometric models of probabilistic choice.

Structural analysis of discrete datawith econometric applications , 198272.Paleti, R. (2019). Discrete choice models with alternate kernel error distributions.

Journal of theIndian Institute of Science , pages 1–10.Peyhardi, D. J. (2020). Robustness of student link function in multinomial choice models.

Journal ofChoice Modelling , 36:100228.Rayaprolu, H. S., Llorca, C., and Moeckel, R. (2020). Impact of bicycle highways on commutermode choice: A scenario analysis.

Environment and Planning B: Urban Analytics and City Science ,47(4):662–677. 22obert, C. and Casella, G. (2013).

Monte Carlo statistical methods . Springer Science & Business Media.Scarinci, R., Markov, I., and Bierlaire, M. (2017). Network design of a transport system based onaccelerating moving walkways.

Transportation Research Part C: Emerging Technologies , 80:310–328.Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation.

Journal of the American statistical Association , 82(398):528–540.23 ppendix A Gibbs sampling details

A.1 Sampling w To update w , we iteratively sample from univariate truncated normal distributions. We have w i j ∼ T N ( µ i j , τ i j ) , for i =

1, . . . , N , j =

1, . . . , J −

1. (7)For MNP, we have µ i j = X (cid:62) i j β + Σ j , − j Σ − − j , − j ( w i , − j − X i , − j β ) and τ i j = Σ j j − Σ j , − j Σ − − j , − j Σ − j , j . For MNR,we have µ i j = X (cid:62) i j β + Σ j , − j Σ − − j , − j ( w i , − j − X i , − j β ) and τ i j = ( Σ j j − Σ j , − j Σ − − j , − j Σ − j , j ) / q i . For Gen-MNR,we have µ i j = X (cid:62) i j β + Q − / i j j Σ j , − j Σ − − j , − j Q / i , − j , − j ( w i , − j − X i , − j β ) and τ i j = ( Σ j j − Σ j , − j Σ − − j , − j Σ − j , j ) / q i j .Here, the index − l denotes the vector without the l th element. For all models, the constraint on w i j is w i j ≥ max { w i , − j } , if y i j = j ; w i j <

0, if y i j = J ; w i j ≤ max { w i j (cid:48) } , if y i j = j (cid:48) (cid:54) = j . A.2 Sampling ν The full conditional distribution of ν is nonstandard. Ding (2014) shows that p ( ν |· ) ∝ exp (cid:167) N ν (cid:16) ν (cid:17) − N log Γ (cid:16) ν (cid:17) + ( α − ) log ν − ξν (cid:170) , (8)where ξ = β + (cid:80) Ni = q i − (cid:80) Ni = log q i . Γ ( x ) denotes the Gamma function. Ding (2014) proposesto sample from (8) using a Metropolised Independence sampler (Liu, 2008) with an approximateGamma proposal. The shape parameter α ∗ and the rate parameter β ∗ of the proposal density areobtained as follows. The log conditional density of ν up to an additive constant is l ( ν ) = N ν (cid:16) ν (cid:17) − N log Γ (cid:16) ν (cid:17) + ( α − ) log ν − ξν . (9)The log density of the Gamma proposal is h ( ν ) = ( α ∗ − ) log ν − β ∗ ν . (10)The ﬁrst and second derivates of l ( ν ) and h ( ν ) are l (cid:48) ( ν ) = N (cid:104) log (cid:16) ν (cid:17) + − ψ (cid:16) ν (cid:17)(cid:105) + α − ν − ξ , h (cid:48) ( ν ) = α ∗ − ν − β ∗ , (11) l (cid:48)(cid:48) ( ν ) = N (cid:149) ν − ψ (cid:48) (cid:16) ν (cid:17)(cid:152) + α − ν , h (cid:48)(cid:48) ( ν ) = − α ∗ − ν , (12)where ψ ( x ) and ψ (cid:48) ( x ) are the di- and trigamma functions, respectively. The mode of h ( ν ) is α ∗ − β ∗ andthe corresponding curvature is ( β ∗ ) α ∗ − . We numerically ﬁnd the mode ν ∗ of l ( ν ) and its correspondingcurvature l ∗ = l (cid:48)(cid:48) ( ν ∗ ) . Ultimately, we match the modes and the corresponding curvatures of l ( ν ) and h ( ν ) to obtain α ∗ = − ( ν ∗ ) l ∗ , β ∗ = − ν ∗ l ∗ . (13)24 .3 Sampling q i j The full conditional distribution of q i j is nonstandard. Jiang and Ding (2016) show that p ( q i j |· ) ∝ exp (cid:26) − q i j u i j − (cid:112) q i j c i j + ν j −

12 log q i j (cid:27) , (14)where u i j = ν j +( Σ − ) j j ( w i j − X (cid:62) i j β ) and c i j = ( w i j − X (cid:62) i j β ) (cid:80) j (cid:48) (cid:54) = j (cid:128)(cid:113) q i j (cid:48) ( Σ − ) j j (cid:48) ( w i j − X (cid:62) i j β ) (cid:138) . Jiangand Ding (2016) propose to sample from (14) using a Metropolised Independence sampler (Liu,2008) with an approximate Gamma proposal. The shape parameter α ∗ and the rate parameter β ∗ ofthe proposal density are obtained as follows. For ν j ≤

1, we set α ∗ = β ∗ = u i j . For ν j > α ∗ and β ∗ are obtained through matching the modes and the corresponding curvatures of the target andthe proposal densities. The log conditional density of q i j up to an additive constant is f ( q i j ) = − q i j u i j − (cid:112) q i j c i j + ν j −

12 log q i j . (15)The log density of the Gamma proposal is g ( q i j ) = ( α ∗ − ) log q i j − β ∗ q i j . (16)The mode of (16) and its corresponding curvature are α ∗ − β ∗ = m ∗ i j and ( β ∗ ) α ∗ − = l ∗ i j , respectively. Theﬁrst and second derivatives of (15) are f (cid:48) ( q i j ) = − u i j − c i j (cid:112) q i j + ν j − q i j , f (cid:48)(cid:48) ( q i j ) = c i j (cid:199) q i j − ν j − q i j . (17)The mode of (15) is m ∗ i j = (cid:130) ci j + (cid:200)(cid:128) ci j (cid:138) + u i j ( ν j − ) ν j − (cid:140) − , and the corresponding curvature is l ∗ i j = f (cid:48)(cid:48) ( m ∗ i j ) .After matching the modes and corresponding curvatures of the log target and the log proposaldensities, we obtain α ∗ = − ( m ∗ i j ) l ∗ i j , β ∗ = − m ∗ i j l ∗ i j . (18)25 ppendix B Additional results for the simulation study B.1 Example I Figure 4:

Estimated posterior distribution and true values of the taste parameters { β , β , β , β } for MNR in simulation example I Figure 5:

Estimated posterior distribution and true values of the unique elements of the covari-ance matrix Σ for MNR in simulation example I .2 Example II Figure 6:

Estimated posterior distribution and true values of the taste parameters { β , β , β , β } for the Gen-MNR model in simulation example II Figure 7:

Estimated posterior distribution and true values of the unique elements of the covari-ance matrix Σ for the Gen-MNR model in simulation example IIfor the Gen-MNR model in simulation example II