[PDF] Analyzing Client Behavior in a Syringe Exchange Program

Abstract

Multiple syringe exchange programs serve the Chicago metropolitan area, providing support for drug users to help prevent infectious diseases. Using data from one program over a ten-year period, we study the behavior of its clients, focusing on the temporal process governing their visits to service locations and on their demographics. We construct a phase-type distribution with an affine relationship between model parameters and features of an individual client. The phase-type distribution governs inter-arrival times between reoccurring visits of each client and is informed by characteristics of a client including age, gender, ethnicity, and more. The inter-arrival time model is a sub-model in a simulation that we construct for the larger system, which allows us to provide a personalized prediction regarding the client's time-to-return to a service location so that better intervention decisions can be made with the help of simulation.

Full PDF

AAnalyzing Client Behavior in a Syringe Exchange Program

July 25, 2019

Haoxiang Yang , Yue Hu , David P. Morton Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston IL, [email protected], [email protected] Decision, Risk, and Operations, Columbia Business School, New York NY, [email protected]

Abstract

Multiple syringe exchange programs serve the Chicago metropolitan area, providing supportfor drug users to help prevent infectious diseases. Using data from one program over a ten-yearperiod, we study the behavior of its clients, focusing on the temporal process governing theirvisits to service locations and on their demographics. We construct a phase-type distributionwith an aﬃne relationship between model parameters and features of an individual client. Thephase-type distribution governs inter-arrival times between reoccurring visits of each client andis informed by characteristics of a client including age, gender, ethnicity, and more. The inter-arrival time model is a sub-model in a simulation that we construct for the larger system, whichallows us to provide a personalized prediction regarding the client’s time-to-return to a servicelocation so that better intervention decisions can be made with the help of simulation.

Keywords: syringe exchange program, personalized prediction, phase-type distribution, discrete-event simulation

Three major agencies provide syringe exchange programs (SEPs) in the Chicago metropolitanarea: Community Outreach Intervention Projects (COIP), Chicago Recovery Alliance (CRA), andTest Positive Aware Network (TPAN). Each agency oﬀers equipment and educational services andconducts research on drug users. With the goal of supporting persons who inject drugs (PWIDs)and helping prevent the spread of infectious diseases, they provide services including street outreach,counseling and training for preventing HIV and hepatitis C, case management for persons living withHIV, assistance in entering treatment for substance use, and HIV medical, mental, and pharmacycare. Their locations include storefronts and mobile vans, which may operate according to a ﬂexibleschedule. The beneﬁts of an SEP are twofold. First, multiple studies have shown that SEPs areeﬀective in reducing risk behavior such as sharing syringes [6, 7, 9, 22, 24], thereby lowering ratesof HIV and hepatitis C transmission. Second, higher utilization of an SEP provides PWIDs withmore opportunities to learn about treatment programs, which further reduces drug use [14, 23].This paper focuses on one of the SEPs in Chicago, and we refer to program participantsas clients.

Service locations, including storefronts and mobile vans, accept used syringes and, inexchange, provide clients with new syringes along with other devices that help prevent the spreadof disease, such as condoms, cookers, puriﬁed water, and bleach. On their ﬁrst visit to a servicelocation, clients are asked to take a voluntary survey involving demographic information and thenature of their drug use (frequency, types of drugs, etc.), which we detail in Section 2. Once a client1 a r X i v : . [ s t a t . A P ] J u l s established in the system, the SEP keeps a record of frequency of drug use, health condition,and general living condition. During a visit, an SEP employee will have a personal conversationwith a client, e.g., about recent life changes, employment status, and family situation. The SEPwill further provide the client with information to help with health issues, and seek to introducethe client to drug treatment programs. Evidence has shown that this type of personal interactioncan help clients obtain peer support to recover from substance addiction [13, 21, 25].Nationwide, about 681,000 Americans aged 12 years or older reported using heroin in 2013 [1],and the number of reported users grew every year from 2007 to 2013, with new users growingabout 70% from 2002 to 2013. The volume of heroin seized by oﬃcials, and the number of heroinoverdoses, both grew over the same period in Chicago [1]. The contrast between the signiﬁcantgrowth in the use of heroin and slightly lower use of services at the target SEP (see Section 2)motivates our study. In order to promote its services and tailor them to individual clients, theagency needs a better understanding of client behavior in using the SEP.The arrival process of clients to SEP sites is key to understanding their use of SEP services, andwe have data on arrival times and locations over a ten-year period. Our main focus is to develop a contextual understanding of the arrival process; i.e., we want to understand inter-arrival times given features of an individual client obtained, in part, from the voluntary demographic survey. Our datasuggest some clients “establish care” with the SEP, returning consistently, while others use SEPservices once and never return. We seek to model and understand the “life cycle” of an individualclient from initiation, reoccurring visits, and termination with the program. Through interviewswith SEP staﬀ, we learned that ethnicity, gender, age, geographic location, and drug history aﬀecthow a client will use SEP services. For example, African Americans are less active in SEP programsbecause they tend to prefer not to disclose their drug habits, and because African Americans whouse drugs tend to snort rather than inject. A client’s interaction with the SEP is aﬀected by lifeevents; e.g., a person who moves farther from a service location may visit less frequently. Theseare snapshots of a pervasive phenomenon that suggest many factors can inﬂuence client behaviorwhen using SEP services.To address these issues, we propose a model for how a client engages with the SEP using threeintegrated sub-models for initiation, reoccurring visits, and termination. The latter two sub-modelsare fully integrated and involve a phase-type distribution, which may be viewed as a continuous-time Markov chain with hidden states. To build a contextual model, we express the Markov chain’sparameters as a function of client features using linear and logistic regression. Analysis withour model can help the SEP understand the importance of diﬀerent features and hence estimatethe distribution governing a speciﬁc client’s next arrival time. This, in turn, can help SEP staﬀbetter allocate limited resources to improve the program’s eﬀectiveness. Armed with a probabilitydistribution for the timing of a client’s next visit, SEP staﬀ can be alerted when a speciﬁc client has2ot used SEP services for an unusual period of time, which may point to risky drug-use behavior.SEP employees can then contact the client (e.g., via a text message), or dispatch a mobile van tospeciﬁc locations and message nearby clients. Using simulation, we illustrate the potential value ofusing our model in this manner.Modeling arrival processes plays a key role in many application domains; see [26] for a review ofrelevant literature in healthcare. Homogeneous and nonhomogeneous Poisson processes are widelyused to model an arrival process [2, 18, 20], but do not address our primary goal, i.e., to provideinsights regarding a client’s return time based on individual predictors associated with that client.In principle, we could partition clients into diﬀerent categories based on their features and ﬁt sucharrival processes based on these categories. However, such an approach scales poorly given thenumber of features we consider.As we will discuss, our data on inter-arrival times of clients to SEP sites have heavier tails thanthat of an exponential distribution, and this is true even when we develop models that condition onsub-populations of the clients. This may arise for two related reasons: (i) the “establishing care”nature of some clients discussed above, and (ii) clients eﬀectively transitioning between hiddenstates, which capture active and passive engagement with the SEP. The three sub-model approachthat we propose allows us to capture these eﬀects, and its contextual nature captures heterogeneity.We use a negative binomial model for initiation, which is consistent with an over-dispersed mixtureof Poisson distributions that captures heterogeneity. We use a two-state continuous-time Markovchain model with unobservable states to capture reoccurring visits and termination. These twointegrated sub-models allow us to reasonably represent issues (i) and (ii) and their conditionaltransition- and system-exit probabilities allow us to capture heterogeneity.Hidden Markov chain models have been used to model scenarios in which an agent transitionsbetween a modest number of states, each associated with certain patterns of behavior. Paddocket al. [30] construct a Markov chain model to understand trajectories of a marijuana user with thegoal of using simulation to assess alternative treatment and prevention policies. Liu et al. [27] learna continuous-time hidden Markov chain model for disease progression in glaucoma and Alzheimer’sdisease. Chehrazi et al. [11, Appendix C] discuss using a two-state Markov chain to model therepayment behavior associated with delinquent credit card accounts, in which the two states rep-resent high and low repayment rates, with the goal of directing credit collection eﬀorts. HiddenMarkov chains enable modelers to capture plausible, but unobservable, transitions of an agent.Our approach can further enhance such models by allowing the parameters of the Markov chainto depend on agent-speciﬁc features using regression. Given requisite data, the types of modelsjust sketched could beneﬁt from our approach, increasing model ﬁdelity and insights from analysis,by linking agent heterogeneity to the Markov chain model. A technical challenge that we addressin this paper—at least for the family of serial Coxian models that we detail—involves parameter3stimation when using regression to map the features of an agent to the Markov chain’s parameters.We ﬁrst provide, in Section 2, an overview of the demographic and arrival data collected bythe SEP between 2005 and 2014. In Section 3, we provide details of the derivation of the threesub-models: initiation, reoccurring visits, and termination, which together capture individual fea-tures associated with clients. We present our computational techniques and results in Section 4,with model validation and an example of active intervention that our model can recommend. Weconclude the paper in Section 5 by summarizing the model and insights from our analysis. The data from the SEP consist of results from a survey and records of individual syringe exchangetransactions. The transaction data were collected from January 2001 to November 2014 with139,488 entries. The survey data were collected between July 2005 and November 2014 with 6,843surveys. Each survey entry corresponds to a unique client. When a new client arrives at a servicesite, the client is assigned a unique study number (henceforth, client ID) and is asked to completean enrollment survey. This client ID is then used throughout the client’s sojourn in the system.We use the data from July 2005 to November 2014 because the surveys are aligned consistentlywith the transaction records in this period. After removing incomplete and contaminated records,we combine the transaction and survey data to obtain a merged dataset with 63,960 entries, eachwith 50 data ﬁelds, which we detail in the following section.

The survey contains 31 questions, which yield 33 predictors, covering basic demographic informa-tion, ZIP code of residence, and the client’s drug use habits. These predictors are categorized asfollows: • Basic personal information: age, ethnicity, gender; • Length of time using/injecting drugs; • Frequency of using/injecting drugs in the past 30 days; • Frequency of reusing one’s own syringes in the past 30 days; • Frequency of sharing syringes with others in the past 30 days; • Involvement in group injection in the past 30 days; • Sources of syringes; • Types of drugs used; • Drug treatment program participation in the past six months;4

Reasons and frequency of being in the area with an SEP location in the past 30 days.Among 5,903 clients in our merged dataset, 4,101 are male, 1,800 are female, and two are trans-gender. The demographics of these clients are summarized in Tables 1 and 2.Ethnicity Number of clients Percentage (%)White 3,045 51.58African American 1,384 23.45Puerto Rican 873 14.78Mexican 364 6.17Other Latino 72 1.22Other 165 2.80Total 5,903 100Table 1: Ethnicity of clientsAge (years) Number of clients Percentage (%) <

15 1 0.0216 −

30 2,602 44.0831 −

45 2,108 35.7146 −

60 1,111 18.82 >

60 81 1.37Total 5,903 100Table 2: Age of clientsThe average age of ﬁrst drug use among all clients is 23.5 years. About 67.9% of clients injectdrugs daily, and 81.7% injected drugs more than 20 out of 30 days before taking the survey. Forall reasonable responses to the survey question regarding injection frequency (deﬁned as at most10 injections per day), the average number of injections per day is 2.77. About 82.6% of clientsreported that they did not use someone else’s syringe in the past 30 days, and 77.9% reported thatno one used their syringes over the same time period. Among 986 clients who injected with syringesothers had used in the past 30 days, 558 shared with their spouse, 53 shared with a family member,350 shared with a friend, 24 shared with an acquaintance, and 11 shared with a stranger. About72.1% of the clients did not share cookers, cotton or water during injection, and 8.6% of the clientsstated that they injected drugs in a shooting gallery within the 30 days before they began using thesyringe exchange service. About 69.2% of the clients were in the neighborhood of the SEP locationwhere they took the survey more than 20 days in a typical 30-day month, 25.1% of whom lived inthe neighborhood with an SEP location, 54.5% of whom had come mainly to buy drugs, 2.0% ofwhom had come mainly to exchange syringes, and 4.6% of whom had come to visit friends. Thevoluntary nature of the survey could lead to a non-response bias in the results; see, for example,5igure 1: The time series of transactions from six SEP service locationsthe illustrated cases and discussion in [12, 28]. Because every SEP transaction is labeled with aclient ID, we can compute the fraction of clients for whom we have a completed survey, which is98.6%. Discussions with SEP staﬀ conﬁrmed that although the survey is voluntary, almost everynew client takes the survey upon the recommendation of SEP staﬀ, and so the overall eﬀect ofnon-response bias is reasonably small in our study.

The transaction data include details of syringe exchanges that occurred in multiple SEP storefrontsand on mobile van routes. During each transaction, SEP employees record the client ID, numberof syringes exchanged, number of other preventive devices distributed, size of the group comingwith the client, and type(s) of health education material given to the client. In between July 2005and November 2014, the SEP distributed 3,647,384 syringes, 160,895 male and female condoms,and 63,667 sets of educational material. The mean size of the group coming with the client duringa single transaction was 1.89 with a standard deviation of 2.37. The mean number of syringesexchanged in one transaction was 57.03 with a standard deviation of 110.88. Our discussions withthe SEP suggested that there are ample staﬀ to process clients’ visits, and that the SEP alwayshad enough syringes and rarely ran out of other drug-injection equipment. Thus we see the levelof censoring in the demand data as minimal.We present a time series of the number of transactions from July 2005 to November 2014 in6igure 2: The time series of aggregated monthly transactions and syringes exchangedFigure 1. The ﬁgure shows transactions at four storefronts and mobile distribution in two areas,where a mobile van was dispatched to certain locations with a ﬂexible schedule. According toSEP staﬀ, the schedule for the van was communicated to clients during their visits and at someshooting galleries. Clients could also inquire about the schedule through phone calls. The serviceat Location 6 was terminated in January 2010, and some of its clients started visiting Location 4afterward, consistent with the increasing trend for Location 4. Figure 1 also suggests a decliningnumber of visits for Location 2. Figure 2 shows the ﬂuctuation of monthly aggregated transactions.The ﬁgure suggests a slight decrease in the number of monthly transactions over the ten-year timespan. The number of syringes exchanged surged between 2009 and 2012 but appeared to decreaseafterward.

We seek to develop a predictive model for a client’s arrival process based on a number of predictorssuch as race, gender, age, and more. To this end, we segment a client’s experience in the system intothree sub-processes: initiation, reoccurring visits, and termination. Features of the client are usedas covariates to estimate parameters of the model of reoccurring visits and termination. We canachieve two objectives with our model. First, we can forecast the next arrival of a speciﬁc client,given that client’s features and most recent arrival time. Second, we can simulate the system andperform sensitivity analysis on speciﬁc model parameters. The former can help the SEP identify7igure 3: Empirical distribution ﬁt of number of daily initiations to a negative binomial distributionirregular behavior and take prompt intervention measures. The latter can guide initiatives toimprove system-wide performance.The overall structure of the model we formulate is that we build sub-models of these threeindividual sub-processes. While we have analytical models of these sub-components, our overallmodel, which combines these sub-models, can only be executed as a simulation, as we describe aftercharacterizing the sub-models.

The ﬁrst time that a client uses the SEP is called the initiation of the client’s arrival process. Therecorded initiation data are clear, and so we focus on how to simulate new initiations. We examinethe distribution of the number of initiations per day, i.e., the arrivals generated by clients who havenever previously visited a service location. Since our SEP started recording survey data four yearsafter recording transactions, some returning clients were asked to complete the survey starting inJuly 2005, even though their true initiation was earlier. In an attempt to avoid inﬂating someinitiation counts, we use data starting from January 2007 in order to ﬁt a distribution to estimatethe initiation process. The blue bars in Figure 3 show the empirical distribution of initiations perday. We ﬁt a negative binomial distribution with parameter (3 , . We track the history of clients who visit the SEP service sites multiple times and plot the distri-bution of inter-arrival times. The inter-arrival time is deﬁned here as the duration between twoconsecutive visits made by the same client. Figure 4 suggests that the distribution has a heavytail, i.e., it shrinks to zero more slowly than an exponential.As we suggest in Section 1, using a Poisson process to model inter-arrival times does not helpwith our main goal, which is to provide contextual, i.e., client-speciﬁc predictions. Putting thisaside for a moment, we tested the goodness of ﬁt associated with a Poisson process for our aggregatedataset, and for datasets associated with sub-populations of clients based on ethnicity, age, andgender. Moreover, we investigated both homogeneous and non-homogeneous (e.g., piecewise con-stant arrival rate by week) Poisson processes. For the models we assessed, statistical tests yielded p -values that were vanishingly small, suggesting that such models fail to provide an adequate rep-resentation. This is consistent with our observation from Figure 4, giving further evidence thatmodeling inter-arrival times using an exponential distribution may not be appropriate.Figure 4: Log-log relationship between frequency and inter-arrival time. The logarithms in theﬁgure are base 10, and underlying inter-arrival times are in days.We model client inter-arrival times with a phase-type distribution, because of its goodness ofﬁt and its potential interpretability. The distribution of any nonnegative random variable can9e approximated with high accuracy using a phase-type distribution; see, e.g., [3]. A phase-typedistribution can be expressed as the time required for a continuous-time Markov chain (CTMC) toenter an absorbing state (say, state 0) from a randomly selected transient state, { , , . . . , n } ; see,e.g., [10]. A probability mass function, denoted by α = ( α i ) i =1 ,...,n , governs the initial state of theCTMC. The inﬁnitesimal generator is constructed in the following manner: (cid:20) a Q (cid:21) , (1)where a is an n -dimensional vector specifying the transition rates from the transient states tothe absorbing state, and Q is an n × n matrix specifying the transition rates among transientstates, where Q ( i, j ) denotes the rate of transitioning from state i to state j , and Q ( i, i ) = − [ (cid:80) j (cid:54) = i,j =1 ,...,n Q ( i, j ) + a ( i )]. The ﬁrst row in the generator of equation (1) corresponds to thetransition rates from the absorbing state to any other transient state, which are always 0. The prob-ability density function (pdf) of the phase-type distribution can be characterized as f ( t ) = αe Qt a ,and the cumulative distribution function (cdf) is given by F ( t ) = 1 − αe Qt , where is the n -dimensional vector of all 1’s.There are multiple ways to ﬁt a phase-type distribution to data; see, e.g., [29]. Here, weformulate a nonlinear optimization model rooted in maximum-likelihood estimation (MLE), coupledwith a regression model that uses covariates of the clients. We ﬁrst describe the MLE approach inthe context of the Coxian distribution, a special case of phase-type distributions, and then coupleit with client-speciﬁc predictors.For a Coxian distribution, the embedded Markov chain has n + 1 states, as shown in Fig-ure 5. The stochastic process starts in state 1, i.e., α = (1 , , . . . , i = 1 , . . . , n −

1, we can transition to only the adjacent transient state, i + 1 or to the absorbingstate, 0. The rate at which we depart state i is γ i , and we transition to the absorbing state withprobability q i and to the adjacent transient state with probability 1 − q i . As Figure 5 also depicts,transient state n can only transition to the absorbing state.Figure 5: CTMC depiction of the Coxian distribution10igure 6: An equivalent CTMC to the Coxian distribution’s model in Figure 5A Coxian distribution is more parsimonious than a general phase-type distribution. The formermodel contains 2 n − n + n . Bobbio and Cumani [8]show that the Coxian model in Figure 5 is equivalent to another CTMC model that is depictedin Figure 6. The latter formulation is helpful in our setting because it oﬀers a linear structure forcapturing client-speciﬁc features as we describe below. The relationship between the q i parametersin the ﬁrst model and the β i parameters in the second model is given by the following equations: q = β = 0 β i = q i i − (cid:89) k =0 (1 − q k ) i = 1 , , . . . , nq i = β i − (cid:80) i − k =0 β k i = 1 , , . . . , n. A CTMC lends itself to interpretation by associating transitions in the model with assumedphases of a client using syringes after leaving a service location. In addition to the referencesmentioned in Section 1, which use CTMCs with hidden states [11, 27, 30], we note that such anapproach has also been used to model the length of stay of hospital patients [15, 16, 17], includingwork in which serial (Coxian) CTMCs are employed. We use a similar philosophy by inferringtransition rates from unobservable states and, moreover, connecting them to features of a client.We can interpret a Coxian distribution with n = 2 in our setting as follows. After visiting an SEPlocation, the client enters an (unobservable) “active state” with probability β , and subsequentlyreturns to an SEP site after an exponentially distributed delay with parameter γ . Alternatively,with probability β = 1 − β the client enters a “passive state.” Returning to an SEP site is thenthe sum of two independent exponential random variables with rates γ and γ , where we expect γ > γ . The passive state could correspond to the client temporarily seeking another source ofsyringes, for example.Bobbio and Cumani [8] present an MLE procedure to ﬁt the parameters for the Coxian dis-tribution. Their method, however, is not directly applicable when we express β and γ as aﬃnefunctions of predictors associated with clients. A result of [5] allows us to express the pdf and cdfof a sum of independent exponential random variables, and we can use this result to write the pdf11nd cdf of a Coxian random variable as: f ( t ) = n (cid:88) i =1 β n +1 − i f i ( t ) = n (cid:88) i =1  β n +1 − i (cid:32) n +1 − i (cid:89) l =1 γ l (cid:33) n +1 − i (cid:88) j =1 e − γ j t (cid:81) n +1 − ik =1 ,k (cid:54) = j ( γ k − γ j )  (2) F ( t ) = n (cid:88) i =1 β n +1 − i F i ( t ) = n (cid:88) i =1  β n +1 − i (cid:32) n +1 − i (cid:89) l =1 γ l (cid:33) n +1 − i (cid:88) j =1 (1 − e − γ j t ) /γ j (cid:81) n +1 − ik =1 ,k (cid:54) = j ( γ k − γ j )  . (3)Here f i and F i are the pdf and cdf of the inter-arrival time, conditioned on beginning in state i . Weindex the collection of inter-arrival times by S , and denote each inter-arrival time by t s , s ∈ S , andsimilarly denote censored inter-arrival times by t u , u ∈ U . For every client, the time from the lastarrival to the end of the observation horizon can be considered a right-censored inter-arrival time.The likelihood function is the product of the pdf of each inter-arrival time and the complement ofthe cdf of each censored inter-arrival time. Maximizing the log-likelihood function then leads tothe following problem:max β,γ ≥ (cid:88) s ∈S log( f ( t s )) + (cid:88) u ∈U log (1 − F ( t u )) (4a)s.t. n (cid:88) i =1 β i = 1 (4b) f, F deﬁned as in (2) and (3), ∀ t s , s ∈ S and t u , u ∈ U , respectively . (4c)The ﬁrst term in the objective function of model (4) corresponds to observed inter-arrival times,and the second term corresponds to right-censored data in which we do not know the inter-arrivaltime, only that it exceeds, t u ; see, e.g., [31] for such treatments of right-censored data. We notethat a limiting analysis shows that equations (2)-(3) remain valid even when rates at distinct statesare identical [5]. That said, this can cause numerical diﬃculties and we return to this issue below.So far, the described ﬁtting procedure assumes all clients behave according to the same model.As discussed above, we seek to incorporate the features of clients when we construct the parametersof the Coxian distribution. Here we use an aﬃne relationship to connect those features to theparameters of the Coxian distribution. We use V to represent the set of clients, and we use j = 1 , , . . . , m to index the characteristics of clients. We use x j,v , ∀ j = 1 , . . . , m, v ∈ V , to denotethese predictors. We also use v ( s ) ∈ V to specify the client associated with the s -th inter-arrivaltime, and we similarly deﬁne v ( u ) for the client associated with the u -th censored inter-arrival time.Our extension of model (4) to incorporate client-speciﬁc predictors is given by: max β,γ,b,g,(cid:15) (cid:88) s ∈S log( f ( t s )) + (cid:88) u ∈U log (1 − F ( t u )) − η β (cid:107) (cid:15) β (cid:107) (5a)s.t. n (cid:88) i =1 β i,v = 1 ∀ v ∈ V (5b) i,v = m (cid:88) j =1 b i,j x j,v + b i, + (cid:15) βi,v ∀ i = 1 , , , . . . , n, v ∈ V (5c) γ i,v = m (cid:88) j =1 g i,j x j,v + g i, ∀ i = 1 , , . . . , n, v ∈ V (5d) f ( t s ) = n (cid:88) i =1  β n +1 − i,v ( s ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( s ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( s ) t s (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( s ) − γ j,v ( s ) (cid:1)  ∀ s ∈ S (5e) F ( t u ) = n (cid:88) i =1  β n +1 − i,v ( u ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( u ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( u ) t u /γ j,v ( u ) (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( u ) − γ j,v ( u ) (cid:1)  ∀ u ∈ U (5f) β i,v ≥ ∀ i = 1 , , . . . , n, v ∈ V (5g) γ i,v ≥ ∀ i = 1 , , . . . , n, v ∈ V . (5h) The idea behind model (5) is that we have predictors associated with each client, and con-straints (5c) and (5d) express the parameters of the Coxian model as an aﬃne function of thesepredictors. Constraint (5b) replicates constraint (4b), and constraints (5e) and (5f) deﬁne the pdfand cdf terms that appear in the log-likelihood in the ﬁrst two terms of the objective function.Model (5) combines elements of regression and maximum likelihood estimation. The ﬁrst twoterms in the objective function maximize log-likelihood in the spirit of model (4). Constraints (5c)-(5d) deﬁne the regression model. Parameters β i,v and γ i,v are unobservable, and so, in principle,we could have no residual term in the regression model. However, constraint (5b) requires that theconditional probabilities that we return to states 1 , , . . . , n sum to 1. So, to maintain feasibilitywe add a residual, (cid:15) βv , in equation (5b) and penalize its two-norm using a positive weight η β inthe ﬁnal term in the objective function (5a). While not explicit in model (5)’s statement, to helpprevent overﬁtting, we regularize the regression parameters b and g by adding terms − η b (cid:107) b (cid:107) and − η g (cid:107) g (cid:107) to the objective function (5a).While a limiting analysis shows the validity of equation (2)-(3) even when some of the compo-nents of γ are identical, allowing this when optimizing can cause numerical problems. Moreover,our motivation, sketched above, includes the idea that the sojourn times should be larger in thepassive state than in the active state, and for both reasons we add the following constraint tomodel (5): γ i,v − γ i +1 ,v ≥ δ, ∀ i = 1 , , . . . , n − , v ∈ V , where δ > . (6) We assume that each client has a possibility to exit the system after each visit to the SEP.Speciﬁcally, as soon as the CTMC hits the absorbing state, we assume that, with probability p v , client v ∈ V will stop visiting our SEP.However, we cannot observe a client leaving the system because we only observe their visits. Ifa client visits service locations multiple times, then we know that for every visit before the last one,13he client is still in the system, and so the likelihood function is conditioned on the client remainingin the system. After the last visit, a client may stay in the system or may leave. We need toincorporate this information in the likelihood function. Conditioned on the client remaining in thesystem, the likelihood function is 1 − F ( t u ), where t u , u ∈ U , is the time between the client’s lastvisit and the end of the observation horizon. As a result, the log-likelihood function can be revisedas: (cid:88) s ∈S (cid:0) log( f ( t s )) + log(1 − p v ( s ) ) (cid:1) + (cid:88) u ∈U log (cid:0) (1 − F ( t u )) (1 − p v ( u ) ) + p v ( u ) (cid:1) . (7)We again model parameter p v via a functional relationship with client v ’s covariates. Instead of anaﬃne relationship, we use a logistic function as follows: p v = (cid:16) e − ( ρ + (cid:80) mj =1 ρ j x j,v ) (cid:17) − , (8)where we will optimize the ﬁt via parameters ρ j and ρ . Given that p v ∈ (0 , p by combining the results of (5), (7), and (8). Since thelikelihood function is conditioned on whether the client has exited the system, we need to solve anonlinear optimization problem as follows, which ﬁts parameters for both reoccurring visits andtermination, integrating our latter two sub-models: max β,γ,b,g,ρ,p,(cid:15) (cid:88) s ∈S (cid:0) log( f ( t s )) + log(1 − p v ( s ) ) (cid:1) + (cid:88) u ∈U log (cid:0) (1 − F ( t u )) (1 − p v ( u ) ) + p v ( u ) (cid:1) − η β (cid:107) (cid:15) β (cid:107) (9a)s.t. n (cid:88) i =1 β i,v = 1 ∀ v ∈ V (9b) β i,v = m (cid:88) j =1 b i,j x j,v + b i, + (cid:15) βi,v ∀ i = 1 , , , . . . , n, v ∈ V (9c) γ i,v = m (cid:88) j =1 g i,j x j,v + g i, ∀ i = 1 , , . . . , n, v ∈ V (9d) p v = (cid:16) e − ( ρ + (cid:80) mj =1 ρ j x j,v ) (cid:17) − ∀ v ∈ V (9e) f ( t s ) = n (cid:88) i =1  β n +1 − i,v ( s ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( s ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( s ) t s (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( s ) − γ j,v ( s ) (cid:1)  ∀ s ∈ S (9f) F ( t u ) = n (cid:88) i =1  β n +1 − i,v ( u ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( u ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( u ) t u /γ j,v ( u ) (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( u ) − γ j,v ( u ) (cid:1)  ∀ u ∈ U (9g) β i,v ≥ ∀ i = 1 , , . . . , n, v ∈ V (9h) γ i,v ≥ ∀ i = 1 , , . . . , n, v ∈ V . (9i) Solving problem (9) maximizes the log-likelihood function for the combination of reoccurringvisits and termination. Similar to ﬁtting the reoccurring visits, we also add a regularization term14 η ρ (cid:107) ρ (cid:107) in the objective function to prevent overﬁtting. Constraint (9e) models the logistic rela-tionship between the parameter p and the covariates x . Given the ﬁt value of ρ ∗ and the featuresof client v , we can calculate the termination probability of that client, p v . In this section, we ﬁrst discuss how we solve model (9) and its simpler variants, along with pre-liminary results in which we do not use the covariates of the clients. Then we present the resultsof the ﬁt model, provide insights as to how diﬀerent features predict client behavior, test elementsof model validity, and show an example of how our personalized arrival model can guide activeintervention.

We use a Coxian model with n = 2 transient states. We interpret one phase as the client being inan active state with the SEP, i.e., with more frequent visits to exchange syringes, and we interpretthe other phase as a passive state with less frequent visits.Model (9) and its variants are computationally challenging nonconvex optimization problems.We use Ipopt 3.12.1 [32], with linear solver MA27, to solve instances of these optimization problems.Due to nonconvexity, we only obtain locally optimal solutions. In addition, because numerical issuescan arise, we brieﬂy sketch ways in which we “help” the solver.We scale all continuous data, i.e., client predictor data, so that it is normalized. As discussedat the end of Section 3.2, we enforce γ ,v ≥ γ ,v + δ , and we use δ = 0 .

005 in our computation. Fornumerical reasons, we also bound the γ -parameters away from zero, by enforcing γ ≥ ¯ γ ≡ . β, γ , and p . We do this for two reasons. First, it provides insight regarding typicalvalues of these parameters, and second, as we discuss in further detail below, it helps provide agood initial solution for model (9). In particular we solve:max β,γ,p (cid:88) s ∈S (log( f ( t s )) + log(1 − p )) + (cid:88) u ∈U log ((1 − F ( t u ))(1 − p ) + p ) (10a)s.t. n (cid:88) i =1 β i = 1 (10b) f ( t s ) = n (cid:88) i =1  β n +1 − i (cid:32) n +1 − i (cid:89) l =1 γ l (cid:33) n +1 − i (cid:88) j =1 e − γ j t s (cid:81) n +1 − ik =1 ,k (cid:54) = j ( γ k − γ j )  ∀ s ∈ S (10c) F ( t u ) = n (cid:88) i =1  β n +1 − i (cid:32) n +1 − i (cid:89) l =1 γ l (cid:33) n +1 − i (cid:88) j =1 e − γ j t u /γ j (cid:81) n +1 − ik =1 ,k (cid:54) = j ( γ k − γ j )  ∀ u ∈ U (10d) γ i − γ i +1 ≥ δ ∀ i = 1 , , . . . , n − i ≥ ∀ i = 1 , , . . . , n (10f) γ n ≥ ¯ γ (10g)0 ≤ p ≤ . (10h)Solving model (10) leads to parameters for a “featureless” client, as follows:ˆ β = 0 . , ˆ β = 0 . γ = 0 . , ˆ γ = 0 . p = 0 . . This result suggests that after each visit, the featureless client has a 9 .

8% chance of exiting theSEP system. Conditional on the client visiting an SEP site again, the client returns via the activestate (frequent visits) with a probability of about 0 .

82. The mean return time to the SEP fromthis state is 1 /γ ≈

19 days. With probability about 0 .

18, the client returns via the passive state,and the expected time to visit the SEP is then 1 /γ + 1 /γ ≈

350 days.Rather than optimizing over the intercept terms, b , g and ρ in model (9), we ﬁxed theseterms as: b , = ˆ β b , = ˆ β g , = ˆ γ g , = ˆ γ ρ = ln (cid:18) ˆ p − ˆ p (cid:19) . Fixing the intercept terms in this way allows us to interpret parameters b i,j , g i,j , and ρ j for j =1 , , . . . , m as deviations from the featureless client. Moreover, ﬁxing these parameters helps improvethe numerical performance of Ipopt when solving the nonconvex problem, in part by eﬀectivelyproviding a good initial solution.After adding the regularization terms for b , g and ρ described in Section 3.2 and 3.3, and ﬁxingthe value of b , g and ρ , we solve the following nonlinear program: max β,γ,b,g,ρ,p,(cid:15) (cid:88) s ∈S (cid:0) log( f ( t s )) + log(1 − p v ( s ) ) (cid:1) + (cid:88) u ∈U log (cid:0) (1 − F ( t u )) (1 − p v ( u ) ) + p v ( u ) (cid:1) − η β (cid:107) (cid:15) β (cid:107) − η b (cid:107) b (cid:107) − η g (cid:107) g (cid:107) − η ρ (cid:107) ρ (cid:107) (11a)s.t. n (cid:88) i =1 β i,v = 1 ∀ v ∈ V (11b) β i,v = m (cid:88) j =1 b i,j x j,v + b i, + (cid:15) βi,v ∀ i = 1 , , , . . . , n, v ∈ V (11c) γ i,v = m (cid:88) j =1 g i,j x j,v + g i, ∀ i = 1 , , . . . , n, v ∈ V (11d) v = (cid:16) e − ( ρ + (cid:80) mj =1 ρ j x j,v ) (cid:17) − ∀ v ∈ V (11e) f ( t s ) = n (cid:88) i =1  β n +1 − i,v ( s ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( s ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( s ) t (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( s ) − γ j,v ( s ) (cid:1)  ∀ s ∈ S (11f) F ( t u ) = n (cid:88) i =1  β n +1 − i,v ( u ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( u ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( u ) t u /γ j,v ( u ) (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( u ) − γ j,v ( u ) (cid:1)  ∀ u ∈ U (11g) γ i,v − γ i +1 ,v ≥ δ ∀ i = 1 , , . . . , n − , v ∈ V (11h) β i,v ≥ ∀ i = 1 , , . . . , n, v ∈ V (11i) γ n,v ≥ ¯ γ ∀ v ∈ V (11j) b i, = ˆ β i , g i, = ˆ γ i , ρ = ln (cid:18) ˆ p − ˆ p (cid:19) ∀ i = 1 , , . . . , n. (11k) With modest tuning eﬀort, we select the following weights on the regularization terms: η β = 100 , η b = 100 , η g = 1000 , η ρ = 10 . The results from ﬁtting the parameters using the method in Section 4.1 are displayed in Table 3.The estimators are represented by ρ , b , and g . A positive ρ j , j = 1 , , . . . , m , indicates that havingfeature j increases the probability that the client will exit the system. Given that the client staysin the system, a positive value of parameter b ,j increases the probability that the client returns tothe active state. We do not report b ,j because its coeﬃcient diﬀers from b ,j ’s by a sign. Positivecoeﬃcients g ,j and g ,j lead to increased frequencies, i.e., shorter mean times, associated with theactive and passive states, respectively. The coeﬃcients in Table 3 are given in either regular font orgray font. The former category is signiﬁcant, and the latter is not, where “signiﬁcant” is deﬁned ashaving at least 90% of bootstrapped replications having the same sign, as detailed in Appendix B.Column ∆ in Table 3 shows the amount by which the conditional expected inter-arrival timechanges (in days) if a client has that row’s feature but is otherwise a featureless client. Column T similarly shows the magnitude by which the expected sojourn time in the system changes due toa single feature. For context, the mean sojourn time of a featureless client is 806 . T ).17actor ( j ) ρ j b ,j g ,j g ,j ∆ T Gallery 0.5448 -0.0387 0.0027 0.0003 5.9 -268.2From Other Locations 0.2047 -0.0156 0.0023 0.0004 -3.2 -162.2From Other SEP -0.3156 0.0248 -0.0063 0.0000 -6.2 186.3From Friends -0.1194 -0.0045 -0.0011 -0.0007 20.9 330.1From Strangers -0.2697 0.0142 0.0020 0.0010 -19.3 -26.8Speedball -0.0293 -0.0704 0.0146 -0.0008 48.1 524.9Heroin -0.0437 -0.0258 0.0061 -0.0008 31.4 365.6In Treatment -0.3254 0.0243 -0.0126 0.0002 -4.7 215.7Been in Treatment -0.1137 -0.0178 0.0063 -0.0001 5.9 154.8Female 0.0154 0.0217 -0.0076 0.0008 -14.8 -159.8White 0.0000 0.0218 -0.0005 0.0002 -9.9 -101.0African American 0.3623 -0.0073 -0.0026 0.0001 1.6 -209.6Puerto Rican -0.7594 0.0084 0.0101 0.0001 -7.4 673.3Mexican -0.3383 0.0014 0.0014 0.0010 -15.6 76.0Other -0.1994 0.0088 -0.0054 0.0000 -0.7 151.9Age of First Drug Use 1.1219 0.0117 0.0007 0.0003 -8.8 -525.7Drug Use Span 1.8506 -0.0042 -0.0030 -0.0001 4.4 -602.3FUD 0.0332 0.0073 -0.0011 -0.0001 0.4 -19.6FROS 0.0380 -0.0051 0.0000 0.0000 1.7 -10.5FBSA -0.1968 -0.0033 0.0058 0.0002 -5.2 95.4

Note: Here, ρ j , b ,j , and g ,j / g ,j are factor-speciﬁc regression coeﬃcients for the probability of exiting the system,probability of returning to the active state, and mean transition times in the CTMC, respectively. Based on thefactor in each row, parameter ∆ denotes the amount by which the conditional expected inter-arrival time changes,and T similarly denotes changes in the system sojourn time, both in days. Table 3: Fitted parameters of the Coxian process2. Clients who can obtain syringes from other locations, such as pharmacies, are more likely toexit the system, perhaps because they are not reliant on SEP services. On the other hand,if the client obtains syringes from other sources (other SEPs, friends, and strangers), whichmay not be as reliable, it is more likely for the client to remain in the system.3. The b -coeﬃcient associated with speedball (a type of drug mixing cocaine with heroin ormorphine) is strongly negative, meaning the client is less likely to stay in the active state,leading to an increase in expected inter-arrival time.4. A client in a treatment program is more likely to stay in the SEP system, as indicated by anegative value of ρ and a positive value of T .5. A female client is more likely to visit frequently, but remains in the system for a shorterperiod of time.6. The probability of exiting the system diﬀers signiﬁcantly according to ethnicity: African-American clients are more likely to exit the system while the opposite is true for PuertoRican and Mexican clients. 18. Not surprisingly, clients who are more frequently near an SEP site (i.e., have larger values ofFBSA) are less likely to leave the system, are more likely to visit a site frequently, and overallhave longer sojourn times in the system.Such observations from the model we ﬁt may allow our SEP to tailor promotion of their servicesto speciﬁc target populations. For example, the starkly diﬀerent behavior of African-Americanclients may warrant special attention from the SEP in early encounters, relative to PWIDs of otherethnicities. Bean [4] states that African-American drug users are less likely to inject drugs thanWhite drug users. However, among all clients, it is not clear why African Americans are less likelyto seek the syringe exchange services oﬀered by our SEP. Investigating whether African Americansare more likely to quickly abandon injection drug use, versus continue use but not seek SEP services,would likely be needed to guide such strategies. Our results show that it would be beneﬁcial toincrease the frequency of clients being in the service location area, and one possible solution is toincrease the frequency of the mobile van service in certain locations to improve accessibility. The model of Section 3 quantiﬁes client behavior. In this section, we perform statistical tests andcompare results of the simulation outputs with observed data in order to assess model validity.We begin with further details on the implementation of the simulation model. We simulate thearrival of clients to the SEP over 2,310 days, equivalent to the number of days that our SEP wasopen between July 2005 and November 2014. We also simulate an initial 5,000 days as a warm-upperiod, since our SEP was established more than 15 years before 2005. We assume SEP staﬀ arealways available to serve a visiting client, there are no shortages of syringes or other resources thatwould alter client behavior, and the location and operating schedule of storefronts and mobile vansare ﬁxed. In other words, we assume the nature of client visits is governed solely by the features ofthe clients, in a manner consistent with historical data, and is not aﬀected by service or resourceavailability. This matches our understanding of the actual system, as we discuss in Section 2. Inour simulation of 7,310 days in total, for each day we ﬁrst simulate the number of new clientsaccording to the negative binomial distribution. We assign features to these clients by drawing aclient at random, with replacement, from the list of 5,903 clients described in Section 2. Given thefeatures of a client, x , and the parameters, ρ , b , and g , obtained from model (11), we can calculatethe parameters of the CTMC, β i , γ i , i = 1 ,

2, and p , by: β i = m (cid:88) j =1 b i,j x j i = 1 , γ i = m (cid:88) j =1 g i,j x j i = 1 , a) Coxian Inter-arrival Model (b) Exponential Inter-arrival Model Figure 7: Part (a) of the ﬁgure shows the log-log relationship between frequency and the Coxianinter-arrival time model. Part (b) shows the analogous relationship for the exponential inter-arrivaltime model. In both subplots the (simulated) model values are shown with red dots and observeddata are shown with blue dots. The logarithms are base 10 and the underlying inter-arrival timesin days. p = (cid:16) e − ( ρ + (cid:80) mj =1 ρ j x j ) (cid:17) − . (12c)With the CTMC built for every client, upon the arrival of a speciﬁc client, we simulate whetherthis client exits the system using p from equation (12c). And, if the client does not exit the system,we simulate the time of the next arrival using the β and γ values from equations (12a)-(12b). Afterthe warm-up period and simulating the 2,310 days of interest, we obtain summary statistics asoutputs based on inter-arrival times, sojourn times, and number of visits of each client.We perform statistical tests to assess the quality of our sub-models for initiation, reoccurringvisits, and termination. For initiation, we selected the negative binomial distribution for reasonsthat we discuss in Section 3.1. Using a Pearson’s chi-squared goodness-of-ﬁt test we obtain a p -value of 0.250, suggesting that we should not reject the null hypothesis that the data are consistentwith the ﬁt distribution. For comparison, we also ﬁt other commonly used distributions (geometric,binomial, uniform, Poisson, and hyper-geometric), and we performed the same goodness-of-ﬁt tests.None of those distributions had a p -value that exceeded a 0.05 level of signiﬁcance.For reoccurring visits, we compare the distribution of inter-arrival times obtained from thesimulation model with an exponential distribution with a mean of 67 .

50 days, which is the meanof observed inter-arrival times. Figure 7 illustrates such a comparison. Results from the simulatedCoxian process are shown on the left, and those from the exponential distribution are on theright, both in red. In addition, we plot actual observations in blue. The ﬁgure suggests that ourCoxian-based simulation model provides a better ﬁt to the observed data.Since we do not directly observe whether a client has left the system, we test the right-censoredsojourn time of clients in the system. We run a Pearson’s chi-squared procedure to test the null20igure 8: The log-log (base 10) relationship between frequency and simulated sojourn time is shownwith red dots, and the same relationship between frequency and observed sojourn time is shownwith blue dots.hypothesis that the simulated distribution is consistent with observed data. The p -value of the testis 0 . Our simulation model can facilitate analysis to provide SEP staﬀ with insights regarding: (i) speciﬁcclients who are likely to enter a passive state or exit the system, and (ii) dispatch policies for themobile van. We discuss both of these in turn.

While not immediate from Table 3, our discrete-event simulation model can be used to estimatethat a 40-year old Puerto Rican male, with a history of using drugs for 21 years, who injects heroin10 times per day, wants treatment, uses syringes after others once every 30 days, and frequentsthe area of an SEP site, has a 95% chance of having entered the passive state if he does not visita service location within 56 days, assuming he has not already terminated contact with the SEPsystem. As a result, SEP staﬀ could send a text message to such a client as a reminder if he hasnot returned within two months.

The value of actively reaching out to clients based on insights from our simulation model may befurther enhanced by mobile exchange of syringes. Here, we simulate mobile van dispatch, witha personalized notiﬁcation push, to show its potential to improve current SEP operations. Thecorresponding simulation model has the following constructs and assumptions.

Clients.

We simulate the initiation and reoccurring visits of clients in the same way describedin Section 4.3 with the following exception: We assume that when a client exits the system, thereis a probability, denoted by p r , that the client is eligible to return to the system with activeintervention. We say that a client is at risk if they reuse syringes, either their own or the syringesof another, and we note that 22.4% of our 5,903 clients are at risk from the survey data. Mobile van.

We assume the SEP has one van, and each weekday the van is dispatched toone of ﬁve ZIP codes. We further assume that any client that the SEP contacts within a ﬁve-mile22adius of that ZIP code is eligible to be served by the van. We selected the ﬁve ZIP codes by solvinga facility-location model, which maximizes coverage of at-risk clients. The van visits each of theseﬁve ZIP codes in turn, Monday-Friday, each week over the simulation horizon.

Risky behavior.

We assume only at-risk clients engage in risky behavior. And, we assumean at-risk client does not engage in risky behavior if the client is in the CTMC’s active state, butotherwise the client does exhibit risky behavior.

Intervention.

The SEP cannot observe a client’s state or behavior, and hence interventiondecisions are made knowing the time since the client’s last visit and the client’s predictors. Inparticular, γ is the rate associated with the exponential distribution governing the return time, if the client is in the active state. We assume the SEP contacts a client if: (i) the time since the lastvisit exceeds the 0.9-level quantile for the active state’s exponential return-time distribution, and(ii) the client’s ZIP code is within ﬁve miles of the ZIP code the van is visiting that day. Re-engaging a client.

An intervention can be successful if the client is in the active state,passive state, or has exited but is eligible to return (with probability p r ). Among these clients, welet p s denote the probability that a contacted client will visit the van, and hence re-engage withthe SEP. If a client ignores the notiﬁcation, we assume the client stays in the same state: active,passive, or exited. We further assume the SEP stops contacting a client after three notiﬁcationattempts have been ignored.Using the simulation model, we compare active intervention with current SEP operations, andwe estimate the relative eﬀectiveness by examining: (i) the number of additional arrivals to thesystem, and (ii) the number of times an SEP intervention re-engages a client who would otherwisebe engaging in risky behavior. We estimate p r as follows. The 0.975-level quantile of the observedinter-arrival time is 552 days, and we use this as a proxy for whether the client has exited thesystem. We denote the number of inter-arrival times exceeding 552 days by N r , and the number ofclients who have not visited any service location after 552 days by N e . We estimate p r via: p r = N r N r + N e . With our data, N r = 1354 and N e = 4281, and so p r ≈ .

24. As we indicate above, we use the 0.9-level quantile for client-speciﬁc return-time distributions from the active state to make interventiondecisions, and Figure 9 shows a histogram of these quantiles among the surveyed clients.We run the simulation model under two diﬀerent system designs: (i) with active intervention viavan dispatch and client notiﬁcation, and (ii) without active intervention, approximating current SEPoperations. The results are shown in Table 5 as we parametrically vary the success probability, p s .After the warm-up period of 5,000 days, the average number of daily arrivals in the last 2,310 dayshas a mean of 24.6 and a 95% conﬁdence interval halfwidth of 0.9. For each value of p s in the table,we run the simulation 20 times and present 95% conﬁdence intervals. The “Total” column under23igure 9: Histogram of the 90% client-speciﬁc quantile values for the return-time distributions fromthe CTMC’s active state for the 5,903 surveyed clients.“No. of Interventions” shows the number of times over 2,310 days that a notiﬁed client visited themobile van, and the “Risky” column indicates the subset of those successful interventions involvingat-risk clients in the passive or exited state. p s Average Daily No. of InterventionsArrivals Risky Total0 24 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . p s = 0) and SEP with active interventionvia van dispatch and client notiﬁcation. Here, p s denotes the success probability associated withclient notiﬁcation, and the ± values represent 95% conﬁdence interval halfwidths.Under current SEP operations ( p s = 0), we see about 56,800 client arrivals over the 2,310-dayhorizon. Table 5 shows that a success probability of p s = 0 .

05 leads to about one interventionper day over that time horizon, i.e., less than 4% of the nominal total number of arrivals wouldbe the immediate result of an intervention. However, because a successful intervention leads to are-engaged client returning to the active state, we see a signiﬁcant 20% growth of about ﬁve arrivalsper day relative to the p s = 0 case. And, this corresponds to about one client per week who stoppedrisky behavior because of the intervention. Having only one client per day visiting the van may seemlow, but a regularly scheduled van would also attract other clients not included in our model. More24mportantly, a 20% growth in average daily arrivals would be a welcome improvement counteringsome of the trends we point to in Section 2.2. We see these results as suggesting that there isvalue in the SEP investigating an active intervention scheme similar to our example. Moreover, oursimulation model makes it possible to exploit the contextual nature of our model of inter-arrivaltimes to determine a personalized threshold for push notiﬁcation. In this paper, we examine the survey and transaction data of one major syringe exchange serviceprovider in the Chicago metropolitan area. We ﬁnd there is a discrepancy between a slightlydecreasing trend in the number of client transactions with our SEP and an increasing numberof heroin users in Chicago and the United States. We also discover signiﬁcant diﬀerences in thebehavior of clients in terms of how they engage with the SEP based on demographic attributes andfurther personal characteristics. Based on our observations, we focus on producing personalizedpredictions for clients that can aid the SEP in improving the system such as intervention initiativesfor clients with certain attributes.Standard stochastic models, such as Poisson processes, fail to accurately capture the observedinter-arrival process. Therefore, we formulate a CTMC-based simulation model to represent aclient’s path through the system. Our model consists of three sub-models: initiation, reoccurringvisits, and termination, with their parameters learned from linear and logistic regression modelsintegrated into the procedure by which we estimate the model’s parameters. With the aid of thismodel, SEP staﬀ and researchers can analyze the system parameters to draw useful conclusions forgroups with diﬀerent traits, so that proper actions can be taken towards a speciﬁc target group,or even the individual PWID. The quantitative model, combined with the personal interactionwith each client can inform SEP staﬀ of timely intervention opportunities. Such personalizedrecommendations may be particularly useful when the SEP faces challenges in tracking a largenumber of clients. Our simulation model can also help SEP staﬀ evaluate the eﬀectiveness ofcandidate initiatives.In the future, our method could be enhanced by ﬁnding an algorithm to ﬁt the parametersfor higher ﬁdelity Markov chain models since our optimization model for parameter estimationdepends on the closed-form representation of the Coxian distribution. Other functional forms forthe predictive models could be integrated into the ﬁtting procedure as well. Further sensitivityanalysis could be performed to evaluate other potential initiatives beyond dispatching the mobilevan. Optimization over the location and route of the van could also be investigated using oursimulation platform. 25 cknowledgments

We thank Dr. Lawrence J. Ouellet from the University of Illinois at Chicago, Dr. Alexander Gut-fraind from Loyola University, Chicago, Dan Bigg from CRA and many other colleagues fromCOIP and CRA who provided insights and access to data that greatly assisted the research. Weare grateful to two referees and an Associate Editor, whose suggestions signiﬁcantly improved thepaper.

References [1] Abuse, Substance and Administration, Mental Health Services (2014) Results from the 2013national survey on drug use and health: Summary of national ﬁndings.

NSDUH Series H-48,HHS Publication No. (SMA) 14-4863. [2] Aksin, Z., Armony, M., and Mehrotra, V. (2007) The modern call center: a multi-disciplinaryperspective on operations management research.

Production and Operations Management ,16(6):665–688.[3] Asmussen, S. (2003) Applied probability and queues.

Stochastic Modelling and Applied Prob-ability , 51(1):355–426.[4] Bean, P. (1993)

Cocaine and Crack: Supply and Use . Palgrave Macmillan, London.[5] Bibinger, M. (2013). Notes on the sum and maximum of independent exponentially distributedrandom variables with diﬀerent scale parameters. arXiv:1307.3945. URL https://arxiv.org/pdf/1307.3945.pdf .[6] Bluthenthal, R. N., Anderson, R., Flynn, N. M., and Kral, A. H. (2007) Higher syringe coverageis associated with lower odds of HIV risk and does not increase unsafe syringe disposal amongsyringe exchange program clients.

Drug and Alcohol Dependence , 89(2):214–222.[7] Bluthenthal, R. N., Kral, A. H., Gee, L., Erringer, E. A., and Edlin, B. R. (2000) The eﬀect ofsyringe exchange use on high-risk injection drug users: a cohort study.

AIDS , 14(5):605–611.[8] Bobbio, A. and Cumani, A. (1992) ML estimation of the parameters of a PH distribution intriangular canonical form.

Computer Performance Evaluation , 22:33–46.[9] Braine, N., Des Jarlais, D. C., Ahmad, S., Purchase, D., and Turner, C. (2004) Long-termeﬀects of syringe exchange on risk behavior and HIV prevention.

AIDS Education and Pre-vention , 16(3):264–275. 2610] Buchholz, P., Kriege, J., and Felko, I. (2014)

Input Modeling with Phase-Type Distributionsand Markov Models: Theory and Applications , chapter 2. Springer Briefs in Mathematics.[11] Chehrazi, N., Glynn, P. W., and Weber, T. A. (2019) Dynamic credit-collections optimization.

Management Science , 65(6):2737–2769.[12] Cheung, K. L., Klooster, P. M., Smit, C., de Vries, H., and Pieterse, M. E. (2017) The impactof non-response bias due to sampling in public health studies: A comparison of voluntaryversus mandatory recruitment in a Dutch national survey on adolescent health.

BMC PublicHealth , 17(1):276–286.[13] Clarke, K., Harris, D., Zweiﬂer, J.A., Lasher, M., Mortimer, R. B., and Hughes, S. (2016) Thesigniﬁcance of harm reduction as a social and health care intervention for injecting drug users:An exploratory study of a needle exchange program in Fresno, California.

Social Work PublicHealth , 31(5):398–407.[14] DeSimone, J. (2005) Needle exchange programs and drug injection behavior.

Journal of PolicyAnalysis and Management , 24(3):559–577.[15] Faddy, M., Graves, N., and Pettitt, A. (2009) Modeling length of stay in hospital and otherright skewed data: comparison of phase-type, gamma and log-normal distributions.

Value inHealth , 12(2):309–314.[16] Faddy, M. and McClean, S. (1999) Analysing data on lengths of stay of hospital patients usingphase-type distributions.

Applied Stochastic Models in Business and Industry , 15(4):311–317.[17] Faddy, M. and McClean, S. (2005) Markov chain modelling for geriatric patient care.

Methodsof Information in Medicine , 44(3):369–373.[18] Fomundam, S. and Herrmann, J. W. (2007) A survey of queuing theory applications inhealthcare. Technical report, Institute for Systems Research, University of Maryland. URL https://drum.lib.umd.edu/bitstream/handle/1903/7222/tr_2007-24.pdf .[19] Gardner, W., Mulvey, E. P., and Shaw, E. C. (1995) Regression analyses of counts andrates: Poisson, overdispersed Poisson, and negative binomial models.

Psychological Bulletin ,118(3):392–404.[20] Govil, M. K. and Fu, M. C. (1999) Queueing theory in manufacturing: a survey.

Journal ofManufacturing Systems , 18(3):214–240.[21] Hay, B., Henderson, C., Maltby, J., and Canales, J. J. (2017) Inﬂuence of peer-based needleexchange programs on mental health status in people who inject drugs: A nationwide NewZealand study.

Frontiers in Psychiatry , 7:211.2722] Holtzman, D., Barry, V., Ouellet, L. J., Des Jarlais, D. C., Vlahov, D., Golub, E. T., Hudson,S. M., and Garfein, R. S. (2009) The inﬂuence of needle exchange programs on injection riskbehaviors and infection with hepatitis C virus among young injection drug users in select citiesin the United States, 1994-2004.

Preventive Medicine , 49(1):68–73.[23] Huo, D., Bailey, S. L., and Ouellet, L. J. (2006) Cessation of injection drug use and change ininjection frequency: the Chicago needle exchange evaluation study.

Addiction , 101(11):1606–1613.[24] Huo, D. and Ouellet, L. J. (2007) Needle exchange and injection-related risk behaviors inChicago: a longitudinal study.

Journal of Acquired Immune Deﬁciency Syndromes , 45(1):108–114.[25] Kidorf, M. and King, V. L. (2008) Expanding the public health beneﬁts of syringe exchangeprograms.

The Canadian Journal of Psychiatry , 53(8):487–495.[26] Lakshmi, C. and Iyer, S. A. (2013) Application of queueing theory in health care: a literaturereview.

Operations Research for Health Care , 2(1):25–39.[27] Liu, Y., Li, S., Li, F., Song, L., and Rehg, J. M. (2015) Eﬃcient learning of continuous-timehidden Markov models for disease progression. In

Advances in Neural Information ProcessingSystems , pages 3600–3608.[28] Locker, D., Wiggins, R., Sittampalam, Y., and Patrick, D. L. (1981) Estimating the prevalenceof disability in the community: the inﬂuence of sample design and response bias.

Journal ofEpidemiology & Community Health , 35(3):208–212.[29] Nelson, B. L. and Gerhardt, I. (2010) On capturing dependence in point processes: matchingmoments and other techniques. Technical report, Northwestern University. URL http://users.iems.northwestern.edu/~nelsonb/Publications/GerhardtNelsonSurvey.pdf .[30] Paddock, S. M., Kilmer, B., Caulkins, J. P., Booth, M. J., and Pacula, R. L. (2012) Anepidemiological model for examining marijuana use over the life course.

Epidemiology ResearchInternational , volume 2012, article ID 520894.[31] Papaioannou, T. (2014) Censoring. In Balakrishnan, N., Colton, T., Everitt, B., Piegorsch,W., Ruggeri, F., and Teugels, J. L., editors,

Wiley StatsRef: Statistics Reference Online .American Cancer Society.[32] W¨achter, A. and Biegler, L. T. (2006) On the implementation of an interior-point ﬁlterline-search algorithm for large-scale nonlinear programming.

Mathematical Programming ,106(1):25–57. 28

Covariates for the PWIDs Population

When a client initiation occurs in the simulation, we draw a client at random, with replacement,from our dataset of 5,903 unique clients. In this appendix we detail those attributes, beginningwith the fraction of the population with each categorical attribute:GenderMale Female Transgender0.6947 0.3049 0.0004EthnicityWhite African American Puerto Rican Mexican Other Latino Other0.5158 0.2345 0.1478 0.0617 0.0122 0.0280Snort before injectionYes No0.3446 0.6554Participation inshooting galleriesYes No0.0864 0.9136Participation in treatment programsCurrently in Been in Tried to get into Interested inYes 0.1011 0.1870 0.0923 0.4738No 0.8989 0.8130 0.9077 0.5362Drugs used in the past 30 daysSpeedball Heroin Cocaine Ritalin Heroin OtherYes 0.0486 0.9582 0.0581 0.0005 0.0185No 0.9514 0.0418 0.9419 0.9995 0.9815Source of syringesFamily Friends Acquaintance Strangers Other SEP OtherYes 0.0586 0.2529 0.0530 0.0163 0.1360 0.6131No 0.9414 0.7471 0.9470 0.9837 0.8640 0.3869Reuse own syringesYes No0.1579 0.8521Use syringesbehind othersYes No0.1972 0.8028For each of the continuous factors, we present descriptive statistics and histograms of their distri-butions. We use µ to denote the mean of the factor and σ to denote the standard deviation.29 Age: µ = 34 . , σ = 11 . • Age of clients at their ﬁrst injection: µ = 23 . , σ = 7 . • Length of drug injection history: µ = 11 . , σ = 11 . • Number of daily drug injections: µ = 2 . , σ = 1 . • Number of times reusing own syringes in 30 days: µ = 1 . , σ = 6 . • Number of times using others’ used syringes in a 30 days: µ = 0 . , σ = 1 . • Number of times visiting the area of service locations in 30 days: µ = 23 . , σ = 9 . B Statistical Signiﬁcance Results of Fitted Parameters