Analyzing Client Behavior in a Syringe Exchange Program
AAnalyzing Client Behavior in a Syringe Exchange Program
July 25, 2019
Haoxiang Yang , Yue Hu , David P. Morton Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston IL, [email protected], [email protected] Decision, Risk, and Operations, Columbia Business School, New York NY, [email protected]
Abstract
Multiple syringe exchange programs serve the Chicago metropolitan area, providing supportfor drug users to help prevent infectious diseases. Using data from one program over a ten-yearperiod, we study the behavior of its clients, focusing on the temporal process governing theirvisits to service locations and on their demographics. We construct a phase-type distributionwith an affine relationship between model parameters and features of an individual client. Thephase-type distribution governs inter-arrival times between reoccurring visits of each client andis informed by characteristics of a client including age, gender, ethnicity, and more. The inter-arrival time model is a sub-model in a simulation that we construct for the larger system, whichallows us to provide a personalized prediction regarding the client’s time-to-return to a servicelocation so that better intervention decisions can be made with the help of simulation.
Keywords: syringe exchange program, personalized prediction, phase-type distribution, discrete-event simulation
Three major agencies provide syringe exchange programs (SEPs) in the Chicago metropolitanarea: Community Outreach Intervention Projects (COIP), Chicago Recovery Alliance (CRA), andTest Positive Aware Network (TPAN). Each agency offers equipment and educational services andconducts research on drug users. With the goal of supporting persons who inject drugs (PWIDs)and helping prevent the spread of infectious diseases, they provide services including street outreach,counseling and training for preventing HIV and hepatitis C, case management for persons living withHIV, assistance in entering treatment for substance use, and HIV medical, mental, and pharmacycare. Their locations include storefronts and mobile vans, which may operate according to a flexibleschedule. The benefits of an SEP are twofold. First, multiple studies have shown that SEPs areeffective in reducing risk behavior such as sharing syringes [6, 7, 9, 22, 24], thereby lowering ratesof HIV and hepatitis C transmission. Second, higher utilization of an SEP provides PWIDs withmore opportunities to learn about treatment programs, which further reduces drug use [14, 23].This paper focuses on one of the SEPs in Chicago, and we refer to program participantsas clients.
Service locations, including storefronts and mobile vans, accept used syringes and, inexchange, provide clients with new syringes along with other devices that help prevent the spreadof disease, such as condoms, cookers, purified water, and bleach. On their first visit to a servicelocation, clients are asked to take a voluntary survey involving demographic information and thenature of their drug use (frequency, types of drugs, etc.), which we detail in Section 2. Once a client1 a r X i v : . [ s t a t . A P ] J u l s established in the system, the SEP keeps a record of frequency of drug use, health condition,and general living condition. During a visit, an SEP employee will have a personal conversationwith a client, e.g., about recent life changes, employment status, and family situation. The SEPwill further provide the client with information to help with health issues, and seek to introducethe client to drug treatment programs. Evidence has shown that this type of personal interactioncan help clients obtain peer support to recover from substance addiction [13, 21, 25].Nationwide, about 681,000 Americans aged 12 years or older reported using heroin in 2013 [1],and the number of reported users grew every year from 2007 to 2013, with new users growingabout 70% from 2002 to 2013. The volume of heroin seized by officials, and the number of heroinoverdoses, both grew over the same period in Chicago [1]. The contrast between the significantgrowth in the use of heroin and slightly lower use of services at the target SEP (see Section 2)motivates our study. In order to promote its services and tailor them to individual clients, theagency needs a better understanding of client behavior in using the SEP.The arrival process of clients to SEP sites is key to understanding their use of SEP services, andwe have data on arrival times and locations over a ten-year period. Our main focus is to develop a contextual understanding of the arrival process; i.e., we want to understand inter-arrival times given features of an individual client obtained, in part, from the voluntary demographic survey. Our datasuggest some clients “establish care” with the SEP, returning consistently, while others use SEPservices once and never return. We seek to model and understand the “life cycle” of an individualclient from initiation, reoccurring visits, and termination with the program. Through interviewswith SEP staff, we learned that ethnicity, gender, age, geographic location, and drug history affecthow a client will use SEP services. For example, African Americans are less active in SEP programsbecause they tend to prefer not to disclose their drug habits, and because African Americans whouse drugs tend to snort rather than inject. A client’s interaction with the SEP is affected by lifeevents; e.g., a person who moves farther from a service location may visit less frequently. Theseare snapshots of a pervasive phenomenon that suggest many factors can influence client behaviorwhen using SEP services.To address these issues, we propose a model for how a client engages with the SEP using threeintegrated sub-models for initiation, reoccurring visits, and termination. The latter two sub-modelsare fully integrated and involve a phase-type distribution, which may be viewed as a continuous-time Markov chain with hidden states. To build a contextual model, we express the Markov chain’sparameters as a function of client features using linear and logistic regression. Analysis withour model can help the SEP understand the importance of different features and hence estimatethe distribution governing a specific client’s next arrival time. This, in turn, can help SEP staffbetter allocate limited resources to improve the program’s effectiveness. Armed with a probabilitydistribution for the timing of a client’s next visit, SEP staff can be alerted when a specific client has2ot used SEP services for an unusual period of time, which may point to risky drug-use behavior.SEP employees can then contact the client (e.g., via a text message), or dispatch a mobile van tospecific locations and message nearby clients. Using simulation, we illustrate the potential value ofusing our model in this manner.Modeling arrival processes plays a key role in many application domains; see [26] for a review ofrelevant literature in healthcare. Homogeneous and nonhomogeneous Poisson processes are widelyused to model an arrival process [2, 18, 20], but do not address our primary goal, i.e., to provideinsights regarding a client’s return time based on individual predictors associated with that client.In principle, we could partition clients into different categories based on their features and fit sucharrival processes based on these categories. However, such an approach scales poorly given thenumber of features we consider.As we will discuss, our data on inter-arrival times of clients to SEP sites have heavier tails thanthat of an exponential distribution, and this is true even when we develop models that condition onsub-populations of the clients. This may arise for two related reasons: (i) the “establishing care”nature of some clients discussed above, and (ii) clients effectively transitioning between hiddenstates, which capture active and passive engagement with the SEP. The three sub-model approachthat we propose allows us to capture these effects, and its contextual nature captures heterogeneity.We use a negative binomial model for initiation, which is consistent with an over-dispersed mixtureof Poisson distributions that captures heterogeneity. We use a two-state continuous-time Markovchain model with unobservable states to capture reoccurring visits and termination. These twointegrated sub-models allow us to reasonably represent issues (i) and (ii) and their conditionaltransition- and system-exit probabilities allow us to capture heterogeneity.Hidden Markov chain models have been used to model scenarios in which an agent transitionsbetween a modest number of states, each associated with certain patterns of behavior. Paddocket al. [30] construct a Markov chain model to understand trajectories of a marijuana user with thegoal of using simulation to assess alternative treatment and prevention policies. Liu et al. [27] learna continuous-time hidden Markov chain model for disease progression in glaucoma and Alzheimer’sdisease. Chehrazi et al. [11, Appendix C] discuss using a two-state Markov chain to model therepayment behavior associated with delinquent credit card accounts, in which the two states rep-resent high and low repayment rates, with the goal of directing credit collection efforts. HiddenMarkov chains enable modelers to capture plausible, but unobservable, transitions of an agent.Our approach can further enhance such models by allowing the parameters of the Markov chainto depend on agent-specific features using regression. Given requisite data, the types of modelsjust sketched could benefit from our approach, increasing model fidelity and insights from analysis,by linking agent heterogeneity to the Markov chain model. A technical challenge that we addressin this paper—at least for the family of serial Coxian models that we detail—involves parameter3stimation when using regression to map the features of an agent to the Markov chain’s parameters.We first provide, in Section 2, an overview of the demographic and arrival data collected bythe SEP between 2005 and 2014. In Section 3, we provide details of the derivation of the threesub-models: initiation, reoccurring visits, and termination, which together capture individual fea-tures associated with clients. We present our computational techniques and results in Section 4,with model validation and an example of active intervention that our model can recommend. Weconclude the paper in Section 5 by summarizing the model and insights from our analysis. The data from the SEP consist of results from a survey and records of individual syringe exchangetransactions. The transaction data were collected from January 2001 to November 2014 with139,488 entries. The survey data were collected between July 2005 and November 2014 with 6,843surveys. Each survey entry corresponds to a unique client. When a new client arrives at a servicesite, the client is assigned a unique study number (henceforth, client ID) and is asked to completean enrollment survey. This client ID is then used throughout the client’s sojourn in the system.We use the data from July 2005 to November 2014 because the surveys are aligned consistentlywith the transaction records in this period. After removing incomplete and contaminated records,we combine the transaction and survey data to obtain a merged dataset with 63,960 entries, eachwith 50 data fields, which we detail in the following section.
The survey contains 31 questions, which yield 33 predictors, covering basic demographic informa-tion, ZIP code of residence, and the client’s drug use habits. These predictors are categorized asfollows: • Basic personal information: age, ethnicity, gender; • Length of time using/injecting drugs; • Frequency of using/injecting drugs in the past 30 days; • Frequency of reusing one’s own syringes in the past 30 days; • Frequency of sharing syringes with others in the past 30 days; • Involvement in group injection in the past 30 days; • Sources of syringes; • Types of drugs used; • Drug treatment program participation in the past six months;4
Reasons and frequency of being in the area with an SEP location in the past 30 days.Among 5,903 clients in our merged dataset, 4,101 are male, 1,800 are female, and two are trans-gender. The demographics of these clients are summarized in Tables 1 and 2.Ethnicity Number of clients Percentage (%)White 3,045 51.58African American 1,384 23.45Puerto Rican 873 14.78Mexican 364 6.17Other Latino 72 1.22Other 165 2.80Total 5,903 100Table 1: Ethnicity of clientsAge (years) Number of clients Percentage (%) <
15 1 0.0216 −
30 2,602 44.0831 −
45 2,108 35.7146 −
60 1,111 18.82 >
60 81 1.37Total 5,903 100Table 2: Age of clientsThe average age of first drug use among all clients is 23.5 years. About 67.9% of clients injectdrugs daily, and 81.7% injected drugs more than 20 out of 30 days before taking the survey. Forall reasonable responses to the survey question regarding injection frequency (defined as at most10 injections per day), the average number of injections per day is 2.77. About 82.6% of clientsreported that they did not use someone else’s syringe in the past 30 days, and 77.9% reported thatno one used their syringes over the same time period. Among 986 clients who injected with syringesothers had used in the past 30 days, 558 shared with their spouse, 53 shared with a family member,350 shared with a friend, 24 shared with an acquaintance, and 11 shared with a stranger. About72.1% of the clients did not share cookers, cotton or water during injection, and 8.6% of the clientsstated that they injected drugs in a shooting gallery within the 30 days before they began using thesyringe exchange service. About 69.2% of the clients were in the neighborhood of the SEP locationwhere they took the survey more than 20 days in a typical 30-day month, 25.1% of whom lived inthe neighborhood with an SEP location, 54.5% of whom had come mainly to buy drugs, 2.0% ofwhom had come mainly to exchange syringes, and 4.6% of whom had come to visit friends. Thevoluntary nature of the survey could lead to a non-response bias in the results; see, for example,5igure 1: The time series of transactions from six SEP service locationsthe illustrated cases and discussion in [12, 28]. Because every SEP transaction is labeled with aclient ID, we can compute the fraction of clients for whom we have a completed survey, which is98.6%. Discussions with SEP staff confirmed that although the survey is voluntary, almost everynew client takes the survey upon the recommendation of SEP staff, and so the overall effect ofnon-response bias is reasonably small in our study.
The transaction data include details of syringe exchanges that occurred in multiple SEP storefrontsand on mobile van routes. During each transaction, SEP employees record the client ID, numberof syringes exchanged, number of other preventive devices distributed, size of the group comingwith the client, and type(s) of health education material given to the client. In between July 2005and November 2014, the SEP distributed 3,647,384 syringes, 160,895 male and female condoms,and 63,667 sets of educational material. The mean size of the group coming with the client duringa single transaction was 1.89 with a standard deviation of 2.37. The mean number of syringesexchanged in one transaction was 57.03 with a standard deviation of 110.88. Our discussions withthe SEP suggested that there are ample staff to process clients’ visits, and that the SEP alwayshad enough syringes and rarely ran out of other drug-injection equipment. Thus we see the levelof censoring in the demand data as minimal.We present a time series of the number of transactions from July 2005 to November 2014 in6igure 2: The time series of aggregated monthly transactions and syringes exchangedFigure 1. The figure shows transactions at four storefronts and mobile distribution in two areas,where a mobile van was dispatched to certain locations with a flexible schedule. According toSEP staff, the schedule for the van was communicated to clients during their visits and at someshooting galleries. Clients could also inquire about the schedule through phone calls. The serviceat Location 6 was terminated in January 2010, and some of its clients started visiting Location 4afterward, consistent with the increasing trend for Location 4. Figure 1 also suggests a decliningnumber of visits for Location 2. Figure 2 shows the fluctuation of monthly aggregated transactions.The figure suggests a slight decrease in the number of monthly transactions over the ten-year timespan. The number of syringes exchanged surged between 2009 and 2012 but appeared to decreaseafterward.
We seek to develop a predictive model for a client’s arrival process based on a number of predictorssuch as race, gender, age, and more. To this end, we segment a client’s experience in the system intothree sub-processes: initiation, reoccurring visits, and termination. Features of the client are usedas covariates to estimate parameters of the model of reoccurring visits and termination. We canachieve two objectives with our model. First, we can forecast the next arrival of a specific client,given that client’s features and most recent arrival time. Second, we can simulate the system andperform sensitivity analysis on specific model parameters. The former can help the SEP identify7igure 3: Empirical distribution fit of number of daily initiations to a negative binomial distributionirregular behavior and take prompt intervention measures. The latter can guide initiatives toimprove system-wide performance.The overall structure of the model we formulate is that we build sub-models of these threeindividual sub-processes. While we have analytical models of these sub-components, our overallmodel, which combines these sub-models, can only be executed as a simulation, as we describe aftercharacterizing the sub-models.
The first time that a client uses the SEP is called the initiation of the client’s arrival process. Therecorded initiation data are clear, and so we focus on how to simulate new initiations. We examinethe distribution of the number of initiations per day, i.e., the arrivals generated by clients who havenever previously visited a service location. Since our SEP started recording survey data four yearsafter recording transactions, some returning clients were asked to complete the survey starting inJuly 2005, even though their true initiation was earlier. In an attempt to avoid inflating someinitiation counts, we use data starting from January 2007 in order to fit a distribution to estimatethe initiation process. The blue bars in Figure 3 show the empirical distribution of initiations perday. We fit a negative binomial distribution with parameter (3 , . We track the history of clients who visit the SEP service sites multiple times and plot the distri-bution of inter-arrival times. The inter-arrival time is defined here as the duration between twoconsecutive visits made by the same client. Figure 4 suggests that the distribution has a heavytail, i.e., it shrinks to zero more slowly than an exponential.As we suggest in Section 1, using a Poisson process to model inter-arrival times does not helpwith our main goal, which is to provide contextual, i.e., client-specific predictions. Putting thisaside for a moment, we tested the goodness of fit associated with a Poisson process for our aggregatedataset, and for datasets associated with sub-populations of clients based on ethnicity, age, andgender. Moreover, we investigated both homogeneous and non-homogeneous (e.g., piecewise con-stant arrival rate by week) Poisson processes. For the models we assessed, statistical tests yielded p -values that were vanishingly small, suggesting that such models fail to provide an adequate rep-resentation. This is consistent with our observation from Figure 4, giving further evidence thatmodeling inter-arrival times using an exponential distribution may not be appropriate.Figure 4: Log-log relationship between frequency and inter-arrival time. The logarithms in thefigure are base 10, and underlying inter-arrival times are in days.We model client inter-arrival times with a phase-type distribution, because of its goodness offit and its potential interpretability. The distribution of any nonnegative random variable can9e approximated with high accuracy using a phase-type distribution; see, e.g., [3]. A phase-typedistribution can be expressed as the time required for a continuous-time Markov chain (CTMC) toenter an absorbing state (say, state 0) from a randomly selected transient state, { , , . . . , n } ; see,e.g., [10]. A probability mass function, denoted by α = ( α i ) i =1 ,...,n , governs the initial state of theCTMC. The infinitesimal generator is constructed in the following manner: (cid:20) a Q (cid:21) , (1)where a is an n -dimensional vector specifying the transition rates from the transient states tothe absorbing state, and Q is an n × n matrix specifying the transition rates among transientstates, where Q ( i, j ) denotes the rate of transitioning from state i to state j , and Q ( i, i ) = − [ (cid:80) j (cid:54) = i,j =1 ,...,n Q ( i, j ) + a ( i )]. The first row in the generator of equation (1) corresponds to thetransition rates from the absorbing state to any other transient state, which are always 0. The prob-ability density function (pdf) of the phase-type distribution can be characterized as f ( t ) = αe Qt a ,and the cumulative distribution function (cdf) is given by F ( t ) = 1 − αe Qt , where is the n -dimensional vector of all 1’s.There are multiple ways to fit a phase-type distribution to data; see, e.g., [29]. Here, weformulate a nonlinear optimization model rooted in maximum-likelihood estimation (MLE), coupledwith a regression model that uses covariates of the clients. We first describe the MLE approach inthe context of the Coxian distribution, a special case of phase-type distributions, and then coupleit with client-specific predictors.For a Coxian distribution, the embedded Markov chain has n + 1 states, as shown in Fig-ure 5. The stochastic process starts in state 1, i.e., α = (1 , , . . . , i = 1 , . . . , n −
1, we can transition to only the adjacent transient state, i + 1 or to the absorbingstate, 0. The rate at which we depart state i is γ i , and we transition to the absorbing state withprobability q i and to the adjacent transient state with probability 1 − q i . As Figure 5 also depicts,transient state n can only transition to the absorbing state.Figure 5: CTMC depiction of the Coxian distribution10igure 6: An equivalent CTMC to the Coxian distribution’s model in Figure 5A Coxian distribution is more parsimonious than a general phase-type distribution. The formermodel contains 2 n − n + n . Bobbio and Cumani [8]show that the Coxian model in Figure 5 is equivalent to another CTMC model that is depictedin Figure 6. The latter formulation is helpful in our setting because it offers a linear structure forcapturing client-specific features as we describe below. The relationship between the q i parametersin the first model and the β i parameters in the second model is given by the following equations: q = β = 0 β i = q i i − (cid:89) k =0 (1 − q k ) i = 1 , , . . . , nq i = β i − (cid:80) i − k =0 β k i = 1 , , . . . , n. A CTMC lends itself to interpretation by associating transitions in the model with assumedphases of a client using syringes after leaving a service location. In addition to the referencesmentioned in Section 1, which use CTMCs with hidden states [11, 27, 30], we note that such anapproach has also been used to model the length of stay of hospital patients [15, 16, 17], includingwork in which serial (Coxian) CTMCs are employed. We use a similar philosophy by inferringtransition rates from unobservable states and, moreover, connecting them to features of a client.We can interpret a Coxian distribution with n = 2 in our setting as follows. After visiting an SEPlocation, the client enters an (unobservable) “active state” with probability β , and subsequentlyreturns to an SEP site after an exponentially distributed delay with parameter γ . Alternatively,with probability β = 1 − β the client enters a “passive state.” Returning to an SEP site is thenthe sum of two independent exponential random variables with rates γ and γ , where we expect γ > γ . The passive state could correspond to the client temporarily seeking another source ofsyringes, for example.Bobbio and Cumani [8] present an MLE procedure to fit the parameters for the Coxian dis-tribution. Their method, however, is not directly applicable when we express β and γ as affinefunctions of predictors associated with clients. A result of [5] allows us to express the pdf and cdfof a sum of independent exponential random variables, and we can use this result to write the pdf11nd cdf of a Coxian random variable as: f ( t ) = n (cid:88) i =1 β n +1 − i f i ( t ) = n (cid:88) i =1 β n +1 − i (cid:32) n +1 − i (cid:89) l =1 γ l (cid:33) n +1 − i (cid:88) j =1 e − γ j t (cid:81) n +1 − ik =1 ,k (cid:54) = j ( γ k − γ j ) (2) F ( t ) = n (cid:88) i =1 β n +1 − i F i ( t ) = n (cid:88) i =1 β n +1 − i (cid:32) n +1 − i (cid:89) l =1 γ l (cid:33) n +1 − i (cid:88) j =1 (1 − e − γ j t ) /γ j (cid:81) n +1 − ik =1 ,k (cid:54) = j ( γ k − γ j ) . (3)Here f i and F i are the pdf and cdf of the inter-arrival time, conditioned on beginning in state i . Weindex the collection of inter-arrival times by S , and denote each inter-arrival time by t s , s ∈ S , andsimilarly denote censored inter-arrival times by t u , u ∈ U . For every client, the time from the lastarrival to the end of the observation horizon can be considered a right-censored inter-arrival time.The likelihood function is the product of the pdf of each inter-arrival time and the complement ofthe cdf of each censored inter-arrival time. Maximizing the log-likelihood function then leads tothe following problem:max β,γ ≥ (cid:88) s ∈S log( f ( t s )) + (cid:88) u ∈U log (1 − F ( t u )) (4a)s.t. n (cid:88) i =1 β i = 1 (4b) f, F defined as in (2) and (3), ∀ t s , s ∈ S and t u , u ∈ U , respectively . (4c)The first term in the objective function of model (4) corresponds to observed inter-arrival times,and the second term corresponds to right-censored data in which we do not know the inter-arrivaltime, only that it exceeds, t u ; see, e.g., [31] for such treatments of right-censored data. We notethat a limiting analysis shows that equations (2)-(3) remain valid even when rates at distinct statesare identical [5]. That said, this can cause numerical difficulties and we return to this issue below.So far, the described fitting procedure assumes all clients behave according to the same model.As discussed above, we seek to incorporate the features of clients when we construct the parametersof the Coxian distribution. Here we use an affine relationship to connect those features to theparameters of the Coxian distribution. We use V to represent the set of clients, and we use j = 1 , , . . . , m to index the characteristics of clients. We use x j,v , ∀ j = 1 , . . . , m, v ∈ V , to denotethese predictors. We also use v ( s ) ∈ V to specify the client associated with the s -th inter-arrivaltime, and we similarly define v ( u ) for the client associated with the u -th censored inter-arrival time.Our extension of model (4) to incorporate client-specific predictors is given by: max β,γ,b,g,(cid:15) (cid:88) s ∈S log( f ( t s )) + (cid:88) u ∈U log (1 − F ( t u )) − η β (cid:107) (cid:15) β (cid:107) (5a)s.t. n (cid:88) i =1 β i,v = 1 ∀ v ∈ V (5b) i,v = m (cid:88) j =1 b i,j x j,v + b i, + (cid:15) βi,v ∀ i = 1 , , , . . . , n, v ∈ V (5c) γ i,v = m (cid:88) j =1 g i,j x j,v + g i, ∀ i = 1 , , . . . , n, v ∈ V (5d) f ( t s ) = n (cid:88) i =1 β n +1 − i,v ( s ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( s ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( s ) t s (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( s ) − γ j,v ( s ) (cid:1) ∀ s ∈ S (5e) F ( t u ) = n (cid:88) i =1 β n +1 − i,v ( u ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( u ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( u ) t u /γ j,v ( u ) (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( u ) − γ j,v ( u ) (cid:1) ∀ u ∈ U (5f) β i,v ≥ ∀ i = 1 , , . . . , n, v ∈ V (5g) γ i,v ≥ ∀ i = 1 , , . . . , n, v ∈ V . (5h) The idea behind model (5) is that we have predictors associated with each client, and con-straints (5c) and (5d) express the parameters of the Coxian model as an affine function of thesepredictors. Constraint (5b) replicates constraint (4b), and constraints (5e) and (5f) define the pdfand cdf terms that appear in the log-likelihood in the first two terms of the objective function.Model (5) combines elements of regression and maximum likelihood estimation. The first twoterms in the objective function maximize log-likelihood in the spirit of model (4). Constraints (5c)-(5d) define the regression model. Parameters β i,v and γ i,v are unobservable, and so, in principle,we could have no residual term in the regression model. However, constraint (5b) requires that theconditional probabilities that we return to states 1 , , . . . , n sum to 1. So, to maintain feasibilitywe add a residual, (cid:15) βv , in equation (5b) and penalize its two-norm using a positive weight η β inthe final term in the objective function (5a). While not explicit in model (5)’s statement, to helpprevent overfitting, we regularize the regression parameters b and g by adding terms − η b (cid:107) b (cid:107) and − η g (cid:107) g (cid:107) to the objective function (5a).While a limiting analysis shows the validity of equation (2)-(3) even when some of the compo-nents of γ are identical, allowing this when optimizing can cause numerical problems. Moreover,our motivation, sketched above, includes the idea that the sojourn times should be larger in thepassive state than in the active state, and for both reasons we add the following constraint tomodel (5): γ i,v − γ i +1 ,v ≥ δ, ∀ i = 1 , , . . . , n − , v ∈ V , where δ > . (6) We assume that each client has a possibility to exit the system after each visit to the SEP.Specifically, as soon as the CTMC hits the absorbing state, we assume that, with probability p v , client v ∈ V will stop visiting our SEP.However, we cannot observe a client leaving the system because we only observe their visits. Ifa client visits service locations multiple times, then we know that for every visit before the last one,13he client is still in the system, and so the likelihood function is conditioned on the client remainingin the system. After the last visit, a client may stay in the system or may leave. We need toincorporate this information in the likelihood function. Conditioned on the client remaining in thesystem, the likelihood function is 1 − F ( t u ), where t u , u ∈ U , is the time between the client’s lastvisit and the end of the observation horizon. As a result, the log-likelihood function can be revisedas: (cid:88) s ∈S (cid:0) log( f ( t s )) + log(1 − p v ( s ) ) (cid:1) + (cid:88) u ∈U log (cid:0) (1 − F ( t u )) (1 − p v ( u ) ) + p v ( u ) (cid:1) . (7)We again model parameter p v via a functional relationship with client v ’s covariates. Instead of anaffine relationship, we use a logistic function as follows: p v = (cid:16) e − ( ρ + (cid:80) mj =1 ρ j x j,v ) (cid:17) − , (8)where we will optimize the fit via parameters ρ j and ρ . Given that p v ∈ (0 , p by combining the results of (5), (7), and (8). Since thelikelihood function is conditioned on whether the client has exited the system, we need to solve anonlinear optimization problem as follows, which fits parameters for both reoccurring visits andtermination, integrating our latter two sub-models: max β,γ,b,g,ρ,p,(cid:15) (cid:88) s ∈S (cid:0) log( f ( t s )) + log(1 − p v ( s ) ) (cid:1) + (cid:88) u ∈U log (cid:0) (1 − F ( t u )) (1 − p v ( u ) ) + p v ( u ) (cid:1) − η β (cid:107) (cid:15) β (cid:107) (9a)s.t. n (cid:88) i =1 β i,v = 1 ∀ v ∈ V (9b) β i,v = m (cid:88) j =1 b i,j x j,v + b i, + (cid:15) βi,v ∀ i = 1 , , , . . . , n, v ∈ V (9c) γ i,v = m (cid:88) j =1 g i,j x j,v + g i, ∀ i = 1 , , . . . , n, v ∈ V (9d) p v = (cid:16) e − ( ρ + (cid:80) mj =1 ρ j x j,v ) (cid:17) − ∀ v ∈ V (9e) f ( t s ) = n (cid:88) i =1 β n +1 − i,v ( s ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( s ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( s ) t s (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( s ) − γ j,v ( s ) (cid:1) ∀ s ∈ S (9f) F ( t u ) = n (cid:88) i =1 β n +1 − i,v ( u ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( u ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( u ) t u /γ j,v ( u ) (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( u ) − γ j,v ( u ) (cid:1) ∀ u ∈ U (9g) β i,v ≥ ∀ i = 1 , , . . . , n, v ∈ V (9h) γ i,v ≥ ∀ i = 1 , , . . . , n, v ∈ V . (9i) Solving problem (9) maximizes the log-likelihood function for the combination of reoccurringvisits and termination. Similar to fitting the reoccurring visits, we also add a regularization term14 η ρ (cid:107) ρ (cid:107) in the objective function to prevent overfitting. Constraint (9e) models the logistic rela-tionship between the parameter p and the covariates x . Given the fit value of ρ ∗ and the featuresof client v , we can calculate the termination probability of that client, p v . In this section, we first discuss how we solve model (9) and its simpler variants, along with pre-liminary results in which we do not use the covariates of the clients. Then we present the resultsof the fit model, provide insights as to how different features predict client behavior, test elementsof model validity, and show an example of how our personalized arrival model can guide activeintervention.
We use a Coxian model with n = 2 transient states. We interpret one phase as the client being inan active state with the SEP, i.e., with more frequent visits to exchange syringes, and we interpretthe other phase as a passive state with less frequent visits.Model (9) and its variants are computationally challenging nonconvex optimization problems.We use Ipopt 3.12.1 [32], with linear solver MA27, to solve instances of these optimization problems.Due to nonconvexity, we only obtain locally optimal solutions. In addition, because numerical issuescan arise, we briefly sketch ways in which we “help” the solver.We scale all continuous data, i.e., client predictor data, so that it is normalized. As discussedat the end of Section 3.2, we enforce γ ,v ≥ γ ,v + δ , and we use δ = 0 .
005 in our computation. Fornumerical reasons, we also bound the γ -parameters away from zero, by enforcing γ ≥ ¯ γ ≡ . β, γ , and p . We do this for two reasons. First, it provides insight regarding typicalvalues of these parameters, and second, as we discuss in further detail below, it helps provide agood initial solution for model (9). In particular we solve:max β,γ,p (cid:88) s ∈S (log( f ( t s )) + log(1 − p )) + (cid:88) u ∈U log ((1 − F ( t u ))(1 − p ) + p ) (10a)s.t. n (cid:88) i =1 β i = 1 (10b) f ( t s ) = n (cid:88) i =1 β n +1 − i (cid:32) n +1 − i (cid:89) l =1 γ l (cid:33) n +1 − i (cid:88) j =1 e − γ j t s (cid:81) n +1 − ik =1 ,k (cid:54) = j ( γ k − γ j ) ∀ s ∈ S (10c) F ( t u ) = n (cid:88) i =1 β n +1 − i (cid:32) n +1 − i (cid:89) l =1 γ l (cid:33) n +1 − i (cid:88) j =1 e − γ j t u /γ j (cid:81) n +1 − ik =1 ,k (cid:54) = j ( γ k − γ j ) ∀ u ∈ U (10d) γ i − γ i +1 ≥ δ ∀ i = 1 , , . . . , n − i ≥ ∀ i = 1 , , . . . , n (10f) γ n ≥ ¯ γ (10g)0 ≤ p ≤ . (10h)Solving model (10) leads to parameters for a “featureless” client, as follows:ˆ β = 0 . , ˆ β = 0 . γ = 0 . , ˆ γ = 0 . p = 0 . . This result suggests that after each visit, the featureless client has a 9 .
8% chance of exiting theSEP system. Conditional on the client visiting an SEP site again, the client returns via the activestate (frequent visits) with a probability of about 0 .
82. The mean return time to the SEP fromthis state is 1 /γ ≈
19 days. With probability about 0 .
18, the client returns via the passive state,and the expected time to visit the SEP is then 1 /γ + 1 /γ ≈
350 days.Rather than optimizing over the intercept terms, b , g and ρ in model (9), we fixed theseterms as: b , = ˆ β b , = ˆ β g , = ˆ γ g , = ˆ γ ρ = ln (cid:18) ˆ p − ˆ p (cid:19) . Fixing the intercept terms in this way allows us to interpret parameters b i,j , g i,j , and ρ j for j =1 , , . . . , m as deviations from the featureless client. Moreover, fixing these parameters helps improvethe numerical performance of Ipopt when solving the nonconvex problem, in part by effectivelyproviding a good initial solution.After adding the regularization terms for b , g and ρ described in Section 3.2 and 3.3, and fixingthe value of b , g and ρ , we solve the following nonlinear program: max β,γ,b,g,ρ,p,(cid:15) (cid:88) s ∈S (cid:0) log( f ( t s )) + log(1 − p v ( s ) ) (cid:1) + (cid:88) u ∈U log (cid:0) (1 − F ( t u )) (1 − p v ( u ) ) + p v ( u ) (cid:1) − η β (cid:107) (cid:15) β (cid:107) − η b (cid:107) b (cid:107) − η g (cid:107) g (cid:107) − η ρ (cid:107) ρ (cid:107) (11a)s.t. n (cid:88) i =1 β i,v = 1 ∀ v ∈ V (11b) β i,v = m (cid:88) j =1 b i,j x j,v + b i, + (cid:15) βi,v ∀ i = 1 , , , . . . , n, v ∈ V (11c) γ i,v = m (cid:88) j =1 g i,j x j,v + g i, ∀ i = 1 , , . . . , n, v ∈ V (11d) v = (cid:16) e − ( ρ + (cid:80) mj =1 ρ j x j,v ) (cid:17) − ∀ v ∈ V (11e) f ( t s ) = n (cid:88) i =1 β n +1 − i,v ( s ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( s ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( s ) t (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( s ) − γ j,v ( s ) (cid:1) ∀ s ∈ S (11f) F ( t u ) = n (cid:88) i =1 β n +1 − i,v ( u ) (cid:32) n +1 − i (cid:89) l =1 γ l,v ( u ) (cid:33) n +1 − i (cid:88) j =1 e − γ j,v ( u ) t u /γ j,v ( u ) (cid:81) n +1 − ik =1 ,k (cid:54) = j (cid:0) γ k,v ( u ) − γ j,v ( u ) (cid:1) ∀ u ∈ U (11g) γ i,v − γ i +1 ,v ≥ δ ∀ i = 1 , , . . . , n − , v ∈ V (11h) β i,v ≥ ∀ i = 1 , , . . . , n, v ∈ V (11i) γ n,v ≥ ¯ γ ∀ v ∈ V (11j) b i, = ˆ β i , g i, = ˆ γ i , ρ = ln (cid:18) ˆ p − ˆ p (cid:19) ∀ i = 1 , , . . . , n. (11k) With modest tuning effort, we select the following weights on the regularization terms: η β = 100 , η b = 100 , η g = 1000 , η ρ = 10 . The results from fitting the parameters using the method in Section 4.1 are displayed in Table 3.The estimators are represented by ρ , b , and g . A positive ρ j , j = 1 , , . . . , m , indicates that havingfeature j increases the probability that the client will exit the system. Given that the client staysin the system, a positive value of parameter b ,j increases the probability that the client returns tothe active state. We do not report b ,j because its coefficient differs from b ,j ’s by a sign. Positivecoefficients g ,j and g ,j lead to increased frequencies, i.e., shorter mean times, associated with theactive and passive states, respectively. The coefficients in Table 3 are given in either regular font orgray font. The former category is significant, and the latter is not, where “significant” is defined ashaving at least 90% of bootstrapped replications having the same sign, as detailed in Appendix B.Column ∆ in Table 3 shows the amount by which the conditional expected inter-arrival timechanges (in days) if a client has that row’s feature but is otherwise a featureless client. Column T similarly shows the magnitude by which the expected sojourn time in the system changes due toa single feature. For context, the mean sojourn time of a featureless client is 806 . T ).17actor ( j ) ρ j b ,j g ,j g ,j ∆ T Gallery 0.5448 -0.0387 0.0027 0.0003 5.9 -268.2From Other Locations 0.2047 -0.0156 0.0023 0.0004 -3.2 -162.2From Other SEP -0.3156 0.0248 -0.0063 0.0000 -6.2 186.3From Friends -0.1194 -0.0045 -0.0011 -0.0007 20.9 330.1From Strangers -0.2697 0.0142 0.0020 0.0010 -19.3 -26.8Speedball -0.0293 -0.0704 0.0146 -0.0008 48.1 524.9Heroin -0.0437 -0.0258 0.0061 -0.0008 31.4 365.6In Treatment -0.3254 0.0243 -0.0126 0.0002 -4.7 215.7Been in Treatment -0.1137 -0.0178 0.0063 -0.0001 5.9 154.8Female 0.0154 0.0217 -0.0076 0.0008 -14.8 -159.8White 0.0000 0.0218 -0.0005 0.0002 -9.9 -101.0African American 0.3623 -0.0073 -0.0026 0.0001 1.6 -209.6Puerto Rican -0.7594 0.0084 0.0101 0.0001 -7.4 673.3Mexican -0.3383 0.0014 0.0014 0.0010 -15.6 76.0Other -0.1994 0.0088 -0.0054 0.0000 -0.7 151.9Age of First Drug Use 1.1219 0.0117 0.0007 0.0003 -8.8 -525.7Drug Use Span 1.8506 -0.0042 -0.0030 -0.0001 4.4 -602.3FUD 0.0332 0.0073 -0.0011 -0.0001 0.4 -19.6FROS 0.0380 -0.0051 0.0000 0.0000 1.7 -10.5FBSA -0.1968 -0.0033 0.0058 0.0002 -5.2 95.4
Note: Here, ρ j , b ,j , and g ,j / g ,j are factor-specific regression coefficients for the probability of exiting the system,probability of returning to the active state, and mean transition times in the CTMC, respectively. Based on thefactor in each row, parameter ∆ denotes the amount by which the conditional expected inter-arrival time changes,and T similarly denotes changes in the system sojourn time, both in days. Table 3: Fitted parameters of the Coxian process2. Clients who can obtain syringes from other locations, such as pharmacies, are more likely toexit the system, perhaps because they are not reliant on SEP services. On the other hand,if the client obtains syringes from other sources (other SEPs, friends, and strangers), whichmay not be as reliable, it is more likely for the client to remain in the system.3. The b -coefficient associated with speedball (a type of drug mixing cocaine with heroin ormorphine) is strongly negative, meaning the client is less likely to stay in the active state,leading to an increase in expected inter-arrival time.4. A client in a treatment program is more likely to stay in the SEP system, as indicated by anegative value of ρ and a positive value of T .5. A female client is more likely to visit frequently, but remains in the system for a shorterperiod of time.6. The probability of exiting the system differs significantly according to ethnicity: African-American clients are more likely to exit the system while the opposite is true for PuertoRican and Mexican clients. 18. Not surprisingly, clients who are more frequently near an SEP site (i.e., have larger values ofFBSA) are less likely to leave the system, are more likely to visit a site frequently, and overallhave longer sojourn times in the system.Such observations from the model we fit may allow our SEP to tailor promotion of their servicesto specific target populations. For example, the starkly different behavior of African-Americanclients may warrant special attention from the SEP in early encounters, relative to PWIDs of otherethnicities. Bean [4] states that African-American drug users are less likely to inject drugs thanWhite drug users. However, among all clients, it is not clear why African Americans are less likelyto seek the syringe exchange services offered by our SEP. Investigating whether African Americansare more likely to quickly abandon injection drug use, versus continue use but not seek SEP services,would likely be needed to guide such strategies. Our results show that it would be beneficial toincrease the frequency of clients being in the service location area, and one possible solution is toincrease the frequency of the mobile van service in certain locations to improve accessibility. The model of Section 3 quantifies client behavior. In this section, we perform statistical tests andcompare results of the simulation outputs with observed data in order to assess model validity.We begin with further details on the implementation of the simulation model. We simulate thearrival of clients to the SEP over 2,310 days, equivalent to the number of days that our SEP wasopen between July 2005 and November 2014. We also simulate an initial 5,000 days as a warm-upperiod, since our SEP was established more than 15 years before 2005. We assume SEP staff arealways available to serve a visiting client, there are no shortages of syringes or other resources thatwould alter client behavior, and the location and operating schedule of storefronts and mobile vansare fixed. In other words, we assume the nature of client visits is governed solely by the features ofthe clients, in a manner consistent with historical data, and is not affected by service or resourceavailability. This matches our understanding of the actual system, as we discuss in Section 2. Inour simulation of 7,310 days in total, for each day we first simulate the number of new clientsaccording to the negative binomial distribution. We assign features to these clients by drawing aclient at random, with replacement, from the list of 5,903 clients described in Section 2. Given thefeatures of a client, x , and the parameters, ρ , b , and g , obtained from model (11), we can calculatethe parameters of the CTMC, β i , γ i , i = 1 ,
2, and p , by: β i = m (cid:88) j =1 b i,j x j i = 1 , γ i = m (cid:88) j =1 g i,j x j i = 1 , a) Coxian Inter-arrival Model (b) Exponential Inter-arrival Model Figure 7: Part (a) of the figure shows the log-log relationship between frequency and the Coxianinter-arrival time model. Part (b) shows the analogous relationship for the exponential inter-arrivaltime model. In both subplots the (simulated) model values are shown with red dots and observeddata are shown with blue dots. The logarithms are base 10 and the underlying inter-arrival timesin days. p = (cid:16) e − ( ρ + (cid:80) mj =1 ρ j x j ) (cid:17) − . (12c)With the CTMC built for every client, upon the arrival of a specific client, we simulate whetherthis client exits the system using p from equation (12c). And, if the client does not exit the system,we simulate the time of the next arrival using the β and γ values from equations (12a)-(12b). Afterthe warm-up period and simulating the 2,310 days of interest, we obtain summary statistics asoutputs based on inter-arrival times, sojourn times, and number of visits of each client.We perform statistical tests to assess the quality of our sub-models for initiation, reoccurringvisits, and termination. For initiation, we selected the negative binomial distribution for reasonsthat we discuss in Section 3.1. Using a Pearson’s chi-squared goodness-of-fit test we obtain a p -value of 0.250, suggesting that we should not reject the null hypothesis that the data are consistentwith the fit distribution. For comparison, we also fit other commonly used distributions (geometric,binomial, uniform, Poisson, and hyper-geometric), and we performed the same goodness-of-fit tests.None of those distributions had a p -value that exceeded a 0.05 level of significance.For reoccurring visits, we compare the distribution of inter-arrival times obtained from thesimulation model with an exponential distribution with a mean of 67 .
50 days, which is the meanof observed inter-arrival times. Figure 7 illustrates such a comparison. Results from the simulatedCoxian process are shown on the left, and those from the exponential distribution are on theright, both in red. In addition, we plot actual observations in blue. The figure suggests that ourCoxian-based simulation model provides a better fit to the observed data.Since we do not directly observe whether a client has left the system, we test the right-censoredsojourn time of clients in the system. We run a Pearson’s chi-squared procedure to test the null20igure 8: The log-log (base 10) relationship between frequency and simulated sojourn time is shownwith red dots, and the same relationship between frequency and observed sojourn time is shownwith blue dots.hypothesis that the simulated distribution is consistent with observed data. The p -value of the testis 0 . Our simulation model can facilitate analysis to provide SEP staff with insights regarding: (i) specificclients who are likely to enter a passive state or exit the system, and (ii) dispatch policies for themobile van. We discuss both of these in turn.
While not immediate from Table 3, our discrete-event simulation model can be used to estimatethat a 40-year old Puerto Rican male, with a history of using drugs for 21 years, who injects heroin10 times per day, wants treatment, uses syringes after others once every 30 days, and frequentsthe area of an SEP site, has a 95% chance of having entered the passive state if he does not visita service location within 56 days, assuming he has not already terminated contact with the SEPsystem. As a result, SEP staff could send a text message to such a client as a reminder if he hasnot returned within two months.
The value of actively reaching out to clients based on insights from our simulation model may befurther enhanced by mobile exchange of syringes. Here, we simulate mobile van dispatch, witha personalized notification push, to show its potential to improve current SEP operations. Thecorresponding simulation model has the following constructs and assumptions.
Clients.
We simulate the initiation and reoccurring visits of clients in the same way describedin Section 4.3 with the following exception: We assume that when a client exits the system, thereis a probability, denoted by p r , that the client is eligible to return to the system with activeintervention. We say that a client is at risk if they reuse syringes, either their own or the syringesof another, and we note that 22.4% of our 5,903 clients are at risk from the survey data. Mobile van.
We assume the SEP has one van, and each weekday the van is dispatched toone of five ZIP codes. We further assume that any client that the SEP contacts within a five-mile22adius of that ZIP code is eligible to be served by the van. We selected the five ZIP codes by solvinga facility-location model, which maximizes coverage of at-risk clients. The van visits each of thesefive ZIP codes in turn, Monday-Friday, each week over the simulation horizon.
Risky behavior.
We assume only at-risk clients engage in risky behavior. And, we assumean at-risk client does not engage in risky behavior if the client is in the CTMC’s active state, butotherwise the client does exhibit risky behavior.
Intervention.
The SEP cannot observe a client’s state or behavior, and hence interventiondecisions are made knowing the time since the client’s last visit and the client’s predictors. Inparticular, γ is the rate associated with the exponential distribution governing the return time, if the client is in the active state. We assume the SEP contacts a client if: (i) the time since the lastvisit exceeds the 0.9-level quantile for the active state’s exponential return-time distribution, and(ii) the client’s ZIP code is within five miles of the ZIP code the van is visiting that day. Re-engaging a client.
An intervention can be successful if the client is in the active state,passive state, or has exited but is eligible to return (with probability p r ). Among these clients, welet p s denote the probability that a contacted client will visit the van, and hence re-engage withthe SEP. If a client ignores the notification, we assume the client stays in the same state: active,passive, or exited. We further assume the SEP stops contacting a client after three notificationattempts have been ignored.Using the simulation model, we compare active intervention with current SEP operations, andwe estimate the relative effectiveness by examining: (i) the number of additional arrivals to thesystem, and (ii) the number of times an SEP intervention re-engages a client who would otherwisebe engaging in risky behavior. We estimate p r as follows. The 0.975-level quantile of the observedinter-arrival time is 552 days, and we use this as a proxy for whether the client has exited thesystem. We denote the number of inter-arrival times exceeding 552 days by N r , and the number ofclients who have not visited any service location after 552 days by N e . We estimate p r via: p r = N r N r + N e . With our data, N r = 1354 and N e = 4281, and so p r ≈ .
24. As we indicate above, we use the 0.9-level quantile for client-specific return-time distributions from the active state to make interventiondecisions, and Figure 9 shows a histogram of these quantiles among the surveyed clients.We run the simulation model under two different system designs: (i) with active intervention viavan dispatch and client notification, and (ii) without active intervention, approximating current SEPoperations. The results are shown in Table 5 as we parametrically vary the success probability, p s .After the warm-up period of 5,000 days, the average number of daily arrivals in the last 2,310 dayshas a mean of 24.6 and a 95% confidence interval halfwidth of 0.9. For each value of p s in the table,we run the simulation 20 times and present 95% confidence intervals. The “Total” column under23igure 9: Histogram of the 90% client-specific quantile values for the return-time distributions fromthe CTMC’s active state for the 5,903 surveyed clients.“No. of Interventions” shows the number of times over 2,310 days that a notified client visited themobile van, and the “Risky” column indicates the subset of those successful interventions involvingat-risk clients in the passive or exited state. p s Average Daily No. of InterventionsArrivals Risky Total0 24 . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . p s = 0) and SEP with active interventionvia van dispatch and client notification. Here, p s denotes the success probability associated withclient notification, and the ± values represent 95% confidence interval halfwidths.Under current SEP operations ( p s = 0), we see about 56,800 client arrivals over the 2,310-dayhorizon. Table 5 shows that a success probability of p s = 0 .
05 leads to about one interventionper day over that time horizon, i.e., less than 4% of the nominal total number of arrivals wouldbe the immediate result of an intervention. However, because a successful intervention leads to are-engaged client returning to the active state, we see a significant 20% growth of about five arrivalsper day relative to the p s = 0 case. And, this corresponds to about one client per week who stoppedrisky behavior because of the intervention. Having only one client per day visiting the van may seemlow, but a regularly scheduled van would also attract other clients not included in our model. More24mportantly, a 20% growth in average daily arrivals would be a welcome improvement counteringsome of the trends we point to in Section 2.2. We see these results as suggesting that there isvalue in the SEP investigating an active intervention scheme similar to our example. Moreover, oursimulation model makes it possible to exploit the contextual nature of our model of inter-arrivaltimes to determine a personalized threshold for push notification. In this paper, we examine the survey and transaction data of one major syringe exchange serviceprovider in the Chicago metropolitan area. We find there is a discrepancy between a slightlydecreasing trend in the number of client transactions with our SEP and an increasing numberof heroin users in Chicago and the United States. We also discover significant differences in thebehavior of clients in terms of how they engage with the SEP based on demographic attributes andfurther personal characteristics. Based on our observations, we focus on producing personalizedpredictions for clients that can aid the SEP in improving the system such as intervention initiativesfor clients with certain attributes.Standard stochastic models, such as Poisson processes, fail to accurately capture the observedinter-arrival process. Therefore, we formulate a CTMC-based simulation model to represent aclient’s path through the system. Our model consists of three sub-models: initiation, reoccurringvisits, and termination, with their parameters learned from linear and logistic regression modelsintegrated into the procedure by which we estimate the model’s parameters. With the aid of thismodel, SEP staff and researchers can analyze the system parameters to draw useful conclusions forgroups with different traits, so that proper actions can be taken towards a specific target group,or even the individual PWID. The quantitative model, combined with the personal interactionwith each client can inform SEP staff of timely intervention opportunities. Such personalizedrecommendations may be particularly useful when the SEP faces challenges in tracking a largenumber of clients. Our simulation model can also help SEP staff evaluate the effectiveness ofcandidate initiatives.In the future, our method could be enhanced by finding an algorithm to fit the parametersfor higher fidelity Markov chain models since our optimization model for parameter estimationdepends on the closed-form representation of the Coxian distribution. Other functional forms forthe predictive models could be integrated into the fitting procedure as well. Further sensitivityanalysis could be performed to evaluate other potential initiatives beyond dispatching the mobilevan. Optimization over the location and route of the van could also be investigated using oursimulation platform. 25 cknowledgments
We thank Dr. Lawrence J. Ouellet from the University of Illinois at Chicago, Dr. Alexander Gut-fraind from Loyola University, Chicago, Dan Bigg from CRA and many other colleagues fromCOIP and CRA who provided insights and access to data that greatly assisted the research. Weare grateful to two referees and an Associate Editor, whose suggestions significantly improved thepaper.
References [1] Abuse, Substance and Administration, Mental Health Services (2014) Results from the 2013national survey on drug use and health: Summary of national findings.
NSDUH Series H-48,HHS Publication No. (SMA) 14-4863. [2] Aksin, Z., Armony, M., and Mehrotra, V. (2007) The modern call center: a multi-disciplinaryperspective on operations management research.
Production and Operations Management ,16(6):665–688.[3] Asmussen, S. (2003) Applied probability and queues.
Stochastic Modelling and Applied Prob-ability , 51(1):355–426.[4] Bean, P. (1993)
Cocaine and Crack: Supply and Use . Palgrave Macmillan, London.[5] Bibinger, M. (2013). Notes on the sum and maximum of independent exponentially distributedrandom variables with different scale parameters. arXiv:1307.3945. URL https://arxiv.org/pdf/1307.3945.pdf .[6] Bluthenthal, R. N., Anderson, R., Flynn, N. M., and Kral, A. H. (2007) Higher syringe coverageis associated with lower odds of HIV risk and does not increase unsafe syringe disposal amongsyringe exchange program clients.
Drug and Alcohol Dependence , 89(2):214–222.[7] Bluthenthal, R. N., Kral, A. H., Gee, L., Erringer, E. A., and Edlin, B. R. (2000) The effect ofsyringe exchange use on high-risk injection drug users: a cohort study.
AIDS , 14(5):605–611.[8] Bobbio, A. and Cumani, A. (1992) ML estimation of the parameters of a PH distribution intriangular canonical form.
Computer Performance Evaluation , 22:33–46.[9] Braine, N., Des Jarlais, D. C., Ahmad, S., Purchase, D., and Turner, C. (2004) Long-termeffects of syringe exchange on risk behavior and HIV prevention.
AIDS Education and Pre-vention , 16(3):264–275. 2610] Buchholz, P., Kriege, J., and Felko, I. (2014)
Input Modeling with Phase-Type Distributionsand Markov Models: Theory and Applications , chapter 2. Springer Briefs in Mathematics.[11] Chehrazi, N., Glynn, P. W., and Weber, T. A. (2019) Dynamic credit-collections optimization.
Management Science , 65(6):2737–2769.[12] Cheung, K. L., Klooster, P. M., Smit, C., de Vries, H., and Pieterse, M. E. (2017) The impactof non-response bias due to sampling in public health studies: A comparison of voluntaryversus mandatory recruitment in a Dutch national survey on adolescent health.
BMC PublicHealth , 17(1):276–286.[13] Clarke, K., Harris, D., Zweifler, J.A., Lasher, M., Mortimer, R. B., and Hughes, S. (2016) Thesignificance of harm reduction as a social and health care intervention for injecting drug users:An exploratory study of a needle exchange program in Fresno, California.
Social Work PublicHealth , 31(5):398–407.[14] DeSimone, J. (2005) Needle exchange programs and drug injection behavior.
Journal of PolicyAnalysis and Management , 24(3):559–577.[15] Faddy, M., Graves, N., and Pettitt, A. (2009) Modeling length of stay in hospital and otherright skewed data: comparison of phase-type, gamma and log-normal distributions.
Value inHealth , 12(2):309–314.[16] Faddy, M. and McClean, S. (1999) Analysing data on lengths of stay of hospital patients usingphase-type distributions.
Applied Stochastic Models in Business and Industry , 15(4):311–317.[17] Faddy, M. and McClean, S. (2005) Markov chain modelling for geriatric patient care.
Methodsof Information in Medicine , 44(3):369–373.[18] Fomundam, S. and Herrmann, J. W. (2007) A survey of queuing theory applications inhealthcare. Technical report, Institute for Systems Research, University of Maryland. URL https://drum.lib.umd.edu/bitstream/handle/1903/7222/tr_2007-24.pdf .[19] Gardner, W., Mulvey, E. P., and Shaw, E. C. (1995) Regression analyses of counts andrates: Poisson, overdispersed Poisson, and negative binomial models.
Psychological Bulletin ,118(3):392–404.[20] Govil, M. K. and Fu, M. C. (1999) Queueing theory in manufacturing: a survey.
Journal ofManufacturing Systems , 18(3):214–240.[21] Hay, B., Henderson, C., Maltby, J., and Canales, J. J. (2017) Influence of peer-based needleexchange programs on mental health status in people who inject drugs: A nationwide NewZealand study.
Frontiers in Psychiatry , 7:211.2722] Holtzman, D., Barry, V., Ouellet, L. J., Des Jarlais, D. C., Vlahov, D., Golub, E. T., Hudson,S. M., and Garfein, R. S. (2009) The influence of needle exchange programs on injection riskbehaviors and infection with hepatitis C virus among young injection drug users in select citiesin the United States, 1994-2004.
Preventive Medicine , 49(1):68–73.[23] Huo, D., Bailey, S. L., and Ouellet, L. J. (2006) Cessation of injection drug use and change ininjection frequency: the Chicago needle exchange evaluation study.
Addiction , 101(11):1606–1613.[24] Huo, D. and Ouellet, L. J. (2007) Needle exchange and injection-related risk behaviors inChicago: a longitudinal study.
Journal of Acquired Immune Deficiency Syndromes , 45(1):108–114.[25] Kidorf, M. and King, V. L. (2008) Expanding the public health benefits of syringe exchangeprograms.
The Canadian Journal of Psychiatry , 53(8):487–495.[26] Lakshmi, C. and Iyer, S. A. (2013) Application of queueing theory in health care: a literaturereview.
Operations Research for Health Care , 2(1):25–39.[27] Liu, Y., Li, S., Li, F., Song, L., and Rehg, J. M. (2015) Efficient learning of continuous-timehidden Markov models for disease progression. In
Advances in Neural Information ProcessingSystems , pages 3600–3608.[28] Locker, D., Wiggins, R., Sittampalam, Y., and Patrick, D. L. (1981) Estimating the prevalenceof disability in the community: the influence of sample design and response bias.
Journal ofEpidemiology & Community Health , 35(3):208–212.[29] Nelson, B. L. and Gerhardt, I. (2010) On capturing dependence in point processes: matchingmoments and other techniques. Technical report, Northwestern University. URL http://users.iems.northwestern.edu/~nelsonb/Publications/GerhardtNelsonSurvey.pdf .[30] Paddock, S. M., Kilmer, B., Caulkins, J. P., Booth, M. J., and Pacula, R. L. (2012) Anepidemiological model for examining marijuana use over the life course.
Epidemiology ResearchInternational , volume 2012, article ID 520894.[31] Papaioannou, T. (2014) Censoring. In Balakrishnan, N., Colton, T., Everitt, B., Piegorsch,W., Ruggeri, F., and Teugels, J. L., editors,
Wiley StatsRef: Statistics Reference Online .American Cancer Society.[32] W¨achter, A. and Biegler, L. T. (2006) On the implementation of an interior-point filterline-search algorithm for large-scale nonlinear programming.
Mathematical Programming ,106(1):25–57. 28
Covariates for the PWIDs Population
When a client initiation occurs in the simulation, we draw a client at random, with replacement,from our dataset of 5,903 unique clients. In this appendix we detail those attributes, beginningwith the fraction of the population with each categorical attribute:GenderMale Female Transgender0.6947 0.3049 0.0004EthnicityWhite African American Puerto Rican Mexican Other Latino Other0.5158 0.2345 0.1478 0.0617 0.0122 0.0280Snort before injectionYes No0.3446 0.6554Participation inshooting galleriesYes No0.0864 0.9136Participation in treatment programsCurrently in Been in Tried to get into Interested inYes 0.1011 0.1870 0.0923 0.4738No 0.8989 0.8130 0.9077 0.5362Drugs used in the past 30 daysSpeedball Heroin Cocaine Ritalin Heroin OtherYes 0.0486 0.9582 0.0581 0.0005 0.0185No 0.9514 0.0418 0.9419 0.9995 0.9815Source of syringesFamily Friends Acquaintance Strangers Other SEP OtherYes 0.0586 0.2529 0.0530 0.0163 0.1360 0.6131No 0.9414 0.7471 0.9470 0.9837 0.8640 0.3869Reuse own syringesYes No0.1579 0.8521Use syringesbehind othersYes No0.1972 0.8028For each of the continuous factors, we present descriptive statistics and histograms of their distri-butions. We use µ to denote the mean of the factor and σ to denote the standard deviation.29 Age: µ = 34 . , σ = 11 . • Age of clients at their first injection: µ = 23 . , σ = 7 . • Length of drug injection history: µ = 11 . , σ = 11 . • Number of daily drug injections: µ = 2 . , σ = 1 . • Number of times reusing own syringes in 30 days: µ = 1 . , σ = 6 . • Number of times using others’ used syringes in a 30 days: µ = 0 . , σ = 1 . • Number of times visiting the area of service locations in 30 days: µ = 23 . , σ = 9 . B Statistical Significance Results of Fitted Parameters