Parameter inference from hitting times for perturbed Brownian motion
NNoname manuscript No. (will be inserted by the editor)
PARAMETER INFERENCE FROM HITTING TIMESFOR PERTURBED BROWNIAN MOTION
M. Tamborrino · S. Ditlevsen · P. LanskyAbstract
A latent internal process describes the state of some system, e.g.the social tension in a political conflict, the strength of an industrial compo-nent or the health status of a person. When this process reaches a predefinedthreshold, the process terminates and an observable event occurs, e.g. the po-litical conflict finishes, the industrial component breaks down or the person hasa heart attack. Imagine an intervention, e.g., a political decision, maintenanceof a component or a medical treatment, is initiated to the process before theevent occurs. How can we evaluate whether the intervention had an effect?To answer this question we describe the effect of the intervention throughparameter changes of the law governing the internal process. Then, the timeinterval between the start of the process and the final event is divided into twosubintervals: the time from the start to the instant of intervention, denoted by S , and the time between the intervention and the threshold crossing, denotedby R . The first question studied here is: What is the joint distribution of ( S, R ) ?The theoretical expression is provided and serves as a basis to answer the mainquestion: Can we estimate the parameters of the model from observations of S and R and compare them statistically? Maximum likelihood estimators areillustrated on simulated data under the assumption that the process before andafter the intervention is described by the same type of model, i.e. a Brownianmotion, but with different parameters. P. LanskyInstitute of Physiology, Academy of Sciences of the Czech Republic,Videnska 1083,Prague 4, 142 20, Czech RepublicE-mail: [email protected] · S. DitlevsenDepartment of Mathematical Sciences, Copenhagen University, Universitetsparken 5, DK-2100, Copenhagen, DenmarkTel.: +45 35320785Fax: +45 35320704E-mail: [email protected], [email protected] a r X i v : . [ s t a t . M E ] D ec Tamborrino, Ditlevsen, Lansky
Keywords first passage times; maximum likelihood estimation; Wienerprocess; degradation process; reliability; effect of intervention
Statistical inference for univariate stochastic processes from observations ofhitting times, i.e. epochs when the process attains a boundary for the firsttime, is a common problem, see Lee and Whitmore (2006) and referencestherein. Here we investigate its specific variant for perturbed stochastic pro-cesses and discuss it in a general setting, presenting some of the fields in whichthis methodology can be applied. At a known time instant, either controlledby an experimentalist or induced by an independent external condition, an in-tervention is initiated and the time to a given event following the interventionis measured. Assume that the intervention causes a change in the parametersof the underlying process. This scenario can be found in many fields, such asreliability theory, social sciences, finance, biology or medicine. The time courseof the intervention can be interpreted as a time-varying explanatory factor ina threshold regression.For analysing reliability of technical systems it is important to investigatedamage processes. A common model is the Wiener process (Whitmore, 1995;Whitmore and Schenkelberg, 1997; Whitmore et al., 1998, 2012; Kahle andLehmann, 1998). In Pieper et al. (1997), changing drifts of Wiener processesdescribes various stress levels for a damage process. Doksum and Hoyland(1992) use a Gaussian process and inverse Gaussian distribution (IGD) to dis-cuss a lifetime model under a step-stress accelerated life test. Nelson (2008)discusses practical issues when conducting an accelerated life test. Yu (2003)proposed a systematic approach to the classification problem where the prod-ucts’ degradation paths satisfy Wiener processes. Our model fits into the aboveframework as follows. The degradation of a component is modeled by a Wienerprocess with failure corresponding to the first crossing of a certain level. Thetime for maintenance is independent of the time since last repair and themaintenance changes the parameters of the Wiener process. Then from mea-surements of the time from last repair to the time of maintenance and fromthe maintenance to the degradation, we deduce the effect of the maintenanceon the system. Similarly to technical systems, a degradation process in a med-ical context is commonly modeled as an intrinsic, but not observable, diffusionstochastic process. With this interpretation, our model takes into account anabrupt change of medication or life style before an observable event takes place.For example, in Commenges and Hejblum (2013) the event is myocardial in-farction or coronary heart disease and the degradation is the atheromatousprocess, which is modeled as a Brownian motion with drift, where the drift isa function of explanatory variables.Lancaster (1972) makes effective use of the IGD in describing data on du-ration of strikes in UK between 1965 and 1972. The rationale is based onthe idea of an underlying Wiener process. Despite that alternative distribu- arameter inference from hitting times 3 tions are proposed (Kennan, 1985; Lawrence, 1984; Newby and Winterton,1983), the approach via the first passage time (FPT) of the Wiener process re-mains one of the alternatives (Harrison and Stewart, 1993; Desmond and Yang,2011). Linden (2000) extends Lancaster’s approach by deriving the strike du-ration density from a controlled Wiener process. The FPT distribution of acontrolled Wiener process is related to IGD, and it is shown that since themaximum-likelihood estimates of expected strike duration with FPT densityfrom a controlled process and IGD are the same, the IGD case offers a simpleand valid approach to the analysis of the strike duration. Again, the modelstudied in this paper fits this reality. Imagine that during a strike an importantoffer towards strikers is proposed. Then the time after may move on a differentscale.In neuroscience, the interval between two consecutive action potentials isoften studied being related to information transfer in neurons. The Wienerprocess is sometimes chosen to model the subthreshold membrane potentialevolution of the neuron (Gerstein and Mandelbrot, 1964) and parameter es-timation has been investigated (Lansky and Ditlevsen, 2008). Moreover, esti-mation from observations of the last action potential before the interventionand the next following it, also in presence of delayed response to the stimulus,has been investigated (Tamborrino et al., 2012, 2013). The current model alsofits this framework.The aim of this paper is to solve two problems. The first is the investigationof the joint distribution of the subintervals up to the instant of intervention,and between the intervention and the first crossing after it. This is needed forthe second problem, namely the estimation of the parameters of the processbefore and after the intervention and testing their equality. Obviously, the twosubintervals are dependent and the statistical inference is complicated by notobserving the position of the process at the time of intervention. The maincontributions of the paper are the solutions to these questions in the case ofa perturbed Brownian motion. A detailed guideline on how to carry out bothsimulation of the data and parameter estimation in the computing environment R (R Development Core Team, 2011) is presented (see Appendices B and C).Using the derived theoretical expressions, estimation could be carried out formore complicated diffusion processes.In Section 2 the character of experimental data together with a descriptionof the involved quantities and variables are presented. In Section 3 we describethe model, mathematically define the quantities of interest and derive theprobability densities for a general diffusion process. The Brownian motionmodel under different assumptions on its parameters is treated in Section 4.The performance of maximum likelihood estimators and testing the differencebetween parameters are illustrated in Section 5 on simulated data. Tamborrino, Ditlevsen, Lansky
Fig. 1
Schematic illustration ofthe single trial. At time , anintervention is initiated, dividingthe observed interval into twosubintervals: the time S up to theinstant of intervention, and thetime R between the interventionand the first crossing after it. Therandom position of the process attime is denoted by X (0) S R0 l X(0)
The character of experimental data and the description of the involved quan-tities are illustrated in Fig. 1. At a time independent of when the processstarted, an intervention is initiated and the time the process has run as wellas the time to an event after the intervention are measured. The time of theintervention is set to by convenience. The intervention divides the observedinterval into two subintervals: the time from the start of the process to the in-stant of intervention, denoted by S , and the time between the intervention andan event after it, denoted by R . Thus, the observed interval has length S + R .The experiment is repeated n times. This allows to obtain n independent andidentically distributed pairs of intervals ( S i , R i ) , for i = 1 , . . . , n . Note that S i and R i are not independent. We describe the dynamics of the system by a diffusion process X ( t ) , startingat some initial value x . An event occurs when X exceeds a threshold B > x for the first time, which by assumption has not happened before time . The(unobserved) position of the process at the time of the intervention is X (0) .Thus, t is running in the interval [ − S, R ] with S, R > , and we assume X ( t ) given as the solution to a stochastic differential equation (cid:26) dX ( t ) = ν ( X ( t ) , t ) dt + σ ( X ( t ) , t ) dW ( t ) ,X ( − S ) = x , X ( R ) = B, X ( t ) < B for t ∈ [ − S, R ) , where W ( t ) is a standard (driftless) Wiener process. We consider ν ( X ( t ) , t ) = ν ( X ( t )) and σ ( X ( t ) , t ) = σ ( X ( t )) for t < , and assume that the in-tervention causes a change in the parameters of the underlying process to ν ( X ( t ) , t ) = ν ( X ( t )) , and likewise for σ ( X ( t ) , t ) . If there is no intervention,the standard approach is to study the FPT of X ( t ) through the constantboundary B , denoted by T . This is the same as the intervention having noeffect. Thus, define T = S + inf { t > X ( t ) ≥ B | ν = ν , σ = σ } . Here T isnot observed, but we can still consider its distribution. arameter inference from hitting times 5 S , X (0) , R and ( S, R ) It is well known from the theory of point processes that the backward recur-rence time S is length biased, and the density is a functional of the distributionof T . In particular, the probability density function (pdf) of S is given by (Coxand Lewis, 1966), f S ( s ) = ¯ F T ( s ) E [ T ] (1)where ¯ F T ( s ) = 1 − F T ( s ) = P ( T > s ) denotes the survival function, and E [ T ] is the mean of T . The first two moments of S are given by, (Cox and Lewis,1966), E [ S ] = E [ T ]2 E [ T ] ; Var [ S ] = 4 E [ T ] E [ T ] − E [ T ] E [ T ] . (2)The conditional density of X (0) given that B has not been crossed up to time is, (Aalen and Gjessing, 2001), f X (0) ( x | s ) = ∂∂x P ( X (0) < x, T > s ) P ( T > s ) = f aX (0) ( x, s )¯ F T ( s ) , (3)where f aX (0) ( x, s ) denotes the pdf of the process at time in presence of aconstant absorbing boundary and given that X ( − S ) = 0 . The unconditionaldensity of X (0) is given by f X (0) ( x ) = (cid:90) ∞ f X (0) ( x | s ) f S ( s ) ds = 1 E [ T ] (cid:90) ∞ f aX (0) ( x, s ) ds, (4)where we used (1) and (3). The variable R coincides with the FPT of X throughthe boundary B , when the process starts in the random position X (0) < B with conditional density f R | X (0) ( r | x ) . The unconditional pdf of R is given by f R ( r ) = (cid:90) B −∞ f R | X (0) ( r | x ) f X (0) ( x ) dx. (5)The joint pdf of ( S, R ) is f ( S,R ) ( s, r ) = 1 E [ T ] (cid:90) B −∞ f R | X (0) ( r | x ) f aX (0) ( x, s ) dx (6)since F ( S,R ) ( s, r ) = (cid:90) s P ( R < r | S = u ) f S ( u ) du = (cid:90) s (cid:90) B −∞ P ( R < r | X (0) = x, S = u ) f X (0) ( x | u ) f S ( u ) dxdu = (cid:90) s (cid:90) B −∞ (cid:90) r f R | X (0) ( t | x ) f X (0) ( x | u ) f S ( u ) dtdxdu = 1 E [ T ] (cid:90) s (cid:90) B −∞ (cid:90) r f R | X (0) ( t | x ) f aX (0) ( x, u ) dtdxdu, Tamborrino, Ditlevsen, Lansky where we condition on X (0) , then use the Markov property, and finally insert(1) and (3). Consider a Wiener process X with ν ( X ( t )) = µ > and σ ( X ( t ) , t ) = σ > for t < and assume that the intervention causes a change in the parametersof the underlying process to µ , σ > . Because of the space homogeneity, set x = 0 without loss of generality. Since X is a Wiener process with positivedrift, T follows an IGD, T ∼ IG ( B/µ , B /σ ) , mean E [ T ] = B/µ andvariance Var [ T ] = Bσ /µ (Chhikara and Folks, 1989). The pdf of S followsfrom (1), f S ( s ) = µ B (cid:40) Φ (cid:32) B − µ s (cid:112) σ s (cid:33) − exp (cid:20) µ Bσ (cid:21) Φ (cid:32) − B − µ s (cid:112) σ s (cid:33)(cid:41) , (7)where Φ ( · ) denotes the cumulative distribution function of a standard normaldistribution. Inserting the first three moments of T into (2), we get E [ S ] = Bµ + σ µ ; Var [ S ] = 13 (cid:18) ( Bµ + 3 σ )2 µ (cid:19) ; CV ( S ) = Bµ + 3 σ √ Bµ + σ ) , (8)where CV ( S ) denotes the coefficient of variation of S , defined as the ratiobetween the standard deviation and the mean. The pdf of X (0) in presenceof a constant absorbing boundary B is (Aalen and Gjessing, 2001; Cox andMiller, 1977; Giraudo et al., 2011; Sacerdote and Giraudo, 2013) f aX (0) ( x, s ) = 1 (cid:112) πσ s (cid:26) exp (cid:20) − ( x − µ s ) σ s (cid:21) − exp (cid:20) µ Bσ − ( x − B − µ s ) σ s (cid:21)(cid:27) , (9)for x ∈ ( −∞ , B ) . Inserting (9) into (4), we get f X (0) ( x ) = 1 B (cid:20) exp (cid:18) µ ( x − | x | ) σ (cid:19) − exp (cid:18) µ ( x − B ) σ (cid:19)(cid:21) . (10)The mean and variance of X (0) are given by E [ X (0)] = Bµ − σ µ , Var [ X (0)] = B µ + 3 σ µ . (11)The distribution of R conditioned on X (0) = x is R | X (0) ∼ IG (cid:0) ( B − x ) /µ , ( B − x ) /σ (cid:1) .Plugging this and (10) into (5), we obtain f R ( r ) = µ B (cid:20) Φ (cid:18) B − µ rσ √ r (cid:19) − Φ (cid:18) − µ √ rσ (cid:19)(cid:21) + µ σ − µ σ Bσ exp (cid:18) µ r ( µ σ − µ σ ) σ (cid:19) × (cid:20) exp (cid:18) µ Bσ (cid:19) Φ (cid:18) − Bσ + 2 rµ σ − µ rσ σ σ √ r (cid:19) − Φ (cid:18) − µ rσ − µ rσ σ σ √ r (cid:19)(cid:21) . arameter inference from hitting times 7 Finally, using (9) and f R | X (0) in (6), we get f ( S,R ) ( s, r ) = µ B (cid:112) π [ σ s + σ r ] exp (cid:26) − ( B − µ s − µ r ) σ s + σ r ) (cid:27) × (cid:40) [( B − µ s ) σ + sµ σ ] Φ (cid:32) √ r ( B − µ s ) σ + sµ σ σ σ (cid:112) s ( σ s + σ r ) (cid:33) − exp (cid:26) rB ( µ σ − µ σ ) σ ( σ s + σ r ) (cid:27) [( − B − µ s ) σ + µ σ s ] Φ (cid:32) ( − B − µ s ) σ + µ σ sσ σ (cid:112) s ( σ s + σ r ) √ r (cid:33)(cid:41) . (12)No closed expressions for CV ( R ) , covariance and correlation of S and R areavailable, except for σ i = kµ i , k > , as described below. In Fig. 2 we il-lustrate CV ( S ) given by (8) and numerically approximate CV ( R ) , Cov ( S, R ) and Corr ( S, R ) for those parameter values used in Section 5. Note that when µ → ∞ , the expected time for an event after the intervention goes to zero; E [ R ] → . Also, Var [ R ] → , whereas CV ( R ) does not, as shown in Fig. 2.4.1 Special case: squared diffusion coefficients proportional to the driftsUp to now, we made no assumptions on the relation between changes in thedrift and changes in the variance of the Wiener process. However, in many ap-plications larger values of a variable are followed by a larger variance. This isformalized, for example, by the well known psychophysical Weber’s law, claim-ing that the standard deviation of the signal is proportional to its strength(Laming, 1986). Applying this law to the IGD by relating mean and standarddeviation, given prior to eq. (7), we obtain that σ is proportional to µ . Ananalogous result can be derived from the diffusion approximation procedure(Lansky and Sacerdote, 2001). We therefore assume the squared diffusion coef-ficients proportional to the drift coefficients, i.e. σ i = kµ i , for k > , i = 1 , , .The above expressions simplify to E [ S ] = B + k µ , Var [ S ] = ( B + 3 k ) µ , CV ( S ) = B + 3 k √ B + k ) , (13) E [ X (0)] = B − k , Var [ X (0)] = B + 3 k ,f R ( r ) = µ B (cid:26) Φ (cid:18) B − µ r √ kµ r (cid:19) − exp (cid:18) Bk (cid:19) Φ (cid:18) − B − µ r √ kµ r (cid:19)(cid:27) = ¯ F T ∗ ( r ) E [ T ∗ ] , where T ∗ denotes the FPT through B of the Wiener process starting in with drift µ and diffusion coefficient √ kµ . Note that R is distributed as theforward recurrence time of T ∗ , as well as S is distributed as the backwardrecurrence time of T . Thus E [ R ] = B + k µ , Var [ R ] = ( B + 3 k ) µ , CV ( R ) = B + 3 k √ B + k ) . (14) Tamborrino, Ditlevsen, Lansky . . . m A C V ( R ) − . − . m C o rr ( S , R ) . . . m C V ( R ) − . . . m C o rr ( S , R ) . . . m C V ( S ) s = s = s = s = . . . m C V ( R ) B − . . . m C o rr ( S , R ) − . . . k C CV(S)==CV(R)Corr(S,R)
Fig. 2
Theoretical CVs of S and R and Corr ( S, R ) as functions of µ , µ and k . Panel A)No further assumptions are made. The parameters are µ = 1 , σ = 0 . , σ = 0 . , yieldingan approximate CV ( S ) = 0 . . Panel B) Equal variances σ = σ = 0 . , . , and , theparameters are µ = 1 for µ ∈ [0 . , , yielding an approximate CV ( S ) = 0 . , . , . and . ; µ = 1 for µ ∈ [0 . , . Panel C) The variances are proportional to the drifts,i.e. σ i = kµ i , k > . The parameters are µ = 1 and µ = 2 . Note that in this case,CV ( S ) , CV ( R ) and Corr ( S, R ) are the same for any value of µ and µ , since they do notdepend on µ and µ (see Section 4.1) Interestingly, CV ( S ) = CV ( R ) and they only depend on k and not on thespecific values of the coefficients. The joint pdf of S and R is f ( S,R ) ( s, r ) = µ µ (cid:112) πk ( µ s + µ r ) exp (cid:18) − ( B − µ s − µ r ) k ( µ s + µ r ) (cid:19) = µ µ B f IG ( B,B /k ) ( µ s + µ r ) , (15) arameter inference from hitting times 9 and the covariance and correlation of S and R areCov ( S, R ) = E [ SR ] − E [ S ] E [ R ] = 3 k − B µ µ , (16)Corr ( S, R ) =
Cov ( S, R ) (cid:112) Var [ S ] Var [ R ] = 3 k − B ( B + 3 k ) , (17)see Appendix A. Note that the correlation can be positive, null or negative,depending on whether < k < B/ √ , k = B/ √ or k > B/ √ , respectively.Moreover, Corr ( S, R ) → − as k → , i.e. σ i → , while CV ( S ) = CV ( R ) →√ and Corr ( S, R ) → / as k → ∞ , i.e. σ i → ∞ , i = 1 , . The aim of this paper is the estimation of the parameters of X from a sam-ple { ( s i , r i ) } ni =1 of n independent observations of ( S, R ) , and testing if theintervention has an effect by the hypothesis H : µ = µ . Three scenariosare considered: no information about the parameters is available; we assumeequal variances σ = σ = σ ; or we assume σ i = kµ i , as in Section 4.1.That is, we want to estimate either φ = ( µ , σ , µ , σ ) , φ = ( µ , µ , σ ) or φ = ( µ , µ , k ) . Since the ( s i , r i ) ’s, i = 1 , . . . , n are independent and identi-cally distributed, the log-likelihood is l ( s,r ) ( φ ) = (cid:80) ni =1 log f ( S,R ) ( s i , r i ) . Themaximum likelihood estimator ˆ φ is found by maximizing l ( s,r ) numerically(see Appendix C). An approximate 95% confidence interval (CI) for φ i isgiven by ˆ φ i ± . SE ( ˆ φ i ) , where SE is the asymptotic standard error given bySE ( ˆ φ i ) = (cid:113) I ii ( ˆ φ ) − /n , where I ( φ ) is the Fisher information matrix (Cramer,1946), which we approximate numerically (see Appendix C). To test the hy-pothesis H : µ = µ we perform a likelihood ratio test at a significancelevel, evaluating it in a chi-squared distribution with one degree of freedom.We reject H if − L ( ˆ φ ) /L full ( ˆ φ )] > . , where L and L full denote thelikelihood functions of the null and full (alternative) model evaluated in theestimated parameters ˆ φ = (ˆ µ, ˆ σ ) and ˆ φ = (ˆ µ , ˆ µ , ˆ σ ) under the hypotheses µ = µ = µ and µ (cid:54) = µ , respectively. This test can be applied in all theconsidered scenarios, but for simplicity we only report results for the case ofequal variances. Results for the other cases are similar. We assume both theparametric form of the underlying process and the relations between parame-ters, if any, to be known. It can be discussed if these assumptions are realistic.Equality of diffusion coefficients, or the assumption of variance proportionalto the mean, can be checked by likelihood ratio test.5.1 Monte Carlo simulation studyFor the simulations, parameter values are chosen such that the mean of T inthe case of no intervention is five times its standard deviation. This is obtained Average Empirical Asymptotic Average Empirical AsymptoticCV(R) of ˆ µ SE (ˆ µ ) SE (ˆ µ ) CP (ˆ µ ) of ˆ σ SE (ˆ σ ) SE (ˆ σ ) CP (ˆ σ ) ˆ µ SE (ˆ µ ) SE (ˆ µ ) CP (ˆ µ ) of ˆ σ SE (ˆ σ ) SE (ˆ σ ) CP (ˆ σ ) Table 1
Averages, empirical and asymptotic SEs and CPs in percentage over 1000 es-timates of φ = ( µ , σ , µ , σ ) for n = 100 , when µ = 1 , σ = 0 . , µ = 0 . , and σ = 0 . , . , . , or . , yielding an approximate CV ( R ) = 0 . , . , . or . , respectively. In all cases, CV ( S ) = 0 . . by setting B = 10 , µ = 1 , σ = 0 . , yielding E [ T ] = 10 , and Var [ T ] = 4 . Then µ and σ are varied to investigate different regimes of the model. Also theeffect of the intervention is varied through the parameters µ and σ . Samplesof size n = 100 are simulated, and for each set of parameter values, we repeatsimulation of data set and estimation 1000 times, obtaining 1000 statisticallyindependent trials.We calculated coverage probabilities (CPs), defined as the probability thatthe CI covers the true value, to evaluate the performance of the CIs. The CPshould be close to − α , where α = 0 . is the significance level, and the CIshould be narrow for a reliable estimator.The computing environment R has been used to carry out both the simula-tions of ( s i , r i ) and the parameter estimation. A description of the simulationprocedure is reported in Appendix B.No further assumptions on parametersWe choose µ = 1 , σ = 0 . and thus CV ( S ) = 0 . . First we fix µ andvary σ , then we fix σ and let µ vary. In the first case, we fix µ = 0 . ,implying that the intervention slows down the process, since µ < µ . Toobtain CV ( R ) = 0 . , . , . or . , we set σ = 0 . , . , . , or . .Averages and empirical SEs of the estimates, as well as medians of theasymptotic SEs and the CPs of the CIs are reported in Table 1. All estimatorsappear unbiased and with acceptable SEs. The empirical and asymptotic SEsare approximately equal, suggesting that n = 100 is sufficient for asymptoticsto be valid. Not surprisingly, the performance improves when the CV of R decreases. This holds also for ˆ µ and ˆ σ , highlighting the dependence between S and R : a large variability after the intervention deteriorates estimation ofparameters governing the process before the intervention. All CPs are closeto the desired . The CPs of µ and µ are higher than those of σ and σ . This phenomenon disappears for larger n , when all CPs are around (results not shown). arameter inference from hitting times 11 . . . m SE ( m ^ ) empirical SE using only Sasymptotic SE using only Sempirical SE using both S and Rasymptotic SE using both S and R . . . m SE ( s ^ ) . . . m SE ( m ^ ) . . . m SE ( s ^ ) Fig. 3
Empirical and asymptotic SEs over 1000 estimates of ( µ , σ , µ , σ ) for n = 100 as a function of µ when no assumptions on the parameters are made. The parameters are µ = 1 , σ = 0 . , σ = 0 . , yielding an approximated CV ( S ) = 0 . . Full lines: empiricalSEs. Dashed lines: asymptotic SEs. Colors correspond to the SEs of the estimators obtainedby either maximizing l ( S,R ) (black lines), or maximizing log f S (gray lines), respectively In the second case, we let µ vary in the interval [0 . , , and fix σ = 0 . .Here the response to the intervention either slows down or accelerates theprocess, depending on whether µ < µ or µ < µ , respectively.A relevant question is how much, if at all, the estimators of µ and σ improve by considering the more complicated likelihood based on eq. (12)compared to the simple likelihood based on eq. (7), where information from R is ignored. All estimators appear unbiased (figures not shown). The estimatesof µ and σ obtained from observations of ( S, R ) outperform those obtainedonly from observations of S , as can be seen comparing both their empirical andasymptotic SEs in Fig. 3. When µ increases, the performance of ˆ µ and ˆ σ improve and that of ˆ µ and ˆ σ get worse even if CV of R decrease. Moreover,the empirical and asymptotic SEs for ˆ µ and ˆ σ are quite different for large µ , e.g. µ = 10 , µ = 500 , meaning that n = 100 is not sufficient for asymp-totics to be valid. In the other cases the empirical and asymptotic SEs areapproximately equal, and thus in the following we only report the asymptoticvalues. m SE ( m ^ ) . . . . m SE ( m ^ ) . . . . l s = s = s = s = m SE ( s ^ ) . . . . . m SE ( m ^ ) . . . m SE ( m ^ ) . . . m SE ( s ^ ) . . . Fig. 4
Asymptotic SEs over 1000 estimates of ( µ , µ , σ ) for n = 100 as a function of µ (upper panels) and of µ (lower panels) for equal variances, σ = σ = σ . In both cases, σ = 0 . (full lines), . (dashed lines), (dotted lines) and (dotted-dashed lines). In theupper panel, µ = 1 (upper panels) yielding an approximate CV ( S ) = 0 . , . , . and . , respectively, and in the lower panel µ = 1 m % r e j e c t H : m = m m = s = s = s = s = Fig. 5
Percentage of rejections, using the likelihood ratio test at significance level ofthe null hypothesis H : µ = µ as a function of µ for equal variances, σ = σ = σ .The parameters are µ = 1 , σ = 0 . (full line), 0.4 (dashed line), 1 (dotted line) and 2(dashed-dotted line)arameter inference from hitting times 13 Equal variancesWhen σ = σ = σ , we put σ = 0 . , . , and , respectively, with either µ = 1 and µ ∈ [0 . , or µ = 1 and µ ∈ [0 . , . The variability of theestimators for different values of µ and µ is reported in Fig. 4, where theSEs of the estimators are plotted against µ . The estimators appear unbiased(results not shown). All of them improve when σ decreases, since that reducesthe variability of both S and R . The performance of ˆ µ i improves while thatof ˆ µ j gets worse when µ j increases, for i, j = 1 , and i (cid:54) = j . Interestingly, theperformance of ˆ σ seems to be constant with respect to µ , unless σ is large.A likelihood ratio test is performed for testing the hypothesis H : µ = µ at a significance level and the percentage of rejections of H as a functionof µ is reported in Fig. 5. If µ = µ , we want the percentage to be around , while if µ (cid:54) = µ , the percentage represents the power of the test, i.e. theprobability of correctly rejecting the null hypothesis, and we want it as highas possible. When µ = µ = 1 , this percentage is around for σ = 0 . , . and , suggesting that n = 100 is sufficient for asymptotics to be valid. When σ = 2 , the percentage is . and then a larger n should be considered.Not surprisingly, the power of the test decreases when σ increases, but it isworthwhile noting that it is larger than when | µ − µ | > . and around if | µ − µ | ≥ . , indicating a satisfactory performance of the test.Variance proportional to the meanNow assume σ i = kµ i , for k > . The parameter values are k ∈ [0 . , and µ , µ ∈ { . , , } . The performance of the estimators is reported in Fig. 6,where SE (ˆ µ i ) /µ i and SE (ˆ k ) are plotted against k . Also in this case, estimatorsappear unbiased (results not shown). As expected from the theoretical resultsin Section 4.1, the performance of ˆ µ and ˆ µ appears similar, and it doesnot depend on µ and µ , respectively. Interestingly, the asymptotic SE of ˆ k depends neither on µ nor on µ , but only on k . This may be due to the factthat neither the CVs of S and R nor their correlation depend on µ and µ ,see eqs. (13), (14) and (17). When any intervention is applied, the most natural question arising is aboutits effect. Here, the effect is reflected in the change of the time to an observableevent. However, there is no apparent information available about what sucha time would be if no action is taken. In this paper we solve the problem bycomparing time to the intervention and the time to the final event. The param-eters of the underlying system are both identified and statistically comparedto judge the presence of an effect. The method represents a potential tool inall the experimental situations where direct measurements are not available,but only the qualitative changes are observable. . . . k SE ( m i ^ ) m i . . k SE ( k ^ ) Fig. 6
Asymptotic SEs over 1000 estimates of µ i and k for n = 100 rescaled by µ i asa function of k when the variance is proportional to the mean, σ i = kµ i , i = 1 , . Theparameters are µ = 1 and µ = 2 . The results for SE (ˆ µ ) /µ and SE (ˆ µ ) /µ are almostindistinguishable. The same results hold for other combinations of ( µ , µ ) and are thereforenot reported Acknowledgements
S.D. was supported by the Danish Council for Independent Research | NaturalSciences. P.L. supported by grant No. AV0Z50110509. The work is part of theDynamical Systems Interdisciplinary Network, University of Copenhagen.
AppendixA Covariance and Correlation of S and R when σ i = kµ i Let P ∼ IG ( B, B /k ) , and thus E [ P ] = B . Then, using (15), we have E [ SR ] = (cid:90) ∞ (cid:90) ∞ srf ( S,R ) drds = (cid:90) ∞ sµ B (cid:90) ∞ µ rf P ( µ s + µ r ) drds = (cid:90) ∞ sµ B (cid:90) ∞ µ s µ ( t − µ s ) f P ( t ) dtds = 1 µ µ B (cid:90) ∞ u (cid:90) ∞ u ( t − u ) f P ( t ) dtdu = 1 µ µ B (cid:90) ∞ u (cid:90) ∞ u tf P ( t ) dtdu − µ µ B (cid:90) ∞ u ¯ F p ( u ) du. (18)Calculating the integral in dt by parts, we get (cid:90) ∞ u tf P ( t ) dt = [ − t ¯ F P ( t )] | ∞ u + (cid:90) ∞ u ¯ F p ( t ) dt = u ¯ F P ( u ) + (cid:90) ∞ u ¯ F p ( t ) dt, (19)where − t ¯ F P ( t ) → when t → ∞ because ¯ F ( t ) = o ( t − ) as t → ∞ . Define now a variable Q by f Q ( t ) = ¯ F P ( t ) E [ P ] = ¯ F P ( t ) B . arameter inference from hitting times 15Then, inserting (19) into (18) and simplifying the resulting expression, we obtain E [ SR ] = 1 µ µ B (cid:90) ∞ u (cid:90) ∞ u ¯ F P ( t ) dtdu = 1 µ µ (cid:90) ∞ u (cid:90) ∞ u f Q ( t ) dtdu = 1 µ µ (cid:90) ∞ u ¯ F Q ( u ) du. (20)Similarly, let Z be a variable defined by f Z ( u ) = ¯ F Q ( u ) E [ Q ] . Then (20) becomes E [ SR ] = E [ Q ] µ µ (cid:90) ∞ u ¯ F Q ( u ) E [ Q ] du = E [ Q ] µ µ E [ Z ] , (21)where E [ Z ] = 12 E [ Q ] + 12 Var [ Q ] E [ Q ] , see eqs. (1) and (2). Mimicking the calculations done for S in (13), we obtain E [ Q ] = ( B + k ) / , Var [ Q ] = ( B + 3 k ) / . Plugging them into E [ Z ] first and then (21), and simplifyingthe resulting expression, we get E [ SR ] = B + 3 Bk + 3 k µ µ . Finally, (16) follows using (13) and (14).
B Simulation in R
To simulate ( s i , r i ) , i = 1 , . . . , n we proceed as follows. We simulate s i by applying the inversetransforming sampling to the cumulative distribution function of S , which is obtained bynumerically integrating (1) using the function integrate in R . We obtain s i by simulating u i from a uniform distribution on [0 , , and solving F S ( s i ) − u i = 0 with respect to s i bymeans of the function uniroot in R . To obtain an observation r i from R we first simulate x , i.e. the position X (0) of the process at the time of intervention. We use the inversetransforming sampling to the distribution of X (0) , obtained by integrating (3) with respectto x , i.e. F X (0) ( x | s ) = F aX (0) ( x, s ) / P ( T > s ) . Because X is a Wiener process, F aX (0) ( x, s ) isgiven by (9), F a ( x, s ) = Φ x − µ s (cid:113) σ s − exp (cid:20) µ Bσ (cid:21) Φ x − B − µ s (cid:113) σ s . Using x , an observation r i from R is drawn from IG (( B − x ) /µ , ( B − x ) /σ ) . C Estimation of φ and I ( φ ) in R Since all parameter values need to be positive, maximizing the log-likelihood is a constrainedoptimization problem. However, the estimated parameters are always positive when estimat-ing φ simply by minimizing − l ( s,r ) by means of the function optim .Since l ( s,r ) is a complicated function of φ , it can frequently happen that it has severallocal maxima. To find the global maximum, sensible starting values are paramount. Thestarting value φ for the iterations is chosen by the following strategy:6 Tamborrino, Ditlevsen, Lanskya. Obtain µ ∗ , σ ∗ by maximizing the log-likelihood log f S from s i , i = 1 , . . . , n , withstarting values given by means of moment estimation of S ; plug µ ∗ , σ ∗ into (11)to estimate the expected position at the time of intervention, i.e. ˆ x = (cid:92)E [ X (0)] ; us-ing r i and ˆ x , obtain µ ∗ , σ ∗ as moment estimators for µ and σ when R | X (0) ∼ IG (( B − ˆ x ) /µ , ( B − ˆ x ) /σ ) , i.e. µ ∗ = B − ˆ x ¯ r , σ ∗ = emp.var ( R ) µ ∗ B − ˆ x (22)where ¯ r denotes the average of the observations r i . Alternatively, µ ∗ and σ ∗ may be themaximum likelihood estimator (Chhikara and Folks, 1989). Then φ = ( µ ∗ , σ ∗ , µ ∗ , σ ∗ ) is the starting value. When the variances are equal, the starting value is φ = ( µ ∗ , σ ∗ , µ ∗ ) .When the variance is proportional to the mean, obtain µ ∗ , k ∗ by maximizing the log-likelihood log f S from s i , i = 1 , . . . , n , with starting values given by means of momentestimation of S through (13); obtain µ ∗ as moment estimator for µ from (14), i.e. µ ∗ = ( B + k ∗ ) / r . Then set φ = ( µ ∗ , µ ∗ , k ∗ ) .To reduce the influence of the starting value in the optimization procedure, we proceed asfollows. Once that φ has been computed, we carry out the estimation procedure, and thenwe use the obtained estimate ˆ φ as a new starting value φ . We repeat this procedure until φ and the estimated parameters yield approximately the same value of − log f ( S,R ) .Often an explicit expression for the inverse of the Fisher information I ( φ ) − is notavailable, but it can be numerically evaluated. We calculate the d × d matrix I ( φ ) /n , for d = 4 when no assumptions are made and d = 3 when σ = σ or σ i = kµ i using theoption hessian=TRUE in the optim function. Since I ( φ ) is symmetric, positive definite squarematrix, we invert it by means of its Cholesky decomposition. We first use the function chol to compute the Cholesky factorization and then chol2inv to invert it. References
O.O. Aalen and H.K. Gjessing. Understanding the shape of the hazard rate: A process pointof view.
Stat. Science , 16:1–22, 2001.R. S. Chhikara and J. L. Folks.
The inverse Gaussian distribution: theory, methodology,and applications . Marcel Dekker, New York, 1989.D Commenges and B P Hejblum. Evidence synthesis through a degradation model appliedto myocardial infarction.
Liftime Data Analysis , 19(1):1–18, 2013.D. R. Cox and P. A. W. Lewis.
The Statistical Analysis of Series of Events . Methuen,London, 1966.D.R. Cox and H.D. Miller.
The Theory of Stochastic Processes . Chapman and Hall, 1977.H. Cramer.
Mathematical Methods of Statistics . Princeton University Press, 1946.A. F. Desmond and Z. L. Yang. Score tests for inverse Gaussian mixtures.
Applied StochasticModels in Business and Industry , 27(6):633–648, 2011.KA Doksum and A Hoyland. Models for variable-stress accelerated life testing experimentsbased on Wiener-processes and the Inverse Gaussian distribution.
Technometrics , 34(1):74–82, 1992.G.L. Gerstein and B. Mandelbrot. Random walk models for the spike activity of a singleneuron.
Biophys. J. , 4:41–68, 1964.M.T. Giraudo, P.E. Greenwood, and L. Sacerdote. How sample paths of leaky integrate-and-fire models are influenced by the presence of a firing threshold.
Neural Comput. , 23:1743–1767, 2011.A Harrison and M Stewart. Strike duration and strike size.
Canadian Journal of Economics-Revue Canadienne D Economique , 26(4):830–849, 1993.W Kahle and A Lehmann.
Advances in Stochastic Models for Reliability, Quality and Safety ,chapter Parameter Estimation in Damage Processes: Dependent Observations of DamageIncrements and First Passage Time, pages 139–152. Birkhauser: Boston, 1998.J Kennan. The duration of contract strikes in united-states manufacturing.
Journal ofEconometrics , 28(1):5–28, 1985.arameter inference from hitting times 17D. Laming.
Sensory analyses . Academic Press, London, 1986.T Lancaster. Stochastic model for the duration of a strike.
Journal of the Royal StatisticalSociety Series A , 135:257–&, 1972.P. Lansky and S. Ditlevsen. A review of the methods for signal estimation in stochasticdiffusion leaky integrate-and-fire neuronal models.
Biol. Cybern. , 99:253–262, 2008.P. Lansky and L. Sacerdote. The Ornstein-Uhlenbeck neuronal model with the signal-dependent noise.
Physics Letters A , 285:132–140, 2001.RJ Lawrence. The lognormal-distribution of the duration of strikes.
Journal of the RoyalStatistical Society Series A , 147(3):464–483, 1984.M-L T Lee and G A Whitmore. Threshold regression for survival analysis: Modeling eventtimes by a stochastic process reaching a boundary.
Statistical Science , 21(4):501–513,2006.M Linden. Modelling strike duration distribution: A controlled Wiener process approach.
Applied Stochastic Models in Business and Industry , 16(1):35–45, 2000.Wayne Nelson.
Accelerated Degradation , pages 521–548. John Wiley & Sons, Inc., 2008.ISBN 9780470316795. doi: 10.1002/9780470316795.ch11. URL http://dx.doi.org/10.1002/9780470316795.ch11 .M Newby and J Winterton. The duration of industrial stoppages.
Journal of the RoyalStatistical Society Series A , 146(1):62–70, 1983.V Pieper, M Domine, and P Kurth. Level crossing problems and drift reliability.
Mathe-matical Methods of Operations Research , 45(3):347–354, 1997.R Development Core Team.
R: A Language and Environment for Statistical Comput-ing . R Foundation for Statistical Computing, Vienna, Austria, 2011. URL . ISBN 3-900051-07-0.L. Sacerdote and M.T. Giraudo. Leaky Integrate and Fire models: a review on mathematicalmethods and their applications. In
Stochastic biomathematical models with applicationsto neuronal modeling , volume 2058 of
Lecture Notes in Mathematics , pages 95 –148.Springer, 2013.M. Tamborrino, S. Ditlevsen, and P. Lansky. Identification of noisy response latency.
Phys.Rev. E , 86:021128, 2012.M. Tamborrino, S. Ditlevsen, and P. Lansky. Parametric inference of neuronal responselatency in presence of a background signal.
BioSystems , 112:249–257, 2013.G. A. Whitmore, T. Ramsay, and S. D. Aaron. Recurrent first hitting times in Wienerdiffusion under several observation schemes.
Lifetime Data Analysis , 18(2):157–176, 2012.GA Whitmore. Estimating degradation by a Wiener diffusion process subject to measure-ment error.
Lifetime Data Analysis , 1:307–319, 1995.GA Whitmore and F Schenkelberg. Modelling accelerated degradation data using Wienerdiffusion with a time scale transformation.
Lifetime Data Analysis , 3:27–45, 1997.GA Whitmore, MJ Crowder, and JF Lawless. Failure inference from a marker process basedon a bivariate Wiener model.
Lifetime Data Analysis , 4(3):229–251, 1998.HF Yu. Optimal classification of highly-reliable products whose degradation paths satisfyWiener processes.