A penalized likelihood approach for efficiently estimating a partially linear additive transformation model with current status data
AA penalized likelihood approach for efficiently estimatinga partially linear additive transformation model withcurrent status data
Yan Liu, Minggen LuSchool of Community Health Sciences, University of Nevada, RenoChristopher S. McMahanDepartment of Mathematical Sciences, Clemson UniversityApril 25, 2019
Abstract
Current status data are commonly encountered in medical and epidemiological studies in whichthe failure time for study units is the outcome variable of interest. Data of this form are charac-terized by the fact that the failure time is not directly observed but rather is known relative to anobservation time; i.e., the failure times are either left- or right-censored. Due to its structure, theanalysis of such data can be challenging. To circumvent these challenges and to provide for aflexible modeling construct which can be used to analyze current status data, herein, a partiallylinear additive transformation model is proposed. In the formulation of this model, constrained B -splines are employed to model the monotone transformation function and nonlinear covariateeffects. To provide for more efficient estimates, a penalization technique is used to regularizethe estimation of all unknown functions. An easy to implement hybrid algorithm is developedfor model fitting and a simple estimator of the large-sample variance-covariance matrix is pro-posed. It is shown theoretically that the proposed estimators of the finite-dimensional regres-sion coefficients are root- n consistent, asymptotically normal, and achieve the semi-parametricinformation bound while the estimators of the nonparametric components attain the optimalrate of convergence. The finite-sample performance of the proposed methodology is evaluatedthrough extensive numerical studies and is further demonstrated through the analysis of uterineleiomyomata data. Keywords: B -splines; Current status data; Isotonic regression; Partially linear additive transforma-tion model; Penalized estimation. 1 a r X i v : . [ s t a t . M E ] A p r Introduction
Current status data is characterized by the fact that the failure/event time cannot be directly mea-sured, but rather is known relative to a single observation time. That is, the failure time is knownto be smaller or larger than an observation time leading to left- or right-censored data, respectively.Data of this structure commonly arise as a part of medical and epidemiological studies, amongmany others, in which study participants are observed only once due to destructive/invasive testing,resource limitations, etc. For example, as part of the Right From The Start (RFTS) study, partici-pants completed an ultrasound examination as a means to determine their fibroid status (Laughlinet al., 2009). In this study, fibroid onset time could not be directly observed but was known relativeto the participant’s age at the ultrasound examination time.For the regression analysis of current status data, many parametric and semi-parametric meth-ods have been proposed. In particular, many of these efforts have focused on the proportionalhazards (Cox, 1972) and proportional odds models (Dabrowska and Doksum, 1988); for a reviewsee McMahan et al. (2013), Lu and Li (2017), and the references therein. To provide a more gen-eral modeling framework, Gu et al. (2005), Sun and Sun (2005), Zhang et al. (2005), Zhang andZhao (2013), and Zeng et al. (2016) developed estimation techniques under linear transformationmodels. A potentially prohibitive assumption made in all of the aforementioned works is that thecovariate effects on the failure time are “linear.” To relax this assumption, Cheng et al. (2011) de-veloped a semi-parametric additive transformation model and Lu and McMahan (2018) presented apartially linear PH model for current status data. For a comprehensive review of methods to analyzeinterval-censored data see Sun (2007) and Banerjee (2012).In this article, we seek to develop a novel methodology which can be used to conduct the regres-sion analysis of current status data. In particular, a very general partially linear additive transforma-tion model is proposed. The proposed model makes use of constrained B -splines to approximate allunknown functions; i.e., the transformation and regression functions. In order to provide for moreefficient estimators, a penalization strategy is utilized to regularize the estimation of the unknownfunctions. To complete model fitting a computationally efficient and easy to implement hybrid al-gorithm is developed. Theoretically, it is shown that the penalized estimators of the regressionscoefficients are asymptotically normal and efficient; i.e., they attain the semi-parametric informa-tion bound. Moreover, we establish that the estimators of the nonparametric components achievethe optimal rate of convergence for an appropriately chosen order of the smoothing parameters.Finite sample techniques of determining the smoothing parameter and estimating the asymptoticcovariance matrix of the estimated regression coefficients are provided and evaluated.2 Model
Consider a study which examines n subjects for a failure time of interest. Let T i denote the timethat the i th subject experiences the failure, for i = 1 , ...., n . To provide for modeling flexibility, weassume that T i obeys a partial linear additive transformation model which is given by η ( T i ) = − Z (cid:124) i β − J (cid:88) j =1 ϕ j ( W ij ) + ε i , (1)where η ( · ) is an unknown monotone increasing transformation function on (0 , ∞ ) with η (0) = −∞ and ε i is a random variable having cumulative distribution function (CDF) H ( · ) . In the abovespecification, β is a q -dimensional vector of regression coefficients corresponding to the covariates Z i = ( Z i , ..., Z iq ) (cid:124) and ϕ j ( · ) is an unknown smooth regression function taking the covariate W ij as its argument, for j = 1 , ..., J . Throughout, we assume that ε i ⊥ ε j , for i (cid:54) = j , and ε i ⊥ X i , where ⊥ denotes statistical independence and X i = ( Z (cid:124) i , W i , ..., W iJ ) (cid:124) . This formulationprovides for a very general modeling construct that holds several popular models as special cases.For example, specifying H ( · ) to be the CDF of the extreme value distribution causes (1) to reduceto the standard proportional hazards (PH) model, while taking H ( · ) to be the CDF of the standardlogistic distribution results in the proportional odds (PO) model.Following the work of Murphy et al. (1997), Sun and Sun (2005), and Cheng et al. (2011), wenote that (1) can be rewritten in terms of the conditional CDF of T i , given the covariate vector X i ,and can be expressed as F ( t | X i ) = g − Z (cid:124) i β + η ( t ) + J (cid:88) j =1 ϕ j ( W ij ) , (2)where g ( · ) is a known, smooth, and strictly increasing link function on (0 , . Though equivalentto (1), the representation provided in (2) is more convenient for reasons that will shortly becomeapparent. Here g ( · ) assumes the role of H ( · ) with respect to determining the final form of theproposed model; e.g., specifying g ( x ) = log {− log(1 − x ) } provides for the PH model and g ( x ) =log { x/ (1 − x ) } results in the PO model.A key feature of current status data is that the failure times (i.e., T i , for i = 1 , . . . , n ) are notdirectly observed but rather are known relative to observation times; i.e., the T i is either left- orright-censored indicating that T i occurs before or after the observation time, respectively. To math-ematically frame the structure of this data, let Y i denote the observation time for the i th individualand let ∆ i = I ( T i < Y i ) denote the corresponding censoring indicator, where I ( · ) is the usualindicator function. Note, ∆ i = 1(0) represents the event that the i th individual’s failure time is3eft(right)-censored. Thus, the observed data consist of { ( Y i , ∆ i , X i ) } ni =1 . Following the works ofMammen and van de Geer (1997), Ma et al. (2005), and Ma (2009), we propose to estimate theunknown parameters via the following penalized log-likelihood l λ ( τ ) = n (cid:88) i =1 [∆ i log F ( Y i | X i )+(1 − ∆ i ) log { − F ( Y i | X i ) } ] − λ J ( η ) − J (cid:88) j =1 λ j J ( ϕ j ) , where τ = ( β, η, ϕ , ..., ϕ J ) represents the collection of unknown model parameters, λ = ( λ , . . . , λ J ) (cid:124) is a vector of smoothing parameters, such that λ j ≥ for all j , and J ( η ) = (cid:90) { η ( r ) ( t ) } dt, J ( ϕ j ) = (cid:90) { ϕ ( r ) j ( W ) } dW. In the penalties above, η ( r ) and ϕ ( r ) j denote the r th derivative of η ( · ) and ϕ j ( · ) , respectively, r ≥ .This penalized log-likelihood is derived under the assumption that the covariates are time-invariantand that the failure and censoring times are conditionally independent, given the covariates. Theseassumptions are common among the literature; e.g., see Ma et al. (2005), Cheng et al. (2011),Banerjee (2012), and the references therein. Let τ = ( β , η , ϕ , ..., ϕ J ) and (cid:98) τ = ( (cid:98) β, (cid:98) η, (cid:98) ϕ , ..., (cid:98) ϕ J ) denote the true value and the penalizedmaximum likelihood estimator of τ , respectively. We established the asymptotic properties of thepenalized estimators (cid:98) τ under the following regularity conditions: Condition 1
The true parameter β is an interior point of a compact subset of R q . Condition 2
The r th derivatives of η and ϕ j , for ≤ j ≤ J , satisfy the Lipchitz condition on [ c, d ] for c > and [ a j , b j ] , respectively, and η is strictly increasing on [ c, d ] . Condition 3
The support of the observation times is an interval within [ l y , u y ] with < l y < u y < inf { t : F ( t ) = 1 } , where F is the CDF of the observation times. Condition 4
The covariate vector Z has bounded support. Condition 5
The first derivative of the link function g is bounded away from 0 and the secondderivative of g is bounded on (0,1). Condition 6
The joint density of ( Y, Z, W, ∆) is bounded away from 0 and ∞ . ondition 7 For any β (cid:54) = β , Pr( Z (cid:124) β (cid:54) = Z (cid:124) β ) > and E { ϕ j ( W j ) } = 0 , for ≤ j ≤ J . Condition 8
The smoothing parameters are specified such that λ j = o p ( n − / ) and λ − j = O p ( n r/ (2 r +1) ) , for ≤ j ≤ J. Condition 9
The efficient information I ( β ) defined in Appendix 1 of the Supplementary Materialis non-singular. Remark 1
Condition 1 is a standard assumption in semi-parametric estimation. The smoothnessassumption of η and ϕ j in condition 2 guarantees that the unknown nonparametric componentscan be well approximated by B -splines. Conditions 3–6 are necessary in entropy calculation usedto derive the rate of convergence of (cid:98) τ and the asymptotic normality of (cid:98) β . Condition 7 is requiredto establish the identifiability of the proposed semi-parametric model. Condition 8 is a typicalassumption used to derive the asymptotic property of penalized estimators. Finally, condition 9 iscommon for the asymptotic normality and the efficiency of (cid:98) β . Using modern empirical process theory, we establish the following asymptotic properties:
Theorem 1 (Consistency and rate of convergence) Under conditions 1–8, the penalized estimator (cid:98) β is asymptotically consistent for β , (cid:107) (cid:98) η (cid:107) ∞ = (cid:107) (cid:98) ϕ j (cid:107) ∞ = O p (1) , and (cid:107) (cid:98) η − η (cid:107) = (cid:107) (cid:98) ϕ j − ϕ j (cid:107) = O p ( n − r/ (1+2 r ) ) , for ≤ j ≤ J . Theorem 2 (Asymptotic normality and efficiency) Under conditions 1–9, n / ( (cid:98) β − β ) d −→ N { , I − ( β ) } , as n → ∞ . The proofs of these results are relegated to Appendix 1 of the Supplementary Material. It isworthwhile to point out that if the smoothing parameters are chosen to be of the right order (seecondition 8), our functional estimates attain the optimal rate of convergence; i.e., O p ( n r/ (1+2 r ) ) .Moreover, (cid:98) β is efficient in the semi-parametric sense since it achieves the information bound. In general, estimating β , η and ϕ j directly from the penalized log-likelihood can be a tumultuoustask. Thus, motivated by Xiang and Wahba (1997), Ma et al. (2005), and Lu and Li (2017), we pro-pose to strike a balance between modeling flexibility and complexity by approximating all unknown5unctions in (2) through the use of constrained B -splines functions. In particular, η ( · ) ≈ η n ( · | γ ) = p (cid:88) k =1 γ k b k ( · ) and ϕ j ( · ) ≈ ϕ nj ( · | α ∗ j ) = p j (cid:88) k =1 α ∗ jk b jk ( · ) , where b k ( · ) and b jk ( · ) are B -spline basis functions with unknown coefficients γ = ( γ , . . . , γ p ) (cid:124) and α ∗ j = ( α ∗ j , ..., α ∗ jp j ) (cid:124) , respectively, ≤ j ≤ J . Once a knot set is chosen and the degree ofthe spline function is specified these basis functions are uniquely determined; for further discussionsee Schumaker (2007). Constraints are placed on the spline coefficients to insure the monotonicityof the transformation function and the identifiability of the regression functions. The monotonicityof η n ( · | γ ) can be insured by requiring that γ ≤ . . . ≤ γ p ; see Theorem 5.9 of Schumaker (2007).As with all additive models, ϕ j ( · ) is only identifiable up to an additive constant. Thus, sum-to-zeroconstraints are imposed such that (cid:80) ni =1 ϕ nj ( W ji | α ∗ j ) = 0 , for ≤ j ≤ J .For implementation purposes, the sum-to-zero constraints can be expressed as B ∗ j (cid:124) α ∗ j = 0 ,where B ∗ j = (cid:80) ni =1 B ∗ ji , B ∗ ji = { b j ( W ji ) , . . . , b jp j ( W ji ) } (cid:124) , and α ∗ j = ( α ∗ j , ..., α ∗ jp j ) (cid:124) . Motivatedby Wood (2017), the model is reparameterized such that the constrained set of parameters α ∗ j canbe uniquely determined by an unconstrained reduced set α j = ( α j , ..., α jp j − ) (cid:124) . In particular, let Q j denote a p j × ( p j − semi-orthogonal matrix whose columns are orthogonal to B ∗ j . Obviously, B ∗ j (cid:124) Q j α j = 0 , for all α j ∈ R p j − , and for any α ∗ j satisfying the sum-to-zero constraint there exists α j such that α ∗ j = Q j α j . Note, Q j can easily be found via a QR-decomposition of B ∗ j . Thus, thelog-likelihood under the proposed spline model is obtained by replacing F ( Y i | X i ) in the penalizedlog-likelihood function by F n ( Y i | X i ) = g − Z (cid:124) i β + M (cid:124) i γ + J (cid:88) j =1 B (cid:124) ji α j , (3)where M i = { b ( Y i ) , . . . , b p ( Y i ) } (cid:124) and B ji = Q (cid:124) j B ∗ ji , for ≤ i ≤ n . Thus, the set of unknownparameters to be estimated are given by θ = ( β (cid:124) , γ (cid:124) , α (cid:124) ) (cid:124) , where α = ( α (cid:124) , . . . , α (cid:124) J ) (cid:124) .To further alleviate computational burdens, we follow the proposal of Eilers and Marx (1996)and replace the penalty terms J ( η ) and J ( ϕ j ) by discrete approximations. Under this strategy,the penalized spline log-likelihood function is given by l n,λ ( θ ) = n (cid:88) i =1 [∆ i log F n ( Y i | X i ) + (1 − ∆ i ) log { − F n ( Y i | X i ) } ] − λ (cid:107) D ( r )0 γ (cid:107) − J (cid:88) j =1 λ j (cid:107) D ( r ) j α j (cid:107) , (4)6here D ( r ) j is the discrete difference operator of order r (Eilers and Marx, 1996; Tibshirani et al.,2014) and (cid:107) · (cid:107) is the Euclidean norm. This operator is best defined recursively as D ( r +1) j = D (1 ,r ) j D ( r ) j , where D (1 ,r ) j = − . . . − . . . ... ... ... ... ... . . . − ∈ R ( p j − r − × ( p j − r − , where D (1) j = D (1 , j . Proceeding in this fashion for r = 1 and yields penalties of the form (cid:107) D (1)0 γ (cid:107) = p (cid:88) k =2 ( γ k − γ k − ) , (cid:107) D (2)0 γ (cid:107) = p (cid:88) k =3 ( γ k − γ k − + γ k − ) , respectively. Through extensive numerical studies, see Section 5, we have found that setting r = 2 is sufficient when implementing the proposed approach.Due to these approximations, the proposed penalized spline estimator is different from the pe-nalized estimator, but as long as the number of basis functions for each of the unknown functionsgrows at least at the rate O ( n / ) , this difference will vanish and the statistical performance of thetwo estimators should be indistinguishable; for further discussion see Eilers and Marx (1996), Maet al. (2005), Wood (2017), and the references therein. That is, the penalized spline estimator shouldinherit the theoretical properties of the penalized maximum likelihood estimator. This assertion issupported by the results of the numerical experiments reported in Section 5. To ensure the accuracyof the approximation, it is generally advised that the number of knots for each function be chosenin a neighborhood of n / ; for further discussion see Section 5. To complete model fitting a computationally efficient hybrid algorithm is developed which can beused to identify both the penalized spline estimators and the smoothing parameters via a nestediterative process. The inner process is aimed at identifying the value of θ that maximizes (4) fora fixed value of λ , while adhering to the monotonicity constraints; i.e., the inner step identifies (cid:101) θ λ = argmax θ l n,λ ( θ ) s.t. γ ≤ . . . ≤ γ p . To solve this constrained optimization problem, aniterative algorithm is developed. In each iteration of this algorithm, the unconstrained updates of7he parameters, say θ ∗ , are first determined based on the current parameter values, say θ ( m ) , via thefollowing Fisher-scoring step θ ∗ = θ ( m ) + I − n,λ ( θ ( m ) ) ∇ l n,λ ( θ ( m ) ) , where ∇ l n,λ ( θ ( m ) ) and I n,λ ( θ ( m ) ) are the gradient and the expected value of the negative hessianmatrix of (4) evaluated at θ ( m ) , respectively. Closed form expressions for these quantities are givenby ∇ l n,λ ( θ ) = X (cid:124) Ω( θ )∆( θ ) − Sθ I n,λ ( θ ) = X (cid:124) Ω( θ ) X + S, where X = ( Z, M, B , ..., B J ) , Z (cid:124) = ( Z , . . . , Z n ) , M (cid:124) = ( M , . . . , M n ) , and B (cid:124) j = ( B j , . . . , B jn ) ,and S = (cid:80) Jj =0 λ j S j , with S j = diag { r j × r j , D ( r ) (cid:124) j D ( r ) j , r j × r j } , r j = q + p + ... + p j − − ( j − , and r j = p j +1 + ... + p J − ( J − j ) . In the expressions above, the transformed responsevector and weight matrix for a specific value of θ are given by ∆( θ ) = (cid:26) ∆ − π G (cid:48) ( X (cid:124) θ ) , ..., ∆ n − π n G (cid:48) ( X (cid:124) n θ ) (cid:27) (cid:124) Ω( θ ) = diag (cid:26) G (cid:48) ( X (cid:124) θ ) π (1 − π ) , ..., G (cid:48) ( X (cid:124) n θ ) π n (1 − π n ) (cid:27) , where X i is the i th row of X , G (cid:48) ( · ) is the derivative of G ( · ) = g − ( · ) , and π i = g − ( X (cid:124) i θ ) .The updated values of γ available in θ ∗ do not necessarily satisfy the monotonicity constraints;see Section 2. To enforce these conditions, isotonic regression is implemented to project the un-constrained estimates into the constrained space; i.e., into G = { γ : γ ≤ . . . ≤ γ p } . This isaccomplished by solving the following minimization problem γ ( m +1) = argmin γ ∈ G (cid:40) p (cid:88) k =1 σ k ( γ k − γ ∗ k ) (cid:41) , where σ k is the diagonal element of I − n,λ ( θ ∗ ) corresponding to γ k , for k = 1 , ..., p . This problemcan be solved using the pava function in the R package Iso , which takes γ ∗ and ( σ , . . . , σ p ) (cid:124) asinputs. Substituting γ ( m +1) for γ ∗ in θ ∗ one obtains the constrained update θ ( m +1) . These two stepsare completed in turn until convergence is attained; i.e., the difference between θ ( m ) and θ ( m +1) isless than some specified stopping criterion. At the point of convergence of this inner process, oneobtains (cid:101) θ λ ; i.e., the maximizer of the penalized spline log-likelihood function for a fixed value of λ . Once the inner process is complete, the outer process updates the smoothing parameters and the8nner process is restarted. To obtain the updated smoothing parameters, say λ ∗ = ( λ ∗ , ..., λ ∗ J ) (cid:124) , ageneralized Fellner-Schall approach (Wood and Fasiolo, 2017) is adopted and makes the update as λ ∗ j = tr { S − S j } − tr (cid:110) I n,λ ( (cid:101) θ λ ) − S j (cid:111)(cid:101) θ (cid:124) λ S j (cid:101) θ λ λ j , for ≤ j ≤ J, where S − is the generalized inverse of S . Using the updated smoothing parameters (i.e., λ ∗ ), theinner process is then utilized to identify (cid:101) θ λ ∗ . This nested iterative process continues until the dif-ference between (cid:101) θ λ and (cid:101) θ λ ∗ is less than some specified stopping criterion. It is worth while to pointout that the update of λ j is guaranteed to be positive, which is required, as long as I n,λ ( (cid:101) θ λ ) ispositive definite (Wood and Fasiolo, 2017), which is the case for the proposed model. At conver-gence of the proposed hybrid algorithm, let (cid:101) θ = ( (cid:101) β (cid:124) , (cid:101) γ (cid:124) , (cid:101) α (cid:124) ) (cid:124) denote the final update of θ with (cid:101) η ( · ) = η n ( · | (cid:101) γ ) and (cid:101) ϕ j ( · ) = ϕ nj ( · | (cid:101) α j ) being the proposed penalized spline estimators of η ( · ) and ϕ j ( · ) , respectively. For the purposes of conducting large-sample inference an estimator of the variance-covariance ma-trix of (cid:101) β was developed. This estimator is given by the upper q × q submatrix of Σ( (cid:101) θ ) = ( X (cid:124) Ω( (cid:101) θ ) X + S ) − X (cid:124) Ω( (cid:101) θ ) X ( X (cid:124) Ω( (cid:101) θ ) X + S ) − . By applying the least-square based variance estimation approach (Zhang et al., 2010), it can beshown that the proposed estimator consistently estimates I − ( β ) . Using this estimator one canconduct standard Wald type inference about the regression coefficients. To assess the finite sample performance of the proposed methodology the following simulationstudy was conducted. In this study, the true failure time T was generated according to the followingmodel g α { F ( t | X ) } = Z β + Z β + η ( t ) + ϕ ( W ) + ϕ ( W ) , (5)where W j ∼ U [ − , , Z ∼ Bernoulli (0 . and Z ∼ N (0 , . To provide for a broad range ofexamples, the link function g α ( · ) was assumed to belong to a family of links indexed by a parameter α as: g α ( u ) = (cid:40) log (1 − u ) − α − α if α > , log {− log(1 − u ) } if α = 0 . α = 0 leads tothe PH model while α = 1 corresponds to a PO model. To examine different forms of the unknownfunctions and regression coefficients three scenarios were considered. In Scenario 1 (S1) the threefunctions were specified as η ( t ) = log(2 t ) , ϕ ( w ) = exp( w + 0 . − { exp(1 . − exp( − . } / ,and ϕ ( w ) = 2 sin( − πw ) and the regression coefficients were chosen to be β = 0 . and β = − . ; in Scenario 2 (S2) the three functions were specified as η ( t ) = log { . t − log(1 + 1 . t ) } , ϕ ( w ) = 2 sin( − πw ) , and ϕ ( w ) = 4 w − / and the regression coefficients were set to be β = 0 . and β = 0 . ; in Scenario 3 (S3) the three functions were specified as η ( t ) = log { log(1 + t/
10) + √ t/ } , ϕ ( w ) = 4 w − / , and ϕ ( w ) = exp( w + 0 . − { exp(1 . − exp( − . } / and the regression coefficients were set to be β = − . and β = − . . In each of S1-S3, weconsider values of α ∈ { , . , } . Note, each of the specified functions adheres to the identifiabilityconstraint; i.e., they integrate to zero over the support of the corresponding covariates. To createcurrent status data, observation times ( Y ) were sampled from an exponential distribution with amean of 2, 2, and 1 under S1, S2, and S3, respectively, and the censoring indicator was determinedas ∆ = I ( T < Y ) . These specifications were made to yield a variety of right-censoring rates; forexample when α = 0(1) we have right censoring rates of 27%(36%), 44%(51%), and 77%(81%)under S1, S2, and S3, respectively. For each simulation configuration, this data generating processwas repeated 500 times to create 500 independent data sets consisting of n = 400 observations,which is approximately one fourth the sample size available in our motivating data application.The proposed methodology was used to analyze each of the simulated data sets. For purposesof comparison, we compare the results of the proposed approach to those of the spline estimatorthat arises from maximizing (4) after dropping the penalty term. Note, this competing approachis a generalization of the technique outlined by Lu and McMahan (2018) which was shown tooutperform Cheng et al. (2011). To compute both the penalized and unpenalized spline estimators,cubic B -spines were used to approximate all unknown functions with interior knots being placedat equally-spaced quantiles of the covariate values. For each data set, the proposed approach madeuse of a single knot set consisting of (cid:100) n / (cid:101) interior knots, while the unpenalized estimator wascomputed using multiple knot set configurations with the number of interior knots ranging from (cid:100) n / (cid:101) − to (cid:100) n / (cid:101) + 3 , with the final unpenailzed estimator being selected based on the BICcriterion; for further discussion see Lu and McMahan (2018).Table 1 summarizes the results of estimating the regression coefficients across all consideredsimulation configurations for both the penalized and unpenalized spline estimator. Provided in thissummary are the empirical bias, sample standard deviation, and mean squared error of the 500point estimates, as well as the average of the 500 estimated standard errors and empirical coverageprobability associated with 95% confidence interval. From these results it is apparent that theproposed approach works well; i.e., the penalized spline estimators of the regression coefficientsexhibit little bias, the standard deviations of the point estimates are in agreement with the averageestimated standard errors, and the empirical coverage probabilities are at their nominal level. The10atter two findings tend to suggest that the proposed method of estimating the variance-covariancematrix works well and that large-sample inference is possible even for relatively small samplesizes. Further, the proposed penalized spline estimator outperformed in virtually every aspect itsunpenalized counterpart even though this competing estimator was chosen as the best from amongstmultiple candidates. Moreover, the performance of this competing technique varied dramaticallyacross the different knot set configurations; i.e., the unpenalized spline estimator was found tobe sensitive to the specification of the number and placement of the basis functions. However,penalized estimator was robust to the specification of the basis functions.One of the primary goals of this study was to assess the performance of the proposed methodwith respect to estimating the unknown functions. To this end, Figure 1 summarizes the estimatesof the unknown functions for S2 for all considered values of α that were obtained from both ap-proaches. This summary includes the 0.025, 0.5, and 0.975 point-wise quantiles of the estimates.The analogous figures under S1 and S3 can be found in Appendix 2 of the Supplementary Material.These results suggest that the proposed methodology can be used to accurately estimate all of theunknown functions; i.e. very little bias can be observed when one compares the the median ofthe functional estimates to the true underlying function. Additionally, these findings demonstratethat the penalized spline estimator results in functional estimates that have smaller variability whencompared to the unpenalized spline estimator. These finding reinforce the assertion that the pro-posed approach performs better with respect to both estimation and inference than its unpenalizedcounterpart, and moreover, that it is robust to the specification of the spline functions. For further illustration, the proposed methodology is now used to analyze uterine leiomyomata(fibroid) data collected as a part of a prospective cohort study concerned with early pregnancy.Uterine fibroids are noncancerous growths that appear during childbearing years and have beenfound to be associated with complications in pregnancy and delivery (Coronado et al., 2000). As apart of this study, a patient’s fibroid status was determined during an ultrasound examination; i.e.,during this examination fibroids were either found to be present or absent indicating that the fibroidonset time was smaller or larger than the examination time, respectively. Thus, the fibroid onsettime is either left- or right-censored with the patient’s age at examination being the censoring time.Of the n = 1604 participants considered in this analysis 216 were found to be fibroid positive atthe time of their examination, leading to a left-censoring rate of approximately 13%.In this analysis, we focus on four risk factors purported to be related to fibroid development;i.e., race ( Z = 1 if African American and Z = 0 Caucasian), parity status ( Z = 1 if they had givenbirth before and Z = 0 otherwise), obesity status ( Z = 1 indicating obesity and Z = 0 otherwise),and standardized age of menarche ( W ). To relate these risk factors to the fibroid onset time, the11ollowing model was specified g { F ( t | X ) } = Z β + Z β + Z β + η ( t ) + ϕ ( W ) , where g ( u ) = log {− log(1 − u ) } . This link function leads to the usual PH model. Cubic B -splineswith a knot set consisting of (cid:100) n / (cid:101) = 12 interior knots were used to approximate all unknownfunctions. Model fitting was completed using the hybrid algorithm outlined in Section 4. Theestimator of the variance-covariance matrix of (cid:101) β developed in Section 4 was used to provide forstandard errors and Wald confidence intervals for the estimated regression coefficients.Table 2 summarizes the regression parameter estimates, along with their estimated standarderrors and corresponding 95% confidence intervals. The results suggest that race and parity aresignificant factors, while obesity is not. In particular, these results suggest that African Americanwomen have a higher hazard of developing fibroids when compared to Caucasian women, likewise itappears that women who have not given birth before appeared to have a higher hazard of developingfibroids when compared to those that had. These results and conclusions reinforce those presentedin Wang and Dunson (2011) and Lu and McMahan (2018). Figure 2 provides the estimates of η ( · ) and ϕ ( · ) , along with 95% point-wise confidence intervals. The upper and lower limits of the 95%point-wise confidence intervals were obtained as the 0.025 and 0.975 point-wise quantiles of 1000bootstrap estimates. The estimate of ϕ ( · ) clearly exhibits a nonlinear pattern; i.e., standardizedage of menarche has a nonlinear impact on the development of fibroids. To examine the effectof the assumed model form, the aforementioned analysis was conducted under the PO model byspecifying g ( x ) = log { x/ (1 − x ) } . The estimates of the regression coefficients under the POmodel are presented in Table 2. These results provide no contradiction to the results obtained underPH model. A figure summarizing the estimates of η ( · ) and ϕ ( · ) under the PO model, along with95% point-wise confidence intervals, is provided in Appendix 2 of the Supplementary Material, andthese estimated functions exhibit similar features to those obtained under PH model. In this work we developed a computationally efficient penalized approach for fitting flexible par-tially linear additive transformation models to current status data. The proposed procedure demon-strates appealing properties, both numerical and theoretical. The hybrid algorithm developed formodel fitting is easy to implement and robust to initialization. Through the use of penalization,the proposed approach attains superior finite-sample performance when compared to the standardspline alternative. A simple variance-covariance estimation procedure was established and wasshown to provide reliable Wald-type inference. In addition to attractive finite sample performance,the estimators of the nonparametric components are shown to attain the optimal rate of convergenceand we prove that estimators of regression coefficients are asymptotically normal and efficient in12he semi-parametric sense.Given the successes of this approach, future work will be directed at generalizing this method-ology to allow for the analysis of interval-censored data. A key obstacle in the development of thismethodology will be the model fitting strategy, which will be far more complicated due to the com-plex nature of interval-censored data. Another direction for future research could involve furtherextending this work to allow for multiple end points, both for current status and interval censoreddata. Lastly, and potentially most challenging, would be the development of goodness-of-fit teststhat could be used to guide the specification of the link function. This is a notoriously hard problemin the transformation model setting, and is only made harder when one considers censored data,and/or additive models.
Acknowledgement
This research was supported by the National Institutes of Health grant AI121351 and NationalScience Foundation grant OIA-1826715.
Supplementary material
Technical details and proofs of Theorems 1 and 2 are provided in the Supplementary Materials,along with the additional results referenced in Sections 5 and 6.
References
Banerjee, M. (2012).
Current status data in the twenty-first century: Some interesting developments .CRC Press Taylor & Francis Group.Cheng, G., Wang, X., et al. (2011). Semiparametric additive transformation model under currentstatus data.
Electronic Journal of Statistics , 5:1735–1764.Coronado, G. D., Marshall, L. M., and Schwartz, S. M. (2000). Complications in pregnancy, labor,and delivery with uterine leiomyomas: a population-based study.
Obstetrics & Gynecology ,95(5):764–769.Cox, D. (1972). Regression models and life tables (with discussion).
Journal of the Royal StatisticalSociety , 34(2):187–220.Dabrowska, D. M. and Doksum, K. A. (1988). Estimation and testing in a two-sample generalizedodds-rate model.
Journal of the American Statistical Association , 83(403):744–749.13ilers, P. H. and Marx, B. D. (1996). Flexible smoothing with b-splines and penalties.
StatisticalScience , 11(2):89–121.Gu, M. G., Sun, L., and Zuo, G. (2005). A baseline-free procedure for transformation models underinterval censorship.
Lifetime Data Analysis , 11(4):473–488.Laughlin, S. K., Baird, D. D., Savitz, D. A., Herring, A. H., and Hartmann, K. E. (2009). Preva-lence of uterine leiomyomas in the first trimester of pregnancy: an ultrasound screening study.
Obstetrics & Gynecology , 113(3):630.Lu, M. and Li, C. (2017). Penalized estimation for proportional hazards models with current statusdata.
Statistics in Medicine , 36(30):4893–4907.Lu, M. and McMahan, C. S. (2018). A partially linear proportional hazards model for current statusdata.
Biometrics , 74:1240–1249.Ma, S. (2009). Cure model with current status data.
Statistica Sinica , 19(1):233–249.Ma, S., Kosorok, M. R., et al. (2005). Penalized log-likelihood estimation for partly linear transfor-mation models with current status data.
The Annals of Statistics , 33(5):2256–2290.Mammen, E. and van de Geer, S. (1997). Penalized quasi-likelihood estimation in partial linearmodels.
The Annals of Statistics , 25(3):1014–1035.McMahan, C. S., Wang, L., and Tebbs, J. M. (2013). Regression analysis for current status datausing the EM algorithm.
Statistics in Medicine , 32(25):4452–4466.Murphy, S., Rossini, A., and van der Vaart, A. W. (1997). Maximum likelihood estimation in theproportional odds model.
Journal of the American Statistical Association , 92(439):968–976.Schumaker, L. (2007).
Spline functions: basic theory . Cambridge University Press.Sun, J. (2007).
The statistical analysis of interval-censored failure time data . Springer Science &Business Media.Sun, J. and Sun, L. (2005). Semiparametric linear transformation models for current status data.
Canadian Journal of Statistics , 33(1):85–96.Tibshirani, R. J. et al. (2014). Adaptive piecewise polynomial estimation via trend filtering.
TheAnnals of Statistics , 42(1):285–323.Wang, L. and Dunson, D. B. (2011). Semiparametric Bayes’ proportional odds models for currentstatus data with underreporting.
Biometrics , 67(3):1111–1118.14ood, S. N. (2017).
Generalized additive models: an introduction with R . Chapman and Hall/CRC.Wood, S. N. and Fasiolo, M. (2017). A generalized Fellner-Schall method for smoothing param-eter optimization with application to tweedie location, scale and shape models.
Biometrics ,73(4):1071–1081.Xiang, D. and Wahba, G. (1997). Approximate smoothing spline methods for large data sets in thebinary case. In
Proceedings of the American Statistical Association Joint Statistical Meetings,Biometrics Section , pages 94–99.Zeng, D., Mao, L., and Lin, D. (2016). Maximum likelihood estimation for semiparametric trans-formation models with interval-censored data.
Biometrika , 103(2):253–271.Zhang, Y., Hua, L., and Huang, J. (2010). A spline-based semiparametric maximum likelihood esti-mation method for the cox model with interval-censored data.
Scandinavian Journal of Statistics ,37(2):338–354.Zhang, Z., Sun, L., Zhao, X., and Sun, J. (2005). Regression analysis of interval-censored failuretime data with linear transformation models.
Canadian Journal of Statistics , 33(1):61–70.Zhang, Z. and Zhao, Y. (2013). Empirical likelihood for linear transformation models with interval-censored failure time data.
Journal of Multivariate Analysis , 116:398–409.15able 1: Summary of the results that were obtained using the spline estimator and the penalizedspline estimator when α = 0 (PH model), α = 0 . , and α = 1 (PO model). Presented resultsinclude the empirical bias (Bias), sample standard deviation (SD) and mean squared error (MSE)of the 500 point estimates, as well as the average of the 500 estimated standard errors (SE) andempirical coverage probabilities associated with 95% confidence intervals (CP)Spline approach Penalized approach α = 0 . Bias SD MSE SE CP Bias SD MSE SE CPS1 β β -0.078 0.174 0.036 0.139 88.0 -0.014 0.129 0.017 0.122 94.2S2 β β β -0.064 0.327 0.111 0.284 91.8 -0.030 0.292 0.086 0.259 93.4 β -0.071 0.177 0.036 0.149 90.6 -0.034 0.158 0.026 0.136 92.0 α = 0 . Bias SD MSE SE CP Bias SD MSE SE CPS1 β β -0.060 0.166 0.031 0.147 91.8 -0.013 0.145 0.021 0.135 93.6S2 β β β -0.073 0.356 0.132 0.324 93.0 -0.041 0.325 0.107 0.298 93.2 β -0.061 0.177 0.035 0.169 94.4 -0.025 0.159 0.026 0.155 94.4 α = 1 . Bias SD MSE SE CP Bias SD MSE SE CPS1 β β -0.050 0.158 0.027 0.159 95.4 -0.015 0.144 0.021 0.149 96.6S2 β β β -0.035 0.393 0.156 0.363 94.2 -0.007 0.351 0.123 0.333 93.4 β -0.064 0.194 0.042 0.188 93.8 -0.021 0.173 0.030 0.172 95.816able 2: Uterine fibroid data: Presented results include the parameter estimates, estimated standarderrors (SE) and 95% Wald type confidence intervals (95% C.I.). The results were obtained usingthe penalized spline estimator under both the PH and PO model Model
Parameter Estimate SE 95% C.I. PH Race 1.479 0.165 (1.154, 1.803)Parity -0.305 0.144 (-0.588, -0.022)Obesity 0.114 0.174 (-0.227, 0.455) PO Race 1.648 0.200 (1.256, 2.040)Parity -0.341 0.163 (-0.661 -0.022)Obesity 0.155 0.200 (-0.237 0.547)17 t η ( t ) −4−2024 −1.0 −0.5 0.0 0.5 1.0 W ϕ ( W ) −2024 −1.0 −0.5 0.0 0.5 1.0 W ϕ ( W ) −7.5−5.0−2.50.0 0.5 1.0 1.5 2.0 t η ( t ) −4−2024 −1.0 −0.5 0.0 0.5 1.0 W ϕ ( W ) −2024 −1.0 −0.5 0.0 0.5 1.0 W ϕ ( W ) −7.5−5.0−2.50.0 0.5 1.0 1.5 2.0 t η ( t ) −4−2024 −1.0 −0.5 0.0 0.5 1.0 W ϕ ( W ) −2024 −1.0 −0.5 0.0 0.5 1.0 W ϕ ( W ) Figure 1: Estimates of the unknown functions under scenario S2 for all three models (from topto bottom: α = 0 , . , . ). The solid grey curves correspond to the true value of the functions.Dashed lines are the medians (black) and 95% percentiles (grey) of the estimates obtained from thepenalized spline approach. Dotted curves are the medians (black) and 95% percentiles (grey) of theestimates obtained from the unpenalized spline approach.18 t η ( t ) −4−20 −2 0 2 4 Age of Menarche ϕ Figure 2: Penalized functional estimate for η ( · ))