Anthony Y. C. Kuk
National University of Singapore
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anthony Y. C. Kuk.
Statistics & Probability Letters | 2000
Anthony Y. C. Kuk; David J. Nott
The method of pairwise likelihood is investigated for analyzing clustered or longitudinal binary data. The pairwise likelihood is a product of bivariate likelihoods for within cluster pairs of observations, and its maximizer is the maximum pairwise likelihood estimator. We discuss the computational advantages of pairwise likelihood relative to competing approaches, present some efficiency calculations and argue that when cluster sizes are unequal a weighted pairwise likelihood should be used for the marginal regression parameters, whereas the unweighted pairwise likelihood should be used for the association parameters.
Biometrics | 1997
Jennifer S. K. Chan; Anthony Y. C. Kuk
The probit-normal model for binary data (McCulloch, 1994, Journal of the American Statistical Association 89, 330-335) is extended to allow correlated random effects. To obtain maximum likelihood estimates, we use the EM algorithm with its M-step greatly simplified under the assumption of a probit link and its E-step made feasible by Gibbs sampling. Standard errors are calculated by inverting a Monte Carlo approximation of the information matrix rather than via the SEM algorithm. A method is also suggested that accounts for the Monte Carlo variation explicitly. As an illustration, we present a new analysis of the famous salamander mating data. Unlike previous analyses, we find it necessary to introduce different variance components for different species of animals. Finally, we consider models with correlated errors as well as correlated random effects.
Journal of The Royal Statistical Society Series B-statistical Methodology | 2002
Kelvin K. W. Yau; Anthony Y. C. Kuk
Generalized linear mixed models (GLMMs) are widely used to analyse non-normal response data with extra-variation, but non-robust estimators are still routinely used. We propose robust methods for maximum quasi-likelihood and residual maximum quasi-likelihood estimation to limit the influence of outlying observations in GLMMs. The estimation procedure parallels the development of robust estimation methods in linear mixed models, but with adjustments in the dependent variable and the variance component. The methods proposed are applied to three data sets and a comparison is made with the nonparametric maximum likelihood approach. When applied to a set of epileptic seizure data, the methods proposed have the desired effect of limiting the influence of outlying observations on the parameter estimates. Simulation shows that one of the residual maximum quasi-likelihood proposals has a smaller bias than those of the other estimation methods. We further discuss the equivalence of two GLMM formulations when the response variable follows an exponential family. Their extensions to robust GLMMs and their comparative advantages in modelling are described. Some possible modifications of the robust GLMM estimation methods are given to provide further flexibility for applying the method.
Journal of Statistical Computation and Simulation | 1997
Anthony Y. C. Kuk; Yuk W. Cheng
It is shown that the Monte Carlo Newton-Raphson algorithm is a viable alternative to the Monte Carlo EM algorithm for finding maximum likelihood estimates based on incomplete data. Both Monte Carlo procedures require simulations from the conditional distribution of the missing data given the observed data with the aid of methods like Gibbs sampling and rejective sampling. The Newton-Raphson algorithm is computationally more efficient than the EM algorithm as it converges faster. We further refine the procedure to make it more stable numerically. Our stopping criterion is based on a chi-square test for zero gradient. We control the type II error by working out the number of Monte Carlo replications required to make the non-centrality parameter sufficiently large. The procedure is validated and illustrated using three examples involving binary, survival and count data. In the last example, the Monte Carlo Newton-Raphson procedure is eight times faster than a modified version of the Monte Carlo EM algorithm.
Journal of the American Statistical Association | 1997
K. F. Lam; Anthony Y. C. Kuk
Abstract A marginal likelihood approach is proposed for estimating the parameters in a frailty model using clustered survival data. To overcome the analytic intractability of the marginal likelihood function, we propose a Monte Carlo approximation using the technique of importance sampling. Implementation is by means of simulations from the uniform distribution. The suggested method can cope with censoring and unequal cluster sizes and can be applied to any frailty distribution with explicit Laplace transform. We concentrate on a two-parameter family that includes the gamma, inverse Gaussian, and positive stable distributions as special cases. The method is illustrated using data from an animal carcinogenesis experiment and validated in a simulation study.
The Journal of Clinical Psychiatry | 2010
Anthony Y. C. Kuk; Jialiang Li; A. John Rush
OBJECTIVE There are currently no clinically useful assessments that can reliably predict--early in treatment--whether a particular depressed patient will respond to a particular antidepressant. We explored the possibility of using baseline features and early symptom change to predict which patients will and which patients will not respond to treatment. METHOD Participants were 2,280 outpatients enrolled in the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study who had complete 16-item Quick Inventory of Depressive Symptomatology-self-report (QIDS-SR16) records at baseline, week 2, and week 6 (primary outcome) of treatment with citalopram. Response was defined as a ≥ 50% reduction in QIDS-SR16 score by week 6. By developing a recursive subsetting algorithm, we used both baseline variables and change in QIDS-SR16 scores from baseline to week 2 to predict response/nonresponse to treatment for as many patients as possible with controlled accuracy, while reserving judgment for the rest. RESULTS Baseline variables by themselves were not clinically useful predictors, whereas symptom change from baseline to week 2 identified 280 nonresponders, of which 227 were true nonresponders. By subsetting recursively according to both baseline features and symptom change, we were able to identify 505 nonresponders, of which 403 were true nonresponders, to achieve a clinically meaningful negative predictive value of 0.8, which was upheld in cross-validation analyses. CONCLUSIONS Recursive subsetting based on baseline features and early symptom change allows predictions of nonresponse that are sufficiently certain for clinicians to spare identified patients from prolonged exposure to ineffective treatment, thereby personalizing depression management and saving time and cost. TRIAL REGISTRATION clinicaltrials.gov Identifier: NCT00021528.
Journal of Statistical Computation and Simulation | 1999
Anthony Y. C. Kuk
It is well known that the standard Laplace approximation of the integrated marginal likelihood function of a random effects model may be invalid if the dimension of the integral increases with the sample size and the resulting parameter estimates, especially those of the variance components, are biased towards zero. Bias-correction factors have been proposed in the literature but they are asymptotically correct only for the case of small variance components. Techniques for modifying the standard Laplace expansion have also been proposed but they are highly technical and problem-specific and hence unsuitable for routine use. Monte Carlo approximations of the marginal likelihood function typically make use of Markov Chain Monte Carlo (MCMC) sampling the convergence of which is difficult to check. We propose an importance sampling method where the the importance function is chosen with the aid of Laplace expansion. Since it is only used to suggest an appropriate importance function, the accuracy of Laplace e...
Statistics and Computing | 1999
Anthony Y. C. Kuk; Yuk W. Cheng
We consider the use of Monte Carlo methods to obtain maximum likelihood estimates for random effects models and distinguish between the pointwise and functional approaches. We explore the relationship between the two approaches and compare them with the EM algorithm. The functional approach is more ambitious but the approximation is local in nature which we demonstrate graphically using two simple examples. A remedy is to obtain successively better approximations of the relative likelihood function near the true maximum likelihood estimate. To save computing time, we use only one Newton iteration to approximate the maximiser of each Monte Carlo likelihood and show that this is equivalent to the pointwise approach. The procedure is applied to fit a latent process model to a set of polio incidence data. The paper ends by a comparison between the marginal likelihood and the recently proposed hierarchical likelihood which avoids integration altogether.
Statistics in Medicine | 2005
Anthony Y. C. Kuk; Stefan Ma
Abstract The incubation period of SARS is the time between infection of disease and onset of symptoms. Knowledge about the distribution of incubation times is crucial in determining the length of quarantine period and is an important parameter in modelling the spread and control of SARS. As the exact time of infection is unknown for most patients, the incubation time cannot be determined. What is observable is the serial interval which is the time from the onset of symptoms in an index case to the onset of symptoms in a subsequent case infected by the index case. By constructing a convolution likelihood based on the serial interval data, we are able to estimate the incubation distribution which is assumed to be Weibull, and justifications are given to support this choice over other distributions. The method is applied to data provided by the Ministry of Health of Singapore and the results justify the choice of a ten‐day quarantine period. The indirect estimate obtained using the method of convolution likelihood is validated by means of comparison with a direct estimate obtained directly from a subset of patients for whom the incubation time can be ascertained. Despite its name, the proposed indirect estimate is actually more precise than the direct estimate because serial interval data are recorded for almost all patients, whereas exact incubation times can be determined for only a small subset. It is possible to obtain an even more efficient estimate by using the combined data but the improvement is not substantial. Copyright
Bioinformatics | 2009
Anthony Y. C. Kuk; Han Zhang; Yaning Yang
MOTIVATION Pooling large number of DNA samples is a common practice in association study, especially for initial screening. However, the use of expectation-maximization (EM)-type algorithms in estimating haplotype distributions for even moderate pool sizes is hampered by the computational complexity involved. A novel constrained EM algorithm called PoooL has been proposed recently to bypass the difficulty via the use of asymptotic normality of the pooled allele frequencies. The resulting estimates are, however, not maximum likelihood estimates and hence not optimal. Furthermore, the assumption of Hardy-Weinberg equilibrium (HWE) made may not be realistic in practice. METHODS Rather than carrying out constrained maximization as in PoooL, we revert to the usual EM algorithm but make it computationally feasible by using normal approximations. The resulting algorithm is much simpler to implement than PoooL because there is no need to invoke sophisticated iterative scaling methods as in PoooL. We also develop an estimating equation analogue of the EM algorithm for the case of Hardy-Weinberg disequilibrium (HWD) by conditioning on the haplotypes of both chromosomes of the same individual. Incorporated into the method is a way of estimating the inbreeding coefficient by relating it to overdispersion. RESULTS Simulation study assuming HWE shows that our simplified implementation of the EM algorithm leads to estimates with substantially smaller SDs than PoooL estimates. Further simulations show that ignoring HWD will induce biases in the estimates. Our extended method with estimation of inbreeding coefficient incorporated is able to reduce the bias leading to estimates with substantially smaller mean square errors. We also present results to suggest that our method can cope with a certain degree of locus-specific inbreeding as well as additional overdispersion not caused by inbreeding. AVAILABILITY http://staff.ustc.edu.cn/ approximately ynyang/aem-aes