Kelvin K. W. Yau
City University of Hong Kong
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kelvin K. W. Yau.
Statistical Methods in Medical Research | 2006
Andy H. Lee; Kui Wang; Jane A. Scott; Kelvin K. W. Yau; Geoffrey J. McLachlan
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which render the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Accident Analysis & Prevention | 2013
Guangnan Zhang; Kelvin K. W. Yau; Guanghan Chen
With the recent economic boom in China, vehicle volume and the number of traffic accident fatalities have become the highest in the world. Meanwhile, traffic accidents have become the leading cause of death in China. Systematically analyzing road safety data from different perspectives and applying empirical methods/implementing proper measures to reduce the fatality rate will be an urgent and challenging task for China in the coming years. In this study, we analyze the traffic accident data for the period 2006-2010 in Guangdong Province, China. These data, extracted from the Traffic Management Sector-Specific Incident Case Data Report, are the only officially available and reliable source of traffic accident data (with a sample size>7000 per year). In particular, we focus on two outcome measures: traffic violations and accident severity. Human, vehicle, road and environmental risk factors are considered. First, the results establish the role of traffic violations as one of the major risks threatening road safety. An immediate implication is: if the traffic violation rate could be reduced or controlled successfully, then the rate of serious injuries and fatalities would be reduced accordingly. Second, specific risk factors associated with traffic violations and accident severity are determined. Accordingly, to reduce traffic accident incidence and fatality rates, measures such as traffic regulations and legislation-targeting different vehicle types/driver groups with respect to the various human, vehicle and environment risk factors-are needed. Such measures could include road safety programs for targeted driver groups, focused enforcement of traffic regulations and road/transport facility improvements. Data analysis results arising from this study will shed lights on the development of similar (adjusted) measures to reduce traffic violations and/or accident fatalities and injuries, and to promote road safety in other regions.
Journal of The Royal Statistical Society Series B-statistical Methodology | 2002
Kelvin K. W. Yau; Anthony Y. C. Kuk
Generalized linear mixed models (GLMMs) are widely used to analyse non-normal response data with extra-variation, but non-robust estimators are still routinely used. We propose robust methods for maximum quasi-likelihood and residual maximum quasi-likelihood estimation to limit the influence of outlying observations in GLMMs. The estimation procedure parallels the development of robust estimation methods in linear mixed models, but with adjustments in the dependent variable and the variance component. The methods proposed are applied to three data sets and a comparison is made with the nonparametric maximum likelihood approach. When applied to a set of epileptic seizure data, the methods proposed have the desired effect of limiting the influence of outlying observations on the parameter estimates. Simulation shows that one of the residual maximum quasi-likelihood proposals has a smaller bias than those of the other estimation methods. We further discuss the equivalence of two GLMM formulations when the response variable follows an exponential family. Their extensions to robust GLMMs and their comparative advantages in modelling are described. Some possible modifications of the robust GLMM estimation methods are given to provide further flexibility for applying the method.
Accident Analysis & Prevention | 2002
Andy H. Lee; Mark Stevenson; K. Wang; Kelvin K. W. Yau
Much of the data collected on motor vehicle crashes is count data. The standard Poisson regression approach used to model this type of data does not take into account the fact there are few crash events and hence, many observed zeros. In this paper, we applied the zero-inflated Poisson (ZIP) model (which adjusts for the many observed zeros) and the negative binomial (NB) model to analyze young driver motor vehicle crashes. The results of the ZIP regression model are comparable to those from fitting a NB regression model for general over-dispersion. The findings highlight that driver confidence/adventurousness and the frequency of driving prior to licensing are significant predictors of crash outcome in the first 12 months of driving. We encourage researchers, when analyzing motor vehicle crash data, to consider the empirical frequency distribution first and to apply the ZIP and NB models in the presence of extra zeros due, for example, to under-reporting.
Statistics in Medicine | 1998
Kelvin K. W. Yau; C. A. McGilchrist
In the study of multiple failure times for the same subjects, for example, recurrent infections for patients with a given disease, there are often subject effects, that is, subjects have different risks that cannot be explained by known covariates. Standard methods, which ignore subject effects, lead to overestimation of precision. The frailty model for subject effects is better, but can be insufficient, because it assumes that subject effects are constant over time. Experience has shown that the dependence between different time periods often decreases with distance in time. Such a model is presented here, assuming that the frailty is no longer constant, but time varying, with one value for each spell. The main example is a first-order autoregressive process. This is applied to a data set of 128 patients with chronic granulomatous disease (CGD), participating in a placebo controlled randomized trial of gamma interferon (gamma-IFN), suffering between 0 and 7 infections. It is shown that the time varying frailty model gives a significantly better fit than the constant frailty model.
Biometrical Journal | 2001
Andy H. Lee; Kui Wang; Kelvin K. W. Yau
When analyzing Poisson count data sometimes a high frequency of extra zeros is observed. The Zero-Inflated Poisson (ZIP) model is a popular approach to handle zero-inflation. In this paper we generalize the ZIP model and its regression counterpart to accommodate the extent of individual exposure. Empirical evidence drawn from an occupational injury data set confirms that the incorporation of exposure information can exert a substantial impact on the model fit. Tests for zero-inflation are also considered. Their finite sample properties are examined in a Monte Carlo study.
Computer Methods and Programs in Biomedicine | 2002
Kui Wang; Kelvin K. W. Yau; Andy H. Lee
With increasing trend of same-day procedures and operations performed for hospital admissions, it is important to analyze those Diagnosis Related Groups (DRGs) consisting of mainly same-day separations. A zero-inflated Poisson (ZIP) mixed model is presented to identify health- and patient-related characteristics associated with length of stay (LOS) and to model variations in LOS within such DRGs. Random effects are introduced to account for inter-hospital variations and the dependence of clustered LOS observations via the generalized linear mixed models (GLMM) approach. Parameter estimation is achieved by maximizing an appropriate log-likelihood function using the EM algorithm to obtain approximate residual maximum likelihood (REML) estimates. An S-Plus macro is developed to provide a unified ZIP modeling approach. The determination of pertinent factors would benefit hospital administrators and clinicians to manage LOS and expenditures efficiently.
Computational Statistics & Data Analysis | 2003
Kelvin K. W. Yau; Andy H. Lee; Angus S.K. Ng
A two-component mixture regression model that allows simultaneously for heterogeneity and dependency among observations is proposed. By specifying random effects explicitly in the linear predictor of the mixture probability and the mixture components, parameter estimation is achieved by maximising the corresponding best linear unbiased prediction type log-likelihood. Approximate residual maximum likelihood estimates are obtained via an EM algorithm in the manner of generalised linear mixed model (GLMM). The method can be extended to a g-component mixture regression model with the component density from the exponential family, leading to the development of the class of finite mixture GLMM. For illustration, the method is applied to analyse neonatal length of stay (LOS). It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation.
Statistics in Medicine | 2011
Liming Xiang; Xiangmei Ma; Kelvin K. W. Yau
The mixture cure model is an effective tool for analysis of survival data with a cure fraction. This approach integrates the logistic regression model for the proportion of cured subjects and the survival model (either the Cox proportional hazards or accelerated failure time model) for uncured subjects. Methods based on the mixture cure model have been extensively investigated in the literature for data with exact failure/censoring times. In this paper, we propose a mixture cure modeling procedure for analyzing clustered and interval-censored survival time data by incorporating random effects in both the logistic regression and PH regression components. Under the generalized linear mixed model framework, we develop the REML estimation for the parameters, as well as an iterative algorithm for estimation of the survival function for interval-censored data. The estimation procedure is implemented via an EM algorithm. A simulation study is conducted to evaluate the performance of the proposed method in various practical situations. To demonstrate its usefulness, we apply the proposed method to analyze the interval-censored relapse time data from a smoking cessation study whose subjects were recruited from 51 zip code regions in the southeastern corner of Minnesota.
Emergency Medicine Journal | 2015
Mina Cheng; Moon-Tong Cheung; Kin-Yan Lee; Kin-Bong Lee; Susan-C H Chan; Amy-C Y Wu; Yu-Fat Chow; A. Chang; Hiu-Fai Ho; Kelvin K. W. Yau
Background The mortality rate in patients with haemodynamically unstable pelvic fractures is as high as 40–60%. In recent years, angioembolisation and pelvic packing have been introduced as part of a multimodality treatment for these patients. Protocol-driven management has been shown to improve outcomes. Patients and methods This is a Level III retrospective cohort study of patients suffering from unstable pelvic fractures from 1 January 1996 to 30 September 2011. The aim of the study was to review our results, particularly in terms of mortality through the evolution of three phases of treatment protocols: preangiography, angiography and pelvic packing. Results The overall 30-day mortality rate for all patients was 47.2%, with a rate of 63.5% in the preangiography phase, 42.1% in the angiography phase and 30.6% in the pelvic packing phase. Multivariate logistic regression analysis identified the use of retroperitoneal packing as a significant independent predictive factor for 24 h mortality. Conclusions Our results showed an improvement in patient survival with sequential protocols over the study period, during which we incorporated a multidisciplinary approach to managing these complicated pelvic fractures. The results strongly suggest that retroperitoneal packing should be highly recommended for bleeding subsequent to pelvic fracture, in addition to other modalities of treatment.