Kam-Wah Tsui
University of Wisconsin-Madison
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kam-Wah Tsui.
Journal of Computational Biology | 2001
Michael A. Newton; Christina Kendziorski; Craig Richmond; Frederick R. Blattner; Kam-Wah Tsui
We consider the problem of inferring fold changes in gene expression from cDNA microarray data. Standard procedures focus on the ratio of measured fluorescent intensities at each spot on the microarray, but to do so is to ignore the fact that the variation of such ratios is not constant. Estimates of gene expression changes are derived within a simple hierarchical model that accounts for measurement error and fluctuations in absolute gene expression levels. Significant gene expression changes are identified by deriving the posterior odds of change within a similar model. The methods are tested via simulation and are applied to a panel of Escherichia coli microarrays.
Journal of the American Statistical Association | 1989
Kam-Wah Tsui; Samaradasa Weerahandi
Abstract This article examines some problems of significance testing for one-sided hypotheses of the form H 0 : θ ≤ θ 0 versus H 1 : θ > θ 0, where θ is the parameter of interest. In the usual setting, let x be the observed data and let T(X) be a test statistic such that the family of distributions of T(X) is stochastically increasing in θ. Define Cx as {X : T(X) — T(x) ≥ 0}. The p value is p(x) = sup θ≤θ0 Pr(X ∈ Cx | θ). In the presence of a nuisance parameter η, there may not exist a nontrivial Cx with a p value independent of η. We consider tests based on generalized extreme regions of the form Cx (θ, η) = {X : T(X; x, θ, η) ≥ T(x; x, θ, η)}, and conditions on T(X; x, θ, η) are given such that the p value p(x) = sup θ≤θ0 Pr(X ∈ Cx (θ, η)) is free of the nuisance parameter η, where T is stochastically increasing in θ. We provide a solution to the problem of testing hypotheses about the differences in means of two independent exponential distributions, a problem for which the fixed-level testing approach...
Journal of the American Statistical Association | 1996
Tom Y. M. Chiu; Tom Leonard; Kam-Wah Tsui
Abstract A flexible method is introduced to model the structure of a covariance matrix C and study the dependence of the covariances on explanatory variables by observing that for any real symmetric matrix A, the matrix exponential transformation, C = exp (A), is a positive definite matrix. Because there is no constraint on the possible values of the upper triangular elements on A, any possible structure of interest can be imposed on them. The method presented here is not intended to replace the existing special models available for a covariance matrix, but rather to provide a broad range of further structures that supplements existing methodology. Maximum likelihood estimation procedures are used to estimate the parameters, and the large-sample asymptotic properties are obtained. A simulation study and two real-life examples are given to illustrate the method introduced.
PLOS Genetics | 2005
Qiaoning Guan; Wei Zheng; Shijie Tang; Xiaosong Liu; Robert A. Zinkel; Kam-Wah Tsui; Brian S. Yandell; Michael R. Culbertson
Nonsense-mediated mRNA decay (NMD) is a eukaryotic mechanism of RNA surveillance that selectively eliminates aberrant transcripts coding for potentially deleterious proteins. NMD also functions in the normal repertoire of gene expression. In Saccharomyces cerevisiae, hundreds of endogenous RNA Polymerase II transcripts achieve steady-state levels that depend on NMD. For some, the decay rate is directly influenced by NMD (direct targets). For others, abundance is NMD-sensitive but without any effect on the decay rate (indirect targets). To distinguish between direct and indirect targets, total RNA from wild-type (Nmd+) and mutant (Nmd−) strains was probed with high-density arrays across a 1-h time window following transcription inhibition. Statistical models were developed to describe the kinetics of RNA decay. 45% ± 5% of RNAs targeted by NMD were predicted to be direct targets with altered decay rates in Nmd− strains. Parallel experiments using conventional methods were conducted to empirically test predictions from the global experiment. The results show that the global assay reliably distinguished direct versus indirect targets. Different types of targets were investigated, including transcripts containing adjacent, disabled open reading frames, upstream open reading frames, and those prone to out-of-frame initiation of translation. Known targeting mechanisms fail to account for all of the direct targets of NMD, suggesting that additional targeting mechanisms remain to be elucidated. 30% of the protein-coding targets of NMD fell into two broadly defined functional themes: those affecting chromosome structure and behavior and those affecting cell surface dynamics. Overall, the results provide a preview for how expression profiles in multi-cellular eukaryotes might be impacted by NMD. Furthermore, the methods for analyzing decay rates on a global scale offer a blueprint for new ways to study mRNA decay pathways in any organism where cultured cell lines are available.
Journal of the American Statistical Association | 1989
Tom Leonard; John S. J. Hsu; Kam-Wah Tsui
Abstract A method is proposed for approximating the marginal posterior density of a continuous function of several unknown parameters, thus permitting inferences about any parameter of interest for nonlinear models when the sample size is finite. Possibly tedious numerical integrations are replaced by conditional maximizations, which are shown to be quite accurate in a number of special cases. There are similarities with the profile likelihood ideas originated by Kalbfleisch and Sprott (1970), and the method is contrasted with a Laplacian approximation recommended by Kass, Tierney, and Kadane (1988, in press), referred to here as the “KTK procedure.” The methods are used to approximate the marginal posterior densities of the log-linear interaction effects and an overall measure of association in a two-way contingency table. Snees (1974) hair/eye color data are reanalyzed, and adjustments are proposed to Goodmans (1964) analysis for the full-rank interaction model. Another application concerns marginaliz...
Biometrics | 2011
Yuan Ji; Yanxun Xu; Qiong Zhang; Kam-Wah Tsui; Yuan Yuan; Clift Norris; Shoudan Liang; Han Liang
Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software.
Journal of Computational and Graphical Statistics | 2013
Xinwei Deng; Kam-Wah Tsui
For statistical inferences that involve covariance matrices, it is desirable to obtain an accurate covariance matrix estimate with a well-structured eigen-system. We propose to estimate the covariance matrix through its matrix logarithm based on an approximate log-likelihood function. We develop a generalization of the Leonard and Hsu log-likelihood approximation that no longer requires a nonsingular sample covariance matrix. The matrix log-transformation provides the ability to impose a convex penalty on the transformed likelihood such that the largest and smallest eigenvalues of the covariance matrix estimate can be regularized simultaneously. The proposed method transforms the problem of estimating the covariance matrix into the problem of estimating a symmetric matrix, which can be solved efficiently by an iterative quadratic programming algorithm. The merits of the proposed method are illustrated by a simulation study and two real applications in classification and portfolio optimization. Supplementary materials for this article are available online.
Annals of the Institute of Statistical Mathematics | 1990
Mohamed Madi; Kam-Wah Tsui
We consider the estimation of the ratio of the scale parameters of two independent two-parameter exponential distributions with unknown location parameters. It is shown that the best affine equivariant estimator (BAEE) is inadmissible under any loss function from a large class of bowl-shaped loss functions. Two new classes of improved estimators are obtained. Some values of the risk functions of the BAEE and two improved estimators are evaluated for two particular loss functions. Our results are parallel to those of Zidek (1973, Ann. Statist., 1, 264–278), who derived a class of estimators that dominate the BAEE of the scale parameter of a two-parameter exponential distribution.
Journal of the American Statistical Association | 1984
Kam-Wah Tsui
Abstract For estimating the means of several independent Poisson distributions, Clevenson and Zidek (1975) were the first to propose a class of estimators that are better than the usual one under the normalized squared error loss (1.1). This class of estimators was subsequently enlarged by others. This article examines the sensitivity of the superiority of the Clevenson-Zidek (CZ)-type means estimators over the usual one with respect to the exactness of the Poisson distribution assumption. It is shown that many of the CZ-type means estimators dominate the usual estimator even when the underlying distributions are negative binomial, whether they are close to the Poisson or not. This broadens the scope of use of CZ-type estimators.
Journal of the American Statistical Association | 2008
Kjell A. Doksum; Shijie Tang; Kam-Wah Tsui
We consider regression experiments involving a response variable Y and a large number of predictor variables X1, …, Xd, many of which may be irrelevant for the prediction of Y and thus must be removed before Y can be predicted from the X’s. We consider two procedures that select variables by using importance scores that measure the strength of the relationship between predictor variables and a response and keep those variables whose importance scores exceed a threshold. In the first of these procedures, scores are obtained by randomly drawing subregions (tubes) of the predictor space that constrain all but one predictor and in each subregion computing a signal-to-noise ratio (efficacy) based on a nonparametric univariate regression of Y on the unconstrained variable. The subregions are adapted to boost weak variables iteratively by searching (hunting) for the subregions in which the efficacy is maximized. The efficacy can be viewed as an approximation to a one-to-one function of the probability of identifying features. By using importance scores based on averages of maximized efficacies, we develop a variable selection algorithm called EARTH (efficacy adaptive regression tube hunting) based on examining the conditional expectation of the response given all but one of the predictor variables for a collection of randomly, adaptively, and iteratively selected regions. The second importance score method (RFVS) is based on using random forest importance values to select variable. Computer simulations show that EARTH and RFVS are successful variable selection methods compared with other procedures in nonparametric situations with a large number of irrelevant predictor variables, and that when each is combined with the model selection and prediction procedure MARS, the tree-based prediction procedure GUIDE, and the random forest method, the combinations lead to improved prediction accuracy for certain models with many irrelevant variables. We give conditions under which a version of the EARTH algorithm selects the correct model with probability tending to 1 as the sample size n tends to infinity even if d → ∞as n → ∞. We include the analysis of a real data set in which we show how a training set can be used to find a threshold for the EARTH importance scores.