Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Wenguang Sun is active.

Publication


Featured researches published by Wenguang Sun.


Journal of the American Statistical Association | 2007

Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control

Wenguang Sun; T. Tony Cai

We develop a compound decision theory framework for multiple-testing problems and derive an oracle rule based on the z values that minimizes the false nondiscovery rate (FNR) subject to a constraint on the false discovery rate (FDR). We show that many commonly used multiple-testing procedures, which are p value–based, are inefficient, and propose an adaptive procedure based on the z values. The z value–based adaptive procedure asymptotically attains the performance of the z value oracle procedure and is more efficient than the conventional p value–based methods. We investigate the numerical performance of the adaptive procedure using both simulated and real data. In particular, we demonstrate our method in an analysis of the microarray data from a human immunodeficiency virus study that involves testing a large number of hypotheses simultaneously.


Statistics in Medicine | 2009

A comparison of several approaches for choosing between working correlation structures in generalized estimating equation analysis of longitudinal binary data.

Justine Shults; Wenguang Sun; Xin Tu; Hanjoo Kim; Jay D. Amsterdam; Joseph Hilbe; Thomas TenHave

The method of generalized estimating equations (GEE) models the association between the repeated observations on a subject with a patterned correlation matrix. Correct specification of the underlying structure is a potentially beneficial goal, in terms of improving efficiency and enhancing scientific understanding. We consider two sets of criteria that have previously been suggested, respectively, for selecting an appropriate working correlation structure, and for ruling out a particular structure(s), in the GEE analysis of longitudinal studies with binary outcomes. The first selection criterion chooses the structure for which the model-based and the sandwich-based estimator of the covariance matrix of the regression parameter estimator are closest, while the second selection criterion chooses the structure that minimizes the weighted error sum of squares. The rule out criterion deselects structures for which the estimated correlation parameter violates standard constraints for binary data that depend on the marginal means. In addition, we remove structures from consideration if their estimated parameter values yield an estimated correlation structure that is not positive definite. We investigate the performance of the two sets of criteria using both simulated and real data, in the context of a longitudinal trial that compares two treatments for major depressive episode. Practical recommendations are also given on using these criteria to aid in the efficient selection of a working correlation structure in GEE analysis of longitudinal binary data.


Journal of the American Statistical Association | 2009

Simultaneous Testing of Grouped Hypotheses: Finding Needles in Multiple Haystacks

T. Tony Cai; Wenguang Sun

In large-scale multiple testing problems, data are often collected from heterogeneous sources and hypotheses form into groups that exhibit different characteristics. Conventional approaches, including the pooled and separate analyses, fail to efficiently utilize the external grouping information. We develop a compound decision theoretic framework for testing grouped hypotheses and introduce an oracle procedure that minimizes the false nondiscovery rate subject to a constraint on the false discovery rate. It is shown that both the pooled and separate analyses can be uniformly improved by the oracle procedure. We then propose a data-driven procedure that is shown to be asymptotically optimal. Simulation studies show that our procedures enjoy superior performance and yield the most accurate results in comparison with both the pooled and separate procedures. A real-data example with grouped hypotheses is studied in detail using different methods. Both theoretical and numerical results demonstrate that exploiting external information of the sample can greatly improve the efficiency of a multiple testing procedure. The results also provide insights on how the grouping information is incorporated for optimal simultaneous inference.


Journal of the American Statistical Association | 2011

Multiple Testing for Pattern Identification, With Applications to Microarray Time-Course Experiments

Wenguang Sun; Zhi Wei

In time-course experiments, it is often desirable to identify genes that exhibit a specific pattern of differential expression over time and thus gain insights into the mechanisms of the underlying biological processes. Two challenging issues in the pattern identification problem are: (i) how to combine the simultaneous inferences across multiple time points and (ii) how to control the multiplicity while accounting for the strong dependence. We formulate a compound decision-theoretic framework for set-wise multiple testing and propose a data-driven procedure that aims to minimize the missed set rate subject to a constraint on the false set rate. The hidden Markov model proposed in Yuan and Kendziorski (2006) is generalized to capture the temporal correlation in the gene expression data. Both theoretical and numerical results are presented to show that our data-driven procedure controls the multiplicity, provides an optimal way of combining simultaneous inferences across multiple time points, and greatly improves the conventional combined p-value methods. In particular, we demonstrate our method in an application to a study of systemic inflammation in humans for detecting early and late response genes.


Journal of the American Statistical Association | 2012

Multiple Testing of Composite Null Hypotheses in Heteroscedastic Models

Wenguang Sun; Alexander C. McLain

In large-scale studies, the true effect sizes often range continuously from zero to small to large, and are observed with heteroscedastic errors. In practical situations where the failure to reject small deviations from the null is inconsequential, specifying an indifference region (or forming composite null hypotheses) can greatly reduce the number of unimportant discoveries in multiple testing. The heteroscedasticity issue poses new challenges for multiple testing with composite nulls. In particular, the conventional framework in multiple testing, which involves rescaling or standardization, is likely to distort the scientific question. We propose the concept of a composite null distribution for heteroscedastic models and develop an optimal testing procedure that minimizes the false nondiscovery rate, subject to a constraint on the false discovery rate. The proposed approach is different from conventional methods in that the effect size, statistical significance, and multiplicity issues are addressed integrally. The external information of heteroscedastic errors is incorporated for optimal simultaneous inference. The new features and advantages of our approach are demonstrated using both simulated and real data. The numerical studies demonstrate that our new procedure enjoys superior performance with greater accuracy and better interpretability of results.


Biometrical Journal | 2009

A Note on the Use of Unbiased Estimating Equations to Estimate Correlation in Analysis of Longitudinal Trials

Wenguang Sun; Justine Shults; Mary B. Leonard

Longitudinal trials can yield outcomes that are continuous, binary (yes/no), or are realizations of counts. In this setting we compare three approaches that have been proposed for estimation of the correlation in the framework of generalized estimating equations (GEE): quasi-least squares (QLS), pseudo-likelihood (PL), and an approach we refer to as Wang-Carey (WC). We prove that WC and QLS are identical for the first-order autoregressive AR(1) correlation structure. Using simulations, we then develop guidelines for selection of an appropriate method for analysis of data from a longitudinal trial. In particular, we argue that no method is uniformly superior for analysis of unbalanced and unequally spaced data with a Markov correlation structure. Choice of the best approach will depend on the degree of imbalance and variability in the temporal spacing of measurements, value of the correlation, and type of outcome, e.g. binary or continuous. Finally, we contrast the methods in analysis of a longitudinal study of obesity following renal transplantation in children.


Journal of the American Statistical Association | 2018

Weighted False Discovery Rate Control in Large-Scale Multiple Testing

Pallavi Basu; T. Tony Cai; Kiranmoy Das; Wenguang Sun

ABSTRACT The use of weights provides an effective strategy to incorporate prior domain knowledge in large-scale inference. This article studies weighted multiple testing in a decision-theoretical framework. We develop oracle and data-driven procedures that aim to maximize the expected number of true positives subject to a constraint on the weighted false discovery rate. The asymptotic validity and optimality of the proposed methods are established. The results demonstrate that incorporating informative domain knowledge enhances the interpretability of results and precision of inference. Simulation studies show that the proposed method controls the error rate at the nominal level, and the gain in power over existing methods is substantial in many settings. An application to a genome-wide association study is discussed. Supplementary materials for this article are available online.


Biometrics | 2010

Design and Analysis of Multiple Events Case-Control Studies

Wenguang Sun; Marshall M. Joffe; Jinbo Chen; Steven M. Brunelli

In case-control research where there are multiple case groups, standard analyses fail to make use of all available information. Multiple events case-control (MECC) studies provide a new approach to sampling from a cohort and are useful when it is desired to study multiple types of events in the cohort. In this design, subjects in the cohort who develop any event of interest are sampled, as well as a fraction of the remaining subjects. We show that a simple case-control analysis of data arising from MECC studies is biased and develop three general estimating-equation-based approaches to analyzing data from these studies. We conduct simulation studies to compare the efficiency of the various MECC analyses with each other and with the corresponding conventional analyses. It is shown that the gain in efficiency by using the new design is substantial in many situations. We demonstrate the application of our approach to a nested case-control study of the effect of oral sodium phosphate use on chronic kidney injury with multiple case definitions.


Journal of The Royal Statistical Society Series B-statistical Methodology | 2009

Large‐scale multiple testing under dependence

Wenguang Sun; T. Tony Cai


American Journal of Epidemiology | 2007

On the Estimation and Use of Propensity Scores in Case-Control and Case-Cohort Studies

Roger Månsson; Marshall M. Joffe; Wenguang Sun; Sean Hennessy

Collaboration


Dive into the Wenguang Sun's collaboration.

Top Co-Authors

Avatar

T. Tony Cai

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Justine Shults

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Zhi Wei

New Jersey Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Xin Tu

University of Liverpool

View shared research outputs
Top Co-Authors

Avatar

Jay D. Amsterdam

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Marshall M. Joffe

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alexander C. McLain

University of South Carolina

View shared research outputs
Top Co-Authors

Avatar

Armin Schwartzman

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar

Brian J. Reich

North Carolina State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge