Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peng Ding is active.

Publication


Featured researches published by Peng Ding.


arXiv: Statistics Theory | 2015

To Adjust or Not to Adjust? Sensitivity Analysis of M-Bias and Butterfly-Bias

Peng Ding; Luke Miratrix

Abstract “M-Bias,” as it is called in the epidemiologic literature, is the bias introduced by conditioning on a pretreatment covariate due to a particular “M-Structure” between two latent factors, an observed treatment, an outcome, and a “collider.” This potential source of bias, which can occur even when the treatment and the outcome are not confounded, has been a source of considerable controversy. We here present formulae for identifying under which circumstances biases are inflated or reduced. In particular, we show that the magnitude of M-Bias in linear structural equation models tends to be relatively small compared to confounding bias, suggesting that it is generally not a serious concern in many applied settings. These theoretical results are consistent with recent empirical findings from simulation studies. We also generalize the M-Bias setting (1) to allow for the correlation between the latent factors to be nonzero and (2) to allow for the collider to be a confounder between the treatment and the outcome. These results demonstrate that mild deviations from the M-Structure tend to increase confounding bias more rapidly than M-Bias, suggesting that choosing to condition on any given covariate is generally the superior choice. As an application, we re-examine a controversial example between Professors Donald Rubin and Judea Pearl.


Journal of the American Statistical Association | 2017

General Forms of Finite Population Central Limit Theorems with Applications to Causal Inference

Xinran Li; Peng Ding

ABSTRACT Frequentists’ inference often delivers point estimators associated with confidence intervals or sets for parameters of interest. Constructing the confidence intervals or sets requires understanding the sampling distributions of the point estimators, which, in many but not all cases, are related to asymptotic Normal distributions ensured by central limit theorems. Although previous literature has established various forms of central limit theorems for statistical inference in super population models, we still need general and convenient forms of central limit theorems for some randomization-based causal analyses of experimental data, where the parameters of interests are functions of a finite population and randomness comes solely from the treatment assignment. We use central limit theorems for sample surveys and rank statistics to establish general forms of the finite population central limit theorems that are particularly useful for proving asymptotic distributions of randomization tests under the sharp null hypothesis of zero individual causal effects, and for obtaining the asymptotic repeated sampling distributions of the causal effect estimators. The new central limit theorems hold for general experimental designs with multiple treatment levels, multiple treatment factors and vector outcomes, and are immediately applicable for studying the asymptotic properties of many methods in causal inference, including instrumental variable, regression adjustment, rerandomization, cluster-randomized experiments, and so on. Previously, the asymptotic properties of these problems are often based on heuristic arguments, which in fact rely on general forms of finite population central limit theorems that have not been established before. Our new theorems fill this gap by providing more solid theoretical foundation for asymptotic randomization-based causal inference. Supplementary materials for this article are available online.


Journal of the American Statistical Association | 2016

Identifiability of Normal and Normal Mixture Models with Nonignorable Missing Data

Wang Miao; Peng Ding; Zhi Geng

ABSTRACT Missing data problems arise in many applied research studies. They may jeopardize statistical inference of the model of interest, if the missing mechanism is nonignorable, that is, the missing mechanism depends on the missing values themselves even conditional on the observed data. With a nonignorable missing mechanism, the model of interest is often not identifiable without imposing further assumptions. We find that even if the missing mechanism has a known parametric form, the model is not identifiable without specifying a parametric outcome distribution. Although it is fundamental for valid statistical inference, identifiability under nonignorable missing mechanisms is not established for many commonly used models. In this article, we first demonstrate identifiability of the normal distribution under monotone missing mechanisms. We then extend it to the normal mixture and t mixture models with nonmonotone missing mechanisms. We discover that models under the Logistic missing mechanism are less identifiable than those under the Probit missing mechanism. We give necessary and sufficient conditions for identifiability of models under the Logistic missing mechanism, which sometimes can be checked in real data analysis. We illustrate our methods using a series of simulations, and apply them to a real-life dataset. Supplementary materials for this article are available online.


Proceedings of the National Academy of Sciences of the United States of America | 2018

Asymptotic theory of rerandomization in treatment–control experiments

Xinran Li; Peng Ding; Donald B. Rubin

Significance Rerandomization refers to experimental designs that enforce covariate balance. This paper studies the asymptotic properties of the difference-in-means estimator under rerandomization, based on the randomness of the treatment assignment without imposing any parametric modeling assumptions on the covariates or outcome. The non-Gaussian asymptotic distribution allows for constructing large-sample confidence intervals for the average treatment effect and demonstrates the advantages of rerandomization over complete randomization. Although complete randomization ensures covariate balance on average, the chance of observing significant differences between treatment and control covariate distributions increases with many covariates. Rerandomization discards randomizations that do not satisfy a predetermined covariate balance criterion, generally resulting in better covariate balance and more precise estimates of causal effects. Previous theory has derived finite sample theory for rerandomization under the assumptions of equal treatment group sizes, Gaussian covariate and outcome distributions, or additive causal effects, but not for the general sampling distribution of the difference-in-means estimator for the average causal effect. We develop asymptotic theory for rerandomization without these assumptions, which reveals a non-Gaussian asymptotic distribution for this estimator, specifically a linear combination of a Gaussian random variable and truncated Gaussian random variables. This distribution follows because rerandomization affects only the projection of potential outcomes onto the covariate space but does not affect the corresponding orthogonal residuals. We demonstrate that, compared with complete randomization, rerandomization reduces the asymptotic quantile ranges of the difference-in-means estimator. Moreover, our work constructs accurate large-sample confidence intervals for the average causal effect.


Statistics in Medicine | 2016

Exact confidence intervals for the average causal effect on a binary outcome

Xinran Li; Peng Ding

Based on the physical randomization of completely randomized experiments, in a recent article in Statistics in Medicine, Rigdon and Hudgens propose two approaches to obtaining exact confidence intervals for the average causal effect on a binary outcome. They construct the first confidence interval by combining, with the Bonferroni adjustment, the prediction sets for treatment effects among treatment and control groups, and the second one by inverting a series of randomization tests. With sample size n, their second approach requires performing O(n4 )randomization tests. We demonstrate that the physical randomization also justifies other ways to constructing exact confidence intervals that are more computationally efficient. By exploiting recent advances in hypergeometric confidence intervals and the stochastic order information of randomization tests, we propose approaches that either do not need to invoke Monte Carlo or require performing at most O(n2) randomization tests. We provide technical details and R code in the Supporting Information.


Journal of the American Statistical Association | 2018

Decomposing Treatment Effect Variation

Peng Ding; Avi Feller; Luke Miratrix

ABSTRACT Understanding and characterizing treatment effect variation in randomized experiments has become essential for going beyond the “black box” of the average treatment effect. Nonetheless, traditional statistical approaches often ignore or assume away such variation. In the context of randomized experiments, this article proposes a framework for decomposing overall treatment effect variation into a systematic component explained by observed covariates and a remaining idiosyncratic component. Our framework is fully randomization-based, with estimates of treatment effect variation that are entirely justified by the randomization itself. Our framework can also account for noncompliance, which is an important practical complication. We make several contributions. First, we show that randomization-based estimates of systematic variation are very similar in form to estimates from fully interacted linear regression and two-stage least squares. Second, we use these estimators to develop an omnibus test for systematic treatment effect variation, both with and without noncompliance. Third, we propose an R2-like measure of treatment effect variation explained by covariates and, when applicable, noncompliance. Finally, we assess these methods via simulation studies and apply them to the Head Start Impact Study, a large-scale randomized experiment. Supplementary materials for this article are available online.


arXiv: Statistics Theory | 2017

Bridging Finite and Super Population Causal Inference

Peng Ding; Xinran Li; Luke Miratrix

Abstract There are two general views in causal analysis of experimental data: the super population view that the units are an independent sample from some hypothetical infinite population, and the finite population view that the potential outcomes of the experimental units are fixed and the randomness comes solely from the treatment assignment. These two views differs conceptually and mathematically, resulting in different sampling variances of the usual difference-in-means estimator of the average causal effect. Practically, however, these two views result in identical variance estimators. By recalling a variance decomposition and exploiting a completeness-type argument, we establish a connection between these two views in completely randomized experiments. This alternative formulation could serve as a template for bridging finite and super population causal inference in other scenarios.


Journal of the American Statistical Association | 2018

Randomization Inference for Peer Effects

Xinran Li; Peng Ding; Qian Lin; Dawei Yang; Jun S. Liu

Abstract Many previous causal inference studies require no interference, that is, the potential outcomes of a unit do not depend on the treatments of other units. However, this no-interference assumption becomes unreasonable when a unit interacts with other units in the same group or cluster. In a motivating application, a top Chinese university admits students through two channels: the college entrance exam (also known as Gaokao) and recommendation (often based on Olympiads in various subjects). The university randomly assigns students to dorms, each of which hosts four students. Students within the same dorm live together and have extensive interactions. Therefore, it is likely that peer effects exist and the no-interference assumption does not hold. It is important to understand peer effects, because they give useful guidance for future roommate assignment to improve the performance of students. We define peer effects using potential outcomes. We then propose a randomization-based inference framework to study peer effects with arbitrary numbers of peers and peer types. Our inferential procedure does not assume any parametric model on the outcome distribution. Our analysis gives useful practical guidance for policy makers of the university. Supplementary materials for this article are available online.


Journal of The Royal Statistical Society Series B-statistical Methodology | 2016

Randomization inference for treatment effect variation

Peng Ding; Avi Feller; Luke Miratrix


arXiv: Statistics Theory | 2017

Overlap in Observational Studies with High-Dimensional Covariates

Alexander D'Amour; Peng Ding; Avi Feller; Lihua Lei; Jasjeet S. Sekhon

Collaboration


Dive into the Peng Ding's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Avi Feller

University of California

View shared research outputs
Top Co-Authors

Avatar

Lihua Lei

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge