Featured Researches

Methodology

Regression Models for Order-of-Addition Experiments

The purpose of order-of-addition (OofA) experiments is to identify the best order in a sequence of m components in a system or treatment. Such experiments may be analysed by various regression models, the most popular ones being based on pairwise ordering (PWO) factors or on component-position (CP) factors. This paper reviews these models and extensions and proposes a new class of models based on response surface (RS) regression using component position numbers as predictor variables. Using two published examples, it is shown that RS models can be quite competitive. In case of model uncertainty, we advocate the use of model averaging for analysis. The averaging idea leads naturally to a design approach based on a compound optimality criterion assigning weights to each candidate model.

Read more
Methodology

Regression analysis for covariate-adaptive randomization: A robust and efficient inference perspective

Linear regression is arguably the most fundamental statistical model; however, the validity of its use in randomized clinical trials, despite being common practice, has never been crystal clear, particularly when stratified or covariate-adaptive randomization is used. In this paper, we investigate several of the most intuitive and commonly used regression models for estimating and inferring the treatment effect in randomized clinical trials. By allowing the regression model to be arbitrarily misspecified, we demonstrate that all these regression-based estimators robustly estimate the treatment effect, albeit with possibly different efficiency. We also propose consistent non-parametric variance estimators and compare their performances to those of the model-based variance estimators that are readily available in standard statistical software. Based on the results and taking into account both theoretical efficiency and practical feasibility, we make recommendations for the effective use of regression under various scenarios. For equal allocation, it suffices to use the regression adjustment for the stratum covariates and additional baseline covariates, if available, with the usual ordinary-least-squares variance estimator. For unequal allocation, regression with treatment-by-covariate interactions should be used, together with our proposed variance estimators. These recommendations apply to simple and stratified randomization, and minimization, among others. We hope this work helps to clarify and promote the usage of regression in randomized clinical trials.

Read more
Methodology

Regression discontinuity design: estimating the treatment effect with standard parametric rate

Regression discontinuity design models are widely used for the assessment of treatment effects in psychology, econometrics and biomedicine, specifically in situations where treatment is assigned to an individual based on their characteristics (e.g. scholarship is allocated based on merit) instead of being allocated randomly, as is the case, for example, in randomized clinical trials. Popular methods that have been largely employed till date for estimation of such treatment effects suffer from slow rates of convergence (i.e. slower than n ??????). In this paper, we present a new model and method that allows estimation of the treatment effect at n ??????rate in the presence of fairly general forms of confoundedness. Moreover, we show that our estimator is also semi-parametrically efficient in certain situations. We analyze two real datasets via our method and compare our results with those obtained by using previous approaches. We conclude this paper with a discussion on some possible extensions of our method.

Read more
Methodology

Regression-based causal inference with factorial experiments: estimands, model specifications, and design-based properties

Factorial designs are widely used due to their ability to accommodate multiple factors simultaneously. The factor-based regression with main effects and some interactions is the dominant strategy for downstream data analysis, delivering point estimators and standard errors via one single regression. Justification of these convenient estimators from the design-based perspective requires quantifying their sampling properties under the assignment mechanism conditioning on the potential outcomes. To this end, we derive the sampling properties of the factor-based regression estimators from both saturated and unsaturated models, and demonstrate the appropriateness of the robust standard errors for the Wald-type inference. We then quantify the bias-variance trade-off between the saturated and unsaturated models from the design-based perspective, and establish a novel design-based Gauss--Markov theorem that ensures the latter's gain in efficiency when the nuisance effects omitted indeed do not exist. As a byproduct of the process, we unify the definitions of factorial effects in various literatures and propose a location-shift strategy for their direct estimation from factor-based regressions. Our theory and simulation suggest using factor-based inference for general factorial effects, preferably with parsimonious specifications in accordance with the prior knowledge of zero nuisance effects.

Read more
Methodology

Regularizing Double Machine Learning in Partially Linear Endogenous Models

We estimate the linear coefficient in a partially linear model with confounding variables. We rely on double machine learning (DML) and extend it with an additional regularization and selection scheme. We allow for more general dependence structures among the model variables than what has been investigated previously, and we prove that this DML estimator remains asymptotically Gaussian and converges at the parametric rate. The DML estimator has a two-stage least squares interpretation and may produce overly wide confidence intervals. To address this issue, we propose the regularization-selection regsDML method that leads to narrower confidence intervals. It is fully data driven and optimizes an estimated asymptotic mean squared error of the coefficient estimate. Empirical examples demonstrate our methodological and theoretical developments. Software code for our regsDML method will be made available in the R-package dmlalg.

Read more
Methodology

Rejoinder: On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning

This is the rejoinder to the discussion by Kennedy, Balakrishnan and Wasserman on the paper "On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning" published in Statistical Science.

Read more
Methodology

Reverse-Bayes methods: a review of recent technical advances

It is now widely accepted that the standard inferential toolkit used by the scientific research community -- null-hypothesis significance testing (NHST) -- is not fit for purpose. Yet despite the threat posed to the scientific enterprise, there is no agreement concerning alternative approaches. This lack of consensus reflects long-standing issues concerning Bayesian methods, the principal alternative to NHST. We report on recent work that builds on an approach to inference put forward over 70 years ago to address the well-known "Problem of Priors" in Bayesian analysis, by reversing the conventional prior-likelihood-posterior ("forward") use of Bayes's Theorem. Such Reverse-Bayes analysis allows priors to be deduced from the likelihood by requiring that the posterior achieve a specified level of credibility. We summarise the technical underpinning of this approach, and show how it opens up new approaches to common inferential challenges, such as assessing the credibility of scientific findings, setting them in appropriate context, estimating the probability of successful replications, and extracting more insight from NHST while reducing the risk of misinterpretation. We argue that Reverse-Bayes methods have a key role to play in making Bayesian methods more accessible and attractive to the scientific community. As a running example we consider a recently published meta-analysis from several randomized controlled clinical trials investigating the association between corticosteroids and mortality in hospitalized patients with COVID-19.

Read more
Methodology

Revisiting Identifying Assumptions for Population Size Estimation

The problem of estimating the size of a population based on a subset of individuals observed across multiple data sources is often referred to as capture-recapture or multiple-systems estimation. This is fundamentally a missing data problem, where the number of unobserved individuals represents the missing data. As with any missing data problem, multiple-systems estimation requires users to make an untestable identifying assumption in order to estimate the population size from the observed data. Approaches to multiple-systems estimation often do not emphasize the role of the identifying assumption during model specification, which makes it difficult to decouple the specification of the model for the observed data from the identifying assumption. We present a re-framing of the multiple-systems estimation problem that decouples the specification of the observed-data model from the identifying assumptions, and discuss how log-linear models and the associated no-highest-order interaction assumption fit into this framing. We present an approach to computation in the Bayesian setting which takes advantage of existing software and facilitates various sensitivity analyses. We demonstrate our approach in a case study of estimating the number of civilian casualties in the Kosovo war. Code used to produce this manuscript is available at this https URL.

Read more
Methodology

Robust Approximate Bayesian Computation: An Adjustment Approach

We propose a novel approach to approximate Bayesian computation (ABC) that seeks to cater for possible misspecification of the assumed model. This new approach can be equally applied to rejection-based ABC and to popular regression adjustment ABC. We demonstrate that this new approach mitigates the poor performance of regression adjusted ABC that can eventuate when the model is misspecified. In addition, this new adjustment approach allows us to detect which features of the observed data can not be reliably reproduced by the assumed model. A series of simulated and empirical examples illustrate this new approach.

Read more
Methodology

Robust Bayesian Inference for Big Data: Combining Sensor-based Records with Traditional Survey Data

Big Data often presents as massive non-probability samples. Not only is the selection mechanism often unknown, but larger data volume amplifies the relative contribution of selection bias to total error. Existing bias adjustment approaches assume that the conditional mean structures have been correctly specified for the selection indicator or key substantive measures. In the presence of a reference probability sample, these methods rely on a pseudo-likelihood method to account for the sampling weights of the reference sample, which is parametric in nature. Under a Bayesian framework, handling the sampling weights is an even bigger hurdle. To further protect against model misspecification, we expand the idea of double robustness such that more flexible non-parametric methods as well as Bayesian models can be used for prediction. In particular, we employ Bayesian additive regression trees, which not only capture non-linear associations automatically but permit direct quantification of the uncertainty of point estimates through its posterior predictive draws. We apply our method to sensor-based naturalistic driving data from the second Strategic Highway Research Program using the 2017 National Household Travel Survey as a benchmark.

Read more

Ready to get started?

Join us today