Featured Researches

Methodology

A Note on Debiased/Double Machine Learning Logistic Partially Linear Model

It is of particular interests in many application fields to draw doubly robust inference of a logistic partially linear model with the predictor specified as combination of a targeted low dimensional linear parametric function and a nuisance nonparametric function. In recent, Tan (2019) proposed a simple and flexible doubly robust estimator for this purpose. They introduced the two nuisance models, i.e. nonparametric component in the logistic model and conditional mean of the exposure covariates given the other covariates and fixed response, and specified them as fixed dimensional parametric models. Their framework could be potentially extended to machine learning or high dimensional nuisance modelling exploited recently, e.g. in Chernozhukovet al. (2018a,b) and Smucler et al. (2019); Tan (2020). Motivated by this, we derive the debiased/double machine learning logistic partially linear model in this note. For construction of the nuisance models, we separately consider the use of high dimensional sparse parametric models and general machine learning methods. By deriving certain moment equations to calibrate the first order bias of the nuisance models, we preserve a model double robustness property on high dimensional ultra-sparse nuisance models. We also discuss and compare the underlying assumption of our method with debiased LASSO (Van deGeer et al., 2014). To implement the machine learning proposal, we design a full model refitting procedure that allows the use of any blackbox conditional mean estimation method in our framework. Under the machine learning setting, our method is rate doubly robust in a similar sense as Chernozhukov et al. (2018a).

Read more
Methodology

A Note on Using Discretized Simulated Data to Estimate Implicit Likelihoods in Bayesian Analyses

This article presents a Bayesian inferential method where the likelihood for a model is unknown but where data can easily be simulated from the model. We discretize simulated (continuous) data to estimate the implicit likelihood in a Bayesian analysis employing a Markov chain Monte Carlo algorithm. Three examples are presented as well as a small study on some of the method's properties.

Read more
Methodology

A Particle Method for Solving Fredholm Equations of the First Kind

Fredholm integral equations of the first kind are the prototypical example of ill-posed linear inverse problems. They model, among other things, reconstruction of distorted noisy observations and indirect density estimation and also appear in instrumental variable regression. However, their numerical solution remains a challenging problem. Many techniques currently available require a preliminary discretization of the domain of the solution and make strong assumptions about its regularity. For example, the popular expectation maximization smoothing (EMS) scheme requires the assumption of piecewise constant solutions which is inappropriate for most applications. We propose here a novel particle method that circumvents these two issues. This algorithm can be thought of as a Monte Carlo approximation of the EMS scheme which not only performs an adaptive stochastic discretization of the domain but also results in smooth approximate solutions. We analyze the theoretical properties of the EMS iteration and of the corresponding particle algorithm. Compared to standard EMS, we show experimentally that our novel particle method provides state-of-the-art performance for realistic systems, including motion deblurring and reconstruction of cross-section images of the brain from positron emission tomography.

Read more
Methodology

A Practical Two-Sample Test for Weighted Random Graphs

Network (graph) data analysis is a popular research topic in statistics and machine learning. In application, one is frequently confronted with graph two-sample hypothesis testing where the goal is to test the difference between two graph populations. Several statistical tests have been devised for this purpose in the context of binary graphs. However, many of the practical networks are weighted and existing procedures can't be directly applied to weighted graphs. In this paper, we study the weighted graph two-sample hypothesis testing problem and propose a practical test statistic. We prove that the proposed test statistic converges in distribution to the standard normal distribution under the null hypothesis and analyze its power theoretically. The simulation study shows that the proposed test has satisfactory performance and it substantially outperforms the existing counterpart in the binary graph case. A real data application is provided to illustrate the method.

Read more
Methodology

A Review of Generalizability and Transportability

When assessing causal effects, determining the target population to which the results are intended to generalize is a critical decision. Randomized and observational studies each have strengths and limitations for estimating causal effects in a target population. Estimates from randomized data may have internal validity but are often not representative of the target population. Observational data may better reflect the target population, and hence be more likely to have external validity, but are subject to potential bias due to unmeasured confounding. While much of the causal inference literature has focused on addressing internal validity bias, both internal and external validity are necessary for unbiased estimates in a target population. This paper presents a framework for addressing external validity bias, including a synthesis of approaches for generalizability and transportability, the assumptions they require, as well as tests for the heterogeneity of treatment effects and differences between study and target populations.

Read more
Methodology

A Robust Spearman Correlation Coefficient Permutation Test

In this work, we show that Spearman's correlation coefficient test about H 0 : ρ s =0 found in most statistical software packages is theoretically incorrect and performs poorly when bivariate normality assumptions are not met or the sample size is small. The historical works about these tests make an unverifiable assumption that the approximate bivariate normality of original data justifies using classic approximations. In general, there is common misconception that the tests about ρ s =0 are robust to deviations from bivariate normality. In fact, we found under certain scenarios violation of the bivariate normality assumption has severe effects on type I error control for the most commonly utilized tests. To address this issue, we developed a robust permutation test for testing the general hypothesis H 0 : ρ s =0 . The proposed test is based on an appropriately studentized statistic. We will show that the test is theoretically asymptotically valid in the general setting when two paired variables are uncorrelated but dependent. This desired property was demonstrated across a range of distributional assumptions and sample sizes in simulation studies, where the proposed test exhibits robust type I error control across a variety of settings, even when the sample size is small. We demonstrated the application of this test in real world examples of transcriptomic data of the TCGA breast cancer patients and a data set of PSA levels and age.

Read more
Methodology

A Selective Review of Negative Control Methods in Epidemiology

Purpose of Review: Negative controls are a powerful tool to detect and adjust for bias in epidemiological research. This paper introduces negative controls to a broader audience and provides guidance on principled design and causal analysis based on a formal negative control framework. Recent Findings: We review and summarize causal and statistical assumptions, practical strategies, and validation criteria that can be combined with subject matter knowledge to perform negative control analyses. We also review existing statistical methodologies for detection, reduction, and correction of confounding bias, and briefly discuss recent advances towards nonparametric identification of causal effects in a double negative control design. Summary: There is great potential for valid and accurate causal inference leveraging contemporary healthcare data in which negative controls are routinely available. Design and analysis of observational data leveraging negative controls is an area of growing interest in health and social sciences. Despite these developments, further effort is needed to disseminate these novel methods to ensure they are adopted by practicing epidemiologists.

Read more
Methodology

A Survival Mediation Model with Bayesian Model Averaging

Determining the extent to which a patient is benefiting from cancer therapy is challenging. Criteria for quantifying the extent of "tumor response" observed within a few cycles of treatment have been established for various types of solid as well as hematologic malignancies. These measures comprise the primary endpoints of phase II trials. Regulatory approvals of new cancer therapies, however, are usually contingent upon the demonstration of superior overall survival with randomized evidence acquired with a phase III trial comparing the novel therapy to an appropriate standard of care treatment. With nearly two thirds of phase III oncology trials failing to achieve statistically significant results, researchers continue to refine and propose new surrogate endpoints. This article presents a Bayesian framework for studying relationships among treatment, patient subgroups, tumor response and survival. Combining classical components of mediation analysis with Bayesian model averaging (BMA), the methodology is robust to model mis-specification among various possible relationships among the observable entities. Posterior inference is demonstrated via application to a randomized controlled phase III trial in metastatic colorectal cancer. Moreover, the article details posterior predictive distributions of survival and statistical metrics for quantifying the extent of direct and indirect, or tumor response mediated, treatment effects.

Read more
Methodology

A Wavelet-Based Independence Test for Functional Data with an Application to MEG Functional Connectivity

Measuring and testing the dependency between multiple random functions is often an important task in functional data analysis. In the literature, a model-based method relies on a model which is subject to the risk of model misspecification, while a model-free method only provides a correlation measure which is inadequate to test independence. In this paper, we adopt the Hilbert-Schmidt Independence Criterion (HSIC) to measure the dependency between two random functions. We develop a two-step procedure by first pre-smoothing each function based on its discrete and noisy measurements and then applying the HSIC to recovered functions. To ensure the compatibility between the two steps such that the effect of the pre-smoothing error on the subsequent HSIC is asymptotically negligible, we propose to use wavelet soft-thresholding for pre-smoothing and Besov-norm-induced kernels for HSIC. We also provide the corresponding asymptotic analysis. The superior numerical performance of the proposed method over existing ones is demonstrated in a simulation study. Moreover, in an magnetoencephalography (MEG) data application, the functional connectivity patterns identified by the proposed method are more anatomically interpretable than those by existing methods.

Read more
Methodology

A flexible and efficient algorithm for joint imputation of general data

Imputation of data with general structures (e.g., data with continuous, binary, unordered categorical, and ordinal variables) is commonly performed with fully conditional specification (FCS) instead of joint modeling. A key drawback of FCS is that it does not invoke an appropriate data augmentation mechanism and as such convergence of the resulting Markov chain Monte Carlo procedure is not assured. Methods that use joint modeling lack these drawbacks but have not been efficiently implemented in data of general structures. We address these issues by developing a new method, the so-called GERBIL algorithm, that draws imputations from a latent joint multivariate normal model that underpins the generally structured data. This model is constructed using a sequence of flexible conditional linear models that enables the resulting procedure to be efficiently implemented on high dimensional datasets in practice. Simulations show that GERBIL performs well when compared to those that utilize FCS. Furthermore, the new method is computationally efficient relative to existing FCS procedures.

Read more

Ready to get started?

Join us today