Featured Researches

Methodology

Epidemiological Forecasting with Model Reduction of Compartmental Models. Application to the COVID-19 pandemic

We propose a forecasting method for predicting epidemiological health series on a two-week horizon at the regional and interregional resolution. The approach is based on model order reduction of parametric compartmental models, and is designed to accommodate small amount of sanitary data. The efficiency of the method is shown in the case of the prediction of the number of infected and removed people during the two pandemic waves of COVID-19 in France, which have taken place approximately between February and November 2020. Numerical results illustrate the promising potential of the approach.

Read more
Methodology

Equivalence class selection of categorical graphical models

Learning the structure of dependence relations between variables is a pervasive issue in the statistical literature. A directed acyclic graph (DAG) can represent a set of conditional independences, but different DAGs may encode the same set of relations and are indistinguishable using observational data. Equivalent DAGs can be collected into classes, each represented by a partially directed graph known as essential graph (EG). Structure learning directly conducted on the EG space, rather than on the allied space of DAGs, leads to theoretical and computational benefits. Still, the majority of efforts in the literature has been dedicated to Gaussian data, with less attention to methods designed for multivariate categorical data. We then propose a Bayesian methodology for structure learning of categorical EGs. Combining a constructive parameter prior elicitation with a graph-driven likelihood decomposition, we derive a closed-form expression for the marginal likelihood of a categorical EG model. Asymptotic properties are studied, and an MCMC sampler scheme developed for approximate posterior inference. We evaluate our methodology on both simulated scenarios and real data, with appreciable performance in comparison with state-of-the-art methods.

Read more
Methodology

Estimating Average Treatment Effects with Support Vector Machines

Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature. We demonstrate that SVM can be used to balance covariates and estimate average causal effects under the unconfoundedness assumption. Specifically, we adapt the SVM classifier as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups while simultaneously maximizing effective sample size. We also show that SVM is a continuous relaxation of the quadratic integer program for computing the largest balanced subset, establishing its direct relation to the cardinality matching method. Another important feature of SVM is that the regularization parameter controls the trade-off between covariate balance and effective sample size. As a result, the existing SVM path algorithm can be used to compute the balance-sample size frontier. We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods. Finally, we conduct simulation and empirical studies to evaluate the performance of the proposed methodology and find that SVM is competitive with the state-of-the-art covariate balancing methods.

Read more
Methodology

Estimating Individual Treatment Effects using Non-Parametric Regression Models: a Review

Large observational data are increasingly available in disciplines such as health, economic and social sciences, where researchers are interested in causal questions rather than prediction. In this paper, we investigate the problem of estimating heterogeneous treatment effects using non-parametric regression-based methods. Firstly, we introduce the setup and the issues related to conducting causal inference with observational or non-fully randomized data, and how these issues can be tackled with the help of statistical learning tools. Then, we provide a review of state-of-the-art methods, with a particular focus on non-parametric modeling, and we cast them under a unifying taxonomy. After presenting a brief overview on the problem of model selection, we illustrate the performance of some of the methods on three different simulated studies and on a real world example to investigate the effect of participation in school meal programs on health indicators.

Read more
Methodology

Estimating Perinatal Critical Windows to Environmental Mixtures via Structured Bayesian Regression Tree Pairs

Maternal exposure to environmental chemicals during pregnancy can alter birth and children's health outcomes. Research seeks to identify critical windows, time periods when the exposures can change future health outcomes, and estimate the exposure-response relationship. Existing statistical approaches focus on estimation of the association between maternal exposure to a single environmental chemical observed at high-temporal resolution, such as weekly throughout pregnancy, and children's health outcomes. Extending to multiple chemicals observed at high temporal resolution poses a dimensionality problem and statistical methods are lacking. We propose a tree-based model for mixtures of exposures that are observed at high temporal resolution. The proposed approach uses an additive ensemble of structured tree-pairs that define structured main effects and interactions between time-resolved predictors and variable selection to select out of the model predictors not correlated with the outcome. We apply our method in a simulation and the analysis of the relationship between five exposures measured weekly throughout pregnancy and resulting birth weight in a Denver, Colorado birth cohort. We identified critical windows during which fine particulate matter, sulfur dioxide, and temperature are negatively associated with birth weight and an interaction between fine particulate matter and temperature. Software is made available in the R package dlmtree.

Read more
Methodology

Estimating Sibling Spillover Effects with Unobserved Confounding Using Gain-Scores

A growing area of research in epidemiology is the identification of health-related sibling spillover effects, or the effect of one individual's exposure on their sibling's outcome. The health and health care of family members may be inextricably confounded by unobserved factors, rendering identification of spillover effects within families particularly challenging. We demonstrate a gain-score regression method for identifying exposure-to-outcome spillover effects within sibling pairs in a linear fixed effects framework. The method can identify the exposure-to-outcome spillover effect if only one sibling's exposure affects the other's outcome; and it identifies the difference between the spillover effects if both siblings' exposures affect the others' outcomes. The method fails in the presence of outcome-to-exposure spillover and outcome-to-outcome spillover. Analytic results and Monte Carlo simulations demonstrate the method and its limitations. To exercise this method, we estimate the spillover effect of a child's preterm birth on an older sibling's literacy skills, measured by the Phonological Awarenesses Literacy Screening-Kindergarten test. We analyze 20,010 sibling pairs from a population-wide, Wisconsin-based (United States) birth cohort. Without covariate adjustment, we estimate that preterm birth modestly decreases an older sibling's test score (-2.11 points; 95% confidence interval: -3.82, -0.40 points). In conclusion, gain-scores are a promising strategy for identifying exposure-to-outcome spillovers in sibling pairs while controlling for sibling-invariant unobserved confounding in linear settings.

Read more
Methodology

Estimating The Proportion of Signal Variables Under Arbitrary Covariance Dependence

Estimating the proportion of signals hidden in a large amount of noise variables is of interest in many scientific inquires. In this paper, we consider realistic but theoretically challenging settings with arbitrary covariance dependence between variables. We define mean absolute correlation (MAC) to measure the overall dependence level and investigate a family of estimators for their performances in the full range of MAC. We explicit the joint effect of MAC dependence and signal sparsity on the performances of the family of estimators and discover that no single estimator in the family is most powerful under different MAC dependence levels. Informed by the theoretical insight, we propose a new estimator to better adapt to arbitrary covariance dependence. The proposed method compares favorably to several existing methods in extensive finite-sample settings with strong to weak covariance dependence and real dependence structures from genetic association studies.

Read more
Methodology

Estimating the number of communities by Stepwise Goodness-of-fit

Given a symmetric network with n nodes, how to estimate the number of communities K is a fundamental problem. We propose Stepwise Goodness-of-Fit (StGoF) as a new approach to estimating K . For m=1,2,… , StGoF alternately uses a community detection step (pretending m is the correct number of communities) and a goodness-of-fit step. We use SCORE \cite{SCORE} for community detection, and propose a new goodness-of-fit measure. Denote the goodness-of-fit statistic in step m by ψ (m) n . We show that as n→∞ , ψ (m) n →N(0,1) when m=K and ψ (m) n →∞ in probability when m<K . Therefore, with a proper threshold, StGoF terminates at m=K as desired. We consider a broad setting where we allow severe degree heterogeneity, a wide range of sparsity, and especially weak signals. In particular, we propose a measure for signal-to-noise ratio (SNR) and show that there is a phase transition: when SNR→0 as n→∞ , consistent estimates for K do not exist, and when SNR→∞ , StGoF is consistent, uniformly for a broad class of settings. In this sense, StGoF achieves the optimal phase transition. Stepwise testing algorithms of similar kind (e.g., \cite{wang2017likelihood, ma2018determining}) are known to face analytical challenges. We overcome the challenges by using a different design in the stepwise algorithm and by deriving sharp results in the under-fitting case (m<K) and the null case ( m=K ). The key to our analysis is to show that SCORE has the {\it Non-Splitting Property (NSP)}. The NSP is non-obvious, so additional to rigorous proofs, we also provide an intuitive explanation.

Read more
Methodology

Estimating the reciprocal of a binomial proportion

As a classic parameter from the binomial distribution, the binomial proportion has been well studied in the literature owing to its wide range of applications. In contrast, the reciprocal of the binomial proportion, also known as the inverse proportion, is often overlooked, even though it also plays an important role in various fields including clinical studies and random sampling. The maximum likelihood estimator of the inverse proportion suffers from the zero-event problem, and to overcome it, alternative methods have been developed in the literature. Nevertheless, there is little work addressing the optimality of the existing estimators, as well as their practical performance comparison. Inspired by this, we propose to further advance the literature by developing an optimal estimator for the inverse proportion in a family of shrinkage estimators. We further derive the explicit and approximate formulas for the optimal shrinkage parameter under different settings. Simulation studies show that the performance of our new estimator performs better than, or as well as, the existing competitors in most practical settings. Finally, to illustrate the usefulness of our new method, we also revisit a recent meta-analysis on COVID-19 data for assessing the relative risks of physical distancing on the infection of coronavirus, in which six out of seven studies encounter the zero-event problem.

Read more
Methodology

Estimating the treatment effect for adherers using multiple imputation

Randomized controlled trials are considered the gold standard to evaluate the treatment effect (estimand) for efficacy and safety. According to the recent International Council on Harmonisation (ICH)-E9 addendum (R1), intercurrent events (ICEs) need to be considered when defining an estimand, and principal stratum is one of the five strategies used to handle ICEs. Qu et al. (2020, Statistics in Biopharmaceutical Research 12:1-18) proposed estimators for the adherer average causal effect (AdACE) for estimating the treatment difference for those who adhere to one or both treatments based on the causal-inference framework, and demonstrated the consistency of those estimators. No variance estimation formula is provided, however, due to the complexity of the estimators. In addition, it is difficult to evaluate the performance of the bootstrap confidence interval (CI) due to computational intensity in the complex estimation procedure. The current research implements estimators for AdACE using multiple imputation (MI) and constructs CI through bootstrapping. A simulation study shows that the MI-based estimators provide consistent estimators with nominal coverage probability of CIs for the treatment difference for the adherent populations of interest. Application to a real dataset is illustrated by comparing two basal insulins for patients with type 1 diabetes.

Read more

Ready to get started?

Join us today