Featured Researches

Methodology

Interpretable Sensitivity Analysis for Balancing Weights

Assessing sensitivity to unmeasured confounding is an important step in observational studies, which typically estimate effects under the assumption that all confounders are measured. In this paper, we develop a sensitivity analysis framework for balancing weights estimators, an increasingly popular approach that solves an optimization problem to obtain weights that directly minimizes covariate imbalance. In particular, we adapt a sensitivity analysis framework using the percentile bootstrap for a broad class of balancing weights estimators. We prove that the percentile bootstrap procedure can, with only minor modifications, yield valid confidence intervals for causal effects under restrictions on the level of unmeasured confounding. We also propose an amplification to allow for interpretable sensitivity parameters in the balancing weights framework. We illustrate our method through extensive real data examples.

Read more
Methodology

Intrinsic Riemannian Functional Data Analysis for Sparse Longitudinal Observations

A novel framework is developed to intrinsically analyze sparsely observed Riemannian functional data. It features four innovative components: a frame-independent covariance function, a smooth vector bundle termed covariance vector bundle, a parallel transport and a smooth bundle metric on the covariance vector bundle. The introduced intrinsic covariance function links estimation of covariance structure to smoothing problems that involve raw covariance observations derived from sparsely observed Riemannian functional data, while the covariance vector bundle provides a rigorous mathematical foundation for formulating the smoothing problems. The parallel transport and the bundle metric together make it possible to measure fidelity of fit to the covariance function. They also plays a critical role in quantifying the quality of estimators for the covariance function. As an illustration, based on the proposed framework, we develop a local linear smoothing estimator for the covariance function, analyze its theoretical properties, and provide numerical demonstration via simulated and real datasets. The intrinsic feature of the framework makes it applicable to not only Euclidean submanifolds but also manifolds without a canonical ambient space.

Read more
Methodology

Invited Discussion of "A Unified Framework for De-Duplication and Population Size Estimation"

Invited Discussion of "A Unified Framework for De-Duplication and Population Size Estimation", published in Bayesian Analysis. My discussion focuses on two main themes: Providing a more nuanced picture of the costs and benefits of joint models for record linkage and the "downstream task" (i.e. whatever we might want to do with the linked and de-duplicated files), and how we should measure performance.

Read more
Methodology

Joint Estimation of Location and Scatter in Complex Elliptical Distributions: A robust semiparametric and computationally efficient R -estimator of the shape matrix

The joint estimation of the location vector and the shape matrix of a set of independent and identically Complex Elliptically Symmetric (CES) distributed observations is investigated from both the theoretical and computational viewpoints. This joint estimation problem is framed in the original context of semiparametric models allowing us to handle the (generally unknown) density generator as an \textit{infinite-dimensional} nuisance parameter. In the first part of the paper, a computationally efficient and memory saving implementation of the robust and semiparmaetric efficient R -estimator for shape matrices is derived. Building upon this result, in the second part, a joint estimator, relying on the Tyler's M -estimator of location and on the R -estimator of shape matrix, is proposed and its Mean Squared Error (MSE) performance compared with the Semiparametric Cramér-Rao Bound (CSCRB).

Read more
Methodology

Joint Variable Selection of both Fixed and Random Effects for Gaussian Process-based Spatially Varying Coefficient Models

Spatially varying coefficient (SVC) models are a type of regression model for spatial data where covariate effects vary over space. If there are several covariates, a natural question is which covariates have a spatially varying effect and which not. We present a new variable selection approach for Gaussian process-based SVC models. It relies on a penalized maximum likelihood estimation (PMLE) and allows variable selection both with respect to fixed effects and Gaussian process random effects. We validate our approach both in a simulation study as well as a real world data set. Our novel approach shows good selection performance in the simulation study. In the real data application, our proposed PMLE yields sparser SVC models and achieves a smaller information criterion than classical MLE. In a cross-validation applied on the real data, we show that sparser PML estimated SVC models are on par with ML estimated SVC models with respect to predictive performance.

Read more
Methodology

Kernel Ordinary Differential Equations

Ordinary differential equations (ODE) are widely used in modeling biological and physical processes in science. In this article, we propose a new reproducing kernel-based approach for estimation and inference of ODEs given the noisy observations. We do not restrict the functional forms in ODE to be linear or additive, and we allow pairwise interactions. We perform sparse estimation to select individual functionals, and construct confidence intervals for the estimated signal trajectories. We establish the estimation optimality and selection consistency of kernel ODE under both the low-dimensional and high-dimensional settings, where the number of unknown functionals can be smaller or larger than the sample size. Our proposal builds upon the smoothing spline analysis of variance (SS-ANOVA) framework, but tackles several important problems that are not yet fully addressed, and thus extends the scope of existing SS-ANOVA too. We demonstrate the efficacy of our method through numerous ODE examples.

Read more
Methodology

Kernel learning approaches for summarising and combining posterior similarity matrices

When using Markov chain Monte Carlo (MCMC) algorithms to perform inference for Bayesian clustering models, such as mixture models, the output is typically a sample of clusterings (partitions) drawn from the posterior distribution. In practice, a key challenge is how to summarise this output. Here we build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models. A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices that capture the clustering structure present in the data. This observation enables us to employ a range of kernel methods to obtain summary clusterings, and otherwise exploit the information summarised by PSMs. For example, if we have multiple PSMs, each corresponding to a different dataset on a common set of statistical units, we may use standard methods for combining kernels in order to perform integrative clustering. We may moreover embed PSMs within predictive kernel models in order to perform outcome-guided data integration. We demonstrate the performances of the proposed methods through a range of simulation studies as well as two real data applications. R code is available at this https URL.

Read more
Methodology

Kernel-Distance-Based Covariate Balancing

A common concern in observational studies focuses on properly evaluating the causal effect, which usually refers to the average treatment effect or the average treatment effect on the treated. In this paper, we propose a data preprocessing method, the Kernel-distance-based covariate balancing, for observational studies with binary treatments. This proposed method yields a set of unit weights for the treatment and control groups, respectively, such that the reweighted covariate distributions can satisfy a set of pre-specified balance conditions. This preprocessing methodology can effectively reduce confounding bias of subsequent estimation of causal effects. We demonstrate the implementation and performance of Kernel-distance-based covariate balancing with Monte Carlo simulation experiments and a real data analysis.

Read more
Methodology

Kullback-Leibler-Based Discrete Relative Risk Models for Integration of Published Prediction Models with New Dataset

Existing literature for prediction of time-to-event data has primarily focused on risk factors from an individual dataset. However, these analyses may suffer from small sample sizes, high dimensionality and low signal-to-noise ratios. To improve prediction stability and better understand risk factors associated with outcomes of interest, we propose a Kullback-Leibler-based discrete relative risk modeling procedure. Simulations and real data analysis are conducted to show the advantage of the proposed methods compared with those solely based on local dataset or prior models.

Read more
Methodology

Latent Causal Socioeconomic Health Index

This research develops a model-based LAtent Causal Socioeconomic Health (LACSH) index at the national level. We build upon the latent health factor index (LHFI) approach that has been used to assess the unobservable ecological/ecosystem health. This framework integratively models the relationship between metrics, the latent health, and the covariates that drive the notion of health. In this paper, the LHFI structure is integrated with spatial modeling and statistical causal modeling, so as to evaluate the impact of a continuous policy variable (mandatory maternity leave days and government's expenditure on healthcare, respectively) on a nation's socioeconomic health, while formally accounting for spatial dependency among the nations. A novel visualization technique for evaluating covariate balance is also introduced for the case of a continuous policy (treatment) variable. We apply our LACSH model to countries around the world using data on various metrics and potential covariates pertaining to different aspects of societal health. The approach is structured in a Bayesian hierarchical framework and results are obtained by Markov chain Monte Carlo techniques.

Read more

Ready to get started?

Join us today