Featured Researches

Methodology

Robust Clustering with Normal Mixture Models: A Pseudo β -Likelihood Approach

As in other estimation scenarios, likelihood based estimation in the normal mixture set-up is highly non-robust against model misspecification and presence of outliers (apart from being an ill-posed optimization problem). We propose a robust alternative to the ordinary likelihood approach for this estimation problem which performs simultaneous estimation and data clustering and leads to subsequent anomaly detection. To invoke robustness, we follow, in spirit, the methodology based on the minimization of the density power divergence (or alternatively, the maximization of the β -likelihood) under suitable constraints. An iteratively reweighted least squares approach has been followed in order to compute our estimators for the component means (or equivalently cluster centers) and component dispersion matrices which leads to simultaneous data clustering. Some exploratory techniques are also suggested for anomaly detection, a problem of great importance in the domain of statistics and machine learning. Existence and consistency of the estimators are established under the aforesaid constraints. We validate our method with simulation studies under different set-ups; it is seen to perform competitively or better compared to the popular existing methods like K-means and TCLUST, especially when the mixture components (i.e., the clusters) share regions with significant overlap or outlying clusters exist with small but non-negligible weights. Two real datasets are also used to illustrate the performance of our method in comparison with others along with an application in image processing. It is observed that our method detects the clusters with lower misclassification rates and successfully points out the outlying (anomalous) observations from these datasets.

Read more
Methodology

Robust Differential Abundance Test in Compositional Data

Differential abundance tests in the compositional data are essential and fundamental tasks in various biomedical applications, such as single-cell, bulk RNA-seq, and microbiome data analysis. Despite the recent developments in the fields, differential abundance analysis in the compositional data is still a complicated and unsolved statistical problem because of the compositional constraint and prevalent zero counts in the dataset. A new differential abundance test is introduced in this paper to address these challenges, referred to as the robust differential abundance (RDB) test. Compared with existing methods, the RDB test 1) is simple and computationally efficient, 2) is robust to prevalent zero counts in compositional datasets, 3) can take the data's compositional nature into account, and 4) has a theoretical guarantee to control false discoveries in a general setting. Furthermore, in the presence of observed covariates, the RDB test can work with the covariate balancing techniques to remove the potential confounding effects and draw reliable conclusions. To demonstrate its practical merits, we apply the new test to several numerical examples using both simulated and real datasets.

Read more
Methodology

Robust Extrinsic Regression Analysis for Manifold Valued Data

Recently, there has been a growing need in analyzing data on manifolds owing to their important role in diverse fields of science and engineering. In the literature of manifold-valued data analysis up till now, however, only a few works have been carried out concerning the robustness of estimation against noises, outliers, and other sources of perturbations. In this regard, we introduce a novel extrinsic framework for analyzing manifold valued data in a robust manner. First, by extending the notion of the geometric median, we propose a new robust location parameter on manifolds, so-called the extrinsic median. A robust extrinsic regression method is also developed by incorporating the conditional extrinsic median into the classical local polynomial regression method. We present the Weiszfeld's algorithm for implementing the proposed methods. The promising performance of our approach against existing methods is illustrated through simulation studies.

Read more
Methodology

Robust Functional Principal Component Analysis for Non-Gaussian Continuous Data to Analyze Physical Activity Data Generated by Accelerometers

Motivated by energy expenditure observations measured by wearable device, we propose two functional principal component analysis (FPCA) methods, Spearman FPCA and Kendall FPCA, for non-Gaussian continuous data. The energy expenditure records measured during physical activity can be modeled as functional data. They often involve non-Gaussian features, where the classical FPCA could be invalid. To handle this issue, we develop two robust FPCA estimators using the framework of rank statistics. Via an extension of the Spearman's rank correlation estimator and the Kendall's ? correlation estimator, a robust algorithm for FPCA is developed to fit the model. The two estimators are applied to analyze the physical activity data collected by a wearable accelerometer monitor. The effectiveness of the proposed methods is also demonstrated through a comprehensive simulation study.

Read more
Methodology

Robust Functional Principal Component Analysis for Non-Gaussian Longitudinal Data

Functional principal component analysis is essential in functional data analysis, but the inferences will become unconvincing when some non-Gaussian characteristics occur, such as heavy tail and skewness. The focus of this paper is to develop a robust functional principal component analysis methodology in dealing with non-Gaussian longitudinal data, for which sparsity and irregularity along with non-negligible measurement errors must be considered. We introduce a Kendall's ? function whose particular properties make it a nice proxy for the covariance function in the eigenequation when handling non-Gaussian cases. Moreover, the estimation procedure is presented and the asymptotic theory is also established. We further demonstrate the superiority and robustness of our method through simulation studies and apply the method to the longitudinal CD4 cell count data in an AIDS study.

Read more
Methodology

Robust Functional Principal Component Analysis via Functional Pairwise Spatial Signs

Functional principal component analysis (FPCA) has been widely used to capture major modes of variation and reduce dimensions in functional data analysis. However, standard FPCA based on the sample covariance estimator does not work well in the presence of outliers. To address this challenge, a new robust functional principal component analysis approach based on the functional pairwise spatial sign (PASS) operator, termed PASS FPCA, is introduced where we propose estimation procedures for both eigenfunctions and eigenvalues with and without measurement error. Compared to existing robust FPCA methods, the proposed one requires weaker distributional assumptions to conserve the eigenspace of the covariance function. In particular, a class of distributions called the weakly functional coordinate symmetric (weakly FCS) is introduced that allows for severe asymmetry and is strictly larger than the functional elliptical distribution class, the latter of which has been well used in the robust statistics literature. The robustness of the PASS FPCA is demonstrated via simulation studies and analyses of accelerometry data from a large-scale epidemiological study of physical activity on older women that partly motivates this work.

Read more
Methodology

Robust Hypothesis Testing and Model Selection for Parametric Proportional Hazard Regression Models

The semi-parametric Cox proportional hazards regression model has been widely used for many years in several applied sciences. However, a fully parametric proportional hazards model, if appropriately assumed, can often lead to more efficient inference. To tackle the extreme non-robustness of the traditional maximum likelihood estimator in the presence of outliers in the data under such fully parametric proportional hazard models, a robust estimation procedure has recently been proposed extending the concept of the minimum density power divergence estimator (MDPDE) under this set-up. In this paper, we consider the problem of statistical inference under the parametric proportional hazards model and develop robust Wald-type hypothesis testing and model selection procedures using the MDPDEs. We have also derived the necessary asymptotic results which are used to construct the testing procedure for general composite hypothesis and study its asymptotic powers. The claimed robustness properties are studied theoretically via appropriate influence function analyses. We have studied the finite sample level and power of the proposed MDPDE based Wald type test through extensive simulations where comparisons are also made with the existing semi-parametric methods. The important issue of the selection of appropriate robustness tuning parameter is also discussed. The practical usefulness of the proposed robust testing and model selection procedures is finally illustrated through three interesting real data examples.

Read more
Methodology

Robust Model-Based Clustering

We propose a new class of robust and Fisher-consistent estimators for mixture models. These estimators can be used to construct robust model-based clustering procedures. We study in detail the case of multivariate normal mixtures and propose a procedure that uses S estimators of multivariate location and scatter. We develop an algorithm to compute the estimators and to build the clusters which is quite similar to the EM algorithm. An extensive Monte Carlo simulation study shows that our proposal compares favorably with other robust and non robust model-based clustering procedures. We apply ours and alternative procedures to a real data set and again find that the best results are obtained using our proposal.

Read more
Methodology

Robust inference of conditional average treatment effects using dimension reduction

It is important to make robust inference of the conditional average treatment effect from observational data, but this becomes challenging when the confounder is multivariate or high-dimensional. In this article, we propose a double dimension reduction method, which reduces the curse of dimensionality as much as possible while keeping the nonparametric merit. We identify the central mean subspace of the conditional average treatment effect using dimension reduction. A nonparametric regression with prior dimension reduction is also used to impute counterfactual outcomes. This step helps improve the stability of the imputation and leads to a better estimator than existing methods. We then propose an effective bootstrapping procedure without bootstrapping the estimated central mean subspace to make valid inference.

Read more
Methodology

Robust optimal estimation of location from discretely sampled functional data

Estimating location is a central problem in functional data analysis, yet most current estimation procedures either unrealistically assume completely observed trajectories or lack robustness with respect to the many kinds of anomalies one can encounter in the functional setting. To remedy these deficiencies we introduce the first class of optimal robust location estimators based on discretely sampled functional data. The proposed method is based on M-type smoothing spline estimation with repeated measurements and is suitable for both commonly and independently observed trajectories that are subject to measurement error. We show that under suitable assumptions the proposed family of estimators is minimax rate optimal both for commonly and independently observed trajectories and we illustrate its highly competitive performance and practical usefulness in a Monte-Carlo study and a real-data example involving recent Covid-19 data.

Read more

Ready to get started?

Join us today