Featured Researches

Methodology

Nested Group Testing Procedures for Screening

This article reviews a class of adaptive group testing procedures that operate under a probabilistic model assumption as follows. Consider a set of N items, where item i has the probability p ( p i in the generalized group testing) to be defective, and the probability 1?�p to be non-defective independent from the other items. A group test applied to any subset of size n is a binary test with two possible outcomes, positive or negative. The outcome is negative if all n items are non-defective, whereas the outcome is positive if at least one item among the n items is defective. The goal is complete identification of all N items with the minimum expected number of tests.

Read more
Methodology

New estimation approaches for graphical models with elastic net penalty

In the context of undirected Gaussian graphical models, we introduce three estimators based on elastic net penalty for the underlying dependence graph. Our goal is to estimate the sparse precision matrix, from which to retrieve both the underlying conditional dependence graph and the partial correlation graph. The first estimator is derived from the direct penalization of the precision matrix in the likelihood function, while the second from using conditional penalized regressions to estimate the precision matrix. Finally, the third estimator relies on a 2-stages procedure that estimates the edge set first and then the precision matrix elements. Through simulations we investigate the performances of the proposed methods on a large set of well-known network structures. Empirical results on simulated data show that the 2-stages procedure outperforms all other estimators both w.r.t. estimating the sparsity pattern in the graph and the edges' weights. Finally, using real-world data on US economic sectors, we estimate dependencies and show the impact of Covid-19 pandemic on the network strength.

Read more
Methodology

New randomized response technique for estimating the population total of a quantitative variable

In this paper, a new randomized response technique aimed at protecting respondents' privacy is proposed. It is designed for estimating the population total, or the population mean, of a quantitative characteristic. It provides a~high degree of protection to the interviewed individuals, hence it may be favorably perceived by them and increase their willingness to cooperate. Instead of revealing the true value of the characteristic under investigation, a respondent only states whether the value is greater (or smaller) than a~number which is selected by him/her at random, and is unknown to the interviewer. For each respondent this number, a sort of individual threshold, is generated as a pseudorandom number from the uniform distribution. Further, two modifications of the proposed technique are presented. The first modification assumes that the interviewer also knows the generated random number. The second modification deals with the issue that, for certain variables, such as income, it may be embarrassing for the respondents to report either high or low values. Thus, depending on the value of the pseudorandom lower bound, the respondent is asked different questions to avoid being embarrassed. The suggested approach is applied in detail to the simple random sampling without replacement, but it can also be applied to many currently used sampling schemes, including cluster sampling, two-stage sampling, etc. Results of simulations illustrate the behavior of the proposed procedure.

Read more
Methodology

Non-parametric regression for networks

Network data are becoming increasingly available, and so there is a need to develop suitable methodology for statistical analysis. Networks can be represented as graph Laplacian matrices, which are a type of manifold-valued data. Our main objective is to estimate a regression curve from a sample of graph Laplacian matrices conditional on a set of Euclidean covariates, for example in dynamic networks where the covariate is time. We develop an adapted Nadaraya-Watson estimator which has uniform weak consistency for estimation using Euclidean and power Euclidean metrics. We apply the methodology to the Enron email corpus to model smooth trends in monthly networks and highlight anomalous networks. Another motivating application is given in corpus linguistics, which explores trends in an author's writing style over time based on word co-occurrence networks.

Read more
Methodology

Nonintrusive Uncertainty Quantification for automotive crash problems with VPS/Pamcrash

Uncertainty Quantification (UQ) is a key discipline for computational modeling of complex systems, enhancing reliability of engineering simulations. In crashworthiness, having an accurate assessment of the behavior of the model uncertainty allows reducing the number of prototypes and associated costs. Carrying out UQ in this framework is especially challenging because it requires highly expensive simulations. In this context, surrogate models (metamodels) allow drastically reducing the computational cost of Monte Carlo process. Different techniques to describe the metamodel are considered, Ordinary Kriging, Polynomial Response Surfaces and a novel strategy (based on Proper Generalized Decomposition) denoted by Separated Response Surface (SRS). A large number of uncertain input parameters may jeopardize the efficiency of the metamodels. Thus, previous to define a metamodel, kernel Principal Component Analysis (kPCA) is found to be effective to simplify the model outcome description. A benchmark crash test is used to show the efficiency of combining metamodels with kPCA.

Read more
Methodology

Nonparametric Analysis of Delayed Treatment Effects using Single-Crossing Constraints

Clinical trials involving novel immuno-oncology (IO) therapies frequently exhibit survival profiles which violate the proportional hazards assumption due to a delay in treatment effect, and in such settings, the survival curves in the two treatment arms may have a crossing before the two curves eventually separate. To flexibly model such scenarios, we describe a nonparametric approach for estimating the treatment arm-specific survival functions which constrains these two survival functions to cross at most once without making any additional assumptions about how the survival curves are related. A main advantage of our approach is that it provides an estimate of a crossing time if such a crossing exists, and moreover, our method generates interpretable measures of treatment benefit including crossing-conditional survival probabilities and crossing-conditional estimates of restricted residual mean life. We demonstrate the use and effectiveness of our approach with a large simulation study and an analysis of reconstructed outcomes from a recent combination-therapy trial.

Read more
Methodology

Nonparametric C- and D-vine based quantile regression

Quantile regression is a field with steadily growing importance in statistical modeling. It is a complementary method to linear regression, since computing a range of conditional quantile functions provides a more accurate modelling of the stochastic relationship among variables, especially in the tails. We introduce a novel non-restrictive and highly flexible nonparametric quantile regression approach based on C- and D-vine copulas. Vine copulas allow for separate modeling of marginal distributions and the dependence structure in the data, and can be expressed through a graph theoretical model given by a sequence of trees. This way we obtain a quantile regression model, that overcomes typical issues of quantile regression such as quantile crossings or collinearity, the need for transformations and interactions of variables. Our approach incorporates a two-step ahead ordering of variables, by maximizing the conditional log-likelihood of the tree sequence, while taking into account the next two tree levels. Further, we show that the nonparametric conditional quantile estimator is consistent. The performance of the proposed methods is evaluated in both low- and high-dimensional settings using simulated and real world data. The results support the superior prediction ability of the proposed models.

Read more
Methodology

Nonparametric causal mediation analysis for stochastic interventional (in)direct effects

Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary treatments and static interventions, and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by treatment. We present a theoretical study of an (in)direct effect decomposition of the population intervention effect, defined by stochastic interventions jointly applied to the treatment and mediators. In contrast to existing proposals, our causal effects can be evaluated regardless of whether a treatment is categorical or continuous and remain well-defined even in the presence of intermediate confounders affected by treatment. Our (in)direct effects are identifiable without a restrictive assumption on cross-world counterfactual independencies, allowing for substantive conclusions drawn from them to be validated in randomized controlled trials. Beyond the novel effects introduced, we provide a careful study of nonparametric efficiency theory relevant for the construction of flexible, multiply robust estimators of our (in)direct effects, while avoiding undue restrictions induced by assuming parametric models of nuisance parameter functionals. To complement our nonparametric estimation strategy, we introduce inferential techniques for constructing confidence intervals and hypothesis tests, and discuss open source software implementing the proposed methodology.

Read more
Methodology

Nonparametric estimation of directional highest density regions

Reconstruction of sets from a random sample of points intimately related to them is the goal of set estimation theory. Within this context, a particular problem is the one related with the reconstruction of density level sets and specifically, those ones with a high probability content, namely highest density regions. We define highest density regions for directional data and provide a plug-in estimator, based on kernel smoothing. A suitable bootstrap bandwidth selector is provided for the practical implementation of the proposal. An extensive simulation study shows the performance of the plug-in estimator proposed with the bootstrap bandwidth selector and with other bandwidth selectors specifically designed for circular and spherical kernel density estimation. The methodology is applied to analyze two real data sets in animal orientation and seismology.

Read more
Methodology

Novel Bayesian Procrustes Variance-based Inferences in Geometric Morphometrics & Novel R package: BPviGM1

Compared to abundant classical statistics-based literature, to date, very little Bayesian literature exists on Procrustes shape analysis in Geometric Morphometrics, probably because of being a relatively new branch of statistical research and because of inherent computational difficulty associated with Bayesian analysis. Moreover, we may obtain a plethora of novel inferences from Bayesian Procrustes analysis of shape parameter distributions. In this paper, we propose to regard the posterior of Procrustes shape variance as morphological variability indicators. Here we propose novel Bayesian methodologies for Procrustes shape analysis based on landmark data's isotropic variance assumption and propose a Bayesian statistical test for model validation of new species discovery using morphological variation reflected in the posterior distribution of landmark-variance of objects studied under Geometric Morphometrics. We will consider Gaussian distribution-based and heavy-tailed t distribution-based models for Procrustes analysis. To date, we are not aware of any direct R package for Bayesian Procrustes analysis for landmark-based Geometric Morphometrics. Hence, we introduce a novel, simple R package \textbf{BPviGM1} ("Bayesian Procrustes Variance-based inferences in Geometric Morphometrics 1"), which essentially contains the R code implementations of the computations for proposed models and methodologies, such as R function for Markov Chain Monte Carlo (MCMC) run for drawing samples from posterior of parameters of concern and R function for the proposed Bayesian test of model validation based on significance morphological variation. As an application, we can quantitatively show that primate male-face may be genetically viable to more shape-variation than the same for females.

Read more

Ready to get started?

Join us today