Featured Researches

Methodology

Adaptive shrinkage of smooth functional effects towards a predefined functional subspace

In this paper, we propose a new horseshoe-type prior hierarchy for adaptively shrinking spline-based functional effects towards a predefined vector space of parametric functions. Instead of shrinking each spline coefficient towards zero, we use an adapted horseshoe prior to control the deviation from the predefined vector space. For this purpose, the modified horseshoe prior is set up with one scale parameter per spline and not one per coefficient. The presented prior allows for a large number of basis functions to capture all kinds of functional effects while the estimated functional effect is prevented from a highly oscillating overfit. We achieve this by integrating a smoothing penalty similar to the random walk prior commonly applied in Bayesian P-spline priors. In a simulation study, we demonstrate the properties of the new prior specification and compare it to other approaches from the literature. Furthermore, we showcase the applicability of the proposed method by estimating the energy consumption in Germany over the course of a day. For inference, we rely on Markov chain Monte Carlo simulations combining Gibbs sampling for the spline coefficients with slice sampling for all scale parameters in the model.

Read more
Methodology

Additive Models for Symmetric Positive-Definite Matrices, Riemannian Manifolds and Lie groups

In this paper an additive regression model for a symmetric positive-definite matrix valued response and multiple scalar predictors is proposed. The model exploits the abelian group structure inherited from either the Log-Cholesky metric or the Log-Euclidean framework that turns the space of symmetric positive-definite matrices into a Riemannian manifold and further a bi-invariant Lie group. The additive model for responses in the space of symmetric positive-definite matrices with either of these metrics is shown to connect to an additive model on a tangent space. This connection not only entails an efficient algorithm to estimate the component functions but also allows to generalize the proposed additive model to general Riemannian manifolds that might not have a Lie group structure. Optimal asymptotic convergence rates and normality of the estimated component functions are also established. Numerical studies show that the proposed model enjoys superior numerical performance, especially when there are multiple predictors. The practical merits of the proposed model are demonstrated by analyzing diffusion tensor brain imaging data.

Read more
Methodology

Addressing patient heterogeneity in disease predictive model development

This paper addresses patient heterogeneity associated with prediction problems in biomedical applications. We propose a systematic hypothesis testing approach to determine the existence of patient subgroup structure and the number of subgroups in patient population if subgroups exist. A mixture of generalized linear models is considered to model the relationship between the disease outcome and patient characteristics and clinical factors, including targeted biomarker profiles. We construct a test statistic based on expectation maximization (EM) algorithm and derive its asymptotic distribution under the null hypothesis. An important computational advantage of the test is that the involved parameter estimates under the complex alternative hypothesis can be obtained through a small number of EM iterations, rather than optimizing the objective function. We demonstrate the finite sample performance of the proposed test in terms of type-I error rate and power, using extensive simulation studies. The applicability of the proposed method is illustrated through an application to a multi-center prostate cancer study.

Read more
Methodology

Adjusting the Benjamini-Hochberg method for controlling the false discovery rate in knockoff assisted variable selection

This paper revisits the knockoff-based multiple testing setup considered in Barber & Candes (2015) for variable selection applied to a linear regression model with n??d , where n is the sample size and d is the number of explanatory variables. The BH method based on ordinary least squares estimates of the regressions coefficients is adjusted to this setup, making it a valid p -value based FDR controlling method that does not rely on any specific correlation structure of the explanatory variables. Simulations and real data applications demonstrate that our proposed method in its original form and its data-adaptive version incorporating estimated proportion of truly unimportant explanatory variables are powerful competitors of the FDR controlling methods in Barber & Candes (2015).

Read more
Methodology

Agglomerative Hierarchical Clustering for Selecting Valid Instrumental Variables

We propose an instrumental variable (IV) selection procedure which combines the agglomerative hierarchical clustering method and the Hansen-Sargan overidentification test for selecting valid instruments for IV estimation from a large set of candidate instruments. Some of the instruments may be invalid in the sense that they may fail the exclusion restriction. We show that under the plurality rule, our method can achieve oracle selection and estimation results. Compared to the previous IV selection methods, our method has the advantages that it can deal with the weak instruments problem effectively, and can be easily extended to settings where there are multiple endogenous regressors and heterogenous treatment effects. We conduct Monte Carlo simulations to examine the performance of our method, and compare it with two existing methods, the Hard Thresholding method (HT) and the Confidence Interval method (CIM). The simulation results show that our method achieves oracle selection and estimation results in both single and multiple endogenous regressors settings in large samples when all the instruments are strong. Also, our method works well when some of the candidate instruments are weak, outperforming HT and CIM. We apply our method to the estimation of the effect of immigration on wages in the US.

Read more
Methodology

An Adaptive Algorithm based on High-Dimensional Function Approximation to obtain Optimal Designs

Algorithms which compute locally optimal continuous designs often rely on a finite design space or on repeatedly solving a complex non-linear program. Both methods require extensive evaluations of the Jacobian Df of the underlying model. These evaluations present a heavy computational burden. Based on the Kiefer-Wolfowitz Equivalence Theorem we present a novel design of experiments algorithm which computes optimal designs in a continuous design space. For this iterative algorithm we combine an adaptive Bayes-like sampling scheme with Gaussian process regression to approximate the directional derivative of the design criterion. The approximation allows us to adaptively select new design points on which to evaluate the model. The adaptive selection of the algorithm requires significantly less evaluations of Df and reduces the runtime of the computations. We show the viability of the new algorithm on two examples from chemical engineering.

Read more
Methodology

An Aligned Rank Transform Procedure for Multifactor Contrast Tests

Data from multifactor HCI experiments often violates the normality assumption of parametric tests (i.e., nonconforming data). The Aligned Rank Transform (ART) is a popular nonparametric analysis technique that can find main and interaction effects in nonconforming data, but leads to incorrect results when used to conduct contrast tests. We created a new algorithm called ART-C for conducting contrasts within the ART paradigm and validated it on 72,000 data sets. Our results indicate that ART-C does not inflate Type I error rates, unlike contrasts based on ART, and that ART-C has more statistical power than a t-test, Mann-Whitney U test, Wilcoxon signed-rank test, and ART. We also extended a tool called ARTool with our ART-C algorithm for both Windows and R. Our validation had some limitations (e.g., only six distribution types, no mixed factorial designs, no random slopes), and data drawn from Cauchy distributions should not be analyzed with ART-C.

Read more
Methodology

An Approximation Scheme for Multivariate Information based on Partial Information Decomposition

We consider an approximation scheme for multivariate information assuming that synergistic information only appearing in higher order joint distributions is suppressed, which may hold in large classes of systems. Our approximation scheme gives a practical way to evaluate information among random variables and is expected to be applied to feature selection in machine learning. The truncation order of our approximation scheme is given by the order of synergy. In the classification of information, we use the partial information decomposition of the original one. The resulting multivariate information is expected to be reasonable if higher order synergy is suppressed in the system. In addition, it is calculable in relatively easy way if the truncation order is not so large. We also perform numerical experiments to check the validity of our approximation scheme.

Read more
Methodology

An Early Stopping Bayesian Data Assimilation Approach for Mixed-Logit Estimation

The mixed-logit model is a flexible tool in transportation choice analysis, which provides valuable insights into inter and intra-individual behavioural heterogeneity. However, applications of mixed-logit models are limited by the high computational and data requirements for model estimation. When estimating on small samples, the Bayesian estimation approach becomes vulnerable to over and under-fitting. This is problematic for investigating the behaviour of specific population sub-groups or market segments with low data availability. Similar challenges arise when transferring an existing model to a new location or time period, e.g., when estimating post-pandemic travel behaviour. We propose an Early Stopping Bayesian Data Assimilation (ESBDA) simulator for estimation of mixed-logit which combines a Bayesian statistical approach with Machine Learning methodologies. The aim is to improve the transferability of mixed-logit models and to enable the estimation of robust choice models with low data availability. This approach can provide new insights into choice behaviour where the traditional estimation of mixed-logit models was not possible due to low data availability, and open up new opportunities for investment and planning decisions support. The ESBDA estimator is benchmarked against the Direct Application approach, a basic Bayesian simulator with random starting parameter values and a Bayesian Data Assimilation (BDA) simulator without early stopping. The ESBDA approach is found to effectively overcome under and over-fitting and non-convergence issues in simulation. Its resulting models clearly outperform those of the reference simulators in predictive accuracy. Furthermore, models estimated with ESBDA tend to be more robust, with significant parameters with signs and values consistent with behavioural theory, even when estimated on small samples.

Read more
Methodology

An Independence Test Based on Recurrence Rates. An empirical study and applications to real data

In this paper we propose several variants to perform the independence test between two random elements based on recurrence rates. We will show how to calculate the test statistic in each one of these cases. From simulations we obtain that in high dimension, our test clearly outperforms, in almost all cases, the other widely used competitors. The test was performed on two data sets including small and large sample sizes and we show that in both ases the application of the test allows us to obtain interesting conclusions.

Read more

Ready to get started?

Join us today