Featured Researches

Methodology

Bayesian inference with tmbstan for a state-space model with VAR(1) state equation

When using R package tmbstan for Bayesian inference, the built-in feature Laplace approximation to the marginal likelihood with random effects integrated out can be switched on and off. There exists no guideline on whether Laplace approximation should be used to achieve better efficiency especially when the statistical model for estimating selection is complicated. To answer this question, we conducted simulation studies under different scenarios with a state-space model employing a VAR(1) state equation. We found that turning on Laplace approximation in tmbstan would probably lower the computational efficiency, and only when there is a good amount of data, both tmbstan with and without Laplace approximation are worth trying since in this case, Laplace approximation is more likely to be accurate and may also lead to slightly higher computational efficiency. The transition parameters and scale parameters in a VAR(1) process are hard to be estimated accurately and increasing the sample size at each time point do not help in the estimation, only more time points in the data contain more information on these parameters and make the likelihood dominate the posterior likelihood, thus lead to accurate estimates for them.

Read more
Methodology

Bayesian model averaging for analysis of lattice field theory results

Statistical modeling is a key component in the extraction of physical results from lattice field theory calculations. Although the general models used are often strongly motivated by physics, many model variations can frequently be considered for the same lattice data. Model averaging, which amounts to a probability-weighted average over all model variations, can incorporate systematic errors associated with model choice without being overly conservative. We discuss the framework of model averaging from the perspective of Bayesian statistics, and give useful formulae and approximations for the particular case of least-squares fitting, commonly used in modeling lattice results. In addition, we frame the common problem of data subset selection (e.g. choice of minimum and maximum time separation for fitting a two-point correlation function) as a model selection problem and study model averaging as a straightforward alternative to manual selection of fit ranges. Numerical examples involving both mock and real lattice data are given.

Read more
Methodology

Bayesian statistical analysis of hydrogeochemical data using point processes: a new tool for source detection in multicomponent fluid mixtures

Hydrogeochemical data may be seen as a point cloud in a multi-dimensional space. Each dimension of this space represents a hydrogeochemical parameter (i.e. salinity, solute concentration, concentration ratio, isotopic composition...). While the composition of many geological fluids is controlled by mixing between multiple sources, a key question related to hydrogeochemical data set is the detection of the sources. By looking at the hydrogeochemical data as spatial data, this paper presents a new solution to the source detection problem that is based on point processes. Results are shown on simulated and real data from geothermal fluids.

Read more
Methodology

Beyond unidimensional poverty analysis using distributional copula models for mixed ordered-continuous outcomes

Poverty is a multidimensional concept often comprising a monetary outcome and other welfare dimensions such as education, subjective well-being or health, that are measured on an ordinal scale. In applied research, multidimensional poverty is ubiquitously assessed by studying each poverty dimension independently in univariate regression models or by combining several poverty dimensions into a scalar index. This inhibits a thorough analysis of the potentially varying interdependence between the poverty dimensions. We propose a multivariate copula generalized additive model for location, scale and shape (copula GAMLSS or distributional copula model) to tackle this challenge. By relating the copula parameter to covariates, we specifically examine if certain factors determine the dependence between poverty dimensions. Furthermore, specifying the full conditional bivariate distribution, allows us to derive several features such as poverty risks and dependence measures coherently from one model for different individuals. We demonstrate the approach by studying two important poverty dimensions: income and education. Since the level of education is measured on an ordinal scale while income is continuous, we extend the bivariate copula GAMLSS to the case of mixed ordered-continuous outcomes. The new model is integrated into the GJRM package in R and applied to data from Indonesia. Particular emphasis is given to the spatial variation of the income-education dependence and groups of individuals at risk of being simultaneously poor in both education and income dimensions.

Read more
Methodology

Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal

Motivated by mapping adverse artifactual events caused by body movements in electroencephalographic (EEG) signals, we present a functional independent component analysis based on the spectral decomposition of the kurtosis operator of a smoothed principal component expansion. A discrete roughness penalty is introduced in the orthonormality constraint of the covariance eigenfunctions in order to obtain the smoothed basis for the proposed independent component model. To select the tuning parameters, a cross-validation method that incorporates shrinkage is used to enhance the performance on functional representations with large basis dimension. This method provides an estimation strategy to determine the penalty parameter and the optimal number of components. Our independent component approach is applied to real EEG data to estimate genuine brain potentials from a contaminated signal. As a result, it is possible to control high-frequency remnants of neural origin overlapping artifactual sources to optimize their removal from the signal. An R package implementing our methods is available at CRAN.

Read more
Methodology

Bi-factor and second-order copula models for item response data

Bi-factor and second-order models based on copulas are proposed for item response data, where the items can be split into non-overlapping groups such that there is a homogeneous dependence within each group. Our general models include the Gaussian bi-factor and second-order models as special cases and can lead to more probability in the joint upper or lower tail compared with the Gaussian bi-factor and second-order models. Details on maximum likelihood estimation of parameters for the bi-factor and second-order copula models are given, as well as model selection and goodness-of-fit techniques. Our general methodology is demonstrated with an extensive simulation study and illustrated for the Toronto Alexithymia Scale. Our studies suggest that there can be a substantial improvement over the Gaussian bi-factor and second-order models both conceptually, as the items can have interpretations of latent maxima/minima or mixtures of means in comparison with latent means, and in fit to data.

Read more
Methodology

Bias Reduction as a Remedy to the Consequences of Infinite Estimates in Poisson and Tobit Regression

Data separation is a well-studied phenomenon that can cause problems in the estimation and inference from binary response models. Complete or quasi-complete separation occurs when there is a combination of regressors in the model whose value can perfectly predict one or both outcomes. In such cases, and such cases only, the maximum likelihood estimates and the corresponding standard errors are infinite. It is less widely known that the same can happen in further microeconometric models. One of the few works in the area is Santos Silva and Tenreyro (2010) who note that the finiteness of the maximum likelihood estimates in Poisson regression depends on the data configuration and propose a strategy to detect and overcome the consequences of data separation. However, their approach can lead to notable bias on the parameter estimates when the regressors are correlated. We illustrate how bias-reducing adjustments to the maximum likelihood score equations can overcome the consequences of separation in Poisson and Tobit regression models.

Read more
Methodology

Biconvex Clustering

Convex clustering has recently garnered increasing interest due to its attractive theoretical and computational properties, but its merits become limited in the face of high-dimensional data. In such settings, pairwise affinity terms that rely on k -nearest neighbors become poorly specified and Euclidean measures of fit provide weaker discriminating power. To surmount these issues, we propose to modify the convex clustering objective so that feature weights are optimized jointly with the centroids. The resulting problem becomes biconvex, and as such remains well-behaved statistically and algorithmically. In particular, we derive a fast algorithm with closed form updates and convergence guarantees, and establish finite-sample bounds on its prediction error. Under interpretable regularity conditions, the error bound analysis implies consistency of the proposed estimator. Biconvex clustering performs feature selection throughout the clustering task: as the learned weights change the effective feature representation, pairwise affinities can be updated adaptively across iterations rather than precomputed within a dubious feature space. We validate the contributions on real and simulated data, showing that our method effectively addresses the challenges of dimensionality while reducing dependence on carefully tuned heuristics typical of existing approaches.

Read more
Methodology

Big Data meets Causal Survey Research: Understanding Nonresponse in the Recruitment of a Mixed-mode Online Panel

Survey scientists increasingly face the problem of high-dimensionality in their research as digitization makes it much easier to construct high-dimensional (or "big") data sets through tools such as online surveys and mobile applications. Machine learning methods are able to handle such data, and they have been successfully applied to solve \emph{predictive} problems. However, in many situations, survey statisticians want to learn about \emph{causal} relationships to draw conclusions and be able to transfer the findings of one survey to another. Standard machine learning methods provide biased estimates of such relationships. We introduce into survey statistics the double machine learning approach, which gives approximately unbiased estimators of causal parameters, and show how it can be used to analyze survey nonresponse in a high-dimensional panel setting.

Read more
Methodology

Binary Outcome Copula Regression Model with Sampling Gradient Fitting

Use copula to model dependency of variable extends multivariate gaussian assumption. In this paper we first empirically studied copula regression model with continous response. Both simulation study and real data study are given. Secondly we give a novel copula regression model with binary outcome, and we propose a score gradient estimation algorithms to fit the model. Both simulation study and real data study are given for our model and fitting algorithm.

Read more

Ready to get started?

Join us today