Featured Researches

Methodology

A Frequency Domain Bootstrap for General Multivariate Stationary Processes

For many relevant statistics of multivariate time series, no valid frequency domain bootstrap procedures exist. This is mainly due to the fact that the distribution of such statistics depends on the fourth-order moment structure of the underlying process in nearly every scenario, except for some special cases like Gaussian time series. In contrast to the univariate case, even additional structural assumptions such as linearity of the multivariate process or a standardization of the statistic of interest do not solve the problem. This paper focuses on integrated periodogram statistics as well as functions thereof and presents a new frequency domain bootstrap procedure for multivariate time series, the multivariate frequency domain hybrid bootstrap (MFHB), to fill this gap. Asymptotic validity of the MFHB procedure is established for general classes of periodogram-based statistics and for stationary multivariate processes satisfying rather weak dependence conditions. A simulation study is carried out which compares the finite sample performance of the MFHB with that of the moving block bootstrap.

Read more
Methodology

A General Bayesian Model for Heteroskedastic Data with Fully Conjugate Full-Conditional Distributions

Models for heteroskedastic data are relevant in a wide variety of applications ranging from financial time series to environmental statistics. However, the topic of modeling the variance function conditionally has not seen near as much attention as modeling the mean. Volatility models have been used in specific applications, but these models can be difficult to fit in a Bayesian setting due to posterior distributions that are challenging to sample from efficiently. In this work, we introduce a general model for heteroskedastic data. This approach models the conditional variance in a mixed model approach as a function of any desired covariates or random effects. We rely on new distribution theory in order to construct priors that yield fully conjugate full conditional distributions. Thus, our approach can easily be fit via Gibbs sampling. Furthermore, we extend the model to a deep learning approach that can provide highly accurate estimates for time dependent data. We also provide an extension for heavy-tailed data. We illustrate our methodology via three applications. The first application utilizes a high dimensional soil dataset with inherent spatial dependence. The second application involves modeling of asset volatility. The third application focuses on clinical trial data for creatinine.

Read more
Methodology

A General Framework of Online Updating Variable Selection for Generalized Linear Models with Streaming Datasets

In the research field of big data, one of important issues is how to recover the sequentially changing sets of true features when the data sets arrive sequentially. The paper presents a general framework for online updating variable selection and parameter estimation in generalized linear models with streaming datasets. This is a type of online updating penalty likelihoods with differentiable or non-differentiable penalty function. The online updating coordinate descent algorithm is proposed to solve the online updating optimization problem. Moreover, a tuning parameter selection is suggested in an online updating way. The selection and estimation consistencies, and the oracle property are established, theoretically. Our methods are further examined and illustrated by various numerical examples from both simulation experiments and a real data analysis.

Read more
Methodology

A Generative Approach to Joint Modeling of Quantitative and Qualitative Responses

In many scientific areas, data with quantitative and qualitative (QQ) responses are commonly encountered with a large number of predictors. By exploring the association between QQ responses, existing approaches often consider a joint model of QQ responses given the predictor variables. However, the dependency among predictive variables also provides useful information for modeling QQ responses. In this work, we propose a generative approach to model the joint distribution of the QQ responses and predictors. The proposed generative model provides efficient parameter estimation under a penalized likelihood framework. It achieves accurate classification for qualitative response and accurate prediction for quantitative response with efficient computation. Because of the generative approach framework, the asymptotic optimality of classification and prediction of the proposed method can be established under some regularity conditions. The performance of the proposed method is examined through simulations and real case studies in material science and genetics.

Read more
Methodology

A Hierarchical Meta-Analysis for Settings Involving Multiple Outcomes across Multiple Cohorts

Evidence from animal models and epidemiological studies has linked prenatal alcohol exposure (PAE) to a broad range of long-term cognitive and behavioral deficits. However, there is virtually no information in the scientific literature regarding the levels of PAE associated with an increased risk of clinically significant adverse effects. During the period from 1975-1993, several prospective longitudinal cohort studies were conducted in the U.S., in which maternal reports regarding alcohol use were obtained during pregnancy and the cognitive development of the offspring was assessed from early childhood through early adulthood. The sample sizes in these cohorts did not provide sufficient power to examine effects associated with different levels and patterns of PAE. To address this critical public health issue, we have developed a hierarchical meta-analysis to synthesize information regarding the effects of PAE on cognition, integrating data on multiple endpoints from six U.S. longitudinal cohort studies. Our approach involves estimating the dose-response coefficients for each endpoint and then pooling these correlated dose-response coefficients to obtain an estimated `global' effect of exposure on cognition. In the first stage, we use individual participant data to derive estimates of the effects of PAE by fitting regression models that adjust for potential confounding variables using propensity scores. The correlation matrix characterizing the dependence between the endpoint-specific dose-response coefficients estimated within each cohort is then run, while accommodating incomplete information on some endpoints. We also compare and discuss inferences based on the proposed approach to inferences based on a full multivariate analysis

Read more
Methodology

A Joint MLE Approach to Large-Scale Structured Latent Attribute Analysis

Structured Latent Attribute Models (SLAMs) are a family of discrete latent variable models widely used in education, psychology, and epidemiology to model multivariate categorical data. A SLAM assumes that multiple discrete latent attributes explain the dependence of observed variables in a highly structured fashion. Usually, the maximum marginal likelihood estimation approach is adopted for SLAMs, treating the latent attributes as random effects. The increasing scope of modern assessment data involves large numbers of observed variables and high-dimensional latent attributes. This poses challenges to classical estimation methods and requires new methodology and understanding of latent variable modeling. Motivated by this, we consider the joint maximum likelihood estimation (MLE) approach to SLAMs, treating latent attributes as fixed unknown parameters. We investigate estimability, consistency, and computation in the regime where sample size, number of variables, and number of latent attributes all can diverge. We establish the statistical consistency of the joint MLE and propose efficient algorithms that scale well to large-scale data for several popular SLAMs. Simulation studies demonstrate the superior empirical performance of the proposed methods. An application to real data from an international educational assessment gives interpretable findings of cognitive diagnosis.

Read more
Methodology

A Kernel-Based Neural Network for High-dimensional Genetic Risk Prediction Analysis

Risk prediction capitalizing on emerging human genome findings holds great promise for new prediction and prevention strategies. While the large amounts of genetic data generated from high-throughput technologies offer us a unique opportunity to study a deep catalog of genetic variants for risk prediction, the high-dimensionality of genetic data and complex relationships between genetic variants and disease outcomes bring tremendous challenges to risk prediction analysis. To address these rising challenges, we propose a kernel-based neural network (KNN) method. KNN inherits features from both linear mixed models (LMM) and classical neural networks and is designed for high-dimensional risk prediction analysis. To deal with datasets with millions of variants, KNN summarizes genetic data into kernel matrices and use the kernel matrices as inputs. Based on the kernel matrices, KNN builds a single-layer feedforward neural network, which makes it feasible to consider complex relationships between genetic variants and disease outcomes. The parameter estimation in KNN is based on MINQUE and we show, that under certain conditions, the average prediction error of KNN can be smaller than that of LMM. Simulation studies also confirm the results.

Read more
Methodology

A Latent Variable Model for Relational Events with Multiple Receivers

Directional relational event data, such as email data, often include multiple receivers for each event. Statistical methods for adequately modeling such data are limited however. In this article, a multiplicative latent factor model is proposed for relational event data with multiple receivers. For a given event (or message) all potential receiver actors are given a suitability score. When this score exceeds a sender-specific threshold value, the actor is added to the receiver set. The suitability score of a receiver actor for a given message can depend on observed sender and receiver specific characteristics, and on the latent variables of the sender, of the receiver, and of the message. One way to view these latent variables as the degree of specific unobserved topics on which an actor can be active as sender, as receiver, or that are relevant for a given message. Bayesian estimation of the model is relatively straightforward due to the Gaussian distribution of the latent suitability scale. The applicability of the model is illustrated on simulated data and on Enron email data for which about a third of the messages have at least two receivers.

Read more
Methodology

A New Framework for Inference on Markov Population Models

In this work we construct a joint Gaussian likelihood for approximate inference on Markov population models. We demonstrate that Markov population models can be approximated by a system of linear stochastic differential equations with time-varying coefficients. We show that the system of stochastic differential equations converges to a set of ordinary differential equations. We derive our proposed joint Gaussian deterministic limiting approximation (JGDLA) model from the limiting system of ordinary differential equations. The results is a method for inference on Markov population models that relies solely on the solution to a system deterministic equations. We show that our method requires no stochastic infill and exhibits improved predictive power in comparison to the Euler-Maruyama scheme on simulated susceptible-infected-recovered data sets. We use the JGDLA to fit a stochastic susceptible-exposed-infected-recovered system to the Princess Diamond COVID-19 cruise ship data set.

Read more
Methodology

A New Method to Determine the Presence of Continuous Variation in Parameters of Biological Growth Curve Models

Quantitative assessment of the growth of biological organisms has produced many mathematical equations. Many efforts have been given on statistical identification of the correct growth model from experimental data. Every growth equation is unique in terms of mathematical structures; however, one model may serve as a close approximation of the other by appropriate choice of the parameter(s). It is still a challenging problem to select the best estimating model from a set of model equations whose shapes are similar in nature. Our aim in this manuscript is to develop methodology that will reduce the efforts in model selection. This is achieved by utilizing an existing model selection criterion in an innovative way that reduces the number of model fitting exercises substantially. In this manuscript, we have shown that one model can be obtained from the other by choosing a suitable continuous transformation of the parameters. This idea builds an interconnection between many equations which are scattered in the literature. We also get several new growth equations; out of them large number of equations can be obtained from a few key models. Given a set of training data points and the key models, we utilize the idea of interval specific rate parameter (ISRP) proposed by Bhowmick et al (2014) to obtain a suitable mathematical model for the data. The ISRP profile of the parameters of simpler models indicates the nature of variation in parameters with time, thus, enable the experimenter to extrapolate the inference to more complex models. Our proposed methodology significantly reduces the efforts involved in model fitting exercises. The proposed idea is verified by using simulated and real data sets. In addition, theoretical justifications have been provided by investigating the statistical properties of the estimators.

Read more

Ready to get started?

Join us today