Featured Researches

Applications

Assortativity measures for weighted and directed networks

Assortativity measures the tendency of a vertex in a network being connected by other vertexes with respect to some vertex-specific features. Classical assortativity coefficients are defined for unweighted and undirected networks with respect to vertex degree. We propose a class of assortativity coefficients that capture the assortative characteristics and structure of weighted and directed networks more precisely. The vertex-to-vertex strength correlation is used as an example, but the proposed measure can be applied to any pair of vertex-specific features. The effectiveness of the proposed measure is assessed through extensive simulations based on prevalent random network models in comparison with existing assortativity measures. In application World Input-Ouput Networks,the new measures reveal interesting insights that would not be obtained by using existing ones. An implementation is publicly available in a R package "wdnet".

Read more
Applications

Asymmetric Tobit analysis for correlation estimation from censored data

Contamination of water resources with pathogenic microorganisms excreted in human feces is a worldwide public health concern. Surveillance of fecal contamination is commonly performed by routine monitoring for a single type or a few types of microorganism(s). To design a feasible routine for periodic monitoring and to control risks of exposure to pathogens, reliable statistical algorithms for inferring correlations between concentrations of microorganisms in water need to be established. Moreover, because pathogens are often present in low concentrations, some contaminations are likely to be under a detection limit. This yields a pairwise left-censored dataset and complicates computation of correlation coefficients. Errors of correlation estimation can be smaller if undetected values are imputed better. To obtain better imputations, we utilize side information and develop a new technique, the \emph{asymmetric Tobit model} which is an extension of the Tobit model so that domain knowledge can be exploited effectively when fitting the model to a censored dataset. The empirical results demonstrate that imputation with domain knowledge is effective for this task.

Read more
Applications

Automated Vehicle Crash Sequences: Patterns and Potential Uses in Safety Testing

With safety being one of the primary motivations for developing automated vehicles (AVs), extensive field and simulation tests are being carried out to ensure AVs can operate safely on roadways. Since 2014, the California DMV has been collecting AV collision and disengagement reports, which are valuable data sources for studying AV crash patterns. In this study, crash sequence data extracted from California AV collision reports were used to investigate patterns and how they may be used to develop AV test scenarios. Employing sequence analysis, this study evaluated 168 AV crashes (with AV in automatic driving mode before disengagement or collision) from 2015 to 2019. Analysis of subsequences showed that the most representative pattern in AV crashes was (collision following AV stop) type. Analysis of event transition showed that disengagement, as an event in 24 percent of all studied AV crash sequences, had a transition probability of 68 percent to an immediate collision. Cluster analysis characterized AV crash sequences into seven groups with distinctive crash dynamic features. Cross-tabulation analysis showed that sequence groups were significantly associated with variables measuring crash outcomes and describing environmental conditions. Crash sequences are useful for developing AV test scenarios. Based on the findings, a scenario-based AV safety testing framework was proposed with sequence of events embedded as a core component.

Read more
Applications

BVAR-Connect: A Variational Bayes Approach to Multi-Subject Vector Autoregressive Models for Inference on Brain Connectivity Networks

In this paper we propose BVAR-connect, a variational inference approach to a Bayesian multi-subject vector autoregressive (VAR) model for inference on effective brain connectivity based on resting-state functional MRI data. The modeling framework uses a Bayesian variable selection approach that flexibly integrates multi-modal data, in particular structural diffusion tensor imaging (DTI) data, into the prior construction. The variational inference approach we develop allows scalability of the methods and results in the ability to estimate subject- and group-level brain connectivity networks over whole-brain parcellations of the data. We provide a brief description of a user-friendly MATLAB GUI released for public use. We assess performance on simulated data, where we show that the proposed inference method can achieve comparable accuracy to the sampling-based Markov Chain Monte Carlo approach but at a much lower computational cost. We also address the case of subject groups with imbalanced sample sizes. Finally, we illustrate the methods on resting-state functional MRI and structural DTI data on children with a history of traumatic injury.

Read more
Applications

Bang the Can Slowly: An Investigation into the 2017 Houston Astros

This manuscript is a statistical investigation into the 2017 Major League Baseball scandal involving the Houston Astros, the World Series championship winner that same year. The Astros were alleged to have stolen their opponents' pitching signs in order to provide their batters with a potentially unfair advantage. This work finds compelling evidence that the Astros on-field performance was significantly affected by their sign-stealing ploy and quantifies the effects. The three main findings in the manuscript are: 1) the Astros' odds of swinging at a pitch were reduced by approximately 27% (OR: 0.725, 95% CI: (0.618, 0.850)) when the sign was stolen, 2) when an Astros player swung, the odds of making contact with the ball increased roughly 80% (OR: 1.805, 95% CI: (1.342, 2.675)) on non-fastball pitches, and 3) when the Astros made contact with a ball on a pitch in which the sign was known, the ball's exit velocity (launch speed) increased on average by 2.386 (95% CI: (0.334, 4.451)) miles per hour.

Read more
Applications

Bayesian Beta-Binomial Prevalence Estimation Using an Imperfect Test

Following [Diggle 2011, Greenland 1995], we give a simple formula for the Bayesian posterior density of a prevalence parameter based on unreliable testing of a population. This problem is of particular importance when the false positive test rate is close to the prevalence in the population being tested. An efficient Monte Carlo algorithm for approximating the posterior density is presented, and applied to estimating the Covid-19 infection rate in Santa Clara county, CA using the data reported in [Bendavid 2020]. We show that the true Bayesian posterior places considerably more mass near zero, resulting in a prevalence estimate of 5,000--70,000 infections (median: 42,000) (2.17% (95CI 0.27%--3.63%)), compared to the estimate of 48,000--81,000 infections derived in [Bendavid 2020] using the delta method. A demonstration, with code and additional examples, is available at this http URL.

Read more
Applications

Bayesian Bi-clustering Methods with Applications in Computational Biology

Bi-clustering is a useful approach in analyzing biological data when observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to high dimensions, and propose three Bayesian bi-clustering models on categorical data, which increase in complexities in their modeling of the distributions of features across bi-clusters. Our proposed methods apply to a wide range of scenarios: from situations where data are cluster-distinguishable only among a small subset of features but masked by a large amount of noise, to situations where different groups of data are identified by different sets of features or data exhibit hierarchical structures. Through simulation studies, we show that our methods outperform existing (bi-)clustering methods in both identifying clusters and recovering feature distributional patterns across bi-clusters. We apply our methods to two genetic datasets, though the area of application of our methods is even broader.

Read more
Applications

Bayesian Edge Regression in Undirected Graphical Models to Characterize Interpatient Heterogeneity in Cancer

Graphical models are commonly used to discover associations within gene or protein networks for complex diseases such as cancer. Most existing methods estimate a single graph for a population, while in many cases, researchers are interested in characterizing the heterogeneity of individual networks across subjects with respect to subject-level covariates. Examples include assessments of how the network varies with patient-specific prognostic scores or comparisons of tumor and normal graphs while accounting for tumor purity as a continuous predictor. In this paper, we propose a novel edge regression model for undirected graphs, which estimates conditional dependencies as a function of subject-level covariates. Bayesian shrinkage algorithms are used to induce sparsity in the underlying graphical models. We assess our model performance through simulation studies focused on comparing tumor and normal graphs while adjusting for tumor purity and a case study assessing how blood protein networks in hepatocellular carcinoma patients vary with severity of disease, measured by HepatoScore, a novel biomarker signature measuring disease severity.

Read more
Applications

Bayesian Functional Registration of fMRI Data

Functional magnetic resonance imaging (fMRI) has provided invaluable insight into our understanding of human behavior. However, large inter-individual differences in both brain anatomy and functional localization after anatomical alignment remain a major limitation in conducting group analyses and performing population-level inference. This paper addresses this problem by developing and validating a new computational technique for reducing misalignment across individuals in functional brain systems by spatially transforming each subject's functional data to a common reference map. Our proposed Bayesian functional registration approach allows us to assess differences in brain function across subjects and individual differences in activation topology. It combines intensity-based and feature-based information into an integrated framework and allows inference to be performed on the transformation via the posterior samples. We evaluate the method in a simulation study and apply it to data from a study of thermal pain. We find that the proposed approach provides increased sensitivity for group-level inference.

Read more
Applications

Bayesian Hierarchical Models for High-Dimensional Mediation Analysis with Coordinated Selection of Correlated Mediators

We consider Bayesian high-dimensional mediation analysis to identify among a large set of correlated potential mediators the active ones that mediate the effect from an exposure variable to an outcome of interest. Correlations among mediators are commonly observed in modern data analysis; examples include the activated voxels within connected regions in brain image data, regulatory signals driven by gene networks in genome data and correlated exposure data from the same source. When correlations are present among active mediators, mediation analysis that fails to account for such correlation can be sub-optimal and may lead to a loss of power in identifying active mediators. Building upon a recent high-dimensional mediation analysis framework, we propose two Bayesian hierarchical models, one with a Gaussian mixture prior that enables correlated mediator selection and the other with a Potts mixture prior that accounts for the correlation among active mediators in mediation analysis. We develop efficient sampling algorithms for both methods. Various simulations demonstrate that our methods enable effective identification of correlated active mediators, which could be missed by using existing methods that assume prior independence among active mediators. The proposed methods are applied to the LIFECODES birth cohort and the Multi-Ethnic Study of Atherosclerosis (MESA) and identified new active mediators with important biological implications.

Read more

Ready to get started?

Join us today