Featured Researches

Applications

"Old Techniques for New Times": the RMaCzek package for producing Czekanowski's Diagrams

Inspired by the MaCzek Visual Basic program we provide an R package, RMaCzek, that produces Czekanowski's diagram. Our package permits any seriation and distance method the user provides. In this paper we focus on the OLO and QAP_2SUM methods from the seriation package. We illustrate the possibilities of our package with three anthropological studies, one socio-economical one and a phylogenetically motivated simulation study.

Read more
Applications

A Bayesian Hierarchical Network for Combining Heterogeneous Data Sources in Medical Diagnoses

Computer-Aided Diagnosis has shown stellar performance in providing accurate medical diagnoses across multiple testing modalities (medical images, electrophysiological signals, etc.). While this field has typically focused on fully harvesting the signal provided by a single (and generally extremely reliable) modality, fewer efforts have utilized imprecise data lacking reliable ground truth labels. In this unsupervised, noisy setting, the robustification and quantification of the diagnosis uncertainty become paramount, thus posing a new challenge: how can we combine multiple sources of information -- often themselves with vastly varying levels of precision and uncertainty -- to provide a diagnosis estimate with confidence bounds? Motivated by a concrete application in antibody testing, we devise a Stochastic Expectation-Maximization algorithm that allows the principled integration of heterogeneous, and potentially unreliable, data types. Our Bayesian formalism is essential in (a) flexibly combining these heterogeneous data sources and their corresponding levels of uncertainty, (b) quantifying the degree of confidence associated with a given diagnostic, and (c) dealing with the missing values that typically plague medical data. We quantify the potential of this approach on simulated data, and showcase its practicality by deploying it on a real COVID-19 immunity study.

Read more
Applications

A Bayesian Model of Cash Bail Decisions

The use of cash bail as a mechanism for detaining defendants pre-trial is an often-criticized system that many have argued violates the presumption of "innocent until proven guilty." Many studies have sought to understand both the long-term effects of cash bail's use and the disparate rate of cash bail assignments along demographic lines (race, gender, etc). However, such work is often susceptible to problems of infra-marginality -- that the data we observe can only describe average outcomes, and not the outcomes associated with the marginal decision. In this work, we address this problem by creating a hierarchical Bayesian model of cash bail assignments. Specifically, our approach models cash bail decisions as a probabilistic process whereby judges balance the relative costs of assigning cash bail with the cost of defendants potentially skipping court dates, and where these skip probabilities are estimated based upon features of the individual case. We then use Monte Carlo inference to sample the distribution over these costs for different magistrates and across different races. We fit this model to a data set we have collected of over 50,000 court cases in the Allegheny and Philadelphia counties in Pennsylvania. Our analysis of 50 separate judges shows that they are uniformly more likely to assign cash bail to black defendants than to white defendants, even given identical likelihood of skipping a court appearance. This analysis raises further questions about the equity of the practice of cash bail, irrespective of its underlying legal justification.

Read more
Applications

A Bayesian Nonparametric Analysis of the 2003 Outbreak of Highly Pathogenic Avian Influenza in the Netherlands

Infectious diseases on farms pose both public and animal health risks, so understanding how they spread between farms is crucial for developing disease control strategies to prevent future outbreaks. We develop novel Bayesian nonparametric methodology to fit spatial stochastic transmission models in which the infection rate between any two farms is a function that depends on the distance between them, but without assuming a specified parametric form. Making nonparametric inference in this context is challenging since the likelihood function of the observed data is intractable because the underlying transmission process is unobserved. We adopt a fully Bayesian approach by assigning a transformed Gaussian Process prior distribution to the infection rate function, and then develop an efficient data augmentation Markov Chain Monte Carlo algorithm to perform Bayesian inference. We use the posterior predictive distribution to simulate the effect of different disease control methods and their economic impact. We analyse a large outbreak of Avian Influenza in the Netherlands and infer the between-farm infection rate, as well as the unknown infection status of farms which were pre-emptively culled. We use our results to analyse ring-culling strategies, and conclude that although effective, ring-culling has limited impact in high density areas.

Read more
Applications

A Bayesian cohort component projection model to estimate adult populations at the subnational level in data-sparse settings

Accurate estimates of subnational populations are important for policy formulation and monitoring population health indicators. For example, estimates of the number of women of reproductive age are important to understand the population at risk to maternal mortality and unmet need for contraception. However, in many low-income countries, data on population counts and components of population change are limited, and so levels and trends subnationally are unclear. We present a Bayesian constrained cohort component model for the estimation and projection of subnational populations. The model builds on a cohort component projection framework, incorporates census data and estimates from the United Nation's World Population Prospects, and uses characteristic mortality schedules to obtain estimates of population counts and the components of population change, including internal migration. The data required as inputs to the model are minimal and available across a wide range of countries, including most low-income countries. The model is applied to estimate and project populations by county in Kenya for 1979-2019, and validated against the 2019 Kenyan census.

Read more
Applications

A Bayesian hierarchical model to estimate land surface phenology parameters with harmonized Landsat 8 and Sentinel-2 images

We develop a Bayesian Land Surface Phenology (LSP) model and examine its performance using Enhanced Vegetation Index (EVI) observations derived from the Harmonized Landsat Sentinel-2 (HLS) dataset. Building on previous work, we propose a double logistic function that, once couched within a Bayesian model, yields posterior distributions for all LSP parameters. We assess the efficacy of the Normal, Truncated Normal, and Beta likelihoods to deliver robust LSP parameter estimates. Two case studies are presented and used to explore aspects of the proposed model. The first, conducted over forested pixels within a HLS tile, explores choice of likelihood and space-time varying HLS data availability for long-term average LSP parameter point and uncertainty estimation. The second, conducted on a small area of interest within the HLS tile on an annual time-step, further examines the impact of sample size and choice of likelihood on LSP parameter estimates. Results indicate that while the Truncated Normal and Beta likelihoods are theoretically preferable when the vegetation index is bounded, all three likelihoods performed similarly when the number of index observations is sufficiently large and values are not near the index bounds. Both case studies demonstrate how pixel-level LSP parameter posterior distributions can be used to propagate uncertainty through subsequent analysis. As a companion to this article, we provide an open-source \R package \pkg{rsBayes} and supplementary data and code used to reproduce the analysis results. The proposed model specification and software implementation delivers computationally efficient, statistically robust, and inferentially rich LSP parameter posterior distributions at the pixel-level across massive raster time series datasets.

Read more
Applications

A Bayesian spatio-temporal nowcasting model for public health decision-making and surveillance

As COVID-19 spread through the United States in 2020, states began to set up alert systems to inform policy decisions and serve as risk communication tools for the general public. Many of these systems, like in Ohio, included indicators based on an assessment of trends in reported cases. However, when cases are indexed by date of disease onset, reporting delays complicate the interpretation of trends. Despite a foundation of statistical literature to address this problem, these methods have not been widely applied in practice. In this paper, we develop a Bayesian spatio-temporal nowcasting model for assessing trends in county-level COVID-19 cases in Ohio. We compare the performance of our model to the current approach used in Ohio and the approach that was recommended by the Centers for Disease Control and Prevention. We demonstrate gains in performance while still retaining interpretability using our model. In addition, we are able to fully account for uncertainty in both the time series of cases and in the reporting process. While we cannot eliminate all of the uncertainty in public health surveillance and subsequent decision-making, we must use approaches that embrace these challenges and deliver more accurate and honest assessments to policymakers.

Read more
Applications

A Bias Correction Method in Meta-analysis of Randomized Clinical Trials with no Adjustments for Zero-inflated Outcomes

Many clinical endpoint measures, such as the number of standard drinks consumed per week or the number of days that patients stayed in the hospital, are count data with excessive zeros. However, the zero-inflated nature of such outcomes is often ignored in analyses, which leads to biased estimates and, consequently, a biased estimate of the overall intervention effect in a meta-analysis. The current study proposes a novel statistical approach, the Zero-inflation Bias Correction (ZIBC) method, that can account for the bias introduced when using the Poisson regression model despite a high rate of zeros in the outcome distribution for randomized clinical trials. This correction method utilizes summary information from individual studies to correct intervention effect estimates as if they were appropriately estimated in zero-inflated Poisson regression models. Simulation studies and real data analyses show that the ZIBC method has good performance in correcting zero-inflation bias in many situations. This method provides a methodological solution in improving the accuracy of meta-analysis results, which is important to evidence-based medicine.

Read more
Applications

A Causal Inference Approach to Measure the Vulnerability of Urban Metro Systems

Transit operators need vulnerability measures to understand the level of service degradation under disruptions. This paper contributes to the literature with a novel causal inference approach for estimating station-level vulnerability in metro systems. The empirical analysis is based on large-scale data on historical incidents and population-level passenger demand. This analysis thus obviates the need for assumptions made by previous studies on human behaviour and disruption scenarios. We develop four empirical vulnerability metrics based on the causal impact of disruptions on travel demand, average travel speed and passenger flow distribution. Specifically, the proposed metrics based on the irregularity in passenger flow distribution extends the scope of vulnerability measurement to the entire trip distribution, instead of just analysing the disruption impact on the entry or exit demand (that is, moments of the trip distribution). The unbiased estimates of disruption impact are obtained by adopting a propensity score matching method, which adjusts for the confounding biases caused by non-random occurrence of disruptions. An application of the proposed framework to the London Underground indicates that the vulnerability of a metro station depends on the location, topology, and other characteristics. We find that, in 2013, central London stations are more vulnerable in terms of travel demand loss. However, the loss of average travel speed and irregularity in relative passenger flows reveal that passengers from outer London stations suffer from longer individual delays due to lack of alternative routes.

Read more
Applications

A Comparative Study of Parametric Regression Models to Detect Breakpoint in Traffic Fundamental Diagram

A speed threshold is a crucial parameter in breakdown and capacity distribution analysis as it defines the boundary between free-flow and congested regimes. However, literature on approaches to establishing the breakpoint value for detecting breakdown events is limited. Most of existing studies rely on the use of either visual observation or predefined thresholds. These approaches may not be reliable considering the variations associated with field data. Thus, this study compared the performance of two data-driven methods, that is, logistic function (LGF) and two-regime models, used to establish the breakpoint from traffic flow variables. The two models were calibrated using urban freeway traffic data. The models'performance results revealed that with less computation efforts, the LGF has slightly better prediction accuracy than the two-regime model. Although the two-regime model had relatively lower performance, it can be useful in identifying the transitional state.

Read more

Ready to get started?

Join us today