Featured Researches

Applications

A Unified Evaluation of Two-Candidate Ballot-Polling Election Auditing Methods

Counting votes is complex and error-prone. Several statistical methods have been developed to assess election accuracy by manually inspecting randomly selected physical ballots. Two 'principled' methods are risk-limiting audits (RLAs) and Bayesian audits (BAs). RLAs use frequentist statistical inference while BAs are based on Bayesian inference. Until recently, the two have been thought of as fundamentally different. We present results that unify and shed light upon 'ballot-polling' RLAs and BAs (which only require the ability to sample uniformly at random from all cast ballot cards) for two-candidate plurality contests, which are building blocks for auditing more complex social choice functions, including some preferential voting systems. We highlight the connections between the methods and explore their performance. First, building on a previous demonstration of the mathematical equivalence of classical and Bayesian approaches, we show that BAs, suitably calibrated, are risk-limiting. Second, we compare the efficiency of the methods across a wide range of contest sizes and margins, focusing on the distribution of sample sizes required to attain a given risk limit. Third, we outline several ways to improve performance and show how the mathematical equivalence explains the improvements.

Read more
Applications

A Vector Autoregression Prediction Model for COVID-19 Outbreak

Since two people came down a county of north Seattle with positive COVID-19 (coronavirus-19) in 2019, the current total cases in the United States (U.S.) are over 12 million. Predicting the pandemic trend under effective variables is crucial to help find a way to control the epidemic. Based on available literature, we propose a validated Vector Autoregression (VAR) time series model to predict the positive COVID-19 cases. A real data prediction for U.S. is provided based on the U.S. coronavirus data. The key message from our study is that the situation of the pandemic will getting worse if there is no effective control.

Read more
Applications

A Zero-State Coupled Markov Switching Poisson Model for Spatio-temporal Infectious Disease Counts

Spatio-temporal counts of infectious disease cases often contain an excess of zeros. Existing zero inflated Poisson models applied to such data do not adequately capture the switching of the disease between periods of presence and absence overtime. As an alternative, we develop a new zero-state coupled Markov switching Poisson Model, under which the disease switches between periods of presence and absence in each area through a series of partially hidden nonhomogeneous Markov chains coupled between neighboring locations. When the disease is present, an autoregressive Poisson model generates the cases with a possible 0 representing the disease being undetected. Bayesian inference and prediction is illustrated using spatio-temporal counts of dengue fever cases in Rio de Janeiro, Brazil.

Read more
Applications

A critical assessment of conformal prediction methods applied in binary classification settings

In recent years there has been an increase in the number of scientific papers that suggest using conformal predictions in drug discovery. We consider that some versions of conformal predictions applied in binary settings are embroiled in pitfalls, not obvious at first sight, and that it is important to inform the scientific community about them. In the paper we first introduce the general theory of conformal predictions and follow with an explanation of the versions currently dominant in drug discovery research today. Finally, we provide cases for their critical assessment in binary classification settings.

Read more
Applications

A data-driven prospective study of incident dementia among older adults in the United States

We conducted a prospective analysis of incident dementia and its association with 65 sociodemographic, early-life, economic, health and behavioral, social, and genetic risk factors in a sample of 7,908 adults over the age of 50 from the nationally representative US-based Health and Retirement Study. We used traditional survival analysis methods (Fine-Gray models) and a data-driven approach (random survival forests for competing risks) which allowed us to account for the competing risk of death with up to 14 years of follow-up. Overall, the top five predictors across all groups were lower education, loneliness, lower wealth and income, and lower self-reported health. However, we observed variation in the leading predictors of dementia across racial/ethnic and gender groups. Our ranked lists may be useful for guiding future observational and quasi-experimental research that investigates understudied domains of risk and emphasizes life course economic and health conditions as well as disparities therein.

Read more
Applications

A decision integration strategy for short-term demand forecasting and ordering for red blood cell components

Blood transfusion is one of the most crucial and commonly administered therapeutics worldwide. The need for more accurate and efficient ways to manage blood demand and supply is an increasing concern. Building a technology-based, robust blood demand and supply chain that can achieve the goals of reducing ordering frequency, inventory level, wastage and shortage, while maintaining the safety of blood usage, is essential in modern healthcare systems. In this study, we summarize the key challenges in current demand and supply management for red blood cells (RBCs). We combine ideas from statistical time series modeling, machine learning, and operations research in developing an ordering decision strategy for RBCs, through integrating a hybrid demand forecasting model using clinical predictors and a data-driven multi-period inventory problem considering inventory and reorder constraints. We have applied the integrated ordering strategy to the blood inventory management system in Hamilton, Ontario using a large clinical database from 2008 to 2018. The proposed hybrid demand forecasting model provides robust and accurate predictions, and identifies important clinical predictors for short-term RBC demand forecasting. Compared with the actual historical data, our integrated ordering strategy reduces the inventory level by 40% and decreases the ordering frequency by 60%, with low incidence of shortages and wastage due to expiration. If implemented successfully, our proposed strategy can achieve significant cost savings for healthcare systems and blood suppliers. The proposed ordering strategy is generalizable to other blood products or even other perishable products.

Read more
Applications

A deterministic matching method for exact matchings to compare the outcome of different interventions

Statistical matching methods are widely used in the social and health sciences to estimate causal effects using observational data. Often the objective is to find comparable groups with similar covariate distributions in a dataset, with the aim to reduce bias in a random experiment. We aim to develop a foundation for deterministic methods which provide results with low bias, while retaining interpretability. The proposed method matches on the covariates and calculates all possible maximal exact matchesfor a given dataset without adding numerical errors. Notable advantages of our method over existing matching algorithms are that all available information for exact matches is used, no additional bias is introduced, it can be combined with other matching methods for inexact matching to reduce pruning and that the result is calculated in a fast and deterministic way. For a given dataset the result is therefore provably unique for exact matches in the mathematical sense. We provide proofs, instructions for implementation as well as a numerical example calculated for comparison on a complete survey.

Read more
Applications

A functional-data approach to the Argo data

The Argo data is a modern oceanography dataset that provides unprecedented global coverage of temperature and salinity measurements in the upper 2,000 meters of depth of the ocean. We study the Argo data from the perspective of functional data analysis (FDA). We develop spatio-temporal functional kriging methodology for mean and covariance estimation to predict temperature and salinity at a fixed location as a smooth function of depth. By combining tools from FDA and spatial statistics, including smoothing splines, local regression, and multivariate spatial modeling and prediction, our approach provides advantages over current methodology that consider pointwise estimation at fixed depths. Our approach naturally leverages the irregularly-sampled data in space, time, and depth to fit a space-time functional model for temperature and salinity. The developed framework provides new tools to address fundamental scientific problems involving the entire upper water column of the oceans such as the estimation of ocean heat content, stratification, and thermohaline oscillation. For example, we show that our functional approach yields more accurate ocean heat content estimates than ones based on discrete integral approximations in pressure. Further, using the derivative function estimates, we obtain a new product of a global map of the mixed layer depth, a key component in the study of heat absorption and nutrient circulation in the oceans. The derivative estimates also reveal evidence for density inversions in areas distinguished by mixing of particularly different water masses.

Read more
Applications

A heavy-tailed and overdispersed collective risk model

Insurance data can be asymmetric with heavy tails, causing inadequate adjustments of the usually applied models. To deal with this issue, hierarchical models for collective risk with heavy-tails of the claims distributions that take also into account overdispersion of the number of claims are proposed. In particular, the distribution of the logarithm of the aggregate value of claims is assumed to follow a Student-t distribution. Additionally, to incorporate possible overdispersion, the number of claims is modeled as having a negative binomial distribution. Bayesian decision theory is invoked to calculate the fair premium based on the modified absolute deviation utility. An application to a health insurance dataset is presented together with some diagnostic measures to identify excess variability. The variability measures are analyzed using the marginal posterior predictive distribution of the premiums according to some competitive models. Finally, a simulation study is carried out to assess the predictive capability of the model and the adequacy of the Bayesian estimation procedure. Keywords: Continuous ranked probability score (CRPS); decision theory; insurance data; marginal posterior predictive; tail value at risk; value at risk.

Read more
Applications

A hierarchical spatio-temporal model to analyze relative risk variations of COVID-19: a focus on Spain, Italy and Germany

The novel coronavirus disease (COVID-19) has spread rapidly across the world in a short period of time and with a heterogeneous pattern. Understanding the underlying temporal and spatial dynamics in the spread of COVID-19 can result in informed and timely public health policies. In this paper, we use a spatio-temporal stochastic model to explain the temporal and spatial variations in the daily number of new confirmed cases in Spain, Italy and Germany from late February to mid September 2020. Using a hierarchical Bayesian framework, we found that the temporal trend of the epidemic in the three countries rapidly reached their peaks and slowly started to decline at the beginning of April and then increased and reached their second maximum in August. However decline and increase of the temporal trend seems to be sharper in Spain and smoother in Germany. The spatial heterogeneity of the relative risk of COVID-19 in Spain is also more pronounced than Italy and Germany.

Read more

Ready to get started?

Join us today