Featured Researches

Applications

Anomaly Detection in Energy Usage Patterns

Energy usage monitoring on higher education campuses is an important step for providing satisfactory service, lowering costs and supporting the move to green energy. We present a collaboration between the Department of Statistics and Facilities Operations at an R1 research university to develop statistically based approaches for monitoring monthly energy usage and proportional yearly usage for several hundred utility accounts on campus. We compare the interpretability and power of model-free and model-based methods for detection of anomalous energy usage patterns in statistically similar groups of accounts. Ongoing conversation between the academic and operations teams enhances the practical utility of the project and enables implementation for the university. Our work highlights an application of thoughtful and continuing collaborative analysis using easy-to-understand statistical principles for real-world deployment.

Read more
Applications

Anomaly Detection on Seasonal Metrics via Robust Time Series Decomposition

The stability and persistence of web services are important to Internet companies to improve user experience and business performances. To keep eyes on numerous metrics and report abnormal situations, time series anomaly detection methods are developed and applied by various departments in companies and institutions. In this paper, we proposed a robust anomaly detection algorithm (MEDIFF) to monitor online business metrics in real time. Specifically, a decomposition method using robust statistical metric--median--of the time series was applied to decouple the trend and seasonal components. With the effects of daylight saving time (DST) shift and holidays, corresponding components were decomposed from the time series. The residual after decomposition was tested by a generalized statistics method to detect outliers in the time series. We compared the proposed MEDIFF algorithm with two open source algorithms (SH-ESD and DONUT) by using our labeled internal business metrics. The results demonstrated the effectiveness of the proposed MEDIFF algorithm.

Read more
Applications

Application of Bayesian Dynamic Linear Models to Random Allocation Clinical Trials

Random allocation models used in clinical trials aid researchers in determining which of a particular treatment provides the best results by reducing bias between groups. Often however, this determination leaves researchers battling ethical issues of providing patients with unfavorable treatments. Many methods such as Play the Winner and Randomized Play the Winner Rule have historically been utilized to determine patient allocation, however, these methods are prone to the increased assignment of unfavorable treatments. Recently a new Bayesian Method using Decreasingly Informative Priors has been proposed by \citep{sabo2014adaptive}, and later \citep{donahue2020allocation}. Yet this method can be time consuming if MCMC methods are required. We propose the use of a new method which uses Dynamic Linear Model (DLM) \citep{harrison1999bayesian} to increase allocation speed while also decreasing patient allocation samples necessary to identify the more favorable treatment. Furthermore, a sensitivity analysis is conducted on multiple parameters. Finally, a Bayes Factor is calculated to determine the proportion of unused patient budget remaining at a specified cut off and this will be used to determine decisive evidence in favor of the better treatment.

Read more
Applications

Application of Dynamic Linear Models to Random Allocation Clinical Trials with Covariates

A recent method using Dynamic Linear Models to improve preferred treatment allocation budget in random allocation models was proposed by Lee, Boone, et al (2020). However this model failed to include the impact covariates such as smoking, gender, etc, had on model performance. The current paper addresses random allocation to treatments using the DLM in Bayesian Adaptive Allocation Models with a single covariate. We show a reduced treatment allocation budget along with a reduced time to locate preferred treatment. Furthermore, a sensitivity analysis is performed on mean and variance parameters and a power analysis is conducted using Bayes Factor. This power analysis is used to determine the proportion of unallocated patient budgets above a specified cutoff value. Additionally a sensitivity analysis is conducted on covariate coefficients.

Read more
Applications

Application of the Cox Regression Model for Analysis of Railway Safety Performance

The assessment of in-service safety performance is an important task, not only in railways. For example it is important to identify deviations early, in particular possible deterioration of safety performance, so that corrective actions can be applied early. On the other hand the assessment should be fair and objective and rely on sound and proven statistical methods. A popular means for this task is trend analysis. This paper defines a model for trend analysis and compares different approaches, e. g. classical and Bayes approaches, on real data. The examples show that in particular for small sample sizes, e. g. when railway operators shall be assessed, the Bayesian prior may influence the results significantly.

Read more
Applications

Applications of Clustering with Mixed Type Data in Life Insurance

Death benefits are generally the largest cash flow item that affects financial statements of life insurers where some still do not have a systematic process to track and monitor death claims experience. In this article, we explore data clustering to examine and understand how actual death claims differ from expected, an early stage of developing a monitoring system crucial for risk management. We extend the k -prototypes clustering algorithm to draw inference from a life insurance dataset using only the insured's characteristics and policy information without regard to known mortality. This clustering has the feature to efficiently handle categorical, numerical, and spatial attributes. Using gap statistics, the optimal clusters obtained from the algorithm are then used to compare actual to expected death claims experience of the life insurance portfolio. Our empirical data contains observations, during 2014, of approximately 1.14 million policies with a total insured amount of over 650 billion dollars. For this portfolio, the algorithm produced three natural clusters, with each cluster having a lower actual to expected death claims but with differing variability. The analytical results provide management a process to identify policyholders' attributes that dominate significant mortality deviations, and thereby enhance decision making for taking necessary actions.

Read more
Applications

Assessing Vaccine Durability in Randomized Trials Following Placebo Crossover

Randomized vaccine trials are used to assess vaccine efficacy and to characterize the durability of vaccine induced protection. If efficacy is demonstrated, the treatment of placebo volunteers becomes an issue. For COVID-19 vaccine trials, there is broad consensus that placebo volunteers should be offered a vaccine once efficacy has been established. This will likely lead to most placebo volunteers crossing over to the vaccine arm, thus complicating the assessment of long term durability. We show how to analyze durability following placebo crossover and demonstrate that the vaccine efficacy profile that would be observed in a placebo controlled trial is recoverable in a trial with placebo crossover. This result holds no matter when the crossover occurs and with no assumptions about the form of the efficacy profile. We only require that the vaccine efficacy profile applies to the newly vaccinated irrespective of the timing of vaccination. We develop different methods to estimate efficacy within the context of a proportional hazards regression model and explore via simulation the implications of placebo crossover for estimation of vaccine efficacy under different efficacy dynamics and study designs. We apply our methods to simulated COVID-19 vaccine trials with durable and waning vaccine efficacy and a total follow-up of two years.

Read more
Applications

Assessing the causal effects of a stochastic intervention in time series data: Are heat alerts effective in preventing deaths and hospitalizations?

We introduce a new causal inference framework for time series data aimed at assessing the effectiveness of heat alerts in reducing mortality and hospitalization risks. We are interested in addressing the following question: how many deaths and hospitalizations could be averted if we were to increase the frequency of issuing heat alerts in a given location? In the context of time series data, the overlap assumption - each unit must have a positive probability of receiving the treatment - is often violated. This is because, in a given location, issuing a heat alert is a rare event on an average temperature day as heat alerts are almost always issued on extremely hot days. To overcome this challenge, first we introduce a new class of causal estimands under a stochastic intervention (i.e., increasing the odds of issuing a heat alert) for a single time series corresponding to a given location. We develop the theory to show that these causal estimands can be identified and estimated under a weaker version of the overlap assumption. Second, we propose nonparametric estimators based on time-varying propensity scores, and derive point-wise confidence bands for these estimators. Third, we extend this framework to multiple time series corresponding to multiple locations. Via simulations, we show that the proposed estimator has good performance with respect to bias and root mean squared error. We apply our proposed method to estimate the causal effects of increasing the odds of issuing heat alerts in reducing deaths and hospitalizations among Medicare enrollees in 2817 U.S. counties. We found weak evidence of a causal link between increasing the odds of issuing heat alerts during the warm seasons of 2006-2016 and a reduction in deaths and cause-specific hospitalizations across the 2817 counties.

Read more
Applications

Assessing the contagiousness of mass shootings with nonparametric Hawkes processes

Gun violence and mass shootings are high-profile epidemiological issues facing the United States with questions regarding their contagiousness gaining prevalence in news media. Through the use of nonparametric Hawkes processes, we examine the evidence for the existence of contagiousness within a catalog of mass shootings and highlight the broader benefits of using such nonparametric point process models in modeling the occurrence of such events.

Read more
Applications

Assessment of COVID-19 hospitalization forecasts from a simplified SIR model

We propose the SH model, a simplified version of the well-known SIR compartmental model of infectious diseases. With optimized parameters and initial conditions, this time-invariant two-parameter two-dimensional model is able to fit COVID-19 hospitalization data over several months with high accuracy (mean absolute percentage error below 15%). Moreover, we observed that, when the model is trained on a suitable two-week period around the hospitalization peak for Belgium, it forecasts the subsequent three-month decrease with mean absolute percentage error below 10%. However, when it is trained in the increase phase, it is less successful at forecasting the subsequent evolution.

Read more

Ready to get started?

Join us today