Featured Researches

Applications

A Comparison of Aggregation Methods for Probabilistic Forecasts of COVID-19 Mortality in the United States

The COVID-19 pandemic has placed forecasting models at the forefront of health policy making. Predictions of mortality and hospitalization help governments meet planning and resource allocation challenges. In this paper, we consider the weekly forecasting of the cumulative mortality due to COVID-19 at the national and state level in the U.S. Optimal decision-making requires a forecast of a probability distribution, rather than just a single point forecast. Interval forecasts are also important, as they can support decision making and provide situational awareness. We consider the case where probabilistic forecasts have been provided by multiple forecasting teams, and we aggregate the forecasts to extract the wisdom of the crowd. With only limited information available regarding the historical accuracy of the forecasting teams, we consider aggregation (i.e. combining) methods that do not rely on a record of past accuracy. In this empirical paper, we evaluate the accuracy of aggregation methods that have been previously proposed for interval forecasts and predictions of probability distributions. These include the use of the simple average, the median, and trimming methods, which enable robust estimation and allow the aggregate forecast to reduce the impact of a tendency for the forecasting teams to be under- or overconfident. We use data that has been made publicly available from the COVID-19 Forecast Hub. While the simple average performed well for the high mortality series, we obtained greater accuracy using the median and certain trimming methods for the low and medium mortality series. It will be interesting to see if this remains the case as the pandemic evolves.

Read more
Applications

A Compartment Model of Human Mobility and Early Covid-19 Dynamics in NYC

In this paper, we build a mechanistic system to understand the relation between a reduction in human mobility and Covid-19 spread dynamics within New York City. To this end, we propose a multivariate compartmental model that jointly models smartphone mobility data and case counts during the first 90 days of the epidemic. Parameter calibration is achieved through the formulation of a general Bayesian hierarchical model to provide uncertainty quantification of resulting estimates. The open-source probabilistic programming language Stan is used for the requisite computation. Through sensitivity analysis and out-of-sample forecasting, we find our simple and interpretable model provides evidence that reductions in human mobility altered case dynamics.

Read more
Applications

A Copula-based Fully Bayesian Nonparametric Evaluation of Cardiovascular Risk Markers in the Mexico City Diabetes Study

Cardiovascular disease lead the cause of death world wide and several studies have been carried out to understand and explore cardiovascular risk markers in normoglycemic and diabetic populations. In this work, we explore the association structure between hyperglycemic markers and cardiovascular risk markers controlled by triglycerides, body mass index, age and gender, for the normoglycemic population in The Mexico City Diabetes Study. Understanding the association structure could contribute to the assessment of additional cardiovascular risk markers in this low income urban population with a high prevalence of classic cardiovascular risk biomarkers. The association structure is measured by conditional Kendall's tau, defined through conditional copula functions. The latter are in turn modeled under a fully Bayesian nonparametric approach, which allows the complete shape of the copula function to vary for different values of the controlled covariates.

Read more
Applications

A Degradation Performance Model With Mixed-type Covariates and Latent Heterogeneity

Successful modeling of degradation performance data is essential for accurate reliability assessment and failure predictions of highly reliable product units. The degradation performance measurements over time are highly heterogeneous. Such heterogeneity can be partially attributed to external factors, such as accelerated/environmental conditions, and can also be attributed to internal factors, such as material microstructure characteristics of product units. The latent heterogeneity due to the unobserved/unknown factors shared within each product unit may also exists and need to be considered as well. Existing degradation models often fail to consider (i) the influence of both external accelerated/environmental conditions and internal material information, (ii) the influence of unobserved/unknown factors within each unit. In this work, we propose a generic degradation performance modeling framework with mixed-type covariates and latent heterogeneity to account for both influences of observed internal and external factors as well as unobserved factors. Effective estimation algorithm is also developed to jointly quantify the influences of mixed-type covariates and individual latent heterogeneity, and also to examine the potential interaction between mixed-type covariates. Functional data analysis and data augmentation techniques are employed to address a series of estimation issues. A real case study is further provided to demonstrate the superior performance of the proposed approach over several alternative modeling approaches. Besides, the proposed degradation performance modeling framework also provides interpretable findings.

Read more
Applications

A Dynamic Choice Model with Heterogeneous Decision Rules: Application in Estimating the User Cost of Rail Crowding

Crowding valuation of subway riders is an important input to various supply-side decisions of transit operators. The crowding cost perceived by a transit rider is generally estimated by capturing the trade-off that the rider makes between crowding and travel time while choosing a route. However, existing studies rely on static compensatory choice models and fail to account for inertia and the learning behaviour of riders. To address these challenges, we propose a new dynamic latent class model (DLCM) which (i) assigns riders to latent compensatory and inertia/habit classes based on different decision rules, (ii) enables transitions between these classes over time, and (iii) adopts instance-based learning theory to account for the learning behaviour of riders. We use the expectation-maximisation algorithm to estimate DLCM, and the most probable sequence of latent classes for each rider is retrieved using the Viterbi algorithm. The proposed DLCM can be applied in any choice context to capture the dynamics of decision rules used by a decision-maker. We demonstrate its practical advantages in estimating the crowding valuation of an Asian metro's riders. To calibrate the model, we recover the daily route preferences and in-vehicle crowding experiences of regular metro riders using a two-month-long smart card and vehicle location data. The results indicate that the average rider follows the compensatory rule on only 25.5% of route choice occasions. DLCM estimates also show an increase of 47% in metro riders' valuation of travel time under extremely crowded conditions relative to that under uncrowded conditions.

Read more
Applications

A Framework for Crop Price Forecasting in Emerging Economies by Analyzing the Quality of Time-series Data

Accuracy of crop price forecasting techniques is important because it enables the supply chain planners and government bodies to take appropriate actions by estimating market factors such as demand and supply. In emerging economies such as India, the crop prices at marketplaces are manually entered every day, which can be prone to human-induced errors like the entry of incorrect data or entry of no data for many days. In addition to such human prone errors, the fluctuations in the prices itself make the creation of stable and robust forecasting solution a challenging task. Considering such complexities in crop price forecasting, in this paper, we present techniques to build robust crop price prediction models considering various features such as (i) historical price and market arrival quantity of crops, (ii) historical weather data that influence crop production and transportation, (iii) data quality-related features obtained by performing statistical analysis. We additionally propose a framework for context-based model selection and retraining considering factors such as model stability, data quality metrics, and trend analysis of crop prices. To show the efficacy of the proposed approach, we show experimental results on two crops - Tomato and Maize for 14 marketplaces in India and demonstrate that the proposed approach not only improves accuracy metrics significantly when compared against the standard forecasting techniques but also provides robust models.

Read more
Applications

A Fully Bayesian, Logistic Regression Tracking Algorithm for Mitigating Disparate Misclassification

We develop a fully Bayesian, logistic tracking algorithm with the purpose of providing classification results that are unbiased when applied uniformly to individuals with differing sensitive variable values. Here, we consider bias in the form of differences in false prediction rates between the different sensitive variable groups. Given that the method is fully Bayesian, it is well suited for situations where group parameters or logistic regression coefficients are dynamic quantities. We illustrate our method, in comparison to others, on both simulated datasets and the well-known ProPublica COMPAS dataset.

Read more
Applications

A Generalized One Parameter Polynomial Exponential Generator Family of Distributions

A new class of distributions, called Generalized One Parameter Polynomial Exponential-G family of distributions is proposed for modelling lifetime data. An account of the structural and reliability properties of the new class is presented. Maximum likelihood estimation of parameters of the class of distributions has been described. Simulation study results have been reported. Two data sets have been analyzed to illustrate its applicability.

Read more
Applications

A Geospatial Functional Model For OCO-2 Data with Application on Imputation and Land Fraction Estimation

Data from NASA's Orbiting Carbon Observatory-2 (OCO-2) satellite is essential to many carbon management strategies. A retrieval algorithm is used to estimate CO2 concentration using the radiance data measured by OCO-2. However, due to factors such as cloud cover and cosmic rays, the spatial coverage of the retrieval algorithm is limited in some areas of critical importance for carbon cycle science. Mixed land/water pixels along the coastline are also not used in the retrieval processing due to the lack of valid ancillary variables including land fraction. We propose an approach to model spatial spectral data to solve these two problems by radiance imputation and land fraction estimation. The spectral observations are modeled as spatially indexed functional data with footprint-specific parameters and are reduced to much lower dimensions by functional principal component analysis. The principal component scores are modeled as random fields to account for the spatial dependence, and the missing spectral observations are imputed by kriging the principal component scores. The proposed method is shown to impute spectral radiance with high accuracy for observations over the Pacific Ocean. An unmixing approach based on this model provides much more accurate land fraction estimates in our validation study along Greece coastlines.

Read more
Applications

A Graph-Theoretic Approach for Spatial Filtering and Its Impact on Mixed-type Spatial Pattern Recognition in Wafer Bin Maps

Statistical quality control in semiconductor manufacturing hinges on effective diagnostics of wafer bin maps, wherein a key challenge is to detect how defective chips tend to spatially cluster on a wafer--a problem known as spatial pattern recognition. Recently, there has been a growing interest in mixed-type spatial pattern recognition--when multiple defect patterns, of different shapes, co-exist on the same wafer. Mixed-type spatial pattern recognition entails two central tasks: (1) spatial filtering, to distinguish systematic patterns from random noises; and (2) spatial clustering, to group filtered patterns into distinct defect types. Observing that spatial filtering is instrumental to high-quality mixed-type pattern recognition, we propose to use a graph-theoretic method, called adjacency-clustering, which leverages spatial dependence among adjacent defective chips to effectively filter the raw wafer maps. Tested on real-world data and compared against a state-of the-art approach, our proposed method achieves at least 46% gain in terms of internal cluster validation quality (i.e., validation without external class labels), and about ~5% gain in terms of Normalized Mutual Information--an external cluster validation metric based on external class labels. Interestingly, the margin of improvement appears to be a function of the pattern complexity, with larger gains achieved for more complex-shaped patterns.

Read more

Ready to get started?

Join us today