Featured Researches

Applications

A Hybrid Framework for Topology Identification of Distribution Grid with Renewables Integration

Topology identification (TI) is a key task for state estimation (SE) in distribution grids, especially the one with high-penetration renewables. The uncertainties, initiated by the time-series behavior of renewables, will almost certainly lead to bad TI results without a proper treatment. These uncertainties are analytically intractable under conventional framework--they are usually jointly spatial-temporal dependent, and hence cannot be simply treated as white noise. For this purpose, a hybrid framework is suggested in this paper to handle these uncertainties in a systematic and theoretical way; in particular, big data analytics are studied to harness the jointly spatial-temporal statistical properties of those uncertainties. With some prior knowledge, a model bank is built first to store the countable typical models of network configurations; therefore, the difference between the SE outputs of each bank model and our observation is capable of being defined as a matrix variate--the so-called random matrix. In order to gain insight into the random matrix, a well-designed metric space is needed. Auto-regression (AR) model, factor analysis (FA), and random matrix theory (RMT) are tied together for the metric space design, followed by jointly temporal-spatial analysis of those matrices which is conducted in a high-dimensional (vector) space. Under the proposed framework, some big data analytics and theoretical results are obtained to improve the TI performance. Our framework is validated using IEEE standard distribution network with some field data in practice.

Read more
Applications

A Joint Spatial Conditional Auto-Regressive Model for Estimating HIV Prevalence Rates Among Key Populations

Ending the HIV/AIDS pandemic is among the Sustainable Development Goals for the next decade. In order to overcome the gap between the need for care and the available resources, better understanding of HIV epidemics is needed to guide policy decisions, especially for key populations that are at higher risk for HIV infection. Accurate HIV epidemic estimates for key populations have been difficult to obtain because their HIV surveillance data is very limited. In this paper, we propose a so-called joint spatial conditional auto-regressive model for estimating HIV prevalence rates among key populations. Our model borrows information from both neighboring locations and dependent populations. As illustrated in the real data analysis, it provides more accurate estimates than independently fitting the sub-epidemic for each key population. In addition, we provide a study to reveal the conditions that our proposal gives a better prediction. The study combines both theoretical investigation and numerical study, revealing strength and limitations of our proposal.

Read more
Applications

A Latent Mixture Model for Heterogeneous Causal Mechanisms in Mendelian Randomization

Mendelian Randomization (MR) is a popular method in epidemiology and genetics that uses genetic variation as instrumental variables for causal inference. Existing MR methods usually assume most genetic variants are valid instrumental variables that identify a common causal effect. There is a general lack of awareness that this effect homogeneity assumption can be violated when there are multiple causal pathways involved, even if all the instrumental variables are valid. In this article, we introduce a latent mixture model MR-PATH that groups instruments that yield similar causal effect estimates together. We develop a Monte-Carlo EM algorithm to fit this mixture model, derive approximate confidence intervals for uncertainty quantification, and adopt a modified Bayesian Information Criterion (BIC) for model selection. We verify the efficacy of the Monte-Carlo EM algorithm, confidence intervals, and model selection criterion using numerical simulations. We identify potential mechanistic heterogeneity when applying our method to estimate the effect of high-density lipoprotein cholesterol on coronary heart disease and the effect of adiposity on type II diabetes.

Read more
Applications

A Latent Survival Analysis Enabled Simulation Platform For Nursing Home Staffing Strategy Evaluation

Nursing homes are critical facilities for caring frail older adults with round-the-clock formal care and personal assistance. To ensure quality care for nursing home residents, adequate staffing level is of great importance. Current nursing home staffing practice is mainly based on experience and regulation. The objective of this paper is to investigate the viability of experience-based and regulation-based strategies, as well as alternative staffing strategies to minimize labor costs subject to heterogeneous service demand of nursing home residents under various scenarios of census. We propose a data-driven analysis framework to model heterogeneous service demand of nursing home residents and further identify appropriate staffing strategies by combing survival model and computer simulation techniques as well as domain knowledge. Specifically, in the analysis, we develop an agent-based simulation tool consisting of four main modules, namely individual length of stay predictor, individual daily staff time generator, facility level staffing strategy evaluator, and graphical user interface. We use real nursing home data to validate the proposed model, and demonstrate that the identified staffing strategy significantly reduces the total labor cost of certified nursing assistants compared to the benchmark strategies. Additionally, the proposed length of stay predictive model that considers multiple discharge dispositions exhibits superior accuracy and offers better staffing decisions than those without the consideration. Further, we construct different census scenarios of nursing home residents to demonstrate the capability of the proposed framework in helping adjust staffing decisions of nursing home administrators in various realistic settings.

Read more
Applications

A Mortality Model for Multi-populations: A Semi-Parametric Approach

Mortality is different across countries, states and regions. Several empirical research works however reveal that mortality trends exhibit a common pattern and show similar structures across populations. The key element in analyzing mortality rate is a time-varying indicator curve. Our main interest lies in validating the existence of the common trends among these curves, the similar gender differences and their variability in location among the curves at the national level. Motivated by the empirical findings, we make the study of estimating and forecasting mortality rates based on a semi-parametric approach, which is applied to multiple curves with the shape-related nonlinear variation. This approach allows us to capture the common features contained in the curve functions and meanwhile provides the possibility to characterize the nonlinear variation via a few deviation parameters. These parameters carry an instructive summary of the time-varying curve functions and can be further used to make a suggestive forecast analysis for countries with barren data sets. In this research the model is illustrated with mortality rates of Japan and China, and extended to incorporate more countries.

Read more
Applications

A Multi-Stage Stochastic Programming Approach to Epidemic Resource Allocation with Equity Considerations

Existing compartmental models in epidemiology are limited in terms of optimizing the resource allocation to control an epidemic outbreak under disease growth uncertainty. In this study, we address this core limitation by presenting a multi-stage stochastic programming compartmental model, which integrates the uncertain disease progression and resource allocation to control an infectious disease outbreak. The proposed multi-stage stochastic program involves various disease growth scenarios and optimizes the distribution of treatment centers and resources while minimizing the total expected number of new infections and funerals. We define two new equity metrics, namely infection and capacity equity, and explicitly consider equity for allocating treatment funds and facilities over multiple time stages. We also study the multi-stage value of the stochastic solution (VSS), which demonstrates the superiority of the proposed stochastic programming model over its deterministic counterpart. We apply the proposed formulation to control the Ebola Virus Disease (EVD) in Guinea, Sierra Leone, and Liberia of West Africa to determine the optimal and fair resource-allocation strategies. Our model balances the proportion of infections over all regions, even without including the infection equity or prevalence equity constraints. Model results also show that allocating treatment resources proportional to population is sub-optimal, and enforcing such a resource allocation policy might adversely impact the total number of infections and deaths, and thus resulting in a high cost that we have to pay for the fairness. Our multi-stage stochastic epidemic-logistics model is practical and can be adapted to control other infectious diseases in meta-populations and dynamically evolving situations.

Read more
Applications

A Multivariate Methodology for Analysing Students' Performance Using Register Data

We present a new method for jointly modelling the students' results in the university's admission exams and their performance in subsequent courses at the university. The case considered involved all the students enrolled at the University of Campinas in 2014 to evening studies programs in educational branches related to exact sciences. We collected the number of attempts used for passing the university course of geometry and the results of the admission exams of those students in seven disciplines. The method introduced involved a combination of multivariate generalised linear mixed models (GLMM) and graphical models for representing the covariance structure of the random components. The models we used allowed us to discuss the association of quantities of very different nature. We used Gaussian GLMM for modelling the performance in the admission exams and a frailty discrete-time Cox proportional model, represented by a GLMM, to describe the number of attempts for passing Geometry. The analyses were stratified into two populations: the students who received a bonus giving advantages in the university's admission process to compensate social and racial inequalities and those who did not receive the compensation. The two populations presented different patterns. Using general properties of graphical models, we argue that, on the one hand, the predicted performance in the admission exam of Mathematics could solely be used as a predictor of the performance in geometry for the students who received the bonus. On the other hand, the Portuguese admission exam's predicted performance could be used as a single predictor of the performance in geometry for the students who did not receive the bonus.

Read more
Applications

A New Spatial Count Data Model with Time-varying Parameters

Recent crash frequency studies incorporate spatiotemporal correlations, but these studies have two key limitations: i) none of these studies accounts for temporal variation in model parameters; and ii) Gibbs sampler suffers from convergence issues due to non-conjugacy. To address the first limitation, we propose a new count data model that identifies the underlying temporal patterns of the regression parameters while simultaneously allowing for time-varying spatial correlation. The model is also extended to incorporate heterogeneity in non-temporal parameters across spatial units. We tackle the second shortcoming by deriving a Gibbs sampler that ensures conditionally conjugate posterior updates for all model parameters. To this end, we take the advantages of PĆ³lya-Gamma data augmentation and forward filtering backward sampling (FFBS) algorithm. After validating the properties of the Gibbs sampler in a Monte Carlo study, the advantages of the proposed specification are demonstrated in an empirical application to uncover relationships between crash frequency spanning across nine years and pavement characteristics. Model parameters exhibit practically significant temporal patterns (i.e., temporal instability). For example, the safety benefits of better pavement ride quality are estimated to increase over time.

Read more
Applications

A Novel Algorithm for Optimized Real Time Anomaly Detection in Timeseries

Observations in data which are significantly different from its neighbouring points but cannot be classified as noise are known as anomalies or outliers. These anomalies are a cause of concern and a timely warning about their presence could be valuable. In this paper, we have evaluated and compared the performance of popular algorithms from domains of Machine Learning and Statistics in detecting anomalies on both offline data as well as real time data. Our aim is to come up with an algorithm which can handle all types of seasonal and non-seasonal data effectively and is fast enough to be of practical utility in real time. It is not only important to detect anomalies at the global but also the ones which are anomalies owing to their local surroundings. Such outliers can be termed as contextual anomalies as they derive their context from the neighbouring observations. Also, we require a methodology to automatically determine the presence of seasonality in the given data. For detecting the seasonality, the proposed algorithm takes up a curve fitting approach rather than model based anomaly detection. The proposed model also introduces a unique filter which assess the relative significance of local outliers and removes the ones deemed as insignificant. Since, the proposed model fits polynomial in buckets of timeseries data, it does not suffer from problems such as heteroskedasticity and breakout as compared to its statistical alternatives such as ARIMA, SARIMA and Winter Holt. Experimental results the proposed algorithm performs better on both real time as well as artificial generated datasets.

Read more
Applications

A Practical Model-based Segmentation Approach for Accurate Activation Detection in Single-Subject functional Magnetic Resonance Imaging Studies

Functional Magnetic Resonance Imaging (fMRI) maps cerebral activation in response to stimuli but this activation is often difficult to detect, especially in low-signal contexts and single-subject studies. Accurate activation detection can be guided by the fact that very few voxels are, in reality, truly activated and that activated voxels are spatially localized, but it is challenging to incorporate both these facts. We provide a computationally feasible and methodologically sound model-based approach, implemented in the R package MixfMRI, that bounds the a priori expected proportion of activated voxels while also incorporating spatial context. Results on simulation experiments for different levels of activation detection difficulty are uniformly encouraging. The value of the methodology in low-signal and single-subject fMRI studies is illustrated on a sports imagination experiment. Concurrently, we also extend the potential use of fMRI as a clinical tool to, for example, detect awareness and improve treatment in individual patients in persistent vegetative state, such as traumatic brain injury survivors.

Read more

Ready to get started?

Join us today