Featured Researches

Applications

Data-driven Analytical Models of COVID-2019 for Epidemic Prediction, Clinical Diagnosis, Policy Effectiveness and Contact Tracing: A Survey

The widely spread CoronaVirus Disease (COVID)-19 is one of the worst infectious disease outbreaks in history and has become an emergency of primary international concern. As the pandemic evolves, academic communities have been actively involved in various capacities, including accurate epidemic estimation, fast clinical diagnosis, policy effectiveness evaluation and development of contract tracing technologies. There are more than 23,000 academic papers on the COVID-19 outbreak, and this number is doubling every 20 days while the pandemic is still on-going [1]. The literature, however, at its early stage, lacks a comprehensive survey from a data analytics perspective. In this paper, we review the latest models for analyzing COVID19 related data, conduct post-publication model evaluations and cross-model comparisons, and collect data sources from different projects.

Read more
Applications

Defining Estimands Using a Mix of Strategies to Handle Intercurrent Events in Clinical Trials

Randomized controlled trials (RCT) are the gold standard for evaluation of the efficacy and safety of investigational interventions. If every patient in an RCT were to adhere to the randomized treatment, one could simply analyze the complete data to infer the treatment effect. However, intercurrent events (ICEs) including the use of concomitant medication for unsatisfactory efficacy, treatment discontinuation due to adverse events, or lack of efficacy, may lead to interventions that deviate from the original treatment assignment. Therefore, defining the appropriate estimand (the appropriate parameter to be estimated) based on the primary objective of the study is critical prior to determining the statistical analysis method and analyzing the data. The International Council for Harmonisation (ICH) E9 (R1), published on November 20, 2019, provided 5 strategies to define the estimand: treatment policy, hypothetical, composite variable, while on treatment and principal stratum. In this article, we propose an estimand using a mix of strategies in handling ICEs. This estimand is an average of the null treatment difference for those with ICEs potentially related to safety and the treatment difference for the other patients if they would complete the assigned treatments. Two examples from clinical trials evaluating anti-diabetes treatments are provided to illustrate the estimation of this proposed estimand and to compare it with the estimates for estimands using hypothetical and treatment policy strategies in handling ICEs.

Read more
Applications

Demand forecasting in hospitality using smoothed demand curves

Forecasting demand is one of the fundamental components of a successful revenue management system in hospitality. The industry requires understandable models that contribute to adaptability by a revenue management department to make data-driven decisions. Data analysis and forecasts prove an essential role for the time until the check-in date, which differs per day of week. This paper aims to provide a new model, which is inspired by cubic smoothing splines, resulting in smooth demand curves per rate class over time until the check-in date. This model regulates the error between data points and a smooth curve, and is therefore able to capture natural guest behavior. The forecast is obtained by solving a linear programming model, which enables the incorporation of industry knowledge in the form of constraints. Using data from a major hotel chain, a lower error and 13.3% more revenue is obtained.

Read more
Applications

Deriving information from missing data: implications for mood prediction

The availability of mobile technologies has enabled the efficient collection prospective longitudinal, ecologically valid self-reported mood data from psychiatric patients. These data streams have potential for improving the efficiency and accuracy of psychiatric diagnosis as well predicting future mood states enabling earlier intervention. However, missing responses are common in such datasets and there is little consensus as to how this should be dealt with in practice. A signature-based method was used to capture different elements of self-reported mood alongside missing data to both classify diagnostic group and predict future mood in patients with bipolar disorder, borderline personality disorder and healthy controls. The missing-response-incorporated signature-based method achieves roughly 66\% correct diagnosis, with f1 scores for three different clinic groups 59\% (bipolar disorder), 75\% (healthy control) and 61\% (borderline personality disorder) respectively. This was significantly more efficient than the naive model which excluded missing data. Accuracies of predicting subsequent mood states and scores were also improved by inclusion of missing responses. The signature method provided an effective approach to the analysis of prospectively collected mood data where missing data was common and should be considered as an approach in other similar datasets.

Read more
Applications

Designing group sequential clinical trials when a delayed effect is anticipated: A practical guidance

A common feature of many recent trials evaluating the effects of immunotherapy on survival is that non-proportional hazards can be anticipated at the design stage. This raises the possibility to use a statistical method tailored towards testing the purported long-term benefit, rather than applying the more standard log-rank test and/or Cox model. Many such proposals have been made in recent years, but there remains a lack of practical guidance on implementation, particularly in the context of group-sequential designs. In this article, we aim to fill this gap. We discuss how the POPLAR trial, which compared immunotherapy versus chemotherapy in non-small-cell lung cancer, might have been re-designed to be more robust to the presence of a delayed effect. We then provide step-by-step instructions on how to analyse a hypothetical realisation of the trial, based on this new design. Basic theory on weighted log-rank tests and group-sequential methods is covered, and an accompanying R package (including vignette) is provided.

Read more
Applications

Detecting Change Signs with Differential MDL Change Statistics for COVID-19 Pandemic Analysis

We are concerned with the issue of detecting changes and their signs from a data stream. For example, when given time series of COVID-19 cases in a region, we may raise early warning signals of outbreaks by detecting signs of changes in the cases. We propose a novel methodology to address this issue. The key idea is to employ a new information-theoretic notion, which we call the differential minimum description length change statistics (D-MDL), for measuring the scores of change sign. We first give a fundamental theory for D-MDL. We then demonstrate its effectiveness using synthetic datasets. We apply it to detecting early warning signals of the COVID-19 epidemic. We empirically demonstrate that D-MDL is able to raise early warning signals of events such as significant increase/decrease of cases. Remarkably, for about 64% of the events of significant increase of cases in 37 studied countries, our method can detect warning signals as early as nearly six days on average before the events, buying considerably long time for making responses. We further relate the warning signals to the basic reproduction number R0 and the timing of social distancing. The results showed that our method can effectively monitor the dynamics of R0 , and confirmed the effectiveness of social distancing at containing the epidemic in a region. We conclude that our method is a promising approach to the pandemic analysis from a data science viewpoint. The software for the experiments is available at this https URL. An online detection system is available at this https URL

Read more
Applications

Detection of foraging behavior from accelerometer data using U-Net type convolutional networks

Narwhal is one of the most mysterious marine mammals, due to its isolated habitat in the Arctic region. Tagging is a technology that has the potential to explore the activities of this species, where behavioral information can be collected from instrumented individuals. This includes accelerometer data, diving and acoustic data as well as GPS positioning. An essential element in understanding the ecological role of toothed whales is to characterize their feeding behavior and estimate the amount of food consumption. Buzzes are sounds emitted by toothed whales that are related directly to the foraging behaviors. It is therefore of interest to measure or estimate the rate of buzzing to estimate prey intake. The main goal of this paper is to find a way to detect prey capture attempts directly from accelerometer data, and thus be able to estimate food consumption without the need for the more demanding acoustic data. We develop 3 automated buzz detection methods based on accelerometer and depth data solely. We use a dataset from 5 narwhals instrumented in East Greenland in 2018 to train, validate and test a logistic regression model and the machine learning algorithms random forest and deep learning, using the buzzes detected from acoustic data as the ground truth. The deep learning algorithm performed best among the tested methods. We conclude that reliable buzz detectors can be derived from high-frequency-sampling, back-mounted accelerometer tags, thus providing an alternative tool for studies of foraging ecology of marine mammals in their natural environments. We also compare buzz detection with certain movement patterns, such as sudden changes in acceleration (jerks), found in other marine mammal species for estimating prey capture. We find that narwhals do not seem to make big jerks when foraging and conclude that their hunting patterns in that respect differ from other marine mammals.

Read more
Applications

Determination and estimation of optimal quarantine duration for infectious diseases with application to data analysis of COVID-19

Quarantine measure is a commonly used non-pharmaceutical intervention during the outbreak of infectious diseases. A key problem for implementing quarantine measure is to determine the duration of quarantine. In this paper, a policy with optimal quarantine duration is developed. The policy suggests different quarantine durations for every individual with different characteristic. The policy is optimal in the sense that it minimizes the average quarantine duration of uninfected people with the constraint that the probability of symptom presentation for infected people attains the given value closing to 1. The optimal solution for the quarantine duration is obtained and estimated by some statistic methods with application to analyzing COVID-19 data.

Read more
Applications

Diagnosis Prevalence vs. Efficacy in Machine-learning Based Diagnostic Decision Support

Many recent studies use machine learning to predict a small number of ICD-9-CM codes. In practice, on the other hand, physicians have to consider a broader range of diagnoses. This study aims to put these previously incongruent evaluation settings on a more equal footing by predicting ICD-9-CM codes based on electronic health record properties and demonstrating the relationship between diagnosis prevalence and system performance. We extracted patient features from the MIMIC-III dataset for each admission. We trained and evaluated 43 different machine learning classifiers. Among this pool, the most successful classifier was a Multi-Layer Perceptron. In accordance with general machine learning expectation, we observed all classifiers' F1 scores to drop as disease prevalence decreased. Scores fell from 0.28 for the 50 most prevalent ICD-9-CM codes to 0.03 for the 1000 most prevalent ICD-9-CM codes. Statistical analyses showed a moderate positive correlation between disease prevalence and efficacy (0.5866).

Read more
Applications

Difference-in-Differences for Ordinal Outcomes: Application to the Effect of Mass Shootings on Attitudes toward Gun Control

The difference-in-differences (DID) design is widely used in observational studies to estimate the causal effect of a treatment when repeated observations over time are available. Yet, almost all existing methods assume linearity in the potential outcome (parallel trends assumption) and target the additive effect. In social science research, however, many outcomes of interest are measured on an ordinal scale. This makes the linearity assumption inappropriate because the difference between two ordinal potential outcomes is not well defined. In this paper, I propose a method to draw causal inferences for ordinal outcomes under the DID design. Unlike existing methods, the proposed method utilizes the latent variable framework to handle the non-numeric nature of the outcome, enabling identification and estimation of causal effects based on the assumption on the quantile of the latent continuous variable. The paper also proposes an equivalence-based test to assess the plausibility of the key identification assumption when additional pre-treatment periods are available. The proposed method is applied to a study estimating the causal effect of mass shootings on the public's support for gun control. I find little evidence for a uniform shift toward pro-gun control policies as found in the previous study, but find that the effect is concentrated on left-leaning respondents who experienced the shooting for the first time in more than a decade.

Read more

Ready to get started?

Join us today