Featured Researches

Applications

Copas' method is sensitive to different mechanisms of publication bias

Copas' method corrects a pooled estimate from an aggregated data meta-analysis for publication bias. Its performance has been studied for one particular mechanism of publication bias. We show through simulations that Copas' method is not robust against other realistic mechanisms. This questions the usefulness of Copas' method, since publication bias mechanisms are typically unknown in practice.

Read more
Applications

Correlated power time series of individual wind turbines: A data driven model approach

Wind farms can be regarded as complex systems that are, on the one hand, coupled to the nonlinear, stochastic characteristics of weather and, on the other hand, strongly influenced by supervisory control mechanisms. One crucial problem in this context today is the predictability of wind energy as an intermittent renewable resource with additional non-stationary nature. In this context, we analyze the power time series measured in an offshore wind farm for a total period of one year with a time resolution of 10 min. Applying detrended fluctuation analysis, we characterize the autocorrelation of power time series and find a Hurst exponent in the persistent regime with cross-over behavior. To enrich the modeling perspective of complex large wind energy systems, we develop a stochastic reduced-form model ofpower time series. The observed transitions between two dominating power generation phases are reflected by a bistable deterministic component, while correlated stochastic fluctuations account for the identified persistence. The model succeeds to qualitatively reproduce several empirical characteristics such as the autocorrelation function and the bimodal probability density function.

Read more
Applications

Cost-sensitive Multi-class AdaBoost for Understanding Driving Behavior with Telematics

Powered with telematics technology, insurers can now capture a wide range of data, such as distance traveled, how drivers brake, accelerate or make turns, and travel frequency each day of the week, to better decode driver's behavior. Such additional information helps insurers improve risk assessments for usage-based insurance (UBI), an increasingly popular industry innovation. In this article, we explore how to integrate telematics information to better predict claims frequency. For motor insurance during a policy year, we typically observe a large proportion of drivers with zero claims, a less proportion with exactly one claim, and far lesser with two or more claims. We introduce the use of a cost-sensitive multi-class adaptive boosting (AdaBoost) algorithm, which we call SAMME.C2, to handle such imbalances. To calibrate SAMME.C2 algorithm, we use empirical data collected from a telematics program in Canada and we find improved assessment of driving behavior with telematics relative to traditional risk variables. We demonstrate our algorithm can outperform other models that can handle class imbalances: SAMME, SAMME with SMOTE, RUSBoost, and SMOTEBoost. The sampled data on telematics were observations during 2013-2016 for which 50,301 are used for training and another 21,574 for testing. Broadly speaking, the additional information derived from vehicle telematics helps refine risk classification of drivers of UBI.

Read more
Applications

Coupling physical understanding and statistical modeling to estimate ice jam flood frequency in the northern Peace-Athabasca Delta under climate change

The Peace-Athabasca Delta (PAD) of northwestern Alberta is one of the largest inland freshwater deltas in the world, laying at the confluence of the Peace and Athabasca Rivers. The PAD is recognized as a having unique ecological significance and periodic ice jam flooding from both rivers is an important feature of its current ecology. Past studies have debated whether a change in ice jam flood (IJF) frequency on the Peace River has recently occurred, and what factors might be driving any perceived changes. This study contributes to this debate by addressing two questions: (1) what factors are most predictive of Peace River IJFs, and (2) how might climate change impact IJF frequency? This work starts with a physically-based conceptual model of the necessary conditions for a large Peace River IJF, and the factors that indicate whether those conditions are met. Logistic regression is applied to the historical flood record to determine which combination of hydroclimatic and riverine factors best predict IJFs and the uncertainty in those relationships given the available data. Winter precipitation and temperature are most predictive of Peace River IJFs, while freeze-up elevation contains little predictive power and is not closely related to IJF occurrence. The best logistic regression model is forced with downscaled climate change scenarios from multiple climate models to project IJF frequency for a variety of plausible futures. Parametric uncertainty in the best logistic regression model is propagated into the projections using a parametric bootstrap to sample many plausible statistical models. Although there is variability across emissions scenarios and climate models, all projections indicate that the frequency of Peace River IJFs is likely to decrease substantially in the coming decades, and that average waiting times between future IJFs will likely surpass recent experience.

Read more
Applications

Covid-19 risk factors: Statistical learning from German healthcare claims data

We analyse prior risk factors for severe, critical or fatal courses of Covid-19 based on a retrospective cohort using claims data of the AOK Bayern. As our main methodological contribution, we avoid prior grouping and pre-selection of candidate risk factors. Instead, fine-grained hierarchical information from medical classification systems for diagnoses, pharmaceuticals and procedures are used, resulting in more than 33,000 covariates. Our approach has better predictive ability than well-specified morbidity groups but does not need prior subject-matter knowledge. The methodology and estimated coefficients are made available to decision makers to prioritize protective measures towards vulnerable subpopulations and to researchers who like to adjust for a large set of confounders in studies of individual risk factors.

Read more
Applications

Credit Crunch: The Role of Household Lending Capacity in the Dutch Housing Boom and Bust 1995-2018

What causes house prices to rise and fall? Economists identify household access to credit as a crucial factor. "Loan-to-Value" and "Debt-to-GDP" ratios are the standard measures for credit access. However, these measures fail to explain the depth of the Dutch housing bust after the 2009 Financial Crisis. This work is the first to model household lending capacity based on the formulas that Dutch banks use in the mortgage application process. We compare the ability of regression models to forecast housing prices when different measures of credit access are utilised. We show that our measure of household lending capacity is a forward-looking, highly predictive variable that outperforms `Loan-to-Value' and debt ratios in forecasting the Dutch crisis. Sharp declines in lending capacity foreshadow the market deceleration.

Read more
Applications

Critical Risk Indicators (CRIs) for the electric power grid: A survey and discussion of interconnected effects

The electric power grid is a critical societal resource connecting multiple infrastructural domains such as agriculture, transportation, and manufacturing. The electrical grid as an infrastructure is shaped by human activity and public policy in terms of demand and supply requirements. Further, the grid is subject to changes and stresses due to solar weather, climate, hydrology, and ecology. The emerging interconnected and complex network dependencies make such interactions increasingly dynamic causing potentially large swings, thus presenting new challenges to manage the coupled human-natural system. This paper provides a survey of models and methods that seek to explore the significant interconnected impact of the electric power grid and interdependent domains. We also provide relevant critical risk indicators (CRIs) across diverse domains that may influence electric power grid risks, including climate, ecology, hydrology, finance, space weather, and agriculture. We discuss the convergence of indicators from individual domains to explore possible systemic risk, i.e., holistic risk arising from cross-domains interconnections. Our study provides an important first step towards data-driven analysis and predictive modeling of risks in the coupled interconnected systems. Further, we propose a compositional approach to risk assessment that incorporates diverse domain expertise and information, data science, and computer science to identify domain-specific CRIs and their union in systemic risk indicators.

Read more
Applications

Customer Price Sensitivities in Competitive Automobile Insurance Markets

Insurers are increasingly adopting more demand-based strategies to incorporate the indirect effect of premium changes on their policyholders' willingness to stay. However, since in practice both insurers' renewal premia and customers' responses to these premia typically depend on the customer's level of risk, it remains challenging in these strategies to determine how to properly control for this confounding. We therefore consider a causal inference approach in this paper to account for customer price sensitivities and to deduce optimal, multi-period profit maximizing premium renewal offers. More specifically, we extend the discrete treatment framework of Guelman and Guillén (2014) by Extreme Gradient Boosting, or XGBoost, and by multiple imputation to better account for the uncertainty in the counterfactual responses. We additionally introduce the continuous treatment framework with XGBoost to the insurance literature to allow identification of the exact optimal renewal offers and account for any competition in the market by including competitor offers. The application of the two treatment frameworks to a Dutch automobile insurance portfolio suggests that a policy's competitiveness in the market is crucial for a customer's price sensitivity and that XGBoost is more appropriate to describe this than the traditional logistic regression. Moreover, an efficient frontier of both frameworks indicates that substantially more profit can be gained on the portfolio than realized, also already with less churn and in particular if we allow for continuous rate changes. A multi-period renewal optimization confirms these findings and demonstrates that the competitiveness enables temporal feedback of previous rate changes on future demand.

Read more
Applications

CytOpT: Optimal Transport with Domain Adaptation for Interpreting Flow Cytometry data

The automated analysis of flow cytometry measurements is an active research field. We introduce a new algorithm, referred to as CytOpT, using regularized optimal transport to directly estimate the different cell population proportions from a biological sample characterized with flow cytometry measurements. We rely on the regularized Wasserstein metric to compare cytometry measurements from different samples, thus accounting for possible mis-alignment of a given cell population across sample (due to technical variability from the technology of measurements). In this work, we rely on a supervised learning technique based on the Wasserstein metric that is used to estimate an optimal re-weighting of class proportions in a mixture model from a source distribution (with known segmentation into cell sub-populations) to fit a target distribution with unknown segmentation. Due to the high-dimensionality of flow cytometry data, we use stochastic algorithms to approximate the regularized Wasserstein metric to solve the optimization problem involved in the estimation of optimal weights representing the cell population proportions in the target distribution. Several flow cytometry data sets are used to illustrate the performances of CytOpT that are also compared to those of existing algorithms for automatic gating based on supervised learning.

Read more
Applications

Data-adaptive Dimension Reduction for US Mortality Forecasting

Forecasting accuracy of mortality data is important for the management of pension funds and pricing of life insurance in actuarial science. Age-specific mortality forecasting in the US poses a challenging problem in high dimensional time series analysis. Prior attempts utilize traditional dimension reduction techniques to avoid the curse of dimensionality, and then mortality forecasting is achieved through features' forecasting. However, a method of reducing dimension pertinent to ideal forecasting is elusive. To address this, we propose a novel approach to pursue features that are not only capable of representing original data well but also capturing time-serial dependence as most as possible. The proposed method is adaptive for the US mortality data and enjoys good statistical performance. As a comparison, our method performs better than existing approaches, especially in regard to the Lee-Carter Model as a benchmark in mortality analysis. Based on forecasting results, we generate more accurate estimates of future life expectancies and prices of life annuities, which can have great financial impact on life insurers and social securities compared with using Lee-Carter Model. Furthermore, various simulations illustrate scenarios under which our method has advantages, as well as interpretation of the good performance on mortality data.

Read more

Ready to get started?

Join us today