Featured Researches

Applications

Calibration methods for spatial Data

In an environmental framework, extreme values of certain spatio-temporal processes, for example wind speeds, are the main cause of severe damage in property, such as electrical networks, transport and agricultural infrastructures. Therefore, availability of accurate data on such processes is highly important in risk analysis, and in particular in producing probability maps showing the spatial distribution of damage risks. Typically, as is the case of wind speeds, data are available at few stations with many missing observations and consequently simulated data are often used to augment information, due to simulated environmental data being available at high spatial and temporal resolutions. However, simulated data often mismatch observed data, particularly at tails, therefore calibrating and bringing it in line with observed data may offer practitioners more reliable and richer data sources. Although the calibration methods that we describe in this manuscript may equally apply to other environmental variables, we describe the methods specifically with reference to wind data and its consequences. Since most damages are caused by extreme winds, it is particularly important to calibrate the right tail of simulated data based on observations. Response relationships between the extremes of simulated and observed data are by nature highly non-linear and non-Gaussian, therefore data fusion techniques available for spatial data may not be adequate for this purpose. After giving a brief description of standard calibration and data fusion methods to update simulated data based on the observed data, we propose and describe in detail a specific conditional quantile matching calibration method and show how our wind speed data can be calibrated using this method.

Read more
Applications

Can we trust the standardized mortality ratio? A formal analysis and evaluation based on axiomatic requirements

Background: The standardized mortality ratio (SMR) is often used to assess and compare hospital performance. While it has been recognized that hospitals may differ in their SMRs due to differences in patient composition, there is a lack of rigorous analysis of this and other - largely unrecognized - properties of the SMR. Methods: This paper proposes five axiomatic requirements for adequate standardized mortality measures: strict monotonicity, case-mix insensitivity, scale insensitivity, equivalence principle, and dominance principle. Given these axiomatic requirements, effects of variations in patient composition, hospital size, and actual and expected mortality rates on the SMR were examined using basic algebra and calculus. In this regard, we distinguished between standardization using expected mortality rates derived from a different dataset (external standardization) and standardization based on a dataset including the considered hospitals (internal standardization). Results: Under external standardization, the SMR fulfills the axiomatic requirements of strict monotonicity and scale insensitivity but violates the requirement of case-mix insensitivity, the equivalence principle, and the dominance principle. All axiomatic requirements not fulfilled under external standardization are also not fulfilled under internal standardization. In addition, the SMR under internal standardization is scale sensitive and violates the axiomatic requirement of strict monotonicity. Conclusions: The SMR fulfills only two (none) out of the five proposed axiomatic requirements under external (internal) standardization. Generally, the SMRs of hospitals are differently affected by variations in case mix and actual and expected mortality rates unless the hospitals are identical in these characteristics. These properties hamper valid assessment and comparison of hospital performance based on the SMR.

Read more
Applications

Capacity Value of Solar Power and Other Variable Generation

This paper reviews methods that are used for adequacy risk assessment considering solar power and for assessment of the capacity value of solar power. The properties of solar power are described as seen from the perspective of the power-system operator, comparing differences in energy availability and capacity factors with those of wind power. Methodologies for risk calculations considering variable generation are surveyed, including the probability background, statistical-estimation approaches, and capacity-value metrics. Issues in incorporating variable generation in capacity markets are described, followed by a review of applied studies considering solar power. Finally, recommendations for further research are presented.

Read more
Applications

Categorical Exploratory Data Analysis: From Multiclass Classification and Response Manifold Analytics perspectives of baseball pitching dynamics

From two coupled Multiclass Classification (MCC) and Response Manifold Analytics (RMA) perspectives, we develop Categorical Exploratory Data Analysis (CEDA) on PITCHf/x database for the information content of Major League Baseball's (MLB) pitching dynamics. MCC and RMA information contents are represented by one collection of multi-scales pattern categories from mixing geometries and one collection of global-to-local geometric localities from response-covariate manifolds, respectively. These collectives shed light on the pitching dynamics and maps out uncertainty of popular machine learning approaches. On MCC setting, an indirect-distance-measure based label embedding tree leads to discover asymmetry of mixing geometries among labels' point-clouds. A selected chain of complementary covariate feature groups collectively brings out multi-order mixing geometric pattern categories. Such categories then reveal the true nature of MCC predictive inferences. On RMA setting, multiple response features couple with multiple major covariate features to demonstrate physical principles bearing manifolds with a lattice of natural localities. With minor features' heterogeneous effects being locally identified, such localities jointly weave their focal characteristics into system understanding and provide a platform for RMA predictive inferences. Our CEDA works for universal data types, adopts non-linear associations and facilitates efficient feature-selections and inferences.

Read more
Applications

Causal Mediation Analysis for Sparse and Irregular Longitudinal Data

Causal mediation analysis seeks to investigate how the treatment effect of an exposure on outcomes is mediated through intermediate variables. Although many applications involve longitudinal data, the existing methods are not directly applicable to settings where the mediator and outcome are measured on sparse and irregular time grids. We extend the existing causal mediation framework from a functional data analysis perspective, viewing the sparse and irregular longitudinal data as realizations of underlying smooth stochastic processes. We define causal estimands of direct and indirect effects accordingly and provide corresponding identification assumptions. For estimation and inference, we employ a functional principal component analysis approach for dimension reduction and use the first few functional principal components instead of the whole trajectories in the structural equation models. We adopt the Bayesian paradigm to accurately quantify the uncertainties. The operating characteristics of the proposed methods are examined via simulations. We apply the proposed methods to a longitudinal data set from a wild baboon population in Kenya to investigate the causal relationships between early adversity, strength of social bonds between animals, and adult glucocorticoid hormone concentrations. We find that early adversity has a significant direct effect (a 9-14% increase) on females' glucocorticoid concentrations across adulthood, but find little evidence that these effects were mediated by weak social bonds.

Read more
Applications

Causal Meta-Mediation Analysis: Inferring Dose-Response Function From Summary Statistics of Many Randomized Experiments

It is common in the internet industry to use offline-developed algorithms to power online products that contribute to the success of a business. Offline-developed algorithms are guided by offline evaluation metrics, which are often different from online business key performance indicators (KPIs). To maximize business KPIs, it is important to pick a north star among all available offline evaluation metrics. By noting that online products can be measured by online evaluation metrics, the online counterparts of offline evaluation metrics, we decompose the problem into two parts. As the offline A/B test literature works out the first part: counterfactual estimators of offline evaluation metrics that move the same way as their online counterparts, we focus on the second part: causal effects of online evaluation metrics on business KPIs. The north star of offline evaluation metrics should be the one whose online counterpart causes the most significant lift in the business KPI. We model the online evaluation metric as a mediator and formalize its causality with the business KPI as dose-response function (DRF). Our novel approach, causal meta-mediation analysis, leverages summary statistics of many existing randomized experiments to identify, estimate, and test the mediator DRF. It is easy to implement and to scale up, and has many advantages over the literature of mediation analysis and meta-analysis. We demonstrate its effectiveness by simulation and implementation on real data.

Read more
Applications

Causal analysis of Covid-19 spread in Germany

In this work, we study the causal relations among German regions in terms of the spread of Covid-19 since the beginning of the pandemic, taking into account the restriction policies that were applied by the different federal states. We propose and prove a new theorem for a causal feature selection method for time series data, robust to latent confounders, which we subsequently apply on Covid-19 case numbers. We present findings about the spread of the virus in Germany and the causal impact of restriction measures, discussing the role of various policies in containing the spread. Since our results are based on rather limited target time series (only the numbers of reported cases), care should be exercised in interpreting them. However, it is encouraging that already such limited data seems to contain causal signals. This suggests that as more data becomes available, our causal approach may contribute towards meaningful causal analysis of political interventions on the development of Covid-19, and thus also towards the development of rational and data-driven methodologies for choosing interventions.

Read more
Applications

Causal inference methods for small non-randomized studies: Methods and an application in COVID-19

The usual development cycles are too slow for the development of vaccines, diagnostics and treatments in pandemics such as the ongoing SARS-CoV-2 pandemic. Given the pressure in such a situation, there is a risk that findings of early clinical trials are overinterpreted despite their limitations in terms of size and design. Motivated by a non-randomized open-label study investigating the efficacy of hydroxychloroquine in patients with COVID-19, we describe in a unified fashion various alternative approaches to the analysis of non-randomized studies. A widely used tool to reduce the impact of treatment-selection bias are so-called propensity score (PS) methods. Conditioning on the propensity score allows one to replicate the design of a randomized controlled trial, conditional on observed covariates. Extensions include the g-computation approach, which is less frequently applied, in particular in clinical studies. Moreover, doubly robust estimators provide additional advantages. Here, we investigate the properties of propensity score based methods including three variations of doubly robust estimators in small sample settings, typical for early trials, in a simulation study. R code for the simulations is provided.

Read more
Applications

Claiming trend in toxicological and pharmacological dose-response studies: an overview on statistical methods and related R-Software

There are very different statistical methods for demonstrating a trend in pharmacological experiments. Here, the focus is on sparse models with only one parameter to be estimated and interpreted: the increase in the regression model and the difference to control in the contrast model. This provides both p-values and confidence intervals for an appropriate effect size. A combined test consisting of the Tukey regression approach and the multiple contrast test according to Williams is recommended, which can be generalized to the generalized linear (mixed effect) model. Thus numerous variable types occurring in pharmacology/toxicology can be adequately evaluated. Software is available through CRAN packages. The most significant limitation of this approach is for designs with very small sample sizes, often in pharmacology/toxicology.

Read more
Applications

Classification of chemical compounds based on the correlation between \textit{in vitro} gene expression profiles

Toxicity evaluation of chemical compounds has traditionally relied on animal experiments;however, the demand for non-animal-based prediction methods for toxicology of compounds is increasing worldwide. Our aim was to provide a classification method for compounds based on \textit{in vitro} gene expression profiles. The \textit{in vitro} gene expression data analyzed in the present study was obtained from our previous study. The data concerned nine compounds typically employed in chemical management.We used agglomerative hierarchical clustering to classify the compounds;however, there was a statistical difficulty to be overcome.We needed to properly extract RNAs for clustering from more than 30,000 RNAs. In order to overcome this difficulty, we introduced a combinatorial optimization problem with respect to both gene expression levels and the correlation between gene expression profiles. Then, the simulated annealing algorithm was used to obtain a good solution for the problem. As a result, the nine compounds were divided into two groups using 1,000 extracted RNAs. Our proposed methodology enables read-across, one of the frameworks for predicting toxicology, based on \textit{in vitro} gene expression profiles.

Read more

Ready to get started?

Join us today