Featured Researches

Applications

A joint bayesian space-time model to integrate spatially misaligned air pollution data in R-INLA

In air pollution studies, dispersion models provide estimates of concentration at grid level covering the entire spatial domain, and are then calibrated against measurements from monitoring stations. However, these different data sources are misaligned in space and time. If misalignment is not considered, it can bias the predictions. We aim at demonstrating how the combination of multiple data sources, such as dispersion model outputs, ground observations and covariates, leads to more accurate predictions of air pollution at grid level. We consider nitrogen dioxide (NO2) concentration in Greater London and surroundings for the years 2007-2011, and combine two different dispersion models. Different sets of spatial and temporal effects are included in order to obtain the best predictive capability. Our proposed model is framed in between calibration and Bayesian melding techniques for data fusion red. Unlike other examples, we jointly model the response (concentration level at monitoring stations) and the dispersion model outputs on different scales, accounting for the different sources of uncertainty. Our spatio-temporal model allows us to reconstruct the latent fields of each model component, and to predict daily pollution concentrations. We compare the predictive capability of our proposed model with other established methods to account for misalignment (e.g. bilinear interpolation), showing that in our case study the joint model is a better alternative.

Read more
Applications

A mathematical take on the competitive balance of a football league

Competitive balance in a football league is extremely important from the perspective of economic growth of the industry. Many researchers have earlier proposed different measures of competitive balance, which are primarily adapted from the standard economic theory. However, these measures fail to capture the finer nuances of the game. In this work, we discuss a new framework which is more suitable for a football league. First, we present a mathematical proof of an ideal situation where a football league becomes perfectly balanced. Next, a goal based index for competitive balance is developed. We present relevant theoretical results and show how the proposed index can be used to formally test for the presence of imbalance. The methods are implemented on the data from top five European leagues, and it shows that the new approach can better explain the changes in the seasonal competitive balance of the leagues. Further, using appropriate panel data models, we show that the proposed index is more suitable to analyze the variability in total revenues of the football leagues.

Read more
Applications

A new generalized newsvendor model with random demand

Newsvendor problem is an extensively researched topic in inventory management. In this class of inventory problems, shortage and excess costs are considered to be proportional to the quantity lost. But, for critical goods or commodities, inventory decision is a typical example where, excess or shortage may lead to greater losses than merely the total cost. Such a problem has not been discussed much in the literature. Moreover, majority of the existing literature assumes the demand distribution to be completely known. In this paper, we propose a generalization of the newsvendor problem for critical goods or commodities with higher shortage or excess losses but of same degree. We also assume that, the parameters of the demand distribution are unknown. We also discuss different estimators of the optimal order quantity based on a random sample of demand. In particular, we provide different estimators based on (i) full sample and (ii) broken sample data (i.e with single order statistic). We also report comparison of the estimators using simulated bias and mean square error (MSE).

Read more
Applications

A note on post-treatment selection in studying racial discrimination in policing

We discuss some causal estimands used to study racial discrimination in policing. A central challenge is that not all police-civilian encounters are recorded in administrative datasets and available to researchers. One possible solution is to consider the average causal effect of race conditional on the civilian already being detained by the police. We find that such an estimand can be quite different from the more familiar ones in causal inference and needs to be interpreted with caution. We propose using an estimand new for this context -- the causal risk ratio, which has more transparent interpretation and requires weaker identification assumptions. We demonstrate this through a reanalysis of the NYPD Stop-and-Frisk dataset. Our reanalysis shows that the naive estimator that ignores the post-treatment selection in administrative records may severely underestimate the disparity in police violence between minorities and whites in these and similar data.

Read more
Applications

A note on the g and h control charts

In this note, we revisit the g and h control charts that are commonly used for monitoring the number of conforming cases between the two consecutive appearances of nonconformities. It is known that the process parameter of these charts is usually unknown and estimated by using the maximum likelihood estimator and the minimum variance unbiased estimator. However, the minimum variance unbiased estimator in the control charts has been inappropriately used in the quality engineering literature. This observation motivates us to provide the correct minimum variance unbiased estimator and investigate theoretical and empirical biases of these estimators under consideration. Given that these charts are developed based on the underlying assumption that samples from the process should be balanced, which is often not satisfied in many practical applications, we propose a method for constructing these charts with unbalanced samples.

Read more
Applications

A nowcasting approach to generate timely estimates of Mexican economic activity: An application to the period of COVID-19

In this paper, we present a new approach based on dynamic factor models (DFMs) to perform nowcasts for the percentage annual variation of the Mexican Global Economic Activity Indicator (IGAE in Spanish). The procedure consists of the following steps: i) build a timely and correlated database by using economic and financial time series and real-time variables such as social mobility and significant topics extracted by Google Trends; ii) estimate the common factors using the two-step methodology of Doz et al. (2011); iii) use the common factors in univariate time-series models for test data; and iv) according to the best results obtained in the previous step, combine the statistically equal better nowcasts (Diebold-Mariano test) to generate the current nowcasts. We obtain timely and accurate nowcasts for the IGAE, including those for the current phase of drastic drops in the economy related to COVID-19 sanitary measures. Additionally, the approach allows us to disentangle the key variables in the DFM by estimating the confidence interval for both the factor loadings and the factor estimates. This approach can be used in official statistics to obtain preliminary estimates for IGAE up to 50 days before the official results.

Read more
Applications

A pragmatic adaptive enrichment design for selecting the right target population for cancer immunotherapies

One of the challenges in the design of confirmatory trials is to deal with uncertainties regarding the optimal target population for a novel drug. Adaptive enrichment designs (AED) which allow for a data-driven selection of one or more pre-specified biomarker subpopulations at an interim analysis have been proposed in this setting but practical case studies of AEDs are still relatively rare. We present the design of an AED with a binary endpoint in the highly dynamic setting of cancer immunotherapy. The trial was initiated as a conventional trial in early triple-negative breast cancer but amended to an AED based on emerging data external to the trial suggesting that PD-L1 status could be a predictive biomarker. Operating characteristics are discussed including the concept of a minimal detectable difference, that is, the smallest observed treatment effect that would lead to a statistically significant result in at least one of the target populations at the interim or the final analysis, respectively, in the setting of AED.

Read more
Applications

A probabilistic risk-based decision framework for structural health monitoring

Obtaining the ability to make informed decisions regarding the operation and maintenance of structures, provides a major incentive for the implementation of structural health monitoring (SHM) systems. Probabilistic risk assessment (PRA) is an established methodology that allows engineers to make risk-informed decisions regarding the design and operation of safety-critical and high-value assets in industries such as nuclear and aerospace. The current paper aims to formulate a risk-based decision framework for structural health monitoring that combines elements of PRA with the existing SHM paradigm. As an apt tool for reasoning and decision-making under uncertainty, probabilistic graphical models serve as the foundation of the framework. The framework involves modelling failure modes of structures as Bayesian network representations of fault trees and then assigning costs or utilities to the failure events. The fault trees allow for information to pass from probabilistic classifiers to influence diagram representations of decision processes whilst also providing nodes within the graphical model that may be queried to obtain marginal probability distributions over local damage states within a structure. Optimal courses of action for structures are selected by determining the strategies that maximise expected utility. The risk-based framework is demonstrated on a realistic truss-like structure and supported by experimental data. Finally, a discussion of the risk-based approach is made and further challenges pertaining to decision-making processes in the context of SHM are identified.

Read more
Applications

A psychometric modeling approach to fuzzy rating data

Modeling fuzziness and imprecision in human rating data is a crucial problem in many research areas, including applied statistics, behavioral, social, and health sciences. Because of the interplay between cognitive, affective, and contextual factors, the process of answering survey questions is a complex task, which can barely be captured by standard (crisp) rating responses. Fuzzy rating scales have progressively been adopted to overcome some of the limitations of standard rating scales, including their inability to disentangle decision uncertainty from individual responses. The aim of this article is to provide a novel fuzzy scaling procedure which uses Item Response Theory trees (IRTrees) as a psychometric model for the stage-wise latent response process. In so doing, fuzziness of rating data is modeled using the overall rater's pattern of responses instead of being computed using a single-item based approach. This offers a consistent system for interpreting fuzziness in terms of individual-based decision uncertainty. A simulation study and two empirical applications are adopted to assess the characteristics of the proposed model and provide converging results about its effectiveness in modeling fuzziness and imprecision in rating data.

Read more
Applications

A rapidly updating stratified mix-adjusted median property price index model

Homeowners, first-time buyers, banks, governments and construction companies are highly interested in following the state of the property market. Currently, property price indexes are published several months out of date and hence do not offer the up-to-date information which housing market stakeholders need in order to make informed decisions. In this article, we present an updated version of a central-price tendency based property price index which uses geospatial property data and stratification in order to compare similar houses. The expansion of the algorithm to include additional parameters owing to a new data structure implementation and a richer dataset allows for the construction of a far smoother and more robust index than the original algorithm produced.

Read more

Ready to get started?

Join us today