Featured Researches

Methodology

Modeling massive highly-multivariate nonstationary spatial data with the basis graphical lasso

We propose a new modeling framework for highly-multivariate spatial processes that synthesizes ideas from recent multiscale and spectral approaches with graphical models. The basis graphical lasso writes a univariate Gaussian process as a linear combination of basis functions weighted with entries of a Gaussian graphical vector whose graph is estimated from optimizing an ??1 penalized likelihood. This paper extends the setting to a multivariate Gaussian process where the basis functions are weighted with Gaussian graphical vectors. We motivate a model where the basis functions represent different levels of resolution and the graphical vectors for each level are assumed to be independent. Using an orthogonal basis grants linear complexity and memory usage in the number of spatial locations, the number of basis functions, and the number of realizations. An additional fusion penalty encourages a parsimonious conditional independence structure in the multilevel graphical model. We illustrate our method on a large climate ensemble from the National Center for Atmospheric Research's Community Atmosphere Model that involves 40 spatial processes.

Read more
Methodology

Modeling partitions of individuals

Despite the central role of self-assembled groups in animal and human societies, statistical tools to explain their composition are limited. We introduce a statistical framework for cross-sectional observations of groups with exclusive membership to illuminate the social and organizational mechanisms that bring people together. Drawing from stochastic models for networks and partitions, the proposed framework introduces an exponential family of distributions for partitions. We derive its main mathematical properties and suggest strategies to specify and estimate such models. A case study on hackathon events applies the developed framework to the study of mechanisms underlying the formation of self-assembled project teams.

Read more
Methodology

Modeling short-ranged dependence in block extrema with application to polar temperature data

The block maxima approach is an important method in univariate extreme value analysis. While assuming that block maxima are independent results in straightforward analysis, the resulting inferences maybe invalid when a series of block maxima exhibits dependence. We propose a model, based on a first-order Markov assumption, that incorporates dependence between successive block maxima through the use of a bivariate logistic dependence structure while maintaining generalized extreme value (GEV) marginal distributions. Modeling dependence in this manner allows us to better estimate extreme quantiles when block maxima exhibit short-ranged dependence. We demonstrate via a simulation study that our first-order Markov GEV model performs well when successive block maxima are dependent, while still being reasonably robust when maxima are independent. We apply our method to two polar annual minimum air temperature data sets that exhibit short-ranged dependence structures, and find that the proposed model yields modified estimates of high quantiles.

Read more
Methodology

Modelling Extremes of Spatial Aggregates of Precipitation using Conditional Methods

Inference on the extremal behaviour of spatial aggregates of precipitation is important for quantifying river flood risk. There are two classes of previous approach, with one failing to ensure self-consistency in inference across different regions of aggregation and the other requiring highly inflexible marginal and spatial dependence structure assumptions. To overcome these issues, we propose a model for high-resolution precipitation data, from which we can simulate realistic fields and explore the behaviour of spatial aggregates. Recent developments in spatial extremes literature have seen promising progress with spatial extensions of the Heffernan and Tawn (2004) model for conditional multivariate extremes, which can handle a wide range of dependence structures. Our contribution is twofold: new parametric forms for the dependence parameters of this model; and a novel framework for deriving aggregates addressing edge effects and sub-regions without rain. We apply our modelling approach to gridded East-Anglia, UK precipitation data. Return-level curves for spatial aggregates over different regions of various sizes are estimated and shown to fit very well to the data.

Read more
Methodology

Modelling Time-Varying Rankings with Autoregressive and Score-Driven Dynamics

We develop a new statistical model to analyse time-varying ranking data. The model can be used with a large number of ranked items, accommodates exogenous time-varying covariates and partial rankings, and is estimated via maximum likelihood in a straightforward manner. Rankings are modelled using the Plackett-Luce distribution with time-varying worth parameters that follow a mean-reverting time series process. To capture the dependence of the worth parameters on past rankings, we utilize the conditional score in the fashion of the generalized autoregressive score (GAS) models. Simulation experiments show that small-sample properties of the maximum-likelihood estimator improve rapidly with the length of the time series and suggest that statistical inference relying on conventional Hessian-based standard errors is usable even for medium-sized samples. As an illustration, we apply the model to the results of the Ice Hockey World Championships. We also discuss applications to rankings based on underlying indices, repeated surveys, and non-parametric efficiency analysis.

Read more
Methodology

Modelling multi-scale state-switching functional data with hidden Markov models

Data sets comprised of sequences of curves sampled at high frequencies in time are increasingly common in practice, but they can exhibit complicated dependence structures that cannot be modelled using common methods of Functional Data Analysis (FDA). We detail a hierarchical approach which treats the curves as observations from a hidden Markov model (HMM). The distribution of each curve is then defined by another fine-scale model which may involve auto-regression and require data transformations using moving-window summary statistics or Fourier analysis. This approach is broadly applicable to sequences of curves exhibiting intricate dependence structures. As a case study, we use this framework to model the fine-scale kinematic movement of a northern resident killer whale (Orcinus orca) off the coast of British Columbia, Canada. Through simulations, we show that our model produces more interpretable state estimation and more accurate parameter estimates compared to existing methods.

Read more
Methodology

Modelling wind speed with a univariate probability distribution depending on two baseline functions

Characterizing the wind speed distribution properly is essential for the satisfactory production of potential energy in wind farms, being the mixture models usually employed in the description of such data. However, some mixture models commonly have the undesirable property of non-identifiability. In this work, we present an alternative distribution which is able to fit the wind speed data adequately. The new model, called Normal-Weibull-Weibull, is identifiable and its cumulative distribution function is written as a composition of two baseline functions. We discuss structural properties of the class that generates the proposed model, such as the linear representation of the probability density function, moments and moment generating function. We perform a Monte Carlo simulation study to investigate the behavior of the maximum likelihood estimates of the parameters. Finally, we present applications of the new distribution for modelling wind speed data measured in five different cities of the Northeastern Region of Brazil.

Read more
Methodology

Monitoring SEIRD model parameters using MEWMA for the COVID-19 pandemic with application to the State of Qatar

During the current COVID-19 pandemic, decision makers are tasked with implementing and evaluating strategies for both treatment and disease prevention. In order to make effective decisions, they need to simultaneously monitor various attributes of the pandemic such as transmission rate and infection rate for disease prevention, recovery rate which indicates treatment effectiveness as well as the mortality rate and others. This work presents a technique for monitoring the pandemic by employing an Susceptible, Exposed, Infected, Recovered Death model regularly estimated by an augmented particle Markov chain Monte Carlo scheme in which the posterior distribution samples are monitored via Multivariate Exponentially Weighted Average process monitoring. This is illustrated on the COVID-19 data for the State of Qatar.

Read more
Methodology

Mortality Forecasting using Factor Models: Time-varying or Time-invariant Factor Loadings?

Many existing mortality models follow the framework of classical factor models, such as the Lee-Carter model and its variants. Latent common factors in factor models are defined as time-related mortality indices (such as κ t in the Lee-Carter model). Factor loadings, which capture the linear relationship between age variables and latent common factors (such as β x in the Lee-Carter model), are assumed to be time-invariant in the classical framework. This assumption is usually too restrictive in reality as mortality datasets typically span a long period of time. Driving forces such as medical improvement of certain diseases, environmental changes and technological progress may significantly influence the relationship of different variables. In this paper, we first develop a factor model with time-varying factor loadings (time-varying factor model) as an extension of the classical factor model for mortality modelling. Two forecasting methods to extrapolate the factor loadings, the local regression method and the naive method, are proposed for the time-varying factor model. From the empirical data analysis, we find that the new model can capture the empirical feature of time-varying factor loadings and improve mortality forecasting over different horizons and countries. Further, we propose a novel approach based on change point analysis to estimate the optimal `boundary' between short-term and long-term forecasting, which is favoured by the local linear regression and naive method, respectively. Additionally, simulation studies are provided to show the performance of the time-varying factor model under various scenarios.

Read more
Methodology

Most Powerful Test Sequences with Early Stopping Options

Sequential likelihood ratio testing is found to be most powerful in sequential studies with early stopping rules when grouped data come from the one-parameter exponential family. First, to obtain this elusive result, the probability measure of a group sequential design is constructed with support for all possible outcome events, as is useful for designing an experiment prior to having data. This construction identifies impossible events that are not part of the support. The overall probability distribution is dissected into stage specific components. These components are sub-densities of interim test statistics first described by Armitage, McPherson and Rowe (1969) that are commonly used to create stopping boundaries given an α -spending function and a set of interim analysis times. Likelihood expressions conditional on reaching a stage are given to connect pieces of the probability anatomy together. The reduction of the support caused by the adoption of an early stopping rule induces sequential truncation (not nesting) in the probability distributions of possible events. Multiple testing induces mixtures on the adapted support. Even asymptotic distributions of inferential statistics are mixtures of truncated distributions. In contrast to the classical result on local asymptotic normality (Le Cam 1960), statistics that are asymptotically normal without stopping options have asymptotic distributions that are mixtures of truncated normal distributions under local alternatives with stopping options; under fixed alternatives, asymptotic distributions of test statistics are degenerate.

Read more

Ready to get started?

Join us today