Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David S. Matteson is active.

Publication


Featured researches published by David S. Matteson.


Journal of the American Statistical Association | 2014

A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data

David S. Matteson; Nicholas A. James

Change point analysis has applications in a wide variety of fields. The general problem concerns the inference of a change in distribution for a set of time-ordered observations. Sequential detection is an online version in which new data are continually arriving and are analyzed adaptively. We are concerned with the related, but distinct, offline version, in which retrospective analysis of an entire sequence is performed. For a set of multivariate observations of arbitrary dimension, we consider nonparametric estimation of both the number of change points and the positions at which they occur. We do not make any assumptions regarding the nature of the change in distribution or any distribution assumptions beyond the existence of the αth absolute moment, for some α ∈ (0, 2). Estimation is based on hierarchical clustering and we propose both divisive and agglomerative algorithms. The divisive method is shown to provide consistent estimates of both the number and the location of change points under standard regularity assumptions. We compare the proposed approach with competing methods in a simulation study. Methods from cluster analysis are applied to assess performance and to allow simple comparisons of location estimates, even when the estimated number differs. We conclude with applications in genetics, finance, and spatio-temporal analysis. Supplementary materials for this article are available online.


The Annals of Applied Statistics | 2011

Forecasting emergency medical service call arrival rates

David S. Matteson; Mathew W. McLean; Dawn B. Woodard; Shane G. Henderson

We introduce a new method for forecasting emergency call arrival rates that combines integer-valued time series models with a dynamic latent factor structure. Covariate information is captured via simple constraints on the factor loadings. We directly model the count-valued arrivals per hour, rather than using an artificial assumption of normality. This is crucial for the emergency medical service context, in which the volume of calls may be very low. Smoothing splines are used in estimating the factor levels and loadings to improve long-term forecasts. We impose time series structure at the hourly level, rather than at the daily level, capturing the fine-scale dependence in addition to the long-term structure. Our analysis considers all emergency priority calls received by Toronto EMS between January 2007 and December 2008 for which an ambulance was dispatched. Empirical results demonstrate significantly reduced error in forecasting call arrival volume. To quantify the impact of reduced forecast errors, we design a queueing model simulation that approximates the dynamics of an ambulance system. The results show better performance as the forecasting method improves. This notion of quantifying the operational impact of improved statistical procedures may be of independent interest.


The Annals of Applied Statistics | 2013

Travel time estimation for ambulances using Bayesian data augmentation

Bradford S. Westgate; Dawn B. Woodard; David S. Matteson; Shane G. Henderson

We introduce a Bayesian model for estimating the distribution of ambulance travel times on each road segment in a city, using Global Positioning System (GPS) data. Due to sparseness and error in the GPS data, the exact ambulance paths and travel times on each road segment are unknown. We simultaneously estimate the paths, travel times, and parameters of each road segment travel time distribution using Bayesian data augmentation. To draw ambulance path samples, we use a novel reversible jump Metropolis-Hastings step. We also introduce two simpler estimation methods based on GPS speed data. We compare these methods to a recently published travel time estimation method, using simulated data and data from Toronto EMS. In both cases, out-of-sample point and interval estimates of ambulance trip times from the Bayesian method outperform estimates from the alternative methods. We also construct probability-of-coverage maps for ambulances. The Bayesian method gives more realistic maps than the recently published method. Finally, path estimates from the Bayesian method interpolate well between sparsely recorded GPS readings and are robust to GPS location errors.


Journal of the American Statistical Association | 2011

Dynamic Orthogonal Components for Multivariate Time Series

David S. Matteson; Ruey S. Tsay

We introduce dynamic orthogonal components (DOC) for multivariate time series and propose a procedure for estimating and testing the existence of DOCs for a given time series. We estimate the dynamic orthogonal components via a generalized decorrelation method that minimizes the linear and quadratic dependence across components and across time. We then use Ljung–Box type statistics to test the existence of dynamic orthogonal components. When DOCs exist, univariate analysis can be applied to build a model for each component. Those univariate models are then combined to obtain a multivariate model for the original time series. We demonstrate the usefulness of dynamic orthogonal components with two real examples and compare the proposed modeling method with other dimension-reduction methods available in the literature, including principal component and independent component analyses. We also prove consistency and asymptotic normality of the proposed estimator under some regularity conditions. We provide some technical details in online Supplementary Materials.


Electronic Journal of Statistics | 2011

Stationarity of generalized autoregressive moving average models

Dawn B. Woodard; David S. Matteson; Shane G. Henderson

Time series models are often constructed by combining nonstationary effects such as trends with stochastic processes that are believed to be stationary. Although stationarity of the underlying process is typically crucial to ensure desirable properties or even validity of statistical estimators, there are numerous time series models for which this stationarity is not yet proven. A major barrier is that the most commonly-used methods assume φ-irreducibility, a condition that can be violated for the important class of discrete-valued observation-driven models. We show (strict) stationarity for the class of Generalized Autoregressive Moving Average (GARMA) models, which provides a flexible analogue of ARMA models for count, binary, or other discrete-valued data. We do this from two perspectives. First, we show stationarity and ergodicity of a perturbed version of the GARMA model, and show that the perturbed model yields parameter estimates that are arbitrarily close to those of the original model. This approach utilizes the fact that the perturbed model is φ-irreducible. Second, we show that the original GARMA model has a unique stationary distribution (so is strictly stationary when initialized in that distribution).


Journal of the American Statistical Association | 2017

Independent Component Analysis via Distance Covariance

David S. Matteson; Ruey S. Tsay

ABSTRACT This article introduces a novel statistical framework for independent component analysis (ICA) of multivariate data. We propose methodology for estimating mutually independent components, and a versatile resampling-based procedure for inference, including misspecification testing. Independent components are estimated by combining a nonparametric probability integral transformation with a generalized nonparametric whitening method based on distance covariance that simultaneously minimizes all forms of dependence among the components. We prove the consistency of our estimator under minimal regularity conditions and detail conditions for consistency under model misspecification, all while placing assumptions on the observations directly, not on the latent components. U statistics of certain Euclidean distances between sample elements are combined to construct a test statistic for mutually independent components. The proposed measures and tests are based on both necessary and sufficient conditions for mutual independence. We demonstrate the improvements of the proposed method over several competing methods in simulation studies, and we apply the proposed ICA approach to two real examples and contrast it with principal component analysis.


Journal of the American Statistical Association | 2015

A Spatio-Temporal Point Process Model for Ambulance Demand

Zhengyi Zhou; David S. Matteson; Dawn B. Woodard; Shane G. Henderson; Athanasios C. Micheas

Ambulance demand estimation at fine time and location scales is critical for fleet management and dynamic deployment. We are motivated by the problem of estimating the spatial distribution of ambulance demand in Toronto, Canada, as it changes over discrete 2 hr intervals. This large-scale dataset is sparse at the desired temporal resolutions and exhibits location-specific serial dependence, daily, and weekly seasonality. We address these challenges by introducing a novel characterization of time-varying Gaussian mixture models. We fix the mixture component distributions across all time periods to overcome data sparsity and accurately describe Toronto’s spatial structure, while representing the complex spatio-temporal dynamics through time-varying mixture weights. We constrain the mixture weights to capture weekly seasonality, and apply a conditionally autoregressive prior on the mixture weights of each component to represent location-specific short-term serial dependence and daily seasonality. While estimation may be performed using a fixed number of mixture components, we also extend to estimate the number of components using birth-and-death Markov chain Monte Carlo. The proposed model is shown to give higher statistical predictive accuracy and to reduce the error in predicting emergency medical service operational performance by as much as two-thirds compared to a typical industry practice.


Journal of the American Statistical Association | 2017

A Bayesian Multivariate Functional Dynamic Linear Model

Daniel R. Kowal; David S. Matteson; David Ruppert

ABSTRACT We present a Bayesian approach for modeling multivariate, dependent functional data. To account for the three dominant structural features in the data—functional, time dependent, and multivariate components—we extend hierarchical dynamic linear models for multivariate time series to the functional data setting. We also develop Bayesian spline theory in a more general constrained optimization framework. The proposed methods identify a time-invariant functional basis for the functional observations, which is smooth and interpretable, and can be made common across multivariate observations for additional information sharing. The Bayesian framework permits joint estimation of the model parameters, provides exact inference (up to MCMC error) on specific parameters, and allows generalized dependence structures. Sampling from the posterior distribution is accomplished with an efficient Gibbs sampling algorithm. We illustrate the proposed framework with two applications: (1) multi-economy yield curve data from the recent global recession, and (2) local field potential brain signals in rats, for which we develop a multivariate functional time series approach for multivariate time–frequency analysis. Supplementary materials, including R code and the multi-economy yield curve data, are available online.


international conference on big data | 2016

Leveraging cloud data to mitigate user experience from ‘breaking bad’

Nicholas A. James; Arun Kejariwal; David S. Matteson

Low latency and high availability of an app or a web service are key, amongst other factors, to the overall user experience (which in turn directly impacts the bottoniline). Exogenic and/or endogenic factors often give rise to breakouts in cloud data which makes maintaining high availability and delivering high performance very challenging. Existing breakout detection techniques are not suitable for cloud data owing to not being robust in the presence of anomalies. To this end, we developed a novel statistical technique to automatically detect breakouts in cloud data. This technique employs Energy Statistics to detect breakouts in both app and system metrics. Further, the technique uses robust statistical metrics, viz., medians, and estimates the statistical significance of a breakout through a permutation test. To the best of our knowledge, this is the first work which addresses breakout detection in the presence of anomalies. We demonstrate the efficacy of the proposed technique using production data and report precision, recall, and f-measure measure. The proposed technique is 3.5× faster than a state-of-the-art technique for breakout detection and is being currently used on a daily basis at Twitter Inc.


knowledge discovery and data mining | 2015

Predicting Ambulance Demand: a Spatio-Temporal Kernel Approach

Zhengyi Zhou; David S. Matteson

Predicting ambulance demand accurately at fine time and location scales is critical for ambulance fleet management and dynamic deployment. Large-scale datasets in this setting typically exhibit complex spatio-temporal dynamics and sparsity at high resolutions. We propose a predictive method using spatio-temporal kernel density estimation (stKDE) to address these challenges, and provide spatial density predictions for ambulance demand in Toronto, Canada as it varies over hourly intervals. Specifically, we weight the spatial kernel of each historical observation by its informativeness to the current predictive task. We construct spatio-temporal weight functions to incorporate various temporal and spatial patterns in ambulance demand, including location-specific seasonalities and short-term serial dependence. This allows us to draw out the most helpful historical data, and exploit spatio-temporal patterns in the data for accurate and fast predictions. We further provide efficient estimation and customizable prediction procedures. stKDE is easy to use and interpret by non-specialized personnel from the emergency medical service industry. It also has significantly higher statistical accuracy than the current industry practice, with a comparable amount of computational expense.

Collaboration


Dive into the David S. Matteson's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge