Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Matthias Katzfuss is active.

Publication


Featured researches published by Matthias Katzfuss.


Journal of Time Series Analysis | 2011

Spatio‐temporal smoothing and EM estimation for massive remote‐sensing data sets

Matthias Katzfuss; Noel A Cressie

The use of satellite measurements in climate studies promises many new scientific insights if those data can be efficiently exploited. Due to sparseness of daily data sets, there is a need to fill spatial gaps and to borrow strength from adjacent days. Nonetheless, these satellites are typically capable of conducting on the order of 100,000 retrievals per day, which makes it impossible to apply traditional spatio‐temporal statistical methods, even in supercomputing environments. To overcome these challenges, we make use of a spatio‐temporal mixed‐effects model. For each massive daily data set, dimension reduction is achieved by essentially modelling the underlying process as a linear combination of spatial basis functions on the globe. The application of a dynamical autoregressive model in time, over the reduced space, allows rapid sequential computation of optimal smoothing predictions via the Kalman smoother; this is known as Fixed Rank Smoothing (FRS). The dimension‐reduced mixed‐effects model contains a number of unknown parameters, including covariance and propagator matrices, which describe the spatial and temporal dependence structure in the reduced‐dimensional process. We take an empirical‐Bayes approach to inference, which involves estimating the parameters and substituting them into the optimal predictors. Method‐of‐moments (MM) parameter estimation (currently used in FRS) is typically inefficient compared to maximum likelihood (ML) estimation and can result in large sampling variability. Here, we develop ML estimation via an expectation‐maximization (EM) algorithm, which offers stable computation of valid estimators and makes efficient use of spatial and temporal dependence in the data. The two parameter‐estimation approaches, MM and ML, are compared in a simulation study. We also apply our methodology to global satellite CO measurements: We optimally smooth the sparse daily CO maps obtained by the Atmospheric InfraRed Sounder (AIRS) instrument on the Aqua satellite; then, using FRS with EM‐estimated parameters, a complete sequence of the daily global CO fields can be obtained, together with their associated prediction uncertainties.


Journal of the American Statistical Association | 2017

A Multi-Resolution Approximation for Massive Spatial Datasets

Matthias Katzfuss

ABSTRACT Automated sensing instruments on satellites and aircraft have enabled the collection of massive amounts of high-resolution observations of spatial fields over large spatial regions. If these datasets can be efficiently exploited, they can provide new insights on a wide variety of issues. However, traditional spatial-statistical techniques such as kriging are not computationally feasible for big datasets. We propose a multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space. The M-RA process is specified as a linear combination of basis functions at multiple levels of spatial resolution, which can capture spatial structure from very fine to very large scales. The basis functions are automatically chosen to approximate a given covariance function, which can be nonstationary. All computations involving the M-RA, including parameter inference and prediction, are highly scalable for massive datasets. Crucially, the inference algorithms can also be parallelized to take full advantage of large distributed-memory computing environments. In comparisons using simulated data and a large satellite dataset, the M-RA outperforms a related state-of-the-art method. Supplementary materials for this article are available online.


Environmetrics | 2013

Bayesian nonstationary spatial modeling for very large datasets

Matthias Katzfuss

With the proliferation of modern high-resolution measuring instruments mounted on satellites, planes, ground-based vehicles, and monitoring stations, a need has arisen for statistical methods suitable for the analysis of large spatial datasets observed on large spatial domains. Statistical analyses of such datasets provide two main challenges: first, traditional spatial-statistical techniques are often unable to handle large numbers of observations in a computationally feasible way; second, for large and heterogeneous spatial domains, it is often not appropriate to assume that a process of interest is stationary over the entire domain. We address the first challenge by using a model combining a low-rank component, which allows for flexible modeling of medium-to-long-range dependence via a set of spatial basis functions, with a tapered remainder component, which allows for modeling of local dependence using a compactly supported covariance function. Addressing the second challenge, we propose two extensions to this model that result in increased flexibility: first, the model is parameterized on the basis of a nonstationary Matern covariance, where the parameters vary smoothly across space; second, in our fully Bayesian model, all components and parameters are considered random, including the number, locations, and shapes of the basis functions used in the low-rank component. Using simulated data and a real-world dataset of high-resolution soil measurements, we show that both extensions can result in substantial improvements over the current state-of-the-art. Copyright


Technometrics | 2014

Spatio-Temporal Data Fusion for Very Large Remote Sensing Datasets

Hai Nguyen; Matthias Katzfuss; Noel A Cressie; Amy Braverman

Developing global maps of carbon dioxide (CO2) mole fraction (in units of parts per million) near the Earth’s surface can help identify locations where major amounts of CO2 are entering and exiting the atmosphere, thus providing valuable insights into the carbon cycle and mitigating the greenhouse effect of atmospheric CO2. Existing satellite remote sensing data do not provide measurements of the CO2 mole fraction near the surface. Japan’s Greenhouse gases Observing SATellite (GOSAT) is sensitive to average CO2 over the entire column, and NASA’s Atmospheric InfraRed Sounder (AIRS) is sensitive to CO2 in the middle troposphere. One might expect that lower-atmospheric CO2 could be inferred by differencing GOSAT column-average and AIRS mid-tropospheric data. However, the two instruments have different footprints, measurement-error characteristics, and data coverages. In addition, the spatio-temporal domains are large, and the AIRS dataset is massive. In this article, we describe a spatio-temporal data-fusion (STDF) methodology based on reduced-dimensional Kalman smoothing. Our STDF is able to combine the complementary GOSAT and AIRS datasets to optimally estimate lower-atmospheric CO2 mole fraction over the whole globe. Further, it is designed for massive remote sensing datasets and accounts for differences in instrument footprint, measurement-error characteristics, and data coverages. This article has supplementary material online.


Statistics and Computing | 2017

Parallel inference for massive distributed spatial data using low-rank models

Matthias Katzfuss; Dorit Hammerling

Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to achieve faster (parallel) computation through a divide-and-conquer scheme. In both cases, the challenge is to obtain valid inference that does not require processing all data at a single central computing node. We show that for a very widely used class of spatial low-rank models, which can be written as a linear combination of spatial basis functions plus a fine-scale-variation component, parallel spatial inference and prediction for massive distributed data can be carried out exactly, meaning that the results are the same as for a traditional, non-distributed analysis. The communication cost of our distributed algorithms does not depend on the number of data points. After extending our results to the spatio-temporal case, we illustrate our methodology by carrying out distributed spatio-temporal particle filtering inference on total precipitable water measured by three different satellite sensor systems.


The American Statistician | 2016

Understanding the Ensemble Kalman Filter

Matthias Katzfuss; Jonathan R. Stroud; Christopher K. Wikle

ABSTRACT The ensemble Kalman filter (EnKF) is a computational technique for approximate inference in state-space models. In typical applications, the state vectors are large spatial fields that are observed sequentially over time. The EnKF approximates the Kalman filter by representing the distribution of the state with an ensemble of draws from that distribution. The ensemble members are updated based on newly available data by shifting instead of reweighting, which allows the EnKF to avoid the degeneracy problems of reweighting-based algorithms. Taken together, the ensemble representation and shifting-based updates make the EnKF computationally feasible even for extremely high-dimensional state spaces. The EnKF is successfully used in data-assimilation applications with tens of millions of dimensions. While it implicitly assumes a linear Gaussian state-space model, it has also turned out to be remarkably robust to deviations from these assumptions in many applications. Despite its successes, the EnKF is largely unknown in the statistics community. We aim to change that with the present article, and to entice more statisticians to work on this topic.


Monthly Weather Review | 2017

A Bayesian Adaptive Ensemble Kalman Filter for Sequential State and Parameter Estimation

Jonathan R. Stroud; Matthias Katzfuss; Christopher K. Wikle

AbstractThis paper proposes new methodology for sequential state and parameter estimation within the ensemble Kalman filter. The method is fully Bayesian and propagates the joint posterior distribution of states and parameters over time. To implement the method, the authors consider three representations of the marginal posterior distribution of the parameters: a grid-based approach, a Gaussian approximation, and a sequential importance sampling (SIR) approach with kernel resampling. In contrast to existing online parameter estimation algorithms, the new method explicitly accounts for parameter uncertainty and provides a formal way to combine information about the parameters from data at different time periods. The method is illustrated and compared to existing approaches using simulated and real data.


Geophysical Research Letters | 2017

A Bayesian hierarchical model for climate change detection and attribution

Matthias Katzfuss; Dorit Hammerling; Richard L. Smith

Regression-based detection and attribution methods continue to take a central role in the study of climate change and its causes. Here we propose a novel Bayesian hierarchical approach to this problem, which allows us to address several open methodological questions. Specifically, we take into account the uncertainties in the true temperature change due to imperfect measurements, the uncertainty in the true climate signal under different forcing scenarios due to the availability of only a small number of climate model simulations, and the uncertainty associated with estimating the climate variability covariance matrix, including the truncation of the number of empirical orthogonal functions (EOFs) in this covariance matrix. We apply Bayesian model averaging to assign optimal probabilistic weights to different possible truncations and incorporate all uncertainties into the inference on the regression coefficients. We provide an efficient implementation of our method in a software package and illustrate its use with a realistic application.


Environmetrics | 2012

Bayesian hierarchical spatio-temporal smoothing for very large datasets

Matthias Katzfuss; Noel A Cressie


Environmetrics | 2011

Spatio-temporal models for large-scale indicators of extreme weather

Matthew J. Heaton; Matthias Katzfuss; Shahla Ramachandar; Kathryn Pedings; Eric Gilleland; Elizabeth Mannshardt-Shamseldin; Richard L. Smith

Collaboration


Dive into the Matthias Katzfuss's collaboration.

Top Co-Authors

Avatar

Dorit Hammerling

National Center for Atmospheric Research

View shared research outputs
Top Co-Authors

Avatar

Noel A Cressie

University of Wollongong

View shared research outputs
Top Co-Authors

Avatar

Amy Braverman

California Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Richard L. Smith

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anna M. Michalak

Carnegie Institution for Science

View shared research outputs
Researchain Logo
Decentralizing Knowledge