Cari G. Kaufman
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Cari G. Kaufman.
Bayesian Analysis | 2010
Cari G. Kaufman; Stephan R. Sain
Functional analysis of variance (ANOVA) models partition a func- tional response according to the main efiects and interactions of various factors. This article develops a general framework for functional ANOVA modeling from a Bayesian viewpoint, assigning Gaussian process prior distributions to each batch of functional efiects. We discuss the choices to be made in specifying such a model, advocating the treatment of levels within a given factor as dependent but exchangeable quantities, and we suggest weakly informative prior distributions for higher level parameters that may be appropriate in many situations. We discuss computationally e-cient strategies for posterior sampling using Markov Chain Monte Carlo algorithms, and we emphasize useful graphical summaries based on the posterior distribution of model-based analogues of traditional ANOVA decom- positions of variance. We illustrate this process of model speciflcation, posterior sampling, and graphical posterior summaries in two examples. The flrst consid- ers the efiect of geographic region on the temperature proflles at weather stations in Canada. The second example examines sources of variability in the output of regional climate models from a designed experiment.
The Annals of Applied Statistics | 2011
Cari G. Kaufman; Derek Bingham; Salman Habib; Katrin Heitmann; Joshua A. Frieman
Statistical emulators of computer simulators have proven to be useful in a variety of applications. The widely adopted model for emulator building, using a Gaussian process model with strictly positive correlation function, is computationally intractable when the number of simulator evaluations is large. We propose a new model that uses a combination of low-order regression terms and compactly supported correlation functions to recreate the desired predictive behavior of the emulator at a fraction of the computational cost. Following the usual approach of taking the correlation to be a product of correlations in each input dimension, we show how to impose restrictions on the ranges of the correlations, giving sparsity, while also allowing the ranges to trade off against one another, thereby giving good predictive performance. We illustrate the method using data from a computer simulator of photometric redshift with 20,000 simulator evaluations and 80,000 predictions.
Journal of Statistical Software | 2015
Christopher J. Paciorek; Benjamin Lipshitz; Wei Zhuo; Prabhat; Cari G. Kaufman; Rollin C. Thomas
We consider parallel computation for Gaussian process calculations to overcome computational and memory constraints on the size of datasets that can be analyzed. Using a hybrid parallelization approach that uses both threading (shared memory) and message-passing (distributed memory), we implement the core linear algebra operations used in spatial statistics and Gaussian process regression in an R package called bigGP that relies on C and MPI. The approach divides the covariance matrix into blocks such that the computational load is balanced across processes while communication between processes is limited. The package provides an API enabling R programmers to implement Gaussian process-based methods by using the distributed linear algebra operations without any C or MPI coding. We illustrate the approach and software by analyzing an astrophysics dataset with n = 67, 275 observations.
international conference on data mining | 2011
Wei Zhuo; Prabhat; Christopher J. Paciorek; Cari G. Kaufman; Wes Bethel
We investigate the problem of kriging analysis for estimating quantities at unknown locations given a set of observations. Widely known in the geostatistical community, kriging bases spatial prediction on a closed-form model for the spatial co variances between observations, deriving interpolation parameters that minimize variance. While kriging produces predictions with high accuracy, a standard implementation based on maximum likelihood involves repeated covariance factorization, forward-solve, and inner product operations. The resulting computational complexity renders the method infeasible for application to large datasets on a single node. To facilitate large-scale kriging analysis, we develop and implement a distributed version of the algorithm that can utilize multiple computational nodes as well as multiple cores on a single node. We apply kriging analysis for making predictions from a medium-sized weather station dataset, and demonstrate our parallel implementation on a much larger synthetic dataset consisting of 65536 points using 512 cores.
The Annals of Applied Statistics | 2016
Benjamin A. Shaby; Brian J. Reich; Daniel Cooley; Cari G. Kaufman
Heat waves merit careful study because they inict severe economic and societal damage. We use an intuitive, informal working denition of a heat wave|a persistent event in the tail of the temperature distribution| to motivate an interpretable latent state extreme value model. A latent variable with dependence in time indicates membership in the heat wave state. The strength of the temporal dependence of the latent variable controls the frequency and persistence of heat waves. Within each heat wave, temperatures are modeled using extreme value distributions, with extremal dependence across time accomplished through an extreme value Markov model. One important virtue of interpretability is that model parameters directly translate into quantities of interest for risk management, so that questions like whether heat waves are becoming longer, more severe, or more frequent, are easily answered by querying an appropriate tted model. We demonstrate the latent state model on two recent, calamitous, examples: the European heat wave of 2003 and the Russian heat wave of 2010.
Methods in Ecology and Evolution | 2016
Danielle Svehla Christianson; Cari G. Kaufman
Methods in Ecology and Evolution 2016, 7, 770–782 doi: 10.1111/2041-210X.12539 Effects of sample design and landscape features on a measure of environmental heterogeneity Danielle S. Christianson 1 * and Cari G. Kaufman 2 Energy and Resources Group, University of California, 310 Barrows Hall, Berkeley, CA 94720-3050, USA; and Department of Statistics, University of California, 367 Evans Hall, Berkeley, CA, 94720-3860, USA Summary 1. Environmental heterogeneity, an important influence on organisms and ecological processes, can be quanti- fied by the variance of an environmental characteristic over all locations within a study extent. However on land- scapes with autocorrelation and gradient patterns, estimating this variance from a sample of locations may lead to errors that cannot be corrected with statistical techniques. 2. We analytically derived the relative expected sampling error of sample designs on landscapes with particular gradient pattern and autocorrelation features. We applied this closed-form approach to temperature observa- tions from an existing study. The expected heterogeneity differed, both in magnitude and direction, amongst sample designs over the study site’s likely range of autocorrelation and gradient features. 3. We conducted a simulation study to understand the effects of (i) landscape variability and (ii) design variabil- ity on an average sampling error. On 10 000 simulated landscapes with varying gradient and autocorrelation fea- tures, we compared estimates of variance from a variety of structured and random sample designs. While gradient patterns and autocorrelation cause large errors for some designs, others yield near-zero average sam- pling error. Sample location spacing is a key factor in sample design performance. Random designs have larger range of possible sampling errors than structured designs due to the potential for sample arrangements that over- and under-sample certain areas of the landscape. 4. When implementing a new sample design to quantify environmental heterogeneity via variance, we recommend using a simple structured design with appropriate sample spacing. For existing designs, we recom- mend calculating the relative expected sampling error via our analytical derivation. Key-words: autocorrelation, gradient, monitoring, sampling, spatial, variability Introduction Environmental heterogeneity, a measure of variability in abiotic and biotic conditions, is important to many areas of ecological study, such as species diversity via niche theory (Holdridge 1947; Whittaker 1956), microbial biogeochemical processes (e.g. Fierer et al. 2006; Sierra et al. 2011) and population dynamics (e.g. Garc ia-Carreras & Reuman 2013). Increasingly, ecologists use environmental heterogene- ity to understand how organisms and ecological processes may respond to a warmer future (e.g. Jentsch et al. 2011; Clark et al. 2013; Thornton et al. 2014). For example, heterogeneity in climate over small spatial scales results in climate refugia that may be of critical importance to the persistence of species with limited options for range expan- sion (Dobrowski 2011; Keppel & Wardell-Johnson 2012). Additionally, while many organisms optimize for average conditions, heterogeneity is important for understanding when and how likely thresholds (e.g. frost tolerances, heat stress) may be breached (Meehl et al. 2000; Jentsch & Beierkuhnlein 2008). *Correspondence author. E-mail: [email protected] There are many definitions of heterogeneity. Some authors simply define a range of values, while others include aspects of scale, quantify spatial features such as clumping, or qualitatively describe composition (Kolasa & Rollo 1991; Wiens 2000). Quantitative metrics include coef- ficient of variance (standard deviation/mean), variance/mean ratio, variograms and others (see overview by Downing 1991). To be useful in many ecological applications, such a metric must describe the range of available environmental conditions, either in space or time. For example, consider a population of trees whose seeds disperse on the landscape. To maintain a viable population, there must be an adequate number of locations on the landscape through time that have environmental conditions suitable for the seeds to ger- minate and the resulting seedlings to survive. By quantifying the available set of environmental conditions, we can deter- mine whether the environments for successful germination and establishment exist and thus predict whether early life- history stages will limit population viability. Accurately estimating heterogeneity, however, is not straightforward when sampling autocorrelated quantities that may also have gradient patterns, a gradual directional change. To account for autocorrelation or gradients, ecologists employ
Biostatistics | 2018
Minjeong Jeon; Cari G. Kaufman; Sophia Rabe-Hesketh
&NA; We propose the Monte Carlo local likelihood (MCLL) method for approximating maximum likelihood estimation (MLE). MCLL initially treats model parameters as random variables, sampling them from the posterior distribution as in a Bayesian model. The likelihood function is then approximated up to a constant by fitting a density to the posterior samples and dividing the approximate posterior density by the prior. In the MCLL algorithm, the posterior density is estimated using local likelihood density estimation, in which the log‐density is locally approximated by a polynomial function. We also develop a new method that allows users to efficiently compute standard errors and the Bayes factor. Two empirical and three simulation studies are provided to demonstrate the performance of the MCLL method.
Statistics in Medicine | 2005
Cari G. Kaufman; Val erie Ventura; Robert E. Kass
Biometrika | 2013
Cari G. Kaufman; Benjamin A. Shaby
Archive | 2007
Cari G. Kaufman; Stephan R. Sain