Matthew A. Taddy
University of Chicago
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Matthew A. Taddy.
Journal of the American Statistical Association | 2011
Matthew A. Taddy; Robert B. Gramacy; Nicholas G. Polson
Dynamic regression trees are an attractive option for automatic regression and classification with complicated response surfaces in online application settings. We create a sequential tree model whose state changes in time with the accumulation of new data, and provide particle learning algorithms that allow for the efficient online posterior filtering of tree states. A major advantage of tree regression is that it allows for the use of very simple models within each partition. The model also facilitates a natural division of labor in our sequential particle-based inference: tree dynamics are defined through a few potential changes that are local to each newly arrived observation, while global uncertainty is captured by the ensemble of particles. We consider both constant and linear mean functions at the tree leaves, along with multinomial leaves for classification problems, and propose default prior specifications that allow for prediction to be integrated over all model parameters conditional on a given tree. Inference is illustrated in some standard nonparametric regression examples, as well as in the setting of sequential experiment design, including both active learning and optimization applications, and in online classification. We detail implementation guidelines and problem specific methodology for each of these motivating applications. Throughout, it is demonstrated that our practical approach is able to provide better results compared to commonly used methods at a fraction of the cost.
Journal of Business & Economic Statistics | 2010
Matthew A. Taddy; Athanasios Kottas
We develop a Bayesian method for nonparametric model–based quantile regression. The approach involves flexible Dirichlet process mixture models for the joint distribution of the response and the covariates, with posterior inference for different quantile curves emerging from the conditional response distribution given the covariates. An extension to allow for partially observed responses leads to a novel Tobit quantile regression framework. We use simulated data sets and two data examples from the literature to illustrate the capacity of the model to uncover nonlinearities in quantile regression curves, as well as nonstandard features in the response distribution.
Technometrics | 2009
Matthew A. Taddy; Herbert K. H. Lee; Genetha Anne Gray; Joshua D. Griffin
Optimization for complex systems in engineering often involves the use of expensive computer simulation. By combining statistical emulation using treed Gaussian processes with pattern search optimization, we are able to perform robust local optimization more efficiently and effectively than when using either method alone. Our approach is based on the augmentation of local search patterns with location sets generated through improvement prediction over the input space. We further develop a computational framework for asynchronous parallel implementation of the optimization algorithm. We demonstrate our methods on two standard test problems and our motivating example of calibrating a circuit device simulator.
Journal of the American Statistical Association | 2010
Matthew A. Taddy
This article develops a set of tools for smoothing and prediction with dependent point event patterns. The methodology is motivated by the problem of tracking weekly maps of violent crime events, but is designed to be straightforward to adapt to a wide variety of alternative settings. In particular, a Bayesian semiparametric framework is introduced for modeling correlated time series of marked spatial Poisson processes. The likelihood is factored into two independent components: the set of total integrated intensities and a series of process densities. For the former it is assumed that Poisson intensities are realizations from a dynamic linear model. In the latter case, a novel class of dependent stick-breaking mixture models are proposed to allow nonparametric density estimates to evolve in discrete time. This, a simple and flexible new model for dependent random distributions, is based on autoregressive time series of marginally beta random variables applied as correlated stick-breaking proportions. The approach allows for marginal Dirichlet process priors at each time and adds only a single new correlation term to the static model specification. Sequential Monte Carlo algorithms are described for online inference with each model component, and marginal likelihood calculations form the basis for inference about parameters governing temporal dynamics. Simulated examples are provided to illustrate the methodology, and we close with results for the motivating application of tracking violent crime in Cincinnati.
The Annals of Applied Statistics | 2013
Robert B. Gramacy; Matthew A. Taddy; Stefan M. Wild
We investigate an application in the automatic tuning of computer codes, an area of research that has come to prominence alongside the recent rise of distributed scientific processing and heterogeneity in high-performance computing environments. Here, the response function is nonlinear and noisy and may not be smooth or stationary. Clearly needed are variable selection, decomposition of influence, and analysis of main and secondary effects for both real-valued and binary inputs and outputs. Our contribution is a novel set of tools for variable selection and sensitivity analysis based on the recently proposed dynamic tree model. We argue that this approach is uniquely well suited to the demands of our motivating example. In illustrations on benchmark data sets, we show that the new techniques are faster and offer richer feature sets than do similar approaches in the static tree and computer experiment literature. We apply the methods in code-tuning optimization, examination of a cold-cache effect, and detection of transformation errors.
Bayesian Analysis | 2012
Matthew A. Taddy; Athanasios Kottas
We propose a general modeling framework for marked Poisson processes observed over time or space. The modeling approach exploits the connection of the nonhomogeneous Poisson process intensity with a density function. Nonparametric Dirichlet process mixtures for this density, combined with nonparametric or semiparametric modeling for the mark distribution, yield flexible prior models for the marked Poisson process. In particular, we focus on fully nonparametric model formulations that build the mark density and intensity function from a joint nonparametric mixture, and provide guidelines for straightforward application of these techniques. A key feature of such models is that they can yield flexible inference about the conditional distribution for multivariate marks without requiring specification of a complicated dependence scheme. We address issues relating to choice of the Dirichlet process mixture kernels, and develop methods for prior specification and posterior simulation for full inference about functionals of the marked Poisson process. Moreover, we discuss a method for model checking that can be used to assess and compare goodness of fit of different model specifications under the proposed framework. The methodology is illustrated with simulated and real data sets.
Bayesian Analysis | 2009
Matthew A. Taddy; Athanasios Kottas
Markov switching models can be used to study heterogeneous populations that are observed over time. This paper explores modeling the group characteristics nonparametrically, under both homogeneous and nonhomogeneous Markov switching for group probabilities. The model formulation involves a finite mixture of conditionally independent Dirichlet process mixtures, with a Markov chain defining the mixing distribution. The proposed methodology focuses on settings where the number of subpopulations is small and can be assumed to be known, and flexible modeling is required for group regressions. We develop Dirichlet process mixture prior probability models for the joint distribution of individual group responses and covariates. The implied conditional distribution of the response given the covariates is then used for inference. The modeling framework allows for both non-linearities in the resulting regression functions and non-standard shapes in the response distributions. We design a simulation-based model fitting method for full posterior inference. Furthermore, we propose a general approach for inclusion of external covariates dependent on the Markov chain but conditionally independent from the response. The methodology is applied to a problem from fisheries research involving analysis of stock-recruitment data under shifts in the ecosystem state.
IEEE Transactions on Geoscience and Remote Sensing | 2008
Robin D. Morris; Athanasios Kottas; Matthew A. Taddy; Roberto Furfaro; B. D. Ganapol
Process models are widely used tools, both for studying fundamental processes themselves and as elements of larger system studies. A radiative transfer model (RTM) simulates the interaction of light with a medium. We are interested in RTMs that model light reflected from a vegetated region. Such an RTM takes as input various biospheric and illumination parameters and computes the upwelling radiation at the top of the canopy. The question we address is as follows: Which of the inputs to the RTM has the greatest impact on the computed observation? We study the leaf canopy model (LCM) RTM, which was designed to study the feasibility of observing leaf chemistry remotely. Its inputs are leaf chemistry variables (chlorophyll, water, lignin, and cellulose) and canopy structural parameters (leaf area index, leaf angle distribution, soil reflectance, and sun angle). We present a statistical approach to the sensitivity analysis of RTMs to answer the question previously posed. The focus is on global sensitivity analysis, studying how the RTM output changes as the inputs vary continuously according to a probability distribution over the input space. The influence of each input variable is captured through the ldquomain effectsrdquo and ldquosensitivity indices.rdquo Direct computation requires extensive computationally expensive runs of the RTM. We develop a Gaussian process approximation to the RTM output to enable efficient computation. We illustrate how the approach can effectively determine the inputs that are vital for accurate prediction. The methods are applied to the LCM with seven inputs and output obtained at eight wavelengths associated with Moderate-resolution Imaging Spectroradiometer bands that are sensitive to vegetation.
Journal of Classification | 2010
Herbert K. H. Lee; Matthew A. Taddy; Genetha Anne Gray
Sometimes a larger dataset needs to be reduced to just a few points, and it is desirable that these points be representative of the whole dataset. If the future uses of these points are not fully specified in advance, standard decision-theoretic approaches will not work. We present here methodology for choosing a small representative sample based on a mixture modeling approach.
Inverse Problems | 2009
Matthew A. Taddy; Herbert K. H. Lee; Bruno Sansó
Computer models for the simulation of physical and environmental phenomena are often regulated by complicated dependences on unknown variables, and these unobservable inputs must be inferred from a comparison of simulator output against physical data. The standard Bayesian statistical approaches to this inference problem require fitting a complicated statistical model to the existing parameter evaluations, usually through use of a Markov chain Monte Carlo sampling scheme. When there already exists a large bank of simulated values, it may be undesirable to develop a sophisticated statistical surrogate or to sample additional output from the computer simulator. In response to this motivation, we discuss a sampling importance resampling algorithm for Bayesian inference in inverse problems that works in conjunction with kernel density estimation to resample, from the original computer output, an approximate posterior sample for the unobservable variables of interest. Given a sufficiently large bank of computer output, our resampling method is able to provide high-quality results at a much lower cost than the standard Bayesian techniques. We present two applications where unobservable inputs are to be inferred from scarce observations and abundant simulated output. One consists of a climate simulator and the other of a groundwater flow model.