Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David A. van Dyk is active.

Publication


Featured researches published by David A. van Dyk.


Journal of the American Statistical Association | 2004

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Kosuke Imai; David A. van Dyk

In this article we develop the theoretical properties of the propensity function, which is a generalization of the propensity score of Rosenbaum and Rubin. Methods based on the propensity score have long been used for causal inference in observational studies; they are easy to use and can effectively reduce the bias caused by nonrandom treatment assignment. Although treatment regimes need not be binary in practice, the propensity score methods are generally confined to binary treatment scenarios. Two possible exceptions have been suggested for ordinal and categorical treatments. In this article we develop theory and methods that encompass all of these techniques and widen their applicability by allowing for arbitrary treatment regimes. We illustrate our propensity function methods by applying them to two datasets; we estimate the effect of smoking on medical expenditure and the effect of schooling on wages. We also conduct simulation studies to investigate the performance of our methods.


Journal of The Royal Statistical Society Series B-statistical Methodology | 1997

The EM algorithm : an old folk-song sung to a fast new tune

Xiao-Li Meng; David A. van Dyk

Celebrating the 20th anniversary of the presentation of the paper by Dempster, Laird and Rubin which popularized the EM algorithm, we investigate, after a brief historical account, strategies that aim to make the EM algorithm converge faster while maintaining its simplicity and stability (e.g. automatic monotone convergence in likelihood). First we introduce the idea of a ‘working parameter’ to facilitate the search for efficient data augmentation schemes and thus fast EM implementations. Second, summarizing various recent extensions of the EM algorithm, we formulate a general alternating expectation–conditional maximization algorithm AECM that couples flexible data augmentation schemes with model reduction schemes to achieve efficient computations. We illustrate these methods using multivariate t-models with known or unknown degrees of freedom and Poisson models for image reconstruction. We show, through both empirical and theoretical evidence, the potential for a dramatic reduction in computational time with little increase in human effort. We also discuss the intrinsic connection between EM-type algorithms and the Gibbs sampler, and the possibility of using the techniques presented here to speed up the latter. The main conclusion of the paper is that, with the help of statistical considerations, it is possible to construct algorithms that are simple, stable and fast.


The Astrophysical Journal | 2002

STATISTICS, HANDLE WITH CARE: DETECTING MULTIPLE MODEL COMPONENTS WITH THE LIKELIHOOD RATIO TEST

Rostislav Protassov; David A. van Dyk; Alanna Connors; Vinay L. Kashyap; Aneta Siemiginowska

The likelihood ratio test (LRT) and the related F-test, popularized in astrophysics by Eadie and coworkers in 1971, Bevington in 1969, Lampton, Margon, & Bowyer, in 1976, Cash in 1979, and Avni in 1978, do not (even asymptotically) adhere to their nominal χ2 and F-distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and nondetections into doubt. Although the above authors illustrate the many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the F-test to detect a line in a spectral model or a source above background despite the lack of certain required regularity conditions. (These applications were not originally suggested by Cash or by Bevington.) In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, contrary to common practice, the nominal χ2 distribution for the LRT or the F-distribution for the F-test should not be used. In this paper, we characterize an important class of problems in which the LRT and the F-test fail and illustrate this nonstandard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability values. We present this method in some detail since it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of 1997 May 8 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation. There are many legitimate uses of the LRT and the F-test in astrophysics, and even when these tests are inappropriate, there remain several statistical alternatives (e.g., judicious use of error bars and Bayes factors). Nevertheless, there are numerous cases of the inappropriate use of the LRT and similar tests in the literature, bringing substantive scientific results into question.


Journal of Computational and Graphical Statistics | 2001

The art of data augmentation

David A. van Dyk; Xiao-Li Meng

The term data augmentation refers to methods for constructing iterative optimization or sampling algorithms via the introduction of unobserved data or latent variables. For deterministic algorithms, the method was popularized in the general statistical community by the seminal article by Dempster, Laird, and Rubin on the EM algorithm for maximizing a likelihood function or, more generally, a posterior density. For stochastic algorithms, the method was popularized in the statistical literature by Tanner and Wongs Data Augmentation algorithm for posterior sampling and in the physics literature by Swendsen and Wangs algorithm for sampling from the Ising and Potts models and their generalizations; in the physics literature, the method of data augmentation is referred to as the method of auxiliary variables. Data augmentation schemes were used by Tanner and Wong to make simulation feasible and simple, while auxiliary variables were adopted by Swendsen and Wang to improve the speed of iterative simulation. In general, however, constructing data augmentation schemes that result in both simple and fast algorithms is a matter of art in that successful strategies vary greatly with the (observed-data) models being considered. After an overview of data augmentation/auxiliary variables and some recent developments in methods for constructing such efficient data augmentation schemes, we introduce an effective search strategy that combines the ideas of marginal augmentation and conditional augmentation, together with a deterministic approximation method for selecting good augmentation schemes. We then apply this strategy to three common classes of models (specifically, multivariate t, probit regression, and mixed-effects models) to obtain efficient Markov chain Monte Carlo algorithms for posterior sampling. We provide theoretical and empirical evidence that the resulting algorithms, while requiring similar programming effort, can show dramatic improvement over the Gibbs samplers commonly used for these models in practice. A key feature of all these new algorithms is that they are positive recurrent subchains of nonpositive recurrent Markov chains constructed in larger spaces.


Journal of the American Statistical Association | 2008

Partially Collapsed Gibbs Samplers: Theory and Methods

David A. van Dyk; Taeyoung Park

Ever-increasing computational power, along with ever–more sophisticated statistical computing techniques, is making it possible to fit ever–more complex statistical models. Among the more computationally intensive methods, the Gibbs sampler is popular because of its simplicity and power to effectively generate samples from a high-dimensional probability distribution. Despite its simple implementation and description, however, the Gibbs sampler is criticized for its sometimes slow convergence, especially when it is used to fit highly structured complex models. Here we present partially collapsed Gibbs sampling strategies that improve the convergence by capitalizing on a set of functionally incompatible conditional distributions. Such incompatibility generally is avoided in the construction of a Gibbs sampler, because the resulting convergence properties are not well understood. We introduce three basic tools (marginalization, permutation, and trimming) that allow us to transform a Gibbs sampler into a partially collapsed Gibbs sampler with known stationary distribution and faster convergence.


The Astrophysical Journal | 2006

BAYESIAN ESTIMATION OF HARDNESS RATIOS: MODELING AND COMPUTATIONS

Taeyoung Park; Vinay L. Kashyap; Aneta Siemiginowska; David A. van Dyk; A. L. Zezas; C.O. Heinke; Bradford J. Wargelin

A commonly used measure to summarize the nature of a photon spectrum is the so-called hardness ratio, which compares the numbers of counts observed in different passbands. The hardness ratio is especially useful to distinguish between and categorize weak sources as a proxy for detailed spectral fitting. However, in this regime classical methods of error propagation fail, and the estimates of spectral hardness become unreliable. Here we develop a rigorous statistical treatment of hardness ratios that properly deals with detected photons as independent Poisson random variables and correctly deals with the non-Gaussian nature of the error propagation. The method is Bayesian in nature and thus can be generalized to carry out a multitude of source-population-based analyses. We verify our method with simulation studies and compare it with the classical method. We apply this method to real-world examples, such as the identification of candidate quiescent low-mass X-ray binaries in globular clusters and tracking the time evolution of a flare on a low-mass star.


The Astrophysical Journal | 2001

ANALYSIS OF ENERGY SPECTRA WITH LOW PHOTON COUNTS VIA BAYESIAN POSTERIOR SIMULATION

David A. van Dyk; Alanna Connors; Vinay L. Kashyap; Aneta Siemiginowska

Over the past 10 years Bayesian methods have rapidly grown more popular in many scientific disciplines as several computationally intensive statistical algorithms have become feasible with increased computer power. In this paper we begin with a general description of the Bayesian paradigm for statistical inference and the various state-of-the-art model-fitting techniques that we employ (e.g., the Gibbs sampler and the Metropolis-Hastings algorithm). These algorithms are very flexible and can be used to fit models that account for the highly hierarchical structure inherent in the collection of high-quality spectra and thus can keep pace with the accelerating progress of new space telescope designs. The methods we develop, which will soon be available in the Chandra Interactive Analysis of Observations (CIAO) software, explicitly model photon arrivals as a Poisson process and thus have no difficulty with high-resolution low-count X-ray and γ-ray data. We expect these methods to be useful not only for the recently launched Chandra X-Ray Observatory and XMM but also for new generation telescopes such as Constellation X, GLAST, etc. In the context of two examples (quasar S5 0014+813 and hybrid-chromosphere supergiant star α TrA), we illustrate a new highly structured model and how Bayesian posterior sampling can be used to compute estimates, error bars, and credible intervals for the various model parameters. Application of our method to the high-energy tail of the ASCA spectrum of α TrA confirms that even at a quiescent state, the coronal plasma on this hybrid-chromosphere star is indeed at high temperatures (>10 MK) that normally characterize flaring plasma on the Sun. We are also able to constrain the coronal metallicity and find that although it is subject to large uncertainties, it is consistent with the photospheric measurements.


The Astrophysical Journal | 2015

STRONG LENS TIME DELAY CHALLENGE. II. RESULTS OF TDC1

Kai Liao; Tommaso Treu; Phil Marshall; C. D. Fassnacht; N. Rumbaugh; Gregory Dobler; Amir Aghamousa; V. Bonvin; F. Courbin; Alireza Hojjati; N. Jackson; Vinay L. Kashyap; S. Rathna Kumar; Eric V. Linder; Kaisey S. Mandel; Xiao-Li Meng; G. Meylan; Leonidas A. Moustakas; T. P. Prabhu; Andrew Romero-Wolf; Arman Shafieloo; Aneta Siemiginowska; C. S. Stalin; Hyungsuk Tak; M. Tewes; David A. van Dyk

We present the results of the first strong lens time delay challenge. The motivation, experimental design, and entry level challenge are described in a companion paper. This paper presents the main challenge, TDC1, which consisted of analyzing thousands of simulated light curves blindly. The observational properties of the light curves cover the range in quality obtained for current targeted efforts (e.g.,~COSMOGRAIL) and expected from future synoptic surveys (e.g.,~LSST), and include simulated systematic errors. \nteamsA\ teams participated in TDC1, submitting results from \nmethods\ different method variants. After a describing each method, we compute and analyze basic statistics measuring accuracy (or bias)


The Astrophysical Journal | 2010

On Computing Upper Limits to Source Intensities

Vinay L. Kashyap; David A. van Dyk; Alanna Connors; Peter E. Freeman; Aneta Siemiginowska; Jin Xu; A. L. Zezas

A


Journal of Computational and Graphical Statistics | 2009

Partially Collapsed Gibbs Samplers: Illustrations and Applications

Taeyoung Park; David A. van Dyk

, goodness of fit

Collaboration


Dive into the David A. van Dyk's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alanna Connors

University of New Hampshire

View shared research outputs
Top Co-Authors

Avatar

Nathan Stein

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Elizabeth Jeffery

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge