Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Edward I. George is active.

Publication


Featured researches published by Edward I. George.


The American Statistician | 1992

Explaining the Gibbs Sampler

George Casella; Edward I. George

Abstract Computer-intensive algorithms, such as the Gibbs sampler, have become increasingly popular statistical tools, both in applied and theoretical work. The properties of such algorithms, however, may sometimes not be obvious. Here we give a simple explanation of how and why the Gibbs sampler works. We analytically establish its properties in a simple case and provide insight for more complicated cases. There are also a number of examples.


Journal of the American Statistical Association | 1993

Variable Selection via Gibbs Sampling

Edward I. George; Robert E. McCulloch

Abstract A crucial problem in building a multiple regression model is the selection of predictors to include. The main thrust of this article is to propose and develop a procedure that uses probabilistic considerations for selecting promising subsets. This procedure entails embedding the regression setup in a hierarchical normal mixture model where latent variables are used to identify subset choices. In this framework the promising subsets of predictors can be identified as those with higher posterior probability. The computational burden is then alleviated by using the Gibbs sampler to indirectly sample from this multinomial posterior distribution on the set of possible subset choices. Those subsets with higher probability—the promising ones—can then be identified by their more frequent appearance in the Gibbs sample.


Journal of the American Statistical Association | 1998

Bayesian CART model search

Hugh A. Chipman; Edward I. George; Robert E. McCulloch

Abstract In this article we put forward a Bayesian approach for finding classification and regression tree (CART) models. The two basic components of this approach consist of prior specification and stochastic search. The basic idea is to have the prior induce a posterior distribution that will guide the stochastic search toward more promising CART models. As the search proceeds, such models can then be selected with a variety of criteria, such as posterior probability, marginal likelihood, residual sum of squares or misclassification rates. Examples are used to illustrate the potential superiority of this approach over alternative methods.


Journal of the American Statistical Association | 2000

The Variable Selection Problem

Edward I. George

Abstract The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments that have led to the wide variety of approaches for this problem.


Machine Learning | 2002

Bayesian Treed Models

Hugh A. Chipman; Edward I. George; Robert E. McCulloch

When simple parametric models such as linear regression fail to adequately approximate a relationship across an entire set of data, an alternative may be to consider a partition of the data, and then use a separate simple model within each subset of the partition. Such an alternative is provided by a treed model which uses a binary tree to identify such a partition. However, treed models go further than conventional trees (e.g. CART, C4.5) by fitting models rather than a simple mean or proportion within each subset. In this paper, we propose a Bayesian approach for finding and fitting parametric treed models, in particular focusing on Bayesian treed regression. The potential of this approach is illustrated by a cross-validation comparison of predictive performance with neural nets, MARS, and conventional trees on simulated and real data sets.


Journal of The Royal Statistical Society Series B-statistical Methodology | 2000

Flexible empirical Bayes estimation for wavelets

Merlise A. Clyde; Edward I. George

Wavelet shrinkage estimation is an increasingly popular method for signal denoising and compression. Although Bayes estimators can provide excellent mean‐squared error (MSE) properties, the selection of an effective prior is a difficult task. To address this problem, we propose empirical Bayes (EB) prior selection methods for various error distributions including the normal and the heavier‐tailed Student t‐distributions. Under such EB prior distributions, we obtain threshold shrinkage estimators based on model selection, and multiple‐shrinkage estimators based on model averaging. These EB estimators are seen to be computationally competitive with standard classical thresholding methods, and to be robust to outliers in both the data and wavelet domains. Simulated and real examples are used to illustrate the flexibility and improved MSE performance of these methods in a wide variety of settings.


Nature Human Behaviour | 2018

Redefine Statistical Significance

Daniel J. Benjamin; James O. Berger; Magnus Johannesson; Brian A. Nosek; Eric-Jan Wagenmakers; Richard A. Berk; Kenneth A. Bollen; Björn Brembs; Lawrence D. Brown; Colin F. Camerer; David Cesarini; Christopher D. Chambers; Merlise A. Clyde; Thomas D. Cook; Paul De Boeck; Zoltan Dienes; Anna Dreber; Kenny Easwaran; Charles Efferson; Ernst Fehr; Fiona Fidler; Andy P. Field; Malcolm R. Forster; Edward I. George; Richard Gonzalez; Steven N. Goodman; Edwin J. Green; Donald P. Green; Anthony G. Greenwald; Jarrod D. Hadfield

We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries.


international journal of management science and engineering management | 2016

Bayes and big data: the consensus Monte Carlo algorithm

Steven L. Scott; Alexander W. Blocker; Fernando V. Bonassi; Hugh A. Chipman; Edward I. George; Robert E. McCulloch

A useful definition of ‘big data’ is data that is too big to process comfortably on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be eliminated by splitting data across multiple machines. Communication between large numbers of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each machine, and then averaging individual Monte Carlo draws across machines. Depending on the model, the resulting draws can be nearly indistinguishable from the draws that would have been obtained by running a single-machine algorithm for a very long time. Examples of consensus Monte Carlo are shown for simple models where single-machine solutions are available, for large single-layer hierarchical models, and for Bayesian additive regression trees (BART).


Journal of the American Statistical Association | 1997

Ozone Exposure and Population Density in Harris County, Texas

Raymond J. Carroll; Rong Chen; Edward I. George; T. H. Li; H. J. Newton; H. Schmiediche; Naisyin Wang

Abstract We address the following question: What is the pattern of human exposure to ozone in Harris County (Houston) since 1980? While there has been considerable research on characterizing ozone measured at fixed monitoring stations, little is known about ozone away from the monitoring stations, and whether areas of higher ozone correspond to areas of high population density. To address this question, we build a spatial-temporal model for hourly ozone levels that predicts ozone at any location in Harris County at any time between 1980 and 1993. Along with building the model, we develop a fast model-fitting method that can cope with the massive amounts of available data and takes into account the substantial number of missing observations. Having built the model, we combine it with census tract information, focusing on young children. We conclude that the highest ozone levels occur at locations with relatively small populations of young children. Using various measures of exposure, we estimate that expos...


Journal of the American Statistical Association | 2014

EMVS: The EM Approach to Bayesian Variable Selection

Veronika Rockova; Edward I. George

Despite rapid developments in stochastic search algorithms, the practicality of Bayesian variable selection methods has continued to pose challenges. High-dimensional data are now routinely analyzed, typically with many more covariates than observations. To broaden the applicability of Bayesian variable selection for such high-dimensional linear regression contexts, we propose EMVS, a deterministic alternative to stochastic search based on an EM algorithm which exploits a conjugate mixture prior formulation to quickly find posterior modes. Combining a spike-and-slab regularization diagram for the discovery of active predictor sets with subsequent rigorous evaluation of posterior model probabilities, EMVS rapidly identifies promising sparse high posterior probability submodels. External structural information such as likely covariate groupings or network topologies is easily incorporated into the EMVS framework. Deterministic annealing variants are seen to improve the effectiveness of our algorithms by mitigating the posterior multimodality associated with variable selection priors. The usefulness of the EMVS approach is demonstrated on real high-dimensional data, where computational complexity renders stochastic search to be less practical.

Collaboration


Dive into the Edward I. George's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lawrence D. Brown

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Linda H. Zhao

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Andreas Buja

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Richard A. Berk

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Veronika Rockova

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Emil Pitkin

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Dean P. Foster

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar

Kai Zhang

University of North Carolina at Chapel Hill

View shared research outputs
Researchain Logo
Decentralizing Knowledge