Tamara Broderick
University of California, Berkeley
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Tamara Broderick.
Bayesian Analysis | 2012
Tamara Broderick; Michael I. Jordan; Jim Pitman
The beta-Bernoulli process provides a Bayesian nonparametric prior for models involving collections of binary-valued features. A draw from the beta process yields an infinite collection of probabilities in the unit interval, and a draw from the Bernoulli process turns these into binaryvalued features. Recent work has provided stick-breaking representations for the beta process analogous to the well-known stick-breaking representation for the Dirichlet process. We derive one such stick-breaking representation directly from the characterization of the beta process as a completely random measure. This approach motivates a three-parameter generalization of the beta process, and we study the power laws that can be obtained from this generalized beta process. We present a posterior inference algorithm for the beta-Bernoulli process that exploits the stickbreaking representation, and we present experimental results for a discrete factor-analysis model.
Bayesian Analysis | 2013
Tamara Broderick; Jim Pitman; Michael I. Jordan
The problem of inferring a clustering of a data set has been the subject of much research in Bayesian analysis, and there currently exists a solid mathematical foundation for Bayesian approaches to clustering. In particular, the class of probability distributions over partitions of a data set has been characterized in a number of ways, including via exchangeable partition probability functions (EPPFs) and the Kingman paintbox. Here, we develop a generalization of the clustering problem, called feature allocation, where we allow each data point to belong to an arbitrary, non-negative integer number of groups, now called features or topics. We define and study an “exchangeable feature probability function” (EFPF)—analogous to the EPPF in the clustering setting—for certain types of feature models. Moreover, we introduce a “feature paintbox” characterization—analogous to the Kingman paintbox for clustering—of the class of exchangeable feature models. We provide a further characterization of the subclass of feature allocations that have EFPF representations.
The Astrophysical Journal | 2004
Dragan Huterer; Alex G. Kim; Lawrence M. Krauss; Tamara Broderick
We investigate the redshift accuracy of Type Ia supernova and cluster number count surveys required for the redshift uncertainties not to contribute appreciably to the dark energy parameter error budget. For the Supernova/Acceleration Probe experiment, we find that without the assistance of ground-based measurements individual supernova redshifts would need to be determined to about 0.002 or better, a challenging but feasible requirement for a low-resolution spectrograph. However, we find that accurate redshifts for z < 0.1 supernovae obtained with ground-based experiments are sufficient to protect the results against even relatively large redshift errors at high z. For the future cluster number count surveys such as with the South Pole Telescope, Planck, or DUET, we find that the purely statistical error in the photometric redshift is less important and that the irreducible systematic bias in redshift drives the requirements. The redshift bias must be kept below 0.001-0.005 per redshift bin (which is determined by the filter set), depending on the sky coverage and details of the definition of the minimal mass of the survey. Furthermore, we find that X-ray surveys have a more stringent required redshift accuracy than Sunyaev-Zeldovich (SZ) effect surveys since they use a shorter lever arm in redshift; conversely, SZ surveys benefit from their high-redshift reach only as long as some redshift information is available for distant (z 1) clusters.
Statistical Science | 2013
Tamara Broderick; Michael I. Jordan; Jim Pitman
One of the focal points of the modern literature on Bayesian nonparametrics has been the problem of clustering, or partitioning, where each data point is modeled as being associated with one and only one of some collection of groups called clusters or partition blocks. Underlying these Bayesian nonparametric models are a set of interrelated stochastic processes, most notably the Dirichlet process and the Chinese restaurant process. In this paper we provide a formal development of an analogous problem, called feature modeling, for associating data points with arbitrary nonnegative integer numbers of groups, now called features or topics. We review the existing combinatorial stochastic process representations for the clustering problem and develop analogous representations for the feature modeling problem. These representations include the beta process and the Indian buffet process as well as new representations that provide insight into the connections between these processes. We thereby bring the same level of completeness to the treatment of Bayesian nonparametric feature modeling that has previously been achieved for Bayesian nonparametric clustering.
Journal of Computational and Graphical Statistics | 2014
Jan Luts; Tamara Broderick; M. P. Wand
We develop algorithms for performing semiparametric regression analysis in real time, with data processed as it is collected and made immediately available via modern telecommunications technologies. Our definition of semiparametric regression is quite broad and includes, as special cases, generalized linear mixed models, generalized additive models, geostatistical models, wavelet nonparametric regression models and their various combinations. Fast updating of regression fits is achieved by couching semiparametric regression into a Bayesian hierarchical model or, equivalently, graphical model framework and employing online mean field variational ideas. An Internet site attached to this article, realtime-semiparametric-regression.net, illustrates the methodology for continually arriving stock market, real estate, and airline data. Flexible real-time analyses based on increasingly ubiquitous streaming data sources stand to benefit. This article has online supplementary material.
PLOS ONE | 2009
Tamara Broderick; David J. C. MacKay
Selection methods that require only a single-switch input, such as a button click or blink, are potentially useful for individuals with motor impairments, mobile technology users, and individuals wishing to transmit information securely. We present a single-switch selection method, “Nomon,” that is general and efficient. Existing single-switch selection methods require selectable options to be arranged in ways that limit potential applications. By contrast, traditional operating systems, web browsers, and free-form applications (such as drawing) place options at arbitrary points on the screen. Nomon, however, has the flexibility to select any point on a screen. Nomon adapts automatically to an individuals clicking ability; it allows a person who clicks precisely to make a selection quickly and allows a person who clicks imprecisely more time to make a selection without error. Nomon reaps gains in information rate by allowing the specification of beliefs (priors) about option selection probabilities and by avoiding tree-based selection schemes in favor of direct (posterior) inference. We have developed both a Nomon-based writing application and a drawing application. To evaluate Nomons performance, we compared the writing application with a popular existing method for single-switch writing (row-column scanning). Novice users wrote 35% faster with the Nomon interface than with the scanning interface. An experienced user (author TB, with 10 hours practice) wrote at speeds of 9.3 words per minute with Nomon, using 1.2 clicks per character and making no errors in the final text.
The Astrophysical Journal | 2012
Adam N. Morgan; James P. Long; Joseph W. Richards; Tamara Broderick; Nathaniel R. Butler; Joshua S. Bloom
As the number of observed gamma-ray bursts (GRBs) continues to grow, follow-up resources need to be used more efficiently in order to maximize science output from limited telescope time. As such, it is becoming increasingly important to rapidly identify bursts of interest as soon as possible after the event, before the afterglows fade beyond detectability.Studyingthemostdistant(highestredshift)events,forinstance,remainsaprimarygoalformanyinthe field. Here, we present our Random Forest Automated Triage Estimator for GRB redshifts (RATE GRB-z) for rapid identification of high-redshift candidates using early-time metrics from the three telescopes onboard Swift. While the basic RATE methodology is generalizable to a number of resource allocation problems, here we demonstrate its utility for telescope-constrained follow-up efforts with the primary goal to identify and study high-z GRBs. For each new GRB, RATE GRB-z provides a recommendation—based on the available telescope time—of whether the event warrants additional follow-up resources. We train RATE GRB-z using a set consisting of 135 Swift bursts with known redshifts, only 18 of which are z> 4. Cross-validated performance metrics on these training data suggest that ∼56% of high-z bursts can be captured from following up the top 20% of the ranked candidates, and ∼84% of high-z bursts are identified after following up the top ∼40% of candidates. We further use the method to rank 200+ Swift bursts with unknown redshifts according to their likelihood of being high-z.
Journal of Classification | 2011
Tamara Broderick; Robert B. Gramacy
Recognizing the successes of treed Gaussian process (TGP) models as an interpretable and thrifty model for nonparametric regression, we seek to extend the model to classification. Both treed models and Gaussian processes (GPs) have, separately, enjoyed great success in application to classification problems. An example of the former is Bayesian CART. In the latter, real-valued GP output may be utilized for classification via latent variables, which provide classification rules by means of a softmax function. We formulate a Bayesian model averaging scheme to combine these two models and describe a Monte Carlo method for sampling from the full posterior distribution with joint proposals for the tree topology and the GP parameters corresponding to latent variables at the leaves. We concentrate on efficient sampling of the latent variables, which is important to obtain good mixing in the expanded parameter space. The tree structure is particularly helpful for this task and also for developing an efficient scheme for handling categorical predictors, which commonly arise in classification problems. Our proposed classification TGP (CTGP) methodology is illustrated on a collection of synthetic and real data sets. We assess performance relative to existing methods and thereby show how CTGP is highly flexible, offers tractable inference, produces rules that are easy to interpret, and performs well out of sample.
Archive | 2010
Tamara Broderick; Robert B. Gramacy
Recognizing the success of the treed Gaussian process (TGP) model as an interpretable and thrifty model for nonstationary regression, we seek to extend the model to classification. By combining Bayesian CART and the latent variable approach to classification via Gaussian processes (GPs), we develop a Bayesian model averaging scheme to traverse the full space of classification TGPs (CTGPs). We illustrate our method on synthetic and real data and thereby show how the combined approach is highly flexible, offers tractable inference, produces rules that are easy to interpret, and performs well out of sample.
Bernoulli | 2018
Tamara Broderick; Ashia C. Wilson; Michael I. Jordan
We demonstrate how to calculate posteriors for general CRM-based priors and likelihoods for Bayesian nonparametric models. We further show how to represent Bayesian nonparametric priors as a sequence of finite draws using a size-biasing approach---and how to represent full Bayesian nonparametric models via finite marginals. Motivated by conjugate priors based on exponential family representations of likelihoods, we introduce a notion of exponential families for CRMs, which we call exponential CRMs. This construction allows us to specify automatic Bayesian nonparametric conjugate priors for exponential CRM likelihoods. We demonstrate that our exponential CRMs allow particularly straightforward recipes for size-biased and marginal representations of Bayesian nonparametric models. Along the way, we prove that the gamma process is a conjugate prior for the Poisson likelihood process and the beta prime process is a conjugate prior for a process we call the odds Bernoulli process. We deliver a size-biased representation of the gamma process and a marginal representation of the gamma process coupled with a Poisson likelihood process.
