Joseph W. Richards
University of California, Berkeley
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joseph W. Richards.
Nature | 2011
Weidong Li; Joshua S. Bloom; Philipp Podsiadlowski; Adam A. Miller; S. Bradley Cenko; Saurabh W. Jha; Mark Sullivan; D. Andrew Howell; Peter E. Nugent; Nathaniel R. Butler; Eran O. Ofek; Mansi M. Kasliwal; Joseph W. Richards; Alan N. Stockton; Hsin-Yi Shih; Lars Bildsten; Michael M. Shara; Joanne Bibby; Alexei V. Filippenko; Mohan Ganeshalingam; Jeffrey M. Silverman; S. R. Kulkarni; Nicholas M. Law; Dovi Poznanski; Robert Michael Quimby; Curtis McCully; Brandon Patel; K. Maguire; Ken J. Shen
Weidong Li1, Joshua S. Bloom1, Philipp Podsiadlowski2, Adam A. Miller1, S. Bradley Cenko1, Saurabh W. Jha3, Mark Sullivan2, D. Andrew Howell4,5, Peter E. Nugent6,1, Nathaniel R. Butler7, Eran O. Ofek8,9, Mansi M. Kasliwal10, Joseph W. Richards1,11, Alan Stockton12, Hsin-Yi Shih12, Lars Bildsten5,13, Michael M. Shara14, Joanne Bibby14, Alexei V. Filippenko1, Mohan Ganeshalingam1, Jeffrey M. Silverman1, S. R. Kulkarni8, Nicholas M. Law15, Dovi Poznanski16, Robert M. Quimby8, Curtis McCully3, Brandon Patel3, & Kate Maguire2Type Ia supernovae are thought to result from a thermonuclear explosion of an accreting white dwarf in a binary system, but little is known of the precise nature of the companion star and the physical properties of the progenitor system. There are two classes of models: double-degenerate (involving two white dwarfs in a close binary system) and single-degenerate models. In the latter, the primary white dwarf accretes material from a secondary companion until conditions are such that carbon ignites, at a mass of 1.38 times the mass of the Sun. The type Ia supernova SN 2011fe was recently detected in a nearby galaxy. Here we report an analysis of archival images of the location of SN 2011fe. The luminosity of the progenitor system (especially the companion star) is 10–100 times fainter than previous limits on other type Ia supernova progenitor systems, allowing us to rule out luminous red giants and almost all helium stars as the mass-donating companion to the exploding white dwarf.
The Astrophysical Journal | 2011
Joseph W. Richards; Dan L. Starr; Nathaniel R. Butler; Joshua S. Bloom; John M. Brewer; Arien Crellin-Quick; Justin Higgins; Rachel Kennedy; Maxime Rischard
With the coming data deluge from synoptic surveys, there is a need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly observed variables based on small numbers of time-series measurements. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics (features), detail methods to robustly estimate periodic features, introduce tree-ensemble methods for accurate variable-star classification, and show how to rigorously evaluate a classifier using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% error rate using the random forest (RF) classifier; this represents a 24% improvement over the best previous classifier on these data. This methodology is effective for identifying samples of specific science classes: for pulsational variables used in Milky Way tomography we obtain a discovery efficiency of 98.2% and for eclipsing systems we find an efficiency of 99.1%, both at 95% purity. The RF classifier is superior to other methods in terms of accuracy, speed, and relative immunity to irrelevant features; the RF can also be used to estimate the importance of each feature in classification. Additionally, we present the first astronomical use of hierarchical classification methods to incorporate a known class taxonomy in the classifier, which reduces the catastrophic error rate from 8% to 7.8%. Excluding low-amplitude sources, the overall error rate improves to 14%, with a catastrophic error rate of 3.5%.
Publications of the Astronomical Society of the Pacific | 2012
Joshua S. Bloom; Joseph W. Richards; Peter E. Nugent; Robert Michael Quimby; Mansi M. Kasliwal; Dan L. Starr; Dovi Poznanski; Eran O. Ofek; S. B. Cenko; N. Butler; S. R. Kulkarni; Avishay Gal-Yam; Nicholas M. Law
The rate of image acquisition in modern synoptic imaging surveys has already begun to outpace the feasibility of keeping astronomers in the real-time discovery and classification loop. Here we present the inner workings of a framework, based on machine-learning algorithms, that captures expert training and ground-truth knowledge about the variable and transient sky to automate (1) the process of discovery on image differences, and (2) the generation of preliminary science-type classifications of discovered sources. Since follow-up resources for extracting novel science from fast-changing transients are precious, self-calibrating classification probabilities must be couched in terms of efficiencies for discovery and purity of the samples generated. We estimate the purity and efficiency in identifying real sources with a two-epoch image-difference discovery algorithm for the Palomar Transient Factory (PTF) survey. Once given a source discovery, using machine-learned classification trained on PTF data, we distinguish between transients and variable stars with a 3.8% overall error rate (with 1.7% errors for imaging within the Sloan Digital Sky Survey footprint). At >96% classification efficiency, the samples achieve 90% purity. Initial classifications are shown to rely primarily on context-based features, determined from the data itself and external archival databases. In the first year of autonomous operations of PTF, this discovery and classification framework led to several significant science results, from outbursting young stars to subluminous Type IIP supernovae to candidate tidal disruption events. We discuss future directions of this approach, including the possible roles of crowdsourcing and the scalability of machine learning to future surveys such as the Large Synoptic Survey Telescope (LSST).
Publications of the Astronomical Society of the Pacific | 2010
Richard Kessler; Bruce A. Bassett; Pavel Belov; Vasudha Bhatnagar; Heather Campbell; A. Conley; Joshua A. Frieman; Alexandre Glazov; S. González-Gaitán; Renée Hlozek; Saurabh W. Jha; Stephen Kuhlmann; Martin Kunz; Hubert Lampeitl; Ashish A. Mahabal; James Newling; Robert C. Nichol; David Parkinson; Ninan Sajeeth Philip; Dovi Poznanski; Joseph W. Richards; Steven A. Rodney; Masao Sako; Donald P. Schneider; Maximilian D. Stritzinger; Melvin Varughese
We report results from the Supernova Photometric Classification Challenge (SNPhotCC), a publicly released mix of simulated supernovae (SNe), with types (Ia, Ibc, and II) selected in proportion to their expected rates. The simulation was realized in the griz filters of the Dark Energy Survey (DES) with realistic observing conditions (sky noise, point-spread function, and atmospheric transparency) based on years of recorded conditions at the DES site. Simulations of non-Ia-type SNe are based on spectroscopically confirmed light curves that include unpublished non-Ia samples donated from the Carnegie Supernova Project (CSP), the Supernova Legacy Survey (SNLS), and the Sloan Digital Sky Survey-II (SDSS-II). A spectroscopically confirmed subset was provided for training. We challenged scientists to run their classification algorithms and report a type and photo-z for each SN. Participants from 10 groups contributed 13 entries for the sample that included a host-galaxy photo-z for each SN and nine entries for the sample that had no redshift information. Several different classification strategies resulted in similar performance, and for all entries the performance was significantly better for the training subset than for the unconfirmed sample. For the spectroscopically unconfirmed subset, the entry with the highest average figure of merit for classifying SNe Ia has an efficiency of 0.96 and an SN Ia purity of 0.79. As a public resource for the future development of photometric SN classification and photo-z estimators, we have released updated simulations with improvements based on our experience from the SNPhotCC, added samples corresponding to the Large Synoptic Survey Telescope (LSST) and the SDSS-II, and provided the answer keys so that developers can evaluate their own analysis.
The Astrophysical Journal | 2012
Craig D. Harrison; Christopher J. Miller; Joseph W. Richards; Edward Lloyd-Davies; Ben Hoyle; A. Kathy Romer; Nicola Mehrtens; Matt Hilton; John P. Stott; D. Capozzi; Chris A. Collins; Paul James Deadman; Andrew R. Liddle; Martin Sahlén; S. Adam Stanford; Pedro T. P. Viana
This paper presents both the result of a search for fossil systems (FSs) within the XMM Cluster Survey and the Sloan Digital Sky Survey and the results of a study of the stellar mass assembly and stellar populations of their fossil galaxies. In total, 17 groups and clusters are identified at z < 0.25 with large magnitude gaps between the first and fourth brightest galaxies. All the information necessary to classify these systems as fossils is provided. For both groups and clusters, the total and fractional luminosity of the brightest galaxy is positively correlated with the magnitude gap. The brightest galaxies in FSs (called fossil galaxies) have stellar populations and star formation histories which are similar to normal brightest cluster galaxies (BCGs). However, at fixed group/cluster mass, the stellar masses of the fossil galaxies are larger compared to normal BCGs, a fact that holds true over a wide range of group/cluster masses. Moreover, the fossil galaxies are found to contain a significant fraction of the total optical luminosity of the group/cluster within 0.5 R 200, as much as 85%, compared to the non-fossils, which can have as little as 10%. Our results suggest that FSs formed early and in the highest density regions of the universe and that fossil galaxies represent the end products of galaxy mergers in groups and clusters.
Monthly Notices of the Royal Astronomical Society | 2013
Henrik Brink; Joseph W. Richards; Dovi Poznanski; Joshua S. Bloom; John A. Rice; Sahand Negahban; Martin J. Wainwright
Modern time-domain surveys continuously monitor large swaths of the sky to look for astronomical variability. Astrophysical discovery in such data sets is complicated by the fact that detections of real transient and variable sources are highly outnumbered by bogus detections caused by imperfect subtractions, atmospheric effects and detector artefacts. In this work we present a machine learning (ML) framework for discovery of variability in time-domain imaging surveys. Our ML methods provide probabilistic statements, in near real time, about the degree to which each newly observed source is astrophysically relevant source of variable brightness. We provide details about each of the analysis steps involved, including compilation of the training and testing sets, construction of descriptive image-based and contextual features, and optimization of the feature subset and model tuning parameters. Using a validation set of nearly 30,000 objects from the Palomar Transient Factory, we demonstrate a missed detection rate of at most 7.7% at our chosen false-positive rate of 1% for an optimized ML classifier of 23 features, selected to avoid feature correlation and over-fitting from an initial library of 42 attributes. Importantly, we show that our classification methodology is insensitive to mis-labelled training data up to a contamination of nearly 10%, making it easier to compile sufficient training sets for accurate performance in future surveys. This ML framework, if so adopted, should enable the maximization of scientific gain from future synoptic survey and enable fast follow-up decisions on the vast amounts of streaming data produced by such experiments.
Astrophysical Journal Supplement Series | 2012
Joseph W. Richards; Dan L. Starr; Adam A. Miller; Joshua S. Bloom; Nathaniel R. Butler; Henrik Brink; Arien Crellin-Quick
With growing data volumes from synoptic surveys, astronomers necessarily must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of classification purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All-Sky Automated Survey (ASAS), and release the Machine-learned ASAS Classification Catalog (MACC), a 28 class probabilistic classification catalog of 50,124 ASAS sources in the ASAS Catalog of Variable Stars. We estimate that MACC achieves a sub-20% classification error rate and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes.
The Astrophysical Journal | 2012
Joseph W. Richards; Dan L. Starr; Henrik Brink; Adam A. Miller; Joshua S. Bloom; Nathaniel R. Butler; J. Berian James; James P. Long; John A. Rice
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because (1) standard assumptions for machine-learned model selection procedures break down and (2) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting, co-training, and active learning (AL). We argue that AL—where the data whose inclusion in the training set would most improve predictions on the testing set are queried for manual follow-up—is an effective approach and is appropriate for many astronomical applications. For a variable star classification problem on a well-studied set of stars from Hipparcos and Optical Gravitational Lensing Experiment, AL is the optimal method in terms of error rate on the testing data, beating the off-the-shelf classifier by 3.4% and the other proposed methods by at least 3.0%. To aid with manual labeling of variable stars, we developed a Web interface which allows for easy light curve visualization and querying of external databases. Finally, we apply AL to classify variable stars in the All Sky Automated Survey, finding dramatic improvement in our agreement with the ASAS Catalog of Variable Stars, from 65.5% to 79.5%, and a significant increase in the classifiers average confidence for the testing set, from 14.6% to 42.9%, after a few AL iterations.
Monthly Notices of the Royal Astronomical Society | 2009
Peter E. Freeman; Jeffrey A. Newman; Ann B. Lee; Joseph W. Richards; Chad M. Schafer
The development of fast and accurate methods of photometric redshift estimation is a vital step towards being able to fully utilize the data of next-generation surveys within precision cosmology. In this paper we apply a specific approach to spectral connectivity analysis (SCA; Lee & Wasserman 2009) called diffusion map. SCA is a class of non-linear techniques for transforming observed data (e.g., photometric colours for each galaxy, where the data lie on a complex subset of p-dimensional space) to a simpler, more natural coordinate system wherein we apply regression to make redshift predictions. As SCA relies upon eigen-decomposition, our training set size is limited to ~ 10,000 galaxies; we use the Nystrom extension to quickly estimate diffusion coordinates for objects not in the training set. We apply our method to 350,738 SDSS main sample galaxies, 29,816 SDSS luminous red galaxies, and 5,223 galaxies from DEEP2 with CFHTLS ugriz photometry. For all three datasets, we achieve prediction accuracies on par with previous analyses, and find that use of the Nystrom extension leads to a negligible loss of prediction accuracy relative to that achieved with the training sets. As in some previous analyses (e.g., Collister & Lahav 2004, Ball et al. 2008), we observe that our predictions are generally too high (low) in the low (high) redshift regimes. We demonstrate that this is a manifestation of attenuation bias, wherein measurement error (i.e., uncertainty in diffusion coordinates due to uncertainty in the measured fluxes/magnitudes) reduces the slope of the best-fit regression line. Mitigation of this bias is necessary if we are to use photometric redshift estimates produced by computationally efficient empirical methods in precision cosmology.
The Astrophysical Journal | 2009
Joseph W. Richards; Peter E. Freeman; Ann B. Lee; Chad M. Schafer
Dimension-reduction techniques can greatly improve statistical inference in astronomy. A standard approach is to use Principal Components Analysis (PCA). In this work, we apply a recently developed technique, diffusion maps, to astronomical spectra for data parameterization and dimensionality reduction, and develop a robust, eigenmode-based framework for regression. We show how our framework provides a computationally efficient means by which to predict redshifts of galaxies, and thus could inform more expensive redshift estimators such as template cross-correlation. It also provides a natural means by which to identify outliers (e.g., misclassified spectra, spectra with anomalous features). We analyze 3835 Sloan Digital Sky Survey spectra and show how our framework yields a more than 95% reduction in dimensionality. Finally, we show that the prediction error of the diffusion-map-based regression approach is markedly smaller than that of a similar approach based on PCA, clearly demonstrating the superiority of diffusion maps over PCA for this regression task.