Galaxy Zoo Supernovae
A. M. Smith, S. Lynn, M. Sullivan, C. J. Lintott, P. E. Nugent, J. Botyanszki, M. Kasliwal, R. Quimby, S. P. Bamford, L. F. Fortson, K. Schawinski, I. Hook, S. Blake, P. Podsiadlowski, J. Joensson, A. Gal-Yam, I. Arcavi, D. A. Howell, J. S. Bloom, J. Jacobsen, S. R. Kulkarni, N. M. Law, E. O. Ofek, R. Walters
MMon. Not. R. Astron. Soc. , 000–000 (0000) Printed 23 October 2018 (MN L A TEX style file v2.2)
Galaxy Zoo Supernovae
A. M. Smith (cid:63) † , S. Lynn , M. Sullivan ‡ , C. J. Lintott , P. E. Nugent ,J. Botyanszki , M. Kasliwal , R. Quimby , S. P. Bamford , L. F. Fortson ,K. Schawinski , , ,I. Hook , , S. Blake , P. Podsiadlowski , J. J¨onsson , A. Gal-Yam ,I. Arcavi , D. A. Howell , , J. S. Bloom , J. Jacobsen , S. R. Kulkarni ,N. M. Law , E. O. Ofek , , R. Walters Department of Physics (Astrophysics), University of Oxford, DWB, Keble Road, Oxford OX1 3RH, UK Computational Cosmology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA. Cahill Center for Astrophysics, California Institute of Technology, Pasadena, CA, 91125, USA Einstein Fellow Department of Physics, Yale University, New Haven, CT 06511, USA Yale Center for Astronomy and Astrophysics, Yale University, P.O. Box 208121, New Haven, CT 06520, USA INAF-Osservatorio di Roma, via Frascati 33, I-00040 Monteporzio Catone (Roma), Italy Department of Particle Physics and Astrophysics, Faculty of Physics, The Weizmann Institute of Science, Rehovot 76100, Israel Las Cumbres Observatory Global Telescope Network, 6740 Cortona Dr, Suite 102, Goleta, CA 93117 University of California, Santa Barbara, Broida Hall, Mail Code 9530, Santa Barbara, CA 93106-9530, USA Department of Astronomy, University of California, Berkeley, CA 94720-3411, USA Dunlap Institute for Astronomy and Astrophysics, University of Toronto, 50 St. George Street, Toronto M5S 3H4, Ontario, Canada Caltech Optical Observatories, California Institute of Technology, Pasadena, CA 91125, USA School of Physics and Astronomy, University of Nottingham, University Park, Nottingham, NG7 2RD School of Physics and Astronomy, University of Minnesota, Minneapolis, MN 55455, USA
23 October 2018
ABSTRACT
This paper presents the first results from a new citizen science project: Galaxy ZooSupernovae. This proof of concept project uses members of the public to identify su-pernova candidates from the latest generation of wide-field imaging transient surveys.We describe the Galaxy Zoo Supernovae operations and scoring model, and demon-strate the effectiveness of this novel method using imaging data and transients fromthe Palomar Transient Factory (PTF). We examine the results collected over the pe-riod April–July 2010, during which nearly 14,000 supernova candidates from PTFwere classified by more than 2,500 individuals within a few hours of data collection.We compare the transients selected by the citizen scientists to those identified byexperienced PTF scanners, and find the agreement to be remarkable – Galaxy ZooSupernovae performs comparably to the PTF scanners, and identified as transients93% of the ∼
130 spectroscopically confirmed SNe that PTF located during the trialperiod (with no false positive identifications). Further analysis shows that only a smallfraction of the lowest signal-to-noise SN detections ( r > .
5) are given low scores:Galaxy Zoo Supernovae correctly identifies all SNe with (cid:62) σ detections in the PTFimaging data. The Galaxy Zoo Supernovae project has direct applicability to futuretransient searches such as the Large Synoptic Survey Telescope, by both rapidly iden-tifying candidate transient events, and via the training and improvement of existingmachine classifier algorithms. Key words: supernovae: general — surveys – methods: data analysis (cid:63)
This publication has been made possible by the participationof more than 10,000 volunteers in the Galaxy Zoo Supernovaeproject ( http://supernova.galaxyzoo.org/authors ). † E-mail: [email protected] ‡ E-mail: [email protected] (cid:13) a r X i v : . [ a s t r o - ph . I M ] N ov Smith et al.
Supernovae (SNe) have a profound influence upon manydiverse areas of astrophysics. They are the key source ofheavy elements in the universe, driving cosmic chemicalevolution. Their energy input can initiate episodes of starformation, and they are themselves the product of the com-plex physics underlying the final stages of stellar evolution.The homogeneous nature of the thermonuclear Type Ia SNeprovides the most mature and direct probe of dark energy.Despite this importance in astrophysics, we understand sur-prisingly little about the physics governing SN explosions.Only the progenitors of the core collapse Type IIP SNehave been directly identified: the physical nature of otherSN types remains uncertain (for reviews see Hillebrandt& Niemeyer 2000; Smartt 2009). We remain ignorantabout many aspects of SN rates, light-curves, spectra,demographics, and the dependence of these properties onenvironment, progenitor composition, and explosion physics.In part, this is due to the historical difficulty andtechnical challenges associated with locating SNe in therequired numbers to create statistically meaningful samples,particularly at low redshift where high quality follow-updata can most easily be attained. This situation haschanged with the availability of large format CCD detec-tors. Automated, wide-field transient searches on dedicated1-2m class telescopes and facilities are underway, typicallyobserving thousands of square degrees every few days (e.g.Keller et al. 2007; Law et al. 2009). These flux-limited‘rolling searches’ select transient events without regard tohost galaxy properties or type.This large amount of imaging data naturally gener-ates its own particular logistical challenges in dealingwith the data flow, and identifying transient astrophysicalobjects of interest in the data (‘candidates’) for scientificstudy and analysis. Of particular importance is the rapididentification of new candidates once the imaging data hasbeen obtained and processed. Though many aspects ofsurvey operations, such as image processing, can be effi-ciently pipelined, the identification of new transient sourcesremains challenging, with human operators (‘scanners’)invariably charged with wading through new detections on anightly basis. Though computer algorithms can assist withidentifying objects of interest in the data, this scanningcan still absorb a significant amount of researcher time. Arelated issue is spectroscopic follow-up, a limited resourcethat must be prioritised and allocated efficiently to thedetected candidates, with the absolute minimum of falsecandidates observed.Two high-redshift SN searches highlight these chal-lenges. The Supernova Legacy Survey (SNLS; e.g. Astieret al. 2006) used the MegaCam instrument on the 3.6mCanada–France–Hawaii Telescope to survey 4 deg witha cadence of a few days. Following automated cuts onsignal-to-noise and candidate shape, each square degreewould typically generate ∼
200 candidates for each night ofobservation (Perrett et al. 2010). Visual inspection woulddecrease this number to ∼
20 plausible real transients. TheSloan Digital Sky Survey-II Supernova Survey (SDSS-SN; e.g. Frieman et al. 2008) used the SDSS 2.5m telescopeto survey a larger area of 300 deg , though to a shallowerdepth than SNLS (Sako et al. 2008). After the removal ofmoving (solar system) objects, in the first season (3 monthperiod), human scanners viewed 3000–5000 objects eachnight spread over six scanners ( > ∼ ∼ The Palomar Transient Factory (PTF) is a wide-fieldsurvey exploring the optical transient sky. The survey isbuilt around the 48 inch Samuel Oschin telescope at thePalomar Observatory, recently equipped with the CFH12kmosaic camera (formerly at the Canada-France-HawaiiTelescope) offering an 7.8 square degree field of view, androbotised to allow remote and automated observations.Observations are mainly conducted using the Mould- R filter. c (cid:13) , 000–000 alaxy Zoo Supernovae A full description of the operations of the PTF exper-iment can be found in Law et al. (2009). Of most relevancefor SN studies are the ‘5-day cadence’ and ‘dynamicalcadence’ experiments, each using ∼
40 % of the observingtime. The dynamic cadence revisits survey fields on time-scales of 1 minute up to 5 days and is particularly sensitiveto rapid transient events (as well as longer duration SNe),whereas the 5-day cadence is specifically targeted toextra-galactic SN studies (Rau et al. 2009). Even in the 5day cadence, images are typically taken in pairs separatedin time by about one hour. This is to help identify movingobjects (i.e., asteroids) in the imaging data, which mightotherwise masquerade as new transients.
The PTF (near)-real-time search pipeline is hosted by theNational Energy Scientific Computing Center (NERSC) atthe Lawrence Berkeley National Laboratory (LBNL). Afterdata is taken and transferred from the Palomar observatoryto NERSC, the pipeline generates new subtraction imageswithin an hour (Nugent et al. 2010), subtracting an older,deep ‘reference’ image from the new observations. The twoimages are photometrically matched using the hotpants program , an implementation of the Alard (2000) algorithm.Candidate transient events are then identified as (cid:62) σ de-tections in the subtraction images using SExtractor (Bertin& Arnouts 1996). Fluxes and various other relevant parame-ters are measured before storing all candidates in a database.Each candidate is also ‘scored’ (producing the PTF ‘real-bogus’ value) using a machine-learning algorithm (the ‘PTFrobot’) based on the characteristics of the detection and pre-vious history of the candidate (Bloom et al., in prep.). Thevast majority ( ∼ . σ mustbe less than 6, and the number deviating by more than 3 σ must be less than 2.(iii) Each candidate must be seen in at least one imagetaken in the previous 10 nights (including the night of de-tection), a constraint designed to remove fast moving solarsystem objects,(iv) Candidates within 1 (cid:48)(cid:48) of previously located objects(excluding the previous 10 nights) are removed to avoid therepeated detection of (e.g.) AGN or variable stars.The effectiveness of these cuts means that a typical fullnight of PTF observing will yield ∼ ∼ further sorted using only a short decision tree in GalaxyZoo Supernovae.Though the ultimate aim is to make the human scan-ners redundant with a fully automated machine-learningclassification pipeline, at the current time a substantialamount of human scanning is still required to identify thegood candidates (in part, this scanning can be used to trainmachine-based methods). Candidates are inspected visuallyby human scanners in the PTF team, using a web interfaceto reject false transient detections. The human scanner candynamically alter a set of cuts to control the candidatesthat are shown for a given image, including the signal-to-noise, shape parameters, the full-width half-maximum(FWHM) of the candidate compared to the global imagevalue, and the output score from the machine classifier.Based on the cuts chosen, the scanner is presented with aseries of detection ‘triplets’ – each triplet contains threeimages showing the current image of the field (containingSN light together with all other objects), the historical orreference image of the same field (with no SN light), andthe difference between the two (which should contain onlythe SN light). Examples of triplets are shown in Fig. 1. Thehuman scanner then decides, based on his or her subjective(but informed) judgement, whether each of the candidatespresented is a real transient event, and if so marks thatcandidate as either a SN-like transient or a variable star.The primary goal of Galaxy Zoo Supernovae in PTFis to initially supplement, but perhaps ultimately replace,the role of the PTF human scanners. By presenting atransient candidate to a number of different classifiers notonly is the time of the PTF team freed to spend on tasksnot suitable for the general public, but the potential ofmis-classification of candidates due to individual humanerror is significantly reduced. The 5-day and dynamicalcadence programs in PTF collect data on every night ofthe year March to November (weather permitting) and oneach night 2–4 of the PTF team share the scanning tasks,examining ∼
500 candidates. This not only requires severalperson-hours of work, but the large number of classificationsby a small number of PTF-scanners is likely to containerrors, and this is where the repeat-classification by GalaxyZoo Supernovae volunteers can help.The Galaxy Zoo Supernova project also has other aims. Alonger-term goal is to provide sufficient classification datafor the training and improvement of the PTF machine-learning classification algorithm. A final consideration is tobuild expertise in the citizen science community for futuretransient surveys, which of course generate many morecandidates than PTF, perhaps approaching thousands ofgenuine candidates on a nightly basis. c (cid:13)000
500 candidates. This not only requires severalperson-hours of work, but the large number of classificationsby a small number of PTF-scanners is likely to containerrors, and this is where the repeat-classification by GalaxyZoo Supernovae volunteers can help.The Galaxy Zoo Supernova project also has other aims. Alonger-term goal is to provide sufficient classification datafor the training and improvement of the PTF machine-learning classification algorithm. A final consideration is tobuild expertise in the citizen science community for futuretransient surveys, which of course generate many morecandidates than PTF, perhaps approaching thousands ofgenuine candidates on a nightly basis. c (cid:13)000 , 000–000 Smith et al.
Figure 1.
Four example detection triplets from PTF, similar to those uploaded to Galaxy Zoo Supernovae. Each image is 100 (cid:48)(cid:48) on aside. In each triplet, the panels show (from left to right) the most recent image containing the candidate SN light (the science image),the reference image generated from data from an earlier epoch with no SN candidate light, and the subtraction or difference image – thescience image minus the reference image – with the SN candidate at the centre of the crosshairs. The two triplets on the left are realSNe, and were highly scored by the Zoo; the triplets on the right are not SNe and were given the lowest possible score.
The Galaxy Zoo Supernovae website is built using theZooniverse Application Programming Interface (API)toolset. The Zooniverse API is the core software supportingthe activities of all Zooniverse citizen science projects.Built originally for Galaxy Zoo 2, the software is currentlybeing used by six different projects. The Zooniverse API isdesigned primarily as a tool for serving up a large collectionof ‘assets’ (for example, images or video) to an interface, andcollecting back user-generated interactions with these assets.So that the project website can retain a high perfor-mance during spikes of activity, Galaxy Zoo Supernovaeis hosted on Amazon Web Services which provides avirtualised machine environment that can auto-scale in sizebased upon server load. The site uses the Elastic ComputeCloud (EC2) for web/database servers and the SimpleStorage Service (S3) for image storage.Image assets are presented to volunteers of the web-site through custom user interfaces, designed to aid thevolunteer in classifying the object. For many projects thisinterface takes the form of a decision tree which walksthe volunteer through a number of questions concerningthe current image. The interaction of the volunteer withthe website produces a set of ‘annotations’ which togetherconstitute a ‘classification’ of the asset. These are stored forlater analysis or in the case of Galaxy Zoo Supernovae arescored in real-time to change the behaviour of the website. http://supernova.galaxyzoo.org/ http://zooniverse.org http://aws.amazon.com/ec2 http://aws.amazon.com/s3 Similar in nature to the original Galaxy Zoo 2 interface,Galaxy Zoo Supernovae is a classic example of a ‘Zoo’.When a new highly-scored candidate is located in the PTFpipeline, an image triplet (Fig. 1) of the candidate is auto-matically uploaded, together with a small amount of meta-data, to the Galaxy Zoo Supernovae API. Upon upload, theimage is saved to Amazon S3 (a file hosting service) andregistered with the website. Finding new SNe is time criti-cal and our method of automatically registering new assetswith the API means that classifiers are inspecting SN candi-dates discovered just hours earlier. The interface for GalaxyZoo Supernovae presents these candidate detection triplets(just as with the PTF human scanners, § The decision tree developed to assist volunteers in classi-fying candidates is described in Fig. 3. This decision treeis designed to remove as many false candidates as possible,without losing real, scientifically interesting events. In thisrespect the decision tree is conservative in the candidatesthat are removed to minimise the number of false negatives.The tree proceeds as follows:(i)
Is there a candidate centered in the crosshairs of theright-hand image?
The PTF subtraction pipeline can occasionally undergoa failure and report (and therefore upload to the site) a‘good’ candidate that is actually an error in the processing.This can be due to large (several pixel) mis-alignmentsof the two images being analysed, often localised in aparticular part of the CCD where the astrometric solutionfails. Other sources of failure include saturated pixels or c (cid:13) , 000–000 alaxy Zoo Supernovae PTF pipeline Zooniverse APIAdmin/Results interface(science team)Result Classification(general public)Amazon Web Services
Figure 2.
A schematic showing the data acquisition and analysisin Galaxy Zoo Supernovae: Raw data is processed by the PTFpipeline, automatically uploaded to the API, presented, analysedand scored by the Zooniverse community and available for reviewby the PTF science team. bleed trails from bright stars, or problems with the pipelineflat-fielding. The SExtractor detection algorithm can alsosometimes detect a noise peak rather than a real transient.Though the basic cuts made by PTF remove most of theseerrors, on occasion they are ranked highly and uploaded toGalaxy Zoo Supernovae (emphasising the need for humanclassifiers). Therefore, the first question in the decision treeis designed to remove such objects. The right-hand imagein the triplets in Fig. 1 are the focus of this question.(ii)
Has the candidate itself subtracted correctly?
Small mis-alignments between the reference and scienceimage can result in image subtraction problems, usuallyindicated by a dipole of positive and negative pixels in thesubtraction image. The cores of bright (but not saturated)stars can also mis-subtract, and result in ‘bullseye’ patternsin the subtraction images. This question is designed to flagsuch candidates.(iii)
Is the candidate star-like and approximately circu-lar?
This question is designed to remove unidentified cosmicrays, or diffuse/non-circular candidates which result fromimage subtraction problems. The volunteer is asked if thecandidate looks like a round, symmetrical dot (star). Can-didates that are very small (1–2 pixels, i.e., not PSF-like),elongated or otherwise distorted, or diffuse would trigger anegative response to this question.(iv)
Is the candidate centered in a circular host galaxy?
The final question is more subjective, and is designed tocategorise real astrophysical transients into two broad cate-gories. Many of the transients which PTF detects are vari-able stars lying within our own galaxy, which are of interestto a different set of science users than extra-galactic tran-sients. Variable star transients will appear to lie in ‘hosts’ Q: Is there a candidate centered in the crosshairs of the right-hand image?No (-1) Yes Q: Has the candidate itself subtracted correctly?No (-1) Yes Q: Is the candidate star-like and approximately circular?No (-1) Yes (+1) Q: Is the candidate centred in a circular host galaxy? No (+2) Yes Q: What is wrong with the subtraction?No candidateNot all pixels positivePoor subtraction Q: What is wrong with the candidate?Not circular - too smallNot circular - elongatedNot circular - distortedNot circular - diffuse
Figure 3.
The decision tree that a Galaxy Zoo Supernovae vol-unteer is presented with when classifying a candidate (see § that are circular (as they are stars), and will also appear tobe located in the centre of these hosts. By contrast, SNe willeither have no host galaxy, or will lie (probably off-centre) ina large diffuse host galaxy. This question therefore broadlysplits the real transients into variable stellar transients, andSNe. Most SNe that do happen to lie in the centres of theirhost galaxies will not be categorised as variable stars – thequestion also requires the ‘host galaxy’ to be circular.A full tutorial is available to new volunteers of the websiteto illustrate the different questions using real PTF data. Once a volunteer has examined a candidate, their responseis converted into a score, S , as follows. • The initial score is zero. • If a classifier answers negatively any question up to andincluding ‘Is the candidate star-like and approximately cir-cular’, the candidate is given a score of -1. • If a classifier instead answers positively up to that ques-tion, then the candidate is given a score of +1. • If the classifier then also marks the candidate as notcentred in a circular host, then the candidate gains an ad-ditional score of 2. c (cid:13) , 000–000 Smith et al.
The structure of the decision and scoring of the questionsmeans that candidates can only end up with a score of -1,1 or 3 from each classification, with the most promisingSN candidates scored 3. As each new classification isreceived, the arithmetic mean score ( S ave ) of the candidateis recalculated. Candidates which are not astrophysicallyinteresting tend to have S ave < − S ave >
0, and SNe tend to have S ave > § Unseen – Candidates which have 3 or fewer classifica-tions.(ii)
Bulk – Candidates which have been classified between3 and 10 times.(iii)
Stragglers – Candidates which have been classifiedmore than 10 times, but which do not have a ‘definitive’ S ave (i.e., those with 0 . < S ave < . Done – Candidates which have been classified morethan 10 times and which have S ave < . S ave > .
7, andcandidates which have been classified more than 20 times.Candidates in the ‘unseen’ category are given absoluteprecedence over all others in an aim to get an initialunderstanding of the quality of the candidate; they areshown in order of upload time followed by the real-bogusscore. Once these are completed, the ‘bulk’ and ‘straggler’candidate classes have equal priority. We select randomlybetween the two classes, choosing the newest candidatewith the highest score from each group – as a candidatebegins to receive ‘positive’ classifications (i.e., S of 1 or 2)then it is prioritised above any others thus allowing rapididentification of the most interesting targets.The choice of 10 classifications as the first point atwhich a candidate can be considered classified is a com-promise between the robustness of the classification andspeed. Clearly, the greater the number of classificationsrequired for each candidate the slower the classificationprocess proceeds; yet the process must be robust againstboth user mistakes (i.e., clicking the wrong button) andmisunderstanding.The aim is to both quickly classify the best, high- scoring candidates (which will rapidly exceed the S ave = 1 . S ave = 0 . The science of Galaxy Zoo Supernovae relies on new candi-dates being classified rapidly, and those classifications thenbeing easily accessible to the science team.
A key part of the Galaxy Zoo Supernovae website is ascience ‘dashboard’ for the PTF team. The science dash-board provides basic statistics on the number of candidateuploads, classifications and volunteers versus time as wellas a more in-depth breakdown of the classification historyfor a candidate or individual.Custom views have been created which break down ascore ranked list of candidates for each day and week al-lowing observing teams to use these rankings to help in theidentification of good candidates for follow up observations.Candidates already identified as PTF transients show thePTF identifier on the science dashboard and a link isalso provided to allow the science team to easily mark ahighly-ranked candidate from the Zoo in the PTF database.
In order to improve the rate at which objects are classi-fied, an automated alert system that monitors the numberof candidates being uploaded to the website is used. Shouldthe number of unclassified candidates reach a threshold, thewebsite sends an automated ‘alert’ to Galaxy Zoo Super-novae subscribers. These (email) alerts are usually sent outonce per day, coinciding with the end of a night’s candi-dates being uploaded from NERSC, and usually result inthe full complement of candidates being classified within afew hours.
Providing feedback to the Galaxy Zoo Supernovae com-munity is a vital part of the overall website experience toencourage volunteers to return to the website. This is partlydone using forums and blogs where scientists can commenton individual events classified by the zoo. In addition, eachvolunteer can view a history of the candidates that theyhave classified on their ‘My Supernovae’ page.The ‘My Supernovae’ (MySN) page displays the can-didate triplets. Those which have been observed areoverlaid with a small symbol identifying the candidate as c (cid:13) , 000–000 alaxy Zoo Supernovae a SN, variable star, or asteroid. Clicking on one of thecandidates also allows the volunteer to see the averagerating across all classifications, the number of classifiersand whether the candidate was selected for followup bythe PTF team. PTF observers are encouraged to leavecomments on the science dashboard that the classifiers canalso see on their MySN page. Galaxy Zoo Supernovae was first trialled on two specificoccasions supporting PTF spectroscopic follow-up obser-vations at the 4.2m William Herschel Telescope (WHT),in August 2009 and October 2009. The selection of thecandidates observed by WHT was guided by the Zoo results,with a particular emphasis on comparing the classificationsproduced by Galaxy Zoo Supernova with those produced byPTF human scanners working on the same data. The top 20scored candidates from this initial trial run of Galaxy ZooSupernovae are shown in Fig. 4. Sixteen of these candidateswere observed by WHT; 15 were confirmed as SNe, with 1cataclysmic variable.Since April 2010, Galaxy Zoo Supernovae has beenrunning full-time on PTF candidates, and by July 15th2010 had classified (cid:39) ,
900 SN candidates at the rate ofseveral hundred candidates per observing night. In all butthe earliest weeks of the project, all submitted candidateswere classified by the zoo. This classified sample forms thebasis of our analysis in this section. A distribution of thescores ( S ave ) for all of these candidates can be found inFig. 5. The bulk of the candidates uploaded are classifiedas likely not astrophysically real events, and correspondto subtraction artefacts or other reduction problems. Thisis indicative of the conservative cuts that are made in thePTF pipeline to avoid losing real SN events for follow-up,and highlights the currently essential requirement for visualinspection of the pipeline candidates. The performance of the public at classifying candidates canbe gauged by comparing with the classifications the PTFteam assigned to the same objects. The PTF team broadlyclassify objects into 4 visual categories: not interesting (notassigned a type), asteroids, variable stars, and transients(such as SNe). Asteroids are not screened for by Galaxy ZooSupernovae – only one image is uploaded for each PTF can-didate, which clearly cannot be used to distinguish movingobjects. Asteroids are typically removed from the candidatelist prior to upload by insisting on two separate detectionsof a candidate within 1 (cid:48)(cid:48) of each other, though this processis not perfect, particularly with slow moving asteroids wherethe apparent motion can be only a few arcseconds a day.To illustrate the performance of Galaxy Zoo Supernovae,we split the candidates by their PTF assigned categoriesand calculate the fraction in each category as a functionof S ave . Fig. 6 is a stacked box plot of the results. At lowscores practically all candidates are those which the PTFteam decide are not interesting: these will include poor N u m be r −1 0 1 2 3S ave Figure 5.
The distribution of all of the scores ( S ave ) for all ofthe PTF candidates classified by Galaxy Zoo Supernovae betweenApril 2010–July 2010. (cid:39) ,
900 candidates were classified. Thebulk of these – (cid:39)
70% – were classified as not astrophysically realby the zoo ( S ave < −1 0 1 2 3S ave F r a c t i on o f c and i da t e s Likely SNLikely astrophysicaltransient
No typeTransientVarStarAsteroidNo typeTransientVarStarAsteroid
Figure 6.
A break down of the classifications collected duringoperations of Galaxy Zoo Supernovae. The bars show the dis-tribution of candidate types (as determined by the PTF team)for a given zoo score ( S ave ). The PTF team potentially assign aclassification of asteroid, variable star, or transient to each zoocandidate – objects without a PTF classification are deemed notto be interesting. Galaxy Zoo Supernovae is not designed to flagmoving objects, which are largely removed before upload. Notethat not all variable stars will be saved to the PTF database, sothis category is likely highly incomplete (and the variables starswill appear as “No type”). subtractions, artefacts/cosmic rays, etc.. As S ave increaseswe see a steady rise in the number of both variable starcandidates and transients. By a score of around 1 .
4, variablestars are no longer selected, and instead the majority of thecandidates are SN-like transients.A number of caveats should be borne in mind when c (cid:13) , 000–000 Smith et al.
PTF09eclCV PTF09ffgunknownPTF09epzSN Ia PTF09dxvSN IIPTF09vxSN Ia PTF09dnpSN IaPTF09afwunknown PTF09siSN IaPTF09bgrunknown PTF09dqtSN IaPTF09dahSN II PTF09dlcSN IaPTF09csjSN Ia PTF09akbSN IaPTF09aluSN Ia PTF09dnqSN IaPTF09dicSN Ia PTF09bdbSN IaPTF09dhxSN Ia PTF09amrunknown
Figure 4.
A montage of the 20 highest ranked PTF candidates from the October testing of the website. Each set of three images shows,from left to right, the new image, the reference image, and the subtraction image. The position of the candidate is shown in each panelby the crosshairs. The candidate name and the spectroscopic type from the WHT (where available) are also shown.c (cid:13) , 000–000 alaxy Zoo Supernovae −1 0 1 2 3S ave F r a c t i on o f c and i da t e s All candidatesSNeAll candidatesSNe
Figure 7.
A break down of the scores ( S ave ) for the 140 knownSNe identified via PTF follow-up spectroscopy (grey histogram).For reference, the distribution of the S ave measures for all theobjects is shown as the open histogram. These classifications werecollected during April–July 2010. examining this plot. The first is that not all variable starsidentified by Galaxy Zoo Supernovae will be assigned thattype by the PTF scanners. As the primary goal of PTFis the study of explosive transients, variable stars arefrequently not recorded in the PTF catalogue (i.e., theywill be assigned “No type” in Fig. 6). The second caveatis that each PTF candidate is potentially observed manytimes over a period of several weeks over many epochs, yetshould only be uploaded to Galaxy Zoo Supernovae once.If there is some problem with the particular epoch that isuploaded to the zoo (a poor image subtraction, or poorseeing conditions), then a real astrophysical event may bepoorly scored by the zoo on that epoch. However, thatcandidate may potentially be saved by a human scannerbased on an image from a different epoch. Thus realtransient events can occasionally be poorly scored by thezoo if the uploaded image is of poor quality; this is the casefor some of the real transients that scored S ave <
0. Finally,it is important to note that the true nature of many of thecandidates remains unknown, and the comparison drawnhere is between the zoo selection and that of a subjective(though experienced) expert opinion.Figure 6 demonstrates that Galaxy Zoo Supernovae iscapable of prioritising good candidates, and that thehighest ranked candidates are likely to be SNe rather thanvariable stars. The candidates which were classified asasteroids in the Galaxy Zoo Supernovae sample are given arelatively high score by the zoo volunteers – they typicallymimic high-quality ‘hostless’ transient events.Some of the Galaxy Zoo Supernovae classified candidateswere observed spectroscopically by the PTF collaboration,as well as candidates identified by other techniques. Weexamine the S ave distribution for these ∼
140 spectro-scopically confirmed SNe (Fig. 7), equivalent to ∼ S ave >
0) by the zoo (60% have S ave > S ave < S ave <
0. Though this may represent a slightly biasedtest (low-scored candidates are less likely to be followedspectroscopically), there are other techniques for screeningcandidates within PTF that complement the zoo thatpartially mitigate this bias. These include the PTF robot( § Fig. 8 shows candidate scores from Galaxy Zoo Supernovaeas a function of the photometric apparent R magnitudeof the candidate (with the host light subtracted) and themagnitude error, both taken from the P48 PTF searchpipeline. We plot these relations separately for spectroscop-ically confirmed SNe and PTF transients, and show thecomparison with all PTF candidates as a set of contours.This latter comparison highlights just what a small fractionof all the PTF candidates the real SNe and transientsrepresent.Fig. 8 shows a few interesting trends. For the confirmedSNe, there is a mild decrease in S ave as the candidatesbecome fainter (or have a larger error), at about ∼ σ significance, or ∼ σ when considering the magnitude error.(There is an equivalent trend for all the PTF transients.)This is expected – at fainter magnitudes, SNe becomeharder to identify visually with a noisier detection, and theclassification becomes more subjective. The SNe are alsolikely to be at higher redshift, and thus perhaps appearmore centrally located in fainter host galaxies and are morelikely to fail the final step in the decision tree (Fig. 3).Nonetheless, Galaxy Zoo Supernovae clearly identifiesand scores highly the bulk of the SNe from PTF, and atbright to intermediate magnitudes the separation of SNeis robust. Even at fainter magnitudes, the majority of theSNe score S ave >
0, and above a detection significance of ∼ σ , the zoo scores all SNe and the vast majority of PTFtransients at S ave > An analysis of the scoring model can reveal optimisationsthat can be made to the number of classifications required c (cid:13)000
0, and above a detection significance of ∼ σ , the zoo scores all SNe and the vast majority of PTFtransients at S ave > An analysis of the scoring model can reveal optimisationsthat can be made to the number of classifications required c (cid:13)000 , 000–000 Smith et al.
16 17 18 19 20 21Candidate apparent magnitude (R)−10123 S a v e PTF TransientsPTF SNeAverage SN scorePTF TransientsPTF SNeAverage SN score S a v e
100 50 20 10 8 5Sigma of detection
Figure 8.
The Galaxy Zoo Supernova scores S ave of PTF candidates of various types as a function of their apparent R detectionmagnitude (left) and the error in that magnitude (right). The filled circles show PTF objects believed to be SN-like transients, filledsquares show the confirmed SNe, while the contours show the distribution of all ∼ ,
000 PTF candidates. The open squares showthe average SN scores in bins of magnitude (or magnitude error). For these candidates, the trend of decreasing score with increasingmagnitude is significant at about 3 σ , and with increasing magnitude error at ∼ σ . Note that only detections of 5 σ significance or greaterare uploaded to the zoo, hence the cut-off in the right-hand panel. for each candidate. As an example, we show the ‘trajectory’of S ave for PTF candidates as a function of the numberof classifications in Fig. 9. As expected, the variation in S ave when adding additional classifications is larger whenthe total number of classifications is small compared towhen many classifications are available. It is also apparentthat once ∼
15 classifications have been received, very fewcandidates change S ave significantly.We also examine the dispersion in each of the GalaxyZoo Supernova scores, as calculated from the individualclassifications, as a function of the scores themselves.Fig. 10 plots the mean absolute deviation in the score ofeach classified candidate as a function of the final candidatescore. As the individual scores from which each S ave iscalculated are highly quantised (each classification can onlyresult in a score of -1, 1 or 3), the resulting plot is highlystructured. In particular, objects with S ave of -1 or 3 musthave a dispersion of zero, and a further dip in the dispersionis also seen around the third scoring possibility, 1. While inprinciple the dispersion in the score might be thought of asa good measure of the classification confidence (measuring,in essence, the agreement between individual classifiers),the current simple decision tree is not refined enough toallow this statistic to be useful.There therefore exists some room to improve the scoringmodel used by Galaxy Zoo Supernovae (and hence theefficiency of the project). A detailed analysis of the datain Fig. 9 shows that reducing the number of classificationsneeded before a candidate is considered classified ( § >
20 to >
15 (for intermediate scoring events) andfrom >
10 to > ∼ < S ave = 1 . S ave = 0 . S a v e SNe, S ave >1.7SNe, 0.21.7SNe, 0.2
Figure 9.
The Galaxy Zoo Supernova scores S ave of PTF candi-dates of various types as a function of the number of classificationsthey have received. Each line represents a spectroscopically con-firmed PTF SN. Those in red have a final S ave > .
7, those inblack a final S ave < .
2, and those in blue intermediate scores.Only candidates scored with 0 . < S ave < . S ave forclarity. In principle, an analysis of which volunteers consis-tently get the classifications correct (when compared to aprofessional astronomer or a spectroscopic classification)could be used to weight different volunteer responses. Forexample, an experienced classifier with a consistent historyof correct responses could have a larger weight than anovice volunteer – there is evidence from Fig. 9 that even c (cid:13) , 000–000 alaxy Zoo Supernovae −1 0 1 2 3S ave M ean ab s o l u t e de v i a t i on ( S a v e ) PTF TransientsPTF SNePTF TransientsPTF SNe
All candidatesSNeAll candidatesSNe
Figure 10.
The dispersion (mean absolute deviation) of theGalaxy Zoo Supernova scores as a function of the scores them-selves ( S ave ) for PTF candidates of various types. The filled circlesshow PTF objects believed to be SN-like transients, filled squaresshow the confirmed SNe, while the contours show the distributionof all ∼ ,
000 PTF candidates. The histograms show the dis-tribution of the mean absolute deviations in S ave for confirmedSNe and for all PTF candidates. In principle, with a more refinedscoring grid the dispersion in each candidate score could be usedas a measure of the confidence of the final classification, but ourcurrent simple decision tree does not permit this. good SN candidates can receive the lowest score (many SNtrajectories start at an S ave = − S ave more quickly. To date, over 13,000 individuals from the Zooniversecommunity have visited the Galaxy Zoo Supernovae siteand 2,800 have classified one or more SN candidates. Thisproject relies upon the rapid classification of SN candidatesand although the community is relatively small comparedto e.g. Galaxy Zoo, a combination of email alerts and acommitted core of a few hundred individuals has madeGalaxy Zoo Supernovae a success.An analysis of the fraction of classifications contributedcompared to the average number of classifications peruser shows that close to 90% of the classifications inGalaxy Zoo Supernovae are contributed by less than 20%of the community. In the Galaxy Zoo 2 project (Masterset al. 2010, Lintott et al., in prep.), close to 50% of theclassifications were by individuals whose total classificationcount was less than 10 galaxies; for Galaxy Zoo Supernovaethat fraction is 3%.
This paper has introduced Galaxy Zoo Supernovae, a newweb-based citizen science project modelled after ‘GalaxyZoo’, that uses members of the public to identify goodsupernova candidates from wide-field imaging data. Using data from the Palomar Transient Factory (PTF), wehave shown that the citizen scientists are extremely goodat identifying real SNe from amongst the thousands ofcandidates that PTF generates, with only a small ‘falsenegative’ rate at the faintest candidate magnitudes.Clearly, Galaxy Zoo Supernovae is not restricted toPTF data and can in principle be applied to any futureimaging survey, such as SkyMapper (e.g., Keller et al.2007) or Pan-STARRS-1 (e.g., Kaiser 2004). The candidateupload mechanism is flexible, and the triplet format (Fig. 1)simple, with custom results pages easily produced forindividual surveys. Perhaps the most exciting aspect formassive future transient surveys such as the Large SynopticSurvey Telescope (LSST) will be the use of Galaxy ZooSupernovae classification data to improve the trainingand accuracy of automated machine-learning transientclassifiers (Starr et al. 2009).The underlying concept of Galaxy Zoo Supernovae iseasily extended. For example, there is also no need torestrict the project to single images of new transientevents. Multiple images of a potential SN from differentepochs, i.e. a candidate history, could also be uploaded toimprove the accuracy of the classifications and thus reducethe possibility of a mis-classification due to a single poorsubtraction. If this included data from before the candidatewas first detected, those candidates with a history of poorsubtractions could quite trivially be eliminated. Thoseasteroids and moving objects which do get uploaded couldalso be removed by visually comparing the candidate posi-tion on several epochs. Galaxy Zoo Supernovae could alsobe used to identify new transients triggered by detections atother wavelengths, for example to quickly identify opticalcounterparts to gamma-ray bursts, where previous opticalreference images might not exist and a timely search iscritical for follow-up.Galaxy Zoo Supernovae could also be used for precisevolumetric SN (or any transient) rate determinations. Inthese calculations, the efficiency of the search (the ratioof recovered to actual SN events) needs to be accuratelyknown, as a function of apparent magnitude and other SNproperties. By uploading ‘fake’ candidates (artificial SNevents inserted into the images) as well as real SNe, thereliability of the zoo can be determined accurately andallow the discovery rate to be converted into a real physicalSN rate.With the discovery stream of new transient types be-coming ever larger, and the dramatic increase set tocontinue with future surveys such as the LSST, the burdenof identifying the best new candidates increases correspond-ingly. By engaging the considerable interest and enthusiasmof the public, we have demonstrated that citizen scienceprojects like Galaxy Zoo Supernovae can play a major rolein ongoing and future transient surveys. c (cid:13)000
This paper has introduced Galaxy Zoo Supernovae, a newweb-based citizen science project modelled after ‘GalaxyZoo’, that uses members of the public to identify goodsupernova candidates from wide-field imaging data. Using data from the Palomar Transient Factory (PTF), wehave shown that the citizen scientists are extremely goodat identifying real SNe from amongst the thousands ofcandidates that PTF generates, with only a small ‘falsenegative’ rate at the faintest candidate magnitudes.Clearly, Galaxy Zoo Supernovae is not restricted toPTF data and can in principle be applied to any futureimaging survey, such as SkyMapper (e.g., Keller et al.2007) or Pan-STARRS-1 (e.g., Kaiser 2004). The candidateupload mechanism is flexible, and the triplet format (Fig. 1)simple, with custom results pages easily produced forindividual surveys. Perhaps the most exciting aspect formassive future transient surveys such as the Large SynopticSurvey Telescope (LSST) will be the use of Galaxy ZooSupernovae classification data to improve the trainingand accuracy of automated machine-learning transientclassifiers (Starr et al. 2009).The underlying concept of Galaxy Zoo Supernovae iseasily extended. For example, there is also no need torestrict the project to single images of new transientevents. Multiple images of a potential SN from differentepochs, i.e. a candidate history, could also be uploaded toimprove the accuracy of the classifications and thus reducethe possibility of a mis-classification due to a single poorsubtraction. If this included data from before the candidatewas first detected, those candidates with a history of poorsubtractions could quite trivially be eliminated. Thoseasteroids and moving objects which do get uploaded couldalso be removed by visually comparing the candidate posi-tion on several epochs. Galaxy Zoo Supernovae could alsobe used to identify new transients triggered by detections atother wavelengths, for example to quickly identify opticalcounterparts to gamma-ray bursts, where previous opticalreference images might not exist and a timely search iscritical for follow-up.Galaxy Zoo Supernovae could also be used for precisevolumetric SN (or any transient) rate determinations. Inthese calculations, the efficiency of the search (the ratioof recovered to actual SN events) needs to be accuratelyknown, as a function of apparent magnitude and other SNproperties. By uploading ‘fake’ candidates (artificial SNevents inserted into the images) as well as real SNe, thereliability of the zoo can be determined accurately andallow the discovery rate to be converted into a real physicalSN rate.With the discovery stream of new transient types be-coming ever larger, and the dramatic increase set tocontinue with future surveys such as the LSST, the burdenof identifying the best new candidates increases correspond-ingly. By engaging the considerable interest and enthusiasmof the public, we have demonstrated that citizen scienceprojects like Galaxy Zoo Supernovae can play a major rolein ongoing and future transient surveys. c (cid:13)000 , 000–000 Smith et al.
ACKNOWLEDGEMENTS
We acknowledge the valuable contributions of the Zooni-verse community without which this project would nothave been possible. AS acknowledges support from theLeverhulme Trust. MS acknowledges support from theRoyal Society. MS and AG acknowledge support from aWeizmann–UK “Making conenctions” grant. CJL acknowl-edges support from the STFC Science in Society Programand The Leverhulme Trust. PEN acknowledges supportfrom the US Department of Energy Scientific Discoverythrough Advanced Computing program under contractDE-FG02-06ER06-04. KS acknowledges support from aNASA Einstein Postdoctoral Fellowship grant numberPF9-00069, issued by the Chandra X-ray ObservatoryCenter, which is operated by the Smithsonian AstrophysicalObservatory for and on behalf of NASA under contractNAS8-03060. JSB acknowledges support of an NSF-CDIgrant “Real-time Classification of Massive Time-seriesData Streams” (Award
REFERENCES
Alard C., 2000, A&AS, 144, 363Astier P., Guy J., Regnault N., Pain R., Aubourg E., BalamD., Basa S., Carlberg R. G., Fabbro S., Fouchez D., HookI. M., Howell D. A., Lafoux H., Neill J. D., Palanque-Delabrouille N., Perrett K., Pritchet C. J., Rich J., Sulli-van M., Taillet R., Aldering G., Antilogus P., ArsenijevicV., Balland C., Baumont S., Bronder J., Courtois H., EllisR. S., Filiol M., Gon¸calves A. C., Goobar A., Guide D.,Hardin D., Lusset V., Lidman C., McMahon R., MouchetM., Mourao A., Perlmutter S., Ripoche P., Tao C., WaltonN., 2006, A&A, 447, 31Bertin E., Arnouts S., 1996, A&AS, 117, 393Frieman J. A., Bassett B., Becker A., Choi C., CinabroD., DeJongh F., Depoy D. L., Dilday B., Doi M., Gar-navich P. M., Hogan C. J., Holtzman J., Im M., Jha S.,Kessler R., Konishi K., Lampeitl H., Marriner J., Mar-shall J. L., McGinnis D., Miknaitis G., Nichol R. C., Pri-eto J. L., Riess A. G., Richmond M. W., Romani R.,Sako M., Schneider D. P., Smith M., Takanashi N., TokitaK., van der Heyden K., Yasuda N., Zheng C., Adelman-McCarthy J., Annis J., Assef R. J., Barentine J., Bender R., Blandford R. D., Boroski W. N., Bremer M., Brewing-ton H., Collins C. A., Crotts A., Dembicky J., EastmanJ., Edge A., Edmondson E., Elson E., Eyler M. E., Fil-ippenko A. V., Foley R. J., Frank S., Goobar A., GuethT., Gunn J. E., Harvanek M., Hopp U., Ihara Y., Ivezi´cˇZ., Kahn S., Kaplan J., Kent S., Ketzeback W., KleinmanS. J., Kollatschny W., Kron R. G., Krzesi´nski J., LamentiD., Leloudas G., Lin H., Long D. C., Lucey J., Lup-ton R. H., Malanushenko E., Malanushenko V., McMillanR. J., Mendez J., Morgan C. W., Morokuma T., Nitta A.,Ostman L., Pan K., Rockosi C. M., Romer A. K., Ruiz-Lapuente P., Saurage G., Schlesinger K., Snedden S. A.,Sollerman J., Stoughton C., Stritzinger M., Subba RaoM., Tucker D., Vaisanen P., Watson L. C., Watters S.,Wheeler J. C., Yanny B., York D., 2008, AJ, 135, 338Hillebrandt W., Niemeyer J. C., 2000, ARA&A, 38, 191Kaiser N., 2004, in Presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference,Vol. 5489, Society of Photo-Optical Instrumentation En-gineers (SPIE) Conference Series, J. M. Oschmann Jr.,ed., pp. 11–22Keller S. C., Schmidt B. P., Bessell M. S., Conroy P. G.,Francis P., Granlund A., Kowald E., Oates A. P., Martin-Jones T., Preston T., Tisserand P., Vaccarella A., Water-son M. F., 2007, Publications of the Astronomical Societyof Australia, 24, 1Law N. M., Kulkarni S. R., Dekany R. G., Ofek E. O.,Quimby R. M., Nugent P. E., Surace J., Grillmair C. C.,Bloom J. S., Kasliwal M. M., Bildsten L., Brown T.,Cenko S. B., Ciardi D., Croner E., Djorgovski S. G., vanEyken J., Filippenko A. V., Fox D. B., Gal-Yam A., HaleD., Hamam N., Helou G., Henning J., Howell D. A., Ja-cobsen J., Laher R., Mattingly S., McKenna D., PicklesA., Poznanski D., Rahmer G., Rau A., Rosing W., SharaM., Smith R., Starr D., Sullivan M., Velur V., Walters R.,Zolkower J., 2009, PASP, 121, 1395Lintott C., Schawinski K., Bamford S., Slosar A., Land K.,Thomas D., Edmondson E., Masters K., Nichol R., Rad-dick J., Szalay A., Andreescu D., Murray P., VandenbergJ., 2010, ArXiv e-printsLintott C. J., Schawinski K., Slosar A., Land K., BamfordS., Thomas D., Raddick M. J., Nichol R. C., Szalay A.,Andreescu D., Murray P., Vandenberg J., 2008, MNRAS,389, 1179Masters K. L., Nichol R. C., Hoyle B., Lintott C., BamfordS., Edmondson E. M., Fortson L., Keel W. C., SchawinskiK., Smith A., Thomas D., 2010, ArXiv e-printsNugent P., Cenko S. B., Miller A. M., Poznanski D., BloomJ. S., Filippenko A. V., Sullivan M., Howell D. A., QuimbyR. M., Ofek E. O., Kasliwal M. M., Kulkarni S. R., LawN. M., Dekany R. G., Rahmer G., Hale D., Smith R.,Zolkower J., Velur V., Walters R., Henning J., Bui K.,McKenna D., Jacobsen J., 2010, The Astronomer’s Tele-gram, 2600, 1Perrett K., Balam D., Sullivan M., Pritchet C., Conley A.,Carlberg R., Astier P., Balland C., Basa S., Fouchez D.,Guy J., Hardin D., Hook I. M., Howell D. A., Pain R.,Regnault N., 2010, AJ, 140, 518Rau A., Kulkarni S. R., Law N. M., Bloom J. S., Ciardi D.,Djorgovski G. S., Fox D. B., Gal-Yam A., Grillmair C. C.,Kasliwal M. M., Nugent P. E., Ofek E. O., Quimby R. M.,Reach W. T., Shara M., Bildsten L., Cenko S. B., Drake c (cid:13) , 000–000 alaxy Zoo Supernovae A. J., Filippenko A. V., Helfand D. J., Helou G., HowellD. A., Poznanski D., Sullivan M., 2009, PASP, 121, 1334Sako M., Bassett B., Becker A., Cinabro D., DeJongh F.,Depoy D. L., Dilday B., Doi M., Frieman J. A., GarnavichP. M., Hogan C. J., Holtzman J., Jha S., Kessler R., Kon-ishi K., Lampeitl H., Marriner J., Miknaitis G., NicholR. C., Prieto J. L., Riess A. G., Richmond M. W., RomaniR., Schneider D. P., Smith M., Subba Rao M., TakanashiN., Tokita K., van der Heyden K., Yasuda N., ZhengC., Barentine J., Brewington H., Choi C., Dembicky J.,Harnavek M., Ihara Y., Im M., Ketzeback W., Klein-man S. J., Krzesi´nski J., Long D. C., Malanushenko E.,Malanushenko V., McMillan R. J., Morokuma T., NittaA., Pan K., Saurage G., Snedden S. A., 2008, AJ, 135, 348Smartt S. J., 2009, ARA&A, 47, 63Starr D. L., Bloom J. S., Brewer J. M., Butler N. R., Poz-nanski D., Rischard M., Klein C., 2009, in Astronom-ical Society of the Pacific Conference Series, Vol. 411,Astronomical Society of the Pacific Conference Series,D. A. Bohlender, D. Durand, & P. Dowler, ed., pp. 493–+ c (cid:13)000