Processing Images from the Zwicky Transient Facility
Russ R. Laher, Frank J. Masci, Steve Groom, Benjamin Rusholme, David L. Shupe, Ed Jackson, Jason Surace, Dave Flynn, Walter Landry, Scott Terek, George Helou, Ron Beck, Eugean Hacopians, Umaa Rebbapragada, Brian Bue, Roger M. Smith, Richard G. Dekany, Adam A. Miller, S. B. Cenko, Eric Bellm, Maria Patterson, Thomas Kupfer, Lin Yan, Tom Barlow, Matthew Graham, Mansi M. Kasliwal, Thomas A. Prince, Shrinivas R. Kulkarni
aa r X i v : . [ a s t r o - ph . I M ] O c t I N PREPARATION FOR
ApJ . DRAFT OF O CTOBER
18, 2017.
Preprint typeset using L A TEX style AASTeX6 v. 1.0
PROCESSING IMAGES FROM THE ZWICKY TRANSIENT FACILITY R USS
R. L
AHER , F RANK
J. M
ASCI , S TEVE G ROOM , B ENJAMIN R USHOLME , D AVID
L. S
HUPE , E D J ACKSON , J ASON S URACE ,D AVE F LYNN , W ALTER L ANDRY , S COTT T EREK , G EORGE H ELOU , R ON B ECK , E UGEAN H ACOPIANS ,U MAA R EBBAPRAGADA , B RIAN B UE , R OGER
M. S
MITH , R ICHARD
G. D
EKANY , A DAM
A. M
ILLER , S. B. C ENKO ,E RIC B ELLM , M ARIA P ATTERSON , T HOMAS K UPFER , L IN Y AN , T OM B ARLOW , M ATTHEW G RAHAM , M ANSI
M. K
ASLIWAL ,T HOMAS
A. P
RINCE , AND S HRINIVAS
R. K
ULKARNI IPAC, Mail Code 100-22, Caltech, 1200 E. California Blvd., Pasadena, CA 91125, U.S.A. Anre Technologies Inc., 3115 Foothill Blvd., Suite M202, La Crescenta, CA 91214, U.S.A. Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, U.S.A. Caltech Optical Observatories, California Institute of Technology, Pasadena, CA 91125, U.S.A. Center for Interdisciplinary Exploration and Research in Astrophysics, Northwestern University, Evanston, IL 60208, U.S.A. Astrophysics Science Division, NASA Goddard Space Flight Center, Code 661, Greenbelt, MD 20771, U.S.A. Department of Astronomy, University of Washington, Seattle, WA 98195, U.S.A. Division of Physics, Mathematics, and Astronomy, California Institute of Technology, Pasadena, CA 91125, U.S.A.
ABSTRACTThe Zwicky Transient Facility is a new robotic-observing program, in which a newly engineered 600-MPdigital camera with a pioneeringly large field of view, 47 square degrees, will be installed into the 48-inchSamuel Oschin Telescope at the Palomar Observatory. The camera will generate ∼ petabyte of raw imagedata over three years of operations. In parallel related work, new hardware and software systems are beingdeveloped to process these data in real time and build a long-term archive for the processed products. Thefirst public release of archived products is planned for early 2019, which will include processed images andastronomical-source catalogs of the northern sky in the g and r bands. Source catalogs based on two differentmethods will be generated for the archive: aperture photometry and point-spread-function fitting. Keywords: asteroids — stars: variables, binaries, supernovae, cataclysmic variables — galactic: active nuclei— techniques: image processing, photometric — methods: observational, data analysis INTRODUCTIONThe Zwicky Transient Facility (ZTF ) is a new program forground-based, optical, time-domain astronomy (Bellm et al.2015), which goes well beyond its immensely successful pre-decessor programs, the Palomar Transient Factory (PTF )and eponymous intermediate program (iPTF). A digital cam-era specially developed for the ZTF, with a faster readouttime and a much larger field of view, along with other en-hancements relative to the PTF camera, such as a fast ex-posure shutter, is nearing the completion of its developmentphase, and installation into the 48-inch Samuel Oschin Tele-scope at the Palomar Observatory is planned for late sum-mer of 2017. Substantial telescope and dome upgrades arepart of this effort, including faster drives and improved op-tics. In addition, the parallel development of hardware andsoftware systems that can handle real-time post-processingof the massive amount of image data from the camera in nor-mal operations, as well as the development of a long-term [email protected] archive for the processed products, is underway. The dataprocessing will be performed at IPAC, and the ZTF archivewill be hosted by the NASA/IPAC Infrared Science Archive(IRSA ). The data-processing system and archive will relyheavily on relational databases to consolidate and warehouseinformation that can be later queried in a historical contextfor the discovery of astrophysical transient phenomena. Themajor science goals for the ZTF include discovering youngsupernovae, searching for electromagnetic counterparts togravitational wave sources, identifying stellar variables, anddetecting near-Earth asteroids.In Section 2, we give background information on the ZTFproject, some camera details, and highlights of the observingstrategy. In Section 3, the focus of this paper, we outlineour design of the ZTF data-processing system and discussour recent performance-test results. We emphasize that thesystem design is still evolving and the results up to this pointin time are preliminary. We briefly describe the ZTF archivein Section 4. Finally, we provide a summary and outlook inSection 5. http://irsa.ipac.caltech.edu L AHER et al. PROJECT BACKGROUND2.1.
Programmatics
The ZTF’s precedessor programs, PTF and iPTF, have ledto the publication of 185 papers in peer-reviewed journals (atthe time of this writing – more PTF/iPTF papers are forth-coming). As an example, a paper on a regular type II super-nova based in part on PTF data was recently published in
Na-ture Physics (Yaron et al. 2017). The iPTF project officiallyended in March of 2017. A separate enterprise has been un-derway since then to reprocess the PTF Galactic-plane datafor inclusion in a future public data release.The ZTF is a consortium of several U.S. and global insti-tutions, known as the “collaboration”. Both private fundsfrom the members and public (National Science Foundation)have been received for this project. Work on the ZTF be-gan in earnest in 2014, and the current plan is for the ZTF tocollect data for three years after regular operations begin.Data from the ZTF will fuel research projects for students,postdocs, and scientists at institutions in the collaborationand the public at large. There is also a related new initia-tive for undergraduate research known as the Summer Un-dergraduate Astronomy Institute (Penprase & Bellm 2017).2.2.
Camera
Detailed engineering papers on the development of theZTF camera are given by Smith et al. (2014) and Dekanyet al. (2016). A definitive instrument paper is in preparationby Dekany et al., and is due out by end of 2017 or early 2018.First light with the camera permanently mounted to the tele-scope is scheduled for October 2017.The ZTF-camera field of view is truly groundbreaking,and it will enable imaging of the entire Palomar sky eachnight. Figure 1 compares it to the field of view of other large-survey cameras, either currently in operation or under devel-opment. Table 1 lists the salient attributes of the camera forthe planned observing.
Table 1 . ZTF camera and nominal observing attributesAttribute ValueField of view 47 square degreesPixel scale 1 arc second per pixelPixel size 15 µ mCCD readout channels 4Exposure time 30 secondsReadout time ≈ secondsSlew & settle time +5 secondsOptical filters ZTF g , r , and i Limiting magnitude 20.4 ( r -band, 5 σ )Mosaic of CCDs × layout (16 CCDs)Pixels per CCD ∼ K × ≈ megapixels The median image quality delivered by the 48-inch tele-scope optics on Palomar mountain is ≈ arc seconds. Thismotivated the design choice of 1 arc second Nyquist sam-pling for the camera.2.3. Observing Strategy
The acquisition of science exposures will occur throughouteach observing night, weather permitting. We are budgetingfor 260 good-weather nights in our estimates. Dependingon the time of year, we expect to acquire 600-800 scienceexposures per night, assuming one exposure taken every 45seconds, which allows for 30-second camera exposures andthe fixed readout time and concurrent slew & settle times ofthe camera and telescope (see Table 1). Scanning at least3760 square degrees of the sky per hour will be possible.Calibration exposures (biases, dome flats, darks, etc.) willbe acquired during the day. Fresh calibration products willbe made before the nightly processing of science exposuresbegins.During the first year and a half of operations, ZTF willconduct two general-purpose public surveys: a three-nightcadence survey of the visible Northern Sky, and a nightlysweep of the Galactic Plane. For both programs, fields willbe visited twice each night they are observed, with approxi-mately one hour separation between a g -band exposure andan r -band exposure. DATA PROCESSINGLaher et al. (2014) describe the image-processing systemfor the ZTF’s precedessor programs, PTF and iPTF. This ex-perience is leveraged in our new design for the ZTF. Belowwe give an overview of the ZTF Data System. A more de-tailed description will appear in a future publication.The science exposures will be processed in real time,throughout the observing night. This requires a data-processing system that can keep up with the incoming data.Figure 2 depicts the ZTF processing system that is cur-rently under development at IPAC. It features 64 pipelinemachines, four file servers, two database machines (primaryand secondary/backup), a private web server, a Kafka clus-ter, and an IRSA public web interface. The local network isproficient at data rates of up to 10 gigabits per second.The pipeline machines are for running ZTF real-timepipelines in parallel. This part of the system is inher-ently scalable, and more machines can be added as needed.The CPU of each pipeline machine has 16 cores (2 threadsper core), and 16 pipeline instances per machine are typ-ically run. The open-source, workload-manager software,SLURM , is used to farm out pipeline instances to pipelinemachines in the proper order and number. The real-timepipeline generates many interim files at its various steps, andthese files typically become inputs to a subsequent step. Theinterim files are written to the local disks of pipeline ma-chines for speed and avoidance of unnecessary congestion https://kafka.apache.org/documentation https://slurm.schedmd.com TF I
MAGE P ROCESSING Figure 1 . Field of view of the ZTF camera compared to that of other large-survey cameras. The Moon and the Andromeda Galaxy (Messier 31)are shown to scale. (With the permission of Joel Johansson.) on the local network.The four file servers allow parallel file transfer across thelocal network. Files copied from pipeline-machine localdisks to and from the sandbox filesystem are expected togenerate the most traffic. Products ultimately copied to thearchive filesystem will add marginally to the load. This partof the system is also scalable and more machines could po-tentially be added to service additional load.Our data-processing system features a PostgreSQL rela-tional database running on the primary database machine,equipped with 384 gigabytes of memory. Since this elementis not scalable, careful, disciplined database design, and bothhardware and software tuning have proven to be absolutelyessential. Database tables and indexes are strategically mi-cromanaged so that heavily accessed data are stored on solid-state devices (SSDs) as opposed to spinning disks. The largedatabase-machine memory and judicious database tuning en-sure a very high cache-hit rate. Daily database vacuumingis required for regular maintenance. The database is repli-cated onto a second database machine, so that design-teammembers can query it without affecting the primary database.Candidate transients are stored in database tables partitionedby observing date and readout channel. A set-up process thatcreates database views over the correct candidate partitionsis executed daily.The web server will allow ZTF personnel to perform qual-ity analysis on the raw data and processed products. It willalso serve as a staging ground for initial access for archive products as well as disseminate products for solar system sci-ence.The Kafka cluster is a set of machines at IPAC and a setof machines at the University of Washington (UW) that fa-cilitates the distribution of transient-event alerts to the ZTFcollaboration and eventually the community. Once the eventdata has been sent to Kafka, which is facilitated by pipelinepackaging the data in Avro format, the Kafka software han-dles mirroring between IPAC and UW over the Internet.Figure 3 gives a flowchart of the ZTF real-time pipeline.The CCD-image files received from Palomar consist of onemulti-extension FITS file for each CCD and exposure. Theseare split into readout-channel quadrants for smaller imagesto facilitate the subsequent parallel processing, in which thequadrant images are processed independently. Bias and flat-field corrections are applied to the applicable raw images.The processing involves instrumental calibration, which con-sists of astrometric and photometric calibration, followed byimage differencing with a reference image for finding tran-sients. Scamp is used in conjuction with the GAIA cata-log (Gaia Collaboration et al. 2016) to perform the astrom-etry and compute a World Coordinate System for each sci-ence image. Absolute photometry is done using the Pan-STARRS DR1 catalog as the standard (Flewelling 2017).Reference images, which are needed for image differenc-ing, are made within the first few months of operation. Theimage differencing includes a combination of the steps de- E.g., https://avro.apache.org/docs/1.8.2/ L AHER et al.
Figure 2 . ZTF data-processing system. scribed in Masci et al. (2017), as well as an implementa-tion of the ZOGY algorithm (Zackay et al. 2016). Candi-date transient events are detected in thresholded differenceimages, and then vetted with machine learning provided byJPL, for example, using the real/bogus framework of Bloomet al. (2012). Asteroid streaks are found in the difference im-ages using the findstreaks module (Waszczak et al. 2017), andsubsequently checked for reliability using a new determinis-tic algorithm that accounts for the point-spread function ofthe streak.Figure 4 gives preliminary performance test results forthe ZTF real-time pipelines processing a night’s worth ofsimulated data. The density plot includes 28,505 indepen-dent pipeline instances running on 32 pipeline machines, andshows that a night spanning ∼ . hours can be processedin ∼ . hours, which is better than real time. The medianrun time for a pipeline instance in this test is 275.5 seconds.The primary factors that affect the pipeline run times in-clude: 1) The number of astronomical sources extracted fromthe images (e.g., Galactic-plane observations are much morestressing); and 2) The fraction of images for the night that ac-tually have reference images available (this was ∼ % forour simulated data). The simulated data for the test includesrandom transients at a rate of 10 per CCD. Future tests willinvolve simulated transient rates that are up to × higher. PRODUCT ARCHIVEAs illustrated in Figure 2, IRSA will set up infrastructurefor a long-term archive and a web interface for the distribu-tion of ZTF-survey products to the collaboration and public.The web interface will have similar functionality as thoseavailable to IRSA archives for other projects. The entirearchive is expected to grow to ∼ petabytes by the end of Figure 3 . Flowchart of the ZTF real-time pipeline.
TF I
MAGE P ROCESSING Figure 4 . Preliminary ZTF benchmark-test results.
ZTF’s three-year program. Table 2 lists the product typesthat will be archived, which will include both images andastronomical-source catalogs. Two different methods willbe used to generate source catalogs for the archive, namely,aperture photometry and point-spread-function (PSF) fitting.The first public release of archived products is planned forearly 2019. This release will include only products from theZTF public surveys. Products from collaboration observingprograms will be included in later releases. CONCLUSIONSThis paper gives an overview of the ZTF project, cam-era, observing strategy, data-processing system, preliminarybenchmark results, and product archive, with an emphasison describing the data-processing system and archive. In thecoming months, the data-processing system will be exercisedon real data from the observatory-mounted camera. Buildingupon our past successes gives us confidence in a favorableoutcome for this endeavor. Exciting new discoveries awaitastronomers analyzing ZTF products!We are grateful to Joel Johansson of the Oskar Klein Cen-ter at Stockholm University and the Weizmann Institute ofScience for permission to use the original creative workshown in Figure 1.The ZTF is supported by a collaboration including Caltech,IPAC, the Weizmann Institute of Science, the Oskar Klein
Table 2 . ZTF archive productsProductRaw imagesProcessed imagesImage masksDifference imagesSExtractor catalogsPSF-fit catalogs (DAOPHOT)Reference images, catalogs, etc.Calibration productsLight curves
Center at Stockholm University, the University of Maryland,Deutsches Elektronen-Synchrotron and Humboldt Univer-sity, Los Alamos National Laboratories, the TANGO Consor-tium of Taiwan, the University of Wisconsin at Milwaukee,Lawrence Berkeley National Laboratories, and the Univer-sity of Washington.Matching support from the National Science FoundationMSIP program will enable public ZTF surveys, data releases,and annual summer schools. L
AHER et al.