[PDF] The PAU Survey: Operation and orchestration of multi-band survey data

Abstract

The Physics of the Accelerating Universe (PAU) Survey is an international project for the study of cosmological parameters associated with Dark Energy. PAU's 18-CCD camera (PAUCam), installed at the prime focus of the William Herschel Telescope at the Roque de los Muchachos Observatory (La Palma, Canary Islands), scans part of the northern sky, to collect low resolution spectral information of millions of galaxies with its unique set of 40 narrow-band filters in the optical range from 450 nm to 850 nm, and a set of 6 standard broad band filters. The PAU data management (PAUdm) team is in charge of treating the data, including data transfer from the observatory to the PAU Survey data center, hosted at Port d'Informació Científica (PIC). PAUdm is also in charge of the storage, data reduction and, finally, of making the results available to the scientific community. We describe the technical solutions adopted to cover different aspects of the PAU Survey data management, from the computing infrastructure to support the operations, to the software tools and web services for the data process orchestration and exploration. In particular we will focus on the PAU database, developed for the coordination of the different PAUdm tasks, and to preserve and guarantee the consistency of data and metadata.

Full PDF

TThe PAU Survey: Operation and orchestration of multi-band survey data

Nadia Tonello a,1, ∗ , Pau Tallada b,1, ∗∗ , Santiago Serrano c,d , Jorge Carretero a,1 , Martin Eriksen a,1 , Martin Folger c,d , ChristianNeissner a,1 , Ignacio Sevilla-Noarbe b , Francisco J. Castander c,d , Manuel Delﬁno a,1 , Juan De Vicente b , Enrique Fernandez a , JuanGarcia-Bellido e , Enrique Gaztanaga c,d , Cristobal Padilla a , Eusebio Sanchez b , Luca Tortorelli f a Institut de F´ısica d’Altes Energies (IFAE), The Barcelona Institute of Science and Technology, Campus UAB, 08193 Bellaterra (Barcelona), Spain b Centro de Investigaciones Energ´eticas, Medioambientales y Tecnol´ogicas (CIEMAT), Avenida Complutense 40, 28040 Madrid, Spain c Institute of Space Sciences (ICE, CSIC), Campus UAB, Carrer de Can Magrans, s / n, 08193 Barcelona, Spain d Institut d’Estudis Espacials de Catalunya (IEEC), E-08034 Barcelona, Spain e Instituto de F´ısica Te´orica, Universidad Aut´onoma de Madrid, Cantoblanco 28049 Madrid, Spain f Institute for Particle Physics and Astrophysics, ETH Z¨urich, Wolfgang-Pauli-Str. 27, 8093 Z¨urich, Switzerland

Abstract

The Physics of the Accelerating Universe (PAU) Survey is an international project for the study of cosmological parameters asso-ciated with Dark Energy. PAU’s 18-CCD camera (PAUCam), installed at the prime focus of the William Herschel Telescope at theRoque de los Muchachos Observatory (La Palma, Canary Islands), scans part of the northern sky, to collect low resolution spectralinformation of millions of galaxies with its unique set of 40 narrow-band ﬁlters in the optical range from 450 nm to 850 nm, anda set of 6 standard broad band ﬁlters. The PAU data management (PAUdm) team is in charge of treating the data, including datatransfer from the observatory to the PAU Survey data center, hosted at Port d’Informaci´o Cient´ıﬁca (PIC). PAUdm is also in chargeof the storage, data reduction and, ﬁnally, of making the results available to the scientiﬁc community. We describe the technicalsolutions adopted to cover di ﬀ erent aspects of the PAU Survey data management, from the computing infrastructure to support theoperations, to the software tools and web services for the data process orchestration and exploration. In particular we will focus onthe PAU database, developed for the coordination of the di ﬀ erent PAUdm tasks, and to preserve and guarantee the consistency ofdata and metadata. Keywords:

Data management, Web service, High Throughput Computing, Cosmological Survey, Database, Data modeling

1. Introduction

The Physics of the Accelerating Universe Survey (PAUS , ? )observes part of the northern sky for the study of the acceleratedexpansion rate of the universe. The main scientiﬁc contribu-tion will be given by the calculation of the photometric redshift(photo-z) of known galaxy catalogs, with an improved resolu-tion of 0.0035 (1 + z) ( ? ), together with the study of spectralfeatures, clustering, intrinsic alignments and galaxy evolution,among other science cases.The project is governed by a consortium, originally foundedby the Spanish institutes Centro de Investigaciones Energ´eticas,Medioambientales y Tecnol´ogicas (CIEMAT), Instituto deFisica Te´orica (IFT), Instituto de Ciencias del Espacio and In-stituto d’Estudis Espacials de Catalunya (ICE / IEEC-CSIC), In-stitut de F´ısica d’Altes Energ´ıes (IFAE) and Port d’Informaci´oCient´ıﬁca (IFAE / PIC), with the later incorporation of severalEuropean institutes (Durham University, ETH Zurich, LeidenObservatory, University College London) for its scientiﬁc ex-ploitation. ∗ Main scientiﬁc author ∗∗ Main technical author

Email address: [email protected] (Pau Tallada) also at Port d’Informaci´o Cient´ıﬁca (PIC), Campus UAB, C. Albareda s / n,08193 Bellaterra (Cerdanyola del Vall`es), Spain The delivery of science ready data products to the PAUS Col-laboration and to the scientiﬁc community is responsibility ofthe PAU Survey data management (PAUdm) team. The mainchallenges faced in this project for the operation and orchestra-tion of PAUS data are typical of a highly automated system, de-livering and processing a considerably high data volume (com-pared to the one that can be comfortably handled by a singlecomputing machine), to be exploited by a scientiﬁc communityspread over many institutes in di ﬀ erent countries.Successful big projects in managing and distributing astro-nomical data, like the Sloan Digital Sky Survey (SDSS ), are aclear reference for this work, but are hardly reusable, given therapid evolution of the technical and software tools which can beapplied to data management, in addition to the peculiarities andthe size of the project.The PAU Survey is carried out using an imaging camera,called PAUCam ( ? ), designed and built at the engineering fa-cilities of IFAE, in Barcelona. PAUCam is a community instru-ment installed at the prime focus of the 4.2-m diameter WilliamHerschel Telescope (WHT) at the Roque de los Muchachos Ob-servatory (La Palma, Spain), since mid 2015. PAUCam is madeof 18 4k x 2k CCDs, with a system of 6 broad (u, g, r, i, z, Y)and 40 narrow band optical ﬁlters (wavelength range: from 450 Preprint submitted to Astronomy & Computing November 7, 2018 a r X i v : . [ a s t r o - ph . I M ] N ov m to 850 nm) for a high-resolution photometric survey. Theﬁlters are installed in a set of moving, interchanging trays ( ? ).With one of the ﬁve narrow-band ﬁlter trays positioned in frontof the focal plane, each narrow band ﬁlter covers one of the 8inner CCDs, while the outer CCDs are covered by broad bandﬁlters. Six trays are fully equipped with a standard broad bandﬁlter each.Each PAUCam focal plane image consists of about 650 MiBof information, which translates into a mean total data volumeof 200 GiB for a typical observing night. Each night of observ-ing time, PAUCam data are transferred to the PAUS data center,where it is stored and processed with a speciﬁcally designeddata reduction software. The speciﬁc algorithms running dur-ing the PAUS data reduction (image detrending and cleaning,astrometric and photometric calibration) and the ones runningfor the production of the ﬁnal catalogs (sources extraction andco-added objects spectra) are explained in ? and ? respectively.The infrastructure selected to run the PAU Survey pipelinesand to store the data is the High Throughput Computing (HTC)facility available at the Port d’Informaci´o Cient´ıﬁca, primarilyfunctioning as a Worldwide Large Hadron Collider Comput-ing Grid Tier-1 facility, but also giving support to Astrophysicsand Cosmology research groups like MAGIC , for which PICis the reference data center, MICE , DES , the European SpaceAgency mission Euclid and the PAU Survey.The core of the PAUS data management (PAUdm) is thePAUS database (PAUdb), on which all the other services relyfor orchestrating the jobs for the reduction of the PAUCam im-ages, storing results and accessing them.In this paper we present the data management system for thePAU Survey operation and orchestration. The constraints guid-ing the PAUdm design are detailed in section 2. The subsequentsections describe the data management approach to fulﬁll them,highlighting the unique aspects that were developed for pro-cessing and making available the PAU Survey datasets. Theyare: the short term storage at the observatory and transfer pro-cedure (section 3), the long term storage at PIC (section 4), thePAUS database for metadata preservation (section 5), the au-tomated orchestration of the nightly data reduction (section 6),and the web services for data exploration and distribution (sec-tion 7). Finally the PAUS data center infrastructure is describedin section 8, followed by conclusions with a short discussion ofthe learned lessons in section 9.

2. PAUS data management operations constraints

The limited availability of observation time for the PAU Sur-vey at the WHT and the ambitious scientiﬁc goals of the projectconstrain the PAUdm operations design. Those constraints havebeen formulated in terms of requirements, addressed here in de-tail. http://maia.ice.cat/mice/ • Short term storage and data transfer.

All ﬁles produced by PAUCam at the WHT shall be trans-ferred to PIC after data taking. A temporal storage in theobservatory for a minimum of 5 days must be guaranteedin case of connection failure or other temporary problem.Data is transferred to PIC the morning after a night of ob-servation. The data bu ﬀ er at the observatory and the trans-fer procedure from La Palma are described in section 3. • Data ﬁles preservation.

The PAU Survey’s raw and reduced data ﬁles shall bearchived and preserved in an organized way.While the storage at the observatory acts as a bu ﬀ er forseveral days worth of data, the PAU data center shall guar-antee the long-term storage of all data sets. The archivesystem of PAUdm is described in section 4. • Metadata integrity and preservation.

PAUdm shall guarantee the preservation and consistencyof the PAU Survey’s ﬁles metadata and of the results ofthe images reduction, such that they can be accessible fortheir scientiﬁc exploitation.PAU Survey metadata is checked and preserved in a Post-greSQL relational database, called PAUdb. The organi-zation of the raw metadata, of the reduction results and ofthe parameters describing how they have been obtained isexplained in section 5. • Nightly results availability before the next observing night.

The pipeline for the image reduction (the nightly pipeline)shall deliver a nightly report with the quality of the data inabout 6 hours. The survey plan of each observation nightis guided by the nightly pipeline results.The nightly data process orchestration relies on the meta-data stored in PAUdb and is described in section 6.2. Thereports system is done querying PAUdb and it is accessiblethrough a web interface, as described in 7.3. • Data and metadata access, distribution and publication

The PAU Survey ﬁles, both raw and reduced images, aswell as the analysis results must be accessible, distributedand published, primarily to the collaborators of the project,and ﬁnally to the whole scientiﬁc community.A web interface to PAUdb has been designed to access thedata process information and the metadata produced dur-ing the image reduction and analysis. PAUdb, especiallythe tables storing the ﬁnal catalogs, are accessible by thePAU Science group through this web interface. The toolsdeveloped to permit the data access will be described insection 7. . PAU short term storage and data transfer PAUCam is taking exposures of the night sky at the WilliamHerschel Telescope, at the Roque de los Muchachos Astronom-ical Observatory in La Palma (Canary Islands). Each expo-sure is a multi-extension FITS ﬁle that has to be transferredto the PAU data center, preserving the information (metadata)collected by the PAUCam control system related to the sky andtelescope conditions when the data was taken. The organiza-tion and collection of the information related to the FITS ﬁlesis a joint e ﬀ ort between the PAUCam and the PAUdm teams.The FITS ﬁles, with their header information including ﬁeldcoordinates corresponding to the center of each focal plane andweather conditions, are written by the PAUCam data acquisi-tion system ( ? ).The data is organized in observation sets. Each observa-tion set is a group of contiguous exposures collected in thesame directory and taken in the same telescope conﬁguration.The metadata related to each observation set (date, operatorcomments, project name, list of ﬁles names) is collected in aYAML ﬁle by the PAUCam data acquisition system.At the end of a night of observation, the telescope operatorgives the command to start the archiving procedure. Data ﬁles(FITS and YAML) are moved atomically, together with ﬁles at-tributes, like the adler32 checksum, from a PAUCam disk to thePAUdm disk space, located at the observatory, and functioningas a temporary storage. From this point on, the responsibilityof the ﬁle passes from the PAUCam team to the PAUdm team.The disk capacity of the temporary storage is 8 TiB, enough toguarantee one week of raw data storage in optimistic weatherconditions.The new observation sets located in the PAUdm disk aretransferred to PAUS archive at PIC, using a procedure triggeredautomatically from PIC every morning. The ﬁle transfer con-sists of two phases: data transfer and register job submission(see section 6.1).The temporary storage is manually freed once the transferat PIC archive and the register of data into the PAUS databaseand nightly data reduction have ended successfully, showingneither data corruption nor inconsistencies with respect to theobservation set YAML ﬁle content.The data transfer is carried out through the use of the bbcp copying tool. This tool maximizes the bandwidth usage evenwhen using a high-latency wide area network (WAN) link,such as in our case, through a 2000 kilometer-long 1 Gb net-work link. At the end of this phase, all the observation sets atthe WHT are synchronized with the data present in the PAUSarchive at PIC.After the data transfer, a register job is submitted for execu-tion (see section 6.1), which takes care of looking up the newobservation sets and submitting custom register jobs to insertthe metadata from each exposure ﬁle into PAUdb. https://fits.gsfc.nasa.gov/fits_documentation.html http://yaml.org/ Figure 1 shows a sample of the network tra ﬃ c during theobservation period corresponding to the 2017 second semester(2017B), compared with the volume of data downloaded. Thedownload speed is on average around 20 MiB / s, with a goodstability over the days, especially considering that PAU Surveyshares the network link from La Palma observatory with all theother experiments running in the same astronomical site. Figure 1: The solid line shows the daily average transfer speed registered at PICfor PAU Survey data (observation period 2017B). The histogram shows the datavolume transferred in the same day. The transfer rate performance looks quitestable and independent of the transferred data volume. Fluctuations are dueto the fact that the network link is shared with other projects of the La Palmaobservatory.

4. Data ﬁles preservation: The PAUS archive

The PAUS archive stores the ﬁles produced by the PAU Sur-vey project during normal operations in an organized structure,as shown in Figure 2.The raw data produced by PAUCam is saved in multi-extension FITS (MEF) ﬁles called mosaics, corresponding toone focal plane composed by the 18 PAUCam CCDs images.Each extension contains data produced by the readout of one ofthe 4 ampliﬁer of each CCD, for a total of 72 extensions.As mentioned in the previous section, a group of FITS ﬁlescoming from an uninterrupted session of PAUCam observationactivity (ideally one per night) is contained in one folder andconstitutes an observation set, identiﬁed by date of creation anda counter, which is reset at the beginning of each night of obser-vations. Each observation set collects several types of raw ﬁles(test ﬁles, calibration ﬁles, science images). Raw ﬁles are clas-siﬁed by their content in di ﬀ erent types: bias, ﬂats, sky images,stacked focus and tests. Flats and sky images are classiﬁed alsobased on the ﬁlter tray in front of the focal plane when the ex-posure was taken.Raw data is processed by the nightly pipeline (see section6.1) and the resulting products are reduced FITS ﬁles. Masterbias and master ﬂats ﬁles are stored in MEF ﬁles, with one ex-tension containing data of one CCD (after ampliﬁers over-scanand gain correction). Sky images are still stored in FITS ﬁles,but data of each CCD is stored separately: reduced (clean) data,weights and corresponding mask. It reduces the data I / O as the3 igure 2: Left picture: the Oracle StorageTek SL8500 Modular Library System at PIC, hosting the magnetic tapes where PAUS archive ﬁles are stored. Right:PAUS data archive structure along with the di ﬀ erent kinds of ﬁles it stores. Raw FITS ﬁles are organized in folders, each of them identifying a single observationset (see text). Reduced ﬁles (’red’ folder) are organized per release and observation set. .data unit of the memba pipeline is one CCD, while in the nightly pipeline is a whole mosaic.Each time data is processed with a di ﬀ erent code release, theresulting reduced ﬁles are stored in a separate folder, whosename corresponds to the release given name.The total volume occupied by PAUS ﬁles (simulated, raw andreduced) from years 2013 to 2017 is shown in Figure 3. Theplanned data volume of 150 TiB has been estimated for the totallifetime of the project and for the most optimistic scenario ofthe telescope activity from the design phase.PAUS data are permanently stored in magnetic tapes. Thistype of support, optimal for low access rate data, such as ac-tual PAUS data, gives multiple advantages with respect to tra-ditional spinning disks: the reduced cost of the cartridges, thesmall physical space occupied per TiB, and the limited energyconsumption for maintenance are some of them. The main dis-advantage is the access latency due to the fact that the volumehas to be mounted and sought to access the ﬁles (read and / orwritten), operation that can take around one minute , whilethe read and write operation from and to tape has a nominale ﬃ ciency of 250 MiB / s (dependent on the ﬁles size). The orga-nization of the ﬁles archive explained above has been designedtaking into account the storage support. The nightly and theanalysis operations are typically carried out per groups of ob-servation sets, or per code release respectively. By organizingthe data of the same directory in the same cartridge, the numberof loads are minimized, for a longer lifetime of the support. Theprocess of load and ﬁle access is automated and transparent tothe user. For more details about the magnetic tapes technologyhandled at PIC, see Appendix A.The number of tapes assigned to the PAUS projects haschanged in the years due to the change in tape technology, butthe mean assigned space has been proven to be reasonable. Nominal values are: load 13s, ﬁle access 50s, unload 23s. Figure 3: Storage space assigned to the PAU Survey project at PIC and its usagebetween the years 2013 and 2017. Before June 2015, corresponding to PAU-Cam commissioning, the space has been occupied by simulations and resultsof the test pipelines prototypes over them. From June 2015 to the present, thestorage has been ﬁlled with observed and reduced data.

Data ownership and security has been guaranteed with thecreation of a dedicated group of PIC users (paus) and with re-stricted write permissions to administrators (through dCacheACLs) on the archive space.

5. Meta-data integrity and preservation: the PAUSdatabase (PAUdb)

The preservation of the data and the metadata, as well as theirconsistency, is a critical point of the data management of everyscientiﬁc project. The PAU Survey project is generating a largeamount of ﬁles while data is being taken and processed, and alarge amount of metadata as a result of running image analysispipelines. The data volume is only one of the critical points: theﬁles and metadata produced at high velocity need to be accessed4oncurrently by several hundreds of clients, i.e. the nodes of thePIC computer farm where the reduction and analysis code run.In order to enable the storage, management and distributionof all this information, several alternatives for implementing ametadata repository were taken into account, such as a nestedstructure of ASCII ﬁles, relational databases or some newerNoSQL solutions, taking into account the data management andﬁnal users needs. In particular, we looked for a solution allow-ing for the use of SQL query language, which is already familiarto most of the scientiﬁc community, a relational database solu-tion, to allow for comparisons between datasets, and ﬁnally afree, mature and stable software, capable to deal with the datavolume we had foreseen for the project.We settled on a relational database setup consisting of twotwin servers conﬁgured one as a replica of the other. Eachserver has 12 physical cores, 96 GiB of memory and 2 TiB ofstorage. We selected PostgreSQL as the actual software solu-tion for the relational database due to its stability, performanceand broad compatibility with current SQL standards.The PAU processing pipelines make heavy use of databasetransactions to ensure the integrity and consistency of the in-gested data. The information stored in the PAU database isbacked up periodically to minimize data loss in case of mal-function or catastrophic failure. Critical tables are backed uponce a week, while others just once a month. For those ta-bles that are not modiﬁed, such as external catalogs, they aredumped only once when they are ingested for the ﬁrst time.

The PAUdb content is organized in about 40 di ﬀ erent tables.Figure 4 shows the high level organization of the main PAUdbtables (see Appendix B for table description and columns).Some of them are ﬁlled and queried during data reduction: theycontain the metadata of raw ﬁles and ﬁles produced by the nightly pipeline, as well as calibration factors and intermedi-ate values. The production table takes care of the code versionpreservation. BT (see 6.2) tables store conﬁguration and meta-data related to the tasks run in the computer farm, connected tothe tables storing quality checks performed during the data re-duction. Other tables store the results of the multi-epoch multiband process intermediate and ﬁnal catalogs. Public catalogsof reference surveys (Gaia , SDSS , CFHTLenS , etc.) arestored in PAUdb for calibration and analysis purposes.PAUdb, like all relational databases, is interfaced usingStructured Query Language (SQL) commands or statements.This language enables users to specify, in a declarative man-ner, the operation they want to perform on the database, such asretrieving a subset of rows or modifying some existing values.Writing complex SQL statements usually requires full knowl-edge of the database model and understanding of how relationaldatabases work. There are also strong security concerns whenthose statements contain user supplied input, as that may lead https://cdn.gea.esac.esa.int/Gaia/ https://skyserver.sdss.org/dr14/en/home.aspx to unexpected results such as information leak, alteration or de-struction.In order to mitigate all those issues and to facilitate the ac-cess to PAUdb by the pipeline developers, we decided to proxyall database operations through an Object Relational Mapper(ORM), allowing users to interface with the database using thestandard constructions present in their programming language.Having Python as the main programming language for thePAUS processing pipelines, we chose SQLAlchemy as thespeciﬁc ORM solution because of its complete feature set andits comprehensive documentation.The database structure, described in the ORM model, allowsthe developers to access the database by importing a speciﬁcPython module in their code and using the set of classes, ob-jects and methods deﬁned in it, instead of having to manuallyconstruct SQL statements. For instance, querying data fromanother table linked by a foreign key is as easy as accessinga particular object attribute. Using such an abstraction layercomes with additional beneﬁts, such as being able to changeand evolve the database structure without interfering with users,as they only interact with the ORM model. Even though the implemented relational database is workingﬁne for the PAUS reduction pipelines, it is not optimized tohandle high metadata volumes produced by analysis jobs. Thedatabase volume grows with the number of exposures, implyingan increasing of processing time for large scale analysis.To ease out the analysis tasks and to facilitate the publica-tion and distribution of the data, the biggest tables handlingthe products of the image reduction and multi-band analysisare migrated to the PIC big data platform. Once there, severalservices for interactive analysis and distribution, such as Cos-moHub (see 7.6), make those tasks easier, faster and withoutlimitations in data set size.

6. Nightly results availability: PAUdm operations

PAUdm operations are entirely carried out on top of the in-frastructure available at PIC, the PAU data center (see section8).The PAU database has dedicated tables, created in order toautomatically activate the execution of functions and orches-trate the nightly operations.The pipelines, written in Python, are coded in order to be runboth in a HTC infrastructure or in a local computer.The code of PAUdm is organized in pipelines. Each pipelineconsists of one or more types of tasks. Each task type is con-nected to others by static dependencies. Tasks of the same typehave a deﬁned conﬁgurable set of parameters and can run inparallel to guarantee the scalability with the number of ﬁles totreat. Each task is composed of three parts: a prolog , a run andan epilog . The prolog and epilog of the task are able to generate igure 4: PAUdb tables, schematic organization of the main PAUdb tables. For description and list of columns see tables 1, 2, 3, 4 and 5 in Appendix B. new sub-jobs, with the associated dependencies and conﬁgura-tion. The run part executes the selected functions to the giveninput ﬁle(s).In PAUdm we have deﬁned the following main pipelines,schematically shown in Figure 5: register , nightly and memba .Additional pipelines are pixelsim , for the simulation of rawPAUCam images, and crosstalk , for the evaluation of thecrosstalk e ﬀ ect on raw mosaics and its correction. Scientiﬁcpipelines (for example, photometric redshift calculation as de-scribed in ? ) are currently under development from existingseparate algorithms, with the intention of integrating them onPAUdm once reaching enough maturity and stability.In order to execute a task in the PIC computer farm, the func-tions deﬁned in the run must ﬁt to one job running in one node(deﬁned by the data center as running in 1 core and a top of 3-4GiB of RAM consumption). The dependencies and the level of parallelization of the tasksrunning automatically each night of observation are describedschematically in Figure 6.1. After the data has been transferred to PIC, a register taskper observation set is created by the transfer script. In the register task, the prolog checks for the list of ﬁles belong-ing to the observation set, as listed in the observation setYAML ﬁle, and creates one corresponding register subtaskjob per ﬁle. The register task reads the header of the FITSﬁle and inserts in PAUdb the metadata corresponding tothe mosaic (contained in the exposure ﬁle primary header)and the individual images (ﬁle extensions headers). 2. The nightly pipeline consists of a tree of subtasks thatare created per each observation set, whose ﬁnal result isthe production of a master bias (task master bias , one perobservation set), master ﬂat ﬁles (task master ﬂats , oneper observation set and per ﬁlter tray), reduced images(task single epoch , one per target ﬁle), and the calcula-tion of image parameters and calibration factors that deter-mine the data quality, previous to the memba (multi-epochmulti-band) pipeline execution. The algorithms used in the nightly pipeline are described in ? .The subtasks composing the nightly pipeline (master bias,master ﬂats and single epoch) are orchestrated as follows. • Once the metadata of all the FITS ﬁles of an obser-vation set is registered in the PAUdb, the nightly taskfor that observation set is created by the epilog of the register task. • The prolog of the nightly task initiates the mas-ter bias task • The prolog of the nightly task queries PAUdb lookingfor ﬂats ﬁles, creating a master ﬂats for each kind ofﬂats ﬁle found (one master ﬂat per ﬁlter tray). • If target mosaics are also found, a single-epoch sub-task is created for each of them. In this phase of theprocess, the image detrending, the astrometric cor-rection and photometric calibration are performed,taking external public catalogs as reference for brightstars position and magnitude calibration.3. While the register and the nightly pipelines are run themorning after each night of observation, the memba igure 5: PAU data management pipelines dependencies. The white rectangles identify the pipelines and the arrows their interface with the storage system and thePAU database.Figure 6: PAU data management transfer , register and nightly pipelines, and their relation with storages and the database. The pipeline that registers the mosaicsand images in PAUdb is parallelized per FITS ﬁle. Jobs in the nightly pipeline run following a hierarchical structure: for each observation set, the master bias iscalculated ﬁrst, followed by the master ﬂats. Finally the single epoch jobs that process each science mosaic is run in a separate job, as soon as the correspondingmaster-ﬂats mosaic is available. pipeline is run periodically, typically after each observa-tion period. In the memba pipeline, the ﬂux of stars andgalaxies of each image, whose position is taken from aselected photometric catalog (deﬁned in the task conﬁgu-ration) is estimated. Then the information collected in themultiple observations of a single celestial object in di ﬀ er-ent images is added up, and the ﬁnal output is a catalogwith calibrated ﬂuxes of each object in all the observedwavelengths, corresponding to the 40 narrow band ﬁlters,and eventually the 6 broad band ones. The algorithms ap-plied to obtain the results are described in ? .The memba pipeline consists of two types of tasks: • The forced phot task computes the forced apertureﬂux (using the forced photometry method) of eachobject per image. The parallelization level is of onejob per CCD. • The forced phot coadd task co-adds the forced pho-tometry results and obtains a ﬁnal estimate of ﬂuxper object and per waveband. This last task is par-allelized in chunks of objects of the selected catalogand operates only at database level.The memba pipeline does not create new ﬁles, all the in-termediate and ﬁnal results are catalogs stored in dedicatedtables of PAUdb.7 .2. Job orchestration tool: BT

The orchestration of PAUdm jobs is carried out by a spe-ciﬁc tool called brownthrower (BT). Albeit developed for thePAUdm pipelines, BT is a generic tool that may be used byother projects: in fact, it has been used for orchestrating sev-eral MICE and

Euclid job productions. Developed on top ofthe PIC grid job scheduler (PBS, HTCondor), it has the advan-tage of being able to establish and manage the dependenciesbetween di ﬀ erent jobs, therefore allowing the implementationof complex pipelines.The cornerstone of this tool is a relational database, currentlyin the same database server as PAUdb. It holds all the informa-tion about the jobs, such as their input and output data, depen-dencies, and conﬁguration settings. BT makes heavy use ofthe transactional nature of the relational database to ensure theconsistency and integrity of that information while tracking thestatus of every job.BT provides two tools to manage jobs: a manager and a run-ner. The BT manager is a command line interface to create,conﬁgure and submit jobs for execution, querying their statusand abort them. The BT runner is the tool that executes thejob. Once launched in the proper environment, it starts pullingready-to-run jobs from the database and executes them sequen-tially. In practice BT runner is used as a pilot job, where severalhundreds of instances are running all the time in the computingfarm, executing many jobs in parallel. Finally, the fact that allthe jobs, past and present, are stored in a relational database,makes all data available for auditing and accounting purposes.The PAUdm jobs, organized in highly parallelized tasks, are or-chestrated thanks to the connection between BT and PAUdb,both for jobs created automatically and manually.Figure 7 shows part of the computer farm activity for PAUS,from the commissioning of PAUCam to the end of 2017, bothin terms of number of jobs (each one of them occupying oneslot in a node) and on wall time, i.e. the time spent from queueto the job end.

7. PAUdm web services for data access, distribution andpublication

We developed a web based graphical user interface toPAUdb, which provides simple access to many parts of thePAUdm system.Its functionality, described in the following sections, includesa monitor of job progress (section 7.1) and data quality duringobservation runs (section 7.2), the nightly data report summary(section 7.3), access to raw and reduced image data (section7.4), a description of the PAUdb schema and an SQL Browser,and an inspector of ﬂux measurements of objects using the 40PAUS narrow band ﬁlters based on the data used to calculatetheir ﬁnal co-added values (section 7.5).

During operations, pilot jobs are launched in the PIC comput-ing farm and the tasks are automatically fetched and executed.The execution status of the jobs can be monitored in the main

Figure 7: Monthly sum of the number of jobs (solid lines) and wall time (reddots), from commissioning on June 2015 to the end of 2017. Only PAU reg-ister , nightly and memba jobs have been selected (other analysis and test jobshave been removed for clarity). Register jobs have been run during the PAU-Cam observation periods. Nightly jobs follow the register pattern. Extra nightly code releases have been run out of the observation periods over subﬁelds of thesurvey for validation purposes. Most of the time spent in the computer farmis due to memba jobs, the high number of jobs depending on the high level ofparallelization of the pipeline.

Memba jobs have been run with di ﬀ erent con-ﬁguration and di ﬀ erent code complexity, explaining the non-linear dependencyof the number of jobs from the wall time. section of the web interface, showing the results of the queue ofbatch jobs (see section 6.2).The list of the most recent (or current) top-level jobs is shownin table form listing ID, name, number of sub-jobs, executionstatus and quality control, as well as dates of creation, start andduration (Figure 8). The list can be ﬁltered by task, qualitycontrol, date range and status. Pie charts indicate the percentageof jobs for each task or status.The details of each job that can be accessed from the webpage are the conﬁguration, input, output, log messages andtraceback. In particular, the job conﬁguration and its errortraceback allow for a quick look in case of a failed job.The jobs are hierarchically structured. Each job can havesub-jobs, with static dependencies, as a result of the paralleliza-tion of the pipeline execution. From the web, the job hierarchyis maintained and sub-jobs details can be accessed clicking onthe top level jobs.A series of graphical views are associated with the opera-tion control web page. The execution status chart, for example,gives a quick overview of the time evolution of the jobs statusof a certain pipeline (Figure 9). During the execution of the tasks and in the epilog of theparent tasks, a series of quality checks are performed and itsresults assigned to the executed job. The quality checks can beeither numeric values of parameters, to be compared to a certainvalue range, and / or plots for visual check. In case of parentjobs, quality controls plots are generated for summarizing theevolution of interesting parameters calculated in their subtasks.The results of the quality checks are registered in the PAUdatabase and linked to the corresponding job, the plots arestored in a dedicated disk space accessible both from the nodes8 igure 8: View of the PAUS web interface for operations control. List of top-level tasks and their status.Figure 9: View of the PAUS web interface for operations control. Time evolution of jobs processing one observation set. executing the jobs and from the web server. These results canbe viewed through the job details page.A table collects the name of each quality test, a short de-scription of the check performed, the result value and the refer- ence range value, a label that visually shows if the constraint isfulﬁlled or not, and ﬁnally a link to the corresponding plot (ifavailable).The quality inspection is especially interesting during the9rocess of validation and debugging. During normal opera-tions, the quality checks result is monitored, shown as a green,red or yellow (in case of partial, but not critical failure) ﬂag inthe main job operations page. The main purpose of the Nightly Report web page is to feedinto the survey and science program planning process.It provides an overview of all the image parameters takenin a given range of observation nights and stored as raw meta-data in PAUdb. An example is shown in Figure 10. Additionalmetadata about seeing, sky background, detrending status etc.are available to be queried by the page after the nightly taskshave ﬁnished successfully and the information is safely storedin PAUdb. These parameters are displayed as plots, showing thetime evolution of observing conditions and data quality over thenight.Additional plots show the percentage of science frames thatpass the quality control tests (Figure 11), allowing the as-tronomer to assess the quality of the data taken during the pre-vious observing night. Selection criteria are conﬁgurable in cutvalues and parameters, such as transparency, seeing, PSF, andparameters derived by the operations of detrending, astrometryand photometry. Those exposures that fail the quality controlcriteria can be ﬁltered out and rescheduled updating the surveyplan accordingly.

The PAUdb webpage allows the user to access the PAU ﬁlesin archive, the PAUdb schema and the metadata through a SQLsearch window.The PAU archive at PIC is accessible through WebDAV, aprotocol o ﬀ ered by default by dCache, the PIC massive stor-age system. The access of the archive through WebDAV is es-pecially intuitive in order to navigate the directories tree anddownload individual ﬁles (previously staged from tape to diskbu ﬀ er). All the web page users have permission to navigatethrough the full archive, both raw and reduced data.The production table is also published, with its full content.The Production index and / or the release name are the mostcommon ﬁlters for searching data through all the di ﬀ erent codeversions that have been used to process the data. Each pro-duction is associated with a pipeline, and linked to the inputpipeline through the ﬁeld input production id.The SQL search page connects to PAUdb with a read-onlygeneric user. The construction of the query and the selection ofthe correct data production is supported by the publication ofthe full db schema: the name of all the tables, its ﬁelds, types,descriptions, and of the available indexes.The SQL searches are limited in number of output rows, con-ﬁgurable by the user, but with an upper limit of 10k entries. Thelimitation is given by the performance and execution times ofthe platform on top of which the query runs. Anyway, Post-greSQL performance does not prevent having a very powerfultool to query the database, visualize the results on-line in formof table or plot (histogram, or scatter), create new ﬁelds apply-ing functions or combine one or more ﬁelds, and download the result of the query in CSV format . For wider searches it ispossible to use CosmoHub, as explained in section 7.6. The web interface is used to review the memba pipeline out-put, i.e. the combined ﬂux measurements obtained through theforced aperture photometry method for all bands and for eachsource measurement.A dedicated page of the web site allows the user to relatethe memba result to all the pipeline reduction steps backwardsdown to any of the original exposures, in a visual way, pro-viding plots of many of the individual measurements that con-tribute to the ﬁnal result (for the details of the reduction andanalysis steps see ? and ? ). An example is shown in Figure 12.The user can select a source from the reference catalog, eitherby its reference ID, or search criteria such as area (e.g. RA,Dec), parameters of the source (e.g. magnitude in a given band)or the PAUdm reduction results (e.g. ﬂux limit in a given narrowband), etc.Once an object is selected and loaded its 40 co-added narrowband ﬂuxes are displayed as in a wavelength-ﬂux plot. If anSDSS spectrum is available for that object, 40 correspondingSDSS measurements, derived by convolving the SDSS spec-trum with the ﬁlter response functions of the 40 PAU narrowband ﬁlters, are superimposed alongside with emission and ab-sorption lines from a line catalog.This provides an initial check of how well PAUdm resultsmatch results from spectroscopy surveys, and how well it picksup spectral lines.Each of the narrow band measurements is selectable. Onselection, a list of all forced aperture measurements that con-tributed to the overall co-added value is loaded. The user caninspect parameters speciﬁc to these individual measurements,e.g. ﬂux, aperture size and orientation, sky background and ob-servation date etc.For each individual measure it is possible to visualize an im-age section around the source with the superimposition of theaperture region used to calculate the object ﬂux and the annu-lus selected to calculate the sky background. Metadata relatedto the weather conditions and telescope parameters during theobservations are loaded and displayed. One of the tools we have adopted to access and distributedata for the PAUS collaboration is CosmoHub ).CosmoHub is a web platform based on big data technolo-gies developed at PIC to perform interactive exploration anddistribution of massive cosmological datasets without any SQLknowledge being required. The latest release has been built ontop of Apache Hive, a data warehouse based on Apache Hadoopwhich facilitates reading, writing and managing large datasets. https://tools.ietf.org/html/rfc4180 See for the methods to ac-cess the published SDSS spectroscopic data (DR14). https://cosmohub.pic.es/ igure 10: PAUS web interface Nightly Report page example with plots. CosmoHub is hosted at the Port de Informaci´o Cient´ıﬁca(PIC) and provides support not only to PAU Survey, but alsoto several international cosmology projects such as the ESA

Euclid space mission, the Dark Energy Survey (DES) andthe Marenostrum Institut de Ci`encies de l’Espai Simulations(MICE).CosmoHub allows users to access value-added data, whichusually are complementary ﬁles to analyze the data such as sky or survey properties maps, to load and explore pre-built datasetsand to create customized object catalogs through a guided pro-cess. All those datasets can be interactively explored using anintegrated visualization tool which includes 1D histogram and2D heatmap plots (Figure 14). Finally, all those datasets can bedownloaded in standard formats.We currently ingest into CosmoHub three tables ofPAUdb (see a screenshot in Figure 13), containing meta-11 igure 11: PAUS web interface Nightly Report page example with quality selection of images. data of memba pipeline productions (table production andforced aperture coadd) and photometric redshift (photo-z) re-sults. Data from external surveys, used to calibrate, to performforced photometry or to cross-check PAUS data (such as COS-MOS, CFHTLenS or SDSS) is also accessible from CosmoHubfor a direct comparison, in addition to mock galaxy catalogscreated by the MICE collaboration, which are used for calibrat-ing and testing the di ﬀ erent pipelines. The access to this in-formation allows the PAU Survey collaborators to explore anddownload the PAUdm pipelines output data for the scientiﬁcexploitation.

8. PAUS data center

PAU data management has been developed on top of the PICinfrastructure, which consists of a storage system, a network, acomputing farm, databases and user interfaces (UI). Details ofthe PIC infrastructure are described in Appendix A.The PAU project has access to a series of standard and cus-tomized services (user interface, test worker node, web serverson virtual machines and databases), supporting the activities ofthe data management. The standard user interface system ofPIC allows any user of the PAU project to access the PIC in-frastructure, with a dedicated 10 GiB ”home” space, a sharedscratch NFS area, and access to the PAU ﬁles archive. An inter-active access to a test worker node o ﬀ ers an environment whichallows developers to test and debug the PAU pipelines code be-fore the production phase.For computing operations, the PIC farm is shared amongthe hosted projects. 5% of the 8000 available cores at PIC(year 2018) have been assigned to PAUdm activities. The soft- ware developed for PAU Survey image reduction and analysisis managed through git, accessible from an NFS software area,readable by all the nodes where jobs run.Raw data from the WHT is transferred to PIC through a 10Gb connection (further details on the PIC network connectionscan be found in Appendix A).PAU ﬁles are permanently stored on tape. An automated dou-ble copy, in two di ﬀ erent cartridges, guarantees the preservationof PAU Survey data in case of failure of one of them. The PAU-Cam data is kept in a disk bu ﬀ er for a short period of time aftertheir arrival at PIC, in order to read and process it as soon as itis transferred successfully.The virtualized infrastructure of PIC hosts the web servicesfor PAU Survey: the o ﬃ cial PAU Survey web site, and the inter-nal web service , entirely dedicated to PAUdm activities (seesection 7).The PIC storage manager dCache integrates WebDAV,among others, as a native protocol for data access. In addi-tion to its use for archive ﬁles access from the internal PAUdmweb pages, the WebDAV protocol allows the groups that do notbelong to PAU Survey, and that received a time allocation to usePAUCam for their own scientiﬁc purposes, restricted access todownload the raw and reduced images belonging to their pro-gram.Once PAU Survey raw and reduced data become public, itwill be possible to have open access to it through WebDAV. http://paudm.pau.pic.es igure 12: PAUS web interface aperture inspector page example. igure 13: PAUdm catalogs distributed in CosmoHub.

9. Conclusions

The PAUS data management team activity is one of the keypoints for the success of the PAU Survey project. It is in chargeof the data transfer from the observatory to the data center, thedeployment and the execution of the nightly pipeline and otheranalysis pipelines, as well as the organization and distributionof the ﬁnal results and the metadata produced during the dataprocess.The PAUS PostgreSQL relational database is the solutionadopted by the project for the PAUdm operation and orches-tration. It has been designed to be the pillar of the data manage-ment, not only to preserve information but also its consistency,from the raw metadata to the ﬁnal catalogs and results repro-ducibility, keeping track of the archive status, the processes run,the metadata and ﬁnal catalogs. It has been developed with theidea of helping the data accessibility, as well as the distributionof the ﬁnal catalogs.PostgreSQL has been proven to be a good choice for frequentdata insertion, delete, update, but with limits when having morethan hundreds concurrent connections, blocking operations orupdating status of thousands of subjobs. ? shows PostgreSQLlimits when querying data volumes larger than some hundredsof GiB.CosmoHub has been a natural solution for catalogs explo-ration and distribution, but the process of PAUdb tables migra-tion to this platform is still manual. We are exploring the wayof an automatic insertion of the memba PAUdm pipeline out-put to CosmoHub. Synchronizing contents between the Post-greSQL PAUS database and CosmoHub will not only improvethe handling of PAUS catalogs, but also their ﬁndability, oncethe standardization of CosmoHub content in the Virtual Obser-vatoy (VO) paradigm will be completed.The orchestration of the PAU data management tasks throughthe novel developed tool BT has been widely and success-fully used for the nightly operations and jobs orchestrationof PAUdm in the HTC infrastructure at PIC. The size of theproject, the cost of design, deploying, maintenance and upgradea software tool like BT, and the availability of generic middle-ware tools for managing jobs (for example HTCondor ) was https://research.cs.wisc.edu/htcondor/ discouraging ad-hoc self-designed solutions. The advantage ofa tool like BT is that, although built ad-hoc for the PAUdmpipelines case, it has been designed, developed and used in awidely and more generic context, proving its versatility and ef-fectiveness in the interaction of complex tasks structures withthe HTC environment.The performance of the automatic transfer procedure hasbeen demonstrated to fulﬁll the PAUS project needs. The down-load of the data to analyze from the observatory in La Palma toPIC ﬁnalizes in a few hours, with total reliability during nor-mal operations, and fully recovers in the rare cases of failure.While more sophisticated tools would allow a total reliabilityand automation, the data volume to be transferred per night andthe lifetime of the project lead us to go for a simple and easilymanageable solution.The PIC storage system for PAU images has been proved tobe reliable, with total accessibility of the ﬁles stored in mag-netic tapes and recover capability. The low access rate of PAUSimages by the nightly and memba pipelines justify the use ofthis cheap, though slow, data storage support, with respect toother supports like hard disks or solid state drives.The PIC computer farm has been able to process PAU jobsaccording to the expectations. The high level of parallelizationof the pipelines (see Figure 6) in independent jobs, the numberof computing nodes and the distributed ﬁle system available atPIC for the PAUdm jobs process allows us to fulﬁll the require-ment of having nightly results and data quality available on timefor optimizing the survey program. The HTC infrastructure en-ables the reprocessing of all the observations of the Survey infew days, conﬁrming it to be a valid choice.Web services have been implemented for PAU data and meta-data access, retrieval and analysis. They have received a verypositive feedback from the collaborators, due to their usabilityand information content. The data services implemented aregiving a valid and reliable support to the ﬁrst scientiﬁc produc-tion exploiting the PAUS data (see ??? ). Acknowledgements

Funding for PAUS has been provided by Durham University(via the ERC StG DEGAS-259586), ETH Zurich, Leiden Uni-versity (via ERC StG ADULT-279396 and Netherlands Organ-14 igure 14: CosmoHub for PAUdm catalogs distribution and inspection example. igure 15: PAU infrastructure and services. For a detailed description of the PIC infrastructure see appendix A. isation for Scientiﬁc Research (NWO) Vici grant 639.043.512)and University College London. The PAUS participantsfrom Spanish institutions are partially supported by MINECOunder grants CSD2007-00060, AYA2015-71825, ESP2015-66861, FPA2015-68048, SEV-2016-0588, SEV-2016-0597,and MDM-2015-0509, some of which include ERDF fundsfrom the European Union. IEEC and IFAE are partially fundedby the CERCA program of the Generalitat de Catalunya. ThePAU data center is hosted by the Port d’Informaci´o Cient´ıﬁca(PIC), maintained through a collaboration of CIEMAT andIFAE, with additional support from Universitat Aut`onoma deBarcelona and ERDF. We aknowledge the PIC services depart-ment team for the support work and the fruitful discussions. ReferencesAppendix A: PIC infrastructure

The Port d’Informaci´o Cient´ıﬁca (PIC) is a research supportcenter that o ﬀ ers services to manage large amounts of data toscientiﬁc collaborations, whose researchers are spread all overthe world, using distributed computing technologies includingclusters, grid, cloud and big data.PIC has two di ﬀ erent vaults in the same building, locatedin the campus of the Universitat Aut´onoma de Barcelona, withdi ﬀ erent characteristics and energy e ﬃ ciency proﬁles: a 150m air-cooled room and a 25 m highly energy e ﬃ cient ? roomwhich uses open bath dielectric ﬂuid tanks for the storage andcomputing IT equipment.The PIC infrastructure o ﬀ ers a hierarchical mass storageservice with a capacity of approximately 7 PiB on disk and18 PiB on magnetic tape. The disk pools are managed by dCache / Chimera . Shared scratch disk space is also available(Network File System, NFS) to support operations.Long term storage uses tapes, organized in a robotic tape li-brary and managed by Enstore . The magnetic tape technologyis constantly kept up to date. The technology currently in use(year 2018) is the Oracle T10000T2 with a capacity of 8.5 TiB(non compressed) per cartridge. The reliability of the magneticsupport is extremely high (warranty of uncorrected bit error rate1 × − , 30 years of archival life and 25 000 loads / unloads).Data in tape is pre-staged in a disk bu ﬀ er for fast access, totallytransparent for the user.The PIC computing batch system is based on PBSTorque / MAUI for the job queue system and scheduling, andHTCondor. It is integrated to the WLCG (WorldWide LHCComputing Grid), and it is also accessible by local users. ThePIC’s computing cluster consists of 8000 cores running onLinux servers.The external network is deployed in collaboration betweenthe Catalan NREN (CSUC, Anella Cient´ıﬁca), the SpanishNREN (RedIRIS) and the G´eant pan-European network. Beingthe Spanish Tier-1 center for CERN, PIC is connected through a20 Gb link between PIC, CERN and the other Tier-1s. It is alsoconnected to the LHCONE network to Tier-2s and to La Palma,one of the Canary Islands where the Roque de los MuchachosObservatory is located.PIC o ﬀ ers a series of services (user interface, test workernode, virtual machines), supporting the activities of the data https: // / http: // / enstore / http: // / us / products / servers-storage / storage / tape-storage / storagetek-t10000-t2-cartridge-296699.pdf Euclid , MICE).The user interface system of PIC allows any user to accessthe PIC infrastructure, with a dedicated ”home” space, accessto a shared scratch NFS area, access to the storage system andto customized front-ends in order to optimize the delivery ofusers’ scientiﬁc results. The virtualized infrastructure of PIC isbased on Ovirt.PIC infrastructure includes a big data platform based onHadoop (Hortonworks HDP v2.6), deployed at PIC for simu-lated and observed cosmological catalogs analysis, with a ded-icated web portal called CosmoHub (see section 7.6) for datainspection and distribution.PIC is involved in the EU HNSciCloud Pre-Commercial Pro-curement project lead by CERN in order to deploy hybrid cloudprototypes oriented to science needs.

Appendix B: PAU database schema

In the following we show the names, description and thecolumns list of each of the main tables of the PAUdb schema.Foreign keys, which determine the relation between tables, aremarked in italic . 17 able .1: PAUdb tables

Table name Description Columnsproduction

Tracks the di ﬀ erent processingproduction runs for all pipelines comments, created, id, input production id , job id , pipeline, release,software version mosaic List of mosaic exposure images(raw and reduced) airmass, amb temp, archivepath, astro chi2, astro contrast, as-tro href sigma, astro nstars, astro nstars highsn, astro ref cat,astro ref sigma, astro status, comment, date creat, date obs, dec, de-trend status, equinox, exp num, exp time, extinction, extinction err,ﬁlename, ﬁltertray, ﬁltertray tmp, guide enabled, guide fwhm,guide var, humidity, id, instrument, kind, mean psf fwhm,merged mosaics, nextend, obs set id , obs title, photo status,pressure, production id , psf model status, ra, rjd obs, telfocus,time creat, time obs, wind dir, wind spd, zp phot id image

List of images associated withthe mosaics (CCD and singleampliﬁer images) amp num, bkg mean , bkg std, ccd num, cosmic ratio, dec max,dec min, ﬁlter, gain, id, image num, max readnoise, mosaic id ,naxis1, naxis2, n extracted, psf ﬁt, psf fwhm, psf stars, ra max,ra min, rdnoise, saturate ratio, transparency, waveband, wavelength,zp nightly, zp nightly err, zp nightly stars obs set

List of Observation Sets regis-tered in the database id, instrument, log, night, notes, obs set, operator, rjd start, rjd stop project

Description of projects observ-ing with PAUCam contact email, contact name, created at, description, id, name crosstalk ratio

Crosstalk correction to be ap-plied to each ampliﬁer amp num dest, amp num orig, ccd num dest, ccd num orig, produc-tion id , ratio detection

Detections measured directlyon the image after the nightlydata reduction id, image id , insert date, band, background, class star, spread model,spreaderr model, ﬂux auto, ﬂux err auto, ﬂux psf, ﬂux err psf,ﬂux model, ﬂux err model, ﬂags, elongation, dec, ra, x, y, zp o ﬀ set Table .2: PAUdb nightly calibration tables

Table name Description Columnsphot method

Photometric methods background method, background parameter, comments, extrac-tion code, extraction method, extraction parameter, id, scat-terlight method, scatterlight parameter phot zp

Photometric zero-points band, date, id, production id , zp template

SED of star templates refer-ences ﬁlename, id, template index, template lib, template name star photometry

Calibration stars ﬂuxes bg, bg err, ﬂags, ﬂux, ﬂux err, id, image id , phot method id , ref cat,ref id, x image, y image star template zp SED template associated witheach star id, star zp id , template ﬁt band id , zp, zp error, zp weight star zp Star zero points calculated inthe nightly calibration calib method, id, star photometry id , zp, zp error, zp weight image zp

Image zeropoint measurementsfor each photometry-calibrationmethod calib method, id , image id , phot method id , transparency, zp,zp error18 able .3: PAUdb memba and catalogs tables Table name Description Columnsforced aperture

Contains the single measure-ments using force photometryin memba for each band andpass, and for each referencesource annulus a in, annulus a out, annulus b in, annulus b out, annu-lus ellipticity, annulus median, annulus samples, annulus sigma,aperture a, aperture b, aperture theta, aperture x, aperture y, ﬂag,ﬂux, ﬂux error, image ellipticity, image id , pixel id, production id ,ref id forced aperture coadd

Contains the combined mea-surements using force photom-etry in memba for each bandand for each reference source band, chi2, ﬂux, ﬂux error, n coadd, production id , ref id, run sdss spec photo class, dec, extinction g, extinction i, extinction r, extinction u, ex-tinction z, ﬁberID, ﬁberMagErr g, ﬁberMagErr i, ﬁberMagErr r,ﬁberMagErr u, ﬁberMagErr z, ﬁberMag g, ﬁberMag i, ﬁberMag r,ﬁberMag u, ﬁberMag z, mjd, mode, modelMagErr g, model-MagErr i, modelMagErr r, modelMagErr u, modelMagErr z, mod-elMag g, modelMag i, modelMag r, modelMag u, modelMag z, ob-jID, plate, ra, specObjID, subClass, survey, tile, z, zErr, zWarning memba ref cat

Reference catalog used for agiven memba production production id , ref cat photoz bcnz

Photometric redshifts chi2, ebv, n band, odds, production id , pz width, ref id, zb, zb mean

Table .4: PAUdb BT tables

Table name Description Columnsdependency

Tracks the dependency betweenBrownthrower jobs (Operationtable) super id , parent id , child id job Tracks the list of Brownthrowercomputing jobs (Operation ta-ble) conﬁg, description, id, input, name, output, status, super id , token,ts created, ts ended, ts queued, ts started quality control quality control entries mea-sured during the data reductionprocess check name, id, job id , max value, min value, plot ﬁle, qc pass, ref,time, units, value tag

Conﬁgurable tags associatedwith a job (tracebacks, logs,etc.) job id , name, value19 able .5: PAUdb Public external tables

Table name Description Columnscosmos

External table from zCOS-MOS. Sources with accurateredshifts for forced photometryand validation acs a image, acs b image, acs mag auto, acs magerr auto,acs theta image, Bmag, conf, dchi, dec, ebv gal, ebv int, eI,F814W, Gmag, I auto, ICmag, Imag, Jmag, Kmag, mod gal, MV,NbFilt, paudm id, r50, ra, Rmag, sersic n gim2d, type, Umag, Vmag,zﬁts, zl68 gal, zl99 gal, Zmag, zp gal, zp sec, zspec, zu68 gal,zu99 gal sdss spec photo

External table from SDSSDR12 (SpecPhoto view).Sources with spectra for forcedphotometry and validation class, dec, extinction g, extinction i, extinction r, extinction u, ex-tinction z, ﬁberID, ﬁberMagErr g, ﬁberMagErr i, ﬁberMagErr r,ﬁberMagErr u, ﬁberMagErr z, ﬁberMag g, ﬁberMag i, ﬁberMag r,ﬁberMag u, ﬁberMag z, mjd, mode, modelMagErr g, model-MagErr i, modelMagErr r, modelMagErr u, modelMagErr z, mod-elMag g, modelMag i, modelMag r, modelMag u, modelMag z, ob-jID, plate, ra, specObjID, subClass, survey, tile, z, zErr, zWarning spec conv

Convolved ﬂuxes derived fromspectra observations from ex-ternal surveys (i.e. SDSS,COSMOS, DEEP2, etc. band, ﬂux, ﬂux err, id, instrument, spec cat, spec id usnostars

Stars from USNO catalog dec, ﬁeld, id, mag g, mag r, ra yalestars

Stars from Yales catalog dec, id, mag v, ra cfhtlens

CFHTLenS catalogue forForced Photometry alpha j2000, a world, backgr, bpz ﬁlt, bpz ﬂagﬁlt, bpz nondetﬁlt,bulge-fraction, b world, c2, chi squared bpz, class star, delta j2000,e1, e2, erra world, errb world, errtheta j2000, extinction,extinction g, extinction i, extinction r, extinction u, extinc-tion y, extinction z, ﬁeld, ﬁeld pos, ﬁtclass, ﬁt-probability, ﬂag,ﬂux radius, fwhm image, fwhm world, imaﬂags iso, imaﬂags iso g,imaﬂags iso i, imaﬂags iso r, imaﬂags iso u, imaﬂags iso y,imaﬂags iso z, isoarea world, kron radius, level, lp log10 sm inf,lp log10 sm med, lp log10 sm sup, lp mg, lp mi, lp mr, lp mu,lp mz, m, magerr g, magerr i, magerr r, magerr u, magerr y,magerr z, mag g, mag i, mag lim g, mag lim i, mag lim r,mag lim u, mag lim y, mag lim z, mag r, mag u, mag y, mag z,mask, maxval, model-ﬂux, mu max, mu threshold, nbpz ﬁlt,nbpz ﬂagﬁlt, nbpz nondetﬁlt, n-exposures-detec, n-exposures-used,nimaﬂags iso, odds, paudm id, psf-e1, psf-e1-exp[1-14], psf-e2, psf-e2-exp[1-14], psf-strehl-ratio, scalelength, seqnr, snratio, star ﬂag,t b, t b stars, theta j2000, t ml, t ml stars, weight, xpos, ypos, z b,z b max”, z b min, z ml deep2

Deep2 catalog comment, date, dec, dof, e2, magb, magberr, magi, magierr, magr,magrerr, mask, m b, mjd, objname, objno, obj type, pa, pgal, ra,rchi2, rg, sfd ebv, slit, slitdec, slitlen, slitpa, slitra, star type, ub,vdisp, vdisperr, z, zbest, zerr, zquality gaia dr2gaia dr2