A citizen science exploration of the X-ray transient sky using the EXTraS science gateway
Daniele D'Agostino, Duncan Law-Green, Mike Watson, Giovanni Novara, Andrea Tiengo, Stefano Sandrelli, Andrea Belfiore, Ruben Salvaterra, Andrea De Luca
AA Citizen Science Exploration of the X-ray TransientSky using the EXTraS Science Gateway
Daniele D’Agostino a, ∗ , Duncan Law-Green b , Mike Watson b ,Giovanni Novara c,d , Andrea Tiengo c,d,e , Stefano Sandrelli d ,Andrea Belfiore d , Ruben Salvaterra d , Andrea De Luca d,e a National Research Council of Italy - CNR-IMATI, Genoa, Italy b Dept.of Physics & Astronomy University of Leicester, U.K. c Scuola Universitaria Superiore IUSS Pavia, Italy d National Institute for Astrophysics INAF, Milan, Italy e Istituto Nazionale di Fisica Nucleare - INFN, Italy
Abstract
Modern soft X-ray observatories can yield unique insights into time domainastrophysics, and a huge amount of information is stored - and largely un-exploited - in data archives. Like a treasure-hunt, the EXTraS project har-vested the hitherto unexplored temporal domain information buried in theserendipitous data collected by the European Photon Imaging Camera in-strument onboard the XMM-Newton satellite in 20 years of observations.The result is a vast catalog, describing the temporal behaviour of hundredsof thousands of X-ray sources. But the catalogue is just a starting pointbecause it has to be, in its turn, further analysed. During the project aneducation activity has been defined and run in several workshops for highschool students in Italy, Germany and UK. The final goal is to engage thestudents, and in perspective citizen scientists, to go through the whole vali-dation process: they look into the data and try to discover new sources, or tocharacterize already known sources. This paper describes how the EXTraSscience gateway is used to accomplish these tasks and highlights the firstdiscovery, a flaring X-ray source in the globular cluster NGC 6540.
Keywords:
Science Gateways; Astrophysics; Citizen science; X-rayastronomy; Virtual Observatory ∗ Corresponding author
Email address: [email protected] (Daniele D’Agostino)
Preprint submitted to Future Generation Computer Systems November 18, 2019 a r X i v : . [ a s t r o - ph . I M ] N ov . Introduction X-ray astronomy probes a wide diversity of phenomena, related to themost extreme physical conditions that can be observed in the Universe: verystrong gravitational and/or electromagnetic fields, very high temperatures,populations of particles moving close to the speed of light [1]. Almost allcosmic X-ray sources, from flaring stars in the solar neighbourhood, to ac-creting supermassive black holes in galactic nuclei at cosmological distances,are variable as a function of time and characterized by flux and spectralchanges on distinct time scales, ranging from a fraction of a second to sev-eral years. Indeed, variability studies are crucial to understand the natureand physics of the sources.Every day observing facilities with time-resolved imaging capabilities [2]collect huge amounts of potentially interesting information about serendip-itous X-ray sources and their temporal variability, which remains mostlyunused, stored in data archives. In the soft X-ray range ( ∼ . −
12 keV),the European Photon Imaging Camera (EPIC) instrument onboard the Euro-pean Space Agency mission XMM-Newton is the most powerful tool to studythe variability of faint X-ray sources, thanks to its unprecedented combina-tion of large effective area, good angular, spectral and temporal resolution,and large field of view [3, 4]. Twenty years after its launch, EPIC is stillfully operational and its immensely rich archive of data, the XMM-NewtonScience Archive (XSA), keeps growing.Major efforts are ongoing to explore the serendipitous content in XMMdata. For instance, a catalogue including all sources detected in EPIC obser-vations is being regularly updated, as more data become available, and itsmost recent release lying by chance in the instrument field of view, in the portion of the sky surroundingthe target of the observation http://nxsa.esac.esa.int/nxsa-web https://heasarc.gsfc.nasa.gov/W3Browse/xmm-newton/xmmssc.html
2o 2016 by several scientific institutions led by INAF, the Italian NationalInstitute for Astrophysics: CNR-IMATI and IUSS Pavia (Italy) MPE andFriedrich-Alexander-Universit¨at Erlangen-N¨urnberg (Germany) and Univer-sity of Leicester (UK). Its main goal was the realisation of a vast catalogueto be released to the astronomical community, describing and quantifying ina set of synthetic parameters the temporal behaviour of all EPIC sources.To accomplish this goal, new data analysis algorithms were designed and im-plemented. The results and software are available through a science gateway(http://portal.extras-fp7.eu) [6] made up of an open access database, namedthe EXTraS Archive, and a portal for running the software on new data orto re-analyse an observation with different parameters. The software is ofparticular importance for enhancing the discovery potential of the XMM-Newton mission [7]. This is especially true as we look to the future: theEPIC instrument is still fully operational and collects new data daily, andits operations may be extended for more than one decade.As the most sensitive search for variability ever performed, EXTraS israising new questions in high-energy astrophysics [8] and may serve as apathfinder for future missions. The output catalogue, together with ouranalysis tools, discloses a huge amount of information, relevant for studies ofalmost all classes of astrophysical sources. The present effort is focused ondelving into this vast dataset and finding interesting phenomena, as [9, 10].Since the beginning of the project, project partners believed this wouldbe an exciting challenge for citizens interested in astrophysics and studentinternships. Therefore an educational activity was defined for offering highschool students the chance to go through the whole validation process: theystudy the data and try to discover new sources, or better to characterisealready known sources [11]. Apart from good basic science they can learnand use statistics, physics, image handling, the added value is that theycould be the very first people in the world to discover and characterize thosetransient and variable sources: they managed to get their faces on the XMM-Newton cameras for a handful of seconds, just like extras in movies. SinceOctober 2015 several workshops and related educational activities have takenplace. This paper describes how the EXTraS science gateway has been usedtogether with the first important discovery, an enigmatic flaring source inthe globular cluster NGC 6540 [12].The paper is organized as follows. Section 2 discusses work related tocitizen astronomy, Section 3 presents the EXTraS portal while 4 presents theEXtraS archive. Section 5 describes the activities carried out for engaging3tudents, while Section 6 presents the conclusions.
2. Citizen Astronomy and its Implementation
As reported by the Encyclopaedia Britannica , “The spectrum of projectsand initiatives that fall under the umbrella of citizen science is broad, andconsequently there is debate surrounding a formal definition”.One possible definition is “A citizen scientist is a volunteer who collectsand/or processes data as part of a scientific inquiry. Projects that involve cit-izen scientists are burgeoning, particularly in ecology and the environmentalsciences, although the roots of citizen science go back to the very beginningsof modern science itself” [13].The most famous - if not the first - initiative in this field is SETI@Home[14], which exploits the idle computing time of common citizens in the searchfor extraterrestrial life since 1999 [15]. Presently, there are several Internet-based citizen science projects and associations devoted to promote such col-laboration paradigms, such as the Citizen Science Alliance , the OpenScien-tist blog and the American Association for the Advancement of Science .Furthermore publications using data and results collected with the contri-bution of citizen scientists are becoming common in many disciplines, as[16, 17, 18].Although a valuable motivation for a citizen scientist in joining a scienceproject is the satisfaction of his/her intellectual curiosity, thus contributingto scientific dissemination [19], both the previous definitions as well as thecited activities and publications mainly stress the active role that citizensfulfill when participating in a project [20]. They can: • provide data, as for example in the Great Sunflower Project or Crowd-Hydrology [21]; • provide computational resources to elaborate data, for example withthe BOINC platform; http://boinc.berkeley.edu/ participate in the data processing and analysis.The last two items represent the initiatives for involving citizen scientistsin astronomy [22]. In particular there are several such initiatives, as can beseen in the Spacehack.org website, on the Zooniverse platform or - as regardsproviding computational resources - among the BOINC projects.Focusing on the data analysis, one of the most important platforms isrepresented by Zooniverse [23, 24], “a platform for people-powered researchwhere anyone can study authentic objects gathered by researchers, like im-ages of faraway galaxies, by answering simple questions about them” . Thisplatform hosts several projects, whose creation is straightforward, just re-quiring the upload of the data and a choice of the tasks the volunteers mustdo, such as answering questions or marking features in the data by drawingor tagging text . The platform today hosts about 21 live projects related toastronomy [25, 26], resulting in an increasing number of scientific discoveries,e.g. [27, 28, 29].While Zooniverse is a collaboration between institutions from the U.K.and the U.S. supporting the creation of science gateways with the specificobject of offering citizen science tools, CitizenScience.gov is an official U.S.government initiative aiming at supporting federal agencies/institutions inaccelerating the use of crowdsourcing and citizen science. This means thefocus is on providing a toolkit for improving scientific projects software, i.e.to improve tools originally developed for professionals also to citizens, andon disseminating the advantages of this approach by providing an extensivelist with more than 430 successful active or past projects.Classification and annotation of data represent the most common tasks,but more complex analyses can be performed. For example the SkyServerinitiative hosts several projects for supporting the study of stars and galaxiesin the Sloan Digital Sky Survey (SDSS) [30], i.e. the same objects that pro-fessional astronomers study. The focus here is twofold: to support the studyand the use of heterogeneous data, e.g. images, spectra, photometric andspectroscopic data, and tools, e.g. for creating astronomical finding charts https://help.zooniverse.org/getting-started/ . Also Zooniverse isdeveloping an environment where more advanced data analysis tools can bedeveloped and provided to citizen scientists .A key factor for the success of such advanced initiatives, as outlinedin [22], is represented by “providing user-friendly, web-based tools enablingfairly sophisticated data analysis to be performed by anyone with a browser.Open source tool code is a minimal requirement in this model; finding waysbeyond this to support citizen algorithm development seems to be likely topay off”. One of the most important examples in this direction, that canbe represented as an evolution of SkyServer [31], is the SciServer collabo-rative environment for large-scale data-driven science [32]. It uses Jupyternotebooks [33] for advanced analysis in a Cloud environment. Jupyter isan interactive computing environment based on the concept of notebooks,i.e. documents including text, plots, equations, videos but also live code andinteractive widgets written in Python, R or other languages. In particularthe output generated by running code is embedded in the notebook, whichmakes it a complete and self-contained record of a computation that can beexecuted locally or on the same server as the source data. The result is anenvironment able to support highly complex searches and analyses resultingin millions of sky objects from several databases .Moreover Jupyter notebooks can be used also for creating science gate-ways [34]. In general, a science gateway can be defined as a set of software,data collections, instrumentation and computational capabilities that areintegrated - using different technologies and middleware [35] - via a Webportal (or a desktop application) in a user friendly and effective environ-ment supporting the scientific research and education activities of a specificcommunity. Science gateways are gaining an increasing interest in many com-munities [36, 37], such as the astrophysics one [38, 39, 40], also because oneof the best strategies to provide software and data to a scientific communityis through a set of services designed following this paradigm. The portal wedeveloped in the project and used for involving citizen scientists [6] is basedon this paradigm.In conclusion most of the presented citizen science projects represent sci- http://skyserver.sdss.org/dr7/en/tools/started/ https://blog.zooniverse.org/2013/11/26/zootools-going-deeper-with-zooniverse-project-data/ and https://github.com/zooniverse . But, at the same time, projects like SciServer allow college stu-dents and and science-literate citizens to access the same tools as researcherswho work with Big Data .
3. The EXTraS Portal
Figure 1: The architecture of the EXTraS science gateway
The EXTraS science gateway is composed of an archive, presented in thefollowing Section, and a portal. The architecture is depicted in Figure 1 anddescribed in details here and in the following Section.The EXTraS portal has the goal of providing users with a seamless envi-ronment to process the observations made available from the XSA with the https://daily.zooniverse.org/2017/09/11/keeping-it-simple/ Experiment configuration.
The Jobs Management User interface representsthe home page of the EXTraS portal. It provides users with the possibil-ity of creating, submitting and managing the different analysis experimentsbased on the software developed within the EXTraS project. In particular itpresents all the submitted or configured analyses, providing the possibility tocreate a new analysis starting from an existing configuration or share resultswith other users.This module is based on AngularJS and it is a complete web app, withoutany server side code [41]. It uses the Persistence API to store and retrieveexperimental data, and it activates the other portal modules correspondingto the different operations available.
Experiment execution.
All jobs are managed by the portal, based on comput-ing resources provided by EGI Fedcloud [42] to virtual organizations (VO),i.e.groups of users where members are usually in related research activities.The Workflow Configuration module is responsible for interacting with theuser for the creation and configuration of experiments based on the EXTraSsoftware. In the portal every analysis corresponds to a single application,therefore there is no need to explicitly create and manage workflows. Forthis reason a user interacts directly with an analysis-specific interface (UI),as shown in Figure 2 for the Transient Analysis. Its main aim is to collectthe parameters value and to create a namelist, that will be provided to theFedCloud Submission Handler for the actual execution of the job.The FedCloud Submission Handler module manages submission of jobsand provides a full view of the job status, results and logs. In particular theactual submission requires the user specifies as input for the analysis one ormore observation identifiers (OBSID) among those contained in the XSA.Each of them corresponds to a job, therefore an analysis configuration canresult in multiple jobs executed at a time. During the execution the user canmonitor the status of the job by means of the real-time log information thesoftware tool provides.Since its official release at the beginning of 2017, 54 users registered tothe portal and submitted experiments for a total of 32,240 core-hour in 2017,505,380 in 2018 and 10,040 in 2019, as reported by the EGI Accounting8ortal . Results management.
When a job terminates, the FedCloud Submission Han-dler provides the possibility to retrieve results and also log information. Allthe information related to a job (e.g. the configuration parameters, the logs,the results, the ownership/sharing information and possible comments) arestored on the portal database via the Persistence API until it is deleted bythe user who owns it.The EXTraS portal provides two further key features: the ability to sharean analysis (i.e. the namelist and possibly the results) and support for theinteraction and discussion (in terms of comments) among the users sharingit. Sharing a completed job means not only that the experiment results arevisible to other users, but also that the configuration is shared and can beused as a starting point for re-submitting the experiment on a new set of data.Thus, a job execution can be replicated by other users that can, for example,validate the experiment results or explore the behaviour by changing one ora few parameter values.Any result computed within the portal is not automatically transferredin the archive, but it has to be validated by the project community, that canuse the portal to publicly discuss it.
Non-academic users.
Citizen scientists can interact with the portal exactlyas the scientists can. In particular they can exploit the EGI Fedcloud in-frastructure for submitting jobs. This has been made possible by the use of“robot certificates” [43].EGI Fedcloud in fact relies on a single sign-on mechanism to access thefederated services based on X.509 certificates and VO membership. Robotcertificates were introduced to allow users, who cannot get or are not famil-iar with personal digital certificates, to exploit any distributed infrastructurerelying on them in their research activities. The robot certificate is usuallyassociated with a specific application (or function) that the application de-veloper/provider wants to share with all the VO [44]. This is exactly thescenario which has arisen in the EXTraS activities, because portal users areprovided with the possibility to run only pre-defined software tools. There-fore the FedCloud Submission Handler module interacts with an e-Token https://accounting.egi.eu/vo admin/cloud/ Figure 2: The Transient Analysis UI shown by the Workflow configuration module of theEXTraS portal.
4. The EXTraS Data Archive
The EXTraS Data Archive forms the primary online repository for thedata generated during the EXTraS project. This builds on the pointed andslew XMM-Newton Serendipitous Source Catalogues [47, 48] and supportsa wide range of newly-generated X-ray light curves, associated source cata-logues with measures of variability and statistical significance, and derivedproducts such as hardness ratios, power spectra, source classifications andsource summary files. 10he design of the archive is derived from the existing high-energy astro-physics data archive Leicester Database and Archive Service (LEDAS ) atthe University of Leicester, which hosts data from several major X-ray mis-sions, including a full set of XMM-Newton EPIC serendipitous data (around200 million seconds of observing time). The core archive system originallydeveloped for LEDAS has been completely rewritten to current software de-velopment standards (including Model-View-Control architecture and unittesting). Moreover a major technological development in archive provisionpost-2000 has been the rise of e-Science, and particularly in the field of as-tronomy, the Virtual Observatory (VObs) concept [49, 50]. The goal of theVObs is to provide a standardised set of metadata, access protocols and sup-porting software tools to improve access to, and interoperability between,very large astronomical datasets. The data.
EXTraS draws on the 3XMM-DR4 public data to search for pe-riodic, aperiodic and transient sources with much greater fidelity than thebasic light curves and Fourier Transforms generated as part of the 3XMMautomated processing. This variability is being studied down to the nativetime resolution of the instruments (typically 73 ms and 2.6 s for the EPIC pnand MOS cameras respectively). Long-term variability is being characterisedfor all sources with multi-epoch data. Upper limits will be computed in thecase of non-detections in specific observations.Catalogues and bulk products produced during the EXTraS project havebeen transferred to the University of Leicester, UK (UoL) to be incorporatedin LEDAS. Details are provided in [51]. A total of 18 TB of LEDAS/EXTraSdata is presently held on UoL central archival storage.In order for VObs applications to handle EXTraS time series datasetsfully and consistently, the data structures and metadata must be documentedaccording to a standard syntax, as defined by the International Virtual Ob-servatory Alliance (IVOA) [52]. But the existing data models are insufficientto describe EXTraS data products. This is the reason why during the projectwe consider that the development of a specific data model should follow twobasic principles: simplicity, which eases management and adoption, and ad-herence to existing IVOA standards, which increases the probability of wideradoption. To that end, we propose to base the new data model, namedExtDM, on the existing PhotDM standard, with extension to the time do- The search form.
The main page of the archive provides a top-level menu toaccess EXTraS data products subdivided by the kind of analysis they are re-lated to, i.e. Aperiodic Variability, Search for Periodicity, Transient & HighlyVariable Sources, Long-Term Variability and Multiwavelength classification.A combined catalogue, which searches all EXTraS source catalogues, and abasic catalogue cross-matching facility are also provided. Direct cataloguedownloads in FITS [53] and VOTable [54] formats are available. It also pro-vides an option to access all catalogues currently held by the LEDAS system(over 900 catalogues at the time of writing) covering a wide variety of mis-sions and wavebands. Online help is available for all catalogues.The catalogue search form, shown in Figure 3, allows searches on a givensky position, using either a cone (fixed radius from search centre), box orrectangle search area. The form makes also possible to type the source nameand click on “Resolve Name”. In this case the Simbad database [55] is queriedto find the source coordinates for the object. If successful, these are displayedin the Cone Search input box on the query form. A minimal or full set oftable columns can be selected, and a variety of output formats displayed.Generation of archive search results in multiple formats (HTML, ASCIItable, CSV, VOTable etc) is accomplished using the Twig templating lan-guage for PHP . Twig is a full-featured presentational language for dataoutput including generation of web pages – it handles branch and loop con-structs and supports multiple inheritance, enabling complex page layouts tobe built from a common set of simple blocks. Tabular search results arepresented using the Javascript DataTables library , which produces tableswhich can be filtered, sorted and paginated on the fly by the end user. Anexample is shown in Figure 4 for the full list of results, while the bulk prod-ucts associated within each given dataset ID organised by category is shownin Figure 5. http://twig.sensiolabs.org/ igure 3: The advanced catalogue search form in the EXTraS archive Visualization.
Beside the download of the results data, an expanding setof interactive visualisations for EXTraS catalogue and bulk product data isprovided. The visualisations are generated directly in the browser (i.e. byclicking on the blue “PREVIEW” button at left links in Figure 5) and requireno additional software installation.The time-series analysis tasks generate science output typically in FITSformat, which cannot natively be displayed in a web browser. A static “quicklook” graphical view for e.g. spectra and time series is already available forsources present in the 3XMM-DR5 catalogue in PNG and PDF formats. Fornew sources the archive data ingest script is able to generate such staticimages automatically using e.g. Python matplotlib.13 igure 4: The full list of results as an HTML page. The buttons on the left allow one toexplore all the products and the links to directly download selected files
Moreover the archive interface has been enriched with dynamic, interac-tive visualisations using the Python graphics library Bokeh running on thePython WSGI application server gunicorn . Requests are sent to the visual-isation server using a REST interface, and the server responds with a JSONdata block which is rendered by the view using standard AJAX techniques.Examples of EXT-PDA visualisation output are shown in Figures 6 and 7.The use of JSON for data transfer enables additional features to be addedto EXTraS archive data analysis in the future using standard Javascriptlibraries, such as jStat .
5. The Education Activities
During the project a workshop scheme has been designed to let highschool students play as researchers, i.e. they are introduced to the scientificmethod and they learn how to work with the EXTraS software tools andresults. The formal goal of the students is the validation and classificationof astronomical sources with a peculiar temporal behaviour (pre-selected byresearchers), by exploring the data extracted from the EXTraS data archive. http://bokeh.pydata.org/ http://gunicorn.org/ igure 5: The summary page of the catalogue results for each single source The workshop has been designed as a live role playing-like activity witha strong hands-on strategy and some formal education: the students aredivided into cooperating groups that attend laboratories and lessons given byresearchers. The groups gather information about high energy astrophysics,space technologies, data mining and analysis. Their ability to interact withexperts is fundamental to reach their goals.A typical workshop lasts about 1-2 weeks (40-80 hours). We explicitlyadopt an inquiry-based learning strategy, together with a peer-to-peer edu-cation approach. This global strategy both fosters the critical thinking and15 igure 6: Bokeh visualisation of Aperiodic Variability MOS1 timeseries engages the students, who act as though they were researchers. A short butclear framework for describing inquiry in astronomy is provided by the IAUproject astroEDU [56].During the analysis of possibly interesting X-ray source candidates theyare guided by five major questions:1. Is the selected candidate a real, astrophysical X-ray source?2. Is the peculiar temporal behaviour related to any instrumental effect?3. Has the source a possible counterpart at other wavelengths (e.g. opti-cal)?4. What kind of astrophysical object or phenomenon might it be?5. Might it be an important scientific discovery?We run several workshops and other education activities laying on theworkshop experience in Italy, Germany and United Kingdom, engaging morethan 200 high school students, from 2015 onwards. Details are available on http://astroedu.iau.org/ebl/ igure 7: Bokeh visualisation of Long-Term Variability timeseries the project website .The first part of the workshop aims at engaging participants in contempo-rary astrophysics, with a specific regard to the high energy band. Researchersor teachers introduce them to the contents and the technical language of as-tronomy (such as field of view, electromagnetic spectrum, time resolution,spatial resolution and energy resolution, light curve, photons and so on) andmaths (such as probability, confidence level and so on). At this stage, re-searchers and teachers should try to avoid a frontal approach. They arerequested to solicit questions and discussion on the treated topics, instead ofacting as pure experts. The goal of this part is to provide the students withthe right method and conceptual tools to face the tasks of data reductionand interpretation, so that some kind of passive learning is assumed on thepart of the students.The second part of the workshop aims at giving the students the oppor-tunity to analyse the results produced by the EXTraS software tools, either Using the archive.
A sample of sources with peculiar temporal behaviouris pre-selected by researchers, based on a blind search in the results fromthe automatic characterization of variability performed within EXTraS. Suchsources are proposed to the students for validation and classification. Weconsidered the case of short-term variability, ranging from a few seconds toa few hours (i.e. on a time scale shorter than the duration of an XMMobservation) – in particular, focusing on flaring/transient sources (displayinga sudden, large increase in flux, and a subsequent fading to the pre-burstflux level) and on eclipsing/dipping sources (displaying a sudden drop in flux,followed by a fast rise to the pre-eclipse level). For each source, students areasked to:1. confirm that the source candidate corresponds to a real astronomicalsource by inspecting the EPIC sky images (is the source candidate closeto a CCD edge or defect? Is it located close to a very bright source orsuperimposed to a bright diffuse source?);2. confirm that the candidate flare or dip is real, by inspecting the lightcurves of the three EPIC cameras;3. exclude the possibility that the flare/dip is related to imperfect back-ground subtraction, by inspecting the background light curves (is thefeature simultaneous to a background flare?);4. check whether the feature can be detected in multiple energy ranges,by inspecting energy-resolved light curves;5. check whether the source is already known and classified, and whetherits peculiar temporal behaviour had already been observed and pub-lished, by using the Simbad database;6. if the source is not known/classified, search for possible counterpartsin multi-wavelength catalogues and using the ESASky tool ;7. propose a classification for the source and an interpretation for its tem-poral behaviour; http://sky.esa.int
18. try to determine whether this could be an interesting discovery.All of the above steps can be completed by inspecting EXTraS productsin the archive, or using online resources linked to the archive web pages(e.g. the 3XMM catalogue and its products, the Simbad database, results ofcross-correlation with multi-wavelength catalogues).We decided to focus on the cases of flares and eclipses, because the simpleshape of the temporal modulation eases several steps of the analysis. In prin-ciple, the same approach could be applied to any kind of temporal variability.For instance, an interesting possibility could be to study sources with largevariability on a long time scale (ranging from few days to few years - e.g.between different XMM observations). Details on how the students use thearchive interface are provided in Appendix A.
Using the portal.
The EXTraS science gateway has also been designed togive non-expert users the possibility of extending the EXTraS analysis tonew XMM observations or reproducing the EXTraS analysis of old observa-tions with different settings and input parameters. During the workshop, thestudents have also the opportunity to experience this additional step in thework of a researcher by selecting targets from recent XMM observations, notincluded in the EXTraS archive, and performing the full EXTraS analysis,starting from raw data and reaching down to the interpretation of results.The choice of the targets of the new observations is left to the groupsof students, with only some general suggestions by the researchers. Thisresulted in some groups selecting particular sky regions (e.g., star clustersor nearby galaxies) and other choosing random positions (e.g., selecting themost recent or the longest observations).After the selection of the XMM observations (the observation ID is themain input data that students enter into the portal interface), the studentscould process the data (automatically downloaded from XSA) with the sameTransient & High Variability software developed and used in the project,with the possibility to modify a limited set of parameters (e.g., the detectionthreshold), using the portal UI as described in details in [6]. In order to obtainthe detection of a significant number of transient candidates to validate, theresearchers suggested suitable values, but encouraged the students to repeatthe analysis with different parameters.Since the full data processing of each observation takes from a few toseveral hours, each group could select only a handful of observations and19he data processing could be performed by submitting the jobs to Fedcloudresources and collecting the results on the following day. After the dataprocessing the results are stored in a TAR (Unix Tape Archive format) fileready to be downloaded to the students’ laptops, independently of the archiveinterface. The results can then be analyzed by the students following aprocedure similar to the one described before for the analysis of sources inthe EXTraS archive.Given the limited number of observations that could be processed duringthe workshops, compared to the much larger amount of computing time usedduring the EXTraS project to populate the archive, only a few possibly realX-ray transients could be discovered by the students.This aspect is of particular importance. On the one side we showedthe students a significant part of the scientific process leading to scientificdiscovery and the difficulty in obtaining scientifically relevant results. On theother side this limited test demonstrates how the use of the portal by a muchbroader community of citizen scientists, guided by expert astronomers, couldsubstantially support the scientific community in performing such analysis.Despite the low probability of finding a new transient, analyzing the ob-servation 0781690201 a group of students discovered a new X-ray transientcandidate. This is an observation of a region of the Galactic plane with aduration of ∼
78 ks. The Galactic plane is characterized by the presence ofnumerous stars that can give rise to stellar flares, which are the most fre-quent transient events expected in the X-ray sky. Using the visualizationsoftware developed within the EXTraS project for the screening analysis ,the students identified the new X-ray transient by analyzing the light curveof the three EPIC instruments, selecting the 0.5-4.5 keV energy band.The peak in the light curve is clearly visible in Fig 8, with a duration of ∼
800 seconds, in a period of low background. The students then enteredthe coordinates of the transient candidate in the ESASky online tool tosearch for counterparts in other energy bands, finding a correspondence witha clearly visible star in the Optical/IR bands. The first result: the peculiar transient J1806-27.
A very unusual phenomenonwas discovered within a workshop held at INAF/IASF Milano in September2017. Six high-school students participated in the research work: Lorenzo http://sky.esa.int igure 8: The analysis of a transient candidate with the EXTraS screening visualizationtool. Apollonio, Bartolomeo Bottazzi-Baldi, Martino Giobbio, Razvan Patrolea,Elena Pecchini and Cinzia Torrente from Liceo Scientifico G. B. Grassi,Saronno, Italy.As an input for the students’ work, we blindly selected ∼
200 sourcesin the EXTraS database – half of them being consistent with a transientbehaviour, and half with dipping behaviour. The students were arrangedin two groups, with the task of scanning the list of sources, following thesteps described in the previous section. At the end of the workshop, thetwo groups presented their results – a small sample of sources they hadselected as potentially interesting. The case of the candidate transient 3XMMJ180608.9-274553 (J1806-27) drew our attention.The transient is detected with high significance by all EPIC cameras andlasts a few minutes only, as shown in Fig. 9. The source, which had neverbeen studied before, lies within the core of the Galactic globular cluster NGC6540. The students’ report triggered a more detailed analysis by a team ofastronomers at INAF/IASF. The transient, with a duration of ∼
300 s anda symmetric rise/decay time, turned out to be very peculiar, defying any21lassification. Assuming the source to be in NGC 6540, its peak luminosityof ∼ erg s − is orders of magnitude lower than the one of thermonuclearX-ray bursts from neutron stars in low-mass X-ray binary systems; a flarefrom a coronally active star in the cluster can also be excluded becauseof the short duration, a factor ∼
100 smaller than expected based on theknown correlation between luminosity and duration of stellar flares. Thepossibility of a less luminous flare from a foreground star is also unlikely,because of the lack of an optical counterpart in archival images collectedwith the Hubble Space Telescope. The properties of this peculiar transienthave been published in full detail in [57]. The discovery and the active roleplayed by students were the subject of an article on the web site of theEuropean Space Agency , and were also included in the weekly “Researchhighlights” section by the Nature magazine . Figure 9: The brightness of J180608.9274553 changed by up to 50 times its normal levelin 2005, and quickly fell again after about five minutes.
6. Conclusions and Future Work
The EXTraS project, being related to rare, exotic phenomena and toextreme, poorly known astrophysical sources, has great potential for the https://goo.gl/rUigBv cknowledgments This research has made use of data produced by the EXTraS project,funded by the European Union’s Seventh Framework Programme under grantagreement no 607452. This work used the EGI infrastructure with the ded-icated support of CYFRONET-CLOUD and INFN-CATANIA-STACK. Au-thors would like to thanks the EGI staff, in particular Giuseppe La Roccaand Diego Scardaci, for their valuable support. J180627 was selected as apotentially interesting source by L. Apollonio, B. Bottazzi-Baldi, M. Giob-bio, R. F. Patrolea, E. Pecchini and C. A. Torrente (Liceo Scientifico G. B.Grassi, Saronno) during their stage at INAF-IASF Milano in 2017, Septem-ber, within the Alternanza Scuola-Lavoro initiative of the Italian Ministry ofEducation, University and Research.
Appendix A. Study of candidate flaring sources
Each group is assigned a list of pre-selected candidate flaring sources –an ASCII file with one row per source, including the source sky coordinatesand a small set of other source identifiers (e.g. observation ID, exposure ID,unique source ID). For each source in the list, each group should completethe following steps, guided by the 5 major questions listed in Sect. 5.1. Go to the EXTraS Archive main data access page, select the “Ape-riodic, short-term variability” query form and search the source bycoordinates.2. in the Results page, a list of all the XMM exposures (for each EPICcamera) analyzed by EXTraS is displayed. Select the appropriate ob-servation and exposure ID for the pn camera and click on the Productsbutton.3. a list of all bulk products generated by EXTraS for the pn camera in theselected exposure is displayed. Follow the link to the 3XMM summarypage. Inspect images: is the source in a noisy/confused region? Is itclose to a detector edge?Step 3 allows one to answer the major question: “Is the selected candidatea real, astrophysical X-ray source?”5. Go back to the list of bulk products; select the Bayesian block lightcurve (full energy range) using the search box (e.g. by searching for24he “bblc” string within the names of the products) and visualize it byclicking on the Preview button. Answer the question: is a flare visible?6. Go back to the list of bulk products; select and visualize the uniform binlight curve (500 s bin, full energy range). Is a flare visible? Identify thepeak and read its time, count rate and the corresponding uncertaintyby clicking on the selected data point in the plot.7. Go back to the list of bulk products; select and visualize the backgroundlight curve. Does it show any correlation between behaviour of thesource and that of the background? Is there any background flaretemporally coincident with the source candidate flare?8. Go back to the list of bulk products; select and visualize the uniformbin light curve (500 s) in the super-soft, soft and hard energy ranges.Is the flare visible in all of the energy ranges?9. Go back to Results page, select for same observation the list of bulkproducts corresponding to the MOS1 camera.10. Repeat steps 4-5 for the MOS1 camera. Is the flare visible in theBayesian blocks light curve? In the uniform bin light curve? Is it thesame feature seen with the pn camera? (simultaneous, with a similarshape of the light curve and roughly half the count rate measured bythe pn?)11. Go back to Results page, select for same observation the list of bulkproducts corresponding to the MOS2 camera.12. Repeat steps 4-5 for the MOS2 camera. Is the flare visible in theBayesian blocks light curve? In the uniform bin light curve? Is it thesame feature seen with the pn camera? (same checks as for MOS1)The above steps 4-11 allow one to answer the major question “Is thepeculiar temporal behaviour related to some instrumental effect?”13. From the bulk products page of any camera, go again to the 3XMMsummary page and follow the link to SIMBAD results.14. A source list is displayed, ranked by (increasing) distance from theposition of the 3XMM source. Select entries within 10 arcsec from the3XMM position by clicking on their identifier.15. For each selected source, read the classification and take note of allthe relevant information (e.g. distance, multi-wavelength counterparts,association to a Globular cluster or to a Supernova remnant...). Fol-low the References link, identify any published paper that could in-clude analysis of the XMM dataset, scan it to assess if the flare is25etected/described; also take note of any X-ray observation performedwith other instruments.16. Start the ESASky app in the browser (not yet integrated with theEXTraS archive) and search the source by coordinates. A portion ofthe sky surrounding the selected position is visualized. It is possible todisplay the sky as seen at different wavelengths by different instruments.17. Assess the presence of any counterpart at different wavelengths (nearinfrared, optical, gamma-ray).The steps 12-16 allow one to answer the major question “Has the sourcea possible counterpart at other wavelengths (e.g. optical)?”.18. Using all the collected information, try to answer the last two majorquestions: “What kind of astrophysical object or phenomenon might itbe?” and “Might it be an important scientific discovery?”