Metadata and their importance in SO/PHI's on-board data processing
K. Albert, J. Hirzberger, D. Busse, J. S. Castellanos Durán, P. Gutierrez-Marques, M. Kolleck
MMetadata and their importance in SO / PHI’s on-board dataprocessing
K. Albert, J. Hirzberger, D. Busse, J. S. Castellanos Durán, P.Gutierrez-Marques, and M. Kolleck, Max Planck Institute for Solar System Research, Göttingen, Germany [email protected]
Abstract.
To cope with the telemetry limitations, the Polarimetric and HelioseismicImager on Solar Orbiter does full on-board data processing. Metadata are central tothe autonomous processing flow, crucial for providing science ready data sets to thecommunity, as well as important in the blind debugging process that will occur in thecommissioning phase. We designed a custom metadata logging system for SO / PHI.This paper shows how the logged information is used in the blind debugging scenario.
1. Introduction
State of the art scientific instrumentation, especially those deployed in deep space, of-ten produce more data than can be downloaded. This is the case for the Polarimetricand Helioseismic Imager (PHI, Solanki et al. 2018) on-board the Solar Orbiter (SO,Müller et al. 2013) spacecraft. SO / PHI is an imaging spectropolarimeter, recordingfour-million-pixel images at six wavelengths in four polarisation states to retrieve fivephysical quantities: magnetic field strength, inclination, azimuth, line-of-sight veloc-ity and temperature map. The limitations on telemetry from Solar Orbiter would allowdownloading ∼
30 raw science data sets in each orbit. An orbit ( ∼
160 days) typically ac-commodates 30 days of observations at strategic points, therefore this would mean littledata return. In addition, due to accuracy requirements of SO / PHI, instrument charac-terisation must be done in orbit, right before the observations, a significant addition tothe necessary telemetry.To maximise science return and cope with telemetry constraints we implementedfull on-board data processing in SO / PHI (see Albert et al. 2018a; Lange et al. 2017).The instrument calculates operational parameters for data acquisition, determines cali-bration data, which is then applied to the science data sets, before reducing them to thefinal physical quantities of interest by inverting the radiative transfer equation (RTE,see Fig. 1). These steps are done for the first time in orbit, with severely limited hard-ware when compared to ground processing (see Albert et al. 2018b), without free entrypoints for verification and without full access to partial results.Metadata play a central role in the design of the on-board data processing system.Each data set has its own associated metadata file, created at image acquisition, con-taining all hardware parameters and processing plans. All processing steps add theirown entries to this information, generating a full log of the processing. This file is thenused both on-board and on ground. 1 a r X i v : . [ a s t r o - ph . I M ] D ec K. Albert, J. Hirzberger, and D. Busse
Figure 1. Typical processing pipeline for spectropolarimetric science data withcompulsory and optional steps.
2. The metadata system
Metadata are crucial to the success of SO / PHI. Due to the on-board processing, the mostimportant source of information on the data reduction details is what we record in theprocess. As the data are for the wide scientific community, the metadata must describethe data sets in their entirety, including all information necessary for their scientific use.
There are four di ff erent sources of metadata: the planning process, the calibration cam-paign, the instrument, and the data processing system. In the planning process wedefine processing parameters, such as the data set identifiers for calibration data. Wewrite these into the so called processing environment, which is a file located on-board.The processing environment is further extended during the calibration campaign withother calculated values. It is then written into the metadata of the data set at acquisition,together with the instrument parameters. During the processing we enlist all operationsperformed with their parameters and return values, followed by higher level informa-tion regarding the reason for those steps, as well as a data set summary, showing thecurrent parameters of the data set. Additionally, at steps where we load additional cal-ibration data, the hardware parameters at the recording of the two data sets are alsocross-checked to generate warnings for ground review. For pixel-wise information weuse an additional image, treated as bit mask, to encode the pixels that reached a NaN value during processing at any point in time, and other areas of interest. Before datadownload this mask is encoded into the temperature image, where we do not looseunrecoverable information by doing so. etadata and its importance for SO / PHI’s on-board data processing The processing pipeline can be executed with the parameters recorded in the metadataof the data sets, or with the current processing environment. Each step of the processingis also based on metadata. We check the basic parameters of the data set: how manyimages does it contain, which area of the detector is it from, was it binned, and how isit scaled. These values determine further actions, e.g. how will we scale the data setto ensure accuracy in the upcoming operations, or the necessity of cropping or binningcalibration data. The fact that this information is carried in the metadata ensures thatno additional information must be passed from one step of the pipeline to the next, andthe data set can be understood at any time independently from the pipeline.
After the data download we check whether there are any errors or warnings, and atwhich step did they occur, if any. At the time of commissioning we may do "blind de-bugging": find any error that occurred without direct access to the pipeline parametersat runtime. In such a case the metadata will provide us with information necessary tofind the error source. During this period partial results will also be available to repro-duce any problem encountered in flight on a ground instrument model.Once the data set is available to the science community, some of the interestsare the steps taken during the on-board processing, and the accuracy of the data set(reconstructed from the logs). In addition, pixel-wise information is also available fromthe masks to ensure that potentially new discoveries are not instrument artefacts.
3. Example for blind debugging
During blind debugging we use the higher level metadata. These contain an entry byeach pipeline block, with name, time of execution, operation target, input parameters,data set dimensions, and return value. In addition to this we also have lower levelmetadata and data set summaries available.An example for erroneous results is shown in Fig. 2, alongside the expected resultsfor comparison. The metadata associated with the results is shown in Fig. 3. From thisinformation it is possible to assess that there was a Feed Select Mechanism (FSM)mismatch between the processed image and the demodulation matrix, indicating thatthe two optical paths are not identical. It is also visible that the OperandID, referring tothe ID of the demodulation matrix is not the expected one, hence there was an operatorerror.
4. Conclusions
Metadata are central to the success of SO / PHI. We have custom designed our metadatacollection system, making it not only contain all essential information for ground use,but also be the central source of information for the processing pipelines. An exampleof ground use in blind debugging is presented, to illustrate how the recorded data giveclues on what could have gone wrong during processing.
Acknowledgments.
Workframe: International Max Planck Research School (IM-PRS) for Solar System Science. Solar Orbiter: ESA, NASA. Grant: DLR 50 OT 1201.
K. Albert, J. Hirzberger, and D. Busse
Figure 2.
Top : Expected results.
Bottom : Obtained, erroneous results. Data isfrom magnetohydrodynamics simulations (Riethmüller et al. 2017), prepared withour instrument simulator (Blanco Rodríguez et al. 2018). For testing purposes theRTE is inverted with H E LI X + . (Lagg et al. 2004).Figure 3. Excerpt of recorded metadata. Top : From the expected results, indicat-ing that the result contains NaNs, as expected.
Bottom : From erroneous results, withwarning regarding the Feed Select Mechanism, and incorrect OperandID.
References
Albert, K., Hirzberger, J., Busse, D., et al. 2018a, in Proc. SPIE, vol. 707, 10707— 2018b, in ASP Conference Series, Vol. 523Blanco Rodríguez, J., del Toro Iniesta, J. C., & Orozco Suárez, D. e. a. 2018, ApJS, 237, 35Lagg, A., Woch, J., Krupp, N., & Solanki, S. K. 2004, Astronomy and Astrophysics, 414, 1109Lange, T., Fiethe, B., Michel, H., et al. 2017, in NASA //