An open-source, end-to-end workflow for multidimensional photoemission spectroscopy
Rui Patrick Xian, Yves Acremann, Steinn Ymir Agustsson, Maciej Dendzik, Kevin Bühlmann, Davide Curcio, Dmytro Kutnyakhov, Frederico Pressacco, Michael Heber, Shuo Dong, Tommaso Pincelli, Jure Demsar, Wilfried Wurth, Philip Hofmann, Martin Wolf, Markus Scheidgen, Laurenz Rettig, Ralph Ernstorfer
AAn open-source, end-to-end workflow formultidimensional photoemission spectroscopy
R. Patrick Xian , Yves Acremann , Steinn Y. Agustsson , Maciej Dendzik ,Kevin B¨uhlmann , Davide Curcio , Dmytro Kutnyakhov , FedericoPressacco , Michael Heber , Shuo Dong , Tommaso Pincelli , Jure Demsar ,Wilfried Wurth † , Philip Hofmann , Martin Wolf , Markus Scheidgen ,Laurenz Rettig , and Ralph Ernstorfer Fritz Haber Institute of the Max Planck Society, 14195 Berlin, Germany Laboratory for Solid State Physics, ETH Zurich, 8093 Zurich, Switzerland Department of Physics, University of Mainz, 55128 Mainz, Germany Department of Physics and Astronomy, Interdisciplinary Nanoscience Center(iNANO), Aarhus University, 8000 Aarhus C, Denmark DESY Photon Science, 22607 Hamburg, Germany Department of Physics, University of Hamburg, 22761 Hamburg, Germany Department of Physics, Humboldt University of Berlin, 12489 Berlin, Germany † Deceased * Corresponding authors: [email protected] , [email protected] , [email protected] Characterization of the electronic band structure of solid state materials isroutinely performed using photoemission spectroscopy. Recent advancementsin short-wavelength light sources and electron detectors give rise to multidi-mensional photoemission spectroscopy, allowing parallel measurements of theelectron spectral function simultaneously in energy, two momentum compo-nents and additional physical parameters with single-event detection capability. a r X i v : . [ phy s i c s . d a t a - a n ] A ug fficient processing of the photoelectron event streams at a rate of up to tensof megabytes per second will enable rapid band mapping for materials charac-terization. We describe an open-source workflow that allows user interactionwith billion-count single-electron events in photoemission band mapping exper-iments, compatible with beamlines at rd and th generation light sources andtable-top laser-based setups. The workflow offers an end-to-end recipe fromdistributed operations on single-event data to structured formats for down-stream scientific tasks and storage to materials science database integration.Both the workflow and processed data can be archived for reuse, providing theinfrastructure for documenting the provenance and lineage of photoemissiondata for future high-throughput experiments. Introduction
Many disciplines in the natural sciences are increasingly dealing with densely sampled mul-tidimensional datasets. The scientific workflows to obtain and process them are becomingincreasingly complex due to the provenance and structure of the data and the informationneeded to be extracted and analyzed [1, 2]. In materials science and condensed matter physics,various spectroscopic and structural characterization techniques produce experimental dataof distinct formats and characteristics. Their creation and understanding require customizedprocessing and analysis pipelines designed by specialists in the respective fields. The grow-ing incentive for building experimental materials science databases [3] that complement es-tablished theoretical counterparts [4] calls for open-source and reusable workflows for dataprocessing [5, 6] that transform raw data to shareable formats for downstream query, analysisand comparison by non-specialists of the experimental techniques [7, 8]. Among the variousproperties associated with materials, the electronic band structure (EBS) of condensed mattersystems is of vital importance to the understanding of their electronic properties in and out ofequilibrium. Multidimensional photoemission spectroscopy (MPES) [9, 10, 11] is an emergingtechnique that bears the potential of high-throughput EBS characterization through bandmapping experiments and holds promise as an enabling technology for building experimentalEBS databases, where data integration requires traceable knowledge of the processing steps2etween the archived and the raw format. Here we present an open-source workflow thatfocuses on band mapping data from MPES. In the following, we briefly introduce the tech-nology of MPES and the associated data processing, before providing details on the workflowfrom raw data to database integration. .h5.h5.h5.h5.h5.h5Distributed binningin original gridFile reductionand conversion.parquet Distributed binningin transformed gridDistributedgrid transformation T O F d r i ft t ube SampleDLD X U V / X -r a y MeasurementcontrollerProcessingworkstationfile outputs AxiscalibrationArtifactcorrectionDatastorage Datavisualization DownstreamanalysisCoordinate mappingsData processing workflow P ho t oe l e c t r on s Data acquisition V i s i b l e l i g h t (1)(2)(4)(3)(5) (6) (7) XYWorkflowparametersWorkflowparametersWorkflowparameters file outputs
Figure 1:
Schematic of the workflow in MPES.
The data acquisition in MPES starts from (1)photoelectrons liberated by the extreme UV (XUV) or X-ray photons travelling through the lenssystems and the TOF tube to trigger detection events on the delay-line detector (DLD). (2) Single-event data acquisition is monitored and controlled by the measurement controller computer. Theraw data are first streamed and stored onto a hard drive in HDF5 format (.h5) and subsequentlyprocessed in the workflow through (3) file reduction (optional), (4)(6) distributed binning, (5) artifactcorrection and axis calibrations, carried out at the single-event or the binned data levels. At the endof the workflow, other data formats are generated (such as HDF5, MAT or TIFF) for (7) storage,visualization or downstream analysis for extracting relevant physical parameters. Critical parameterswithin the workflow may be exported (as workflow parameters files), shared and reused for processingother datasets.
MPES, also called momentum microscopy (MM), is born out of the recent integration oftime-of-flight (TOF) electron spectrometers with delay-line detectors (DLDs) and improved3lectron-optic lens designs [12, 13, 14, 15]. Compared with the earlier generations of angle-resolved photoemission spectroscopy (ARPES) [16, 17] using hemispherical analyzers to mea-sure the 2D energy-momentum distribution of the photoemitted electrons [18], MPES is ca-pable of recording single-electron events simultaneously sorted into the ( k x , k y , E ) coordinates( E : electron energy, k x , k y : parallel momentum components) in band mapping experiments,obviating the need for scanning across sample orientations and subsequent data merging asis the case for similar experiments using a single hemispherical analyzer. Operation of theTOF DLD in MPES requires a pulsed photon source and is directly compatible with 3 rd and4 th generation light sources [19] as well as laboratory-based table-top setups [20, 21, 22, 23],harnessing their high repetition rates in the range of multi-kilohertz to megahertz to dras-tically improve the detection speed and efficiency. Mapping of the 3D band structure withsufficient signal-to-noise ratio (SNR) may be achieved on the timescale of minutes. The tech-nological convergence opens up the possibilities to record 3D datasets in dependence of one ormore additional parameters, such as spatial location I ( x, y, k x , k y , E ), probe photon energy, I ( k x , k y , E, k z ) [10], spin-polarization, I ( k x , k y , E, S ) [9], or pump-probe time in time-resolvedMM, I ( k x , k y , E, t ) [24] within a reasonable time frame.From the data perspective, the pulsed sources with high repetition rates generate denselysampled data at rates of multiple megabytes per second (MB/s), which has brought aboutchallenges in data processing and management compared with conventional ARPES experi-ments. The raw data in MPES are single photoelectron events registered by the DLD andthe physical quantities related to the detected events are streamed in parallel to the storagefiles in a hierarchical file format (e.g. HDF5 [25]). A typical dataset involves 10 − detected events with a total size of up to a few hundred of gigabytes (GBs), depending onthe number of coordinates measured (3D or 4D) and the required SNR. Unlike the large 2Dor 3D image-based datasets, such as those obtained in various forms of optical [26, 27] andelectron microscopy techniques [28, 29], processing and conversion of tabulated single-eventdata requires additional steps of statistical computing for conversion into standard images.This motivates the current workflow development for efficient data processing and analysis.In data processing and calibration, experiments performed at different facilities share similarprocedures going from the raw events to the multidimensional hypervolume with calibratedaxes, which is the basis for archiving and downstream analysis. To maintain reproducibility4or the particular data source and structure, we have summarized the workflow (see Fig. 1)into two open-source software packages ( hextof-processor [30] and mpes [31]), with similar de-sign principles for coping with large-scale facility and table-top experiments, respectively. Thecore of our approach includes distributed statistical processing at the single-event level usingparameters calibrated and determined from preprocessed volumetric datasets, which enableseffective instrument diagnostics, artifact correction, and sample condition monitoring. Thealgorithms involved balance physical knowledge and existing methods in image processingand computer vision. The workflow is illustrated next with data obtained at some of theelectron momentum microscopes currently in operation, such as the HEXTOF (high energyX-ray time-of-flight) measurement system [24] at the free-electron laser source FLASH [32] atDESY, and the table-top high harmonic generation-based setup at the Fritz Haber Institute(FHI) [21] involving a commercial TOF and DLD (METIS 1000, SPECS GmbH). We expectthe workflow described here to serve as a blueprint for upcoming software platforms in similarsetups to be installed in other facilities or laboratories worldwide. Results
Workflow description.
The workflow schematic shown in Fig. 1 starts with raw single-eventdata from measurements. The data are (i) binned in a distributed fashion in the measurementcoordinates, including each of the photoelectrons’ position on the detector (
X, Y ), its TOF, adigital encoder (ENC) axis, and others, if more than four dimensions are acquired in parallel.The binned histogram is (ii) used to estimate the numerical transforms for distortion correctionand axis calibration. Next, these transforms are (iii) applied to the raw single-event data toconvert the measurement coordinates to the physical axes, ( k x , k y , E, t pp ) and others for higherdimensions (see also Fig. 2). Finally, the single-event data are (iv) binned in the transformedgrid to yield 3D, 3D+t or other higher-dimensional data with the correct axis values. Theoutcome may be exported in different formats for storage, visualization and downstreamanalysis. Tasks and software infrastructure.
Processing billion-count single-event data requiresuser interaction for data checking and distributed processing to reduce the time consumption.The general tasks in the workflow include the transformation of data streams to multidimen-5ional histograms, artifact correction and axis calibration. These operations can be efficientlydecomposed into column-wise operations of the distributed dataframe format offered by the dask package [33] in Python. While the use of dask dataframes provide the common foundationfor interactivity with single events of hextof-processor and mpes , they distinguish themselvesby the experimental requirements. At large-scale facilities, experiments often record a largenumber of machine parameters that need to be stored, though only a small number of relevantparameters are needed for downstream processing. Therefore, the hextof-processor packageincludes a parameter sampling step to retrieve intermediate tabulated data in the ApacheParquet format (https://parquet.apache.org/), a column-based data structure optimized forcomputational efficiency. This approach reduces the processing overhead in searching throughthe raw data files every time when data are queried during the subsequent processing. As anopen-source project, other beamtime-specific functionalities are added by users to the existingframework at every new experimental run. The mpes package adapts to the much simplerfile structure produced at table-top experimental setups and makes direct use of the HDF5raw data. It comes with added functionalities motivated by the existing issues encounteredin data acquisition and downstream processing such as axis calibration, masking, alignmentand different forms of artifact correction.
Artifact correction.
Artifacts in MPES data come from mechanical imperfections, strayfields (electric and magnetic), uncertainties in the alignment of the sample, light beams and themultistage electron-optic lens systems as well as the data digitization process. Minimizing andcorrecting instrumental imperfections plays an important role in the validity of downstreamanalysis. We carry out artifact correction sequentially at the level of single photoelectronevents or the data hypervolume obtained from multidimensional histogramming (see Fig. 2).The outcomes are illustrated using the correction of (1) digitization artifact (see Fig. 3) and(2) spherical timing aberration artifact (see Fig. 4), with technical details in Methods.
Axis calibration.
To transform the measurement axes of the DLD into physically relevantaxes for electronic band mapping, calibrations are required, as shown in Fig. 2. The calibra-tion functions are constructed with parameters derived from comparing physical knowledge ofthe materials (e.g. Brillouin zone size, Fermi level position) with the corresponding scales indata. They are applied either to the binned data hypervolume, or to the single-electron events6 Y TOFENC
U(0,1)U(0,1)U(0,1) correctionscalibrationparameterdistribution measurementaxis physicalaxis datahypervolumetransforms binning
Figure 2:
Examples of workflow components.
Illustrations are given for artifact correction andaxis calibration. Characteristic 1D distributions of the measured X , Y , TOF, ENC and an arbitraryaxis are shown on the very left. U (0 ,
1) represent uniformly distributed random noise added to suppressdigitization artifacts (jittering or dithering). The transforms ( g ’s) are calibration functions that convertthe values in the measurement axes to the physical ones. The transform L ( X, Y ) corrects the symmetrydistortion, while the spherical timing aberration and space charge are compensated for by ∆TOF sph and ∆TOF sc , respectively. Binning of the corrected single-event data over the calibrated physical axesyields a multidimensional hypervolume (right picture) of photoemission intensity data along with thephysical axes values. a b c d Figure 3:
Digitization artifact correction by histogram jittering.
Removal of the digitizationartifact is illustrated with a 2D k - E slice across the Brillouin zone center of the band mapping datasetmeasured at FHI on WSe . The images before and after histogram jittering and their difference areshown in a , b and c respectively. A zoomed-in section of the data are shown in the insets in a - c .The effective removal of the digitization artifact is further demonstrated in the momentum-integratedenergy distribution curves in d . The traces in d are computed by averaging horizontally over theircorresponding 2D images in a - c . I n t en s i t y ( a . u . ) W4f
CenterEdge 664.6664.40.90.80.70.60.50.40.30.2 r ( a . u . ) Raw dataPolynomial fit rdTOF tube D L D d ∆ d -1.0-0.50.00.51.0 y ( a . u . ) -1.0 -0.5 0.0 0.5 1.0x (a. u.) 665.2664.8664.4 CenterEdge r ab c d Corrected datasource peak TOF (ns)
Figure 4:
Spherical timing aberration correction.
The correction is demonstrated using W4fcore-level data measured at FLASH. The energy spacing between the W4f / and W4f / levels isabout 2.1 eV [34]. a. Illustration of the geometric origin of the spherical timing aberration in thetime-of-flight (TOF) drift tube. b. Comparison of the W4f spectra at the center and on the edge ofthe detector plane. The energy spectra are extracted from the corresponding regions, marked by thedots in the same blue and red colors, respectively, in c . The white stripes crossing at the detectorcenter block the exposed edges of the four-quadrant detector quadrants. d. The uncorrected andcorrected radial-averaged peak TOF positions for the W4f / core level. Data storage and format.
The simplistic form of the output data hypervolume derivedfrom single-electron events includes non-negative scalar values of the photoemission intensityand the calibrated real-valued axes coordinates, including k x , k y , E , and other parameterdependencies such as the pump-probe time delay t pp . These values are exported as HDF5,MAT or TIFF, with the metadata included as attributes of the files. Workflow archiving and reuse.
Computational workflows are valued by their the repro-ducibility [35]. Archiving and sharing the workflow parameters among users of the beamlinesor facilities allow comparison between experimental runs and reuse for the simultaneous ben-efits of machine diagnostics and experimental efficiency. To achieve this, we store criticalparameters generated within the workflow in a separate file as workflow parameters (see Fig.1) during each step, including the numerical values used in binning, the intermediate parame-ters and coefficients of the correction and calibration functions, etc. They can be reused whenloading into the processing of other datasets. 8 ata visualization.
The adaptation of established scientific visualization methods in thephysical sciences [36, 37] to band mapping data should incorporate the requirements andknowledge of the data characteristics in this field of research. The band mapping data in3D (multi-megavoxel) and 3D+t (multi-gigavoxel) include the inherent symmetries from theelectronic band structure of the material, but the intensity modulations in the photoemissionprocess [38], dynamics and sample condition disrupt the original symmetry. The overallgoal is to emphasize the features of interest while exploiting the symmetry to simplify thevisualization (see Methods). The output files from the processing pipeline are compatible withopen-source visualization software such as matplotlib [39],
ParaView [36] and
Blender [40]. cba Ek x k y Figure 5:
Typical visual representations of the volumetric band mapping data.
The exam-ples are illustrated using band mapping data of the layered semiconductor WSe , measured with theHEXTOF instrument at FLASH and the METIS detector at the FHI (see Methods). The visualiza-tions are a. the orthoslice representation, b. the band-path diagram (right) with the momentum pathlabelled in dashed blue line in the momentum k x - k y plane (left), and c. the cut-out view. All colorscales represent photoemission intensity. The letters label the high symmetry points of the hexagonalBrillouin zone of WSe [41]. Downstream analysis integration.
Typical photoemission data analysis involves extract-ing electronic band structure parameters, physical coupling constants and lifetimes via fitting9f lineshapes [16] or dynamical models [42], often carried out specific to the material understudy. At the end of our distributed workflow, the data size is on the order of a few to tens ofgigabytes, which can be directly loaded into memory on users’ local machines for downstreamdata analysis with custom routines.
Experimental metadata.
The metadata of the data files have a tree structure and containinformation of the experimental setting, parameters of the pulsed light source, the detectorand the sample under study. A list of top-level metadata parameters is presented in Table1. A full and current list of all metadata parameters, including the top-level parametersand their constituent lower-level parameters, along with their definitions, units and values,is provided in Supplementary Tables 1-4. For database integration, an accompanying dataparser ( parser-mpes , see Code Availability) for MPES data has been written in accordancewith existing standards [43] for computational materials science in NOMAD [8], featuring anelectronic version of the metadata parameter list in the file mpes.nomadmetainfo.json online.The metadata parameter list and the data parser are versioned and are updated based onthe corresponding changes in the data structure for photoemission spectroscopy experiments.The existing WSe photoemission data have been integrated into the experimental section ofthe materials science database NOMAD (see Data Availability). Table 1: Top-level metadata parameters
Category name DescriptionGeneral parameters Descriptive information of the experiment and facilitySource parameters Technical parameters relating to the photon sourceDetector parameters Technical parameters relating to the photoelectron detectorSample parameters Parameters relating to the material sample in experiment
Discussion
We have designed and implemented an open-source, end-to-end workflow for processing single-event data produced in multidimensional photoemission spectroscopy, linking to downstreamtasks, providing guidelines and software for integrating processed data into the NOMADexperimental materials science database. The distributed processing takes full advantage ofthe single-event data streams directly accessible from the TOF delay-line detector for event-10ise correction and calibration and converts the raw events to the calibrated data hypervolumefor project-specific downstream analysis. The functionalities within the workflow are publiclyaccessible through the software packages we have developed ( hextof-processor [30] and mpes [31]). The processing workflow is archived at each step of operation and the processed datamay be integrated into experimental database with user-specified metadata. The methodsdescribed here are applicable to all existing types of multidimensional photoemission bandmapping measurements beyond the static and time-dependent settings described here.Our end-to-end workflow from raw data to processed data to database integration pro-vides a fast-track and all-in-one solution to the demands for open experimental data andreproducible research in the materials science community [7, 8]. The public repositoriesfor the software packages are the foundations for phased future extension and integrationwith existing analytical tools in the photoemission spectroscopy community. The modularstructure of the packages introduced here allows targeted upgrades by both temporary anddedicated maintainers and users. Casting the workflow in the Python programming envi-ronment provides the foundation for convenient incorporation of existing image processingand machine-learning resources [44] for further exploration and understanding of the bandmapping datasets, which contain rich information owing to the complex nature of the photoe-mission process [16, 18]. This is especially beneficial for broader adoption of photoemissionsince the interpretation of photoemission data is often linked to the observed or extractedoutstanding features such as local intensity extrema, dispersion kinks and satellites, lineshapeparameters and pattern symmetry [16], therefore, the access to experimental data and thepotential integration with existing electronic structure-related software [5, 45, 46, 47] willfacilitate method developments and the direct comparison between experimental results andtheoretical band structure calculations within the same programming platform.
Methods
Sample preparation.
Single-crystalline samples of 2D bulk WSe were purchased fromHQ Graphene. Crystals of size around 5 mm × × Photoemission experiments.
The measurements were conducted using the HEXTOFinstrument [24] at the DESY FLASH PG-2 beamline [48] with the free-electron laser (FEL)as well as a laboratory source [21] with a METIS electron momentum microscope (SPECSMETIS 1000) installed at the FHI. In all measurements at FLASH, the FEL was tuned to 36.5eV (or 34.0 nm) and 109 eV, the optical pump pulse had a center wavelength of 775 nm. Themeasurements at the FHI used a 21.7 eV home-built extreme ultraviolet source based on highharmonic generation and driven by an optical parametric chirped-pulse amplifier operatingat 500 kHz repetition rate [49].
Digitization artifact.
The time-to-digital converter (TDC) outputs digitized data accord-ing to the binning width of the on-board electronics. Data conversion from one digitizedformat to another in a rebinning process often creates a picket fence-like effect (see Fig. 3).This phenomenon originates from the incommensurate bin size in the two rounds of samplingprocesses (binning and rebinning). To solve the problem, one introduces a slight amount ofuniformly distributed noise, with an amplitude equal to half of the original bin size, to thesingle-event values when carrying out the bin counts. This is similar to the histogram jitter-ing (or dithering) technique [50, 51] used in statistical visualization and computer graphics.Mathematically, the uniformly distributed noise U (0 ,
1) bounded in the range [0 ,
1] is addedbefore binning a univariate data stream, S = { S i } via, S i = S i + w b × U (0 , . (1)Here, w b is the bin width. For binning of multivariate data streams, such as the detector Xposition (or k x ), Y position (or k y ), and the photoelectron TOF (or E ), we adopt the sameapproach individually for each dimension. The effect of jittering in reducing the digitizationartifact is demonstrated in Fig. 3. Spherical timing aberration.
Electrons entering the TOF tube at different lateral positionstravel through different path lengths to reach the detector, which is the origin of the sphericaltiming aberration as illustrated in Fig. 4. The lateral position-dependent time delay may be12xpressed as, ∆TOF sph ( r ) = (cid:0)q r /d − (cid:1) TOF , (2)where r is the radial distance from the center of the DLD and TOF is the TOF normalizationconstant. For a typical field-free region length of d ∼ r = 50 mm, ∆TOF / TOF ≈ . × − . Assuming TOF = 0 . µs , the sphericaltiming aberration in TOF scale is ∆TOF sph ≈ .
62 ns, which is larger than the DLD’stemporal resolution of ∼ f core-level data presented in Fig. 4b. For every ( X, Y ) positionon the detector the peak of W4 f / was fitted with a Voigt profile and the peak positions areshown in Fig. 4c. As the spectra from deep core levels typically do not show dispersion, thedeviation from fitting corresponds to the spherical timing aberration of the electron optics.In order to compensate for the spherical timing aberration, we first transform the data fromCartesian to the polar coordinates (see Fig. 4c), and then fit the radial-averaged peak positionto a polynomial function of the radius,∆TOF sph ( r ) = r TOF d − r TOF d + O( r ) . (3)The fitting results together with the corrected radial distribution are presented in Fig. 4d. Symmetry distortion.
Photoemission patterns in the ( k x , k y ) plane (i.e. an energy slice)may exhibit distorted symmetry due to the influence of various factors from the instrument,the sample and the experimental geometry on the trajectory of low-energy photoelectrons.Correction of the symmetry distortion yet preserving the intensity features requires the useof symmetry-related landmarks to solve for the symmetrization coordinate transform in theframework of nonrigid image registration [52]. In typical situations with an excellent electronlens alignment, the energy dependence of the momentum distortion within the focused phasespace volume covering an energy range of several eV is negligible, so the same coordinatetransform can be applied to all energy slices in the volumetric data (including both valenceand conduction bands) or simultaneously to all single events. Other single-experiment artifacts. (1) Momentum center shift: The momentum center13f the emergent photoelectrons travelling through the electron-optic system may experiencean energy-dependent shift owing to the slight misalignment in the system or the influenceof stray fields. Correction of the center shift requires an energy-dependent center alignmentof energy slices. The shift along the energy (or TOF) axis may be estimated using phasecorrelation [53] or mutual information-based [54] sequential image registration methods, inwhich the series of energy slices are treated as an image sequence. In a well-shielded andwell-aligned electron-optic lens system, generally, the momentum center shift is negligiblein the focused photoelectron energy range. (2) Space-charge effect (SCE): The secondaryphotoelectron clouds originating from the probe and pump pulses cause a “doming effect” ofthe photoemission intensity distribution around the momentum center of the band structure.This is especially visible in systems with a clear Fermi edge [9, 11] or non-dispersing shallowcore levels, which may be used as references for calibrating the parameters used for theflattening transform by applying a momentum-dependent shift ∆TOF sc ( k x , k y ) in the TOF(or the calibrated energy) coordinate of the single-event data. Momentum calibration.
The scaling factors for momentum calibration are computed bycomparing the positions of known high symmetry points in the band structure with theircorresponding locations in an energy slice. Suppose A and B are two high symmetry pointsidentifiable (e.g. as local extrema) from the experimental data with pixel positions ( X A , Y A ) and ( X B , Y B ), and momentum positions, ( k Ax , k Ay ) and ( k Bx , k By ), respectively. Wecalculate the pixel-to-momentum scaling ratios, f X and f Y , along the X (column) and Y (row) directions of a 2D k -space image, respectively. Then, the momentum coordinate ( k x , k y ) at each pixel position ( X , Y ) may be derived. f D = ( k Ad − k Bd ) / ( D A − D B ) (4) k d = f D × ( D − D A ) ( D, d = X, x or Y, y ) (5)
Energy calibration.
The calibration requires a set of band mapping data measured atdifferent bias voltages (applied between the material sample and the ground), usually sampledwith a spacing of 0.5 V in a range of ± E is approximated as a polynomial function, E (TOF) = n X i =0 a i TOF i (6)The approximation is sufficiently accurate within a range of ∼
20 eV, sufficient to cover theentire valence band and some low-lying conduction bands of typical materials. The polynomialcoefficients are determined using nonlinear least squares by solving ∆ T · a = ∆ E , in which a = ( a , a , ... ) T is the coefficient vector while the constant offset a is determined by manualalignment to an energy reference, such as the Fermi level or valence band maximum. Thevector ∆ E and the matrix ∆ T contain, respectively, the pairwise differences of the biasvoltages and the polynomial terms of differential TOF values. To calibrate a large energyrange including multiple core levels, a piecewise polynomial may be used [11]. Pump-probe delay calibration.
The time origin (’time zero’) in time-resolved photoemis-sion spectroscopy, i.e. the temporal overlap of pump and probe pulses, is determined by fittingof a characteristic trace extracted from the data. Since the readings of the digital encoder (seeFig. 2) are sampled linearly, equally-spaced pump-probe delays are directly convertible fromthe readings using linear interpolation, given the boundary values of the translation stagepositions and the corresponding delay times. For unequally-spaced delays, a delay marker isfirst added to each data point as a separate column after data acquisition to group togetherthe encoder reading ranges that correspond to the specific time delays. The data binning iscarried out over the delay marker column instead of the equally-sampled encoder readings.
Visualization strategies.
We discuss here three methods for the display of volumetric bandmapping data, which are, at the same time, the basis for visualizing 3D+t data with time asan animated axis. (1) The orthoslice representation includes orthogonal 2D planes selectedin specific regions in the 3D volume [36], which highlights specific slices deep within the dataless visible in a volumetrically rendered view (see Fig. 5a). Along this line, we have developeda software package, [56], to explore 4D data using simultaneously linked orthoslices,which also features contrast adjustment and data integration within a hypervolume of interest.152) The band-path plot (see Fig. 5b) is a 2D representation of the 3D band mapping volumegenerated by combining a series of 2D cuts along selected momentum paths (or k-paths)traversing a list of so-called high-symmetry points [57, 58]. This representation captures thelargest dispersion within the band structure. For volumetric data, the same path may besampled from all the full energy range to produce the plot shown in Fig. 5b. The analysis and visualization modules in the mpes package include functionalities to compose customizedband-path plots. (3) The cut-out view (see Fig. 5c) exposes a specific part of interest inthe volumetric data, while not losing the rest [36]. The analysis module in the mpes packageprovides ways to generate precise cut-outs using position landmarks (e.g. high-symmetrypoints labelled in Fig. 5) and inequalities.
Acknowledgements
We thank G. Sch¨onhense for support on the photoelectron detector, S. Grunewald, S. Sch¨ulkeand G. Schnapka for support on the computing infrastructures. We thank G. Brenner,H. Redlin and S. Dziarzhytski at FLASH, DESY, for beamline support. The work was par-tially supported by BiGmax, the Max Planck Society’s Research Network on Big-Data-DrivenMaterials-Science, the European Research Council (ERC) under the European Union’s Hori-zon 2020 research and innovation program (Grant No. ERC-2015-CoG-682843), and theGerman Research Foundation (DFG) through the Emmy Noether program under grant num-ber RE 3977/1. D. Kutnyakhov, M. Heber and W. Wurth acknowledge funding by the DFGwithin the framework of the SFB 925 (project B2). F. Pressacco acknowledges funding fromthe excellence cluster EXC 1074 “The Hamburg Centre for Ultrafast Imaging - Structure,Dynamics and Control of Matter at the Atomic Scale” of the DFG. S. Y. Agustsson andJ. Demsar acknowledge the financial support by the DFG in the framework of the Collabo-rative Research Centre SFB TRR 173 “Spin +X”. D. Curcio and P. Hofmann acknowledgefunding from VILLUM FONDEN via the Centre of Excellence for Dirac Materials (Grant No.11744). 16 uthor contributions
Y.A., K.B., S.Y.A., D.C., R.P.X. and M.D. wrote the hextof-processor package. R.P.X. andL.R. wrote the mpes package. D.K., Y.A., F.P., R.P.X., S.Y.A., D.C., M.D., M.H., S.D., P.H.,L.R., R.E. and W.W. participated in the experiments at the FLASH PG-2 beamline usingthe HEXTOF instrument in Hamburg. S.D. and L.R. conducted the experiment at the FritzHaber Institute using the METIS electron momentum microscope. R.P.X., M.D., T.P., L.R.,R.E., M.S. T.P. constructed the metadata format, R.P.X. and M.S. implemented them into parser-mpes . R.P.X. wrote the initial manuscript with contributions from M.D. and Y.A. Allauthors contributed to discussions to bring the manuscript to its final form.
Data availability
The single-event photoemission data used for demonstrating the workflow is available on theZenodo platform at 10.5281/zenodo.2704787 and 10.5281/zenodo.3987303. The preprocesseddata are being integrated into the NOMAD database in the domain for experimental materialsscience data accessible at https://nomad-lab.eu/prod/rae/gui/search?domain=ems.
Code availability
The code for the data transformation in the workflow is available on GitHub, under the names hextof-processor (https://github.com/momentoscope/hextof-processor) and mpes (https://github.com/mpes-kit/mpes). The parser for integrating preprocessed experimentaldata to the NOMAD database is parser-mpes (https://gitlab.mpcdf.mpg.de/rpx/parser-mpes).
Conflict of interests
The authors declare no conflict of interests in the content of the article.
References [1] Pruneau, C.
Data Analysis Techniques for Physical Scientists (Cambridge UniversityPress, 2017). 172] Deelman, E. et al.
The future of scientific workflows.
The International Journal ofHigh Performance Computing Applications , 159–175 (2018). URL http://journals.sagepub.com/doi/10.1177/1094342017704893 .[3] Zakutayev, A. et al. An open experimental database for exploring inorganic mate-rials.
Scientific Data , 180053 (2018). URL .[4] Himanen, L., Geurts, A., Foster, A. S. & Rinke, P. Data-Driven Materials Science:Status, Challenges, and Perspectives. Advanced Science https://onlinelibrary.wiley.com/doi/abs/10.1002/advs.201900808 .[5] Pizzi, G., Togo, A. & Kozinsky, B. Provenance, workflows, and crystallographic toolsin materials science: AiiDA, spglib, and seekpath.
MRS Bulletin , 696–702 (2018).URL .[6] Perkel, J. M. Workflow systems turn raw data into scientific knowledge. Nature ,149–150 (2019). URL .[7] Hill, J. et al.
Materials science with large-scale data and informatics: Unlocking newopportunities.
MRS Bulletin , 399–409 (2016). URL .[8] Draxl, C. & Scheffler, M. NOMAD: The FAIR concept for big data-driven materialsscience. MRS Bulletin , 676–682 (2018). URL .[9] Sch¨onhense, G., Medjanik, K. & Elmers, H.-J. Space-, time- and spin-resolved photoe-mission. Journal of Electron Spectroscopy and Related Phenomena , 94–118 (2015).URL http://linkinghub.elsevier.com/retrieve/pii/S0368204815001243 .[10] Medjanik, K. et al.
Direct 3D mapping of the Fermi surface and Fermi velocity.
Na-ture Materials , 615–621 (2017). URL . 1811] Sch¨onhense, B. et al. Multidimensional photoemission spectroscopy—the space-chargelimit.
New Journal of Physics , 033004 (2018). URL http://stacks.iop.org/1367-2630/20/i=3/a=033004?key=crossref.d5618dec9ebafc233e6eff1b7ce89ee0 .[12] Kr¨omker, B. et al. Development of a momentum microscope for time resolved bandstructure imaging.
Review of Scientific Instruments , 053702 (2008). URL http://aip.scitation.org/doi/10.1063/1.2918133 .[13] Ovsyannikov, R. et al. Principles and operation of a new type of electron spectrometer –ArTOF.
Journal of Electron Spectroscopy and Related Phenomena , 92–103 (2013).URL https://linkinghub.elsevier.com/retrieve/pii/S0368204813001357 .[14] Damm, A. et al.
Application of a time-of-flight spectrometer with delay-line detectorfor time- and angle-resolved two-photon photoemission.
Journal of Electron Spectroscopyand Related Phenomena , 74–80 (2015). URL https://linkinghub.elsevier.com/retrieve/pii/S0368204815000663 .[15] Tusche, C., Krasyuk, A. & Kirschner, J. Spin resolved bandstructure imaging with ahigh resolution momentum microscope.
Ultramicroscopy , 520–529 (2015). URL https://linkinghub.elsevier.com/retrieve/pii/S0304399115000698 .[16] Damascelli, A., Hussain, Z. & Shen, Z.-X. Angle-resolved photoemission studies of thecuprate superconductors.
Reviews of Modern Physics , 473–541 (2003). URL https://link.aps.org/doi/10.1103/RevModPhys.75.473 .[17] Yang, H. et al. Visualizing electronic structures of quantum materials by angle-resolvedphotoemission spectroscopy.
Nature Reviews Materials , 341–353 (2018). URL .[18] Suga, S. & Sekiyama, A. Photoelectron Spectroscopy: Bulk and Surface ElectronicStructures (Springer, 2014). URL https://link.springer.com/content/pdf/10.1007{%}2F978-3-642-37530-9.pdf .[19] Couprie, M. New generation of light sources: Present and future.
Journal of ElectronSpectroscopy and Related Phenomena , 3–13 (2014). URL https://linkinghub.elsevier.com/retrieve/pii/S0368204813002429 .1920] Chiang, C.-T. et al.
Boosting laboratory photoelectron spectroscopy by mega-hertz high-order harmonics.
New Journal of Physics , 013035 (2015).URL http://stacks.iop.org/1367-2630/17/i=1/a=013035?key=crossref.6ebe926eef9ebdacee9ddaabac19036d .[21] Puppin, M. et al. Time- and angle-resolved photoemission spectroscopy of solids in theextreme ultraviolet at 500 kHz repetition rate.
Review of Scientific Instruments ,023104 (2019). URL http://aip.scitation.org/doi/10.1063/1.5081938 .[22] Corder, C. et al. Ultrafast extreme ultraviolet photoemission without space charge.
Struc-tural Dynamics , 054301 (2018). URL http://aca.scitation.org/doi/10.1063/1.5045578 .[23] Buss, J. H. et al. A setup for extreme-ultraviolet ultrafast angle-resolved photoelectronspectroscopy at 50-kHz repetition rate.
Review of Scientific Instruments , 023105(2019). URL http://aip.scitation.org/doi/10.1063/1.5079677 .[24] Kutnyakhov, D. et al. Time- and momentum-resolved photoemission studies using time-of-flight momentum microscopy at a free-electron laser.
Review of Scientific Instruments , 013109 (2020). URL http://aip.scitation.org/doi/10.1063/1.5118777 .[25] Folk, M., Heber, G., Koziol, Q., Pourmal, E. & Robinson, D. An overview of the HDF5technology suite and its applications. In Proceedings of the EDBT/ICDT 2011 Workshopon Array Databases - AD ’11 , 36–47 (ACM Press, New York, New York, USA, 2011).URL http://portal.acm.org/citation.cfm?doid=1966895.1966900 .[26] Weiler, N. C., Collman, F., Vogelstein, J. T., Burns, R. & Smith, S. J. Synaptic molec-ular imaging in spared and deprived columns of mouse barrel cortex with array tomog-raphy.
Scientific Data , 140046 (2014). URL .[27] Ker, D. F. E. et al. Phase contrast time-lapse microscopy datasets with automatedand manual cell tracking annotations.
Scientific Data , 180237 (2018). URL .2028] Levin, B. D. et al. Nanomaterial datasets to advance tomography in scanning transmis-sion electron microscopy.
Scientific Data , 160041 (2016). URL .[29] Aversa, R., Modarres, M. H., Cozzini, S., Ciancio, R. & Chiusole, A. The first annotatedset of scanning electron microscopy images for nanoscience. Scientific Data , 180172(2018). URL .[30] Acremann, Y. et al. hextof-processor. https://github.com/momentoscope/hextof-processor . URL https://github.com/momentoscope/hextof-processor .[31] Xian, R. P. & Rettig, L. mpes. https://github.com/mpes-kit/mpes . URL https://github.com/mpes-kit/mpes .[32] Ackermann, W. et al. Operation of a free-electron laser from the extreme ultraviolet tothe water window.
Nature Photonics , 336–342 (2007). URL .[33] Dask Development Team. Dask: Library for dynamic task scheduling (2016). URL https://dask.org .[34] Shallenberger, J. R. 2D tungsten diselenide analyzed by XPS.
Surface Science Spectra , 014001 (2018). URL http://avs.scitation.org/doi/10.1116/1.5016189 .[35] Stodden, V. et al. Enhancing reproducibility for computational methods.
Science , 1240–1241 (2016). URL .[36] Hansen, C. D. & Johnson, C. R. (eds.)
The Visualization Handbook (ElsevierButterworth-Heinemann, 2005).[37] Lip¸sa, D. R. et al.
Visualization for the Physical Sciences.
Computer Graphics Forum ,2317–2347 (2012). URL http://doi.wiley.com/10.1111/j.1467-8659.2012.03184.x . 2138] Moser, S. An experimentalist’s guide to the matrix element in angle resolved photoemis-sion. Journal of Electron Spectroscopy and Related Phenomena , 29–52 (2017). URL https://linkinghub.elsevier.com/retrieve/pii/S0368204816301724 .[39] Hunter, J. D. Matplotlib: A 2d graphics environment.
Computing in Science & Engi-neering , 90–95 (2007).[40] Community, B. O. Blender - a 3D modelling and rendering package . Blender Foundation,Stichting Blender Foundation, Amsterdam (2018). URL .[41] Riley, J. M. et al.
Direct observation of spin-polarized bulk bands in an inversion-symmetric semiconductor.
Nature Physics , 835–839 (2014). URL .[42] Weinelt, M. Time-resolved two-photon photoemission from metal surfaces. Journal ofPhysics: Condensed Matter , R1099–R1141 (2002). URL http://stacks.iop.org/0953-8984/14/i=43/a=202?key=crossref.95d2a41303a5d98bdb7544d34eeba966 .[43] Ghiringhelli, L. M. et al. Towards efficient data exchange and sharing for big-data drivenmaterials science: metadata and data formats. npj Computational Materials , 46 (2017).URL .[44] Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learningfor molecular and materials science. Nature , 547–555 (2018). URL .[45] Ong, S. P. et al.
Python Materials Genomics (pymatgen): A robust, open-source pythonlibrary for materials analysis.
Computational Materials Science , 314–319 (2013). URL https://linkinghub.elsevier.com/retrieve/pii/S0927025612006295 .[46] Hjorth Larsen, A. et al. The atomic simulation environment—a Python libraryfor working with atoms.
Journal of Physics: Condensed Matter , 273002(2017). URL http://stacks.iop.org/0953-8984/29/i=27/a=273002?key=crossref.20f9751653d872507bf6c0cc5737032c . 2247] M. Ganose, A., J. Jackson, A. & O. Scanlon, D. sumo: Command-line tools for plottingand analysis of periodic ab initio calculations. Journal of Open Source Software , 717(2018). URL http://joss.theoj.org/papers/10.21105/joss.00717 .[48] Gerasimova, N., Dziarzhytski, S. & Feldhaus, J. The monochromator beamline atFLASH: performance, capabilities and upgrade plans. Journal of Modern Optics ,1480–1485 (2011). URL .[49] Puppin, M. et al.
500 kHz OPCPA delivering tunable sub-20 fs pulses with 15 W averagepower based on an all-ytterbium laser.
Optics Express , 1491 (2015). URL .[50] Chambers, M., Cleveland, S., Tukey, A. & Kleiner, B. Graphical Methods for DataAnalysis (Wadsworth International Group, 1983). URL .[51] Novo, D. & Wood, J. Flow cytometry histograms: Transformations, resolution, anddisplay.
Cytometry Part A , 685–692 (2008). URL http://doi.wiley.com/10.1002/cyto.a.20592 .[52] Xian, R. P., Rettig, L. & Ernstorfer, R. Symmetry-guided nonrigid registration: Thecase for distortion correction in multidimensional photoemission spectroscopy.
Ultrami-croscopy , 133–139 (2019). URL https://linkinghub.elsevier.com/retrieve/pii/S0304399118303474 .[53] Guizar-Sicairos, M., Thurman, S. T. & Fienup, J. R. Efficient subpixel image registrationalgorithms.
Optics Letters , 156 (2008). URL .[54] P. Viola & W.M.Wells. Alignment by Maximisation of Mutual Information. InternationalJournal of Computer Vision , 137–154 (1997). URL http://link.springer.com/10.1023/A:1007958904918 .[55] Salvador, S. & Chan, P. Toward accurate dynamic time warping in linear time and space. Intelligent Data Analysis , 561–580 (2007).2356] Dendzik, M. 4Dview (2019). URL https://zenodo.org/record/3360817 .[57] Setyawan, W. & Curtarolo, S. High-throughput electronic band structure calculations:Challenges and tools. Computational Materials Science , 299–312 (2010). URL http://linkinghub.elsevier.com/retrieve/pii/S0927025610002697 .[58] Hinuma, Y., Pizzi, G., Kumagai, Y., Oba, F. & Tanaka, I. Band structure diagram pathsbased on crystallography. Computational Materials Science , 140–184 (2017). URL https://linkinghub.elsevier.com/retrieve/pii/S0927025616305110https://linkinghub.elsevier.com/retrieve/pii/S0927025616305110