[PDF] A machine learning route between band mapping and band structure

Abstract

Full PDF

AA machine learning route between bandmapping and band structure

R. Patrick Xian, , ∗ Vincent Stimper, , Marios Zacharias, Shuo Dong, Maciej Dendzik, Samuel Beaulieu, Bernhard Sch ¨olkopf, Martin Wolf, Laurenz Rettig, Christian Carbogno, Stefan Bauer, ∗ and Ralph Ernstorfer ∗ Fritz Haber Institute of the Max Planck Society, 14195 Berlin, Germany. Department of Empirical Inference, Max Planck Institute for Intelligent Systems,72076 T¨ubingen, Germany. These authors contributed equally to this work. ∗ Correspondence authors: [email protected],[email protected], [email protected].

The electronic band structure (BS) of solid state materials imprints the multidimensionaland multi-valued functional relations between energy and momenta of periodically con-ﬁned electrons [1]. Photoemission spectroscopy is a powerful tool for its comprehensivecharacterization. A common task in photoemission band mapping is to recover the un-derlying quasiparticle dispersion [2], which we call band structure reconstruction. Tra-ditional methods often focus on speciﬁc regions of interests [3, 4] yet require extensivehuman oversight. To cope with the growing size and scale of photoemission data [5, 6],we develop a generic machine-learning approach leveraging the information within elec-tronic structure calculations for this task. We demonstrate its capability by reconstruct-ing all fourteen valence bands of tungsten diselenide and validate the accuracy on vari-ous synthetic data. The reconstruction uncovers previously inaccessible momentum-spacestructural information on both global and local scales in conjunction with theory, whilerealizing a path towards integrating band mapping data into materials science databases. a r X i v : . [ phy s i c s . d a t a - a n ] M a y he modelling and characterization of the electronic band structure of materials play an es-sential role in materials design [7] and device simulation [8]. The BS lives in the momen-tum space, Ω( k x , k y , k z , E ) , spanned by momentum ( k x , k y , k z ) and energy ( E ) coordinates ofthe electrons. Photoemission band mapping [2] (see Fig. 1a) using momentum- and energy-resolved photoemission spectroscopy (PES), including angle-resolved PES (ARPES) [9] andmultidimensional PES [5, 6] measures the BS as an intensity-valued multivariate probabilitydistribution directly in Ω . The proliferation of band mapping datasets and their public availabil-ity brought about by recent hardware upgrades [5, 6, 10, 11] have ushered in the possibilitiesof comprehensive benchmarking between theories and experiments. Interpreting the photoe-mission spectra often requires least-squares ﬁtting of the 1D energy distribution curves (EDCs)to the single-particle spectral function [9] at selected momentum locations through heuristicsderived from physical knowledge of the materials and the experimental settings. Although,in principle, such a physics-informed data model guarantees the highest accuracy and inter-pretability, upscaling the pointwise ﬁtting (or estimation) to large, densely sampled regions inthe momentum space (e.g. including or more momentum locations) presents challengesdue to limited numerical stability, nor does it form a constructive mathematical framework forprogressive reﬁnement. The situation becomes especially challenging for multidimensional andmultiband materials with complex band dispersions [12, 13, 14].To navigate and comprehend band mapping data globally, we propose an analytical frame-work (see Fig. 1b) for reconstructing the photoemission (or quasiparticle) BS as a set of energy(or electronic) bands, formed by energy values (i.e. band loci) connected along momentumcoordinates. Because the local maxima of photoemission intensities are not always good indi-cators of band loci, we exploit the connection between theory and experiment in our framework,based on a probabilistic machine learning [15, 16] model to approximate the intensity data fromband mapping experiments. The gist of the model is rooted in the Bayes rule, p ( Θ |D ) ∝ p ( D| Θ ) p ( Θ ) , (1)where the model parameters Θ = { θ i } and the data D are mapped directly onto unknowns andexperimental observables. We assign the energy values of the photoemission BS as the modelparameters to extract from data, and a nearest-neighbor (NN) Gaussian distribution as prior, p ( Θ ) , to describe the proximity of energy values at nearby momenta. The EDC at every mo-mentum grid point relates to the likelihood, p ( D| Θ ) , when we interpret the photoemission in-2ensity probabilistically. The optimal parameters are obtained via maximum a posteriori (MAP)estimation, or maximizing the posterior, p ( Θ |D ) , in statistical inference [15] (see Methods andSupplementary Figs. 2-3). The posterior in the current setting forms a Markov random ﬁeld(MRF) [15, 17], which encapsulates the energy band continuity assumption and the measuredintensities distribution of photoemission in a probabilistic graphical model. For one beneﬁt,the probabilistic formulation can incorporate imperfect physical knowledge algebraically in themodel or numerically as initialization (i.e. warm start, see Methods) of the MAP estimation,without requiring de facto ground truth and training as in supervised machine learning [18].For another, the graphical model representation allows convenient optimization and extensionto other dimensions (see Supplementary Fig. 1 and section S1).To demonstrate the effectiveness of the algorithm, we have reconstructed the entire 3D dis-persion surfaces, E ( k x , k y ) , of all 14 valence bands within the projected ﬁrst Brillouin zone,in ( k x , k y , E ) coordinates, of the semiconductor tungsten diselenide (WSe ), spanning ∼ ∼ − along each momentum direction. We adapt informatics tools to BSdata to sample and compare the reconstructed and theoretical BSs globally. The accuracy ofthe reconstruction is validated using synthetic data and the extracted local structural parametersin comparison with pointwise ﬁtting. The available data and BS informatics enable detailedcomparison of energy dispersion at a resolution of < − . Valence band mapping.

The 2D layered semiconductor WSe with 2H interlayer stacking(2H-WSe ) is a model system for band mapping experiments [20, 21, 12]. Its valence BS con-tains 14 strongly dispersive energy bands, formed by a mixture of the d and s orbitals of theW atoms and the p orbitals of the Se atoms in its hexagonal unit cell. The strong spin-orbitcoupling due to heavy elements produces large momentum- and spin-dependent energy splittingand modiﬁcations to the BS [12, 22]. The photoemission band mapping experiment capturesthe photoelectrons directly in their 3D coordinates, ( k x , k y , E ) , by a commercial electron mo-mentum microscope (METIS 1000, SPECS GmbH, see Methods) [5, 6]. Earlier valence bandmapping and reconstruction in ARPES experiments on WSe have demonstrated a high degreeof similarity between theory and experiments [20, 21, 12], but a quantitative assessment withinthe entire (projected) Brillouin zone is still lacking. Effects of sample degradation has also beenreported [21] during the course of long-duration angular scanning in ARPES measurements.3igure 1: From band mapping to band structure. a , Schematic of a photoemission band map-ping experiment. The electrons from a crystalline sample’s surface are liberated by extreme UV(XUV) or X-ray pulses and collected by a detector through either angular scanning or time-of-ﬂight detection schemes. b , Overview of the data-analytical framework for reconstructionof the photoemission (or quasiparticle) band structure: (1) The volumetric data obtained froma band mapping experiment ﬁrst (2) go through preprocessing steps, then are (3) fed into theprobabilistic machine learning algorithm along with electronic structure calculations as initial-ization of the optimization. The reconstruction algorithm for volumetric band mapping data isrepresented as a 2D probabilistic graphical model with the band energies as parameters at eachnode and tens of thousands of nodes in practice. (4) The outcome of the reconstruction is post-processed (e.g. symmetrization) to (5) yield the dispersion surfaces (i.e. energy bands) of thephotoemission band structure ordered by band indices. c - f , Effects of the intensity transformsin preprocessing viewed in both 3D and along high-symmetry lines of the projected Brillouinzone (see b (1)), from the original data ( c ) through intensity symmetrization ( d ), contrast en-hancement [19] ( e ) and Gaussian smoothing of intensities ( f ). The intensity data in c - f arenormalized individually for visual comparison.4ith our high-repetition-rate photon source [11] and the fast electronics of the momentummicroscope, band mapping of WSe achieves sufﬁcient signal-to-noise ratio for valence bandreconstruction within only tens of minutes of data acquisition, without the need for angularscanning and subsequent reconstruction from momentum-space slices. Band structure reconstruction and digitization.

We use a 2D MRF to model the loci ofan energy band within the intensity-valued 3D band mapping data, regarded as a collection ofmomentum-ordered EDCs. It is graphically represented by a rectangular grid overlaid on themomentum axes with the indices ( i, j ) ( i , j are nonnegative integers), as shown in step (3)of Fig. 1b. The undetermined band energy of the EDC at ( i, j ) , with the associated momen-tum coordinates ( k x,i , k y,j ) , is considered a random variable (or model parameter), ˜ E i,j , of theMRF. Together, the probabilistic model is characterized by a joint distribution, expressed asthe product of the likelihood and the Gaussian prior, in Eq. (1). To maintain its simplicity, wedon’t explicitly account for the intensity modulations of various origins (such as imbalancedtransition matrix elements [23]) in the original band mapping data, which cannot be remedi-ated by upgrading the photon source or detector. Instead, we preprocess the data to minimizetheir effects on the reconstruction (see Fig. 1c-f). The preprocessing steps include (1) inten-sity symmetrization, (2) contrast enhancement [19], followed by (3) Gaussian smoothing (seeMethods), whereafter the continuity of band-like features is restored. The EDCs from the pre-processed data, ˜ I , are used effectively as the likelihood to calculate the MRF joint distribution, p ( { ˜ E i,j } ) = 1 Z (cid:89) ij ˜ I ( k x,i , k y,j , ˜ E i,j ) · (cid:89) ( i,j )( l,m ) | NN exp (cid:34) − ( ˜ E i,j − ˜ E l,m ) η (cid:35) . (2)Here, Z is a normalization constant, η is a hyperparameter deﬁning the width of the Gaussianprior, (cid:81) ij denotes the product over all discrete momentum values sampled in the experimentand (cid:81) ( i,j )( l,m ) | NN the product over all the NN terms. Detailed derivation of Eq. (2) is givenin Supplementary Information section S1. Reconstruction of the bands in the photoemissionBS is carried out sequentially and relies on local optimization of the MRF model parameters, { ˜ E i,j } , and operates efﬁciently on scalable hardware (see Methods). We further inject relevantphysical knowledge to correctly resolve band crossings and nearly degenerate energies by usingdensity functional theory (DFT) band structure calculation with semi-local approximation [24]as a starting point for the reconstruction. The calculation qualitatively entails such physical5ymmetry information for WSe , albeit not quantitatively reproducing the experimental quasi-particle BSs [24] at all momentum coordinates. As shown with four DFT calculations withdifferent exchange-correlation functionals [24] to initiate the reconstruction for WSe and invarious cases using synthetic data with known ground truth (see Methods, Supplementary Table1 and Supplementary Figs. 4-8), the reconstruction algorithm is not particularly sensitive to theinitialization as long as the information about band crossings is present.The reconstructed 14 valence bands of WSe initialized by LDA-level DFT are shown inFig. 2b-d and Supplementary Videos. To globally compare the computed and reconstructedbands at a consistent resolution, we expand the BS in shape descriptors, such as orthonormalpolynomial bases [25], which forms a uniﬁed representation of BS data unbiased by the un-derlying electronic detail. The geometric featurization of band dispersion allows multiscalesampling and comparison using the coefﬁcient (or feature) vectors [26]. Although the choiceof basis is not unique, Zernike polynomials (ZPs) are used to decompose the 3D dispersionsurfaces (see Fig. 3 and Methods) because of their existing adaptations to various boundaryconditions [27]. In Fig. 3a-b, the band dispersions show generally decreasing dependence (seenfrom the magnitude of coefﬁcients) on basis terms with increasing complexities (see Fig. 3a),and the majority of dispersion is encoded into a subset of the terms (see Fig. 3b). The examplein Fig. 3b and more numerical evidence in Supplementary Fig. 10 illustrate the approximationcapability of the hexagonal ZPs. Concisely, the coefﬁcient amplitudes and their distributionfunction as geometric ﬁngerprints of the energy bands to describe their dispersions on a globalscale. We can then use similarity or distance metrics (see Methods) for their classiﬁcation andcomparison [26], an approach inspired by content-based image retrieval from databases [28].For example, in Fig. 3c, the positive cosine similarity conﬁrms the strong shape (or dispersion)resemblance of the 7 pairs of spin-split energy bands in the reconstructed BS of WSe , while thelow negative values, such as those between bands 1-2 and 13-14, reﬂect the opposite directionsof their respective dispersion (see Fig. 2d). These observations are consistent with the outcomeobtained from DFT calculations (see Supplementary Fig. 9).Moreover, we introduce a BS distance metric (see Methods), invariant to the global energyshift frequently used to adjust the energy zero, to quantify the differences in band dispersionand the relative spacing between bands. The distance is calculated using the geometric fea-ture vectors to bypass interpolation errors while reconciling the coordinate spacing difference6igure 2: Reconstructed band structure from WSe photoemission data. a , Comparisonbetween the preprocessed WSe valence band photoemission data along Γ -M direction, DFTband structure calculated with different exchange-correlation functionals (solid red lines), andtheir ﬁnal positions after band-wise rigid-shift alignment (dashed yellow lines) as part of hy-perparameter tuning. The energy zero of each DFT calculation is set at the K point (see alsoSupplementary Fig. 4). b , Exploded view (with enlarged spacing between bands for better visi-bility) of reconstructed energy bands of WSe . c , Overlay of reconstructed band dispersion (redlines) on preprocessed photoemission band mapping data cut along the high-symmetry lines inthe hexagonal Brillouin zone of WSe . The residual intensities on the low-energy end are fromcontrast-ampliﬁed background signals unrelated to the band structure. d , Band-wise compari-son between LDA-level DFT (LDA-DFT) calculation used to initialize the optimization and thereconstructed 14 valence bands of WSe (symmetrized in postprocessing). The boundaries ofthe ﬁrst Brillouin zone are traced out by the dashed hexagons. The band indices on the upperright corners in d follow the ordering of the electronic orbitals in this material obtained fromLDA-DFT. The color scale of band energies in b and d are normalized within each band toimprove visibility. 7igure 3: Digitization and comparison of WSe band structures. a , Decomposition of the14 energy bands of WSe into hexagonal Zernike polynomials (ZPs) with selected major termsdisplayed on the left. The zero spatial frequency term in the decomposition is subtracted foreach band. The counts of large ( > − by absolute value) coefﬁcients of all 14 bands areaccumulated at the bottom row of the decomposition to illustrate their distribution, which de-crease in value towards higher-order terms. b , Approximation of the shape (or dispersion) of thefourth energy band using different numbers of hexagonal ZPs. Both of the average and relativeapproximation errors (see Methods) shown on the far right drop as more terms are included,while summation in the coefﬁcient-ranked order (used to generate the intermediate results in b ) achieves faster convergence than in the default polynomial order. c , Cosine similarity ma-trix for pairwise comparison of the reconstructed band dispersion in Fig. 2. The band indicesfollow those in Fig. 2d. d , Two-part similarity matrix showing band structure distances (inthe upper triangle) and their corresponding standard errors (in the lower triangle) between thecomputed and reconstructed band structures of WSe . The abbreviation “LDA recon.” denotesreconstruction with LDA-level DFT band structure as the initialization.8etween reconstructed and theoretical BSs, essential for differentiating BS data from heteroge-neous sources in databases. The results in Fig. 3d refer to the valence BS of WSe discussedin this work, where the distances and their spread (i.e. standard errors) are displayed in theupper and lower triangles, respectively. A high degree of consistency exists among the recon-structions (pairwise distance no larger than 60 ± GW [29] or that including electronic self-energiesrenormalized by electron-phonon coupling [30], when semi-local approximation yields not onlyquantitatively, but also qualitatively wrong quasiparticle BSs compared with experiment. How-ever, a systematic benchmark of theory and experiment goes beyond the scope of this work. Asshown in Fig. 3d and Supplementary Fig. 4, the learning algorithm can bridge the epistemicgap between theories to obtain a consistent reconstruction.Besides providing global structural information, the reconstruction improves the robustnessof traditional region-of-interest lineshape ﬁtting in extended regions of the momentum space(see Supplementary Fig. 11), when used as initial guess. Pointwise ﬁtting in turn acts as re-ﬁnement of local details not explicitly included in the probabilistic reconstruction model, whichprioritizes efﬁciency. A compendium of local parameters are retrieved using this approach (seeSupplementary Table 2). We obtain the trigonal warping parameter of the ﬁrst two valencebands around K-point, 5.8 eV · ˚A and 3.9 eV · ˚A , respectively, conﬁrming the magnitude differ-ence between these spin-split bands predicted by theory [22]. Fitting around M (cid:48) (and M) revealsthat the gap opened by spin-orbit interaction extends beyond the saddle point in the dispersionsurface with the minimum gap at 338 meV, markedly larger than DFT results. Overall, thereconstruction yields local structural information consistent with the more laborious pointwiseﬁtting. Finally, our approach reduces the data size by over 5000 times from 3D band mappingdata to geometric features vectors (see Methods), facilitating database integration. Conclusion and outlook.

We have formulated band structure reconstruction ubiquitous in pho-toemission band mapping in an inference perspective and described an efﬁcient reconstructionprocedure by combining probabilistic machine learning with the physical knowledge embedded9n electronic structure calculations, as demonstrated for the energy-dispersive, multiband mate-rial of WSe . The reconstruction reveals global and local structural information challenging toaccess by pointwise ﬁtting. Our approaches lend valuable insights to automating data analysis inmaterials characterization and future upgrade into an end-to-end framework balancing physicalconstraints and computational efﬁciency to achieve desired accuracy. The reconstruction out-come should assist interpretation of deep-lying bands, parametrizing multiband Hamiltonianmodels [31], simulation of realistic devices [8], and complement theoretical data in materi-als science databases. Implementation of the reconstruction across multiple materials and tohigher-dimensional data [5], including temperature, photon energy, dynamical time delay andspin as resolved quantities, will generate comprehensive knowledge about the (non)equilibriumelectronic structure of materials to benchmark theories. Likewise, the method is transferable toextract the band structure of other quasiparticles (e.g. phonons [32] or polaritons) in periodicsystems, given the availability of these alternative band mapping data. The multidisciplinarymethodology provides an example for building next-generation high-throughput materials char-acterization toolkits combining learning algorithms with physical knowledge [33] to arrive at acomprehensive understanding of materials properties unattainable before. Methods

Band mapping measurements.

Photoemission band mapping of WSe using multidimen-sional photoemission spectroscopy were conducted using a laser-driven, high harmonic generation-based extreme UV light source [11] operating at 21.7 eV and 500 kHz and a METIS 1000(SPECS GmbH) momentum microscope featuring a delay-line detector coupled to a time-of-ﬂight drift tube [6, 34]. Single crystal samples of WSe ( > samples were attached to the Cu substrate by conductive epoxyresin (EPO-TEK H20E). The samples were cleaved by cleaving pins attached to the samplesurface upon transfer into the measurement chamber, which operates at an ambient pressureof − mbar during photoemission experiments. No effect of surface termination has beenobserved in the measured WSe photoemission spectra, similar to previous experimental ob-servations [20, 12]. For the valence band mapping experiments, the energy focal plane of thephotoelectrons within the time-of-ﬂight drift tube was set close to the top valence band.10 ata processing and reconstruction. The raw data, in the form of single-electron eventsrecorded by the delay-line detector, were preprocessed using home-developed software pack-ages [35]. The events were ﬁrst binned to the ( k x , k y , E ) grid with a size of 256 × ×

470 tocover the full valence band range in WSe within the projected Brillouin zone, which amountsto a pixel size of ∼ A − along the momentum axes and ∼

18 meV along the energyaxis. The bin sizes are within the limits of the momentum resolution ( < A − ) and energyresolution ( <

15 meV) of the photoelectron spectrometer [36].Data binning is carried out in conjunction with the necessary lens distortion correction [37]and calibrations as described in [35]. The outcome provides a sufﬁcient level of granularityin the momentum space to resolve the ﬁne features in band dispersion while achieving highersignal-to-noise ratio than using single-event data directly. Afterwards, we applied intensitysymmetrization to the data along the sixfold rotation symmetry and mirror symmetry axes [12]of the photoemission intensity pattern in the ( k x , k y ) coordinates, followed by contrast enhance-ment using the multidimensional extension of the contrast limited adaptive histogram equaliza-tion (MCLAHE) algorithm, where the intensities in the image are transformed by a look-uptable built from the normalized cumulative distribution function of local image patches [19].Finally, we applied Gaussian smoothing to the data along the k x , k y and E axes with a standarddeviation of 0.8, 0.8 and 1 pixels (or about 0.012 ˚ A − , 0.012 ˚ A − , and 18 meV), respectively.After data preprocessing, we sequentially reconstructed every energy band of WSe fromthe photoemission data using the maximum a posteriori (MAP) approach described in the maintext. The reconstruction requires tuning of three hyperparameters: (1) the momentum scalingand (2) the rigid energy shift to coarse-align the computed energy band, e.g. from densityfunctional theory (DFT), to the photoemission data, and (3) the width of the nearest-neighborGaussian prior ( η in Eq. (2)). The hyperparameter tuning is also carried out individually foreach band to adapt to their speciﬁc environment. An example of hyperparameter tuning is givenin Supplementary Fig. 3. The MAP reconstruction method involves optimization of the modelparameters (i.e. band energy random variables), { ˜ E i,j } to maximize the posterior probability p = p ( { ˜ E i,j } ) or to minimize the negative log-probability loss function, L := − log p , obtainedfrom Eq. (2) as is used in our actual implementation. L ( { ˜ E i,j } ) = − (cid:88) i,j log I ( k x,i , k y,j , ˜ E i,j ) + (cid:88) ( i,j ) , ( l,m ) | NN ( ˜ E i,j − ˜ E l,m ) η + const . (3)11e implemented the optimization using a parallelized version of the iterated conditional mode(ICM) [38] method in Tensorﬂow [39] in order to run on multicore computing clusters andGPUs. The parallelization involves a checkerboard coloring scheme (or coding method) ofthe graph nodes [40] and subsequent hierarchical grouping of colored nodes, which allowsalternating updates on different subgraphs (i.e. subsets of the nodes) of the Markov randomﬁeld during optimization. Typically, the optimization process in the reconstruction of one bandconverges within and therefore is terminated after 100 epochs, which takes ∼ , straightforward DFT calculations with semi-local approximation (which in itselfinvolves explicit optimizations such as geometric optimization of the crystal structures) aresufﬁcient, but our approach is not limited to DFT. Therefore, the use of ”warm start” in ourapplication is conceptually well-aligned with the origin of the term.To validate the MAP reconstruction algorithm in a variety of scenarios, we used syntheticphotoemission data where the nominal ground-truth band structures are available. The bandstructures are constructed using analytic functions, model Hamiltonians or DFT calculations.The initializations are generated by tuning the numerical parameters used to generate the ground-truth band structures. The procedures and results are presented in section S2 of the Supplemen-tary Information. In simple cases, such as single or well-isolated bands, the reconstruction12ields a close solution to the ground truth even with a ﬂat band initialization. In the moregeneral multiband scenario with congested bands and band crossings (or anti-crossings), anapproximate dispersion (or shape) of the band and the crossing information is required in theinitialization (i.e. warm start) in order to converge to a realistic solution. We further tested therobustness of the initializations by (1) scaling the energies of the ground truth and by (2) us-ing DFT calculations with different exchange-correlation (XC) functionals, in order to capturesufﬁcient variability of available band structure calculations in the real world. We quantify thevariations in the initializations and the performance of the reconstruction using the average error(Eq. (8), or Fig. 3b), calculated with respect to the ground truth. Among the different numericalexperiments, we ﬁnd that the optimization converges consistently to a set of bands that bettermatches the experimental data than the initialization. This is manifest in that the average errorsof the initializations are reduced to a similar level in the corresponding reconstruction outcomes,a trend seen over all bands regardless of their dispersion. In the synthetic data with an energyspacing of ∼

18 meV, the average error in the reconstruction is on the order of 40-50 meV foreach band, which amounts to an average inaccuracy of < fuller [42] for broader applications. Band structure calculations.

Electronic band structures were calculated within (generalized)DFT using the local density approximation (LDA) [43, 44], the generalized-gradient approxi-mation (GGA-PBE) [45] and GGA-PBEsol [46]), and the hybrid XC functional HSE06 [47],which incorporates a fraction of the exact exchange. All calculations were performed withthe all-electron, full-potential numeric-atomic orbital code, FHI-aims [48]. They were con-ducted for the geometries obtained by fully relaxing the atomic structure with the respectiveXC-functional to keep the electronic and atomic structures consistent. Spin-orbit coupling wasincluded in a perturbational fashion [49]. The momentum grid used for the calculation wasequally sampled with a spacing of 0.012 ˚ A − in both k x and k y directions that covers the irre-ducible part of the ﬁrst Brillouin zone at k z = 0.35 ˚ A − , estimated using the inner potential ofWSE from a previous measurement [12]. The calculated band structure is symmetrized to ﬁllthe entire hexagonal Brillouin zone to be used to initialize the band structure reconstruction and13ynthetic data generation. We note here that for the MAP reconstruction, the momentum gridsize used in theoretical calculations (such as DFT at various levels used here) need not be iden-tical to that of the data (or instrument resolution) and in those cases an appropriate upsampling(or downsampling) should be applied to the calculation to match their momentum resolution.Further details are presented in section S3 of the Supplementary Information. Band structure informatics.

The shape feature space representation of each electronic band isderived from the decomposition, E b ( k ) = (cid:88) l a l φ l ( k ) = a · Φ (4)Here, k = ( k x , k y ) represents the momentum coordinate, E b ( k ) is the single-band dispersionrelation (e.g. dispersion surface in 3D), a l and φ l ( k ) are the coefﬁcient and its associated basisterm, respectively. They are grouped separately into the feature vector, a = ( a , a , ... ) , andthe basis vector, Φ = ( φ , φ , ... ) . The orthonormality of the basis is guaranteed within theprojected Brillouin zone (PBZ) of the material. (cid:90) k ∈ Ω PBZ φ m ( k ) φ n ( k ) d k = δ mn (5)For the hexagonal PBZ of WSe , the basis terms are hexagonal Zernike polynomials (ZPs) con-structed using a linear combination of the circular ZPs via Gram-Schmidt orthonormalizationwithin a regular (i.e. equilateral and equiangular) hexagon [27]. A similar method can be usedto generate ZP-derived orthonormal basis adapted to other boundary conditions [27]. The repre-sentation in feature space [26] provides a way to quantify the difference (or distance) d betweenenergy bands or band structures at different resolutions or scales without additional interpola-tion. To quantify the shape similarity between energy bands E b and E b (cid:48) , we calculate the cosinesimilarity using the feature vectors, d cos ( E b , E b (cid:48) ) = a · a (cid:48) | a | · | a (cid:48) | , (6)The cosine similarity is bounded within [ − , , with a value of 0 describing orthogonality ofthe feature vectors and a value of 1 and -1 describing parallel and anti-parallel relations betweenthem, respectively, both indicating high similarity. The use of cosine similarity in feature spaceallows comparison of dispersion while being unaffected by their magnitudes. In comparing the14ispersion between single energy bands using Eq. (6), the ﬁrst term in the polynomial expan-sion, or the hexagonal equivalent of the Zernike piston [50], is discarded as it only represents aconstant energy offset (with zero spatial frequency) instead of dispersion, which is characterizedby a combination of ﬁnite and nonzero spatial frequencies.The electronic band structure is a collection of energy bands E B = { E b i } ( i = 1 , , ... ) .To quantify the distance between two band structures, E B = { E b ,i } and E B = { E b ,i } ,containing the same number of energy bands while ignoring their global energy difference, weﬁrst subtract the energy grand mean (i.e. mean of the energy means of all bands within theregion of the band structure for comparison). Then, we compute the Euclidean distance, or the (cid:96) -norm, for the i th pair of bands, d b,i . d b,i ( E b ,i , E b ,i ) = (cid:107) ˜ a ,i − ˜ a ,i (cid:107) = (cid:115)(cid:88) l (˜ a ,il − ˜ a ,il ) . (7)Here, ˜ a denotes the feature vector after subtracting the energy grand mean so that any globalenergy shift is removed. We deﬁne the band structure distance as the average distance over all N b pairs of bands, or d B ( E B , E B ) = (cid:80) N b i d b,i ( E b ,i , E b ,i ) /N b . The values of d B ( E B , E B ) are shown in the upper triangle of Fig. 3d and their corresponding standard errors (over the 14valence bands of WSe ) in the lower triangle. The distance in Eq. (7) is independent of basisand allows energy bands calculated on different resolutions or from different materials with thesame symmetry (e.g. differing only by Brillouin zone size) to be compared.We use same-resolution error metrics to evaluate the approximation quality of the expansionbasis and to quantify the reconstruction outcome with a known ground-truth band structure.Speciﬁcally, we deﬁne the average approximation error (with energy unit), η avg , for each energyband using the energy difference at every momentum location, η avg ( E approx , E recon ) = (cid:115) N k (cid:88) k ∈ Ω PBZ ( E approx , k − E recon , k ) , (8)where N k is the number of momentum grid points and the summation runs over the projectedBrillouin zone. In addition, we construct the relative approximation error, η rel , following thedeﬁnition of the normwise error [51] in matrix computation, η rel ( E approx , E recon ) = (cid:107) E approx − E recon (cid:107) (cid:107) E recon (cid:107) . (9)15q. (8)-(9) are used to compute the curves in Fig. 3b as a function of the number of basis termsincluded in the approximation. The relevant code for the representation using hexagonal ZPsand the computation of the metrics is also accessible in the public repository fuller [42]. Data reduction.

The raw data and intermediate results are stored in the HDF5 format [35].The ﬁle sizes quoted here for reference are calculated from storage as double-precision ﬂoatsor integers (for indices). The photoemission band mapping data of WSe (256 × ×

470 bins)have a size of about 235 MB (240646 kB) after binning from single-event data (7.8 GB or8176788 kB). The reconstructed valence bands at the same resolution occupy about 3 MB (3352kB) in storage, and the size further decreases to 46 kB when we store the shape feature vectorassociated with each band. If only the top-100 coefﬁcient (ranked by the absolute values of theiramplitudes) and their indices in the feature vectors are stored, the data amounts to 24 kB. Forthe case of WSe , the top-100 coefﬁcients can approximate the band dispersion with a relativeerror (see Eq. (9)) of < . for every energy band, as shown in Supplementary Fig. 10. Acknowledgments

We thank M. Schefﬂer for fruitful discussions and S. Sch¨ulke, G. Schnapka at GemeinsamesNetzwerkzentrum (GNZ) in Berlin and M. Rampp at Max Planck Computing and Data Facility(MPCDF) in Garching for support on the computing infrastructure. The work was partially sup-ported by BiGmax, the Max Planck Society’s Research Network on Big-Data-Driven Materials-Science, the European Research Council (ERC) under the European Union’s Horizon 2020 re-search and innovation program (Grant No. 740233 and Grant No. ERC-2015-CoG-682843), theGerman Research Foundation (DFG) through the Emmy Noether program under grant numberRE 3977/1, the SFB/TRR 227 “Ultrafast Spin Dynamics” (projects A09 and B07), and the NO-MAD pillar of the FAIR-DI e.V. association. S. Beaulieu acknowledges the ﬁnancial support ofthe Banting Fellowship from the Natural Sciences and Engineering Research Council (NSERC)in Canada.

Authors contributions

R.P.X. and R.E. conceived the project. S.D. and Sa.B. performed the photoemission bandmapping experiment. M.Z., M.D. and C.C. performed the DFT band structure calculations.16.P.X. processed the raw data, devised the band structure digitization and algorithm validationschemes. V.S. designed and implemented the machine learning algorithm under the supervisionof St.B. and B.S. with inputs from R.P.X.. R.P.X., V.S. and M.Z. co-wrote the ﬁrst draft ofthe manuscript. All authors contributed to discussion and revision of the manuscript to its ﬁnalversion.

Data and code availability

The electronic structure calculations are available from the NOMAD repository(DOI: 10.17172/NOMAD/2020.03.28-1). The photoemission dataset used in this work will bemade available on Zenodo. The code developed for this work will be made available on GitHubat https://github.com/mpes-kit/fuller.

Competing interests

The authors declare no competing interests in the content of the article.17 upplementary InformationA machine learning route between bandmapping and band structure

S1 Band structure reconstruction

Physical foundations.

The three quantities of common interest for the interpretation of photoe-mission spectra are (1) the bare band energy, (cid:15) k , (2) the complex-valued electron self-energy, Σ( k , E ) = ReΣ( k , E ) + i ImΣ( k , E ) , and (3) the transition matrix elements connecting theﬁnal ( f ) and initial ( i ) electronic states, M f,i ( k , E ) . An established interface between theoryand experiment for quantitating and interpreting the photoemission signal is the formalism ofan experimental observable—the single-particle spectral function [52, 9], A ( k , E ) . For a singleenergy band of a many-body electronic system, A ( k , E ) = 1 π Im Σ( k , E )[ E − (cid:15) k − Re Σ( k , E )] + [ Im Σ( k , E )] . (10)Within this framework, the band loci of the photoemission (or quasiparticle) band structure(BS), b ( k , E ) = (cid:15) k + ReΣ( k , E ) , correspond to the bare band dispersion modulated by thereal part of the electron self-energy, and they occupy the local maxima of the spectral functionevaluated at different momenta. However, in the photoemission process, the intensity countsregistered by the detector are modulated by the transition matrix elements [23], the Fermi-Diracoccupation function, f FD ( E ) , and the resolution of the measuring instrument, G ( E, σ E , σ k ) ,typically a multidimensional Gaussian function. This leads to the expression of the photoemis-sion intensity, I ( k , E ) , registered on an energy- and momentum-resolved detector, I ( k , E ) ∝ | M f,i ( k , E ) | f FD ( E ) A ( k , E ) ⊗ G ( E, σ E , σ k ) . (11)For a multiband electronic structure, band mapping measurements, in principle, have access tothe spectral functions of at least all valence bands. The photoemission intensities are combined18n summation to form the multiband (MB) counterpart of the single-band formula. I MB ( k , E ) = (cid:88) j I j ( k , E ) ∝ (cid:88) j | M f j ,i j ( k , E ) | f FD ( E ) A j ( k , E ) ⊗ G ( E, σ E , σ k ) (12) ∼ (cid:88) j A j ( k , E ) ⊗ G ( E, σ E , σ k ) , (when | M f j ,i j ( k , E ) | → , f FD ( E ) → . (13)The condition f FD ( E ) → applies to valence bands, while | M f j ,i j ( k , E ) | → may be achievedthrough nonlinear intensity normalization or contrast enhancement in data processing. Theexpression of the multiband photoemission intensity in Eqs. (12)-(13) provides the physicalfoundation and inspiration for the approximate generation of band mapping data (see SectionS2) that we employ to validate the reconstruction algorithm introduced in this work. Markov random ﬁeld modeling.

The Markov random ﬁeld (MRF) model for the photoemis-sion band structure in photoemission band mapping data can be constructed similarly for datain multiple dimensions. In traditional angle-resolved photoemission spectroscopy (ARPES),photoemission intensities are measured in the ( k, E ) coordinates, the proximity of the momen-tum positions in the band structure can be modeled using an MRF composed of a 1D chainof random variables as shown in Supplementary Fig. 1a. Band mapping data in ( k x , k y , E ) coordinates, as described in the main text, can be modelled using a 2D MRF. In addition, thealgorithm can be extended to higher dimensions involving coordinates beyond energy and mo-menta. For example, time-resolved photoemission data recorded in ( k x , k y , E, t ) coordinatescan be modelled using a 3D MRF as shown in Supplementary Fig. 1c. In the following, weprovide a brief introduction to the theory underlying MRF and provide a simpliﬁed derivationof the 2D MRF model introduced in the main text.Deriving the MRF amounts to determining the joint distribution of the random variables as-sociated with its graphical representation. In graphical model theory [53], a graph is constructedfrom the fundamental components called cliques. Each clique C of a graph is a subset of nodesthat shares an edge with another node in C , with the total number of nodes in C deﬁned asits size. The MRFs in Supplementary Fig. 1a-c that model the photoemission data are builtout of cliques of sizes 1–2 shown in Supplementary Fig. 1d. Although larger cliques can beconstructed similarly [53], their parent graphical models are described by more complex jointdistributions with drastically higher computational costs in optimization, therefore are not usedin our MRFs. Mathematically, each clique is represented by a so-called potential function, ψ C ,19hich is used to derive the joint distribution that characterizes the MRF. The potential functiononly depends on the node conﬁguration in the cliques, X C , and satisﬁes ψ C ( X C ) (cid:62) . Ac-cording to the Hammersley-Clifford theorem [54, 55, 53], the joint distribution of a vector ofrandom variables, X , can be written in the factorized form, p ( X ) = 1 Z (cid:89) C ∈C ψ C ( X C ) . (14)Here, C is the set of all cliques in the graph, and the partition function Z is a normalizationconstant given by Z = (cid:88) X (cid:89) C ∈C ψ C ( X C ) . The graphical representation of the MRFs relevant to this work are rectangular grids shown in ˜E k ··· ˜E i − k i − ˜E i k i ˜E i+1 k i+1 ··· ˜E N k N a · · · ˜E i,j,m − ˜E i,j,m ˜E i,j,m+1 · · · ··· k y,j ···... t m − ... k x,i ··· ···... t m ... ··· ···... t m+1 ... bc ··· k y,j − ˜E i − − ˜E i,j − ˜E i+1,j − ······ k y,j ˜E i − ˜E i,j ˜E i+1,j ······ k y,j+1 ˜E i − ˜E i,j+1 ˜E i+1,j+1 ···... ... ...... k x,i − ... k x,i ... k x,i+1 d ˜E i ˜E i ˜E i+1 Supplementary Figure 1:

Examples of the MRF models for photoemission spectroscopydata . a , 1D MRF model for data in ( k, E ) coordinates, represented as a chain of random vari-ables ˜ E i . N is the number of measured momentum values. b , 2D MRF model of photoemissiondata in ( k x , k y , E ) coordinates as introduced and demonstrated for use in the main text, withthe random variables ˜ E i,j connected on two dimensions k x and k y . c , 3D MRF model for time-and momentum-resolved photoemission spectroscopy data in ( k x , k y , E, t ) coordinates. Therandom variables ˜ E i,j,m are ﬁrst connected in the graph to the neighboring momentum positionsas in b , then subsequently along the neighboring time points. The time variable in c may alsobe replaced with other variables without changes in the structure of the graphical model. In a - c ,the MRFs are constructed using components (cliques) with sizes 1 (left) and 2 (right) in d , withtheir respective potential functions written below the illustrations.Supplementary Fig. 1. The respective potential functions of the size-1 and size-2 cliques are20nterpreted as the likelihood and prior of the probabilistic graphical model, respectively. To castthe band structure reconstruction problem into this framework, we assign the band energies asthe random variables (or model parameters) in the model, and the potential function of eachnode (size-1 clique) as the (preprocessed) photoemission intensity at the respective grid posi-tion. For simplicity and computational efﬁciency, this formulation doesn’t explicitly accountfor the intensity modulations described in Eq. (11) and preprocessing steps are required to neu-tralize their effects. The continuity assumption (i.e. no sharp jump) of the band energies alongmomentum directions means that the potential function of size-2 cliques can be represented bya Gaussian on adjacent momentum grid positions. Intuitively, this means that the closer the twoadjacent energies is, the more probable they are the actual band loci, and vice versa .In the 1D case (see Supplementary Fig. 1a), the potential function of each node (containingone band energy random variable ˜ E i ) is given by ψ i ( ˜ E i ) = ˜ I ( k i , ˜ E i ) , (15)where ˜ I is the photoemission intensity after preprocessing. The potential function of two con-nected nodes (describing the similarity between two neighboring band energy random variables)is given by ψ j,j +1 ( ˜ E j , ˜ E j +1 ) = exp (cid:34) − ( ˜ E j − ˜ E j +1 ) η (cid:35) . (16)Plugging Eqs. (15)-(16) into Eq. (14) yields p ( ˜ E , ..., ˜ E N ) = 1 Z N (cid:89) i =1 ψ i ( ˜ E i ) · N − (cid:89) j =1 ψ j,j +1 ( ˜ E j , ˜ E j +1 )= 1 Z N (cid:89) i =1 ˜ I ( k i , ˜ E i ) · N − (cid:89) j =1 exp  − (cid:16) ˜ E j − ˜ E j +1 (cid:17) η  (17)as the joint distribution of the 1D MRF, with N being the total number of momentum gridpoints. Analogously, we can derive the joint distribution of the 2D MRF as given in the maintext, and that for the 3D MRF in the ( k x , k y , E, t ) coordinates is p ( { ˜ E i,j,m } ) = 1 Z (cid:89) i,j,m ˜ I ( k x,i , k y,j , t m , ˜ E i,j,m ) · (cid:89) ( i,j,m ) , ( l,o,q ) | NN exp (cid:34) − ( ˜ E i,j,m − ˜ E l,o,q ) η (cid:35) . The MRF models in different dimensions discussed here follow the same Bayesian interpreta-tion as the 2D MRF (Eq. (1) in the main text).21 ptimization procedure.

Optimization of the MRF model is a local minima-ﬁnding process[53]. The following procedures are described using the 2D MRF in the main text as an example,but the approach can be extended to arbitrary dimensions. Due to the large number of randomvariables ( ∼ for the 2D MRF in the main text) and their complex dependence structurein the MRF, we solved it numerically using iterated conditional mode (ICM) [38] procedureand implemented with efﬁcient parallelization schemes, including the coding method and thehierarchical grouping of random variables. Next, we discuss the motivations and clarify thedetails of these three aspects. We provide the associated pseudocode in Algorithm 1.(1) ICM: Originally developed for similar optimization problems arising in image denoising[56, 57, 53], ICM is applicable to optimizing MRF at any dimension. The ICM procedureincludes (i) initialization of the random variables (e.g. { ˜ E i,j } in 2D MRF) and (ii) selection ofa single random variable to optimize in the loss function L while keeping all the other randomvariables ﬁxed. Each round in (ii) requires to compute at most ﬁve terms in the loss (Eq. (3)in the main text Methods section) which depend on the selected random variable ˜ E i,j . We cansimply evaluate these terms at the energy axis values measured in the experiment to determinethe energy associated with the lowest loss. (iii) iterate over all other random variables using thesame procedure in (ii).(2) Coding method: The ICM procedure described above operates sequentially over every ˜ E i,j ,which is inefﬁcient for the MAP optimization involving a large number of parameters. To im-prove the optimization performance, we implement the ICM with a checkerboard paralleliza-tion scheme (or coding method) [40] that scales favorably on multicore computing clusters.The scheme assigns the nodes of the MRF alternately with white and black colors, as shownin Supplementary Fig. 2a. If the white nodes are blocked, the black nodes are no longer con-nected through paths (i.e. sequences of connected edges and nodes). This property is called d-separation [58, 53]. Analogously, blocking the black nodes d-separates the white nodes. Sincethe MRF models satisfy the Hammersley-Clifford theorem [54], d-separation is equivalent toconditional independence, meaning that the random variables represented by the black nodesare independent if we condition on those represented by the white nodes. Therefore, condition-ing on the nodes of one color allows us to compute the terms in the log-probability loss (Eq. (3)in main text Methods) that depends on the nodes of another color in parallel, which means thatthe nodes associated with different colors can be updated alternately. Further details and proofs22 b Supplementary Figure 2:

Numerical optimization of the MRF model . a , Schematic of thecheckerboard parallelization (or coding method) and hierarchical grouping schemes for speed-ing up the ICM. The nodes of the MRF are alternately colored white and black (checkerboardparallelization) and each set of four neighboring nodes are group into a unit as colored in grey(hierarchical grouping). The updates in optimization are carried out ﬁrst at the four-node unitlevel, then alternately on the white or black nodes within the units. b , An example loss curvefor reconstructing the second valence band of WSe using the 2D MRF model and parallelizedICM implementation. L is the initial value of the loss at the start of the optimization. Withinan epoch in the parallelized scheme, the white nodes and subsequently the black nodes are sep-arately updated, therefore each band energy random variable is effectively updated once. Theloss decreases rapidly in the beginning and reaches a minimum after about 90 epochs.related to the coding method have been elaborated in [59, 55].(3) Hierarchical grouping: The introduction of the checkerboard parallelization scheme re-duces the translation symmetry of the original graph (originally symmetric by translation of anarbitrary number of nodes, now only symmetric by a translation of two nodes in each direction),which complicates the matrix operations needed to update the loss. However, we can restorethe translation symmetry and carry out the computation on a higher level by grouping a set offour neighboring nodes into a unit, as illustrated in Supplementary Fig. 2a. In this way, updat-ing the loss requires only standard matrix operations at the unit level followed by consecutiveupdates of the nodes within the units. During the optimization, the loss is updated by two setsof operations concerning (i) the nearest neighbor nodes within the unit (line 18-19 in Algorithm1) and (ii) the nearest neighbor nodes of the neighboring unit (line 20-21 in Algorithm 1). Thelatter operations are carried out by shifting the higher-level rectangular grid formed by the unitsby one step vertically or horizontally, followed by an operation on nodes of the respective units23 lgorithm 1 Optimization procedure for reconstructing a single energy band.

Input : I (3D momentum-resolved photoemission data), E (2D initialization from density func-tional theory calculation), E (1D energy axis) Parameter : η (hyperparameter of the Markov random ﬁeld), N (number of epochs) Output : E rec (Reconstructed 2D energy band) size kx, size ky, size E = size(I) ind x, ind y = meshgrid(range(size kx, step=2), range(size ky, step=2)) u (i,j,...), I u (i,j,...) are the band energies and for i in [0, 1] do for j in [0, 1] do E u [i, j, :, :] = E [ind x + i, ind y + j] log I u [i, j, :, :, :] = log(I[ind x + i, ind y + j, :]) for n in range(N) do E u [0, 0, :, :] = update E(0, 0, log I u , E u , E) E u [1, 1, :, :] = update E(1, 1, log I u , E u , E) E u [0, 1, :, :] = update E(0, 1, log I u , E u , E) E u [1, 0, :, :] = update E(1, 0, log I u , E u , E) for i in [0, 1] do for j in [0, 1] do E rec [ind x + i, ind y + j] = E u [i, j, :, :] function UPDATE

E(i, j, log I u , E u , E) squ diff = (E u - E) ** 2 / (2 * η ** 2) log p values, start with log-likelihood log p = log I u [i, j, :, :, :] log p -= squ diff[(i + 1) % 2, j, :, :, :] log p -= squ diff[i, (j + 1) % 2, :, :, :] log p -= shift(squ diff[(i + 1) % 2, j, :, :, :], 2 * i - 1, axis=2) log p -= shift(squ diff[i, (j + 1) % 2, :, :, :], 2 * j - 1, axis=3) return E[argmax(log p)] 24f the original and the shifted grid. The procedure is implemented in the open-source fuller package [42] using Tensorﬂow [39]. Supplementary Fig. 2b shows an example loss curve (i.e.loss as a function of iteration) in reconstruction of an energy band, where the optimization isessentially complete within ∼

90 iterations.(4) Robust initialization: Since the current MRF model doesn’t include any explicit regulariza-tion on the outcome with respect to the initialization, the optimizer is free to explore a largerange of values. In other words, the initial band dispersion is able to freely deform to ﬁt tothe band loci embedded in the data. This design improves the robustness of the algorithm toinitialization. As a result, in scenarios with only non-crossing energy bands, the MAP optimiza-tion can simply be initialized with constant energy values to yield consistent results. In generalsituations involving band crossings, the optimization procedure requires an initialization withapproximate energy values that preserves the band-crossing information, such as those providedby electronic structure calculations. In this scenario, the robustness of the algorithm is mani-fest in the fact that it can tolerate a certain amount of deviation in the initialization and stillconverges to a satisfactory reconstruction, which, in realistic settings, is closer to the real bandstructure contained in photoemission data than the initialization (e.g. from electronic structurecalculations). Quantitative examples demonstrating the robustness of initialization are providedusing synthetic data in Supplementary Figs. 5-6 (see Section S2).

Hyperparameter tuning.

The optimization process in the band structure reconstruction in-volves the tuning of three kinds of hyperparameters, which are the momentum scaling parame-ter, the rigid energy shift and the width of the nearest-neighbor Gaussian prior.(1) Momentum scaling: applied to equalize the momentum scale and resolution between theBS calculation (e.g. conducted on relaxed unit cells, see Supplementary Table 1) and the exper-imental data (measured on real materials). In our reconstruction procedure, the scaling factoris ﬁxed in the reconstruction of all energy bands using a particular level of density functionaltheory (DFT) calculation as initialization.(2) Rigid energy shift ( ∆ E): separately applied to each energy band in the calculated BS tocoarse-align to the band mapping data. In our case, the shift is chosen manually by visualinspection of the theoretical band energies overplotted on photoemission data (usually in theenergy-momentum slices). In practice, the necessary energy shifts vary between bands and also25upplementary Figure 3:

Demonstration of hyperparameter tuning . An example of tuningthe hyperparameters, the rigid shift ( ∆ E) and the width of the nearest-neighbor Gaussian prior( η ), for reconstructing the second valence band of WSe . a , Evolution of reconstructed energyband during hyperparameter tuning. b , Evolution of the initialization and reconstructed bandalong high-symmetry directions of the hexagonal lattice of WSe . The energy bands are overlaidon top of preprocessed data from photoemission band mapping of WSe (Fig. 1f in the maintext). In a , b , the images showing the optimal region for the hyperparameters identiﬁed by thescientists are emphasized with orange-colored frames.26epend on level of approximation in the BS calculation used as initialization, as illustrated inFig. 2a of the main text.(3) Width of the nearest-neighbor Gaussian prior ( η ): The value of the parameter η is chosenmanually from an initial estimate and subsequently optimized by visual inspection of the re-construction outcome. In the case of WSe , the momentum grid of the experimental data has aspacing of ∆ k x = ∆ k y ≈ A − , we used η ∈ [0.05, 0.2] eV. Generally speaking, the initialestimate of η has the order of magnitude proportional to the momentum grid spacing times thedispersion due to the following argument: To obtain a consistent reconstruction, we expect theposterior to stay relatively constant and be independent of the momentum grid spacing, whichshould be sufﬁciently ﬁne to ensure band continuity. Since after preprocessing the data, theintensity (i.e. the likelihood) is normalized and stays constant with respect to the momentumgrid spacing, the nearest-neighbor Gaussian prior term should stay constant correspondingly.For example, for two nearest-neighbor energy variables along the k x axis, the reasoning aboverequires, const ≈ ( ˜ E i +1 ,j − ˜ E i,j ) η ≈ (cid:18) ∂E∂k x (cid:19) ∆ k x η . (18)Thereby, we obtain η ∝ ∂E∂k x ∆ k x , which provide an order-of-magnitude estimate of η . The samelines of reasoning apply to the k y axis, for detector systems with relatively constant momentumresolution. As the grid spacing is the same in both k x and k y directions, a single η is used forreconstructing each band in the case of WSe , but the best η differs somewhat between energybands due to their various amounts of dispersion and how they are connected to the neighboringbands (i.e. their environment), hence the range of η as speciﬁed earlier.To demonstrate the process of hyperparameter tuning, we provide an example showing thereconstruction of the second valence band of WSe (see Supplementary Fig. 3), visualized inthe top view of the reconstruction outcome and in the momentum path along high-symmetrylines of the projected Brillouin zone. The orange-framed subﬁgures represent the range ofhyperparameter settings that yield a good reconstruction. In practice, the rule-of-thumb is thatthe choice of hyperparameters is more ﬂexible for reconstructing more isolated bands or thosewith less number of crossings, and vice versa . Reconstructions using different theories as initializations.

Comparison between reconstructedand theoretical band structures for 2H-WSe are presented as a similarity matrix in the main27 bc d Supplementary Figure 4:

Band structure reconstructions with different theory initializa-tions . Comparisons between reconstructed photoemission band structures (abbreviated as re-con.) and calculated band structures (abbreviated as calc.) from density functional theory(DFT) with different exchange-correlation functionals, including a, local density approxima-tion (LDA); b, PBE generalized gradient approximation (GGA); c, PBEsol GGA; d, HSE06hybrid functional. For each set of DFT band structure, the same energy shift (as in Supplemen-tary Fig. 8) is applied globally to all bands to align the energy zero at the K point with thereconstruction.text. To provide more intuitive visual guidance in interpreting the BS distance metric used inconstructing the similarity matrix, we compare these band structures along the high-symmetrylines of the Brillouin zone in Supplementary Fig. 4. S2 Generation of and validation on synthetic data

The advantage of using synthetic data is that the underlying band structure (i.e. ground truth)is exactly known such that they can be used for benchmarking the performance of the MAP re-construction algorithm described in this work. Benchmarking includes numerical experimentson two interrelated aspects: (1) testing the robustness of the reconstruction algorithm using dif-ferent initializations and comparing the deviations of the outcome from the ground-truth; (2)28esting the accuracy of reconstruction by determining the closest-possible reconstruction out-come from a given initialization. In the following, we ﬁrst describe the workﬂow of generatingthe band structure, the photoemission data and the initializations, which provide all essentialcomponents to carry out the tests. Then we present the benchmarking results on various cases.

Generation of band structure data.

We have adopted two approaches to generate band struc-ture data to meet the needs for testing the reconstruction algorithm. Firstly, we used analyticfunctions to describe the band dispersion (see Supplementary Fig. 5). They are computationallyefﬁcient, contain tunable parameters, can be produced at any resolution, and are easily extend-able to higher dimensions. In 2D momentum space, we constructed a multi-sinusoidal bandand two double-crossing parabolic bands. In 3D momentum space, we constructed a scaledversion of the strongly oscillating second-order Griewank function [60] and the tight-bindingformulation of the two-band graphene band structure [61] as model band dispersion surfaces.The modiﬁed Griewank function takes the form, E griewank ( k x , k y ) = 116000 ( k x + k y ) − cos(2 k x ) cos( √ k y ) . (19)The two-band tight-binding model of graphene has the energy dispersion relations, E ± ( k x , k y ) = ± (cid:118)(cid:117)(cid:117)(cid:116) (cid:16) √ k y a (cid:17) + 4 cos (cid:32) √ k y a (cid:33) cos (cid:18) k x a (cid:19) . (20)Here, E + and E − refer to the conduction band and the valence band, respectively. Secondly,we used numerical band structures from DFT calculations with different exchange-correlationfunctionals (see section S3). They are more physically realistic, but also require more compu-tation to obtain than generating bands from analytic functions. Tuning the initialization.

For simple bands constructed using analytic functions, tuning canbe achieved by modifying the parameters in the functions. In the complex multiband situationsuch as that of WSe , we tuned the initialization of the reconstruction algorithm by scalingor perturbing the coefﬁcient amplitudes of the constituent bases of the band structure. In ourcase, the bases are the terms of the hexagonal Zernike polynomials (ZPs) [62, 27]. Althoughunconstrained basis tuning is prone to unrealistic results, it achieves a level of ad hoc control forefﬁcient generation of a large amount of distinct initializations. For more physically realistic29 ce df b I n t en s i t y I n t en s i t y − k (a.u.) − − E ne r g y ( a . u . ) − k (a.u.) ground truthinitializationreconstruction I n t en s i t y − − − k x (a.u.) − − − k y ( a . u . ) Ground truth − − − k x (a.u.)Initialization − − − k x (a.u.)Reconstruction − − − − − k x (a.u.) − − − k y ( a . u . ) Difference − − − k y ( Å − ) Conduction BandGround truth Initialization Reconstruction − − k x (Å − ) − − k y ( Å − ) Valence Band − − k x (Å − ) − − k x (Å − ) − − − − − k x (Å − ) − − k y ( Å − ) − − − k y ( Å − ) Difference − E ne r g y ( a . u . ) k (a.u.) k (a.u.) − − − − I n t en s i t y E ne r g y E ne r g y Supplementary Figure 5:

Validations on 2D and 3D synthetic data . Test results for the re-construction algorithm on band structures generated with analytic functions. a , Reconstructionof a multi-sinusoidal band. b , Reconstruction of two double-crossing parabolic bands. c , d ,Reconstruction of a multi-extrema band with dispersion following the second-order Griewankfunction (see Eq. (19)) [60]. e , f , Reconstruction of the two bands of graphene nearby its Fermilevel ( e , f ) formulated in the tight-binding model (see Eq. (20)) [61]. The volumetric renderingsin c , e , display the synthetic data. The initialization for the reconstruction in a is a ﬂat line, while2D ﬂat bands are used to initialize the cases in d , f . In b , two double-crossing curves are neededas initialization to preserve the crossing in the reconstruction. The values in the difference plotsin d , f are calculated by subtracting the ground-truth band energies from the reconstructed ones.tuning, we used DFT calculations with different exchange-correlation functionals (see sectionS3). Approximate generation of photoemission data.

We approximately synthesized momentum-resolved photoemission data for each energy band by plugging the band energy and linewidthparameter at each momentum position into the Voigt proﬁle [63] (with Gaussian and Lorentzian30upplementary Figure 6:

Validation on 3D synthetic multiband photoemission data . a ,Synthetic photoemission data with b , the underlying band structure obtained from LDA-levelDFT calculation of WSe (only the ﬁrst 8 valence bands are used here). c , Comparison of twosets of differently scaled (by 0.8 and 1.2 times, respectively) initial conditions with respect tothe ground-truth band structure calculation (LDA calc.), shown for a k x - E (left) and a k y - E (right) slice. d , e , Comparison of the average error η avg for energy bands used as initializations(solid dots) and reconstructions (hollow dots). The initializations are constructed by scalingthe ground-truth band energies ( d ) or by using other DFT calculations ( e ). The reconstructionsall have reduced η avg compared with the initialization and η avg is consistent across all energybands. f , g , Reconstruction, ground truth (LDA), and initialization overlaid on the synthetic dataalong high-symmetry lines of the hexagonal Brillouin zone, corresponding to two of the casesin d and e , respectively. The energy zeros of the initialization in d - e are aligned with the groundtruth via a global shift. h , Comparisons of ground truth (LDA), reconstructed bands, and thedifferences between initialization (PBE), reconstruction and ground truth (g.t.) for each energyband. 31arameters σ and γ , and amplitude B ) computed using the Faddeeva function W [64]. TheVoigt proﬁle approximates the convolution of a single-particle spectral function (see SectionS1), describing the photoemission observable, with a Gaussian energy resolution function. Thesynthetic photoemission intensity, I synth , for a band structure composed of a set of energy bands, E B = { E b i } , is generated by combining multiple Voigt proﬁles in summation, similar to Eqs.(12)-(13). I synth ( k x , k y , E ) = (cid:88) j B j ( k x , k y ) σ j √ π Re (cid:20) W (cid:18) E − E b j ( k x , k y ) + i γ j ( k x , k y ) σ j √ (cid:19)(cid:21) (21)Without loss of generality, we assume the energy resolution in detection for all bands to be thesame ( σ j = σ ). For the cases shown in Supplementary Figs. 5-6, the linewidth parameter γ are set to a constant throughout the band. In all synthetic data, we omitted the inhomogeneousintensity modiﬁcations in realistic photoemission data due to experimental factors such as theexperimental geometry, sample condition, matrix element effect, photon energy, etc. This omit-tance relies on the assumption that the essential preprocessing step, such as symmetrization andcontrast enhancement [19] in our workﬂow (see main text Methods), can sufﬁciently restorethe intensity continuity along the energy bands. The momentum resolution effect is also notaccounted for because the instrument (such as METIS 1000 [6, 36]) has a higher momentumresolution than the momentum spacing used in data binning or generation. Validation of the reconstruction algorithm.

Using synthetic data generated from analyticfunctions of varying complexities as the band structure, we test out the accuracy of reconstruc-tion algorithm (see Supplementary Fig. 5); Using synthetic multiband data generated from theLDA-level DFT (LDA-DFT) band structures of WSe (see section S3), we tested out the sensi-tivity of reconstruction to the initialization (see Supplementary Fig. 6). In this case, to capturesufﬁcient physical realism similar to the photoemission band mapping of WSe presented in themain text, we set the energy resolution parameter of σ = 100 meV, the lineshape parameter γ =50 meV [65], and the energy spacing of data to ∼

18 meV, identical to the energy bin size forthe experimental data. The tests include four sets of numerical experiments summarized below:(1) Reconstructing non-crossing bands: For isolated bands, we tested synthetic data constructedfrom a multi-sinusoidal band (Supplementary Fig. 5a), the band generated by the Griewankfunction (Supplementary Fig. 5c-d), and the two-band tight-binding model of graphene (Sup-32lementary Fig. 5e-f). In these cases, initialization with a ﬂat band without any initial knowl-edge of the band dispersion (i.e. cold start) is sufﬁcient to recover its shape, regardless of thecomplexity of the dispersion.(2) Reconstructing crossing bands: We tested the simplest case of crossing bands with twoparabolas of opposite directions of opening (Supplementary Fig. 5b). To recover the dispersionwithout band index scrambling, the knowledge of crossing needs to be included numericallyin the initialization. This means, operationally, that the initialization requires crossing bandsat nearby energy values, or that the reconstruction needs a warm-start optimization. For theintercrossing parabolas, the initializations that yield feasible outcomes are generated by slighttuning of the parabola parameters in the range that retains the crossing.(3) Sensitivity of reconstruction to scaled energies as initialization: We scaled the energiesof the LDA-DFT band structure of WSe (using the ﬁrst 8 valence bands) around the meanenergy of each band (see Supplementary Fig. 6c) to for use as the initialization. The accuracyof the reconstruction outcome is evaluated by its average error η avg (Eq. (8) in the main textMethods), calculated with respect to the ground-truth band energies. The results displayedin Supplementary Fig. 6d,f show that the average error and its spread in the reconstruction arereduced from the corresponding values in the initialization. Quantitatively, in the reconstruction, η avg is within the range 20-65 meV, while in the initialization, η avg varies within 45-100 meVfor all 8 valence bands.(4) Sensitivity of reconstruction to differently calculated band structures as initialization: Weused DFT band structure calculations of WSe with PBE, PBEsol and HSE06 exchange-correlationfunctionals (see section S3) to initialize the reconstruction. The accuracy of the reconstructionis quantiﬁed similarly as in the previous numerical experiment using η avg . The results displayedin Supplementary Fig. 6e,g,h show that, despite the huge spread in the average error for the dif-ferent levels of DFT calculations (used as initialization without global shift alignment of energyzero), the corresponding reconstructions all have average errors at around or below 40 meV forevery band. The value of η avg varies by up to ∼

30 meV (i.e. between band

S3 Band structure calculations

DFT calculations.

The crystal structure of bulk WSe with 2H stacking (2H-WSe ) belongsto the P6 /mmc space group and consists of two Se-W-Se triatomic layers as shown in Supple-mentary Fig. 7. The stacking order of the two hexagonal layers is - BAB - ABA - and the long c -axis is oriented perpendicular to the layers. Electronic structure calculations were performedwithin DFT using the local density approximation (LDA), the generalized-gradient approxima-tion (GGA-PBE and GGA-PBEsol), and hybrid (HSE06) exchange-correlation functionals asimplemented in FHI-aims [48]. The atomic orbitals basis sets, the integration grids and Hartreepotential employed for all calculations are according to the default “tight” numerical settings ofFHI-aims. A 16 × × k -gird was used to sample the Brillouin zone. The Broyden-Fletcher-Goldfarb-Shanno optimization algorithm was used to relax the atomic positions untilthe residual force component per atom was less than 10 − eV/ ˚A. Supplementary Table 1 showsthe optimized lattice constants, a and c , as obtained by the evaluation of the analytical stresstensor [66] using different exchange-correlation functionals. In all BS calculations, we includedthe effect of spin-orbit coupling, which is known to introduce a large splitting of the outermostvalence states of bulk 2H-WSe [67].The calculated BSs of bulk 2H-WSe using different levels of approximation for the exchange-correlation (XC) functional are shown in Supplementary Fig. 8. For each XC functional, thecalculations were performed on (1) fully optimized structures (black lines), and on (2) opti-mized structures by ﬁxing the lattice parameters of the unit cell to the experimental values(colored lines). All calculations using different XC functionals reveal an indirect band gap withthe conduction band minimum located along the Γ -K path ( Γ and K being the bulk equivalents34 b Supplementary Figure 7:

Crystal structure of bulk 2H-WSe . a , Side view and b , top view ofthe crystal structure of 2H-WSe . The space group of the hexagonal structure is P6 /mmc withthe c -axis oriented perpendicular to the stacking layers. In each case, the real-space unit cell islabelled by dashed black lines.Supplementary Table 1: Parameters from density functional theory calculations.

Op-timized lattice constants, spin-orbit splitting of the topmost valence states at the K high-symmetry point, and the band gap of bulk 2H-WSe calculated within density functionaltheory using the LDA, PBE, PBEsol and HSE06 exchange-correlation functionals. For com-parison, we also report the corresponding experimental values at room temperature.xc-functional LDA PBE PBEsol HSE06 Experiment a ( ˚A) 3.250 3.317 3.269 3.295 3.28 c ( ˚A) 12.827 14.921 13.211 13.863 12.98Spin-orbit splitting at K (eV) 0.485 Band gap (eV) 1.022 Fully optimized structure. Optimized structure by ﬁxing the lattice parameters to experimental values. Ref. [12]. Ref. [68]. of the Γ and K high-symmetry points). For both sets of optimized structures, the LDA resultsreveal a valence band maximum at the Γ point, compatible with experimental measurements,while the PBE, PBEsol, and HSE06 band structures obtained for fully optimized structures ex-hibit a valence band maximum at the K point. Nevertheless, ﬁxing the unit cell dimensions atthe experimental lattice constants reproduces the experimental behavior that the valence band35 bc d Supplementary Figure 8:

Bulk electronic band structure of 2H-WSe . a - d , Band structureof bulk 2H-WSe along the Γ -K-M- Γ momentum path of its Brillouin zone including the effectof spin-orbit coupling. Calculations were performed using the LDA (green, a ), PBE (orange, b ), PBEsol (yellow, c ), and HSE06 (blue, d ) exchange-correlation functionals and optimizedstructures (see Supplementary Table 1) with the unit cell dimensions kept ﬁxed at the experi-mental lattice constants. Black lines in a - d represent the corresponding calculations using fullyoptimized geometries. For comparison, the two band structures in each plot are rigidly shiftedto align their uppermost valence state at the K high-symmetry point, where we also deﬁne asthe energy zero. All band structure calculations used k z = 0.35 ˚ A − .36aximum resides at the Γ point. The difference between the two sets of calculations obtainedusing PBE, PBEsol, and HSE06 functionals is attributed to the overestimation of the lattice pa-rameter c and the residual strain along the c -axis [69]. The calculated indirect band gaps and thespin-orbit splitting of the two topmost valence states at the K point using both sets of optimizedstructures are shown in Supplementary Table 1. Brillouin zone tiling.

Generation of a large and densely sampled patch of energy bands cov-ering the ﬁrst Brillouin zone and beyond is crucial for initialization of the MRF model. Tobalance the computational cost using different XC functionals with the dense sampling similarto the experimental data grid, we used the symmetry properties of the Brillouin zone to tile thecalculated momentum-space rectangular patch that covers the Γ , K and M points of the Bril-louin zone. The hexagonal Brillouin zone of WSe has a sixfold rotation symmetry axis andtwo independent mirror planes in the ( k x , k y ) coordinates. The initial rectangular patch is ﬁrstsymmetrized about the two mirror planes in the Γ -K and Γ -M directions to form a larger patch,which is then rotated by ◦ and ◦ , respectively, and combined with the original mirror-symmetrized patch. The composite patch is then shifted along all six Γ -M directions by oneunit cell distance and the result is cut to the required shape compatible with photoemission data. S4 Band structure informatics

Global scale structure.

Unbiased approximators allow us to use informatics tools for dataretrieval, representation and comparison for entire bands. We extend the examples given in Fig.3 of the main text to other bands and band structures used in the present work. SupplementaryFig. 9 displays the band-wise comparison of dispersion surfaces within other DFT calculations.These results contain similar features as Fig. 3a and 3c in the main text, reafﬁrming that thegeometric featurization provides a sparse representation of the band dispersions and that thedispersion similarities are largely preserved despite the use of different exchange-correlationfunctionals in the DFT calculations. They may, therefore, be regarded as general features of theband structure of WSe .In Supplementary Fig. 10, we demonstrate numerically the approximation capability ofthe hexagonal ZP basis set to all 14 valence bands of WSe . Despite the stark differences inenergy dispersion, the approximation to reconstructed bands (Supplementary Fig. 10a-d) and37 b cd e fg h C oun t s A m p li t ude ( a . u . ) Band C oun t s A m p li t ude ( a . u . ) Band C oun t s A m p li t ude ( a . u . ) Band C oun t s A m p li t ude ( a . u . ) Band B and i nde x HSE06 − − − − − C o s i ne s i m il a r i t y B and i nde x LDA − − − − − C o s i ne s i m il a r i t y B and i nde x PBE − − − − − C o s i ne s i m il a r i t y B and i nde x PBEsol − − − − − C o s i ne s i m il a r i t y Supplementary Figure 9:

Geometric featurization of the energy bands of WSe . a-d , De-composition of the 14 valence energy bands of WSe into hexagonal Zernike polynomials forthe DFT band structure calculations carried out at the levels of LDA ( a ), PBE ( b ), PBEsol ( c ),and HSE06 ( d ), respectively. Similar characteristics are seen compared with the reconstructedband structure shown in Fig. 3a in the main text, including the sparse distribution of signiﬁcantbasis terms and the decreasing dependence on higher-order basis terms. e-h , Cosine similaritymatrices between the 14 energy bands of WSe for the DFT band structure calculations carriedout at the levels of LDA ( e ), PBE ( f ), PBEsol ( g ), and HSE06 ( h ), respectively. The character-istics of these matrices resemble that calculated for the reconstructed band structure as shownin Fig. 3c in the main text. 38

50 100 150 200 250 300 350 400

Number of terms − − − − R e l a t i v e app r o x i m a t i on e rr o r band Number of terms − − − − R e l a t i v e app r o x i m a t i on e rr o r band a bc de fg h ij Number of terms A v e r age app r o x i m a t i on e rr o r ( m e V ) band Number of terms A v e r age app r o x i m a t i on e rr o r ( m e V ) band − − − − − − − E ne r g y ( e V ) − − − − − − − E ne r g y ( e V )

45 termsReconstructionApproximation Γ M K Γ − − − − − − − E ne r g y ( e V )

150 termsReconstructionApproximationApproximation by polynomials in default order − − − − − − − E ne r g y ( e V ) − − − − − − − E ne r g y ( e V )

45 termsReconstructionApproximation Γ M K Γ − − − − − − − E ne r g y ( e V )

150 termsReconstructionApproximationApproximation by polynomials in coefficient order0 50 100 150 200 250 300 350 400

Number of terms − − − − R e l a t i v e app r o x i m a t i on e rr o r band Number of terms A v e r age app r o x i m a t i on e rr o r ( m e V ) band Number of terms − − − − R e l a t i v e app r o x i m a t i on e rr o r band Number of terms A v e r age app r o x i m a t i on e rr o r ( m e V ) band Supplementary Figure 10:

Approximation to the band structure of WSe by a polynomialbasis . a-j , Demonstration of the convergence properties of the polynomial approximation usingreconstructed photoemission band structure ( a-d ) and DFT band structure calculated at the LDAlevel ( e-h ). When summing the hexagonal Zernike polynomial in the default order, the averageand relative approximation errors for the reconstructed ( a , b ) and theoretical ( e , f ) energy bandsconverge much slower than summing the polynomials in an ordering ranked by the magnitudeof their coefﬁcients (coefﬁcient order). This observation is similar for reconstructed ( c , d ) andtheoretical ( g , h ) energy bands. i - j , Visualization of the difference in convergence rates using thereconstructed band structure along the high-symmetry lines. The naturally-ordered polynomialbasis has not yet converged with 150 terms ( i ), while the coefﬁcient-ranked polynomials ( j )produces an accurate approximation well within that limit.39heoretical band structure at the level of LDA-DFT (Supplementary Fig. 10e-h) show compa-rable convergence rates. Quantitatively, the approximation using hexagonal ZPs ordered by themagnitude of the corresponding coefﬁcients (i.e. coefﬁcient order) converges to within 10-30meV/band within 50 polynomial basis terms, signiﬁcantly faster than using the default order(see also Fig. 3b for reference). The remaining errors are on par with the ﬁnite step size alongSupplementary Figure 11: Local band structure parameters . a , The ﬁrst valence band ofWSe with constant-energy contours. The patches around high-symmetry points K and M (cid:48) fromreconstruction (with LDA-DFT as the initialization) are overlaid in color. b , c , Patch around theM (cid:48) -point, a saddle point in the dispersion surface, visualized in 3D ( b ) and 2D ( c ), respec-tively. The energy gap at M (cid:48) due to spin-orbit coupling (SOC) results in the energy difference ∆ E M (cid:48) , − . d , e , Patch around the K-point, the energy maximum of the valence band, visualizedin 3D ( d ) and 2D ( e ), respectively. The SOC results in the energy gap ∆ E K , − . The outcomeof ﬁtting to a trigonal warping (TW) model around K from k · p theory [22] is shown in e .the energy axis in the data ( ∼

18 meV) that results in the imperfect smoothness of the recon-40tructed bands. This further proves that the hexagonal ZPs can provide an accurate and sparseapproximation for the band structure data. The trend of convergence between these two typesof polynomial ordering is further illustrated in Supplementary Fig. 10i-j in the momentum pathalong high-symmetry lines of the reconstructed band structure.

Local scale structure.

Local structural information includes energy gaps, effective masses,warpings, (avoided) crossings, etc. We extracted some of their associated parameters at andaround three high-symmetry points (K, M (cid:48) , and Γ , see Supplementary Fig. 11a) and compiledthe results in Supplementary Table 2. The dispersions and band structure parameters from theMAP reconstruction are compared with those extracted by line-by-line ﬁtting of the EDCs,which used the band energies from the reconstruction as initialization to improve robustness.Around K, two spectral peaks corresponding to two spin-split bands were ﬁt simultaneously,while around M (cid:48) and Γ , four were ﬁt simultaneously due to the spectral proximity of the ﬁrstfour valence bands (see Supplementary Fig. 4). The ﬁtting is carried out using a linear su-perposition of Voigt lineshapes and the lmﬁt package [70] with the reconstructed band energyas initialization (but not ﬁxed). The ﬁtting procedure iterates over the EDCs (e.g. a total of50 ×

50 EDCs for the patch around M (cid:48) ). Unstable ﬁts yielding erratic results (e.g. if differingsigniﬁcantly from neighboring values) are re-ﬁt with either algorithmically or manually ad-justed initialization. Supplementary Table 2 shows that the local structural information fromreconstruction is generally consistent with those obtained by iterative pointwise ﬁtting, whilediffering from DFT calculations. The deviations in the size of energy gaps at K and M (cid:48) betweenreconstruction and pointwise ﬁtting lie in the same range as the momentum-averaged recon-struction errors (see section S2), which are due to the ﬁnite coordinate spacing in the data ( ∼

18 meV in energy).The region extracted around K (see Supplementary Fig. 11d-e) contains about 10% of thedistance of Γ − K. Due to the strong trigonal warping (TW) effect in this class of materials,the effective masses and the TW parameters around K were ﬁt simultaneously in 2D using themomentum-space model derived from k · p theory [22]. E ( q ) = (cid:126) q m K + C | q | cos(3 ϕ q + θ ) + E . (22)Here, q is the momentum vector k recentered on a particular K (or K (cid:48) ) point by translation, m K is the effective mass of the hole at K point, C is the magnitude of the TW (named C Band structure parameters from experiment and theory.

Effective masses of holes ( m K ), trigonal warping parameters ( C ) are extract at K pointin the ﬁrst two valence bands. Two directional effective masses at M (cid:48) ( m M (cid:48) ), and oneat Γ ( m Γ ), are obtained for the ﬁrst valence band. The energy gaps ( ∆ E ) between theﬁrst two valence bands are obtained at both K and M (cid:48) points. The number (1 or 2) inthe subscript of the parameter symbols denotes the valence band index, m e is the massof an isolated electron.Symmetry point Parameter LDA recon. Line ﬁtting LDA HSE06 K m K , /m e − − − − m K , /m e − − − − C K , (eV · ˚A ) 5.3 5.8 6.2 4.5K C K , (eV · ˚A ) 4.0 3.9 3.9 3.2K ∆ E K , − (meV) 419 446 485 467M (cid:48) m M (cid:48) − Γ , /m e (cid:48) m M (cid:48) − K (cid:48) , /m e − − − − (cid:48) ∆ E M (cid:48) , − (meV) 352 338 127 48 Γ m Γ , /m e − − − − Using band dispersion reconstructed globally by the proposed probabilistic machine learning algo-rithm with DFT calculation at the LDA level as the initialization. Using band dispersion from iterative lineshape ﬁtting of the energy distribution curves (in an regionaround the corresponding high-symmetry points). With fully optimized structure, see Supplementary Table 1. in [22]), ϕ q is the polar angle in the coordinate system centered on a K (or K (cid:48) ) point, θ is anauxiliary ﬁtting parameter used to accommodate the orientation of the TW with respect to thepixel coordinates deﬁned by the rectangular region of interest, E accounts for the energy offset.The energy gaps at K ( ∆ E K , − ) and M (cid:48) ( ∆ E M (cid:48) , − ) are illustrated in Supplementary Fig. 11(b and d), respectively. The M (cid:48) (or M) point situates at a saddle point of the dispersion surface(ﬁrst valence band), as shown in Supplementary Fig. 11b-c. Its lower symmetry (comparedwith K, K (cid:48) and Γ ) means that the effective masses exhibits anisotropy, with opposite signs andmagnitude along the M (cid:48) − Γ and M (cid:48) − K (cid:48) directions. We ﬁt the dispersion locally using a modelthat also accounts for the spin-orbit interaction involving a linear momentum-dependent shift(Eq. 14 in [22]). The second valence band is not ﬁtted at M (cid:48) due to the pronounced dispersionmodulation by interband coupling unaccounted for in the existing saddle-shaped model. Ataround Γ , a single effective mass is extracted by ﬁtting a paraboloid to a local patch of thedispersion surface. 42 eferences [1] L. P. Bouckaert, R. Smoluchowski, and E. Wigner. Theory of Brillouin Zones and Sym-metry Properties of Wave Functions in Crystals. Physical Review , 50(1):58–67, 7 1936.[2] T.-C. Chiang and F. Seitz. Photoemission spectroscopy in solids.

Annalen der Physik ,10(1-2):61–74, 2 2001.[3] T. Valla, A. V. Fedorov, P. D. Johnson, B. O. Wells, S. L. Hulbert, Q. Li, G. D. Gu, andN. Koshizuka. Evidence for Quantum Critical Behavior in the Optimally Doped CuprateBi2Sr2CaCu2O8+.

Science , 285(5436):2110–2113, 9 1999.[4] G. Levy, W. Nettke, B. M. Ludbrook, C. N. Veenstra, and A. Damascelli. Deconstructionof resolution effects in angle-resolved photoemission.

Physical Review B , 90(4):045150,7 2014.[5] Gerd Sch¨onhense, Katerina Medjanik, and Hans-Joachim Elmers. Space-, time- andspin-resolved photoemission.

Journal of Electron Spectroscopy and Related Phenomena ,200:94–118, 4 2015.[6] K. Medjanik, O. Fedchenko, S. Chernov, D. Kutnyakhov, M. Ellguth, A. Oelsner,B. Sch ¨onhense, T. R. F. Peixoto, P. Lutz, C.-H. Min, F. Reinert, S. D¨aster, Y. Acremann,J. Viefhaus, W. Wurth, H. J. Elmers, and G. Sch¨onhense. Direct 3D mapping of the Fermisurface and Fermi velocity.

Nature Materials , 16(6):615–621, 3 2017.[7] Eric B. Isaacs and Chris Wolverton. Inverse Band Structure Design via Materials DatabaseScreening: Application to Square Planar Thermoelectrics.

Chemistry of Materials ,30(5):1540–1546, 3 2018.[8] E. G. Marin, M. Perucchini, D. Marian, G. Iannaccone, and G. Fiori. Modeling of ElectronDevices Based on 2-D Materials.

IEEE Transactions on Electron Devices , 65(10):4167–4179, 10 2018.[9] Andrea Damascelli, Zahid Hussain, and Zhi-Xun Shen. Angle-resolved photoemissionstudies of the cuprate superconductors.

Reviews of Modern Physics , 75(2):473–541, 42003. 4310] Christopher Corder, Peng Zhao, Jin Bakalis, Xinlong Li, Matthew D. Kershis, Amanda R.Muraca, Michael G. White, and Thomas K. Allison. Ultrafast extreme ultraviolet photoe-mission without space charge.

Structural Dynamics , 5(5):054301, 9 2018.[11] M. Puppin, Y. Deng, C. W. Nicholson, J. Feldl, N. B. M. Schr¨oter, H. Vita, P. S. Kirch-mann, C. Monney, L. Rettig, M. Wolf, and R. Ernstorfer. Time- and angle-resolved pho-toemission spectroscopy of solids in the extreme ultraviolet at 500 kHz repetition rate.

Review of Scientiﬁc Instruments , 90(2):023104, 2 2019.[12] J. M. Riley, F. Mazzola, M. Dendzik, M. Michiardi, T. Takayama, L. Bawden,C. Granerød, M. Leandersson, T. Balasubramanian, M. Hoesch, T. K. Kim, H. Takagi,W. Meevasana, Ph. Hofmann, M. S. Bahramy, J. W. Wells, and P. D. C. King. Direct ob-servation of spin-polarized bulk bands in an inversion-symmetric semiconductor.

NaturePhysics , 10(11):835–839, 11 2014.[13] M. S. Bahramy, O. J. Clark, B.-J. Yang, J. Feng, L. Bawden, J. M. Riley, I. Markovi´c,F. Mazzola, V. Sunko, D. Biswas, S. P. Cooil, M. Jorge, J. W. Wells, M. Leandersson,T. Balasubramanian, J. Fujii, I. Vobornik, J. E. Rault, T. K. Kim, M. Hoesch, K. Okawa,M. Asakawa, T. Sasagawa, T. Eknapakul, W. Meevasana, and P. D. C. King. Ubiquitousformation of bulk Dirac cones and topological surface states from a single orbital manifoldin transition-metal dichalcogenides.

Nature Materials , 17(1):21–28, 1 2018.[14] Niels B. M. Schr ¨oter, Ding Pei, Maia G. Vergniory, Yan Sun, Kaustuv Manna, Fernandode Juan, Jonas. A. Krieger, Vicky S ¨uss, Marcus Schmidt, Pavel Dudin, Barry Bradlyn,Timur K. Kim, Thorsten Schmitt, Cephise Cacho, Claudia Felser, Vladimir N. Strocov,and Yulin Chen. Chiral topological semimetal with multifold band crossings and longFermi arcs.

Nature Physics , 5 2019.[15] Kevin P. Murphy.

Machine Learning: A Probabilistic Perspective . MIT Press, 2012.[16] Zoubin Ghahramani. Probabilistic machine learning and artiﬁcial intelligence.

Nature ,521(7553):452–459, 5 2015. 4417] Chaohui Wang, Nikos Komodakis, and Nikos Paragios. Markov random ﬁeld modeling,inference & learning in computer vision & image understanding: A survey.

ComputerVision and Image Understanding , 117(11):1610 – 1627, 2013.[18] Kevin Kaufmann, Chaoyi Zhu, Alexander S. Rosengarten, Daniel Maryanovsky, Tyler J.Harrington, Eduardo Marin, and Kenneth S. Vecchio. Crystal symmetry determination inelectron diffraction using machine learning.

Science , 367(6477):564–568, jan 2020.[19] Vincent Stimper, Stefan Bauer, Ralph Ernstorfer, Bernhard Scholkopf, and Rui PatrickXian. Multidimensional Contrast Limited Adaptive Histogram Equalization.

IEEE Access ,7:165437–165447, 6 2019.[20] M. Traving, M. Boehme, L. Kipp, M. Skibowski, F. Starrost, E. E. Krasovskii, A. Perlov,and W. Schattke. Electronic structure of WSe : A combined photoemission and inversephotoemission study. Physical Review B , 55(16):10392–10399, 4 1997.[21] Th. Finteis, M. Hengsberger, Th. Straub, K. Fauth, R. Claessen, P. Auer, P. Steiner,S. H ¨ufner, P. Blaha, M. V¨ogt, M. Lux-Steiner, and E. Bucher. Occupied and unoccupiedelectronic band structure of WSe . Physical Review B , 55(16):10400–10411, 4 1997.[22] Andor Korm´anyos, Guido Burkard, Martin Gmitra, Jaroslav Fabian, Viktor Z´olyomi,Neil D Drummond, and Vladimir Fal’ko. k · p theory for two-dimensional transitionmetal dichalcogenide semiconductors.

2D Materials , 2(2):022001, 4 2015.[23] Simon Moser. An experimentalist’s guide to the matrix element in angle resolved photoe-mission.

Journal of Electron Spectroscopy and Related Phenomena , 214:29–52, 1 2017.[24] John P. Perdew and Karla Schmidt. Jacob’s ladder of density functional approximationsfor the exchange-correlation energy. In

AIP Conference Proceedings , volume 577, pages1–20. AIP, 2001.[25] Dengsheng Zhang and Guojun Lu. Review of shape representation and description tech-niques.

Pattern Recognition , 37(1):1–19, 1 2004.[26] A. Khotanzad and Y.H. Hong. Invariant image recognition by Zernike moments.

IEEETransactions on Pattern Analysis and Machine Intelligence , 12(5):489–497, 5 1990.4527] Virendra N. Mahajan and Guang-ming Dai. Orthonormal polynomials in wavefront anal-ysis: analytical solution.

Journal of the Optical Society of America A , 24(9):2994, 2007.[28] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. Image retrieval: Ideas, inﬂuences,and trends of the new age.

ACM Computing Surveys , 40(2):1–60, 4 2008.[29] Dorothea Golze, Marc Dvorak, and Patrick Rinke. The GW Compendium: A PracticalGuide to Theoretical Photoemission Spectroscopy.

Frontiers in Chemistry , 7:377, 7 2019.[30] M Zacharias, M Schefﬂer, and C Carbogno. Fully anharmonic, non-perturbative theory ofvibronically renormalized electronic band structures. arXiv , 2003.10417, 2020.[31] Matthias Ehrhardt and Thomas Koprucki, editors.

Multi-Band Effective Mass Approxima-tions , volume 94 of

Lecture Notes in Computational Science and Engineering . Springer,2014.[32] R.A. Ewings, A. Buts, M.D. Le, J. van Duijn, I. Bustinduy, and T.G. Perring. Horace :Software for the analysis of data from single crystal spectroscopy experiments at time-of-ﬂight neutron instruments.

Nuclear Instruments and Methods in Physics Research SectionA: Accelerators, Spectrometers, Detectors and Associated Equipment , 834:132–142, 102016.[33] Laura von Rueden, Sebastian Mayer, Katharina Beckh, Bogdan Georgiev, Sven Giessel-bach, Raoul Heese, Birgit Kirsch, Julius Pfrommer, Annika Pick, Rajkumar Ramamurthy,Michal Walczak, Jochen Garcke, Christian Bauckhage, and Jannis Schuecker. InformedMachine Learning – A Taxonomy and Survey of Integrating Knowledge into LearningSystems. arXiv , 1903.12394, 3 2019.[34] A. Oelsner, O. Schmidt, M. Schicketanz, M. Klais, G. Sch¨onhense, V. Mergel, O. Jagutzki,and H. Schmidt-B¨ocking. Microspectroscopy and imaging using a delay line detector intime-of-ﬂight photoemission microscopy.

Review of Scientiﬁc Instruments , 72(10):3968–3974, 10 2001.[35] Rui Patrick Xian, Yves Acremann, Steinn Ymir Agustsson, Maciej Dendzik, KevinB¨uhlmann, Davide Curcio, Dmytro Kutnyakhov, Frederico Pressacco, Michael Heber,46huo Dong, Jure Demsar, Wilfried Wurth, Philip Hofmann, Martin Wolf, Laurenz Ret-tig, and Ralph Ernstorfer. An open-source, distributed workﬂow for band mapping data inmultidimensional photoemission spectroscopy. arXiv , 1909.07714, 9 2019.[36] SPECS GmbH. METIS 1000 Brochure. , 2019.[37] Rui Patrick Xian, Laurenz Rettig, and Ralph Ernstorfer. Symmetry-guided nonrigid reg-istration: The case for distortion correction in multidimensional photoemission spec-troscopy.

Ultramicroscopy , 202:133–139, 7 2019.[38] J Kittler and J Fglein. Contextual classiﬁcation of multispectral pixel data.

Image andVision Computing , 2(1):13 – 29, 1984.[39] Mart´ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, IanGoodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefow-icz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Man´e, Rajat Monga, SherryMoore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, IlyaSutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, FernandaVi´egas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xi-aoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems,2015. Software available from tensorﬂow.org.[40] Stan Li.

Markov Random Field Modeling in Image Analysis . Advances in Pattern Recog-nition. Springer, 3 edition, 2009.[41] Jorge Nocedal and Stephen J. Wright.

Numerical Optimization . Springer New York, 2ndedition, 2006.[42] Vincent Stimper and Rui Patrick Xian. fuller. https://github.com/mpes-kit/fuller .[43] D. M. Ceperley and B. J. Alder. Ground state of the electron gas by a stochastic method.

Phys. Rev. Lett. , 45:566–569, 8 1980. 4744] John P. Perdew and Yue Wang. Accurate and simple analytic representation of theelectron-gas correlation energy.

Phys. Rev. B , 45:13244–13249, 6 1992.[45] John P. Perdew, Kieron Burke, and Matthias Ernzerhof. Generalized Gradient Approxi-mation Made Simple.

Physical Review Letters , 77(18):3865–3868, 10 1996.[46] John P. Perdew, Adrienn Ruzsinszky, G´abor I. Csonka, Oleg A. Vydrov, Gustavo E. Scuse-ria, Lucian A. Constantin, Xiaolan Zhou, and Kieron Burke. Restoring the Density-Gradient Expansion for Exchange in Solids and Surfaces.

Physical Review Letters ,100(13):136406, 4 2008.[47] Jochen Heyd, Gustavo E. Scuseria, and Matthias Ernzerhof. Hybrid functionals based ona screened Coulomb potential.

The Journal of Chemical Physics , 118(18):8207–8215, 52003.[48] Volker Blum, Ralf Gehrke, Felix Hanke, Paula Havu, Ville Havu, Xinguo Ren, KarstenReuter, and Matthias Schefﬂer. Ab initio molecular simulations with numeric atom-centered orbitals.

Computer Physics Communications , 180(11):2175–2196, 11 2009.[49] William P. Huhn and Volker Blum. One-hundred-three compound band-structure bench-mark of post-self-consistent spin-orbit coupling treatments in density functional theory.

Physical Review Materials , 1(3):033803, 8 2017.[50] James C. Wyant and Katherine Creath. Basic wavefront aberration theory. In

AppliedOptics and Optical Engineering , volume Xl, pages 1–53. Academic Press, Inc., 1992.[51] David S. Watkins.

Fundamentals of matrix computations . Wiley, 3rd edition, 2010.[52] Stefan H¨ufner.

Photoelectron Spectroscopy . Advanced Texts in Physics. Springer BerlinHeidelberg, Berlin, Heidelberg, 3rd editio edition, 2003.[53] Christopher M. Bishop.

Pattern Recognition and Machine Learning . Springer, 2006.[54] J. M. Hammersley and P. Clifford. Markov Fields on Finite Graphs and Lattices. Unpub-lished, 1971. 4855] Julian Besag. Spatial Interaction and the Statistical Analysis of Lattice Systems.

Journalof the Royal Statistical Society. Series B (Methodological) , 36(2):192–236, 1974.[56] Stuart Geman and Donald Geman. Stochastic Relaxation, Gibbs Distributions, and theBayesian Restoration of Images.

IEEE Transactions on Pattern Analysis and MachineIntelligence , 6(6):721–741, 11 1984.[57] Julian Besag. On the Statistical Analysis of Dirty Pictures.

Journal of the Royal StatisticalSociety: Series B (Methodological) , 48(3):259–279, 7 1986.[58] Judea Pearl.

Probabilistic Reasoning in Intelligent Systems . Morgan Kaufmann, 1988.[59] J. Besag. On the statistical analysis of nearest-neighbours. In

Proceedings of the EuropeanMeeting of Statisticians, Budapest , 1972.[60] M. Locatelli. A note on the Griewank test function.

Journal of Global Optimization ,25(2):169–174, 2 2003.[61] Cristina Bena and Gilles Montambaux. Remarks on the tight-binding model of graphene.

New Journal of Physics , 11(9):095003, 9 2009.[62] Virendra N. Mahajan and Guang-ming Dai. Orthonormal polynomials for hexagonalpupils.

Optics Letters , 31(16):2462, 8 2006.[63] H. C. van de Hulst and J. J. M. Reesinck. Line Breadths and Voigt Proﬁles.

The Astro-physical Journal , 106:121, 7 1947.[64] Mofreh R. Zaghloul and Ahmed N. Ali. Algorithm 916: Computing the Faddeyeva andVoigt Functions.

ACM Transactions on Mathematical Software , 38(2):1–22, 12 2011.[65] Hannes Huebener. Calculations of the momentum-resolved electron self-energy of 2H-WSe . Unpublished, 2016.[66] Franz Knuth, Christian Carbogno, Viktor Atalla, Volker Blum, and Matthias Schefﬂer.All-electron formalism for total energy strain derivatives and stress tensor components fornumeric atom-centered orbitals. Computer Physics Communications , 190:33 – 50, 2015.4967] D. Voß, P. Kr¨uger, A. Mazur, and J. Pollmann. Atomic and electronic structure of WSe from ab initio theory: Bulk crystal and thin ﬁlm systems. Phys. Rev. B , 60:14311–14317,11 1999.[68] K K Kam, C L Chang, and D W Lynch. Fundamental absorption edges and indirect bandgaps in W Mo x Se (0 (cid:54) x (cid:54) Journal of Physics C: Solid State Physics , 17(22):4031–4040, 8 1984.[69] Sujay B. Desai, Gyungseon Seol, Jeong Seuk Kang, Hui Fang, Corsin Battaglia, RehanKapadia, Joel W. Ager, Jing Guo, and Ali Javey. Strain-induced indirect to direct bandgaptransition in multilayer WSe . Nano Letters , 14(8):4592–4597, 2014. PMID: 24988370.[70] Matt Newville, Renee Otten, Andrew Nelson, Antonino Ingargiola, Till Stensitzki, DanAllan, Austin Fox, Faustin Carter, Michał, Dima Pustakhod, Yoav Ram, Glenn, ChristophDeil, Stuermer, Alexandre Beelen, Oliver Frost, Nicholas Zobrist, Gustavo Pasquevich,Allan L. R. Hansen, Tim Spillane, Shane Caldwell, Anthony Polloreno, Andrewhannum,Julius Zimmermann, Jose Borreguero, Jonathan Fraine, Deep-42-thought, Benjamin F.Maier, Ben Gamari, and Anthony Almarza. lmﬁt/lmﬁt-py 1.0.0. https://doi.org/10.5281/zenodo.3588521https://doi.org/10.5281/zenodo.3588521