Efficient Data Management in Neutron Scattering Data Reduction Workflows at ORNL
William F Godoy, Peter F Peterson, Steven E Hahn, Jay J Billings
EEfficient Data Management in Neutron ScatteringData Reduction Workflows at ORNL
William F Godoy
Computer Science andMathematics Division,Oak Ridge National Laboratory
Oak Ridge, TN, USAEmail: [email protected]
Peter F Peterson
Computer Science andMathematics Division,Oak Ridge National Laboratory
Oak Ridge, TN, USAEmail: [email protected]
Steven E Hahn
Computer Science andMathematics Division,Oak Ridge National Laboratory
Oak Ridge, TN, USAEmail: [email protected]
Jay J Billings
Computer Science andMathematics Division,Oak Ridge National Laboratory
Email: [email protected]
Abstract —Oak Ridge National Laboratory (ORNL) experi-mental neutron science facilities produce 1.2 TB a day of rawevent-based data that is stored using the standard metadata-richNeXus schema built on top of the HDF5 file format. Performanceof several data reduction workflows is largely determined bythe amount of time spent on the loading and processing al-gorithms in Mantid, an open-source data analysis frameworkused across several neutron sciences facilities around the world.The present work introduces new data management algorithmsto address identified input output (I/O) bottlenecks on Mantid.First, we introduce an in-memory binary-tree metadata indexthat resemble NeXus data access patterns to provide a scalablesearch and extraction mechanism. Second, data encapsulationin Mantid algorithms is optimally redesigned to reduce the totalcompute and memory runtime footprint associated with metadataI/O reconstruction tasks. Results from this work show speed upsin wall-clock time on ORNL data reduction workflows, rangingfrom 11% to 30% depending on the complexity of the targetedinstrument-specific data. Nevertheless, we highlight the need formore research to address reduction challenges as experimentaldata volumes increase.
Index Terms —experimental data, reduction workflows, datamanagement, metadata, indexing, Mantid, NeXus, HDF5, neu-tron scattering
This manuscript has been authored by UT-Battelle, LLCunder Contract No. DEAC05-00OR22725 with the U.S. De-partment of Energy. The United States Government retainsand the publisher, by accepting the article for publication,acknowledges that the United States Government retains anonexclusive, paid-up, irrevocable, world-wide license to pub-lish or reproduce the published form of this manuscript, orallow others to do so, for United States Government purposes.The Department of Energy will provide public access to theseresults of federally sponsored research in accordance withthe DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). I. I
NTRODUCTION
Vast quantities of data are produced at two of the largestneutron source facilities in the world hosted at Oak RidgeNational Laboratory (ORNL): the High Flux Isotope Reactor(HFIR) and the Spallation Neutron Source (SNS) [1]. Neutronscattering data produced at ORNL is used to address majorscientific challenges across several industries. Currently ORNLinstruments produce experimental data at a rate of 1.2 TB aday, for a grand total of 1.6 PB, with plans to expand thecurrent volumes as new instruments, for example the VENUSbeamline [1], [2], become available.Instruments at ORNL’s HFIR and SNS facilities record indi-vidual neutron events [3] containing three essential elements: i)detector pixel identifier, ii) total neutrons’ time-of-flight (TOF)from source to detector, and iii) wall-clock time of the protonpulse the neutron is associated with [4]. The vast amountof raw event data is stored using the metadata-rich standardNeXus schema [5], built on top of the self-describing HDF5hierarchical data file format [6]. Each instrument at HFIR andSNS stores a subset of the NeXus schema according to itsapplication. This data is hosted at ORNL computing facilitiesand available to users via remote access for their scientificneeds [7].As shown in Fig. 1, the stored NeXus datasets are loadedfor post-processing by several data reduction workflows usingthe open-source data analysis and visualization Mantid frame-work [8], written in C++ [9]. Mantid is part of an internationalcollaboration between several neutron science facilities aroundthe world; including ORNL’s SNS and HFIR, the ISIS Neutronand Muons Source [10], and The Institut Laue–Langevin(ILL) [11]. Loading NeXus files is an essential component inexisting production data reduction workflows deployed to thefacilities users. Mantid creates an in-memory data structurenamed an “event workspace” to interpret raw event data byloading and processing algorithm operations on NeXus files. a r X i v : . [ c s . D B ] J a n ig. 1: Overview of the central role of the Mantid frameworkin data reduction workflows, from [8]. The step going fromNeXus files to (event) Workspaces has been identified as abottleneck at ORNL.The latter operation has been identified as a major bottleneckin data reduction workflows [12]. Tackling these data I/Obottlenecks is critical for integrating novel paradigms, such asmachine learning algorithms on vast amounts of experimentaldata, as they become more important across neutron scienceapplications [13], [14].As presented by Foster et al. [15], data I/O bottleneckshave been largely identified in high performance computing(HPC) co-design data reduction efforts. Furthermore, Alam etal. [16] refers to the “metadata wall” as one of the criticalaspects in the performance of parallel file systems as morerich self-describing data is produced. Current efforts have beendeployed in HPC systems to address metadata and data relatedbottlenecks at scale, such as ADIOS 2 [17] and ExaHDF5 [18].As described by Zhang et al. [19], proper metadata indexing isessential for efficient search and information discovery as sci-entific applications continue to produce large amounts of data.Much of the data generated from experiments, observations,and simulations is stored using self-describing data formats; inwhich the metadata and data can be accessed efficiently all atonce. However, the authors also argue that very few systematicstudies exist on the discovery of in-memory index strategiesfor different scientific applications. Diederich and Milton [20]proposed the creation of metadata structures that are domain-specific, as opposed to well-established knowledge-based datamodels. The latter metadata problem has been identified inthis work as one of the bottlenecks to address in the targetedneutron scattering data reduction workflows.The present work introduces optimal data managementstrategies to address current I/O bottlenecks in the Mantidframework processing stages of NeXus datasets. In particular,managing the metadata entries in-memory to reduce the com-pute and memory runtime footprint. First, a suitable memory-persistent binary-tree [21] structure is introduced to speed upsearch and extraction operations by using an “absolute-path”metadata entry index that matches processing operations on Mantid. The goal is to replace the current hierarchical ap-proach to reconstruct indices using a “relative-path” approachsimilar to walking through the directories of a file system.This incurs in added cost as memory and disk input output(I/O) resources are used “on-demand”. Second, Mantid’s algo-rithms architecture encapsulation is reformulated to facilitatepersistent data sharing across stages of processing NeXusfiles. These changes in the architecture allow for reusableinformation, thus reducing current bottlenecks associate withcomputing and memory run time footprint.The remainder of the article is organized as follows. Sec-tion II describes the NeXus format used for the raw event-based neutron data stored at ORNL SNS/HFIR facilities anda description of the current data reduction operations andchallenges in the Mantid framework. Section III presents theproposed data management strategies in Mantid: introductionof an in-memory binary-tree metadata indexing structure andmodifying the encapsulation on Mantid algoritms. The impactis shown in section IV illustrating the consistent speed ups ob-tained with the proposed strategy for different SNS and HFIRinstruments. In particular, the small angle neutrons scattering(SANS) reduction workflows of interest. Lastly, conclusionsand future work are presented in section V outlining theneed for further co-design research efforts to provide optimalmanagement strategies for the generated neutron sciencesexperimental data.II. N EUTRON S CATTERING D ATA R EDUCTION W ORKFLOWS
A. The NeXus file format
SNS and HFIR instruments at ORNL use the internationalstandard NeXus schema [5] for storing raw neutron event-based data. NeXus is based on the HDF5 [6] file formatand follows a strict hierarchy for groups, datasets and at-tributes that identify each group of raw event based datafrom a neutron scattering experiment. Typical sizes for eachfile ranges between 0.1 up to 30 GB for each experimentdepending on the complexity of the instrument and the numberof entries of each dataset. These datasets are then stored ina ORNL-hosted warehouse in the neutron science computingfacilities, analysis.sns.gov, which already amounted to 1.6 PBof available experimental raw data as of 2020.The NeXus schema is illustrated in Table I for the metadatastructure saved to an underlying HDF5 file. Each level inthe hierarchy maps to a “group” in the underlying HDF5dataset that is described with a string attribute with key“NX class” to identify the group type according to the datasource of information. Two representative groups are shownfor: i) logs, NX class=NXlog, and ii) bank event data entries,NX class=NXevent data, which represent the majority of theprocessed group data type as described in subsection II-B. Logentries are essentially process variables stored as time-stampeddata which serve as a link to raw event data entries. Actualvalue entries, such as arrays or single values, are representedas scientific datasets (SDS) entries, or NX class=SDS inthe NeXus schema. Thus, SDS entries don’t require explicitttribute annotation in the NeXus stored metadata on-disk asthey map directly to the HDF5 definition of a dataset [6].As a result, NeXus event-based datasets have a hierarchicalmetadata structure in which group types are the first levelsearching criteria.
Data Type Entry Namegroup /entryattribute /entry/NX class...group /entry/DASlogsattribute /entry/DASlogs/NX class → “NXlog”group /entry/DASlogs/BL6:CS:DataTypeattribute /entry/DASlogs/BL6:CS:DataType/NX classdataset (SDS) /entry/DASlogs/BL6:CS:DataType/average valuedataset (SDS) /entry/DASlogs/BL6:CS:DataType/average value error...group /entry/bank1 eventsattribute /entry/bank1 events/NX class → “NXevent data”dataset (SDS) /entry/bank1 events/event iddataset (SDS) /entry/bank1 events/event index...group /entry/bank91 eventsattribute /entry/bank91 events/NX class → “NXevent data”dataset (SDS) /entry/bank91 events/event iddataset (SDS) /entry/bank91 events/event index TABLE I: Schematic representation of the hierarchical NeXusschema [5] for recorded raw event-based neutron data.
B. Data Reduction Workflows on Mantid
NeXus files are processed in data reduction workflows usingthe Mantid framework [8]. These workflows include severalinput NeXus files for the physical interpretation, analysis andvisualization tasks required by users of SNS and HFIR instru-ments. Figure 2 shows the typical user interactions of typicalreduction workflows through the Mantid interface, in whichseveral MB or GB of NeXus data is reduced to a histogramor a pixelated image. Reduction workflows call a single andunified “LoadEventNexus” Mantid function for each NeXusfile. “LoadEventNexus” return an in-memory Mantid structurecalled an “EventWorkspace” which is designed specifically forsorting time-of-flight event histograms [4]. To build a reduced“EventWorkspace”, “LoadEventNexus” requires internal callsto different algorithms processing different parts of the NeXusfile entries. Particularly time consuming algorithms includethose processing logs and bank event data and those formingthe in-memory metadata index at each step for each group.For more details, the reader is referred to the documentationof the “LoadEventNexus” algorithm on Mantid [22].Mantid’s original architectural design isolates each algo-rithmic step shown in Fig. 3, this encapsulation preventssharing “expensive” data resources, such as metadata indexinformation and file handlers, among these steps. Therefore,the existing implementation reconstructs metadata index infor-mation for every group level that is accessed in the search fordata entries. As a result, several extra calls are made to theunderlying HDF5 library tracking the metadata in appropriateb-trees structures [6], as well as increasing the number ofmemory allocations required to reconstruct each hierarchical Fig. 2: Mantid graphical interface illustrating a reduced“EventWorkspace” and physical quantities generated fromNeXus files, from [8]. MantidLoadEventNexusEventWorkspace
LoadLogs NXlogLoadMonitors (optional) NXmonitorLoadGeometry NXgeometryLoadBankData NXevent data
Raw input data < instrument > < run number > .nxs.h5Fig. 3: Mantid’s LoadEventNexus steps for processing entriesof a single input NeXus file identified by instrument andexperiment run number in data reduction workflows.level index. The latter adds to the overall wall-time bottlenecksobserved in several neutron data reduction workflows usingMantid.III. P ROPOSED D ATA M ANAGEMENT S TRATEGY
The present effort introduces new data management strate-gies in the stages of Mantid’s “LoadEventNexus” to addresscurrent I/O bottlenecks. First, an in-memory index binary treestructure is introduced along all the stages of “LoadEvent-Nexus”. The index key consists of a string prefixed withthe entry “type”, the value of the “NX class” attribute foreach entry in Table I, followed by the absolute path to eachX class NXlogNXevent data NXmonitorSDSNXlog /entry/Log4/entry/Log3 /entry/Log6/entry/Log5 /entry/Log7NXevent data /entry/bank4 events/entry/bank3 events /entry/bank6 events/entry/bank5 events /entry/bank7 eventsSDS: Scientific Dataset/entry/bank4 events/../entry/bank3 events/.. /entry/bank6 events/../entry/bank5 events/.. /entry/bank7 events/..Fig. 4: Schematic representation of the efficient binary-tree in-memory index metadata for NeXus files entries classified byNX class types at the top level. Each NX class node (NXlog,NXevent data, SDS) is a binary-tree on its own.entry. Second, the proposed index is generated as soon as aNeXus file is opened and reused in the stages of Mantid’s“LoadEventNexus”. The intention is to replace the currentI/O bottlenecks due to the cost associated with hierarchicalmetadata reconstruction at each NeXus group level, as thoseshown in Table I. The goal is to also match the search andprocessing patterns of NeXus entries in “LoadEventNexus”,as illustrated in Fig 3.Figure 4 and Table II show a schematic representation ofthe proposed index structure. The first search bucket of thisbinary tree is given by the number of entry types (NX classes),which is typically only a few groups in the NeXus file asdescribed in II-A. Each node is a binary-tree on its own, sincethe Scientific Dataset (SDS) type refers to actual values (singleor array values) it is the node with the largest number of entries(NX class-entries). As a result, the complexity of a search fora given entry becomes logarithmic on the number of classesand the number of entries-per-class:
Key: NX class Value: Sorted binary-tree with absolute-path entry keyNXcollection /entry/DASlogs/entry/DASlogs/BL6:CS:DataType/enum/entry/DASlogs/BL6:Chop:Skf1:PhaseLocked/enum/entry/DASlogs/BL6:Chop:Skf2:PhaseLocked/enum...NXdetector /entry/instrument/bank1/entry/instrument/bank2.../entry/instrument/bank48NXlog /entry/DASlogs/BL6:CS:DataType/entry/DASlogs/BL6:CS:beamslit4/entry/DASlogs/BL6:Chop:Skf1:MotorSpeed...NXevent data /entry/bank1 events/entry/bank2 events.../entry/bank48 eventsSDS /entry/DASlogs/BL6:CS:DataType/average value/entry/DASlogs/BL6:CS:DataType/average value error.../entry/bank1 events/event id/entry/bank1 events/event index/entry/bank1 events/event time offset/entry/bank1 events/event time zero/entry/bank1 events/event total counts.../entry/bank48 events/event id/entry/bank48 events/event time offset/entry/bank48 events/event time zero/entry/bank48 events/event total counts...
TABLE II: Resulting in-memory index implementation usingC++’s map
NX classes × NX entries-per-class )) . (1)The resulting index is immediately constructed in-memoryand passed along the algorithms called inside “LoadEvent-Nexus” in Fig. 3. The latter required architectural changes inthe algorithms encapsulation inside Mantid to enable reusabil-ity of “expensive” resources, such as the introduced binarytree index in Fig. 4, to avoid frequent memory allocationoperations.From the implementation perspective, the available datastructures from the C++ standard template library (STL) [9]are used. The end result is a two-step ordered binary tree,“ map LoadEventNexus with in-memory index CPU profiling Search icM..M.. __libc_s..M.. Mantid::DataHandling::.. [c..[..Mantid::DataHandling::ProcessBankData::runDataHandlingTes Ma..[lib.. [..M.. [c..main [c..M.. Ma.. [c..[lib.. [.. [c..Ma.. [..[..Mantid::.. [c..Poco::ThreadImpl::runnableEntry [c..__clone [u..Mantid::D.. Ma.. Ma.. ct..[unknown] CxxT..Lo..Ma.. [c..[c..Ma.._start [u..[c..M.. start_thread [c..[c..H5VM.. [..[c..std::vector Fig. 5: Mantid’s “LoadEventNexus” CPU profiling flamegraph representation for (a) Mantid v5.0, (b) Mantid latestimplementation with our proposed strategy. The reduction ofmetadata-related CPU operations bottlenecks is illustrated inthis comparison.portantly, results and performance reproducibility the readeris referred to the changes on Mantid’s source code: https://github.com/mantidproject/mantid/pull/28495 currently avail-able on Mantid’s latest development branch.IV. I MPACT A. Mantid LoadEventNexus Performance The proposed changes in data management resulted infewer metadata operations inside Mantid’s “LoadEventNexus”.Figure 5 shows the flame graph [23] representation (x-widthillustrates the cost of each function, y-heigth is function callstack) of the CPU profiling for all the existing Mantid testsusing “LoadEvenNexus” for: (a) Mantid v5.0, and (b) Mantidlatest development branch with the proposed improvement.It can be seen that the CPU time spent on tasks relatedto metadata management for entry search have been largelyreduced for a variety of files. NeXus file Entries Size (MB)CG2 8179 (GPSANS) 3,683 62CG2 8947 3,712 725CG3 943 (BIOSANS) 3,203 71CG3 816 3,203 766CG3 1545 3,607 137CG3 1056 3,387 269CG3 1003 3,387 1800CORELLI 83353 2,660 297CORELLI 145950 2,974 510EQSANS 112300 2,529 461EQSANS 113407 2,532 5800NOM 78093 1,572 1100NOM 78106 1,572 488 TABLE III: Summary of selected representative NeXus filesgenerated at ORNL instruments for Mantid’s “LoadEvent-Nexus” performance comparison.Further measurements are provided to understand the im-pact on each individual NeXus file generated from differentSNS and HFIR instruments at ORNL neutron facilities. Eachinstrument generates a set of “runs”, with each run stored asa NeXus file. Selected files are provided in Table IV for avariety of instruments highlighting the different number ofNeXus entries and sizes, which are typically processed withMantid’s “LoadEventNexus”.Performance of Mantid’s “LoadEventNexus” is measuredfor the v5.0 release version and compared against the latestdevelopment branch with the introduced changes from thiswork. As shown in Fig. 6, wall-clock times are reported for“LoadEventNexus” on the NeXus files listed on Table IIIrunning on a AMD Ryzen 7 3700X 8-Core processor, 64 GBof RAM, and a Hitachi HDP72505 500 GB hard drive for filestorage. For completeness, we provide measured wall-clocktimes using “non-cached” files (not previously used), and “hotcached” files (previously used) to cover the different scenariosin which a NeXus file could be retrieved by users. Overall,results in Fig. 6 demonstrate that the changes introducedfrom this work provide a consistent speed up across differentinstrument generated NeXus files. Impact may vary depend-ing on the file characteristics. For example, large files fromEQSANS [24] show little speed up in the wall-clock times,which indicates the need for identifying more bottlenecks in“LoadEventNexus”, while the smaller GPSANS (CG2) [25]files see larger benefits in speed up. More research is neededto understand the relationship between internal compute andI/O algorithms and file characteristics, in particular metadataentries and file sizes, as those shown in Table III. The long-term goal is to co-design efficient data reduction workflows asnew use-cases are identified. B. ORNL Data Reduction Workflows Data reduction workflows are typically composed by ahandful of NeXus files identified by different runs, such asthose presented in Table III. The end result is the reductionof the raw NeXus event-based data into physical quantitiesof interest, in particular histograms and images. The currentalgorithmic improvements are then applied to three data reduc- a)(b) Fig. 6: Comparison of Mantid’s “LoadEventNexus” wall-clocktimes for Mantid v5.0 release and our proposed strategy onMantid’s latest implementation. Results are shown for: (a)non-cached files (run once), (b) cached files showing varyingimprovements across different ORNL instrument generatedfiles.tion workflows of interest for small angle neutron scattering(SANS) instruments at ORNL facilities.Table IV shows results for the SANS instruments reduc-tion workflows running on ORNL production systems atanalysis.sns.gov. The production hardware consist of an IntelXeon CPU E5-2670 v3 with 48 cores equipped with 512GB of RAM. The SANS instruments consist of one time-of-flight instrument, EQSANS [24], where information fromeach event is used to reduce the data; and two monochromaticinstruments, BIOSANS [26] and GPSANS [25], where eventdata is traditionally not used. Relative speed ups are presentedas the ratio of the difference between wall-clock times obtainedwith Mantid v5.0 and the latest development version with thecurrent algorithm improvements divided by the original wall-clock time obtained with Mantid v5.0.These data reduction workflows are composed of differentNeXus files, each representing a run number, thus impactmight vary according to the I/O and computation character-istics ratio from different calls to “LoadEventNexus”. Overall, it can be seen that improvements apply consistently to allthe production workflows when wall-clock times are measuredbefore and after introducing the proposed index structure. Asexpected from our initial single file assessment for “LoadE-ventNexus” in Fig. 6, the GPSANS data reduction workflowsshows the largest improvements, 30% speed ups, which aretypically composed of a large number of entries and smallfile sizes. On the other hand, improvements on EQSANSdata reduction workflows reach a reproducible 10% speedup, which are a more modest improvement as expected fromthe results in Fig. 6. BIOSANS improvements are placed inbetween at 19%, even though the workflow takes the longestas more data process is required. Overall, we proved that theimprovement are universal and impact a wide range of NeXusfiles and their composition in data reduction workflows. ORNL NeXus Max Mantid Mantid RelativeInstrument entries file size v5.0 WC latest WC speedWorkflow approx (MB) time (s) time (s) upGPSANS [25] 3,700 45 58.9 41.8 29%BIOSANS [26] 3,700 444 100.2 80.9 19%EQSANS [24] 2,500 62 99.0 88.0 11% TABLE IV: Overall wall-clock (WC) times comparisonand speed up on production data reduction workflows forSNS/HFIR instruments running on analysis.sns.gov hardwaresystem. V. C ONCLUSIONS This work introduces efficient data management strategiesto address I/O bottlenecks in existing data reduction work-flows at ORNL neutron scattering experimental facilities. Forreproduciblity, the present work is available in the latest devel-opment branch of the Mantid data analysis and visualizationframework. These improvements are also expected to benefitthe larger Mantid community at other neutron source facilitiesaround the world, such as ISIS (UK) and ILL (France), as theyimpact several NeXus files with a wide range of entries and filesizes. Efficient metadata indexing search is introduced usingan entry “absolute path” key binary-tree, while reduction ofCPU runtime and memory footprint is achieved by modifyingthe current encapsulation in Mantid algorithms that processNeXus files. The overall impact on wall-clock time resultsin speed ups ranging from 11% to nearly 30% in currentdata reduction workflows of interest for Small Angle NeutronScattering (SANS) instruments at ORNL. Future directionincludes continuing researching different data managementstrategies to further customize existing reduction workflows.The latter is expected to focus on specific areas such as: eventdata filtering, histograms generation, data storage compression,and machine learning applications.A CKNOWLEDGMENT Work at Oak Ridge National Laboratory was sponsoredby the Division of Scientific User Facilities, Office of BasicEnergy Sciences, US Department of Energy, under Contractno. DE-AC05-00OR22725 with UT-Battelle, LLC. We wouldike thank Dr. Mathieu Doucet, Dr James Kohl, and Mr. RichCrompton of the Neutron Sciences Division at Oak RidgeNational Laboratory for their helpful input to this work.R EFERENCES[1] Oak Ridge National Laboratory, “Neutron Sciences.” [Online].Available: https://neutrons.ornl.gov/[2] H. Bilheux, K. Herwig, S. Keener, and L. Davis, “Overview of theconceptual design of the future venus neutron imaging beam line at thespallation neutron source,” Physics Procedia , vol. 69, pp. 55 – 59, 2015,proceedings of the 10th World Conference on Neutron Radiography(WCNR-10) Grindelwald, Switzerland October 5–10, 2014.[3] G. E. Granroth, K. An, H. L. Smith, P. Whitfield, J. C. Neuefeind,J. Lee, W. Zhou, V. N. Sedov, P. F. Peterson, A. Parizzi, H. Skorpenske,S. M. Hartman, A. Huq, and D. L. Abernathy, “Event-based processingof neutron scattering data at the Spallation Neutron Source,” Journal ofApplied Crystallography , vol. 51, no. 3, pp. 616–629, Jun 2018.[4] P. F. Peterson, S. I. Campbell, M. A. Reuter, R. J. Taylor, and J. Zikovsky,“Event-based processing of neutron scattering data,” Nuclear Instru-ments and Methods in Physics Research Section A: Accelerators, Spec-trometers, Detectors and Associated Equipment , vol. 803, pp. 24 – 28,2015.[5] M. K¨onnecke, F. A. Akeroyd, H. J. Bernstein, A. S. Brewster, S. I.Campbell, B. Clausen, S. Cottrell, J. U. Hoffmann, P. R. Jemian,D. M¨annicke, R. Osborn, P. F. Peterson, T. Richter, J. Suzuki, B. Watts,E. Wintersberger, and J. Wuttke, “The NeXus data format,” Journal ofApplied Crystallography Journal of Physics:Conference Series , vol. 247, p. 012013, oct 2010.[8] O. Arnold, J. Bilheux, J. Borreguero, A. Buts, S. Campbell, L. Chapon,M. Doucet, N. Draper, R. F. Leal], M. Gigg, V. Lynch, A. Markvardsen,D. Mikkelson, R. Mikkelson, R. Miller, K. Palmen, P. Parker, G. Passos,T. Perring, P. Peterson, S. Ren, M. Reuter, A. Savici, J. Taylor, R. Taylor,R. Tolchenov, W. Zhou, and J. Zikovsky, “Mantid—data analysis andvisualization package for neutron scattering and µ sr experiments,” Nuclear Instruments and Methods in Physics Research Section A:Accelerators, Spectrometers, Detectors and Associated Equipment , vol.764, pp. 156 – 166, 2014.[9] B. Stroustrup, The C++ Programming Language , 4th ed. Addison-Wesley Professional, 2013.[10] J. Thomason, “The ISIS Spallation Neutron and Muon Source—Thefirst thirty-three years,” Nuclear Instruments and Methods in PhysicsResearch Section A: Accelerators, Spectrometers, Detectors and Asso-ciated Equipment , vol. 917, pp. 61 – 67, 2019.[11] P. Ageron, “Cold neutron sources at ILL,” Nuclear Instruments andMethods in Physics Research Section A: Accelerators, Spectrometers,Detectors and Associated Equipment , vol. 284, no. 1, pp. 197 – 199,1989.[12] G. Shipman, S. Campbell, D. Dillow, M. Doucet, J. Kohl, G. Granroth,R. Miller, D. Stansberry, T. Proffen, and R. Taylor, “Accelerating DataAcquisition, Reduction, and Analysis at the Spallation Neutron Source,”in , vol. 1, 2014,pp. 223–230.[13] C. Garcia-Cardona and R. Kannan and T. Johnston and T. Proffen andK. Page and S. K. Seal , “Learning to Predict Material Structure fromNeutron Scattering Data,” in , 2019, pp. 4490–4497.[14] B. Sullivan, R. Archibald, V. Vandavasi, P. Langan, L. Coates, andV. Lynch, “Volumetric Segmentation via Neural Networks ImprovesNeutron Crystallography Data Analysis,” in ,2019, pp. 549–555.[15] I. Foster, “Computing just what you need: Online data analysis and re-duction at extreme scales,” in , 2017, pp. 306–306.[16] S. R. Alam, H. N. El-Harake, K. Howard, N. Stringfellow, and F. Verzel-loni, “Parallel i/o and the metadata wall,” in Proceedings of the SixthWorkshop on Parallel Data Storage , ser. PDSW ’11. New York, NY,USA: Association for Computing Machinery, 2011, p. 13–18. [17] W. F. Godoy, N. Podhorszki, R. Wang, C. Atkins, G. Eisenhauer,J. Gu, P. Davis, J. Choi, K. Germaschewski, K. Huck, A. Huebl,M. Kim, J. Kress, T. Kurc, Q. Liu, J. Logan, K. Mehta, G. Ostrouchov,M. Parashar, F. Poeschel, D. Pugmire, E. Suchyta, K. Takahashi,N. Thompson, S. Tsutsumi, L. Wan, M. Wolf, K. Wu, and S. Klasky,“ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management”,” SoftwareX , vol. 12, p. 100561, 2020.[18] Suren Byna, M. Scot Breitenfeld, Bin Dong, Quincey Koziol, ElenaPourmal, Dana Robinson, Jerome Soumagne, Houjun Tang, VenkatramVishwanath, Richard Warren, “ExaHDF5: Delivering Efficient ParallelI/O on Exascale Computing Systems,” Journal of Computer Science andTechnology , vol. 35, no. 1, p. 145, 2020.[19] W. Zhang, S. Byna, C. Niu, and Y. Chen, “Exploring metadata searchessentials for scientific data management,” in , 2019, pp. 83–92.[20] J. Diederich and J. Milton, “Creating domain specific metadata forscientific data and knowledge bases,” IEEE Transactions on Knowledgeand Data Engineering , vol. 3, no. 4, pp. 421–434, 1991.[21] R. Bayer and E. McCreight, “Organization and maintenance of largeordered indices,” in Proceedings of the 1970 ACM SIGFIDET (NowSIGMOD) Workshop on Data Description, Access and Control , ser.SIGFIDET ’70. New York, NY, USA: Association for ComputingMachinery, 1970, p. 107–141.[22] Mantid, “LoadEventNexus v1 Algorithm.” [Online]. Available: https://docs.mantidproject.org/nightly/algorithms/LoadEventNexus-v1.html[23] B. Gregg, “The Flame Graph,” Commun. ACM , vol. 59, no. 6, p.48–57, May 2016. [Online]. Available: https://doi.org/10.1145/2909476[24] J. K. Zhao, C. Y. Gao, and D. Liu, “The extended Q -range small-angle neutron scattering diffractometer at the SNS,” Journal of AppliedCrystallography , vol. 43, no. 5 Part 1, pp. 1068–1077, Oct 2010.[25] K. D. Berry, K. M. Bailey, J. Beal, Y. Diawara, L. Funk, J. S. Hicks,A. Jones, K. C. Littrell, S. Pingali, P. Summers, V. S. Urban, D. H.Vandergriff, N. H. Johnson, and B. J. Bradley, “Characterization of theneutron detector upgrade to the GP-SANS and Bio-SANS instruments atHFIR,” Nuclear Instruments and Methods in Physics Research SectionA: Accelerators, Spectrometers, Detectors and Associated Equipment ,vol. 693, pp. 179 – 185, 2012.[26] W. T. Heller, V. S. Urban, G. W. Lynn, K. L. Weiss, H. M. O’Neill, S. V.Pingali, S. Qian, K. C. Littrell, Y. B. Melnichenko, M. V. Buchanan,D. L. Selby, G. D. Wignall, P. D. Butler, and D. A. Myles, “TheBio-SANS instrument at the High Flux Isotope Reactor of Oak RidgeNational Laboratory,”