aa r X i v : . [ h e p - e x ] S e p Data Preservation at MINERvA
R. Fine, ∗ B. Messerly, † and K.S. McFarland, on behalf of the MINERvA Collaboration ‡ University of Rochester, Rochester, New York 14627 USA University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA (Dated: August 31, 2020)Over the past ten years, MINERvA has collected an accelerator neutrino interaction datasetthat is uniquely relevant to the energy range of DUNE. These are the only currently availabledata at intermediate and high momentum transfers for multiple nuclear targets in the same beam.MINERvA is undertaking a campaign to preserve these data and make them publicly available sothat they may be analyzed beyond the end of the MINERvA collaboration. We encourage thecommunity to consider the development of centralized resources to enable long-term access to thesedata and analysis tools for the entire HEP community.
I. TECHNICAL STRATEGY
The MINERvA data preservation project consistsof three components: (1) preservation of MINERvAdata into a single ROOT tuple that incorporates low-and high-level reconstructed objects; (2) the MINERvAAnalysis Toolkit (MAT) – a broadly applicable HEP soft-ware toolkit for calculating systematic uncertainties us-ing event tuple objects; (3) a software package built onthe MAT for reproducing MINERvA published results,which includes templates for performing new analyses.To date, each published MINERvA analysis has em-ployed its own tailored strategy for preparing ROOTtuples that feed into the macro (event-loop) stage ofanalysis. They are prepared using the Gaudi frame-work [1], and apply analysis-specific reconstruction tech-niques to commonly calibrated and prepared low-leveldata. Historically, this has served MINERvA well, asit enables parallel development of distinct reconstructiontechniques and decentralizes the production of analysistuples. The MINERvA analysis program has reached alevel of maturity at which we can now summarize the re-construction for a broad variety of final states in a unifiedanalysis tuple. This structure will support a large num-ber of analyses using a smaller disk footprint than wouldbe used by separately preserving tuples which are special-ized to individual analyses. This approach also obviatesthe need to develop new Gaudi routines, which furtherreduces the computational resources needed to performfuture analyses. We will include in these tuples low-levelreconstruction objects that could, in principle, be usedfor novel reconstructions.It is a common analysis strategy in HEP experimentsto estimate systematic uncertainties by simulating theexperiment many times in a multitude of systematic uni-verses. In MINERvA analyses, systematic universes in-corporate the effects of systematic uncertainties by in-serting variations directly into physics distributions at all ∗ Now at Los Alamos National Laboratory † Now at University of Minnesota ‡ Please direct correspondence to fi[email protected], [email protected] stages of the analyses. For example, a typical analysis atMINERvA combines the distributions of signal and back-ground events as functions of some kinematic variable, S ( x ) , B ( x ), the efficiency of performing that selection, ǫ ( x ), and the integrated flux, Φ, to extract a differentialcross section, dσdx ∝ S ( x ) − B ( x ) ǫ ( x )Φ . Each of these inputs iscalculated independently in O (100) distinct systematicuniverses, and is stored in a modified version of ROOT’sTH { } D class. Using these objects, it is straightfor-ward to calculate the uncertainty arising from any subsetof the systematic variations for any of these physics dis-tributions, or any distribution derived from them. Sim-ilarly, any analysis technique used in the extraction ofa cross section, such as the construction of backgroundsideband constraints, is performed independently in eachsystematic universe. A suite of custom C++ classes fa-cilitates the execution of this strategy, and streamlinesthe evaluation of systematic uncertainty across all MIN-ERvA analyses. Collectively, we refer to these classesas the MINERvA Analysis Toolkit (MAT). As part ofour data preservation effort, we intend to make the MATpublicly available and we encourage its adoption by otherneutrino experiments.Using the methods provided by the MAT, writing eventloops is straightforward. Within the loop over a ROOTtuple, there is a loop over systematic universes. In eachsystematic universe, cuts are applied and histograms cor-responding to various kinematic variables are filled. Thispreserves the event-by-event effects of the systematicvariations across all bins of each histogram constructed.Whereas a systematics-agnostic user would fill a TH1D,for example, in this event loop, the user instead fillsan “MnvH1D”, which maps a TH1D to each systematicvariation. Downstream, the MnvH1D supports the stan-dard operations available for a TH1D and executes themacross all systematic universes. In general, a systematicvariation may modify the value of a kinematic variable,and alter an event selection through a cut placed on thatkinematic variable. By incorporating the handling of sys-tematic universes into the event loop, this approach guar-antees that kinematic variables are shifted appropriatelyand that the effects of those shifts are propagated to alldownstream aspects of the analysis. As part of our datapreservation campaign, we are developing software us-ing this approach that will easily reproduce MINERvAanalyses and enable future modifications to them. Fu-ture use-cases may include the modification of an eventselection, the construction of a new observable, or theimplementation of a new signal definition. Future userswill be able to make any of these modifications and re-extract cross sections. For example, a future user may beinterested in adding a final state neutron requirement to ν µ CCQE-like final states, defining the transverse anglebetween that neutron’s direction and the summed p T ofthe leading proton and muon, and then extracting a dif-ferential cross-section with respect to that new variable. II. APPLICATIONS OF MINERVA DATA
MINERvA completed data-taking in 2019, and expectsthe number of new analyses undertaken by the collabo-ration to dramatically curtail beginning in 2021. Oncethe DUNE near detector begins to collect data (later inthis decade [2]), its data set (FHC) will likely exceedthe size of MINERvA’s within one year of operations [3].Thus, there will be at least five years during which theMINERvA data set provides the community’s only op-portunity to study neutrino interactions at intermediateand high momentum transfers for multiple nuclear tar-gets in the same beam. The MINERvA collaborationis likely to continue in a less active configuration dur-ing this interim period, but recognizes that it will haveinsufficient resources available to address all questionsthat may arise. Therefore, we believe it is important tomaintain access to the data MINERvA has collected tocontinue probing the interaction models that will be usedby DUNE in the measurement of CP violation and otherneutrino phenomena.In recent years, MINERvA has led the field in prob-ing these models in the few-GeV regime. To date wehave published 31 cross section and flux measurementsusing our “Low Energy” (LE; h E ν i ∼ h E ν i ∼ ν µ interactions in the active ( CH ) regionof the detector, and roughly half as many ¯ ν µ interactions.There are an additional O (1 million) interactions in thepassive nuclear target region of the detector, which in-cludes He , H O , C , F e , and
P b . These data can beanalyzed to support the construction of interaction mod-els that span nuclei both larger than and smaller thanargon. Though we have used and continue to use thesedata, current efforts are limited by the human resourcesavailable to analyze them.In the coming years, neutrino interaction models mustbe improved to ensure the success of DUNE’s ambitiousphysics program. Until then, MINERvA will offer thelargest and most relevant neutrino interaction dataset, against which such models can be tested. New, discrimi-nating analysis techniques are continually being inventedand refined, and the MINERvA data have the flexibilityto be used for studying new observables. For example,consider the analysis technique in which transverse kine-matic imbalance is used to probe intranuclear dynamicsin neutrino interactions [4–14] or the absence thereof [15–19]. This technique has provided a new handle on probingnuclear effects, but has only been utilized in the analy-sis of modern data sets. Evidently, the infrastructure isnot available to re-analyze historical data at this level ofdetail. We believe that both access to the data and toan infrastructure to facilitate its analysis are necessarycomponents of a successful preservation campaign. Asdescribed in Section I, our data preservation strategy in-cludes support for the calculation of new kinematic vari-ables and access to software which will enable reproduc-tion of a wide range of current MINERvA analyses. Weexpect this to serve as a launching point for the reanalysisof existing selections to include new kinematic variablesor to test against new interaction models. We also expectthat future analyzers may modify existing selections tomeasure additional classes of neutrino interactions. Forexample, a future analyzer may wish to further constrainour ν µ CCQE-like selection or to perform an exclusive ν e analysis. Given the recent advances in machine learning,we also plan to provide tools for turning our events intoimages that can be used in machine learning research. III. DATA PRESERVATION AND SNOWMASS
We intend to make all aspects of our data preservationproduct publicly available and documented sufficientlythat a trained experimental neutrino physicist could, inprinciple, use them. However, we acknowledge that sucha goal has not yet been realized by any modern neutrinoexperiments, and we worry that future analysis of ourdata may not be viable without some involvement fromMINERvA collaborators. As a practical matter, MIN-ERvA has always had the concept of limited authorshipwherein temporary collaborators use our data to performa specific measurement. We expect support for this anal-ysis approach to extend beyond the current phase of ourcollaboration, so we expect our data to be useful to thecommunity in the few-year time frame. For our data tobe useful farther into the future, or for it to be usablewithout involvement from current MINERvA collabora-tors, we believe that additional resources will be required.Because MINERvA is a scintillator-based experiment,the disk footprint for storing the entirety of our data issmall compared to some more recent neutrino interac-tion experiments. We currently expect the total size ofour data set (FHC+RHC, LE+ME, Data+Simulation)will be ∼
10 TB. The corresponding computational re-sources required to loop through these data vary withthe complexity of the analysis, and in particular its di-mensionality. For reference, a one-dimensional analysisfilling O (10) histograms can run over the entire FHC MEdata set in ∼ [1] G. Barrand et al. , Comput. Phys. Commun. , 45 (2001).[2] R. Acciarri et al. (DUNE), (2016),arXiv:1601.05471 [physics.ins-det].[3] C. M. Marshall, K. S. McFarland, andC. Wilkinson, Phys. Rev. D , 032002 (2020),arXiv:1910.10996 [hep-ex].[4] X. G. Lu, L. Pickering, S. Dolan, G. Barr,D. Coplowe, Y. Uchida, D. Wark, M. Wascko, A. We-ber, and T. Yuan, Phys. Rev. C , 015503 (2016),arXiv:1512.05748 [nucl-th].[5] A. P. Furmanski and J. T.Sobczyk, Phys. Rev. C , 065501 (2017),arXiv:1609.03530 [hep-ex].[6] K. Abe et al. (T2K), Phys. Rev. D , 032003 (2018),arXiv:1802.05078 [hep-ex].[7] S. Dolan, U. Mosel, K. Gallmeister, L. Picker-ing, and S. Bolognesi, Phys. Rev. C , 045502 (2018),arXiv:1804.09488 [hep-ex].[8] X. Lu et al. (MINERvA),Phys. Rev. Lett. , 022504 (2018),arXiv:1805.05486 [hep-ex].[9] S. Dolan, (2018), arXiv:1810.06043 [hep-ex].[10] X. Lu and J. T. Sobczyk,Phys. Rev. C , 055504 (2019),arXiv:1901.06411 [hep-ph].[11] L. Harewood and R. Gran, (2019),arXiv:1906.10576 [hep-ex].[12] T. Cai, X. Lu, and D. Ruter-bories, Phys. Rev. D , 073010 (2019),arXiv:1907.11212 [hep-ex].[13] T. Cai et al. (MINERvA),Phys. Rev. D , 092001 (2020),arXiv:1910.08658 [hep-ex].[14] D. Coplowe et al. (MINERvA), (2020),arXiv:2002.05812 [hep-ex].[15] X.-G. Lu, D. Coplowe, R. Shah, G. Barr, D. Wark,and A. Weber, Phys. Rev. D , 051302 (2015),arXiv:1507.00967 [hep-ex].[16] H. Duyang, B. Guo, S. Mishra, and R. Petti, (2018),arXiv:1809.08752 [hep-ph].[17] H. Duyang, B. Guo, S. Mishra, andR. Petti, Phys. Lett. B , 424 (2019),arXiv:1902.09480 [hep-ph].[18] L. Munteanu, S. Suvorov, S. Dolan, D. Sgalaberna,S. Bolognesi, S. Manly, G. Yang, C. Giganti, K. Iwamoto,and C. Jess-Valls, Phys. Rev. D , 092003 (2020),arXiv:1912.01511 [physics.ins-det].[19] P. Hamacher-Baumann, X. Lu, and J. Martn-Albo, Phys. Rev. D102