ROOT I/O compression improvements for HEP analysis
Oksana Shadura, Brian Paul Bockelman, Philippe Canal, Danilo Piparo, Zhe Zhang
RROOT I/O compression improvements for HEP analysis
Oksana
Shadura , ∗ Brian Paul
Bockelman , ∗∗ Philippe
Canal , ∗∗∗ Danilo
Piparo , ∗∗∗∗ and Zhe
Zhang , † University of Nebraska-Lincoln, 1400 R St, Lincoln, NE 68588, United States Morgridge Institute for Research, 330 N Orchard St, Madison, WI 53715, United States Fermilab, Kirk Road and Pine St, Batavia, IL 60510, United States CERN, Meyrin 1211, Geneve, Switzerland
Abstract.
We overview recent changes in the ROOT I / O system, increasing per-formance and enhancing it and improving its interaction with other data analy-sis ecosystems. Both the newly introduced compression algorithms, the muchfaster bulk I / O data path, and a few additional techniques have the potential tosignificantly to improve experiment’s software performance.The need for e ffi cient lossless data compression has grown significantly as theamount of HEP data collected, transmitted, and stored has dramatically in-creased during the LHC era. While compression reduces storage space and,potentially, I / O bandwidth usage, it should not be applied blindly: there are sig-nificant trade-o ff s between the increased CPU cost for reading and writing filesand the reduce storage space. In the past years LHC experiments are commissioned and now manages about an exabyte ofstorage for analysis purposes, approximately half of which is used for archival purposes, andhalf is used for traditional disk storage. Meanwhile for HL-LHC storage requirements peryear are expected to be increased by factor 10 [1].Looking at these predictions, we would like to state that storage will remain one of themajor cost drivers and at the same time the bottlenecks for HEP computing. It means that newstorage and data management techniques, as well as compression algorithms, are likely to berequired to remove a cost bottleneck together with storage and analysis computing costs to beable to handle expected data ratios and data volumes needed to be processed by experimentsduring HL-LHC[1].Looking into innovative compression algorithms could help to resolve some problems,such as improving user analysis, removing decompression speed bottleneck, while maintain-ing the same or better compression data ratios. Zstandard [5] is a dictionary-type algorithm(LZ77) with a large search window and fast implementations of entropy coding stage, usingeither fast Finite State Entropy (tANS) or Hu ff man coding. Zstandard referred to as zstd, ∗ e-mail: [email protected] ∗∗ e-mail: [email protected] ∗∗∗ e-mail: [email protected] ∗∗∗∗ e-mail: [email protected] † e-mail: [email protected] a r X i v : . [ c s . OH ] A p r s a much more modern compressor comparing to Zlib, which was initially implemented in1995, and which o ff ers higher compression rates while using less CPU compared to othercompression algorithms. ZSTD is available as a ROOT supported compression algorithm,starting from ROOT 6.20.00 release. [3] Three years ago, Facebook [7] open-sourced Zstandard, an innovative data compression so-lution that o ff ers a performance. It is largely supported by the community and continuouslysupported as well as enhanced by ZSTD authors, who released a variety of advanced capa-bilities, such as improved decompression speed and better compression ratios.The initial promise of Zstandard was that it would allow users to replace their existingdata compression implementation, such as ZLIB, for one with significant improvements oncompression speed, compression ratio, and as well decompression speed. [6]In addition to replacing ZLIB, ZSTD has taken over many of the tasks that traditionallyrelied on fast compression alternatives. Fastest compression is still provided by LZ4 (forthe fastest compression settings), while ZSTD provides a twice size better compression ra-tio. According to reports from the community, it is slowly replacing the strong compressionscenarios previously served by XZ (or LZMA) [2], with the benefit of 10 times faster de-compression speed. According to reports from Facebook, with all these use cases combined,ZSTD now is processing a significant amount of data every day at Facebook.Zstandard can use a "dictionary" format to make compression of files of an already knowntype in a more e ffi cient way. Here a dictionary is a file that stores the compression settingsfor small files. Compression dictionary is assembled from a group of typically small filesthat contain similar information, preferably over 100 files. For the best e ffi ciency, their com-bined size should be about one hundred times the size of the dictionary produced from them.In general, the smaller the file, the greater the improvement in compression. According tothe zstd manual page, a dictionary can only increase the compression of a 64KB file by 10percent, compared with a 500 percent improvement for a file of less than 1KB [6]. In this section, we will try to focus on the evaluation of compression of most used analysis-related formats in CMS, NanoAOD [9] and MiniAOD [8], as well as a simple case of analysisfile used by the LHCb experiment.The MiniAOD is a new high-level CMS data file that was introduced in 2014 to serve theneeds of the mainstream physics analyses while keeping a small event size - only 30-50 KBper event. It is not readable with bare ROOT and requires special CMSSW setup to be ableto read it. Meanwhile, NanoAOD format consists of a Ntuple like format, readable with bareROOT and containing the per-event information that is needed in most generic analyses. Thesize per event of NanoAOD is the order of 1KB. NanoAODs are usually centrally producedor even produced on-demand with di ff erent variations of features or columns required bydi ff erent physics analysis groups. Users can as well easily extend NanoAOD for their specificstudies making a private production when needed.For CMS NanoAOD files, using ZSTD could be a better compromise between size offile on a disk and decompression speed for a faster analysis as well as better compressionratio and faster decompression then ZLIB and 6x faster comparing to LZMA, while filecompressed with ZSTD is only 20 % bigger size (all results are shown on the Figure 2 and1). igure 1. Comparison of compression ratio and decompression speed for ZLIB, LZMA and ZSTDalgorithms for NanoAOD 2019 file
For MiniAOD, measured time spend in decompressing on readback is less comparingto LZMA, while the size of the file with ZSTD is only 10% bigger.
Figure 2.
Comparison of compression ratio and decompression speed for all compression algorithmsfor NanoAOD 2019 file. igure 3.
Comparison of compression ratio and decompression speed for all compression algorithmsfor LHCb file.
In case of LHCb, for the very simple NTuples with a simple structure, the best choicecould be LZ4 compression algorithm, o ff ering time faster read speed (all results areshown on the Figure 3).In ROOT, the serialization of variable-sized data (containing C-style arrays) produces twointernal arrays: one contains the branch data for each of events while the other contains thebyte o ff set of each of events in the branch data. LZ4 compression algorithm achieves its per-formance by looking for byte-aligned patterns (as opposed to ZLIB compression algorithm,which works on individual bits) and lacks the Hu ff man encoding pass, this results in the o ff -set array sequence being e ff ectively incompressible using LZ4. ZSTD has no problems withcompression of data that contains the byte o ff set of each event in the branch data (vs LZ4)(all results are shown on the Figure 4). TTrees can be forced to create only the new baskets at event cluster boundaries, using a
TTree::kOnlyFlushAtCluster feature. It simplifies file layout and I / O at the cost of memory.For example for the
TTree::kOnlyFlushAtCluster feature tests shown in Figure 5, NanoAOD2017 was bigger only by 3.6 % of size, while decompression speed is improved almost up to200 MB / s [10]. T T ree :: kOnl y FlushAtCluster is recommended for simple file formats such as ntupleswhere it can show really interesting improvementsts, but not for more complex data types. igure 4.
Comparison ratio comparison for custom analysis file with variable-sized data (containingC-style arrays).
Figure 5.
Comparison of decompression speed for two file samples NanoAOD 2017, with and without
T T ree :: kOnl y FlushAtCluster option.
Some time ago, Bitshu ffl e pre-conditioner was demonstrated as a possible pre-conditionerfor ROOT data with LZ4 for lossless compression. To improve the performance of LZ4in this case, we investigated the combination of LZ4 with various “pre-conditioners”. Pre-conditioners transform the sequence of input bytes according to a simple, deterministic algo-rithm before applying the compression algorithm. The two algorithms investigated, inspiredby the Blosc library, are Shu ffl e and BitShu ffl e. Both pre-conditioners rearrange the inputarray’s bytes by reading through the data using fixed strides. The resulting output of the pre-onditioner often contains long sequences of repeated bytes, improving the compression ratiofor LZ4. One of the issues exposed was that it is di ffi cult for ROOT to compress its bu ff ersnow due to its 9-byte header [10].The idea of using pre-conditioners could be easily expanded to be used with other algo-rithms, such as ZSTD. The next goal of the project will be to validate the possibility to usepre-conditioners in the ROOT compression layer used to compress both ROOT file formats(TTree and RNTuple) for the fastest ROOT compression algorithms: LZ4, ZSTD.Another interesting investigation could be to extend pre-conditioners to support newBYTE_STREAM_SPLIT encoding that improves compression ratio and compression speedfor certain types of floating-point data where the upper-most bytes of values do not changemuch. The existing compressors and encodings in ROOT do not perform well for such datadue to noise in the mantissa bytes. The new encoding improves results by extracting thewell compressible bytes into separate byte streams which can be afterward compressed by acompressor like ZSTD [12]. ZSTD has been successfully evaluated and ready to be used for compression of data analysisformats for future LHC Runs in experiments.
This work has been supported by U.S. National Science Foundation grants OAC-1450323.
References [1] Elsen, Eckhard. "A Roadmap for HEP Software and Computing R&D for the 2020s."(2019): 16.[2] XZ Utils. https: // tukaani.org / xz / . Accessed 6 Mar. 2020.[3] R. Brun, F. Rademakers, ROOT - An Object Oriented Data Analysis Framework ,Nucl. Inst. & Meth. in Phys. Res. A (Proceedings AIHENP’96 Workshop,1997).[4] Facebook Github organization. GitHub, https: // github.com / facebook. Accessed 22Feb. 2020.[5] Facebook / Zstd. 2015. Facebook, 2020. GitHub, https: // github.com / facebook / zstd.[6] Collet, Y., and M. Kucherawy. "Zstandard Compression and the application / zstd Me-dia Type." RFC 8478 (2018).[7] “Zstandard: How Facebook Increased Compression Speed.” Facebook Engineering,19 Dec. 2018, https: // engineering.fb.com / core-data / zstandard / .[8] Petrucciani, Giovanni, Andrea Rizzi, and Carl Vuosalo. "Mini-AOD: A new analysisdata format for CMS." Journal of Physics: Conference Series. Vol. 664. No. 7. IOPPublishing, 2015.[9] Rizzi, Andrea, Giovanni Petrucciani, and Marco Peruzzi. "A further reduction inCMS event data for analysis: the NANOAOD format." EPJ Web of Conferences.Vol. 214. EDP Sciences, 2019.[10] Shadura, Oksana, and Brian Paul Bockelman. "ROOT I / O compression algorithmsand their performance impact within Run 3." arXiv preprint arXiv:1906.04624(2019).[11] Canal, Philippe, Brian Bockelman, and René Brun. "ROOT I / O: The fast and furi-ous." Journal of Physics: Conference Series. Vol. 331. No. 4. IOP Publishing, 2011.12] “Apache / Parquet-Format.” GitHub, https: // github.com / apache //