[PDF] ROOT I/O compression improvements for HEP analysis

Abstract

We overview recent changes in the ROOT I/O system, increasing performance and enhancing it and improving its interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiment's software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically increased during the LHC era. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly: there are significant trade-offs between the increased CPU cost for reading and writing files and the reduce storage space.

Full PDF

RROOT I/O compression improvements for HEP analysis

Oksana

Shadura , ∗ Brian Paul

Bockelman , ∗∗ Philippe

Canal , ∗∗∗ Danilo

Piparo , ∗∗∗∗ and Zhe

Zhang , † University of Nebraska-Lincoln, 1400 R St, Lincoln, NE 68588, United States Morgridge Institute for Research, 330 N Orchard St, Madison, WI 53715, United States Fermilab, Kirk Road and Pine St, Batavia, IL 60510, United States CERN, Meyrin 1211, Geneve, Switzerland

Abstract.

We overview recent changes in the ROOT I / O system, increasing per-formance and enhancing it and improving its interaction with other data analy-sis ecosystems. Both the newly introduced compression algorithms, the muchfaster bulk I / O data path, and a few additional techniques have the potential tosigniﬁcantly to improve experiment’s software performance.The need for e ﬃ cient lossless data compression has grown signiﬁcantly as theamount of HEP data collected, transmitted, and stored has dramatically in-creased during the LHC era. While compression reduces storage space and,potentially, I / O bandwidth usage, it should not be applied blindly: there are sig-niﬁcant trade-o ﬀ s between the increased CPU cost for reading and writing ﬁlesand the reduce storage space. In the past years LHC experiments are commissioned and now manages about an exabyte ofstorage for analysis purposes, approximately half of which is used for archival purposes, andhalf is used for traditional disk storage. Meanwhile for HL-LHC storage requirements peryear are expected to be increased by factor 10 [1].Looking at these predictions, we would like to state that storage will remain one of themajor cost drivers and at the same time the bottlenecks for HEP computing. It means that newstorage and data management techniques, as well as compression algorithms, are likely to berequired to remove a cost bottleneck together with storage and analysis computing costs to beable to handle expected data ratios and data volumes needed to be processed by experimentsduring HL-LHC[1].Looking into innovative compression algorithms could help to resolve some problems,such as improving user analysis, removing decompression speed bottleneck, while maintain-ing the same or better compression data ratios. Zstandard [5] is a dictionary-type algorithm(LZ77) with a large search window and fast implementations of entropy coding stage, usingeither fast Finite State Entropy (tANS) or Hu ﬀ man coding. Zstandard referred to as zstd, ∗ e-mail: [email protected] ∗∗ e-mail: [email protected] ∗∗∗ e-mail: [email protected] ∗∗∗∗ e-mail: [email protected] † e-mail: [email protected] a r X i v : . [ c s . OH ] A p r s a much more modern compressor comparing to Zlib, which was initially implemented in1995, and which o ﬀ ers higher compression rates while using less CPU compared to othercompression algorithms. ZSTD is available as a ROOT supported compression algorithm,starting from ROOT 6.20.00 release. [3] Three years ago, Facebook [7] open-sourced Zstandard, an innovative data compression so-lution that o ﬀ ers a performance. It is largely supported by the community and continuouslysupported as well as enhanced by ZSTD authors, who released a variety of advanced capa-bilities, such as improved decompression speed and better compression ratios.The initial promise of Zstandard was that it would allow users to replace their existingdata compression implementation, such as ZLIB, for one with signiﬁcant improvements oncompression speed, compression ratio, and as well decompression speed. [6]In addition to replacing ZLIB, ZSTD has taken over many of the tasks that traditionallyrelied on fast compression alternatives. Fastest compression is still provided by LZ4 (forthe fastest compression settings), while ZSTD provides a twice size better compression ra-tio. According to reports from the community, it is slowly replacing the strong compressionscenarios previously served by XZ (or LZMA) [2], with the beneﬁt of 10 times faster de-compression speed. According to reports from Facebook, with all these use cases combined,ZSTD now is processing a signiﬁcant amount of data every day at Facebook.Zstandard can use a "dictionary" format to make compression of ﬁles of an already knowntype in a more e ﬃ cient way. Here a dictionary is a ﬁle that stores the compression settingsfor small ﬁles. Compression dictionary is assembled from a group of typically small ﬁlesthat contain similar information, preferably over 100 ﬁles. For the best e ﬃ ciency, their com-bined size should be about one hundred times the size of the dictionary produced from them.In general, the smaller the ﬁle, the greater the improvement in compression. According tothe zstd manual page, a dictionary can only increase the compression of a 64KB ﬁle by 10percent, compared with a 500 percent improvement for a ﬁle of less than 1KB [6]. In this section, we will try to focus on the evaluation of compression of most used analysis-related formats in CMS, NanoAOD [9] and MiniAOD [8], as well as a simple case of analysisﬁle used by the LHCb experiment.The MiniAOD is a new high-level CMS data ﬁle that was introduced in 2014 to serve theneeds of the mainstream physics analyses while keeping a small event size - only 30-50 KBper event. It is not readable with bare ROOT and requires special CMSSW setup to be ableto read it. Meanwhile, NanoAOD format consists of a Ntuple like format, readable with bareROOT and containing the per-event information that is needed in most generic analyses. Thesize per event of NanoAOD is the order of 1KB. NanoAODs are usually centrally producedor even produced on-demand with di ﬀ erent variations of features or columns required bydi ﬀ erent physics analysis groups. Users can as well easily extend NanoAOD for their speciﬁcstudies making a private production when needed.For CMS NanoAOD ﬁles, using ZSTD could be a better compromise between size ofﬁle on a disk and decompression speed for a faster analysis as well as better compressionratio and faster decompression then ZLIB and 6x faster comparing to LZMA, while ﬁlecompressed with ZSTD is only 20 % bigger size (all results are shown on the Figure 2 and1). igure 1. Comparison of compression ratio and decompression speed for ZLIB, LZMA and ZSTDalgorithms for NanoAOD 2019 ﬁle

For MiniAOD, measured time spend in decompressing on readback is less comparingto LZMA, while the size of the ﬁle with ZSTD is only 10% bigger.

Figure 2.

Comparison of compression ratio and decompression speed for all compression algorithmsfor NanoAOD 2019 ﬁle. igure 3.

Comparison of compression ratio and decompression speed for all compression algorithmsfor LHCb ﬁle.

In case of LHCb, for the very simple NTuples with a simple structure, the best choicecould be LZ4 compression algorithm, o ﬀ ering time faster read speed (all results areshown on the Figure 3).In ROOT, the serialization of variable-sized data (containing C-style arrays) produces twointernal arrays: one contains the branch data for each of events while the other contains thebyte o ﬀ set of each of events in the branch data. LZ4 compression algorithm achieves its per-formance by looking for byte-aligned patterns (as opposed to ZLIB compression algorithm,which works on individual bits) and lacks the Hu ﬀ man encoding pass, this results in the o ﬀ -set array sequence being e ﬀ ectively incompressible using LZ4. ZSTD has no problems withcompression of data that contains the byte o ﬀ set of each event in the branch data (vs LZ4)(all results are shown on the Figure 4). TTrees can be forced to create only the new baskets at event cluster boundaries, using a

TTree::kOnlyFlushAtCluster feature. It simpliﬁes ﬁle layout and I / O at the cost of memory.For example for the

TTree::kOnlyFlushAtCluster feature tests shown in Figure 5, NanoAOD2017 was bigger only by 3.6 % of size, while decompression speed is improved almost up to200 MB / s [10]. T T ree :: kOnl y FlushAtCluster is recommended for simple ﬁle formats such as ntupleswhere it can show really interesting improvementsts, but not for more complex data types. igure 4.

Comparison ratio comparison for custom analysis ﬁle with variable-sized data (containingC-style arrays).

Figure 5.

Comparison of decompression speed for two ﬁle samples NanoAOD 2017, with and without

T T ree :: kOnl y FlushAtCluster option.

Some time ago, Bitshu ﬄ e pre-conditioner was demonstrated as a possible pre-conditionerfor ROOT data with LZ4 for lossless compression. To improve the performance of LZ4in this case, we investigated the combination of LZ4 with various “pre-conditioners”. Pre-conditioners transform the sequence of input bytes according to a simple, deterministic algo-rithm before applying the compression algorithm. The two algorithms investigated, inspiredby the Blosc library, are Shu ﬄ e and BitShu ﬄ e. Both pre-conditioners rearrange the inputarray’s bytes by reading through the data using ﬁxed strides. The resulting output of the pre-onditioner often contains long sequences of repeated bytes, improving the compression ratiofor LZ4. One of the issues exposed was that it is di ﬃ cult for ROOT to compress its bu ﬀ ersnow due to its 9-byte header [10].The idea of using pre-conditioners could be easily expanded to be used with other algo-rithms, such as ZSTD. The next goal of the project will be to validate the possibility to usepre-conditioners in the ROOT compression layer used to compress both ROOT ﬁle formats(TTree and RNTuple) for the fastest ROOT compression algorithms: LZ4, ZSTD.Another interesting investigation could be to extend pre-conditioners to support newBYTE_STREAM_SPLIT encoding that improves compression ratio and compression speedfor certain types of ﬂoating-point data where the upper-most bytes of values do not changemuch. The existing compressors and encodings in ROOT do not perform well for such datadue to noise in the mantissa bytes. The new encoding improves results by extracting thewell compressible bytes into separate byte streams which can be afterward compressed by acompressor like ZSTD [12]. ZSTD has been successfully evaluated and ready to be used for compression of data analysisformats for future LHC Runs in experiments.

This work has been supported by U.S. National Science Foundation grants OAC-1450323.

References [1] Elsen, Eckhard. "A Roadmap for HEP Software and Computing R&D for the 2020s."(2019): 16.[2] XZ Utils. https: // tukaani.org / xz / . Accessed 6 Mar. 2020.[3] R. Brun, F. Rademakers, ROOT - An Object Oriented Data Analysis Framework ,Nucl. Inst. & Meth. in Phys. Res. A (Proceedings AIHENP’96 Workshop,1997).[4] Facebook Github organization. GitHub, https: // github.com / facebook. Accessed 22Feb. 2020.[5] Facebook / Zstd. 2015. Facebook, 2020. GitHub, https: // github.com / facebook / zstd.[6] Collet, Y., and M. Kucherawy. "Zstandard Compression and the application / zstd Me-dia Type." RFC 8478 (2018).[7] “Zstandard: How Facebook Increased Compression Speed.” Facebook Engineering,19 Dec. 2018, https: // engineering.fb.com / core-data / zstandard / .[8] Petrucciani, Giovanni, Andrea Rizzi, and Carl Vuosalo. "Mini-AOD: A new analysisdata format for CMS." Journal of Physics: Conference Series. Vol. 664. No. 7. IOPPublishing, 2015.[9] Rizzi, Andrea, Giovanni Petrucciani, and Marco Peruzzi. "A further reduction inCMS event data for analysis: the NANOAOD format." EPJ Web of Conferences.Vol. 214. EDP Sciences, 2019.[10] Shadura, Oksana, and Brian Paul Bockelman. "ROOT I / O compression algorithmsand their performance impact within Run 3." arXiv preprint arXiv:1906.04624(2019).[11] Canal, Philippe, Brian Bockelman, and René Brun. "ROOT I / O: The fast and furi-ous." Journal of Physics: Conference Series. Vol. 331. No. 4. IOP Publishing, 2011.12] “Apache / Parquet-Format.” GitHub, https: // github.com / apache //