Effect of Meltdown and Spectre Patches on the Performance of HPC Applications
Nikolay A. Simakov, Martins D. Innus, Matthew D. Jones, Joseph P. White, Steven M. Gallo, Robert L. DeLeon, Thomas R. Furlani
PPREPRINT, JANUARY 2018 1
Effect of Meltdown and Spectre Patches on thePerformance of HPC Applications
Nikolay A. Simakov, Martins D. Innus, Matthew D. Jones, Joseph P. White, Steven M. Gallo,Robert L. DeLeon and Thomas R. Furlani
Abstract —In this work we examine how the updatesaddressing Meltdown and Spectre vulnerabilities impactthe performance of HPC applications. To study this weuse the application kernel module of XDMoD to test theperformance before and after the application of the vulner-ability patches. We tested the performance difference formultiple application and benchmarks including: NWChem,NAMD, HPCC, IOR, MDTest and IMB. The results showthat although some specific functions can have perfomancedecreased by as much as 74%, the majority of individualmetrics indicates little to no decrease in performance.The real-world applications show a 2-3% decrease inperformance for single node jobs and a 5-11% decreasefor parallel multi node jobs.
Index Terms —HPC, Security, Performance
I. I
NTRODUCTION
The recently discovered Meltdown [1] and Spectre [2]vulnerabilities allow reading of process memory by otherunauthorized processes. This poses a significant securityrisk on multi-user platforms including HPC resourcesthat can result in the compromise of proprietary orsensitive information [1, 2]. Software patches released tomitigate the security vulnerabilities have the potential tosignificantly impact performance. According to Redhat[3] Linux OS remedies can degrade performance overallby 1-20%. In order to quantify the impact, particularlyon HPC applications, we performed independent testsutilizing XDMoD’s application kernel capability [4].The XD Metrics on Demand (XDMoD) tool, whichis designed for the comprehensive management of HPCsystems, provides users, managers, and operations staffwith access to utilization data, job and system levelperformance data, and quality of service data for HPCresources [5]. Originally developed to provide indepen-dent audit capability for the XSEDE program, XDMoDwas later open-sourced and is widely used by university,government, and industry HPC centers [6]. The applica-tion kernel performance monitoring module of XDMoD[4] allows automatic performance monitoring of HPCresources through the periodic execution of applicationkernels, which are based on benchmarks or real-world
Center for Computational Research, State University of New York,University at Buffalo, Buffalo, [email protected] applications implemented with sensible input parameters(see Figure 1 for web interface screen-shot).Since the application kernels, which are computation-ally lightweight, are designed to run continuously ona given HPC system, they are ideal for detecting dif-ferences in application performance when system widechanges (hardware or software) are made. Accordingly,XDMoD’s application kernels were employed here todetermine if the software patches that mitigate the Melt-down and Spectre vulnerabilities significantly impactperformance.
Fig. 1. Screen-shot of the application kernel performance monitoringmodule of XDMod showing the change in performance of NAMD exe-cuted on 2-nodes. The module automatically calculates control regionsand performs an automatic detection of performance degradation andapplication environment changes.
II. M
ETHODS
A. Selected Application Kernels
The following XDMoD application kernels were cho-sen for this test: NAMD [7], NWChem [8], HPC Chal-lenge Benchmark suite (HPCC) [9] (which includesmemory bandwidth micro-benchmark STREAM [10] a r X i v : . [ c s . PF ] J a n REPRINT, JANUARY 2018 2
TABLE IC
HANGE IN WALLTIME UPON PATCH APPLICATION .Application NumberofNodes Difference,% Are themeansdifferent? Before Patch Application After Patch ApplicationMean,Seconds StandardDevi-ation,Seconds Numberof Runs Mean,Seconds StandardDevi-ation,Seconds Numberof RunsNAMD 1 3.3 Y 306.6 1.44 24 316.9 3.05 56NAMD 2 6.9 Y 175.4 2.78 22 188.1 3.49 56NWChem 1 2.6 Y 77.8 1.91 23 79.9 1.11 59NWChem 2 10.7 Y 58.4 1.05 21 65.0 4.16 56HPCC 1 2.2 Y 304.1 6.39 23 310.9 4.88 56HPCC 2 5.3 Y 345.1 5.41 22 364.0 8.44 56IMB 2 4 Y 14.8 0.54 21 15.4 1.39 56IOR 1 3.9 Y 188.5 9.41 21 195.9 11.69 55IOR 2 1.5 N 371.1 12.23 22 376.7 19.50 56IOR.local 1 2.1 N 462.8 16.37 12 472.8 19.03 56MDTest 1 21.5 Y 30.5 3.17 21 37.8 4.10 56MDTest 2 9.3 Y 166.7 3.60 23 182.8 5.30 55MDTest.local 1 56.4 Y 3.8 0.62 12 6.7 2.61 56 Differences are calculated as the new mean value minus the old mean value divided by the average of the two means. A largerdifference indicates poorer performance after the patch. The Welch two sample, two sided, t-test with α = 0 . was used to determine if the before and after test results were drawn fromdistributions with statistically significantly different means. and the NASA parallel benchmarks (NPB)[11]), inter-connect/MPI benchmarks (IMB) [12, 13], IOR [14] andMDTest [15]. The first two are based on widely usedscientific applications and the others are based on com-monly deployed benchmarks. Most of the applicationkernels were executed on one or two nodes, 8 and16 cores respectively. For more details on applicationkernels refer to [4].IOR and MDTest were executed on the parallel filesystem (GPFS) as well as the local file system. In orderto differentiate between the two file systems, we use a”.local” suffix in the reported results when the local filesystem is used (e.g. IOR.local). B. System
The tests were performed on a development cluster atthe Center for Computational Research (CCR), SUNY,University at Buffalo. The cluster consists of eight nodes(8-cores, 24GiB RAM) with two Intel L5520 CPUsconnected by QDR Mellanox Infiniband. The nodes haveaccess to a 3 PB IBM GPFS storage system shared withother HPC resources in CCR. The operating system isCentOS Linux release 7.4.1708.
C. Patches
To fix the Meltdown and Spectre vulnerabilities anew kernel was installed. Specifically, kernel-3.10.0-693.5.2.el7.x86 64 was updated with kernel-3.10.0-693.11.6.el7.x86 64 which fixes CVE-2017-5753, CVE-2017-5715 and CVE-2017-5754 vulnerabilities.
D. Comparison of the Results
The tests were run prior to and after application ofthe vulnerability updates. The ”before” tests includeapproximately 20 runs for most of the application ker-nels. The ”after” tests include approximately 50 runs forall application kernels. The comparison of before andafter distributions were determine using the Welch twosample, two sided, t-test with α parameter equal to 0.05.That is, we consider the means of two distributions tobe different if the probability that such test results couldbe obtained from equal distributions is less than or equalto 0.05. III. R ESULTS AND D ISCUSSION
Table I and Figure 2 show the change in walltimebefore and after the patches for the suite of applicationkernels employed in this study. For the compute intensiveapplications (NAMD, NWChem and HPCC), the perfor-mance degradation is around 2-3% for parallel singlenode jobs. However it increases to 5-11% for the caseof two nodes.IOR and MDTest benchmarks measure the perfor-mance of the file system. As discussed in the introduc-tion, we tested both the parallel and local file systems.Tables IV and V show selected results for these tests. Inboth cases there is a significant decrease in performancefor file meta-data operations (10-20%). However, theperformance degradation for read and write operationsis only in the range of 0-3%. Based on these findings,the performance degradation should be smaller for appli-cations that use a small number of large files versus thosethat use a large number of small files. Data processing
EFERENCES 3 l l ll lll lll lll lllll l ll l ll l lll l ll l lll l ll ll ll l lll lll l lll ll l lll lllll lll lll ll l l lll l lll l l l ll ll ll l ll l lll l lll ll ll l lll llll ll ll l ll ll l l lll l lll lll ll ll ll ll l ll ll lll l l lll llll l lll l lll ll ll lll ll ll l l l ll l lll l ll l ll ll ll l l ll llll ll ll ll ll l l lll ll ll ll l lllll l lll ll lll llll llll ll ll lllll l l lll l lll ll lll l lllll ll ll ll l llllll ll lll ll l ll ll l ll ll lll l lll ll llllll l lll lll l lll l lll ll l llll lll l ll ll lll l l ll lll llll lll ll llll l llll lll ll ll l ll ll l lllll lll ll lll ll l lll lll ll l ll lll l ll ll ll ll ll ll l lllll ll ll ll l ll ll ll l ll ll l l llllll lll lllll l lll lll ll ll lllllll ll llll lll lll lll lll lll lll ll ll l llll lll l ll l lll l lll ll ll ll llll ll ll lll lllll ll l ll lll ll ll l l ll l l l lll lll llll lll ll lll l ll l l lll ll ll lllll l ll lllll l llll ll ll l lll lll ll llll lll ll l ll ll ll llll llll l llllll l l ll l l ll lll l llll ll ll lll ll lll ll ll l l l ll ll ll l ll l l lll ll lll ll lll llll l ll l lll ll lll ll ll l ll l l ll l llll ll lll ll l lll l ll ll ll ll l llll lll ll ll l lll lll ll ll lllll lll ll l llll lll ll ll ll l lll l ll ll ll l ll l ll l ll l lll ll ll ll l l l l l ll l lll ll ll ll lll l lll ll lllll ll ll ll l ll lll ll l ll lll l l ll lllllllll llll lllll lllll l l llllllllll l llllll lll llll llllll lllllllll lll lllllll llllll
MDTest.local, N=1IOR.local, N=1 IMB, N=2MDTest, N=1 MDTest, N=2IOR, N=1 IOR, N=2HPCC, N=1 HPCC, N=2NWChem, N=1 NWChem, N=2NAMD, N=1 NAMD, N=25.0 7.5 10.0460 480 500 15.0 17.5 20.0 22.5 25.030 35 40 45 160 180 200180 190 200 210 220 350 375 400 425 450295 300 305 310 315 320 340 350 360 370 380 39076 78 80 82 65 75 85 95305 310 315 320 325 180 190 200
Walltime, Seconds Time ll After UpdateBefore Update
Fig. 2. Application kernel walltime comparisons before and after the updates. Box plot diagram is used to show sample statistics. Left sideof the box, vertical line within the box and right side of the box show first quartile, medium and third quartile. In addition all measurementsare plotted using round points. applications may therefore be particularly sensitive to thepatches employed to mitigate the vulnerabilities.The IMB test shows that most reported metrics aredegraded by more than by 2% (Table III).The HPCC benchmark performs various tests fromlinear algebra, fast Fourier transformation (FFT) andmemory manipulation. Interestingly the simple arraysmanipulations (STREAM tests: arrays addition, copyingand scaling) are actually faster in case of two nodes(Table II). However FFT, matrix manipulation and matrixtransposition get slower. The surprising performanceimprovement in STREAM tests might be due to otherchanges in kernel. Anyway this improvement does nottransfer to matrix manipulation and matrix transposition,which are 2% and 10% slower (two nodes).IV. C
ONCLUSIONS AND F UTURE P LANS
Some of the individually measured simple metricsshow a significant decrease in performance, notably MPI random access, memory copying and file metadataoperations. Many other metrics show little to no change.Overall the compute intensive single node applicationshave a moderate decrease in the performance around 2-3%. However, multi-node parallel jobs suffer a 5-11%decrease in performance. This can Probably be addressedin the compiler and MPI libraries.These tests were executed in a relatively isolated envi-ronment. After the updates are applied on our productionsystem we will perform additional tests with a largernumber of nodes and for more application kernels.V. A
CKNOWLEDGEMENTS
We gratefully acknowledge the support of NSF awardsOCI 1025159, ACI 1445806, and OCI 1203560.
REPRINT, JANUARY 2018 4 R EFERENCES
1. Lipp, M., Schwarz, M., Gruss, D., Prescher, T.,Haas, W., Mangard, S., Kocher, P., Genkin, D.,Yarom, Y. & Hamburg, M. Meltdown.
ArXiv e-prints. arXiv: 1801.01207 (Jan. 2018).2. Kocher, P., Genkin, D., Gruss, D., Haas, W., Ham-burg, M., Lipp, M., Mangard, S., Prescher, T.,Schwarz, M. & Yarom, Y. Spectre Attacks: Exploit-ing Speculative Execution.
ArXiv e-prints. arXiv:1801.01203 (Jan. 2018).3. Hat, R.
Speculative Execution Exploit PerformanceImpacts - Describing the performance impacts tosecurity patches for CVE-2017-5754 CVE-2017-5753 and CVE-2017-5715 https : / / access . redhat .com / articles / 3307751 (2018 (Updated January 42018 at 11:23 PM)).4. Simakov, N. A., White, J. P., DeLeon, R. L.,Ghadersohi, A., Furlani, T. R., Jones, M. D., Gallo,S. M. & Patra, A. K. Application kernels: HPCresources performance monitoring and varianceanalysis.
Concurrency and Computation: Practiceand Experience
CPE-14-0402.R1, 5238–5260.
ISSN : 1532-0634 (2015).5. Furlani, T. R., Schneider, B. I., Jones, M. D.,Towns, J., Hart, D. L., Gallo, S. M., DeLeon,R. L., Lu, C., Ghadersohi, A., Gentner, R. J.,Patra, A. K., Laszewski, G., Wang, F., Palmer,J. T. & Simakov, N.
Using XDMoD to facilitateXSEDE operations, planning and analysis in Pro-ceedings of the Conference on Extreme Science andEngineering Discovery Environment: Gateway toDiscovery (XSEDE ’13) (2013), 8. doi:10 . 1145 /2484762.2484763.6. Palmer, J. T., Gallo, S. M., Furlani, T. R., Jones,M. D., DeLeon, R. L., White, J. P., Simakov, N.,Patra, A. K., Sperhac, J. M., Yearke, T., Rathsam,R., Innus, M., Cornelius, C. D., Browne, J. C.,Barth, W. L. & Evans, R. T. Open XDMoD: Atool for the comprehensive management of high-performance computing resources.
Computing inScience and Engineering
J. Comp. Chem.
Comput. Phys. Commun. et al. Introduction tothe HPC Challenge Benchmark Suite, ICL Techni-cal Report ICL-UT-05-01, University of Tennessee- Knoxville
Memory Bandwidth and MachineBalance in Current High Performance Computers
IEEE Computer Society Technical Committee onComputer Architecture (TCCA) Newsletter. Nov.1995.11.
12. OSU Micro Benchmarks 3.3. http://mvapich.cse.ohio-state.edu/benchmarks/[September 1, 2011]
13. Intel MPI Benchmarks 3.2.2. http://software.intel.com/en-us/articles/intel-mpi-benchmarks [September 1, 2011]
14. IOR Parallel I/O Benchmark. http://sourceforge.net/projects/ior-sio [December1, 2011]
15. mdtest. an MPI-coordinated metadata benchmarktest https://sourceforge.net/projects/mdtest/ [June30, 2016].
REPRINT, JANUARY 2018 5
TABLE IIC
HANGES IN SELECTED MEASURED METRICS FROM
NAMD, NWC
HEM AND
HPCC.
REPRINT, JANUARY 2018 6
TABLE IIIC
HANGES IN SELECTED MEASURED METRICS FROM
IMB.
REPRINT, JANUARY 2018 7
TABLE IVC
HANGES IN ALL MEASURED METRICS FROM
IOR
REPRINT, JANUARY 2018 8
TABLE VC
HANGES IN ALL MEASURED METRICS FROM
MDT