[PDF] Parallel versions of the symbolic manipulation system FORM

Abstract

The symbolic manipulation program FORM is specialized to handle very large algebraic expressions. Some specific features of its internal structure make FORM very well suited for parallelization. We have now two parallel versions of FORM, one is based on POSIX threads and is optimal for modern multicore computers while another one uses MPI and can be used to parallelize FORM on clusters and Massive Parallel Processing systems. Most existing FORM programs will be able to take advantage of the parallel execution without the need for modifications.

Full PDF

aa r X i v : . [ h e p - ph ] J un Parallel versions of the symbolic manipulationsystem FORM

M. Tentyukov ∗ Institut für Theoretische Teilchenphysik, Karlsruhe Institute of Technology (KIT), D-76128Karlsruhe, GermanyE-mail: [email protected]

J.A.M. Vermaseren

Nikhef Science Park 105 1098 XG, AmsterdamE-mail: [email protected]

J. Vollinga

Nikhef Science Park 105 1098 XG, AmsterdamE-mail: [email protected]

The symbolic manipulation program FORM is specialized to handle very large algebraic expres-sions. Some speciﬁc features of its internal structure make FORM very well suited for paralleliza-tion.We have now two parallel versions of FORM, one is based on POSIX threads and is optimal formodern multicore computers while another one uses MPI and can be used to parallelize FORMon clusters and Massive Parallel Processing systems. Most existing FORM programs will be ableto take advantage of the parallel execution without the need for modiﬁcations. ∗ Speaker. c (cid:13) Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-ShareAlikeLicence. http://pos.sissa.it/ arallel FORM

M. Tentyukov

1. Introduction

The symbolic manipulation system FORM [1] which is available already more than 20 years,is specialized to handle very large algebraic expressions of billions of terms in an efﬁcient andreliable way. It is widely used, in particular in the framework of perturbative Quantum FieldTheory, where sometimes hundreds of thousands of Feynman diagrams have to be computed; mostof the spectacular calculations of refs [2, 3] would hardly have been possible with other availablesystems. However, the abilities of FORM are also quite useful in other ﬁelds of science where themanipulation of huge expressions is necessary.Parallelization is one of the most efﬁcient ways to increase performance. Some internalspeciﬁcs [4] make FORM very well suitable for parallelization so the idea to parallelize FORMis quite natural.

2. General concepts and models in use

The general concept of FORM parallelization is as follows [4, 5, 6]: upon the startup, theprogram launches a master and several workers . FORM treats each expression individually, whichallows the master to split incoming expressions into independent chunks. Each chunk is processedby workers in parallel, and then the master collects the results.At present, we have two different models [5, 6]: in

ParFORM [4] the master and workers areindependent processes communicating via MPI and in TFORM [6] master and workers are separatethreads of a multithreaded process.Both models require almost no special efforts for parallel programming, all FORM programsmay be executed in parallel without any changings. The user may give FORM some hints of how toparallelize some things better; these hints are simply ignored by the sequential version of FORM.Since TFORM uses common address space, it is runnable only on SMP computers. On theother hand, sometimes it permits more efﬁcient parallelization, and it does not depend on MPIwhich make it much easier for deployment. ParFORM can be used not only on SMP computersbut also in clusters and Massive Parallel Processors (MPP).

3. Performance

Both ParFORM and TFORM demonstrate approximately the same speedup [5, 6]. Here wediscuss TFORM running the Multiple Zeta Value program [7] on the computer “qftquad5” at DESY.The computer has 96 GB of main memory and 8 independent CPU cores; the effective number ofCPU cores is 16 due to hyperthreading. The results are given in Fig. 1.For reference, the run with FORM (the sequential version) took 57078 sec.We see three regions: ﬁrst, the speedup is almost linear up to 8 workers; second, the speedupis also almost linear in the range of 8-16 workers but with much less slope, and after 16 workerswe observe a saturation. When we looked at the total amount of CPU time used, Fig.2, we see thetotal CPU time is more or less constant up to 8 workers and above 16 workers. In the range of TFORM uses POSIX threads, or pthreads arallel FORM M. Tentyukov T i m e ( s ) Workers

Figure 1:

Running times of the Multiple Zeta Value TFORM program. The runs were for weight 23, up todepth 7. C P U t i m e ( s ) Workers

Figure 2:

Total CPU time of the Multiple Zeta Value TFORM program.

4. Recent development

Over the past years parallel FORM versions have picked up a number of new features: • Dollar variables . By default, both ParFORM and TFORM switch into the sequential modefor each module which gives dollar variables a value during execution. But there are common3 arallel FORM

M. Tentyukov cases when some dollar variables obtained from each term in each chunk can be processed inparallel in order to get a minimum value, a maximum, or a sum of results. Also, sometimesat the end of the processing of a term the value of the dollar variable is not important at all.Hence new module options have been implemented to help FORM to process these variablesin parallel: minimum , maximum , sum and local . • Right-hand side expressions (RHS) . This is not a problem for TFORM since all threadswork with the same ﬁle system while it is a big problem for ParFORM since the expres-sion may be situated in a scratch ﬁle but different nodes may have independent scratch ﬁlesystems. For a long time ParFORM forced evaluation of modules with RHS expressions insequential mode. Now ParFORM is able to perform RHS expressions in a real parallel mode. • InParallel statement . A new statement was inplemented, inparallel; . This statementallows the execution of complete expressions in a single worker simultaneously. This isreally useful when there are many short expressions, sometimes it gives a signiﬁcant increasein efﬁciency.In Fig. 3 we summarize the speedup curves for the TFORM running the MZV program on 8 CPUcores computer when various features are switched off/on. The legend is the following: Sp ee dup Number of workersAll parRHSseqNoDolNoInParNoInpar,NoDolRHSseq,NoInPar

Figure 3:

Results of the MZV program runs with various features switched off/on. The runs were forweight 20, up to depth 8. • All par – all above mentioned features are implemented; • RHSseq – modules with RHS expressions are forced into the sequential mode; • NoDol – modules with dollar variables are forced into the sequential mode; • NoInar – no InParallel statements; • NoInPar,NoDol – modules with dollar variables are forced into the sequential mode, noInParallel statements; • RHSseq,NoInPar modules with RHS expressions are forced into the sequential mode, noInParallel statements. 4 arallel FORM

M. Tentyukov

As we can see, all these new features are really important.If FORM programs have to run for a long time the reliability of the hardware or of the softwareinfrastructure becomes a critical issue. Program termination due to unforeseen failures may wastedays or weeks of invested execution time. The checkpoint mechanism was introduced to protectlong running FORM programs as good as possible from such accidental interruptions. With acti-vated checkpoints FORM will save its internal state and data from time to time on the hard disk.This data then allows a recovery from a crash. The parallel FORM versions support this mechanismas well.By default, data are saved at the end of each module. Usually this is too expensive. Optionally,the data may be saved only after some time interval. The scalability for ParFORM running BAICERN=16 for different intervals between checkpoints is depicted in Fig. 4. As one can see, even very Sp ee dup Number of workersNo chck30 min10 min 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 1 2 3 4 5 6 7 T i m e ( s ) Number of workers10 min30 minNo chck

Figure 4:

Absolute time and speedup curves for the test program BAICER without checkpoint mechanism(“NoChck”), checkpoints every 30 minutes (“30 min”) and every 10 minutes(“10 min”). frequent checkpoints do not affect performance much.

Acknowledgments.

This work was supported in part by DFG through SBF/TR 9 and by theFOM foundation.

References [1] J. A. M. Vermaseren, [arXiv:math-ph/0010025];J. A. M. Vermaseren, Nucl. Phys. Proc. Suppl. (2008) 19, [arXiv:0806.4080 [hep-ph]].[2] S. Moch, J. A. M. Vermaseren and A. Vogt, Nucl. Phys. B (2004) 101;A. Vogt, S. Moch and J. A. M. Vermaseren, Nucl. Phys. B (2004) 129;J. Blumlein and J. A. M. Vermaseren, Phys. Lett. B (2005) 130;Y. Schröder and A. Vuorinen, JHEP (2005) 051;J. A. M. Vermaseren, A. Vogt and S. Moch, Nucl. Phys. B (2005) 3;R. Bonciani and A. Ferroglia, Phys. Rev. D (2005) 056004;Y. Schröder and M. Steinhauser, JHEP (2006) 051;K. G. Chetyrkin, J. H. Kuhn and C. Sturm, Nucl. Phys. B (2006) 121;T. Aoyama, M. Hayakawa, T. Kinoshita and M. Nio, Nucl.Phys. B (2006) 138.[3] A. Retey and J.A.M. Vermaseren, Nucl. Phys. B604 (2001) 281;P.A. Baikov, K.G. Chetyrkin and J.H. Kuhn, Phys. Rev. Lett. (2002) 012001; arallel FORM M. TentyukovP.A. Baikov, K.G. Chetyrkin and J.H. Kuhn, Phys. Lett.

B559 (2003) 245;P.A. Baikov, K.G. Chetyrkin and J.H. Kuhn, Phys. Rev.

D67 (2003) 074026;P.A. Baikov, K.G. Chetyrkin and J.H. Kuhn, Eur. Phys. J.

C33 (2004) 650;S. Bekavac, hep-ph/0505174;P. A. Baikov, K. G. Chetyrkin and J. H. Kuhn, Phys. Rev. Lett. (2005) 012003;P. A. Baikov, K. G. Chetyrkin and J. H. Kuhn, Phys. Rev. Lett. (2006) 012003;P. A. Baikov, K. G. Chetyrkin and J. H. Kuhn, Phys. Rev. Lett. (2008) 012002;A. Kotikov, J.H. Kuhn and O. Veretin, Nucl. Phys. B788 (2008) 47;P. A. Baikov, K. G. Chetyrkin and J. H. Kuhn, Phys. Rev. Lett. (2010) 132004.[4] D. Fliegner et al , [arXiv:hep-ph/9906426];D. Fliegner et al , [arXiv:hep-ph/0007221]. M. Tentyukov et al , “Parallel Version of the SymbolicManipulation Program FORM”, in: V.G. Ganzha et al (Eds.), Proceedings of the CASC 2004,Technische Universität München, Garching, Germany; [arXiv:cs.SC/0407066];M. Tentyukov et al , Nucl. Instrum. Meth. A (2006) 2248.[5] M. Tentyukov and J.A.M.Vermaseren, PoS (ACAT08) 119.[6] M. Tentyukov and J.A.M. Vermaseren, doi:10.1016/j.cpc.2010.04.009, [arXiv:hep-ph/0702279].[7] J. Blumlein, D.J. Broadhurst and J.A.M. Vermaseren, [arXiv:0907.2557].(2006) 2248.[5] M. Tentyukov and J.A.M.Vermaseren, PoS (ACAT08) 119.[6] M. Tentyukov and J.A.M. Vermaseren, doi:10.1016/j.cpc.2010.04.009, [arXiv:hep-ph/0702279].[7] J. Blumlein, D.J. Broadhurst and J.A.M. Vermaseren, [arXiv:0907.2557].