lrsarith: a small fixed/hybrid arithmetic C library
aa r X i v : . [ c s . M S ] J a n lrsarith : a small fixed/hybrid arithmetic C library David AvisSchool of Informatics, Kyoto University, Kyoto, Japan andSchool of Computer Science, McGill University, Montr´eal, Qu´ebec, CanadaCharles JordanGraduate School of Information Science and Technology, Hokkaido University, Sapporo, JapanFebruary 1, 2021
Abstract
We describe lrsarith which is a small fixed precision and hybrid arithmetic C library for integersand rationals that we developed for use in the lrslib library for polyhedral computation. Using ageneric set of operations, a program can be compiled with either 64-bit or 128-bit (if available)fixed precision, with an extended precision library such as GMP or the built-in MP routines. Asimple scheme checks for overflow and either terminates the program or, in hybrid mode, changesto a higher precision arithmetic. Implementing these arithmetics in lrslib resulted in only minimalchanges to the original code. We give computational results using lrs and mplrs , vertex/facetenumeration codes in lrslib , using 64 and 128 bit fixed integer arithmetic with and without overflowchecking, GMP arithmetic, lrsarith hybrid arithmetic with both GMP and MP, and FLINT hybridarithmetic. We give a small self-contained example C program using the lrsarith package in bothfixed precision and hybrid mode.
When writing mathematical software, a fundamental choice is the arithmetic to use. It is easy tobe seduced by the performance of native integers, but the possibility of incorrect output causedby overflow can be a major concern. To guarantee correctness, in many cases one must rely onmultiprecision arithmetic such as that provided by the GNU Multiple Precision (GMP) ArithmeticLibrary. This comes with a large performance penalty compared to native integers on inputs wherethey suffice, and it’s tempting to maintain multiple versions with different arithmetic.Until version 7, this was the situation with lrslib , a library of programs for polyhedral computationincluding e.g. lrs for vertex/facet enumeration and redund for redundancy removal. From the outset lrslib used exact arithmetic and could be compiled with either fixed precision C integer arithmetic(without overflow checking) or with an internal extended precision arithmetic library. For safety, thedefault was to use extended precision. Later versions could also be compiled with the GMP packageand more recently, with the hybrid arithmetic package available in FLINT. The fixed precision versionsperform several times faster than the extended precision versions but can produce incorrect results ifoverflow occurs.For this reason we developed a simple scheme to detect the possibility of overflow occurring whileusing fixed precision arithmetic. The code could then halt and allow the user to restart using higheror extended precision arithmetic. This led to the development of a hybrid scheme in lrslib to allowautomatic restart with the next higher precision arithmetic. This gives users the performance of nativearithmetic in most cases where it is safe, while keeping the correctness provided by the extendedprecision versions. The restart is from (very near) the point where a possible overflow was detected.Implementing this in lrslib resulted in only minimal changes to the original code and gives a significantperformance improvement.The main purpose of this paper is to explain how overflow checking and hybrid arithmetic wasimplemented and to give numerical results for a case study involving vertex/facet enumeration. This1xplains the improved performance of lrslib lrslib library, so we created the much smaller, independent lrsarith package which contains only the arithmetic code. It can be downloaded from [1].Implementing arithmetic in lrsarith allows developers to postpone the choice of arithmetic untilcompile time. With some additional effort to handle overflows and restarting from intermediate pointsof the computation, it also allows the developer to obtain the performance of native integers whenthey suffice and the correctness of an extended precision library when required.
When comparing various codes for vertex/facet enumeration [2], we noted that the hybrid arithmeticavailable in normaliz could give a large speedup for certain inputs. The default in normaliz is to beginthe computation in 64-bit integers and to restart using GMP extended precision if overflow occurs.However, the restart is from the beginning of the computation, which can hurt performance on certaininputs where the repeated work is significant (see the comparison in [2]).The Fast Library for Number Theory (FLINT) [5] is a collection of modules that support variousarithmetic operations. In particular, we use the fmpz module for extended precision arithmetic thatbalances “very efficient” performance on small integers with performance similar to GMP for largeintegers. We will compare lrsarith performance with FLINT in Section 4.4. Although FLINT is easierto develop with since switching arithmetic is transparent to the user, lrsarith can give higher perfor-mance on small integers. FLINT contains much additional functionality beyond extended precisionarithmetic, but usually requires the user to install it.Porta [4] is a collection of routines related to polytopes, including vertex/facet enumeration. Itsupports both fixed precision and its own multiprecision arithmetic and was included in the compari-son [2] mentioned earlier.The standard general purpose library for extended precision arithmetic with large integers is GMP .It is highly optimized for many platforms and was the default in lrslib for some time. Historically, lrslib used its own multiprecision arithmetic (MP) and this is still supported. Comparisons are givenin Section 4.4. The paper contains two parts that can essentially be read independently. The first part gives a detailedexplanation of how we implement the various arithmetics with a simple but complete example programthat can be used as a template for developing with lrsarith . In Section 2 we show how overflow checkingis implemented in our fixed precision lrslong library. We give a simple example to show how a singleC code can be compiled with 64-bit, 128-bit, MP or GMP arithmetic. In the first three cases theprogram terminates when the possibility of overflow is detected. In Section 3 we continue the exampleto show how a program can restart after overflow has been detected, using the next level of arithmetic.This form of hybrid arithmetic is used in lrslib . Computational results are presented in both sectionsfor a larger example involving Collatz sequences.The second part of the paper begins in Section 4 where we present our case study and givecomputational results comparing various arithmetic packages that can be used in lrslib . These includefixed integer arithmetic with and without overflow checking, GMP arithmetic, two versions of lrsarith hybrid arithmetic and FLINT hybrid arithmetic.
Arithmetic is handled in lrsarith by defining a generic integer type and a set of generic operations. Ageneric integer a , integer vector v and integer matrix A are defined https://gmplib.org/ rs_mp a; lrs_mp_vector v; lrs_mp_matrix A; allocated lrs_alloc_mp(a); v=lrs_alloc_mp_vector(n); A=lrs_alloc_mp_matrix(m,n); and freed lrs_clear_mp(a); lrs_clear_mp_vector(n); lrs_clear_mp_matrix(m,n); where m and n are the dimensions. The types are assigned at compile time depending on the arithmeticused. For 64-bit and 128-bit integers they are assigned to fixed length integers and for GMP arithmeticto the GMP integer type: typedef long long lrs_mp[1]; /* one long integer */typedef __int128 lrs_mp[1]; /* one 128-bit integer */typedef long long lrs_mp[MAX_DIGITS + 1]; /* one MP integer */typedef mpz_t lrs_mp; /* one GMP integer */ Operations using lrs mp integers are written generically. Typical examples are as follows where a , b , c , d , e are lrs mp integers, i is a long long and the equivalent C code is given in parentheses: itomp(i,a); (a=i)copy(b,a); (b=a)addint(a,b,c); (c=a+b)mulint(a,b,c); (c=a*b)divint(a,b,c); (c=a/b a=a%b)linint(a, ka, b, kb); (a=a*ka+b*kb for ka, kb long long integers)qpiv(a,b,c,d,e); (a=(a*b-c*d)/e where the division is exact ) A small C program, fixed.c , using some of these operations is given in Appendix A. It reads in aninteger k and attempts to repeatedly square it six times. We will discuss the program later in thissection. Generic operations are either assigned at compile time by macros, for example: or by C code in the case of MP. In this way arithmetic operations are essentially the same as usingthe underlying arithmetic package.The problem is that overflow is not detected in 64-bit and 128-bit arithmetic. We solve this problemby a technique we call lazy overflow handling . Using very light extra computation we detect whenit is possible that an overflow may occur. Control is then given to an overflow handler that eitherhalts computation or restarts the program using arithmetic with higher precision. While restartingfrom the beginning of execution is simplest, later we will see options for restarting from intermediatecheckpoints – allowing programs to use faster arithmetic for initial portions of computations in somecases.To incorporate lazy overflow handling we define some constants that depend on the word size W which is either 64 or 128 in lrsarith : MAXDm = jp W − − k MAXDl = 2 W/ − − MAXDa = 2 W − − MAXDa is the constant used in testing for overflow when using addition or similar operations,
MAXDl is used for linint and
MAXDm is used for multiplying. Three macros test whether the operands a and b are out of bounds: We claim that if the integer arithmetic would overflow then lrs overflow is called. Since we are usingsigned integer arithmetic this is equivalent to proving that the result (and intermediate values) in eachcase is at most 2 W − −
1. Proceeding case by case: • addint ( a, b, c ): | a | , | b | ≤ MAXDa and so a + b ≤ ∗ (2 W − −
1) = 2 W − − • mulint ( a, b, c ): | a | , | b | ≤ MAXDm and so a ∗ b ≤ (cid:16) √ W − − (cid:17) = 2 W − − • linint ( a, ka, b, kb ) | a | , | b | , | ka | , | kb | ≤ MAXDl and so a ∗ ka + b ∗ kb ≤ W/ − − = 2 W − − W/ + 2 .Note that divint cannot overflow and that the analysis for qpiv is thus essentially the same as for linint , proving the claim.Rational arithmetic is handled in lrsarith in a very simple way by representing the rational number a/b by two lrs mp integers a and b . The operation reduce(a,b) divides a and b by their greatest common divisor. Arithmetic operations for rationals are based onthe integer operations above. For example mulrat(a,b,c,d,e,f) (e/f = a/b * c/d, with e/f reduced) is implemented by mulint (a,c,e);mulint (b,d,f);reduce (e,f); Overflow checking for rational arithmetic is inherited from the corresponding integer arithmetic.We now look more closely at the code fixed.c for iterated squaring given in Listing 1 of Appendix A.Among the includes on lines 1–3 we have the lrsarith header file. On lines 7–9 is the function lrs overflow that handles overflow processing. In this case it simply prints a message and halts the program. Inthe main routine we see that two lrs mp variables are declared. These are allocated and cleared inlines 16 and 26. These are null operations for 64 or 128-bit arithmetic and MP but are needed inGMP. On line 15 we initialize the arithmetic package. The next lines read an integer k and attemptto iteratively square it six times.Using the makefile in Listing 2 we can compile fixed.c with each of the arithmetic packages de-pending on the compiler switches used. For 64 or 128-bit arithmetic the -DLRSLONG switch is set, with -DB128 also set for 128-bit arithmetic. The -DSAFE switch enables overflow checking and handlingas described in this section. The files lrsarith.h and lrsarith.c use these switches to ensure thatthe correct arithmetic files are included. Lines 2 and 3 produce the binaries fixed1 and fixed2 . Forillustrative purposes we also compile without the -DSAFE switch obtaining fixed1n and fixed2n in lines4 and 5. In this case no overflow checking is performed. With the -DMP switch set the MP version isproduced. Finally by setting the -DGMP switch we compile with the external GMP library which mustbe preinstalled. For simplicity, we assume that the necessary files are in standard locations otherwisethe locations need to be specified. We ran fixed1 , fixed2 , fixedmp and fixedgmp with the input k = 5getting the following output, respectively: 4
25 625 390625 152587890625 overflow detected:halting5 25 625 390625 152587890625 2328364365386962890625 overflow detected:halting5 25 625 390625 152587890625 23283064365386962890625 5421010862427522170037264004349708557128906255 25 625 390625 152587890625 23283064365386962890625 542101086242752217003726400434970855712890625 fixed1 can compute 5 before overflowing and fixed2 can also compute 5 . Both fixedmp and fixedgmp can compute 5 . Running fixed1n and fixed2n with no overflow protection produces incorrectoutput. This can observed by noting the three cases where the last digit is not 5: We briefly compare performance of the various fixed arithmetic packages in order to evaluate theoverhead used by overflow checking. While this can also be seen in the more extended comparisondone in Section 4, here we use a simple code to further demonstrate the use of lrsarith .We consider a problem that uses a great deal of relatively simple computations and little output.The Collatz conjecture [6, 7] is a famous open problem in number theory: given a number k , wereplace k by 3 k + 1 if k is odd and by k/ for all k < .Our goal is to compare arithmetic packages and not to computationally verify larger values, so weavoid techniques such as huge lookup tables for simplicity. We consider the Collatz sequence from k as a path from the integer k to the value one. The set of such paths defines an infinite tree with root k = 1 and the conjecture holds if all integers appear in the tree. One way to make the tree finite is togive a parameter max and consider the tree of all paths to the root that do not contain an integer k greater than max . This finite tree can be generated from the root without using memory to store itsnodes using the reverse search procedure.We wrote a code coll to generate the tree in this way. As long as max < / when the various arithmeticpackages are compiled with coll . The binary collflint uses FLINT arithmetic (version 2.6.3) and theother binaries are labelled in the same way as for fixed.c above. max nodes coll1 coll1n coll2 coll2n collgmp collflint collmp Table 1: Collatz tree generation: times in seconds Collatz tree ( mai48 )The values in Table 1 are small enough to avoid overflows, but one can see that overflow checkingin lrslong results in only minor overhead. The relative performance of the packages will vary betweenmachines, mix of arithmetic operations, compilers and versions of extended precision libraries; however,these results give a sample of what can be obtained. It is also worth noting that the multiprecisionarithmetics are generally focused more on larger operands. In any case, for this computation on thismachine, 128-bit arithmetic is roughly eight times slower than 64-bit arithmetic. FLINT arithmeticis somewhat faster than GMP (version 6.0) but slower than the dedicated fixed precision arithmetics.This is generally reasonable – FLINT makes more effort to obtain good performance on small operands,however lrsarith fixed arithmetic is essentially the same as using native arithmetic. See also David Barina’s page on the current status: https://pcbarina.fit.vutbr.cz/ coll is contained in lrsarith -010. mai48 : 2x AMD EPYC7552 2.3GHz, 48 cores, 512 GB memory, CentOS 7.6, gcc 4.8.5 Hybrid arithmetic
The fixed arithmetic versions described in the previous section are very simple to implement and easyto use, giving good performance for inputs that do not require very long integers. If overflow occurs theuser can manually rerun the job with a higher precision arithmetic package. However, it is certainlymore convenient to automatically switch from one arithmetic to another with higher precision whenoverflow could occur. In this section we describe how this can be done using lrsarith , while makingonly minor modifications to the original user code.
We illustrate our approach using the example program of the previous section that reads an integer k and successively squares it. The hybrid code is shown in Appendix B. Lines 9–29 contain the function run that is essentially the function main from Appendix A. Instead of reading the integer k , it is passedto run as a parameter.The major change is the use of setjmp to allow lrs overflow to return control to the program thatcalls run . This is achieved via the variable buf1 defined on line 7. The main loop (lines 18–25) of run is enclosed by a test on line 17. If lrs overflow (lines 31–33) is called by the arithmetic package a longjump is performed to the statement immediately after the end of the enclosed loop, i.e. line 26. Thisprints a warning and returns to the routine that called run .In fact there are three routines that call run , namely run1 , run2 , and runmp (GMP or MP), but onlythe first two can trigger a call to lrs overflow . The function run itself will need to be compiled threetimes, once for each package. The arithmetic package used and the routine to call run are selectedat compile time by compiler directives in the makefile and on lines 35–43, respectively. The problemnow arises that since run will be compiled three times the three versions will need different names,a process known as name mangling . In fact all of the routines that use lrs mp will also need namemangling. The particular scheme used in lrsarith was suggested by David Bremner and is illustratedin line 5. A unique suffix suf defined in the arithmetic header file used is added to the function run .These define lines will be needed for each user supplied routine. The define for lrs overflow is containedin lrslong .h.Listing 5 contains the main program with its header file given in Listing 4. It begins by readingthe parameter k and by calling run1 which uses 64-bit arithmetic. If overflow occurs a return codeof 1 is triggered by lrs overflow causing main to call run2 using 128-bit arithmetic. If this in turnoverflows a final call is made to runmp which uses GMP . In the makefile given in Listing 6, lines4–6 compile the arithmetic libraries. Lines 7–9 compile hybridlib.c three times, once with eacharithmetic library. Finally line 11 combines everything into a single executable hybrid . Running hybrid with k = 5 produces: We continue the example of generating Collatz trees introduced in Section 2.2. This time we use muchlarger values of max in order to create overflow conditions in both 64-bit and 128-bit arithmetic. Asthe trees now become extremely large, we terminate the programs after 10 nodes have been generated.The program is deterministic so in each case the same nodes are traversed. The program coll is thehybrid code combining coll1 , coll2 and collgmp created in the same way as hybrid described above.The results are shown in Table 2. On each line the final arithmetic used in coll is shown in blue. Itis interesting to observe that for each of these three runs collgmp runs in roughly the same time andthat, for max = 10 , this is faster than the 128-bit arithmetic coll2 (and hence coll ).The simple scheme described in this section will be adequate in many applications but has anumber of disadvantages. First, memory allocated during the run for a given arithmetic may not be For simplicity we have omitted lines which produce a hybrid executable with MP, but they are in the lrsarith makefile . ax coll coll1 coll2 collgmp collmp collflint (hybrid) (hybrid)10 (o)=possible overflow detectedTable 2: Collatz tree generation: 10 nodes, times in seconds ( mai48 )freed after an overflow occurs, causing a memory leak. More importantly, the simple scheme hererestarts from the start of program execution. Forgetting the past can be helpful but this can hurtperformance if overflow occurs near the end of program execution. It would be much better to notredo work that has already been done, especially with slower arithmetic.In applications such as lrslib , these are serious issues. The definition of lrs mp depends on the arith-metic, so offsets into structures containing lrs mp can change after switching arithmetic. Handlingoverflows appropriately is therefore more complicated than simply forgetting the past and restarting,as was described here.The solution used in lrslib was to use a global variable to point to the allocated data structures. Thisallows lrs overflow to free the data after overflow is detected, before switching to the next arithmetic.A further improvement in lrslib was to allow the program to restart at the point in the calculationwhere the overflow was detected. Again this was done using a global variable and also the ability ofthe original code to restart. Fortunately this was already available. In other programs, it may benecessary to add periodic checkpoints and it can be helpful to separate structures that contain lrs mp from those that don’t. When comparing [2] various codes for vertex/facet enumeration, we noted that hybrid arithmetic couldgive a large speedup for certain inputs, but that this was not available in lrslib at the time. We nowturn our attention to the original motivation for developing the hybrid version of lrsarith : obtainingthese speedups in lrslib v. 7.We begin by introducing the basics of vertex and facet enumeration. Then, we explain the parallelhybrid version mplrs , since this handles overflow somewhat differently than hybrid lrs . In Section 4.3we describe the experimental setup and in Section 4.4 we present a comparison between the differentarithmetic packages as used in lrs and mplrs . A convex polyhedron P can be represented by either a list of vertices and extreme rays, called aV-representation, or a list of its facet defining inequalities, called an H-representation. The vertexenumeration problem is to convert an H-representation to a V-representation. The computationallyequivalent facet enumeration problem performs the reverse transformation. For further backgroundsee Ziegler [8]. We will use the lrs program in lrslib to experiment with various instances of theseproblems. A comparison with other codes including mplrs , a parallel version of lrs , to solve theseproblems can be found in [2].The input is represented by an m by n matrix. For a vertex enumeration problem this is a listof m inequalities in n − P . A vertex (resp. extreme ray) is theintersection of a set of n − n −
2) input inequalities, taken as equations, that satisfies theremaining inequalities. A major difficulty is caused by degeneracy which occurs when more than n − lrs generates multiple representations of the vertex,known as bases. For a facet enumeration problem the input is a list of the vertices of P each beginningwith a 1 in column one . A facet is defined by a set of n − Extreme rays would be indicated by a zero in column one. n − lrs will generate the facet multiple times, each known asa basis. When degeneracy occurs the vertex or facet is output when the index-wise lexicographicallyminimum basis is found. mplrs On detecting a possible overflow, the hybrid version of lrs switches to the next highest precision arith-metic package which it uses from that point onwards. The sequential implementation was parallelizedas mplrs [2] which dynamically partitions the work and provides a series of subproblems to multiple lrs workers.If each of these workers was a hybrid lrs process, it would start each problem with the fastestarithmetic and switch to higher precision when detecting a possible overflow. This has a few drawbacks;1. Duplicate output: it’s possible that when restarting after an overflow, we reprint the most recentline of output.2. Performance: if overflow occurs, it usually occurs often in the run (ie not infrequently). Thereis overhead in restarting and switching arithmetic and so we would prefer to avoid frequentswitches.In the hybrid version of mplrs , each of the workers is a fixed arithmetic process that returns to mplrs when possible overflow is detected. At that point, mplrs re-initializes the worker in questionusing the next available arithmetic. All output produced by the subproblem is discarded and theworker restarts from the beginning (of the subproblem). The worker uses the new arithmetic fromthat point onwards.This approach avoids duplicate output: by default, hybrid mplrs holds output produced by theworker if the arithmetic could overflow, flushing it only when the job ends. It also resolves the secondproblem; each worker can only overflow and switch arithmetic twice in the overall run – the same ashybrid lrs . In addition, if overflow is a rare event then it is possible for only some workers to overflow.This makes larger speedups possible comparable to hybrid lrs .There is overhead in two areas. First, when overflow is detected the worker restarts its job. Thisoverhead is limited because jobs are usually very small and workers can only overflow at most twiceduring the run. Next, since mplrs traverses different parts of the reverse search tree in a differentorder compared to lrs , it’s possible for overflow to occur earlier than it would in lrs . This could hurtperformance if lrs overflows only at the end but could also help performance by avoiding early lrs overflows.One complication is that for volume computation, mplrs internally uses the maximum precisionarithmetic available. This means that mplrs may not agree with the worker process on arithmetic.Communication between the worker process and mplrs therefore avoids using lrs mp integers.After checkpointing and restarting, hybrid mplrs begins again with the fastest arithmetic available.Checkpoint files are compatible between the various arithmetic packages. The polytopes we tested are described in Table 3 and, except for two new problems, were previouslydescribed and used in [2]. The problems range from non-degenerate to highly degenerate polyhedra.This table includes the results of an lrs run on each polytope as lrs gives the number of bases in asymbolic perturbation of the polytope. The new problems are: • cp7 is the cut polytope for 7 points which has input coefficients 0 or 1 and is highly degenerate. • p8-6 is related to the holographic cone studied in physics and is an extension of the eight pointcut polytope. Input coordinates are 0, 1 or -1 and it is the most degenerate problem studied. See e.g., Figure 3(c,d) in [2].
8e include a column labelled degeneracy which is the number of bases divided by the numberof vertices (or facets) output, rounded to the nearest integer. We have sorted the table in order ofincreasing degeneracy. The horizontal line separates the non-degenerate from the degenerate problems.The corresponding input files are available by following the download link at [1]. Note that the inputsizes are small, roughly comparable and much smaller than the output sizes.
Name Input Output lrs
H/V m n size V/H size bases secs depth degeneracy c30
V 30 16 4.7K 341088 73.8M 319770 36 14 1 c40
V 40 21 12K 40060020 15.6G 20030010 7531 19 1 km22
H 44 23 4.8K 4194304 1.2G 4194304 234 22 1 perm10
H 1023 11 29K 3628800 127M 3628800 283 45 1 vf500
V 500 7 98K 56669 38M 202985 137 41 4 vf900
V 900 7 20K 55903 3.9M 264385 23 45 5 mit71
H 71 61 9.5K 3149579 1.1G 57613364 15474 20 18 fq48
H 48 19 2.1K 119184 8.7M 7843390 44 24 66 mit
H 729 9 21K 4862 196K 1375608 132 101 283 cp7
V 64 22 3K 116764 7.4M 308644212 4071 41 2643 bv7
H 69 57 8.1K 5040 867K 84707280 1256 17 16807 p8-6
H 154 92 36K 4452 1.2M 110640628 16152 63 24852
Table 3: Polytopes tested and lrs v. 7.1 times ( mai48 ) In this section we give numerical results on using lrslib to solve the vertex enumeration problemsdescribed in the previous section with the various arithmetic packages in lrsarith . We test the followingsuite of codes, which apart from the exception noted below, are available in lrslib v. 7.1: • lrs1 : 64-bit fixed arithmetic with overflow checking. • lrs2 : 128-bit fixed arithmetic with overflow checking. • lrsgmp : GMP arithmetic (version 6.0). Comparable to default lrs v. 6.2 and earlier. • lrs : lrsarith hybrid arithmetic. Starts with lrs1 and switches to lrs2 (if available) and finally lrsgmp . Default version of lrs. • lrsMP : As lrs except in the final step internal MP arithmetic is used. • lrsflint : FLINT hybrid arithmetic (version 2.6.3). • mplrsgmp : Multi-core version of lrsgmp , comparable to default mplrs v. 6.2 and earlier. • mplrs : Hybrid multi-core version of lrs . Default version of mplrs .Also available in lrslib v. 7.1 are the parallel versions mplrs1 , mplrs2 , and mplrsflint of the corresponding lrs codes above.The results of the tests are shown in Table 4. The programs lrs1 and lrs2 either produce the correctoutput or indicate that an overflow may occur (o) and terminate. The columns lrs and lrsMP give therunning times for the hybrid versions. When 64 or 128-bit arithmetic suffices these two running timesare essentially the same. When extended precision is required the internal MP arithmetic may bemuch slower than GMP, especially for c40 , which requires 459 decimal digits. The final two columnsshow the speedups obtained by using the multicore versions mplrs and mplrsgmp with 40 processorsavailable.In the table we show in blue which arithmetic version was in use when lrs , shown in red, terminated.As to be expected lrs performs roughly the same as lrsgmp when the integers become too large for Available in v. 7.2 or on request. ame Single Arithmetic Hybrid Arithmetic 40 cores lrs1 lrs2 lrsgmp lrs lrsMP lrsflint mplrsgmp mplrs c30 (o) (o) 36 36 295 42 2 3 c40 (o) (o) 7707 7581 97532 8341 402 389 km22 (o) (o) 237 234 384 187 8 8 perm10
288 538 2429 283 284 1120 113 16 vf500 (o) (o) 139 137 1658 163 12 11 vf900 (o) 23 103 23 23 170 8 2 mit71 (o) (o) 15489 15474 110467 26432 731 723 fq48
44 77 265 44 44 147 11 1 mit (o) 134 563 132 131 270 27 7 cp7 bv7 p8-6 (o)=possible overflow detectedTable 4: Comparison of running times for various types of arithmetic, lrslib v. 7.1 ( mai48 )fixed precision. Speedups of 2–4 times are observed for the combinatorial problems which can besolved using only 64 or 128 bits. In general lrsflint did not perform as well as lrs but it is usually fasterthan lrsgmp on the combinatorial problems. The approach used in lrsarith hybrid arithmetic allowsit to obtain essentially native integer performance, but requires effort from the developer to handleoverflow. FLINT is easier for the developer but did not achieve the same performance when all valuesare small. Another possible reason for the lack of performance is that we did not use FLINT matricesas this would have required substantial reprogramming of lrs .We can observe that 128-bit arithmetic runs roughly 1.5–2 times slower than 64-bit arithmetic,when 64-bit arithmetic suffices. Hence there is a strong incentive to start the computation with 64bits using 128 bits only when necessary. This latter outcome occurred for mit and vf900 . We introduced lrsarith which is a small C library for performing fixed precision arithmetic on integersand rationals with overflow protection and allows hybrid arithmetic. It was developed as part of the lrslib polyhedral computation package. However, due to its small size and ease of use we decided torelease it as an independent package. The details of the method used and small examples were givenin the first part of the paper.In the second part we gave computational results for some vertex enumeration problems. Theexamples show the diversity of such problems and this is reflected in the arithmetic precision requiredto solve them. Many combinatorial polytopes involve calculations that can be completed withoutoverflow using 64 (or 128) bit integers. Using fixed arithmetic with overflow checking or hybridarithmetic, speedups of 2–4 times can be achieved and these carry over to parallel implementations.The new hybrid version described here explains the performance improvements found in lrslib version7 relative to previous versions or earlier comparisons [2].
Acknowledgements
We thank David Bremner for many helpful discussions and in particular for the elegant implementationof name mangling. This work was partially supported by JSPS Kakenhi Grants 16H02785, 18H05291,18K18027, 20H00579, 20H00595 .
References [1] D. Avis. http://cgm.cs.mcgill.ca/~avis/C/lrs.html .102] David Avis and Charles Jordan. mplrs: A scalable parallel vertex/facet enumeration code.
Math-ematical Programming Computation , 10(2):267–302, 2018.[3] David Barina. Convergence verification of the Collatz problem.
The Journal of Supercomputing ,2020.[4] T. Christof and A. Loebel. http://porta.zib.de .[5] William B. Hart. Fast library for number theory: an introduction. In
Mathematical Software –ICMS 2010 , volume 6327 of
Lecture Notes in Computer Science , pages 88–91, 2010.[6] Jeffrey C. Lagarias. The 3 x + 1 problem: An annotated bibliography (1963–1999) (sorted byauthor). arXiv:math/0309224, 2003.[7] Jeffrey C. Lagarias. The 3 x + 1 problem: An annotated bibliography, II (2000-2009).arXiv:math/0608208, 2006.[8] G¨unter M. Ziegler. Lectures on polytopes . Springer, 1995.
A Single arithmetic code for repeated squaring
Listing 1: fixed.c < s t d i o . h > < s t d l i b . h > ” l r s a r i t h . h” FILE ∗ l r s i f p , ∗ l r s o f p ; void l r s o v e r f l o w ( i nt parm ) { f p r i n t f ( l r s o f p , ” o v e r f l o w d e t e c t e d : h a l t i n g \ n” ) ; e x i t ( 1 ) ; } i nt main ( void ) { long i , k ; l r s m p a , b ; l r s m p i n i t ( 1 0 , s t d i n , s t d o u t ) ; l r s a l l o c m p ( a ) ; l r s a l l o c m p ( b ) ; f s c a n f ( l r s i f p , ”%l d ” ,&k ) ; itomp ( k , b ) ; pmp( ”” , b ) ; for ( i =1; i < =6; i ++) { copy ( a , b ) ; mulint ( a , a , b ) ; pmp( ” ” , b ) ; } l r s c l e a r m p ( a ) ; l r s c l e a r m p ( b ) ; f p r i n t f ( l r s o f p , ” \ n” ) ; return } Listing 2: makefile f i x e d : f i x e d . c l r s a r i t h . c l r s a r i t h . h l r s l o n g . c l r s l o n g . h lr sg mp . c lr sg mp . h lr smp. c lr smp . h $ (CC) − DLRSLONG − DSAFE − o f i x e d 1 l r s a r i t h . c f i x e d . c $ (CC) − DLRSLONG − DB128 − DSAFE − o f i x e d 2 l r s a r i t h . c f i x e d . c $ (CC) − DLRSLONG − o f i x e d 1 n l r s a r i t h . c f i x e d . c $ (CC) − DLRSLONG − DB128 − o f i x e d 2 n l r s a r i t h . c f i x e d . c $ (CC) − DMP − o fixedmp l r s a r i t h . c f i x e d . c $ (CC) − DGMP − o fixedgmp l r s a r i t h . c f i x e d . c − lgmp Hybrid arithmetic code for repeated squaring
Listing 3: hybridlib.c < s t d i o . h > < s t d l i b . h > < setjmp . h > ” l r s a r i t h . h” run s u f ( run ) s t a t i c jmp buf buf1 ; / ∗ r e t u r n l o c a t i o n when o v e r f l o w i n g ∗ / i nt run ( long k ) { long i ; l r s m p a , b ; l r s m p i n i t ( 0 , s t d i n , s t d o u t ) ; l r s a l l o c m p ( a ) ; l r s a l l o c m p ( b ) ; itomp ( k , b ) ; pmp( ”” , b ) ; i f ( ! setjmp ( buf1 ) ) { / ∗ o v e r f l o w t e s t ∗ / for ( i =1; i < =6; i ++) { copy ( a , b ) ; mulint ( a , a , b ) ; pmp( ” ” , b ) ; } l r s c l e a r m p ( a ) ; l r s c l e a r m p ( b ) ; return } p r i n t f ( ” o v e r f l o w d e t e c t e d : r e s t a r t i n g \ n” ) ; l r s c l e a r m p ( a ) ; l r s c l e a r m p ( b ) ; return } void l r s o v e r f l o w ( i nt parm ) { longjmp ( buf1 , 1 ) ; } d e f i n e d (MA) && d e f i n e d (LRSLONG) B128 i nt run2 ( long k ) { / ∗
128 b i t ∗ / i nt run1 ( long k ) { / ∗
64 b i t ∗ / i nt runmp ( long k ) { / ∗ o t h e r a r i t h m e t i c ∗ / return ( run ( k ) ) ; } < s t d i o . h > < s t d l i b . h > FILE ∗ l r s i f p ; / ∗ i n p u t f i l e p o i n t e r ∗ / FILE ∗ l r s o f p ; / ∗ o u t p u t f i l e p o i n t e r ∗ / i nt run1 ( long k ) ; i nt run2 ( long k ) ; i nt runmp ( long k ) ; Listing 5: hybrid.c i nt main ( void ) { long k ; s c a n f ( ”%l d ” ,&k ) ; i f ( run1 ( k ) ) / ∗ TRUE on 64 b i t o v e r f l o w ∗ / i f ( run2 ( k ) ) / ∗ TRUE on 128 b i t o v e r f l o w ∗ / runmp ( k ) ; p r i n t f ( ” \ n” ) ; return } Listing 6: makefile HOBJ=l r s l o n g 1 . o h y b r i d l i b 1 . o l r s l o n g 2 . o lr sg mp . o h y b r i d l i b 2 . o h y b r i d l i b g m p . o h y b r i d : h y b r i d . c l r s l o n g . c l r s l o n g . h lr sg mp . c lr sg mp . h h y b r i d l i b . c h y b r i d . hl r s a r i t h . c $ (CC) − DMA − DSAFE − DLRSLONG − c − o l r s l o n g 1 . o l r s a r i t h . c $ (CC) − DMA − DB128 − DSAFE − DLRSLONG − c − o l r s l o n g 2 . o l r s a r i t h . c $ (CC) − DMA − DGMP − c − o lr sg mp . o l r s a r i t h . c $ (CC) − DMA − DSAFE − DLRSLONG − c − o h y b r i d l i b 1 . o h y b r i d l i b . c $ (CC) − DMA − DB128 − DSAFE − DLRSLONG − c − o h y b r i d l i b 2 . o h y b r i d l i b . c $ (CC) − DMA − DGMP − c − o h y b r i d l i b g m p . o h y b r i d l i b . c $ (CC) − DMA − DLRSLONG − DSAFE − o h y b r i d $ { HOBJ } h y b r i d . c − lgmplgmp